Bugzilla

Comment 2

•

24 years ago

Crash still occurring with debug build from 2001-03-12 on WinNT.
My steps to reproduce are slightly different than above: 

1. Go to http://www.chinesecomputing.com
2. Change the character coding selection for the page to anything new
3. CRASH !!! 



void *
js_AllocGCThing(JSContext *cx, uintN flags)
{
    JSBool tried_gc;
    JSRuntime *rt;
    JSGCThing *thing;
    uint8 *flagp;

#ifdef TOO_MUCH_GC
    js_GC(cx, GC_KEEP_ATOMS);
    tried_gc = JS_TRUE;
#else
    tried_gc = JS_FALSE;
#endif

    rt = cx->runtime;
    JS_LOCK_GC(rt);
    JS_ASSERT(!rt->gcRunning); <<<<<<<<<<<<<<<<<<<<<<<<<<< CRASHED HERE
    if (rt->gcRunning) {
        METER(rt->gcStats.finalfail++);
        JS_UNLOCK_GC(rt);
        return NULL;
    }
    METER(rt->gcStats.alloc++);

Comment 3

•

24 years ago

Attached file WinNT stack trace of crash (deleted) — Details

Comment 4

•

24 years ago

NOTES:

1. This crash only happened for me with debug builds, not the nightly binaries

2. With the nightly binaries, I would occasionally get this messagbox on WinNT:

   ###!!!  ASSERTION:  no handler for font package: '0', file       
   d:\mozilla\intl\locale\src\nsFontPackageService.cpp, line 70


3. With the debug build on WinNT, I saw messages like this in the 
   debug console just before the crash:

   JavaScript error: 
   chrome://global/content/charsetOverlay.js  line 126:  wnd has no properties

   JavaScript error: 
   chrome://global/content/charsetOverlay.js  line 126:  wnd has no properties

   ###!!!  ASSERTION:  docshell has null child:  'shell',  file       
   d:\mozilla\docshell\base\nsDocShell.cpp,  line 153

   Charset Overlay menu item pressed: EUC-JP

Comment 5

•

24 years ago

I think this is not a JS Engine problem; reassigning to Embedding:Docshell.
cc'ing Brendan, jband for a more refined analysis on the root cause -

Assignee: rogerl → adamlock

Component: Javascript Engine → Embedding: Docshell

QA Contact: pschwartau → adamlock

John Bandhauer

Comment 6

•

24 years ago

Phil, any place where you see this on Win32 debug builds only and with another 
assertion going the you shoul dsuspect a dup of bug 54792. Look in the stack 
trace for the nest into the event loop from the NS_ASSERTION with a JS gc 
further up the stack.

Comment 7

•

24 years ago

I wasn't sure it was bug 54792 because I didn't see any nsDebug::Assertion 
anywhere in the stack. Should I mark this as a dupe of 54792, then?

John Bandhauer

Comment 8

•

24 years ago

Phil, It is not clear if this bug on the whole is only a dup of bug 54792 or 
not. I think that the two cases you listed probably are. The stack trace 
attached to this bug is not conclusive for me. There may be an assert hiding in 
the transition from the the View dtor into the NT code (that the debugger is 
not showing). Or something else might be going on that causes it to get into 
dispatching events. We know there have been some cases where dtors called 
from JS finalizers have made calls into other objects that then get back into JS 
and cause this assert. I don't know if that is one of them or just the stupid 
Windows assert dialog bug.

Assignee

Comment 9

•

24 years ago

The assertion happens because destruction of the old set of docshells causes 
garbage collection to start. This causes the destruction of the associated views 
and when they destroy or release their parent widget the thread goes into 
message processing loop. Before the loop ends, a DOM keyboard event is processed 
from the queue and the assertion occurs as it is turned into a JS object.

Assignee

Comment 10

•

24 years ago

John, it does look similar to the other bug.

I think the problem here is the message processing loop happening during GC. 
This makes it possible for DOM events to be handled before the loop ends. DOM 
events usually end up with a JS object being created which is causing the bug 
we're seeing because GC is still in progress.

I can't think of why there is a message processing loop in the middle of this 
stack trace, but it could be that the widget is actually a proxy so calling 
methods on it causes events to be posted and a processing loop during which 
other things can happen.

If so then perhaps we can fix this problem by putting checks into the proxy code 
so that when the calling thread and the object thread are the same it just makes 
a straight call rather than using a queue.

Judson Valeski

Updated

•

24 years ago

Target Milestone: --- → mozilla0.9

Comment 11

•

24 years ago

Danm saw a similar problem, and testified that Windows just *had* to process a
message or two in the middle of a Win32 call that I claimed should not process
messages.

I am clueless about Windows, and defer to others, but it seems to me that
between our code's reentrancy restrictions, and JS's run-to-completion execution
model, we need to keep event/message processing confined to well-known loops
down the stack from JS and other code activations.  Can we?

/be

Assignee

Comment 12

•

24 years ago

The only time I know that Win32 will enter a message processing loop without 
explicitly being told to is for a marshalled COM call. Since we don't make such 
a call anywhere near here (though it could happen around the clipboard / drag & 
drop code) I'm not not sure what the reason is.

If it does occur some other time I don't think there is much we can do about 
it.

Dan M

Comment 13

•

24 years ago

Brendan is talking about bug 69586; a crash with a stack trace like this:

js_AllocGCThing(JSContext * 0x00db3460, unsigned int 0) line 448 + 41 bytes
...
nsWindow::DispatchFocus(unsigned int 105) line 4112 + 15 bytes
...
nsWindow::WindowProc(HWND__ * 0x00030288, unsigned int 7, unsigned int 393444, 
long 0) line 923 + 27 bytes
USER32! 77e13eb0()
...
DocumentViewerImpl::~DocumentViewerImpl() line 414 + 97 bytes
...
js_GC(JSContext * 0x035a8908, unsigned int 0) line 1218 + 11 bytes
...
nsDocShell::CreateContentViewer(nsDocShell * const 0x035b47c0, const char * 
0x0012f828, nsIRequest * 0x0482c698, nsIStreamListener * * 0x0012f87c) line 2771 
+ 32 bytes
...
PL_HandleEvent(PLEvent * 0x036924ec) line 586 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00b57318) line 516 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x001e00e6, unsigned int 49442, unsigned int 0, 
long 11891480) line 1067 + 9 bytes
...
nsAppShellService::Run(nsAppShellService * const 0x00b4b770) line 408

Some object is being deleted, Mozilla enters js_GC, the whole window gets torn 
down, Windows sends a WM_SETFOCUS message directly to some (presumably, other) 
window, the handler for that event invokes JS. eek.

It isn't entering a Windows message processing loop at all. Windows is 
dispatching Windows events directly. All very legal. All very problematic.

Assignee

Comment 14

•

24 years ago

It looks as if focus events should be discarded when a window is being taken 
down. Does anyone know who did the focus stuff?

Assignee

Comment 15

•

24 years ago

CC'ing saari@netscape.com

Chris, do you have any ideas on the best way to prevent the focus issues from 
causing DOM events to be fired to a window during it's destruction?

saari (gone)

Comment 16

•

24 years ago

Event firing during window transitions is a general problem that we haven't
really addressed yet. We could certainly lock things down if we notify the
presshell that it is about to be destroyed and then proceed with said destruction.

Dan M

Comment 17

•

24 years ago

In the stack trace I posted above, the message causing all the trouble is a *get* 
focus message on a window other than the one being destroyed. Two different 
PresShells. Locking down event handling on a window while it is being destroyed 
wouldn't help; in fact, that may be the only window you wouldn't have to lock 
down. Yar.

Assignee

Comment 18

•

24 years ago

It may be possible to stick some extra refcounts on the document viewer to stop 
it going away right in the middle of GC. Investigating further.

Assignee

Comment 19

•

24 years ago

Refcount testing didn't work, moving to 0.9.1. 

Note also that the bug has no apparant ill-effect on the release version save 
perhaps for a lost focus event. The crash, or rather assertion only happens in 
debug mode.

Assignee

Updated

•

24 years ago

Target Milestone: mozilla0.9 → mozilla0.9.1

Comment 20

•

24 years ago

Adam: no kungFuDeathGrip luck?

The debug-only assertion-crash (JS assertions are always fatal) does mean that
in release builds, at least one JS object (or perhaps a string or double, those
are GC-allocated too) allocation is failing.  This should throw an "out of
memory" exception.  Any sign of that?  If not, perhaps the JS allocation is also
somehow dependent on DEBUG.

/be

Comment 21

•

24 years ago

*** Bug 77064 has been marked as a duplicate of this bug. ***

rbs

Comment 22

•

24 years ago

[Win2K] I am seeing this now and then.

Assignee

Comment 23

•

24 years ago

Here is a list of things I tried, all of which were in nsDocShell::Destroy:

1. Put a death grip on the docshell's content viewer
2. Move the call to DestroyChildren() to before the destruction of the parent 
docshell's content viewer
3. Splitting the call to destroy each child docshell into two methods called 
before and after content viewer destruction.

No luck so far.

I will work on this some more. I will also investigate whether the loss of the 
focus DOM event has any repercussions, e.g. JS thinking it's out of memory, 
focus problems etc. Suggestions are welcome.

Marek Z. Jeziorek

Updated

•

24 years ago

Blocks: 77421

Assignee

Comment 24

•

24 years ago

Attached patch Patch fixes the GC problem, review and superreview please (deleted) — Details — Splinter Review

Assignee

Comment 25

•

24 years ago

My previous attempts at death grips were in the wrong place. Turns out that a 
single line death grip in nsDocShell::Embed() ensures the content viewer is 
destroyed *after* GC so this chain of events does not occur.

Patch also removes a few extraneous lines related to the nsDocShell::Destroy() 
method.

Can I have review and super-review please?

rbs

Comment 26

•

24 years ago

For ease of reference later, a comment would help here:

+     // Ensure that the content viewer is destroyed *after* the GC - bug 71515
+     nsCOMPtr<nsIContentViewer> kungfuDeathGrip = mContentViewer;

Assignee

Comment 27

•

24 years ago

Thanks, I'll add a comment above the line.

rpotts (gone)

Comment 28

•

24 years ago

this looks good to me - sr=rpotts.

you might want to delete the comment at line 3079 which says "don't hold onto a 
reference to the DocViewer beyond this point" :-)  Since that is exactly what 
this patch is doing :-)

I believe that that comment is old and out of date...  Does anyone think that 
the comment is still valid?

-- rick

Comment 29

•

24 years ago

r=brendan@mozilla.org