Closed
Bug 71515
Opened 24 years ago
Closed 24 years ago
Reentrance in js_AllocGCThing which leads to crash in debug version
Categories
(Core :: DOM: Navigation, defect)
Tracking
()
RESOLVED
FIXED
mozilla0.9.1
People
(Reporter: shanjian, Assigned: adamlock)
References
()
Details
(Keywords: crash)
Attachments
(2 files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
While I was working on bug 70759, the same reproduce process always leads to assertion and
crash in my debug build. This problem has been bothering me for several days. To reproduce
the problem:
1. go to http://www.chinesecomputing.com
2. click any link on the home page (optional: click down several links)
3. change the Character Coding selection for the page (e.g., change to Chinese
Traditional Big 5)
After step 3, the program shows assertion at file jsgc.c, line 448
JS_ASSERT(!rt->gcRunning);
I spent some time on this, and posted here for your reference:
After step 2, set breakpoint at jsgc.c, 1103,
rt->gcRunning = JS_TRUE;
Then follow step 3. The program will stop at jsgc.c:1103 twice. The first
time it stop there, let it go. The second time it will crash.
The reentrance happened in the last item of rt->gcArenaPool.first (around line 1205).
It happened inside function "finalizer(cx, thing);".
Comment 1•24 years ago
|
||
Marking NEW this has a lot of good info in it.
Comment 2•24 years ago
|
||
Crash still occurring with debug build from 2001-03-12 on WinNT.
My steps to reproduce are slightly different than above:
1. Go to http://www.chinesecomputing.com
2. Change the character coding selection for the page to anything new
3. CRASH !!!
void *
js_AllocGCThing(JSContext *cx, uintN flags)
{
JSBool tried_gc;
JSRuntime *rt;
JSGCThing *thing;
uint8 *flagp;
#ifdef TOO_MUCH_GC
js_GC(cx, GC_KEEP_ATOMS);
tried_gc = JS_TRUE;
#else
tried_gc = JS_FALSE;
#endif
rt = cx->runtime;
JS_LOCK_GC(rt);
JS_ASSERT(!rt->gcRunning); <<<<<<<<<<<<<<<<<<<<<<<<<<< CRASHED HERE
if (rt->gcRunning) {
METER(rt->gcStats.finalfail++);
JS_UNLOCK_GC(rt);
return NULL;
}
METER(rt->gcStats.alloc++);
Comment 3•24 years ago
|
||
Comment 4•24 years ago
|
||
NOTES:
1. This crash only happened for me with debug builds, not the nightly binaries
2. With the nightly binaries, I would occasionally get this messagbox on WinNT:
###!!! ASSERTION: no handler for font package: '0', file
d:\mozilla\intl\locale\src\nsFontPackageService.cpp, line 70
3. With the debug build on WinNT, I saw messages like this in the
debug console just before the crash:
JavaScript error:
chrome://global/content/charsetOverlay.js line 126: wnd has no properties
JavaScript error:
chrome://global/content/charsetOverlay.js line 126: wnd has no properties
###!!! ASSERTION: docshell has null child: 'shell', file
d:\mozilla\docshell\base\nsDocShell.cpp, line 153
Charset Overlay menu item pressed: EUC-JP
Comment 5•24 years ago
|
||
I think this is not a JS Engine problem; reassigning to Embedding:Docshell.
cc'ing Brendan, jband for a more refined analysis on the root cause -
Assignee: rogerl → adamlock
Component: Javascript Engine → Embedding: Docshell
QA Contact: pschwartau → adamlock
Comment 6•24 years ago
|
||
Phil, any place where you see this on Win32 debug builds only and with another
assertion going the you shoul dsuspect a dup of bug 54792. Look in the stack
trace for the nest into the event loop from the NS_ASSERTION with a JS gc
further up the stack.
Comment 7•24 years ago
|
||
I wasn't sure it was bug 54792 because I didn't see any nsDebug::Assertion
anywhere in the stack. Should I mark this as a dupe of 54792, then?
Comment 8•24 years ago
|
||
Phil, It is not clear if this bug on the whole is only a dup of bug 54792 or
not. I think that the two cases you listed probably are. The stack trace
attached to this bug is not conclusive for me. There may be an assert hiding in
the transition from the the View dtor into the NT code (that the debugger is
not showing). Or something else might be going on that causes it to get into
dispatching events. We know there have been some cases where dtors called
from JS finalizers have made calls into other objects that then get back into JS
and cause this assert. I don't know if that is one of them or just the stupid
Windows assert dialog bug.
The assertion happens because destruction of the old set of docshells causes
garbage collection to start. This causes the destruction of the associated views
and when they destroy or release their parent widget the thread goes into
message processing loop. Before the loop ends, a DOM keyboard event is processed
from the queue and the assertion occurs as it is turned into a JS object.
Assignee | ||
Comment 10•24 years ago
|
||
John, it does look similar to the other bug.
I think the problem here is the message processing loop happening during GC.
This makes it possible for DOM events to be handled before the loop ends. DOM
events usually end up with a JS object being created which is causing the bug
we're seeing because GC is still in progress.
I can't think of why there is a message processing loop in the middle of this
stack trace, but it could be that the widget is actually a proxy so calling
methods on it causes events to be posted and a processing loop during which
other things can happen.
If so then perhaps we can fix this problem by putting checks into the proxy code
so that when the calling thread and the object thread are the same it just makes
a straight call rather than using a queue.
Updated•24 years ago
|
Target Milestone: --- → mozilla0.9
Comment 11•24 years ago
|
||
Danm saw a similar problem, and testified that Windows just *had* to process a
message or two in the middle of a Win32 call that I claimed should not process
messages.
I am clueless about Windows, and defer to others, but it seems to me that
between our code's reentrancy restrictions, and JS's run-to-completion execution
model, we need to keep event/message processing confined to well-known loops
down the stack from JS and other code activations. Can we?
/be
Assignee | ||
Comment 12•24 years ago
|
||
The only time I know that Win32 will enter a message processing loop without
explicitly being told to is for a marshalled COM call. Since we don't make such
a call anywhere near here (though it could happen around the clipboard / drag &
drop code) I'm not not sure what the reason is.
If it does occur some other time I don't think there is much we can do about
it.
Comment 13•24 years ago
|
||
Brendan is talking about bug 69586; a crash with a stack trace like this:
js_AllocGCThing(JSContext * 0x00db3460, unsigned int 0) line 448 + 41 bytes
...
nsWindow::DispatchFocus(unsigned int 105) line 4112 + 15 bytes
...
nsWindow::WindowProc(HWND__ * 0x00030288, unsigned int 7, unsigned int 393444,
long 0) line 923 + 27 bytes
USER32! 77e13eb0()
...
DocumentViewerImpl::~DocumentViewerImpl() line 414 + 97 bytes
...
js_GC(JSContext * 0x035a8908, unsigned int 0) line 1218 + 11 bytes
...
nsDocShell::CreateContentViewer(nsDocShell * const 0x035b47c0, const char *
0x0012f828, nsIRequest * 0x0482c698, nsIStreamListener * * 0x0012f87c) line 2771
+ 32 bytes
...
PL_HandleEvent(PLEvent * 0x036924ec) line 586 + 10 bytes
PL_ProcessPendingEvents(PLEventQueue * 0x00b57318) line 516 + 9 bytes
_md_EventReceiverProc(HWND__ * 0x001e00e6, unsigned int 49442, unsigned int 0,
long 11891480) line 1067 + 9 bytes
...
nsAppShellService::Run(nsAppShellService * const 0x00b4b770) line 408
Some object is being deleted, Mozilla enters js_GC, the whole window gets torn
down, Windows sends a WM_SETFOCUS message directly to some (presumably, other)
window, the handler for that event invokes JS. eek.
It isn't entering a Windows message processing loop at all. Windows is
dispatching Windows events directly. All very legal. All very problematic.
Assignee | ||
Comment 14•24 years ago
|
||
It looks as if focus events should be discarded when a window is being taken
down. Does anyone know who did the focus stuff?
Assignee | ||
Comment 15•24 years ago
|
||
CC'ing saari@netscape.com
Chris, do you have any ideas on the best way to prevent the focus issues from
causing DOM events to be fired to a window during it's destruction?
Comment 16•24 years ago
|
||
Event firing during window transitions is a general problem that we haven't
really addressed yet. We could certainly lock things down if we notify the
presshell that it is about to be destroyed and then proceed with said destruction.
Comment 17•24 years ago
|
||
In the stack trace I posted above, the message causing all the trouble is a *get*
focus message on a window other than the one being destroyed. Two different
PresShells. Locking down event handling on a window while it is being destroyed
wouldn't help; in fact, that may be the only window you wouldn't have to lock
down. Yar.
Assignee | ||
Comment 18•24 years ago
|
||
It may be possible to stick some extra refcounts on the document viewer to stop
it going away right in the middle of GC. Investigating further.
Assignee | ||
Comment 19•24 years ago
|
||
Refcount testing didn't work, moving to 0.9.1.
Note also that the bug has no apparant ill-effect on the release version save
perhaps for a lost focus event. The crash, or rather assertion only happens in
debug mode.
Comment 20•24 years ago
|
||
Adam: no kungFuDeathGrip luck?
The debug-only assertion-crash (JS assertions are always fatal) does mean that
in release builds, at least one JS object (or perhaps a string or double, those
are GC-allocated too) allocation is failing. This should throw an "out of
memory" exception. Any sign of that? If not, perhaps the JS allocation is also
somehow dependent on DEBUG.
/be
Comment 21•24 years ago
|
||
*** Bug 77064 has been marked as a duplicate of this bug. ***
Comment 22•24 years ago
|
||
[Win2K] I am seeing this now and then.
Assignee | ||
Comment 23•24 years ago
|
||
Here is a list of things I tried, all of which were in nsDocShell::Destroy:
1. Put a death grip on the docshell's content viewer
2. Move the call to DestroyChildren() to before the destruction of the parent
docshell's content viewer
3. Splitting the call to destroy each child docshell into two methods called
before and after content viewer destruction.
No luck so far.
I will work on this some more. I will also investigate whether the loss of the
focus DOM event has any repercussions, e.g. JS thinking it's out of memory,
focus problems etc. Suggestions are welcome.
Assignee | ||
Comment 24•24 years ago
|
||
Assignee | ||
Comment 25•24 years ago
|
||
My previous attempts at death grips were in the wrong place. Turns out that a
single line death grip in nsDocShell::Embed() ensures the content viewer is
destroyed *after* GC so this chain of events does not occur.
Patch also removes a few extraneous lines related to the nsDocShell::Destroy()
method.
Can I have review and super-review please?
Comment 26•24 years ago
|
||
For ease of reference later, a comment would help here:
+ // Ensure that the content viewer is destroyed *after* the GC - bug 71515
+ nsCOMPtr<nsIContentViewer> kungfuDeathGrip = mContentViewer;
Assignee | ||
Comment 27•24 years ago
|
||
Thanks, I'll add a comment above the line.
Comment 28•24 years ago
|
||
this looks good to me - sr=rpotts.
you might want to delete the comment at line 3079 which says "don't hold onto a
reference to the DocViewer beyond this point" :-) Since that is exactly what
this patch is doing :-)
I believe that that comment is old and out of date... Does anyone think that
the comment is still valid?
-- rick
Comment 29•24 years ago
|
||
Assignee | ||
Comment 30•24 years ago
|
||
Fix checked in.
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•