Closed Bug 466747 Opened 16 years ago Closed 16 years ago

TM: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements

Categories

(Core :: JavaScript Engine, defect, P2)

x86
macOS
defect

Tracking

()

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: graydon)

References

Details

(Keywords: assertion, testcase, verified1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

Loading the testcase in my mozilla-central debug build triggers: Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp, at /Users/jruderman/central/js/src/jstracer.cpp:3595
Flags: blocking1.9.1?
Summary: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements → TM: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements
Assertion failurez iz badz.
Flags: blocking1.9.1? → blocking1.9.1+
Priority: -- → P2
Assignee: general → graydon
Ok, I've narrowed down what's going on here a ways. The assert is claiming that when the trace returned, it left the native stack in a state has 2 new things on it; but when the assert runs js_ReconstructStackDepth() on the bytecode making up the trace -- which simulates the effect of each opcode on the stack depth -- it decides there should only be 1 thing on the stack at that pc, not 2. So the assert trips. So: why are there 2 things on the stack instead of 1? Who is right? This is the question. The exit in question comes from the bottom of the loop: "for (var q = 0; q < " + m + "; ++q) { }" which is an lt relational; the test is "q < 7" in the case sitting in my debugger, guarded on the assumption of "being true"; so we take the exit when the guard turns false. The code in TraceRecorder::relational() emits this guard via TraceRecorder::fuseif(), which describes the exit with the belief that the stack still contains the 2 values "q" and "7", not the single value "false" that ought to be on the stack after the lt operation has occurred. Which is correct for the exit, mm? The pre-operation state "q" and "7", or the post-operation state "false"? Looking at ::relational() more closely, it's clear "false" has not actually been *written* back to the stack when the exit has taken. We decide to exit in ::fuseif *before* writing "false" back to the stack. That's a hint. Digging further, ::fuseif() calls TraceRecorder::snapshot(), which calls js_NativeStackSlots() to work out the number of stack slots present, which asks the *interpreter* about its current sp. So my current hunch is that the interpreter disagrees with the judgment of js_ReconstructStackDepth(), and that 2 is in fact the correct number, given the state of recording: we're still, to some extent, in the "pre-state" of the lt-fused-with-the-next-ifne-opcode hybrid operation. Or something. I think this all *might* have to do with the TRY_BRANCH_AFTER_COND macro shortcut installed in rev 2c17ba7e9da3, back in August, but I'll have to dig a bit more after lunch to confirm. (I'm surprised, if that's the case, that we're not hitting this condition a lot more commonly...)
Status: NEW → ASSIGNED
Attached patch Fix the bug (obsolete) (deleted) — Splinter Review
Whoo, that was a bit of a diversion. What's actually happening is that we're returning with a pc adjustment of 20, which is the IFNE opcode following the LT at the bottom of the loop. This is incorrect; we're supposed to be called to trace *before* the LT executes, and this is the conceptual basis for what gets written out (correctly) by the fuseif operation. So I have discovered talking to gal. The new question, then, is why we get a pc adjustment of 20 rather than 19? It makes no sense. And after much stabbing around, it hasn't to do with thin loops. The answer is much spookier, and explains why we can only reproduce in the browser: there are two <script> tags at work here, containing almost-identical code, compiled-and-run one after another. Turns out we are *recycling* the spidermonkey bytecode buffer memory (via malloc) and running cached JIT'ed machine code for the old bytecode when we actually got new bytecode that is, alas, a single opcode shorter: opcode "JSOP_ONE" (one byte) is used in the second run, triggering the JIT'ed code, but it executes JIT'ed code compiled with a "JSOP_INT8 7" (two bytes) in the same location, wired to return therefore 1 byte of pc late. Thus the wild symptom causing the above deep diversions from the root cause. The fix is trivial. Flush the JIT cache same place we flush the property cache, when the script is destroyed. Patch attached.
Attachment #361015 - Flags: review?
Attachment #361015 - Flags: review? → review?(brendan)
Comment on attachment 361015 [details] [diff] [review] Fix the bug >diff -r 569acf636d50 -r 05de25319810 js/src/jsscript.cpp >--- a/js/src/jsscript.cpp Wed Feb 04 11:08:31 2009 -0800 >+++ b/js/src/jsscript.cpp Fri Feb 06 18:14:14 2009 -0800 >@@ -60,6 +60,9 @@ > #include "jsparse.h" > #include "jsscope.h" > #include "jsscript.h" >+#ifdef JS_TRACER >+#include "jstracer.h" >+#endif > #if JS_HAS_XDR > #include "jsxdrapi.h" > #endif Turns out you can #include "jstracer.h" without #ifdefs around the #include. >@@ -1607,6 +1610,9 @@ > JS_ASSERT(script->owner == cx->thread); > #endif > js_FlushPropertyCacheForScript(cx, script); >+#ifdef JS_TRACER >+ js_FlushJITCache(cx); >+#endif Any way to flush just this script's pc mapping? Worried that we will over-flush when loading top level scripts (sometimes this is done via the DOM APIs: create a Script node, insert it). /be
(In reply to comment #4) > Turns out you can #include "jstracer.h" without #ifdefs around the #include. Wondered about that, wasn't sure if you had a preference wrt. style. > >@@ -1607,6 +1610,9 @@ > > JS_ASSERT(script->owner == cx->thread); > > #endif > > js_FlushPropertyCacheForScript(cx, script); > >+#ifdef JS_TRACER > >+ js_FlushJITCache(cx); > >+#endif > > Any way to flush just this script's pc mapping? Worried that we will over-flush > when loading top level scripts (sometimes this is done via the DOM APIs: create > a Script node, insert it). Also wondered about that. It's not super hard; just need to make a helper function (or perhaps a defaulted script argument to the existing one) and possibly store an extra script pointer in each VMFragment if it's not already reachable from the tangle of what's already in there. Will cost a full hashtable walk, but the full flush already costs that, so strictly cheaper. I'll fiddle with this on monday.
(In reply to comment #5) > (In reply to comment #4) > > > Turns out you can #include "jstracer.h" without #ifdefs around the #include. > > Wondered about that, wasn't sure if you had a preference wrt. style. Generally against extra #ifdefs :-). We often define stub macros in #else parts to relieve callers; same motivation for #include guards. I think originally you had to #ifdef the #include, but that changed and most includers do not ifdef. > Also wondered about that. It's not super hard; just need to make a helper > function (or perhaps a defaulted script argument to the existing one) and > possibly store an extra script pointer in each VMFragment if it's not already > reachable from the tangle of what's already in there. You could just test JS_UPTRDIFF(f->ip, script->code) < script->length and if true, then f->ip is in script's bytecode vector. /be
Attachment #361015 - Flags: review?(brendan) → review?(danderson)
Attachment #361015 - Flags: review?(danderson)
Isn't this a bit extreme? Flushing the entire cache? Brendan how frequently do we hit this path?
Attached patch More conservative fix (obsolete) (deleted) — Splinter Review
This is a more conservative fix; it just nulls out the hashtable entry for any VMFragment peer list associated with a particular script, as brendan suggested. Still fixes this particular crasher, doesn't appear to interact negatively with any other tests I can run (sunspider in browser, dromaeo, alexa, trace-test in shell, etc.)
Attachment #361015 - Attachment is obsolete: true
Attachment #361399 - Flags: review?(gal)
Comment on attachment 361399 [details] [diff] [review] More conservative fix Nice.
Attachment #361399 - Flags: review?(gal) → review+
Whiteboard: fixed-in-tracemonkey
Ugh, no, this 'fix' is not correct; it's nulling out entire hash bucket-chains if their *head element* is in the ip right range, rather than killing the specific offending VMFragments along the chain in question. We actually have to traverse every chain and kill the chain *entries* that are possible offenders. Will post followup patch.
Whiteboard: fixed-in-tracemonkey
Attached patch Correction to flushing logic (obsolete) (deleted) — Splinter Review
It's a testament to both the obscurity of the testcase and the robustness of the JIT that the patch I landed didn't *appear* to break anything; but it wasn't actually correct, as noted in previous comment. This variant is, I think, more correct. Perhaps even fully correct, dare I hope? Please read with some care. I've still got a browser building on this laptop to test more completely.
Attachment #361399 - Attachment is obsolete: true
Attachment #361472 - Flags: review?(gal)
Attachment #361472 - Flags: review?(gal) → review?(jorendorff)
Comment on attachment 361472 [details] [diff] [review] Correction to flushing logic If we search this table frequently (and flush rarely), it would be better to remove the fragment from the chain.
Attachment #361472 - Flags: review?(jorendorff) → review+
This variant disconnects offending entries entirely from the hash chains, as suggested. Probably best. Appears to pass shell testing; rebuilding browser presently to test against original case and confirm. Otherwise look good?
Attachment #361472 - Attachment is obsolete: true
Attachment #361560 - Flags: review?(jorendorff)
Comment on attachment 361560 [details] [diff] [review] Physically disconnect buckets, rather than set ip to 0 > + debug_only_v(printf("Disconnecting VMFragment %p " > + "with ip %p, in range [%p,%p).\n", > + *f, (*f)->ip, script->code, > + script->code + script->length)); GCC will complain that these aren't void *. :-P Otherwise it looks fine.
Attachment #361560 - Flags: review?(jorendorff) → review+
http://hg.mozilla.org/tracemonkey/rev/904555678bda GCC already complains about dozens of such things. If we're to fix them, let's do so all at once?
Really better to avoid adding new warnings -- otherwise the deferred job of fixing the ever-increasing pile looks even more thankless and ill-timed at any given point in the development cycle. I grep out warnings like so: grep warn ./Darwin*/*made|grep -v nanojit|grep -v no-rtti|grep -v editline but the good news is that I do not see novel warnings introduced by this bug's patch. I see: ./Darwin_DBG.OBJ/made:../jstracer.cpp:3264: warning: ‘void js_dumpMap(const TypeMap&)’ defined but not used ./Darwin_DBG.OBJ/made:../../shell/js.cpp:2809: warning: ‘void Callback(JSRuntime*)’ defined but not used ./Darwin_OPT.OBJ/made:../../shell/js.cpp:2809: warning: ‘void Callback(JSRuntime*)’ defined but not used (last two are the same; Andreas is fixing the Callback one), and I'm fixing the jstracer.cpp one in the upvar patch, although it really should go in on its own (provided it gets merged to active release branches -- that's the down side of fixing warnings: versionitis between tm/m-c and 1.9.1, or all three). /be
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → FIXED
Still a lingering secondary patch to backport here, noted in comment #16.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: fixed-in-tracemonkey
Status: REOPENED → RESOLVED
Closed: 16 years ago16 years ago
Resolution: --- → FIXED
http://hg.mozilla.org/tracemonkey/rev/fb75ee8e9017 /cvsroot/mozilla/js/tests/js1_5/Regress/regress-466747.js,v <-- regress-466747.js initial revision: 1.1
Flags: in-testsuite+
v 1.9.1, 1.9.2.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: