Closed Bug 466747 Opened 16 years ago Closed 16 years ago

TM: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements

Categories

(Core :: JavaScript Engine, defect, P2)

Product:

Component:

Platform:

x86

macOS

Type:

defect

Priority:

P2

Severity:

critical

Tracking

()

Status:

VERIFIED FIXED

People

(Reporter: jruderman, Assigned: graydon)

References

Details

(Keywords: assertion, testcase, verified1.9.1, Whiteboard: fixed-in-tracemonkey)

Attachments

(2 files, 3 obsolete files)

testcase (triggers a fatal assertion when loaded in Firefox) 16 years ago Jesse Ruderman (deleted), text/html		Details
Fix the bug 16 years ago Graydon Hoare :graydon (deleted), patch		Details \| Diff \| Splinter Review
More conservative fix 16 years ago Graydon Hoare :graydon (deleted), patch	gal : review+	Details \| Diff \| Splinter Review
Correction to flushing logic 16 years ago Graydon Hoare :graydon (deleted), patch	jorendorff : review+	Details \| Diff \| Splinter Review
Physically disconnect buckets, rather than set ip to 0 16 years ago Graydon Hoare :graydon (deleted), patch	jorendorff : review+	Details \| Diff \| Splinter Review

Reporter

Description

•

16 years ago

Attached file testcase (triggers a fatal assertion when loaded in Firefox) (deleted) — Details

Loading the testcase in my mozilla-central debug build triggers: Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp, at /Users/jruderman/central/js/src/jstracer.cpp:3595

Flags: blocking1.9.1?

Reporter

Updated

•

16 years ago

Summary: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements → TM: "Assertion failure: fp->slots + fp->script->nfixed + js_ReconstructStackDepth(cx, fp->script, fp->regs->pc) == fp->regs->sp" with DOM manipulation of script elements

Mike Beltzner [:beltzner, not reading bugmail]

Comment 1

•

16 years ago

Assertion failurez iz badz.

Flags: blocking1.9.1? → blocking1.9.1+

Priority: -- → P2

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Assignee: general → graydon

Graydon Hoare :graydon

Assignee

Comment 2

•

16 years ago

Ok, I've narrowed down what's going on here a ways. The assert is claiming that when the trace returned, it left the native stack in a state has 2 new things on it; but when the assert runs js_ReconstructStackDepth() on the bytecode making up the trace -- which simulates the effect of each opcode on the stack depth -- it decides there should only be 1 thing on the stack at that pc, not 2. So the assert trips. So: why are there 2 things on the stack instead of 1? Who is right? This is the question. The exit in question comes from the bottom of the loop: "for (var q = 0; q < " + m + "; ++q) { }" which is an lt relational; the test is "q < 7" in the case sitting in my debugger, guarded on the assumption of "being true"; so we take the exit when the guard turns false. The code in TraceRecorder::relational() emits this guard via TraceRecorder::fuseif(), which describes the exit with the belief that the stack still contains the 2 values "q" and "7", not the single value "false" that ought to be on the stack after the lt operation has occurred. Which is correct for the exit, mm? The pre-operation state "q" and "7", or the post-operation state "false"? Looking at ::relational() more closely, it's clear "false" has not actually been *written* back to the stack when the exit has taken. We decide to exit in ::fuseif *before* writing "false" back to the stack. That's a hint. Digging further, ::fuseif() calls TraceRecorder::snapshot(), which calls js_NativeStackSlots() to work out the number of stack slots present, which asks the *interpreter* about its current sp. So my current hunch is that the interpreter disagrees with the judgment of js_ReconstructStackDepth(), and that 2 is in fact the correct number, given the state of recording: we're still, to some extent, in the "pre-state" of the lt-fused-with-the-next-ifne-opcode hybrid operation. Or something. I think this all *might* have to do with the TRY_BRANCH_AFTER_COND macro shortcut installed in rev 2c17ba7e9da3, back in August, but I'll have to dig a bit more after lunch to confirm. (I'm surprised, if that's the case, that we're not hitting this condition a lot more commonly...)

Status: NEW → ASSIGNED

Graydon Hoare :graydon

Assignee

Comment 3

•

16 years ago

Attached patch Fix the bug (obsolete) (deleted) — Details — Splinter Review

Whoo, that was a bit of a diversion. What's actually happening is that we're returning with a pc adjustment of 20, which is the IFNE opcode following the LT at the bottom of the loop. This is incorrect; we're supposed to be called to trace *before* the LT executes, and this is the conceptual basis for what gets written out (correctly) by the fuseif operation. So I have discovered talking to gal. The new question, then, is why we get a pc adjustment of 20 rather than 19? It makes no sense. And after much stabbing around, it hasn't to do with thin loops. The answer is much spookier, and explains why we can only reproduce in the browser: there are two <script> tags at work here, containing almost-identical code, compiled-and-run one after another. Turns out we are *recycling* the spidermonkey bytecode buffer memory (via malloc) and running cached JIT'ed machine code for the old bytecode when we actually got new bytecode that is, alas, a single opcode shorter: opcode "JSOP_ONE" (one byte) is used in the second run, triggering the JIT'ed code, but it executes JIT'ed code compiled with a "JSOP_INT8 7" (two bytes) in the same location, wired to return therefore 1 byte of pc late. Thus the wild symptom causing the above deep diversions from the root cause. The fix is trivial. Flush the JIT cache same place we flush the property cache, when the script is destroyed. Patch attached.

Attachment #361015 - Flags: review?

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Attachment #361015 - Flags: review? → review?(brendan)

Brendan Eich [:brendan]

Comment 4

•

16 years ago

Comment on attachment 361015 [details] [diff] [review] Fix the bug >diff -r 569acf636d50 -r 05de25319810 js/src/jsscript.cpp >--- a/js/src/jsscript.cpp Wed Feb 04 11:08:31 2009 -0800 >+++ b/js/src/jsscript.cpp Fri Feb 06 18:14:14 2009 -0800 >@@ -60,6 +60,9 @@ > #include "jsparse.h" > #include "jsscope.h" > #include "jsscript.h" >+#ifdef JS_TRACER >+#include "jstracer.h" >+#endif > #if JS_HAS_XDR > #include "jsxdrapi.h" > #endif Turns out you can #include "jstracer.h" without #ifdefs around the #include. >@@ -1607,6 +1610,9 @@ > JS_ASSERT(script->owner == cx->thread); > #endif > js_FlushPropertyCacheForScript(cx, script); >+#ifdef JS_TRACER >+ js_FlushJITCache(cx); >+#endif Any way to flush just this script's pc mapping? Worried that we will over-flush when loading top level scripts (sometimes this is done via the DOM APIs: create a Script node, insert it). /be

Graydon Hoare :graydon

Assignee

Comment 5

•

16 years ago

(In reply to comment #4) > Turns out you can #include "jstracer.h" without #ifdefs around the #include. Wondered about that, wasn't sure if you had a preference wrt. style. > >@@ -1607,6 +1610,9 @@ > > JS_ASSERT(script->owner == cx->thread); > > #endif > > js_FlushPropertyCacheForScript(cx, script); > >+#ifdef JS_TRACER > >+ js_FlushJITCache(cx); > >+#endif > > Any way to flush just this script's pc mapping? Worried that we will over-flush > when loading top level scripts (sometimes this is done via the DOM APIs: create > a Script node, insert it). Also wondered about that. It's not super hard; just need to make a helper function (or perhaps a defaulted script argument to the existing one) and possibly store an extra script pointer in each VMFragment if it's not already reachable from the tangle of what's already in there. Will cost a full hashtable walk, but the full flush already costs that, so strictly cheaper. I'll fiddle with this on monday.

Brendan Eich [:brendan]

Comment 6

•

16 years ago

(In reply to comment #5) > (In reply to comment #4) > > > Turns out you can #include "jstracer.h" without #ifdefs around the #include. > > Wondered about that, wasn't sure if you had a preference wrt. style. Generally against extra #ifdefs :-). We often define stub macros in #else parts to relieve callers; same motivation for #include guards. I think originally you had to #ifdef the #include, but that changed and most includers do not ifdef. > Also wondered about that. It's not super hard; just need to make a helper > function (or perhaps a defaulted script argument to the existing one) and > possibly store an extra script pointer in each VMFragment if it's not already > reachable from the tangle of what's already in there. You could just test JS_UPTRDIFF(f->ip, script->code) < script->length and if true, then f->ip is in script's bytecode vector. /be

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Attachment #361015 - Flags: review?(brendan) → review?(danderson)

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Attachment #361015 - Flags: review?(danderson)

Andreas Gal :gal

Comment 7

•

16 years ago

Isn't this a bit extreme? Flushing the entire cache? Brendan how frequently do we hit this path?

Graydon Hoare :graydon

Assignee

Comment 8

•

16 years ago

Attached patch More conservative fix (obsolete) (deleted) — Details — Splinter Review

This is a more conservative fix; it just nulls out the hashtable entry for any VMFragment peer list associated with a particular script, as brendan suggested. Still fixes this particular crasher, doesn't appear to interact negatively with any other tests I can run (sunspider in browser, dromaeo, alexa, trace-test in shell, etc.)

Attachment #361015 - Attachment is obsolete: true

Attachment #361399 - Flags: review?(gal)

Andreas Gal :gal

Comment 9

•

16 years ago

Comment on attachment 361399 [details] [diff] [review] More conservative fix Nice.

Attachment #361399 - Flags: review?(gal) → review+

Graydon Hoare :graydon

Assignee

Comment 10

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/93425e747d7f

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Whiteboard: fixed-in-tracemonkey

Graydon Hoare :graydon

Assignee

Comment 11

•

16 years ago

Ugh, no, this 'fix' is not correct; it's nulling out entire hash bucket-chains if their *head element* is in the ip right range, rather than killing the specific offending VMFragments along the chain in question. We actually have to traverse every chain and kill the chain *entries* that are possible offenders. Will post followup patch.

Whiteboard: fixed-in-tracemonkey

Graydon Hoare :graydon

Assignee

Comment 12

•

16 years ago

Attached patch Correction to flushing logic (obsolete) (deleted) — Details — Splinter Review

It's a testament to both the obscurity of the testcase and the robustness of the JIT that the patch I landed didn't *appear* to break anything; but it wasn't actually correct, as noted in previous comment. This variant is, I think, more correct. Perhaps even fully correct, dare I hope? Please read with some care. I've still got a browser building on this laptop to test more completely.

Attachment #361399 - Attachment is obsolete: true

Attachment #361472 - Flags: review?(gal)

Graydon Hoare :graydon

Assignee

Updated

•

16 years ago

Attachment #361472 - Flags: review?(gal) → review?(jorendorff)

Jason Orendorff [:jorendorff]

Comment 13

•

16 years ago

Comment on attachment 361472 [details] [diff] [review] Correction to flushing logic If we search this table frequently (and flush rarely), it would be better to remove the fragment from the chain.

Attachment #361472 - Flags: review?(jorendorff) → review+

Graydon Hoare :graydon

Assignee

Comment 14

•

16 years ago

Attached patch Physically disconnect buckets, rather than set ip to 0 (deleted) — Details — Splinter Review

This variant disconnects offending entries entirely from the hash chains, as suggested. Probably best. Appears to pass shell testing; rebuilding browser presently to test against original case and confirm. Otherwise look good?

Attachment #361472 - Attachment is obsolete: true

Attachment #361560 - Flags: review?(jorendorff)

Jason Orendorff [:jorendorff]

Comment 15

•

16 years ago

Comment on attachment 361560 [details] [diff] [review] Physically disconnect buckets, rather than set ip to 0 > + debug_only_v(printf("Disconnecting VMFragment %p " > + "with ip %p, in range [%p,%p).\n", > + *f, (*f)->ip, script->code, > + script->code + script->length)); GCC will complain that these aren't void *. :-P Otherwise it looks fine.

Attachment #361560 - Flags: review?(jorendorff) → review+

Graydon Hoare :graydon

Assignee

Comment 16

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/904555678bda GCC already complains about dozens of such things. If we're to fix them, let's do so all at once?

Brendan Eich [:brendan]

Comment 17

•

16 years ago

Really better to avoid adding new warnings -- otherwise the deferred job of fixing the ever-increasing pile looks even more thankless and ill-timed at any given point in the development cycle. I grep out warnings like so: grep warn ./Darwin*/*made|grep -v nanojit|grep -v no-rtti|grep -v editline but the good news is that I do not see novel warnings introduced by this bug's patch. I see: ./Darwin_DBG.OBJ/made:../jstracer.cpp:3264: warning: ‘void js_dumpMap(const TypeMap&)’ defined but not used ./Darwin_DBG.OBJ/made:../../shell/js.cpp:2809: warning: ‘void Callback(JSRuntime*)’ defined but not used ./Darwin_OPT.OBJ/made:../../shell/js.cpp:2809: warning: ‘void Callback(JSRuntime*)’ defined but not used (last two are the same; Andreas is fixing the Callback one), and I'm fixing the jstracer.cpp one in the upvar patch, although it really should go in on its own (provided it gets merged to active release branches -- that's the down side of fixing warnings: versionitis between tm/m-c and 1.9.1, or all three). /be

Comment 18

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/93425e747d7f

Status: ASSIGNED → RESOLVED

Closed: 16 years ago

Resolution: --- → FIXED

Graydon Hoare :graydon

Assignee

Comment 19

•

16 years ago

Still a lingering secondary patch to backport here, noted in comment #16.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Whiteboard: fixed-in-tracemonkey

Comment 20

•

16 years ago

http://hg.mozilla.org/mozilla-central/rev/904555678bda

Status: REOPENED → RESOLVED

Closed: 16 years ago → 16 years ago

Resolution: --- → FIXED

Comment 21

•

16 years ago

http://hg.mozilla.org/releases/mozilla-1.9.1/rev/3c65bb9e17ea

Keywords: fixed1.9.1

Bob Clary [:bc] (inactive)

Comment 22

•

16 years ago

http://hg.mozilla.org/tracemonkey/rev/fb75ee8e9017 /cvsroot/mozilla/js/tests/js1_5/Regress/regress-466747.js,v <-- regress-466747.js initial revision: 1.1

Flags: in-testsuite+

Bob Clary [:bc] (inactive)

Comment 23

•

16 years ago

v 1.9.1, 1.9.2.

Status: RESOLVED → VERIFIED

Keywords: fixed1.9.1 → verified1.9.1

You need to log in before you can comment on or make changes to this bug.