Closed Bug 632343 Opened 14 years ago Closed 14 years ago

Running automated Firebug tests crashes Firefox [@ js::mjit::EnterMethodJIT]

Categories

(Core :: JavaScript Engine, defect)

x86
Windows Vista
defect
Not set
critical

Tracking

()

RESOLVED FIXED
Tracking Status
blocking2.0 --- betaN+

People

(Reporter: Honza, Assigned: sfink)

References

Details

(Keywords: crash, Whiteboard: [firebug-p1][hardblocker][has patch][fixed-in-tracemonkey])

Crash Data

Attachments

(4 files, 3 obsolete files)

Firefox 4 crashes when running Firebug automated tests (FBTest): Crash report: https://crash-stats.mozilla.com/report/index/bp-8caafa45-9d2c-433b-9402-c1dc02110208 Note that Bug 626830 landed, which could be related: STR: 1) Install Firebug 1.7: http://getfirebug.com/releases/firebug/1.7X/ 2) Install FBTest 1.7: http://getfirebug.com/releases/fbtest/1.7/ 3) Open Test Console: Firebug icon menu 4) Press 'Run All', wait for crash More about how to run Firebug tests here: http://getfirebug.com/wiki/index.php/Running_Automated_Test_Suite Let me know if you need my help to narrow the regression range Bug 595351 is related, but David Baron recommended to create a separate bug report. Honza
Asking for blocking here since this prevents us from running Firebug tests. Honza
blocking2.0: --- → ?
Whiteboard: [firebug-p1]
Severity: normal → critical
Keywords: crash
Summary: Running automated Firebug tests crashes Firefox - js::mjit::EnterMethodJIT → Running automated Firebug tests crashes Firefox [@ js::mjit::EnterMethodJIT]
Regression range according to: http://getfirebug.com/testresults App Changeset: 2ce5e1843ce8 -> OK App Changeset: a02c6f4ffe4a -> CRASH Honza
blocking2.0: ? → final+
Whiteboard: [firebug-p1] → [firebug-p1][hardblocker]
blocking2.0: final+ → betaN+
I'm seeing this with the FBTest suite as well. My current instance is trying to execute code that is zeroed out. gdb can't give me a back trace (on x86-64 at least), but just chasing frame pointers I can get function names. Attaching...
Attached file basic stack trace (deleted) —
Now I have a similar instance, but it hits when my cx is zeroed out. I get some meaningful stack in this case, though, before it hits the mjitted code: #0 0x0000003f9ac0f29b in raise () from /lib64/libpthread.so.0 #1 0x00007ffff46fd054 in JS_Assert (s=0x7ffff4878f9d "(cx)->thread", file=0x7ffff4878d60 "/home/sfink/src/TM-singlestep/js/src/jscntxt.h", ln=2192) at /home/sfink/src/TM-singlestep/js/src/jsutil.cpp:83 #2 0x00007ffff456cf5d in js::AutoGCRooter::AutoGCRooter (this=0x7fffffffb728, cx=0x7fffe8e08298, tag=-1) at /home/sfink/src/TM-singlestep/js/src/jscntxt.h:2192 #3 0x00007ffff456d0fb in js::AutoValueRooter::AutoValueRooter (this=0x7fffffffb728, cx=0x7fffe8e08298, _notifier=...) at /home/sfink/src/TM-singlestep/js/src/jscntxt.h:2260 #4 0x00007ffff45e596c in js_ReportIsNotFunction (cx=0x7fffe8e08298, vp=0x7fffe8d5e550, flags=0) at /home/sfink/src/TM-singlestep/js/src/jsfun.cpp:2971 #5 0x00007ffff4633712 in js::Invoke (cx=0x7fffe8e08298, argsRef=..., flags=0) at /home/sfink/src/TM-singlestep/js/src/jsinterp.cpp:658 #6 0x00007ffff4820ecc in js::mjit::stubs::UncachedCallHelper (f=..., argc=3, ucr=0x7fffffffb8f8) at /home/sfink/src/TM-singlestep/js/src/methodjit/InvokeHelpers.cpp:485 #7 0x00007ffff4820c08 in js::mjit::stubs::UncachedCall (f=..., argc=3) at /home/sfink/src/TM-singlestep/js/src/methodjit/InvokeHelpers.cpp:435 #8 0x00007fffe573e0e7 in ?? () #9 0x0000000000000000 in ?? () I've been getting what is probably the same crash occasionally for a while now (specifically, with js_ReportIsNotFunction on the stack and under a JM frame.) I'm working on getting a nicely reproducible case. I can reproduce trivially by rerunning the test suite, but it crashes in different places each time.
Strange. If I run the 'Firebug' section of the test suite, this always crashes in firebug/openDisableEnableReload. If I expand that section and click on one test at a time, running the same series of tests, it does not crash.
Can you bisect on tracemonkey tree?
Thanks for the bisecting Mossop! Honza
I'll take a stab. I can repro a mjit-only crash in firebug/openDisableEnableReload.js. (Unfortunately, only when running all of the "Firebug" section.) Is this the same test crashing in comment 0?
Assignee: general → lw
Looks like it. That scenario matches what I was seeing, too. Does your stack look about the same? (One or two mjit calls on the stack invoked by an event handler)
Yes.
The cause of the problem seem to be that, if a script is live when we call JS_SetDebugModeForCompartment, it gets added to the liveScripts set and we don't call ReleaseScriptCode on that script. The problem is that this live script could have a call ic which has baked in the address of jit code that *is* freed by JS_SetDebugModeForCompartment. The obvious fix is to just fail setting debug mode if a script from the compartment is active. Why do we even allow calling JS_SetDebugModeForCompartment if any scripts are active? Patch shortly; there also appear to be bugs in nsXPConnect::CheckForDebugMode's error path.
(In reply to comment #13) > The obvious fix is to just fail setting debug mode if a script from the > compartment is active. Why do we even allow calling > JS_SetDebugModeForCompartment if any scripts are active? We don't (or we shouldn't) if you are turning debugging *on*. If you're turning it off, we need to immediately start preventing any JSD callbacks from firing. We do not actually need to recompile (or discard) any scripts, though. They can wait to get sped up the next time they're compiled. > Patch shortly; there also appear to be bugs in nsXPConnect::CheckForDebugMode's > error path. You talking about the leaked context? Oops! You may want to address some of the comments from bug 632864. (Or if you attach the patch, I'll re-raise them on timeless's behalf.)
Attached patch fix (obsolete) (deleted) — Splinter Review
This changes firebug/openDisableEnableReload from crashing every time to merely failing 3 tests: - The net should be marked on the Firebug Statusbar Icon - The script should be marked on the Firebug Statusbar Icon - The script should be marked on the Firebug Statusbar Icon My guess is that this is related to the underlying problem that we are entering debug mode for a compartment whose scripts are on the stack. Steve: if you put a breakpoint on the new 'return JS_FALSE', perhaps you can see how we get into this situation and whether that that in and of itself is a bug or whether we need something like this patch. Patch also fixes a bug in nsXPConnect. Patch also needs comments explaining.
Attachment #511895 - Flags: review?(sphink)
Whiteboard: [firebug-p1][hardblocker] → [firebug-p1][hardblocker][has patch]
(In reply to comment #14) > We don't (or we shouldn't) if you are turning debugging *on*. Ah, so then thats the bug and my patch shouldn't be necessary. Put a breakpoint on the "if (debug) return JS_FALSE" line and you can see how this is happening. It seems like we should but a big fat assert in this function that, when switching on debugging, no scripts are on the stack.
(In reply to comment #15) > Created attachment 511895 [details] [diff] [review] > fix > > This changes firebug/openDisableEnableReload from crashing every time to merely > failing 3 tests: > - The net should be marked on the Firebug Statusbar Icon > - The script should be marked on the Firebug Statusbar Icon > - The script should be marked on the Firebug Statusbar Icon > My guess is that this is related to the underlying problem that we are entering > debug mode for a compartment whose scripts are on the stack. Oddly, that test almost always passes for me with your patch. openInNewWindow fails most of the time, though. > Steve: if you put > a breakpoint on the new 'return JS_FALSE', perhaps you can see how we get into > this situation and whether that that in and of itself is a bug or whether we > need something like this patch. It is hit. I have now edumacated myself about SM stack layouts. I was checking for active scripts by testing cx->regs (equivalent to hasActiveFrame()), but then scanning through all frames in JS_SetDebugModeForCompartment. I *think* I should be checking cx->getCurrentSegment() instead, because saved frames should count. And it turns out you need to be in a request to call xpc::ParticipatesInCycleCollection. Given that ICs can be shared between live and non-live scripts, it seems like JS_SetDebugModeForCompartment should *never* be called when any scripts are live, independent of whether you're turning debugging on or off. I've made another patch that just never calls it at all when turning debugging off. It means that everything will run slower after you turn off debugging until all the scripts eventually get recompiled, which kind of sucks. As an optimization, I could have JS_SetDebugModeForCompartment check whether anything at all is live and if not, discard all JITScripts. But it wouldn't do any good, since in my experience it always takes a little while for everything to stop running. So I'd have to immediately disable JSD, then keep watching for things to quiesce. And that feels like more logic that I would get wrong.
This could probably use some comment improvements and function renamings (JS_SetDebugModeForCompartment -> JS_DiscardJITScripts or something, then reuse the former name for setting comp->debugMode only?) But I wanted to run it past you to see if you think it's the right approach. I also incorporated everything from bug 626743. timeless, if you're listening, I don't know what you mean by this: > + nsCOMPtr<jsdIDebuggerService> jsds = do_GetService(jsdServiceCtrID, &rv); > > i don't think we actually want to do this, i suspect we want to instead ask the > service manager if the service is currently instantiated. the code you have > here will instantiate it even if no one is using it. there is probably some > cost involved in doing this (albeit hopefully not a high cost). This will only get called if SetDebugModeWhenPossible has been called, and realistically speaking that means someone already created the debugger service. But mainly, I don't know what a service manager is. (My knowledge beyond the boundaries of the JS engine is shockingly limited. Not that I know much within, either...)
Attachment #511895 - Attachment is obsolete: true
Attachment #511931 - Flags: review?(lw)
Attachment #511931 - Flags: feedback?(timeless)
Attachment #511895 - Flags: review?(sphink)
With this patch applied, I did 'Run All' on the test suite. It made it much farther, but crashed somewhere around console/api/time.js with the attached stack. This could be a completely different issue that fixing this uncovered. Let me know whether you think it's related. Otherwise I'll file a separate bug. Rerunning starting from the console/api section, it ran all the way through to the end without crashing. In fact, I ran all the tests, one section at a time, without crashing.
Attachment #511931 - Attachment is patch: true
Attachment #511931 - Attachment mime type: application/octet-stream → text/plain
Attachment #511931 - Attachment is obsolete: true
Attachment #512268 - Flags: review?(lw)
Attachment #512268 - Flags: feedback?(timeless)
Attachment #511931 - Flags: review?(lw)
Attachment #511931 - Flags: feedback?(timeless)
New patch is to address an issue Luke noticed when reviewing patch -- I was uselessly calling xpc::ParticipatesInCycleCollection in nsXPConnect::Push when it is already guaranteed to be on the main thread, because it's looking at per-thread data and have checked NS_IsMainThread(). I added an assert just to be sure and ran it through the full fbtest suite. (I didn't run it on the try server, because I realized our JSD test coverage there is too minimal to be useful right now.) It passed.
Comment on attachment 512268 [details] [diff] [review] Fix live script check, do not trigger script recompile when turning off debugging Looks good; just nits: >+ // Discard JIT code for any scripts that change debugMode. This function is >+ // only called from the main thread, and will only consider contexts in >+ // that same (main) thread and scripts inside compartments associated with >+ // that thread. (Scripts off the main thread are allowed to migrate from >+ // thread to thread, but scripts do not migrate between the main thread and >+ // other threads.) I don't think the JS engine should be making reference to the 'main thread'. Also, as long as the caller ensures that 'comp' is in the same thread as cx, I think this would work off the main thread. It would be nice to assert that but, currently, there is no compartment->thread (we need to merge AutoCompartment/AutoRequest and kill cx->globalObject before we can maintain this field). So perhaps just replace the comment with "This function assumes that 'comp' is in the same thread as 'cx'". Second, I would add a comment to CompartmentHasLiveScripts to the effect of "Unsynchronized context iteration is technically a race; but this is only for debug asserts where such a race would be rare" and then wrap the whole thing in #ifdef DEBUG. >+ if (gDesiredDebugMode) { >+ JS_ASSERT(!gDebugMode); >+ gDesiredDebugMode = gDebugMode; >+ JS_SetRuntimeDebugMode(rt, gDebugMode); >+ } else { >+ JS_ASSERT(gDebugMode); >+ gDebugMode = JS_FALSE; >+ } >+ >+ JS_ASSERT(!gDebugMode && !gDesiredDebugMode); Perhaps this could be simplified to just: if (gDesiredDebugMode) JS_SetRuntimeDebugMode(rt, JS_FALSE); gDesiredDebugMode = gDebugMode = JS_FALSE; > jsdService::RecompileForDebugMode (JSContext *cx, JSCompartment *comp, JSBool mode) { > NS_ASSERTION(NS_IsMainThread(), "wrong thread"); >- return JS_SetDebugModeForCompartment(cx, comp, mode) ? NS_OK : NS_ERROR_FAILURE; >+ if (mode) { >+ return JS_SetDebugModeForCompartment(cx, comp, mode) ? NS_OK : NS_ERROR_FAILURE; >+ } else { >+ /* >+ * Some scripts may be live, so just mark the compartment as not wanting >+ * debugMode for scripts compiled from now on. >+ */ >+ comp->debugMode = JS_FALSE; >+ return NS_OK; >+ } > } The reason the code is this way is because of how its called in xpconnect, so it seems like it'd be easier to understand if this function was just rolled into the single caller in xpconnect.
Attachment #512268 - Flags: review?(lw) → review+
> Looks good; just nits: Nits? If those are what you call nits, I'd hate to see what your substantial comments look like! This patch addresses all of your saber-toothed nits, but it turned out to be a big enough change that I'm not comfortable carrying forward your r+. In particular, as you point out it now makes more sense for nsXPConnect.cpp to do the call (or not) to JS_SetDebugModeForCompartment directly because it's so intimately tied to the details of how XPConnect uses it. So I nerfed the whole RecompileForDebugMode IDL entry entirely. Perhaps I should just remove it? (It's [noscript] and I recently changed its signature, so it's not like it's a sacred cow.)
Attachment #512268 - Attachment is obsolete: true
Attachment #512306 - Flags: review?(lw)
Attachment #512268 - Flags: feedback?(timeless)
> Nits? If those are what you call nits, I'd hate to see what your substantial > comments look like! Haha; they were in my head before writing the comment :)
(In reply to comment #23) > Perhaps I should just remove it? I'd like it if you did; does anyone XPCOM-IDL-rules-knowing know whether Steve can do that (interface stability/release concerns and all...) ??
Comment on attachment 512306 [details] [diff] [review] Fix live script check, do not trigger script recompile when turning off debugging comment 25 aside, r+
Attachment #512306 - Flags: review?(lw) → review+
Comment on attachment 512306 [details] [diff] [review] Fix live script check, do not trigger script recompile when turning off debugging The IDL change (removing recompileForDebugMode) sounds like module owner territory. We can leave it alone for now as in the current patch (stubbed out to always return an error), if you don't want further IDL mutation. Or take it out. Works for me either way.
Attachment #512306 - Flags: review?(timeless)
Please minimize IDL mutation at this late stage. Thank you!
Assignee: lw → sphink
Whiteboard: [firebug-p1][hardblocker][has patch] → [firebug-p1][hardblocker][has patch][fixed-in-tracemonkey]
Comment on attachment 512306 [details] [diff] [review] Fix live script check, do not trigger script recompile when turning off debugging Shaver nixed the IDL change, so there's not much left for timeless to review that he hasn't already.
Attachment #512306 - Flags: review?(timeless)
host-5-225:Darwin_DBG.OBJ gal$ ./js -m /Users/gal/workspace/tracemonkey-repository/js/src/jit-test/tests/basic/testBug552248.js Assertion failure: !CompartmentHasLiveScripts(comp), at ../jsdbgapi.cpp:151 Segmentation fault
Whiteboard: [firebug-p1][hardblocker][has patch][fixed-in-tracemonkey] → [firebug-p1][hardblocker][has patch]
generally speaking if the method didn't exist in anything we shipped then not shipping it is preferable to shipping it with a promise that it will never do anything, as it's vtable bloat. that said, if someone else is driving, there's nothing more i can contribute indeed.
Ugh. Forgot about the shell. The problem is that the shell has a setDebug() function that expects to be able to set debug mode synchronously, which is unsupported for FF4. Until this patch, it was faked well enough -- debug mode would be set for all non-live scripts, which in the context of shell tests meant that the toplevel script would not be in debug mode but everything else would be. Apparently, this is good enough for those tests, since they seem to be mostly just checking whether turning on debugging messes things up. Just thinking out loud: 1. Add back in the code to detect live scripts and skip them. This obviously gets things working again the way they were before this patch, but it has unexpected semantics that I don't know if people looking at test code are aware of. 2. Use the shell's -d flag to run all these scripts. Probably make setDebug(true) a no-op if debug mode is already on, in addition. It looks like there's a magic // |jit-test| STUFF marker that could be used for this purpose? It means touching 29 test files. 3. Add some bogus asynchrony into the shell -- have setDebug be a no-op if it's already on, but otherwise it exits the current execution with a request to be reinvoked with debugging on. Messy. 4. LAST_FRAME_CHECKS is run with no scripts live on the stack. Throw a special uncatchable exception which contains a continuation. Have the shell register an error reporter that detects it, turns on debug mode, and invokes the continuation. Probably mostly good for humor value. ...although I've been thinking about doing a simple form of this to allow the debugger to break on uncatchable exceptions and see exactly where they came from even after you've unwound the stack. So it's not completely crazy? Maybe. Or 1b. Special-case the outermost script by never running it mjitted (this part should already happen with the recent changes to interpret before JITting) and allow it to be live when setting debug mode. This kind of fights with the "mjitalways" annotation though. #1 and #2 are feasible. Not sure about #1b. #2 seems cleanest if we're stuck with asynchronous debugger activation. But we probably aren't; sounds like the type inference branch can recompile live scripts. So I'm torn.
Change the shell's setDebug to take a function that gets called, and wrap some tests that way?
Here's #2. It leaves the setDebug(true) calls in the source, which is a little weird. My intent is that setDebug(false) turns off debugging (well, it disables callbacks at least.) setDebug(true) means to assert if debugging isn't already on. I could remove them instead and make setDebug(true) an unconditional error.
Attachment #512531 - Flags: review?(lw)
(In reply to comment #35) > Change the shell's setDebug to take a function that gets called, and wrap some > tests that way? A cleaner version of #3, I guess? I'd still have to teach the shell to stash that function away and call it after everything is done executing (add in a work queue.) Or did you mean to allow live scripts on the stack when turning on debugging (#1), as long as you don't re-enter them when running the setDebug callback? I believe the tests are already written with the expectation that the toplevel script will not itself be subject to debugging, because they would have broken a while back otherwise. So anything that needs to be run with debugging on is already in an inner frame.
Comment on attachment 512531 [details] [diff] [review] Use -d flag to run jit-tests when needed Nice; I also thought this was the best of the options you listed, mostly because it keeps the complexity out of the shell.
Attachment #512531 - Flags: review?(lw) → review+
cdleary-bot mozilla-central merge info: http://hg.mozilla.org/mozilla-central/rev/dd5c76d35ac4 http://hg.mozilla.org/mozilla-central/rev/e25c8949931d (backout) Note: not marking as fixed because last changeset is a backout.
Whiteboard: [firebug-p1][hardblocker][has patch] → [firebug-p1][hardblocker][has patch][needs landing tracemonkey]
Whiteboard: [firebug-p1][hardblocker][has patch][needs landing tracemonkey] → [firebug-p1][hardblocker][has patch][fixed-in-tracemonkey]
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 634648
Depends on: 638274
Crash Signature: [@ js::mjit::EnterMethodJIT]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: