Closed Bug 499169 Opened 15 years ago Closed 15 years ago

top crash [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]

Categories

(Core :: JavaScript Engine, defect, P1)

1.9.1 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla1.9.1

People

(Reporter: samuel.sidler+old, Assigned: gal)

References

Details

(5 keywords, Whiteboard: fixed-in-tracemonkey)

Crash Data

Attachments

(3 files, 4 obsolete files)

The current topcrash in Firefox 3.5b99 and Firefox 3.5 (all RCs) happens with a signature of js_MonitorLoopEdge(JSContext*, unsigned int&). On trunk, it appears further down in the topcrash list and, currently, doesn't have any crashes on Windows. It seems to happen across platforms on 1.9.1, however, and the top two frames all appear to be random hex numbers. Query for Firefox 3.5 (RC) crashes: http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_MonitorLoopEdge%28JSContext*%2C%20unsigned%20int%26%29 Typically, the crash happens with the following stack, taken from bp-3d4e216c-f4bf-4444-b663-1e82d2090617: Frame Module Signature [Expand] Source 0 @0x4122c24 1 @0x12ecbb 2 js3250.dll js_MonitorLoopEdge js/src/jstracer.cpp:4862 3 js3250.dll js_Interpret js/src/jsinterp.cpp:3308 However, sometimes it appears with the following stack, taken from bp-b786c839-de39-4468-925b-065b72090618: Frame Module Signature [Expand] Source 0 @0x162d0c0c 1 @0xbfffc5f7 2 libmozjs.dylib js_MonitorLoopEdge js/src/jstracer.cpp:4862 3 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:3308 4 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 5 libmozjs.dylib js_fun_apply js/src/jsfun.cpp:2074 6 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5147 7 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 8 libmozjs.dylib js_fun_apply js/src/jsfun.cpp:2074 9 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5147 10 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 11 libmozjs.dylib js_fun_apply js/src/jsfun.cpp:2074 12 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5147 13 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 14 libmozjs.dylib js_fun_apply js/src/jsfun.cpp:2074 15 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5147 16 libmozjs.dylib js_Invoke js/src/jsinterp.cpp:1394 17 libmozjs.dylib js_fun_apply js/src/jsfun.cpp:2074 18 libmozjs.dylib js_Interpret js/src/jsinterp.cpp:5147 19 libmozjs.dylib js_Execute js/src/jsinterp.cpp:1622 20 libmozjs.dylib JS_EvaluateUCScriptForPrincipals js/src/jsapi.cpp:5145 21 XUL nsJSContext::EvaluateString dom/src/base/nsJSEnvironment.cpp:1631 22 XUL nsScriptLoader::EvaluateScript content/base/src/nsScriptLoader.cpp:686 23 XUL nsScriptLoader::ProcessRequest content/base/src/nsScriptLoader.cpp:600 24 XUL nsScriptLoader::ProcessPendingRequests content/base/src/nsScriptLoader.cpp:740 25 XUL nsRunnableMethod<nsScriptLoader>::Run nsThreadUtils.h:264 26 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 27 XUL NS_ProcessNextEvent_P nsThreadUtils.cpp:227 28 XUL nsThread::Shutdown xpcom/threads/nsThread.cpp:465 29 XUL NS_InvokeByIndex_P xpcom/reflect/xptcall/src/md/unix/xptcinvoke_unixish_x86.cpp:179 30 XUL nsProxyObjectCallInfo::Run xpcom/proxy/src/nsProxyEvent.cpp:181 31 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:510 32 XUL NS_ProcessPendingEvents_P nsThreadUtils.cpp:180 33 XUL nsBaseAppShell::NativeEventCallback widget/src/xpwidgets/nsBaseAppShell.cpp:121 34 XUL nsAppShell::ProcessGeckoEvents widget/src/cocoa/nsAppShell.mm:405 35 CoreFoundation CFRunLoopRunSpecific 36 CoreFoundation CFRunLoopRunInMode 37 HIToolbox RunCurrentEventLoopInMode 38 HIToolbox ReceiveNextEventCommon 39 HIToolbox BlockUntilNextEventMatchingListInMode 40 AppKit _DPSNextEvent 41 AppKit -[NSApplication nextEventMatchingMask:untilDate:inMode:dequeue:] 42 AppKit -[NSApplication run] 43 XUL nsAppShell::Run widget/src/cocoa/nsAppShell.mm:720 44 XUL nsAppStartup::Run toolkit/components/startup/src/nsAppStartup.cpp:193 45 XUL XRE_main toolkit/xre/nsAppRunner.cpp:3298 46 firefox-bin main browser/app/nsBrowserApp.cpp:156 47 firefox-bin firefox-bin@0x1541 48 firefox-bin firefox-bin@0x1468 49 @0x1 Filing as security-sensitive for now just to be safe.
Flags: wanted1.9.1.x+
Flags: blocking1.9.1?
Note that this is likely the same as bug 487317, which was marked WFM. I'm guessing it's "random".
Sorry, forgot to say, this is the #1 topcrash with ~3x the amount of crashes as the #2 topcrash.
I probably know the answer to this, but what do we think the frequency is?
Well, it's 3x more frequent than the next crash down. I should note that the reason there are so few 3.5 RC crashes is because we throttle them and only process 15% of the reports. (We throttle all major versions and the RC appears as a major version to the server.) In the last week, there's been ~7700 crashes using b99.
out of 800,000 users that's ... a lot :( Sayrer: stack helpful? any ideas?
It looks like Firebug... could that be right?
(In reply to comment #6) > It looks like Firebug... could that be right? Possibly? robcee?
hmm, a bunch of these mention a discussion forum. We should get URLs for this.
Lars: Can you generate a list of URLs for this crash signature? Anything newer than June 9 with a version of Firefox 3.5b99, Firefox 3.5, or Firefox 3.5pre (in that order).
Waldo, this looks like a NULL pointer access on trace to me. Didn't you work on a related bug a while back?
I'd like to see URLs and a set of STRs, ideally. Can't be sure it's Firebug. One user wrote this in his crash report: "random crashes even after removing the Firefox folder in Library/Application Support, disable all addons, disable all plugins" this one (in french) does point at firebug though: "Je suis quand même surpris qu'une RC plante aussi souvent. Quand j'utilise firebug avec des sites un peu touffus en CSS et JS, j'ai un plantage toutes les 5mn : ex : http://www.blue-days.org" I'm going to do some digging and see if we can reproduce it.
filed bug 499299 to get the url list from socorro. I'll keep mining those crash reports.
like a champ, I've been commenting away in bug 492041 thinking it was this one. Way to go, Rob. From that bug's c#26: (ignore the trailing commas from those URLs, I blame Numbers.app) there are lots of about: pages in there. Some about:blank, some about:sessionrestore and some about:rights. There is also an about:ubiquity link. A number of chrome: pages (adblockplus, autopager, downbar, fastdial and google-toolbar to name a few). I'm going to go through some of these remaining URLs and try to find crashers with Firebug. Time to get my 4chan on. ... from c#27: was able to crash: http://202.181.195.27/forum-5-1.html Load that URL, open Firebug. Click the "Yes I am 18 Years or Older" button. (optional: take a shot of something strong). Page loads, albeit strangely. Clicking from the script panel to the net panel, I think, caused the crash. Trying again to verify. http://crash-stats.mozilla.com/report/index/d45392dc-cd3a-41be-88f7-c1f852090619?p=1 ted replied in c#29: This crashed me on a Win32 trunk build without Firebug installed, FWIW: http://crash-stats.mozilla.com/report/index/1afc6f5c-0901-4cd2-8452-e3f9a2090619?p=1
another tidbit from lars' report. Earliest reports with this signature are: 2009-06-09 00:17:10.288931 running Firefox 3.5pre & 2009-06-09 00:59:01.635172 running Firefox 3.5b99
(In reply to comment #10) I probably have worked on such bugs, but then again, I've worked on a lot of different null derefs. Nothing here jumps out at me from the bug, the crash signatures, or anything else to say that it's something I would have more or less knowledge of than anyone else.
problem begins between 3.1b3 and 3.5b4. http://hg.mozilla.org/releases/mozilla-1.9.1/rev/3d9704097cd8 http://hg.mozilla.org/releases/mozilla-1.9.1/rev/afac8b5958bc that's a fairly big range and I don't have time to narrow this down any more at the moment. Any west-coasters feel like hunting for this?
QA: Can we get a regression range for this issue based on the URLs from comment 13 and comment 14?
final range: http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=7e2facde0c95&tochange=9b52390838f0 Looks like the candidate is likely in rsayre's merge at: http://hg.mozilla.org/releases/mozilla-1.9.1/rev/8940504c799e Andreas' changeset http://hg.mozilla.org/releases/mozilla-1.9.1/rev/54bd8a0c1c4b looks promising. but I'm only guessing based on the checkin comment "Recording continues across loop edge" and some scary-looking pointer math.
54bd8a0c1c4b enables tracing for some code that was accidentally blacklisted too early. It might be a red herring (we don't trace before that fix some code we really should, but trace incorrectly). So my best guess is its either not that changeset, or its not caused by it.
Note that upvar2 is in the regression window. However, we fall on our face on trace, so thats a bit unusual behavior for an upvar bug. 79606200f871 Brendan Eich — upvar2, aka the big one take 2 (452498, r=mrbkap).
rc, just to humor me could you disable the jit and see if it still crashes? I know we are crashing on trace, but if we get something wrong in the code generation for upvar maybe the interpreter dies too (which would be a lot easier to analyze from the stack trace).
andreas, sure thing, checking now. (and I believe your patch may well be a red herring. It just stood out as "interesting" which is why I mentioned it).
um, with jit.content set to false, andjit jit.chrome false, the page loads completely. If I turn on jit.content, the page crashes. Still not necessarily that particular patch, but it's one of the ones in that merge.
Now that RC is out there (and we took some fixes between b99 and RC, iirc) do we know if this crash is still happening?
Gary: can we get a TM regression range worked up based on the regression range posted in comment 21?
(In reply to comment #28) > Now that RC is out there (and we took some fixes between b99 and RC, iirc) do > we know if this crash is still happening? Yes, it is. And it's still #1 and it's about double the amount of crashes for the #2 topcrash.
Gary's telling me he needs a more definitive/reduced testcase to be able to autobisect here. Any chance?
auto-bisect what? The merge lists about 10 changesets. By the time you've written a reduced testcase, you could've bisected them with the given testcase. and yes, this still happens on latest nightlies/RCs.
Yes, I agree with Rob in comment 32; this is our #1 topcrash and should be the #1 priority to fix in case we decide to block on it. Can we get some manual testing done to see which of the patches on the list regressed it?
I have some disturbing new data: I was testing this on my Mac Pro at home (dual quadcore XEON 5150, 1st generation) and that page (http://202.181.195.27/forum-5-1.html, and click the over 18 link) crashed every time I loaded it. On my Mac Book Pro (last year's model), I can't get this page to crash at all. I don't have access to my mac at home to verify that this is still a problem. Not sure they've changed the page or hardware is a factor.
correction, dual dual core mac pro.
Do we seriously not have a blamed patch yet? It's been all day.
(In reply to comment #36) > Do we seriously not have a blamed patch yet? It's been all day. Yep, seriously. It stopped crashing for us as we were searching.
Note that for http://202.181.195.27/forum-5-1.html on mac os x and flash 10.0.22.87 and svn trunk valgrind I get a number of invalid reads and writes with sizes 1, 2, 4 in Flash_EnforceLocalSecurity
I think they may have changed some of the ad content on that page. Bob, have you tried running this with qm-xserve03? I think it's roughly comparable to the machine I've got at home, hardware-wise.
At this point, with no understanding of what's causing it, I'm having a hard time blocking release on this bug. Rob, are you able to reproduce reliably?
Since we don't have great steps to reproduce or understanding of this bug, I think we should make it public (not security-sensitive). Keeping it private isn't really protecting users, and might exclude people who can help figure out how to reproduce it.
Beltzner: I said in c#34 that I could produce it reliably on my home machine, not at all on my macbook. I think it may be hardware dependent. There's no question in my mind that there is a bug here that is causing a large number of crashes, but no idea where it's coming from or which exact patch is causing it. still waiting to hear from Bob if he can reproduce on his xserve. failing that, it's going to be two+ days before I have access to my desktop machine. Jesse: some of my crashes have been bus errors and memory access violations. There could be an exploit here, but it'd be nice to get some extra hands on this so we could at least develop a hardware profile for crashing machines. Could we try to get some of the QA machinery on this?
(In reply to comment #39) > I think they may have changed some of the ad content on that page. > > Bob, have you tried running this with qm-xserve03? I think it's roughly > comparable to the machine I've got at home, hardware-wise. no, but I will right now.
Rob, what is your desktop configuration?
Desktop's a Mac Pro (early generation, 1,1, Late 2007, I think?) dual processor, dual core 2.66 GHz XEON 5150 running OS X 10.5.7. I just ran a test on an Xserve 1,1, dual-dual core 2.66GHz running 10.4.11 but didn't get a crash. I did some informal requests for people to load this site in #qa and #firefox and the few responders didn't get crashes either. It's quite possible the cause of the crash has been removed from the page(s). Deeply troubling.
I'm going to leave this as a nomination for now; we're going to start building Firefox 3.5 RC3 with what we have, and hopefully we'll be able to identify what's causing this crash and find a trivial fix, at which point we can discuss respinning RC3 or waiting for 3.5.1.
Can we open up the bug then? Would be good if people can search for this instead of filing random bugs we have to triage and dup against this if this comes back.
I'm ok with opening this up. /be
Bug 499299 now has a list of URLs associated with this crash. For privacy reasons, only Mozilla employees can access bug 499299. *This* bug doesn't seem to have any security-sensitive information in it, so I'm making it public.
Group: core-security
I've asked QA to dig into this and indicated this is high priority.
Oops, comment 13 already contains a scrubbed, public list of URLs.
This one crashes for me consistently. http://www.skyfunny.com/thread-3548-1-5.html
(gdb) x/20i $pc-24 0x1b667bf9: jne 0x1b665d81 0x1b667bff: mov 0x20(%edx),%edx 0x1b667c02: cmp $0x4f0e,%edx 0x1b667c08: jne 0x1b665d90 0x1b667c0e: mov 0x8(%ecx),%ecx 0x1b667c11: mov (%ecx),%ecx 0x1b667c13: mov (%ecx),%edx 0x1b667c15: mov (%edx),%edx 0x1b667c17: test %edx,%edx 0x1b667c19: jne 0x1b665d9f 0x1b667c1f: mov 0x20(%ecx),%ecx 0x1b667c22: cmp $0x4f0e,%ecx 0x1b667c28: jne 0x1b665dae 0x1b667c2e: mov (%eax),%ecx 0x1b667c30: mov (%ecx),%ecx 0x1b667c32: mov 0xc(%ecx),%ecx 0x1b667c35: cmp $0x13025a,%ecx 0x1b667c3b: jne 0x1b665dbd 0x1b667c41: mov (%eax),%ecx 0x1b667c43: mov (%ecx),%edx (gdb) p $pc $1 = (void (*)()) 0x1b667c11 (gdb)
Attached file source file for webpage crash (obsolete) (deleted) —
i can confirm the crash on http://www.skyfunny.com/thread-15010-1-1.html. Attached is the source file, saved from Fx3.0.11
Andreas has this in a debugger now. Stay tuned.
(In reply to comment #54) > Created an attachment (id=384752) [details] > source file for webpage crash > > i can confirm the crash on http://www.skyfunny.com/thread-15010-1-1.html. > Attached is the source file, saved from Fx3.0.11 More information: This was ran against Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090623 Shiretoko/3.5pre. Clean profile.
I checked this on Mac Shiretoko, 20090417 (no crash) - 20090418 (crash).
(In reply to comment #57) > I checked this on Mac Shiretoko, 20090417 (no crash) - 20090418 (crash). can you add the regression changeset?
(In reply to comment #52) > This one crashes for me consistently. > http://www.skyfunny.com/thread-3548-1-5.html The two required files from that page to crash are mt2.js and jp.js. beginning of the two files: mt2.js: var MooTools={version:"1.2.2", jp.js: MooTools.More={'version':'1.2.2.1'}
cx = ld JSVAL_TO_PSEUDO_BOOLEAN(JSVAL_HOLE)[8] ld3217 = ld cx[NULL] eos = ld ld3217[NULL] ld3218 = ld eos[NULL] eor = eq ld3218, NULL xf2957: xf eor -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8 mov eax,8(eax) eax(JSVAL_TO_PSEUDO_BOOLEAN(JSVAL_HOLE)) ebx(cx) esi(state) edi(sp) mov ecx,0(eax) eax(cx) ebx(cx) esi(state) edi(sp) mov edx,0(ecx) eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp) mov edx,0(edx) eax(cx) ecx(ld3217) edx(eos) ebx(cx) esi(state) edi(sp) test edx,edx eax(cx) ecx(ld3217) edx(ld3218) ebx(cx) esi(state) edi(sp) jne 0x1b688d63 eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x1b688d63: merging registers (intersect) with existing edge mov ecx,-12(ebp) <= restore state mov eax,449640044 mov esp,ebp 0x1b688d6d: jmp 0x1b676ff8 --------------------------------------- end exit block 0x1accf688 shape = ld ld3217[skip257] guard(shape) = eq shape, #0xfbe5 $stack1: xf guard(shape) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8 mov ecx,32(ecx) eax(cx) ecx(ld3217) ebx(cx) esi(state) edi(sp) cmp ecx,64485 eax(cx) ecx(shape) ebx(cx) esi(state) edi(sp) jne 0x1b688d72 eax(cx) ebx(cx) esi(state) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x1b688d72: merging registers (intersect) with existing edge mov ecx,-12(ebp) <= restore state mov eax,449640120 mov esp,ebp 0x1b688d7c: jmp 0x1b676ff8 --------------------------------------- end exit block 0x1accf6d4 ld825 = ld cx[8] ld3219 = ld ld825[NULL] ops = ld ld3219[NULL] ld3220 = ld ops[NULL] guard(native-map) = eq ld3220, NULL xf2958: xf guard(native-map) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8 mov ecx,8(eax) eax(cx) ebx(cx) esi(state) edi(sp) mov edx,0(ecx) ecx(ld825) ebx(cx) esi(state) edi(sp) mov eax,0(edx) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) mov eax,0(eax) eax(ops) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) test eax,eax eax(ld3220) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) mov eax,-36(ebp) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) <= restore GetProperty_tn145 jne 0x1b688d81 eax(GetProperty_tn145) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) --------------------------------------- exit block (LIR_xt|LIR_xf) 0x1b688d81: merging registers (intersect) with existing edge mov ecx,-12(ebp) <= restore state mov eax,449640348 mov esp,ebp 0x1b688d8b: jmp 0x1b676ff8 --------------------------------------- end exit block 0x1accf7b8 shape = ld ld3219[skip257] guard(shape) = eq shape, #0xfbe5 sp: xf guard(shape) -> pc=0x1ec4e410 imacpc=0x0 sp+112 rp+8 mov edx,32(edx) eax(GetProperty_tn145) ecx(ld825) edx(ld3219) ebx(cx) esi(state) edi(sp) cmp edx,64485 eax(GetProperty_tn145) ecx(ld825) edx(shape) ebx(cx) esi(state) edi(sp) jne 0x1b688d90 eax(GetProperty_tn145) ecx(ld825) ebx(cx) esi(state) edi(sp) Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x1b676c11 in ?? () 0x1b676bf9: jne 0x1b688d81 0x1b676bff: mov 0x20(%edx),%edx 0x1b676c02: cmp $0xfbe5,%edx 0x1b676c08: jne 0x1b688d90 0x1b676c0e: mov 0x8(%ecx),%ecx 0x1b676c11: mov (%ecx),%ecx 0x1b676c13: mov (%ecx),%edx 0x1b676c15: mov (%edx),%edx 0x1b676c17: test %edx,%edx
Based on conversations with Andreas and Bkap, we have to block on this.
Flags: blocking1.9.1? → blocking1.9.1+
tony, juan: I already narrowed this to: http://hg.mozilla.org/releases/mozilla-1.9.1/pushloghtml?fromchange=7e2facde0c95&tochange=9b52390838f0 Feel free to bisect it again, but I'm pretty sure the problem's in that merge.
Flags: blocking1.9.1+ → blocking1.9.1?
Attached file partially reduced (2519 lines, requires browser) (obsolete) (deleted) —
This crashes my mozilla-central opt build on Mac, with a null deref from JIT code. Will try to reduce more.
Attachment #384752 - Attachment is obsolete: true
To reduce it, I'm using: ./lithium.py --testcase=t.js ./ok-shell-crashes-browser.py 12 ~/central/opt-obj/dist/Firefox.app/Contents/MacOS/firefox-bin s1.html I also tried to hack out all the browser-dependent bits (e.g. window, document, navigator), but I got stuck on the last |document| :(
Flags: blocking1.9.1? → blocking1.9.1+
robcee doesn't have time to bisect among the changesets that went into the merge identified in comment 63. Anyone else want to pick up where he left off, and try to identify the changeset that introduced this crash?
(In reply to comment #66) > robcee doesn't have time to bisect among the changesets that went into the > merge identified in comment 63. Anyone else want to pick up where he left off, > and try to identify the changeset that introduced this crash? autoBisect can take over once we get a shell testcase.... :)
for what it's worth i crash every time if i refresh any page with firebug 1.3.3's html inspect console open. if i disable jit.content i no longer crash.
Assignee: general → gal
Attached file partially reduced (577 lines, shell) (obsolete) (deleted) —
Attachment #384784 - Attachment is obsolete: true
I'm scanning the urls from comment 13. Crashes so far on 1.9.1/mac os x. http://www.latio.lv/lv/ Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x00054c51 in ?? () (gdb) bt #0 0x00054c51 in ?? () #1 0xa03a7690 in __sF () http://www.latio.lv/lv/piedavajuma/?view=67988 Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x00054c51 in ?? () (gdb) bt #0 0x00054c51 in ?? () #1 0xa03a7690 in __sF () Previous frame inner to this frame (gdb could not unwind past this frame) http://www.latio.lv/lv/870/ Program received signal EXC_BAD_ACCESS, Could not access memory. Reason: KERN_PROTECTION_FAILURE at address: 0x00000000 0x1750ec51 in ?? () (gdb) bt #0 0x1750ec51 in ?? () #1 0xa03a7690 in __sF () Previous frame inner to this frame (gdb could not unwind past this frame)
Attached file partially reduced (228 lines, shell) (obsolete) (deleted) —
Attachment #384793 - Attachment is obsolete: true
Not exploitable, but fairly bad bug. Fix in a sec.
Priority: -- → P1
Target Milestone: --- → mozilla1.9.1
Attached patch patch (deleted) — Splinter Review
Attachment #384795 - Flags: review?(mrbkap)
Removed a bogus line from the patch in test_property_cache.
Attachment #384795 - Flags: review?(mrbkap) → review+
Reliably reproduced by Damon, analyzed by mrbkap, fix confirmed by brendan. Confirmed to fix the particular test case I was looking at. Independent confirmation very welcome, also a reduced test case (I will give it a try, too). http://hg.mozilla.org/tracemonkey/rev/72f8b38ed38d
Attached file partially reduced (176 lines, shell) (deleted) —
Gal, can you create a testcase from scratch, using your understanding of the bug? This code is hard to reduce all the way.
Attachment #384794 - Attachment is obsolete: true
I want to see a minimal testcase so I know what I'm failing to fuzz ;)
autoBisect shows this is probably related to bug 478525 : The first bad revision is: changeset: 26145:f449fe8bd097 parent: 26142:33c5c42a29c7 user: Andreas Gal date: Tue Mar 17 15:39:42 2009 -0700 summary: Try harder to trace array access with non-int / non-string index (478525, r=brendan). Strange - this isn't in the regression windows of the previous comments.
Blocks: 478525
Keywords: testcase
That's expected: mrbkap narrowed the previous range to http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ab0047adeb64 but realized that change would only affect it in the browser, not in the shell.
That's expected: mrbkap narrowed the previous range to http://hg.mozilla.org/releases/mozilla-1.9.1/rev/ab0047adeb64 but realized that change would only affect it in the browser, not in the shell.
function a() {} function b() {} a.prototype = null; var o1 = new a(); var o2 = new b(); function test(o) { for (var i = 0; i < 5; i++) o.foobar; } test(o1); test(Object.getPrototypeOf(Object.getPrototypeOf(o2))); Et voilà!
Even more minimal, without ES5 shenanigans: function a() { } a.prototype = null; var o = new a(); function test(o) { for (var i = 0; i < 5; i++) o.foobar; } test(o); test(Object.prototype); So |o| and |Object.prototype| both have the same shape, but the former hops once more before nulling out while the second nulls out immediately.
(In reply to comment #79) > autoBisect shows this is probably related to bug 478525 : > > The first bad revision is: > changeset: 26145:f449fe8bd097 > parent: 26142:33c5c42a29c7 > user: Andreas Gal > date: Tue Mar 17 15:39:42 2009 -0700 > summary: Try harder to trace array access with non-int / non-string index > (478525, r=brendan). > > Strange - this isn't in the regression windows of the previous comments. It seems the bug went in with the patch for bug 478512. /be
Gary, I think bug 478525 is innocent, although it may be enabling tracing of something in the testcase you were running autoBisect on. Could you try Waldo's smallest testcase and see if it doesn't confirm the patch for bug 478512 being the regressing change? /be
Blocks: 478512
Let's talk about the fix for a second? How localized is the codepath? Are we talking about something which is touched every time we trace, which would require a full beta cycle, or something akin to adding a null check to ensure we don't go somewhere we shouldn't be?
Beltzner: The fix adds a JITted null check, indeed. The factoring out of guardHasPrototype is simple, "constant" in complexity. It builds on common methods used all over, using them in conventional ways. /be
I would sleep better if we can give this a week of rc coverage before we ship final, but we don't need a beta for this. Its a localized additional null pointer check, targeting a side exit that was already present previously and we exit trace instead of exploding with a bus error. At the machine level its an additional "test reg, reg ; jz exit". We want to run every test we can think of against this, but I think its pretty low risk overall.
OK, so what's the testplan here? Should we land on 1.9.1 and trunk immediately and start testing on nightly builds? Should we also redo RC3? My feeling is that if this is just a null check, we don't have as much to risk as otherwise thought, so getting it on more branches and in the new RC is better than waiting.
bc: Jesse suggested taking the list of URLs from the crash reports (first attachment here) and running them with a build with this patch through load-crash-urls to see if it manages to solve most of them. Sound good?
I think we should optimize this by building it into RC3. If we do and it bites back somehow we back out and respin. But it is a straightforward fix. IMHO the odds are higher that a week's worth of RC testing will emphasize other known topcrashes, or perhaps put a new one on our radar due to new content and/or user cohort in some locale (for example). We need to fix what is topmost and if the next one down is much less frequent, release 3.5 and put the rest of the fixes into the dot release. /be
I'm inclined to agree with the honourable representative from Sunnyvale in comment 91; I'll tell the build team to scupper rc3build1. Waldo's pushing this to mozilla-central and mozilla-1.9.1 as we speak.
Status: NEW → RESOLVED
Closed: 15 years ago
Keywords: fixed1.9.1
Resolution: --- → FIXED
Whiteboard: fixed-in-tracemonkey
(In reply to comment #90) > bc: Jesse suggested taking the list of URLs from the crash reports (first > attachment here) and running them with a build with this patch through > load-crash-urls to see if it manages to solve most of them. Sound good? Well, obsolete already. :-) I guess we go with fresh builds. current run at 2239/3054 urls. Note the scan is missing sites that require a click to "I'm over 18" and such. Making list of homepages it is reduced to 1177. That might be good enough for a quick run. I'll continue to list the reproducible crash urls here so others can check as well. http://www.skyfunny.com/thread-3548-1-5.html (known not reproducible locally) http://www.skyfunny.com/thread-15010-1-1.html (known not reproducible locally) http://www.latio.lv/ru/novosti/1917/ (same stack as before) http://www.latio.lv/lv/piedavajuma/?view=67122 (same stack as before) http://www.colchones-online.com/literas.html?from=adwords&gclid=CPCdi7uPjpsCFZkA4wodcXGboQ (same stack as latio) http://www.colchones-online.com/ (same stack as latio)
Checking the js shell test from comment 83 and the attached html testcase with a debug build on OS X made from http://hg.mozilla.org/releases/mozilla-1.9.1/rev/2747b209db85 looks good. No crash anymore.
(In reply to comment #85) > Gary, I think bug 478525 is innocent, although it may be enabling tracing of > something in the testcase you were running autoBisect on. Could you try Waldo's > smallest testcase and see if it doesn't confirm the patch for bug 478512 being > the regressing change? > > /be Brendan, you're right. :) Using Waldo's smallest testcase, autoBisect confirms bug 478512 might be related instead. The first bad revision is: changeset: 25416:707f96a1de28 parent: 25413:c63cf255ec3b user: Andreas Gal date: Thu Feb 26 19:01:02 2009 -0800 summary: Trace reading undefined properties (478512, r=jwalden).
No longer blocks: 478525
478512 is definitively the regressor. Broken originally by me, reviewed by Waldo, identified using his testcase, and fixed by my patch. Its all stays in the family :)
only 9 crashers found in the orginal list. none crash with a fresh build from this morning. am scanning the homepages of the listed urls now with a fresh build.
(In reply to comment #92) > I'm inclined to agree with the honourable representative from Sunnyvale Santa Clara, but don't call me a congresscritter if you please (or if I am, where are those bribes?). The followup fix with the FIXME comment is over-optimistic (proto-chaining means there's still a many to one invalidation hazard when mutating a proto after we cache or guard on a not-found property, even with inherited shapes), but I'll deal with it in bug 497789. /be
Running the url again from comment 54 no longer crashes on today's branch and trunk nightly. Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.1pre) Gecko/20090624 Shiretoko/3.5pre Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.2a1pre) Gecko/20090624 Minefield/3.6a1pre
Alongside comment #101, I gave it a whirl on Linux using all test case URLs in comment #95 No crashes on 06/24 trunk and RC3 build 2 Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.2a1pre) Gecko/20090624 Minefield/3.6a1pre Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9.1) Gecko/20090624 Firefox/3.5
no crashes on the homepage run with mac os x xserve, but i did get two hangs: http://www.kafic.net/ https://ibank.standardchartered.com.my/ but they aren't reproducible locally.
Attachment #384133 - Attachment is private: true
(In reply to comment #103) > no crashes on the homepage run with mac os x xserve, but i did get two hangs: > > http://www.kafic.net/ > https://ibank.standardchartered.com.my/ > > but they aren't reproducible locally. Also no crashes on the crash-url-run with Windows and a build that contain this fix.
fwiw, the topcrash part of this is fixed, but there's still another (much smaller!) lingering crash with a similar stack. I'd give you a set URL with them, but it's not really possible to search by build ID, so... click on the "Table" view to see how many. http://crash-stats.mozilla.com/report/list?product=Firefox&version=Firefox%3A3.5&query_search=signature&query_type=exact&query=&date=&range_value=1&range_unit=weeks&do_query=1&signature=js_MonitorLoopEdge%28JSContext*%2C%20unsigned%20int%26%29 (To be clear, no reason to open this bug, just a confirmation that the topcrash looks fixed.)
Marking bug verified given all the verifications in the comments.
Status: RESOLVED → VERIFIED
Filed bug 500936 on the remaining (non-#1) topcrash.
Crash Signature: [@ js_MonitorLoopEdge(JSContext*, unsigned int&)]
Automatically extracted testcase for this bug was committed: https://hg.mozilla.org/mozilla-central/rev/2e891e0db397
Flags: in-testsuite+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: