Closed
Bug 601725
Opened 14 years ago
Closed 12 years ago
Firefox 4.0b7pre Crash Report [@ StartRequest ]
Categories
(Core :: JavaScript Engine, defect, P2)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: chofmann, Assigned: jorendorff)
References
Details
(Keywords: crash)
Crash Data
these crashes started slowly increasing around sept 26 in builds from the 25th, but then took a larger bump yesterday to the #5 top crash on trunk.
20100924
20100925
20100926 1 4.0b7pre2010092504 1 ,
20100927 1 4.0b7pre2010092704 1 ,
20100928 2 1 4.0b7pre2010092804,
1 4.0b7pre2010092704,
20100929 2 4.0b7pre2010092904 2 ,
20100930 5 3 4.0b7pre2010093004,
2 4.0b7pre2010092904,
20101001 8 6 4.0b7pre2010093004,
2 4.0b7pre2010100108,
20101002 12 5 4.0b7pre2010100204,
4 4.0b7pre2010093004, 3 4.0b7pre2010100108,
20101003 28 15 4.0b7pre2010093004,
7 4.0b7pre2010100204, 4 4.0b7pre2010100304,
1 4.0b7pre2010100315, 1 4.0b7pre2010100108,
stack looks like
http://crash-stats.mozilla.com/report/index/32f6b4e7-b5f9-4b1b-8d9a-ed8642101002
Frame Module Signature [Expand] Source
0 mozjs.dll StartRequest js/src/jsapi.cpp:815
1 xul.dll XPCCallContext::Init js/src/xpconnect/src/xpccallcontext.cpp:165
more reports at
http://crash-stats.mozilla.com/report/list?signature=StartRequest
Reporter | ||
Comment 1•14 years ago
|
||
probably should block beta 7 if the uptick continues
blocking2.0: --- → ?
Comment 2•14 years ago
|
||
FWIW, http://hg.mozilla.org/mozilla-central/rev/d70a9ed2b89e was pushed on Sept 24th which would have first showed up in Sept 25th builds - possible suspect?
Comment 3•14 years ago
|
||
Updated•14 years ago
|
blocking2.0: ? → beta7+
Comment 4•14 years ago
|
||
The first instance of this crash on record is from a 9/23 build:
http://crash-stats.mozilla.com/report/index/742cdb61-8077-49a7-b231-e428b2101007
The last merge from TM before 9/23 was on 9/21:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=e4fdb23efa33&tochange=e6cfaf0840cb
That group also just happens to contain the last changeset to touch StartRequest, so that changeset seems like a good starting point.
http://hg.mozilla.org/mozilla-central/rev/d20abbebe373
Comment 6•14 years ago
|
||
Looks like there is no thread attached to a context? Thread data null? Brian has blame for the lines affected.
Updated•14 years ago
|
Assignee: general → bhackett1024
Updated•14 years ago
|
Priority: -- → P2
Comment 7•14 years ago
|
||
This is a NULL cx->thread on entry to StartRequest, but cx->thread was being dereferenced by StartRequest before d20abbebe373 and d20abbebe373 does not change anything around where cx->thread is written. From the stack traces in the crash reports it looks like the cx came from GetSafeJSContext in (most? all?) cases, which is again code that d20abbebe373 doesn't touch. So this might be a latent issue; I don't know XPConnect at all and am entirely unequipped to diagnose.
My options are to either back out d20abbebe373 (which is a performance patch) or paper over things by adding a cx->thread NULL check on GetSafeJSContext. Is there a preference?
Comment 8•14 years ago
|
||
(In reply to comment #7)
> This is a NULL cx->thread on entry to StartRequest, but cx->thread was being
> dereferenced by StartRequest before d20abbebe373 and d20abbebe373 does not
> change anything around where cx->thread is written. From the stack traces in
> the crash reports it looks like the cx came from GetSafeJSContext in (most?
> all?) cases, which is again code that d20abbebe373 doesn't touch. So this
> might be a latent issue; I don't know XPConnect at all and am entirely
> unequipped to diagnose.
>
> My options are to either back out d20abbebe373 (which is a performance patch)
> or paper over things by adding a cx->thread NULL check on GetSafeJSContext. Is
> there a preference?
Papering over doesn't sound good.
- Can you back out d20abbebe373 and reland it easily, or will there be merge pain from that? If it's easy, then it would be nice to do it now, either to find the problem or rule out this patch. If it's going to be hard, then it sounds like you don't think d20abbebe373 is the problem so it may not be worth trying at this point.
On the hypothesis that d20abbebe373 is not the cause, I looked again and found 2 patches that look possibly related. lw, jorendorff, could these have anything to do with this crash?
(lw; create XPCCallContext in XPC_NW_Construct)
http://hg.mozilla.org/mozilla-central/rev/81b271bfaeba
(jorendorff; JS_Save/RestoreFrameChain should update cx->compartment)
http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211
Comment 9•14 years ago
|
||
(In reply to comment #8)
> - Can you back out d20abbebe373 and reland it easily, or will there be merge
> pain from that? If it's easy, then it would be nice to do it now, either to
> find the problem or rule out this patch. If it's going to be hard, then it
> sounds like you don't think d20abbebe373 is the problem so it may not be worth
> trying at this point.
It wouldn't be hard to back out. The only problem I have is that if the crashes do go away it still leaves me with no clue what is going on --- it seems possible/likely that d20abbebe373 caused this failure, I just don't see how or why that could happen.
Comment 10•14 years ago
|
||
Luke also pointed me to this changeset as possibly related:
(gal; TM: set right compartment in js::ctypes::CClosure::ClosureStub)
http://hg.mozilla.org/mozilla-central/rev/835980f51185
It hit m-c on Oct 1, so it seems less likely.
Comment 11•14 years ago
|
||
I talked to Blake about the "XPCCallContext in XPC_NW_Construct" patch and we don't see anything suspicious.
Comment 12•14 years ago
|
||
Backed out d20abbebe373 for testing as:
http://hg.mozilla.org/tracemonkey/rev/e9bda67d78ad
http://hg.mozilla.org/tracemonkey/rev/0c6648b5044b
Comment 13•14 years ago
|
||
These are on m-c now as well.
Reporter | ||
Comment 14•14 years ago
|
||
mark as fixed, or wait to see if we can confirm by looking for the absence of these in builds after oct 20?
latest stats are:
date tl crashes at, count build, count build, ...
StartRequest
20101014 18 6 4.0b8pre2010101404,
5 4.0b8pre2010101304, 5 4.0b8pre2010101204,
2 4.0b8pre2010101104,
20101015 7 4 4.0b8pre2010101504,
2 4.0b8pre2010101204, 1 4.0b7pre2010100204,
20101016 7 3 4.0b8pre2010101504,
2 4.0b8pre2010101404, 1 4.0b8pre2010101604,
1 4.0b8pre2010101322,
20101017 5 2 4.0b8pre2010101704,
1 4.0b8pre2010101604, 1 4.0b8pre2010101404,
1 4.0b7pre2010100108,
20101018 7 3 4.0b8pre2010101404,
2 4.0b7pre2010100204, 1 4.0b8pre2010101704,
1 4.0b8pre2010101504,
20101019 3 4.0b8pre2010101904 3 ,
OS: Mac OS X → Windows XP
Comment 15•14 years ago
|
||
(In reply to comment #14)
> mark as fixed, or wait to see if we can confirm by looking for the absence of
> these in builds after oct 20?
The backout is an experiment--we need to try to confirm. So far, I see there is one crash with a build id from 10/20, and it doesn't seem to be the standard m-c build, but it seems to be from a cset after the backout, so it looks like it is not fixed.
We could try http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211 if that doesn't work. Otherwise, someone should inspect the code for other clues about the problem or what experiments we can do to diagnose. I can't get to that for a bit yet--I am currently busy with 2 other b7-blocking topcrashes.
Comment 16•14 years ago
|
||
It looks like this crash is still there after the backout:
http://crash-stats.mozilla.com/report/index/57c03edb-7c74-4823-ba02-bb0a62101021
Comment 17•14 years ago
|
||
Brian's patch should go back in. This bug smells like compartments.
/be
Comment 18•14 years ago
|
||
(In reply to comment #17)
> Brian's patch should go back in. This bug smells like compartments.
I'd like to see at least one crash from a standard m-c nightly post-backout before doing that. But once we have ruled out this patch (which I expect we will), let's reland.
Comment 19•14 years ago
|
||
Is this one from a nightly?
http://crash-stats.mozilla.com/report/index/7aeb3b5f-51dd-45ad-9a4a-8aa4b2101021
Comment 20•14 years ago
|
||
(In reply to comment #19)
> Is this one from a nightly?
>
> http://crash-stats.mozilla.com/report/index/7aeb3b5f-51dd-45ad-9a4a-8aa4b2101021
Yes, that's a nightly. Go ahead an reland. Thanks for trying this out for me.
Does anyone think it's worth trying backing out http://hg.mozilla.org/mozilla-central/rev/a8252fe3f211?
Assignee | ||
Comment 21•14 years ago
|
||
Don't back it out yet! That patch is righteous and necessary for correctness in a lot more cases. If it caused this crash, it's revealing a deeper issue that we have to look at. I'll look tomorrow.
Comment 22•14 years ago
|
||
(In reply to comment #21)
> Don't back it out yet! That patch is righteous and necessary for correctness in
> a lot more cases. If it caused this crash, it's revealing a deeper issue that
> we have to look at. I'll look tomorrow.
OK, I won't back out anything. Btw, I don't have any particularly good reason to suspect that patch: it's just in the right time range and seems at least peripherally related to the issue.
For this crash, I think it needs someone to dive into the code and see what's going on, or what experiments we can do to get more data. I figured trying to back out a patch or two would have a small probability of finding the problem until it can get proper attention. Anyway, it didn't work. :-)
Updated•14 years ago
|
Assignee: bhackett1024 → jorendorff
Comment 23•14 years ago
|
||
Do we really feel like this is a topcrash worthy of blocking b7? I'm seeing, what, maybe 10 reports a day?
Reporter | ||
Comment 24•14 years ago
|
||
daily crash rate is variable. about 3-18 crashes per day in the last ten days, and we had that one spike of 28 crashes on oct 3.
this is a new crash since beta5/6 so its hard to predict where this will go.
if it's site content that tickles this bug directly, or indirectly through GC or other areas then the set of sites in the url list would expose the bug to a lot of users.
180 about:blank
102 \N
99
18 http://www.facebook.com/
17 http://www.facebook.com/home.php?
13 jar:file:///C:/Arquivos%20de%20programas/Minefield/omni.jar!/chrome/browser/content/browser/aboutHome.xhtml
13 http://en-us.start3.mozilla.com/firefox?client=firefox-a&rls=org.mozilla:en-US:official
11 http://www.orkut.com.br/Main#Home
10 about:sessionrestore
7 http://static.ak.facebook.com/common/redirectiframe.html
7 http://maps.google.com/
7 http://apps.facebook.com/onthefarm/index.php
7 http://apps.facebook.com/logout.php
6 https://mail.google.com/mail/?shva=1#inbox
6 http://www.orkut.com.br/Home
6 http://www.google.com/
6 http://www.google.com.br/
6 http://www.facebook.com/?ref=home
Reporter | ||
Comment 25•14 years ago
|
||
also theories about the possibility this is related to this family of bugs should be given a chance to pan out.
Bug 604756 - crash [@ JSString::flatten() ] [@ JSString::flatten ]
Bug 605836 - crash [@ js_NewGCShortString(JSContext*)
see comments which might trace back to this bug.
https://bugzilla.mozilla.org/show_bug.cgi?id=605836#c2
https://bugzilla.mozilla.org/show_bug.cgi?id=604756#c14
Comment 26•14 years ago
|
||
(In reply to comment #23)
> Do we really feel like this is a topcrash worthy of blocking b7? I'm seeing,
> what, maybe 10 reports a day?
My vote would be no. It's currently #71. I don't know how many blocking topcrashes we have, but somehow I doubt it's around 70.
(In reply to comment #24)
> daily crash rate is variable. about 3-18 crashes per day in the last ten
> days, and we had that one spike of 28 crashes on oct 3.
>
> this is a new crash since beta5/6 so its hard to predict where this will go.
True.
> if it's site content that tickles this bug directly, or indirectly through GC
> or other areas then the set of sites in the url list would expose the bug to a
> lot of users.
I would say it's very unlikely it's certain site content. Given the signature and the proximate cause, it is probably related to to threading or compartments somehow.
(In reply to comment #25)
> also theories about the possibility this is related to this family of bugs
> should be given a chance to pan out.
>
> Bug 604756 - crash [@ JSString::flatten() ] [@ JSString::flatten ]
> Bug 605836 - crash [@ js_NewGCShortString(JSContext*)
They might be loosely related (possibly both threading-related), but it is unlikely they are closely related: the flatten crash started much earlier in time.
Updated•14 years ago
|
Comment 28•14 years ago
|
||
It is #38 top crasher in 4.0b9 for the last week.
Comment 29•13 years ago
|
||
In bug 671840 with a [@ StartRequest ] crash signature the reporter has this STR:
"I set Mozilla Firefox as a scheduled task to run every day at a given time and show a specific page. (Normally, the page loads without any problem!)"
And that makes Firefox 5 crash when Firefox is not already running.
Comment 30•13 years ago
|
||
It's #29 top crasher in 9.0.1, #15 in 10.0b2, and #28 in 11.0a2 on Mac OS X.
The stack trace now looks like:
Frame Module Signature [Expand] Source
0 XUL StartRequest js/src/jsapi.cpp:944
1 XUL XPCCallContext::Init js/xpconnect/src/XPCCallContext.cpp:148
2 XUL XPCCallContext::XPCCallContext js/xpconnect/src/XPCCallContext.cpp:63
3 XUL nsXPConnect::BeginCycleCollection js/xpconnect/src/nsXPConnect.cpp:469
4 XUL nsCycleCollector::BeginCollection xpcom/base/nsCycleCollector.cpp:2733
5 XUL nsCycleCollectorRunner::Run xpcom/base/nsCycleCollector.cpp:3504
6 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631
7 XUL NS_ProcessNextEvent_P obj-firefox/i386/xpcom/build/nsThreadUtils.cpp:245
8 XUL nsThread::ThreadFunc xpcom/threads/nsThread.cpp:272
9 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:187
10 libSystem.B.dylib _pthread_start
11 libSystem.B.dylib thread_start
Comment 31•13 years ago
|
||
(In reply to Scoobidiver from comment #30)
> The stack trace now looks like:
> 4 XUL nsCycleCollector::BeginCollection
> xpcom/base/nsCycleCollector.cpp:2733
> 5 XUL nsCycleCollectorRunner::Run xpcom/base/nsCycleCollector.cpp:3504
> 6 XUL nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:631
> 7 XUL NS_ProcessNextEvent_P
> obj-firefox/i386/xpcom/build/nsThreadUtils.cpp:245
> 8 XUL nsThread::ThreadFunc xpcom/threads/nsThread.cpp:272
> 9 libnspr4.dylib _pt_root nsprpub/pr/src/pthreads/ptthread.c:187
> 10 libSystem.B.dylib _pthread_start
> 11 libSystem.B.dylib thread_start
So we start the cycle collection not from the main thread but from some other thread in response to some event???
Comment 32•13 years ago
|
||
(In reply to Igor Bukanov from comment #31)
> So we start the cycle collection not from the main thread but from some
> other thread in response to some event???
Technically yes, but the cycle collector thread and main thread do not run at the same time: the main thread tells the cycle collector thread to run a cycle collection and then the main thread blocks until the cycle collector thread finishes. Why this goofiness? Because gal speculated that it might trashing the caches of the core running the main thread (bug 580096). No measurements were performed and I would be happy if you removed this goofy case; it was no small annoyance to the single-threaded runtime work.
(In reply to Luke Wagner [:luke] from comment #32)
> Why this goofiness? Because gal speculated that it might
> trashing the caches of the core running the main thread (bug 580096). No
> measurements were performed and I would be happy if you removed this goofy
> case; it was no small annoyance to the single-threaded runtime work.
Additionally, I think we might be using a separate thread to make it easier to implement interruptible cycle collection. (At least I think I heard that.) So we might not want to take it out.
Comment 34•13 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #33)
Ah, good point.
Comment 35•13 years ago
|
||
(In reply to Scoobidiver from comment #30)
> It's #29 top crasher in 9.0.1, #15 in 10.0b2, and #28 in 11.0a2 on Mac OS X.
>
> The stack trace now looks like:
What % look like that? With BeginCycleCollection in there somewhere. I looked at maybe 6 crash reports, and only a few had BeginCycleCollection in there, so I'm curious what the rough % in the overall set might be.
Reporter | ||
Comment 36•13 years ago
|
||
reports like https://crash-analysis.mozilla.com/crash_stacks/Stack-summary-11.0a2.txt can help with questions like this to break down the distribution of like stacks for any one signature.
in the browser search down to "startrequest" to find the signature or something deeper in the frame.
Comment 37•13 years ago
|
||
Thanks, I really need to write down that URL somewhere.
I looked through the stack reports for January 6, searching for "|startrequest". The links I give seem to be updated daily, so you won't find the exact same thing I did 1/8 and onwards.
It doesn't show up at all in 9.
In 10, it is ranked 91. There are 23 stacks. https://crash-analysis.mozilla.com/crash_stacks/Stack-summary-10.0.txt
- 9 look like JS_SetRuntimeThread, PR_GetCurrentThread, blank frames, start request
- 8 are nsGlobalWindow::SetNewDocument eventually calling nsDOMClassInfo::DefineStaticJSVals
- 5 are inside the cycle collector
- 1 is some kind of thready thing involving nsRunnable::Release
In 11, it is ranked 57. There are 6 stacks.
- 3 are in the cycle collector
- 2 are inside XPCJSContextStack::Pop whatever that is
- 1 is blank
Comment 38•13 years ago
|
||
It's #100 top browser crasher but #4 on Mac OS X in 10.0.1.
Comment 39•12 years ago
|
||
Scoob, there are no crashes for this sig newer than version 11.
there are other sigs that *contain* StartRequest for current versions, such as mozalloc_abort(char const* const) | mozalloc_handle_oom(unsigned int) | moz_xmalloc | nsStreamLoader::OnStartRequest(nsIRequest*, nsISupports*)
Comment 40•12 years ago
|
||
ref also Bug 671840 and Bug 671266
Comment 41•12 years ago
|
||
(In reply to Wayne Mery (:wsmwk) from comment #39)
> Scoob, there are no crashes for this sig newer than version 11.
Let's close it.
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•