Closed
Bug 865701
Opened 12 years ago
Closed 11 years ago
crash in nsFrameManager::ReResolveStyleContext with AMD Radeon 6310/6320
Categories
(Core :: Layout, defect)
Tracking
()
RESOLVED
INCOMPLETE
People
(Reporter: scoobidiver, Assigned: benjamin)
References
Details
(Keywords: crash, regression)
Crash Data
It's already #1 top crasher in 21.0b4 which is not yet released.
Signature mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**) More Reports Search
UUID 1dcbafd5-842f-4903-8323-24c0e2130425
Date Processed 2013-04-25 14:16:30
Uptime 2643
Last Crash 44.6 minutes before submission
Install Age 16.8 hours since version was first installed.
Install Time 2013-04-24 21:26:50
Product Firefox
Version 21.0
Build ID 20130423212553
Release Channel beta
OS Windows NT
OS Version 6.1.7601 Service Pack 1
Build Architecture x86
Build Architecture Info AuthenticAMD family 20 model 2 stepping 0
Crash Reason EXCEPTION_ACCESS_VIOLATION_WRITE
Crash Address 0x1d18
App Notes
AdapterVendorID: 0x1002, AdapterDeviceID: 0x9806, AdapterSubsysID: 00000000, AdapterDriverVersion: 6.1.7600.16385
D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers-
Processor Notes sp-processor06.phx1.mozilla.com_22392:2012
EMCheckCompatibility True
Adapter Vendor ID 0x1002
Adapter Device ID 0x9806
Total Virtual Memory 2147352576
Available Virtual Memory 1619656704
System Memory Use Percentage 47
Available Page File 2430001152
Available Physical Memory 903467008
Frame Module Signature Source
0 xul.dll mozilla::dom::DocumentBinding::CreateInterfaceObjects obj-firefox/dom/bindings/DocumentBinding.cpp:7308
1 xul.dll NS_NewStyleContext layout/style/nsStyleContext.cpp:723
2 xul.dll nsStyleSet::GetContext layout/style/nsStyleSet.cpp:776
3 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1219
4 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1604
5 xul.dll nsFrameManager::ComputeStyleChangeFor layout/base/nsFrameManager.cpp:1697
6 xul.dll nsCSSFrameConstructor::RestyleElement layout/base/nsCSSFrameConstructor.cpp:8442
7 xul.dll mozilla::css::RestyleTracker::DoProcessRestyles layout/base/RestyleTracker.cpp:209
8 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:3880
9 xul.dll nsRefreshDriver::Tick layout/base/nsRefreshDriver.cpp:898
10 xul.dll mozilla::RefreshDriverTimer::Tick layout/base/nsRefreshDriver.cpp:156
11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:498
12 nspr4.dll nspr4.dll@0x8d70
13 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:589
14 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:627
15 ntdll.dll EtwEventEnabled
16 nspr4.dll PR_Lock nsprpub/pr/src/threads/combined/prulock.c:201
17 nspr4.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:315
18 xul.dll mozilla::Mutex::Unlock obj-firefox/dist/include/mozilla/Mutex.h:83
19 xul.dll NS_ProcessNextEvent_P obj-firefox/xpcom/build/nsThreadUtils.cpp:238
20 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:117
21 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:208
22 xul.dll _SEH_epilog4
23 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:182
24 xul.dll nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163
25 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:154
26 xul.dll XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3871
27 mozalloc.dll mozalloc.dll@0x10a0
28 @0x17024e0
More reports at:
https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3ADocumentBinding%3A%3ACreateInterfaceObjects%28JSContext*%2C+JSObject*%2C+JSObject**%29
Setting QA Contact to Juan since he has a netbook with the AMD 6310 GPU.
QA Contact: jbecerra
Comment 2•12 years ago
|
||
I have not been able to reproduce this crash exercising functionality in live.com and yahoo.com accounts or by visiting some of the other URLs associated with this crash and browsing around those sites.
You can access the machine through VNC in the MV office at 10.250.6.86 if you want to give it a try.
I'm wondering if there is any difference between the AMD E350/450 and Radeon 6310/6320 on a Desktop platform vs a Laptop platform. I know that sometimes the same CPU/GPU will have slight technical differences depending on the intended platform. Would anyone be able to comment to this theory? Maybe someone from AMD?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #3)
> I'm wondering if there is any difference between the AMD E350/450 and Radeon
> 6310/6320 on a Desktop platform vs a Laptop platform. I know that sometimes
> the same CPU/GPU will have slight technical differences depending on the
> intended platform. Would anyone be able to comment to this theory? Maybe
> someone from AMD?
The reason I ask is if it's worth the time and money to invest in a desktop computer using this hardware given that Juan's been unable to reproduce thus far on a netbook platform.
Reporter | ||
Updated•12 years ago
|
Crash Signature: [@ mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**)] → [@ mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**)]
[@ JSCompartment::getNewType(JSContext*, js::Class*, js::TaggedProto, JSFunction*) ]
[@ JS_GetCompartmentPrincipals(JSCompartment*) ]
[@ nsStyleSet::ReparentSt…
Reporter | ||
Comment 5•12 years ago
|
||
21.0b1 seems as crashy as 19.0 was (see bug 830531).
Crash Signature: , JSFunction*) ]
[@ JS_GetCompartmentPrincipals(JSCompartment*) ]
[@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] → , JSFunction*) ]
[@ JS_GetCompartmentPrincipals(JSCompartment*) ]
[@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ]
[@ nsFrameManager::ReResolveStyleContext(nsPresContext*, nsIFrame*, nsIContent*, nsStyleCh…
Reporter | ||
Comment 6•12 years ago
|
||
> 21.0b1 seems [...]
I meant 21.0b4.
Reporter | ||
Updated•12 years ago
|
Crash Signature: , unsigned int) ]
[@ nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*) ] → , unsigned int) ]
[@ nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*) ]
[@ nsCSSFrameConstructor::AddFrameConstructionItems(nsFrameConstructorState&, nsIContent*, bool, nsIFrame*, nsCSSFrameConstructor::FrameConstructionItemList&) ]
[@ n…
Updated•12 years ago
|
Comment 7•12 years ago
|
||
Can the impacted users please help test if ftp://ftp.mozilla.org/pub/mozilla.org/firefox/nightly/21.0b5-candidates/build1/win32/en-US/Firefox%20Setup%2021.0b5.exe reproduces these crash issues ?
Comment 8•12 years ago
|
||
Checking the latest Url's from the crash-report, I see https://www.facebook.com/ a top hit with which the users seem to crash.
Benjamin, please let us know if there is any more interesting co-relations from the data yet that can help QA here.
Assignee | ||
Comment 9•12 years ago
|
||
I have a full dump of this crash (signature of JSCompartment::getNewType) from Juan's QA computer. I will be examining it thoroughly to check the memory corruption.
Assignee | ||
Comment 10•12 years ago
|
||
dbaron or anyone else, could you construct a web page that calls NS_NewStyleContext in as tight a loop as we can manage? I'm trying to make this as reproduceable as possible.
Juan or anyone with access to that machine, could you play around with running other programs at the same time as Firefox to see if any other programs cause this crash to happen more regularly? Graphics-intensive programs in particular may make this easier to reproduce.
Flags: needinfo?(jbecerra)
Flags: needinfo?(dbaron)
Reporter | ||
Comment 11•12 years ago
|
||
Let's use in the summary the first frame shared by every stack traces.
Crash Signature: , nsCSSFrameConstructor::FrameConstructionItemList&) ]
[@ nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*) ] → , nsCSSFrameConstructor::FrameConstructionItemList&) ]
[@ nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*) ]
[@ nsStyleSet::ProbePseudoElementStyle(mozilla::dom::Element*, nsCSSPseudoElements::Type, nsSty…
Summary: crash in mozilla::dom::DocumentBinding::CreateInterfaceObjects with AMD Radeon 6310/6320 → crash in nsFrameManager::ReResolveStyleContext with AMD Radeon 6310/6320
Comment 12•12 years ago
|
||
This crash seems to have spiked as of today, Adding needsinfo on Kairo to get some data around # of unique affected users.
CCing Ted,Tracy if they can help out earlier here.The basic query to be used here is similar to https://bugzilla.mozilla.org/show_bug.cgi?id=830531#c25, taking the right signature into account.
Updated•12 years ago
|
Flags: needinfo?(kairo)
Reporter | ||
Comment 13•12 years ago
|
||
I haven't seen these crashes or equivalent ones in 17.0b5 not yet released.
Comment 14•12 years ago
|
||
(In reply to Scoobidiver from comment #13)
> I haven't seen these crashes or equivalent ones in 17.0b5 not yet released.
I think you mean 21.0b5 here ?
Reporter | ||
Comment 15•12 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #14)
> (In reply to Scoobidiver from comment #13)
> > I haven't seen these crashes or equivalent ones in 17.0b5 not yet released.
> I think you mean 21.0b5 here ?
Yes. My bad.
Comment 16•12 years ago
|
||
Using 21.0b5 I let the machine run over the weekend with the same sorts of tabs and video playlists playing, but it didn't crash this weekend. Before, it used to crash a few hours into the video playlist.
I will now go back to using 21.0b4 to address the request in comment #10.
Flags: needinfo?(jbecerra)
Comment 17•12 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #10)
> dbaron or anyone else, could you construct a web page that calls
> NS_NewStyleContext in as tight a loop as we can manage? I'm trying to make
> this as reproduceable as possible.
>
> Juan or anyone with access to that machine, could you play around with
> running other programs at the same time as Firefox to see if any other
> programs cause this crash to happen more regularly? Graphics-intensive
> programs in particular may make this easier to reproduce.
Running multiple programs at the same time seems to make the crash happen sooner. I'll keep trying a few more times, but this last time it took maybe a half hour or so for it to crash on 21.0b4 with several applications open.
Comment 18•12 years ago
|
||
(In reply to bhavana bajaj [:bajaj] from comment #12)
> This crash seems to have spiked as of today, Adding needsinfo on Kairo to
> get some data around # of unique affected users.
As the vast majority of the crashes is from one signature, I'll give you the installations overview of that one (this is for yesterday only):
breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature LIKE 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND utc_day_is(date_processed, '2013-04-28') GROUP BY version;
version | crashes | installations
---------+---------+---------------
21.0 | 18377 | 7460
(1 row)
Flags: needinfo?(kairo)
Comment 19•12 years ago
|
||
For multiple days the ratio of crashes/installation becomes even higher.
breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature LIKE 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND date_processed BETWEEN '2013-04-24' AND '2013-04-29' GROUP BY version;
version | crashes | installations
---------+---------+---------------
21.0 | 44123 | 12495
(1 row)
Comment 20•12 years ago
|
||
Unfortunately I haven't been able to reproduce the problem reliably and within a short period of time. This morning I was able to crash within the first five minutes, but after that I wasn't able to get it to reproduce within a short time.
Whenever I have been able to reproduce the problem it has been while playing youtube videos, a long playlist, and trying to send email using outlook.com. Sometimes it just takes time for this to happen while the videos are playing. In addition, I get a system dialog saying the plugin container had stopped working.
Assignee | ||
Comment 21•12 years ago
|
||
Because of the nature of the memory corruption here, it's going to exist in a bunch of different signatures:
40335 (3.1): mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**)
10543 (8.0): JSCompartment::getNewType(JSContext*, js::Class*, js::TaggedProto, JSFunction*)
3623 (2.0): nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*)
3196 (1.0): nsFrameManager::ReResolveStyleContext(nsPresContext*, nsIFrame*, nsIContent*, nsStyleChangeList*, nsChangeHint, nsChangeHint, nsRestyleHint, mozilla::css::RestyleTracker&, nsFrameManager::DesiredA11yNotifications, nsTArray<nsIContent*>&, TreeMatchConte...
2454 (4.9): JS_GetCompartmentPrincipals(JSCompartment*)
1968 (2.1): nsStyleSet::ResolveStyleFor(mozilla::dom::Element*, nsStyleContext*, TreeMatchContext&)
822 (7.9): js::detail::HashTable<js::AtomStateEntry const, js::HashSet<js::AtomStateEntry, js::AtomHasher, js::SystemAllocPolicy>::SetOps, js::SystemAllocPolicy>::lookup(js::AtomHasher::Lookup const&, unsigned int, unsigned int)
521 (3.5): nsStyleContext::AddChild(nsStyleContext*)
494 (2.0): nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*)
124 (5.3): js::NewObjectWithGivenProto(JSContext*, js::Class*, js::TaggedProto, JSObject*, js::gc::AllocKind, js::NewObjectKind)
80 (3.1): nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*)
71 (2.5): nsStyleSet::ProbePseudoElementStyle(mozilla::dom::Element*, nsCSSPseudoElements::Type, nsStyleContext*, TreeMatchContext&)
63 (2.3): nsStyleSet::GetContext(nsStyleContext*, nsRuleNode*, nsRuleNode*, nsIAtom*, nsCSSPseudoElements::Type, mozilla::dom::Element*, unsigned int)
42 (5.2): mozilla::Preferences::AddBoolVarCache(bool*, char const*, bool)
39 (5.3): SelectorMatches
30 (1.8): firefox.exe@0x10203
29 (1.0): nsStyleContext::CalcStyleDifference(nsStyleContext*, nsChangeHint)
28 (1.8): firefox.exe@0x60203
27 (4.0): RuleHash::EnumerateAllRules(mozilla::dom::Element*, ElementDependentRuleProcessorData*, NodeMatchContext&)
27 (2.2): nsStyleSet::ResolveStyleByAddingRules(nsStyleContext*, nsCOMArray<nsIStyleRule> const&)
25 (3.9): nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*)
The (number) is the average depth of ReResolveStyleContext in the stack.
I'm working on some scripts to scan the stack memory to see whether all these different crashes all have the callstack ReResolveStyleContext -> NS_NewStyleContext -> nsStyleContext::nsStyleContext -> nsStyleContext::AddChild -> weeds or not. I ran a small sample of the minidumps through a memory checker and I haven't encountered any corrupted .text memory yet.
On IRC we were speculating that perhaps an interrupt handler is clobbering some register by accident, so I'm going focus my investigation on whether there is a particular register clobber which might produce the varied effects seen in this bug.
Assignee: nobody → benjamin
Comment 22•12 years ago
|
||
probably something like:
<div id="d">
<div></div>
<!-- repeat to get a decent number of children -->
</div>
<script>
var d = document.getElementById("d");
var cs = getComputedStyle(d, "");
while (true) { /* or break it up to avoid the slow script dialog */
d.style.transform = 'translate(1px)';
cs.color; /* flush style */
d.style.transform = 'translate(2px)';
cs.color; /* flush style */
}
</script>
Flags: needinfo?(dbaron)
Comment 23•12 years ago
|
||
(In reply to comment #21)
> On IRC we were speculating that perhaps an interrupt handler is clobbering some
> register by accident, so I'm going focus my investigation on whether there is a
> particular register clobber which might produce the varied effects seen in this
> bug.
I think a most likely explanation is something corrupting the heap, causing the second mov instruction in the assembly code you showed to me the other day to read a zero value into edx.
Assignee | ||
Comment 24•12 years ago
|
||
I have trouble believing that heap corruption could explain why this crash hits just this function-tree and not others and has the other characteristics (varying volume per beta even on the same cset).
More data from running dumplookup on a sample of crashes with "ReResolveStyleContext" somewhere near the top of the stack:
nsStyleContext::nsStyleContext is present as a return address on the stack in 98% of the crashes, and I think the others are unrelated to this.
Of the crashes which had nsStyleContext::nsStyleContext as a return location, most of them are returning to http://hg.mozilla.org/releases/mozilla-beta/annotate/04aba2e6927f/layout/style/nsStyleContext.cpp#l70 which means that they called nsStyleContext::AddChild and crashed there.
https://crash-stats.mozilla.com/report/index/f46d376f-43e4-428a-96fd-0f15a2130430
In this case, AddChild has returned successfully and we have executed another two instructions:
http://pastebin.mozilla.org/2379752 (crash is at line 78 dereferencing eax+0x24 which should be this->mRuleNode). Registers: $EAX=0xa000c7de $EBX=0x5bc7d765 $ECX=0x18e3dc61
Also:
* $EBX is supposed to be `this`, but it's pointing at code within nsStyleContext::AddChild and cannot be a valid heap address.
* $EAX is the correct value of *($EBX+0xc)
$ECX is a heap address and we just finished a MOV ECX,EBX above. But it's an odd number.
According to my read of the stack memory, the actual value of `this` should be 0x18e3dc78 *(return address + 4).
* http://pastebin.mozilla.org/2379837 is the disassembly of nsStyleContext::AddChild. It only modifies EAX and EDX and never changes EBX or ECX or makes any calls.
* So $EBX should really be identical to $ECX when we hit this crash.
Assignee | ||
Comment 25•12 years ago
|
||
More details from https://crash-stats.mozilla.com/report/index/c3abc1c8-fd11-4db6-940e-dba802130425 which is an EXCEPTION_ILLEGAL_INSTRUCTION:
stack memory shows:
*ESP: nsStyleContext::AddChild + 2 bytes
*(ESP + 4): nsStyleContext::nsStyleContext[70] (returning from AddChild)
*(ESP + 8): EDI saved by nsStyleContext::nsStyleContext
*(ESP + 12): ESI "
*(ESP + 16): EBX "
*(ESP + 20): ECX "
*(ESP + 24): return to NS_NewStyleContext from nsStyleContext::nsStyleContext
The minidump can't show memory corruption in nsStyleContext::AddChild, but I'm still betting that the first two bytes of that function are being corrupted into a 2-byte jump instruction which happens to end up in CreateInterfaceObjects.
The exact offset may vary.
Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's machine shipped to me and try to add memory watchpoints in a kernel debugger.
Comment 26•12 years ago
|
||
(In reply to comment #25)
> Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's
> machine shipped to me and try to add memory watchpoints in a kernel debugger.
You've definitely convinced me of the likelihood of the register corruption! Thanks for the detailed analysis!
Reporter | ||
Comment 27•12 years ago
|
||
Removing the topcrash keyword because:
1. it no longer happens in 21.0b5 and above
2. bug 772330 comment 19
Keywords: topcrash
Comment 28•12 years ago
|
||
Since this is no longer a top crash I'm removing qawanted. QA will continue to dogfood on our AMD netbooks periodically with Beta and RCs.
Keywords: qawanted
Comment 29•11 years ago
|
||
(In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #18)
> (In reply to bhavana bajaj [:bajaj] from comment #12)
> > This crash seems to have spiked as of today, Adding needsinfo on Kairo to
> > get some data around # of unique affected users.
>
> As the vast majority of the crashes is from one signature, I'll give you the
> installations overview of that one (this is for yesterday only):
>
> breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT
> client_crash_date - install_age * interval '1 second') as installations
> FROM reports WHERE product='Firefox' AND signature LIKE
> 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND
> utc_day_is(date_processed, '2013-04-28') GROUP BY version;
> version | crashes | installations
> ---------+---------+---------------
> 21.0 | 18377 | 7460
> (1 row)
For comparison, we had 1,583,848 ADI on 21.0b4 on that day.
(In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #19)
> For multiple days the ratio of crashes/installation becomes even higher.
>
> breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT
> client_crash_date - install_age * interval '1 second') as installations
> FROM reports WHERE product='Firefox' AND signature LIKE
> 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND date_processed
> BETWEEN '2013-04-24' AND '2013-04-29' GROUP BY version;
> version | crashes | installations
> ---------+---------+---------------
> 21.0 | 44123 | 12495
> (1 row)
We had 5,873,863 ADI pings on 21.0b4 over those days.
Comment 30•11 years ago
|
||
ping?(In reply to :Ehsan Akhgari (needinfo? me!) from comment #26)
> (In reply to comment #25)
> > Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's
> > machine shipped to me and try to add memory watchpoints in a kernel debugger.
>
> You've definitely convinced me of the likelihood of the register corruption!
> Thanks for the detailed analysis!
Any news? This (and possibly also bug#830531) is causing us to regenerate-multiple-release-builds as a "workaround", so is a priority for us.
Flags: needinfo?(benjamin)
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
You need to log in
before you can comment on or make changes to this bug.
Description
•