Closed Bug 865701 Opened 12 years ago Closed 11 years ago

crash in nsFrameManager::ReResolveStyleContext with AMD Radeon 6310/6320

Categories

(Core :: Layout, defect)

21 Branch
x86
Windows 7
defect
Not set
critical

Tracking

()

RESOLVED INCOMPLETE
Tracking Status
firefox21 + affected

People

(Reporter: scoobidiver, Assigned: benjamin)

References

Details

(Keywords: crash, regression)

Crash Data

It's already #1 top crasher in 21.0b4 which is not yet released. Signature mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**) More Reports Search UUID 1dcbafd5-842f-4903-8323-24c0e2130425 Date Processed 2013-04-25 14:16:30 Uptime 2643 Last Crash 44.6 minutes before submission Install Age 16.8 hours since version was first installed. Install Time 2013-04-24 21:26:50 Product Firefox Version 21.0 Build ID 20130423212553 Release Channel beta OS Windows NT OS Version 6.1.7601 Service Pack 1 Build Architecture x86 Build Architecture Info AuthenticAMD family 20 model 2 stepping 0 Crash Reason EXCEPTION_ACCESS_VIOLATION_WRITE Crash Address 0x1d18 App Notes AdapterVendorID: 0x1002, AdapterDeviceID: 0x9806, AdapterSubsysID: 00000000, AdapterDriverVersion: 6.1.7600.16385 D3D10 Layers? D3D10 Layers- D3D9 Layers? D3D9 Layers- Processor Notes sp-processor06.phx1.mozilla.com_22392:2012 EMCheckCompatibility True Adapter Vendor ID 0x1002 Adapter Device ID 0x9806 Total Virtual Memory 2147352576 Available Virtual Memory 1619656704 System Memory Use Percentage 47 Available Page File 2430001152 Available Physical Memory 903467008 Frame Module Signature Source 0 xul.dll mozilla::dom::DocumentBinding::CreateInterfaceObjects obj-firefox/dom/bindings/DocumentBinding.cpp:7308 1 xul.dll NS_NewStyleContext layout/style/nsStyleContext.cpp:723 2 xul.dll nsStyleSet::GetContext layout/style/nsStyleSet.cpp:776 3 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1219 4 xul.dll nsFrameManager::ReResolveStyleContext layout/base/nsFrameManager.cpp:1604 5 xul.dll nsFrameManager::ComputeStyleChangeFor layout/base/nsFrameManager.cpp:1697 6 xul.dll nsCSSFrameConstructor::RestyleElement layout/base/nsCSSFrameConstructor.cpp:8442 7 xul.dll mozilla::css::RestyleTracker::DoProcessRestyles layout/base/RestyleTracker.cpp:209 8 xul.dll PresShell::FlushPendingNotifications layout/base/nsPresShell.cpp:3880 9 xul.dll nsRefreshDriver::Tick layout/base/nsRefreshDriver.cpp:898 10 xul.dll mozilla::RefreshDriverTimer::Tick layout/base/nsRefreshDriver.cpp:156 11 xul.dll nsTimerImpl::Fire xpcom/threads/nsTimerImpl.cpp:498 12 nspr4.dll nspr4.dll@0x8d70 13 xul.dll nsTimerEvent::Run xpcom/threads/nsTimerImpl.cpp:589 14 xul.dll nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:627 15 ntdll.dll EtwEventEnabled 16 nspr4.dll PR_Lock nsprpub/pr/src/threads/combined/prulock.c:201 17 nspr4.dll PR_Unlock nsprpub/pr/src/threads/combined/prulock.c:315 18 xul.dll mozilla::Mutex::Unlock obj-firefox/dist/include/mozilla/Mutex.h:83 19 xul.dll NS_ProcessNextEvent_P obj-firefox/xpcom/build/nsThreadUtils.cpp:238 20 xul.dll mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:117 21 xul.dll MessageLoop::RunHandler ipc/chromium/src/base/message_loop.cc:208 22 xul.dll _SEH_epilog4 23 xul.dll MessageLoop::Run ipc/chromium/src/base/message_loop.cc:182 24 xul.dll nsBaseAppShell::Run widget/xpwidgets/nsBaseAppShell.cpp:163 25 xul.dll nsAppShell::Run widget/windows/nsAppShell.cpp:154 26 xul.dll XREMain::XRE_mainRun toolkit/xre/nsAppRunner.cpp:3871 27 mozalloc.dll mozalloc.dll@0x10a0 28 @0x17024e0 More reports at: https://crash-stats.mozilla.com/report/list?signature=mozilla%3A%3Adom%3A%3ADocumentBinding%3A%3ACreateInterfaceObjects%28JSContext*%2C+JSObject*%2C+JSObject**%29
Keywords: qawanted
Setting QA Contact to Juan since he has a netbook with the AMD 6310 GPU.
QA Contact: jbecerra
I have not been able to reproduce this crash exercising functionality in live.com and yahoo.com accounts or by visiting some of the other URLs associated with this crash and browsing around those sites. You can access the machine through VNC in the MV office at 10.250.6.86 if you want to give it a try.
I'm wondering if there is any difference between the AMD E350/450 and Radeon 6310/6320 on a Desktop platform vs a Laptop platform. I know that sometimes the same CPU/GPU will have slight technical differences depending on the intended platform. Would anyone be able to comment to this theory? Maybe someone from AMD?
(In reply to Anthony Hughes, Mozilla QA (:ashughes) from comment #3) > I'm wondering if there is any difference between the AMD E350/450 and Radeon > 6310/6320 on a Desktop platform vs a Laptop platform. I know that sometimes > the same CPU/GPU will have slight technical differences depending on the > intended platform. Would anyone be able to comment to this theory? Maybe > someone from AMD? The reason I ask is if it's worth the time and money to invest in a desktop computer using this hardware given that Juan's been unable to reproduce thus far on a netbook platform.
Crash Signature: [@ mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**)] → [@ mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**)] [@ JSCompartment::getNewType(JSContext*, js::Class*, js::TaggedProto, JSFunction*) ] [@ JS_GetCompartmentPrincipals(JSCompartment*) ] [@ nsStyleSet::ReparentSt…
21.0b1 seems as crashy as 19.0 was (see bug 830531).
Crash Signature: , JSFunction*) ] [@ JS_GetCompartmentPrincipals(JSCompartment*) ] [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] → , JSFunction*) ] [@ JS_GetCompartmentPrincipals(JSCompartment*) ] [@ nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) ] [@ nsFrameManager::ReResolveStyleContext(nsPresContext*, nsIFrame*, nsIContent*, nsStyleCh…
> 21.0b1 seems [...] I meant 21.0b4.
Crash Signature: , unsigned int) ] [@ nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*) ] → , unsigned int) ] [@ nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*) ] [@ nsCSSFrameConstructor::AddFrameConstructionItems(nsFrameConstructorState&, nsIContent*, bool, nsIFrame*, nsCSSFrameConstructor::FrameConstructionItemList&) ] [@ n…
Checking the latest Url's from the crash-report, I see https://www.facebook.com/ a top hit with which the users seem to crash. Benjamin, please let us know if there is any more interesting co-relations from the data yet that can help QA here.
I have a full dump of this crash (signature of JSCompartment::getNewType) from Juan's QA computer. I will be examining it thoroughly to check the memory corruption.
dbaron or anyone else, could you construct a web page that calls NS_NewStyleContext in as tight a loop as we can manage? I'm trying to make this as reproduceable as possible. Juan or anyone with access to that machine, could you play around with running other programs at the same time as Firefox to see if any other programs cause this crash to happen more regularly? Graphics-intensive programs in particular may make this easier to reproduce.
Flags: needinfo?(jbecerra)
Flags: needinfo?(dbaron)
Let's use in the summary the first frame shared by every stack traces.
Crash Signature: , nsCSSFrameConstructor::FrameConstructionItemList&) ] [@ nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*) ] → , nsCSSFrameConstructor::FrameConstructionItemList&) ] [@ nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*) ] [@ nsStyleSet::ProbePseudoElementStyle(mozilla::dom::Element*, nsCSSPseudoElements::Type, nsSty…
Summary: crash in mozilla::dom::DocumentBinding::CreateInterfaceObjects with AMD Radeon 6310/6320 → crash in nsFrameManager::ReResolveStyleContext with AMD Radeon 6310/6320
This crash seems to have spiked as of today, Adding needsinfo on Kairo to get some data around # of unique affected users. CCing Ted,Tracy if they can help out earlier here.The basic query to be used here is similar to https://bugzilla.mozilla.org/show_bug.cgi?id=830531#c25, taking the right signature into account.
Flags: needinfo?(kairo)
I haven't seen these crashes or equivalent ones in 17.0b5 not yet released.
(In reply to Scoobidiver from comment #13) > I haven't seen these crashes or equivalent ones in 17.0b5 not yet released. I think you mean 21.0b5 here ?
(In reply to bhavana bajaj [:bajaj] from comment #14) > (In reply to Scoobidiver from comment #13) > > I haven't seen these crashes or equivalent ones in 17.0b5 not yet released. > I think you mean 21.0b5 here ? Yes. My bad.
Using 21.0b5 I let the machine run over the weekend with the same sorts of tabs and video playlists playing, but it didn't crash this weekend. Before, it used to crash a few hours into the video playlist. I will now go back to using 21.0b4 to address the request in comment #10.
Flags: needinfo?(jbecerra)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #10) > dbaron or anyone else, could you construct a web page that calls > NS_NewStyleContext in as tight a loop as we can manage? I'm trying to make > this as reproduceable as possible. > > Juan or anyone with access to that machine, could you play around with > running other programs at the same time as Firefox to see if any other > programs cause this crash to happen more regularly? Graphics-intensive > programs in particular may make this easier to reproduce. Running multiple programs at the same time seems to make the crash happen sooner. I'll keep trying a few more times, but this last time it took maybe a half hour or so for it to crash on 21.0b4 with several applications open.
(In reply to bhavana bajaj [:bajaj] from comment #12) > This crash seems to have spiked as of today, Adding needsinfo on Kairo to > get some data around # of unique affected users. As the vast majority of the crashes is from one signature, I'll give you the installations overview of that one (this is for yesterday only): breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature LIKE 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND utc_day_is(date_processed, '2013-04-28') GROUP BY version; version | crashes | installations ---------+---------+--------------- 21.0 | 18377 | 7460 (1 row)
Flags: needinfo?(kairo)
For multiple days the ratio of crashes/installation becomes even higher. breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT client_crash_date - install_age * interval '1 second') as installations FROM reports WHERE product='Firefox' AND signature LIKE 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND date_processed BETWEEN '2013-04-24' AND '2013-04-29' GROUP BY version; version | crashes | installations ---------+---------+--------------- 21.0 | 44123 | 12495 (1 row)
Unfortunately I haven't been able to reproduce the problem reliably and within a short period of time. This morning I was able to crash within the first five minutes, but after that I wasn't able to get it to reproduce within a short time. Whenever I have been able to reproduce the problem it has been while playing youtube videos, a long playlist, and trying to send email using outlook.com. Sometimes it just takes time for this to happen while the videos are playing. In addition, I get a system dialog saying the plugin container had stopped working.
Because of the nature of the memory corruption here, it's going to exist in a bunch of different signatures: 40335 (3.1): mozilla::dom::DocumentBinding::CreateInterfaceObjects(JSContext*, JSObject*, JSObject**) 10543 (8.0): JSCompartment::getNewType(JSContext*, js::Class*, js::TaggedProto, JSFunction*) 3623 (2.0): nsStyleSet::ReparentStyleContext(nsStyleContext*, nsStyleContext*, mozilla::dom::Element*) 3196 (1.0): nsFrameManager::ReResolveStyleContext(nsPresContext*, nsIFrame*, nsIContent*, nsStyleChangeList*, nsChangeHint, nsChangeHint, nsRestyleHint, mozilla::css::RestyleTracker&, nsFrameManager::DesiredA11yNotifications, nsTArray<nsIContent*>&, TreeMatchConte... 2454 (4.9): JS_GetCompartmentPrincipals(JSCompartment*) 1968 (2.1): nsStyleSet::ResolveStyleFor(mozilla::dom::Element*, nsStyleContext*, TreeMatchContext&) 822 (7.9): js::detail::HashTable<js::AtomStateEntry const, js::HashSet<js::AtomStateEntry, js::AtomHasher, js::SystemAllocPolicy>::SetOps, js::SystemAllocPolicy>::lookup(js::AtomHasher::Lookup const&, unsigned int, unsigned int) 521 (3.5): nsStyleContext::AddChild(nsStyleContext*) 494 (2.0): nsStyleSet::ResolveAnonymousBoxStyle(nsIAtom*, nsStyleContext*) 124 (5.3): js::NewObjectWithGivenProto(JSContext*, js::Class*, js::TaggedProto, JSObject*, js::gc::AllocKind, js::NewObjectKind) 80 (3.1): nsStyleContext::nsStyleContext(nsStyleContext*, nsIAtom*, nsCSSPseudoElements::Type, nsRuleNode*) 71 (2.5): nsStyleSet::ProbePseudoElementStyle(mozilla::dom::Element*, nsCSSPseudoElements::Type, nsStyleContext*, TreeMatchContext&) 63 (2.3): nsStyleSet::GetContext(nsStyleContext*, nsRuleNode*, nsRuleNode*, nsIAtom*, nsCSSPseudoElements::Type, mozilla::dom::Element*, unsigned int) 42 (5.2): mozilla::Preferences::AddBoolVarCache(bool*, char const*, bool) 39 (5.3): SelectorMatches 30 (1.8): firefox.exe@0x10203 29 (1.0): nsStyleContext::CalcStyleDifference(nsStyleContext*, nsChangeHint) 28 (1.8): firefox.exe@0x60203 27 (4.0): RuleHash::EnumerateAllRules(mozilla::dom::Element*, ElementDependentRuleProcessorData*, NodeMatchContext&) 27 (2.2): nsStyleSet::ResolveStyleByAddingRules(nsStyleContext*, nsCOMArray<nsIStyleRule> const&) 25 (3.9): nsRuleNode::WalkRuleTree(nsStyleStructID, nsStyleContext*) The (number) is the average depth of ReResolveStyleContext in the stack. I'm working on some scripts to scan the stack memory to see whether all these different crashes all have the callstack ReResolveStyleContext -> NS_NewStyleContext -> nsStyleContext::nsStyleContext -> nsStyleContext::AddChild -> weeds or not. I ran a small sample of the minidumps through a memory checker and I haven't encountered any corrupted .text memory yet. On IRC we were speculating that perhaps an interrupt handler is clobbering some register by accident, so I'm going focus my investigation on whether there is a particular register clobber which might produce the varied effects seen in this bug.
Assignee: nobody → benjamin
probably something like: <div id="d"> <div></div> <!-- repeat to get a decent number of children --> </div> <script> var d = document.getElementById("d"); var cs = getComputedStyle(d, ""); while (true) { /* or break it up to avoid the slow script dialog */ d.style.transform = 'translate(1px)'; cs.color; /* flush style */ d.style.transform = 'translate(2px)'; cs.color; /* flush style */ } </script>
Flags: needinfo?(dbaron)
(In reply to comment #21) > On IRC we were speculating that perhaps an interrupt handler is clobbering some > register by accident, so I'm going focus my investigation on whether there is a > particular register clobber which might produce the varied effects seen in this > bug. I think a most likely explanation is something corrupting the heap, causing the second mov instruction in the assembly code you showed to me the other day to read a zero value into edx.
I have trouble believing that heap corruption could explain why this crash hits just this function-tree and not others and has the other characteristics (varying volume per beta even on the same cset). More data from running dumplookup on a sample of crashes with "ReResolveStyleContext" somewhere near the top of the stack: nsStyleContext::nsStyleContext is present as a return address on the stack in 98% of the crashes, and I think the others are unrelated to this. Of the crashes which had nsStyleContext::nsStyleContext as a return location, most of them are returning to http://hg.mozilla.org/releases/mozilla-beta/annotate/04aba2e6927f/layout/style/nsStyleContext.cpp#l70 which means that they called nsStyleContext::AddChild and crashed there. https://crash-stats.mozilla.com/report/index/f46d376f-43e4-428a-96fd-0f15a2130430 In this case, AddChild has returned successfully and we have executed another two instructions: http://pastebin.mozilla.org/2379752 (crash is at line 78 dereferencing eax+0x24 which should be this->mRuleNode). Registers: $EAX=0xa000c7de $EBX=0x5bc7d765 $ECX=0x18e3dc61 Also: * $EBX is supposed to be `this`, but it's pointing at code within nsStyleContext::AddChild and cannot be a valid heap address. * $EAX is the correct value of *($EBX+0xc) $ECX is a heap address and we just finished a MOV ECX,EBX above. But it's an odd number. According to my read of the stack memory, the actual value of `this` should be 0x18e3dc78 *(return address + 4). * http://pastebin.mozilla.org/2379837 is the disassembly of nsStyleContext::AddChild. It only modifies EAX and EDX and never changes EBX or ECX or makes any calls. * So $EBX should really be identical to $ECX when we hit this crash.
More details from https://crash-stats.mozilla.com/report/index/c3abc1c8-fd11-4db6-940e-dba802130425 which is an EXCEPTION_ILLEGAL_INSTRUCTION: stack memory shows: *ESP: nsStyleContext::AddChild + 2 bytes *(ESP + 4): nsStyleContext::nsStyleContext[70] (returning from AddChild) *(ESP + 8): EDI saved by nsStyleContext::nsStyleContext *(ESP + 12): ESI " *(ESP + 16): EBX " *(ESP + 20): ECX " *(ESP + 24): return to NS_NewStyleContext from nsStyleContext::nsStyleContext The minidump can't show memory corruption in nsStyleContext::AddChild, but I'm still betting that the first two bytes of that function are being corrupted into a 2-byte jump instruction which happens to end up in CreateInterfaceObjects. The exact offset may vary. Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's machine shipped to me and try to add memory watchpoints in a kernel debugger.
(In reply to comment #25) > Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's > machine shipped to me and try to add memory watchpoints in a kernel debugger. You've definitely convinced me of the likelihood of the register corruption! Thanks for the detailed analysis!
Removing the topcrash keyword because: 1. it no longer happens in 21.0b5 and above 2. bug 772330 comment 19
Keywords: topcrash
Since this is no longer a top crash I'm removing qawanted. QA will continue to dogfood on our AMD netbooks periodically with Beta and RCs.
Keywords: qawanted
(In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #18) > (In reply to bhavana bajaj [:bajaj] from comment #12) > > This crash seems to have spiked as of today, Adding needsinfo on Kairo to > > get some data around # of unique affected users. > > As the vast majority of the crashes is from one signature, I'll give you the > installations overview of that one (this is for yesterday only): > > breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT > client_crash_date - install_age * interval '1 second') as installations > FROM reports WHERE product='Firefox' AND signature LIKE > 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND > utc_day_is(date_processed, '2013-04-28') GROUP BY version; > version | crashes | installations > ---------+---------+--------------- > 21.0 | 18377 | 7460 > (1 row) For comparison, we had 1,583,848 ADI on 21.0b4 on that day. (In reply to Robert Kaiser (:kairo@mozilla.com) [away until early June] from comment #19) > For multiple days the ratio of crashes/installation becomes even higher. > > breakpad=> SELECT version,COUNT(*) as crashes,COUNT(DISTINCT > client_crash_date - install_age * interval '1 second') as installations > FROM reports WHERE product='Firefox' AND signature LIKE > 'mozilla::dom::DocumentBinding::CreateInterfaceObjects%' AND date_processed > BETWEEN '2013-04-24' AND '2013-04-29' GROUP BY version; > version | crashes | installations > ---------+---------+--------------- > 21.0 | 44123 | 12495 > (1 row) We had 5,873,863 ADI pings on 21.0b4 over those days.
ping?(In reply to :Ehsan Akhgari (needinfo? me!) from comment #26) > (In reply to comment #25) > > Ehsan, does this sound like a reasonable guess? If so, I'm going to have Juan's > > machine shipped to me and try to add memory watchpoints in a kernel debugger. > > You've definitely convinced me of the likelihood of the register corruption! > Thanks for the detailed analysis! Any news? This (and possibly also bug#830531) is causing us to regenerate-multiple-release-builds as a "workaround", so is a priority for us.
Flags: needinfo?(benjamin)
No news yet.
Flags: needinfo?(benjamin)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → INCOMPLETE
You need to log in before you can comment on or make changes to this bug.