Closed Bug 1011391 Opened 11 years ago Closed 10 years ago

crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a

Categories

(Core :: CSS Parsing and Computation, defect)

32 Branch
defect
Not set
critical

Tracking

()

VERIFIED FIXED
mozilla33
Tracking Status
firefox31 --- unaffected
firefox32 + verified
firefox33 --- verified

People

(Reporter: cosmin-malutan, Assigned: mccr8)

References

Details

(Keywords: crash, topcrash)

Crash Data

Attachments

(3 files)

This bug was filed from the Socorro interface and is report bp-ea140cff-9bc6-46b1-afd8-f53e62140515. ============================================================= https://crash-stats.mozilla.com/report/index/ea140cff-9bc6-46b1-afd8-f53e62140515 This crashed with Nightly en-US on Mac 10.8 (mm-osx-108-4) in test: http://hg.mozilla.org/qa/mozmill-tests/file/7ac564a10cbc/firefox/tests/remote/testSecurity/testSafeBrowsingNotificationBar.js
Mostly on Windows 7 and 8.1. Started appearing on nigthly 32a1 around 5/2, and it is currently a top crasher, #5, in the list of crashers for nightly. The signature has been around for a while, and other versions are affected, but in other versions it hasn't figured on top of the list. There are very few comments, and not much to go by other than what's been reported in comment #0. More reports here: https://crash-stats.mozilla.com/report/list?product=Firefox&signature=RuleHash_ClassTable_GetKey 0 xul.dll RuleHash_ClassTable_GetKey layout/style/nsCSSRuleProcessor.cpp 1 xul.dll RuleHash_CSMatchEntry layout/style/nsCSSRuleProcessor.cpp 2 xul.dll SearchTable xpcom/glue/pldhash.cpp 3 xul.dll PL_DHashTableOperate(PLDHashTable *,void const *,PLDHashOperator) xpcom/glue/pldhash.cpp 4 xul.dll RuleHash::EnumerateAllRules(mozilla::dom::Element *,ElementDependentRuleProcessorData *,NodeMatchContext &) layout/style/nsCSSRuleProcessor.cpp 5 xul.dll xul.dll@0xf6324c 6 xul.dll nsCSSRuleProcessor::RulesMatching(ElementRuleProcessorData *) layout/style/nsCSSRuleProcessor.cpp 7 xul.dll mozilla::css::SetStyleSheetReference layout/style/nsCSSRules.cpp 8 xul.dll nsStyleSet::FileRules(bool (*)(nsIStyleRuleProcessor *,void *),RuleProcessorData *,mozilla::dom::Element *,nsRuleWalker *) layout/style/nsStyleSet.cpp 9 mozglue.dll arena_malloc_small memory/mozjemalloc/jemalloc.c 10 xul.dll nsStyleContext::nsStyleContext(nsStyleContext *,nsIAtom *,nsCSSPseudoElements::Type,nsRuleNode *,bool) layout/style/nsStyleContext.cpp 11 mozglue.dll arena_malloc memory/mozjemalloc/jemalloc.c 12 xul.dll nsStyleSet::ResolveStyleFor(mozilla::dom::Element *,nsStyleContext *,TreeMatchContext &) layout/style/nsStyleSet.cpp 13 xul.dll nsCSSFrameConstructor::FindDisplayData(nsStyleDisplay const *,mozilla::dom::Element *,nsIFrame *,nsStyleContext *) layout/base/nsCSSFrameConstructor.cpp 14 xul.dll nsCSSFrameConstructor::FrameConstructionItemList::AppendItem(nsCSSFrameConstructor::FrameConstructionData const *,nsIContent *,nsIAtom *,int,PendingBinding *,already_AddRefed<nsStyleContext> &&,bool,nsTArray<nsIAnonymousContentCreator::ContentInfo> *) layout/base/nsCSSFrameConstructor.h 15 xul.dll nsCSSFrameConstructor::ResolveStyleContext(nsStyleContext *,nsIContent *,nsFrameConstructorState *) layout/base/nsCSSFrameConstructor.cpp 16 xul.dll nsCSSFrameConstructor::ResolveStyleContext(nsIFrame *,nsIContent *,nsFrameConstructorState *) layout/base/nsCSSFrameConstructor.cpp 17 xul.dll nsCSSFrameConstructor::AddFrameConstructionItems(nsFrameConstructorState &,nsIContent *,bool,nsIFrame *,nsCSSFrameConstructor::FrameConstructionItemList &) layout/base/nsCSSFrameConstructor.cpp 18 xul.dll mozilla::dom::ExplicitChildIterator::GetNextChild() content/base/src/ChildIterator.cpp 19 xul.dll nsCSSFrameConstructor::ProcessChildren(nsFrameConstructorState &,nsIContent *,nsStyleContext *,nsIFrame *,bool,nsFrameItems &,bool,PendingBinding *,nsIFrame *) layout/base/nsCSSFrameConstructor.cpp 20 xul.dll SearchTable xpcom/glue/pldhash.cpp 21 xul.dll PL_DHashTableOperate(PLDHashTable *,void const *,PLDHashOperator) xpcom/glue/pldhash.cpp 22 xul.dll xul.dll@0x1079e00 23 xul.dll BRFrame::AccessibleType() layout/generic/nsBRFrame.cpp 24 xul.dll nsFrame::Init(nsIContent *,nsIFrame *,nsIFrame *) layout/generic/nsFrame.cpp 25 xul.dll nsFrameManager::RestoreFrameStateFor(nsIFrame *,nsILayoutHistoryState *) layout/base/nsFrameManager.cpp 26 xul.dll nsFrameItems::AddChild(nsIFrame *) layout/base/nsCSSFrameConstructor.cpp 27 xul.dll nsCSSFrameConstructor::ConstructBlock(nsFrameConstructorState &,nsStyleDisplay const *,nsIContent *,nsIFrame *,nsIFrame *,nsStyleContext *,nsIFrame * *,nsFrameItems &,nsIFrame *,PendingBinding *) layout/base/nsCSSFrameConstructor.cpp 28 xul.dll nsPresArena::Allocate(unsigned int,unsigned __int64) layout/base/nsPresArena.cpp 29 xul.dll BRFrame::AccessibleType() layout/generic/nsBRFrame.cpp 30 xul.dll nsCSSFrameConstructor::ConstructNonScrollableBlock(nsFrameConstructorState &,nsCSSFrameConstructor::FrameConstructionItem &,nsIFrame *,nsStyleDisplay const *,nsFrameItems &) layout/base/nsCSSFrameConstructor.cpp 31 xul.dll TreeMatchContext::AutoAncestorPusher::PushAncestorAndStyleScope(nsIContent *) layout/style/nsRuleProcessorData.h 32 xul.dll nsCSSFrameConstructor::ConstructFrameFromItemInternal(nsCSSFrameConstructor::FrameConstructionItem &,nsFrameConstructorState &,nsIFrame *,nsFrameItems &) layout/base/nsCSSFrameConstructor.cpp 33 xul.dll nsStyleContext::nsStyleContext(nsStyleContext *,nsIAtom *,nsCSSPseudoElements::Type,nsRuleNode *,bool) layout/style/nsStyleContext.cpp
This seems to be a null deref, and the claim is that this is nsCOMPtr::get. I don't see anything obviously wrong in RuleHash_ClassTable_GetKey, though... Cameron, any ideas?
Flags: needinfo?(cam)
Argh. You had this in a debugger? If you can do that again, what is entry->mRules[0].mSelector->mClassList? We shouldn't be ending up in the class table if we have no classes, obviously...
(In reply to Boris Zbarsky [:bz] from comment #4) > Argh. You had this in a debugger? If you can do that again, what is > entry->mRules[0].mSelector->mClassList? We shouldn't be ending up in the > class table if we have no classes, obviously... I downloaded the minidump for the one reported in bug 1011119. My memory was incorrect. In that one, mClassList was 0x5a5a5a5a. Is your question whether or not we took the IsPseudoElement() branch? Because if we did not, then selector is entry->mRules[0].mSelector. The disassembly is: 650C7510 push ebp 650C7511 mov ebp,esp 650C7513 mov eax,dword ptr [hdr] 650C7516 mov ecx,dword ptr [eax+4] 650C7519 mov eax,dword ptr [ecx+0Ch] 650C751C cmp dword ptr [eax],0 650C751F jne RuleHash_ClassTable_GetKey+18h (650C7528h) 650C7521 mov eax,dword ptr [eax+0Ch] 650C7524 mov eax,dword ptr [eax] 650C7526 pop ebp 650C7527 ret 650C7528 cmp dword ptr [eax+4],0 650C752C jne RuleHash_ClassTable_GetKey+28h (650C7538h) 650C752E mov eax,dword ptr [eax+1Ch] 650C7531 mov edx,dword ptr [eax+0Ch] 650C7534 mov eax,dword ptr [edx] 650C7536 pop ebp 650C7537 ret 650C7538 mov ecx,dword ptr [eax+0Ch] 650C753B mov eax,dword ptr [ecx] 650C753D pop ebp 650C753E ret We are dying on 53B. The assembly is relatively easy to read. On 519 we store entry->mRules[0].mSelector or 'selector' into EAX. Then IsPsuedoElement[0] is inlined. On 51C we test to see if we have a non-null mLowerCasedTag. If we do not, we take an early return. If we do, we jump to 528, where we test to see if we have a null mCasedTag. If we do, we advance to mNext and return the class list's atom. Otherwise, we jump down to 538 where we grab the class list and return the atom. Since we died on 53B we can infer that we have both mLowerCaseTag and mCasedTag and that we did not take the pseudo element branch. [0] http://mxr.mozilla.org/mozilla-central/source/layout/style/StyleRule.h#150
> In that one, mClassList was 0x5a5a5a5a. OK. That's dead stuff. :( > Is your question whether or not we took the IsPseudoElement() branch? That was one of the things I wanted to know, yes. Alright, so the real issue is that we ended up with a dead object in the hashtable somehow...
If this is a noticeable crash in nightly, can we get a one-day regression range out of crash stats?
This crash reproduced today on Max OSX 10.9 when running our mozmill tests. https://crash-stats.mozilla.com/report/index/63056538-0308-4308-b606-9c6a62140520 Test that failed is the same as in Comment 1 : firefox/tests/remote/testSecurity/testSafeBrowsingNotificationBar.js
Blocks: 1013603
(In reply to David Baron [:dbaron] (needinfo? me) (UTC-7) from comment #7) > If this is a noticeable crash in nightly, can we get a one-day regression > range out of crash stats? These started with build 20140502030202.
Hrm. So looking through that for things that might conceivably lead to corrupted style system memory: 1) Enabling of incremental cycle collection by default. 2) Bug 1001966 -- changes to the CC macros 3) DataStore on workers (on the general "threads + memory == bad idea" assumption; this is a long shot). 4) Bug 999913 -- replace-malloc. But I thought this did nothing by default, right? 5) Bug 1002632 (image document refresh driver bits), but not likely: stacks are showing script running in a normal document. Anything else jumping out at people?
Flags: needinfo?(mh+mozilla)
Flags: needinfo?(continuation)
#3 should not actually be used in the wild. It's b2g only right now, afaik.
Gijs landed bug 1004431 "fix broken CSS in OSX's downloads.css", which I guess in theory could expose an existing bug, though I don't know if that's related to the style system in particular. But that seems unlikely. What are the CCed classes that might be involved here? I could audit them to see if they do anything weird CC-wise, like implement their own AddRef/Release, or do something funny in their Unlink that could leave dead stuff around.
Flags: needinfo?(continuation)
#4 didn't even make it to a nightly yet.
Flags: needinfo?(mh+mozilla)
> What are the CCed classes that might be involved here? Stylesheets and style rules, I guess (and style rules do some weird CC things, in fact). The actual things that are crashing are not CCed, so even the CC bit seemed like a bit of a long shot. :(
This is the #3 topcrasher for Firefox 32.0a1 with 849/9099 crashes in the last 7 days.
Flags: needinfo?(cam)
Once again, as with bug 1009675, most of these crashes aren't really null-dereferences. See bug 1018360 comment #0 for an explanation of how this can happen on the Intel 64-bit (aka amd64) CPU. In this bug's case, the true crash address seems to (mostly) be 0x5a5a5a5a (for 32-bit distros) and 0x5a5a5a5a5a5a5a5a (for 64-bit distros).
Summary: crash in RuleHash_ClassTable_GetKey → crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a
What happens incrementally in incremental cycle collection?
Building the graph (MarkRoots()). ScanRoots() and CollectWhite() happen in a single chunk because they interact badly with weak references.
The first Nightly for ICC was 5/2 so the timing would match comment 0.
Blocks: 1023758
No longer blocks: 1013603
Crash Signature: [@ RuleHash_ClassTable_GetKey] → [@ RuleHash_ClassTable_GetKey][@ RuleHash_IdTable_GetKey]
Assignee: nobody → continuation
Blocks: 911246
I crashed bp-261febbd-8191-46b3-a310-7bcb72140616 EXCEPTION_ACCESS_VIOLATION_READ|0xffffffffffffffff|0
#3 on Nightly, Fx33
Attached file Archive.zip (unminimized testcase) (deleted) —
I seem to be hitting this crash pretty soon with this unminimized testcase. Just open the file 'parentframe3.htm' and make sure you have popup blocking disabled for local files.
> make sure you have popup blocking disabled for local files How do you do this? And what OS are you running on? I just tried your testcase (with today's m-c nightly) on OS X 10.8.5, and didn't see any problems.
I don't see any crashes on OS X 10.7.5 or 10.9.3 either. Before I tested I turned off popup blocking entirely (in Preferences : Content).
This is really just papering over the issue, because presumably the rule cascades for a sheet shouldn't be reachable if we're unlinking the sheet, but maybe it will change how this is failing, and tell us something. I measured locally and it didn't take any measurably amount of time when clearing TechCrunch. https://tbpl.mozilla.org/?tree=Try&rev=a8302af34a76
Attachment #8442347 - Flags: review?(bzbarsky)
Comment on attachment 8442347 [details] [diff] [review] Clear rule cascades in nsCSSStyleSheet's Unlink. r=me to try this out and see if crash frequency goes down.
Attachment #8442347 - Flags: review?(bzbarsky) → review+
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
We're not sure whether this is fixed....
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Target Milestone: mozilla33 → ---
Keywords: leave-open
I experience a crash with a signature referencing this bug, and an alternative signature: https://crash-stats.mozilla.com/report/index/3425700f-561c-42e6-b806-7cc532140619 This is reproducible in my Nightly with a low frequency by repeatedly opening and closing a tab with a news article [1], the crash appears to happen shortly after closing the tab. [1] http://www.svt.se/nyheter/regionalt/vasterbottensnytt/csn-detektiver-jobbar-fran-umea
I can also reproduce this crash in the latest trunk build: https://crash-stats.mozilla.com/report/index/c5b787b9-d782-485d-9e90-2d4e02140620
(In reply to Martijn Wargers [:mwargers] (QA) from comment #37) > I can also reproduce this crash in the latest trunk build: What did you use to reproduce it? The steps in comment 36? From that crash report, the crash is the same, except that it is a null deref now. Which is sort of better. But it will almost certainly be happening as much. (In reply to Lars Viklund from comment #36) > This is reproducible in my Nightly with a low frequency by repeatedly > opening and closing a tab with a news article [1], the crash appears to > happen shortly after closing the tab. Thanks for the link. Can you reproduce this in safe mode? If not, what addons do you have? I tried following your steps but it didn't crash.
Summary: crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a → crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a or 0x0
> https://crash-stats.mozilla.com/report/index/c5b787b9-d782-485d-9e90-2d4e02140620 > except that it is a null deref now It's *not* a null deref. It's a deref at 0x5a5a5a5a5a5a5a5a like all the others. I downloaded the minidump and checked. See comment #20.
Summary: crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a or 0x0 → crash in RuleHash_ClassTable_GetKey referencing address 0x5a5a5a5a or 0x5a5a5a5a5a5a5a5a
> https://crash-stats.mozilla.com/report/index/c5b787b9-d782-485d-9e90-2d4e02140620 And by the way, Martijn, the build from this crash report doesn't yet have the patch from comment #31.
We don't need any more "me too" comments. We *do* need reliable STR for this bug.
I am away from the machine in comment 36 for vacation now, but it has Privacy Badger, Itsalltext, Greasemonkey. Tried permuting add-ons and safe mode but hit rate is too low to tell by hand as I fail to trigger it right.
What are you reproducing the crash with? The test case you previously attached? Do I need to install some addon in addition, because I get a lot of errors about universal XPConnect when I run the test case as is.
Flags: needinfo?(martijn.martijn)
Attached file crash1.htm (deleted) —
Ok, this is a (as good as) minimal testcase that is still showing this crash for me: https://crash-stats.mozilla.com/report/index/c274bea0-4956-4c25-93f5-b751b2140625 The testcase uses popups, so you need to allow popups for this testcase.
Flags: needinfo?(martijn.martijn)
The testcase doesn't work here (attached to the bug), because it doesn't like mixed content, so you would need to enable that or just open the testcase locally. I usually get the crash after a while, when Nightly is in the background for a while.
Martijn, I find your latest testcase works for me, though it's still rather hard to use -- it takes anywhere between a minute and five minutes to crash. First you need to set security.mixed_content.block_active_content to false in about:config. (I think not setting this was the reason I didn't see any crashes with your previous testcase.) Then you need to either turn off all popup blocking or to create an exception for https://bug1011391.bugzilla.mozilla.org. But if you use the Profile Manager (as I do), you need to watch out for one *very* unfortunate side effect: When FF crashes (in any profile) it defaults to reloading the same tabs as were present before you crashed. But with your testcase, this behavior somehow causes FF to load an infinite number of times *while you're still in the Profile Manager*. I ended up having to force quit my computer, then spent 30 minutes verifying all its hard drives with Disk Utility.
You can probably avoid the problem I reported in comment #49 by setting browser.sessionstore.resume_from_crash to false in about:config.
I just did another test with incremental cycle collection turned off (with dom.cycle_collector.incremental set to false). Yesterday's m-c nightly has now been running Martijn's testcase for 10 minutes without crashing.
Thanks, Martijn. johns and I have also been able to reproduce, and we've been doing some analysis with additional logging.
This didn't fix the crash, and Vladan noticed a spike in total CC time on the day this hit Nightly (there's also a small increase in max pause time), so I backed this out. bz was okay with doing that in IRC. https://hg.mozilla.org/integration/mozilla-inbound/rev/ad48f01aae1d I have a patch in progress that I think fixes the issue, so hopefully that will land in the next day or two.
Status: REOPENED → NEW
No longer blocks: 1023758
Depends on: 1023758
the backout patch migrated to central at some point: https://hg.mozilla.org/mozilla-central/rev/ad48f01aae1d
ICC has been disabled on 32 as of tomorrow, and hopefully should be fixed as of today's Nightly.
It looks like there have been no crashes on trunk since bug 1023758 landed, and none on Aurora since bug 911246 was backed out, so I'm going to call this fixed. Thanks to everybody who helped investigate this.
Status: NEW → RESOLVED
Closed: 11 years ago10 years ago
OS: Mac OS X → All
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
verified, no reports of this after 20140702
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: