Closed
Bug 1346421
Opened 8 years ago
Closed 5 years ago
Crash in mozilla::css::StyleRule::IsCCLeaf
Categories
(Core :: DOM: CSS Object Model, defect, P5)
Tracking
()
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
firefox52 | --- | unaffected |
firefox53 | + | fix-optional |
firefox54 | + | fix-optional |
firefox55 | --- | unaffected |
People
(Reporter: philipp, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
This bug was filed from the Socorro interface and is
report bp-704e4ecc-6b27-4b42-aced-5c05e2170310.
=============================================================
Crashing Thread (0)
Frame Module Signature Source
0 xul.dll mozilla::css::StyleRule::IsCCLeaf() layout/style/StyleRule.cpp:1281
1 xul.dll mozilla::CSSStyleSheet::cycleCollection::TraverseNative(void*, nsCycleCollectionTraversalCallback&) layout/style/CSSStyleSheet.cpp:547
2 xul.dll MayHaveChild xpcom/base/nsCycleCollector.cpp:2556
3 xul.dll nsPurpleBuffer::PurpleBlock::VisitEntries<RemoveSkippableVisitor>(nsPurpleBuffer&, RemoveSkippableVisitor&) xpcom/base/nsCycleCollector.cpp:1035
4 xul.dll nsPurpleBuffer::RemoveSkippable(nsCycleCollector*, bool, bool, void (*)(void)) xpcom/base/nsCycleCollector.cpp:2815
5 xul.dll nsCycleCollector::ForgetSkippable(bool, bool) xpcom/base/nsCycleCollector.cpp:2863
6 xul.dll nsCycleCollector_forgetSkippable(bool, bool) xpcom/base/nsCycleCollector.cpp:4096
7 xul.dll FireForgetSkippable dom/base/nsJSEnvironment.cpp:1238
8 xul.dll CCTimerFired dom/base/nsJSEnvironment.cpp:1795
9 xul.dll nsTimerImpl::Fire(int) xpcom/threads/nsTimerImpl.cpp:479
10 xul.dll nsTimerEvent::Run() xpcom/threads/TimerThread.cpp:297
11 xul.dll nsThread::ProcessNextEvent(bool, bool*) xpcom/threads/nsThread.cpp:1240
12 xul.dll mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:96
13 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:301
14 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231
15 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211
16 xul.dll nsBaseAppShell::Run() widget/nsBaseAppShell.cpp:156
17 xul.dll nsAppShell::Run() widget/windows/nsAppShell.cpp:262
18 xul.dll XRE_RunAppShell() toolkit/xre/nsEmbedFunctions.cpp:924
19 xul.dll mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) ipc/glue/MessagePump.cpp:269
20 xul.dll MessageLoop::RunHandler() ipc/chromium/src/base/message_loop.cc:231
21 xul.dll MessageLoop::Run() ipc/chromium/src/base/message_loop.cc:211
22 xul.dll XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/nsEmbedFunctions.cpp:756
23 xul.dll mozilla::BootstrapImpl::XRE_InitChildProcess(int, char** const, XREChildData const*) toolkit/xre/Bootstrap.cpp:65
24 firefox.exe content_process_main(mozilla::Bootstrap*, int, char** const) ipc/contentproc/plugin-container.cpp:115
25 firefox.exe wmain toolkit/xre/nsWindowsWMain.cpp:115
26 firefox.exe __scrt_common_main_seh f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:253
27 kernel32.dll BaseThreadInitThunk
28 ntdll.dll __RtlUserThreadStart
29 ntdll.dll _RtlUserThreadStart
this crash signature seems to be regressing in firefox 53 & later versions. so far it has rather been a low-volume crash though...
Reporter | ||
Comment 1•8 years ago
|
||
a related signature may be https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Acss%3A%3ARule%3A%3AIsKnownLive
Crash Signature: [@ mozilla::css::StyleRule::IsCCLeaf] → [@ mozilla::css::StyleRule::IsCCLeaf]
[@ mozilla::css::Rule::IsKnownLive]
Comment 2•8 years ago
|
||
This function was only added in 53, so crashes in it being new in 53 makes sense.
The linked crash is at 0x20c on this line:
return !mDOMDeclaration || !mDOMDeclaration->PreservingWrapper();
0x20c isn't _that_ close to 0.... but still pretty close.
The crashes linked in comment 1 are mostly at 0x1b4 but not all, and at this line:
return nsCCUncollectableMarker::InGeneration(
in IsKnownLive.
Apart from "memory corruption", I have no idea what's up here. And memory corruption would not be this consistent....
Comment 3•8 years ago
|
||
Let's assume this is a regresssion from bug 851892 until proven otherwise.
Blocks: 851892
Comment 4•8 years ago
|
||
Happening on mDOMDeclaration line but not Rule::IsCCLeaf line probably means the StyleRule object itself is still in valid memory, but it is filled with some random data.
There are two crashes from a same install happen at 0x20c, but the address varies otherwise. There is one 0xd, 0x7a000c, and 0x400c. So the position here probably doesn't have much value.
Comment 5•8 years ago
|
||
52.0b2 was just released out, let's see how the trend is going.
While this is low volume it does seem like a recent regression and I want to make sure we don't forget it. Tracking for 53 and 54.
tracking-firefox53:
--- → +
tracking-firefox54:
--- → +
Comment 7•8 years ago
|
||
dbaron: bz: :astley -- any thoughts? What are our options here? I presume we don't want to backout the CSS in WebIDL
Flags: needinfo?(dbaron)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(aschen)
Comment 8•8 years ago
|
||
The best place to start would be with someone with a working Windows install to try debugging one of these crashes in a minidump so we get some idea of what pointers are valid and what pointers are garbage.
The next best place to start is checking whether there is any information on the crash dumps (e.g. urls) that might enable us to reproduce the crash. Especially under rr....
Flags: needinfo?(bzbarsky)
Comment 9•8 years ago
|
||
So, two crashes from 53.0b8:
First, bp-22706024-0157-4a9f-913b-8ed2a2170401, which is a crash at 659DF7A0:
659DF7A0 F6 40 0C 01 test byte ptr [eax+0Ch],1 <===== CRASH HERE
659DF7A4 0F 84 F8 44 BC FF je mozilla::css::StyleRule::IsCCLeaf+11h (655A3CA2h)
659DF7AA 33 C0 xor eax,eax
659DF7AC C3 ret
with EAX = 1
Second, bp-b16689b4-d9df-4a5f-92c2-1ce082170403, which is a crash at 00007FFCB5C90CDE:
00007FFCB5C90CC4 F6 41 18 01 test byte ptr [rcx+18h],1
00007FFCB5C90CC8 75 11 jne mozilla::css::StyleRule::IsCCLeaf+17h (07FFCB5C90CDBh)
00007FFCB5C90CCA 48 8B 51 58 mov rdx,qword ptr [rcx+58h]
00007FFCB5C90CCE 33 C0 xor eax,eax
00007FFCB5C90CD0 48 85 D2 test rdx,rdx
00007FFCB5C90CD3 75 09 jne mozilla::css::StyleRule::IsCCLeaf+1Ah (07FFCB5C90CDEh)
00007FFCB5C90CD5 B8 01 00 00 00 mov eax,1
00007FFCB5C90CDA C3 ret
00007FFCB5C90CDB 32 C0 xor al,al
00007FFCB5C90CDD C3 ret
00007FFCB5C90CDE F6 42 18 01 test byte ptr [rdx+18h],1 <====== CRASH HERE
00007FFCB5C90CE2 75 F6 jne mozilla::css::StyleRule::IsCCLeaf+16h (07FFCB5C90CDAh)
00007FFCB5C90CE4 EB EF jmp mozilla::css::StyleRule::IsCCLeaf+11h (07FFCB5C90CD5h)
with RDX = 0000000020000000.
Comment 10•8 years ago
|
||
I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer in mDOMDeclaration->PreservingWrapper().
Comment 11•8 years ago
|
||
One thing I'm a little suspicious of is whether it's OK for DOMCSSDeclarationImpl to aggregate the two cycle collection interfaces with mRule, but not aggregate in general with mRule.
Comment 12•8 years ago
|
||
My suspicion is that the url field (present in 20 crashes in the past 6 months) isn't particularly useful. 6 are exactly https://www.facebook.com/ , 3 are other urls on facebook, 2 on youtube, 1 on google, 1 https://web.whatsapp.com/ , 1 about:newtab, 1 on etsy, and the remaining 4 on sites I hadn't heard of. Though I suppose that means there's a chance it's Facebook-related.
Maybe bz has some ideas based on the comments above, though?
Flags: needinfo?(dbaron) → needinfo?(bzbarsky)
Updated•8 years ago
|
Comment 13•8 years ago
|
||
> I think (at a quick look) that in both cases that means we have a bad mDOMDeclaration pointer
> in mDOMDeclaration->PreservingWrapper().
Certainly the 64-bit one looks like that. The other has less context, but certainly involves the same sort of "read the thing 3 words in, and check the 1 bit", so is quite plausible.
> but not aggregate in general with mRule
DOMCSSDeclarationImpl does aggregate with mRule for refcounting:
NS_IMPL_ADDREF_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule)
NS_IMPL_RELEASE_USING_AGGREGATOR(DOMCSSDeclarationImpl, mRule)
Were you thinking of something else in terms of aggregating in general?
Anyway, the basic ownership model here is that mDOMDeclaration is a UniquePtr<DOMCSSDeclarationImpl>, the refcounts are aggregated, and the CC bits are aggregated so that both objects look like a single node in the CC graph. The only real benefit here is that we save a word for the refcount of the DOMCSSDeclarationImpl.
We could disaggregate the stuff, give DOMCSSDeclarationImpl its own refcount, and see whether that makes the crashes go away. But I'm not quite sure how we can end up with a bad mDOMDeclaration value in general. It starts life in the UniquePtr ctr, which nulls it out. Then it's only modified when we allocate the DOMCSSDeclarationImpl, and set to the new object. It's never written anywhere else...
Looking at reports for this crash, I'm seeing the following crash addresses in the last 14 days, including the two from comment 9:
0x800c
0x8a00000018
0x10
0x2c
0x20000018
0x400018
0x200c
0xd
0xe
0x14
0x10c
0x4c
0x11
0x4000000c
0x10c
0x818
0x8000c
All the ones ending in "18" are 64-bit. All the others are 32-bit. A crash for a bad mDOMDeclaration on 64-bit would be at "bad pointer + 0x18" and on 32-bit at "bad pointer + 0xC". So the corresponding bad pointers would be:
0x8000
0x8a00000000
0x04
0x20
0x20000000
0x400000
0x2000
0x1
0x2
0x08
0x100
0x40
0x05
0x40000000
0x100
0x800
0x80000
and all of those except 0x8a00000000 and 0x05 are a single bit from a 0 pointer. The report from comment 0 was crashing at 0x20c which is 0x200 + 0xC.
Anyway, those two non-one-bit-from-0 reports are https://crash-stats.mozilla.com/report/index/1ea017ca-7429-49ce-ba52-c48442170404 and https://crash-stats.mozilla.com/report/index/270040d2-913e-44b9-bd51-195972170326 in case we think we can get something useful out of them. The rest really do look like someone bitflipped mDOMDeclaration...
Flags: needinfo?(bzbarsky)
Comment 14•8 years ago
|
||
So one question is whether this is low-volume enough that we can in fact chalk this up to memory bitflips...
Comment 15•8 years ago
|
||
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #13)
> Were you thinking of something else in terms of aggregating in general?
I was thinking of COM identity, i.e., both objects produce the same set of results from QueryInterface.
Comment 16•8 years ago
|
||
(In reply to Boris Zbarsky [:bz] (still a bit busy) (if a patch has no decent message, automatic r-) from comment #14)
> So one question is whether this is low-volume enough that we can in fact
> chalk this up to memory bitflips...
Given that the pointer is generally null, I think that's plausible.
Comment 17•8 years ago
|
||
The volume of crashes in Beta54 is low. Mark 54 as fix-optional but still happy to take fix in 54.
Updated•7 years ago
|
Flags: needinfo?(aschen)
Updated•6 years ago
|
Priority: -- → P5
Comment 18•5 years ago
|
||
Closing because no crashes reported for 12 weeks.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•