Closed Bug 1367065 Opened 8 years ago Closed 7 years ago

Intermittent flexbox-widget-flex-items-1.html,grid-placement-definite-002.html,test_animations_omta.html,table-cell.html | application crashed [@ DynamicAtom::GCAtomTableLocked]

Categories

(Core :: Layout, defect, P4)

defect

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: intermittent-bug-filer, Unassigned)

References

Details

(Keywords: assertion, crash, intermittent-failure)

Crash Data

Summary: Intermittent css-grid/grid-placement-definite-002.html | application crashed [@ DynamicAtom::GCAtomTableLocked] → Intermittent grid-placement-definite-002.html,test_animations_omta.html | application crashed [@ DynamicAtom::GCAtomTableLocked]
Priority: -- → P4
Summary: Intermittent grid-placement-definite-002.html,test_animations_omta.html | application crashed [@ DynamicAtom::GCAtomTableLocked] → Intermittent grid-placement-definite-002.html,test_animations_omta.html,table-cell.html | application crashed [@ DynamicAtom::GCAtomTableLocked]
This looks horrible... Someone is decreasing DynamicAtom's refcnt without invoking its Release method?
Summary: Intermittent grid-placement-definite-002.html,test_animations_omta.html,table-cell.html | application crashed [@ DynamicAtom::GCAtomTableLocked] → Intermittent flexbox-widget-flex-items-1.html,grid-placement-definite-002.html,test_animations_omta.html,table-cell.html | application crashed [@ DynamicAtom::GCAtomTableLocked]
No longer blocks: stylo-bustage
(In reply to Xidorn Quan [:xidorn] UTC+10 from comment #1) > This looks horrible Does this mean someone should eventually be prioritizing investigating it?
Flags: needinfo?(xidorn+moz)
Yes, someone should really look into this... Let me try to take a quick look. Just to verify and so I know where to look, this is stylo-only, right?
Flags: needinfo?(ryanvm)
Looking at the code, this crash either indicates that someone decreases DynamicAtom::mRefCnt without invoking DynamicAtom::Release, or someone is holding an nsIAtom pointer without actually owning it, so it can become a dangling pointer. It doesn't seem to me the former is possible in the code, so I guess it is more likely we are doing the latter.
Hmmm... I suspect the violating thing is: https://searchfox.org/mozilla-central/rev/09c065976fd4f18d4ad764d7cb4bbc684bf56714/servo/components/style/properties/gecko.mako.rs#3074 It isn't clear to me why this Atom::from_addrefed is safe. It doesn't seem to be safe at all?
But even if that is a cause of this, the code doesn't exist before July 4th. But this intermittent started since May...
I pushed https://treeherder.mozilla.org/#/jobs?repo=try&revision=8d70e6aa5e688986993740d740d440782886a798 which adds a check that a zero-refcounted atom can only be revived via getting it from atom table, but that assertion doesn't seem to catch anything. Interestingly, this try push actually caught one this assertion, which means we get this assertion isn't because we are reviving a weak atom.
I did another try push in https://treeherder.mozilla.org/#/jobs?repo=try&revision=fc0e0fde59029b9bdd1f0d4e4670f924967093a9 This records all operations on gUnusedAtomCount in the log, so that we can track what happens to it. There is one task which asserts at a different place that we are reviving an atom (refcnt from 0 to 1) when gUnusedAtomCount is zero. From the record, I can see a pattern stating: > DynamicAtom: Reviving `Sans` > DynamicAtom: Reviving `Sans` > DynamicAtom: Killing `Sans` > DynamicAtom: Killing `Sans` which means we are double reviving the atom `Sans`. The position of this pattern is different from the assertion, so it is not clear to me whether they are related, but this kind of pattern doesn't seem to appear in tasks which run normally. I kinda suspect there is some kind of race condition inside AtomTable, and it only appears with Stylo because atom table wasn't under this level of concurrency pressure without parallel styling. (BTW, it is also weird that why we are killing and reviving the `Sans` atom so frequently... It really shouldn't.)
OK, so the assertion I saw above is another intermittent which is bug 1367396. I think these two bugs are caused from the same underlying issue that there is race condition inside AtomTable, although it isn't completely clear to me why this can happen.
Flags: needinfo?(xidorn+moz)
Depends on: 1386158
Depends on: 1381731
Resolving intermittent Stylo test failure bugs that haven't had any recent activity.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.