Closed
Bug 1394788
Opened 7 years ago
Closed 5 years ago
Crash in MessageLoop::PostTask_Helper
Categories
(Firefox for Android Graveyard :: General, defect, P5)
Tracking
(fennec+, firefox55 wontfix, firefox56+ wontfix, firefox57 fix-optional)
RESOLVED
WORKSFORME
Tracking | Status | |
---|---|---|
fennec | + | --- |
firefox55 | --- | wontfix |
firefox56 | + | wontfix |
firefox57 | --- | fix-optional |
People
(Reporter: marcia, Unassigned)
References
Details
(Keywords: crash, regression)
Crash Data
This bug was filed from the Socorro interface and is
report bp-fccc50c8-4076-469b-805e-06bd60170829.
=============================================================
Crash that is spiking on Android in 55.0.2: 3766 crashes/ : http://bit.ly/2gnN7l5
Also occurs on desktop on both Windows and Mac, although in much smaller numbers. There are a number of intermittent test failure bugs on file, such as Bug 1394665 with the same signature.
ni on snorp and nevin to see if they can figure out what might be causing this spike on Android.
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(snorp)
Flags: needinfo?(cnevinchen)
Comment 1•7 years ago
|
||
A lot of these seem to be called from mozilla::layers::UiCompositorControllerChild::Destroy()
Comment 2•7 years ago
|
||
Other android ones seem to be from mozilla::layers::AndroidDynamicToolbarAnimator::UpdateFrameMetrics()
Comment 3•7 years ago
|
||
but there are very few android crashes other than the UICompositorControllerChild
Comment 4•7 years ago
|
||
Crashes in PostTask_Helper are also a big problem for Android tests currently - bug 1394428.
Comment 5•7 years ago
|
||
Joe, is there someone on your team who can take a look?
Flags: needinfo?(jcheng)
Comment 6•7 years ago
|
||
Thanks for ni.
This doesn't look like a front-end bug since it's crash at native code. Sorry I have no idea how to fix it.
Hi Jing Wei
Do you have any idea about this crash?
Flags: needinfo?(cnevinchen) → needinfo?(topwu.tw)
Updated•7 years ago
|
Whiteboard: [FNC][SPT57.3][INT]
Comment 7•7 years ago
|
||
Hi Jingwei, as discussed, please check it first to clarify if this will need other team's help, tks!
Flags: needinfo?(jcheng)
Comment 8•7 years ago
|
||
Since the crash happens in c++ and comment 0 said it also occurs on desktop, let's wait for platform team's help with their wisdom.
Flags: needinfo?(topwu.tw)
Comment 9•7 years ago
|
||
Joe, Wesly, ~90% of these crashes are on fennec, with about 4k crashes a week on release. "wait until some other team gets to it" doesn't seem like a winning strategy?
Flags: needinfo?(wehuang)
Flags: needinfo?(jcheng)
rbarker, can you take a look? Seems like you may be familiar with this code.
Flags: needinfo?(rbarker)
Comment 12•7 years ago
|
||
I believe this may be a dup of Bug 1394428 which I am currently looking into.
Flags: needinfo?(rbarker)
Updated•7 years ago
|
Flags: needinfo?(wehuang)
Updated•7 years ago
|
Comment 13•7 years ago
|
||
Hopefully patches in 1392705 can help here.
Yeah hopefully this goes away with the other patches.
Flags: needinfo?(snorp)
Comment 15•7 years ago
|
||
[Tracking Requested - why for this release]: Very high Android crash rate (and not insignificant Desktop crash rate)
tracking-fennec: --- → ?
tracking-firefox56:
--- → ?
Updated•7 years ago
|
Updated•7 years ago
|
Whiteboard: [FNC][SPT57.3][INT]
Comment 17•7 years ago
|
||
This is Android nightly #1 top crash for 20170918100058 build
Comment 18•7 years ago
|
||
See bug 1392705, which may fix the Android portion of this. This merged to m-c on 9/18.
For non-Android crashes, such as Windows:
https://crash-stats.mozilla.com/signature/?platform=Windows&signature=MessageLoop%3A%3APostTask_Helper&date=%3E%3D2017-09-14T07%3A44%3A00.000Z&date=%3C2017-09-21T07%3A44%3A00.000Z&_columns=date&_columns=product&_columns=version&_columns=build_id&_columns=platform&_columns=reason&_columns=address&_columns=install_time&_sort=-date&page=1#reports
this will still be relevant. These seem to come from all over the place, and don't (all) appear to be shutdown crashes, though that's worth checking.
There's some from MediaManager shutdown, which is called from GetProfileBeforeChange (non-e10s) or XpcomWillShutdown (e10s Content). Basically, we're posting a Task to the MediaManager thread telling it to cleanup and shutdown.
Others I see crashing in calls from vsync, and a number from the GeckoIO Thread from IPC reception or OnChannelError.
Perhaps some non-atomic/locking oddness since pump_ can be accessed from multiple threads? Though pump_ seems to be set-once, never touch again (unless the entire object is being destroyed). The one access that uses pump_ after object destruction (potentially) is in PostTask(), where it grabs a stack-based ref to pump_ to call pump->ScheduleWork(). I don't see how that could make where this crashes always (pump_->GetXPCOMThread()) to fail, but all of this is tricky.
Bill -- This is an ongoing crash source, with apparently some underlying issue causing it to get hit in many places. Also see what Android was doing that made this a topcrash there until bug 1392705 landed; perhaps that's a clue about what's failing on other platforms.
Flags: needinfo?(wmccloskey)
tracking-fennec: ? → +
Priority: -- → P1
Assignee: nobody → rbarker
For the Windows crashes, I suspect that bug 1395330 caused most of these to stop happening. My hypothesis was that we were crashing because we were shutting down a thread while there was still an IPC channel alive that could post messages to that thread. The assertion catches that situation and crashes us before it can happen. (And we are seeing quite a lot of crashes from bug 1395330.)
Looking at Windows crashes for 57, it looks like they mostly stopped around 9/6, which is around when bug 1395330 landed. I see one crash from a 9/18 build that seems unrelated:
https://crash-stats.mozilla.com/report/index/1e0f4bfb-0066-40a7-938a-cba850170919
Perhaps the message loop for mMediaThread has already been shut down there? In any case, it's different from the IPC crashes.
We still need to fix the crashes arising from bug 1395330, but at least now we have a better sense of what is going on: people are forgetting to Close() their channels before they shut down their threads.
Flags: needinfo?(wmccloskey)
Comment 20•7 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #19)
> We still need to fix the crashes arising from bug 1395330, but at least now
> we have a better sense of what is going on: people are forgetting to Close()
> their channels before they shut down their threads.
Curious about next steps... does this need someone to hunt for these cases -- is there an easy way to find them?
Flags: needinfo?(wmccloskey)
Bug 1398070 mostly fixed this issue. We're still seeing a few PostTask_Helper crashes, but they're not IPC related. A lot of them seem to be graphics related. Hopefully that team can fix them.
Flags: needinfo?(wmccloskey)
200 crashes in the last week on 56, so I'm not considering this a dot release issue for 56. Crash volume in 57 beta looks pretty low.
Comment 23•7 years ago
|
||
[triage] 42 crashes on 58 in the past 7 days and that includes fennec. This is also no longer a top crasher. Given fennec engineering resources, this is non-critical so removing P1.
Randall, please unassign if you're not working on this.
Flags: needinfo?(rbarker)
Priority: P1 → P3
Updated•7 years ago
|
Assignee: rbarker → nobody
Flags: needinfo?(rbarker)
Comment 24•6 years ago
|
||
Re-triaging per https://bugzilla.mozilla.org/show_bug.cgi?id=1473195
Needinfo :susheel if you think this bug should be re-triaged.
Priority: P3 → P5
Reporter | ||
Comment 25•5 years ago
|
||
Time to close this one out as WFM since there are no crashes on Android.
Status: NEW → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Assignee | ||
Updated•4 years ago
|
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•