Fission crash in [@ mozilla::dom::BrowserBridgeHost::GetLayersId]
Categories
(Core :: Layout, defect, P3)
Tracking
()
People
(Reporter: cpeterson, Assigned: emilio)
Details
(Keywords: crash, Whiteboard: fission-soft-blocker)
Crash Data
Attachments
(2 files, 1 obsolete file)
Parent process crash on macOS and Windows.
Maybe Fission related since 100% of crash reports (11 out of 11) have Fission enabled and the crashes have only been reported in Nightly (86, 87, 88) and Beta (89.0b8, where we are running a Fission Beta experiment).
Crash report: https://crash-stats.mozilla.org/report/index/342930c5-8783-44a8-ad5a-8c72b0210507
Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS
Top 4 frames of crashing thread:
0 XUL mozilla::dom::BrowserBridgeHost::GetLayersId const dom/ipc/BrowserBridgeHost.cpp:30
1 XUL nsSubDocumentFrame::BuildDisplayList layout/generic/nsSubDocumentFrame.cpp:372
2 XUL nsBlockFrame::BuildDisplayList layout/generic/nsBlockFrame.cpp:7064
3 XUL nsIFrame::BuildDisplayListForChild layout/generic/nsIFrame.cpp:4030
Comment 1•3 years ago
|
||
100% crash on Fission. Give this S2 for now. Perhaps need to be P1 if the milestone is close.
Reporter | ||
Comment 2•3 years ago
|
||
The stack trace is missing some frames because many functions inlined between nsSubDocumentFrame::BuildDisplayList indirectly calling GetRemoteBrowser() and BrowserBridgeHost::GetLayersId(). mBridge is null here:
We should null out bridge host before nulling out remote browser?
needinfo'ing Nika to add more comments so someone else can write the patch.
Comment 3•3 years ago
|
||
I think the risk here is caused by the mBridge
field of BrowserBridgeHost
being cleared out earlier than the mRoot
field of BrowserHost
is cleared out, and before the nsFrameLoader::mRemoteBrowser
field is cleared. The mRemoteBrowser
field is cleared in nsFrameLoader::DestroyComplete()
: https://searchfox.org/mozilla-central/rev/0e8b28fb355afd2fcc69d34e8ed66bbabf59a59a/dom/base/nsFrameLoader.cpp#2061-2065, which is calle dfrom the BrowserParent::ActorDestroy
callback when the browser is fully destroyed here: https://searchfox.org/mozilla-central/rev/0e8b28fb355afd2fcc69d34e8ed66bbabf59a59a/dom/ipc/BrowserParent.cpp#714. This full destroy path is initially triggered by the nsFrameLoader
calling DestroyStart
here: https://searchfox.org/mozilla-central/rev/0e8b28fb355afd2fcc69d34e8ed66bbabf59a59a/dom/base/nsFrameLoader.cpp#2007, which ends up asynchronously performing the rest of the process. In the BrowserBridgeHost
case, BrowserBridgeHost::DestroyStart()
is implemented by directly calling DestroyComplete
, meaning that we can clear out the mBridge
field before we have cleared out the reference to the host.
To fix this issue, we should probably make BrowserBridgeChild
act somewhat more like BrowserChild
, calling nsFrameLoader::DestroyComplete
from ActorDestroy
, and not clearing out mRoot
in the host until the final step is complete.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Comment 5•3 years ago
|
||
[layout S1/S2 bug triage]: toggling ni=emilio just to be sure this is on your radar (once you're back from PTO). Looks like there's a patch but it's pending some revision. Thanks!
Assignee | ||
Comment 6•3 years ago
|
||
This is probably simpler than messing up with the destruction order, as
callers already need to deal with this case, wdyt?
Assignee | ||
Updated•3 years ago
|
Reporter | ||
Comment 7•3 years ago
|
||
This bug is a hard blocker for Fission M8.
Comment 8•3 years ago
|
||
Change the status for beta to have the same as nightly and release.
For more information, please visit auto_nag documentation.
Reporter | ||
Comment 9•3 years ago
|
||
This crash's volume is so low (only 1 report in Beta 91 and 1 report in Nightly 92) that we don't need to bother uplifting a fix to Beta 91.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 10•3 years ago
|
||
Assignee | ||
Comment 11•3 years ago
|
||
Try run for the later, which I think is more in line with what Nika wanted: https://treeherder.mozilla.org/jobs?repo=try&revision=d007e75078f757d707c95175a52c88aec032b009
Updated•3 years ago
|
Reporter | ||
Comment 12•3 years ago
|
||
Setting status-firefox92=wontfix
because I don't think we need to uplift this fix to Beta 92. This crash's volume is very low and the fix looks a little risky.
Reporter | ||
Updated•3 years ago
|
Comment 13•3 years ago
|
||
Comment 14•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment 15•3 years ago
|
||
Backed out changeset 7fb67d32d491 (Bug 1710879) on suspicion of causing content crashes (Bug 1725558)
Backout links:
https://hg.mozilla.org/mozilla-central/rev/d6e2bf1d80b0de62e93d3206fbfe048e1ed4caab
https://hg.mozilla.org/integration/autoland/rev/d6e2bf1d80b0de62e93d3206fbfe048e1ed4caab
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 16•3 years ago
|
||
So this got backed out because of crashes like bug 1725558.
They seem nullptr crashes with a null frameloader, which I believe can happen now that destruction is async if we never finish initializing the bridge (e.g., if we take any of the early returns in places like RecvMakeFrameRemote
). Before, we'd shut down the bridge sync, which would prevent the messages from arriving.
Should I just null-check mFrameLoader
in the BrowserBridgeChild::Recv*
functions? Add a mDestroying
boolean and early return? Or is there a better approach?
Reporter | ||
Comment 17•3 years ago
|
||
Deferring this bug from Fission Milestone M8 to MVP. The crash volume is low and the fix caused some new crash regressions, so we probably wouldn't uplift the new fix to Beta 92 this late in the beta cycle.
Comment 18•3 years ago
|
||
(In reply to Emilio Cobos Álvarez (:emilio) from comment #16)
They seem nullptr crashes with a null frameloader, which I believe can happen now that destruction is async if we never finish initializing the bridge (e.g., if we take any of the early returns in places like
RecvMakeFrameRemote
). Before, we'd shut down the bridge sync, which would prevent the messages from arriving.
Ah, that makes sense. The change to make the deleteBridge
MakeScopeExit
call not synchronously destroy the bridge could cause issues. I should've probably caught that in review :-)
Could we perhaps make Send__delete__
a both:
message, so that we can still synchronously destroy the bridge in the case where setup failed, but we do the async shutdown if it's succeeded, and was bound to a nsFrameLoader
? That would give us the async teardown behaviour which the frameloader expects, but also allows us to only allow the actor to receive messages after it's been properly connected. I don't think the BrowserBridgeParent
side will mind much either, so it'll just simplify things on the BrowserBridgeChild
side.
Should I just null-check
mFrameLoader
in theBrowserBridgeChild::Recv*
functions? Add amDestroying
boolean and early return? Or is there a better approach?
Null-checking mFrameLoader
would also be a reasonable approach, I suppose. I don't love how future methods could easily be missed there and be potential future crashes though.
Comment 20•3 years ago
|
||
Comment 21•3 years ago
|
||
bugherder |
Description
•