RDD Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid]
Categories
(Core :: Audio/Video: Playback, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox84 | --- | unaffected |
firefox85 | --- | wontfix |
firefox86 | --- | wontfix |
firefox87 | --- | fixed |
firefox88 | --- | fixed |
People
(Reporter: aryx, Assigned: alwu)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: crash, regression, Whiteboard: [not-a-fission-bug])
Crash Data
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
jcristau
:
approval-mozilla-beta+
|
Details |
Maybe Fission related. (DOMFissionEnabled=1)
Crash report: https://crash-stats.mozilla.org/report/index/9570ab9a-a94f-4d74-a127-8f5c30210107
Reason: SIGSEGV /SEGV_MAPERR
Top 10 frames of crashing thread:
0 libxul.so mozilla::ipc::IToplevelProtocol::OtherPid const ipc/glue/ProtocolUtils.cpp:618
1 libxul.so mozilla::RDDProcessManager::CreateContentBridge dom/media/ipc/RDDProcessManager.cpp:268
2 libxul.so mozilla::RDDProcessManager::EnsureRDDProcessAndCreateBridge const dom/media/ipc/RDDProcessManager.cpp:192
3 libxul.so mozilla::MozPromise<bool, nsresult, false>::ThenValue<mozilla::RDDProcessManager::EnsureRDDProcessAndCreateBridge xpcom/threads/MozPromise.h:835
4 libxul.so mozilla::MozPromise<bool, nsresult, false>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:476
5 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:739
6 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
7 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
9 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137
Updated•4 years ago
|
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Bryce, do you know who plans to take over ownership of the RDD process?
We run LaunchRDDProcess()
which launches the process, and then asynchronously runs the callback. The callback calls CreateContentBridge
, which assumes mRDDChild
exists and dereferences it.
In this case it's nullptr, because we ran code between LaunchRDDProcess
and cleared it (I guess because it crashed).
We could try make this not async (direct dispatch promises?), fail gracefully, or retry.
Fairly low volume crash, but probably worth fixing.
(In reply to Matt Woodrow (:mattwoodrow) from comment #1)
Bryce, do you know who plans to take over ownership of the RDD process?
We run
LaunchRDDProcess()
which launches the process, and then asynchronously runs the callback. The callback callsCreateContentBridge
, which assumesmRDDChild
exists and dereferences it.In this case it's nullptr, because we ran code between
LaunchRDDProcess
and cleared it (I guess because it crashed).We could try make this not async (direct dispatch promises?), fail gracefully, or retry.
Fairly low volume crash, but probably worth fixing.
I think :jhlin has expressed some interest, but he's on PTO for the near future. CCing :alwu who may be able to help. I don't have a lot of cycles right now, but will hold NI in case I get time to look.
Updated•4 years ago
|
Assignee | ||
Updated•4 years ago
|
Comment 3•4 years ago
|
||
Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information
Updated•4 years ago
|
Comment 4•4 years ago
|
||
This crash is not related to Fission. About 23% of this signature's crash reports have Fission enabled, but that's roughly the percentage of Nightly users with Fission enabled, so there is no correlation.
With a suitable selection of open Web pages, this turns into a crash upon session restore. (I have no clue what page among my many tabs triggers this crash.)
jbauman, I suggest restoring the more severe designation of the bug considering that it seems easy for this to be a startup crash (really session restore-time crash) from the user perspective.
Comment 9•4 years ago
|
||
I don't have strong feelings about S2/S3. I probably changed it because it's possible to disable RDD, therefore a workaround exists (which is contrary to the definition of S2) and unassigned S2 bugs lead to triage automation spam for our team.
(In reply to Jon Bauman [:jbauman:] from comment #9)
I don't have strong feelings about S2/S3. I probably changed it because it's possible to disable RDD, therefore a workaround exists (which is contrary to the definition of S2) and unassigned S2 bugs lead to triage automation spam for our team.
A workaround exists but not in a way that's discoverable and understandable to a user experiencing this problem.
Assignee | ||
Comment 11•4 years ago
|
||
Hi, do any of you have a bandwidth to take this? If not, then feel free to cancel my NI and I will take this later. Thank you.
Comment 12•4 years ago
|
||
I need to focus on the AVIF stuff at least until we get it shipping by default, so I don't think I can spare the necessary time for this in the short term, but please do include me on any reviews since I'm still planning to get more involved in RDD generally.
Assignee | ||
Comment 13•4 years ago
|
||
Will take a look on it later today.
Assignee | ||
Comment 14•4 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #1)
We could try make this not async (direct dispatch promises?), fail gracefully, or retry.
I do see some value of keeping creating content bridge async, which separate the process of "creating RDD + Video bridge" and "creating content bridge". So I will keep this approach and add new check to avoid getting the crash.
Assignee | ||
Comment 15•4 years ago
|
||
As the process of creating content process is async, the RDD process might be shutdown before that (eg. crash). So we should check if RDD still exists first.
Assignee | ||
Updated•4 years ago
|
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
bugherder |
Comment 18•4 years ago
|
||
Do you want to uplift the fix to beta/87? I'm not sure about uplifting to release/86 because this has been tagged as 'wontfix' for 86.
Assignee | ||
Comment 19•4 years ago
|
||
Sure, uplifting it to Beta seems enough because this is not a serious crash.
Assignee | ||
Comment 20•4 years ago
|
||
Comment on attachment 9206170 [details]
Bug 1685666 - check RDDChild before creating content bridge.
Beta/Release Uplift Approval Request
- User impact if declined: Crash.
- Is this code covered by automated tests?: No
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce any new behavior, it simply added a protection to prevent from accessing the dangling pointer.
- String changes made/needed: none
Thanks for fixing!
(In reply to Alastor Wu [:alwu] from comment #19)
Sure, uplifting it to Beta seems enough because this is not a serious crash.
If the user has content that triggers this bug in tabs that get restored by session restore, the UX is very bad and, I assume, could push the user to try another browser. I suggest uplifting to release as well in case we end up doing a point release for reasons beyond this bug.
Comment 22•4 years ago
|
||
Comment on attachment 9206170 [details]
Bug 1685666 - check RDDChild before creating content bridge.
crash fix, approved for 87.0b6
Comment 23•4 years ago
|
||
bugherder uplift |
Assignee | ||
Comment 24•4 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #21)
Thanks for fixing!
If the user has content that triggers this bug in tabs that get restored by session restore, the UX is very bad and, I assume, could push the user to try another browser. I suggest uplifting to release as well in case we end up doing a point release for reasons beyond this bug.
On the last seven days, we got 77 crashs total on Fx86, which is not a large amount. I'm not sure if that crosses the line of worth to uplift to Release. I personally prefer to bake the fix on Beta for a while even if the fix is only about a small change, because sometime the thing just happens on someplace you didn't expect. In addition, this crash is avoidable by turning off the pref media.rdd-process.enabled
, so it seems not an urgent problem.
Updated•4 years ago
|
Description
•