Closed Bug 1685666 Opened 4 years ago Closed 4 years ago

RDD Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid]

Tracking

()

Status:

RESOLVED FIXED

Milestone:

88 Branch

Tracking Flags:

Tracking

Status

firefox-esr78

---

unaffected

firefox84

---

unaffected

firefox85

---

wontfix

firefox86

---

wontfix

firefox87

---

fixed

firefox88

---

fixed

People

(Reporter: aryx, Assigned: alwu)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: crash, regression, Whiteboard: [not-a-fission-bug])

Crash Data

Attachments

(1 file)

Bug 1685666 - check RDDChild before creating content bridge. 4 years ago Alastor Wu [:alwu] (deleted), text/x-phabricator-request	jcristau : approval-mozilla-beta+	Details

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Reporter

Description

•

4 years ago

Maybe Fission related. (DOMFissionEnabled=1)

Crash report: https://crash-stats.mozilla.org/report/index/9570ab9a-a94f-4d74-a127-8f5c30210107

Reason: SIGSEGV /SEGV_MAPERR

Top 10 frames of crashing thread:

0 libxul.so mozilla::ipc::IToplevelProtocol::OtherPid const ipc/glue/ProtocolUtils.cpp:618
1 libxul.so mozilla::RDDProcessManager::CreateContentBridge dom/media/ipc/RDDProcessManager.cpp:268
2 libxul.so mozilla::RDDProcessManager::EnsureRDDProcessAndCreateBridge const dom/media/ipc/RDDProcessManager.cpp:192
3 libxul.so mozilla::MozPromise<bool, nsresult, false>::ThenValue<mozilla::RDDProcessManager::EnsureRDDProcessAndCreateBridge xpcom/threads/MozPromise.h:835
4 libxul.so mozilla::MozPromise<bool, nsresult, false>::ThenValueBase::ResolveOrRejectRunnable::Run xpcom/threads/MozPromise.h:476
5 libxul.so mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal xpcom/threads/TaskController.cpp:739
6 libxul.so nsThread::ProcessNextEvent xpcom/threads/nsThread.cpp:1200
7 libxul.so mozilla::ipc::MessagePump::Run ipc/glue/MessagePump.cpp:87
8 libxul.so MessageLoop::Run ipc/chromium/src/base/message_loop.cc:309
9 libxul.so nsBaseAppShell::Run widget/nsBaseAppShell.cpp:137

Flags: needinfo?(jya-moz)

Pascal Chevrel:pascalc (PTO until August 21)

Updated

•

4 years ago

status-firefox85: unaffected → affected

tracking-firefox86: ? → +

Jean-Yves Avenard [:jya]

Updated

•

4 years ago

Flags: needinfo?(jya-moz) → needinfo?(matt.woodrow)

Matt Woodrow (:mattwoodrow)

Comment 1

•

4 years ago

Bryce, do you know who plans to take over ownership of the RDD process?

We run LaunchRDDProcess() which launches the process, and then asynchronously runs the callback. The callback calls CreateContentBridge, which assumes mRDDChild exists and dereferences it.

In this case it's nullptr, because we ran code between LaunchRDDProcess and cleared it (I guess because it crashed).

We could try make this not async (direct dispatch promises?), fail gracefully, or retry.

Fairly low volume crash, but probably worth fixing.

Flags: needinfo?(matt.woodrow) → needinfo?(bvandyk)

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Comment 2

•

4 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #1)

Bryce, do you know who plans to take over ownership of the RDD process?

We run LaunchRDDProcess() which launches the process, and then asynchronously runs the callback. The callback calls CreateContentBridge, which assumes mRDDChild exists and dereferences it.

In this case it's nullptr, because we ran code between LaunchRDDProcess and cleared it (I guess because it crashed).

We could try make this not async (direct dispatch promises?), fail gracefully, or retry.

Fairly low volume crash, but probably worth fixing.

I think :jhlin has expressed some interest, but he's on PTO for the near future. CCing :alwu who may be able to help. I don't have a lot of cycles right now, but will hold NI in case I get time to look.

Julien Cristau [:jcristau]

Updated

•

4 years ago

status-firefox85: affected → wontfix

Alastor Wu [:alwu]

Assignee

Updated

•

4 years ago

Blocks: RDD

Alastor Wu [:alwu]

Assignee

Updated

•

4 years ago

Severity: -- → S2

Bryce Seager van Dyk [:bryce] (he/him) - Not reading bugmail

Updated

•

4 years ago

Flags: needinfo?(bvandyk)

Priority: -- → P3

BugBot [:suhaib / :marco/ :calixte]

Comment 3

•

4 years ago

Changing the priority to p2 as the bug is tracked by a release manager for the current nightly.
See What Do You Triage for more information

Priority: P3 → P2

Jon Bauman [:jbauman:]

Updated

•

4 years ago

Severity: S2 → S3

Chris Peterson [:cpeterson]

Comment 4

•

4 years ago

This crash is not related to Fission. About 23% of this signature's crash reports have Fission enabled, but that's roughly the percentage of Nightly users with Fission enabled, so there is no correlation.

status-firefox87: --- → affected

OS: Unspecified → All

Hardware: Unspecified → All

Summary: Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid] → RDD Crash in [@ mozilla::ipc::IToplevelProtocol::OtherPid]

Whiteboard: [fission-]

Chris Peterson [:cpeterson]

Comment 5

•

4 years ago

Not a Fission bug

Whiteboard: [fission-] → [not-a-fission-bug]

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 6

•

4 years ago

With a suitable selection of open Web pages, this turns into a crash upon session restore. (I have no clue what page among my many tabs triggers this crash.)

Pascal Chevrel:pascalc (PTO until August 21)

Comment 7

•

4 years ago

[Tracking Requested - why for this release]:

status-firefox86: affected → wontfix

tracking-firefox86: + → ---

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 8

•

4 years ago

jbauman, I suggest restoring the more severe designation of the bug considering that it seems easy for this to be a startup crash (really session restore-time crash) from the user perspective.

Flags: needinfo?(jbauman)

Jon Bauman [:jbauman:]

Comment 9

•

4 years ago

I don't have strong feelings about S2/S3. I probably changed it because it's possible to disable RDD, therefore a workaround exists (which is contrary to the definition of S2) and unassigned S2 bugs lead to triage automation spam for our team.

Flags: needinfo?(jbauman)

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 10

•

4 years ago

(In reply to Jon Bauman [:jbauman:] from comment #9)

I don't have strong feelings about S2/S3. I probably changed it because it's possible to disable RDD, therefore a workaround exists (which is contrary to the definition of S2) and unassigned S2 bugs lead to triage automation spam for our team.

A workaround exists but not in a way that's discoverable and understandable to a user experiencing this problem.

Alastor Wu [:alwu]

Assignee

Comment 11

•

4 years ago

Hi, do any of you have a bandwidth to take this? If not, then feel free to cancel my NI and I will take this later. Thank you.

Flags: needinfo?(jolin)

Flags: needinfo?(jbauman)

Jon Bauman [:jbauman:]

Comment 12

•

4 years ago

I need to focus on the AVIF stuff at least until we get it shipping by default, so I don't think I can spare the necessary time for this in the short term, but please do include me on any reviews since I'm still planning to get more involved in RDD generally.

Flags: needinfo?(jbauman)

Alastor Wu [:alwu]

Assignee

Comment 13

•

4 years ago

Will take a look on it later today.

Assignee: nobody → alwu

Flags: needinfo?(jolin)

Alastor Wu [:alwu]

Assignee

Comment 14

•

4 years ago

(In reply to Matt Woodrow (:mattwoodrow) from comment #1)

We could try make this not async (direct dispatch promises?), fail gracefully, or retry.

I do see some value of keeping creating content bridge async, which separate the process of "creating RDD + Video bridge" and "creating content bridge". So I will keep this approach and add new check to avoid getting the crash.

Alastor Wu [:alwu]

Assignee

Comment 15

•

4 years ago

Attached file Bug 1685666 - check RDDChild before creating content bridge. (deleted) — Details

As the process of creating content process is async, the RDD process might be shutdown before that (eg. crash). So we should check if RDD still exists first.

Alastor Wu [:alwu]

Assignee

Updated

•

4 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1692957

Pulsebot

Comment 16

•

4 years ago

Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/bf32f5089547 check RDDChild before creating content bridge. r=mattwoodrow

Bogdan Tara[:bogdan_tara | bogdant]

Comment 17

•

4 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/bf32f5089547

Status: NEW → RESOLVED

Closed: 4 years ago

status-firefox88: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → 88 Branch

Viktor Jägersküpper

Comment 18

•

4 years ago

Do you want to uplift the fix to beta/87? I'm not sure about uplifting to release/86 because this has been tagged as 'wontfix' for 86.

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Assignee

Comment 19

•

4 years ago

Sure, uplifting it to Beta seems enough because this is not a serious crash.

Flags: needinfo?(alwu)

Alastor Wu [:alwu]

Assignee

Comment 20

•

4 years ago

Comment on attachment 9206170 [details]
Bug 1685666 - check RDDChild before creating content bridge.

Beta/Release Uplift Approval Request

User impact if declined: Crash.
Is this code covered by automated tests?: No
Has the fix been verified in Nightly?: No
Needs manual test from QE?: No
If yes, steps to reproduce:
List of other uplifts needed: None
Risk to taking this patch: Low
Why is the change risky/not risky? (and alternatives if risky): This patch doesn't introduce any new behavior, it simply added a protection to prevent from accessing the dangling pointer.
String changes made/needed: none

Attachment #9206170 - Flags: approval-mozilla-beta?

Henri Sivonen (:hsivonen) (away from Bugzilla until 2023-09-11)

Comment 21

•

4 years ago

Thanks for fixing!

(In reply to Alastor Wu [:alwu] from comment #19)

Sure, uplifting it to Beta seems enough because this is not a serious crash.

If the user has content that triggers this bug in tabs that get restored by session restore, the UX is very bad and, I assume, could push the user to try another browser. I suggest uplifting to release as well in case we end up doing a point release for reasons beyond this bug.

Julien Cristau [:jcristau]

Comment 22

•

4 years ago

Comment on attachment 9206170 [details]
Bug 1685666 - check RDDChild before creating content bridge.

crash fix, approved for 87.0b6

Attachment #9206170 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Julien Cristau [:jcristau]

Comment 23

•

4 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-beta/rev/3f263665039a

status-firefox87: affected → fixed

Alastor Wu [:alwu]

Assignee

Comment 24

•

4 years ago

(In reply to Henri Sivonen (:hsivonen) from comment #21)

Thanks for fixing!
If the user has content that triggers this bug in tabs that get restored by session restore, the UX is very bad and, I assume, could push the user to try another browser. I suggest uplifting to release as well in case we end up doing a point release for reasons beyond this bug.

On the last seven days, we got 77 crashs total on Fx86, which is not a large amount. I'm not sure if that crosses the line of worth to uplift to Release. I personally prefer to bake the fix on Beta for a while even if the fix is only about a small change, because sometime the thing just happens on someplace you didn't expect. In addition, this crash is avoidable by turning off the pref media.rdd-process.enabled, so it seems not an urgent problem.

BMO Automation

Updated

•

4 years ago

Has Regression Range: --- → yes

You need to log in before you can comment on or make changes to this bug.