Closed Bug 1811641 Opened 2 years ago Closed 2 years ago

Crash in [@ webrtc::videocapturemodule::VideoCaptureImpl::DeliverCapturedFrame]

Categories

(Core :: WebRTC: Audio/Video, defect, P2)

Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
111 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox109 --- unaffected
firefox110 --- wontfix
firefox111 --- fixed

People

(Reporter: mccr8, Assigned: pehrsons)

References

(Regression)

Details

(Keywords: crash, regression)

Crash Data

Attachments

(3 files, 1 obsolete file)

Crash report: https://crash-stats.mozilla.org/report/index/8ad89e28-c397-49ab-bff6-616b70230119

Reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS

Top 10 frames of crashing thread:

0  XUL  nsCOMPtr<nsIThreadObserver>::operator=  xpcom/base/nsCOMPtr.h:663
0  XUL  mozilla::ThreadEventQueue::PutEventInternal  xpcom/threads/ThreadEventQueue.cpp:140
0  XUL  mozilla::ThreadEventQueue::PutEvent  xpcom/threads/ThreadEventQueue.cpp:73
0  XUL  mozilla::ThreadEventTarget::Dispatch  xpcom/threads/ThreadEventTarget.cpp:122
0  XUL  nsThread::Dispatch  xpcom/threads/nsThread.cpp:676
1  XUL  nsIEventTarget::Dispatch  dist/include/nsIEventTarget.h:42
1  XUL  mozilla::camera::CamerasParent::RecvStartCapture const  dom/media/systemservices/CamerasParent.cpp:970
1  XUL  mozilla::media::LambdaRunnable<mozilla::camera::CamerasParent::RecvStartCapture  dom/media/systemservices/MediaUtils.h:77
2  XUL  nsThread::ProcessNextEvent  xpcom/ds/nsTObserverArray.h:89
3  XUL  NS_ProcessNextEvent  xpcom/threads/nsThreadUtils.cpp:473

This signature is pretty bad, but I opened a half dozen of these crashes and they all had CamerasParent::RecvStartCapture on the stack and are all on Mac OS, so I'm going to guess that this is fallout from bug 1806605.

Flags: needinfo?(apehrson)
OS: Unspecified → macOS

The guess is reasonable. And this is very similar to bug 1811226 in terms of how it crashes.

That said, I have not been able to gather from the crash reports how this could be happening, nor have I been able to reproduce anything close to this (with ASAN).

Flags: needinfo?(apehrson)

I let a high-res capture that was used in multiple tabs run for a while since it was showing significant latency. That triggered the system to run out of memory. In my case this was handled gracefully by MacOS so I have no evidence it could be the source of any problems, but it's the closest I've gotten to a lead.

Assignee: nobody → apehrson

S4 because this is only enabled in nightly and early beta.

Severity: -- → S4
Priority: -- → P2
Regressed by: 1806605

Set release status flags based on info from the regressing bug 1806605

Wontfix 110 as there is a low volume of crashes and this is gated pre-release channels.

Here are recent crashes with VideoCaptureAvFoundation on the stack.

There are two kinds of crashes in the callback:

This bug and bug 1811226 I'm guessing are symptoms of the race above involving the callback having happened.

It seems most likely that there is a race between the frame callback (on a frameQueue), and the dtor. The AVCaptureSession's stopRunning (which we already wait for) is claimed to stop capture synchronously. However, it doesn't seem unreasonable that there are still frames in flight to the frameQueue once the session has stopped, and hence there's a race.

Thanks to :farre for bouncing ideas with me here.

There's also alwaysDiscardsLateVideoFrames = NO suggesting frames can be queued up here.

Status: NEW → ASSIGNED
Duplicate of this bug: 1811226

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] → [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=]

Leaving open so we can evaluate the effectiveness of the patch before closing.

Crash Signature: [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] → [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=]
Keywords: leave-open
Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/f96446710a3b [libwebrtc] Wait for dispatched frame handlers to run when stopping. r=ng

Still crashing: 1 2. I'll try another thing, similar to what we do in the old backend.

This is similar to what the old backend does when stopping. Given that we do not
fully understand the failure mode yet, this is a speculative fix.

Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/caacd8bfa912 [libwebrtc] Unset output delegate before stopping. r=webrtc-reviewers,dbaker

This didn't have the desired effect per crash-stats.

This fixes applyConstraints which re-uses the backend with new settings.
Without this patch stopCapture would unset the delegate but startCapture would
not set it again.

Pushed by pehrsons@gmail.com: https://hg.mozilla.org/integration/autoland/rev/d5c555644378 [libwebrtc] Set output delegate when starting instead of during init. r=webrtc-reviewers,ng
Crash Signature: [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] → [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] [@ webrtc::videocapturemodule::VideoCaptureImpl::RegisterCaptureDataCallback ] [@ CMIOUnitVideoToolboxCompressorEntry ]
Crash Signature: [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] [@ webrtc::videocapturemodule::VideoCaptureImpl::RegisterCaptureDataCallback ] [@ CMIOUnitVideoToolboxCompressorEntry ] → [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] [@ webrtc::videocapturemodule::VideoCaptureImpl::RegisterCaptureDataCallback ]

Comment on attachment 9318229 [details]
Bug 1811641 - [libwebrtc] Set output delegate when starting instead of during init. r?#webrtc-reviewers!

Beta/Release Uplift Approval Request

  • User impact if declined: Camera reconfigs on Mac (in early beta or earlier) will cause all video tracks coming from that camera to freeze.
  • Is this code covered by automated tests?: No
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: No
  • If yes, steps to reproduce: I don't think this needs manual test from QE since it is enabled only to early beta. But if you disagree the STR is simple:
    Open https://jsfiddle.net/pehrsons/0ehb87au/
    Click start
    Verify that video is flowing after reconfigs (camera must be visibly reset, otherwise one may have to adjust the initial requested width).
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): It is moving the timing of a call on the camera backend. Threading or lifetimes are not affected.
  • String changes made/needed:
  • Is Android affected?: No
Attachment #9318229 - Flags: approval-mozilla-beta?

Thanks :phersons for the uplift request.
Do you still need the leave-open on this keyword or can this be resolved?
We can look at the uplift, but wondering if there was any other work planned for this bug?

Flags: needinfo?(apehrson)

We still have crashes occurring so we can continue trying to fix them here or open another bug if that simplifies something? Unfortunately we have no insight into these crashes other than through crash-stats, so they're tricky to debug.

Flags: needinfo?(apehrson)

(In reply to Andreas Pehrson [:pehrsons] from comment #25)

We still have crashes occurring so we can continue trying to fix them here or open another bug if that simplifies something? Unfortunately we have no insight into these crashes other than through crash-stats, so they're tricky to debug.

Thanks for the context, it gets tricky when we are tracking a fix across multiple releases but I see this is only impacts early beta for now - it won't impact 111 release.
I'll take the uplift so we can see in the beta population, though the volume of the crash is low

Comment on attachment 9318229 [details]
Bug 1811641 - [libwebrtc] Set output delegate when starting instead of during init. r?#webrtc-reviewers!

Approved for 111.0b3

Attachment #9318229 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

(In reply to Andreas Pehrson [:pehrsons] from comment #25)

We still have crashes occurring so we can continue trying to fix them here or open another bug if that simplifies something? Unfortunately we have no insight into these crashes other than through crash-stats, so they're tricky to debug.

If you fixed some of the crashes, I'd change the summary of this bug to describe a bit what you did fix, and then open a new bug for the remaining crashes, if you think there's something worth investigating. Keeping a particular bug about a specific set of patches makes it easier to understand which releases have which patches, and bugs are cheap.

...and also close this bug, I meant to add.

Attachment #9318105 - Attachment is obsolete: true

I agree, I'm just not sure we really fixed something. It does seem like we fixed this class of crashes in the frame callback though. Unfortunately the others seem unrelated.

Status: ASSIGNED → RESOLVED
Crash Signature: [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] [@ rtc::scoped_refptr<T>::operator=] [@ webrtc::videocapturemodule::VideoCaptureImpl::RegisterCaptureDataCallback ] → [@ webrtc::videocapturemodule::VideoCaptureImpl::DeliverCapturedFrame] [@ mozilla::detail::MutexImpl::mutexLock | mozilla::detail::MutexImpl::lock | mozilla::OffTheBooksMutex::Lock | mozilla::detail::BaseAutoLock<T>::BaseAutoLock]
Closed: 2 years ago
Keywords: leave-open
Resolution: --- → FIXED
Summary: Crash in [@ nsCOMPtr<T>::operator= | mozilla::ThreadEventTarget::Dispatch] → Crash in [@ webrtc::videocapturemodule::VideoCaptureImpl::DeliverCapturedFrame]
Target Milestone: --- → 111 Branch
Blocks: 1817724
Blocks: 1825514
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: