Closed Bug 1397539 Opened 7 years ago Closed 2 years ago

H264 rendering fails after several calls

Categories

(Core :: Audio/Video: GMP, defect, P3)

49 Branch
defect

Tracking

()

RESOLVED FIXED

People

(Reporter: drno, Unassigned)

References

()

Details

Attachments

(1 file)

After making about 10-20 calls on http://mozilla.github.io/webrtc-landing/pc_test.html with "Require H264" turned on remote video no longer gets rendered.

I repeatedly start and stop the call without reloading the page.

After several starts (10-20?) the call connects according to about:webrtc and the ICE stats indicate that data gets exchanged, but both remote video elements never start rendering anything.

When in this "blank white" state I turn off "Require H264" I get proper video rendering (with VP8) again. But as soon as I turn "Require H264" back on rendering still fails.

Reloading the page does not solve the problem. Closing the tab and trying it in another tab does not solve the problem.

This is with OpenH264 1.6, but also with the upcoming 1.7.1 releases.
I create a test page to repro this: https://nils-ohlmeier.github.io/webrtc-landing/pc_test_loop_h264.html

- Open the link
- give access to cam and click the check box for future permission

On my first try it took 27 connections until I start to see white boxes for both remote videos (the bigger videos).

about:webrtc suggests that no RTP gets exchanged (neither send or received) once this stage has been reached.
Mass change P1->P2 to align with new Mozilla triage process
Priority: P1 → P2
we reproduced the issue using the demo : https://nils-ohlmeier.github.io/webrtc-landing/pc_test_loop_h264.html
the log has been uploaded.
Attached file firefox_openh264_not_work.rar (deleted) β€”
We have a couple of different issues relating to H264 now. Byron could you investigate these? I'm guessing they all stem from the same root cause.
Flags: needinfo?(docfaraday)
Not anytime soon. I might be able to give this a look next week?
Flags: needinfo?(docfaraday)
That's fine. Can we keep the needinfo so it doesn't fall off the charts?
Flags: needinfo?(docfaraday)
This seems to happen when we start encoding frames before GMP is done de-initing from the previous test... looking further, but likely a bug in either the GMP code, or the wrapper code we use to interface with GMP.
Yeah, this definitely happens when the GMPParent has not been closed yet when we try to start a new call; the encoder/decoder has been torn down, but the logic in the GMPParent thinks the h264 process is still open and doesn't re-init it. This is almost certainly a bug in either the GMP code proper, or maybe the openh264 stuff.
Pushing further back, this seems to happen when ActorDestroy has not been called on both encoder parents and both decoder parents before starting the new call. This does not appear to be an issue of not waiting long enough to start the new call; either all of the expected ActorDestroy calls are made in short order, or they are simply not made regardless of how long we wait.
Flags: needinfo?(docfaraday)
We are deadlocking in the plugin process here:

https://searchfox.org/mozilla-central/source/dom/media/gmp/GMPVideoEncoderChild.cpp#186
I see the following code at the end of Encode_w (the encode function that is called on the worker thread on the plugin process) in the openh264 code:


    if (!has_frame) {
      // This frame must be destroyed on the main thread.
      TrySyncRunOnMainThread (WrapTask (
                                   this,
                                   &OpenH264VideoEncoder::DestroyInputFrame_m,
                                   inputImage));
      return;
    }

    // Synchronously send this back to the main thread for delivery.
    TrySyncRunOnMainThread (WrapTask (
                                   this,
                                   &OpenH264VideoEncoder::Encode_m,
                                   inputImage,
                                   &encoded,
encoded_type));

We are sync dispatching to main when we're done encoding a frame. Here's the implementation of EncodingComplete:

  virtual void EncodingComplete() {
    shutting_down = true;

    // Release the reference to the external objects, because it is no longer safe to call them
    host_     = nullptr;
    callback_ = nullptr;
    TearDownEncoder();

    Release();
}

TearDownEncoder is implemented thusly:

   void TearDownEncoder() {
     // Stop the worker thread first
     if (worker_thread_) {
       worker_thread_->Join();
       worker_thread_ = nullptr;
     }

     // Destroy OpenH264 encoder
     if (encoder_) {
       WelsDestroySVCEncoder(encoder_);
       encoder_ = nullptr;
     }
   }


This code is called on main. If we get a call to EncodingComplete while we are encoding a frame, "worker_thread_->Join()" will never return, because the worker thread is trying to sync dispatch back to main.
I should note that, when we deadlock, the IPC logging looks like this:

[time: 1533313793767228][49950->49965] [PGMPVideoEncoderParent] Sending  PGMPVideoEncoder::Msg_Encode
[time: 1533313793767359][49950->49965] [PGMPVideoEncoderParent] Sending  PGMPVideoEncoder::Msg_EncodingComplete
[time: 1533313793767397][49965<-49950] [PGMPVideoEncoderChild] Received  PGMPVideoEncoder::Msg_Encode
[time: 1533313793769270][49965<-49950] [PGMPVideoEncoderChild] Received  PGMPVideoEncoder::Msg_EncodingComplete

This seems to line up with the openh264 bug I described in comment 12.
I have filed https://github.com/cisco/openh264/issues/3007
Blocks: 1486988
Assignee: nobody → hankpeng
This bug is supposed to be fixed by the new release v1.8.1.
Status: NEW → ASSIGNED
Depends on: 1533001

The bug assignee didn't login in Bugzilla in the last 7 months.
:bryce, could you have a look please?
For more information, please visit auto_nag documentation.

Assignee: hankpeng → nobody
Status: ASSIGNED → NEW
Flags: needinfo?(bvandyk)

It sounds like this should be fixed in the current OpenH264 release and needs to be verified. :bwc, it sounds like you've previously reproed this and are more familiar with the domain than I. Would you be able to test this again and close or confirm this isn't fixed as appropriate?

Severity: normal → S3
Flags: needinfo?(bvandyk) → needinfo?(docfaraday)
Priority: P2 → P3

I have not had any luck reproducing this.

Status: NEW → RESOLVED
Closed: 2 years ago
Flags: needinfo?(docfaraday)
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: