Peer sees choppy motion/low frameRate in 1-1 call with Google Meet (regression)
Categories
(Core :: WebRTC, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr60 | --- | unaffected |
firefox64 | --- | unaffected |
firefox65 | + | fixed |
firefox66 | + | fixed |
People
(Reporter: jib, Assigned: dminor)
References
(Blocks 1 open bug)
Details
(Keywords: regression)
Attachments
(4 files, 1 obsolete file)
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
image/png
|
Details | |
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-release+
|
Details |
Reporter | ||
Updated•6 years ago
|
Comment 1•6 years ago
|
||
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
Updated•6 years ago
|
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Assignee | ||
Comment 6•6 years ago
|
||
Comment 7•6 years ago
|
||
Comment 8•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
Reporter | ||
Comment 14•6 years ago
|
||
Assignee | ||
Comment 15•6 years ago
|
||
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
Assignee | ||
Comment 18•6 years ago
|
||
Assignee | ||
Comment 19•6 years ago
|
||
Fwiw, I just interviewed an intern on Meet. Their audio and video was fine for me. They reported good audio but choppy video.
Assignee | ||
Comment 20•6 years ago
|
||
Hey Byron, if you can spare some time, Nils suggested you might be a good person to ask to have a look at this one. It is definitely a regression from the branch 64 update, related to simulcast and bitrate settings, but I'm having a hard time tracking it down. Even if you have some suggestions of other places in the code I should be looking, that would be very helpful. Thanks!
Assignee | ||
Comment 21•6 years ago
|
||
Just to confirm that the dropped frames are a separate issue, if I disable frame dropping in the VideoConduit [1] I see the dropped frames reported by SendStatisticsProxy [2] go to zero, but the video is still choppy.
[1] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/media/webrtc/signaling/src/media-conduit/VideoConduit.cpp#711
[2] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/media/webrtc/trunk/webrtc/video/send_statistics_proxy.cc#582
Assignee | ||
Comment 22•6 years ago
|
||
Randell, could you have another look at this? This will be on release very soon and I don't think I'm any closer to tracking it down. Thanks!
Assignee | ||
Comment 23•6 years ago
|
||
I spent some time looking at the values reported by BitrateControllerImpl::AvailableBandwidth. With the branch update, the estimate for the available bandwidth grows much more quickly, but if I leave the call running for a few minutes, both converge to similar values, e.g. 5668096 with the branch update and 5863040 without. The other difference I see is that the RTT with the branch update is around 40, without it is 56.
Assignee | ||
Comment 24•6 years ago
|
||
Looking at the receive side in Chrome, without the branch update, the graphs for googFrameRateReceived, googFrameRateOutput, and googFrameRateDecoded show similar values, typically around 30 FPS with occasional drops to 15 or 20 FPS.
With the branch update, googFrameRateReceived still shows 30FPS with occasional drops, but googFrameRateOutput and googFrameRateDecoded average around 9FPS with lots of fluctuations.
Comment 25•6 years ago
|
||
Dan - can you screenshot the results from comment 24? and if possible get a wireshark trace? Probably a raw wireshark trace, though a trace drno's packet-dump pref wouldn't hurt.
Also a webrtc_trace dump... did you ever hook up the new webrtc log messages to webrtc_trace:5?
Assignee | ||
Comment 26•6 years ago
|
||
Assignee | ||
Comment 27•6 years ago
|
||
I've attached the screenshots illustrating the difference. I've confirmed by adding additional logging that we see the same thing if Firefox is the receiver: the network framerate stays around 30fps, while the decoded framerate drops to 8 or 9.
The framerate problems seem to show up when we switch to the high resolution stream on Meet. I see bits being allocated to it on the send side, and the video resolution increases on the receive side, and then the decoded framerate drops.
I've also briefly looked at appear.in. It also uses three streams, but it seems to stay on the low quality stream and does hit the same problems.
I'll gather the requested traces.
Assignee | ||
Comment 28•6 years ago
|
||
I put the traces up at: https://file.pizza/rosemary-capers-prawn-salami
I have a wireshark capture as well as a webrtc log with rtplogger enabled. The new tracing code (TRACE_EVENT0 and friends) is not hooked up to the logs, I had a quick look and it did not seem straightforward to get it working.
Assignee | ||
Comment 29•6 years ago
|
||
If I log where frames are inserted into the frame buffer [1] and where frames leave the frame buffer [2], on a recent run I see 816 frames go in and 317 frames leave. (317/816)*30fps ~ 11fps, which roughly matches the observed framerate in both Firefox and Chrome. I'm not sure where the missing frames are going, I only see a handful of messages in the log mentioned frames being dropped, not nearly enough to explain the discrepancy.
Clearing the NI to Byron, when I first asked this seemed like a simulcast bitrate problem, that no longer seems likely.
[1] https://searchfox.org/mozilla-central/rev/7adb490485eff0783071a3e132005bceeb337461/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#289
[2] https://searchfox.org/mozilla-central/rev/7adb490485eff0783071a3e132005bceeb337461/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#195
Assignee | ||
Comment 30•6 years ago
|
||
I'm not able to get this to reproduce on calls between Firefox 64 and Nightly, only between Nightly and Nightly or Nightly and Chrome 70. I ran a call between Firefox 64 and Nightly for 20 minutes without it reproducing, then left the call with Firefox 64 and rejoined with Chrome on the same laptop and it reproduced almost immediately.
Reporter | ||
Updated•6 years ago
|
Assignee | ||
Comment 31•6 years ago
|
||
After spending some more time on this, I don't think we're actually dropping frames, it looks like we start requesting them more slowly. I can see frames start building up in the frame buffer.
It also doesn't seem directly tied to resolution. Sometimes my OS X laptop sends 640x480 rather than 1280x720 on meet as the highest resolution stream, and I still see the discrepancy between network framerate and decoded framerate.
Assignee | ||
Comment 32•6 years ago
|
||
I added some logging about how long it takes to get a frame from the frame buffer when receiving:
With the branch 64 update, I see a mean of 50ms, a stddev of 113ms, and a max of 930ms.
Without the update applied, I see a mean of 30ms, a stddev of 18ms, and a max of 152ms.
From previous testing, as far as I can tell, we're inserting frames into the frame buffer at the same rate in each case and it does not appear that we are dropping frames.
Interestingly, as mentioned above, this does not reproduce between Nightly and Firefox 64. I double checked, and Firefox 64 is using the old jitter buffer implementation, not frame_buffer2.cc which is what we use in Nightly after the branch update (and presumably is what is used by Chrome.)
If this was just on the receive side, I would expect one side of the Nightly and Firefox 64 call to be affected, as well as Chrome to Chrome calls. Since this is not the case, it seems likely it is an interaction between send side and the newer frame buffer implementation.
Assignee | ||
Comment 33•6 years ago
|
||
After a bit of digging, it looks like on the receive side we're skipping frames here [1] due to missing decodable frames vastly more often in the branch update case than in the preupdate case (17280 times in a 28632 line log file vs 452 times in a 29084 line log file.)
So we are dropping frames, I missed it the earlier because most places where frames are dropped have a preexisting webrtc log statement, but this one doesn't, perhaps because it does happen occasionally under normal circumstances.
Comment 34•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #33)
After a bit of digging, it looks like on the receive side we're skipping frames here [1] due to missing decodable frames vastly more often in the branch update case than in the preupdate case (17280 times in a 28632 line log file vs 452 times in a 29084 line log file.)
So we are dropping frames, I missed it the earlier because most places where frames are dropped have a preexisting webrtc log statement, but this one doesn't, perhaps because it does happen occasionally under normal circumstances.
Have you added logging to check if the continous flag or the num_missing_decodable makes us enter the if condition?
I would guess that the counting for num_missing_decodable is somehow off.
Assignee | ||
Comment 35•6 years ago
|
||
It looks like it is always missing decodable frames. If the continuous flag is sometimes false, it's happening much less often.
Comment 36•6 years ago
|
||
Unfortunately we're out of time for a fix this cycle. Hopefully we can get to the bottom of it for 66 still.
Assignee | ||
Comment 37•6 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #36)
Unfortunately we're out of time for a fix this cycle. Hopefully we can get to the bottom of it for 66 still.
Yes, sorry.
As an update, we've reached out to the Meet team to see if they have any logs or anything on their side that might help us diagnose this problem.
I've also been trying to track down why we're unable to decode frames. Unfortunately, the Meet SFU rewrites picture ids, which would be the easiest way to correlate frames from the sender to the receiver. I've been working on logging state changes for each frame in the frame buffer and then using a script to isolate cases where we end up with a large number of non-decodable frames. I'm hoping to see a pattern in frames that we're unable to decode and then be able to use more logging or Wireshark to correlate this back to a problem on the send side.
As far as I can tell, the picture ids from Meet start at zero, don't have gaps and show up in order, so I don't think the problem is that we are just missing frames or receiving them too late. I'll spend some time this morning to double check this before continuing.
Assignee | ||
Comment 38•6 years ago
|
||
Saeed from the Meet team pointed out that the timestamps on the receive side seemed incorrect but they are fine on the send side. It turns out the Meet SFU rewrites timestamps based upon RTCP sender reports.
Unfortunately, I ended up spending time duplicating Saeed's work. For the record:
Looking at the delta between timestamps on the send side:
min|max |mean|std. dev
900|16740|3059|1004
900|6390 |3017|797
900|10440|3026|820
And on the receive side:
min|max |mean|std. dev
1 |88896|3055|9953
So the rewriting has a pretty big impact on the distribution of the timestamps.
Hopefully this explains the problems we are seeing.
Assignee | ||
Comment 39•6 years ago
|
||
There is a difference in the rtp timestamp estimated in the sender report. The last_frame_capture_time_ms_ value at [1] is always zero in affected versions of Firefox, which means the rtp timestamps in the sender report do not match up with the rtp timestamps when a video packet is sent. I'll track down why it is not being set.
If I change the code at [2] to assign a value to last_frame_capture_time_ms_ based upon the current clock time, I can confirm that the bug does not reproduce and the video is smooth.
[1] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_sender.cc#482
[2] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_sender.cc#283
Assignee | ||
Comment 40•6 years ago
|
||
Prior to the branch 64 update, timestamps were set in ViEEncoder::OnFrame unconditionally [1]. After the update, the equivalent code is in VideoStreamEncoder::OnFrame, however rather than setting the timestamp unconditionally, it is only done if it is greater than the current time.
If I comment out the if statement so that the timestamp is set all of the time then the bug no longer reproduces.
Note that [1] is from an upstream import commit and does not include any of our local modifications. This was just a case of a behavioural change in the upstream code that we didn't catch.
[1] https://hg.mozilla.org/integration/mozilla-inbound/file/fda2b2655b26/media/webrtc/trunk/webrtc/video/vie_encoder.cc#l461
[2] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/video/video_stream_encoder.cc#681
Assignee | ||
Comment 41•6 years ago
|
||
Assignee | ||
Comment 42•6 years ago
|
||
In the past we relied upon ViEEncoder::OnFrame to set the render time for
frames. With the branch 64 update, this code moved to
VideoStreamEncoder::OnFrame, and only sets the timestamp if it is greater than
the current time. This results in broken rtp timestamp estimates in the rtcp
sender report, which causes problems for Meet and possibly other services
that rewrite rtp timestamps based upon the sender report.
This patch explicitly sets the timestamp in VideoConduit::SendVideoFrame. This
should give us the same behaviour that we had before the branch update without
requiring local modifications to upstream code.
Assignee | ||
Comment 43•6 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #41)
Try job here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0b047c3b1351304f5d819c0bb316d417aafc369
This has gtests failures because they are testing explicit values of the timestamp rather than just checking that it is increasing. I've filed Bug 1522238 to do a proper fix for this.
Assignee | ||
Comment 44•6 years ago
|
||
In the past we relied upon ViEEncoder::OnFrame to set the render time for
frames. With the branch 64 update, this code moved to
VideoStreamEncoder::OnFrame, and only sets the timestamp if it is greater than
the current time. This results in broken rtp timestamp estimates in the rtcp
sender report, which causes problems for Meet and possibly other services
that rewrite rtp timestamps based upon the sender report.
This patch makes VideoStreamEncoder::OnFrame always set the timestamp. In a
follow on bug, we'll move this behaviour to VideoConduit so we don't have to
maintain a local modification of the upstream code.
Updated•6 years ago
|
Assignee | ||
Comment 45•6 years ago
|
||
Comment 46•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 47•6 years ago
|
||
[Tracking Requested - why for this release]: this is a regression from Firefox 64, which might impact a lot of (Hangout) users.
Updated•6 years ago
|
Updated•6 years ago
|
Comment 48•6 years ago
|
||
For some unknown reason the explanation when filling the beta uplift request does not show up here. Manually copying stuff in a comment here instead:
Feature/Bug causing the regression: 1376873
User impact if declined: The video of Firefox users on Google Hangouts calls appears with very bad quality (low framerate) and freezes very frequently.
Is this code covered by automated tests? Yes
Has the fix been verified in Nightly? No
Needs manual test from QE? No
If yes, steps to reproduce:
List of other uplifts needed: N/A
Risk to taking this patch: Low
Why is the change risky/not risky?
(and alternatives if risky): The update of libwebrtc from upstream to version 64 (in bug 1376873) came with one new additional if condition. We are only restoring the previous code here by removing this if condition and restore the behavior from before the update.
String changes made/needed: N/A
Updated•6 years ago
|
Comment 49•6 years ago
|
||
bugherder |
Comment 50•6 years ago
|
||
Comment on attachment 9038641 [details]
Bug 1515205 - Always set frame timestamps in VideoStreamEncoder::OnFrame; r=drno!
[Triage Comment]
Reverts us to the behavior we had before the last WebRTC upstream update. Nils assures me this was also heavily tested by people on his team across different sites. Approved for 65.0 RC2.
Comment 51•6 years ago
|
||
bugherder uplift |
Comment 52•6 years ago
|
||
(In reply to Nils Ohlmeier [:drno] from comment #48)
Is this code covered by automated tests? Yes
Needs manual test from QE? No
Marking as qe-verify- since it is covered by automated tests.
Description
•