Closed Bug 1515205 Opened 6 years ago Closed 6 years ago

Peer sees choppy motion/low frameRate in 1-1 call with Google Meet (regression)

Categories

(Core :: WebRTC, defect, P2)

Product:

Component:

Version:

65 Branch

Platform:

Unspecified

macOS

Type:

defect

Priority:

P2

Severity:

major

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla66

Tracking Flags:

Tracking

Status

firefox-esr60

---

unaffected

firefox64

---

unaffected

firefox65

+

fixed

firefox66

+

fixed

People

(Reporter: jib, Assigned: dminor)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(4 files, 1 obsolete file)

webrtc_traces.tgz 6 years ago Dan Minor [:dminor] (deleted), application/x-compressed-tar		Details
webrtc-internals from pre-branch update 6 years ago Dan Minor [:dminor] (deleted), image/png		Details
webrtc-internals from post-branch update 6 years ago Dan Minor [:dminor] (deleted), image/png		Details
Bug 1515205 - Set frame timestamp in VideoConduit::SendVideoFrame; r=drno! 6 years ago Dan Minor [:dminor] (deleted), text/x-phabricator-request		Details
Bug 1515205 - Always set frame timestamps in VideoStreamEncoder::OnFrame; r=drno! 6 years ago Dan Minor [:dminor] (deleted), text/x-phabricator-request	RyanVM : approval-mozilla-release+	Details

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Description

•

6 years ago

STRs: 1. In Nightly, create a new meeting, e.g. https://meet.google.com/djy-ehkm-gmr?authuser=1 2. Have a (non-local) friend join meeting using Chrome (unclear if browser is relevant). 3. Wave to friend, moving hand slowly back and forth, for a good 20 seconds. Expected result: - Friend reports seeing smooth motion. Actual result: - Friend reports seeing choppy motion / low frame rate. Regression range: Narrowed inbound regression window from [faf1cfd8, 983a36d2] (5 builds) to [faf1cfd8, 1f47cb09] (3 builds) (~1 steps left) Pushlog: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=faf1cfd83dd5a3359edfd6d7e66a7123a84bb031&tochange=1f47cb09f1e64ffa9d530235eb8cfb0b3ae65f3c -> Bug 1376873. (Full disclosure: above was second to last bisect. Final pointed to Bug 1504867, a mozregression bug?)

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

6 years ago

Blocks: 1376873

status-firefox64: --- → unaffected

status-firefox65: --- → affected

status-firefox66: --- → affected

status-firefox-esr60: --- → unaffected

status-geckoview64: --- → unaffected

status-geckoview65: --- → affected

tracking-firefox65: --- → ?

tracking-firefox66: --- → ?

Randell Jesup [:jesup] (needinfo me)

Comment 1

•

6 years ago

A good set of logs before and after (with webrtc_trace logs) and a wireshark trace would help a lot in tracking down why this happens.

Dan Minor [:dminor]

Assignee

Comment 2

•

6 years ago

Thank you for running the mozregression. It sounds like this may be bandwidth related, if so, hopefully the Network Link Conditioner tool in OS X will make this reproduce.

Dan Minor [:dminor]

Assignee

Comment 3

•

6 years ago

Fwiw, I'm seeing very choppy video on Meet just on a local call. Meet was one of the services I tested prior to landing the update, I wonder if something has changed on their end that has exposed the bug in Firefox. I don't see the same problem with appear.in locally. That suggests that the problem that Randell is seeing is a separate bug.

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

status-geckoview64: unaffected → ---

status-geckoview65: affected → ---

tracking-firefox65: ? → +

tracking-firefox66: ? → +

Dan Minor [:dminor]

Assignee

Comment 4

•

6 years ago

Randell, are you using OS X when you see problems with calls? I see choppy video from OS X to my Linux desktop and laptop, but not between the Linux machines or from Linux back to OS X. We did switch video capture backends on OS X with this update (Bug 1439997) but it doesn't seem likely that is the source of the problem or it would show up more consistently across services. Local video on the OS X laptop seems fine as well. I spent some time looking at maxMbps/maxFps in VideoConduit and the initial bandwidth allocations in VideoConduit and in the simulcast code just before and after the update. The code has changed, but the resulting calculations seemed to match up. I guess the next step is to see if there is something wrong with the bandwidth estimation as the call progresses.

Flags: needinfo?(rjesup)

Dan Minor [:dminor]

Assignee

Comment 5

•

6 years ago

Bandwidth estimation (as reported to Call::OnNetworkChanged by SendSideCongestionController) converges to similar values pre- and post-update.

Dan Minor [:dminor]

Assignee

Comment 6

•

6 years ago

Attached file webrtc_traces.tgz (deleted) — Details

Log files "webrtc_trace:5" from the OS X machine sending the choppy video. preupdate.log is from a build just before branch update landed. update.log is from a build of today's mozilla central. Some interesting changes (only present in update.log): [Child 97579: Main Thread]: D/webrtc_trace (alr_detector.cc:112): Using ALR experiment settings: pacing factor: 1, max pacer queue length: 2875, ALR start bandwidth usage percent: 80, ALR end budget level percent: 40, ALR end budget level percent: -60, ALR experiment group ID: 3 [Child 97579: Main Thread]: D/webrtc_trace (aimd_rate_control.cc:74): Using aimd rate control with back off factor 0.85 [Child 97579: Main Thread]: D/webrtc_trace (delay_based_bwe.cc:107): Using Trendline filter for delay change estimation with window size 20 I've done some experimentation (e.g. disabling the ALR experiment) and none of these seem to have an affect.

Randell Jesup [:jesup] (needinfo me)

Comment 7

•

6 years ago

(In reply to Dan Minor [:dminor] from comment #4) > Randell, are you using OS X when you see problems with calls? No. Win10 > I see choppy video from OS X to my Linux desktop and laptop, but not between > the Linux machines or from Linux back to OS X. We did switch video capture > backends on OS X with this update (Bug 1439997) but it doesn't seem likely > that is the source of the problem or it would show up more consistently > across services. Local video on the OS X laptop seems fine as well. When I see problems with Vicky (who is on Mac), I see audio freezes/glitches as well. Also resolution drops after them, implying network issues... or issue we have dealing with minor network hiccups. Overall it's FAR worse than I've seen in the past - and far worse than Vidyo on the same machine in the same location, to the same person. > I spent some time looking at maxMbps/maxFps in VideoConduit and the initial > bandwidth allocations in VideoConduit and in the simulcast code just before > and after the update. The code has changed, but the resulting calculations > seemed to match up. I guess the next step is to see if there is something > wrong with the bandwidth estimation as the call progresses. Also, these happen periodically throughout the call. It seems like poor reaction to probing over the BW limit.

Flags: needinfo?(rjesup)

Randell Jesup [:jesup] (needinfo me)

Comment 8

•

6 years ago

> Some interesting changes (only present in update.log): > > [Child 97579: Main Thread]: D/webrtc_trace (alr_detector.cc:112): Using ALR > experiment settings: pacing factor: 1, max pacer queue length: 2875, ALR > start bandwidth usage percent: 80, ALR end budget level percent: 40, ALR end > budget level percent: -60, ALR experiment group ID: 3 > > [Child 97579: Main Thread]: D/webrtc_trace (aimd_rate_control.cc:74): Using > aimd rate control with back off factor 0.85 > > [Child 97579: Main Thread]: D/webrtc_trace (delay_based_bwe.cc:107): Using > Trendline filter for delay change estimation with window size 20 > > I've done some experimentation (e.g. disabling the ALR experiment) and none > of these seem to have an affect. Why are we using ALR experiments in regular calls? Probably this (i.e. screenshare only) is what's causing the first log: if (experiment_name == kScreenshareProbingBweExperimentName) { // This experiment is now default-on with fixed settings. ALR is for cases where we're not filling the pipe - though if it is accidentally enabled for regular calls, it could be at fault. wireshark traces and coordinated logs I think are needed

Flags: needinfo?(dminor)

Dan Minor [:dminor]

Assignee

Comment 9

•

6 years ago

I've captured logs with wireshark traces (using RtpLogger, I assume that is what you meant), but they're too big to attach to this bug. It's a bit of a needle in a haystack, any suggestion of what I should be looking out for?

Flags: needinfo?(dminor)

Dan Minor [:dminor]

Assignee

Comment 10

•

6 years ago

The choppy video I'm seeing is due to frames being dropped by the encoder internally, as reported here [1]. I added a logging statement and see a burst of dropped frames that corresponds to choppy video on the receiver side. [1] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/media/webrtc/trunk/webrtc/modules/video_coding/generic_encoder.cc#309

Dan Minor [:dminor]

Assignee

Comment 11

•

6 years ago

(In reply to Dan Minor [:dminor] from comment #10) > The choppy video I'm seeing is due to frames being dropped by the encoder > internally, as reported here [1]. I added a logging statement and see a > burst of dropped frames that corresponds to choppy video on the receiver > side. > > [1] > https://searchfox.org/mozilla-central/rev/ > 0ee0b63732d35d16ba22d5a1120622e2e8d58c29/media/webrtc/trunk/webrtc/modules/ > video_coding/generic_encoder.cc#309 Maybe this is just a coincidence, the choppiness seems to continue long after I see a frame dropped.

Dan Minor [:dminor]

Assignee

Comment 12

•

6 years ago

This is also occurring with video sent from my Windows laptop, so not an OS X specific issue.

Dan Minor [:dminor]

Assignee

Comment 13

•

6 years ago

There might be something going on with the order in which participants join. I always see choppy video from the second participant back to the first participant, so if I start the meeting on my OS X laptop and then join from Linux, the video from Linux appears choppy on OS X. It also seems to be related to simulcast. If I hard code streamCount = 1 here [1], I have yet to see any problems. If I log target bitrates here [2] or here [3], I initially see two streams, e.g. [60000, 30000, 60000], [60000, 30000, 60000], which increases to: [60000, 30000, 60000], [422000, 212000, 422000], at which point a third stream is added: [60000, 30000, 60000], [200000, 100000, 200000], [253000, 126000, 253000] which increases to: [60000, 30000, 60000], [200000, 100000, 200000], [1000000, 500000, 1000000] Not conclusive, but it looks like the choppiness starts when we switch from two streams to three. Sometimes that happens earlier in the call, sometimes later. [1] https://searchfox.org/mozilla-central/rev/ecf61f8f3904549f5d65a8a511dbd7ea4fd1a51d/media/webrtc/signaling/src/media-conduit/VideoStreamFactory.cpp#131 [2] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/media/webrtc/trunk/webrtc/modules/video_coding/generic_encoder.cc#121 [3] https://searchfox.org/mozilla-central/rev/0ee0b63732d35d16ba22d5a1120622e2e8d58c29/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc#260

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Comment 14

•

6 years ago

System info: Model Name: MacBook Pro Model Identifier: MacBookPro13,3 Processor Name: Intel Core i7 Processor Speed: 2.9 GHz Number of Processors: 1 Total Number of Cores: 4 L2 Cache (per Core): 256 KB L3 Cache: 8 MB Memory: 16 GB Boot ROM Version: 250.0.0.0.0 SMC Version (system): 2.38f7 Chipset Model: AMD Radeon Pro 460 Type: GPU Bus: PCIe PCIe Lane Width: x8 VRAM (Dynamic, Max): 4096 MB Vendor: AMD (0x1002) ROM Revision: 113-C980AF-908 VBIOS Version: 113-C9801AU-029 EFI Driver Version: 01.A0.908 Automatic Graphics Switching: Supported gMux Version: 4.0.29 [3.2.8] Metal: Supported, feature set macOS GPUFamily1 v3 Displays: Color LCD: Display Type: Built-In Retina LCD Resolution: 2880 x 1800 Retina Framebuffer Depth: 24-Bit Color (ARGB8888) Main Display: Yes Mirror: Off Online: Yes Rotation: Supported Automatically Adjust Brightness: No

Dan Minor [:dminor]

Assignee

Comment 15

•

6 years ago

I asked Nico for help on this but it does not reproduce for him at all, other than some dropped frames. It is possible this is a performance regression, that would be consistent with the problem starting when we start allocating bits to all three streams, and would then also depend upon the systems being used and OS load. I think when jib first saw this problem, he also had problems on appear.in that might have been compositor related. If there was a temporary performance regression elsewhere in Firefox perhaps that was enough to make this problem show up as well. My macbook is fairly old, a mid-2015 model with Intel Iris Pro integrated graphics. I think I see a bit of a slow down with my windows laptop, which is beefier, a core i7-4800 @2.7GHz, Quadro K2100, 16 GB ram, but not as severe as with the macbook. I did check that my macbook is not using hardware accelerated VP8 encoding either before or after the update. I also double checked and made sure I did not accidentally leave the network link conditioner enabled on it.

Dan Minor [:dminor]

Assignee

Comment 16

•

6 years ago

I've grabbed some perf.html profiles just in case there was something noticeably different pre- and post- webrtc.org update: Prior to the webrtc.org update here: https://perfht.ml/2C2Rp8Z Post webrtc.org update here: https://perfht.ml/2R82oIs The only thing webrtc related I noticed in the profiles is the RTCStatsQuery, but that seems to be take roughly the same amount of time both pre and post update.

Dan Minor [:dminor]

Assignee

Comment 17

•

6 years ago

I did some quick testing and it seems like the audio is not affected even when the video is getting choppy.

Dan Minor [:dminor]

Assignee

Comment 18

•

6 years ago

I've filed Bug 1518125 for the problems that Randell has reported as it seems increasingly likely that is a separate issue.

Dan Minor [:dminor]

Assignee

Comment 19

•

6 years ago

Fwiw, I just interviewed an intern on Meet. Their audio and video was fine for me. They reported good audio but choppy video.

Dan Minor [:dminor]

Assignee

Comment 20

•

6 years ago

Hey Byron, if you can spare some time, Nils suggested you might be a good person to ask to have a look at this one. It is definitely a regression from the branch 64 update, related to simulcast and bitrate settings, but I'm having a hard time tracking it down. Even if you have some suggestions of other places in the code I should be looking, that would be very helpful. Thanks!

Flags: needinfo?(docfaraday)

Dan Minor [:dminor]

Assignee

Comment 21

•

6 years ago

Just to confirm that the dropped frames are a separate issue, if I disable frame dropping in the VideoConduit [1] I see the dropped frames reported by SendStatisticsProxy [2] go to zero, but the video is still choppy.

[1] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/media/webrtc/signaling/src/media-conduit/VideoConduit.cpp#711
[2] https://searchfox.org/mozilla-central/rev/c3ebaf6de2d481c262c04bb9657eaf76bf47e2ac/media/webrtc/trunk/webrtc/video/send_statistics_proxy.cc#582

Dan Minor [:dminor]

Assignee

Comment 22

•

6 years ago

Randell, could you have another look at this? This will be on release very soon and I don't think I'm any closer to tracking it down. Thanks!

Flags: needinfo?(rjesup)

Dan Minor [:dminor]

Assignee

Comment 23

•

6 years ago

I spent some time looking at the values reported by BitrateControllerImpl::AvailableBandwidth. With the branch update, the estimate for the available bandwidth grows much more quickly, but if I leave the call running for a few minutes, both converge to similar values, e.g. 5668096 with the branch update and 5863040 without. The other difference I see is that the RTT with the branch update is around 40, without it is 56.

Dan Minor [:dminor]

Assignee

Updated

•

6 years ago

Blocks: meet

Dan Minor [:dminor]

Assignee

Comment 24

•

6 years ago

Looking at the receive side in Chrome, without the branch update, the graphs for googFrameRateReceived, googFrameRateOutput, and googFrameRateDecoded show similar values, typically around 30 FPS with occasional drops to 15 or 20 FPS.

With the branch update, googFrameRateReceived still shows 30FPS with occasional drops, but googFrameRateOutput and googFrameRateDecoded average around 9FPS with lots of fluctuations.

Randell Jesup [:jesup] (needinfo me)

Comment 25

•

6 years ago

Dan - can you screenshot the results from comment 24? and if possible get a wireshark trace? Probably a raw wireshark trace, though a trace drno's packet-dump pref wouldn't hurt.

Also a webrtc_trace dump... did you ever hook up the new webrtc log messages to webrtc_trace:5?

Flags: needinfo?(rjesup) → needinfo?(dminor)

Dan Minor [:dminor]

Assignee

Comment 26

•

6 years ago

Attached image webrtc-internals from pre-branch update (deleted) — Details

Dan Minor [:dminor]

Assignee

Comment 27

•

6 years ago

Attached image webrtc-internals from post-branch update (deleted) — Details

I've attached the screenshots illustrating the difference. I've confirmed by adding additional logging that we see the same thing if Firefox is the receiver: the network framerate stays around 30fps, while the decoded framerate drops to 8 or 9.

The framerate problems seem to show up when we switch to the high resolution stream on Meet. I see bits being allocated to it on the send side, and the video resolution increases on the receive side, and then the decoded framerate drops.

I've also briefly looked at appear.in. It also uses three streams, but it seems to stay on the low quality stream and does hit the same problems.

I'll gather the requested traces.

Flags: needinfo?(dminor)

Dan Minor [:dminor]

Assignee

Comment 28

•

6 years ago

I put the traces up at: https://file.pizza/rosemary-capers-prawn-salami

I have a wireshark capture as well as a webrtc log with rtplogger enabled. The new tracing code (TRACE_EVENT0 and friends) is not hooked up to the logs, I had a quick look and it did not seem straightforward to get it working.

Flags: needinfo?(rjesup)

Dan Minor [:dminor]

Assignee

Comment 29

•

6 years ago

If I log where frames are inserted into the frame buffer [1] and where frames leave the frame buffer [2], on a recent run I see 816 frames go in and 317 frames leave. (317/816)*30fps ~ 11fps, which roughly matches the observed framerate in both Firefox and Chrome. I'm not sure where the missing frames are going, I only see a handful of messages in the log mentioned frames being dropped, not nearly enough to explain the discrepancy.

Clearing the NI to Byron, when I first asked this seemed like a simulcast bitrate problem, that no longer seems likely.

[1] https://searchfox.org/mozilla-central/rev/7adb490485eff0783071a3e132005bceeb337461/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#289
[2] https://searchfox.org/mozilla-central/rev/7adb490485eff0783071a3e132005bceeb337461/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#195

Flags: needinfo?(docfaraday)

Dan Minor [:dminor]

Assignee

Comment 30

•

6 years ago

I'm not able to get this to reproduce on calls between Firefox 64 and Nightly, only between Nightly and Nightly or Nightly and Chrome 70. I ran a call between Firefox 64 and Nightly for 20 minutes without it reproducing, then left the call with Firefox 64 and rejoined with Chrome on the same laptop and it reproduced almost immediately.

Jan-Ivar Bruaroey [:jib] (needinfo? me)

Reporter

Updated

•

6 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1519556

Dan Minor [:dminor]

Assignee

Comment 31

•

6 years ago

After spending some more time on this, I don't think we're actually dropping frames, it looks like we start requesting them more slowly. I can see frames start building up in the frame buffer.

It also doesn't seem directly tied to resolution. Sometimes my OS X laptop sends 640x480 rather than 1280x720 on meet as the highest resolution stream, and I still see the discrepancy between network framerate and decoded framerate.

See Also: https://bugzilla.mozilla.org/show_bug.cgi?id=1519556 →

Dan Minor [:dminor]

Assignee

Comment 32

•

6 years ago

I added some logging about how long it takes to get a frame from the frame buffer when receiving:

With the branch 64 update, I see a mean of 50ms, a stddev of 113ms, and a max of 930ms.
Without the update applied, I see a mean of 30ms, a stddev of 18ms, and a max of 152ms.

From previous testing, as far as I can tell, we're inserting frames into the frame buffer at the same rate in each case and it does not appear that we are dropping frames.

Interestingly, as mentioned above, this does not reproduce between Nightly and Firefox 64. I double checked, and Firefox 64 is using the old jitter buffer implementation, not frame_buffer2.cc which is what we use in Nightly after the branch update (and presumably is what is used by Chrome.)

If this was just on the receive side, I would expect one side of the Nightly and Firefox 64 call to be affected, as well as Chrome to Chrome calls. Since this is not the case, it seems likely it is an interaction between send side and the newer frame buffer implementation.

Dan Minor [:dminor]

Assignee

Comment 33

•

6 years ago

After a bit of digging, it looks like on the receive side we're skipping frames here [1] due to missing decodable frames vastly more often in the branch update case than in the preupdate case (17280 times in a 28632 line log file vs 452 times in a 29084 line log file.)

So we are dropping frames, I missed it the earlier because most places where frames are dropped have a preexisting webrtc log statement, but this one doesn't, perhaps because it does happen occasionally under normal circumstances.

[1] https://searchfox.org/mozilla-central/rev/c21d6620d384dfb13ede6054015da05a6353b899/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#106

Nils Ohlmeier [:drno]

Comment 34

•

6 years ago

(In reply to Dan Minor [:dminor] from comment #33)

After a bit of digging, it looks like on the receive side we're skipping frames here [1] due to missing decodable frames vastly more often in the branch update case than in the preupdate case (17280 times in a 28632 line log file vs 452 times in a 29084 line log file.)

So we are dropping frames, I missed it the earlier because most places where frames are dropped have a preexisting webrtc log statement, but this one doesn't, perhaps because it does happen occasionally under normal circumstances.

[1] https://searchfox.org/mozilla-central/rev/c21d6620d384dfb13ede6054015da05a6353b899/media/webrtc/trunk/webrtc/modules/video_coding/frame_buffer2.cc#106

Have you added logging to check if the continous flag or the num_missing_decodable makes us enter the if condition?
I would guess that the counting for num_missing_decodable is somehow off.

Dan Minor [:dminor]

Assignee

Comment 35

•

6 years ago

It looks like it is always missing decodable frames. If the continuous flag is sometimes false, it's happening much less often.

Ryan VanderMeulen [:RyanVM]

Comment 36

•

6 years ago

Unfortunately we're out of time for a fix this cycle. Hopefully we can get to the bottom of it for 66 still.

status-firefox65: affected → wontfix

Dan Minor [:dminor]

Assignee

Comment 37

•

6 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #36)

Unfortunately we're out of time for a fix this cycle. Hopefully we can get to the bottom of it for 66 still.

Yes, sorry.

As an update, we've reached out to the Meet team to see if they have any logs or anything on their side that might help us diagnose this problem.

I've also been trying to track down why we're unable to decode frames. Unfortunately, the Meet SFU rewrites picture ids, which would be the easiest way to correlate frames from the sender to the receiver. I've been working on logging state changes for each frame in the frame buffer and then using a script to isolate cases where we end up with a large number of non-decodable frames. I'm hoping to see a pattern in frames that we're unable to decode and then be able to use more logging or Wireshark to correlate this back to a problem on the send side.

As far as I can tell, the picture ids from Meet start at zero, don't have gaps and show up in order, so I don't think the problem is that we are just missing frames or receiving them too late. I'll spend some time this morning to double check this before continuing.

Dan Minor [:dminor]

Assignee

Comment 38

•

6 years ago

Saeed from the Meet team pointed out that the timestamps on the receive side seemed incorrect but they are fine on the send side. It turns out the Meet SFU rewrites timestamps based upon RTCP sender reports.

Unfortunately, I ended up spending time duplicating Saeed's work. For the record:

Looking at the delta between timestamps on the send side:

min|max  |mean|std. dev
900|16740|3059|1004
900|6390 |3017|797
900|10440|3026|820

And on the receive side:

min|max  |mean|std. dev
1  |88896|3055|9953

So the rewriting has a pretty big impact on the distribution of the timestamps.

Hopefully this explains the problems we are seeing.

Flags: needinfo?(rjesup)

Dan Minor [:dminor]

Assignee

Comment 39

•

6 years ago

There is a difference in the rtp timestamp estimated in the sender report. The last_frame_capture_time_ms_ value at [1] is always zero in affected versions of Firefox, which means the rtp timestamps in the sender report do not match up with the rtp timestamps when a video packet is sent. I'll track down why it is not being set.

If I change the code at [2] to assign a value to last_frame_capture_time_ms_ based upon the current clock time, I can confirm that the bug does not reproduce and the video is smooth.

[1] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_sender.cc#482
[2] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/modules/rtp_rtcp/source/rtcp_sender.cc#283

Dan Minor [:dminor]

Assignee

Comment 40

•

6 years ago

Prior to the branch 64 update, timestamps were set in ViEEncoder::OnFrame unconditionally [1]. After the update, the equivalent code is in VideoStreamEncoder::OnFrame, however rather than setting the timestamp unconditionally, it is only done if it is greater than the current time.

If I comment out the if statement so that the timestamp is set all of the time then the bug no longer reproduces.

Note that [1] is from an upstream import commit and does not include any of our local modifications. This was just a case of a behavioural change in the upstream code that we didn't catch.

[1] https://hg.mozilla.org/integration/mozilla-inbound/file/fda2b2655b26/media/webrtc/trunk/webrtc/video/vie_encoder.cc#l461
[2] https://searchfox.org/mozilla-central/rev/6c784c93cfbd5119ed07773a170b59fbce1377ea/media/webrtc/trunk/webrtc/video/video_stream_encoder.cc#681

Dan Minor [:dminor]

Assignee

Comment 41

•

6 years ago

Try job here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0b047c3b1351304f5d819c0bb316d417aafc369

Dan Minor [:dminor]

Assignee

Comment 42

•

6 years ago

Attached file Bug 1515205 - Set frame timestamp in VideoConduit::SendVideoFrame; r=drno! (obsolete) (deleted) — Details

In the past we relied upon ViEEncoder::OnFrame to set the render time for
frames. With the branch 64 update, this code moved to
VideoStreamEncoder::OnFrame, and only sets the timestamp if it is greater than
the current time. This results in broken rtp timestamp estimates in the rtcp
sender report, which causes problems for Meet and possibly other services
that rewrite rtp timestamps based upon the sender report.

This patch explicitly sets the timestamp in VideoConduit::SendVideoFrame. This
should give us the same behaviour that we had before the branch update without
requiring local modifications to upstream code.

Dan Minor [:dminor]

Assignee

Comment 43

•

6 years ago

(In reply to Dan Minor [:dminor] from comment #41)

Try job here: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b0b047c3b1351304f5d819c0bb316d417aafc369

This has gtests failures because they are testing explicit values of the timestamp rather than just checking that it is increasing. I've filed Bug 1522238 to do a proper fix for this.

Dan Minor [:dminor]

Assignee

Comment 44

•

6 years ago

Attached file Bug 1515205 - Always set frame timestamps in VideoStreamEncoder::OnFrame; r=drno! (deleted) — Details

In the past we relied upon ViEEncoder::OnFrame to set the render time for
frames. With the branch 64 update, this code moved to
VideoStreamEncoder::OnFrame, and only sets the timestamp if it is greater than
the current time. This results in broken rtp timestamp estimates in the rtcp
sender report, which causes problems for Meet and possibly other services
that rewrite rtp timestamps based upon the sender report.

This patch makes VideoStreamEncoder::OnFrame always set the timestamp. In a
follow on bug, we'll move this behaviour to VideoConduit so we don't have to
maintain a local modification of the upstream code.

Phabricator Automation

Updated

•

6 years ago

Attachment #9038634 - Attachment is obsolete: true

Dan Minor [:dminor]

Assignee

Comment 45

•

6 years ago

Updated try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9f2198d346a42bec773a2968c27ee1a03d2e1dc1

Comment 46

•

6 years ago

Pushed by dminor@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a77e9c7eabb5 Always set frame timestamps in VideoStreamEncoder::OnFrame; r=drno

Dan Minor [:dminor]

Assignee

Updated

•

6 years ago

Attachment #9038641 - Flags: approval-mozilla-beta?

Nils Ohlmeier [:drno]

Comment 47

•

6 years ago

[Tracking Requested - why for this release]: this is a regression from Firefox 64, which might impact a lot of (Hangout) users.

status-firefox65: wontfix → affected

tracking-firefox65: + → ?

Nils Ohlmeier [:drno]

Updated

•

6 years ago

Attachment #9038641 - Flags: approval-mozilla-beta?

Nils Ohlmeier [:drno]

Updated

•

6 years ago

Attachment #9038641 - Flags: approval-mozilla-beta?

Nils Ohlmeier [:drno]

Comment 48

•

6 years ago

For some unknown reason the explanation when filling the beta uplift request does not show up here. Manually copying stuff in a comment here instead:

Feature/Bug causing the regression: 1376873

User impact if declined: The video of Firefox users on Google Hangouts calls appears with very bad quality (low framerate) and freezes very frequently.

Is this code covered by automated tests? Yes

Has the fix been verified in Nightly? No

Needs manual test from QE? No

If yes, steps to reproduce:

List of other uplifts needed: N/A

Risk to taking this patch: Low

Why is the change risky/not risky?
(and alternatives if risky): The update of libwebrtc from upstream to version 64 (in bug 1376873) came with one new additional if condition. We are only restoring the previous code here by removing this if condition and restore the behavior from before the update.

String changes made/needed: N/A

Ryan VanderMeulen [:RyanVM]

Updated

•

6 years ago

tracking-firefox65: ? → +

Andrei Ciure[:aciure]

Comment 49

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/a77e9c7eabb5

Status: NEW → RESOLVED

Closed: 6 years ago

status-firefox66: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla66

Ryan VanderMeulen [:RyanVM]

Comment 50

•

6 years ago

Comment on attachment 9038641 [details]
Bug 1515205 - Always set frame timestamps in VideoStreamEncoder::OnFrame; r=drno!

[Triage Comment]
Reverts us to the behavior we had before the last WebRTC upstream update. Nils assures me this was also heavily tested by people on his team across different sites. Approved for 65.0 RC2.

Attachment #9038641 - Flags: approval-mozilla-beta? → approval-mozilla-release+

Ryan VanderMeulen [:RyanVM]

Comment 51

•

6 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-release/rev/ca6926210a58

status-firefox65: affected → fixed

Cornel Ionce [:noni] [Hubs QA]

Comment 52

•

6 years ago

(In reply to Nils Ohlmeier [:drno] from comment #48)

Is this code covered by automated tests? Yes
Needs manual test from QE? No

Marking as qe-verify- since it is covered by automated tests.

Flags: qe-verify-

Nico Grunbaum [:ng, @chew:mozilla.org]

Updated

•

4 years ago

Depends on: 1646904

Dan Minor [:dminor]

Assignee

Updated

•

4 years ago

No longer depends on: 1646904

You need to log in before you can comment on or make changes to this bug.