Closed Bug 1315283 Opened 8 years ago Closed 8 years ago

Intermittent video glitches on AppRTC calls

Categories

(Core :: WebRTC: Audio/Video, defect, P1)

51 Branch
defect

Tracking

()

VERIFIED FIXED
mozilla52
Tracking Status
firefox50 --- unaffected
firefox51 + verified
firefox52 --- verified

People

(Reporter: JuliaC, Assigned: jesup)

References

()

Details

(Keywords: regression)

Attachments

(1 file, 2 obsolete files)

[Affected versions]: - latest Aurora 51.0a2 (2016-11-04) - latest Nightly 52.0a1 (2016-11-04) [Affected platforms]: - Ubuntu 14.04 x86 - Mac OS X 10.11.6 - Windows 7 x64 (highly intermittent, only seen one of ten tries on this OS) [Steps to reproduce]: 1. Launch Firefox on 2 separate stations 2. Go to https://apprtc.appspot.com/ on both stations and initiate a call (enter a room) 3. Wait a few minutes and pay attention to the video playback [Expected result]: - The call is properly taking place, no video or audio issues are encountered [Actual result]: - Video glitches occur at the bottom of the window mainly at the beginning of the call and sometimes they keep emerge until the end of the call (see the attached screenshot https://goo.gl/VE9DHi) [Regression range]: - This issue is not reproducible on Firefox 50 - Further investigation will be done on this side as soon as possible
Version: Trunk → 51 Branch
Track 51+ as new regression. Hi Maire, Could you help find someone to work on this?
Flags: needinfo?(mreavy)
We have been looking at this. It appears to happen with VP9 when there is a resolution change due to load. (apprtc is preferring VP9.) We may already have a fix for this on another bug which is waiting to land.
Assignee: nobody → rjesup
Rank: 10
Flags: needinfo?(mreavy)
Priority: -- → P1
Let's add a "see also" or a dependency on that bug that's waiting to land, just so that we can track it.
We didn't manage to reproduce this issue on the 50 branch, neither on the Nightly 51 builds. It seems that only affected branches are Nightly 52 and Aurora 51. Due to its intermittent nature and because bug 1316325 is often occurring it is very hard to determine a exact regression range for it.
this is due to apprtc preferring VP9 over VP8, and VP9 not handling resizes of the input (due to load). This will abort() in a debug build. I have a working patch to fix this.
Attachment #8809950 - Attachment is obsolete: true
Attachment #8809950 - Flags: review?(tdaede)
Comment on attachment 8809951 [details] [diff] [review] allow VP9 encoder in webrtc to reconfigure if the input resolution changes Review of attachment 8809951 [details] [diff] [review]: ----------------------------------------------------------------- ::: media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp9/vp9_impl.cc @@ +585,5 @@ > + > + codec_.width = input_image.width(); > + codec_.height = input_image.height(); > + > + vpx_img_free(raw_); Where are you calling vpx_codec_destroy()? I think you are leaking a codec ctx for every resize. @@ +597,5 @@ > + config_->g_threads = NumberOfThreads(codec_.width, codec_.height, > + num_cores_); > + // Update the cpu_speed setting for resolution change. > + cpu_speed_ = GetCpuSpeed(codec_.width, codec_.height); > + int result = InitAndSetControlSettings(&codec_); So this should *work*, however VP9 can change size without keyframes, so by re-initializing we are producing a non-ideal video stream. Even if we don't produce a stream that changes size on the fly without reinit, we also have to accept it, so it might make bugs slip by easier if we don't produce resolution changing video.
Attachment #8809951 - Flags: review?(tdaede) → review-
(In reply to Thomas Daede [:TD-Linux] from comment #8) > ::: media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp9/vp9_impl.cc > @@ +585,5 @@ > > + > > + codec_.width = input_image.width(); > > + codec_.height = input_image.height(); > > + > > + vpx_img_free(raw_); > > Where are you calling vpx_codec_destroy()? I think you are leaking a codec > ctx for every resize. We don't create a new codec_ or VP9EncoderImpl (which would be 'this' here); we reuse the encoder. You can compare to ../vp8/vp8_impl.cc > > @@ +597,5 @@ > > + config_->g_threads = NumberOfThreads(codec_.width, codec_.height, > > + num_cores_); > > + // Update the cpu_speed setting for resolution change. > > + cpu_speed_ = GetCpuSpeed(codec_.width, codec_.height); > > + int result = InitAndSetControlSettings(&codec_); > > So this should *work*, however VP9 can change size without keyframes, so by > re-initializing we are producing a non-ideal video stream. And we have auto-resizing on in VP9 (which happens due to bitrate/rate-control I understand); this patch is so we can force a reduction in CPU use (by dropping resolution) if encoding is taking too long (load is too high). This is already done for VP8 (and we're basically mirroring the idea from VP8, though the innards of the function I added are different). This is also generally quite rare, so I'm ok with non-optimal streams. The user will see a distinct reduction in resolution (typically 3/4 x 3/4 previous - or increase when going back the other way).
Comment on attachment 8809951 [details] [diff] [review] allow VP9 encoder in webrtc to reconfigure if the input resolution changes re-requesting review, see comment
Attachment #8809951 - Flags: review?(tdaede)
Comment on attachment 8809951 [details] [diff] [review] allow VP9 encoder in webrtc to reconfigure if the input resolution changes Review of attachment 8809951 [details] [diff] [review]: ----------------------------------------------------------------- So the reason the current proposed patch does not cause a memory leak is because most of vp9_cx_interface.c:encoder_init is guarded by if (ctx->priv != NULL). However, this guard also causes neither validate_config nor set_encoder_config to be called either. So this patch, by calling vpx_codec_enc_init() instead of vpx_codec_enc_config() like VP8 does: https://dxr.mozilla.org/mozilla-central/source/media/webrtc/trunk/webrtc/modules/video_coding/codecs/vp8/vp8_impl.cc#925 is effectively changing config values underneath libvpx without calling the logic in vp9_cx_iface.c:encoder_set_config() (which is responsible for forcing a keyframe on resolution change). It also means that vp9_change_config isn't called right away (though it later is implicitly after codec ctls in InitAndSetControlSettings). Without calling the encoder_set_config() logic, I don't think this patch will work with resolution changes where valid_ref_frame_size are false. (which require a keyframe) It is not clear to me why this patch avoids the assertions when AQ mode == 3 that are triggered by doing the resize "properly" (like for vp8). I would much rather using the API as designed and disabling AQ mode 3 if that solves the problem (and reporting the issue to upstream libvpx).
I looked into this issue in more detail, in particular the effects of repeatedly initializing a codec: Higher up in vpx_codec_enc_init_ver, ctx->priv is set to NULL. This means that the code path in vp9_cx_interface.c is always run, and so the encoder is always reinitialized. This explains why the patch works - it really is re-creating the encoder. However, I can't see anyway that this doesn't leak memory?
Attachment #8809951 - Attachment is obsolete: true
Attachment #8809951 - Flags: review?(tdaede)
Attachment #8810026 - Flags: review?(tdaede) → review+
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/8b0d24bbb59d allow VP9 encoder in webrtc to reconfigure if the input resolution changes r=TD-Linux
Comment on attachment 8810026 [details] [diff] [review] allow VP9 encoder in webrtc to reconfigure if the input resolution changes Approval Request Comment [Feature/regressing bug #]: enabling VP9 in webrtc (in 51) [User impact if declined]: VP9 in webrtc would have to be disabled in 51 [Describe test coverage new/current, TreeHerder]: VP9 calls are not tested in treeherder yet. Even if they were, this might be hard to reliably test (it might unreliably get tested due to load on test VMs) Manual test is easy; start apprtc (or other VP9) call, and (in linux) do "stress -c 8" (or 16, whatever) and wait a bit. [Risks and why]: Lower risk - functionalities are almost identical to VP8; but there are bugs if we just modify the config in vp9 (in AQ 3 (cyclic) it asserts; in AQ 0 it works, but on a resize to less than 1/2 x 1/2 original, it asserts in convolve()). Instead we're destroying and recreating the encoder, and resetting the bitrates/etc. Biggest risk would be a leak. [String/UUID change made/needed]: None
Attachment #8810026 - Flags: approval-mozilla-aurora?
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment on attachment 8810026 [details] [diff] [review] allow VP9 encoder in webrtc to reconfigure if the input resolution changes Since merge day is passed, 51 became beta. So I remove aurora flag and change it to 51 beta. This patch fixes a regression. Beta51+. This should be in 51 beta 2.
Attachment #8810026 - Flags: approval-mozilla-beta+
Attachment #8810026 - Flags: approval-mozilla-aurora?
Attachment #8810026 - Flags: approval-mozilla-aurora-
Flags: qe-verify+
I verified this issue on Ubuntu 14.04 x86, Mac OS X 10.11.6 and Windows 7 x64, using latest Aurora 52.0a2 (2016-11-23) and 51.0b2 build1 (20161121093909). The fix looks good, I couldn't reproduce the issue anymore and also, no other side effects were noticed.
Status: RESOLVED → VERIFIED

I've seen this issue occur again in appRTC and Hangouts in both Nightly v67.0a1 from 2019-03-11 and Beta v66.0b14, on Windows.
Should I log a new issue or can I reopen this one?

Flags: needinfo?(jib)

Bodea, please open a new issue since this one contains patches that have landed and been uplifted. Thanks.

Flags: needinfo?(jib)

I could not manage to reproduce it again today on the the 3 day old Nightly, Beta14 or RC. Some edge case might be hiding here.
Do you have any idea what could influence the reproduction of this issue? Do you know of anything that I could do to enhance the chances of reproduction?

Flags: needinfo?(rjesup)

Bodea: Please file a new bug, and NI dminor on it. Also, if possible if you can repro it, check regressionrange - the most likely candidate would be changes in FF65 to update upstream webrtc code.

Does it look like the image in comment 0?

dminor: A quick look shows that the code I added doesn't appear to be in vp9_impl.cc anymore.... It looks like this was lost. Relanding the patch should work. The check should go right before line 504 it appears.

Flags: needinfo?(rjesup)
Flags: needinfo?(dminor)
Flags: needinfo?(daniel.bodea)
Blocks: 1535584

I've cloned this to Bug 1535584.

Flags: needinfo?(dminor)
Flags: needinfo?(daniel.bodea)

Hi :jesup,

Yes, the reproduction looked the same. The video was shown correctly for a few seconds and then only partially, only about a quarter of the whole streaming video (a corner of it) and the green object overlapping the video, occasionally.

I have logged the following: bug 1536048.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: