Closed Bug 1690688 Opened 4 years ago Closed 4 years ago

Some YouTube videos hang with SW-WR (when Windows laptop is on battery power)

Categories

(Core :: Audio/Video: Playback, defect, P2)

Unspecified
Windows
defect

Tracking

()

VERIFIED FIXED
87 Branch
Tracking Status
firefox-esr78 --- unaffected
firefox85 --- unaffected
firefox86 --- disabled
firefox87 --- verified

People

(Reporter: cpeterson, Assigned: alwu)

References

(Blocks 2 open bugs, Regression, )

Details

(Keywords: regression, Whiteboard: [media-youtube])

Attachments

(1 file)

This is a regression from bug 1681043. I bisected the regression to this pushlog:

https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=12325181f43aee1257922717f8ae4d52e4e502d6&tochange=50fb5b9343f6ddec6ec41d4e648fd8a4807ef133

Steps to reproduce

  1. Enable gfx.webrender.software in 86 Beta or 87 Nightly and restart Firefox.
  2. Load https://www.youtube.com/watch?v=N9M4Fn0efUE
  3. Play the video in full screen mode. (Playing in full screen is not required, but seems to reproduce the problem more reliably.)

Expected result

The video and audio play smoothly.

Actual result

The video hangs after about ten seconds, but the audio continues playing. Disabling SW-WR and restarting Firefox makes the hang go away. I can reproduce with SW-WR on Windows, but not macOS.

This problem doesn't happen with all YouTube videos, but it seems to happen on other game videos on https://modernwolf.net/games, but the "Skeleton Crew" video linked above seems to reproduce the problem most reliably.

Blocks: RDD
Blocks: sw-wr-stability
No longer blocks: gfx-triage, sw-wr-dogfood

Chris, could you please grab a performance profile (with the graphics setting enabled) for this?

Flags: needinfo?(cpeterson)

(In reply to Matt Woodrow (:mattwoodrow) from comment #1)

Chris, could you please grab a performance profile (with the graphics setting enabled) for this?

Here is a profile:

https://share.firefox.dev/2MZ4mcI

Looks like the Compositor and Renderer go idle around 12.5 seconds, there's a major GC around 13 seconds, and then the video frame is stuck, as show in the Screenshots row.

I am only able to reproduce this problem when my laptop is on battery power.

Flags: needinfo?(cpeterson)
Summary: Some YouTube videos hang with SW-WR → Some YouTube videos hang with SW-WR (when Windows laptop is on battery power)

This seems like a fairly high priority issue -- Windows bustage of YT under a non-exotic config. Alastor, Matt, do you have any further insight that would mitigate this concern? If not, does this fall into either of your wheelhouses?

Severity: -- → S2
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(alwu)
Priority: -- → P2

(In reply to Chris Peterson [:cpeterson] from comment #2)

(In reply to Matt Woodrow (:mattwoodrow) from comment #1)

Chris, could you please grab a performance profile (with the graphics setting enabled) for this?

Here is a profile:

https://share.firefox.dev/2MZ4mcI

Looks like the Compositor and Renderer go idle around 12.5 seconds, there's a major GC around 13 seconds, and then the video frame is stuck, as show in the Screenshots row.

I am only able to reproduce this problem when my laptop is on battery power.

Sorry, can you please grab one with the 'media' config enabled. Looks like the gfx is working fine, but we're not providing new video frames.

Flags: needinfo?(matt.woodrow) → needinfo?(cpeterson)

(In reply to Matt Woodrow (:mattwoodrow) from comment #4)

Sorry, can you please grab one with the 'media' config enabled. Looks like the gfx is working fine, but we're not providing new video frames.

Here's a 'media' config profile:

https://share.firefox.dev/3jGss8v

I now see the video hang after just 1-2 seconds instead of 10+ (and then about 5-10 seconds later the video intermittently jumps ahead and plays a couple seconds before hanging again).

Flags: needinfo?(cpeterson)

From what I've seen, the video decoding was too slow and couldn't catch up with the audio, so video sink dropped a lot of video frames.
Most of blocking time happened in MFTDecoder::Output() where blocking happened in Microsoft Media Foundation.

When turning on the gfx.webrender.software, I guess that affected this attribute which would make all video decoding perform on RDD process, instead of GPU. But I'm not clear why performing both video and audio on RDD would make decoding too slow.

If they all runs in software decoder, maybe that is the reason why this issue can only be reproduced when the laptop is on battery power, where some ability of computer might be constrained.

I saw there were three decoder threads in RDD and they all blocked in MFTDecoder::Output(), not sure if they would affect each other. (and why three decoders? why not two? one for video and one for audio)

Flags: needinfo?(alwu)

Matt, is this pref gfx.webrender.software something that users would turn on easily? Is turning that off affecting this, making us not be able to create video decoder in GPU process? if so, are we still able to create decoder in GPU process when this pref is on?

I think the patch causing this issue is this one, which would affect not creating video decoder in GPU process.

Flags: needinfo?(matt.woodrow)

(In reply to Alastor Wu [:alwu] from comment #7)

Matt, is this pref gfx.webrender.software something that users would turn on easily? Is turning that off affecting this, making us not be able to create video decoder in GPU process? if so, are we still able to create decoder in GPU process when this pref is on?

I think that check is somewhat wrong, since it will indeed block decoding in the GPU process when software WebRender is enabled, even if we had a d3d11 compositor. I think we can return true if UsingSoftwareWebRenderD3D11() is true.

I think the patch causing this issue is this one, which would affect not creating video decoder in GPU process.

The pref itself is unlikely to be changed, but the underlying feature of software webrender may be automatically enabled on Nightly for some users (and will be on other channels in the future).

From what I've seen, the video decoding was too slow and couldn't catch up with the audio, so video sink dropped a lot of video frames.
Most of blocking time happened in MFTDecoder::Output() where blocking happened in Microsoft Media Foundation.

Indeed, it looks like we had a couple of very slow frames in the Renderer thread where software WebRender consumed a lot of CPU. From there it appears we're decoding fine, but we just can't catch up and we're just discarding all the frames we do decode.

Fixing the condition above so that cpeterson gets a hardware decoder again will help, but some users might truly be stuck on software decoding and software WebRender. I think we want to look at our skip-to-keyframe logic again to see if there's something better we can do (like presenting the frames late rather than never?) for this case.

I saw there were three decoder threads in RDD and they all blocked in MFTDecoder::Output(), not sure if they would affect each other. (and why three decoders? why not two? one for video and one for audio)

I suspect this is just the threadpool, and a single decoder (or two) are jumping between threads in the pool for each task/frame.

Flags: needinfo?(matt.woodrow)

(In reply to Matt Woodrow (:mattwoodrow) from comment #8)

I think that check is somewhat wrong, since it will indeed block decoding in the GPU process when software WebRender is enabled, even if we had a d3d11 compositor. I think we can return true if UsingSoftwareWebRenderD3D11() is true.

Thank you! I will submit a patch to tweak that checking.

I suspect this is just the threadpool, and a single decoder (or two) are jumping between threads in the pool for each task/frame.

yep you're right, when I zoom in those activities on decoder threads, I did see the video decoding tasks running in different threads times by times, and didn't have an overlapping.

Assignee: nobody → alwu

(In reply to Matt Woodrow (:mattwoodrow) from comment #8)

I think we want to look at our skip-to-keyframe logic again to see if there's something better we can do (like presenting the frames late rather than never?) for this case.

Our mechanism for skip-to-keyframe is to keep audio playing and present the last frame to users. Chrome's way is to completely pause the video, until having enough data. I'm not sure if we should show those frames which are already late, because there are probably some good reasons of choosing current mechanism. Bryce, any thought about this?

Flags: needinfo?(bvandyk)
Whiteboard: [media-youtube]
Pushed by alwu@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4657effbc3c4 enable video decoding in GPU process if the compositor supports D3D11. r=mattwoodrow
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 87 Branch

(In reply to Alastor Wu [:alwu] from comment #10)

(In reply to Matt Woodrow (:mattwoodrow) from comment #8)

I think we want to look at our skip-to-keyframe logic again to see if there's something better we can do (like presenting the frames late rather than never?) for this case.

Our mechanism for skip-to-keyframe is to keep audio playing and present the last frame to users. Chrome's way is to completely pause the video, until having enough data. I'm not sure if we should show those frames which are already late, because there are probably some good reasons of choosing current mechanism. Bryce, any thought about this?

My understanding of our behaviour was that we were supposed to detect that we'd fallen behind, and then jump decoding forward to the next keyframe (and buffer 3-10 frames ahead of that) so that we could start playing video again when we got to that point.

Looking at the profile it appears we just keep decoding late frames for ~4 seconds, so we're dropping hundreds of frames and it looks frozen.

It's unexpected to me that we'd keep trying to decode for so long without just giving up and advancing.

I don't have much to add. I'm not opposed to our current mechanism. It sounds like we have room to improve the skipping if we're falling behind then getting stuck with a backlog of frames we're already late on, but I don't have insight as to the why we're doing that or how to fix.

Flags: needinfo?(bvandyk)

sw-wr was in activated in early betas only, marking disabled for 86.

(In reply to Matt Woodrow (:mattwoodrow) from comment #14)

My understanding of our behaviour was that we were supposed to detect that we'd fallen behind, and then jump decoding forward to the next keyframe (and buffer 3-10 frames ahead of that) so that we could start playing video again when we got to that point.

Looking at the profile it appears we just keep decoding late frames for ~4 seconds, so we're dropping hundreds of frames and it looks frozen.

It's unexpected to me that we'd keep trying to decode for so long without just giving up and advancing.

That is another mechaism in MediaFormatReader. On the place around 9s, I saw we started getting the video frames which were not late to audio, which might be the place we finished skipping.

The frames dropping happens in VideoSink which simply drops the frames behind the audio clock. Will open another bug to investigate that.

I see a similar frames dropping issue in 1692881, will investigate the problem there.

Flags: qe-verify+

Confirmed issue with 87.0a1 (2021-02-03), 86.0.1 on Win10 right after going fullscreen.

  • setting video in theater mode and pressing the F11 key a couple of times did the trick in triggering it.

Can confirm that with 87.0b9-Win10 the issue does not manifest anymore.

Status: RESOLVED → VERIFIED
Flags: qe-verify+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: