Closed Bug 1539043 Opened 5 years ago Closed 4 years ago

Reduce / remove copy and memory allocation when using RemoteDataDecoder (decoder side)

Categories

(Core :: Audio/Video: Playback, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED

People

(Reporter: jya, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: power, Whiteboard: [media-rdd])

The current process for video decoding is the following:

Decode (RDD) -> allocate and copy into YCbCrBuffer, Create Image/VideoData (RDD) -> Allocate and Copy into shemm (RDD) -> IPC -> Deserialise IPC (CP) -> Allocate and Copy into RecyclingPlanarYCbCrImage (CP) -> Allocate and Copy into shmem (CP) -> IPC -> Deserialise IPC (GPU) -> Allocate and copy (GPU) -> Upload to surface.

The process is almost the same for audio decoding.

We could be using shmem all the way across by use of a YCbCrShmemBuffer or a AudioData with a shmeme backend, only needing a copy once and then sharing it across process.

Michael could you please take a look at this?

Assignee: nobody → mfroman
Flags: needinfo?(mfroman)
Priority: -- → P2

Based on my bug backlog, hunting crashes, I suspect it will be a while.

Flags: needinfo?(mfroman)
Keywords: power
Depends on: 1552622

jya and I discussed this a bit today.

Ideally we want the RDD process to allocate shmem, and get that to the GPU process to be uploaded to the GPU. Neither the RDD process, nor the content process will be able to access the GPU once we have full sandboxing in place.

Currently our IPDL code can only share shmem across the IPDL connection that it was created on, so if the RDD process sends shmem to the content process, then we can't share that with the GPU process without copying to a new shmem. It's possible that we could get that restriction changed, but maybe not easily.

My preferred option is to do the same as the GPU process decoder, and open a new IPDL channel to connect the decoder directly to the compositor to send decoded frames across. The RDD decoder would then return opaque handles back to the content process that identify a frame that has been sent to the compositor.

Unifying the code used for the RDD decoder, and the GPU process decoder would probably be the easiest way to achieve this, and is something I think we want regardless (since the two implementations are copy pastes of each other).

It sounds like the content process sometimes needs to read the content of decoded audio buffers, so we'd need to stick with the extra copy there. Audio should be small enough that this isn't a huge deal.

After talking to jya a bit the other day, he thought this could be a simple change by making sure ImageContainer::CreatePlanarYCbCrImage creates a SharedPlanarYCbCrImage because it has a valid mImageClient here[1]. That requires the ImageContainer here[2] is created with mode ASYNCHRONOUS. However, we still fail the check for mImageClient[3] because EnsureImageClient is not finding a ImageBridgeChild here[4].

Fixing that requires we hookup an ImageBridge like GPU process does here[5] with the additional IPC messaging/namespaces/etc.

[1] https://searchfox.org/mozilla-central/source/gfx/layers/ImageContainer.cpp#230
[2] https://searchfox.org/mozilla-central/source/dom/media/ipc/RemoteVideoDecoder.cpp#147
[3] https://searchfox.org/mozilla-central/source/gfx/layers/ImageContainer.cpp#229
[4] https://searchfox.org/mozilla-central/source/gfx/layers/ImageContainer.cpp#162
[5] https://searchfox.org/mozilla-central/source/gfx/ipc/GPUProcessManager.cpp#773

(In reply to Michael Froman [:mjf] from comment #5)

After talking to jya a bit the other day, he thought this could be a simple change by making sure ImageContainer::CreatePlanarYCbCrImage creates a SharedPlanarYCbCrImage because it has a valid mImageClient here[1]. That requires the ImageContainer here[2] is created with mode ASYNCHRONOUS. However, we still fail the check for mImageClient[3] because EnsureImageClient is not finding a ImageBridgeChild here[4].

Fixing that requires we hookup an ImageBridge like GPU process does here[5] with the additional IPC messaging/namespaces/etc.

Yes, unfortunately it turned out that a shmem is only usable with the ipdl bridge that created it.

So if you create a shmem on the RDD<->Content IPDL, the resulting shmem can only be read by the designated content process and the RDD process.
The GPU process wouldn't be able to access it.

So it's slightly (and that's a massive understatement) more work than I first imagined.

We need a GPU<->RDD IPDL ; this would allow to either allocate a shmem that would ultimately end up being used by the GPU process. Or the ability to ask the GPU process for an image, and directly perform the image upload from the decoder to the GPU image within the RDD. This is what the current content process does on Windows.
However, this would require to loosen the RDD sandbox to access win32 API, which is what we are trying to remove from the content process.

We have existing IPDL for connecting a decoder directly to the compositor, called PVideoBridge.

It's currently only initialized in the GPU decoder (which we should fix by unifying the two remote decoder implementations), and currently is initialized assuming that the compositor thread is in the same process.

I think we could fix the latter fairly easily by creating PVideoBridge Endpoints in the parent process when we create the RDD process, and send one into the newly created RDD process and the other to the compositor (either in the parent process, or the GPU process - GPUProcessManager knows which).

There's sure to be some tricky corners with handling ordering guarantees, and recreating the connection if the GPU process crashes, but I don't think it's too bad.

Depends on: 1561178
Depends on: 1561179

Bug 1542881 comment 2 has a writeup of the possible options we have here, from both a performance and security perspective.

Depends on: 1561181
Depends on: 1562910
Assignee: mfroman → nobody

This job will be done via bug 1595994. We will now directly upload the content into a GPU surface (on windows platforms)

Depends on: 1595994
Priority: P2 → P3
Whiteboard: [media-rdd]
Depends on: 1618429

This bug still persists on Mac OS Mojave 10.14.6 (18G4032) running FireFox 74.0.1 (64-bit). My computer is lifting off, RDD process is pulling around 128% CPU sometimes. This computer is a fresh install.

I ran into this today on Firefox 75.0 in Linux Mint 19.3. Certain videos, apparently those using the av01 codec, are using a lot of CPU when playing.

I also have this issue on Ubuntu 18.04 Firefox 75.0 (64 bit), makes it impossible to view videos. Isn't there any way to disable it until fixed?

Firefox nowadays "kills" the computer when watching YouTube. CPU load goes rocket high, RDD process uses all CPU it can grab., then after a short time CPU has to throttle and things get even worse.

Problem is, more and more YouTube videos are AV1-encoded, and recently YouTube changed the HD availability to 1080p minimum (was 720p previously), so people are forced to use 1080p to get HD, and then Firefox goes havoc with AV1/RDD.

There's a (non-default) YouTube setting under "Playback and performance" where you can force YouTube to prefer AV1 only for SD, not for HD. Basically that forces VP9 for higher resolutions, and all is fine with Firefox for now. But that option is likely to be removed eventually.

With YouTube's trend to go for AV1 1080p HD, more and more users are getting faced with that problem. Other browsers do not have those performance problems.

If there's a chance that Firefox can be optimized for better AV1 performance, I'd be very happy. Firefox is the best of all browsers!

(Firefox 78.0.1, Fedora 32 64-bit Linux)

I hope I'm not spamming this list but I can confirm visual 'freezing' and 'stuttering' videos on YouTube. Note that this is only video stuttering and the audio is coming through just fine.

For me, Firefox recently upgraded to 78.0.1 (64-bit).

I suspect that perhaps the combination of Firefox's upgrade and Youtube recently started pushing higher quality video is the reason myself (and I guess others?) are seeing this issue now.

When I downgrade to 720p, the video stuttering goes away.

I also see the "RDD Process" but it only takes 1-2% of CPU and 0.5 % memory (on an 8G machine) as reported by top. The "Web Content" process shoots up to 200% CPU when viewing a 2180p video and down to under 100% when I reduce the quality to 720p (I have a quad core machine).

Here is some other info if it's useful:

$ firefox --version
Mozilla Firefox 78.0.1
$ uname -a
Linux mill 4.4.0-184-generic #214-Ubuntu SMP Thu Jun 4 10:14:11 UTC 2020 x86_64 x86_64 x86_64 GNU/Linux
$ cat /etc/lsb-release 
DISTRIB_ID=Ubuntu
DISTRIB_RELEASE=16.04
DISTRIB_CODENAME=xenial
DISTRIB_DESCRIPTION="Ubuntu 16.04.6 LTS"
$ free -h
              total        used        free      shared  buff/cache   available
Mem:           7.7G        2.0G        1.9G        542M        3.7G        4.8G
Swap:          7.9G        546M        7.3G

Hope that helps!

Same here with the latest FF 79.0 64-bit.

It has only started with the latest version update for me.

BTW, this is not an enhancement, it is clearly a bug. Stuttering is a buggy behavior.

The description of the bug is no longer valid.

We know have as little copy as possible, just one in the worse case scenario, and this can't be improved unless we do the decoding in the GPU process and directly decode into a GPU surface.

I'll mark this bug as fixed as this will be handled in bug 1539735

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.