Closed Bug 1668735 Opened 4 years ago Closed 2 years ago

FF81 Linux - high CPU load playing video/stream (no GPU usage?)

Categories

(Core :: Audio/Video: Playback, enhancement)

Firefox 81
enhancement

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: star, Unassigned, NeedInfo)

References

(Blocks 1 open bug)

Details

Attachments

(2 files)

User Agent: Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:81.0) Gecko/20100101 Firefox/81.0

Steps to reproduce:

Stream video/live TV at welt.de
Started FF with
"MOZ_X11_EGL=1 MOZ_LOG="PlatformDecoderModule:5"
Use ubuntu 20.4 (mate) with modeset-driver on a core-m5.
check for the video acceleration flags as described in
https://bugzilla.mozilla.org/show_bug.cgi?id=1619523
(media.ffvpx.enabled: false
media.ffmpeg.vaapi-drm-display.enabled: true
media.ffmpeg.vaapi.enabled: true
widget.wayland-dmabuf-vaapi.enabled: true)

Actual results:

The CPU usage of my core m5 keeps same as before, around 50% for streaming. This affects the battery time.
If I stream by mplayer load is just around 5..10%

Expected results:

I expected a decrease of CPU load with FF80/81 as is should support hardware acceleration to extend playtime. Some of the streams can't played via mplayer - I used that one form welt.de for comparison.

The log looks like that:
~$ MOZ_X11_EGL=1 MOZ_LOG="PlatformDecoderModule:5" firefox 2>&1 | grep 'VA-API'
[Child 7583: MediaPDecoder #3]: D/PlatformDecoderModule Initialising VA-API FFmpeg decoder
libva info: VA-API version 1.7.0
[Child 7583: MediaPDecoder #3]: D/PlatformDecoderModule VA-API FFmpeg init successful
[Child 7583: MediaPDecoder #1]: D/PlatformDecoderModule Choosing FFmpeg pixel format for VA-API video decoding.
[Child 7583: MediaPDecoder #1]: D/PlatformDecoderModule DMABUF/VA-API Got one frame output with pts=1601567281664000dts=1601567281464000 duration=120000 opaque=-9223372036854775808
[Child 7583: MediaPDecoder #1]: D/PlatformDecoderModule DMABUF/VA-API Got one frame output with pts=1601567281824000dts=1601567281625943 duration=40000 opaque=-9223372036854775808
[Child 7583: MediaPDecoder #3]: D/PlatformDecoderModule DMABUF/VA-API Got one frame output with
...

Summary: FF81 - high CPU load playing video/stream (no GPU usage?) → FF81 Linux - high CPU load playing video/stream (no GPU usage?)

Setting a component for this issue in order to get the dev team involved.
If you feel it's an incorrect one please feel free to change it to a more appropriate one.

Component: Untriaged → Audio/Video: Playback
Product: Firefox → Core

Martin, do you have any idea what's going on?

Severity: -- → S3
Type: defect → enhancement
Flags: needinfo?(stransky)

Star, could you post the output of when you run vainfo in the terminal? AFAIK the ffmpeg va-api backend can be used for both hardware and software decoding. That means if your GPU decoder doesn't support a certain codec, ffmpeg will just decode things in software, upload to VRAM and still share the result with firefox via dmabuf.

I do not know if there's a clear hint when a certain format is decoded in hardware or not - but Martin should know :)

Please attach a full Firefox log from the playback.
Thanks.

Flags: needinfo?(stransky) → needinfo?(star)

(In reply to Robert Mader [:rmader] from comment #3)

<...> AFAIK the ffmpeg va-api backend can be used for both hardware and software decoding. That means if your GPU decoder doesn't support a certain codec, ffmpeg will just decode things in software, upload to VRAM and still share the result with firefox via dmabuf.

Nope, doesn't work that way. For example, vaExportSurfaceHandle() that exports dma-buf surface, is part of libva, which in turn calls an implementation from a loaded VA-API driver.

It's technically possible to make a VA-API driver that performs software decoding and uploads results to dma-buf surfaces. Doesn't make a lot of sense, but still possible. But FFmpeg is definitely not an appropriate place for such code. Even with some crazy tricks it won't be able to reliably intercept calls to vaExportSurfaceHandle().

media.ffmpeg.vaapi.enabled internally also enables media.ffmpeg.dmabuf-textures.enabled which is software decoding to dmabuf. It reduces at least one copy and can also prevent a sluggish UI when playing 8K videos (if VAAPI doesn't decode beyond 4K).

(In reply to Rinat from comment #5)

It's technically possible to make a VA-API driver that performs software decoding and uploads results to dma-buf surfaces. Doesn't make a lot of sense, but still possible. But FFmpeg is definitely not an appropriate place for such code. Even with some crazy tricks it won't be able to reliably intercept calls to vaExportSurfaceHandle().

Just a note: We do that on Firefox side, see:
https://searchfox.org/mozilla-central/rev/dafb74eec8028248324018e8cd32b93808e3fd5c/dom/media/platforms/ffmpeg/FFmpegVideoDecoder.cpp#776
and below.

Thanks everyone for clarification!

(In reply to Robert Mader [:rmader] from comment #3)

Star, could you post the output of when you run vainfo in the terminal? AFAIK the ffmpeg va-api backend can be used for both hardware and software decoding. That means if your GPU decoder doesn't support a certain codec, ffmpeg will just decode things in software, upload to VRAM and still share the result with firefox via dmabuf.

I do not know if there's a clear hint when a certain format is decoded in hardware or not - but Martin should know :)

Sorry for beeing off for a while: Here we are:
libva info: VA-API version 1.7.0
libva info: Trying to open /usr/lib/x86_64-linux-gnu/dri/iHD_drv_video.so
libva info: Found init function __vaDriverInit_1_7
libva info: va_openDriver() returns 0
vainfo: VA-API version: 1.7 (libva 2.6.0)
vainfo: Driver version: Intel iHD driver for Intel(R) Gen Graphics - 20.1.1 ()
vainfo: Supported profile and entrypoints
VAProfileMPEG2Simple : VAEntrypointVLD
VAProfileMPEG2Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointVLD
VAProfileH264Main : VAEntrypointEncSliceLP
VAProfileH264High : VAEntrypointVLD
VAProfileH264High : VAEntrypointEncSliceLP
VAProfileJPEGBaseline : VAEntrypointVLD
VAProfileJPEGBaseline : VAEntrypointEncPicture
VAProfileH264ConstrainedBaseline: VAEntrypointVLD
VAProfileH264ConstrainedBaseline: VAEntrypointEncSliceLP
VAProfileVP8Version0_3 : VAEntrypointVLD
VAProfileHEVCMain : VAEntrypointVLD

Flags: needinfo?(star)
Attached file log as requested (deleted) —
(In reply to Martin Stránský [:stransky] from comment #4) > Please attach a full Firefox log from the playback. > Thanks. What exactly should I do? What is missed at the OP-log?(In reply to Martin Stránský [:stransky] from comment #4) > Please attach a full Firefox log from the playback. > Thanks.

(In reply to star from comment #10)

(In reply to Martin Stránský [:stransky] from comment #4)

Please attach a full Firefox log from the playback.
Thanks.
What exactly should I do? What is missed at the OP-log?(In reply to Martin

Just on on terminal:

"MOZ_X11_EGL=1 MOZ_LOG="PlatformDecoderModule:5" firefox > log.txt 2>&1

play the video and attach log.txt here.
Thanks.

Flags: needinfo?(star)

I just updated to 82.0.3 (Ubuntu) release and attack the log above. Thanks for look at it.

Flags: needinfo?(star)
Attached file log.txt (deleted) —

The decoding seems to be done correctly to dmabuf. I expect there's an issue with playback via WebRender. Can you check if video size makes a difference here? For instance compare CPU usage when 720p/1K/2K/4K video is player over VA-API in Firefox. To avoid potential issues it would be good to download the clips open them from disk locally, i.e. avoid streaming. You can get some at https://www.h264info.com/clips.html

Flags: needinfo?(star)

That four files I have tested behave all the same. While mpv creates a load of around 15...20% - in firefox it is around 60..80% - all files played from disk. Do I have something mis-configured in firefox? It seems also that scolling creates more load compared to Chromium.
Ah, btw: The playback of same video file ins chromium requires 10..20% less so around 40...60%, but still a lot more than mpv.
If you may ask why I am care about: My core-m5 takes lot more power in load situation, so it has impact on battery time.

Flags: needinfo?(star)

Can you attach output of this terminal command?

lspci | grep "VGA"

Thanks.

Flags: needinfo?(star)

FTR., it's expected that we take more CPU time than MPV at least until bug 1650378 is solved. But certainly not that much more :)

(In reply to Martin Stránský [:stransky] from comment #16)

Can you attach output of this terminal command?

lspci | grep "VGA"

Thanks.

00:02.0 VGA compatible controller: Intel Corporation HD Graphics 515 (rev 07)

Flags: needinfo?(star)

I see, Thanks. As the chip does not have dedicated GPU memory and uses system ram I think we do something wrong when we make a texture from dmabuf or so. But it hard to debug when I don't have the HW available.

(In reply to Martin Stránský [:stransky] from comment #19)

I see, Thanks. As the chip does not have dedicated GPU memory and uses system ram I think we do something wrong when we make a texture from dmabuf or so. But it hard to debug when I don't have the HW available.

I though most of this chips haven't dedicated RAM. Anything I can do helping you with that? Let me know.

Do you still see that bug?
Thanks.

Flags: needinfo?(star)
Status: UNCONFIRMED → RESOLVED
Closed: 2 years ago
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: