Open
Bug 1394061
Opened 7 years ago
Updated 2 years ago
av1 performance regression
Categories
(Core :: Audio/Video: Playback, defect, P3)
Core
Audio/Video: Playback
Tracking
()
NEW
People
(Reporter: rillian, Assigned: drno)
References
()
Details
The recent update of the av1 reference implementation in third_party/aom to upstream commit id f5bdeac22930ff4c6b219be49c843db35970b918 (bug 1380118) resulted in dropped frames for streams over 1 Mbps, as demonstrated by the demo at https://demo.bitmovin.com/public/firefox/av1/ even on high-end hardware.
A lot of new features have been added to the code recently, so it may just be extra complexity. It's also now returning 16 bit-per-channel image data, even for 8-bit input, so memory bandwidth should be higher. Or maybe something is interacting badly with the Firefox playback scheduling.
This bug is about tracking down and resolving the regression so the demo plays smoothy.
Reporter | ||
Comment 1•7 years ago
|
||
David, the update should be merged soon, making it easier to verify this. Could you run your profiler, please, and see if anything stands out vs the 2017 August 25 Firefox Nightly?
Flags: needinfo?(dmajor)
Comment hidden (typo) |
(Fixed typo)
As a first step sanity-check, I confirmed that I can reproduce the symptoms on my test machine (haven't run the profiler yet) with https://demo.bitmovin.com/public/firefox/av1/ at 1Mbps.
Nightly 08-25 takes about 9% of my 8-core CPU, so 75% of a core
Nightly 08-28 takes about 15% of my CPU, so a full core and then some
Flags: needinfo?(dmajor)
Huge increase in av1_loop_filter_rows:
Nightly 0825:
xul.dll!av1_loop_filter_frame, 7532
xul.dll!av1_loop_filter_rows, 7531
xul.dll!av1_filter_block_plane_non420_ver, 4828
xul.dll!av1_filter_block_plane_non420_hor, 2676
Nightly 0828:
xul.dll!av1_loop_filter_frame, 33164
xul.dll!av1_loop_filter_rows, 33163
xul.dll!av1_filter_block_plane_vert, 16703
xul.dll!av1_filter_block_plane_horz, 16378
Also a large increase in CreateAndCopyData:
Nightly 0825:
xul.dll!mozilla::VideoData::CreateAndCopyData, 611
xul.dll!mozilla::VideoData::SetVideoDataToImage, 607
xul.dll!mozilla::layers::SharedPlanarYCbCrImage::CopyData, 607
xul.dll!mozilla::layers::UpdateYCbCrTextureClient, 589
xul.dll!mozilla::layers::MappedYCbCrTextureData::CopyInto, 588
xul.dll!mozilla::layers::MappedYCbCrChannelData::CopyInto, 588
Nightly 0828:
xul.dll!mozilla::VideoData::CreateAndCopyData, 4907
xul.dll!mozilla::VideoData::SetVideoDataToImage, 4895
xul.dll!mozilla::layers::SharedPlanarYCbCrImage::CopyData, 4895
xul.dll!mozilla::layers::UpdateYCbCrTextureClient, 4882
xul.dll!mozilla::layers::MappedYCbCrTextureData::CopyInto, 4882
xul.dll!mozilla::layers::MappedYCbCrChannelData::CopyInto, 4882
And some notable decreases:
av1_cdef_frame decreased from 7106 to 4573
update_boundary_info (#663) decreased from 6043 to negligible
The numbers in the data above are sample counts at 8kHz for 10 seconds of playback. So about 80k samples represents "a full core" worth of processing. In total I had 72k samples in nightly 0825, and 92k samples in nightly 0828, which more or less agrees with my comment 3.
Reporter | ||
Comment 7•7 years ago
|
||
Thanks, David. That's very helpful. I looks like the PARALLEL_DEBLOCK feature disabled simd in the loopfilter. I've tried to re-enable it in https://aomedia-review.googlesource.com/c/aom/+/19920 but there may be alignment issues.
The CreateAndCopyData spike is on our side. I thought by using the mSkip member of YCbCrBuffer::Plane would avoid the overhead of my downsampling loop, but it if it did it didn't help enough. Hopefully native 10/12-bit support (bug 1215089) will give us a fast-path here, but in the meantime I'll look into optimizing what we have.
Assignee: nobody → giles
Updated•7 years ago
|
Priority: -- → P3
Reporter | ||
Comment 8•7 years ago
|
||
Nils, this is a tracking bug for a performance regressino from the last update we did. The recent upstream optimizations should help, but we probably still need a 16-bit fast-path. Overlaps with HDR work.
Assignee: giles → drno
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•