Open Bug 1576107 Opened 5 years ago Updated 2 years ago

Degraded video performance and high energy consumption for streamed video on MacOS

Categories

(Core :: Audio/Video, defect, P3)

All
macOS
defect

Tracking

()

People

(Reporter: whimboo, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [media-performance])

Attachments

(2 files, 1 obsolete file)

Attached image Screenshot 2019-08-23 at 10.25.13.png (deleted) —

Mozilla/5.0 (Macintosh; Intel Mac OS X 10.14; rv:70.0) Gecko/20100101 Firefox/70.0 ID:20190821215524

In short, watching streamed videos on MacOS is a bad experience for users due to the high power usage, and number of frame drops across nearly all video resolutions, and supported codecs. For details, which this statement is based on, please see below.

Earlier this year I was working on a new video streaming benchmark suite for Raptor called youtube-playback performance tests. It utilizes the playback-perf test suite from Google, which we now also run as copy in our own infrastructure. This suite contains videos with different codecs (H264, and VP9), resolutions (from 140p up to 2140p), framerates (30, and 60), and playback speeds (0.25x .. 1.0x .. 2.0x). This Raptor benchmark suite iterates over all those individual video files, starts playing those for 15s, and measures the number of dropped frames. The harness of the test site tolerates a single dropped frame as maximum allowed value, and marks everything above as failed. To allow us to also measure the number of dropped frames with Raptor, I updated the test suite accordingly, and got some improvements even included upstream.

The Raptor tests are running on integration branches since May, and since then produced a number of interesting data across all supported platforms. This is as best viewed in detail on Perfherder, but due to the complexity of test results we also abstract it in the health dashboard. Lets drill-in from top to down:

Health dashboard

This dashboard is used to visualize various performance results at a high level, also including our results. Those can be found in the "Media Playback" section at https://health.graphics/quantum/64. Also see the first screenshot as attached to find a snapshot from today.

Now what the different colors of boxes mean... If it's green, then the number of dropped frames is zero, orange is 1, and red is everything above. As already mentioned above we took the same measurement as also defined by Google for their own test runner results, which fail if dropped frames are above 1. The results for both VP9 and H264 encoded files or split into two sections.

While for all major platforms even including Windows on ARM64 the performance is good for VP9, and somewhat degraded for high resolution H264 videos, the results for Mac are bad across all of them. It's not even possible to playback videos with a 480p resolution, without being affected by a lot of framedrops.

In those 15s of playback a 480p VP9 video produces around 5 dropped frames, a 720p and 1080p video already ~45 dropped frames. Really worse it is with 4k@60fps videos which have more than 400 dropped frames even with a playback speed of 1.0x!

Hovering over the boxes, and clicking them brings you to a more detailed page on the hearth dashboard. You will notice a bit of noise for the results, which might indeed could get improved.

Perfherder

As the health dashboard picks the data from Perfherder, and only shows the results of the for now important tests, the latter has all the details. I.e this example shows the mean of dropped frames across
all the performed video tests for the major platforms. MacOS clearly falls-apart with nearly 10x times that much dropped frames.

By clicking a data point, and selecting the Compare link in the popup, it's possible to see the raw data of the collected results, like the number of decoded, and dropped frames, and the percentage of dropped/decoded.

Chromium

Beside Firefox the Test Engineering Performance team also plans to run those tests with Chromium. I hope that those will be enabled soon, so that performance numbers are present to compare with. See bug 1554967 for details.

Reproducibility

To check those numbers on your own, just load the mentioned performance test suite from Google, and run all the tests. If you want to use Raptor for generating re-usable data, just run

./mach raptor-test --test raptor-youtube-playback (--post-startup-delay 0)

Results in JSON format will be printed to the console instead of being uploaded to Perfherder.

Energy consumption

While the tests are running you can see that the energy consumption of Firefox is really high. While I haven't done any measurements on that, the TestEngineering Performance team currently works on getting the power tests updated to also run those video streaming tests (Personally I'm not aware of an open bug yet).

But also watching videos of other streaming sites like tvnow.de, and keeping an eye on the CPU load, I can always see a 200% (2 full cores) usage. This compared to Safari, which uses about 40-60% only.

Summary: Degraded video performance and high engery consumption for streamed video → Degraded video performance and high engery consumption for streamed video on MacOS
Depends on: 1575575

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+2] from comment #0)

Energy consumption

While the tests are running you can see that the energy consumption of Firefox is really high. While I haven't done any measurements on that, the TestEngineering Performance team currently works on getting the power tests updated to also run those video streaming tests (Personally I'm not aware of an open bug yet).

But also watching videos of other streaming sites like tvnow.de, and keeping an eye on the CPU load, I can always see a 200% (2 full cores) usage. This compared to Safari, which uses about 40-60% only.

GPU usage, on top of of CPU usage, is a big problem. It should be (and is on any other browser) doing practically nothing. When using Intel Power Gadget on OSX, the GPU frequency is around 0.2GHz when playing a vp9 video in Chrome, and consistently pinned to its max frequency in Firefox.

Bug 1400787 has energy consumption figures.

Priority: -- → P2
Summary: Degraded video performance and high engery consumption for streamed video on MacOS → Degraded video performance and high energy consumption for streamed video on MacOS

So the reported dropped frames is the sum of mDroppedDecodedFrames, mDroppedSinkFrames, mDroppedCompositorFrames. It would be good to know which of these three is the problem.

I ran the 88. PlaybackPerf.VP9.1080p60@1.5X test (from our instance) on a bunch of different Macs in the office. This test has consistently high numbers on the graph page for dropped frames for VP9 videos on macOS. However, out of the six Macs tested, only one showed any dropped frames at all, and it was a very old machine.

  • MacBook Pro (Late 2016), 10.12.6: no dropped frames.
  • MacBook Pro (Early 2013), 10.14.6: no dropped frames.
  • MacBook Pro (Mid 2012), 10.15: no dropped frames.
  • Mac mini (Late 2012), 10.13.6: no dropped frames.
  • Macbook Pro (Mid 2010), 10.13.6: no dropped frames.
  • MacBook Pro (Early 2011), 10.9.2: 300-500 dropped frames.

The only videos that show any frame drops on the two most modern machines are VP9 2160p60 videos at 1.5x and 2x.

(In reply to Markus Stange [:mstange] from comment #4)

The only videos that show any frame drops on the two most modern machines are VP9 2160p60 videos at 1.5x and 2x.

I have the same results on my mid-2015 MBP running "./mach raptor-test --test raptor-youtube-playback".

Hi Henrik, on what hardware are the tests running in CI? Thanks!

Flags: needinfo?(hskupin)

Mac tests are running on "Mac Mini R7". Windows and Linux tests are running on "HPE Moonshot". https://wiki.mozilla.org/TestEngineering/Performance/Talos/Platforms

Flags: needinfo?(hskupin)

The pattern seems to be the following:

During video playback, in the beginning, all frames correctly make it to the screen, and become "presented" frames.
Then, after some number of frames that seems to be between 200 and 500 for me, something changes and no frames make it to the screen from here on out. Instead, they all become "dropped sink frames". This happens both on my Late 2016 MBP and on the Early 2011 MBP.
On the Early 2011 machine, on one attempt I saw a slightly different phenomenon: After 408 presented frames and 122 dropped sink frames, the rest of the frames became "dropped decoded frames". (Final statistic: 408 presented, 122 dropped sink, 382 dropped decoded. This sums up to 912 which is more than the total 720.)
Here's a profile from the Early 2011 machine where all video frames start getting dropped at the 28.5 second mark.
Here's a profile from the Late 2016 machine (captured by Instruments) where all video frames start getting dropped at the 8 second mark.

On the 2160 videos, I also sometimes get into the state where the video throbber appears halfway through and the network requests are restarted.
Here's an example profile of that happening.

Depends on: 1589230
Attachment #9101861 - Attachment is obsolete: true

I chatted with Henrik on slack about this briefly yesterday. He said that he used to see the poor performance when running the tests locally on his macbook pro, but when he retested yesterday things were much improved, which does make it seem like there is something going on with the mac minis in automation.

Looking at the machine specs, the mac minis in automation are dual core machines. I asked Markus, and the mac mini tested in comment 3 is a quad core machine. However, he pointed out one of the laptops he had tested on was dual core and the tests were fine there, so this might not be a relevant difference.

Henrik also mentioned that there is a WIP on Bug 1553131 to get Gecko profiler profiles while running in automation. I'm going to see what results I get with that next.

Attached file Dropped frames by type (deleted) —

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

So the reported dropped frames is the sum of mDroppedDecodedFrames, mDroppedSinkFrames, mDroppedCompositorFrames. It would be good to know which of these three is the problem.

I ran three try jobs: one with only dropped decoded frames reported [1], one with only dropped sink frames reported [2], and one with only dropped compositor frames reported [3].

The only tests on which we are dropping decoder frames are VP9.2160p30@2x and VP9.2160p60@2x which are problematic even when run outside of automation. Looking at the PlaybackPerf.VP9.1080p60@1.5X test, we have 0 of 690 dropped sink frames and 210 of 909 dropped compositor frames. That isn't necessarily typical, some tests show only dropped sink frames, and some have a mixture of both. Attached are the full results.

Since these results are from combining individual try runs made at different times we have to be a bit careful making direct comparisons, but I would say it does point to the problem being at the sink or the compositor and not with decoding.

[1] https://treeherder.mozilla.org/#/jobs?repo=try&revision=ab86b4fbb7b70e8d10b7511e7f76c5beebd5f71e
[2] https://treeherder.mozilla.org/#/jobs?repo=try&revision=9c899ba0914a3cbae9988e23737c67fecef9ed09
[3] https://treeherder.mozilla.org/#/jobs?repo=try&revision=1544c7a7c305fb2c1711fd6e7c4b0d3fa1c3b100

Here's an excerpt of test 88 from the profile that was captured on the CI machine: https://perfht.ml/2PgC0df
It looks like the ImageBridgeTransactions arrive at bad times.

Blocks: 1604207

Marian, a very important bit from bug 1554967 you didn't mention here. The results with Chromium are even more worse. As the following plot shows it has 6x more dropped frames on MacOS:

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=mozilla-central,2054686,1,10&series=mozilla-central,2189198,1,10&timerange=2592000

As such I wonder if there is a general problem or limitation with those Mac Minis in CI.

Henrik, i had a discussion with Dave Hunt about these devices and there will be an upgrade in Q1 and we can see more clearly if the new hardware will solve these issues.

Here is another discussion from yesterday, (#perftest on slack) :

davehunt :
@jmaher happy new year! do you know how many Apple Mac Minis we have for perf testing?

jmaher :
@davehunt about 450
it is a shared pool with the unittests
@davehunt on a related note, the mac mini pool is out of warranty and there is new hardware to replace it- this is upgraded specs as that is what is available; timeline is Q1 to replace- probably February


I also ran the youtube-playback-tests as "--power-test" to force the MAC jobs to run on MacBook Pro,

But at least we have some results for Firefox:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=2b7e94a872ff9b92dddd25ecce10383d9d85464f

The workerID contains "mbpro", so i assume the tests are running on MacBook Pro:
(Selected an ytp job -> Task link -> Worker Type:
https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1014-power)

The ytp score is not displayed under Performance Tab for ytp power tests,
but i did these steps to get the results:

From "OS X 10.14 Shippable opt" selected the ytp jobs
Clicked on "Job Details" tab at the bottom
Clicked on perfherder-data.json
Clicked on "Collapse All"

The values for suites[0]['value'] field are:

  1. 35.64
  2. 34.84
  3. 33.85
  4. 36.9
  5. 33.42
  6. 35.92
  7. 36.58
  8. 34.57

The results are still higher than other platforms but lower compared to regular "mac OS 10" on Firefox,
(https://firefox-ci-tc.services.mozilla.com/provisioners/releng-hardware/worker-types/gecko-t-osx-1014)
where we have values between 50 - 60:

https://treeherder.mozilla.org/perf.html#/graphs?highlightAlerts=1&series=mozilla-central,2054686,1,10&series=mozilla-central,2044136,1,10&series=mozilla-central,2044425,1,10&series=mozilla-central,2045003,1,10&series=mozilla-central,2043847,1,10&series=mozilla-central,2044714,1,10&series=mozilla-central,2043558,1,10&timerange=2592000

Latest measurements for OS X 10.14 Shippable opt
(running ytp as power tests on MacBook Pro devices):

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f8bcb8a71f67cc6a4c2162d4f038ab573a8622fe
(the ytp tests failed on "Linux x64 shippable opt" because the power tests are designed to run only on Mac OS devices)

Firefox 74.0a1

  • 34.79
  • 35.9
  • 34.36
  • 35.35
  • 34
  • 34.86
  • 35.21
  • 35.43
  • 33.58
  • 35.2
  • 35.8

Chrome 79.0.3945.117

  • 229.82
  • 221.78
  • 229.64
  • 225.41
  • 229.26
  • 222.86
  • 228.6
  • 223.09
  • 222.93
  • 232.99
  • 224.85
Priority: P2 → P3
Whiteboard: [media-performance]
Depends on: 1654040
Depends on: 1655853
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: