Closed Bug 602080 Opened 14 years ago Closed 8 years ago

Perf regression from bug 130078 in CPU use during video playback

Categories

(Core Graveyard :: Tracking, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(blocking2.0 -)

RESOLVED INCOMPLETE
mozilla2.0b7
Tracking Status
blocking2.0 --- -

People

(Reporter: kinetik, Assigned: romaxa)

References

Details

While investigating bug 601542, I discovered further possible perf regressions in video playback that need to be investigated. I'm testing on x86_64 Linux and building with optimizations (-g -O2 -fno-omit-frame-pointer). Testing against a locally downloaded copy of http://media.tinyvid.tv/3d198wqepg78m.ogg and using /usr/bin/time to measure CPU. The browser is run with the video on the command line, and closed with Ctrl-C once it fades to black: 4613e0369937 (immediately before Ogg decoder rewrite/removal of liboggplay): 21.38user 5.00system 3:36.90elapsed 12%CPU (0avgtext+0avgdata 303504maxresident)k 16inputs+824outputs (1major+405814minor)pagefaults 0swaps 1d00691be5f2 (Ogg decoder rewrite landed): 29.67user 6.00system 3:34.40elapsed 16%CPU (0avgtext+0avgdata 296608maxresident)k 0inputs+14040outputs (0major+32544minor)pagefaults 0swaps * User CPU increased 38%. e560e0fe90d1 (recent trunk): 35.71user 5.47system 3:34.09elapsed 19%CPU (0avgtext+0avgdata 320256maxresident)k 0inputs+408outputs (0major+22370minor)pagefaults 0swaps * User CPU increased 20% from 1d00691be5f2, 67% from 4613e0369937. Bug 601542 is tracking a regression that seems to be caused by writing smaller audio blocks more frequently. We also seem to have lost another ~5% between the old backend and the new backend (see bug 601542 comment 6). Possible causes for the rest of the regression (completely untested guesses, also not exhaustive): * Audio Data API (bug 490705) Adds extra work to decoder even when there are no listeners are attached. * Replacement Y'CbCr code (bug 551277) Possibly slower on x86_64 and/or machines with modern SSE cores. * Decoder refactoring (bug 560708/bug 563825) * Layers changes (bug ?)
0aefd9d484f1 (before Audio Data API landed): 30.86user 5.97system 3:38.43elapsed 16%CPU (0avgtext+0avgdata 359120maxresident)k 67520inputs+8608outputs (60major+34701minor)pagefaults 0swaps 081a707a76b8 (Audio Data API landed): 30.39user 5.94system 3:34.86elapsed 16%CPU (0avgtext+0avgdata 605008maxresident)k 37480inputs+35176outputs (44major+47022minor)pagefaults 0swaps This seems to eliminate bug 490705, and also suggests that ~5s have been lost sometime between 081a707a76b8 and now.
Someone suggested disabling controls (by commenting out the binding in html.css), which doesn't gain our ~5s back, but does reveal the controls cost us around ~4-5% user CPU overhead: 081a707a76b8 (Audio Data API landed), no controls: 29.14user 5.54system 3:35.66elapsed 16%CPU (0avgtext+0avgdata 644400maxresident)k 12200inputs+29104outputs (1major+46763minor)pagefaults 0swaps e560e0fe90d1 (recent trunk), no controls: 34.03user 7.08system 3:40.14elapsed 18%CPU (0avgtext+0avgdata 303664maxresident)k 28960inputs+6824outputs (179major+20910minor)pagefaults 0swaps
ac526bc2af65 (before gfx/ycbcr landed): 33.48user 5.56system 3:34.39elapsed 18%CPU (0avgtext+0avgdata 266896maxresident)k 4064inputs+8outputs (0major+52678minor)pagefaults 0swaps ae968e2a4792 (after that and bug 551277 landed, contains quite a few other changesets): 33.57user 6.68system 3:34.21elapsed 18%CPU (0avgtext+0avgdata 269184maxresident)k 184inputs+8outputs (2major+43070minor)pagefaults 0swaps So, no change there. Also, user CPU has dropped from ~33s to ~30s when the Audio Data API landed, then gone back up to ~35s? I'm confused. The measurements I'm making seem to be repeatable within ~0.5s, so I don't think this is measurement error.
Ugh, last set of measurements (comment 3 only) are bogus because I was relying on the new default objdir behaviour. Rerun: ac526bc2af65 (before gfx/ycbcr landed): 29.28user 5.01system 3:39.30elapsed 15%CPU (0avgtext+0avgdata 334688maxresident)k 38056inputs+39664outputs (0major+35131minor)pagefaults 0swaps ae968e2a4792 (after that and bug 551277 landed, contains quite a few other changesets): 30.42user 5.09system 3:57.63elapsed 14%CPU (0avgtext+0avgdata 297888maxresident)k 3560inputs+12656outputs (1major+22330minor)pagefaults 0swaps ~3.8% difference, possibly within the noise.
So there are two major changes and one smaller one: * ~21s to ~29s - decoder rewrite, mostly audio stuff (bug 601542) * ~29s to ~30s - YCbCr changes (bug 551277) * ~30s to ~35s - tracemonkey merge (http://hg.mozilla.org/mozilla-central/rev/f290d1fe9544)
Bisected the ~30s to ~35s change within the TM merge and found it was this changeset: http://hg.mozilla.org/mozilla-central/rev/ffacf623a9df previous changeset: 30.63user 6.73system 3:33.80elapsed 17%CPU (0avgtext+0avgdata 616192maxresident)k 8320inputs+13768outputs (0major+45025minor)pagefaults 0swaps ffacf623a9df: 36.71user 5.35system 3:37.48elapsed 19%CPU (0avgtext+0avgdata 606416maxresident)k 0inputs+4704outputs (0major+40762minor)pagefaults 0swaps 20% user CPU hit from that change.
Summary: Investigate possible perf regression in video decoder → Multiple perf regressions in CPU use during video playback (67% higher CPU use)
Nominating for blocking so that this is on the table for someone (me?) to continue tracking down the 30 to 35s regression.
blocking2.0: --- → ?
We should probably split these out into separate bugs. We've already got bug 601542 so we probably just need one more bug for the regression associated with bug 130078.
romaxa's patch in bug 615870 solves (and even improves) the painting regression: 31.15user 4.27system 3:40.86elapsed 16%CPU (0avgtext+0avgdata 413120maxresident)k 8inputs+10432outputs (1major+29800minor)pagefaults 0swaps 23.53user 4.33system 3:41.61elapsed 12%CPU (0avgtext+0avgdata 411488maxresident)k 0inputs+12752outputs (1major+29882minor)pagefaults 0swaps Roughly 25% less CPU use. I'm going to mark this bug as depending on 615870. The other large regression is covered by 601542, and the YCbCr stuff has changed again with an updated code landing, so it has probably improved.
Blocks: 130078
Component: Video/Audio → Tracking
Depends on: 615870
Summary: Multiple perf regressions in CPU use during video playback (67% higher CPU use) → Perf regression from bug 130078 in CPU use during video playback
Target Milestone: --- → mozilla2.0b7
Assignee: nobody → romaxa
Whiteboard: [softblocker]
So bug 615870 gets back the performance that we lost and then some, but do we understand why bug 130078 caused a perf regression in the first place? Maybe we are leaving perf on the table, or maybe 130078 is causing perf problems in other places that bug 615870 don't fix.
I assume that it's the cost of extra display list construction, but I can't prove it. One interesting thing would be to compare video performance on a popup window with no chrome ... see if that regressed when 130078 landed.
Taking off the blocker list since we fixed the regression, but leaving open for investigation.
blocking2.0: final+ → -
Whiteboard: [softblocker]
Marking all tracking bugs which haven't been updated since 2014 as INCOMPLETE. If this bug is still relevant, please reopen it and move it into a bugzilla component related to the work being tracked. The Core: Tracking component will no longer be used.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → INCOMPLETE
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.