Open Bug 1600269 Opened 5 years ago Updated 2 years ago

Worse performance with new Fission compatible `drawSnapshot` API

Categories

(Core :: Graphics: Layers, defect, P3)

defect

Tracking

()

Fission Milestone Future

People

(Reporter: whimboo, Unassigned, NeedInfo)

References

(Blocks 2 open bugs)

Details

(Keywords: perf, regression)

The following gecko profile shows the process of taking a full screenshot via Marionette and the new drawScreenshot API for https://www.blueorigin.com/latest/gallery (1503 x 4938 pixels) :

https://perfht.ml/37Ng3cG

Steps which takes a very long time:

  1. 5s to draw the snapshot
  2. ~4s to move the screenshot data to the parent process (1.2GB)
  3. ~4s in toDataURL()
  4. Marionette: ~4s to transfer the base64 encoded data via a socket to the Marionette client
    '
    Matt, can you please have a look at the first 3 parts?
Flags: needinfo?(matt.woodrow)
Keywords: perf

Using Puppeteer and running https://github.com/puppeteer/puppeteer/blob/master/examples/screenshot-fullpage.js with Chromium the following results can be seen:

$ time PUPPETEER_PRODUCT=chrome PUPPETEER_EXECUTABLE_PATH=.local-chromium/mac-706915/chrome-mac/Chromium.app/Contents/MacOS/Chromium  NODE_PATH=../ node examples/screenshot-fullpage.js
PUPPETEER_PRODUCT=chrome PUPPETEER_EXECUTABLE_PATH= NODE_PATH=../ node   4.53s user 1.91s system 52% cpu 12.366 total
$ file full.png
full.png: PNG image data, 750 x 11018, 8-bit/color RGBA, non-interlaced

So it takes 12s in total including the page load.

Blocks: 1499845

:whimboo, since this bug is a regression, could you fill (if possible) the regressed_by field?
For more information, please visit auto_nag documentation.

Flags: needinfo?(hskupin)

For Marionette specifically it regressed with bug 1559592, but the problem actually is with the implmentation of the new API. Given that this is a core issue not sure what I would have to set.

Flags: needinfo?(hskupin)

Tracking for Fission Nightly (M6)

Fission Milestone: --- → M6
Blocks: fission-perf
Flags: needinfo?(matt.woodrow)
Priority: -- → P3
Fission Milestone: M6 → Future

Matt, do you know if this is still an issue we need to track or has this been fixed already?

Flags: needinfo?(matt.woodrow)

I don't think we need to block on this, but no it hasn't been fixed.

Flags: needinfo?(matt.woodrow)

Maybe this is also related to what we can see on bug 1559592. Creating snapshots from pages including Apple emojis cause a poor performance in creating page thumbnails for Ctrl-Tab on MacOS.

I think the best thing to do here is to record into file handles, rather than shmem.

The current code records into normal memory (reallocating to grow as needed if the recording is big), copies into shmem, then copies the shmem back into memory in the parent process. Recording directly into a file would remove the reallocations, and both of those copies.

We already do the same thing for printing recording, using DrawEventRecorderPRFileDesc instead of DrawEventRecorderMemory.

We'd need CrossProcessPaint to allocate a temporary file for each content process it wants to record, and send that along with the request to record. We could then instantiate DrawEventRecorderPRFileDesc using the file handle, record into that, and then just notify the parent that we've completed (and it can read from the file).

(In reply to Matt Woodrow (:mattwoodrow) from comment #9)

https://treeherder.mozilla.org/jobs?repo=try&tier=1%2C2%2C3&revision=931041fd3e1a32550510c708792b3fdd55200beb

Sadly the builds for MacOS opt are busted. So I don't have binaries to download. Re-triggered the build job doesn't help due to some TaskCluster failures. Matt, could you please push a new try build, or at least test yourself locally? The Marionette test that I'm using for the measurement is:

from marionette_harness import MarionetteTestCase

class Test(MarionetteTestCase):
    def test_screenshot(self):
        self.marionette.navigate("https://www.blueorigin.com/latest/gallery")
        screenshot = self.marionette.screenshot(full=True)

Then it can be run via MOZ_PROFILER_STARTUP=1 MOZ_PROFILER_SHUTDOWN=marionette_profile.json mach marionette-test test.py.

Here a profile with the most recent Firefox Nightly build, which looks even worse compared to the build that I used when filing this bug:

https://share.firefox.dev/3rc6vAG

So I'm kinda looking forward to see performance improvements here.

Flags: needinfo?(hskupin) → needinfo?(matt.woodrow)

And one more thing, could we actually get a performance test for the drawSnapshot API?

Flags: needinfo?(matt.woodrow) → needinfo?(hskupin)

The improvements here seem to be marginal: https://share.firefox.dev/3r8HXbQ.

It still takes a long time, at least given the duration of Task PWindowGlobal::Msg_DrawSnapshot which reports 11s compared to 7.5s before.

Flags: needinfo?(hskupin) → needinfo?(matt.woodrow)

(In reply to Henrik Skupin (:whimboo) [⌚️UTC+1] from comment #13)

The improvements here seem to be marginal: https://share.firefox.dev/3r8HXbQ.

Same profile with symbols: https://share.firefox.dev/3pIqBT0 (and here with CPU data visible).

Interesting, those profiles no longer show time spent in unnecessary copies :( Is the total time worse than before? I guess the overhead of using files is worse than just copying.

Back to the drawing board I guess!

Here some numbers as comparison to Chrome, which contain not only the drawSnapshot time, but everything including the base64 encoding and the transmission of the screenshot over the socket to geckodriver. In this case only for the visible viewport:

Firefox (current mozilla-central):

  • Time before screenshot: 1612298483.7496388
  • Time after screenshot: 1612298485.118722
    Screenshot: 2560x1636

Firefox (try build with your changes):

  • Time before screenshot: 1612298848.8437572
  • Time after screenshot: 1612298850.0715609
    Screenshot: 2560x1636

Chrome:

  • Time before screenshot: 1612298380.4292579
  • Time after screenshot: 1612298382.065851
    Screenshot: 2400x1588

So actually for images of this size your new code path is actually faster, and we also beat Chrome by around 0.4s. Here a profile for that particular scenario.

So maybe creating screenshots of the full page (which is huge!) triggers some slow code path?

I can have a look for a comparison between Firefox and Chrome for a full page screenshot tomorrow. I would have to do it via Puppeteer and the CDP protocol.

Severity: normal → S3
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.