Closed Bug 1565225 Opened 5 years ago Closed 5 years ago

Composition recording with webrender does not record a blank orange frame

Categories

(Core :: Graphics: WebRender, defect, P1)

Unspecified
Windows 10
defect

Tracking

()

RESOLVED WORKSFORME
Tracking Status
firefox-esr60 --- unaffected
firefox-esr68 --- unaffected
firefox68 --- unaffected
firefox69 --- wontfix
firefox70 --- wontfix
firefox71 --- wontfix
firefox72 --- fix-optional

People

(Reporter: denispal, Assigned: barret)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(2 files, 1 obsolete file)

I noticed this when trying to use the browsertime tool to obtain visual metrics for page loads. Browsertime will first inject a <div> element that will paint a blank orange frame and once removed is an indicator of navigation start. However, with webrender enabled the orange frame is never recorded by the composition recorder. After bisection, it seems to have regressed by bug 1413546. Flipping layout.display-list.retain.chrome to false does fix the problem.

Instructions to reproduce are a bit involved but here they are:

  1. Install npm and nodejs for your platform
  2. git clone https://github.com/mozilla/browsertime.git
  3. cd browsertime
  4. npm install
  5. python -m pip install Pillow
  6. Install imagemagick for your platform from a package manager or from https://imagemagick.org/script/download.php
  7. Install ffmpeg for your platform from a package manager or from https://ffmpeg.org/download.html

Then you can execute:
node C:\src\browsertime\bin\browsertime.js --skipHar -b firefox --viewPort maximize -n 1 --visualMetrics true https://www.youtube.com --firefox.preference layers.acceleration.disabled:false --firefox.binaryPath "C:\\Program Files\\Firefox Nightly\\firefox.exe"

You'll see the following error because the orange frame was never recorded:
[2019-07-11 09:35:45] ERROR: The video recording start is zero, either no orange is there in the video or VisualMetrics failed: {.........}

adding --firefox.preference layout.display-list.retain.chrome:false fixes the problem.

You can also delete the following line from browsertime to inspect the window recorder directory: https://github.com/mozilla/browsertime/blob/master/lib/video/screenRecording/firefox/firefoxWindowRecorder.js#L120

Component: Graphics → Graphics: WebRender

Where is the code in browser time and add/removes the orange div?

(In reply to Timothy Nikkel (:tnikkel) from comment #1)

Where is the code in browser time and add/removes the orange div?

This script is injected to add orange:
https://github.com/mozilla/browsertime/blob/master/lib/video/screenRecording/setOrangeBackground.js

This one is injected to remove it:
https://github.com/mozilla/browsertime/blob/master/lib/core/seleniumRunner.js#L192-L202

Blocks: wr-70
Priority: -- → P2
Attached file errors.txt (deleted) —

Is browsertime expected to work on mac?

I get errors even when layout.display-list.retain.chrome:false and both with and without webrender (different errors with/without webrender).

Flags: needinfo?(dpalmeiro)

I can reproduce the same error. It seems like the composition recorder doesn't actually work on Mac when Webrender is enabled. I opened bug 1566915 for this issue. Unfortunately, this means the problem in this bug can probably only be reproduced on Windows for now.

Flags: needinfo?(dpalmeiro)
OS: Unspecified → Windows 10
Attached file or.html (deleted) —

After applying the patches in bug 1566915 and bug 1569258 I get usable composition recording on my mac.

I created a standalone testcase that approximates the orange dive that browsertime uses based on the links in comment 2. The orange div still showed up in the recorded pngs.

Regressor landed in 69, adjusting flags.

Denis, is this still a problem?

Flags: needinfo?(dpalmeiro)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #7)

Denis, is this still a problem?

Yes, still no orange frame on windows 10.

Flags: needinfo?(dpalmeiro)
Assignee: nobody → brennie

Hey Barret, it looks like this might be in your wheelhouse?

Flags: needinfo?(brennie)

I believe the problem is that the display lists of the frame before the orange one and the orange one will be the same and hence RenderThread::UpdateAndRender (which is what triggers recording) isn't being called.

However, I really don't know enough about WebRender internals or display lists to be able to tackle this.

Flags: needinfo?(brennie)

(In reply to Barret Rennie [:barret] (he/him) from comment #10)

I believe the problem is that the display lists of the frame before the orange one and the orange one will be the same and hence RenderThread::UpdateAndRender (which is what triggers recording) isn't being called.

However, I really don't know enough about WebRender internals or display lists to be able to tackle this.

Naive question: can we make browsertime ("any harness") paint a special sequence of frames (say orange/brown/orange/brown) and teach the processing to recognize subsequences of sufficient length? That might avoid smart frame coalescing/paint suppression without requiring changes to the rendering engine itself.

I don't think that the problem is the colour; I thnk that the problem is that about:blank and the blank orange frame use the same display list. We would need to change it so they do not generate the same display list (I think). I might be super off base here though tbh.

That theory could be tested by just adding a smaller orange div inside the big orange div, that should give us a different display list but the same visual output.

Priority: P2 → P3
No longer blocks: wr-70

I had a play with this, and I can't reproduce the failure on Windows or OSX. I get videos recorded in both cases, and they always include the orange frame.

The bug title mentions WebRender, but the STR has --firefox.preference layers.acceleration.disabled:false (which would disable all acceleration, include WebRender).

I've been replacing that with gfx.webrender.all=true to ensure we do have WebRender enabled.

Looking at the browsertime code, it appears that we set the orange frame, and then wait 400ms before removing it and initiating the load.
Does varying that value change how frequently this reproduces for you?

In general, there's no guarantee that the compositing backend (WebRender) won't coalesce frames if it receives multiple while it's busy (doing the initial startup?), and waiting a fixed number of ms is likely to have intermittent failures of this nature.

If running Gecko-specific privileged scripts is an option, then the ideal solution is to use nsIDOMWindowUtils to get the 'lastTransactionId' before adding the orange div, and then wait for a MozAfterPaint event with an id > that value.

Matt,

Thanks for your analysis.

If running Gecko-specific privileged scripts is an option, then the ideal solution is to use nsIDOMWindowUtils to get the 'lastTransactionId' before adding the orange div, and then wait for a MozAfterPaint event with an id > that value.

Yes, Gecko-privileged scripts are an option. Let's agree that the ball is now in perf team's court to determine if your suggestion can be made to work. Thanks again!

(In reply to Nick Alexander :nalexander [he/him] from comment #15)

Matt,

Thanks for your analysis.

If running Gecko-specific privileged scripts is an option, then the ideal solution is to use nsIDOMWindowUtils to get the 'lastTransactionId' before adding the orange div, and then wait for a MozAfterPaint event with an id > that value.

Yes, Gecko-privileged scripts are an option. Let's agree that the ball is now in perf team's court to determine if your suggestion can be made to work. Thanks again!

Hi all! I dug into this during the past week.

First, on Android we simply don't (can't) load a data:text/html,... URL, so the orange div insertion can never work. See Bug 1582606. This can be worked around at the Browsertime level by setting a pref and tweaking the Browsertime code a tiny bit around this conditional.

Second, what Matt suggests feels like it should be possible, but it's awkward. The way to achieve this is with an actor pair between the main and content processes. We want to execute chrome JS in the main process (because that's where Marionette executes privileged JS) but we need to wait for the MozAfterPaint event in the content process (because that's the Window that needs to paint). However, setting up actors dynamically is awkward: it feels do-able, but it involves writing JS to disk (or blobs, preferably) and delicately arranging message passing.

While investigating this approach I also saw a suggestion to wait for two animation frames: see https://searchfox.org/mozilla-central/rev/c8933daeb71df02566407eff88904ee981a7bcdd/dom/animation/test/chrome/test_running_on_compositor.html#73-78 and https://searchfox.org/mozilla-central/source/testing/web-platform/tests/web-animations/testcommon.js#156-167 and related functions. If I do this in content JS immediately after inserting the orange div, things appear to work. I don't really know if this is reliable, but it does ensure the orange frame is painted.

However, while really try to drill into this, I got side tracked into some Android debugging around making the screen recorder work in the Android x86/x86_64 emulator (Bug 1584659) so I'm not quite done.

Matt, perhaps you could comment on whether animation frames would be enough here, and whether my understanding of which process to expect the MozAfterPaint event to fire in is correct.

Flags: needinfo?(matt.woodrow)

(In reply to Nick Alexander :nalexander [he/him] from comment #16)

Matt, perhaps you could comment on whether animation frames would be enough here, and whether my understanding of which process to expect the MozAfterPaint event to fire in is correct.

I'm not Matt but I think you should be able to wait for MozAfterPaint in the parent process, it should fire after compositing the updated layer tree it receives from the child process.

(In reply to Timothy Nikkel (:tnikkel) from comment #17)

I'm not Matt but I think you should be able to wait for MozAfterPaint in the parent process, it should fire after compositing the updated layer tree it receives from the child process.

Couldn't we also get a MozAfterPaint for an unrelated browser front-end change, while the content process is still busy?

I think you do need to wait in the content process to get fully reliable results.

The two animation frames approach should also mostly work, but it's relying on an implementation detail (that we throttle the main-thread painting to stop it getting more than two frames ahead of the compositor, see [1]).

There's a non-zero chance that this will change in the future, especially with WebRender (which has been designed to support a 4 frame pipeline).

I'd highly recommend finding a way to get an automated mozilla-central test for the behaviour you're using, so that this doesn't silently stop working on you again.

[1] https://searchfox.org/mozilla-central/rev/6866d3a650c826f1cabd123663e84b95ee743701/layout/base/nsRefreshDriver.cpp#2283

Flags: needinfo?(matt.woodrow)

P3 and we are building 70RC2 today, so wontfix for 70.

Are you still working on this, Barret?

Flags: needinfo?(brennie)

This is in my queue but I've got other things on my plate atm.

Flags: needinfo?(brennie)

P3 and we have one beta left before RC week, wontfix for 71.

Blocks: 1601004
Priority: P3 → P1
Status: NEW → ASSIGNED

A summary on the decided course of action:

Instead of attempting to work around WR's multiple frame pipeline, we will instead capture an epoch timestamp of when the recording starts. This can be used in lieu of the orange frame in browsertime (with some modifications therein) to compute where in the captured video the navigation starts.

When running Browsertime using the Firefox Composition Recorder with WebRender
enabled, it was some times the case that the composition recorder would miss a
frame generated artificially by Browsertime to mark the start of page load,
which would completely break visual metrics. This is an attempt to ensure that
we can wait for the render pipeline to flush from within Browsertime (via a
Chrome-priviliged script injected by WebDriver [1]) so that this does not occur.

  1. Example usage of this api:
gBrowser
  .selectedBrowser
  .frameLoader
  .browsingContext
  .currentWindowGlobal
  .getActor("FlushWaiter")
  .waitForFlush();

This works for me now using nightly on Feb 19, 2020.

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → WORKSFORME
Attachment #9127713 - Attachment is obsolete: true
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: