Closed
Bug 1361970
Opened 8 years ago
Closed 7 years ago
5.84 - 6.68% tp5o_scroll (windows8-64) regression on push 9fa3d6e54b5b59b88e6f63496d05601e454fa7a0 (Mon May 1 2017)
Categories
(Core :: Graphics: Layers, defect, P3)
Tracking
()
RESOLVED
FIXED
mozilla56
Tracking | Status | |
---|---|---|
firefox-esr52 | --- | unaffected |
firefox54 | --- | unaffected |
firefox55 | --- | wontfix |
firefox56 | --- | fixed |
People
(Reporter: jmaher, Assigned: mattwoodrow)
References
(Regression)
Details
(Keywords: perf, regression, talos-regression, Whiteboard: [gfx-noted])
Attachments
(2 files, 1 obsolete file)
(deleted),
text/x-review-board-request
|
mstange
:
review+
|
Details |
(deleted),
patch
|
Details | Diff | Splinter Review |
Talos has detected a Firefox performance regression from push 9fa3d6e54b5b59b88e6f63496d05601e454fa7a0. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
7% tp5o_scroll summary windows8-64 pgo e10s 1.86 -> 1.98
6% tp5o_scroll summary windows8-64 opt e10s 1.88 -> 1.99
You can find links to graphs and comparison views for each of the above tests at: https://treeherder.mozilla.org/perf.html#/alerts?id=6347
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests
For information on reproducing and debugging the regression, either on try or locally, see: https://wiki.mozilla.org/Buildbot/Talos/Running
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•8 years ago
|
||
:mattwoodrow can you take a look at this regression and determine if we can fix this or need to back it out or accept it?
Component: Untriaged → Graphics: Layers
Flags: needinfo?(matt.woodrow)
Product: Firefox → Core
Updated•8 years ago
|
Priority: -- → P3
Whiteboard: [gfx-noted]
Assignee | ||
Comment 2•8 years ago
|
||
I'll investigate this. It looks like it's the APZ tests only for some reason.
Assignee | ||
Comment 3•8 years ago
|
||
I can't seem to get tp5o_scroll running locally. Unzipping the tp5n.zip file fails because the paths are too long (both when done automatically using mach talos-test and when unzipping by hand).
Has anyone seen this before?
Markus, does profiling on try still work?
Flags: needinfo?(mstange)
Comment 4•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #3)
> Markus, does profiling on try still work?
It mostly doesn't; non-e10s tests have a higher probability of giving you useful profiles than e10s tests. (This is bug 1329132.)
And when loaded into perf.html, the call tree has many duplicated frames that are not combined into the same node, https://github.com/devtools-html/perf.html/issues/256 .
Flags: needinfo?(mstange)
Reporter | ||
Comment 5•8 years ago
|
||
the paths might be too long if you have a long base/root directory name to start with. I unzip to c:\users\jmaher\mozilla-inbound\objdir, but it could be easy to have a much longer name.
Assignee | ||
Comment 6•8 years ago
|
||
I had the path as: c:\users\mattwoodrow\src\mozilla-central\objdir-debug without success. Switching to c:\users\mattwoodrow\m-c\o fixed things.
I guess we just don't have many characters to spare with the tp5 path names.
Assignee | ||
Comment 7•8 years ago
|
||
This is because APZ sets non-integer translations on scrolled layers.
Previously we just had the scrolling layer, but now we have the scrolling layer + the non scrolled ColorLayer behind it.
When scrolling via the main-thread, the scroll translation is always an integer, and PostProcessLayers (on the compositor side) is able to determine the scrolling layer occludes the ColorLayer, and we don't render it.
When scrolling via APZ, the scroll translations is (sometimes) non-integer, leading to the occlusion test failing, and us drawing an extra Layer across the whole screen.
The extra annoying part is that most Layer implementations snap their transform back to integers anyway (as part of their ComputeEffectiveTransforms impl), but this snapping is only reflected in mEffectiveTransform, not the other transform values.
The effective transform is the transform to the nearest intermediate surface (or the root), but PostProcessLayers only uses the local layer transforms, since it recurses down through all Layers and want to update visible/opaque regions on each.
We can't easily replicate the snapping on the local transform, since multiple accumulated non-integer translations might snap differently.
Possible solutions:
* Rewrite PostProcessLayers to work with effective transforms, not entirely sure how this would work.
* Allow non-integer translations to still occlude area within PostProcessLayers. I think we basically convert the int-region into a float-region, translate by the non-integer amounts, and then round to inside pixels.
* Make APZ snap its scroll offsets before setting them on Layers.
Kats, botond, is the 3rd option feasible? Neither of the first two are particularly appealing, so hopefully that one is easier.
Flags: needinfo?(matt.woodrow)
Flags: needinfo?(bugmail)
Flags: needinfo?(botond)
Comment 8•8 years ago
|
||
(In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> * Make APZ snap its scroll offsets before setting them on Layers.
>
> Kats, botond, is the 3rd option feasible?
I think it should be doable. We just need to make sure we do the same snapping everywhere, because we use the async transform for fixed-pos layers, scrollbars, checkerboarding calculations, hit-testing, etc. A rule of thumb would be to do snapping in all the functions in AsyncPanZoomController that read the mTestAsyncScrollOffset field, and to do the snapping basically right after that field is combined with the scroll offset from mFrameMetrics.
Flags: needinfo?(bugmail)
Comment 9•8 years ago
|
||
Although I guess the snapping needs to be done in LayerPixel or ParentLayerPixel space, so there might be some conversions needed back and forth from CSS pixel space. Still, a helper function to do that would be relatively clean I think.
Comment 10•8 years ago
|
||
I have a vague recollection of us discussing APZ snapping scroll offsets to integers in the early days of APZ, and concluding that it was undesirable, but I can't remember why, nor can I find any record of that discussion.
Out of curiosity, though:
(In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> * Allow non-integer translations to still occlude area within
> PostProcessLayers. I think we basically convert the int-region into a
> float-region, translate by the non-integer amounts, and then round to inside
> pixels.
why is this option unappealing? Is it just due to the performance impact of the additional region operations?
Flags: needinfo?(botond)
Assignee | ||
Comment 11•8 years ago
|
||
(In reply to Botond Ballo [:botond] from comment #10)
> I have a vague recollection of us discussing APZ snapping scroll offsets to
> integers in the early days of APZ, and concluding that it was undesirable,
> but I can't remember why, nor can I find any record of that discussion.
>
> Out of curiosity, though:
>
> (In reply to Matt Woodrow (:mattwoodrow) from comment #7)
> > * Allow non-integer translations to still occlude area within
> > PostProcessLayers. I think we basically convert the int-region into a
> > float-region, translate by the non-integer amounts, and then round to inside
> > pixels.
>
> why is this option unappealing? Is it just due to the performance impact of
> the additional region operations?
I had a think about this more:
It seems that the normal transform on Layers (both mTransform and the shadow transform) aren't actually part of the coordinate space conversions, they are just inputs to the effective transform. The effective transform can differ from the local transform in ways that aren't predictable (thanks to snapping, and the accumulation of non-integer amounts across multiple transforms).
Given that, it seems that there isn't actually a defined conversion between a Layer and its parent, unless that parent happens to have an intermediate surface (or is the root).
We currently have visible regions on all Layer types, but it seems to me that it's only a meaningful representation for the root, containers with an intermediate surface, or leaf layers. ContainerLayers without an intermediate surface can't have a defined visible region, since we have no idea what the transform to their coordinate space actually is.
The right solution might be to make the 'official' traversal order of Layers be to jump directly down to Layers that have a defined coordinate space (we do this for preserve-3d already) rather than traversing through the intermediate ContainerLayers. We'd also want to do something to invalidate the visible region on those skipped ContainerLayers so that nobody tries to use it.
That's probably a lot of work, and not something I fancy doing, especially with all the WR changes happening right now.
Rounding the non-integer translations inwards to generate an approximate opaque regions feels like its propagating this weird state more, and I think that's the main reason I don't like it. The perf impact is a bit of pain too.
Comment hidden (mozreview-request) |
Comment 14•7 years ago
|
||
mozreview-review |
Comment on attachment 8869593 [details]
Bug 1361970 - Make PostProcessLayers occlusion culling work against the surface we will draw to rather than the parent layer.
https://reviewboard.mozilla.org/r/141172/#review148922
This looks plausible. I only have one very superficial comment.
::: layout/base/UnitTransforms.h:300
(Diff revision 1)
> }
>
> +
> +template <typename TargetUnits, typename SourceUnits>
> +static gfx::IntRectTyped<TargetUnits>
> +MoveBy(const gfx::IntRectTyped<SourceUnits>& aRect,
Maybe call this TransformByTranslation? Since that's what you have here: Usually, the two units require a transform to be applied to convert from one to the other, but you know that in this case, the transform is just an integer translation.
Attachment #8869593 -
Flags: review?(mstange) → review+
Comment 15•7 years ago
|
||
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/4524bfde8583
Make PostProcessLayers occlusion culling work against the surface we will draw to rather than the parent layer. r=mstange
Comment 16•7 years ago
|
||
Backed out in https://hg.mozilla.org/integration/mozilla-inbound/rev/48e28c61a9dc for failures in transform-3d/backface-visibility-3.html and transform-3d/opacity-preserve3d-1.html, https://treeherder.mozilla.org/logviewer.html#?job_id=105048217&repo=mozilla-inbound, and maybe, not sure yet, Android-only async-scrolling/group-opacity-surface-size-1.html, https://treeherder.mozilla.org/logviewer.html#?job_id=105031192&repo=mozilla-inbound
Comment 17•7 years ago
|
||
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/14ac21b548cf
Make PostProcessLayers occlusion culling work against the surface we will draw to rather than the parent layer. r=mstange
Comment 18•7 years ago
|
||
had to back this out in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=6a84d1b254338f398a9558b886c0ca31a94e7874&filter-classifiedState=unclassified&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception for failures like https://treeherder.mozilla.org/logviewer.html#?job_id=105399904&repo=mozilla-inbound
Flags: needinfo?(matt.woodrow)
Comment 19•7 years ago
|
||
Backout by cbook@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b6507f85ef5
Backed out changeset 14ac21b548cf
Assignee | ||
Comment 20•7 years ago
|
||
Returning None rather than an empty rect is a bit confusing, and actually wrong if the rect in question is a clip.
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Attachment #8876558 -
Flags: review?(bugmail)
Assignee | ||
Comment 21•7 years ago
|
||
Containerful scrolling does weird things when you have fixed position content within the scrolled container. The effective visible region of the container can increase when scrolled (something moves away from the fixed content), but it doesn't update the visible regions for this.
Using CalculateScissorRect included the current visible region of the ancestor with an intermediate surface in the clip, so we'd clip to the old size.
Getting rid of this and sticking to computing the clip rects ourselves lets PostProcessLayers correctly compute the increased visible area on the ancestors.
Comment 22•7 years ago
|
||
Comment on attachment 8876558 [details] [diff] [review]
Return empty rect
Review of attachment 8876558 [details] [diff] [review]:
-----------------------------------------------------------------
The UntransformBy function was added in http://searchfox.org/mozilla-central/commit/368ce48f75b8189bc0372d2749b4861f069ba9c1 - my understanding of that commit message is that the function should return Nothing() if the untransformed rect is entirely on the negative side of the w=0 plane. However the implementation seems to convert any empty rect into Nothing(), which is a different thing. Can we change the implementation to do the former? I feel like that would be a more correct fix, but it might be more involved as it would require changes to ProjectRectBounds.
Failing that, I'd still prefer to leave the UntransformBy function as returning a Maybe, and change the call site in LayerManagerComposite to convert it to an empty rect in case of Nothing() - something like:
LayerRect insideClipFloat = UntransformBy(...).valueOr(LayerRect());
Presumably the other two call sites will not need to be touched at all if we do this, since they're not dealing with clip rects and the new code is functionally equivalent to the old code.
The reason I'd prefer this is because even though the implementation of UntransformBy currently doesn't match what it's supposed to do, the correct implementation would still want to return a Maybe, and so I'd like to keep the signature the same. That will make it easier to fix the function in a follow-up bug if we decide not to do it right away.
(Also, please add a commit message to the patch.)
Attachment #8876558 -
Flags: review?(bugmail) → review-
Assignee | ||
Updated•7 years ago
|
Attachment #8876558 -
Attachment is obsolete: true
Comment 23•7 years ago
|
||
Pushed by mwoodrow@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/68932b1a5b71
Make PostProcessLayers occlusion culling work against the surface we will draw to rather than the parent layer. r=mstange
Comment 24•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
status-firefox56:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
Updated•7 years ago
|
status-firefox54:
--- → unaffected
status-firefox55:
--- → affected
status-firefox-esr52:
--- → unaffected
Reporter | ||
Comment 25•7 years ago
|
||
and a fix:
== Change summary for alert #7294 (as of June 14 2017 02:17 UTC) ==
Improvements:
6% tp5o_scroll summary osx-10-10 opt e10s 3.02 -> 2.85
2% tscrollx summary linux64 opt e10s 4.38 -> 4.28
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7294
Comment 26•7 years ago
|
||
Matt, should we be backing out bug 1349418 from 55, or would that be worse than shipping with this bug here?
Flags: needinfo?(matt.woodrow)
Assignee | ||
Comment 27•7 years ago
|
||
This is a pretty small regression that likes affects talos a lot more (thanks to ASAP mode) than it would affect real webpages.
Given that, I'm not too worried, and think we can just live with it, but backing out is fine too.
Flags: needinfo?(matt.woodrow)
Updated•7 years ago
|
(In reply to Matt Woodrow (:mattwoodrow) from comment #27)
> This is a pretty small regression that likes affects talos a lot more
> (thanks to ASAP mode) than it would affect real webpages.
>
> Given that, I'm not too worried, and think we can just live with it, but
> backing out is fine too.
Bug 1373983 seems kind of bad. Can you take a look? Or reconsider backout? Thanks.
Flags: needinfo?(matt.woodrow)
Comment 29•7 years ago
|
||
Dropping stale needinfo here. The remaining regressions already have needinfos on matt, so there's nothing he needs to do on this bug specifically.
Flags: needinfo?(matt.woodrow)
Updated•3 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•