Closed Bug 1189903 Opened 9 years ago Closed 9 years ago

8% Linux 64 glterrain regression on Mozilla-Inbound-Non-PGO on July 30, 2015 from push 189161dc1616

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox42 fixed)

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: jmaher, Assigned: acomminos)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(1 file, 2 obsolete files)

Talos has detected a Firefox performance regression from commit 189161dc1616 in bug 1179063.

This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=189161dc1616&showAll=1

On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.

To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#glterrain

Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64 -u none -t g1  # add "mozharness: --spsProfile" to generate profile data

To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code

Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a glterrain

Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch will be backed out! ***

Our wiki page oulines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling


here is a perfherder comparison view:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=c14a0de10f23&newProject=mozilla-inbound&newRevision=189161dc1616&hideMinorChanges=1

It shows the glterrain regression as well as a tscrollx regression.  The smaller regressions might not be as much of a concern.
:shu, as the author who broke the build which prevented us from finding the root cause, I would appreciate your help in narrowing this down.
Flags: needinfo?(shu)
:bhearsum, I added you to the cc list has 2 of the changes in here have you as the author, can you confirm that your changes are not responsible for this?
Flags: needinfo?(bhearsum)
(In reply to Joel Maher (:jmaher) from comment #2)
> :bhearsum, I added you to the cc list has 2 of the changes in here have you
> as the author, can you confirm that your changes are not responsible for
> this?

Highly, highly unlikely. My change was http://hg.mozilla.org/integration/mozilla-inbound/rev/2d831e32aada (and a merge commit), which is a no-op on mozilla-central.
Flags: needinfo?(bhearsum)
[un-CC'ing myself; my commit in that range is just making a small tweak to a SVG file that's only used in devtools. So unless this talos test involves opening & rendering our devtools, it's impossible for my tweak to have affected it.]
This is likely caused by my changes, but I'm not sure how (considering that the Linux talos builds don't use the OpenGL compositor AFAICT). I'll try to figure out why.
we could push to try with it backed out and in, if you add "--rebuild 5" it will give enough data point to remove noise.
(In reply to Andrew Comminos [:acomminos] from comment #7)
> This is likely caused by my changes, but I'm not sure how (considering that
> the Linux talos builds don't use the OpenGL compositor AFAICT). I'll try to
> figure out why.

Most linux tests have regressed, and the WebGL wasn't even the biggest regression. tscrollx regressed 9%, and other tests have regressed roughly 2-3% on average (and 3 tests improved by ~2%):

https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=c14a0de10f23&newProject=mozilla-inbound&newRevision=189161dc1616&hideMinorChanges=1

Do despite the bug title, it's not only about WebGL.
This appears to be caused by forcing RGBA surfaces and applying a colour mask with WebGL even when we don't really need one (i.e. when not sharing surfaces). I'll write up a patch to use standard RGB surfaces without a colour mask when layers acceleration is disabled.

I still don't understand how tscrollx has regressed with the basic compositor however.
it could be we have two different regressions in this range.
Here's a patch that should go down the old path when surface sharing is disabled.

Talos push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=07f69ee7c4d0
Sorry, gfxInfo won't actually give us the info we need on Linux unless it's enabled by default; this patch should do the correct thing.

Talos push: https://treeherder.mozilla.org/#/jobs?repo=try&revision=31e9e7b95d43
Attachment #8641889 - Attachment is obsolete: true
Attachment #8641973 - Flags: review?(jgilbert)
Talos runs with this patch applied seem to go much better when compared to c14a0de10f23;

https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=c14a0de10f23&newProject=try&newRevision=31e9e7b95d43&hideMinorChanges=0

The dromaeo_dom and a11yr regressions are almost certainly not due to my changes.
thanks for jumping on this- I will look at the other regressions.  I suspect the dromaeo dom might be misleading due to the way it is calculated in compare view vs the way it was intended to be calculated.  I need to look at a11y and see the trend a bit more, in the past that was a noisy test.
Flags: needinfo?(shu)
Comment on attachment 8641973 [details] [diff] [review]
Don't use RGBA surfaces on GLX if surface sharing is not used.

Review of attachment 8641973 [details] [diff] [review]:
-----------------------------------------------------------------

::: dom/canvas/WebGLContext.cpp
@@ +686,5 @@
>          baseCaps.premultAlpha = true;
>  
> +    if (gl->IsANGLE() ||
> +        (gl->GetContextType() == GLContextType::GLX &&
> +         gfxPlatform::GetPlatform()->GetCompositorBackend() == LayersBackend::LAYERS_OPENGL)) {

Multi-line conditionals means the { comes down onto its own line.
Attachment #8641973 - Flags: review?(jgilbert) → review+
Attachment #8641973 - Attachment is obsolete: true
Assignee: nobody → acomminos
Status: NEW → ASSIGNED
Keywords: checkin-needed
https://hg.mozilla.org/mozilla-central/rev/d4a8f6b67ce0
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
verified that this resolved the issue!  Thanks for working on it!
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: