Closed
Bug 1246443
Opened 9 years ago
Closed 9 years ago
3-5% tp5o_scroll regression on inbound (v.47) on Feb 5th, from push 0dc0dc424f06
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(firefox47 fixed)
VERIFIED
FIXED
mozilla47
Tracking | Status | |
---|---|---|
firefox47 | --- | fixed |
People
(Reporter: jmaher, Assigned: jnicol)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(2 files)
Talos has detected a Firefox performance regression from your commit 0dc0dc424f066d349b7624f6e88ad42c98bec5d8. We need you to address this regression.
This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=0dc0dc424f066d349b7624f6e88ad42c98bec5d8&showAll=1
On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tp5o_scroll
Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p linux64,win64,win32 -u none -t g1 # add "mozharness: --spsProfile" to generate profile data
To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.org/Buildbot/Talos/Running#Running_locally_-_Source_Code
Then run the following command from the directory where you set up Talos:
talos --develop -e <path>/firefox -a tp5o_scroll
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Tuesday, or the offending patch will be backed out! ***
Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•9 years ago
|
||
here is a compare view showing all the differences:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2ada62724f2a&newProject=mozilla-inbound&newRevision=0dc0dc424f06&framework=1&filter=scroll
there are two different bugs here, I don't have time in the next 2 days to investigate this much, so lets start with bug 1241917
:jnicol, can you take a look at this and let us know if you think this is your patch causing this talos regression?
Flags: needinfo?(jnicol)
Comment 2•9 years ago
|
||
Bug 1245550 is definitely not involved in this. It does not apply to any of the affected platforms.
Assignee | ||
Comment 3•9 years ago
|
||
It's not entirely surprising that bug 1241917 would have caused this - reducing the display port size could decrease scroll performance.
That change is vital, however. Without it we're OOM crashing on lots of sites. Can we leave it in, and I'll have a more in depth look to see if there's a way to avoid this regression with it in?
Flags: needinfo?(jnicol)
Reporter | ||
Comment 4•9 years ago
|
||
thanks for the explanation! Is there anything actionable here (in the future), or should we mark this as wontfix?
Comment 5•9 years ago
|
||
From the links it seems we've regressed both e10s and non-e10s? Bug 1241917 should definitely not have regressed non-e10s. As for e10s, the scrollframe that we scroll is most likely the root scroll frame, and so bug 1241917 should not have an effect, so that seems odd as well.
Assignee | ||
Comment 6•9 years ago
|
||
Hmm yes that's a good point. The fact my patch shouldn't affect the root scroll frame I thought was a bit odd, and I hadn't even considered the non-e10s case. But Lee is also right that the other patch shouldn't have caused this.
I'd suggest we keep the patch in, and keep this ticket open while I investigate further?
Comment 7•9 years ago
|
||
My guess is that transforming a rect from the root frame to every scroll frame in the document is the culprit. We should probably avoid that (the expensive transform) when 1) we don't have apz enabled, and 2) the scroll frame doesn't WantAsyncScroll(), and 3) maybe even if the scrollframe doesn't have a displayport.
Assignee | ||
Comment 8•9 years ago
|
||
Bug 1241917 made it so that a subframe's displayport base is restricted
to the root composition bounds (in addition to its previous
restrictions). This involved an expensive coordinate transformation
causing a scrolling performance regression.
This avoids restricting the displayport base to the root composition
bounds unless: APZ is enabled, the frame is async-scrollable, and the
frame has a display port.
Review commit: https://reviewboard.mozilla.org/r/34195/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34195/
Attachment #8717522 -
Flags: review?(tnikkel)
Assignee | ||
Comment 9•9 years ago
|
||
Okay, the above patch implements what :tnikkel suggests. I think it might help, but I'm having a bit of trouble with talos and try and perfherder. My local results seem inconsistent. I did a try run and I now have this: https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=2dfb45d74f42&newProject=try&newRevision=02d055409ed4&framework=1&showOnlyImportant=0
But since there's only 1 run the confidence is low. Joel, what can I do to make it do more runs / increase the confidence in the results?
Flags: needinfo?(jmaher)
Comment 10•9 years ago
|
||
You just need to retrigger the talos jobs a bunch more times so that there's more data samples. I retriggered your try push ones a bunch but you might need to do the inbound version you're comparing against as well.
Comment 11•9 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #9)
> Okay, the above patch implements what :tnikkel suggests. I think it might
> help, but I'm having a bit of trouble with talos and try and perfherder. My
> local results seem inconsistent. I did a try run and I now have this:
> https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-
> inbound&originalRevision=2dfb45d74f42&newProject=try&newRevision=02d055409ed4
> &framework=1&showOnlyImportant=0
>
> But since there's only 1 run the confidence is low. Joel, what can I do to
> make it do more runs / increase the confidence in the results?
Click on the revisions in the header (e.g. https://treeherder.mozilla.org/#/jobs?repo=try&revision=02d055409ed4), this will bring you to the treeherder jobs view. From there, you can click on the jobs you want to retrigger and do so from the jobs panel (using the refresh button on the left).
There's a bug open to streamline this process at some point. Feel free to ask questions like this on #treeherder, if jmaher's not around chances are someone else will be able to help during normally NA/Europe hours.
Reporter | ||
Comment 12•9 years ago
|
||
one trick is to use |--rebuild 5| at the end of your try syntax. As for the base revision, that needs to be done manually (I did that, jobs should show up in a while).
In addition, I triggered some e10s jobs as the regression was there as well. This is great that you have a fix testing on try and working on verifying this! Thanks for the quick work:)
Flags: needinfo?(jmaher)
Assignee | ||
Comment 13•9 years ago
|
||
Thanks for the help everyone. It looks like that patch definitely helps with this regression. But unfortunately it caused windows7-32 glterrain to regress slightly.
Updated•9 years ago
|
Attachment #8717522 -
Flags: review?(tnikkel) → review+
Comment 14•9 years ago
|
||
Comment on attachment 8717522 [details]
MozReview Request: Bug 1246443 - Only restrict displayport base to root composition bounds if content has displayport; r?tnikkel
https://reviewboard.mozilla.org/r/34195/#review31173
Okay, I change my mind. The only condition we want to check is if we have a displayport, because if we do we definitly want to do the more accurate displayport base calculation, and if we don't then we don't care.
There is one un-obvious problem with this patch as is though. Which is that we check wasUsingDisplayPort before we call MaybeCreateDisplayPort, which (you guessed it) might create a displayport. I'll upload a patch to this bug that splits out the displayport base setting from the "maybe created display port" part so that you can fix this.
Comment 15•9 years ago
|
||
Attachment #8718241 -
Flags: review?(botond)
Comment 16•9 years ago
|
||
Comment on attachment 8718241 [details] [diff] [review]
Split "setting displayport base" part out of MaybeCreateDisplayPort
I don't think anything in MaybeCreateDisplayPort depends on the displayport base being set. I looked through the code. Correct me if I'm wrong though.
Updated•9 years ago
|
Attachment #8718241 -
Flags: review?(botond) → review+
Comment 17•9 years ago
|
||
(In reply to Jamie Nicol [:jnicol] from comment #13)
> Thanks for the help everyone. It looks like that patch definitely helps with
> this regression. But unfortunately it caused windows7-32 glterrain to
> regress slightly.
I'm guessing the glterrain regression is probably not real.
Reporter | ||
Comment 18•9 years ago
|
||
do you have a link to the glterrain regression? I suspect this might be on try and that could be hard to confirm depending on what you are comparing to.
Comment 19•9 years ago
|
||
Comment 21•9 years ago
|
||
bugherder |
Assignee | ||
Comment 22•9 years ago
|
||
Joel, I saw the glterrain regression in the link in comment 9
Assignee | ||
Comment 23•9 years ago
|
||
Comment on attachment 8717522 [details]
MozReview Request: Bug 1246443 - Only restrict displayport base to root composition bounds if content has displayport; r?tnikkel
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34195/diff/1-2/
Attachment #8717522 -
Attachment description: MozReview Request: Bug 1246443 - Only restrict displayport base to root composition bounds if necessary; r?tnikkel → MozReview Request: Bug 1246443 - Only restrict displayport base to root composition bounds if content has displayport; r?tnikkel
Reporter | ||
Comment 24•9 years ago
|
||
the data in compare view shows a <1% difference in glterrain, I wouldn't worry about it as it is a bimodal test on win7 and all it takes is 1 of the many data points to fall high or low and it throws the geometric mean off.
Thanks for getting a patch up for review!
Assignee | ||
Comment 25•9 years ago
|
||
Latest version on mozreview is rebased on the other patch and makes the change :tnikkel suggests in comment 14, so the r+ carries over. Here is a try run with that patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f51039416fa3
Requesting checkin please.
Keywords: leave-open → checkin-needed
Comment 26•9 years ago
|
||
Keywords: checkin-needed
Comment 27•9 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox47:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla47
Reporter | ||
Comment 28•9 years ago
|
||
thanks for fixing this, I have verified this on the graphs!
Status: RESOLVED → VERIFIED
Updated•8 years ago
|
Assignee: nobody → jnicol
You need to log in
before you can comment on or make changes to this bug.
Description
•