Closed
Bug 1278686
Opened 8 years ago
Closed 8 years ago
2.38% tresize (windows8-64) regression on push 8908cc624380 (Thu Jun 2 2016)
Categories
(Core :: WebRTC, defect, P1)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
Attachments
(2 files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
Talos has detected a Firefox performance regression from push 8908cc624380. As author of one of the patches included in that push, we need your help to address this regression.
This is a list of all known regressions and improvements related to the push:
https://treeherder.mozilla.org/perf.html#/alerts?id=1431
On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test(s), please see:
https://wiki.mozilla.org/Buildbot/Talos/Tests#tresize
Reproducing and debugging the regression:
If you would like to re-run this Talos test on a potential fix, use try with the following syntax:
try: -b o -p win64 -u none -t chromez-e10s[Windows 8] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data
(we suggest --rebuild 5 to be more confident in the results)
To run the test locally and do a more in-depth investigation, first set up a local Talos environment:
https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code
Then run the following command from the directory where you set up Talos:
talos --develop -e [path]/firefox -a tresize
(add --e10s to run tests in e10s mode)
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! ***
Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Comment 1•8 years ago
|
||
Is this with or without e10s? It looks like maybe with e10s.
It seems weird that this would cause a regression, because it should make us do strictly less work in an e10s build.
Reporter | ||
Comment 2•8 years ago
|
||
this is a tricky one as we had 29 changesets get pushed while the build was broken- imagine 100 try pushes to get this figured out, but that is the truth.
I did attempt to narrow this down and I believe it is:
https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=8908cc624380
there are a few changesets there and my suspicion is:
https://hg.mozilla.org/integration/mozilla-inbound/rev/25e2bc431fa9
as far as I can tell we just have the single regression on windows 8 tresize.
either way, I was not able to back that changeset out and verify on try as it wasn't clean and my attempt to rectify the merge conflicts resulted in a build error.
:mccr8, can you take a look at this and let us know if you can figure out why this is the case?
Reporter | ||
Comment 3•8 years ago
|
||
sorry, this is e10s only!
Comment 4•8 years ago
|
||
You need to also back out efcca7d36ad4 to back out either cb803c900f83 or 25e2bc431fa9.
Reporter | ||
Comment 5•8 years ago
|
||
that is a lot of either or stuff, can you push to try with the backout? I have spent about 4 hours trying to bisect this down already, I don't mind validating data against a try push!
Comment 6•8 years ago
|
||
I'm not sure what exactly you want. Are you looking for a patch against trunk mozilla-central that has 25e2bc431fa9 backed out?
Reporter | ||
Comment 7•8 years ago
|
||
I was just trying to narrow down the 4 patches to figure out which one is the root cause- if you have an idea of what part of the code is causing the regression, ideally we don't need anything.
to narrow it down, I would ideally:
for patch in patches:
hg backout patch.rev
hg push -f try
// revert changes
Comment 8•8 years ago
|
||
I have no idea what could cause this. Two patches are removing unused code, and the other two affect how we do shutdown in e10s processes, which should not affect a resize test. The only part that makes sense is that this is e10s-only, and my patches only affect e10s. I'll upload some patches against trunk with various subsets backed out. These patches touch the same file so hg backout isn't really going to work, as you saw.
Comment 9•8 years ago
|
||
Comment 10•8 years ago
|
||
Comment 11•8 years ago
|
||
Talos comparer with 1277067 and 1276383 backed out:
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=d62e594806ea
Talos comparer with 1277067 and 1270308 backed out:
https://treeherder.mozilla.org/perf.html#/comparechooser?newProject=try&newRevision=7501f6a48316
Comment 12•8 years ago
|
||
It does looks like the score got 3% better with 1276383 backed out, and almost none with the other, so that's at least consistent.
Jim, it looks like you worked on this e10s tresize test at some point. Do you have any idea how a change in WebRTC shutdown might cause a regression here? I wouldn't think the content process would shut down during this test, which is the only time code I changed should do anything.
Flags: needinfo?(jmathies)
Reporter | ||
Comment 13•8 years ago
|
||
Thanks for the pushes and narrowing this down!
Reporter | ||
Updated•8 years ago
|
Component: Untriaged → WebRTC
Product: Firefox → Core
Updated•8 years ago
|
Rank: 19
Priority: -- → P1
Comment 14•8 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #12)
> It does looks like the score got 3% better with 1276383 backed out, and
> almost none with the other, so that's at least consistent.
>
> Jim, it looks like you worked on this e10s tresize test at some point. Do
> you have any idea how a change in WebRTC shutdown might cause a regression
> here? I wouldn't think the content process would shut down during this test,
> which is the only time code I changed should do anything.
Nope it shouldn't. The test opens a window and resizes it in small increments measuring the time it takes to reflow and paint the window.
Flags: needinfo?(jmathies)
Reporter | ||
Comment 15•8 years ago
|
||
we are seeing this on Aurora, I will need to ping :mccr8 later this week
Flags: needinfo?(jmaher)
Comment 16•8 years ago
|
||
It seems like this must be some bogosity in the test itself (like maybe the content process incidentally gets created and destroyed) but I haven't done the investigation to really see if that's the case. I'm okay with backing out 1277067 and 1276383 on Aurora (see the link in comment 11) if we want to buy some more time.
Reporter | ||
Comment 17•8 years ago
|
||
would we get any benefit from backing out vs leaving it in? Is there work in progress for Firefox 50 to maybe look into this? I don't think the goal is to fix every regression, just fix what we can and try to document everything else.
Flags: needinfo?(jmaher)
Comment 18•8 years ago
|
||
I think we should just WONTFIX this, unless somebody wants to investigate why WebRTC shutdown interferes with resizing a window.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•