Closed Bug 1278686 Opened 8 years ago Closed 8 years ago

2.38% tresize (windows8-64) regression on push 8908cc624380 (Thu Jun 2 2016)

Categories

(Core :: WebRTC, defect, P1)

49 Branch
defect

Tracking

()

RESOLVED WONTFIX

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(2 files)

Talos has detected a Firefox performance regression from push 8908cc624380. As author of one of the patches included in that push, we need your help to address this regression. This is a list of all known regressions and improvements related to the push: https://treeherder.mozilla.org/perf.html#/alerts?id=1431 On the page above you can see an alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format. To learn more about the regressing test(s), please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tresize Reproducing and debugging the regression: If you would like to re-run this Talos test on a potential fix, use try with the following syntax: try: -b o -p win64 -u none -t chromez-e10s[Windows 8] --rebuild 5 # add "mozharness: --spsProfile" to generate profile data (we suggest --rebuild 5 to be more confident in the results) To run the test locally and do a more in-depth investigation, first set up a local Talos environment: https://wiki.mozilla.lorg/Buildbot/Talos/Running#Running_locally_-_Source_Code Then run the following command from the directory where you set up Talos: talos --develop -e [path]/firefox -a tresize (add --e10s to run tests in e10s mode) Making a decision: As the patch author we need your feedback to help us handle this regression. *** Please let us know your plans within 3 business days, or the offending patch(es) will be backed out! *** Our wiki page outlines the common responses and expectations: https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Is this with or without e10s? It looks like maybe with e10s. It seems weird that this would cause a regression, because it should make us do strictly less work in an e10s build.
this is a tricky one as we had 29 changesets get pushed while the build was broken- imagine 100 try pushes to get this figured out, but that is the truth. I did attempt to narrow this down and I believe it is: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?changeset=8908cc624380 there are a few changesets there and my suspicion is: https://hg.mozilla.org/integration/mozilla-inbound/rev/25e2bc431fa9 as far as I can tell we just have the single regression on windows 8 tresize. either way, I was not able to back that changeset out and verify on try as it wasn't clean and my attempt to rectify the merge conflicts resulted in a build error. :mccr8, can you take a look at this and let us know if you can figure out why this is the case?
sorry, this is e10s only!
You need to also back out efcca7d36ad4 to back out either cb803c900f83 or 25e2bc431fa9.
that is a lot of either or stuff, can you push to try with the backout? I have spent about 4 hours trying to bisect this down already, I don't mind validating data against a try push!
I'm not sure what exactly you want. Are you looking for a patch against trunk mozilla-central that has 25e2bc431fa9 backed out?
I was just trying to narrow down the 4 patches to figure out which one is the root cause- if you have an idea of what part of the code is causing the regression, ideally we don't need anything. to narrow it down, I would ideally: for patch in patches: hg backout patch.rev hg push -f try // revert changes
I have no idea what could cause this. Two patches are removing unused code, and the other two affect how we do shutdown in e10s processes, which should not affect a resize test. The only part that makes sense is that this is e10s-only, and my patches only affect e10s. I'll upload some patches against trunk with various subsets backed out. These patches touch the same file so hg backout isn't really going to work, as you saw.
It does looks like the score got 3% better with 1276383 backed out, and almost none with the other, so that's at least consistent. Jim, it looks like you worked on this e10s tresize test at some point. Do you have any idea how a change in WebRTC shutdown might cause a regression here? I wouldn't think the content process would shut down during this test, which is the only time code I changed should do anything.
Flags: needinfo?(jmathies)
Thanks for the pushes and narrowing this down!
Component: Untriaged → WebRTC
Product: Firefox → Core
Rank: 19
Priority: -- → P1
(In reply to Andrew McCreight [:mccr8] from comment #12) > It does looks like the score got 3% better with 1276383 backed out, and > almost none with the other, so that's at least consistent. > > Jim, it looks like you worked on this e10s tresize test at some point. Do > you have any idea how a change in WebRTC shutdown might cause a regression > here? I wouldn't think the content process would shut down during this test, > which is the only time code I changed should do anything. Nope it shouldn't. The test opens a window and resizes it in small increments measuring the time it takes to reflow and paint the window.
Flags: needinfo?(jmathies)
we are seeing this on Aurora, I will need to ping :mccr8 later this week
Flags: needinfo?(jmaher)
It seems like this must be some bogosity in the test itself (like maybe the content process incidentally gets created and destroyed) but I haven't done the investigation to really see if that's the case. I'm okay with backing out 1277067 and 1276383 on Aurora (see the link in comment 11) if we want to buy some more time.
would we get any benefit from backing out vs leaving it in? Is there work in progress for Firefox 50 to maybe look into this? I don't think the goal is to fix every regression, just fix what we can and try to document everything else.
Flags: needinfo?(jmaher)
I think we should just WONTFIX this, unless somebody wants to investigate why WebRTC shutdown interferes with resizing a window.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: