Closed
Bug 1171653
Opened 9 years ago
Closed 9 years ago
4% Linux*/Win7 tp5o regression on Firefox e10s on June 01, 2015 from push baa9c64fea6f
Categories
(Testing :: Talos, defect, P5)
Testing
Talos
Tracking
(e10s+, firefox41- wontfix, firefox42- affected, firefox43- affected)
People
(Reporter: jmaher, Assigned: tnikkel)
References
(Blocks 1 open bug)
Details
(Keywords: perf, regression, Whiteboard: [talos_regression][e10s])
Talos has detected a Firefox performance regression from your commit baa9c64fea6f. We need you to address this regression.
This is a list of all known regressions and improvements related to your bug:
http://alertmanager.allizom.org:8080/alerts.html?rev=baa9c64fea6f&showAll=1
On the page above you can see Talos alert for each affected platform as well as a link to a graph showing the history of scores for this test. There is also a link to a treeherder page showing the Talos jobs in a pushlog format.
To learn more about the regressing test, please see: https://wiki.mozilla.org/Buildbot/Talos/Tests#tsvg.2C_tsvgx
Making a decision:
As the patch author we need your feedback to help us handle this regression.
*** Please let us know your plans by Monday, or the offending patch will be backed out! ***
Our wiki page outlines the common responses and expectations:
https://wiki.mozilla.org/Buildbot/Talos/RegressionBugsHandling
Reporter | ||
Comment 1•9 years ago
|
||
our first e10s only regression! yay for talos.
the problem is this is on m-c only and a large merge range. We can't run talos e10s on try really (we can hack it).
not sure what route to take here.
Reporter | ||
Comment 2•9 years ago
|
||
Updated•9 years ago
|
Flags: needinfo?(mconley)
Comment 3•9 years ago
|
||
This was a merge from fx-team. Do we a similar bump there?
Flags: needinfo?(mconley) → needinfo?(jmaher)
Reporter | ||
Comment 4•9 years ago
|
||
we only run e10s talos on m-c (not fx-team or inbound). and this is the *first* regression that is e10s only !!!
Flags: needinfo?(jmaher)
Comment 5•9 years ago
|
||
Ugh. Ok.
How possible would it be to try to backfill the missing fx-team e10s talos data from around that date?
Flags: needinfo?(jmaher)
Reporter | ||
Comment 6•9 years ago
|
||
we don't have a way to run the tests on fx-team- maybe we can add that in and just not schedule it (i.e. create the builders).
:catlee, can we create builders for talos e10s jobs on fx-team (and ideally mozilla-inbound). Then when we find an issue we can backfill. Do let me know how this might work.
Flags: needinfo?(jmaher) → needinfo?(catlee)
Comment 7•9 years ago
|
||
hm, I don't think we have a way right now to create the tests and not run them.
could we use seta for this?
Flags: needinfo?(catlee)
Comment 8•9 years ago
|
||
There's not much we can do here until we have a changeset. :(
Flags: needinfo?(jmaher)
Reporter | ||
Comment 9•9 years ago
|
||
:catlee, seta doesn't work on Talos at the moment, if it did, then we could apply the same logic of SETA to talos on inbound/fx-team for e10s and reduce resources. Maybe it makes sense to do this.
kmoir, can you weigh in on how much work it might be to apply SETA to the talos builders?
Flags: needinfo?(jmaher) → needinfo?(kmoir)
Comment 10•9 years ago
|
||
Well, we would have to change the talos scheduler to use the class that looks at the skipconfig data. And we would need talos data generated by your SETA scripts so we could consume it. So it does require significant testing like the previous implementation. However, we do have the code that works for opt and debug tests so the work should be more on the testing side, implementation shouldn't be that difficult in theory.
Flags: needinfo?(kmoir)
Updated•9 years ago
|
Flags: needinfo?(mconley)
Comment 11•9 years ago
|
||
Is tying this into our current automation practical, given that we'll probably only need to do this once?
I seem to recall MattN had some scripts that let us do some backfilling of talos data when we were working on Australis... MattN, are those scripts still around? Perhaps we could modify them for our purposes.
Flags: needinfo?(mconley) → needinfo?(MattN+bmo)
Comment 12•9 years ago
|
||
http://hg.mozilla.org/users/mozilla_noorenberghe.ca/talos-tart/file/b53e872a557f/tart-nightlies.sh
http://hg.mozilla.org/users/mozilla_noorenberghe.ca/talos-tart/file/b53e872a557f/README-TART
A patch to moznightly was also needed so it didn't delete the downloaded build. I believe moznightly is gone now though there is a github issue to bring it back.
I'm on my phone on a plane now so these instructions aren't thorough but it may get you started. I can help next week at Whistler. I will need to add your machine ID to my server IIRC if you want to post there like the one script does. If you save the result files you can also POST them with later.
Flags: needinfo?(MattN+bmo)
Updated•9 years ago
|
tracking-e10s:
--- → m7+
Updated•9 years ago
|
Assignee: nobody → mconley
Comment 13•9 years ago
|
||
Alright, I don't think it makes much sense to get a bunch of releng or ateam people hacking on making this happen, since this is probably one-time-only.
I've requested a Linux talos machine. My plan is to write a script that will run the tp5o test on the machine for each push to fx-team within the regression range, and report the results.
jmaher - is it possible / advisable for me to have talos report the results from this machine to graph server for analysis? Or should I do the old trick of posting the results to a Google Spreadsheet or into a file for manual analysis?
Flags: needinfo?(jmaher)
Reporter | ||
Comment 14•9 years ago
|
||
it would be just fine to report to graph server as long as you have the branch and machine names correct.
Flags: needinfo?(jmaher)
Comment 15•9 years ago
|
||
So the fastest path (at least while at Whistler) seemed to be bisecting with try pushes between sessions.
Just a reminder, this is the regression range that was identified: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2c815cc65cc9&tochange=8707d35414f4
8707d35414f4: Bad
70a376c0f23d: Bad
dc2e19e737c7: Bad
d5adc9e191d7: Bad
09b27d21c789: Bad
507b6aba4555: Bad
d551aa12ebb1: Good
1a955124eccc: Good
9aed76a4ee0b: Good
2c815cc65cc9: Good
My bisection leads me to believe that this was caused by bug 1148582.
Blocks: 1148582
Comment 16•9 years ago
|
||
This was marked m7 so we could at least identify the regressing changeset. I think we've done that now.
Assignee | ||
Comment 17•9 years ago
|
||
I'm guessing this has the same cause as bug 1169756. Pushing
https://hg.mozilla.org/try/rev/20272d58e2e5
to try with whatever other options are needed to reproduce this could confirm.
Comment 18•9 years ago
|
||
There's a patch to force e10s enabled for talos by pointing it at an alternative talos repo / revision. That's the best we can do until bug 1174780 is fixed.
I'll do the try push comparison:
Before tn's patch: https://treeherder.mozilla.org/#/jobs?repo=try&revision=84968faa49e8
After: https://treeherder.mozilla.org/#/jobs?repo=try&revision=183f5d88ea27
Comment 19•9 years ago
|
||
Retriggers still coming in, but the initial results of this patch are compelling:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=84968faa49e8&newProject=try&newRevision=183f5d88ea27
Comment 20•9 years ago
|
||
Retriggers in - we have a winner!
Assignee | ||
Comment 21•9 years ago
|
||
Unfortunately landing that patch would be a correctness regression. See bug 1169756 comment 16 for example.
Comment 22•9 years ago
|
||
Hrm. Well, at least we know this is where the bottleneck is. Let me know if / when you've got another patch you'd like to test.
Updated•9 years ago
|
Assignee: mconley → tnikkel
Comment 23•9 years ago
|
||
[Tracking Requested - why for this release]: regression in 41
tracking-firefox41:
--- → ?
Comment hidden (obsolete) |
Comment hidden (obsolete) |
FF41 does not have e10s enabled by default. Moved tracking to 42 and 43 to ensure this gets attention there.
status-firefox42:
--- → affected
status-firefox43:
--- → affected
tracking-firefox42:
--- → +
tracking-firefox43:
--- → +
It doesn't look like it'll be useful to track this any more; I'd like to know though, how we will be testing and prioritizing performance issues when e10s is turned on. From talking with joel it sounds like we have e10s tests turned on for all pushes for talos now and so it will be easier to pinpoint future regressions.
Should we close this, or is it still useful to leave it open? Brad, what do you think?
Comment 29•9 years ago
|
||
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #28)
> It doesn't look like it'll be useful to track this any more; I'd like to
> know though, how we will be testing and prioritizing performance issues when
> e10s is turned on. From talking with joel it sounds like we have e10s tests
> turned on for all pushes for talos now and so it will be easier to pinpoint
> future regressions.
>
> Should we close this, or is it still useful to leave it open? Brad, what
> do you think?
IMO, regressions should track the release they regressed in. If we are saying we don't care to fix this regression then close it as won't fix.
Flags: needinfo?(blassey.bugs)
tnikkel, it may be up to you then. I asked Brad before I noticed you were assigned to the bug. Improving performance would be great of course, and I don't want to close this if you're still intending to work on it.
Flags: needinfo?(tnikkel)
Updated•9 years ago
|
Priority: -- → P5
Reporter | ||
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Flags: needinfo?(tnikkel)
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•