Closed Bug 1032831 Opened 10 years ago Closed 10 years ago

3-30% regressions due to disabling OMTC on Aurora

Categories

(Core :: Graphics, defect)

All
Windows 8
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX
Tracking Status
firefox32 + wontfix

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

on the flip side of the regressions, we had improvements, and now that we are turning them off, we see a series of performance tests regress: winxp: * tscroll-asap 31.4% * svg-asap 25.8% win7: * tscroll-asap 28.7% win8: * ts, paint 7.04% * paint 4.37% * tscroll-asap 27% * session restore 3.7% * session restore no auto restore 7.48% I assume we are fine taking this hit as it was am improvement on aurora/trunk so the regression won't be seen when we uplift to beta.
This is probably a good hint that when we report regressions as a result of some changeset, we should also log the improvements, such that it doesn't look as if we only regress both when we land it and then regress again when we back it out later. And it'll also allow us to compare the improvements/regressions once we back it out to make sure we didn't end up at a worse place than before the original patch landed. Kyle, would this be a good enough example to demonstrate why it'd be useful for dzAlerts to also log improvements?
Flags: needinfo?(klahnakoski)
IMHO, if those are not regressions when compared with 31, we should be OK. We know that OMTC improves some benchmarks and regresses others (I'm sure we did see some good perf wins as well with this disabling, right?) but as long as non-OMTC 32 does not regress over non-OMTC 31, we should be good. OMTC on Windows is just too unstable to ship, the amount of crashes would be too much for even make me sign this off for beta at this point.
(In reply to Robert Kaiser (:kairo@mozilla.com) from comment #3) > IMHO, if those are not regressions when compared with 31, we should be OK. Agreed. However, a more accurate approach would be to check the kinds of improvements we got when OMTC was turned on, and verify that the regressions of this bug roughly match their magnitudes (off the top of my head - looks like they do). Otherwise, there could have been other, possibly many, performance changes between FX31 and now. Currently we "officially" only track regressions, so such comparison is possible but we don't have the data "ready" for it. Comment 1 was a bit off topic for this bug but suggests that we should also at least log the improvements as well somehow, even if mostly they don't require further actions.
The side effect here is that the 33 uplift to aurora will result in a large series of improvement/regressions again. It isn't a big deal, just something we need to keep in mind.
Matching improvements to regressions and performing a net benefit analysis is far outside the scope of an alerting app. I suspect this type of analysis requires a human to review all test suite results and perform a context sensitive analysis. I imagine alerting to improvements will help little here: The improvement may be below a given threshold and be invisible, or noise will mess with the estimate of the improvement amplitude. Both force the engineer to review all test suites and evaluate the discrepancies: which is necessarily harder than doing the net benefit analysis on it's own. Now, *if* we believe the regressions and improvements happen on the same changeset, I can understand the benefit of reporting those improvements: All regression alerts come with a list of corresponding improvements that happened on other tests on the same revision. Maybe that will catch this case, maybe not.
Flags: needinfo?(klahnakoski)
(In reply to Kyle Lahnakoski [:ekyle] from comment #6) > Matching improvements to regressions and performing a net benefit analysis > is far outside the scope of an alerting app. How about a using the previous version on the same channel as a benchmark and using that for comparison as well? This should at least let us know if we're shipping a regression from the previous release. Do we know what, if any, regression 32 shows in this respect in comparison with 31?
Flags: needinfo?(jmaher)
(In reply to Kyle Lahnakoski [:ekyle] from comment #6) > Matching improvements to regressions and performing a net benefit analysis > is far outside the scope of an alerting app. It may be outside the scope of an alerting app, but in order to do that with another tool, we'll first need to have a log of improvements as well, and then maybe some other tool could match the improvements to regressions to changesets. And logging the improvements falls right within the core scope of an alerting app. All the algorithms are pretty much identical for detecting regressions or improvements.
(In reply to Avi Halachmi (:avih) from comment #8) > (In reply to Kyle Lahnakoski [:ekyle] from comment #6) > > Matching improvements to regressions and performing a net benefit analysis > > is far outside the scope of an alerting app. > > ... logging the improvements falls right within the core scope of an > alerting app. All the algorithms are pretty much identical for detecting > regressions or improvements. Agreed, changes will be made to the alerting app so the small regressions and all improvements are logged to a database. Then we can work on (separate) clients that use that info to deal with the complexities of net benefit analysis.
for each uplift (I am behind from the one 2 weeks ago) I compile the list of regressions and improvements and summarize this so we can see what changes. The uplifts are great ways to track the delta because we have 6 weeks of difference between the two versions (except for changes that landed on both). The conversation of Kyle and Avi is more of a meta conversation about the newer higher resolution tracking system we are working on and fine tuning.
Flags: needinfo?(jmaher)
I don't think there is actually anything that we need to do in this bug for 32. The 'regressions' are from a state of 32 that we will not ship. As Kairo said in comment 3, if these are not regressions when compared with 31, we're ok. We will see the perf improvements again in 33. I'm marking this as won't fix for 32 and resolving as won't fix.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.