Closed Bug 1039611 Opened 10 years ago Closed 10 years ago

55 Windows PGO regressions on just about all suites on inbound (fx33) from July 14th

Categories

(Core :: JavaScript Engine, defect)

x86
Windows 7
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jmaher, Unassigned)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

I am still narrowing down the exact change and quantifying the damage, but here is what we have: http://54.215.155.53:8080/alerts.html?rev=0f25f3ccb7b3&showAll=1 I put this in a table: https://docs.google.com/spreadsheets/d/1H0U0flbFFR_v1Vllrr3Fl3vyjagZrwZYzJmSgV-fVNc/edit?usp=sharing The tests of concern are: a11y - 160 - 230% regression tp5o responsiveness - 145-182% regression really all tests other than canvasmark have a 10%+ regression. The tests which are not affected are: tscrollx v8_7 canvasmark glterrain
Blocks: 1037886
Component: Talos → JavaScript Engine
Product: Testing → Core
:jandem, can you comment on if we should back this out or resolve this in another fix?
Flags: needinfo?(jdemooij)
related to bug 1029422, it sounds like there is a fix which just landed.
Flags: needinfo?(jdemooij)
It is surprising that this code would affect performance across the board, given that it shouldn't even run unless we're explicitly gathering memory stats.
(In reply to Andrew McCreight (Away July 17-24) [:mccr8] from comment #3) > It is surprising that this code would affect performance across the board, > given that it shouldn't even run unless we're explicitly gathering memory > stats. It's not bug 1037886 (that one landed 10 minutes ago) but bug 1037869. We've had the exact same issue a few weeks ago with another string patch, then I landed another patch and the regression was magically gone but now it returned. Something must trigger some deoptimization somewhere. Anyway, bug 1030706 landed today and should have fixed this for good.
Blocks: 1037869
No longer blocks: 1037886
Depends on: 1030706
FWIW, I checked one of the regressions - tp5o_scroll which tests scrolling like tscrollx but on the tp5o pages, and it looks like the non-pgo results (4.2 ms on average) are roughly in the middle between the normal pgo results and the regressed pgo results (3.2 ms, 5.3ms respectively): http://graphs.mozilla.org/graph.html#tests=[[323,63,31],[323,131,31]]&sel=1404932946132,1405537746132&displayrange=7&datatype=running Also interesting that tscrollx almost didn't regress while tp5o_scroll which should exercise identical/very-similar code paths regressed significantly. Not sure why it happens, and it's also hard to investigate. Ideas are welcome. The scroll code is identical between those tests, but one difference - other than scrolling a different set of pages, is that tscrollx scrolls with normal content privileges, while tp5o_scroll scrolls with chrome privileges from within the pageLoader talos addon.
After turning pgo off for js/src/, we have resolve almost all of the regressions here. We had 55 regressions originally, and are left with 11 small ones: https://docs.google.com/spreadsheets/d/1H0U0flbFFR_v1Vllrr3Fl3vyjagZrwZYzJmSgV-fVNc/edit?usp=sharing (please see the tabs at the bottom) What is left is a 4% average regression for: TART (all 3) TPaint (all 3) Session Restore (all 3) Session Restore No Auto (Win 8) TS, Paint (Win 7) One observation is that the scrolling and page loading tests all went back to normal while the ones that open a new window or tab have held onto a small regression. The question to still answer is the remaining regressions related to turning PGO off, or related to some other change while we had the large spike in pgo only regressions for a few days. My gut tells me this is related to pgo as we didn't see the regressed values change by ~4%.
after the fix there is one regression: tp5 main startup file io - 9% regression: http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=1405429529000,1405602329000&displayrange=7&datatype=running I assume this is a side effect of less pgo.
(In reply to Joel Maher (:jmaher) from comment #7) > after the fix there is one regression: > tp5 main startup file io - 9% regression: > http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=1405429529000, > 1405602329000&displayrange=7&datatype=running > > I assume this is a side effect of less pgo. While I didn't try to dispute the numbers, I think I could interpret them differently than "due to less pgo". Here's a graph of the same metric over 3 months: http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=none&displayrange=90&datatype=running As you could tell, it's been creeping upwards throughout all this period. Except for some occasional short periods of lower values (one of them apparently during the period of this bug's regressions due to the strings), it's just getting worse all the time. I think the last "regression" you see could be due to this rather than due to less PGO.
Yeah I doubt a regression in "tp5 main startup file io" could have been caused by turning off PGO in js/src. On the pure JS benchmarks we saw consistent wins or little change..
this is coming on 6 months old, are there any reasons not to close this as wontfix?
(In reply to Joel Maher (:jmaher) from comment #6) > After turning pgo off for js/src/, we have resolve almost all of the > regressions here. We had 55 regressions originally, and are left with 11 > small ones: ... That's likely the best we can get of it. Let's close.
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.