Closed
Bug 1039611
Opened 10 years ago
Closed 10 years ago
55 Windows PGO regressions on just about all suites on inbound (fx33) from July 14th
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: jmaher, Unassigned)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
I am still narrowing down the exact change and quantifying the damage, but here is what we have:
http://54.215.155.53:8080/alerts.html?rev=0f25f3ccb7b3&showAll=1
I put this in a table:
https://docs.google.com/spreadsheets/d/1H0U0flbFFR_v1Vllrr3Fl3vyjagZrwZYzJmSgV-fVNc/edit?usp=sharing
The tests of concern are:
a11y - 160 - 230% regression
tp5o responsiveness - 145-182% regression
really all tests other than canvasmark have a 10%+ regression.
The tests which are not affected are:
tscrollx
v8_7
canvasmark
glterrain
Reporter | ||
Updated•10 years ago
|
Component: Talos → JavaScript Engine
Product: Testing → Core
Reporter | ||
Comment 1•10 years ago
|
||
:jandem, can you comment on if we should back this out or resolve this in another fix?
Flags: needinfo?(jdemooij)
Reporter | ||
Comment 2•10 years ago
|
||
related to bug 1029422, it sounds like there is a fix which just landed.
Flags: needinfo?(jdemooij)
Comment 3•10 years ago
|
||
It is surprising that this code would affect performance across the board, given that it shouldn't even run unless we're explicitly gathering memory stats.
Comment 4•10 years ago
|
||
(In reply to Andrew McCreight (Away July 17-24) [:mccr8] from comment #3)
> It is surprising that this code would affect performance across the board,
> given that it shouldn't even run unless we're explicitly gathering memory
> stats.
It's not bug 1037886 (that one landed 10 minutes ago) but bug 1037869. We've had the exact same issue a few weeks ago with another string patch, then I landed another patch and the regression was magically gone but now it returned. Something must trigger some deoptimization somewhere.
Anyway, bug 1030706 landed today and should have fixed this for good.
Comment 5•10 years ago
|
||
FWIW, I checked one of the regressions - tp5o_scroll which tests scrolling like tscrollx but on the tp5o pages, and it looks like the non-pgo results (4.2 ms on average) are roughly in the middle between the normal pgo results and the regressed pgo results (3.2 ms, 5.3ms respectively):
http://graphs.mozilla.org/graph.html#tests=[[323,63,31],[323,131,31]]&sel=1404932946132,1405537746132&displayrange=7&datatype=running
Also interesting that tscrollx almost didn't regress while tp5o_scroll which should exercise identical/very-similar code paths regressed significantly. Not sure why it happens, and it's also hard to investigate. Ideas are welcome.
The scroll code is identical between those tests, but one difference - other than scrolling a different set of pages, is that tscrollx scrolls with normal content privileges, while tp5o_scroll scrolls with chrome privileges from within the pageLoader talos addon.
Reporter | ||
Comment 6•10 years ago
|
||
After turning pgo off for js/src/, we have resolve almost all of the regressions here. We had 55 regressions originally, and are left with 11 small ones:
https://docs.google.com/spreadsheets/d/1H0U0flbFFR_v1Vllrr3Fl3vyjagZrwZYzJmSgV-fVNc/edit?usp=sharing (please see the tabs at the bottom)
What is left is a 4% average regression for:
TART (all 3)
TPaint (all 3)
Session Restore (all 3)
Session Restore No Auto (Win 8)
TS, Paint (Win 7)
One observation is that the scrolling and page loading tests all went back to normal while the ones that open a new window or tab have held onto a small regression.
The question to still answer is the remaining regressions related to turning PGO off, or related to some other change while we had the large spike in pgo only regressions for a few days. My gut tells me this is related to pgo as we didn't see the regressed values change by ~4%.
Reporter | ||
Comment 7•10 years ago
|
||
after the fix there is one regression:
tp5 main startup file io - 9% regression:
http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=1405429529000,1405602329000&displayrange=7&datatype=running
I assume this is a side effect of less pgo.
Comment 8•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #7)
> after the fix there is one regression:
> tp5 main startup file io - 9% regression:
> http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=1405429529000,
> 1405602329000&displayrange=7&datatype=running
>
> I assume this is a side effect of less pgo.
While I didn't try to dispute the numbers, I think I could interpret them differently than "due to less pgo".
Here's a graph of the same metric over 3 months: http://graphs.mozilla.org/graph.html#tests=[[242,63,25]]&sel=none&displayrange=90&datatype=running
As you could tell, it's been creeping upwards throughout all this period. Except for some occasional short periods of lower values (one of them apparently during the period of this bug's regressions due to the strings), it's just getting worse all the time.
I think the last "regression" you see could be due to this rather than due to less PGO.
Comment 9•10 years ago
|
||
Yeah I doubt a regression in "tp5 main startup file io" could have been caused by turning off PGO in js/src. On the pure JS benchmarks we saw consistent wins or little change..
Reporter | ||
Comment 10•10 years ago
|
||
this is coming on 6 months old, are there any reasons not to close this as wontfix?
Comment 11•10 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #6)
> After turning pgo off for js/src/, we have resolve almost all of the
> regressions here. We had 55 regressions originally, and are left with 11
> small ones: ...
That's likely the best we can get of it. Let's close.
Reporter | ||
Updated•10 years ago
|
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•