Closed
Bug 1179732
Opened 9 years ago
Closed 9 years ago
11-147% linux*/winxp tresize regression when comparing e10s to non e10s on the same build
Categories
(Testing :: Talos, defect, P2)
Testing
Talos
Tracking
(e10s+)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
e10s | + | --- |
People
(Reporter: jmaher, Assigned: gw280)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
(deleted),
text/x-review-board-request
|
Details | |
(deleted),
patch
|
jmaher
:
review+
|
Details | Diff | Splinter Review |
comparing:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=aad95360a002&exclusion_profile=false&filter-searchStr=talos
we have:
linux:
:( tresize 17.8 +/- 2% (7#) [ +11.7%] 19.9 +/- 1% (7#)
linux64:
:( tresize 15.7 +/- 1% (7#) [ +17.8%] 18.4 +/- 3% (7#)
win7:
:) tresize 15.7 +/- 1% (7#) [ -14.3%] 13.4 +/- 1% (7#)
winxp:
:( tresize 11.8 +/- 2% (7#) [+147.0%] 29.2 +/- 3% (7#)
win8:
:) tresize 12.4 +/- 1% (7#) [ -11.3%] 11.0 +/- 1% (7#)
notice win7/win8 have improvements. osx10.10 doesn't have any difference.
in general the linux non-e10s vs e10s follow a similar pattern:
http://graphs.mozilla.org/graph.html#tests=[[254,1,45],[254,1,41],[254,1,35],[254,1,43],[254,1,33],[254,1,37]]&sel=1428061558484,1435837558484&displayrange=90&datatype=geo
winxp is just wonky. fyi, winxp is wonky for tscrollx and tp5o_scroll as well.
Updated•9 years ago
|
Keywords: regressionwindow-wanted
Updated•9 years ago
|
Comment hidden (obsolete) |
Updated•9 years ago
|
Flags: needinfo?(bgirard)
Comment 2•9 years ago
|
||
Some bad links in there. Let me try that again.
Comment 3•9 years ago
|
||
non-e10s:
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/5a82eda28a55fc4e1f6780a90d5569be714655d3c9ee2eb6dc9dcc88404eced842464ef36ba01772264dcc809ede04f5648747cac060f276f1ee033eaa4d3c79&pathInZip=profile_tresize/tresize/cycle_0.sps#report=6a79a4841916fbac7e10910c8f39574f018fed2c
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/5a82eda28a55fc4e1f6780a90d5569be714655d3c9ee2eb6dc9dcc88404eced842464ef36ba01772264dcc809ede04f5648747cac060f276f1ee033eaa4d3c79&pathInZip=profile_tresize/tresize/cycle_1.sps#report=d5145bee3af8afcc9e60fc81d8c15545ef3ef0de
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/5a82eda28a55fc4e1f6780a90d5569be714655d3c9ee2eb6dc9dcc88404eced842464ef36ba01772264dcc809ede04f5648747cac060f276f1ee033eaa4d3c79&pathInZip=profile_tresize/tresize/cycle_2.sps#report=45d247924d8a2ef7c1b4b961fbaf0230276356b1
e10s:
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/950f0b481ae4dd67389b8503d4f0194d91779636e3a7aed215b45708b8a3366723b39605cfd38bc61debf86269cad8e59a3206ae41f8d861411dbd6a822175ef&pathInZip=profile_tresize/tresize/cycle_0.sps#report=56d6b1001887bdd67b9ec77743498cbbb532e923
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/950f0b481ae4dd67389b8503d4f0194d91779636e3a7aed215b45708b8a3366723b39605cfd38bc61debf86269cad8e59a3206ae41f8d861411dbd6a822175ef&pathInZip=profile_tresize/tresize/cycle_1.sps#report=11b2e4d053e518828788c6a96672dd6a885094fb
* http://people.mozilla.org/~bgirard/cleopatra/?zippedProfile=http://mozilla-releng-blobs.s3.amazonaws.com/blobs/Try-Non-PGO/sha512/950f0b481ae4dd67389b8503d4f0194d91779636e3a7aed215b45708b8a3366723b39605cfd38bc61debf86269cad8e59a3206ae41f8d861411dbd6a822175ef&pathInZip=profile_tresize/tresize/cycle_2.sps#report=fd0f120b5c19ae1393baaddd996aa39bbaafa3ae
Comment 4•9 years ago
|
||
The thing that's sticking out at me right now is that we seem to call nsPresContext::FireDOMPaintEvent farrrrrrr more often in the content process (which gets dispatched synchronously to the parent via the add-on shims), than we do with non-e10s.
Looking at at the first profiles for non-e10s and e10s that I listed...
Calls to nsPresContext::FireDOMPaintEvent:
non-e10s: 30 samples
e10s (content process): 1418 samples
Comment 5•9 years ago
|
||
My gut feeling would be to look into what's the deal with that PushGroup. It's very costly on the e10s profile. If we improve that then the RPC message would get through a lot faster.
Killing that RpcMessage as well would be wonderful.
FireDOMPaintEvent showing up as well seems suspicious.
We likely need an answer for all three of these if we want this regression to go away.
Flags: needinfo?(bgirard)
Comment 6•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → gwright
Priority: -- → P1
Comment 8•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/28315/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/28315/
Comment 9•9 years ago
|
||
Assignee | ||
Comment 10•9 years ago
|
||
For the record, that's a profile from my Linux box running tresize. We're trying to debug why there's such a huge startup time for the content process.
Assignee | ||
Comment 11•9 years ago
|
||
More complete profile: https://cleopatra.io/#report=7ff1953e607c2e3c629844ab0836e7aa95f09372
Assignee | ||
Comment 12•9 years ago
|
||
So, what appears to be happening here is that tresize uses an addon to run the test, and that addon is using shims which is really killing our startup performance. We seem to spend around 4-5 seconds initialising before we actually start to run the test, which on my box at least accounts for the discrepancy between e10s and non-e10s.
I haven't been able to conclusively prove this as the first 5s of any profile I take is a black hole with no samples recorded. Mike and I have looked into trying to get the profiler to sample but haven't had any luck.
Focusing my efforts now on killing the use of shims in this addon and hoping the issue goes away.
Reporter | ||
Comment 13•9 years ago
|
||
good find- fixing the addon is fair game!
Comment 14•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #13)
> good find- fixing the addon is fair game!
Good find, but not sure if fixing the addon is the first thing I'd do.
The addon doesn't use any special features IIRC, and it communicates cross process using messages, so not sure what the shims would be used for TBH. Maybe for the resize commands themselves? IIRC making it work with e10s was done with the advice of the e10s guys (don't recall the bug number, Joel?), so it shouldn't be doing something really bad.
Even if it does end up using shims, this might happen with other addons in the wild too, so it's probably worth understanding where does the slow-down come from, at the very least (which according to comment 12 is being done already, which is great).
Assignee | ||
Comment 15•9 years ago
|
||
This is a good first step imho. It narrows the gap but doesn't quite get us there. Investigating further.
Attachment #8712404 -
Flags: review?(jmaher)
Assignee | ||
Comment 16•9 years ago
|
||
Let's see what try shows https://treeherder.mozilla.org/#/jobs?repo=try&revision=856618af7188
Reporter | ||
Comment 17•9 years ago
|
||
try shows that e10s vs non e10s we have:
* win7 is equal
* win8 is slightly faster
* linux64 is much slower
Reporter | ||
Comment 18•9 years ago
|
||
Comment on attachment 8712404 [details] [diff] [review]
0001-Bug-1179732-Make-the-tresize-addon-e10s-compatible.patch
Review of attachment 8712404 [details] [diff] [review]:
-----------------------------------------------------------------
these changes look good. Try doesn't have any issues with it :)
Attachment #8712404 -
Flags: review?(jmaher) → review+
Comment 19•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
Assignee | ||
Comment 20•9 years ago
|
||
https://treeherder.mozilla.org/perf.html#/graphs?timerange=86400&series=[mozilla-inbound,ff4ce004871790022f413b1abb21632e7c788f65,1]&series=[mozilla-inbound,2c2a8f5bf693abaac9441bda475152c167bf24e0,1]
The initial results for WinXP non-PGO are looking good. e10s tresize timings are now basically the same as non-e10s. PGO is still showing the same though, but it's small.
Comment 21•9 years ago
|
||
bugherder |
Comment 22•9 years ago
|
||
bugherder |
Reporter | ||
Comment 23•9 years ago
|
||
now that we have more data, this is looking much more aligned! tresize is bi-modal on opt, but for opt e10s, we seem to match (or beat) the upper mode of the results- and in e10s land we have a single mode.
Assignee | ||
Comment 24•9 years ago
|
||
Jeff, given that Windows and OS X are now no longer showing a regression or a very slight one (~5%), and it's only Linux that's showing a significant regression (~25%), how much of a priority is this now?
Flags: needinfo?(jgriffiths)
Comment 25•9 years ago
|
||
(In reply to George Wright (:gw280) (:gwright) from comment #24)
> Jeff, given that Windows and OS X are now no longer showing a regression or
> a very slight one (~5%), and it's only Linux that's showing a significant
> regression (~25%), how much of a priority is this now?
Lower priority, for sure. What are my options? This should not block and also I would not call it a P1, but I think we need to understand why Linux is regressing.
Flags: needinfo?(jgriffiths)
Updated•9 years ago
|
Priority: -- → P2
Comment 27•9 years ago
|
||
Joel, according to your measurements at bug 1144120 comment 14, tresize on e10s linux is now on par (within 1%) compared to non-e10s.
Can we use this data to close this bug?
Flags: needinfo?(jmaher)
Comment 28•9 years ago
|
||
Though it's still worse by 7-12% on windows 7/XP.
Reporter | ||
Comment 29•9 years ago
|
||
linux is fixed, winxp is the only one that is measurably slower, win7/osx10.10 are similar and win8 has an improvement. Close this out if we are not going to work on winxp.
Flags: needinfo?(jmaher)
Assignee | ||
Comment 30•9 years ago
|
||
Based on https://treeherder.mozilla.org/perf.html#/e10s?filter=tresize it looks like we can close this bug. The release criteria state we can regress this test by up to 5%, so to me it looks like we have hit that.
Jeff, what do you think?
Flags: needinfo?(jgriffiths)
Comment 31•9 years ago
|
||
(In reply to George Wright (:gw280) (:gwright) from comment #30)
> Based on https://treeherder.mozilla.org/perf.html#/e10s?filter=tresize it
> looks like we can close this bug. The release criteria state we can regress
> this test by up to 5%, so to me it looks like we have hit that.
>
> Jeff, what do you think?
I would like to see a followup bug for the winxp regression in particular, but besides that I agree.
Flags: needinfo?(jgriffiths)
Assignee | ||
Comment 32•9 years ago
|
||
Filed bug 1252947
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Assignee | ||
Updated•9 years ago
|
Keywords: leave-open
You need to log in
before you can comment on or make changes to this bug.
Description
•