Closed Bug 1186956 Opened 9 years ago Closed 9 years ago

2-20% svgr_opacity/tp5_responsiveness regressions on win* platforms seen on inbound July 21

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(firefox42 fixed)

RESOLVED FIXED
mozilla42
Tracking Status
firefox42 --- fixed

People

(Reporter: jmaher, Assigned: mconley)

References

Details

(Keywords: perf, regression, Whiteboard: [talos_regression])

Attachments

(2 files, 1 obsolete file)

We have a build bustage and a series of patches which didn't generate data!  As patch authors we all need to pitch in and see if your patch is the root cause.

Here is what we have (http://alertmanager.allizom.org:8080/alerts.html?rev=a86a6429b078&showAll=1&testIndex=0&platIndex=0):
damp - improvements 3-6%
tp5o private bytes/mainrss - 2-6% improvements

tp5o responsiveness - regressions 20% on all win* platforms
svg opacity - 4.3 -> 6.9% regressions
tp5o %cpu - win7 - 2.7% regression

here is a compareview of the tests:
https://treeherder.allizom.org/perf.html#/compare?originalProject=mozilla-inbound&originalRevision=bcf65c04b69c&newProject=mozilla-inbound&newRevision=a86a6429b078

you can see svg near the bottom has a bunch of regressions, and damp has some improvements!

compare view doesn't have counters (including responsiveness), so here is a graph view showing the responsiveness issue:
http://graphs.mozilla.org/graph.html#tests=[[267,131,25],[267,131,37]]&sel=1435076913436,1437668913436&displayrange=30&datatype=geo
here is the range:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=bcf65c04b69c&tochange=11ee37ed0cd9&filter-searchStr=Windows%208%2064-bit%20mozilla-inbound%20talos%20tp5o

If you are needinfo'd on this bug- I need your help in determining if your patch caused one or more of these above mentioned regressions.  If you cannot respond in the next 24 hours, I will back out and we can reland.  :mccr8 I would appreciate your help as the build was broken which is the only reason I have a larger regression window here!
Flags: needinfo?(tbsaunde+mozbugs)
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(mfowler)
Flags: needinfo?(mconley)
Flags: needinfo?(continuation)
Isn't the patch that changes Talos most likely to be the cause for changing the Talos score?
I mean bug 1186057.
Flags: needinfo?(continuation)
Bug 1186031 change affect only to b2g gonk. It seems not related to this bug.
Flags: needinfo?(sotaro.ikeda.g)
bug 1185726 should only have an effect when a11y is enabled, which I'm pretty sure is not the case for svg tests.  The Same should probably be true for Lorien's patch though I haven't looked.

Without looking at what hapened in the talos repo that seems like a good bet.  I'd also think we could narrow down the set of possibly bad commits on try if someone spent the time.
Flags: needinfo?(tbsaunde+mozbugs)
thanks for commenting so far.  I am familiar with the talos changes, they should affect other tests as well if they were the root cause.  lets see what mfowler has to say, Ideally we can identify a revision or two, test on try and confirm.  My track record for pushing to try and bisecting is horrible when I cannot just 'hg update <rev>'.  I would rather reduce confusion and start by asking folks.
Joel, I'm pretty confident that bug 1185592 (which is the only patch of mine in the regression range you posted) is unrelated to this issue. It did two things - delete some dead logging macros that were no longer used, and cache the result of parsing an image's MIME type instead of parsing it every time we decode it. I can't see either of those changes being responsible for this kind of regression.
Flags: needinfo?(mfowler)
Talked to jmaher about this over IRC. The talos version bump is, imo, the most suspicious change in the range. We're going to Try a backout and see where the numbers go. Assuming that the talos bump is the regressor, we'll then get some profiles and see what's gone wrong.
Flags: needinfo?(mconley)
and this looks to be the talos changes that mconley worked on. 

baseline push (tip of inbound) with profiling:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ba2e695c2287

baseline push and backed out talos with profile:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca9a78e09250

look at tp5 responsiveness, svg opacity on windows xp or windows 7 for data.
Flags: needinfo?(mconley)
Thanks!
Strangely, it looks like the backout patch didn't get any builds. Re-pushed:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7f7b81f6273e
These two try pushes seem to indicate that it was the patch for doing GC / CC in the content process between talos pageloads that's affecting us here:

Adds Task.async support:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2eaa40679192

Adds GC / CC between pageloads:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dccc14c00cd2

Looking into what can be done here...
I don't fully understand what's slowing us down here. According to the profiles from the talos machines, we're spending more time serializing nsISHEntry's after we garbage collect in the frame script... I don't know why that is.

I will note, however, that there's really no point in running GC / CC in single-process mode... so the shortest path to eliminating this regression is probably to skip the unnecessary GC / CC in the frame script if we're not running in multi-process mode.

All are based on top of 8bae34af92ea:

Before (hopefully slow): https://treeherder.mozilla.org/#/jobs?repo=try&revision=467107cda21f
After (hopefully fast): https://treeherder.mozilla.org/#/jobs?repo=try&revision=79af0f0f6897
After with e10s (hopefully unchanged): https://treeherder.mozilla.org/#/jobs?repo=try&revision=5662e518afa9
Bug 1186956 - Only GC / CC in a frame script if we're running with e10s. r?jmaher
Attachment #8639997 - Flags: review?(jmaher)
Comment on attachment 8639997 [details]
MozReview Request: Bug 1186956 - Only GC / CC in a frame script if we're running with e10s. r?jmaher

https://reviewboard.mozilla.org/r/14267/#review12905

Ship It!
Attachment #8639997 - Flags: review?(jmaher) → review+
url:        https://hg.mozilla.org/build/talos/rev/85587fddabcd5944604aca0e8afea3f4eac0eb3b
changeset:  85587fddabcd5944604aca0e8afea3f4eac0eb3b
user:       Mike Conley <mconley@mozilla.com>
date:       Mon Jul 27 23:22:44 2015 -0400
description:
Bug 1186956 - Only GC / CC in a frame script if we're running with e10s. r=jmaher
Bug 1186956 - Bump to latest talos version to address some tp5r and tsvg_opacity regressions. r?jmaher
Attachment #8640150 - Flags: review?(jmaher)
Flags: needinfo?(mconley)
Comment on attachment 8640150 [details]
MozReview Request: Bug 1186956 - Bump to latest talos version to address some tp5r and tsvg_opacity regressions. r?jmaher

https://reviewboard.mozilla.org/r/14291/#review12923

Ship It!
Attachment #8640150 - Flags: review?(jmaher) → review+
url:        https://hg.mozilla.org/integration/mozilla-inbound/rev/82fc23b5d5b808f42ebc0dec348c9ac704c6a7f2
changeset:  82fc23b5d5b808f42ebc0dec348c9ac704c6a7f2
user:       Mike Conley <mconley@mozilla.com>
date:       Tue Jul 28 17:38:18 2015 -0400
description:
Bug 1186956 - Bump to latest talos version to address some tp5r and tsvg_opacity regressions. r=jmaher
https://hg.mozilla.org/mozilla-central/rev/82fc23b5d5b8
Assignee: nobody → mconley
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla42
verified on graph server!  thanks for fixing this.
Attachment #8639997 - Attachment is obsolete: true
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: