Closed
Bug 1178353
Opened 9 years ago
Closed 9 years ago
windows xp svgr talos e10s only regression on m-c
Categories
(Core :: Graphics: ImageLib, defect)
Core
Graphics: ImageLib
Tracking
()
People
(Reporter: jmaher, Assigned: seth)
References
(Blocks 1 open bug)
Details
(Keywords: perf, regression, Whiteboard: [talos_regression])
we have 2 e10s only regressions on winxp talos svgx.
You can see from the graph we have 2 bumps (http://graphs.mozilla.org/graph.html#tests=[[281,1,45],[281,1,37]]&sel=1432965814114,1435557814114&displayrange=30&datatype=geo):
0) ~265.00
1) ~285.00, ~7% regression, June 19th: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=2694ff2ace6a&tochange=c319f262ce3e
2) ~305.00, ~7% regression, June 22nd: http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=cddf3b36b5e2&tochange=be81b8d6fae9
in bisecting the second regression, I have a series of (non pgo) try pushes:
hg update 026e77985e59: https://treeherder.mozilla.org/#/jobs?repo=try&revision=011553447470
hg update 2505945b9d43: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9e249ca2733e
hg update 953f8ac1e47e: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a41f2e99e7d
hg update 02f640a72dcd: https://treeherder.mozilla.org/#/jobs?repo=try&revision=1c17a0df1e32
hg update 69a9775dc5c9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=44c859f0e373
hg update 513d62fe75c9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ba309fffb34
Ideally we will see which of these helps narrow down the regression so we can run another round and find the offending push.
There is still work to do for the first regression.
Reporter | ||
Updated•9 years ago
|
Keywords: perf,
regression
Whiteboard: [talos_regression]
Reporter | ||
Comment 1•9 years ago
|
||
for the first regression, the scope is narrowed to just a single m-c push:
http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=b6fe3099f8f6&tochange=c319f262ce3e
Reporter | ||
Comment 2•9 years ago
|
||
the range is now between:
hg update 69a9775dc5c9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=44c859f0e373
hg update 513d62fe75c9: https://treeherder.mozilla.org/#/jobs?repo=try&revision=9ba309fffb34
which translates to:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=c319f262ce3e&tochange=69a9775dc5c9&filter-searchStr=talos%20svg%20win%20xp
and the pushlog:
http://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=c319f262ce3e&tochange=69a9775dc5c9
-----------
hg update 2afd5728f13a: https://treeherder.mozilla.org/#/jobs?repo=try&revision=25a17a9ad784
hg update a79cb8e688cd: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b96a3fedcab0
hg update 51168e14388c: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3676e4ea5862
these will take a few hours to update, good times! One interesting fact is that we might have 1 regression that happened to show up oddly between two pushes, we will see if that is the case. Folks feel free to hop in here and add to it.
Reporter | ||
Comment 3•9 years ago
|
||
and the winner is bug 1174923:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a79cb8e688cd
Seth, as the patch author of this, can you comment on what options we have to fix this?
Component: Talos → ImageLib
Flags: needinfo?(seth)
Product: Testing → Core
Assignee | ||
Comment 4•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> and the winner is bug 1174923:
> https://hg.mozilla.org/integration/mozilla-inbound/rev/a79cb8e688cd
>
> Seth, as the patch author of this, can you comment on what options we have
> to fix this?
I have no idea why firing the document load event *earlier* should cause a regression in svgr performance. Then again, I don't know what the test actually does.
Here's my guess as to what's happening: this test is waiting for the load event, and then measuring how long it takes to do some things that require images to be decoded. (For example, perhaps it draws images to a canvas, or uses them as masks.) Because previously we waited for the images to be decoded before firing the load event, we used to not count the time it takes to decode those images into the measurement. Now that we are firing the load event earlier, we are probably starting to count that image decoding time as part of the measurement, causing an apparent regression.
If my guess is correct, then the best bet for fixing this problem is to fix the test to ensure that images are decoded in some other way before the clock starts. One option is to draw them into a canvas, which will force them to be decoded.
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(seth)
Reporter | ||
Comment 5•9 years ago
|
||
interesting. as this is e10s only there might be some odd case with decoding.
looking at a detailed compare view;
https://treeherder.allizom.org/perf.html#/compare?originalProject=try&originalRevision=3676e4ea5862&newProject=try&newRevision=b96a3fedcab0
we see the subtests:
https://treeherder.allizom.org/perf.html#/comparesubtest?originalProject=try&originalRevision=3676e4ea5862&newProject=try&newRevision=b96a3fedcab0&originalSignature=bfc8e7e96da7fe3c59d39d17bf698ed4982d030b&newSignature=bfc8e7e96da7fe3c59d39d17bf698ed4982d030b
and there are two which really have issues:
1) tsvgx composite-scale-rotate-opacity.svg: http://hg.mozilla.org/build/talos/file/33449734f99f/talos/page_load_test/svgx/composite-scale-rotate-opacity.svg
2) tsvgx composite-scale-rotate.svg: http://hg.mozilla.org/build/talos/file/33449734f99f/talos/page_load_test/svgx/composite-scale-rotate.svg
the other non hixie showed some improvement. As you can see the regression seems to be mostly in composite scale rotate.
Reporter | ||
Comment 6•9 years ago
|
||
as a note there were two issues mentioned in the first comment, this bug is now dealing with the second issue and bug 1178643 is dealing with the first issue.
Assignee | ||
Comment 7•9 years ago
|
||
Hmmm. I'm still not sure how to explain this - I can't think of any explanation for why composite-scale-rotate-opacity.svg should take *five times* as long with this patch applied. Especially since these are page load tests, if the directory they're stored is to be believed - that suggests that firing page load earlier should be helpful rather than causing a regression.
However, with such a strong signal, it should be possible to reproduce this locally. I'll take a look tomorrow and see if I can learn anything.
Assignee | ||
Comment 8•9 years ago
|
||
By the way Joel, thanks for the additional details. That will make investigation much easier.
Reporter | ||
Comment 9•9 years ago
|
||
maybe mconley can help as well. I am on pto today/tomorrow- so any response will be spotty. I am happy to help investigate where possible- the trick is running on try is we need to have this in e10s mode which isn't easily available on try.
Reporter | ||
Comment 10•9 years ago
|
||
ok, I see this talos test regress when this landed the first time (as narrowed down in bug 1178643). What is odd is that we don't correct on the backout. We have enough retriggers to show that the test regresses each time on backout.
This is confusing nonetheless, but it does keep pointing at the code in question here.
Comment 12•9 years ago
|
||
[Tracking Requested - why for this release]: regression in 41
Assignee | ||
Comment 13•9 years ago
|
||
Joel, do I understand you correctly that there's no way of testing this on try?
Can we fix that?
Reporter | ||
Comment 14•9 years ago
|
||
right now the best way to test on e10s is to edit talos.json in your tree:
diff --git a/testing/talos/talos.json b/testing/talos/talos.json
--- a/testing/talos/talos.json
+++ b/testing/talos/talos.json
@@ -1,16 +1,16 @@
{
"talos.zip": {
"url": "http://talos-bundles.pvt.build.mozilla.org/zips/talos.a6052c33d420.zip",
"path": ""
},
"global": {
- "talos_repo": "https://hg.mozilla.org/build/talos",
- "talos_revision": "4a8d22dd38c4"
+ "talos_repo": "http://hg.mozilla.org/users/mconley_mozilla.com/e10s-talos",
+ "talos_revision": "default"
},
"extra_options": {
"android": [ "--apkPath=%(apk_path)s" ]
},
"suites": {
"chromez": {
"tests": ["tresize", "tcanvasmark"]
},
there are some quirks to sort out, but we need to get this as an option on talos try chooser.
Assignee | ||
Comment 16•9 years ago
|
||
Just posting to note that some other issues have kept me from focusing on this, but it's still a high priority for me and I plan to come back to it soon.
Don't see this getting uplifted to 41.
e10s is disabled in FF41 by default.
Comment 19•9 years ago
|
||
We disabled e10s 42 when it moved to beta.
Still enabled in 43. Tracking request to bring that to Liz radar.
status-firefox43:
--- → affected
tracking-firefox43:
--- → ?
Ok, tracking for 43. Is this also affecting 44?
Assignee | ||
Comment 22•9 years ago
|
||
(In reply to Brad Lassey [:blassey] (use needinfo?) from comment #21)
> Seth, is this still a high priority?
Yes, but not compared to other (1) regressions we've had recently that more directly impact users and (2) B2G 2.5 needs. Hence I haven't had a chance to address it yet.
That said, there are unrelated reasons to stop sync decoding images in nsSVGImageFrame, which should fix this regression, so I expect it to get fixed soon.
Flags: needinfo?(seth)
Comment 23•9 years ago
|
||
I assume this is still an issue for Firefox 44/45?
status-firefox45:
--- → ?
Reporter | ||
Comment 24•9 years ago
|
||
we only run the tests on trunk although the code is still a problem on all branches.
Assignee | ||
Comment 25•9 years ago
|
||
Adding a speculative dependency on bug 1217571, which I cannot *believe* managed to exist for so long with no one noticing. Given that this is e10s specific and involves painting the same image many, many times, it's quite possible that bug 1217571 may fix this.
Depends on: 1217571
Assignee | ||
Comment 26•9 years ago
|
||
Joel, let's watch what happens once bug 1217571 gets merged. Needinfo'ing you to make you aware of the situation.
Flags: needinfo?(jmaher)
Reporter | ||
Comment 27•9 years ago
|
||
Thanks Seth! I see an improvement on inbound already:
https://treeherder.mozilla.org/perf.html#/graphs?series=[mozilla-inbound,f437443b6f45bafdf125532c1c3b59db1db3fd83,1]&series=[mozilla-inbound,6418118e3b7aa64dc305955d3f987d58986fea82,1]
Flags: needinfo?(jmaher)
Updated•9 years ago
|
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WORKSFORME
No need to track this for FF44 as bug 1217571 is already tracked and marking this as fixed on 44 based on last comment.
Marking disabled for 43 since we are deferring the e10s experiment to 44.
You need to log in
before you can comment on or make changes to this bug.
Description
•