4.83 - 28.14% displaylist_mutate / glvideo Mean / perf_reftest_singleton (linux64, macosx1014, windows10, windows7) regression on push 4c3e6fb95aa6184651f215d466e702ff7d1660d3 (Wed October 21 2020)
Categories
(Core :: Audio/Video: Playback, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox82 | --- | unaffected |
firefox83 | --- | unaffected |
firefox84 | --- | disabled |
firefox85 | --- | disabled |
firefox86 | --- | fixed |
People
(Reporter: alexandrui, Unassigned)
References
(Regression)
Details
(4 keywords, Whiteboard: [perf:alert:0])
Attachments
(2 obsolete files)
Perfherder has detected a talos performance regression from push 4c3e6fb95aa6184651f215d466e702ff7d1660d3. As author of one of the patches included in that push, we need your help to address this regression.
Regressions:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
26% | glvideo | Mean tick time across 100 ticks: | macosx1014-64-shippable-qr | e10s stylo | 27.31 -> 34.49 |
26% | glvideo | Mean tick time across 100 ticks: | macosx1014-64-shippable | e10s stylo | 27.85 -> 35.17 |
26% | glvideo | Mean tick time across 100 ticks: | macosx1014-64-shippable | e10s stylo | 27.71 -> 34.79 |
5% | displaylist_mutate | linux64-shippable-qr | e10s stylo | 2,942.05 -> 3,084.10 |
Improvements:
Ratio | Suite | Test | Platform | Options | Absolute values (old vs new) |
---|---|---|---|---|---|
5% | perf_reftest_singletons | window-named-property-get.html | linux64-shippable | e10s stylo | 595.81 -> 565.39 |
Details of the alert can be found in the alert summary, including links to graphs and comparisons for each of the affected tests. Please follow our guide to handling regression bugs and let us know your plans within 3 business days, or the offending patch(es) will be backed out in accordance with our regression policy.
For more information on performance sheriffing please see our FAQ.
Reporter | ||
Updated•4 years ago
|
Comment 1•4 years ago
|
||
Set release status flags based on info from the regressing bug 1595994
Comment 2•4 years ago
|
||
Investigating.
None of those tests include a media element to be played.
We would never encounter the new code path introduced in bug 1595994.
However, when doing a first run, there a sanity test being run that checks if hardware acceleration can be used. This is done by decoding a single frame video and ensuring it displays properly. This will happen once only.
So it's likely what we are seeing is the additional time required to spawn the RDD process.
For glvideo on mac, this is expected if webrender isn't running. Once webrender is enabled by default it should improved things.
I would argue that if the time to spawned the RDD process is included and is like running a new configuration from zero, it doesn't really reflect a typical use.
We need to determine if the hardware is not doing crazy things.
Comment 3•4 years ago
|
||
I tried all day to understand what was going on with some of those tests as I had no explanation for the regression, even with the new code path disabled by pref I was still getting the same range of values.
Here is a push with all of bug 1595994 reverted, even more so both the GPU remote decoder and the RDD process have been turned off.
Here we have:
perf_reftest_singletons bloom-basic-2.html opt windows7-32-shippable e10s stylo: 64.98
perf_reftest_singletons bloom-basic-2.html opt windows10-64-shippable e10s stylo: 52.41
perf_reftest_singletons bloom-basic.html opt windows10-64-shippable e10s stylo: 50.8
Here is the windows7-32 graph: https://treeherder.mozilla.org/perf.html#/graphs?series=try,1908965,1,1&selected=1908965,1245748561 you can see no changes.
Those are the 4 biggest regression showing in the report, even with the revert we are still at the same range of values.
So I have no reasonable explanation for those regressions, it's not due to the RDD not being started, in the push above,
Comment 4•4 years ago
|
||
In addition, I can't also explain any of the performance improvement, the code modified by bug 1595994 isn't called.
With all code reverted, the biggest improvement shown in the list
perf_reftest_singletons parent-basic-singleton.html opt macosx1014-64-shippable e10s stylo: 168.91
still give the same figures.
Comment 5•4 years ago
|
||
Updated•4 years ago
|
Comment 6•4 years ago
|
||
Updated•4 years ago
|
Reporter | ||
Comment 7•4 years ago
|
||
Thanks for the input, Jean. Did some retriggers. Bug 1595994 seems more reasonable as the culprit. I will update the bug as the graph reveals another culprit.
Reporter | ||
Updated•4 years ago
|
Comment 8•4 years ago
|
||
(In reply to Alexandru Ionescu (needinfo me) [:alexandrui] from comment #7)
Thanks for the input, Jean. Did some retriggers. Bug 1595994 seems more reasonable as the culprit. I will update the bug as the graph reveals another culprit.
It may be worth also looking at the regression that shows around October 21st there https://arewefastyet.com/win10/overview?numDays=60
You can see the regression in most of those tests around 1 day before bug 1595994 landed.
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 9•4 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #3)
I tried all day to understand what was going on with some of those tests as I had no explanation for the regression, even with the new code path disabled by pref I was still getting the same range of values.
Here is a push with all of bug 1595994 reverted, even more so both the GPU remote decoder and the RDD process have been turned off.
Here we have:
perf_reftest_singletons bloom-basic-2.html opt windows7-32-shippable e10s stylo: 64.98
perf_reftest_singletons bloom-basic-2.html opt windows10-64-shippable e10s stylo: 52.41
perf_reftest_singletons bloom-basic.html opt windows10-64-shippable e10s stylo: 50.8Here is the windows7-32 graph: https://treeherder.mozilla.org/perf.html#/graphs?series=try,1908965,1,1&selected=1908965,1245748561 you can see no changes.
Those are the 4 biggest regression showing in the report, even with the revert we are still at the same range of values.
So I have no reasonable explanation for those regressions, it's not due to the RDD not being started, in the push above,
Jean, when I filed the bug I based on glvideo regressions from this alert. If you look at this graph retriggers you'll be ablt to see pretty clear that Bug 1595994 is the culprit. I'll do some retriggers on perf_reftest_singletons to make sure that this test's regression is also caused by the same patch. I'll keep you posted!
Comment 10•4 years ago
|
||
Regression of glvideo on Mac without webrender enabled is expected.
Not any of the others.
Reporter | ||
Comment 11•4 years ago
|
||
Indeed, perf_reftest_singleons were caused by Bug 1644624. I reassigned them. Thanks for the heads-up!
Updated•4 years ago
|
Updated•4 years ago
|
Comment 12•4 years ago
|
||
Jean-Yves what's the status here, will the patch on this bug land?
Comment 13•4 years ago
|
||
Oops didn't mean to change 84 status
Comment 14•4 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #12)
Jean-Yves what's the status here, will the patch on this bug land?
we can, but I don't believe it will affect the outcome of some regressions here.
The majority of it isn't even related to the bug first mentioned.
Comment 15•4 years ago
|
||
Comment 16•4 years ago
|
||
Backed out for Gecko Decision task build bustage.
Push with failure: https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=testfailed%2Cbusted%2Cexception&revision=684580bcc08740311afc74cbcc7cfd0909204058
Log: https://treeherder.mozilla.org/logviewer?job_id=322264435&repo=autoland&lineNumber=1581
Backout: https://hg.mozilla.org/integration/autoland/rev/0a76681f2d76616b6384b3cdcdc0785ed2f42388
Updated•4 years ago
|
Updated•4 years ago
|
Comment 17•4 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #14)
(In reply to Julien Cristau [:jcristau] from comment #12)
Jean-Yves what's the status here, will the patch on this bug land?
we can, but I don't believe it will affect the outcome of some regressions here.
The majority of it isn't even related to the bug first mentioned.
AIUI there's still the displaylist_mutate and glvideo-with-webrender cases?
Comment 18•4 years ago
|
||
Alexandru could you update the bug title and comment 0 to reflect comment 11?
Comment 19•4 years ago
|
||
(In reply to Julien Cristau [:jcristau] from comment #17)
AIUI there's still the displaylist_mutate and glvideo-with-webrender cases?
displaylist_mutate is nonsensical, as we won't hit the new codepath with that test. So I have no explanation on what's going on.
For webgl, as mentioned earlier, there will be a temporary regression occurring until webrender is enabled and webgl in the GPU process becomes active, and this one may now be.
Comment 20•4 years ago
|
||
Short note that displaylist_mutate
was apparently only added for WR when this landed (https://treeherder.mozilla.org/perfherder/graphs?series=autoland,1925725,1,1&timerange=5184000&series=autoland,2812899,1,1&series=mozilla-central,2820195,1,1) - and that, so far, performs massively worse that basic, i.e. should be investigated either way :)
Reporter | ||
Updated•4 years ago
|
Comment 21•4 years ago
|
||
(In reply to Jean-Yves Avenard [:jya] from comment #3)
I tried all day to understand what was going on with some of those tests as I had no explanation for the regression, even with the new code path disabled by pref I was still getting the same range of values.
https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=FHgVQ9wfSqOYbctIyNeRaA.0&revision=8138419746a1d0a56dbb9a278e275ea5eb3daff2
Re-using that same build with the revert we have:
without bug 1595994:
displaylist_mutate flattened_mutate.html opt e10s stylo webrender: 2460.49
displaylist_mutate inactive_mutate.html opt e10s stylo webrender: 3142.56
displaylist_mutate mutate.html opt e10s stylo webrender: 4018.05
displaylist_mutate opt e10s stylo webrender: 3143.71
compare to supposedly the regressed range (https://treeherder.mozilla.org/jobs?repo=mozilla-central&revision=1980f87855fc557e7ef6bb24b66c4d2df5606afa&group_state=expanded&selectedTaskRun=efYRjeh6RMqAnUOVnTDxlQ.0)
with bug 1595994:
displaylist_mutate flattened_mutate.html opt e10s stylo webrender: 2317.37
displaylist_mutate inactive_mutate.html opt e10s stylo webrender: 3055.52
displaylist_mutate mutate.html opt e10s stylo webrender: 4004.04
displaylist_mutate opt e10s stylo webrender: 3049.27
We can see here that with none of the RDD work we get even worse results
This is conclusive proof that bug 1595994 has nothing to do with any regressions on displaylist_mutate
Comment 22•4 years ago
|
||
The main problem appears to be in webgl itself ; even if the image should be readable directly, it still uses a readback to the RDD process to read it.
GLBlitHelper::BlitImageToFramebuffer isn't called.
This seems to be the same problem as bug 1670542 and bug 1640607.
So moving the decoder from the content process to the GPU or RDD process only exposed the issue.
Setting webgl.out-of-process to false; and we move from 35 to 4.5
Updated•4 years ago
|
Comment 24•4 years ago
|
||
(In reply to Jeff Gilbert [:jgilbert] from comment #23)
This is being addressed in bug 1640607, thanks!
:jgilbert could you confirm that bug 1640607 covers the remaining regressions?
Comment 26•4 years ago
|
||
webgl.out-of-process is off for beta and release.
Comment 27•4 years ago
|
||
Improvement shown in bug 1640607 comment 23.
Updated•4 years ago
|
Updated•4 years ago
|
Description
•