Worker events can be starved by refresh driver
Categories
(Core :: Layout, defect)
Tracking
()
People
(Reporter: jrmuizel, Assigned: smaug)
References
(Blocks 1 open bug, Regressed 1 open bug, Regression)
Details
(Keywords: perf-alert, regression)
Attachments
(6 files)
In bug 1749365 we're seeing a laggy volume control on Twitch caused by what I believe is high latency in receiving events from a worker.
I've created a test case that shows this starvation quite dramatically compared to Safari and Chrome.
Updated•3 years ago
|
Comment 1•3 years ago
|
||
Set release status flags based on info from the regressing bug 1300658
Updated•3 years ago
|
asuth, does this look like a worker-side issue to you?
Comment 3•3 years ago
|
||
The problem as I understand it here is that postMessage() from workers to their parent are explicitly dispatched with a normal priority via a ThrottledEventQueue and this is intentionally structured this way because of bug 1300659 where the concern was a worker being able to run faster than the main thread and spam it so much it couldn't make forward progress. Whereas vsync is dispatched as its own vsync priority class which is higher than Normal and also the MediumHigh workers use for other types of events.
The general options would seem to be:
- The refresh driver needs to try and do something to scale to leave more slack in the system. This seems tricky because I think the granularity is largely a question of dropping frames or not.
- Worker-to-parent event prioritization could be more dynamic using a token bucket or similar so that if a worker is only sending a small number of message per second, those could be re-targeted to be medium-high. But in the event of spamming of postMessage like in bug 1300659, a steady-state would be reached where the worker drops down to normal priority again.
- Do note that there are potential event ordering issues, but that the use of ThrottledEventQueue helps mitigate this because it internally buffers dispatched events and so can be re-targeted to different underlying queues without too much trouble.
This ends up being more of a :smaug question, though I think?
Updated•3 years ago
|
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Reporter | ||
Comment 5•3 years ago
|
||
The tracing version of this test case makes it easier to determine the behaviour of other browsers.
A basic summary of the results are:
Firefox tries keep running at 60Hz
Chrome drops the frame rate down to 8.3Hz (120ms)
Safari doesn't throttle events from the worker at all
Reporter | ||
Comment 6•3 years ago
|
||
Reporter | ||
Comment 7•3 years ago
|
||
Reporter | ||
Comment 8•3 years ago
|
||
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Backed out changeset d3f40b4651f5 (Bug 1755006) for causing mochitest failures on test_pointerlock-api.html.
Backout link
Push with failures - 4
Failure Log
dt3 Failure Log
Comment 11•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 12•3 years ago
|
||
bugherder |
Comment 13•3 years ago
|
||
Backed out for causing mochitest failures on browser_ext_pageAction_context.js,browser_panel_list_accessibility.js
- Backout link
- Push with failures
Link to failure logs:
https://treeherder.mozilla.org/logviewer?job_id=370940154&repo=autoland&lineNumber=5367
https://treeherder.mozilla.org/logviewer?job_id=370940061&repo=autoland&lineNumber=3753
https://treeherder.mozilla.org/logviewer?job_id=370939976&repo=autoland&lineNumber=3763
https://treeherder.mozilla.org/logviewer?job_id=370939245&repo=autoland&lineNumber=13894
https://treeherder.mozilla.org/logviewer?job_id=370940369&repo=autoland&lineNumber=4949
https://treeherder.mozilla.org/logviewer?job_id=370940079&repo=autoland&lineNumber=13806
https://treeherder.mozilla.org/logviewer?job_id=370939447&repo=autoland&lineNumber=3086 - Failure line: TEST-UNEXPECTED-FAIL | toolkit/mozapps/extensions/test/browser/browser_panel_list_accessibility.js | The first item is focused - Got [object HTMLButtonElement], expected [object HTMLElement]
Assignee | ||
Comment 14•3 years ago
|
||
https://treeherder.mozilla.org/logviewer?job_id=370940154&repo=autoland&lineNumber=5367
is an example of browser_ext_pageAction_context.js failure.
Looks like it is https://bugzilla.mozilla.org/show_bug.cgi?id=1503404
Comment 15•3 years ago
|
||
Hello, this(Bug 1655139) also turned into high frequency starting from this bug which got backed out. Can you please look at this too?
Comment 16•3 years ago
|
||
Backout merged to central: https://hg.mozilla.org/mozilla-central/rev/b3eceffcdc4e
Comment 17•3 years ago
|
||
Comment 18•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment hidden (obsolete) |
Updated•3 years ago
|
Updated•3 years ago
|
Reporter | ||
Comment 20•3 years ago
|
||
Release Note Request (optional, but appreciated)
[Why is this notable]: Performance of the volume slider on Twitch is much better than before
[Affects Firefox for Android]:
[Suggested wording]: Improve the fairness between painting and handling other events. This noticeably improves the performance of the volume slider on Twitch.
[Links (documentation, blog post, etc)]: Before and after videos here: https://jrmuizel.github.io/twitch/volume.html
Updated•3 years ago
|
Description
•