ServiceWorkerManager is the second most frequent timer when Firefox is completely idle
Categories
(Core :: DOM: Service Workers, defect, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox96 | --- | unaffected |
firefox97 | --- | wontfix |
firefox98 | --- | fixed |
firefox99 | --- | fixed |
People
(Reporter: florian, Assigned: jesup)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
pascalc
:
approval-mozilla-beta+
|
Details |
In an attempt to identify avoidable idle wake ups, I profiled Firefox for a long duration without doing anything to it. On Windows I kept a single tab with about:blank, on Mac I closed all browser windows.
In the profiles (mac | windows) I see ServiceWorkerManager
timers firing every 10s.
Looking at the code, the only purpose of this timer seems to be updating telemetry probes added in bug 1740335.
I wonder if we could stop this timer when there's no service worker, and I suspect it's unintentional that the probe descriptions say "sampled every minute" but the timer is set to 10s.
Not related to the timer: I don't see where the SERVICE_WORKER_FETCH_RUNNING
probe defined in Histograms.json is used, I wonder if it's a leftover from a previous version of the patch.
Assignee | ||
Comment 1•3 years ago
|
||
I flagged this during discussion with chutten; unfortunately the telemetry API doesn't provide an easy way to disable the timer but still produce reliable results. I need the 0's during no-serviceworker times to make the result meaningful (like that we found that 85% of the time we have no SW running). If I disable the timer, to keep the stats right I need to do 2 things: one, when we start a SW (go from 0->1), we'd need to restart the timer (easy), and we'd need to submit N(seconds)/10 0's to Telemetry (the 0's we didn't record). This means calling the API N/10 times in a row, or creating an array of 0's and passing that in (and internally it basically calls the record method with 0 N times, so there's no real perf win). We'd also need to notice shutdown and submit 0's then if we're in a stopped-timer state.
Assignee | ||
Comment 2•3 years ago
|
||
Not related to the timer: I don't see where the
SERVICE_WORKER_FETCH_RUNNING
probe defined in Histograms.json is used, I wonder if it's a leftover from a previous version of the patch.
Yes, I believe so
Assignee | ||
Comment 3•3 years ago
|
||
The only way I see to solve this without a performance/pageload impact would be to submit the 0's via creating an idle runnable, and even then I'm still worried it would run long enough in some cases (like a new SW after N days without one running) that it could impact pageload. Perhaps use a large fixed delay (10s? 30s?), then use an idle runnable.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
I suppose we could do some type of increasing delay the longer we've had 0 SW's, and then iterating N/10 times on every wakeup, with some type of upper limit to limit maximum jank. Note that none of these ideas exactly as above would help the cases where a SW remains running, though we could trivially extend them for "# of SW's hasn't changed", with a trigger to cut the delay immediately on any change in the number (instead of on 0->1 transitions)
Reporter | ||
Comment 5•3 years ago
|
||
Is the telemetry API slow enough to make it noticeably slow if we add the same value a few thousand times in a row? And if the answer is yes, can we call it off main thread?
Updated•3 years ago
|
Comment 6•3 years ago
|
||
Set release status flags based on info from the regressing bug 1740335
Updated•3 years ago
|
Comment 7•3 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #5)
Is the telemetry API slow enough to make it noticeably slow if we add the same value a few thousand times in a row? And if the answer is yes, can we call it off main thread?
Telemetry (and Glean) APIs are threadsafe and are in the performance class of "cheap but not free".
In today's m-c Telemetry, Histograms are given samples proportional to the number of paints or vsync intervals (and the latter even on release channel) amounting to hundreds of millions of samples per major version, easily tens of thousands of samples per subsession, if not more. ( Check about:telemetry
and use the search box for "rasterize" and you'll see your own counts. Refresh the page to watch them increase.) There are no doubt Histograms with even more samples in them, but those are two that came to mind : )
Telemetry does take a lock on the calling thread and perform some small in-memory calculations per sample to figure out what bucket it should be in and then to increment the count in that bucket. There are some value and bounds checks as well, but they go by quickly.
In short: we should be fine to receive a whole whack-load of zeroes. If you learn that Telemetry isn't up to the task, do let us know and we'll cut a better API for you or I can help you encode your information in a slightly different way (like maybe we count lengths of time that we're at zero SWs to keep 0s out of the main SW probe). Please use the array API if you plan on sending us a bunch of zeroes so you only pick up Telemetry's Histogram lock once per batch. And by all means do so off the main thread. It shouldn't run long, but as advertised these operations are "cheap but not free".
(( In the case of Glean, the Glean SDK has its own dispatcher to take things off of the instrumenting thread so dispatching yourself wouldn't be needed. ))
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 10•3 years ago
|
||
Backed out for causing failures on nsISupportsImpl.cpp:43. CLOSED TREE
Backout link : https://hg.mozilla.org/integration/autoland/rev/a66c1dc9f44dd80542a1a58f00d14946d3badf17
Link to failure log : https://treeherder.mozilla.org/logviewer?job_id=367088428&repo=autoland&lineNumber=2579
Comment 11•3 years ago
|
||
Please also check:
- assertion failure in MOZ_Crash -> https://treeherder.mozilla.org/logviewer?job_id=367094551&repo=autoland&lineNumber=4418
- bc failure on browser_serviceworker_fetch_new_process.js -> https://treeherder.mozilla.org/logviewer?job_id=367101162&repo=autoland&lineNumber=5462
- bc failure on browser_toolbarKeyNav.js -> https://treeherder.mozilla.org/logviewer?job_id=367105002&repo=autoland&lineNumber=2007
Comment 12•3 years ago
|
||
Comment 13•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Comment 14•3 years ago
|
||
The patch landed in nightly and beta is affected.
:jesup, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Updated•3 years ago
|
Assignee | ||
Comment 15•3 years ago
|
||
Comment on attachment 9261815 [details]
Bug 1752387: Don't use a timer for recording ServiceWorker running telemetry r=chutten,#dom-worker-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: Increased power use by idle firefox instances (including on mobile)
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Relatively simple patch that removes a timer. We certainly don't need to include it, but power use especially by mobile is an issue, and mobile does finger apps that use power in the background
- String changes made/needed: none
Comment 16•3 years ago
|
||
Comment on attachment 9261815 [details]
Bug 1752387: Don't use a timer for recording ServiceWorker running telemetry r=chutten,#dom-worker-reviewers
We are still in early betas and that looks like a valuable fix to uplift. Approved for 98 beta 5, thanks.
Comment 17•3 years ago
|
||
bugherder uplift |
Updated•3 years ago
|
Description
•