Closed Bug 1752387 Opened 3 years ago Closed 3 years ago

ServiceWorkerManager is the second most frequent timer when Firefox is completely idle

Categories

(Core :: DOM: Service Workers, defect, P3)

defect

Tracking

()

RESOLVED FIXED
99 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox96 --- unaffected
firefox97 --- wontfix
firefox98 --- fixed
firefox99 --- fixed

People

(Reporter: florian, Assigned: jesup)

References

(Blocks 1 open bug, Regression)

Details

(Keywords: regression)

Attachments

(1 file)

In an attempt to identify avoidable idle wake ups, I profiled Firefox for a long duration without doing anything to it. On Windows I kept a single tab with about:blank, on Mac I closed all browser windows.

In the profiles (mac | windows) I see ServiceWorkerManager timers firing every 10s.

Looking at the code, the only purpose of this timer seems to be updating telemetry probes added in bug 1740335.

I wonder if we could stop this timer when there's no service worker, and I suspect it's unintentional that the probe descriptions say "sampled every minute" but the timer is set to 10s.

Not related to the timer: I don't see where the SERVICE_WORKER_FETCH_RUNNING probe defined in Histograms.json is used, I wonder if it's a leftover from a previous version of the patch.

Flags: needinfo?(rjesup)

I flagged this during discussion with chutten; unfortunately the telemetry API doesn't provide an easy way to disable the timer but still produce reliable results. I need the 0's during no-serviceworker times to make the result meaningful (like that we found that 85% of the time we have no SW running). If I disable the timer, to keep the stats right I need to do 2 things: one, when we start a SW (go from 0->1), we'd need to restart the timer (easy), and we'd need to submit N(seconds)/10 0's to Telemetry (the 0's we didn't record). This means calling the API N/10 times in a row, or creating an array of 0's and passing that in (and internally it basically calls the record method with 0 N times, so there's no real perf win). We'd also need to notice shutdown and submit 0's then if we're in a stopped-timer state.

Flags: needinfo?(rjesup)

Not related to the timer: I don't see where the SERVICE_WORKER_FETCH_RUNNING probe defined in Histograms.json is used, I wonder if it's a leftover from a previous version of the patch.

Yes, I believe so

The only way I see to solve this without a performance/pageload impact would be to submit the 0's via creating an idle runnable, and even then I'm still worried it would run long enough in some cases (like a new SW after N days without one running) that it could impact pageload. Perhaps use a large fixed delay (10s? 30s?), then use an idle runnable.

Flags: needinfo?(chutten)

I suppose we could do some type of increasing delay the longer we've had 0 SW's, and then iterating N/10 times on every wakeup, with some type of upper limit to limit maximum jank. Note that none of these ideas exactly as above would help the cases where a SW remains running, though we could trivially extend them for "# of SW's hasn't changed", with a trigger to cut the delay immediately on any change in the number (instead of on 0->1 transitions)

Is the telemetry API slow enough to make it noticeably slow if we add the same value a few thousand times in a row? And if the answer is yes, can we call it off main thread?

Set release status flags based on info from the regressing bug 1740335

Has Regression Range: --- → yes

(In reply to Florian Quèze [:florian] from comment #5)

Is the telemetry API slow enough to make it noticeably slow if we add the same value a few thousand times in a row? And if the answer is yes, can we call it off main thread?

Telemetry (and Glean) APIs are threadsafe and are in the performance class of "cheap but not free".

In today's m-c Telemetry, Histograms are given samples proportional to the number of paints or vsync intervals (and the latter even on release channel) amounting to hundreds of millions of samples per major version, easily tens of thousands of samples per subsession, if not more. ( Check about:telemetry and use the search box for "rasterize" and you'll see your own counts. Refresh the page to watch them increase.) There are no doubt Histograms with even more samples in them, but those are two that came to mind : )

Telemetry does take a lock on the calling thread and perform some small in-memory calculations per sample to figure out what bucket it should be in and then to increment the count in that bucket. There are some value and bounds checks as well, but they go by quickly.

In short: we should be fine to receive a whole whack-load of zeroes. If you learn that Telemetry isn't up to the task, do let us know and we'll cut a better API for you or I can help you encode your information in a slightly different way (like maybe we count lengths of time that we're at zero SWs to keep 0s out of the main SW probe). Please use the array API if you plan on sending us a bunch of zeroes so you only pick up Telemetry's Histogram lock once per batch. And by all means do so off the main thread. It shouldn't run long, but as advertised these operations are "cheap but not free".

(( In the case of Glean, the Glean SDK has its own dispatcher to take things off of the instrumenting thread so dispatching yourself wouldn't be needed. ))

Flags: needinfo?(chutten)
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Severity: -- → S4
Priority: -- → P3
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/autoland/rev/068ad1f32fba Don't use a timer for recording ServiceWorker running telemetry r=chutten,dom-worker-reviewers,edenchuang
Pushed by rjesup@wgate.com: https://hg.mozilla.org/integration/autoland/rev/3f4d90aeea3b Don't use a timer for recording ServiceWorker running telemetry r=chutten
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 99 Branch
Flags: needinfo?(rjesup)

The patch landed in nightly and beta is affected.
:jesup, is this bug important enough to require an uplift?
If not please set status_beta to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(rjesup)

Comment on attachment 9261815 [details]
Bug 1752387: Don't use a timer for recording ServiceWorker running telemetry r=chutten,#dom-worker-reviewers

Beta/Release Uplift Approval Request

  • User impact if declined: Increased power use by idle firefox instances (including on mobile)
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Relatively simple patch that removes a timer. We certainly don't need to include it, but power use especially by mobile is an issue, and mobile does finger apps that use power in the background
  • String changes made/needed: none
Flags: needinfo?(rjesup)
Attachment #9261815 - Flags: approval-mozilla-beta?

Comment on attachment 9261815 [details]
Bug 1752387: Don't use a timer for recording ServiceWorker running telemetry r=chutten,#dom-worker-reviewers

We are still in early betas and that looks like a valuable fix to uplift. Approved for 98 beta 5, thanks.

Attachment #9261815 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: