Closed Bug 1191339 Opened 9 years ago Closed 7 years ago

Launch application with ServiceWorker registered takes much longer

Categories

(Firefox OS Graveyard :: Performance, defect, P1)

ARM
Gonk (Firefox OS)
defect

Tracking

(tracking-b2g:+)

RESOLVED WONTFIX
FxOS-S7 (18Sep)
tracking-b2g +

People

(Reporter: ting, Unassigned)

References

Details

(Keywords: perf)

Attachments

(1 file)

While in Paris NGA work week, Francisco told me they have some benchmark tests for ServiceWorker [1]. And the result he most concerns is the app empty-sw, since it takes ~2x startup time comapre to baseline which does not register any. Note the app registers just an empty ServiceWorker, and fetch a json file when startup. I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] (add the apps to apps/verticalhome/build/default-homescreens.json so I can launch them). I also enabled ServiceWorker preferences "dom.serviceWorkers.enabled", "dom.serviceWorkers.interception.enabled" and installed the ServiceWorker before profiling. The results of running Raptor to measure both apps on Aries L: baseline: | Metric | Mean | Median | Min | Max | StdDev | p95 | | --------------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- | | coldlaunch.benchmark-js-file-parsed | 111.067 | 94.500 | 70.000 | 248.000 | 40.281 | 182.000 | | coldlaunch.benchmark-dom-content-loaded | 122.172 | 109.000 | 76.000 | 222.000 | 38.361 | 205.850 | | coldlaunch.benchmark-load | 144.833 | 129.000 | 96.000 | 257.000 | 45.155 | 251.000 | | coldlaunch.benchmark-payload-loaded | 161.567 | 142.500 | 110.000 | 290.000 | 48.242 | 285.000 | | coldlaunch.benchmark-payload-read | 178.900 | 159.000 | 126.000 | 297.000 | 47.796 | 292.000 | | coldlaunch.fullyLoaded | 179.000 | 159.000 | 126.000 | 297.000 | 47.914 | 293.000 | empty-sw: | Metric | Mean | Median | Min | Max | StdDev | p95 | | ----------------------------------- | ------- | ------- | ------- | ------- | ------ | ------- | | coldlaunch.benchmark-js-file-parsed | 273.967 | 269.000 | 213.000 | 338.000 | 34.135 | 332.000 | | coldlaunch.benchmark-load | 275.400 | 270.500 | 214.000 | 338.000 | 34.190 | 333.000 | | coldlaunch.benchmark-payload-loaded | 312.367 | 313.000 | 239.000 | 377.000 | 35.130 | 369.000 | | coldlaunch.benchmark-payload-read | 315.967 | 316.500 | 251.000 | 380.000 | 34.727 | 371.000 | | coldlaunch.fullyLoaded | 316.167 | 316.500 | 252.000 | 380.000 | 34.706 | 372.000 | Versions: Gecko: f3b757156f69 Gaia: 581de383687dc441a878d2c91a0167c6ec688fef [1] https://github.com/azasypkin/sw-tests [2] http://people.mozilla.org/~bgirard/cleopatra/#report=049737cc6516a32ce4de73d123c10196a199910b [3] http://people.mozilla.org/~bgirard/cleopatra/#report=2511bcf01a1c118b6c4442edf6f6fb7cd5f381f2
(In reply to Ting-Yu Chou [:ting] from comment #0) > I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] The profile of empty-sw shows child process main thread idles for ~150ms before WorkerMainThreadRunnable::Run(), I'll try to figure out why. Ben, please let us know if you have any ideas, thanks.
Flags: needinfo?(bkelly)
Is it spending time spinning up the ServiceWorker thread (even though you have no fetch event handler)?
Flags: needinfo?(bkelly)
I am not sure, will figure that out. Maybe I should change the subject a bit, the problem is not about whether there's a fetch event handler or not, but why it's idling.
Summary: Lauch application with ServiceWorker registered takes much longer, even it is empty → Lauch application with ServiceWorker registered takes much longer
What do you mean by "idling"? Do you mean why is the ServiceWorker going away between requests? The answer to that is we haven't implemented a grace period to keep the ServiceWorker alive for a short time before killing it. We should add this to debounce the worker thread and minimize teardown/startup costs. Catalin, is this related to what you are looking at now?
Flags: needinfo?(catalin.badea392)
Summary: Lauch application with ServiceWorker registered takes much longer → Launch application with ServiceWorker registered takes much longer
I mean the child process main thread is idling for ~150ms during application startup.
(In reply to Ting-Yu Chou [:ting] from comment #5) > I mean the child process main thread is idling for ~150ms during application > startup. "idling" is waiting in epoll_wait().
Is it in a single epoll_wait() or spread across many?
Attached patch Perf patch (deleted) — Splinter Review
I am trying to set some perf marks on the SW implementation to see where the time is spent with the attached patch and this is the output that I got so far: I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|0.000000|1438795590256 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_CREATION|594146.488887|0.000000|1438795590258 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594146.806908|0.000000|1438795590258 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594533.544096|0.000000|1438795590645 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|389.006771|1438795590646 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594539.975867|0.000000|1438795590651 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/index.html|594144.537325|395.438542|1438795590652 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|0.000000|1438795590702 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_CREATION|594596.358002|0.000000|1438795590707 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594596.463887|0.000000|1438795590708 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|0.000000|1438795590708 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_REUSED|594603.246648|0.000000|1438795590714 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594603.460241|0.000000|1438795590715 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594709.592117|0.000000|1438795590821 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|118.522344|1438795590821 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594710.466908|0.000000|1438795590822 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/js/app.js|594591.069773|119.397135|1438795590822 I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-js-file-parsed|584.330000|0.000000|1438795590825 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594716.858575|0.000000|1438795590828 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|119.719427|1438795590828 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|0.000000|1438795590829 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|WORKER_PRIVATE_REUSED|594719.572273|0.000000|1438795590831 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_CREATION|594719.824617|0.000000|1438795590831 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594720.922429|0.000000|1438795590832 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/js/dummy.js|594597.139148|123.783281|1438795590832 I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-load|592.960000|0.000000|1438795590834 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|DISPATCH_FETCH_EVENT_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594738.534356|0.000000|1438795590850 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_FETCH_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|21.133646|1438795590850 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|RESET_INTERCEPTION_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594764.198002|0.000000|1438795590875 I/PerformanceTiming( 3214): Performance Entry: Gecko|measure|TIME_FROM_INTERCEPTION_TO_RESET_app://benchmark-app.gaiamobile.org/empty-sw/payload.json|594717.400710|46.797292|1438795590876 I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-payload-loaded|698.780000|0.000000|1438795590940 I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|benchmark-payload-read|715.210000|0.000000|1438795590956 I/PerformanceTiming( 3214): Performance Entry: benchmark-app.gaiamobile.org|mark|fullyLoaded|715.420000|0.000000|1438795590956 I/PerformanceTiming( 2228): Performance Entry: Gecko|mark|INTERCEPTION_START_app://benchmark-app.gaiamobile.org/img/icons/icon48x48.png|989178.313789|0.000000|1438795591069 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.501645|0.000000|1438795596515 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.593520|0.000000|1438795596515 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.647113|0.000000|1438795596515 I/PerformanceTiming( 3214): Performance Entry: Gecko|mark|SERVICE_WORKER_DESTRUCTION|600403.695603|0.000000|1438795596515 I am measuring the time that it takes from starting the interception of a request to sending the fetch event and to resetting the interception of the request. IIUC, the numbers show the following: - We take ~390ms to get to the interception reset for the first resource (index.html). - We take ~125ms for the following resources (app.js and dummy.js). - We take ~45ms for the last resource (payload.json). As you can see on the log, the requests for app.js and dummy.js are different in a way that we create a worker private for app.js but we reuse it for dummy.js, but still the times are quite similar. The difference between the times of the app.js and dummy.js and payload.json requests may be because payload.json is requested via the Fetch API (?)
Attachment #8643802 - Attachment is patch: true
BTW I run the tests on a Flame device
(In reply to Fernando Jiménez Moreno [:ferjm] (PTO July 27th - July 31th and Aug10th - Aug 24th) from comment #8) > As you can see on the log, the requests for app.js and dummy.js are > different in a way that we create a worker private for app.js but we reuse > it for dummy.js, but still the times are quite similar. > > The difference between the times of the app.js and dummy.js and payload.json > requests may be because payload.json is requested via the Fetch API (?) This is probably because dummy.js starts at almost the same time as app.js. It reuses the WorkerPrivate, but the worker is still not started yet. In contrast, payload.json starts just before app.js and dummy.js complete. So it does benefit from the WorkerPrivate thread being fully spun up.
If the WorkerPrivate thread from handling index.html interception was still alive when app.js was triggered, then it seems we would save ~100ms.
(In reply to Ben Kelly [:bkelly] from comment #4) > What do you mean by "idling"? Do you mean why is the ServiceWorker going > away between requests? > > The answer to that is we haven't implemented a grace period to keep the > ServiceWorker alive for a short time before killing it. We should add this > to debounce the worker thread and minimize teardown/startup costs. > > Catalin, is this related to what you are looking at now? From what I can tell, there are two problems here: 1. We're not reusing the service worker to handle sequences of fetch events. 2. We dispatch fetch events to a service worker that doesn't have a fetch event handler. My patches from bug 1188545 should fix the first problem - the service worker will live at least ~30 seconds after each event. At the moment, each fetch event holds a reference to the service worker that keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch handler was executed. Thus, having a very fast succession of fetches should reuse the WorkerPrivate. I suspect the reason the first fetch is slower than the rest in Fernando's results is because that's the only time the worker thread is actually created. Gecko keeps threads around for a short amount of time (I think 10s) and reuses them when creating a new WorkerPrivate.
Flags: needinfo?(catalin.badea392)
(In reply to Ting-Yu Chou [:ting] from comment #1) > (In reply to Ting-Yu Chou [:ting] from comment #0) > > I cpatured two profiles on Aries L, one for baseline [2] and one for empty-sw [3] > > The profile of empty-sw shows child process main thread idles for ~150ms > before WorkerMainThreadRunnable::Run(), I'll try to figure out why. > > Ben, please let us know if you have any ideas, thanks. Is that the WorkerMainThreadRunnable that resets the channel for index.html?
Flags: needinfo?(janus926)
(In reply to Cătălin Badea (:catalinb) from comment #12) > 2. We dispatch fetch events to a service worker that doesn't have a fetch > event handler. This is not something we can address ourselves. We agreed to add a disableFetch() call on InstallEvent to opt-out of fetch event processing. We haven't implemented that yet, though. > My patches from bug 1188545 should fix the first problem - the service > worker will live at least ~30 seconds > after each event. Awesome! > At the moment, each fetch event holds a reference to the service worker that > keeps it alive (which in turn keeps the WorkerPrivate alive) until the fetch > handler was executed. Thus, having a very fast succession of fetches should > reuse the WorkerPrivate. I suspect the reason the first fetch is slower than > the rest in Fernando's results is because that's the only time the worker > thread is actually created. If there is no overlap in the FetchEvent, then the ServiceWorker ref drops to zero. > Gecko keeps threads around for a short amount of time (I think 10s) and > reuses them when creating a new WorkerPrivate. Keeps threads alive, but it still starts shutting down the javascript running inside of it immediately when ref count of the DOM object reaches zero as far as I know. Loading, parsing, and executing the javascript is probably heavier weight than actually spawning an OS thread.
(In reply to Cătălin Badea (:catalinb) from comment #13) > Is that the WorkerMainThreadRunnable that resets the channel for index.html? It is CompileScriptRunnable that takes time. Child process main thread is mostly idling between nsJARChannel::AsyncOpen() intercepting index.html and FetchEventRunnable calling nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js is processed before FetchEventRunnable. There're also two problems here: - Why does CompileScriptRunnable take time for an empty script? - Can we compile the service worker script ahead of time (AOT), e.g., startup cache?
Flags: needinfo?(janus926)
(In reply to Ting-Yu Chou [:ting] from comment #15) > (In reply to Cătălin Badea (:catalinb) from comment #13) > > Is that the WorkerMainThreadRunnable that resets the channel for index.html? > > It is CompileScriptRunnable that takes time. > > Child process main thread is mostly idling between nsJARChannel::AsyncOpen() > intercepting index.html and FetchEventRunnable calling > nsJARChannel::ResetInterception(), and the CompileScriptRunnable for sw.js > is processed before FetchEventRunnable. > > There're also two problems here: > > - Why does CompileScriptRunnable take time for an empty script? > - Can we compile the service worker script ahead of time (AOT), e.g., > startup cache? For the JS scripts in general I was hoping to be able to use some of the work in bug 1164389. So far the patches there does not support Workers but it should not be too hard to add in dom/workers/ScriptLoader.cpp (irrc).
(In reply to Ting-Yu Chou [:ting] from comment #15) > - Why does CompileScriptRunnable take time for an empty script? It takes time from ScriptLoaderRunnable::RunInternal() calling CacheCreator::Load() til CacheCreator::ResolvedCallback() is called. I'll take a deeper look.
The dealy is from two runnables which both dispatch to and pending on chrome process main thread: PrincipalVerifier and QuotaInitRunnable, thus ScripLoaderRunnable receives CacheCreator::ResolvedCallback() late in content process. I added profiler markers (PVDispatch, PVRun, QIRDispatch, QIRRun) before dispatching the runnable and when it runs, the profile [1] shows B2G process main thread was busy working in: 1. awf_launch() handling webapps-launch event in system app (~15ms), 2. refresh driver tick #1 (50ms), 3. awm_switchApp() to switch app to foreground/background in system app (~25ms), 4. refresh driver tick #2 (80ms) There's a bug 1110625 for 2 and 4. [1] http://people.mozilla.org/~bgirard/cleopatra/#report=8a3f722ae0eb9733943fc52208878e4b29a115d4
Depends on: 1110625
(In reply to Ting-Yu Chou [:ting] from comment #18) > 1. awf_launch() handling webapps-launch event in system app (~15ms), > 2. refresh driver tick #1 (50ms), > 3. awm_switchApp() to switch app to foreground/background in system app > (~25ms), > 4. refresh driver tick #2 (80ms) We could try to collapse the Cache main thread runnables into a single runnable, but it seems like it will still get delayed on these operations. Unfortunately I don't think there is any way to avoid main thread when opening an origin in QuotaManager right now.
Do we always need Cache to be ready before loading service worker script?
(In reply to Ting-Yu Chou [:ting] from comment #20) > Do we always need Cache to be ready before loading service worker script? Currently the service worker scripts are stored in the Cache API, so yes we do. We chose Cache API partly because it was convenient (didn't need to get http cache pinning implemented) and partly because the scripts were exposed in the spec via a Cache object. I believe they are no longer exposed in the spec, however, so we could stop using Cache API in theory. This would be a pretty large change, though.
Also, we're talking about moving other persistent data like SW registrations into quota manager. So they would also grow a "main thread runnable at startup" requirement. Jan, does your work to switch QM over to PBackground reduce the need to go to main thread to initialize origin storage in QM?
Flags: needinfo?(Jan.Varga)
(In reply to Ben Kelly [:bkelly] from comment #22) > Also, we're talking about moving other persistent data like SW registrations > into quota manager. So they would also grow a "main thread runnable at > startup" requirement. > > Jan, does your work to switch QM over to PBackground reduce the need to go > to main thread to initialize origin storage in QM? Once quota manager is on PBackground thread, the main thread will be only used occasionally. For example the first call to QuotaManager::GetOrCreate(nsIRunnable* aCallback) will have to dispatch a runnable to main thread to get profile directory and preferences.
Flags: needinfo?(Jan.Varga)
Aleh, bug 1196604 is landed, do you see any improvements from that?
Flags: needinfo?(azasypkin)
(In reply to Ting-Yu Chou [:ting] from comment #24) > Aleh, bug 1196604 is landed, do you see any improvements from that? Mmm, I've just re-run tests I use to measure SW performance. I see that app with registered SW loads a bit faster now (it takes ~50ms less for the first app performance mark). See the latest numbers [1] and compare with the previous ones [2]. Only thing that bothers me a bit is the initial numbers [3] look even better than both [2] and [3]... Not sure what has changed. [1] https://github.com/nga-crash-tests/service-worker-benchmarks#24.08.2015 [2] https://github.com/nga-crash-tests/service-worker-benchmarks#12.08.2015 [3] https://github.com/nga-crash-tests/service-worker-benchmarks#15.07.2015
Flags: needinfo?(azasypkin)
Target Milestone: --- → FxOS-S7 (18Sep)
tracking-b2g: --- → +
Keywords: perf
Priority: -- → P1
Bug 961049 was just landed to switch QuotaManager to PBackground.
Firefox OS is not being worked on
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: