Open
Bug 960986
Opened 11 years ago
Updated 2 years ago
Chrome Worker startup is very slow on second run. Disabling nglayout.debug.disable_xul_cache removes slowdown
Categories
(Core :: DOM: Workers, defect, P5)
Core
DOM: Workers
Tracking
()
NEW
People
(Reporter: smacleod, Unassigned)
Details
(Whiteboard: [Async:blocker])
While investigating Bug 959130 we noticed the startup time of Chrome Workers was > 600ms (rMBP, slower on windows VM).
It appears these long startup times happen on second run, with a cold startup being much quicker (around 100ms on rMBP). These timings are from sending first message, to JS execution inside the worker. Worker startup timings were taken of the Session Store worker (browser/components/sessionstore/src/SessionWorker.js)
Setting "nglayout.debug.disable_xul_cache" to true results in all timings being "fast".
STR:
- Rebuild browser/
- Run Firefox and observe timing to start SessionWorker (Will be ~100ms)
- Close Firefox
- Run Firefox and observe timing to start SessionWorker (Will be > 600ms)
- Subsequent runs will also have large timing
- Set nglayout.debug.disable_xul_cache = true
- Rebuild browser/
- Run Firefox and observe timing to start SessionWorker (Will be ~100ms)
- Close Firefox
- Run Firefox and observe timing to start SessionWorker (will be ~100ms)
This doesn't sound like a worker bug but more like something inside our cache implementation.
Comment 2•11 years ago
|
||
Is there anything we can do to debug this? Where should we start?
Comment 3•11 years ago
|
||
I did some bisecting in StartupCache::GetBuffer() and found the following:
Excluding browser/base/content/browser.js and browser/components/customizableui/content/panelUI.js keeps the session worker startup time at ~250ms, I put this at the top of the function:
> if (!strcmp(id, "xulcache/resource/app/chrome/browser/content/browser/browser.js")) {
> return NS_ERROR_FAILURE;
> }
>
> if (!strcmp(id, "xulcache/resource/app/chrome/browser/content/browser/customizableui/panelUI.js")) {
> return NS_ERROR_FAILURE;
> }
Removing one of those bail outs brings the session worker startup time back to 600-700ms on the second start i.e. with a filled startup cache.
Comment 4•11 years ago
|
||
Removing the DeserializeOutOfLine() call that was introduced with bug 188744 [1] (I think) keeps the worker fast, without excluding browser.js and panelUI.js. Now I'm sure that call has a purpose but something is going wrong somewhere here.
https://github.com/mozilla/gecko-dev/commit/b11fb7ca9d1c5c22e93d60ce0dc187705fcb32da
Comment 5•11 years ago
|
||
Adding Olli who knows a bit about the XUL prototype cache. Maybe he has a clue what's going on.
Comment 6•11 years ago
|
||
http://bonsai.mozilla.org/cvsblame.cgi?file=mozilla/content/xul/content/src/nsXULElement.cpp&rev=1.443
http://bonsai.mozilla.org/cvsview2.cgi?diff_mode=context&whitespace_mode=show&file=nsXULElement.cpp&branch=&root=/cvsroot&subdir=mozilla/content/xul/content/src&command=DIFF_FRAMESET&rev1=1.442&rev2=1.443
Comment 7•11 years ago
|
||
BTW, it's not only the session worker that suffers. The shared OS.File worker (resource://gre/modules/osfile/osfile_async_worker.js) is delayed by roughly the same amount of time. Measure in ChrashMonitor.jsm that uses it right on startup.
Comment 8•11 years ago
|
||
Starting with an empty startup cache, WorkerPrivate::WorkerPrivate() is called and DedicatedWorkerGlobalScope::PostMessage() right after that.
Starting with a filled startup cache there is a slight delay (probably around 400ms) between the worker creation and the call to PostMessage().
Comment 9•11 years ago
|
||
With a filled startup cache the CompileScriptRunnable seems to be taking longer, i.e. scriptLoader->LoadWorkerScript():
https://mxr.mozilla.org/mozilla-central/source/dom/workers/WorkerPrivate.cpp#880
Comment 10•11 years ago
|
||
LoadWorkerScript() with empty startup cache:
(292 ms) + (239 ms)
Filled startup cache:
(735 ms) + (675 ms)
We'll have Telemetry once bug 916076 lands.
Comment 12•11 years ago
|
||
Measuring ScriptLoaderRunnable times for an empty startup cache, measure from Run() to OnStreamCompleteInternal():
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_async_worker.js) (23 ms)
-- ScriptLoaderRunnable (resource:///modules/sessionstore/SessionWorker.js) (18 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile.jsm) (5 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/workers/require.js) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_back.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_shared_front.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_front.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/workers/require.js) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_back.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_shared_front.jsm) (7 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_front.jsm) (7 ms)
A filled cache yields the following:
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_async_worker.js) (54 ms)
-- ScriptLoaderRunnable (resource:///modules/sessionstore/SessionWorker.js) (49 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile.jsm) (3 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile.jsm) (3 ms)
-- ScriptLoaderRunnable (resource://gre/modules/workers/require.js) (94 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_back.jsm) (94 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_shared_front.jsm) (94 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_front.jsm) (94 ms)
-- ScriptLoaderRunnable (resource://gre/modules/workers/require.js) (93 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_back.jsm) (93 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_shared_front.jsm) (93 ms)
-- ScriptLoaderRunnable (resource://gre/modules/osfile/osfile_unix_front.jsm) (93 ms)
I was afraid that the slow down could be caused by workers/require.js, but comment 12 seems to suggest that this isn't the case, as some of the 94ms scripts are loaded using require(), others using importScripts().
Comment 14•11 years ago
|
||
It looks like browser.xul and its script files are loaded off the startup cache and that leads to the browser window being ready much earlier. That I think means the worker scripts have to fight for resources with the images and style sheets of the first browser window that then leads to a delayed worker initialization. Does that sound plausible?
Comment 15•11 years ago
|
||
Worker scripts aren't put into the startup cache so it seems like the best option would be to go with bug 959598 and not use Workers in the (critical) startup path.
Why wouldn't we just startup cache worker scripts that we care about?
Comment 17•11 years ago
|
||
That of course is another possibility. It just seems bad to rely on that list of worker scripts we care about being up to date at all times. Wouldn't it be easier to discourage from using workers for time-critical startup code? In hindsight, it does sound like a bad idea.
We don't have to manually maintain a list; we could startupcache all ChromeWorkers.
"Don't use parallelism" seems like a bad solution to me.
Comment 20•11 years ago
|
||
Right, startup caching all ChromeWorkers would indeed do. With a ~250ms (on a fast machine) worker startup time on first run, I wonder how fast we can become with the startup cache. I am totally not against caching worker scripts, it just seems the easier option to not use workers here?
If you need (or would benefit from) parallelism at startup then sure, use workers. But afaik we won't win much by using workers for disk I/O at startup where the disk is under heavy load and the bottleneck anyway?
Updated•11 years ago
|
Whiteboard: [Async:blocker]
Comment 21•11 years ago
|
||
(In reply to Kyle Huey [:khuey] (khuey@mozilla.com) (AFK Jan 22/23) from comment #18)
> We don't have to manually maintain a list; we could startupcache all
> ChromeWorkers.
Sounds like we should probably do this either way. Are there downsides I'm missing? Do we need to be concerned about increasing the startupcache size significantly, or other weird interactions?
(In reply to Tim Taubert [:ttaubert] from comment #20)
> But afaik we won't win much by using workers for disk I/O at
> startup where the disk is under heavy load and the bottleneck anyway?
Indeed. Sounds like we should fix bug 959598 and generally avoid relying on OS.File in the startup path.
Note that we're strongly providing a native version of OS.File.read that would completely side-step the issue.
Ok, we have numbers for the OS.File worker:
- 80% of measures show a startup above 200ms;
- 42% are above 600ms;
- 3% are above 5 seconds!
Source: http://telemetry.mozilla.org/#nightly/29/OSFILE_WORKER_LAUNCH_MS (from January 23rd).
That's pretty bad. Gavin, you have a bird's eye view on this. Can you put someone in charge?
Flags: needinfo?(gavin.sharp)
Comment 24•11 years ago
|
||
vladan, can you find someone to own "startupcache all ChromeWorkers"?
Flags: needinfo?(gavin.sharp)
Updated•11 years ago
|
Flags: needinfo?(vdjeric)
Comment 25•11 years ago
|
||
Asking Aaron for his opinion about comment #24, during the perf meeting it seemed like he definitely has one :)
Flags: needinfo?(vdjeric) → needinfo?(aklotz)
Comment 26•11 years ago
|
||
Normally I'm hesitant to add more stuff to the startup cache. Since this problem appears to be caused due to various interdependencies between some things that are already in the cache and some things that are not, I'm more inclined to go along with this provided that we verify that the changes do indeed mitigate this problem.
This stuff is precompiled during make package, so it's going to need some enhancements in http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/packager.py#116 and possibly http://mxr.mozilla.org/mozilla-central/source/toolkit/mozapps/installer/precompile_cache.js
Flags: needinfo?(aklotz)
Updated•7 years ago
|
Priority: -- → P5
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•