Closed
Bug 961318
Opened 11 years ago
Closed 11 years ago
Random big delays (up to 5-10s) when opening new windows/chrome documents
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
VERIFIED
FIXED
mozilla29
People
(Reporter: dagger.bugzilla, Assigned: luke)
References
Details
(Keywords: regression)
Attachments
(5 files)
(deleted),
application/x-javascript
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
image/gif
|
Details | |
(deleted),
patch
|
billm
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
lsblakk
:
approval-mozilla-aurora+
|
Details | Diff | Splinter Review |
I'm seeing periodic delays of up to 5-10s when opening new dialogs and windows (any windows, not just browser.xul ones), and when opening some chrome documents (at least about:config) into tabs. Weirdly enough, the browser stays completely responsive while the windows load and doesn't completely peg the CPU -- the window just takes a long time to show up.
I've attached a script that measures the window open delay. Run it on a chrome scratchpad (Enable devtools.chrome.enabled -> Shift-F4 -> Environment -> Browser), and it'll open and immediately close a dialog 30 times, printing timing info to the console. The length of the delays seems to depend on memory consumption, so opening up some tabs with a memory-heavy page (I guess Gmail or Facebook; http://us.gizmodo.com/ perhaps if you don't have accounts) will make them a lot more noticeable.
Here's an example of the script output for me (with my Nightly profile that's consuming ~950 MB of RAM). Every 10s or so, the window takes 4-7s to open:
Delay: 31ms (0s since start)
Delay: 4109ms (5.6s since start)
Delay: 93ms (7.3s since start)
Delay: 110ms (8.9s since start)
Delay: 47ms (10.4s since start)
Delay: 6344ms (18.3s since start)
Delay: 31ms (19.8s since start)
Delay: 47ms (21.4s since start)
Delay: 31ms (22.9s since start)
Delay: 6953ms (31.3s since start)
Delay: 47ms (32.9s since start)
Delay: 94ms (34.5s since start)
Delay: 125ms (36.1s since start)
Delay: 6157ms (43.8s since start)
Delay: 62ms (45.4s since start)
Delay: 125ms (47s since start)
Delay: 109ms (48.6s since start)
Delay: 4375ms (54.5s since start)
I see the same problem in a clean profile, provided I open some pages to push the RAM usage up.
Regression range is https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=06c8949f78a0&tochange=1d9356c6c769 so I strongly suspect bug 931864.
github
Reporter | ||
Comment 1•11 years ago
|
||
This is the XUL the script loads in the dialog (i.e. the big data: URI). This line loads an empty script:
<script type="application/javascript;version=1.7" src="data:application/x-javascript;charset=utf-8,"/>
Oddly enough, without this line, the dialog consistently opens in <150ms. Using an actual .js file instead of a data: URI makes no difference, but if the script is inline:
<script type="application/javascript;version=1.7"> // script here </script>
it works just fine with no delays.
(I also tested without the "load" handler, running the JS immediately, and it made no difference.)
Reporter | ||
Comment 2•11 years ago
|
||
And here's a .gif for good measure.
Reporter | ||
Comment 3•11 years ago
|
||
> github
Whoops. Anybody reading #developers from an hour or so ago will know why I randomly have "github" there ;)
Updated•11 years ago
|
tracking-firefox29:
--- → ?
Assignee | ||
Comment 4•11 years ago
|
||
What the patch in bug 931864 did is make async script parsing block on the completion of incremental GC. (Before the patch, we'd just parse the async script synchronously making the determination of whether an async script was parsed asychronously highly non-deterministic.) Thus, on the slow load times, the window is being opened during a full GC and the delay is the time it takes for the GC to complete. GC time is proportional to the amount of life stuff, so it makes sense that you see worse times for a big profile.
I'm not sure if there is an easy fix for this or how much work is justified to fix it. This only happens for async scripts (which is why the inline script in your experiment wasn't affected) where there is already an expectation of delay (from network latency or waiting for other scripts to finish execution). Also, this only happens for full GCs which are not usually so frequent and close together (closing windows triggers them, which is why your script hits this regularly).
One thought is that, since the GC is incremental, it'll take a longer to complete (because it is letting content run in between GC slices) so perhaps we could do something to try to rush the incremental GC when an async script is waiting to be parsed. Any thoughts on that Bill?
Assignee | ||
Updated•11 years ago
|
Status: UNCONFIRMED → NEW
Ever confirmed: true
Comment 5•11 years ago
|
||
> This only happens for async scripts (which is why the inline script in your experiment
> wasn't affected) where there is already an expectation of delay (from network latency or
> waiting for other scripts to finish execution).
Then chrome code shouldn't be using async scripts on the blocking path to showing a window, perhaps? Right now it does. The user experience of hitting Cmd-N or Ctrl-N and then there being no visible response for 5s is a lot worse than an immediate window appearing that's not usable for 5s (which itself is a lot worse than an immediate and immediately usable window).
Assignee | ||
Comment 6•11 years ago
|
||
Yeah, although that also sounds unpleasant.
Alternatively, we could say that, if an off-thread parse task is started, we disable collection of the atom's zone. This has the possibility of starvation given a constant stream of async script parse jobs, but it doesn't seem like it would arise in practice...
Assignee | ||
Comment 7•11 years ago
|
||
On second though, if the atoms compartment did get starved for collection, eventually we'd get one of those powerful (purge/shrinking?) GCs which I assume is non-incremental? Just guessing here, perhaps Bill could weigh in.
Flags: needinfo?(wmccloskey)
I don't really like the idea of disabling collection of the atoms zone. First, we don't currently have a mechanism to cancel collection of a given compartment when the GC has already started. It might be hard to add this, too, since we could have already executed some |if (isFullGC)| checks, and it we'd have to back off from that somehow.
The starvation thing is also an issue. We don't have any good backstop there. Eventually when the zone got big enough, we'd trigger a non-incremental GC. However, the triggering logic doesn't take the actual string size into account--only sizeof(JSString)--so we could end up with pretty bad behavior. We also have the malloc bytes trigger, but I doubt it would work in this case since it gets reset on every GC.
Could we put in a check so that we only do the parsing off-thread during a GC if the script is huge? My understanding is that the original patch was meant to handle multi-megabyte asm.js scripts. I'm guessing that even browser.js isn't that big.
We really need to find a long-term fix for the atoms zone. I talked about the problem with Brian during the GC work week, but I don't remember the conclusion we came to. Maybe we can talk to him when he gets back.
Flags: needinfo?(wmccloskey)
Assignee | ||
Comment 9•11 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #8)
> Could we put in a check so that we only do the parsing off-thread during a
> GC if the script is huge? My understanding is that the original patch was
> meant to handle multi-megabyte asm.js scripts. I'm guessing that even
> browser.js isn't that big.
Yeah, that's probably the best short-term fix. Caching isn't even used unless the script is >10,000 chars.
Assignee | ||
Comment 10•11 years ago
|
||
browser.js seems to be 549,150 chars. However, it doesn't seem to be async compiled again after the initial load. Other than that, when opening a new tab, the biggest script to be async compiled is 58,877 chars (chrome://browser/content/newtab/newTab.js). The second biggest script is less than 10,000 chars (the caching minimum threshold).
It seems kinda sad to disable async parsing for *all* scripts <59K chars just because of this one potential problem. Rather, I'm thinking we disable if rt->activeGCInAtomsZone() AND num-chars < 500,000. This seems like a fair tradeoff: if a full GC is in progress, it's probably faster not to wait for the GC to finish; but for big scripts, we'd still like to avoid the main-thread jank.
(In reply to Luke Wagner [:luke] from comment #10)
> It seems kinda sad to disable async parsing for *all* scripts <59K chars
> just because of this one potential problem. Rather, I'm thinking we disable
> if rt->activeGCInAtomsZone() AND num-chars < 500,000. This seems like a
> fair tradeoff: if a full GC is in progress, it's probably faster not to wait
> for the GC to finish; but for big scripts, we'd still like to avoid the
> main-thread jank.
Yes, I think that makes sense.
Assignee | ||
Comment 12•11 years ago
|
||
With this patch, the script in comment 0 produces pretty regular <30ms load times. I also threw in the heuristic of not using async parsing for tiny scripts because, seriously, it's not cheap; we create a whole Zone for each task.
Comment on attachment 8362725 [details] [diff] [review]
fix-off-thread-heuristics
Review of attachment 8362725 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jsworkers.cpp
@@ +234,5 @@
> +{
> + // Off thread parsing can't occur during incremental collections on the
> + // atoms compartment, to avoid triggering barriers. (Outside the atoms
> + // compartment, the compilation will use a new zone which doesn't require
> + // barriers itself.) If an atoms-zone GC is in progress, hold off on
Rather than say "which doesn't require barriers itself", I think it would be clearer to say "that is never collected".
Attachment #8362725 -
Flags: review?(wmccloskey) → review+
Assignee | ||
Comment 14•11 years ago
|
||
Bill, I was just thinking that 1,000,000 is overly conservative. Several large asm.js codebases are in the 100,000-1,000,000 range, with 100-300ms compile times. Again, browser.js is above this limit, but it isn't parsed repeatedly (thank goodness; XUL prototype cache, I'm guessing?).
Assignee | ||
Comment 15•11 years ago
|
||
So does 100,000 seem reasonable?
Sure, that's fine.
Assignee | ||
Comment 17•11 years ago
|
||
Comment 18•11 years ago
|
||
Comment 19•11 years ago
|
||
And just for variety, Android 2.2 (but not Android 2.2 Armv6) mochitest-4, https://tbpl.mozilla.org/php/getParsedLog.php?id=33303231&tree=Mozilla-Inbound
Assignee | ||
Comment 20•11 years ago
|
||
D'oh! In the shell, offThreadCompileScript wants to ignore these heuristics, so I'll add a CompileOption.forceCaching that is set only from this shell function.
The one mochitest failure will be an activeGCInAtomsZone for a script < 100,000 chars. The scripts are already generated to be bigger than 10,000 chars, so I'll just bump them up in size.
Assignee | ||
Comment 21•11 years ago
|
||
Comment 22•11 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Updated•11 years ago
|
Reporter | ||
Comment 23•11 years ago
|
||
I can verify this is fixed on Windows, if that's enough for VERIFIED.
Should this be uplifted to Aurora, since bug 931864 landed on 28?
Comment 24•11 years ago
|
||
Good catch! Nominating this for 28 tracking.
tracking-firefox28:
--- → ?
Keywords: regression
Assignee | ||
Comment 25•11 years ago
|
||
[Approval Request Comment]
Bug caused by (feature/regressing bug #): 931864
User impact if declined: random Chrome stalls when loading
Testing completed (on m-c, etc.): m-c
Risk to taking this patch (and alternatives if risky): low
(Attaching patch with simple rebase conflicts fixed.)
Attachment #8367539 -
Flags: approval-mozilla-aurora?
Updated•11 years ago
|
Updated•11 years ago
|
Attachment #8367539 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 26•11 years ago
|
||
Comment 27•11 years ago
|
||
I've verified the bug using the following environment:
Firefox 29
User Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:29.0) Gecko/20100101 Firefox/29.0
Build Id: 20140203004003
Windows 7 x64
Firefox 28
User Agent:Mozilla/5.0 (Windows NT 6.1; WOW64; rv:28.0) Gecko/20100101 Firefox/28.0
Build Id:20140203004003
Windows 7 x64
FF Aurora 29
User Agent:Mozilla/5.0 (Macintosh; Intel Mac OS X 10.9; rv:28.0) Gecko/20100101 Firefox/28.0
Build Id: 20140204004003
Mac OS X 10.9
FF Aurora 29
User Agent:Mozilla/5.0 (X11; Linux i686; rv:29.0) Gecko/20100101 Firefox/29.0
Build Id:20140204004003
Ubuntu 12.04 x32
Status: RESOLVED → VERIFIED
Keywords: verifyme
You need to log in
before you can comment on or make changes to this bug.
Description
•