Closed
Bug 959130
Opened 11 years ago
Closed 11 years ago
don't use a ChromeWorker to read the session file at startup
Categories
(Firefox :: Session Restore, defect)
Firefox
Session Restore
Tracking
()
RESOLVED
FIXED
Firefox 29
People
(Reporter: Yoric, Assigned: smacleod)
References
Details
(Keywords: perf, regression, Whiteboard: [talos_regression][qa-])
Attachments
(8 files, 4 obsolete files)
(deleted),
application/force-download
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
application/force-download
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
ttaubert
:
review+
Sylvestre
:
approval-mozilla-aurora+
lsblakk
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
ttaubert
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review |
Graph http://screencast.com/t/cafAln40oXU, collected by joy, indicates a regression in the time between firstpaint and sessionrestored (this is measure d2 in the graph) in Firefox 25.
We need to investigate this. Note that this may be Session Restore, Add-on Manager or some other stuff that takes place between firstpaint and sessionrestored.
Updated•11 years ago
|
Summary: Investigate Firefox 25 regression on sessiontrestored → Investigate Firefox 25 regression on sessionrestored
Comment 1•11 years ago
|
||
Comment 2•11 years ago
|
||
My current assumption is that the sessionrestored notification fires a tad later due to moving I/O to a ChromeWorker. Reading and creating backup copies might be de-prioritized and thus leading to us firing notifications later.
Reporter | ||
Comment 3•11 years ago
|
||
For add-on manager, this looks shorter:
https://bugzilla.mozilla.org/buglist.cgi?list_id=9177896&resolution=FIXED&query_format=advanced&bug_status=RESOLVED&component=Add-On&component=Add-on&component=Add-on%20Manager&component=Add-on%20Security&component=Add-on%20Validation&component=Add-ons&component=Add-ons%20%26%20Plugins&component=Add-ons%20Manager&target_milestone=Firefox%2025
Reporter | ||
Comment 4•11 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #2)
> My current assumption is that the sessionrestored notification fires a tad
> later due to moving I/O to a ChromeWorker. Reading and creating backup
> copies might be de-prioritized and thus leading to us firing notifications
> later.
Also, there is a cost associated to launching the worker.
Reporter | ||
Comment 5•11 years ago
|
||
Joy, before we can proceed, we need to know the duration of this regression.
Flags: needinfo?(sguha)
Comment 6•11 years ago
|
||
From 1% sample, considering all the release US profiles
For each profile compute, the mean of sessionRestored - firstpaint across sessions per week for a given version.
Versions are in the set {24, 25.0, 25.0.1, 26.0}
This is the geometric mean of d2 across all profiles for a version across weeks
d2 = mean (sessionRestored - firstpaint in millisecond)
ver d2 = mean (sessionRestored - firstpaint in millisecond)
1: 24.0$ 245.1465
2: 25.0$ 897.2273
3: 25.0.1$ 686.6769
4: 26.0$ 747.9843
Hope this helps
Cheers
Saptarshi
Flags: needinfo?(sguha)
Comment 7•11 years ago
|
||
(In reply to "Saptarshi Guha[:joy]" from comment #6)
> From 1% sample, considering all the release US profiles
>
> For each profile compute, the mean of sessionRestored - firstpaint across
> sessions per week for a given version.
> Versions are in the set {24, 25.0, 25.0.1, 26.0}
>
> This is the geometric mean of d2 across all profiles for a version across
> weeks
>
> d2 = mean (sessionRestored - firstpaint in millisecond)
>
> ver d2 = mean (sessionRestored - firstpaint in millisecond)
> 1: 24.0$ 245.1465
> 2: 25.0$ 897.2273
> 3: 25.0.1$ 686.6769
> 4: 26.0$ 747.9843
>
> Hope this helps
> Cheers
> Saptarshi
If this is the geometric mean, can we get some measure of standard deviation or such as well? Some of us discussed earlier today that it might be that we caused an issue for a limited number of people and that's manipulating the average. Diagnosing that as opposed to a 'general' slowdown would help.
Flags: needinfo?(sguha)
Reporter | ||
Comment 8•11 years ago
|
||
(In reply to "Saptarshi Guha[:joy]" from comment #6)
> From 1% sample, considering all the release US profiles
>
> For each profile compute, the mean of sessionRestored - firstpaint across
> sessions per week for a given version.
> Versions are in the set {24, 25.0, 25.0.1, 26.0}
>
> This is the geometric mean of d2 across all profiles for a version across
> weeks
>
> d2 = mean (sessionRestored - firstpaint in millisecond)
>
> ver d2 = mean (sessionRestored - firstpaint in millisecond)
> 1: 24.0$ 245.1465
> 2: 25.0$ 897.2273
> 3: 25.0.1$ 686.6769
> 4: 26.0$ 747.9843
Is that in milliseconds?
Looking at the patch, I believe that we have added the following steps:
- launching the worker;
- writeLoadStateOnceAfterStartup (in case of sync fallback).
Assignee | ||
Comment 9•11 years ago
|
||
Tim and I have have an alternative hypothesis we're looking into. Bug 898308 is also present in the range, which moves the session store initialization for the first window to a little later in the delayed startup process.
Before the change we would initialize in the _delayedStartup() function, and after we would wait until "delayed-startup-finished" was fired. This moves a significant ammount of code to before initialization.
Tim took some initial measurements and on his rMBP this accounts for ~34ms, and ~70ms on linux. On a slower machine this could be even larger.
There is also some Windows specific initialization that occurs in between, which we're trying to get measurements on.
(In reply to "Saptarshi Guha[:joy]" from comment #6)
> From 1% sample, considering all the release US profiles
>
> For each profile compute, the mean of sessionRestored - firstpaint across
> sessions per week for a given version.
> Versions are in the set {24, 25.0, 25.0.1, 26.0}
>
> This is the geometric mean of d2 across all profiles for a version across
> weeks
>
> d2 = mean (sessionRestored - firstpaint in millisecond)
>
> ver d2 = mean (sessionRestored - firstpaint in millisecond)
> 1: 24.0$ 245.1465
> 2: 25.0$ 897.2273
> 3: 25.0.1$ 686.6769
> 4: 26.0$ 747.9843
>
> Hope this helps
> Cheers
> Saptarshi
joy, a breakdown by Operating system could be useful, would you be able to provide this?
Reporter | ||
Comment 10•11 years ago
|
||
(In reply to Steven MacLeod [:smacleod] from comment #9)
> Tim took some initial measurements and on his rMBP this accounts for ~34ms,
> and ~70ms on linux. On a slower machine this could be even larger.
The regression seems to be one order of magnitude larger.
I seem to remember measuring that launching a worker could explain 300ms on my machine.
Comment 11•11 years ago
|
||
PDF
D := sessionrestored - firstpaint
page 1: Box Plot of D in seconds, x axis is a log scale
To read the following: points above the grey line (y==x) implies that quantiles of the variable on the Y axis > quantiles of variable on X axis
page 2: QQ Plot of log(D) for v25 vs log(D) for v24
page 3: QQ Plot of log(D) for v25.0.1 vs log(D) for v24
page 4: QQ Plot of log(D) for v26 vs log(D) for v24
Percentiles (seconds)
p v24.0$ v25.0$ v26.0$ v25.0.1$
1: 0.00 0.000 0.000 0.000 0.000
2: 0.01 0.019 0.059 0.054 0.062
3: 0.05 0.030 0.115 0.095 0.112
4: 0.10 0.041 0.163 0.128 0.153
5: 0.20 0.063 0.261 0.185 0.229
6: 0.30 0.090 0.384 0.251 0.317
7: 0.40 0.129 0.553 0.335 0.431
8: 0.50 0.186 0.792 0.454 0.587
9: 0.60 0.280 1.156 0.635 0.815
10: 0.70 0.453 1.759 0.951 1.188
11: 0.80 0.842 2.884 1.636 1.904
12: 0.90 2.139 5.622 3.821 3.805
13: 0.95 4.544 10.016 8.259 6.952
14: 0.99 19.209 34.244 43.900 27.318
15: 1.00 918350.860 760061.238 80008.623 1245991.998
Flags: needinfo?(sguha)
Comment 12•11 years ago
|
||
We profiled on Paolo's machine and got the following profile:
https://people.mozilla.org/~bgirard/cleopatra/#report=01fc530bf31ad36ba35adbf295aa690989ccb8aa
Interesting is that we seem to wait a lot (~4s) on system calls in WaitForMessage(). This function has changed in Fx 25 (bug 896896):
https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2
Comment 13•11 years ago
|
||
Correction: this function has been added in Fx 25.
Comment 14•11 years ago
|
||
If anyone wants to give it a try and compare startup times, here's a patch backing out bug 896896. I unfortunately don't have a Windows machine at hand until next week.
Reporter | ||
Comment 15•11 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #12)
> We profiled on Paolo's machine and got the following profile:
Did Paolo get the time to perform several rounds of profiling to check whether this was stable diagnosis?
> Interesting is that we seem to wait a lot (~4s) on system calls in
> WaitForMessage(). This function has changed in Fx 25 (bug 896896):
>
> https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2
Nicholas, any chance that you could shed some light on this?
Flags: needinfo?(ncameron)
Comment 16•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #15)
> (In reply to Tim Taubert [:ttaubert] from comment #12)
> > We profiled on Paolo's machine and got the following profile:
>
> Did Paolo get the time to perform several rounds of profiling to check
> whether this was stable diagnosis?
>
> > Interesting is that we seem to wait a lot (~4s) on system calls in
> > WaitForMessage(). This function has changed in Fx 25 (bug 896896):
> >
> > https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2
>
> Nicholas, any chance that you could shed some light on this?
AFAIK, the implementation of WaitForMessage should be no less efficient than the previous use of WaitMessage. But these are some deep, dark windows internals things (perf-wise) so I might be wrong. We've been backing that change out of beta and release because it causes issues for some users. But as far as the docs go, it is the right approach. And it does prevent a real bug which appears when we use OMTC.
The time spent on system calls is probably just sleeping, waiting for a message. That should not be a problem, unless you think that is time that should be spent doing something.
Flags: needinfo?(ncameron)
Reporter | ||
Comment 17•11 years ago
|
||
(In reply to Nick Cameron [:nrc] from comment #16)
> The time spent on system calls is probably just sleeping, waiting for a
> message. That should not be a problem, unless you think that is time that
> should be spent doing something.
Having the main thread sleep a total of 4 seconds during startup seems a little surprising.
Comment 18•11 years ago
|
||
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #17)
> (In reply to Nick Cameron [:nrc] from comment #16)
> > The time spent on system calls is probably just sleeping, waiting for a
> > message. That should not be a problem, unless you think that is time that
> > should be spent doing something.
>
> Having the main thread sleep a total of 4 seconds during startup seems a
> little surprising.
Yeah, that seems bad. I am not sure if that is because of WaitForMultipleObjectsEx, or if there is some kind of *lock that would be there anyway which involves it. I am by no means an expert on this Windows API stuff.
Reporter | ||
Comment 19•11 years ago
|
||
We don't have the distribution of the regression across operating systems, but we suspect that it's Windows only. Testing under Windows seems to indicate a large regression (see comment 12) and small-to-trivial regressions on MacOS X and Linux that probably match bug 959598.
The attached patch lets us test without initializing or waiting for SessionWorker. Don't use this with your everyday profile.
Comment 20•11 years ago
|
||
Same interpretation as Comment[11]. This is for Windows XP.
p x.24.0$ x.25.0$ x.26.0$ x.25.0.1$
1: 0.00 0.000 0.000 0.000 0.000
2: 0.01 0.029 0.103 0.081 0.098
3: 0.05 0.054 0.197 0.149 0.184
4: 0.10 0.078 0.296 0.211 0.264
5: 0.20 0.127 0.492 0.334 0.418
6: 0.30 0.197 0.719 0.476 0.593
7: 0.40 0.300 1.006 0.652 0.806
8: 0.50 0.451 1.406 0.892 1.090
9: 0.60 0.675 1.987 1.256 1.497
10: 0.70 1.057 2.930 1.871 2.153
11: 0.80 1.856 4.573 3.144 3.374
12: 0.90 4.184 8.702 6.880 6.529
13: 0.95 8.440 15.623 14.526 11.933
14: 0.99 34.828 56.922 78.442 48.304
15: 1.00 918350.860 94209.719 62206.390 955376.719
Comment 21•11 years ago
|
||
Looking at those numbers it seems very likely that bug 896896 is the culprit. Probably in combination with SessionStore switching to its own I/O worker.
Fx 24 was good, Fx 25 was bad. Fx 25.0.1 was slightly better again because we backed out the WaitForMessage() change [1].
As Nick said this was backed out from Fx 26 as well but the numbers are still bad probably because we have all the Nightly, Beta, and Aurora usage data - which we don't have for 25.0.1.
[1] https://hg.mozilla.org/releases/mozilla-release/rev/33b48d5a8be3
Comment 22•11 years ago
|
||
(In reply to "Saptarshi Guha[:joy]" from comment #20)
> Created attachment 8361034 [details]
> Updated Numbers for WinXP
Joy, thank you for all the numbers. I'd be really interested if we see the same numbers on Windows 7. What about Linux or Mac? According to our current theory they shouldn't be at all or just slightly affected. I really hope it's not too much work getting that data :)
Flags: needinfo?(sguha)
Comment 23•11 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #22)
> (In reply to "Saptarshi Guha[:joy]" from comment #20)
> > Created attachment 8361034 [details]
> > Updated Numbers for WinXP
>
> Joy, thank you for all the numbers. I'd be really interested if we see the
> same numbers on Windows 7. What about Linux or Mac? According to our current
> theory they shouldn't be at all or just slightly affected. I really hope
> it's not too much work getting that data :)
Not a problem. Will get them and update the bug accordingly
Flags: needinfo?(sguha)
Comment 24•11 years ago
|
||
After some more investigation it looks like WaitForMessage() isn't to blame. Bug 959598 accounts for a huge delay on startup, on my XP VM it takes about 900ms to initialize the worker and read the session file.
Another problem are the sessionstore initialization changes. For small to medium sessions where we finish reading long before delayed-startup-finished is fired we wait 70-100ms until restoring the session into the initial window only because of all the stuff being called in delayedStartup(). We can fix this by calling SessionStore.initSession() back from where it was.
Reporter | ||
Comment 25•11 years ago
|
||
Here's a patch that restores using OS.File instead of SessionWorker to read the session.
Reporter | ||
Comment 26•11 years ago
|
||
Tim, could you test the performance with that patch?
Flags: needinfo?(ttaubert)
Comment 27•11 years ago
|
||
Your patch unfortunately doesn't apply at all but I just wrote my own version :)
OS.File with dedicated Worker: ~800ms
OS.File: : ~800ms (your patch)
NetUtil.asyncFetch() : ~ 70ms
Flags: needinfo?(ttaubert)
Comment 28•11 years ago
|
||
Results on my MBP are as follows:
It takes about ~220ms from calling |new ChromeWorker(worker.js)| to executing the first line of |worker.js|. Starting up workers seems really slow.
The other thing that isn't really great is that |importScripts("resource://gre/modules/osfile.jsm")| alone takes 60ms off the worker's "startup time" until it's ready to handle I/O requests.
Updated•11 years ago
|
Comment 29•11 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #28)
> It takes about ~220ms from calling |new ChromeWorker(worker.js)| to
> executing the first line of |worker.js|. Starting up workers seems really
> slow.
Just FTR, due to bug 960986 worker startup time is actually more like 600-700ms.
Assignee | ||
Comment 30•11 years ago
|
||
I have a patch to fix the initial session store read almost complete, which I'll post in the next day or two. It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader and see what's faster. There will be a bit of a hack due to Bug 937220, but we should be able to remove it easily once that bug is fixed.
Depends on: 937220
Comment 31•11 years ago
|
||
(In reply to Steven MacLeod [:smacleod] from comment #30)
> It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader
We should be careful with asyncFetch, since it can introduce main-thread I/O depending on the file stream being used. Feels like an all-native solution would be better still, I think we should prototype it and measure.
Reporter | ||
Comment 32•11 years ago
|
||
(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #31)
> (In reply to Steven MacLeod [:smacleod] from comment #30)
> > It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader
>
> We should be careful with asyncFetch, since it can introduce main-thread I/O
> depending on the file stream being used. Feels like an all-native solution
> would be better still, I think we should prototype it and measure.
I have a native OS.File.read implementation prototype in bug 961665.
Updated•11 years ago
|
Blocks: fxdesktopbacklog
Whiteboard: [triage]
Assignee | ||
Comment 33•11 years ago
|
||
This patch switches the SessionStore read over to using the DOMFileReader. To avoid crashes due to worker startup at shutdown we also start the worker with a noop during startup.
Assignee: nobody → smacleod
Attachment #8368075 -
Flags: review?(ttaubert)
Attachment #8368075 -
Flags: review?(dteller)
Assignee | ||
Comment 34•11 years ago
|
||
Attachment #8368076 -
Flags: review?(dteller)
Assignee | ||
Comment 35•11 years ago
|
||
With both of the previous patches applied on my machine I see an improvement to the (Session Restored - First Paint) times.
There was a lot of noise in the measurements, but I pulled out what seemed to be a typical case for each. The timestamps appear in order (except for the INFO.* stamps which are printed after Session Restored):
LEGEND:
[CALLING READ] - Time immediately before SessionFile.read() call
[WORKER STARTING] - Time immediately before first message posted to worker (either read, or noop)
[AFTER NOOP CALL] - Time immediately after first message posted to worker (after the post() call)
[WORKER STARTED] - Time of first line of JS execution inside SessionWorker (right before osfile import)
[WORKER OSFILE IMPORTED] - Time immediately after osfile is imported in worker
[READ FINISHED] - Time of first line inside |_onSessionFileRead| in nsSessionStartup
Others are self explanatory
# BEFORE CHANGES:
## COLD:
1391108642855 [CALLING READ]
1391108642858 [WORKER STARTING]
1391108642859 [AFTER NOOP CALL]
1391108643048 [WORKER STARTED]
1391108643110 [WORKER OSFILE IMPORTED]
1391108643113 [READ FINISHED]
1391108642546 [INFO.PROCESS]
1391108643771 [INFO.FIRSTPAINT]
1391108643853 [INFO.SESSIONRESTORED]
1225 [TIME TO FIRST PAINT]
82 [SESSION RESTORED - FIRST PAINT]
1307 [TIME TO SESSION RESTORED]
## HOT:
1391108651116 [CALLING READ]
1391108651118 [WORKER STARTING]
1391108651119 [AFTER NOOP CALL]
1391108651313 [WORKER STARTED]
1391108651946 [WORKER OSFILE IMPORTED]
1391108651952 [READ FINISHED]
1391108650861 [INFO.PROCESS]
1391108651784 [INFO.FIRSTPAINT]
1391108652008 [INFO.SESSIONRESTORED]
923 [TIME TO FIRST PAINT]
224 [SESSION RESTORED - FIRST PAINT]
1147 [TIME TO SESSION RESTORED]
1391108658025 [CALLING READ]
1391108658026 [WORKER STARTING]
1391108658027 [AFTER NOOP CALL]
1391108658253 [WORKER STARTED]
1391108658845 [WORKER OSFILE IMPORTED]
1391108658847 [READ FINISHED]
1391108657739 [INFO.PROCESS]
1391108658699 [INFO.FIRSTPAINT]
1391108658892 [INFO.SESSIONRESTORED]
960 [TIME TO FIRST PAINT]
193 [SESSION RESTORED - FIRST PAINT]
1153 [TIME TO SESSION RESTORED]
# AFTER CHANGES:
## COLD:
1391109078865 [CALLING READ]
1391109078868 [WORKER STARTING]
1391109078869 [AFTER NOOP CALL]
1391109079058 [READ FINISHED]
1391109079064 [WORKER STARTED]
1391109079128 [WORKER OSFILE IMPORTED]
1391109078564 [INFO.PROCESS]
1391109079776 [INFO.FIRSTPAINT]
1391109079827 [INFO.SESSIONRESTORED]
1212 [TIME TO FIRST PAINT]
51 [SESSION RESTORED - FIRST PAINT]
1263 [TIME TO SESSION RESTORED]
## HOT:
1391108903131 [CALLING READ]
1391108903132 [WORKER STARTING]
1391108903132 [AFTER NOOP CALL]
1391108903325 [READ FINISHED]
1391108903329 [WORKER STARTED]
1391108902873 [INFO.PROCESS]
1391108903828 [INFO.FIRSTPAINT]
1391108903905 [INFO.SESSIONRESTORED]
955 [TIME TO FIRST PAINT]
77 [SESSION RESTORED - FIRST PAINT]
1032 [TIME TO SESSION RESTORED]
1391108904100 [WORKER OSFILE IMPORTED]
1391108908894 [CALLING READ]
1391108908896 [WORKER STARTING]
1391108908896 [AFTER NOOP CALL]
1391108909093 [READ FINISHED]
1391108909096 [WORKER STARTED]
1391108908650 [INFO.PROCESS]
1391108909542 [INFO.FIRSTPAINT]
1391108909630 [INFO.SESSIONRESTORED]
892 [TIME TO FIRST PAINT]
88 [SESSION RESTORED - FIRST PAINT]
980 [TIME TO SESSION RESTORED]
1391108909803 [WORKER OSFILE IMPORTED]
Comment 36•11 years ago
|
||
Comment on attachment 8368076 [details] [diff] [review]
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm
Review of attachment 8368076 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +95,5 @@
> *
> * @return {Promise} A promise that resolves/rejects once loading is complete
> */
> loadPreviousCheckpoints: function () {
> + let d = Promise.defer();
Nit: maybe call it "deferred"?
@@ +99,5 @@
> + let d = Promise.defer();
> + CrashMonitorInternal.previousCheckpoints = d.promise;
> +
> + let reader = Cc["@mozilla.org/files/filereader;1"].
> + createInstance(Ci.nsIDOMFileReader);
Nit: please indent by two spaces.
@@ +108,5 @@
> + }
> +
> + reader.onload = function () {
> + let notifications = null;
> + let decoder = new TextDecoder();
We could combine the listeners by using reader.addEventListener("loadend", ...). When that is called we just check reader.error?
@@ +114,2 @@
> try {
> + let contents = decoder.decode(Uint8Array(reader.result));
Can this throw? If not we should move it out of the try stmt.
@@ +114,3 @@
> try {
> + let contents = decoder.decode(Uint8Array(reader.result));
> + notifications = Object.freeze(JSON.parse(contents));
Same here for freeze(), we could do that below when calling resolve().
@@ +117,2 @@
> } catch (ex) {
> Cu.reportError("Error while loading crash monitor data: " + ex);
Nit: "... error while parsing ..."
@@ +126,5 @@
> +
> + try {
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
Nit: retrieve
@@ +127,5 @@
> + try {
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
> + let file = File(dsFile);
Move that out of the try stmt?
@@ +128,5 @@
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
> + let file = File(dsFile);
> + reader.readAsArrayBuffer(file);
How about using reader.readAsText(file, "utf-8")? We could get rid of the TextDecoder and Uint8Array usage then, no?
@@ +133,5 @@
> + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
> + d.resolve(null);
> + } catch (ex) {
> + Cu.reportError("Error while loading crash monitor data: " + ex);
> + d.resolve(null);
Nit: maybe:
} catch (ex) {
if (ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
Cu.reportError("Error while loading crash monitor data: " + ex);
}
d.resolve(null);
}
Attachment #8368076 -
Flags: feedback+
Comment 37•11 years ago
|
||
(In reply to Tim Taubert [:ttaubert] from comment #36)
> @@ +128,5 @@
> > + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> > + // cause synchronous disk access as they check for existence and
> > + // retreieve file meta-data before the file is read asynchronously.
(nit: retrieve)
That's rather unfortunate :( IIRC Yoric suggested you file a bug (I don't see one on file). Perhaps we can fix it quickly?
Comment 38•11 years ago
|
||
Comment on attachment 8368075 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown
Review of attachment 8368075 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/sessionstore/src/SessionFile.jsm
@@ +152,5 @@
> + * string The name of the session file.
> + * @returns {promise}
> + */
> + _readSessionFile: function (filename) {
> + let d = Promise.defer();
Nit: deferred
@@ +161,5 @@
> + reader.onerror = function () {
> + d.reject(reader.error);
> + }
> +
> + reader.onload = function () {
addEventListener("loadend") and check reader.error?
@@ +181,5 @@
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
> + let file = File(dsFile);
> + reader.readAsArrayBuffer(file);
readAsText()?
@@ +184,5 @@
> + let file = File(dsFile);
> + reader.readAsArrayBuffer(file);
> + } catch (ex) {
> + d.reject(ex);
> + }
Do we need to reject here? Wouldn't we just continue with the next file as the exception is caught in read()?
Attachment #8368075 -
Flags: review?(ttaubert) → feedback+
Reporter | ||
Updated•11 years ago
|
Severity: normal → major
Reporter | ||
Comment 39•11 years ago
|
||
Comment on attachment 8368075 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown
Review of attachment 8368075 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/sessionstore/src/SessionFile.jsm
@@ +127,5 @@
> */
> _isClosed: false,
>
> read: function () {
> + SessionWorker.post("noop");
Please add a comment explaining why you do this.
Also, couldn't we do this in app-startup in nsSessionStartup.js?
@@ +179,5 @@
> +
> + try {
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
Mention the bug number.
Also, "retrieve."
@@ +181,5 @@
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
> + let file = File(dsFile);
> + reader.readAsArrayBuffer(file);
readAsText() would simplify the code, and might possibly be faster.
::: browser/components/sessionstore/src/SessionWorker.js
@@ +71,3 @@
> */
> + noop: function () {
> + return {result: true};
Let's call it |init|.
Attachment #8368075 -
Flags: review?(dteller) → feedback+
Comment 40•11 years ago
|
||
FTR, testing both patches on my machine yielded good but not great results. They took as long as reading using the ChromeWorker (200ms) but that was after all an almost constant time and didn't increase to 800ms with a filled scache anymore.
Trying NetUtil.asyncFetch() I saw 140ms read times. We had discarded that idea earlier because it seemed hard to get the encoding right but I think we found code we can steal from Metro. Steven will give that a try.
Reporter | ||
Comment 41•11 years ago
|
||
Comment on attachment 8368076 [details] [diff] [review]
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm
Review of attachment 8368076 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +121,5 @@
> + d.resolve(notifications);
> + }
> +
> + let dsFile = Services.dirsvc.get("ProfD", Ci.nsIFile);
> + dsFile.append("sessionCheckpoints.json");
I'd prefer if you created the nsIFile and initialized it from CrashMonitorInternal.path.
@@ +128,5 @@
> + // Note that both the call to File(...) and readAsArrayBuffer(...) will
> + // cause synchronous disk access as they check for existence and
> + // retreieve file meta-data before the file is read asynchronously.
> + let file = File(dsFile);
> + reader.readAsArrayBuffer(file);
As in the other patch, readAsText would probably be better.
@@ +133,5 @@
> + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
> + d.resolve(null);
> + } catch (ex) {
> + Cu.reportError("Error while loading crash monitor data: " + ex);
> + d.resolve(null);
As Tim, but with a != instead of a ==.
Attachment #8368076 -
Flags: review?(dteller) → feedback+
Assignee | ||
Comment 42•11 years ago
|
||
I left the noop worker message (now "init") inside sessionfile to keep things as simple as possible. Patch now uses NetUtil.
Attachment #8368075 -
Attachment is obsolete: true
Attachment #8368946 -
Flags: review?(ttaubert)
Assignee | ||
Comment 43•11 years ago
|
||
Patch now uses NetUtil.
Attachment #8368076 -
Attachment is obsolete: true
Attachment #8368949 -
Flags: review?(ttaubert)
Assignee | ||
Comment 44•11 years ago
|
||
Comment 45•11 years ago
|
||
Comment on attachment 8368946 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2
Review of attachment 8368946 [details] [diff] [review]:
-----------------------------------------------------------------
Thanks!
::: browser/components/sessionstore/src/SessionFile.jsm
@@ +129,5 @@
>
> read: function () {
> + // We must initialize the worker during startup so it will be ready to
> + // perform the final write. If shutdown happens soon after startup and
> + // the worker has not started yet it will cause lockups.
Nit: I hope the remaining lockup will be fixed soon. The actual reason we need to do it here is possible dataloss.
@@ +133,5 @@
> + // the worker has not started yet it will cause lockups.
> + // See Bug 964531.
> + SessionWorker.post("init");
> +
> + return Task.spawn(function () {
Nit: function* () { [if that works with .bind(this), if not just ignore me]
Attachment #8368946 -
Flags: review?(ttaubert) → review+
Comment 46•11 years ago
|
||
Comment on attachment 8368946 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2
Review of attachment 8368946 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/components/sessionstore/src/SessionFile.jsm
@@ +139,5 @@
> + for (let filename of [this.path, this.backupPath]) {
> + try {
> + data = yield this._readSessionFile(filename);
> + break;
> + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) {
Wait, did you try this? If you reject(status) below what's the exception that then arrives here? Does rejecting a promise in Task.jsm cause an exception? I never tried this. Would be great to maybe remove all the files in your profile dir and check that it works as expected.
Comment 47•11 years ago
|
||
Comment on attachment 8368949 [details] [diff] [review]
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v2
Review of attachment 8368949 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM
::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +102,5 @@
> + let file = FileUtils.File(CrashMonitorInternal.path);
> + NetUtil.asyncFetch(file, function(inputStream, status) {
> + if (!Components.isSuccessCode(status)) {
> + if (status != Cr.NS_ERROR_FILE_NOT_FOUND) {
> + Cu.reportError("Error while loading crash monitor data: " + ex);
"ex" isn't defined here.
Attachment #8368949 -
Flags: review?(ttaubert) → review+
Assignee | ||
Comment 48•11 years ago
|
||
Attachment #8368946 -
Attachment is obsolete: true
Attachment #8369390 -
Flags: review?(ttaubert)
Assignee | ||
Comment 49•11 years ago
|
||
Attachment #8368949 -
Attachment is obsolete: true
Attachment #8369392 -
Flags: review?(ttaubert)
Comment 50•11 years ago
|
||
Comment on attachment 8369390 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3
Review of attachment 8369390 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM!
Attachment #8369390 -
Flags: review?(ttaubert) → review+
Comment 51•11 years ago
|
||
Comment on attachment 8369392 [details] [diff] [review]
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v3
Review of attachment 8369392 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/components/crashmonitor/CrashMonitor.jsm
@@ +119,2 @@
> } catch (ex) {
> Cu.reportError("Error while loading crash monitor data: " + ex);
Nit: Error while parsing...
Attachment #8369392 -
Flags: review?(ttaubert) → review+
Comment 52•11 years ago
|
||
Updated•11 years ago
|
No longer blocks: fxdesktopbacklog
Whiteboard: [triage]
Updated•11 years ago
|
status-firefox27:
--- → wontfix
status-firefox28:
--- → affected
status-firefox29:
--- → affected
tracking-firefox28:
--- → ?
tracking-firefox29:
--- → ?
Comment 53•11 years ago
|
||
I assume this won't make Fx 29 but let's see.
status-firefox30:
--- → fixed
Assignee | ||
Comment 54•11 years ago
|
||
Comment on attachment 8369390 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 959130 (This bug)
User impact if declined: Startup perf regression documented in this bug
Testing completed (on m-c, etc.): Tested locally, pushed to try, landing [1]
Risk to taking this patch (and alternatives if risky): Shouldn't be very risky, just a small change to read the file differently outside of the worker.
String or IDL/UUID changes made by this patch: None
[1] https://hg.mozilla.org/integration/fx-team/rev/13d1ff429b44
Attachment #8369390 -
Flags: approval-mozilla-aurora?
Comment 55•11 years ago
|
||
Since it hasn't landed to central yet, we might have to uplift this to Aurora/Beta after the merge - tracking so we make sure it does.
Comment 56•11 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/13d1ff429b44
https://hg.mozilla.org/mozilla-central/rev/75303a3ddc0c
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 29
Comment 57•11 years ago
|
||
It looks like there has been a 27% talos regression on "Fx-Team-Non-PGO - Tp5 No Network Row Major MozAfterPaint (Non-Main Startup File IO Bytes)".
https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/StAkmt81_ec
http://mzl.la/1fqYBZ9
Tp5 is a page loader test but the "Startup File IO Bytes" hints to it measuring I/O going through file channels at startup. We previously omitted those channels by using OS.File so it seems reasonable that there is of course more I/O reported now.
I think it is safe to ignore but wanted to write it down in case someone thinks otherwise.
Joel, can you confirm that this test regressed only because of more bytes going through the file channels at startup? Other Tp5 tests would have regressed as well if this would really impact page load time, correct?
Flags: needinfo?(jmaher)
Updated•11 years ago
|
Flags: needinfo?(jmaher)
Whiteboard: [talos_regression]
Comment 58•11 years ago
|
||
this test is xperf specfically- we measure bytes on startup and during loading pages and report the total bytes from those accessed files. When these numbers change it is usually related to loading DLL's that we didn't load previously (sometimes things are delay loaded) or related to some new feature that adds/requires libraries which were not there before.
Reporter | ||
Comment 59•11 years ago
|
||
If this is a xperf test, this means that OS.File was not hiding byte usage.
Reporter | ||
Comment 60•11 years ago
|
||
I strongly doubt it's related to this bug.
At the moment, none of the Talos tests has a sessionstore.js, so I can't imagine us causing 125kb+ of I/O during startup. Barring any mistake, all the code of asyncFetch resides in libxul and libc/win32.dll or equivalent, so us causing a dll load sounds unlikely, too.
Comment 61•11 years ago
|
||
Ah, I didn't think about those startup profiles not having a sessionstore.js file. It's probably completely unrelated then and safe to ignore.
Updated•11 years ago
|
Attachment #8369390 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Updated•11 years ago
|
Assignee | ||
Comment 63•11 years ago
|
||
Comment on attachment 8369390 [details] [diff] [review]
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3
[Approval Request Comment]
Bug caused by (feature/regressing bug #): Bug 959130 (This bug)
User impact if declined: Startup perf regression documented in this bug
Testing completed (on m-c, etc.): On m-c and aurora currently.
Risk to taking this patch (and alternatives if risky): Introduces a small talos regression (Bug 970043) which we believe is worth the much larger perf win introduced. Otherwise pretty low risk.
String or IDL/UUID changes made by this patch: None
Attachment #8369390 -
Flags: approval-mozilla-beta?
Flags: needinfo?(smacleod)
Updated•11 years ago
|
Attachment #8369390 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 64•11 years ago
|
||
Steven, can you please provide a patch that applies on Beta?
Flags: needinfo?(smacleod)
Keywords: branch-patch-needed
Assignee | ||
Comment 65•11 years ago
|
||
Flags: needinfo?(smacleod)
Assignee | ||
Comment 66•11 years ago
|
||
Patch for beta uplift is now attached.
Keywords: branch-patch-needed → checkin-needed
Comment 67•11 years ago
|
||
Keywords: checkin-needed
Comment 68•11 years ago
|
||
Deprioritizing for QA testing before we release. If there's any chance this could cause regressions in Session Restore functionality then please flag for testing by removing [qa-] and adding the verifyme keyword.
Whiteboard: [talos_regression] → [talos_regression][qa-]
Comment 69•11 years ago
|
||
Updated•11 years ago
|
status-b2g-v1.3:
--- → fixed
Updated•3 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•