959130 - don't use a ChromeWorker to read the session file at startup

David Teller [:Yoric] - still alive but not very active

Reporter

Description

•

11 years ago

Graph http://screencast.com/t/cafAln40oXU, collected by joy, indicates a regression in the time between firstpaint and sessionrestored (this is measure d2 in the graph) in Firefox 25. We need to investigate this. Note that this may be Session Restore, Add-on Manager or some other stuff that takes place between firstpaint and sessionrestored.

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Summary: Investigate Firefox 25 regression on sessiontrestored → Investigate Firefox 25 regression on sessionrestored

Tim Taubert [:ttaubert] (inactive)

Comment 1

•

11 years ago

The list of sessionstore changes in Fx 25: https://bugzilla.mozilla.org/buglist.cgi?resolution=FIXED&query_format=advanced&bug_status=RESOLVED&component=Session%20Restore&target_milestone=Firefox%2025&list_id=9177857

Tim Taubert [:ttaubert] (inactive)

Comment 2

•

11 years ago

My current assumption is that the sessionrestored notification fires a tad later due to moving I/O to a ChromeWorker. Reading and creating backup copies might be de-prioritized and thus leading to us firing notifications later.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 3

•

11 years ago

For add-on manager, this looks shorter: https://bugzilla.mozilla.org/buglist.cgi?list_id=9177896&resolution=FIXED&query_format=advanced&bug_status=RESOLVED&component=Add-On&component=Add-on&component=Add-on%20Manager&component=Add-on%20Security&component=Add-on%20Validation&component=Add-ons&component=Add-ons%20%26%20Plugins&component=Add-ons%20Manager&target_milestone=Firefox%2025

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 4

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #2) > My current assumption is that the sessionrestored notification fires a tad > later due to moving I/O to a ChromeWorker. Reading and creating backup > copies might be de-prioritized and thus leading to us firing notifications > later. Also, there is a cost associated to launching the worker.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 5

•

11 years ago

Joy, before we can proceed, we need to know the duration of this regression.

Flags: needinfo?(sguha)

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Depends on: 959598

"Saptarshi Guha[:joy]"

Comment 6

•

11 years ago

From 1% sample, considering all the release US profiles For each profile compute, the mean of sessionRestored - firstpaint across sessions per week for a given version. Versions are in the set {24, 25.0, 25.0.1, 26.0} This is the geometric mean of d2 across all profiles for a version across weeks d2 = mean (sessionRestored - firstpaint in millisecond) ver d2 = mean (sessionRestored - firstpaint in millisecond) 1: 24.0$ 245.1465 2: 25.0$ 897.2273 3: 25.0.1$ 686.6769 4: 26.0$ 747.9843 Hope this helps Cheers Saptarshi

Flags: needinfo?(sguha)

:Gijs (he/him)

Comment 7

•

11 years ago

(In reply to "Saptarshi Guha[:joy]" from comment #6) > From 1% sample, considering all the release US profiles > > For each profile compute, the mean of sessionRestored - firstpaint across > sessions per week for a given version. > Versions are in the set {24, 25.0, 25.0.1, 26.0} > > This is the geometric mean of d2 across all profiles for a version across > weeks > > d2 = mean (sessionRestored - firstpaint in millisecond) > > ver d2 = mean (sessionRestored - firstpaint in millisecond) > 1: 24.0$ 245.1465 > 2: 25.0$ 897.2273 > 3: 25.0.1$ 686.6769 > 4: 26.0$ 747.9843 > > Hope this helps > Cheers > Saptarshi If this is the geometric mean, can we get some measure of standard deviation or such as well? Some of us discussed earlier today that it might be that we caused an issue for a limited number of people and that's manipulating the average. Diagnosing that as opposed to a 'general' slowdown would help.

Flags: needinfo?(sguha)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 8

•

11 years ago

(In reply to "Saptarshi Guha[:joy]" from comment #6) > From 1% sample, considering all the release US profiles > > For each profile compute, the mean of sessionRestored - firstpaint across > sessions per week for a given version. > Versions are in the set {24, 25.0, 25.0.1, 26.0} > > This is the geometric mean of d2 across all profiles for a version across > weeks > > d2 = mean (sessionRestored - firstpaint in millisecond) > > ver d2 = mean (sessionRestored - firstpaint in millisecond) > 1: 24.0$ 245.1465 > 2: 25.0$ 897.2273 > 3: 25.0.1$ 686.6769 > 4: 26.0$ 747.9843 Is that in milliseconds? Looking at the patch, I believe that we have added the following steps: - launching the worker; - writeLoadStateOnceAfterStartup (in case of sync fallback).

Steven MacLeod [:smacleod]

Assignee

Comment 9

•

11 years ago

Tim and I have have an alternative hypothesis we're looking into. Bug 898308 is also present in the range, which moves the session store initialization for the first window to a little later in the delayed startup process. Before the change we would initialize in the _delayedStartup() function, and after we would wait until "delayed-startup-finished" was fired. This moves a significant ammount of code to before initialization. Tim took some initial measurements and on his rMBP this accounts for ~34ms, and ~70ms on linux. On a slower machine this could be even larger. There is also some Windows specific initialization that occurs in between, which we're trying to get measurements on. (In reply to "Saptarshi Guha[:joy]" from comment #6) > From 1% sample, considering all the release US profiles > > For each profile compute, the mean of sessionRestored - firstpaint across > sessions per week for a given version. > Versions are in the set {24, 25.0, 25.0.1, 26.0} > > This is the geometric mean of d2 across all profiles for a version across > weeks > > d2 = mean (sessionRestored - firstpaint in millisecond) > > ver d2 = mean (sessionRestored - firstpaint in millisecond) > 1: 24.0$ 245.1465 > 2: 25.0$ 897.2273 > 3: 25.0.1$ 686.6769 > 4: 26.0$ 747.9843 > > Hope this helps > Cheers > Saptarshi joy, a breakdown by Operating system could be useful, would you be able to provide this?

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 10

•

11 years ago

(In reply to Steven MacLeod [:smacleod] from comment #9) > Tim took some initial measurements and on his rMBP this accounts for ~34ms, > and ~70ms on linux. On a slower machine this could be even larger. The regression seems to be one order of magnitude larger. I seem to remember measuring that launching a worker could explain 300ms on my machine.

"Saptarshi Guha[:joy]"

Comment 11

•

11 years ago

Attached file foo.pdf (deleted) — Details

PDF D := sessionrestored - firstpaint page 1: Box Plot of D in seconds, x axis is a log scale To read the following: points above the grey line (y==x) implies that quantiles of the variable on the Y axis > quantiles of variable on X axis page 2: QQ Plot of log(D) for v25 vs log(D) for v24 page 3: QQ Plot of log(D) for v25.0.1 vs log(D) for v24 page 4: QQ Plot of log(D) for v26 vs log(D) for v24 Percentiles (seconds) p v24.0$ v25.0$ v26.0$ v25.0.1$ 1: 0.00 0.000 0.000 0.000 0.000 2: 0.01 0.019 0.059 0.054 0.062 3: 0.05 0.030 0.115 0.095 0.112 4: 0.10 0.041 0.163 0.128 0.153 5: 0.20 0.063 0.261 0.185 0.229 6: 0.30 0.090 0.384 0.251 0.317 7: 0.40 0.129 0.553 0.335 0.431 8: 0.50 0.186 0.792 0.454 0.587 9: 0.60 0.280 1.156 0.635 0.815 10: 0.70 0.453 1.759 0.951 1.188 11: 0.80 0.842 2.884 1.636 1.904 12: 0.90 2.139 5.622 3.821 3.805 13: 0.95 4.544 10.016 8.259 6.952 14: 0.99 19.209 34.244 43.900 27.318 15: 1.00 918350.860 760061.238 80008.623 1245991.998

Flags: needinfo?(sguha)

Tim Taubert [:ttaubert] (inactive)

Comment 12

•

11 years ago

We profiled on Paolo's machine and got the following profile: https://people.mozilla.org/~bgirard/cleopatra/#report=01fc530bf31ad36ba35adbf295aa690989ccb8aa Interesting is that we seem to wait a lot (~4s) on system calls in WaitForMessage(). This function has changed in Fx 25 (bug 896896): https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2

Tim Taubert [:ttaubert] (inactive)

Comment 13

•

11 years ago

Correction: this function has been added in Fx 25.

Tim Taubert [:ttaubert] (inactive)

Comment 14

•

11 years ago

Attached patch 0001-Bug-959130-Back-out-cset-4887f7d34df2-bug-896896.patch (deleted) — Details — Splinter Review

If anyone wants to give it a try and compare startup times, here's a patch backing out bug 896896. I unfortunately don't have a Windows machine at hand until next week.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 15

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #12) > We profiled on Paolo's machine and got the following profile: Did Paolo get the time to perform several rounds of profiling to check whether this was stable diagnosis? > Interesting is that we seem to wait a lot (~4s) on system calls in > WaitForMessage(). This function has changed in Fx 25 (bug 896896): > > https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2 Nicholas, any chance that you could shed some light on this?

Flags: needinfo?(ncameron)

Nick Cameron [:nrc]

Comment 16

•

11 years ago

(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #15) > (In reply to Tim Taubert [:ttaubert] from comment #12) > > We profiled on Paolo's machine and got the following profile: > > Did Paolo get the time to perform several rounds of profiling to check > whether this was stable diagnosis? > > > Interesting is that we seem to wait a lot (~4s) on system calls in > > WaitForMessage(). This function has changed in Fx 25 (bug 896896): > > > > https://hg.mozilla.org/mozilla-central/rev/4887f7d34df2 > > Nicholas, any chance that you could shed some light on this? AFAIK, the implementation of WaitForMessage should be no less efficient than the previous use of WaitMessage. But these are some deep, dark windows internals things (perf-wise) so I might be wrong. We've been backing that change out of beta and release because it causes issues for some users. But as far as the docs go, it is the right approach. And it does prevent a real bug which appears when we use OMTC. The time spent on system calls is probably just sleeping, waiting for a message. That should not be a problem, unless you think that is time that should be spent doing something.

Flags: needinfo?(ncameron)

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 17

•

11 years ago

(In reply to Nick Cameron [:nrc] from comment #16) > The time spent on system calls is probably just sleeping, waiting for a > message. That should not be a problem, unless you think that is time that > should be spent doing something. Having the main thread sleep a total of 4 seconds during startup seems a little surprising.

Nick Cameron [:nrc]

Comment 18

•

11 years ago

(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #17) > (In reply to Nick Cameron [:nrc] from comment #16) > > The time spent on system calls is probably just sleeping, waiting for a > > message. That should not be a problem, unless you think that is time that > > should be spent doing something. > > Having the main thread sleep a total of 4 seconds during startup seems a > little surprising. Yeah, that seems bad. I am not sure if that is because of WaitForMultipleObjectsEx, or if there is some kind of *lock that would be there anyway which involves it. I am by no means an expert on this Windows API stuff.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 19

•

11 years ago

Attached patch Removing the worker from the startup - DO NOT LAND (deleted) — Details — Splinter Review

We don't have the distribution of the regression across operating systems, but we suspect that it's Windows only. Testing under Windows seems to indicate a large regression (see comment 12) and small-to-trivial regressions on MacOS X and Linux that probably match bug 959598. The attached patch lets us test without initializing or waiting for SessionWorker. Don't use this with your everyday profile.

"Saptarshi Guha[:joy]"

Comment 20

•

11 years ago

Attached file Updated Numbers for WinXP (deleted) — Details

Same interpretation as Comment[11]. This is for Windows XP. p x.24.0$ x.25.0$ x.26.0$ x.25.0.1$ 1: 0.00 0.000 0.000 0.000 0.000 2: 0.01 0.029 0.103 0.081 0.098 3: 0.05 0.054 0.197 0.149 0.184 4: 0.10 0.078 0.296 0.211 0.264 5: 0.20 0.127 0.492 0.334 0.418 6: 0.30 0.197 0.719 0.476 0.593 7: 0.40 0.300 1.006 0.652 0.806 8: 0.50 0.451 1.406 0.892 1.090 9: 0.60 0.675 1.987 1.256 1.497 10: 0.70 1.057 2.930 1.871 2.153 11: 0.80 1.856 4.573 3.144 3.374 12: 0.90 4.184 8.702 6.880 6.529 13: 0.95 8.440 15.623 14.526 11.933 14: 0.99 34.828 56.922 78.442 48.304 15: 1.00 918350.860 94209.719 62206.390 955376.719

Tim Taubert [:ttaubert] (inactive)

Comment 21

•

11 years ago

Looking at those numbers it seems very likely that bug 896896 is the culprit. Probably in combination with SessionStore switching to its own I/O worker. Fx 24 was good, Fx 25 was bad. Fx 25.0.1 was slightly better again because we backed out the WaitForMessage() change [1]. As Nick said this was backed out from Fx 26 as well but the numbers are still bad probably because we have all the Nightly, Beta, and Aurora usage data - which we don't have for 25.0.1. [1] https://hg.mozilla.org/releases/mozilla-release/rev/33b48d5a8be3

Tim Taubert [:ttaubert] (inactive)

Comment 22

•

11 years ago

(In reply to "Saptarshi Guha[:joy]" from comment #20) > Created attachment 8361034 [details] > Updated Numbers for WinXP Joy, thank you for all the numbers. I'd be really interested if we see the same numbers on Windows 7. What about Linux or Mac? According to our current theory they shouldn't be at all or just slightly affected. I really hope it's not too much work getting that data :)

Flags: needinfo?(sguha)

"Saptarshi Guha[:joy]"

Comment 23

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #22) > (In reply to "Saptarshi Guha[:joy]" from comment #20) > > Created attachment 8361034 [details] > > Updated Numbers for WinXP > > Joy, thank you for all the numbers. I'd be really interested if we see the > same numbers on Windows 7. What about Linux or Mac? According to our current > theory they shouldn't be at all or just slightly affected. I really hope > it's not too much work getting that data :) Not a problem. Will get them and update the bug accordingly

Flags: needinfo?(sguha)

Tim Taubert [:ttaubert] (inactive)

Comment 24

•

11 years ago

After some more investigation it looks like WaitForMessage() isn't to blame. Bug 959598 accounts for a huge delay on startup, on my XP VM it takes about 900ms to initialize the worker and read the session file. Another problem are the sessionstore initialization changes. For small to medium sessions where we finish reading long before delayed-startup-finished is fired we wait 70-100ms until restoring the session into the initial window only because of all the stuff being called in delayedStartup(). We can fix this by calling SessionStore.initSession() back from where it was.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 25

•

11 years ago

Attached patch Using OS.File to read the session (deleted) — Details — Splinter Review

Here's a patch that restores using OS.File instead of SessionWorker to read the session.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 26

•

11 years ago

Tim, could you test the performance with that patch?

Flags: needinfo?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Comment 27

•

11 years ago

Your patch unfortunately doesn't apply at all but I just wrote my own version :) OS.File with dedicated Worker: ~800ms OS.File: : ~800ms (your patch) NetUtil.asyncFetch() : ~ 70ms

Flags: needinfo?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Comment 28

•

11 years ago

Results on my MBP are as follows: It takes about ~220ms from calling |new ChromeWorker(worker.js)| to executing the first line of |worker.js|. Starting up workers seems really slow. The other thing that isn't really great is that |importScripts("resource://gre/modules/osfile.jsm")| alone takes 60ms off the worker's "startup time" until it's ready to handle I/O requests.

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Blocks: 898308, 891360

Depends on: 960952, 960986

Keywords: regressionwindow-wanted

Tim Taubert [:ttaubert] (inactive)

Comment 29

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #28) > It takes about ~220ms from calling |new ChromeWorker(worker.js)| to > executing the first line of |worker.js|. Starting up workers seems really > slow. Just FTR, due to bug 960986 worker startup time is actually more like 600-700ms.

Steven MacLeod [:smacleod]

Assignee

Comment 30

•

11 years ago

I have a patch to fix the initial session store read almost complete, which I'll post in the next day or two. It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader and see what's faster. There will be a bit of a hack due to Bug 937220, but we should be able to remove it easily once that bug is fixed.

Depends on: 937220

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 31

•

11 years ago

(In reply to Steven MacLeod [:smacleod] from comment #30) > It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader We should be careful with asyncFetch, since it can introduce main-thread I/O depending on the file stream being used. Feels like an all-native solution would be better still, I think we should prototype it and measure.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 32

•

11 years ago

(In reply to :Gavin Sharp (email gavin@gavinsharp.com) from comment #31) > (In reply to Steven MacLeod [:smacleod] from comment #30) > > It uses NetUtil.asyncFetch, but I'll also try with nsIDomFileReader > > We should be careful with asyncFetch, since it can introduce main-thread I/O > depending on the file stream being used. Feels like an all-native solution > would be better still, I think we should prototype it and measure. I have a native OS.File.read implementation prototype in bug 961665.

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Blocks: fxdesktopbacklog

Whiteboard: [triage]

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

Depends on: 964531

Steven MacLeod [:smacleod]

Assignee

Comment 33

•

11 years ago

Attached patch Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown (obsolete) (deleted) — Details — Splinter Review

This patch switches the SessionStore read over to using the DOMFileReader. To avoid crashes due to worker startup at shutdown we also start the worker with a noop during startup.

Assignee: nobody → smacleod

Attachment #8368075 - Flags: review?(ttaubert)

Attachment #8368075 - Flags: review?(dteller)

Steven MacLeod [:smacleod]

Assignee

Comment 34

•

11 years ago

Attached patch Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm (obsolete) (deleted) — Details — Splinter Review

Attachment #8368076 - Flags: review?(dteller)

Steven MacLeod [:smacleod]

Assignee

Comment 35

•

11 years ago

With both of the previous patches applied on my machine I see an improvement to the (Session Restored - First Paint) times. There was a lot of noise in the measurements, but I pulled out what seemed to be a typical case for each. The timestamps appear in order (except for the INFO.* stamps which are printed after Session Restored): LEGEND: [CALLING READ] - Time immediately before SessionFile.read() call [WORKER STARTING] - Time immediately before first message posted to worker (either read, or noop) [AFTER NOOP CALL] - Time immediately after first message posted to worker (after the post() call) [WORKER STARTED] - Time of first line of JS execution inside SessionWorker (right before osfile import) [WORKER OSFILE IMPORTED] - Time immediately after osfile is imported in worker [READ FINISHED] - Time of first line inside |_onSessionFileRead| in nsSessionStartup Others are self explanatory # BEFORE CHANGES: ## COLD: 1391108642855 [CALLING READ] 1391108642858 [WORKER STARTING] 1391108642859 [AFTER NOOP CALL] 1391108643048 [WORKER STARTED] 1391108643110 [WORKER OSFILE IMPORTED] 1391108643113 [READ FINISHED] 1391108642546 [INFO.PROCESS] 1391108643771 [INFO.FIRSTPAINT] 1391108643853 [INFO.SESSIONRESTORED] 1225 [TIME TO FIRST PAINT] 82 [SESSION RESTORED - FIRST PAINT] 1307 [TIME TO SESSION RESTORED] ## HOT: 1391108651116 [CALLING READ] 1391108651118 [WORKER STARTING] 1391108651119 [AFTER NOOP CALL] 1391108651313 [WORKER STARTED] 1391108651946 [WORKER OSFILE IMPORTED] 1391108651952 [READ FINISHED] 1391108650861 [INFO.PROCESS] 1391108651784 [INFO.FIRSTPAINT] 1391108652008 [INFO.SESSIONRESTORED] 923 [TIME TO FIRST PAINT] 224 [SESSION RESTORED - FIRST PAINT] 1147 [TIME TO SESSION RESTORED] 1391108658025 [CALLING READ] 1391108658026 [WORKER STARTING] 1391108658027 [AFTER NOOP CALL] 1391108658253 [WORKER STARTED] 1391108658845 [WORKER OSFILE IMPORTED] 1391108658847 [READ FINISHED] 1391108657739 [INFO.PROCESS] 1391108658699 [INFO.FIRSTPAINT] 1391108658892 [INFO.SESSIONRESTORED] 960 [TIME TO FIRST PAINT] 193 [SESSION RESTORED - FIRST PAINT] 1153 [TIME TO SESSION RESTORED] # AFTER CHANGES: ## COLD: 1391109078865 [CALLING READ] 1391109078868 [WORKER STARTING] 1391109078869 [AFTER NOOP CALL] 1391109079058 [READ FINISHED] 1391109079064 [WORKER STARTED] 1391109079128 [WORKER OSFILE IMPORTED] 1391109078564 [INFO.PROCESS] 1391109079776 [INFO.FIRSTPAINT] 1391109079827 [INFO.SESSIONRESTORED] 1212 [TIME TO FIRST PAINT] 51 [SESSION RESTORED - FIRST PAINT] 1263 [TIME TO SESSION RESTORED] ## HOT: 1391108903131 [CALLING READ] 1391108903132 [WORKER STARTING] 1391108903132 [AFTER NOOP CALL] 1391108903325 [READ FINISHED] 1391108903329 [WORKER STARTED] 1391108902873 [INFO.PROCESS] 1391108903828 [INFO.FIRSTPAINT] 1391108903905 [INFO.SESSIONRESTORED] 955 [TIME TO FIRST PAINT] 77 [SESSION RESTORED - FIRST PAINT] 1032 [TIME TO SESSION RESTORED] 1391108904100 [WORKER OSFILE IMPORTED] 1391108908894 [CALLING READ] 1391108908896 [WORKER STARTING] 1391108908896 [AFTER NOOP CALL] 1391108909093 [READ FINISHED] 1391108909096 [WORKER STARTED] 1391108908650 [INFO.PROCESS] 1391108909542 [INFO.FIRSTPAINT] 1391108909630 [INFO.SESSIONRESTORED] 892 [TIME TO FIRST PAINT] 88 [SESSION RESTORED - FIRST PAINT] 980 [TIME TO SESSION RESTORED] 1391108909803 [WORKER OSFILE IMPORTED]

Tim Taubert [:ttaubert] (inactive)

Comment 36

•

11 years ago

Comment on attachment 8368076 [details] [diff] [review] Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm Review of attachment 8368076 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/crashmonitor/CrashMonitor.jsm @@ +95,5 @@ > * > * @return {Promise} A promise that resolves/rejects once loading is complete > */ > loadPreviousCheckpoints: function () { > + let d = Promise.defer(); Nit: maybe call it "deferred"? @@ +99,5 @@ > + let d = Promise.defer(); > + CrashMonitorInternal.previousCheckpoints = d.promise; > + > + let reader = Cc["@mozilla.org/files/filereader;1"]. > + createInstance(Ci.nsIDOMFileReader); Nit: please indent by two spaces. @@ +108,5 @@ > + } > + > + reader.onload = function () { > + let notifications = null; > + let decoder = new TextDecoder(); We could combine the listeners by using reader.addEventListener("loadend", ...). When that is called we just check reader.error? @@ +114,2 @@ > try { > + let contents = decoder.decode(Uint8Array(reader.result)); Can this throw? If not we should move it out of the try stmt. @@ +114,3 @@ > try { > + let contents = decoder.decode(Uint8Array(reader.result)); > + notifications = Object.freeze(JSON.parse(contents)); Same here for freeze(), we could do that below when calling resolve(). @@ +117,2 @@ > } catch (ex) { > Cu.reportError("Error while loading crash monitor data: " + ex); Nit: "... error while parsing ..." @@ +126,5 @@ > + > + try { > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. Nit: retrieve @@ +127,5 @@ > + try { > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. > + let file = File(dsFile); Move that out of the try stmt? @@ +128,5 @@ > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. > + let file = File(dsFile); > + reader.readAsArrayBuffer(file); How about using reader.readAsText(file, "utf-8")? We could get rid of the TextDecoder and Uint8Array usage then, no? @@ +133,5 @@ > + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) { > + d.resolve(null); > + } catch (ex) { > + Cu.reportError("Error while loading crash monitor data: " + ex); > + d.resolve(null); Nit: maybe: } catch (ex) { if (ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) { Cu.reportError("Error while loading crash monitor data: " + ex); } d.resolve(null); }

Attachment #8368076 - Flags: feedback+

:Gavin Sharp [email: gavin@gavinsharp.com]

Comment 37

•

11 years ago

(In reply to Tim Taubert [:ttaubert] from comment #36) > @@ +128,5 @@ > > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > > + // cause synchronous disk access as they check for existence and > > + // retreieve file meta-data before the file is read asynchronously. (nit: retrieve) That's rather unfortunate :( IIRC Yoric suggested you file a bug (I don't see one on file). Perhaps we can fix it quickly?

Tim Taubert [:ttaubert] (inactive)

Comment 38

•

11 years ago

Comment on attachment 8368075 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown Review of attachment 8368075 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/sessionstore/src/SessionFile.jsm @@ +152,5 @@ > + * string The name of the session file. > + * @returns {promise} > + */ > + _readSessionFile: function (filename) { > + let d = Promise.defer(); Nit: deferred @@ +161,5 @@ > + reader.onerror = function () { > + d.reject(reader.error); > + } > + > + reader.onload = function () { addEventListener("loadend") and check reader.error? @@ +181,5 @@ > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. > + let file = File(dsFile); > + reader.readAsArrayBuffer(file); readAsText()? @@ +184,5 @@ > + let file = File(dsFile); > + reader.readAsArrayBuffer(file); > + } catch (ex) { > + d.reject(ex); > + } Do we need to reject here? Wouldn't we just continue with the next file as the exception is caught in read()?

Attachment #8368075 - Flags: review?(ttaubert) → feedback+

David Teller [:Yoric] - still alive but not very active

Reporter

Updated

•

11 years ago

Severity: normal → major

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 39

•

11 years ago

Comment on attachment 8368075 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown Review of attachment 8368075 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/sessionstore/src/SessionFile.jsm @@ +127,5 @@ > */ > _isClosed: false, > > read: function () { > + SessionWorker.post("noop"); Please add a comment explaining why you do this. Also, couldn't we do this in app-startup in nsSessionStartup.js? @@ +179,5 @@ > + > + try { > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. Mention the bug number. Also, "retrieve." @@ +181,5 @@ > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. > + let file = File(dsFile); > + reader.readAsArrayBuffer(file); readAsText() would simplify the code, and might possibly be faster. ::: browser/components/sessionstore/src/SessionWorker.js @@ +71,3 @@ > */ > + noop: function () { > + return {result: true}; Let's call it |init|.

Attachment #8368075 - Flags: review?(dteller) → feedback+

Tim Taubert [:ttaubert] (inactive)

Comment 40

•

11 years ago

FTR, testing both patches on my machine yielded good but not great results. They took as long as reading using the ChromeWorker (200ms) but that was after all an almost constant time and didn't increase to 800ms with a filled scache anymore. Trying NetUtil.asyncFetch() I saw 140ms read times. We had discarded that idea earlier because it seemed hard to get the encoding right but I think we found code we can steal from Metro. Steven will give that a try.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 41

•

11 years ago

Comment on attachment 8368076 [details] [diff] [review] Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm Review of attachment 8368076 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/crashmonitor/CrashMonitor.jsm @@ +121,5 @@ > + d.resolve(notifications); > + } > + > + let dsFile = Services.dirsvc.get("ProfD", Ci.nsIFile); > + dsFile.append("sessionCheckpoints.json"); I'd prefer if you created the nsIFile and initialized it from CrashMonitorInternal.path. @@ +128,5 @@ > + // Note that both the call to File(...) and readAsArrayBuffer(...) will > + // cause synchronous disk access as they check for existence and > + // retreieve file meta-data before the file is read asynchronously. > + let file = File(dsFile); > + reader.readAsArrayBuffer(file); As in the other patch, readAsText would probably be better. @@ +133,5 @@ > + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) { > + d.resolve(null); > + } catch (ex) { > + Cu.reportError("Error while loading crash monitor data: " + ex); > + d.resolve(null); As Tim, but with a != instead of a ==.

Attachment #8368076 - Flags: review?(dteller) → feedback+

Steven MacLeod [:smacleod]

Assignee

Comment 42

•

11 years ago

Attached patch Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2 (obsolete) (deleted) — Details — Splinter Review

I left the noop worker message (now "init") inside sessionfile to keep things as simple as possible. Patch now uses NetUtil.

Attachment #8368075 - Attachment is obsolete: true

Attachment #8368946 - Flags: review?(ttaubert)

Steven MacLeod [:smacleod]

Assignee

Comment 43

•

11 years ago

Attached patch Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v2 (obsolete) (deleted) — Details — Splinter Review

Patch now uses NetUtil.

Attachment #8368076 - Attachment is obsolete: true

Attachment #8368949 - Flags: review?(ttaubert)

Steven MacLeod [:smacleod]

Assignee

Comment 44

•

11 years ago

Try: https://tbpl.mozilla.org/?tree=Try&rev=5ac8f184cb5b

Tim Taubert [:ttaubert] (inactive)

Comment 45

•

11 years ago

Comment on attachment 8368946 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2 Review of attachment 8368946 [details] [diff] [review]: ----------------------------------------------------------------- Thanks! ::: browser/components/sessionstore/src/SessionFile.jsm @@ +129,5 @@ > > read: function () { > + // We must initialize the worker during startup so it will be ready to > + // perform the final write. If shutdown happens soon after startup and > + // the worker has not started yet it will cause lockups. Nit: I hope the remaining lockup will be fixed soon. The actual reason we need to do it here is possible dataloss. @@ +133,5 @@ > + // the worker has not started yet it will cause lockups. > + // See Bug 964531. > + SessionWorker.post("init"); > + > + return Task.spawn(function () { Nit: function* () { [if that works with .bind(this), if not just ignore me]

Attachment #8368946 - Flags: review?(ttaubert) → review+

Tim Taubert [:ttaubert] (inactive)

Comment 46

•

11 years ago

Comment on attachment 8368946 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2 Review of attachment 8368946 [details] [diff] [review]: ----------------------------------------------------------------- ::: browser/components/sessionstore/src/SessionFile.jsm @@ +139,5 @@ > + for (let filename of [this.path, this.backupPath]) { > + try { > + data = yield this._readSessionFile(filename); > + break; > + } catch (ex if ex.result == Cr.NS_ERROR_FILE_NOT_FOUND) { Wait, did you try this? If you reject(status) below what's the exception that then arrives here? Does rejecting a promise in Task.jsm cause an exception? I never tried this. Would be great to maybe remove all the files in your profile dir and check that it works as expected.

Tim Taubert [:ttaubert] (inactive)

Comment 47

•

11 years ago

Comment on attachment 8368949 [details] [diff] [review] Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v2 Review of attachment 8368949 [details] [diff] [review]: ----------------------------------------------------------------- LGTM ::: toolkit/components/crashmonitor/CrashMonitor.jsm @@ +102,5 @@ > + let file = FileUtils.File(CrashMonitorInternal.path); > + NetUtil.asyncFetch(file, function(inputStream, status) { > + if (!Components.isSuccessCode(status)) { > + if (status != Cr.NS_ERROR_FILE_NOT_FOUND) { > + Cu.reportError("Error while loading crash monitor data: " + ex); "ex" isn't defined here.

Attachment #8368949 - Flags: review?(ttaubert) → review+

Steven MacLeod [:smacleod]

Assignee

Comment 48

•

11 years ago

Attached patch Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3 (deleted) — Details — Splinter Review

Attachment #8368946 - Attachment is obsolete: true

Attachment #8369390 - Flags: review?(ttaubert)

Steven MacLeod [:smacleod]

Assignee

Comment 49

•

11 years ago

Attached patch Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v3 (deleted) — Details — Splinter Review

Attachment #8368949 - Attachment is obsolete: true

Attachment #8369392 - Flags: review?(ttaubert)

Tim Taubert [:ttaubert] (inactive)

Comment 50

•

11 years ago

Comment on attachment 8369390 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3 Review of attachment 8369390 [details] [diff] [review]: ----------------------------------------------------------------- LGTM!

Attachment #8369390 - Flags: review?(ttaubert) → review+

Tim Taubert [:ttaubert] (inactive)

Comment 51

•

11 years ago

Comment on attachment 8369392 [details] [diff] [review] Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v3 Review of attachment 8369392 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/crashmonitor/CrashMonitor.jsm @@ +119,2 @@ > } catch (ex) { > Cu.reportError("Error while loading crash monitor data: " + ex); Nit: Error while parsing...

Attachment #8369392 - Flags: review?(ttaubert) → review+

Tim Taubert [:ttaubert] (inactive)

Comment 52

•

11 years ago

https://hg.mozilla.org/integration/fx-team/rev/13d1ff429b44 https://hg.mozilla.org/integration/fx-team/rev/75303a3ddc0c

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

No longer blocks: fxdesktopbacklog

Whiteboard: [triage]

Tim Taubert [:ttaubert] (inactive)

Updated

•

11 years ago

status-firefox27: --- → wontfix

status-firefox28: --- → affected

status-firefox29: --- → affected

tracking-firefox28: --- → ?

tracking-firefox29: --- → ?

Tim Taubert [:ttaubert] (inactive)

Comment 53

•

11 years ago

I assume this won't make Fx 29 but let's see.

status-firefox30: --- → fixed

Steven MacLeod [:smacleod]

Assignee

Comment 54

•

11 years ago

Comment on attachment 8369390 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3 [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 959130 (This bug) User impact if declined: Startup perf regression documented in this bug Testing completed (on m-c, etc.): Tested locally, pushed to try, landing [1] Risk to taking this patch (and alternatives if risky): Shouldn't be very risky, just a small change to read the file differently outside of the worker. String or IDL/UUID changes made by this patch: None [1] https://hg.mozilla.org/integration/fx-team/rev/13d1ff429b44

Attachment #8369390 - Flags: approval-mozilla-aurora?

Lukas Blakk [:lsblakk] use ?needinfo

Comment 55

•

11 years ago

Since it hasn't landed to central yet, we might have to uplift this to Aurora/Beta after the merge - tracking so we make sure it does.

tracking-firefox28: ? → +

tracking-firefox29: ? → +

Ryan VanderMeulen [:RyanVM]

Comment 56

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/13d1ff429b44 https://hg.mozilla.org/mozilla-central/rev/75303a3ddc0c

Status: NEW → RESOLVED

Closed: 11 years ago

status-firefox29: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → Firefox 29

Tim Taubert [:ttaubert] (inactive)

Comment 57

•

11 years ago

It looks like there has been a 27% talos regression on "Fx-Team-Non-PGO - Tp5 No Network Row Major MozAfterPaint (Non-Main Startup File IO Bytes)". https://groups.google.com/forum/#!topic/mozilla.dev.tree-management/StAkmt81_ec http://mzl.la/1fqYBZ9 Tp5 is a page loader test but the "Startup File IO Bytes" hints to it measuring I/O going through file channels at startup. We previously omitted those channels by using OS.File so it seems reasonable that there is of course more I/O reported now. I think it is safe to ignore but wanted to write it down in case someone thinks otherwise. Joel, can you confirm that this test regressed only because of more bytes going through the file channels at startup? Other Tp5 tests would have regressed as well if this would really impact page load time, correct?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

11 years ago

Flags: needinfo?(jmaher)

Whiteboard: [talos_regression]

Joel Maher ( :jmaher ) (UTC -8)

Comment 58

•

11 years ago

this test is xperf specfically- we measure bytes on startup and during loading pages and report the total bytes from those accessed files. When these numbers change it is usually related to loading DLL's that we didn't load previously (sometimes things are delay loaded) or related to some new feature that adds/requires libraries which were not there before.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 59

•

11 years ago

If this is a xperf test, this means that OS.File was not hiding byte usage.

David Teller [:Yoric] - still alive but not very active

Reporter

Comment 60

•

11 years ago

I strongly doubt it's related to this bug. At the moment, none of the Talos tests has a sessionstore.js, so I can't imagine us causing 125kb+ of I/O during startup. Barring any mistake, all the code of asyncFetch resides in libxul and libc/win32.dll or equivalent, so us causing a dll load sounds unlikely, too.

Tim Taubert [:ttaubert] (inactive)

Comment 61

•

11 years ago

Ah, I didn't think about those startup profiles not having a sessionstore.js file. It's probably completely unrelated then and safe to ignore.

Sylvestre Ledru [:Sylvestre]

Updated

•

11 years ago

Attachment #8369390 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

:Gavin Sharp [email: gavin@gavinsharp.com]

Updated

•

11 years ago

No longer depends on: 960952, 960986

Summary: Investigate Firefox 25 regression on sessionrestored → don't use a ChromeWorker to read the session file at startup

:Gijs (he/him)

Updated

•

11 years ago

Depends on: 967766

:Gijs (he/him)

Updated

•

11 years ago

Depends on: 970043

:Gijs (he/him)

Updated

•

11 years ago

No longer depends on: 967766

Lukas Blakk [:lsblakk] use ?needinfo

Comment 62

•

11 years ago

Can we get an uplift nomination for Beta here?

Flags: needinfo?(smacleod)

Steven MacLeod [:smacleod]

Assignee

Comment 63

•

11 years ago

Comment on attachment 8369390 [details] [diff] [review] Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3 [Approval Request Comment] Bug caused by (feature/regressing bug #): Bug 959130 (This bug) User impact if declined: Startup perf regression documented in this bug Testing completed (on m-c, etc.): On m-c and aurora currently. Risk to taking this patch (and alternatives if risky): Introduces a small talos regression (Bug 970043) which we believe is worth the much larger perf win introduced. Otherwise pretty low risk. String or IDL/UUID changes made by this patch: None

Attachment #8369390 - Flags: approval-mozilla-beta?

Flags: needinfo?(smacleod)

Lukas Blakk [:lsblakk] use ?needinfo

Updated

•

11 years ago

Attachment #8369390 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Tim Taubert [:ttaubert] (inactive)

Comment 64

•

11 years ago

Steven, can you please provide a patch that applies on Beta?

Flags: needinfo?(smacleod)

Keywords: branch-patch-needed

Steven MacLeod [:smacleod]

Assignee

Comment 65

•

11 years ago

Attached patch Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown (Beta Uplift) (deleted) — Details — Splinter Review

Flags: needinfo?(smacleod)

Steven MacLeod [:smacleod]

Assignee

Comment 66

•

11 years ago

Patch for beta uplift is now attached.

Keywords: branch-patch-needed → checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 67

•

11 years ago

https://hg.mozilla.org/releases/mozilla-beta/rev/8d35f3f8b28b

status-firefox28: affected → fixed

Keywords: checkin-needed

u279076

Comment 68

•

11 years ago

Deprioritizing for QA testing before we release. If there's any chance this could cause regressions in Session Restore functionality then please flag for testing by removing [qa-] and adding the verifyme keyword.

Whiteboard: [talos_regression] → [talos_regression][qa-]

Ryan VanderMeulen [:RyanVM]

Comment 69

•

11 years ago

https://hg.mozilla.org/releases/mozilla-b2g28_v1_3/rev/8d35f3f8b28b

Ryan VanderMeulen [:RyanVM]

Updated

•

11 years ago

status-b2g-v1.3: --- → fixed

Mathew Hodson

Updated

•

3 years ago

Blocks: 964531

No longer depends on: 964531

foo.pdf 11 years ago "Saptarshi Guha[:joy]" (deleted), application/force-download		Details
0001-Bug-959130-Back-out-cset-4887f7d34df2-bug-896896.patch 11 years ago Tim Taubert [:ttaubert] (inactive) (deleted), patch		Details \| Diff \| Splinter Review
Removing the worker from the startup - DO NOT LAND 11 years ago David Teller [:Yoric] - still alive but not very active (deleted), patch		Details \| Diff \| Splinter Review
Updated Numbers for WinXP 11 years ago "Saptarshi Guha[:joy]" (deleted), application/force-download		Details
Using OS.File to read the session 11 years ago David Teller [:Yoric] - still alive but not very active (deleted), patch		Details \| Diff \| Splinter Review
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : feedback+ Yoric : feedback+	Details \| Diff \| Splinter Review
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : feedback+ Yoric : feedback+	Details \| Diff \| Splinter Review
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v2 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : review+	Details \| Diff \| Splinter Review
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v2 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : review+	Details \| Diff \| Splinter Review
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown v3 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : review+ Sylvestre : approval-mozilla-aurora+ lsblakk : approval-mozilla-beta+	Details \| Diff \| Splinter Review
Patch 2 - Stop using OS.File for startup read in CrashMonitor.jsm v3 11 years ago Steven MacLeod [:smacleod] (deleted), patch	ttaubert : review+	Details \| Diff \| Splinter Review
Patch 1 - Stop using the SessionWorker for read at startup to avoid slowdown (Beta Uplift) 11 years ago Steven MacLeod [:smacleod] (deleted), patch		Details \| Diff \| Splinter Review