Closed
Bug 1361102
Opened 8 years ago
Closed 7 years ago
nsUpdateService.js does main thread IO writes for the update xml files during the first startup after an update, and whenever an update is downloaded
Categories
(Toolkit :: Application Update, defect, P2)
Toolkit
Application Update
Tracking
()
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: florian, Assigned: robert.strong.bugs)
References
(Blocks 3 open bugs)
Details
(Whiteboard: [bhr])
Attachments
(4 files, 4 obsolete files)
(deleted),
patch
|
dthayer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dthayer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dthayer
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
robert.strong.bugs
:
review+
|
Details | Diff | Splinter Review |
FileUtils_closeSafeFileOutputStream is called 5 times by update related code on this profile: https://perfht.ml/2p1wH1d
The first one is before the first window is shown. The next 4 are after the window is shown, but cause UI jank, and they seem to be writing the same file over and over again.
Comment 1•8 years ago
|
||
This comes from _writeUpdatesToXMLFile(): <https://searchfox.org/mozilla-central/rev/ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/mozapps/update/nsUpdateService.js#2740>. The reason we're flushing here is that the output file stream used here is a safe output file stream. As far as I can tell, that goes back to bug 299302 where this code was introduced first: <https://searchfox.org/mozilla-central/rev/f59f02c76f3a0e997a84a5711b8531bbec91173c/toolkit/mozapps/update/src/nsUpdateService.js.in#1868>
Robert, is an fflush() really necessary here? If no, we can simply make the output file stream here an atomic output filestream (added in bug 928321) which doesn't flush. That should fix this bug...
Flags: needinfo?(robert.strong.bugs)
Assignee | ||
Comment 2•8 years ago
|
||
That is likely but I'll look at it again in a few days after I finish some uplifts I have to get done.
Flags: needinfo?(robert.strong.bugs)
Reporter | ||
Comment 3•8 years ago
|
||
Using a deferred task to write it once instead of 4 times would also be helpful.
Comment 4•8 years ago
|
||
If flushing is necessary we should consider doing so from a background thread to avoid pausing the UI thread.
I saw reports of this in the BHR reports of hangs of the main thread for over 8 seconds, for example:
https://people-mozilla.org/~mlayzell/bhr/20170429/763.html#1
https://people-mozilla.org/~mlayzell/bhr/20170429/753.html#0
https://people-mozilla.org/~mlayzell/bhr/20170429/967.html#0
Assignee | ||
Comment 5•8 years ago
|
||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #1)
> This comes from _writeUpdatesToXMLFile():
> <https://searchfox.org/mozilla-central/rev/
> ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/mozapps/update/
> nsUpdateService.js#2740>. The reason we're flushing here is that the output
> file stream used here is a safe output file stream. As far as I can tell,
> that goes back to bug 299302 where this code was introduced first:
> <https://searchfox.org/mozilla-central/rev/
> f59f02c76f3a0e997a84a5711b8531bbec91173c/toolkit/mozapps/update/src/
> nsUpdateService.js.in#1868>
Not related to bug 299302 which was an add-ons manager bug.
Component: Application Update → Startup and Profile System
Assignee | ||
Comment 6•8 years ago
|
||
Ehsan, what is the typical file output stream that is used for things like this? I much prefer using what is used by the majority of code for app update when ever possible.
Flags: needinfo?(ehsan)
Assignee | ||
Comment 7•8 years ago
|
||
I have a couple of other improvements to how app update writes to files I've been planning but they haven't made it to the top of my queue due to higher priority work. Fixing this bug will definitely be an improvement for approximately 2 startups every 6 weeks on the release channel and I'll fold the other improvements in with this.
Comment 8•8 years ago
|
||
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #6)
> Ehsan, what is the typical file output stream that is used for things like
> this? I much prefer using what is used by the majority of code for app
> update when ever possible.
An atomic file output stream: <https://searchfox.org/mozilla-central/rev/ae8c2e2354db652950fe0ec16983360c21857f2a/toolkit/modules/FileUtils.jsm#112>. This gives you a file output stream that you can write to and when you close it, we atomically replace it with the destination file. On Windows this is typically done by writing the new file to a temporary file and then atomically rename the temporary file to the destination file to ensure that the written data doesn't get stored partially. The safe output file stream that the current code uses does the exact same thing plus the additional fflush.
To make things even better and avoid the main-thread I/O itself also, you can create an input string stream and use nsIDOMSerializer to serialize the XML DOM into the string and then use NetUtil.asyncCopy() to copy the input string stream into the output stream. The finish callback which would be responsible to close the output stream which will close the file handle will still do I/O on the main thread but that will be much better than the current situation... :-)
I'm wondering if the OS.File API gives us better primitives to avoid all main-thread I/O while working with XPCOM streams? David should be able to help us figure out if we can use that to our benefit.
(BTW thanks for the note about how frequently we hit this code on the release channel, knowing that is helpful in putting the severity of the bug into perspective.)
Flags: needinfo?(ehsan) → needinfo?(dteller)
OS.File doesn't work with XPCOM streams, unfortunately. The main primitive for atomic writes is `OS.File.writeAtomic`, which takes an entire buffer (String or ArrayBuffer) and writes it to disk using the usual pseudo-atomic properties.
As a side-note, recall that even a full flush does not guarantee that the file is correcty written to disk. For Session Restore, we had to come up with alternative solutions. I wrote a blog post about this a long time ago: https://dutherenverseauborddelatable.wordpress.com/2014/06/26/firefox-the-browser-that-has-your-backup/
Flags: needinfo?(dteller)
Note that if you have a string, OS.File.writeAtomic will generally be faster than NetUtil.asyncCopy.
Updated•8 years ago
|
Whiteboard: [qf] → [qf][bhr]
Assignee | ||
Updated•8 years ago
|
Component: Startup and Profile System → Application Update
Comment 11•8 years ago
|
||
[qf:p3] because of the frequency of the occurrence in comment 7.
Whiteboard: [qf][bhr] → [qf:p3][bhr]
Reporter | ||
Comment 12•8 years ago
|
||
(In reply to :Ehsan Akhgari (super long backlog, slow to respond) from comment #11)
> [qf:p3] because of the frequency of the occurrence in comment 7.
"2 startups every 6 weeks" in comment 7 is one way to phrase it, but we could also say that it affects 100% of the startups right after updating to 57. Is this the first time experience we want for 57?
It will also occur a lot on Nightly where we do much of our profiling.
Assignee | ||
Comment 13•8 years ago
|
||
Prioritizing this based on people profiling is probably not the right way to prioritize either since it isn't prioritizing off of what our release users experience.
If we had all the time in the world to fix all of the bugs then everything should be a P1 whether it affected 1 startup after updating or all startups. Even then an all startups bug should have a higher priority than this bug.
Having said that I do plan on addressing this bug and the other things I mentioned well before 57 so the priority isn't that important to me.
Reporter | ||
Comment 14•7 years ago
|
||
Here is another startup profile from the quantum reference hardware https://perfht.ml/2rgSN14 where we spend more than 3s on the main thread writing this file.
Assignee | ||
Comment 15•7 years ago
|
||
Interesting profile in comment #14. This code path is for staging an update which happens after the update download completes which won't happen during startup the vast majority of time. Surprised you caught it on startup.
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#3507
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#3711
https://dxr.mozilla.org/mozilla-central/source/toolkit/xre/nsUpdateDriver.cpp#869
https://dxr.mozilla.org/mozilla-central/source/toolkit/xre/nsUpdateDriver.cpp#975
https://dxr.mozilla.org/mozilla-central/source/toolkit/mozapps/update/nsUpdateService.js#2739
It will of course also benefit from the changes proposed in this bug.
Assignee | ||
Updated•7 years ago
|
Priority: -- → P2
Reporter | ||
Comment 16•7 years ago
|
||
I keep seeing this in startup profiles, here is another one seen today: https://perfht.ml/2sQ0fQJ
This happens whenever we are downloading an update, and whenever we are installing an update. I've been told during the all hands that we have lots of users who use Firefox only once or twice a month. These users will experience this jank in each Firefox session.
Given this, I don't think qf:p3 is the right priority here.
Summary: The first startup after an update writes update related files to disk several times using main thread I/O → nsUpdateService.js does main thread IO during the first startup after an update, and whenever an update is downloaded
Whiteboard: [qf:p3][bhr] → [qf][bhr]
Assignee | ||
Comment 17•7 years ago
|
||
Note: we are trying to finish up bug 1343671 which will move the download outside of Firefox (also doesn't require Firefox to be running) to a scheduled task on Windows. This will benefit this bug as well as get those people that seldom run Firefox (easily less than 15% though I don't have an exact number) updated quicker. There are also plans to do something similar on Mac.
Updated•7 years ago
|
Whiteboard: [qf][bhr] → [qf:p1][bhr]
Comment 18•7 years ago
|
||
Robert, are you still planning on taking this bug for 57? Bug 1343671 got pushed to 58, correct? If you're low on time I can look into the improvements you all have talked about here.
Flags: needinfo?(robert.strong.bugs)
Assignee | ||
Comment 19•7 years ago
|
||
Doug, I planned on this bug for 57 and Bug 1343671 got pushed to 58. I should be done with bug 1385865 soon and this bug would be next in line since bug 1385865 changes the code that this bug touches quite alot. If you could take this after I land bug 1385865 that would be helpful.
Flags: needinfo?(robert.strong.bugs)
Assignee | ||
Comment 20•7 years ago
|
||
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #19)
> Doug, I planned on this bug for 57 and Bug 1343671 got pushed to 58. I
> should be done with bug 1385865 soon and this bug would be next in line
> since bug 1385865 changes the code that this bug touches quite alot. If you
> could take this after I land bug 1385865 that would be helpful.
Copied the wrong bug number. I meant that "bug 1367910 changes the code that this bug touches quite alot", etc.
Comment 21•7 years ago
|
||
Works for me!
Updated•7 years ago
|
Assignee: nobody → dothayer
Status: NEW → ASSIGNED
Reporter | ||
Comment 22•7 years ago
|
||
Robert, I'm confused by the status of this bug. It seems in comment 19 and 20 you want this to wait until bug 1367910 lands, but that bug doesn't have a patch that looks close to landing (no pending review request, and a patch last touched 3 months ago). What's the next step here to get this unblocked?
Flags: needinfo?(robert.strong.bugs)
Assignee | ||
Comment 23•7 years ago
|
||
Florian, I have a patch almost ready for review but other work has repeatedly taken priority. I'll ask Doug to go ahead with this and I'll adapt the patch for bug 1367910 after this lands. To expedite this I am going to ask Doug to request review from you for this bug.
Doug, I'll adapt the patch for bug 1367910 to what lands for this bug. Please request review from Florian. Thanks!
Flags: needinfo?(robert.strong.bugs) → needinfo?(dothayer)
Reporter | ||
Comment 24•7 years ago
|
||
(In reply to Florian Quèze [:florian] [:flo] from comment #3)
> Using a deferred task to write it once instead of 4 times would also be
> helpful.
Or maybe http://searchfox.org/mozilla-central/source/toolkit/mozapps/extensions/DeferredSave.jsm already does most of what we need.
Comment 25•7 years ago
|
||
Robert, you mentioned in IRC that you were taking this. Should I unassign myself?
Flags: needinfo?(dothayer) → needinfo?(robert.strong.bugs)
Assignee | ||
Comment 26•7 years ago
|
||
Done
Assignee: dothayer → robert.strong.bugs
Flags: needinfo?(robert.strong.bugs)
Assignee | ||
Comment 27•7 years ago
|
||
I suspect that I will need to change when UpdateService shutsdown so not quite ready for review. The modified tests are in good shape and I'll attach that patch after I finish the aforementioned.
Attachment #8904446 -
Flags: feedback?(dothayer)
Assignee | ||
Comment 28•7 years ago
|
||
Attachment #8904446 -
Attachment is obsolete: true
Attachment #8904446 -
Flags: feedback?(dothayer)
Attachment #8904712 -
Flags: review?(dothayer)
Assignee | ||
Comment 29•7 years ago
|
||
A couple of the tests that tested multiple things had to be split out to make the end state of the xml files deterministic so they wouldn't periodically fail when run sequentially.
Attachment #8904740 -
Flags: review?(dothayer)
Assignee | ||
Comment 30•7 years ago
|
||
Attachment #8904741 -
Flags: review?(dothayer)
Assignee | ||
Comment 31•7 years ago
|
||
Assignee | ||
Comment 32•7 years ago
|
||
Test run is looking good and none of the tests failed the sequential run and had to be retried so far
Reporter | ||
Comment 33•7 years ago
|
||
Comment on attachment 8904712 [details] [diff] [review]
patch 1 - client code
Review of attachment 8904712 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/mozapps/update/nsUpdateService.js
@@ +2530,5 @@
> + // If there is an active-update.xml.tmp file try to use that for the active
> + // update. If it contains a valid update remove the active-update.xml file
> + // and rename the active-update.xml.tmp to active-update.xml.
> + let activeUpdates = [];
> + let tmpFile = getUpdateFile([FILE_ACTIVE_UPDATE_XML + ".tmp"]);
From a quick glance, this part of the patch looks like it's doing a lot of main thread IO. How is it related to the rest of the patch, and is it all required to be done synchronously?
@@ +2538,5 @@
> + if (activeUpdates.length > 0) {
> + let file = getUpdateFile([FILE_ACTIVE_UPDATE_XML]);
> + try {
> + if (file.exists()) {
> + file.remove(false);
It's faster to call .remove() and let it throw an exception when the file doesn't exist, than to call .exists() before.
Comment 34•7 years ago
|
||
Comment on attachment 8904712 [details] [diff] [review]
patch 1 - client code
Review of attachment 8904712 [details] [diff] [review]:
-----------------------------------------------------------------
On the whole it looks good, but echoing Florian's comment, it would be nice if we could avoid increasing the sync IO in UpdateManager(). It should generally just be an extra file.exists() call, which doesn't seem too bad to me, but I'd like to hear your take on whether it's necessary for this to be synchronous.
::: toolkit/mozapps/update/nsUpdateService.js
@@ +2531,5 @@
> + // update. If it contains a valid update remove the active-update.xml file
> + // and rename the active-update.xml.tmp to active-update.xml.
> + let activeUpdates = [];
> + let tmpFile = getUpdateFile([FILE_ACTIVE_UPDATE_XML + ".tmp"]);
> + if (tmpFile.exists()) {
_loadXMLFileIntoArray already does this check, so I think this is redundant.
@@ +2538,5 @@
> + if (activeUpdates.length > 0) {
> + let file = getUpdateFile([FILE_ACTIVE_UPDATE_XML]);
> + try {
> + if (file.exists()) {
> + file.remove(false);
Do we need this? moveTo should replace the file if it already exists (http://searchfox.org/mozilla-central/rev/4d8e389498a08668cce9ebf6232cc96be178c3e4/xpcom/io/nsIFile.idl#155)
@@ +2541,5 @@
> + if (file.exists()) {
> + file.remove(false);
> + }
> + tmpFile.moveTo(tmpFile.parent, FILE_ACTIVE_UPDATE_XML);
> + LOG("UpdateManager: found active-update.xml.tmp file");
Nit: Should this say something different than the LOG a few lines above?
Attachment #8904712 -
Flags: review?(dothayer) → review-
Comment 35•7 years ago
|
||
Comment on attachment 8904741 [details] [diff] [review]
patch 3 - test code
Review of attachment 8904741 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
Attachment #8904741 -
Flags: review?(dothayer) → review+
Updated•7 years ago
|
Attachment #8904740 -
Flags: review?(dothayer) → review+
Assignee | ||
Comment 36•7 years ago
|
||
Florian, is there any data on the percentage of clients that fail completing the write operation when using writeAtomic and tmpPath that don't complete the rename of the tmp file? If that isn't available is there any data on the percentage of clients that fail completing the write operation when using writeAtomic in general?
Flags: needinfo?(florian)
Reporter | ||
Comment 37•7 years ago
|
||
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #36)
> Florian, is there any data on the percentage of clients that fail completing
> the write operation when using writeAtomic and tmpPath that don't complete
> the rename of the tmp file? If that isn't available is there any data on the
> percentage of clients that fail completing the write operation when using
> writeAtomic in general?
These are questions for Yoric.
Flags: needinfo?(florian) → needinfo?(dteller)
Assignee | ||
Comment 38•7 years ago
|
||
I am changing the summary to limit the scope of this bug and make it closer to the original bug report. I filed bug 1397450 to cover the additional items that have been added since it is unlikely that all main thread IO will be eliminated in nsUpdateService.js during the 57 cycle.
Summary: nsUpdateService.js does main thread IO during the first startup after an update, and whenever an update is downloaded → nsUpdateService.js does main thread IO writes for the update xml files during the first startup after an update, and whenever an update is downloaded
(In reply to Robert Strong [:rstrong] (use needinfo to contact me) from comment #36)
> Florian, is there any data on the percentage of clients that fail completing
> the write operation when using writeAtomic and tmpPath that don't complete
> the rename of the tmp file? If that isn't available is there any data on the
> percentage of clients that fail completing the write operation when using
> writeAtomic in general?
No, we do not have such data.
Flags: needinfo?(dteller)
Let me add one piece of data: most of the time, if there is a ".tmp", it's corrupted data.
The only case in which a ".tmp" can contain all the data, uncorrupted, is if:
1. the process *and the system* have written all the data to disk; AND
2. the process or the system crash before actually renaming the ".tmp" to its final name.
Most OSes finish treating the renaming *before* finishing writing all the data to disk, which means that this situation pretty much never happens.
In other words, the ".tmp" is typically useless.
Assignee | ||
Comment 41•7 years ago
|
||
Thanks for the info David!
Doug, I think there is already enough telemetry in pingStateAndStatusCodes, specifically PATCH_UNKNOWN, to handle the case where the active update is missing for whatever reason.
Attachment #8904712 -
Attachment is obsolete: true
Attachment #8905790 -
Flags: review?(dothayer)
Assignee | ||
Comment 42•7 years ago
|
||
This mainly removes redundant checks and performs file checks for a couple of other tests.
Attachment #8905792 -
Flags: review?(dothayer)
Assignee | ||
Comment 43•7 years ago
|
||
Assignee | ||
Comment 44•7 years ago
|
||
Try push
https://treeherder.mozilla.org/#/jobs?repo=try&revision=43e2a0959ce3b5a2dc1c3e1994a18cb1ce37b4e0
Attachment #8905792 -
Attachment is obsolete: true
Attachment #8905792 -
Flags: review?(dothayer)
Attachment #8905806 -
Flags: review?(dothayer)
Comment 45•7 years ago
|
||
Comment on attachment 8905790 [details] [diff] [review]
patch 1 - client code rev 2
Review of attachment 8905790 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good!
Attachment #8905790 -
Flags: review?(dothayer) → review+
Updated•7 years ago
|
Attachment #8905806 -
Flags: review?(dothayer) → review+
Assignee | ||
Comment 46•7 years ago
|
||
I combined patch 3 and 4.
Attachment #8905806 -
Attachment is obsolete: true
Attachment #8906038 -
Flags: review+
Comment 47•7 years ago
|
||
Pushed by rstrong@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/a18f510ef12e
update the application update xml files asynchronously for writes and deletions. r=dothayer
https://hg.mozilla.org/integration/mozilla-inbound/rev/814b441b2bad
split out tests so they are more deterministic when updating the application update xml files asynchronously. r=dothayer
https://hg.mozilla.org/integration/mozilla-inbound/rev/b04b2abfc170
change tests to account for updating the application update xml files asynchronously. r=dothayer
Comment 48•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/a18f510ef12e
https://hg.mozilla.org/mozilla-central/rev/814b441b2bad
https://hg.mozilla.org/mozilla-central/rev/b04b2abfc170
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox57:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1][bhr] → [bhr]
You need to log in
before you can comment on or make changes to this bug.
Description
•