Closed
Bug 670911
Opened 13 years ago
Closed 13 years ago
"clear cache now" (or cache delete after crash) is slow (and hangs UI) on Windows/NTFS
Categories
(Core :: Networking: Cache, defect)
Tracking
()
RESOLVED
FIXED
mozilla8
People
(Reporter: neil, Assigned: jduell.mcbugs)
References
Details
(Keywords: hang)
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
michal
:
review+
christian
:
approval-mozilla-aurora-
christian
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
jduell.mcbugs
:
review+
jduell.mcbugs
:
superreview+
christian
:
approval-mozilla-aurora-
christian
:
approval-mozilla-beta-
|
Details | Diff | Splinter Review |
When the cache service detects that the cache is corrupt after a crash, it tries to clear the cache quickly by renaming the folder and then unlocking the cache while it deletes the files in the background. Unfortunately bug 224692 means that all of the files and subfolders in the cache have their security information reset, and the cache is still locked while this happens freezing the UI as soon as it tries to access the cache. Bug 559942 exacerbates the problem by allowing the cache to become bigger and therefore hold more files.
Comment 1•13 years ago
|
||
This sounds like something that needs to be fixed on the cache's side.
Component: XPCOM → Networking: Cache
QA Contact: xpcom → networking.cache
Reporter | ||
Comment 2•13 years ago
|
||
The actual hang occurs inside the call to nsLocalFileWin::MoveTo which is why I filed the bug under XPCOM...
Comment 3•13 years ago
|
||
See also bug 651011.
Assignee | ||
Comment 4•13 years ago
|
||
Neil, can you explain a bit more why the UI hangs during the folder deletion? We're renaming the "Cache" folder to "Cache.trash", then deleting it (too quickly: in bug 651011 we're going to slow the deletion down). Are you saying that the directory rename operation itself causes security info updates on all the files in the directory tree? Or that even once we've created a new, different "Cache" directory, we're somehow locked while Cache.trash is deleted?
I've really been wondering why this has caused hangs (as opposed to just annoying amounts of I/O)--thanks for reporting!
Reporter | ||
Comment 5•13 years ago
|
||
(In reply to comment #4)
> Neil, can you explain a bit more why the UI hangs during the folder
> deletion? We're renaming the "Cache" folder to "Cache.trash", then deleting
> it (too quickly: in bug 651011 we're going to slow the deletion down). Are
> you saying that the directory rename operation itself causes security info
> updates on all the files in the directory tree?
Bug 224692 causes the security to be updated on all the files in the directory tree. Although this happens on a separate thread, the cache is locked until the call to MoveTo completes, causing the UI to hang when it needs the cache.
Assignee | ||
Comment 6•13 years ago
|
||
> Bug 224692 causes the security to be updated on all the files in the directory tree.
Do we have an API that will allow us to rename the directory w/o all the NTFS permissions getting touched for the whole tree? From the discussion in bug 224692 it sounds like the native windows move command doesn't touch the info, and we added code to support the case of wanting downloads to inherit permission of their enclosing folder when we move them from TEMP. Can we turn the code off easily?
I suppose the other alternative is to not rename the Cache directory, and create a new one called Cache-1, etc., but it involves jumping through hoops at startup (which directory is the 'real' Cache?) that I'd rather avoid if we can do a rename quickly.
Reporter | ||
Comment 7•13 years ago
|
||
(In reply to comment #6)
> > Bug 224692 causes the security to be updated on all the files in the directory tree.
> Do we have an API that will allow us to rename the directory w/o all the
> NTFS permissions getting touched for the whole tree? From the discussion in
> bug 224692 it sounds like the native windows move command doesn't touch the
> info, and we added code to support the case of wanting downloads to inherit
> permission of their enclosing folder when we move them from TEMP. Can we
> turn the code off easily?
Well, that depends on your definition of easily. Ideally I would have suggested that we turn the code off if the first parameter for MoveTo is null, but I wasn't sure how easy it was to tell that from the point where the call is made.
Assignee | ||
Comment 11•13 years ago
|
||
Resetting ACLs (as added in bug 224692) is pointless when we're already in the target directory, and hangs UI during corrupted cache rename.
Attachment #547350 -
Flags: review?(benjamin)
Assignee | ||
Comment 12•13 years ago
|
||
Currently, for no good reason, we take a bad "Cache" directory and 1) create a Cache.Trash directory, 2) create a "Trash" subdirectory of that, and then 3) move Cache into it, winding up with "Cache.Trash/Trash/Cache".
This changes the logic to simply rename "Cache" -> "Cache.trash". We need to keep the operation as a rename in the same directory (rather than a move into a subdirectory) in order to keep NTFS ACL resets from slowing down the rename into a potentially minutes-long operation (locking the cache and thus the UI the whole while).
Also adds telemetry calls. Some of this will conflict with Taras' already half-landed (inbound) patch for bug 672651. I'll fix conflicts before I land.
Still to come: 1) batching delete of cache dir tree into smaller chunks separated by sleep calls, and 2) adding a 'delay' parameter to delay start of all deletes for some amount of time (at startup we don't want the extra I/O. But these two patches cover the most horrible part of the bug (the UI freeze), so I'd like to see them land ASAP.
Assignee: nobody → jduell.mcbugs
Status: NEW → ASSIGNED
Attachment #547355 -
Flags: review?(michal.novotny)
Assignee | ||
Updated•13 years ago
|
OS: Windows XP → All
Hardware: x86 → All
Summary: Cache slow to "clear" after a crash → Cache slow to clear (and hangs UI) after a crash
Comment 13•13 years ago
|
||
On Windows at least, lowering the thread priority will lower its IO priority as well, so that might be something to try too.
Assignee | ||
Comment 14•13 years ago
|
||
If some folks would be willing to test what I've got so far to verify that the UI no longer locks up, that would be great (I've got an SSD drive and can't replicate any noticeable lockup). There's a build available at
http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jduell@mozilla.com-d3f1bb76aa01/try-win32/firefox-8.0a1.en-US.win32.zip
Steps: backup your profile (with a big Cache directory preferably), kill your browser with the task manager, and then try running this build and see if you hang at startup (make sure your cache directory is tiny after the restart, or Firefox may have managed to exit "cleanly": for some reason I don't always find that "End Task' always results in a "corrupted" cache). Then you can exit, restore your profile and get your old, big cache back. (Extra credit: try the corrupt restart with your stock Firefox and compare). Then note in this bug how big your cache was and how it went. Thanks!
James--thanks for the suggestion: the thread is already getting launched with PR_PRIORITY_LOW, which the NSPR docs say is the lowest possible priority.
Status: ASSIGNED → NEW
OS: All → Windows XP
Hardware: All → x86
Assignee | ||
Comment 15•13 years ago
|
||
And if you have about:telemetry installed
https://addons.mozilla.org/en-US/firefox/addon/abouttelemetry/
you can also go to "about:telemetry" in the ftp build after you restart, and report the numbers for "Time spent renaming bad Cache to Cache.Trash (ms)" and "Time spent deleting disk cache (ms)" (about:telemetry won't work on Firefox 5, so no comparison #s alas)
Updated•13 years ago
|
Attachment #547355 -
Flags: review?(michal.novotny) → review+
Updated•13 years ago
|
Attachment #547350 -
Flags: superreview?(neil)
Attachment #547350 -
Flags: review?(benjamin)
Attachment #547350 -
Flags: review+
Reporter | ||
Comment 16•13 years ago
|
||
Comment on attachment 547350 [details] [diff] [review]
let renames w/o changing directory skip NTFS ACL reset
>+ // Set security permissions to inherit from parent.
>+ // Note: propagates to all children: slow for big file trees
Nit: needs trailing full stop.
> nsresult
> nsLocalFile::CopyMove(nsIFile *aParentDir, const nsAString &newName, PRBool followSymlinks, PRBool move)
> {
>+ PRBool sameDirRename = PR_FALSE;
>+
> nsCOMPtr<nsIFile> newParentDir = aParentDir;
> // check to see if this exists, otherwise return an error.
> // we will check this by resolving. If the user wants us
> // to follow links, then we are talking about the target,
> // hence we can use the |followSymlinks| parameter.
> nsresult rv = ResolveAndStat();
> if (NS_FAILED(rv))
> return rv;
>
> if (!newParentDir)
> {
>+ // No need to reset ACLs to "new" dir if not changing directory
>+ sameDirRename = PR_TRUE;
I was looking at this when it dawned on me that sameDirRename == !newParentDir == !aParentDir, and as the latter isn't used anywhere else in the method, you can just pass !aParentDir to CopySingleFile. Or am I missing something?
Assignee | ||
Comment 17•13 years ago
|
||
> you can just pass !aParentDir to CopySingleFile
Yes, that looks like it works. I don't think it makes the code as easy to follow as the named variable and accompanying comment (and the compiler will presumably make them equally efficient). But I'm fine with using your logic if you prefer.
Is that a +sr?
Reporter | ||
Comment 18•13 years ago
|
||
Comment on attachment 547350 [details] [diff] [review]
let renames w/o changing directory skip NTFS ACL reset
Sure, if you're happy to do that, then sr=me with that changed.
Attachment #547350 -
Flags: superreview?(neil) → superreview+
Assignee | ||
Comment 19•13 years ago
|
||
incorporates comment 16.
Attachment #547350 -
Attachment is obsolete: true
Attachment #549053 -
Flags: superreview+
Attachment #549053 -
Flags: review+
Assignee | ||
Comment 20•13 years ago
|
||
I've inserted a old, non-SSD drive in my machine, and here are some numbers with and without these patches:
With a 877 MB Cache folder:
-----------------------------
Rename to *.Trash (blocks UI) 15179 ms -> 217 ms
Cache open at Startup 15182 ms -> 222 ms
Delete Trash (off-main-thread) 21337 ms -> 19450
With a 293 MB Cache folder:
-----------------------------
Rename to *.Trash (blocks UI) 1728 ms -> 62 ms
Cache open at Startup 1733 ms -> 66 ms
Delete Trash (off-main-thread) 3172 ms -> 8000 ms
So this fix is a big win--we go from blocking the UI (very noticeably) for 15 seconds to 200 ms, a 70x speedup. (Interesting to note that the I/O times here do not scale linearly--the delete and old-style ACL rename are 3x longer per-byte for the bigger cache).
Assignee | ||
Comment 21•13 years ago
|
||
http://hg.mozilla.org/integration/mozilla-inbound/rev/f5272524806b
http://hg.mozilla.org/integration/mozilla-inbound/rev/85efefa85d5f
I'm moving the rest of the work (delaying the delete IO) to a followup--bug 674869--so we can look at landing this on aurora/beta ASAP.
Whiteboard: [inbound]
Assignee | ||
Comment 22•13 years ago
|
||
Comment on attachment 547355 [details] [diff] [review]
Change Cache.trash logic to not use extra subdirectories
The UI hang here is really gross, and this fix is pretty simple.
Attachment #547355 -
Flags: approval-mozilla-beta?
Attachment #547355 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 23•13 years ago
|
||
Comment on attachment 549053 [details] [diff] [review]
let renames w/o changing directory skip NTFS ACL reset v2
Note: need both patches to get any benefit.
Attachment #549053 -
Flags: approval-mozilla-beta?
Attachment #549053 -
Flags: approval-mozilla-aurora?
Assignee | ||
Comment 24•13 years ago
|
||
Oh, got one data point wrong:
Delete Trash (off-main-thread) 3172 ms -> 8000 ms
This is actually inverted (delete got faster, from 8->3 seconds). But I suspect that's actually noise--there's nothing in this patch that changes the delete logic, just the rename.
Assignee | ||
Comment 25•13 years ago
|
||
Rename to *.Trash (blocks UI) 1728 ms -> 62 ms
Cache open at Startup 1733 ms -> 66 ms
Also, in case it's not obvious, Cache open at Startup blocks on the rename, which is why the Cache open time timing is always just a touch longer.
Reporter | ||
Comment 26•13 years ago
|
||
(In reply to comment #20)
> (Interesting to note that the I/O times here do not scale linearly--the
> delete and old-style ACL rename are 3x longer per-byte for the bigger cache).
It's probably the disk size of the metadata that influences the ACL rename, so just depends on the number of files rather than their size.
Comment 27•13 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/85efefa85d5f
http://hg.mozilla.org/mozilla-central/rev/f5272524806b
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla8
Assignee | ||
Comment 28•13 years ago
|
||
> It's probably the disk size of the metadata that influences the ACL rename,
> so just depends on the number of files rather than their size.
But I did the 877 MB test by making two copies of the 293 directory and storing them as subdirs, so the number of files per MB is the same. So I'm still not sure what the increased time is about. But the takeaway is that we should make sure we always test at the upper range of the max cache size...
Whiteboard: [inbound]
Target Milestone: mozilla8 → ---
Comment 29•13 years ago
|
||
Comment on attachment 547355 [details] [diff] [review]
Change Cache.trash logic to not use extra subdirectories
We don't want this in aurora or beta and will wait for it to bubble up from central.
Attachment #547355 -
Flags: approval-mozilla-beta?
Attachment #547355 -
Flags: approval-mozilla-beta-
Attachment #547355 -
Flags: approval-mozilla-aurora?
Attachment #547355 -
Flags: approval-mozilla-aurora-
Attachment #549053 -
Flags: approval-mozilla-beta?
Attachment #549053 -
Flags: approval-mozilla-beta-
Attachment #549053 -
Flags: approval-mozilla-aurora?
Attachment #549053 -
Flags: approval-mozilla-aurora-
Comment 31•13 years ago
|
||
There might be a regression in bug 675420, but I'm not sure. I'm only seeing a problem in Nightly, while another tester claims that it also exists in 5.0 and up.
Assignee | ||
Comment 32•13 years ago
|
||
bug 675420 is caused by a different bugfix. This bugfix is innocent of regressions (so far :)
Updated•13 years ago
|
Severity: normal → blocker
Updated•13 years ago
|
Severity: blocker → critical
Assignee | ||
Comment 36•13 years ago
|
||
Renaming to make it clear that this also covers the case where a user hits "Clear Now" to clear the cache. (so not just cache delete after a crash).
The fix lands in Firefox/Gecko 8.
Summary: Cache slow to clear (and hangs UI) after a crash → "clear cache now" (or cache delete after crash) is slow (and hangs UI) on Windows/NTFS
Target Milestone: --- → mozilla8
Comment 39•13 years ago
|
||
I've tried clearing cache on Windows XP and it's still slow (Firefox Beta).
There's also bug 730662 that is about slow clearing of Crash.Trash folder after a crash (on Windows 7).
Comment 40•13 years ago
|
||
Sorry, Cache.Trash.
Comment 41•13 years ago
|
||
(In reply to Marco Castelluccio from comment #39)
> I've tried clearing cache on Windows XP and it's still slow (Firefox Beta).
Was this slow because of deleting Cache.Trash or because of evicting all entries (bug #648232)?
Comment 42•13 years ago
|
||
I don't know, I'll try to investigate with your script to fill the cache (however I don't know how much time it will take, as the computer wasn't mine and I haven't access to it currently).
You need to log in
before you can comment on or make changes to this bug.
Description
•