Closed
Bug 913822
Opened 11 years ago
Closed 9 years ago
HTTP cache v2: fix shutdown time regression on a slow storage
Categories
(Core :: Networking: Cache, defect)
Core
Networking: Cache
Tracking
()
RESOLVED
FIXED
mozilla46
People
(Reporter: mayhemer, Assigned: mayhemer)
References
(Blocks 1 open bug)
Details
(Whiteboard: [cache2])
Attachments
(1 file, 1 obsolete file)
(deleted),
patch
|
michal
:
review+
Sylvestre
:
approval-mozilla-beta+
|
Details | Diff | Splinter Review |
Writes and file closes are done as the last operation on the I/O thread, with the least priority. When user visits a page with a lot (400+) resources on a page, like images or so, writes to disk are serialized and slow.
When we shut down, we wait for all the writes to finish. This may keep the browser open for minutes (!). We should just abandon never written files and partially written just leave as broken or delete them when there is just few of such files and it won't take a lot of time.
On a slow storage this introduces a huge and unbearable shutdown time regression.
Assignee | ||
Updated•11 years ago
|
Blocks: cache2enable
Assignee | ||
Comment 1•11 years ago
|
||
Suggestion:
- at the earliest moment we know the process is going to exit:
- set CacheIOThread to a FastShutdown state
- create a timestamp now() + say 2s
- when executing events on levels WRITE or CLOSE, check we are in the FastShutdown
- if so, check that now() is smaller then the timestamp we took
- if not, it means we are over the shutdown time limit and just prevent call of Run() for events on those two levels, thus just throw file writes and closings away
Not sure this may introduce any leaks or hangs, but seems as a simple solution.
Assignee | ||
Comment 2•11 years ago
|
||
It's been decided this shutdown time regression is not worth the risk of breaking the code since we are approaching final stage. Also, I don't believe we will be so much blocked even on mobile devices. I was testing with an extremely slow storage and an extreme number of data to write so this problem was visible. For production I think we can fix this as one of first bugs after cache2 enable.
Also doesn't block the never-remember-case since we tend to keep data only in memory for that setting. Hence that bug's not dependent on any shutdown cleanup we would handle/introduce/base here.
Assignee: michal.novotny → nobody
Assignee | ||
Comment 3•10 years ago
|
||
I think it's simpler now to fix and becomes pretty important. During my tests with cache on a slower sdcard I can hang for several tens of seconds to write all the unwritten stuff to the cache. We should not spend more then 2-3 seconds by delayed writes, after that they should be just bypassed as NOOP.
We unfortunatelly don't have tele for write and close times... so it's hard to find out if we also have to skip PR_Close. But according my research in the past, we probably have to skip it too... So, option just for a release build w/o leak detection?
Assignee: nobody → honzab.moz
Status: NEW → ASSIGNED
Assignee | ||
Comment 5•9 years ago
|
||
Based on bug 1215970 comment 86, it seems like leaking is a way we can go with.
I'll try a patch according comment 1.
Assignee | ||
Comment 6•9 years ago
|
||
- there is a 2 seconds limit to write and close anything
- the limit has a preference "browser.cache.max_shutdown_io_lag"
- when we pass that limit we never write anything and leave the files open
- files that have been doomed before shutdown are tho deleted from disk every time
- "invalid" handles (=with unwritten metadata) are left to prevent I/O (dooming) ; these will never load back, checksums will not validate
https://treeherder.mozilla.org/#/jobs?repo=try&revision=95e6f9365a5b
Attachment #8706475 -
Flags: review?(michal.novotny)
Comment 7•9 years ago
|
||
Comment on attachment 8706475 [details] [diff] [review]
v1
Review of attachment 8706475 [details] [diff] [review]:
-----------------------------------------------------------------
::: netwerk/cache2/CacheFileIOManager.cpp
@@ +2292,5 @@
> DebugOnly<bool> found;
> found = mHandlesByLastUsed.RemoveElement(aHandle);
> MOZ_ASSERT(found);
>
> + if (aHandle->mIsDoomed || aIgnoreShutdownLag || !IsPastShutdownIOLag()) {
Why we need to close handles of doomed entries?
Assignee | ||
Comment 8•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #7)
> Comment on attachment 8706475 [details] [diff] [review]
> v1
>
> Review of attachment 8706475 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> ::: netwerk/cache2/CacheFileIOManager.cpp
> @@ +2292,5 @@
> > DebugOnly<bool> found;
> > found = mHandlesByLastUsed.RemoveElement(aHandle);
> > MOZ_ASSERT(found);
> >
> > + if (aHandle->mIsDoomed || aIgnoreShutdownLag || !IsPastShutdownIOLag()) {
>
> Why we need to close handles of doomed entries?
Because entries that are doomed prior shutdown must be removed from disk. Later you are deleting the files, hence the handle must be closed (at least on windows, but osx has a problem too I think).
Comment 9•9 years ago
|
||
(In reply to Honza Bambas (:mayhemer) from comment #8)
> Because entries that are doomed prior shutdown must be removed from disk.
> Later you are deleting the files, hence the handle must be closed (at least
> on windows, but osx has a problem too I think).
If CacheFileHandle::mIsDoomed is true then the file was already moved to doomed directory. It's not critical to remove these files during shutdown because they will be deleted on next startup.
Assignee | ||
Comment 10•9 years ago
|
||
(In reply to Michal Novotny (:michal) from comment #9)
> (In reply to Honza Bambas (:mayhemer) from comment #8)
> > Because entries that are doomed prior shutdown must be removed from disk.
> > Later you are deleting the files, hence the handle must be closed (at least
> > on windows, but osx has a problem too I think).
>
> If CacheFileHandle::mIsDoomed is true then the file was already moved to
> doomed directory. It's not critical to remove these files during shutdown
> because they will be deleted on next startup.
You are right! I will update the patch. Thanks.
Assignee | ||
Updated•9 years ago
|
Attachment #8706475 -
Flags: review?(michal.novotny)
Assignee | ||
Comment 11•9 years ago
|
||
So, the only one difference is removal of |aHandle->mIsDoomed| from the condition. There is no need to do anything else, since we don't break on mFile->Remove() failure later in the code.
Actually, I'm thinking of not PR_Close()'ing the doomed and invalid files immediately after shutdown at all, just leak them, they won't load next or will be deleted next time anyway. This may save even more I/O and get more metadata be successfully written to disk. What do you think?
And I'm also thinking of giving the metadata write a bit more priority somehow during the shutdown time. But that would definitely be an optimization that may or may not have any real affect and definitely for a different bug.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=469a0a6cc7c8
Attachment #8706475 -
Attachment is obsolete: true
Attachment #8707675 -
Flags: review?(michal.novotny)
Updated•9 years ago
|
Attachment #8707675 -
Flags: review?(michal.novotny) → review+
Assignee | ||
Updated•9 years ago
|
Keywords: checkin-needed
Comment 12•9 years ago
|
||
Keywords: checkin-needed
Comment 13•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox46:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla46
Comment 14•9 years ago
|
||
If we think this can impact top crashers 1215970 we should consider uplift to 45. Honza do you think its a reasonable candidate? looks that way to me..
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 15•9 years ago
|
||
There has been no report of error since we've landed this. So, yes. I will request beta approval.
Flags: needinfo?(honzab.moz)
Assignee | ||
Comment 16•9 years ago
|
||
Comment on attachment 8707675 [details] [diff] [review]
v1.1
Approval Request Comment
[Feature/regressing bug #]: we could say the HTTP cache v2
[User impact if declined]: very long shutdown times when storage media where HTTP cache is located is slow
[Describe test coverage new/current, TreeHerder]: currently rides aurora
[Risks and why]: hard to say, but according zero error feedback so far I think minimal
[String/UUID change made/needed]: none
Attachment #8707675 -
Flags: approval-mozilla-beta?
Updated•9 years ago
|
status-firefox45:
--- → affected
Comment 17•9 years ago
|
||
Comment on attachment 8707675 [details] [diff] [review]
v1.1
If it can fix the skype issue, taking it.
Should be in 45 beta 3.
Attachment #8707675 -
Flags: approval-mozilla-beta? → approval-mozilla-beta+
Comment 18•9 years ago
|
||
bugherder uplift |
Comment 19•9 years ago
|
||
Unfortunately, this patch seems to not have helped bug 1215970 or bug 1158189 in 45.0b3.
You need to log in
before you can comment on or make changes to this bug.
Description
•