Closed
Bug 586859
Opened 14 years ago
Closed 14 years ago
Move startupcache write off the main thread
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
People
(Reporter: benedict, Assigned: taras.mozilla)
References
(Blocks 1 open bug)
Details
(Whiteboard: [softblocker])
Attachments
(2 files, 5 obsolete files)
(deleted),
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
benjamin
:
approval2.0+
|
Details | Diff | Splinter Review |
Post-startup writes in startupcache should be batched into an in-memory cache, and only written out after a long idle period. All writes should happen off the main thread.
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → bhsieh
Reporter | ||
Comment 1•14 years ago
|
||
The design that we had in mind was:
Writes always go to an in-memory cache. Attempts to read will check this cache first and the on-disk JAR second. After a sufficient idle period (say, no user input for 5 minutes), this in-memory cache is written out on a separate thread. When this is done, the in-memory cache is deleted and the zipArchive (the representation of the on-disk JAR) is reloaded.
Need to think about the best way to synchronize this. One way is just to have a lock around the in-memory cache and another lock around the zipArchive, but there may be a better approach.
Assignee: bhsieh → tglek
Assignee | ||
Comment 2•14 years ago
|
||
This does a couple of things
a) it always batches writes, instead of just first ones. This reduces fragmentation, by writing less. I'm not sure if I broke anything by changing this logic.
b) Writes off main thread. There is no locking, just pr_join in case there is an outstanding write request. I think this is good enough since cache access should be relatively rare, so there is a good chance of a write getting through without anyone having to wait on it.
c) Since we are writing off main thread, I turned on compression to reduce amount of io performed on subsequent startups.
d) Make use of the timestamp field in the jar. File stamps will enable fragmentation analysis to see how many times the startup cache was written to. This will be useful if we later want to trigger rewrites of the cache
Assignee | ||
Comment 3•14 years ago
|
||
Assignee | ||
Comment 4•14 years ago
|
||
This accomplishes the following
* move zip writing to a single function
* group zip writes
* start the write timer on demand
* postpone write timer with every cache write
* turn on compression, add timestamps
This passes try
Attachment #488494 -
Flags: review?
Assignee | ||
Updated•14 years ago
|
Attachment #487701 -
Attachment is obsolete: true
Assignee | ||
Updated•14 years ago
|
Attachment #488494 -
Flags: review? → review?(ben.hsieh)
Assignee | ||
Comment 5•14 years ago
|
||
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing
Ben, if you don't have time to review this, please just glance over to see if I made any fatal assumptions
Assignee | ||
Comment 6•14 years ago
|
||
This is my first piece of threaded C++, so I might've picked the wrong approach here. The idea is to spawn off a worker thread for writes and hope that nobody else tries to read/write from the cache(which is relatively rare) while the thread runs(otherwise they will block).
Attachment #488599 -
Flags: review?(jones.chris.g)
Assignee | ||
Updated•14 years ago
|
Flags: wanted-fennec1.0?
Assignee | ||
Updated•14 years ago
|
tracking-fennec: --- → ?
Flags: wanted-fennec1.0?
Updated•14 years ago
|
blocking2.0: --- → final+
Comment on attachment 488599 [details] [diff] [review]
run zipwriter + reload zip on a thread
Approach looks OK as far as I understand it. I don't know startupcache well to confirm the assumption of cache/read writes being relatively rare.
It's not clear to me how this patch gets startupcache reloaded off the main thread. It doesn't seem like that's possible, given the apparently synchronous API.
What prevents gStartupCache from being deleted while the writer thread is writing it? Are all the methods/functions that can mutate gStartupCache protected by WaitOnWriteThread? Need to document that.
Things this patch needs
- Document the threading model. The are exactly two, main and writer thread. There's only one startupcache "instance" at any time, and it only travels from main-->writer to be written to disk. Attempts to reincarnate a new instance are serialized after the writing of the old instance.
- I would rename WaitOnWriteThread() to something more specific, like FinishWritingPreviousCache().
- Make strong assertions about which functions can be called on which thread. It seem like WaitOnWriteThread() is main-thread only, right?
Attachment #488599 -
Flags: review?(jones.chris.g) → feedback+
Assignee | ||
Comment 8•14 years ago
|
||
(In reply to comment #7)
> Comment on attachment 488599 [details] [diff] [review]
> run zipwriter + reload zip on a thread
>
> Approach looks OK as far as I understand it. I don't know startupcache well to
> confirm the assumption of cache/read writes being relatively rare.
Every write to the cache delays the writeout by 60 seconds. Writes occur on cache misses when something is trying to read from cache. Cache read/writes are usually clustered together. Thus there is a high chance of the writeout occurring without anyone blocking on it.
>
> It's not clear to me how this patch gets startupcache reloaded off the main
> thread. It doesn't seem like that's possible, given the apparently synchronous
> API.
I meant the part where it does LoadArchive() after the write.
>
> What prevents gStartupCache from being deleted while the writer thread is
> writing it? Are all the methods/functions that can mutate gStartupCache
> protected by WaitOnWriteThread? Need to document that.
The destructor calls WriteToDisk which does WaitOnWriteThread().
>
> Things this patch needs
> - Document the threading model. The are exactly two, main and writer thread.
> There's only one startupcache "instance" at any time, and it only travels from
> main-->writer to be written to disk. Attempts to reincarnate a new instance
> are serialized after the writing of the old instance.
> - I would rename WaitOnWriteThread() to something more specific, like
> FinishWritingPreviousCache().
> - Make strong assertions about which functions can be called on which thread.
> It seem like WaitOnWriteThread() is main-thread only, right?
That's the confusing part: it is not mainthread-only. LoadArchive() is called from the main thread upon opening the startup cache. WriteToDisk() is also called from the main thread in the destructor. Ie in the these cases these are inherently synchronous operations, makes no sense to do them on a worker thread.
The write-thread is spawned on a timeout while the app is running. This way we can avoid a slow shutdown. After writing out the cache, the zipreader is reloaded on the same thread.
WaitOnWriteThread() can be called from a main thread to wait for the worker thread to finish. However since the same code is used in the worker thread and main thread, the worker thread can also call WaitOnWriteThread() which does a thread.join on itself which is a no-op.
It's a bit messy, but this is the simplest threading model I could come up with for startupcache.
Assignee | ||
Comment 9•14 years ago
|
||
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing
I'm guessing benh is too busy atm
Attachment #488494 -
Flags: review?(ben.hsieh) → review?(dwitte)
(In reply to comment #8)
This seems OK. Just ensure all that makes it into code comments ;).
Reporter | ||
Comment 11•14 years ago
|
||
Hey, sorry about the delay. (As though I ever could get _my_ patches reviewed in less than a week, ;) ).
Just a few small comments:
>+ nsCOMPtr<nsIZipWriter> zipW = do_CreateInstance("@mozilla.org/zipwriter;1");
>+ if (!zipW)
>+ return;
Probably want a NS_WARNING in this case, as below.
>@@ -388,7 +345,6 @@ StartupCache::InvalidateCache()
>
> // This is usually closed, but it's possible to get into
> // an inconsistent state.
>- mZipW->Close();
I think you can remove the comment now, right?
>+nsresult
>+StartupCache::ResetStartupWriteTimer()
>+{
>+ mStartupWriteInitiated = PR_FALSE;
Do you still need this variable?
From the other patch:
>+ //stick a functiontimer thing here
>+ //NS_WARNING("Waiting on startupcache write");
Are you still planning on doing either of these?
> nsresult
> StartupCache::PutBuffer(const char* id, const char* inbuf, PRUint32 len)
> {
>+ WaitOnWriteThread();
This Wait is only necessary in debug, right? Otherwise the PutBuffer doesn't check the existing archive or in-memory table at all? (I'm not sure about this, though.)
So basically the only case where we would block at all is if we did a write, then nothing for 10 minutes, then did another access operation, correct?
Also, maybe we need another test in TestStartupCache now to test interweaving read/writes. One suggestion for that is, you could make ResetStartupWriteTimer() take a param that says how long to wait, which defaults to 10 seconds. Then you can make that shorter in tests.
Other than that, patch looks pretty good. Thanks for the cleanup bits too.
Updated•14 years ago
|
tracking-fennec: ? → 2.0+
Comment 12•14 years ago
|
||
Comment on attachment 488494 [details] [diff] [review]
prepare startup cache for off-main thread writing
>diff --git a/startupcache/StartupCache.cpp b/startupcache/StartupCache.cpp
>@@ -334,7 +286,7 @@ CacheCloseHelper(const nsACString& key,
> NS_ASSERTION(NS_SUCCEEDED(rv) && hasEntry == PR_FALSE,
> "Existing entry in disk StartupCache.");
> #endif
>- rv = writer->AddEntryStream(key, 0, 0, stream, false);
>+ rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);
Why is providing holder->time relevant? Are you planning to use it later? It costs cycles to compute it...
>@@ -388,7 +345,6 @@ StartupCache::InvalidateCache()
>
> // This is usually closed, but it's possible to get into
> // an inconsistent state.
>- mZipW->Close();
No need for the comment now.
>@@ -599,20 +551,29 @@ StartupCacheWrapper::StartupWriteComplet
> return NS_OK;
> }
>
>+
Extra newline.
>+nsresult
>+StartupCache::ResetStartupWriteTimer()
>+{
Move this up with the other SC methods, rather than by SCW?
>+ mStartupWriteInitiated = PR_FALSE;
>+ nsresult rv;
>+ if (!mTimer)
>+ mTimer = do_CreateInstance("@mozilla.org/timer;1", &rv);
>+ else
>+ rv = mTimer->Cancel();
>+ NS_ENSURE_SUCCESS(rv, rv);
>+ // Wait for 10 seconds, then write out the cache.
Presumably you mean 60. :)
Also, throw some newlines around?
Can you cancel the timer in ~StartupCache? Also, do you want to force a write there, or throw away any unwritten data?
r=dwitte with fixes. Tests? Talos results?
Attachment #488494 -
Flags: review?(dwitte) → review+
Assignee | ||
Comment 13•14 years ago
|
||
> >- rv = writer->AddEntryStream(key, 0, 0, stream, false);
> >+ rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);
>
> Why is providing holder->time relevant? Are you planning to use it later? It
> costs cycles to compute it...
It's useful for analysis of how startup cache is populated/etc. In the future it will be useful for automatic defrag-logic. CPU overhead is irrelevant in this case.
>
> Can you cancel the timer in ~StartupCache? Also, do you want to force a write
> there, or throw away any unwritten data?
>
> r=dwitte with fixes. Tests? Talos results?
It's already canceled+flushed. No tests, this just changes existing behavior to be more efficient, no new functionality.
Comment 14•14 years ago
|
||
(In reply to comment #13)
> > >- rv = writer->AddEntryStream(key, 0, 0, stream, false);
> > >+ rv = writer->AddEntryStream(key, holder->time, PR_TRUE, stream, false);
> >
> > Why is providing holder->time relevant? Are you planning to use it later? It
> > costs cycles to compute it...
>
> It's useful for analysis of how startup cache is populated/etc. In the future
> it will be useful for automatic defrag-logic. CPU overhead is irrelevant in
> this case.
>
> >
> > Can you cancel the timer in ~StartupCache? Also, do you want to force a write
> > there, or throw away any unwritten data?
> >
> > r=dwitte with fixes. Tests? Talos results?
>
> It's already canceled+flushed. No tests, this just changes existing behavior to
> be more efficient, no new functionality.
you can haz approval?
Assignee | ||
Comment 15•14 years ago
|
||
Addressed review comments
Attachment #487702 -
Attachment is obsolete: true
Attachment #488494 -
Attachment is obsolete: true
Attachment #498255 -
Flags: review+
Assignee | ||
Comment 16•14 years ago
|
||
Added comments on threading as requested
Attachment #488599 -
Attachment is obsolete: true
Attachment #498257 -
Flags: review?(jones.chris.g)
Comment on attachment 498257 [details] [diff] [review]
run zipwriter + reload zip on a thread
I still don't particularly like WaitOnWriteThread() but wouldn't r- for that.
Attachment #498257 -
Flags: review?(jones.chris.g) → review+
Updated•14 years ago
|
Whiteboard: [softblocker]
Assignee | ||
Comment 18•14 years ago
|
||
http://hg.mozilla.org/mozilla-central/rev/c2cc892c5d53
http://hg.mozilla.org/mozilla-central/rev/908f598737d7
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Whiteboard: [softblocker]
Assignee | ||
Comment 19•14 years ago
|
||
backed out the threading part in case it was causing an intermittent gc-on-shutdown crash
http://hg.mozilla.org/mozilla-central/rev/dc0cff820b22
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 20•14 years ago
|
||
Are you joining the helper thread on xpcom-shutdown? We mustn't leave threads running past the XPCOM shutdown process.
Whiteboard: [softblocker]
Assignee | ||
Comment 21•14 years ago
|
||
(In reply to comment #20)
> Are you joining the helper thread on xpcom-shutdown? We mustn't leave threads
> running past the XPCOM shutdown process.
No, I this patch joins the thread from startupcache destructor. Perhaps that happens too late in shutdown.
Comment 22•14 years ago
|
||
Yes, that happens too late.
Assignee | ||
Comment 23•14 years ago
|
||
This is the same patch as before, but it joins the writer thread on shutdown.
Attachment #498257 -
Attachment is obsolete: true
Attachment #505205 -
Flags: approval2.0?
Updated•14 years ago
|
Attachment #505205 -
Flags: approval2.0? → approval2.0+
Assignee | ||
Comment 24•14 years ago
|
||
Status: REOPENED → RESOLVED
Closed: 14 years ago → 14 years ago
Resolution: --- → FIXED
Comment 25•14 years ago
|
||
Looks like the original landing caused a small Ts regression (~40ms) in Mobile
before:
663b7d2d4b6d ts: 6881.78
after:
22a635e15db1 ts: 6923.56
Unexpected?
Assignee | ||
Comment 26•14 years ago
|
||
(In reply to comment #25)
> Looks like the original landing caused a small Ts regression (~40ms) in Mobile
>
> before:
> 663b7d2d4b6d ts: 6881.78
>
> after:
> 22a635e15db1 ts: 6923.56
>
> Unexpected?
I turned on compression to reduce cold io :(.
You need to log in
before you can comment on or make changes to this bug.
Description
•