Reduce time spent decompressing StartupCache entries
Categories
(Toolkit :: Startup and Profile System, enhancement, P3)
Tracking
()
People
(Reporter: dthayer, Assigned: dthayer)
References
(Blocks 2 open bugs, Regressed 2 open bugs)
Details
(Keywords: main-thread-io, perf, Whiteboard: [fxperf:p1])
Attachments
(10 files, 2 obsolete files)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
On reference hardware I see anywhere from 600ms to 3000ms spent in inflate under StartupCache::GetBuffer. I think we can do better with lz4. Currently we compress at compression level 1 with deflate, I imagine because compressing with deflate at max compression is too expensive. With lz4 I'm able to compress at its level 6 while taking less time, meaning the overall filesize is about the same, while taking an order of magnitude less time.
Updated•6 years ago
|
Assignee | ||
Comment 1•6 years ago
|
||
[fxperf:p1] because I'm currently working on it.
Assignee | ||
Comment 2•6 years ago
|
||
Assignee | ||
Comment 3•6 years ago
|
||
I opted to go with what I perceived as the more expedient route
of leaving lz4 roughly where it is and just adding to that. The
biggest complication was xxhash, which is included elsewhere.
I'm not generally proficient with build-related things though so
my solution may be wrong and not just ugly.
Assignee | ||
Comment 4•6 years ago
|
||
Edits to all of the other locations that use nsIZipWriter to
follow. This only switches StartupCache. Regarding the substance
of the change, though, I opted for lz4 because it's the fastest
by a long shot on decompress, so we'll pay the smallest cost for
warm startups, and at compression level 6 it's able to match the
compression ratio of deflate at level 1, while still being
faster to compress. We could potentially opt for a higher
compression ratio alg, but I think the shutdown cost of doing so
may not be worth paying, considering we already opted to compress
with deflate at 1.
Depends on D30640
Assignee | ||
Comment 5•6 years ago
|
||
I split this out to hopefully make it easier to digest the more
substantive changes in the previous commit. The changes here
should just leave the existing behavior unchanged.
Depends on D30641
Assignee | ||
Comment 6•6 years ago
|
||
This will not behave exactly the same if we had previously written bad
data for the entry that would fail to decompress. I imagine this is rare
enough, and the consequences are not severe enough, that this should be
fine.
Depends on D30642
Assignee | ||
Comment 7•6 years ago
|
||
I thought I had already written out the patch to remove these, but
apparently not. Per discussion in the startup cache telemetry bug,
there should be no reason for doing this.
Depends on D30640
Updated•6 years ago
|
Assignee | ||
Comment 8•6 years ago
|
||
This just splits out the InputBuffer and OutputBuffer helper classes
to make it cleaner for the StartupCache to include them.
Depends on D30643
Assignee | ||
Comment 9•6 years ago
|
||
I am not aware of anything that depends on StartupCache being a
zip file, and since I want to use lz4 compression because inflate
is showing up quite a lot in profiles, it's simplest to just use
a custom format. This loosely mimicks the ScriptPreloader code,
with a few diversions:
- Obviously the contents of the cache are compressed. I used lz4
for this as I hit the same file size as deflate at a compression
level of 1, which is what the StartupCache was using previously,
while decompressing an order of magnitude faster. Seemed like
the most conservative change to make. I think it's worth
investigating what the impact of slower algs with higher ratios
would be, but for right now I settled on this. We'd probably
want to look at zstd next. - I use streaming compression for this via lz4frame. This is not
strictly necessary, but has the benefit of not requiring as
much memory for large buffers, as well as giving us a built-in
checksum, rather than relying on the much slower CRC that we
were doing with the zip-based approach. - I coded the serialization of the headers inline, since I had to
jump back to add the offset and compressed size, which would
make the nice Code(...) method for the ScriptPreloader stuff
rather more complex. Open to cleaner solutions, but moving it
out just felt like extra hoops for the reader to jump through
to understand without the benefit of being more concise.
Depends on D34651
Assignee | ||
Comment 10•6 years ago
|
||
We should be able to just let consumers borrow the buffer, as long
as we can guarantee that we'll keep it around. We shouldn't be
invalidating the cache with any significant frequency, so we
should be able to just hold onto old items indefinitely.
Depends on D34652
Assignee | ||
Comment 11•6 years ago
|
||
The first run loads more things into the StartupCache than are
used on the second and subsequent runs. This just ensures that if
the StartupCache diverges too far from its actual use that we will
rebuild it.
Depends on D34653
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 12•6 years ago
|
||
I updated the patch to just change the StartupCache's underlying format entirely, rather than extending nsIZipWriter to support lz4, as it's nonstandard. This means that D30640 is no longer strictly necessary, but I would like to keep it for reasons mentioned in the description of D34652.
This also facilitated eliminating some copies and making it easier to compact the cache if it's larger than it needs to be. Removing unused items from the cache means that every page that straddles the boundary between two cache items is fully utilized.
Talos seems rather happy with the changes overall. In my original approach, I compressed the cache as one big chunk and decompressed the whole thing on initialize. This had more dramatic results, but strangely it improved WebRender-enabled ts_paint while degrading WR-disabled ts_paint. I couldn't get this effect to go away, so I went with the per-item compression.
Assignee | ||
Comment 13•5 years ago
|
||
Does what it says on the tin. Once we have a central scheduling
system this should likely just consume that.
Depends on D34654
Assignee | ||
Comment 14•5 years ago
|
||
Kris, could you give an update on the status of the reviews of this? Does it look like it will be generally acceptable or do you foresee major changes required?
Comment 15•5 years ago
|
||
Hey kmag, any chance you'll have a chance to look at this soon? Or would it be best to find a different reviewer?
Comment 16•5 years ago
|
||
Hey froydnj, would you be a good candidate to review these last few patches?
Comment 17•5 years ago
|
||
(In reply to Mike Conley (:mconley) (:⚙️) from comment #16)
Hey froydnj, would you be a good candidate to review these last few patches?
Sure, I can look at these. Doug, throw these into my phabricator queue, please?
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Comment 19•5 years ago
|
||
Depends on D35454
Assignee | ||
Comment 20•5 years ago
|
||
Quick update on where this is: I ran it through Talos to check that the numbers were still good, and somehow a ts_paint regression has crept in, so I'm trying to diagnose where this was introduced, as ts_paint was something like 5% green in the earlier versions of the patches.
Assignee | ||
Comment 21•5 years ago
|
||
I am almost certain that the above result is spurious at this point. ts_paint performs better locally, startup profiles better locally (less time spend in the StartupCache), and ts_paint on try profiles better all with these patches applied. I think stuff is just being shuffled around by this performing faster such that it happens to be a little worse on the hardware we're running this on.
Waiting for profiles on try from shippable builds, but assuming those come in looking the same my plan is just to button this up and take the nominal loss.
Assignee | ||
Comment 22•5 years ago
|
||
Adds an explicit error mapping function to Result, to simplify
using MOZ_TRY... macros.
Depends on D34651
Comment 23•5 years ago
|
||
Comment 24•5 years ago
|
||
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=268934700&repo=autoland
https://treeherder.mozilla.org/logviewer.html#?job_id=268934680&repo=autoland
Backout link: https://hg.mozilla.org/integration/autoland/rev/f43fd233453d0ffdf1a866e0d4061a2c5a9e7f46
Comment 25•5 years ago
|
||
Comment 26•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/ea19d193ebbb
https://hg.mozilla.org/mozilla-central/rev/eb37df990734
https://hg.mozilla.org/mozilla-central/rev/7915439e3c10
https://hg.mozilla.org/mozilla-central/rev/fbba9fe43303
https://hg.mozilla.org/mozilla-central/rev/8cb90c97dddb
https://hg.mozilla.org/mozilla-central/rev/992e2a6359f9
https://hg.mozilla.org/mozilla-central/rev/849df6250b7e
https://hg.mozilla.org/mozilla-central/rev/c90c4ca6cd25
https://hg.mozilla.org/mozilla-central/rev/36a2b9138e11
https://hg.mozilla.org/mozilla-central/rev/0c4837041392
Comment 27•5 years ago
|
||
Thanks a lot, it looks like it improved DevTools startup quite significantly!
== Change summary for alert #23353 (as of Sat, 05 Oct 2019 21:21:33 GMT) ==
Improvements:
43% damp console.openwithcache.open.DAMP linux64-shippable opt e10s stylo 506.01 -> 287.60
7% damp simple.webconsole.open.DAMP linux64-shippable opt e10s stylo 295.81 -> 275.02
7% damp complicated.webconsole.open.DAMP windows10-64-shippable-qr opt e10s stylo 322.21 -> 300.79
6% damp cold.webconsole.open.DAMP windows10-64-shippable opt e10s stylo 306.30 -> 287.18
6% damp cold.webconsole.open.DAMP windows10-64-shippable-qr opt e10s stylo 305.04 -> 286.02
...
For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=23353
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 29•5 years ago
|
||
Update on this for any interested parties: it looks like we improved median time to first paint by something like 5-10% with this change. Here is the raw data - med_val is the median time to first paint, and the change landed on the 20191005 build:
----- before change -----
build: 20190917, median_first_paint: 2507
build: 20190918, median_first_paint: 2385
build: 20190919, median_first_paint: 2254
build: 20190920, median_first_paint: 2381
build: 20190921, median_first_paint: 2416
build: 20190922, median_first_paint: 2361
build: 20190923, median_first_paint: 2395
build: 20190924, median_first_paint: 2407
build: 20190925, median_first_paint: 2492
build: 20190926, median_first_paint: 2507
build: 20190927, median_first_paint: 2383
build: 20190928, median_first_paint: 1991
build: 20190929, median_first_paint: 2431
build: 20190930, median_first_paint: 2364
build: 20191001, median_first_paint: 2349
build: 20191002, median_first_paint: 2359
build: 20191003, median_first_paint: 2352
build: 20191004, median_first_paint: 2401
mean value: 2374
----- after change -----
build: 20191005, median_first_paint: 2246
build: 20191006, median_first_paint: 2223
build: 20191007, median_first_paint: 2182
mean value: 2217
6.61% improvement
The builds after 20191007 show even lower (better) numbers, but it's too early to tell if they are valid.
Comment 30•5 years ago
|
||
Just FTR, as it may be too late to be worth doing anything about it: I think this bug should have updated the CLOBBER file -- at least, I've had broken builds on a couple of machines recently, complaining about "No rule to make target '..../mfbt/lz4.c', needed by 'lz4.obj'", which I believe is a result of the file moves here (lz4.{c,h} moved into a subdirectory). Clobbering the objdir (where that out-of-date dependency is recorded) fixes this.
Comment 31•5 years ago
|
||
(In reply to Jonathan Kew (:jfkthame) from comment #30)
Just FTR, as it may be too late to be worth doing anything about it: I think this bug should have updated the CLOBBER file -- at least, I've had broken builds on a couple of machines recently, complaining about "No rule to make target '..../mfbt/lz4.c', needed by 'lz4.obj'", which I believe is a result of the file moves here (lz4.{c,h} moved into a subdirectory). Clobbering the objdir (where that out-of-date dependency is recorded) fixes this.
I filed bug 1591172 about this, as the issue reappeared for me even after clobbering.
Comment 32•5 years ago
|
||
Have you by any chance relaxed the test to print out the warning result.isErr() in the following code a bit?
I am trying to figure out why I see many errors in locally executed |mach xpcshell-tests| of C-C TB (FULL DEBUG version) in the last month (it ran OK in August.).
https://bugzilla.mozilla.org/show_bug.cgi?id=1589779
While I am analyzing the many cases of indexedDB failures now observed in the local log, I noticed many warnings of the following form:
77:57.81 pid:29914 [29914, Main Thread] WARNING: 'result.isErr()', file /NEW-SSD/NREF-COMM-CENTRAL/mozilla/startupcache/StartupCache.cpp, line 173
I am not sure if the warning is related to the indexedDB error I observe or not.
But it could be that maybe xpcshell-tests harness may not populate the startup cache correctly or something.
I just wanted to be sure that the warning could be safely ignored and unrelated to the indexedDB failures I observe.
At the same time, it may be nice if we can reduce the warning to a single line instead of producing it many times (maybe very difficult to do.)
Anyway, just FYI.
TIA
Comment 33•5 years ago
|
||
I’ve also noticed an increase in these warnings since this patch
landed. It can be reproduced outside of xpcshell just by using the
-h
flag:
% ./mach run -- -h
0:00.47 /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/dist/bin/firefox -h -no-remote -profile /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/tmp/profile-default
Usage: /home/ato/src/gecko/obj-x86_64-pc-linux-gnu/dist/bin/firefox [ options ... ] [URL]
where options include:
X11 options
--display=DISPLAY X display to use
--sync Make X calls synchronous
--g-fatal-warnings Make all warnings fatal
Firefox options
-h or --help Print this message.
-v or --version Print Firefox version.
-P <profile> Start with <profile>.
--profile <path> Start with profile at <path>.
--migration Start with migration wizard.
--ProfileManager Start with ProfileManager.
--no-remote Do not accept or send remote commands; implies
--new-instance.
--new-instance Open new instance, not a new window in running instance.
--UILocale <locale> Start with <locale> resources as UI Locale.
--safe-mode Disables extensions and themes for this session.
--allow-downgrade Allows downgrading a profile.
--MOZ_LOG=<modules> Treated as MOZ_LOG=<modules> environment variable,
overrides it.
--MOZ_LOG_FILE=<file> Treated as MOZ_LOG_FILE=<file> environment variable,
overrides it. If MOZ_LOG_FILE is not specified as an
argument or as an environment variable, logging will be
written to stdout.
--headless Run without a GUI.
--save-recordings Save recordings for all content processes to a directory.
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
[622, Main Thread] WARNING: 'result.isErr()', file /home/ato/src/gecko/startupcache/StartupCache.cpp, line 173
Description
•