Closed Bug 340148 Opened 18 years ago Closed 15 years ago

nsFastLoadFileReader::ComputeChecksum is ~10% of cold-startup time

Categories

(Core :: XPCOM, defect, P2)

x86
Windows XP
defect

Tracking

()

RESOLVED DUPLICATE of bug 412796

People

(Reporter: darin.moz, Assigned: taras.mozilla)

References

(Depends on 1 open bug)

Details

(Keywords: perf, Whiteboard: [ts][patches in dependent bugs])

I profiled a cold-start of trunk Firefox on windows (static build w/ symbols), and I found that we spend about 10% of the time computing checksums on the two fastload files.

There's a comment in the code saying that the 8k buffer size ought to be tuned.  Maybe we can do even better.  Maybe we can avoid having to read in the entire fastload file since otherwise the file might not be read in its entirety (assuming it might contain data for windows that are not opened at startup).
Status: NEW → ASSIGNED
Keywords: perf
Priority: -- → P4
Wow 10% is a lot. Having faster startup time is something that's VERY important for a lot of users.
For the suite, there could be lots of data to checksum not needed for first browser window.  For Firefox, it seems to me the dialogs and any non-browser window data will be relatively small.

I suspect if we mmapped the whole file and ran over it to checksum it, warming it up for deserialization, we would see the profile data shift away from i/o.  I'm assuming the checksum computation is tiny -- it should be.

/be
-> reassign to default owner
Assignee: darin.moz → nobody
Status: ASSIGNED → NEW
Taking, want to deCOMtaminate FastLoad.

/be
Assignee: nobody → brendan
Depends on: 104170
Blocks: 195010
Changing target to Future since 1.9a1 has long shipped. Brendan, is this still on the radar for 1.9?
Target Milestone: mozilla1.9alpha1 → Future
Flags: wanted1.9.1?
Whiteboard: [ts]
results from the FunctionTimer patch bug 480735:

ddahl-t500 1~/code/moz/mozilla-central/mozilla % ../obj-i686-pc-linux-gnu-optimize/dist/bin/firefox -P cold -no-remote
ddahl-t500 ~/code/moz/mozilla-central/mozilla % cat ~/logs/startup.txt | grep Compute                                
-1243674240[b5a18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: <     2 ms (    2 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243674240[b5a18060]: <     1 ms (    1 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
ddahl-t500 ~/code/moz/mozilla-central/mozilla %
sorry the last log data was warm startup, here are the cold startup numbers on Linux/Trunk:

ddahl-t500 139~/code/moz/mozilla-central/mozilla % cat ~/logs/startup.txt|grep Compute
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <    76 ms (   76 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <    33 ms (   33 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: > ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
-1243018880[b5b18060]: <     0 ms (    0 ms total) - ComputeChecksum (/home/ddahl/code/moz/mozilla-central/mozilla/xpcom/io/nsFastLoadService.cpp:391)
I've also done some profiling on this.  I stuck some PRIntervalTimes into nsFastLoadFileReader::ComputeChecksum and ran 10 runs under simulated cold startup on OS X (i.e., sync && purge), where I'm getting 9-10s overall wall clock startup time.  Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.5; en-US; rv:1.9.3a1pre) Gecko/20090826 Minefield/3.7a1pre

Upshot is, I'm seeing ~3% of wall clock startup time, not 10%, with ~83% of that 3% in reads and ~2% of that 3% in NS_AccumulateFastLoadChecksum, confirming comment 2.  (Despite using PR_IntervalToMicroseconds, I'm getting only ms granularity; besides CPU, the remaining ~15% of that 3% must be lost precision, including intervals < 1ms (Seeks and Tells).)

Data follows.  Numbers are times in ms.  "total" is total time spent in nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.  "read" is total time spent in Read calls from nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.  "NS_AccumulateFastLoadChecksum" is total time spent in NS_AccumulateFastLoadChecksum calls from nsFastLoadFileReader::ComputeChecksum for each of the 10 runs.

> total <- c(314.0,94.0,92.0,94.0,610.0,516.0,256.0,94.0,154.0,564.0)
> mean(total)
[1] 278.8
> median(total)
[1] 205
> sd(total)
[1] 211.339
> min(total)
[1] 92
> max(total)
[1] 610
> read <- c(307.0,79.0,82.0,90.0,602.0,116.0,246.0,83.0,148.0,555.0)
> mean(read)
[1] 230.8
> median(read)
[1] 132
> sd(read)
[1] 198.8359
> min(read)
[1] 79
> max(read)
[1] 602
> NS_AccumulateFastLoadChecksum <- c(5.0,11.0,4.0,1.0,5.0,8.0,9.0,8.0,4.0,4.0)
> mean(NS_AccumulateFastLoadChecksum)
[1] 5.9
> median(NS_AccumulateFastLoadChecksum)
[1] 5
> sd(NS_AccumulateFastLoadChecksum)
[1] 2.998148
> min(NS_AccumulateFastLoadChecksum)
[1] 1
> max(NS_AccumulateFastLoadChecksum)
[1] 11
> mean(read) / mean(total)
[1] 0.8278336
> mean(NS_AccumulateFastLoadChecksum) / mean(total)
[1] 0.02116212
> wall_clock <- 9500
> mean(total) / wall_clock
[1] 0.02934737
BTW, mmapping per comment 2 is being worked on in bug 412796.
Found out that we checksum 7mb worth of stuff every startup, only to read 200K of it.

I wonder if we should disable fastload checksums while we rewrite the fastload system to be less slow?
http://people.mozilla.com/~tglek/less_fasl.diff saves 12% of warm startup on n810. Unfortunately cold startup time went to hell due to throughput decreasing by 90%. 

perhaps doing more aggressive mmap than bug 412796, with correct madvise flags will fix the throughput.
Fastload writes the file when we are slow-loading, so are we really pushing out XDR'ed bytes for more files than we read in when we next hit the .mfasl file? Or are we pushing out more bytes for some reason?

/be
(In reply to comment #12)
> Fastload writes the file when we are slow-loading, so are we really pushing out
> XDR'ed bytes for more files than we read in when we next hit the .mfasl file?
> Or are we pushing out more bytes for some reason?
> 
> /be

yes during component registration every file is read in...which happens to coincide with when fastload is created(i think).
Argh, how asymmetric -- easy to fix?

/be
(In reply to comment #14)
> Argh, how asymmetric -- easy to fix?
> 
> /be

well if I do what you are implying, it shouldn't be hard to not write fasl files during comporent registration. But then we'll get burned during execution since other components and modules would be slowloaded.

I think the proper fix is to do what zip files do and store checksums per fastloaded object. Furthermore, we should set the file ordering to be what startup looks like so it's a short sequential read on startup. That doesn't sound easy at all.
(In reply to comment #10)
> Found out that we checksum 7mb worth of stuff every startup, only to read 200K
> of it.

That wasn't completely correct. The XPC.mfasl file is only ~ 2mb, so we checksum 2mb of it. But in the end due to spreading files all over the place and buffering aggressively we read in 7mb out of the 2mb file during startup.
Depends on: 517956
(In reply to comment #16)
> (In reply to comment #10)
> > Found out that we checksum 7mb worth of stuff every startup, only to read 200K
> > of it.
> 
> That wasn't completely correct. The XPC.mfasl file is only ~ 2mb, so we
> checksum 2mb of it. But in the end due to spreading files all over the place
> and buffering aggressively we read in 7mb out of the 2mb file during startup.

Sounds like bug 195010 is indeed biting.

/be
Conclusion:
Did some more digging. I think I got confused by xdr api on friday. In some cases we do indeed read the entire fastload file on startup, but it's not as bad as i thought.

 If a fastload file is written during normal(non-component-reg) startup then we do not overread due to checksum/etc. 

If a fastload is written during compreg we can overread, by around 2.25x due to checksoming. 

Together with overreading due to buffering as suggested in bug 195010, we end up with 2.5x more io, than filesize.

I think mmap io(something like bug 504864, but more aggressive) is going to solve the reading-multiple times problem.
Assignee: brendan → tglek
Depends on: 412796
Severity: minor → normal
Priority: P4 → P2
Target Milestone: Future → ---
Whiteboard: [ts] → [ts][patches in dependent bugs]
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
Flags: wanted1.9.1?
You need to log in before you can comment on or make changes to this bug.