Closed Bug 1519184 Opened 6 years ago Closed 6 years ago

startup cache is purged on second run of windows builds

Categories

(Toolkit :: Startup and Profile System, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla66
Tracking Status
firefox66 --- fixed

People

(Reporter: mossop, Assigned: mossop)

Details

Attachments

(1 file)

When Firefox runs we cache a number of things, JS, XUL, XBL, CSS etc. Whenever a new build of Firefox runs against a profile we clear the cache to rebuild with the new information.

On the first run of a profile where compatibility.ini doesn't exist (presumably to cover the case of upgrading from a very old versions of Firefox, but also affects new profiles) we attempt to clear the cache as well.

If any attempt to clear the cache fails we set a flag in compatibility.ini telling us to clear the cache on next startup.

Unfortunately nsIFile.remove returns different error codes on different platforms when the file in question already doesn't exist. So an attempt to remove a cache that does't exist is counted as a failure on Windows and so the cache will be cleared again on second run.

When Firefox runs we cache a number of things, JS, XUL, XBL, CSS etc. Whenever a new build of Firefox runs against a profile we clear the cache to rebuild with the new information. On the first run of a profile where compatibility.ini doesn't exist (presumably to cover the case of upgrading from a very old versions of Firefox, but also affects new profiles) we attempt to clear the cache as well. If any attempt to clear the cache fails we set a flag in compatibility.ini telling us to clear the cache on next startup. Unfortunately nsIFile.remove returns different error codes on different platforms when the file in question already doesn't exist. So an attempt to remove a cache that does't exist is counted as a failure on Windows and so the cache will be cleared again on second run. This change counts the file not found return code as counting as a success when clearing the cache.

Added this to the "this week I learned" section of our meeting notes, fwiw...

Filed bug 1519200 as a follow-up.

Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66

Noticed some big tp5n improvements on Windows thanks to this! \0/

== Change summary for alert #18689 (as of Thu, 10 Jan 2019 18:41:11 GMT) ==

Improvements:

31% tp5n time_to_session_store_window_restored_ms windows7-32 pgo e10s stylo 1,043.21 -> 723.74
22% tp5n nonmain_startup_fileio windows7-32 opt e10s stylo 2,572,989.83 -> 2,007,252.17
21% tp5n time_to_session_store_window_restored_ms windows7-32 opt e10s stylo 1,018.64 -> 800.80
21% tp5n nonmain_startup_fileio windows7-32 pgo e10s stylo 2,525,237.75 -> 1,995,149.17
20% tp5n time_to_session_store_window_restored_ms windows10-64 pgo e10s stylo 821.90 -> 653.93

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=18689

Yeah, essentially those are just because it now uses the startup cache when doing measurements. Those improvements won't really affect users except on the second startup of a build.

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

Flags: needinfo?(dtownsend)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

This change would certainly change the file i/o characteristics of talos tests that run in the second run of Firefox, making them much closer to what users see in general. I'm not sure what tp5n main_normal_fileio is though so I can't guess why it regressed. Is there any info on it?

Flags: needinfo?(dtownsend) → needinfo?(igoldan)

(In reply to Dave Townsend [:mossop] (he/him) from comment #9)

(In reply to Ionuț Goldan [:igoldan], Performance Sheriffing from comment #8)

Dave, I noticed a 27% tp5n main_normal_fileio regression.
Was this also expected?

This change would certainly change the file i/o characteristics of talos tests that run in the second run of Firefox, making them much closer to what users see in general. I'm not sure what tp5n main_normal_fileio is though so I can't guess why it regressed. Is there any info on it?

This is a xperf subtest and you can read more about this suite from here. I think main_normal_fileio stands for the main thread's disk I/O usage after the browser started up.

Am I correct, Aaron?

Flags: needinfo?(igoldan) → needinfo?(aklotz)

Correct, where the delineation between "startup" and "normal" is the sessionstore-windows-restored event.

Flags: needinfo?(aklotz)

This sounds odd then. Is there any way to see the files being accessed?

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

(In reply to Brendan Dahl [:bdahl] from comment #13)

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

Sure, but I'd expect all the cached stuff to be loaded by the time sessionstore-windows-restored fired.

(In reply to Dave Townsend [:mossop] (he/him) from comment #12)

This sounds odd then. Is there any way to see the files being accessed?

Needinfo for this, so we can get to the bottom of this. Given some of the other "interesting" hangs/issues I've seen when I messed with startupcache (bug 1445739), I'd really like to understand this better. I suspect shutdown IO to the cache for things that didn't load on first startup but loaded on second startup, but lists of files etc. would really help.

Flags: needinfo?(igoldan)

(Also, it's Not Good that the regression is for mainthread IO.)

(In reply to Dave Townsend [:mossop] (he/him) from comment #14)

(In reply to Brendan Dahl [:bdahl] from comment #13)

In the non-cached case wouldn't we just be loading files out of the jar, whereas in the cached case, we'd be loading jar file + the startup cache file? That could explain more i/o.

Sure, but I'd expect all the cached stuff to be loaded by the time sessionstore-windows-restored fired.

Never mind, missed that this was after the session windows restored, was thinking it was startup.

(In reply to :Gijs (he/him) from comment #15)

(In reply to Dave Townsend [:mossop] (he/him) from comment #12)

This sounds odd then. Is there any way to see the files being accessed?

Needinfo for this, so we can get to the bottom of this.

Aaron, could you provide some assistance here? It would save us some time, as I'm not too familiar with xperf tests.

Flags: needinfo?(igoldan) → needinfo?(aklotz)

I seem to recall that at one point we uploaded the original etl file as an artifact, but I'm not seeing any here.

Joel, am I recalling that correctly? Or do we only do that if the xperf suite fails?

Flags: needinfo?(aklotz) → needinfo?(jmaher)

I believe it is only if xperf fails. You can always push to try and make it happen with some small edits. I have always wanted to make xperf easier to debug (i.e. retrigger and collect etl/metrics)

Flags: needinfo?(jmaher)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: