Bugzilla

Comment 1

•

5 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #0)

Thanks for filing!

I don't see an fsync when I profile startup with an existing Firefox profile, however:

https://perfht.ml/2IFesME

That's still a lot of main thread I/O on your profile. These operations seem cheap in your profile, but I assume you ran this on a machine with an SSD.

Florian: is your startup profile possibly a first-run (or "first run after upgrade") profile?

It's a startup profile captured automatically by the startup main thread I/O test I'm writing in bug 1540135. So this is a slightly artificial situation as mochitests have some custom defaults to avoid network requests. It's closer to a first-run startup than to a normal warm startup though.

I've observed that startup in our test infrastructure is typically CPU bound, with I/O being very cheap. I would expect actual users (especially users with mechanical hard drives) to see much longer times for these I/O operations.

Flags: needinfo?(florian)

Whiteboard: [fxperf]

Comment 2

•

5 years ago

Here is what it looks like on a normal cold startup on the reference hardware: https://perfht.ml/2LgAOGD

Marco Bonardo [:mak]

•

5 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #3)

I have an idea for how to address this and will investigate…

Hi Myk. Can you tell us a bit more about your plans? :-)

Flags: needinfo?(myk)

Reporter

Comment 5

•

5 years ago

For XULStore in general, we should be able to preload its database on its background thread at the same time that URLPreloader pre-reads other URLs on its background thread.

For the specific XULStore values that earlyBlankFirstPaint() and nsXULWindow more generally requires to show the first browser (or blank) window—width, height, screenX, screenY, and sizemode—we may be able to speed up retrieval even more by storing them in bespoke storage that is optimized for retrieval of just those values.

For example, a binary file with the shortest sequence of bytes sufficient to represent those values in predetermined order (thus not requiring delimiters) would presumably be 9-10 bytes in size: two each for width, height, screenX, and screenY; plus one byte for sizemode; and perhaps another byte that identifies the format version (in case we want to change it in the future).

But I'll start by preloading XULStore in general.

Flags: needinfo?(myk)

Brian Grinstead [:bgrins]

Comment 6

•

5 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #5)

For the specific XULStore values that earlyBlankFirstPaint() and nsXULWindow more generally requires to show the first browser (or blank) window—width, height, screenX, screenY, and sizemode—we may be able to speed up retrieval even more by storing them in bespoke storage that is optimized for retrieval of just those values.

For example, a binary file with the shortest sequence of bytes sufficient to represent those values in predetermined order (thus not requiring delimiters) would presumably be 9-10 bytes in size: two each for width, height, screenX, and screenY; plus one byte for sizemode; and perhaps another byte that identifies the format version (in case we want to change it in the future).

Would this file be read on the main thread? If so, just storing these values in prefs (which are already available in memory at the time we display the early blank window) would be faster than needing to touch another file on disk.

Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?

Comment 7

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #6)

Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?

Florian, are there any plans to add:

automated tests to capture this type of I/O change
a policy around when it's appropriate to backout/disable patches due to an I/O change

I understand that (1) is very difficult to do. But it seems harder to do (2) without it, and without (2) it's not clear to me how to make a decision in any particular case.

Brian Grinstead [:bgrins]

Comment 8

•

5 years ago

(In reply to Brian Grinstead [:bgrins] from comment #7)

(In reply to Florian Quèze [:florian] from comment #6)

Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?

Florian, are there any plans to add:

automated tests to capture this type of I/O change

a policy around when it's appropriate to backout/disable patches due to an I/O change

I understand that (1) is very difficult to do. But it seems harder to do (2) without it, and without (2) it's not clear to me how to make a decision in any particular case.

Yes, I've been working on this on and off for the last 3 months, and it landed yesterday: see bug 1540135.
Actually, this bug has been found because I had to add a whitelist entry for it between my try runs.
Sounds straightforward given the test is a mochitest: a new test failure means sheriffs will backout the patch that introduced new I/O.

The next question is: when is it fine to whitelist new I/O? The answer will be: if you can convince the front-end performance team who's looking after this test (mostly myself and mconley I guess) that this new main thread I/O is unavoidable.

Comment 9

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #8)

Yes, I've been working on this on and off for the last 3 months, and it landed yesterday: see bug 1540135.
Actually, this bug has been found because I had to add a whitelist entry for it between my try runs.

Sounds straightforward given the test is a mochitest: a new test failure means sheriffs will backout the patch that introduced new I/O.

Thanks for the response - a new mochitest sounds great.

Reporter

Comment 10

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #6)

Would this file be read on the main thread? If so, just storing these values in prefs (which are already available in memory at the time we display the early blank window) would be faster than needing to touch another file on disk.

That's currently true, and storing these values in prefs is the path of least resistance. But it isn't clear that it has to be true, so it's worth pushing on this assumption.

Do we really need to parse all prefs before we display the early blank window? Do we even need to parse any prefs? And if so, what's the subset that we actually need, and how do we isolate those, so we can load them first, and then display the window, before we load the rest?

Do you expect to be able to fix this bug for 68? If not, should we backout bug 1460811 or restrict it to nightly-only in the meantime?

Sorry, I should have mentioned this earlier: in bug 1547877 I re-added the old implementation of XULStore and configured the new implementation to be enabled only on Nightly (because of this issue and others). So the new implementation will not ride the trains to the 68 release.

(Nevertheless, this issue is on my list of the top three issues to resolve before we do allow the new implementation to ride the trains. So it remains a top priority for me.)

Status: NEW → ASSIGNED

Comment 11

•

5 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #10)

Do we really need to parse all prefs before we display the early blank window? Do we even need to parse any prefs? And if so, what's the subset that we actually need, and how do we isolate those, so we can load them first, and then display the window, before we load the rest?

We probably don't need to parse prefs, no. My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.

I would expect the cases where prefs are expensive to parse to be due to large values having been stored in prefs by legacy add-ons.

Sorry, I should have mentioned this earlier: in bug 1547877 I re-added the old implementation of XULStore and configured the new implementation to be enabled only on Nightly (because of this issue and others). So the new implementation will not ride the trains to the 68 release.

Thanks! I think that means my I/O test will fail when merged to beta, and more conditions need to be added on some whitelist entries.

Reporter

Comment 12

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #11)

My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.

I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)

I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.

(Nevertheless, it's worth considering whether we can improve prefs efficiency as well to reduce its startup cost.)

Thanks! I think that means my I/O test will fail when merged to beta, and more conditions need to be added on some whitelist entries.

Sorry for the bustage! Let me know how I can help with the cleanup. (The relevant configuration option is MOZ_NEW_XULSTORE, which is defined in toolkit/moz.configure and reflected into AppConstants.MOZ_NEW_XULSTORE.)

:Gijs (he/him)

Comment 13

•

5 years ago

(In reply to Myk Melez [:myk] [@mykmelez] from comment #12)

(In reply to Florian Quèze [:florian] from comment #11)

My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.

I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)

And we should consider what we can do about this more generally, esp. when it comes to state storage that isn't actually a user "preference", like "timestamp when we last started as stored by telemetry". But that's not really the case for this bug - this is actually user-configured stuff.

I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.

This makes sense for parsing, but that's not the only (nor biggest) cost here. If all the prefs consumers used this logic, things would probably get (a lot) worse because we'd end up with many many tiny separate files on separate blocks/sectors on disk, so on HDDs there'll be more seeking than just having 1 file, seeking which in practice will be more expensive in wall-clock time. Costs of file reads on most systems likely only start being vaguely related to their size once you need more than a few blocks/sectors of disk space (ie when there's a chance of fragmentation). As Florian already said, the CPU cost of parsing prefs is likely negligible on a lot of users' systems compared to the disk IO cost -- the opposite of how things pan out on talos (but even there, I'm fairly sure pref parsing doesn't show up in profiles as significant). Your example of 70 user_pref lines likely still fits in very few disk sectors (maybe even in only 1, depending on the size of the prefs and the type of disk etc.).

Reporter

Comment 14

•

5 years ago

(In reply to :Gijs (he/him) from comment #13)

(In reply to Myk Melez [:myk] [@mykmelez] from comment #12)

(In reply to Florian Quèze [:florian] from comment #11)

My unverified assumption is that parsing prefs is cheaper than reading a small file from the disk. I'm wondering if what you are suggesting is equivalent to storing prefs in a more efficient format.

I don't think so, given the intentional flexibility of the prefs API, and since there are many prefs, even on firstrun with a new profile (70 user_pref lines in prefs.js after quitting Firefox following firstrun on my laptop)

And we should consider what we can do about this more generally, esp. when it comes to state storage that isn't actually a user "preference", like "timestamp when we last started as stored by telemetry". But that's not really the case for this bug - this is actually user-configured stuff.

Agreed, it'd be useful to think about the issues more generally, in addition to whatever we do here for this issue specifically.

I expect an optimally efficient format for storing/reading/parsing five specific integer values to be much more efficient than an optimally efficient format for storing/reading/parsing an arbitrary set of prefs.

This makes sense for parsing, but that's not the only (nor biggest) cost here. If all the prefs consumers used this logic, things would probably get (a lot) worse because we'd end up with many many tiny separate files on separate blocks/sectors on disk, so on HDDs there'll be more seeking than just having 1 file, seeking which in practice will be more expensive in wall-clock time.

Agreed, the solution I've proposed here is not intended to be generalized to all prefs consumers. I'm only proposing it for this particular one because it blocks early blank first paint (both the current implementation, which happens to run after prefs are read/parsed, and an ideal implementation that wouldn't depend on prefs).

Costs of file reads on most systems likely only start being vaguely related to their size once you need more than a few blocks/sectors of disk space (ie when there's a chance of fragmentation). As Florian already said, the CPU cost of parsing prefs is likely negligible on a lot of users' systems compared to the disk IO cost -- the opposite of how things pan out on talos (but even there, I'm fairly sure pref parsing doesn't show up in profiles as significant).

Understood. In the case of the XULStore values that block early blank first paint, which we can store in 9-10 bytes, we should be able to improve both CPU time and IO time (and thus wall-clock time) to that first paint by reading and parsing the values from a separate file.

Your example of 70 user_pref lines likely still fits in very few disk sectors (maybe even in only 1, depending on the size of the prefs and the type of disk etc.).

On my Windows system, prefs.js after first run is 6,645 bytes. On macOS and Linux, it's 6,019 and 5,840, respectively. All of which take at least 12 sectors on an HDD with 512-byte sectors or two sectors on an HDD with 4k sectors.

Prefs files in the wild, which contain accumulated "preferences" (both actual user settings and data stored by Firefox components), will be larger, although it isn't clear by how much on average. The prefs.js file in my default profile is 45,764 bytes (although this is likely to be an outlier).

Reporter

Updated

•

5 years ago

Blocks: 1560211

Mike Conley (:mconley) (:⚙️)

Comment 15

•

5 years ago

This is still causing slowness visible in Nightly startup profiles, eg. https://perfht.ml/2GInN3A

Comment 16

•

5 years ago

Hey myk,

Can I presume correctly that you're unlikely to work on this anytime soon? If so (or if I don't hear back by the end of the week), I'll unassign you.

Flags: needinfo?(myk)

Flags: needinfo?(mconley)

Comment 17

•

5 years ago

I was told in the #browser-arch room that Victor has this bug in his todos for Q3.

Assignee: myk → vporof

Flags: needinfo?(myk)

Flags: needinfo?(mconley)

Assignee

Updated

•

5 years ago

Blocks: crlite

Assignee

Updated

•

5 years ago

Blocks: ship-rkv

Assignee

Updated

•

5 years ago

No longer blocks: crlite

Assignee

Updated

•

5 years ago

Type: task → enhancement

Summary: investigate XULStore main thread file I/O → Fix XULStore main thread file I/O performance