Closed Bug 1186714 Opened 9 years ago Closed 9 years ago

PlacesBackups.create is extremely janky

Categories

(Toolkit :: Places, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla43
Iteration:
43.1 - Aug 24
Tracking Status
firefox43 --- fixed

People

(Reporter: markh, Assigned: markh)

References

(Blocks 1 open bug)

Details

(Whiteboard: [fxsync])

Attachments

(2 files, 1 obsolete file)

Attached file places-backup-jank-scratchpad.js (deleted) —
In a profile of 10000 bookmarks, PlacesBackups.create() causes ~1 second of jank during which time the browser is unresponsive. Mak expressed surprised on IRC as he thought it was all done using non-blocking async operations. To reproduce, get a profile with 10000 bookmarks and run the attached script in a browser scratchpad. The script waits for 3 seconds before starting the backup - during this 3 seconds, grab the border of the browser window and start resizing - the resizing stops for the duration of the backup (~1s on my development box) The script also reports jank from PerformanceStats.jsm to the browser console. Running the script for me shows: > while waiting for timer: 3x1ms, 3x2ms, 3x4ms, 3x8ms, 1x16ms > doing snapshot: 3x1ms, 3x2ms, 3x4ms, 3x8ms, 1x16ms, 1x32ms, 1x64ms, 1x128ms, 1x256ms, 1x512ms > timer after snapshot: 2x1ms, 2x2ms, 2x4ms, 2x8ms > done So during the initial timer there is 1 jank for 16-32ms, then during the backup there is one jank for 512-1024ms, then after the backup we are back to an 8ms jank. This seems exactly for how long the browser is unresponsive for when doing the resizing. sqlmanager tells me my bookmarks table has just over 10000 rows, mostly created via a simple script (eg, example.com/1, example.com/2, etc). The created backup file is 519,842 bytes. I'm happy to share places.db or the backup files if it helps. Note we are doing a backup as part of Sync. As Mak noted this may not be necessary as bookmark backups are run once per day anyway - but that just means even non-Sync users are seeing this jank regularly.
Just to be sure I understand your sentence: you are mentioning numbers given to you by PerformanceStats.jsm, right?
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #1) > Just to be sure I understand your sentence: you are mentioning numbers given > to you by PerformanceStats.jsm, right? Yep, as dumped by the .js attachment. (In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #2) > A profile would be useful http://people.mozilla.org/~bgirard/cleopatra/#report=f75fe9e3b71f3261583527df33c9d149d7d6973c&invertCallback=true&filter=%5B%7B%22type%22%3A%22RangeSampleFilter%22,%22start%22%3A217400,%22end%22%3A220147%7D%5D&selection=%22(total)%22,393,223,218,118,3,117,122,121,120,126,117,3,116,114,3,113,3,1 (wow, that's long!) - both the gecko profiler and the devtools profiler seem to blame promiseBookmarksTree
Off-hand, the only synchronous part left in promiseBookmarksTree is PlacesUtils.getAnnotationsForItem, but it should only run for a subset of the bookmarks. Changing this, by introducing an async helper, would not be too hard, provided there's a relation with the jank. Could you add some Performance.now() measures into promiseBookmarksTree, to get the total time spent in getAnnotationsForItem? The profile looks strange(0.4% spent in promiseBookmarksTree?) and I don't see a long jank (there are a couple of about 40ms), I see a lot of packed operations that make me think to the loop spinning. Was this profile taken by running the method from the console, or from Sync when it runs while spinning the events loop? I mean, spinning the events loop will clearly pack a lot of IO operations in a small time. Note, we have telemetry about time needed to collect the bookmarks tree, that tells me 80% of users are below 200ms. 7% is over 1s, but this value is taken before and after yields, so it also account for time spent on the async threads. http://telemetry.mozilla.org/dist.html#measure=PLACES_BACKUPS_BOOKMARKSTREE_MS&max_channel_version=nightly%2F41&min_channel_version=nightly%2F38&product=Firefox&cumulative=0&use_submission_date=0&start_date=2015-05-11&end_date=2015-06-29&sanitize=1
I'd like to know more about your script that generates the database, is it adding annotations, tags, something non-common?
(I should also add that the backup operation happens on idle)
(In reply to David Rajchenbach-Teller [:Yoric] (use "needinfo") from comment #2) > A profile would be useful, but from the top of my head, I suspect JSON > serialization: We also have telemetry about the JSON serialization, and it indicates it's not an issue for about 99.5% of users http://telemetry.mozilla.org/dist.html#measure=PLACES_BACKUPS_TOJSON_MS&max_channel_version=nightly%2F41&min_channel_version=nightly%2F38&product=Firefox&cumulative=0&use_submission_date=0&start_date=2015-05-11&end_date=2015-06-29&sanitize=1
(In reply to Marco Bonardo [::mak] from comment #4) > Could you add some Performance.now() measures into promiseBookmarksTree, to > get the total time spent in getAnnotationsForItem? I couldn't work out how to use Performance.now() without a window, but it is called only about 10 times. I commented out the getAnnotationsForItem() call in promiseBookmarksTree and it had no effect. > The profile looks strange(0.4% spent in promiseBookmarksTree?) and I don't > see a long jank (there are a couple of about 40ms), I see a lot of packed > operations that make me think to the loop spinning. As I mentioned, the browser completely hangs for ~ 1 second - the jank will be obvious if you run it with lots of bookmarks. > Was this profile taken > by running the method from the console The profile was taken running the attached script in a scratchpad. No Sync involved. (In reply to Marco Bonardo [::mak] from comment #5) > I'd like to know more about your script that generates the database, is it > adding annotations, tags, something non-common? I ran this script along time ago purely for testing Sync. Sadly I don't have it now, but I'm pretty sure it was just something simple like: > for (i = 0; i < 10000; i++) { > addBookmarkOrSomething("http://example.com/" + i); // can't remember what I used :) > } I wasn't attempting to make complex bookmarks, it was just the shortest-possible script to get a large number of them. I suspect you will be able to reproduce it with the simplest bookmarks possible. > that tells me 80% of users are below 200ms. Yeah, I've no doubt that 10000 bookmarks is more than most users have so isn't representative. However, the question here seems to be how much of that 200ms is jank. In my scenario the browser hangs for most of the wall-time - if that's also true for users that only 200ms, that still seems bad.
Sure, I'm not trying to reduce the importance of this report, I'm trying to understand how much priority it should have. 200ms of peak IO on idle it's not terrible, 1s of jank while the user is using the browser, would be terrible, and Sync is more likely to cause the latter since it's not running this backup code on idle and it's also spinning around it (more likely to convert high peak IO into complete jank). The situation improved a lot regarding backups lately, we moved from seconds of main-thread jank to this less than 1s of high traffic. We can likely do better. We need to profile each part of the function and see where we can achieve the highest wins. But in the meanwhile I think Sync should just stop trying to be nice and stop forcing backups. We have telemetry that can tell us if users have too old backups, so we can tweak its code, and we consider "sucky-but-acceptable" a minimal (like last day) loss of data.
Blocks: PlacesJank
I think we might also want to try yielding soon() every N bookmarks to give back time to the main-thread, the price there would be a longer time to generate the backup, but it might not be that bad.
(In reply to Marco Bonardo [::mak] from comment #9) > 200ms of peak IO on idle it's > not terrible, 1s of jank while the user is using the browser, would be > terrible Isn't the whole point with the async IO that the browser shouldn't be hanging (whether it's 200ms or 1s)? > and Sync is more likely to cause the latter since it's not running > this backup code on idle and it's also spinning around it (more likely to > convert high peak IO into complete jank). PerformanceStats and physically using the browser while running this script shows exactly the same results. I don't think Sync is at all relevant here (but yes, not doing it on idle makes it worse for the user) > But in the meanwhile I think Sync should just stop trying to be nice and > stop forcing backups. A consideration here is that Sync about to explicitly make serious and buggy changes on them - there are reports of total bookmark loss after a sync. Which sucks terribly. This makes me feel the backup is valuable. Note also Sync only does it on a *first* sync, not *every* sync. > I think we might also want to try yielding soon() every N bookmarks to give I guess I just don't understand enough :) I expected all the async IO to have a wall-time cost, but very low "jank cost" on an unloaded system. I also thought promise.then will always result in an event-loop spin, which I (naively :) assume does-the-right-thing in terms of UI responsiveness. The JSON encoding obviously blocks and throwing messages at the IO threads hurts a bit, but the results still surprise me. I'll see if I can make Services.Profiler.AddMarker() be useful...
(In reply to Mark Hammond [:markh] from comment #11) > I guess I just don't understand enough :) I expected all the async IO to > have a wall-time cost, but very low "jank cost" on an unloaded system. I > also thought promise.then will always result in an event-loop spin async IO by itself should not jank, but then results must be passed to the main-thread and callbacks from the async execution run on main-thread. However, here I suspect we spend a lot of time doing js, small amount of js but multiplied for 10000 bookmarks in a strict loop promises are resolved at the end of the current tick, as Yoric explained me the other day, today they run in microtasks, so they are likely not giving back time to the main-thread as we expect.
(In reply to Marco Bonardo [::mak] from comment #12) > promises are resolved at the end of the current tick, as Yoric explained me > the other day, today they run in microtasks, so they are likely not giving > back time to the main-thread as we expect. Right - I could have sworn I heard that wasn't the case! So yeah, yielding in the main loop has a minor wall-time impact and a huge less-jank impact :) Thanks!
This patch yields every 10 items. I tweaked the .js attachment to report wall-time too. I ran each case 3 times. Without patch: > doing snapshot: 3x1ms, 3x2ms, 3x4ms, 3x8ms, 2x16ms, 1x32ms, 1x64ms, 1x128ms, 1x256ms, 1x512ms , took 2719 > doing snapshot: 5x1ms, 5x2ms, 5x4ms, 5x8ms, 1x16ms, 1x32ms, 1x64ms, 1x128ms, 1x256ms, 1x512ms , took 2783 > doing snapshot: 5x1ms, 5x2ms, 5x4ms, 5x8ms, 2x16ms, 1x32ms, 1x64ms, 1x128ms, 1x256ms, 1x512ms , took 2678 With this patch but n=100: > doing snapshot: 108x1ms, 108x2ms, 108x4ms, 108x8ms, 47x16ms, 4x32ms, 1x64ms, 1x128ms , took 2859 > doing snapshot: 109x1ms, 109x2ms, 109x4ms, 109x8ms, 48x16ms, 3x32ms, 1x64ms, 1x128ms , took 2960 > doing snapshot: 109x1ms, 109x2ms, 109x4ms, 109x8ms, 39x16ms, 2x32ms , took 2719 With this patch (ie, n=10) > doing snapshot: 165x1ms, 165x2ms, 165x4ms, 165x8ms, 4x16ms, 2x32ms, 1x64ms , took 3284 > doing snapshot: 164x1ms, 164x2ms, 164x4ms, 164x8ms, 2x16ms, 1x32ms , took 2948 > doing snapshot: 168x1ms, 168x2ms, 168x4ms, 168x8ms, 1x16ms, 1x32ms , took 2951 With this patch but every item (ie, no counter) > doing snapshot: 238x1ms, 238x2ms, 238x4ms, 238x8ms, 4x16ms, 1x32ms , took 4427 > doing snapshot: 235x1ms, 235x2ms, 235x4ms, 235x8ms, 2x16ms, 1x32ms , took 4660 > doing snapshot: 243x1ms, 243x2ms, 243x4ms, 243x8ms, 3x16ms, 3x32ms , took 4692 Not particularly scientific :) But... n=10 has 1 jank at 64ms and the increase in wall-time seems reasonable. n=100 is still fairly janky. n=0 probably increases wall-time too far. What do you think?
Attachment #8637790 - Flags: feedback?(mak77)
Comment on attachment 8637790 [details] [diff] [review] 0004-Bug-1186714-bookmark-backup-yields-the-event-loop-oc.patch Review of attachment 8637790 [details] [diff] [review]: ----------------------------------------------------------------- yes, I think we should do something like this. On the other side, having to yield every 10 entries to avoid 64ms jank, points to the fact some code is behaving very poorly and should be fixed, it's very few entries. On my system the n=10 moves the wall time from 8s to 10s, that's likely a little bit too much. The other problem is that PlacesTransactions.jsm will use promiseBookmarksTree to fetch subtrees, that usually will be small, but in many cases they might be over 10 items, and we likely don't want to slowdown all the UI operations involving bookmarks, like moving a folder with 10 bookmarks inside. Even if it's unscientific, I'd guess we should go half-way, yield every 50 items, should have plenty of space for "simple" ui operations, while still helping a lot a whole backup. Then we can file a follow-up to figure out which part of the js code is taking 64ms just to do some simple operations on 10 items...
Attachment #8637790 - Flags: feedback?(mak77)
(In reply to Marco Bonardo [::mak] from comment #15) > On my system the n=10 moves the wall time from 8s to 10s, that's likely a > little bit too much. Wow - my system took only 1s for 10,000 bookmarks - but almost all of that time was janky. Are you able to run the attachment here and see how janky that 8s is? > The other problem is that PlacesTransactions.jsm will use > promiseBookmarksTree to fetch subtrees, that usually will be small, but in > many cases they might be over 10 items, and we likely don't want to slowdown > all the UI operations involving bookmarks, like moving a folder with 10 > bookmarks inside. Right, but surely it would be worse to have 8s of jank than 10s of no jank? > Even if it's unscientific, I'd guess we should go half-way, yield every 50 > items, should have plenty of space for "simple" ui operations, while still > helping a lot a whole backup. > Then we can file a follow-up to figure out which part of the js code is > taking 64ms just to do some simple operations on 10 items... I'd have thought we should approach this the other way - ie, what's the maximum jank we will tolerate, and what value of n does this imply? The big question here though is "on what system do we measure this". (I still believe there should be a way so a Task.jsm task can say "please yield for me whenever you detect jank" which would make the value of n depend on the actual performance of the system, which would end up making this moot, but it seems I'm the only one who thinks this)
Flags: needinfo?(mak77)
(In reply to Mark Hammond [:markh] from comment #16) > Wow - my system took only 1s for 10,000 bookmarks - but almost all of that > time was janky. Are you able to run the attachment here and see how janky > that 8s is? It was in a debug build that matters a lot, in an opt build it takes far less (300ms). I have 1200 bookmarks. I slightly modified the script to run PlacesBackups.getBookmarksTree() instead of create(), since we are measuring promiseBookmarksTree and that's more harmless. this in Nightly (opt): while waiting for timer: 6x1ms, 6x2ms, 6x4ms, 6x8ms doing snapshot: 3x1ms, 3x2ms, 3x4ms, 3x8ms, 1x16ms, 1x32ms, 1x64ms, 1x128ms, 1x256ms timer after snapshot: 3x1ms, 3x2ms, 3x4ms, 3x8ms this is in a debug build: while waiting for timer: 8x1ms, 8x2ms, 8x4ms, 8x8ms, 2x16ms doing snapshot: 15x1ms, 15x2ms, 15x4ms, 15x8ms, 6x16ms, 4x32ms, 4x64ms, 1x128ms, 1x256ms, 1x512ms timer after snapshot: 44x1ms, 44x2ms, 44x4ms, 44x8ms, 20x16ms, 7x32ms, 3x64ms, 1x128ms > > Even if it's unscientific, I'd guess we should go half-way, yield every 50 > > items, should have plenty of space for "simple" ui operations, while still > > helping a lot a whole backup. > > Then we can file a follow-up to figure out which part of the js code is > > taking 64ms just to do some simple operations on 10 items... > > I'd have thought we should approach this the other way - ie, what's the > maximum jank we will tolerate, and what value of n does this imply? The big > question here though is "on what system do we measure this". Yeah, the problem is that I don't want to make any UI operation feel slow to avoid a 32ms jank. Or better, ideally I would like to have no jank, but we need a compromise. The other problem is that in some edge cases (like bookmarks.html generation) adding seconds could kill the feature, cause it still happens on shutdown. > (I still believe there should be a way so a Task.jsm task can say "please > yield for me whenever you detect jank" which would make the value of n > depend on the actual performance of the system, which would end up making > this moot, but it seems I'm the only one who thinks this) That would be awesome, if we could detect jank in real-time without an overhead cost. I still think we need a compromise and investigate the actual cause of jankiness... Here you are using a 10000 bookmarks case, telemetry tells me 90% of users have less than 5000, so let's start by moving the test to 5000 and measure for n=50, 40, 30. I think 10 is too strict for UI operations, I'd like to stay between 30 and 50 for now and then remove remaining jankiness with actual code fixes.
Flags: needinfo?(mak77)
sorry, that was "99% of users"
(In reply to Marco Bonardo [::mak] (spotty available until 24 Aug) from comment #17) > I still think we need a compromise and investigate the actual cause of > jankiness... Here you are using a 10000 bookmarks case, telemetry tells me > 90% of users have less than 5000, so let's start by moving the test to 5000 > and measure for n=50, 40, 30. I think 10 is too strict for UI operations, > I'd like to stay between 30 and 50 for now and then remove remaining > jankiness with actual code fixes. The issue I have here is that the jank will be different for everyone - so what jank my machine measures is different than yours and different to our "typical" user (whatever that means). I do take your point though, and I'd like to move this forward without finding the time to dig into exactly why this is happening. So I opened bug 1197701 in the hope that the performance team can investigate the underlying reasons, and I've updated the patch to yield every 50 iterations, which is the range you were hoping for. While this *will* reduce jank, it will be subjective whether it reduces it enough, but for now I'm happy to take this as "good enough" for Sync's purposes given it only happens on the first sync.
Assignee: nobody → markh
Status: NEW → ASSIGNED
Attachment #8651655 - Flags: review?(mak77)
Attachment #8637790 - Attachment is obsolete: true
Iteration: --- → 43.1 - Aug 24
Flags: firefox-backlog+
Comment on attachment 8651655 [details] [diff] [review] 0006-Bug-1186714-bookmark-backup-yields-the-event-loop-oc.patch Review of attachment 8651655 [details] [diff] [review]: ----------------------------------------------------------------- yes it will be different for everyone, but if we fix the worst case, we should ideally be fine. I agree some improvement is better than no improvement, so let's move on ::: toolkit/components/places/PlacesUtils.jsm @@ +1962,5 @@ > parentsMap.set(item.guid, item); > + > + // With many bookmarks we end up stealing the CPU - even with yielding! > + // So we let everyone else have a go every few items (bug 1186714). Also, > + // bug 1197701 exists in the hope of being able to remove this entirely. well, considered the fate of bug 1197701 maybe the last phrase could just go. I also think we need a specific tooling to solve this case, but paolo is right that a catch-all solution could end up inviting people to just write sucky code and let the system workaround it (yes, this could happen regardless, we are already doing that...)
Attachment #8651655 - Flags: review?(mak77) → review+
Thanks! > a catch-all solution could end up inviting people to just write > sucky code and let the system workaround it (yes, this could happen > regardless, we are already doing that...) FTR, I think we are being way too harsh on this code. Casual inspection shows it *should* be IO bound and *should not* display the characteristics we are seeing. So far, no one can satisfactorily explain why this code behaves how it does, but the fact it does is indisputable. Eg, a highly efficient caching system could easily turn it into "no actual IO, all promises can be satisfied from in-memory data" - leading to a situation where the more we optimize code the jankier it gets, but this is not obvious from inspection. But yeah, no point flogging a dead horse :) Pushed with the reference to bug 1197701 removed. https://hg.mozilla.org/integration/fx-team/rev/1520b5bf78dd
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla43
Whiteboard: [fxsync]
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: