Closed
Bug 1389381
Opened 7 years ago
Closed 7 years ago
16 Second Pause in ExtensionUtils.jsm's truncate()
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
People
(Reporter: sstangl, Assigned: kmag)
References
Details
(Keywords: regression)
Attachments
(3 files)
(deleted),
text/x-review-board-request
|
aswan
:
review+
Sylvestre
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
|
Details |
(deleted),
text/x-review-board-request
|
aswan
:
review+
Sylvestre
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
|
Details |
(deleted),
text/x-review-board-request
|
aswan
:
review+
Sylvestre
:
approval-mozilla-beta+
Sylvestre
:
approval-mozilla-release+
|
Details |
Several users on Reddit are reporting that Firefox 55 is completely unusable for them due to extremely long pauses. I asked them to attempt a recording using the Gecko Profiler. One user was unable to get the profiler installed and running, but another one was able to upload several profiles.
The one this bug is about is here: https://perfht.ml/2vJH3JQ
The user is experiencing a 16 second hang on the main thread under WindowDestroyedEvent::Run(). In ExtensionUtils.jsm, the function truncate() is called on something, which then proceeds to spend 16 whole seconds seeking through an OrderedHashTable, calling onRemove() handlers, and finally remove()ing each item.
The user says, "All addons except session manager and uBlock disabled."
This seems pretty severe, and multiple users are reporting that 55 is effectively unusable for them.
The specific user's report on Reddit is here: https://www.reddit.com/r/firefox/comments/6siseo/loading_websites_in_firefox_very_slow/dlgcl0q/
The full thread is here: https://www.reddit.com/r/firefox/comments/6siseo/loading_websites_in_firefox_very_slow/
I'm really not sure how to diagnose this. Ehsan or bz, would you be able to point me in the right direction, or see what the likely culprit would be? I remember that Ehsan changed some stuff with hash tables recently, but I don't know if that could be related.
Flags: needinfo?(ehsan)
Flags: needinfo?(bzbarsky)
Comment 1•7 years ago
|
||
This seems like something Kris might know something about, at least based on blame.
But fundamentally, we're starting off at http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionContent.jsm#555 where we're told an inner window died. This calls http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/MessageChannel.jsm#782 which adds things to this.abortedResponses. That's a LimitedSet (of max size 30), so we end up at http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionUtils.jsm#280 which calls http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/toolkit/components/extensions/ExtensionUtils.jsm#271
But... in this case that should do exactly one this.delete() call per add() call. That shouldn't be unreasonable. So maybe we have a ton of this.pendingResponses in abortResponses? Or we have a ton of DocumentManagers around? But at first glance DocumentManager is a singleton...
Flags: needinfo?(bzbarsky) → needinfo?(kmaglione+bmo)
Comment 2•7 years ago
|
||
If LimitedSet#add() (resp. #truncate()) is called a few thousand times in consecutive order, and no GC happens in that time, the SetIteratorObjects aren't getting collected, which means their internal OrderedHashTable#Range isn't destroyed. In that case we spent a huge amount of time in http://searchfox.org/mozilla-central/rev/4b79f3b23aebb4080ea85e94351fd4046116a957/js/src/ds/OrderedHashTable.h#205-208 updating effectively stale ranges.
Simple shell test case for that scenario:
---
class LimitedSet extends Set {
constructor(limit, iterable = undefined) {
super(iterable);
this.limit = limit;
}
truncate(limit) {
for (let item of this) {
if (this.size <= limit) {
break;
}
this.delete(item);
}
}
add(item) {
if (!this.has(item) && this.size >= this.limit) {
this.truncate(this.limit - 1);
}
super.add(item);
}
}
var ls = new LimitedSet(30);
var t = dateNow();
for (var i = 0; i < 50000; ++i) {
ls.add(i);
// GC inactive set iterators to destroy Range instances.
if (i % 1000 === 0)
gc(); // <-- Without this explicit GC, the test case needs much more time.
}
print(dateNow() - t);
---
Comment 3•7 years ago
|
||
And if this is causing this slow-down, the simplest fix would be to change truncate() to:
---
truncate(limit) {
// Iterate over all entries to ensure the SetIteratorObject's Range gets destroyed. (bug 1389381)
for (let item of this) {
if (this.size > limit) {
this.delete(item);
}
}
}
---
Comment 4•7 years ago
|
||
Hmm. Can we nursery-allocate set iterator objects? That seems like it would help too...
Comment 5•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (vacation Aug 14-27) from comment #4)
> Hmm. Can we nursery-allocate set iterator objects? That seems like it
> would help too...
bug 1368420. But for nursery allocation we'd need to have a different implementation for OrderedHashTable#Range, so we can background finalize the iterator objects. (At least that's my understanding.)
Comment 6•7 years ago
|
||
[Tracking Requested - why for this release]: Severe perf issue.
Is comment 3 something we could ship in a dot release (and maybe 56 too) while we figure out a better long-term fix?
status-firefox55:
--- → affected
status-firefox56:
--- → affected
status-firefox57:
--- → affected
status-firefox-esr52:
--- → unaffected
tracking-firefox55:
--- → ?
tracking-firefox56:
--- → ?
tracking-firefox57:
--- → ?
Keywords: regression
Whiteboard: [qf]
Comment 7•7 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM] from comment #6)
> Is comment 3 something we could ship in a dot release (and maybe 56 too)
> while we figure out a better long-term fix?
It seems like we need a way to reproduce the problem so that we know this fixes it for that to be a viable option...
Flags: needinfo?(ehsan)
Comment 8•7 years ago
|
||
Based on code inspection, I think this should be a regression from bug 1260548 which first added |this.abortedResponses = new ExtensionUtils.LimitedSet(30);| <https://hg.mozilla.org/mozilla-central/rev/8e67a3e9a5bf#l4.17>
Blocks: 1260548
Comment 9•7 years ago
|
||
FWIW I tried installing both the Session Manager and uBlock Origin extensions in a new profile on 55.0.1 and browsed around a bit to see if I can reproduce this issue and didn't succeed.
Assignee | ||
Comment 10•7 years ago
|
||
There's a chance this is caused by bug 1386937, which caused us to add duplicate inner-window-destroyed observers. But that seems pretty unlikely, since only the first observer should actually find any responses to abort, which is why I haven't asked for uplift.
In the worst case, we can probably just remove the code that adds entries to abortedResponses, since it's really only there to prevent unhelpful console spam. But I think I'd rather try the solution in comment 3 first, combined with only truncating when we're something like 25% over the limit.
I'd still like to know exactly why this is happening, though, and find a way to reproduce. This seems like a pretty pathological worst case, and I can't even guess what an extension would have to do in order to wind up with that many responses to abort.
Although, looking at this more closely... When we call abortResponses, the aborted responses are removed from the set from a promise handler, which happens asynchronously. Which means that we should wind up iterating over the already aborted responses again from each observer callback. But even then, we only truncate when the item isn't already in the set, which means we'd have to have more than 30 responses to abort for each iteration.
Flags: needinfo?(kmaglione+bmo)
Assignee | ||
Comment 11•7 years ago
|
||
Also, neither Session Manager nor any of the public versions of μBlock are WebExtensions, so it would be nice to know exactly which extensions they're using.
Assignee | ||
Comment 12•7 years ago
|
||
Ah, looking at the profile some more, it looks like the only WebExtension they're using is Zotero.
Assignee | ||
Comment 13•7 years ago
|
||
So, just running the Zotero add-on for a short while, I get a huge number of console errors. If I had to guess, I'd say their messaging wrapper code (which is pretty hairy) doesn't handle those correctly, and they wind up with a bunch of bunch of messages that are waiting for response callbacks that will never come, and all get canceled when when the window running the content scripts finally closes.
So, given that, I'm just going to fix the issues that causes us to handle this so poorly, and not worry too much about trying to reproduce it.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 17•7 years ago
|
||
mozreview-review |
Comment on attachment 8896468 [details]
Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them.
https://reviewboard.mozilla.org/r/167714/#review173012
Attachment #8896468 -
Flags: review?(aswan) → review+
Comment 18•7 years ago
|
||
mozreview-review |
Comment on attachment 8896469 [details]
Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet.
https://reviewboard.mozilla.org/r/167716/#review173014
Good intentions, but where does `this.slop` actually get set?
Attachment #8896469 -
Flags: review?(aswan) → review-
Comment 19•7 years ago
|
||
mozreview-review |
Comment on attachment 8896470 [details]
Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet.
https://reviewboard.mozilla.org/r/167718/#review173016
Attachment #8896470 -
Flags: review?(aswan) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 22•7 years ago
|
||
mozreview-review |
Comment on attachment 8896469 [details]
Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet.
https://reviewboard.mozilla.org/r/167716/#review173088
Attachment #8896469 -
Flags: review?(aswan) → review+
Assignee | ||
Comment 23•7 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b63f0f44148519df73c725ab99746d4d686e12f
Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them. r=aswan
https://hg.mozilla.org/integration/mozilla-inbound/rev/364ef71685abd9959f7816810e422a15ec92087f
Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet. r=aswan
https://hg.mozilla.org/integration/mozilla-inbound/rev/96fb5933d07336b44f19b77a511981cba96e29f3
Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet. r=aswan
Assignee | ||
Updated•7 years ago
|
Assignee: nobody → kmaglione+bmo
Assignee | ||
Updated•7 years ago
|
Assignee | ||
Comment 24•7 years ago
|
||
Comment on attachment 8896468 [details]
Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them.
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1260548
[User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly trivial change to make sure that aborted response objects are removed from a map synchronously, as expected, rather than only after returning to the main event loop or microtask queue. There should be no change in behavior except in the existing corner cases which currently produce incorrect behavior.
[String changes made/needed]: None.
Attachment #8896468 -
Flags: approval-mozilla-release?
Attachment #8896468 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 25•7 years ago
|
||
Comment on attachment 8896469 [details]
Bug 1389381: Part 2 - Add slop to the maximum size of a LimitedSet.
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1260548
[User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly simple change to reduce overhead by pruning entries from a set less often, rather than every time a new entry is added after it reaches its maximum capacity.
[String changes made/needed]: None.
Attachment #8896469 -
Flags: approval-mozilla-release?
Attachment #8896469 -
Flags: approval-mozilla-beta?
Assignee | ||
Comment 26•7 years ago
|
||
Comment on attachment 8896470 [details]
Bug 1389381: Part 3 - Loop over the entire set iterator when truncating a LimitedSet.
Approval Request Comment
[Feature/Bug causing the regression]: Bug 1260548
[User impact if declined]: This can cause significant performance issues in certain cases where extension scripts create large numbers of message listeners which never call their response callbacks.
[Is this code covered by automated tests?]: Yes.
[Has the fix been verified in Nightly?]: N/A
[Needs manual test from QE? If yes, steps to reproduce]: No.
[List of other uplifts needed for the feature/fix]: None.
[Is the change risky?]: No.
[Why is the change risky/not risky?]: This is a fairly simple change to reduce overhead by looping over an entire iterator, to make sure it's finalized immediately, rather than exiting the loop early. The logic is unchanged, and the extra loop overhead from a maximum of about 30 entries in a JIT-optimized function should not amount to much.
[String changes made/needed]: None.
Attachment #8896470 -
Flags: approval-mozilla-release?
Attachment #8896470 -
Flags: approval-mozilla-beta?
Comment 27•7 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4b63f0f44148
https://hg.mozilla.org/mozilla-central/rev/364ef71685ab
https://hg.mozilla.org/mozilla-central/rev/96fb5933d073
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla57
Comment 28•7 years ago
|
||
Would it be possible to have a test to make sure we don't regress here?
Thanks
Comment 29•7 years ago
|
||
Comment on attachment 8896468 [details]
Bug 1389381: Part 1 - Synchronously remove rending responses after aborting them.
Fix some severe perf issues, taking it as ride along of 55.0.2
Attachment #8896468 -
Flags: approval-mozilla-release?
Attachment #8896468 -
Flags: approval-mozilla-release+
Attachment #8896468 -
Flags: approval-mozilla-beta?
Attachment #8896468 -
Flags: approval-mozilla-beta+
Updated•7 years ago
|
Attachment #8896469 -
Flags: approval-mozilla-release?
Attachment #8896469 -
Flags: approval-mozilla-release+
Attachment #8896469 -
Flags: approval-mozilla-beta?
Attachment #8896469 -
Flags: approval-mozilla-beta+
Updated•7 years ago
|
Attachment #8896470 -
Flags: approval-mozilla-release?
Attachment #8896470 -
Flags: approval-mozilla-release+
Attachment #8896470 -
Flags: approval-mozilla-beta?
Attachment #8896470 -
Flags: approval-mozilla-beta+
Updated•7 years ago
|
Comment 30•7 years ago
|
||
bugherder uplift |
https://hg.mozilla.org/releases/mozilla-beta/rev/3323f34f84bb
https://hg.mozilla.org/releases/mozilla-beta/rev/ffabdaaebcea
https://hg.mozilla.org/releases/mozilla-beta/rev/b301a9babee6
Flags: in-testsuite?
Comment 31•7 years ago
|
||
bugherder uplift |
Comment 32•7 years ago
|
||
Sean, would it be possible to ask some of the affected Reddit users to try out the CI builds that contain this fix? Should be reasonably safe to do, but backing up their profile first would be a reasonable precaution :)
OSX: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-macosx64/1502721201/firefox-55.0.2.en-US.mac.dmg
Win32: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-win32/1502720964/firefox-55.0.2.en-US.win32.zip
Win64: https://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-release-win64/1502720964/firefox-55.0.2.en-US.win64.zip
Flags: needinfo?(sstangl)
Comment 33•7 years ago
|
||
We tried to reproduce this issue on 55.0.2-build1 as well, but we didn't managed to due so.
Comment 34•7 years ago
|
||
Added to the release notes with "Fix performance regressions with WebExtension"
Reporter | ||
Comment 35•7 years ago
|
||
Ryan: I sent messages 2 days ago, but have received no responses.
Flags: needinfo?(sstangl)
Assignee | ||
Comment 36•7 years ago
|
||
(In reply to Sean Stangl [:sstangl] from comment #35)
> Ryan: I sent messages 2 days ago, but have received no responses.
It's probably not worth worrying about, at this point. I'm pretty confident that this solves the regression, and if there are still issues as bad as this, I'm sure we'll hear about them soon enough.
Comment 37•7 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #36)
> (In reply to Sean Stangl [:sstangl] from comment #35)
> > Ryan: I sent messages 2 days ago, but have received no responses.
>
> It's probably not worth worrying about, at this point. I'm pretty confident
> that this solves the regression, and if there are still issues as bad as
> this, I'm sure we'll hear about them soon enough.
Hi Kris, we're still seeing a subset of our users on v55.0.2 experiencing 10-15sec startup times. I'm able to partially reproduce on a Win 10 machine running FF 55.0.2 x64. Here's a profile running Ghostery at startup and the browser locks up for about 10sec:
https://perfht.ml/2imUBFC
In this profile I disabled and re-enabled Ghostery around the 80sec mark, and Firefox locked up for about 20 seconds.
https://perfht.ml/2imq9LQ
This had been a massive issue for us recently and I'm grateful that you all were able to patch it in the last release. Any further help you could provide would be much appreciated. Thanks,
Chris
Assignee | ||
Comment 38•7 years ago
|
||
(In reply to Christopher Tino from comment #37)
> Hi Kris, we're still seeing a subset of our users on v55.0.2 experiencing
> 10-15sec startup times. I'm able to partially reproduce on a Win 10 machine
> running FF 55.0.2 x64. Here's a profile running Ghostery at startup and the
> browser locks up for about 10sec:
There's too much going on in those profiles to make much sense of, but the problems are clearly unrelated to this bug. In both cases, a lot of the overhead is from Sync and SDK extensions (particularly LastPass).
Aside from that, most of the WebExtension overhead seems to be JSON/structured clone overhead in the storage API (some of which was probably fixed by bug 1356546 and bug 1370752), and other random messaging overhead.
Comment 39•7 years ago
|
||
It may be worth filing a separate bug for comment 37...
Comment 40•7 years ago
|
||
Thanks guys. We're trying to reproduce the issue based on the info we have from users. I'll see if I can get a few to run the profiler, then I'll open a new bug if we find anything useful.
Updated•3 years ago
|
Performance Impact: --- → ?
Whiteboard: [qf]
You need to log in
before you can comment on or make changes to this bug.
Description
•