Closed
Bug 1348772
Opened 8 years ago
Closed 7 years ago
Optimize Array.prototype.shift to be O(1) instead of O(n)
Categories
(Core :: JavaScript Engine, enhancement, P2)
Core
JavaScript Engine
Tracking
()
People
(Reporter: jandem, Assigned: jandem)
References
(Blocks 2 open bugs)
Details
(Whiteboard: [platform-rel-Outlook][platform-rel-Microsoft])
Attachments
(2 files, 1 obsolete file)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
Last week I was investigating an Elm benchmark and it uses Array#shift in a way that results in quadratic behavior for us (it uses arrays as queues). Both V8 and JSC optimize this be much faster.
We could optimize this as follows:
(1) When we shift() an element, increment the elements_ pointer to point to the second element (and move the ObjectElements header as well).
(2) In the ObjectElements header, increment an integer (numShiftedElements or something). This will allow us to recover the original pointer which we have to pass to free/realloc etc.
(3) To avoid wasting memory when we shift many elements, we would reallocate and discard this space when we grow the elements (when we have to realloc anyway) or we could do this during GC when numShiftedElements is sufficiently large.
(4) unshift could be optimized in a similar way when numShiftedElements > 0.
This shouldn't be too difficult to implement I think and it would be nice to have it fixed soonish, as it's easy to get quadratic behavior from this.
I'm attaching a micro-benchmark that takes 1 ms in Chrome/Safari and 35 ms for us.
Assignee | ||
Comment 1•8 years ago
|
||
Oh and we should probably fix GC post barriers to store the dense element index relative to the original pointer (add numShiftedElements).
Assignee | ||
Comment 2•7 years ago
|
||
I added some logging and I think this would help a lot of websites. Although shift() is typically used with small arrays (<= 3 elements), when I open outlook.com for instance it looks like they have an array with > 1900 elements and keep shift()ing one element. This will easily take more than a few ms on an average machine.
Considering this is causing quadratic behavior on real websites, we should probably get it fixed soon. I'll work on a patch this week, hopefully it won't be that hard.
Comment 3•7 years ago
|
||
This is a great idea! Both shift() and pop() can treat the elements array as a dequeue. A "shrink capacity to 1/2 when usage is <1/3" would give us O(1) amortized performance for shift() and pop(), while avoid edge cases where we keep shrinking/growing for alternating add/remove ops around the shrink/grow boundary.
Updated•7 years ago
|
platform-rel: --- → ?
Whiteboard: [qf] → [qf][platform-rel-Outlook][platform-rel-Microsoft]
Assignee | ||
Comment 4•7 years ago
|
||
This implements what I described in comment 0. It seems to work very well: we can now shift the elements header and when we grow/shrink elements we unshift these shifted elements if needed.
Most of the changes have to do with GC stuff: barriers, elements alloc/realloc/free, etc. Other than that it works pretty transparently and I didn't have to touch a lot of code. I changed Ion's optimized shift code to move the elements before (instead of after) updating the length + initializedLength, this makes it more similar to ArrayShiftDenseKernel.
The attached micro-benchmark improves from 35 ms to 0-1 ms. On Jon's GC barrier benchmark in bug 1362956 I get this:
Before:
Length: Iterations: GC state: Contents: Shift time: Pop time:
100000 1000 no GC objects 0.10338 0.0010601
100000 1000 no GC numbers 0.084298 0.00094800
100000 1000 in GC objects 1.8337 0.0011079
100000 1000 in GC numbers 0.43972 0.0010459
After:
Length: Iterations: GC state: Contents: Shift time: Pop time:
100000 1000 no GC objects 0.00022510 0.00090601
100000 1000 no GC numbers 0.00010181 0.00073486
100000 1000 in GC objects 0.00013989 0.0010242
100000 1000 in GC numbers 0.00010498 0.00071899
Note that shift() is now as fast as pop() and we no longer have the bad perf cliff when we're in the middle of an incremental GC.
We can also optimize unshift() now when there are shifted elements, but I'll do that as a follow-up.
Assignee: nobody → jdemooij
Status: NEW → ASSIGNED
Flags: needinfo?(jdemooij)
Attachment #8866435 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 5•7 years ago
|
||
Note that shift() is also used quite a lot in our chrome code, for instance when restoring tabs:
http://searchfox.org/mozilla-central/rev/7057a51c5cf29b5b115b1db19ace2cfe3fd83a0e/browser/components/sessionstore/SessionStore.jsm#3823
So this may improve session restore perf a bit for users with thousands of tabs.
Comment 6•7 years ago
|
||
Note to ehsan & QF triage team: I'm marking this "qf:p1" because that's what we'd do in triage, and I might have to miss the first half of the next triage meeting.
Whiteboard: [qf][platform-rel-Outlook][platform-rel-Microsoft] → [qf:p1][platform-rel-Outlook][platform-rel-Microsoft]
Assignee | ||
Comment 7•7 years ago
|
||
With the previous patch we could shift up to 64k elements but this patch reduces that to 2047. It doesn't matter much perf wise but it may improve memory usage in certain pathological cases, and in general I think it's good to be conservative with these things.
I noticed Chrome has a bad perf cliff somewhere around ~50,000:
Chrome shifting 50k elements: 4 ms
Chrome shifting 51k elements: > 1 second
Chrome shifting 80k elements: > 2.5 seconds
Nightly without patch shifting 80k elements: ~2.5 seconds
Nightly with patch shifting 80k elements: 4 ms
So Chrome still has the quadratic behavior when shifting more than 50k elements. Safari takes about 5 ms and is similar to us.
Attachment #8866435 -
Attachment is obsolete: true
Attachment #8866435 -
Flags: review?(jcoppeard)
Attachment #8866685 -
Flags: review?(jcoppeard)
Comment 8•7 years ago
|
||
Comment on attachment 8866685 [details] [diff] [review]
Patch
Review of attachment 8866685 [details] [diff] [review]:
-----------------------------------------------------------------
This is great. And those numbers speak for themselves.
::: js/src/vm/NativeObject.cpp
@@ +728,5 @@
> +
> + // Move the elements. Initialize to |undefined| to ensure pre-barriers
> + // don't see garbage.
> + for (size_t i = 0; i < numShifted; i++)
> + initDenseElement(i, UndefinedValue());
We need to trigger a prebarrier when shifting these elements out of the array in the first place. Does that not happen already? Oh right, this might have have the ObjectElements header copied over it.
::: js/src/vm/NativeObject.h
@@ +210,5 @@
> + static const size_t MaxShiftedElements = (1 << NumShiftedElementsBits) - 1;
> + static const size_t NumShiftedElementsShift = 32 - NumShiftedElementsBits;
> + static const size_t FlagsMask = (1 << NumShiftedElementsShift) - 1;
> + static_assert(MaxShiftedElements == 2047,
> + "MaxShiftedElements should match the comment");
It might be simpler to store flags and shifted elements count as separate uint16_ts. Setting a maximum for the shifted element count is a good idea though.
@@ +273,5 @@
> + void addShiftedElements(uint32_t count) {
> + MOZ_ASSERT(count < capacity);
> + MOZ_ASSERT(count < initializedLength);
> + MOZ_ASSERT(!(flags & (NONWRITABLE_ARRAY_LENGTH | FROZEN | COPY_ON_WRITE)));
> + uint32_t numShifted = numShiftedElements() + count;
Can this overflow?
@@ +1054,5 @@
> + void* getUnshiftedElementsHeader() const {
> + return ObjectElements::fromElements(unshiftedElements());
> + }
> +
> + uint32_t indexForBarrier(uint32_t index) const {
The name doesn't make it clear what this does. Maybe it could be called unshiftedIndex or indexIgnoringShift or something?
Attachment #8866685 -
Flags: review?(jcoppeard) → review+
Pushed by jandemooij@gmail.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/ecfa2c50a8d8
Optimize Array.prototype.shift to have O(1) perf instead of O(n). r=jonco
Assignee | ||
Comment 10•7 years ago
|
||
Thanks for the quick review.
(In reply to Jon Coppeard (:jonco) from comment #8)
> It might be simpler to store flags and shifted elements count as separate
> uint16_ts. Setting a maximum for the shifted element count is a good idea
> though.
These flags are accessed from JIT code and there's some concern that accessing 16-bit values is slower than 32-bit values. We have some workarounds for that for JSFunction flags, see this comment:
http://searchfox.org/mozilla-central/rev/d66b9f27d5630a90b2fce4d70d4e9050f43df9b4/js/src/jit/MacroAssembler-inl.h#446
> Can this overflow?
No all these values (and the result of the addition too) are <= MAX_DENSE_ELEMENTS_ALLOCATION.
> The name doesn't make it clear what this does. Maybe it could be called
> unshiftedIndex or indexIgnoringShift or something?
Good point, I renamed it to unshiftedIndex and removed the comment (because the function is no longer barrier-specific).
Comment 11•7 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
status-firefox55:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla55
Comment 13•5 years ago
|
||
Here's a test showcasing the improved performance compared to Chrome:
http://www.lonniebest.com/BadShiftPerformance/
Here is some praise:
https://www.reddit.com/r/firefox/comments/d511vl/firefox_arrayshift_is_faster_than_chrome/
Updated•3 years ago
|
Performance Impact: --- → P1
Whiteboard: [qf:p1][platform-rel-Outlook][platform-rel-Microsoft] → [platform-rel-Outlook][platform-rel-Microsoft]
You need to log in
before you can comment on or make changes to this bug.
Description
•