Open Bug 1845728 Opened 1 year ago Updated 11 months ago

Make Object.keys elidable when only the length property is used.

Categories

(Core :: JavaScript Engine: JIT, enhancement, P1)

enhancement

Tracking

()

ASSIGNED

People

(Reporter: nbp, Assigned: nbp)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(2 files)

In Bug 1827268 comment 0, the snippet of code show that Object.keys is called twice on 2 objects, in order to compare the lengths against each others, and then only one of the two is iterated over.

If we are not implementing a cache for Object.keys, we might replace one of the two Object.keys calls by a special function which only compute the length, and avoid building the array.

Attached file shallowTest.js (deleted) —

I whipped together an egregiously broken prototype of this; so suppose you have a builtin "KeysCount" which doens't allocate the array but simply returns the count that the array would have computed. You could then write shallowEquals as follows:

// What if we had a hypothetical builtin that could count keys without actually allocating? and we could
// transform this to use said builtin?
function shallowEqualKeysCount(objA, objB) {
    if (Object.is(objA, objB)) return true;
    if ("object" !== typeof objA || null === objA || "object" !== typeof objB || null === objB) return !1;
    if (KeysCount(objA) !== KeysCount(objB)) {
        return false;
    }

    var keysA = Object.keys(objA);
    for (let key = 0; key < keysA.length; key++) {
        if (!Object.prototype.hasOwnProperty.call(objB, keysA[key]) || !Object.is(objA[keysA[key]], objB[keysA[key]])) {
            return false;
        }
    }
    return true;
}

(The expectation here being hypothetically we could teach JITs enough to use this intrinsic)

The challenge evaluating this is correctly designing a representative workload. Unfortunately, I don'te have any insight into what actual react values look like that would go through shallowEquals. However, I built the attached benchmark; you have 3 groups of objects, A B and C. A and B differ in the property key set length. A and C differ only in value.

Using a jerryrigged builtin that borrows all its code from the paths for obj_keys, I measure about a 20% speedup using the counting version of shallowEquals over the non-counting version.

hyperfine '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js' 'COUNT=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js' 
Benchmark 1: /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js
  Time (mean ± σ):      1.012 s ±  0.016 s    [User: 1.042 s, System: 0.036 s]
  Range (min … max):    0.982 s …  1.032 s    10 runs
 
Benchmark 2: COUNT=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js
  Time (mean ± σ):     847.0 ms ±  10.6 ms    [User: 850.0 ms, System: 11.6 ms]
  Range (min … max):   836.9 ms … 873.3 ms    10 runs
 
Summary
  'COUNT=1 /home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js' ran
    1.19 ± 0.02 times faster than '/home/matthew/unified/obj-opt-shell-nodebug-x86_64-pc-linux-gnu/dist/bin/js ../perf/shallowTest.js'

I'll attach my poor patch to this bug.

Note: Again, the problem with this evidence is that it's not clear what the values actually look like in react applications! Note for example if you drop Group B from the benchmark, you see no speedup whatsoever -- this is surprising to me by the way! I would have expected that you would see some speedup from avoiding an array allocation, but I don't measure that without the addition of Group B.

Whiteboard: [sp3]
Severity: -- → N/A
Priority: -- → P1

FWIW, when looking at the DFG generated code from JSC it's not eliding the array allocation (but is doing it inline in JIT code) and is still 3.3x faster than SM on shallowEquals

That's a fair point. Before experimenting with scalar replacement, we could start by simply generating an inline fast allocation path for cases where the native iterator is available.

I literally do not care how others are implementing things, unless I have no clue and I need some reading material for inspiration.

Recover Instructions do not exist in other engines, and they are unlikely to make use of Recover Instructions if they do not already have an implementation of it. I also doubt they are going to implement recover instructions for this one use case.

I am not here to implement the same feature as other engines, I am here to make us faster, and we would not be faster if we keep copying.

Fear not; knowing about how other engines work doesn't mean you have to discard your own ideas for optimization. It just means that by combining both ideas you may become faster than the other engines.

(In reply to Nicolas B. Pierron [:nbp] from comment #5)

and we would not be faster if we keep copying.

Not strictly true because there are multiple engines to copy from. If the time in a function is made up by its subparts A and B, and if Chrome is fast at A and slow at B, and Safari is slow at A and fast at B, then we can be faster than either of them by copying Chrome's optimization for A and Safari's optimization for B, and be fast at A and B.

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: