Open Bug 1827268 Opened 1 year ago Updated 10 months ago

shallowEqual from React is a lot slower in SM than V8 and JSC

Categories

(Core :: JavaScript Engine: JIT, enhancement, P2)

enhancement

Tracking

()

People

(Reporter: jrmuizel, Unassigned)

References

(Depends on 2 open bugs, Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(2 files, 1 obsolete file)

function shallowEqual(objA, objB) {
  if (objectIs(objA, objB)) return !0;
  if ("object" !== typeof objA || null === objA || "object" !== typeof objB || null === objB) return !1;
  var keysA = Object.keys(objA),
    keysB = Object.keys(objB);
  if (keysA.length !== keysB.length) return !1;
  for (keysB = 0; keysB < keysA.length; keysB++) if (!hasOwnProperty$1.call(objB, keysA[keysB]) || !objectIs(objA[keysA[keysB]], objB[keysA[keysB]])) return !1;
  return !0;
}

SM: https://share.firefox.dev/3KtdDnP
V8: https://share.firefox.dev/3zMGxu3
JSC: https://share.firefox.dev/3monnHQ

The majority of the time is spent in Object.keys.

V8 is 2x faster than us. JSC is 3.6x faster. Getting to the same speed as V8 would gain us 3.3% on this benchmark. Getting to the same speed as JSC would gain us 3.3%

This function shows up as a problem in TodoMVC-React, TodoMVC-ReactRedux and MatrixReact. I suspect it also shows up in React-Stockcharts but don't have profiles showing that yet.

Whiteboard: [sp3]

Most of the time shallowEqual will run to completion and return true.

It looks like JSC is hitting this code in lower tiers:
https://searchfox.org/wubkat/rev/226c809b2e6b2461435b10075fdd24410dce2cea/Source/JavaScriptCore/runtime/ObjectConstructor.cpp#989

and at higher tiers inlines that fast path here:
https://searchfox.org/wubkat/rev/226c809b2e6b2461435b10075fdd24410dce2cea/Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp#15178

For Object.keys, most of our time is spent allocating the array object from C++. There's some slowness for the object metadata callback and various nursery bookkeeping. We might be able to carve out a faster path here.

We could also specialize Object.keys (and getOwnPropertyNames) in CacheIR. The main win would be to be able to allocate the array object in JIT code and then initialize its elements in C++. With the right caching we could avoid calling into C++ at all, but our property enumeration code seems to be pretty fast so for a first stab it might not be necessary.

(In reply to Jan de Mooij [:jandem] from comment #2)

For Object.keys, most of our time is spent allocating the array object from C++.

Yeah, we're page faulting quite a bit during the allocation. I want to try and better understand what's going on allocation wise between the different browsers.

Depends on: 1827420

Bug 1827420 should get rid of most of the AutoSetNewObjectMetadata overhead.

jonco and I talked about adding a faster, inlined code path for nursery allocation from C++. We think something more similar to the JIT's GC allocation code might work, where we optimize for the common case based on some flag.

Blocks: sm-runtime
Severity: -- → N/A
Priority: -- → P2
Depends on: 1827810
Depends on: 1836679

(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)

V8 is 2x faster than us. JSC is 3.6x faster.

Update: After bug 1839437, JSC is now only 2.8x faster. Firefox profile, WebKit profile
There's still a lot of room to improve the React subtests, just by making shallowEqual faster.

(In reply to Markus Stange [:mstange] from comment #5)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)

V8 is 2x faster than us. JSC is 3.6x faster.

Update: After bug 1839437, JSC is now only 2.8x faster. Firefox profile, WebKit profile
There's still a lot of room to improve the React subtests, just by making shallowEqual faster.

I want to better understand before taking this work, are we comparing the identical string comparisons workloads?
How is for-in cache related to shallowEqual speed? Are we sure this is an equality issue, or could this be that we are doing too many comparisons?

Flags: needinfo?(mstange.moz)
Flags: needinfo?(jmuizelaar)

One crazy talk idea:

I noticed that keysB = Object.keys(objB) is only used to compute the length of the array. Thus, we should not have to worry about things such as ordering the keys and so on, as long as we can count correctly.

If we were to optimize that only in Ion, and given the existing Firefox profile we might be slightly slower than JSC.

This might be doable using recover instructions (in cased of removed branches), and ensuring that there is no aliasing objects inbetween, and that array length prototype is not changed. However, I do not think we have any way to invalidate the code if the length property is replaced on the prototype nowadays.

There might be some way to make this work with scalar replacement?

Right now we do a regular call into the C++ implementation of Object.keys. If we turn that into a dedicated MIR node, then we could look for cases where we only do keys[idx] and/or keys.length and skip allocating the array itself.

(In reply to Nicolas B. Pierron [:nbp] from comment #6)

are we comparing the identical string comparisons workloads?

We are comparing the time spent in the JS function shallowEqual on https://shell--speedometer-preview.netlify.app/?suite=TodoMVC-React&iterationCount=100#home . It should be identical workloads. I don't know what the object shapes / field types are.

How is for-in cache related to shallowEqual speed?

See the discussion in bug 1836679.

Are we sure this is an equality issue, or could this be that we are doing too many comparisons?

I don't know.

Flags: needinfo?(mstange.moz)
Flags: needinfo?(jmuizelaar)
Assignee: nobody → nicolas.b.pierron
Status: NEW → ASSIGNED
Depends on: 1845728
Assignee: nicolas.b.pierron → nobody
Status: ASSIGNED → NEW
Attached file shallowBench.js (obsolete) (deleted) —

Attached is an attempt at creating a microbenchmark shallowEquals to measure the current state of the world outside of react.

It's interesting to note that in a profile of this, obj_keys is down to only 30% of the execution time of shallow_equals.

This version of the profile has IONPERF=ir set

Looking at where the ticks end up out of 967 ticks:

  • 341 Ticks to obj_keys
  • 168 ticks to a MegamorphicHasProp
  • 108 and 140 = 240 ticks to two MegamorphicLoadSlotByValue

Of course, all this could well be a side effect of the microbenchmark. I'll try seeing if I can figure out approximately what a real react benchmark looks like here

So, to get more ticks measured in shallowEqual, I wrapped the benchmark() call in Speedometer/resources/todomvc/architecture-examples/react/index.js in a 30 iteration loop, and upped the sample rate to 4000 on samply.

That produces this profile profile with IONPERF=ir set.

Some Observations:

  • My microbenchmark doesn't reflect react: The Megamorphic bits are gone.
  • obj_keys is up to 44% of ticks
  • 12% of time is spent in IonIC GetElems (why these show up as siblings to the ion compiled shallowEqual and not children I'm not entirely clear.
  • The rest of the profile outside of the calls is quite flat looking at the Ion IR (358 ticks); no individual op is assigned more than 12 ticks.

In the older version of React that is used in Speedometer 2, shallowEqual looks like this:

  function shallowEqual(a, b) {
    var k;
    for (k in a) {
      if (a.hasOwnProperty(k) && a[k] !== b[k]) {
        return false;
      }
    }
    for (k in b) {
      if (b.hasOwnProperty(k) && a[k] !== b[k]) {
        return false;
      }
    }
    return true;
  }

The a[k] and b[k] inside the loop end up being megamorphic. In bug 1799025, motivated in part by this code, I added an optimization to speed up obj[key] inside a for (key in obj) loop.

That optimization doesn't do anything for the new version, but maybe we can try to reuse that approach. In particular, objA[keysA[keysB]] is loading a property of objA based on an element of Object.keys(objA), so we have extra information and can do better than the megamorphic cache.

Rough sketch: we generate a specialized MIR node for Object.keys. In scalar replacement, we look for cases where the only uses of the result are Object.keys(o).length and/or Object.keys(o)[idx]. (We could support more cases if they came up elsewhere.) Object.keys(o) becomes MObjectToIterator. MIteratorLength and MLoadIteratorElement should be relatively simple to implement.

The major obstacle that might sink the whole scheme is figuring out how to close the iterator afterwards. We keep track of which iterators are currently in use because deleting a property during a for-in can mutate the iterator contents. For actual for-in loops, we have bytecode ops and trynotes to ensure that iterators are always closed appropriately. We would need a different approach if we're handling iterators that aren't present in the original bytecode.

If we could make this work, we would be able to eliminate the allocation of the Object.keys arrays, and replace one of the megamorphic loads with a faster version. The megamorphic hasProp and the second megamorphic load are harder to eliminate because they're using keys from one object to access another object.

Just to make sure I'm on the same page: You think this will work on the react workload, rather than the microbenchmark I built (which turns out to not be reflective of the react benchmark? )

I still need to profile the other react subtests -- got distracted :D

Following the same procedure as Comment 12, react-redux -- looks very similar to react above.

Angular does have a similar function to shallowEqual (sharing a name) but it is barely called at all in the testing.

Unfortunately I haven't been able to get profiles of other potential users of shallowEqual in the browser.

(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #14)

Just to make sure I'm on the same page: You think this will work on the react workload, rather than the microbenchmark I built (which turns out to not be reflective of the react benchmark? )

Hmm, good question. I wrote my comment before reading yours.

I think the megamorphic bits are gone because we failed to transpile the baseline ICs. Note that there are GetElem ICs as siblings (presumably replacing the MegamorphicLoadSlotByValue from your microbenchmark) and a HasOwn IC (VMWrapper: IonHasOwnICUpdate shows up as a child). It would be interesting to figure out why we couldn't transpile a megamorphic IC from baseline; maybe some value flowed in that we don't support, like an object with a getter?

The iterator indices optimization can optimize getProp and hasProp ICs, so that change doesn't necessarily rule out the scalar replacement approach. But scalar replacement is complicated and uncertain; figuring out why our megamorphic ICs don't support real React should be easier. I'd start there.

If I bump the iteration count and change shallowEqual() in the attached benchmark to be:

function shallowEqual(objA, objB) {
        return objectIs(objA, objB);
}

I get:

V8:  Infinity   2329.542 equal:  19000000 not equal:  342000000
SM:  Infinity   4153.805 equal:  19000000 not equal:  342000000
JSC: Infinity   2330.136 equal:  19000000 not equal:  342000000

Hmm. So it appears that we just never try to transpile the baseline ICs because we have multiple active stubs; using some home-brewed printing I get the following stubs:

Starting to dump shallowEqual stubs
This stub has an entry count of 891
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
This stub has an entry count of 594
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
This stub has an entry count of 297
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
>> All Stubs dumped

I haven't quite managed to get what the actual atoms are yet, but is this something where the answer would be stub folding on the specific atom?

Printing the keys in the benchmark it seems we always have exactly 3 keys; todo,dispatch,index (oops. hit submit by accident)

This would explain us having 3 stubs, and why we can't transpile.

You could try using Caroline's CacheIRHealthReport work as a less home-brewed way to get this information. I believe it has a feature where IC chains with identical CacheIR will report which ops have the same fields, and which ops have different fields, which would help to determine whether there are multiple atoms and/or multiple shapes here.

Do your results imply that all these objects have exactly the same shape? I would not have predicted that. Is that normal for React, or a consequence of the way the benchmark was written?

On the three attaches I see at the GetElem I'm looking at they're all for the same object (and thus same shape); but given the chains are also length 3 by the end... I think we can conclude they always have the same shape.

My (weak) understanding of React, that you have a state object which updated through copies would lead me to conclude that having the same shape is not unexpected.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)

If I bump the iteration count and change shallowEqual() in the attached benchmark to be:

I think I'm going to mark the microbenchmark as obsolete for now -- we probably can rebuild it based on what I'm learning from TodoMCV/React.

At the moment I think it would be confusing to look at it too much more.

Attachment #9346265 - Attachment is obsolete: true

I would expect similar shapes for each React component type, and TodoMVC probably just doesn't have a lot of different React components.
Let's check the code: todo,dispatch,index are the props of the Item component: https://github.com/WebKit/Speedometer/blob/0fef685dcbbc4d56e82b9f4bdcb63b02a6270d73/resources/todomvc/architecture-examples/react/src/todo/components/item.jsx#L6
There are four other components in the components directory so I would expect their props to have other shapes.

IIRC shallowEqual gets called for the props of a component to determine if it should re-render. It's possible that Item is the only component that gets re-rendered during the benchmark.

You could use the Firefox profiler for a more realistic React testcase.

I think we will end up going megamorphic if the total number of props across all components exceeds six. That isn't true here, but I would assume it's true for most non-trivial React code. That said, the code we currently generate may well be faster than the megamorphic cache would have been, so we don't necessarily need to fix anything here.

(In reply to Iain Ireland [:iain] from comment #13)

Rough sketch: we generate a specialized MIR node for Object.keys. In scalar replacement, we look for cases where the only uses of the result are Object.keys(o).length and/or Object.keys(o)[idx]. (We could support more cases if they came up elsewhere.) Object.keys(o) becomes MObjectToIterator. MIteratorLength and MLoadIteratorElement should be relatively simple to implement.

I am currently looking at this issue as part of Bug 1845728, without looking at the iterator aspect yet.

I gathered per line sample counts on react-todoMVC-long. You can see that Object.keys is still a big problem in SM compared to JSC. There's also a lot of samples being attributed to the for loop. (I haven't looked into why yet)

You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: