1827268 - shallowEqual from React is a lot slower in SM than V8 and JSC

Reporter

Description

•

1 year ago

function shallowEqual(objA, objB) {
  if (objectIs(objA, objB)) return !0;
  if ("object" !== typeof objA || null === objA || "object" !== typeof objB || null === objB) return !1;
  var keysA = Object.keys(objA),
    keysB = Object.keys(objB);
  if (keysA.length !== keysB.length) return !1;
  for (keysB = 0; keysB < keysA.length; keysB++) if (!hasOwnProperty$1.call(objB, keysA[keysB]) || !objectIs(objA[keysA[keysB]], objB[keysA[keysB]])) return !1;
  return !0;
}

SM: https://share.firefox.dev/3KtdDnP
V8: https://share.firefox.dev/3zMGxu3
JSC: https://share.firefox.dev/3monnHQ

The majority of the time is spent in Object.keys.

V8 is 2x faster than us. JSC is 3.6x faster. Getting to the same speed as V8 would gain us 3.3% on this benchmark. Getting to the same speed as JSC would gain us 3.3%

This function shows up as a problem in TodoMVC-React, TodoMVC-ReactRedux and MatrixReact. I suspect it also shows up in React-Stockcharts but don't have profiles showing that yet.

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

1 year ago

Whiteboard: [sp3]

Jira Integration Bot

Updated

•

1 year ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-317

Markus Stange [:mstange]

Updated

•

1 year ago

Blocks: speedometer3

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 1

•

1 year ago

Most of the time shallowEqual will run to completion and return true.

It looks like JSC is hitting this code in lower tiers:
https://searchfox.org/wubkat/rev/226c809b2e6b2461435b10075fdd24410dce2cea/Source/JavaScriptCore/runtime/ObjectConstructor.cpp#989

and at higher tiers inlines that fast path here:
https://searchfox.org/wubkat/rev/226c809b2e6b2461435b10075fdd24410dce2cea/Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp#15178

Jan de Mooij [:jandem]

Comment 2

•

1 year ago

For Object.keys, most of our time is spent allocating the array object from C++. There's some slowness for the object metadata callback and various nursery bookkeeping. We might be able to carve out a faster path here.

We could also specialize Object.keys (and getOwnPropertyNames) in CacheIR. The main win would be to be able to allocate the array object in JIT code and then initialize its elements in C++. With the right caching we could avoid calling into C++ at all, but our property enumeration code seems to be pretty fast so for a first stab it might not be necessary.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 3

•

1 year ago

(In reply to Jan de Mooij [:jandem] from comment #2)

For Object.keys, most of our time is spent allocating the array object from C++.

Yeah, we're page faulting quite a bit during the allocation. I want to try and better understand what's going on allocation wise between the different browsers.

Jan de Mooij [:jandem]

Updated

•

1 year ago

Depends on: 1827420

Jan de Mooij [:jandem]

Comment 4

•

1 year ago

Bug 1827420 should get rid of most of the AutoSetNewObjectMetadata overhead.

jonco and I talked about adding a faster, inlined code path for nursery allocation from C++. We think something more similar to the JIT's GC allocation code might work, where we optimize for the common case based on some flag.

Nicolas B. Pierron [:nbp]

Updated

•

1 year ago

Blocks: sm-runtime

Severity: -- → N/A

Priority: -- → P2

Jan de Mooij [:jandem]

Updated

•

1 year ago

Depends on: 1827810

Jeff Muizelaar [:jrmuizel]

Reporter

Updated

•

1 year ago

Depends on: 1836679

Markus Stange [:mstange]

Comment 5

•

1 year ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)

V8 is 2x faster than us. JSC is 3.6x faster.

Update: After bug 1839437, JSC is now only 2.8x faster. Firefox profile, WebKit profile
There's still a lot of room to improve the React subtests, just by making shallowEqual faster.

Nicolas B. Pierron [:nbp]

Comment 6

•

1 year ago

(In reply to Markus Stange [:mstange] from comment #5)

(In reply to Jeff Muizelaar [:jrmuizel] from comment #0)

V8 is 2x faster than us. JSC is 3.6x faster.

Update: After bug 1839437, JSC is now only 2.8x faster. Firefox profile, WebKit profile
There's still a lot of room to improve the React subtests, just by making shallowEqual faster.

I want to better understand before taking this work, are we comparing the identical string comparisons workloads?
How is for-in cache related to shallowEqual speed? Are we sure this is an equality issue, or could this be that we are doing too many comparisons?

Flags: needinfo?(mstange.moz)

Flags: needinfo?(jmuizelaar)

Nicolas B. Pierron [:nbp]

Comment 7

•

1 year ago

One crazy talk idea:

I noticed that keysB = Object.keys(objB) is only used to compute the length of the array. Thus, we should not have to worry about things such as ordering the keys and so on, as long as we can count correctly.

If we were to optimize that only in Ion, and given the existing Firefox profile we might be slightly slower than JSC.

This might be doable using recover instructions (in cased of removed branches), and ensuring that there is no aliasing objects inbetween, and that array length prototype is not changed. However, I do not think we have any way to invalidate the code if the length property is replaced on the prototype nowadays.

Iain Ireland [:iain]

Comment 8

•

1 year ago

There might be some way to make this work with scalar replacement?

Right now we do a regular call into the C++ implementation of Object.keys. If we turn that into a dedicated MIR node, then we could look for cases where we only do keys[idx] and/or keys.length and skip allocating the array itself.

Markus Stange [:mstange]

Comment 9

•

1 year ago

(In reply to Nicolas B. Pierron [:nbp] from comment #6)

are we comparing the identical string comparisons workloads?

We are comparing the time spent in the JS function shallowEqual on https://shell--speedometer-preview.netlify.app/?suite=TodoMVC-React&iterationCount=100#home . It should be identical workloads. I don't know what the object shapes / field types are.

How is for-in cache related to shallowEqual speed?

See the discussion in bug 1836679.

Are we sure this is an equality issue, or could this be that we are doing too many comparisons?

I don't know.

Flags: needinfo?(mstange.moz)

Flags: needinfo?(jmuizelaar)

Nicolas B. Pierron [:nbp]

Updated

•

1 year ago

Assignee: nobody → nicolas.b.pierron

Status: NEW → ASSIGNED

Nicolas B. Pierron [:nbp]

Updated

•

11 months ago

Depends on: 1845728

Nicolas B. Pierron [:nbp]

Updated

•

11 months ago

Assignee: nicolas.b.pierron → nobody

Status: ASSIGNED → NEW

Matthew Gaudet (he/him) [:mgaudet]

Comment 10

•

11 months ago

Attached file shallowBench.js (obsolete) (deleted) — Details

Attached is an attempt at creating a microbenchmark shallowEquals to measure the current state of the world outside of react.

It's interesting to note that in a profile of this, obj_keys is down to only 30% of the execution time of shallow_equals.

Matthew Gaudet (he/him) [:mgaudet]

Comment 11

•

11 months ago

This version of the profile has IONPERF=ir set

Looking at where the ticks end up out of 967 ticks:

341 Ticks to obj_keys
168 ticks to a MegamorphicHasProp
108 and 140 = 240 ticks to two MegamorphicLoadSlotByValue

Of course, all this could well be a side effect of the microbenchmark. I'll try seeing if I can figure out approximately what a real react benchmark looks like here

Matthew Gaudet (he/him) [:mgaudet]

Comment 12

•

11 months ago

So, to get more ticks measured in shallowEqual, I wrapped the benchmark() call in Speedometer/resources/todomvc/architecture-examples/react/index.js in a 30 iteration loop, and upped the sample rate to 4000 on samply.

That produces this profile profile with IONPERF=ir set.

Some Observations:

My microbenchmark doesn't reflect react: The Megamorphic bits are gone.
obj_keys is up to 44% of ticks
12% of time is spent in IonIC GetElems (why these show up as siblings to the ion compiled shallowEqual and not children I'm not entirely clear.
The rest of the profile outside of the calls is quite flat looking at the Ion IR (358 ticks); no individual op is assigned more than 12 ticks.

Iain Ireland [:iain]

Comment 13

•

11 months ago

In the older version of React that is used in Speedometer 2, shallowEqual looks like this:

  function shallowEqual(a, b) {
    var k;
    for (k in a) {
      if (a.hasOwnProperty(k) && a[k] !== b[k]) {
        return false;
      }
    }
    for (k in b) {
      if (b.hasOwnProperty(k) && a[k] !== b[k]) {
        return false;
      }
    }
    return true;
  }

The a[k] and b[k] inside the loop end up being megamorphic. In bug 1799025, motivated in part by this code, I added an optimization to speed up obj[key] inside a for (key in obj) loop.

That optimization doesn't do anything for the new version, but maybe we can try to reuse that approach. In particular, objA[keysA[keysB]] is loading a property of objA based on an element of Object.keys(objA), so we have extra information and can do better than the megamorphic cache.

Rough sketch: we generate a specialized MIR node for Object.keys. In scalar replacement, we look for cases where the only uses of the result are Object.keys(o).length and/or Object.keys(o)[idx]. (We could support more cases if they came up elsewhere.) Object.keys(o) becomes MObjectToIterator. MIteratorLength and MLoadIteratorElement should be relatively simple to implement.

The major obstacle that might sink the whole scheme is figuring out how to close the iterator afterwards. We keep track of which iterators are currently in use because deleting a property during a for-in can mutate the iterator contents. For actual for-in loops, we have bytecode ops and trynotes to ensure that iterators are always closed appropriately. We would need a different approach if we're handling iterators that aren't present in the original bytecode.

If we could make this work, we would be able to eliminate the allocation of the Object.keys arrays, and replace one of the megamorphic loads with a faster version. The megamorphic hasProp and the second megamorphic load are harder to eliminate because they're using keys from one object to access another object.

Matthew Gaudet (he/him) [:mgaudet]

Comment 14

•

11 months ago

Just to make sure I'm on the same page: You think this will work on the react workload, rather than the microbenchmark I built (which turns out to not be reflective of the react benchmark? )

I still need to profile the other react subtests -- got distracted :D

Matthew Gaudet (he/him) [:mgaudet]

Comment 15

•

11 months ago

Following the same procedure as Comment 12, react-redux -- looks very similar to react above.

Angular does have a similar function to shallowEqual (sharing a name) but it is barely called at all in the testing.

Unfortunately I haven't been able to get profiles of other potential users of shallowEqual in the browser.

Iain Ireland [:iain]

Comment 16

•

11 months ago

(In reply to Matthew Gaudet (he/him) [:mgaudet] from comment #14)

Just to make sure I'm on the same page: You think this will work on the react workload, rather than the microbenchmark I built (which turns out to not be reflective of the react benchmark? )

Hmm, good question. I wrote my comment before reading yours.

I think the megamorphic bits are gone because we failed to transpile the baseline ICs. Note that there are GetElem ICs as siblings (presumably replacing the MegamorphicLoadSlotByValue from your microbenchmark) and a HasOwn IC (VMWrapper: IonHasOwnICUpdate shows up as a child). It would be interesting to figure out why we couldn't transpile a megamorphic IC from baseline; maybe some value flowed in that we don't support, like an object with a getter?

The iterator indices optimization can optimize getProp and hasProp ICs, so that change doesn't necessarily rule out the scalar replacement approach. But scalar replacement is complicated and uncertain; figuring out why our megamorphic ICs don't support real React should be easier. I'd start there.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 17

•

11 months ago

If I bump the iteration count and change shallowEqual() in the attached benchmark to be:

function shallowEqual(objA, objB) {
        return objectIs(objA, objB);
}

I get:

V8:  Infinity   2329.542 equal:  19000000 not equal:  342000000
SM:  Infinity   4153.805 equal:  19000000 not equal:  342000000
JSC: Infinity   2330.136 equal:  19000000 not equal:  342000000

Matthew Gaudet (he/him) [:mgaudet]

Comment 18

•

11 months ago

Hmm. So it appears that we just never try to transpile the baseline ICs because we have multiple active stubs; using some home-brewed printing I get the following stubs:

Starting to dump shallowEqual stubs
This stub has an entry count of 891
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
This stub has an entry count of 594
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
This stub has an entry count of 297
shallowEqualStub  GuardToObject                 inputId 0
shallowEqualStub  GuardToString                 inputId 1
shallowEqualStub  GuardSpecificAtom             strId 1, expectedOffset 0
shallowEqualStub  GuardShape                    objId 0, shapeOffset 8
shallowEqualStub  LoadFixedSlotResult           objId 0, offsetOffset 16
shallowEqualStub  ReturnFromIC                  
--- end  stub ---
>> All Stubs dumped

I haven't quite managed to get what the actual atoms are yet, but is this something where the answer would be stub folding on the specific atom?

Matthew Gaudet (he/him) [:mgaudet]

Comment 19

•

11 months ago

Printing the keys in the benchmark it seems we always have exactly 3 keys; todo,dispatch,index (oops. hit submit by accident)

This would explain us having 3 stubs, and why we can't transpile.

Iain Ireland [:iain]

Comment 20

•

11 months ago

You could try using Caroline's CacheIRHealthReport work as a less home-brewed way to get this information. I believe it has a feature where IC chains with identical CacheIR will report which ops have the same fields, and which ops have different fields, which would help to determine whether there are multiple atoms and/or multiple shapes here.

Do your results imply that all these objects have exactly the same shape? I would not have predicted that. Is that normal for React, or a consequence of the way the benchmark was written?

Matthew Gaudet (he/him) [:mgaudet]

Comment 21

•

11 months ago

On the three attaches I see at the GetElem I'm looking at they're all for the same object (and thus same shape); but given the chains are also length 3 by the end... I think we can conclude they always have the same shape.

My (weak) understanding of React, that you have a state object which updated through copies would lead me to conclude that having the same shape is not unexpected.

Matthew Gaudet (he/him) [:mgaudet]

Comment 22

•

11 months ago

(In reply to Jeff Muizelaar [:jrmuizel] from comment #17)

If I bump the iteration count and change shallowEqual() in the attached benchmark to be:

I think I'm going to mark the microbenchmark as obsolete for now -- we probably can rebuild it based on what I'm learning from TodoMCV/React.

At the moment I think it would be confusing to look at it too much more.

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

11 months ago

Attachment #9346265 - Attachment is obsolete: true

Markus Stange [:mstange]

Comment 23

•

11 months ago

I would expect similar shapes for each React component type, and TodoMVC probably just doesn't have a lot of different React components.
Let's check the code: todo,dispatch,index are the props of the Item component: https://github.com/WebKit/Speedometer/blob/0fef685dcbbc4d56e82b9f4bdcb63b02a6270d73/resources/todomvc/architecture-examples/react/src/todo/components/item.jsx#L6
There are four other components in the components directory so I would expect their props to have other shapes.

IIRC shallowEqual gets called for the props of a component to determine if it should re-render. It's possible that Item is the only component that gets re-rendered during the benchmark.

You could use the Firefox profiler for a more realistic React testcase.

Iain Ireland [:iain]

Comment 24

•

11 months ago

I think we will end up going megamorphic if the total number of props across all components exceeds six. That isn't true here, but I would assume it's true for most non-trivial React code. That said, the code we currently generate may well be faster than the megamorphic cache would have been, so we don't necessarily need to fix anything here.

Nicolas B. Pierron [:nbp]

Comment 25

•

11 months ago

(In reply to Iain Ireland [:iain] from comment #13)

Rough sketch: we generate a specialized MIR node for Object.keys. In scalar replacement, we look for cases where the only uses of the result are Object.keys(o).length and/or Object.keys(o)[idx]. (We could support more cases if they came up elsewhere.) Object.keys(o) becomes MObjectToIterator. MIteratorLength and MLoadIteratorElement should be relatively simple to implement.

I am currently looking at this issue as part of Bug 1845728, without looking at the iterator aspect yet.

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 26

•

10 months ago

Attached image per line sample counts with jsc on react-long (deleted) — Details

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 27

•

10 months ago

Attached image per line sample counts with sm on react-long (deleted) — Details

Jeff Muizelaar [:jrmuizel]

Reporter

Comment 28

•

10 months ago

I gathered per line sample counts on react-todoMVC-long. You can see that Object.keys is still a big problem in SM compared to JSC. There's also a lot of samples being attributed to the for loop. (I haven't looked into why yet)

shallowBench.js 11 months ago Matthew Gaudet (he/him) [:mgaudet] (deleted), application/x-javascript		Details
per line sample counts with jsc on react-long 10 months ago Jeff Muizelaar [:jrmuizel] (deleted), image/png		Details
per line sample counts with sm on react-long 10 months ago Jeff Muizelaar [:jrmuizel] (deleted), image/png		Details