Open Bug 1820627 Opened 2 years ago Updated 1 year ago

A lot more time is spent in completeWork on matrix-react compared to Chrome

Categories

(Core :: JavaScript Engine, defect, P2)

defect

Tracking

()

People

(Reporter: mstange, Unassigned)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [sp3])

Attachments

(1 file)

In perf profiles of the shell version of the Matrix React benchmark, we identified completeWork as a JS function which had a significant difference in "self time" (JS-only view) between Spidermonkey and V8.

870 self samples in Spidermonkey: https://share.firefox.dev/3YqvS1S
649 self samples in V8: https://share.firefox.dev/3yeHLgG

These profiles are filtered so that they only show stacks which we consider "self time" of the JS function.

Many of the extra samples seems to be in the Ion-compiled code for the function, 408 vs 184.

Depends on: 1820605

The hottest hot spot in this function, by self time, is this part of a IsNullOrLikeUndefinedAndBranchV:

          mov rsi, -0x2000000000000
          xor rsi, rbx
          mov rdi, qword [rsi]             // Load shape from object
51   51   mov rdi, qword [rdi]             // Load baseshape from shape
32   32   mov rdi, qword [rdi]             // Load class from baseshape
14   14   test dword [rdi + 0x8], 0x80000  // Check for proxy class
 3    3   jnz $+0x46bb
 1    1   test dword [rdi + 0x8], 0x40     // Check emulates-undefined flag
          jnz $+0x5
 1    1   jmp $+0x1c

This corresponds to something like if (!someValtue) in the source code. We've already determined that the value is an object, but we have to verify that it isn't the stupid document.all object, which is falsy. To do so, we have to load the shape, base shape, and class.

It looks like V8 has an "undetectable" bit on their shape equivalent that they can check for the same reason. Now that there's a proxy bit on the shape, maybe it makes sense to hoist the emulatesUndefined bit to the shape too. (Prior to that, we had to load the class anyway to check whether the object was potentially a proxy for document.all.)

I hate that there's all this machinery to support a compatibility hack that hasn't been relevant in decades, but here we are.

My initial results are somewhat disappointing. I wrote a patch to hoist the flag out of the class and into an ObjectFlag on the shape, which should eliminate the baseshape/class loads. Profiling that version:

          mov rsi, -0x2000000000000
          xor rsi, rbx
 1        mov rdi, qword [rsi]            // Load shape from object
51   51   test dword [rdi + 0x8], 0x30    // Check for proxy class
22   22   jz $+0x468f                     
          test dword [rdi + 0xc], 0x2000  // Check emulates-undefined flag
 1    1   jnz $+0x5

We still end up with a lot of samples loading the shape of the object. I suppose it's possible that there are fewer distinct base shapes and (especially) classes than objects/shapes, so the loads I eliminated are more likely to hit in the cache.

Trying a run in CI to see if there's any measurable performance difference.

Here are updated profiles taken with fixed per-sample overhead.
SM: https://share.firefox.dev/3l7u2FL
V8: https://share.firefox.dev/3yBURVu
The difference in completeWork self time is now less pronounced: 1555 vs 1416 samples. So it's still worth looking into, but lower hanging fruit than other places in matrix-react. The larger difference in the original profiles may have been caused by stack-depth-dependent sampling overhead.

That said, the optimization still sounds worthwhile to my untrained ears. If you have three places which access a certain piece of memory, then you won't see improvements until you've eliminated all three. That doesn't mean that each individual elimination isn't worth taking.

I've put up the patch as a WIP. Does it affect your profile numbers at all?

The current version has some small costs elsewhere, although I think we could remove most of those if we decided this was worth doing at all.

Adding NI to Markus, based on comments 4 and 5.

Flags: needinfo?(mstange.moz)
Severity: -- → S3
Priority: -- → P2

I haven't tested the patch yet.

In the meantime, I noticed that this function also shows up on speedometer 3. The difference to Chrome makes up:

  • 1.3% on TodoMVC-React
  • 1.3% on TodoMVC-React-Redux
  • 1.0% on TodoMVC-React-Complex-DOM
  • 0.2% on React-Stockcharts-SVG
Blocks: speedometer3
Whiteboard: [sp3]

I tried out the patch on react-long and it made a noticeable difference on the number of samples in completeWork:
before 1283 samples: https://share.firefox.dev/3qZBDJr
after 968 samples: https://share.firefox.dev/45GwQM7

On the specific line null != workInProgress.stateNode where the IsNullOrLikeUndefinedAndBranchV comes frome we go from 577 samples down to 323 samples.

I also looked into what WebKit and V8 do.

WebKit has a watchpoint that fires if a masqueradesAsUndefined object is ever created. If the watchpoint hasn't fired then they just generate the simple code that you'd expect: https://searchfox.org/wubkat/rev/2347cf165a5a540f8da145a38150c0a3f3f8781e/Source/JavaScriptCore/dfg/DFGSpeculativeJIT.cpp#8217

If the watchpoint has fired they look at a type info flag which lives in the Structure (Shape) https://searchfox.org/wubkat/rev/2347cf165a5a540f8da145a38150c0a3f3f8781e/Source/JavaScriptCore/runtime/JSTypeInfo.h#40

Profiling JSC I see 724 samples in completeWork (pre watchpoint) vs 1004 samples if I manually trigger the watchpoint with var IsHTMLDDA = $262.IsHTMLDDA; (192 sample vs 5 samples on the null != workInProgress.stateNode)

In V8 I see 122 samples on that line it looks like it just uses a flag on the shape like JSC and doesn't have any watchpoint/fuse system to avoid the flag check.

Flags: needinfo?(mstange.moz)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: