Closed Bug 1786375 Opened 2 years ago Closed 2 years ago

WebKit megamorphic-load benchmark 1.6x slower in FIrefox vs Chrome

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: denispal, Unassigned)

References

(Blocks 2 open bugs)

Details

Denis Palmeiro [:denispal]

Reporter

Description

•

2 years ago

Firefox behaves quite poorly on the microbenchmark megamorphic-load.js from WebKit which can be found here https://raw.githubusercontent.com/WebKit/WebKit/main/JSTests/microbenchmarks/megamorphic-load.js, but I've changed it slightly to run a bit longer:

start = Date.now();
(function() {
    var array = [];
    for (var i = 0; i < 1000; ++i) {
        var o = {};
        o["i" + i] = i;
        o.f = 42;
        array.push(o);
    }

for (var j = 0 ; j < 100 ; j++) {
    for (var i = 0; i < 1000000; ++i) {
        var result = array[i % array.length].f;
        if (result != 42)
            throw "Error: bad result: " + result;
    }
}
})();
end = Date.now();
console.log(end-start);

Spidermonkey: 1320 ms
V8: 854 ms

The profile from perf looks like this which looks similar to a lot of other profiles from react, matrix, slack, reddit, etc:

Samples: 5K of event 'instructions', Event count (approx.): 15871908413
Overhead  Command    Shared Object         Symbol
+   67.21%  js         jitted-1355365-25.so  [.] Ion /home/denis/src/WebKit/JSTests/microbenchmarks/megamorphic-load.js:3:9
+   24.84%  js         js                    [.] js::jit::GetNativeDataPropertyPure
+    7.41%  js         jitted-1355365-24.so  [.] Ion /home/denis/src/WebKit/JSTests/microbenchmarks/megamorphic-load.js:3:9
+    0.06%  js         js                    [.] js::frontend::TokenStreamSpecific<mozilla::Utf8Unit, js::frontend::ParserAnyCharsAccess<js::frontend::GeneralParser<js::frontend::FullParseHandler, mozilla::Utf8Unit> > >::getTokenInternal
+    0.03%  js         jitted-1355365-22.so  [.] IC: MegamorphicNativeSlot
+    0.03%  js         js                    [.] js::frontend::GeneralParser<js::frontend::FullParseHandler, mozilla::Utf8Unit>::primaryExpr
+    0.02%  js         js                    [.] js::frontend::EmitterScope::lookupInCache
+    0.02%  js         jitted-1355365-18.so  [.] IC: BinaryArith.Int32.Mod

Comparing performance counters, it seems we execute almost 2x as many instructions as Chome:

Firefox:

 6,073,506,603      cycles
15,879,944,293      instructions              #    2.61  insn per cycle
 2,912,475,009      branch-instructions
       489,436      cache-misses
     1,829,848      icache_64b.iftag_miss

   1.341984818 seconds time elapsed

Chrome:

 3,749,745,343      cycles                                                      
 9,195,343,474      instructions              #    2.45  insn per cycle         
 2,141,285,555      branch-instructions                                         
       944,924      cache-misses                                                
     3,753,871      icache_64b.iftag_miss                                       

   0.812395097 seconds time elapsed

A large number of instructions are part of this sequence of code generated by Ion:

            │     ModI                                                                                                                                                                                                                        ▒
   28624373 │       mov    %rbx,%rax                                                                                                                                                                                                          ▒
            │       test   %ecx,%ecx                                                                                                                                                                                                          ▒
            │     ↓ je     337                                                                                                                                                                                                                ▒
  410882702 │       test   %ebx,%ebx                                                                                                                                                                                                          ▒
            │     ↓ js     198                                                                                                                                                                                                                ▒
    2401639 │       mov    %rcx,%rdx                                                                                                                                                                                                          ▒
   43494055 │       sub    $0x1,%edx                                                                                                                                                                                                          ▒
            │       test   %edx,%ecx                                                                                                                                                                                                          ▒
            │     ↓ jne    18f                                                                                                                                                                                                                ▒
            │       and    %ebx,%edx                                                                                                                                                                                                          ▒
            │     ↓ jmp    1af                                                                                                                                                                                                                ▒
  480812223 │18f:   xor    %edx,%edx                                                                                                                                                                                                          ▒
    2959710 │       idiv   %ecx                                                                                                                                                                                                               ▒
 1596561001 │     ↓ jmp    1af                                                                                                                                                                                                                ▒
            │198:   cmp    $0x80000000,%ebx                                                                                                                                                                                                   ▒
            │     ↓ je     345                                                                                                                                                                                                                ▒
            │1a4:   cltd                                                                                                                                                                                                                      ▒
            │       idiv   %ecx                                                                                                                                                                                                               ▒
            │       test   %edx,%edx                                                                                                                                                                                                          ▒
            │     ↓ je     33e                                                                                                                                                                                                                ▒
            │     BoundsCheck                                                                                                                                                                                                                 ▒
            │1af:   cmp    %edx,-0x28(%rbp)                                                                                                                                                                                                   ▒
            │     ↓ jbe    353                                                                                                                                                                                                                ▒
            │     SpectreMaskIndex                                                                                                                                                                                                            ▒
   43915670 │       xor    %ecx,%ecx                                                                                                                                                                                                          ▒
    2896413 │       cmp    -0x28(%rbp),%edx                                                                                                                                                                                                   ▒
  281633276 │       cmovb  %edx,%ecx                                                                                                                                                                                                          ▒
            │     MoveGroup                                                                                                                                                                                                                   ▒
   17590813 │       mov    -0x38(%rbp),%rdx                                                                                                                                                                                                   ▒
            │     LoadElementAndUnbox                                                                                                                                                                                                         ▒
    8754710 │       movabs $0xfffe000000000000,%r11                                                                                                                                                                                           ▒
            │       xor    (%rdx,%rcx,8),%r11                                                                                                                                                                                                 ◆
  761616959 │       mov    %r11,%rsi                                                                                                                                                                                                          ▒
            │       shr    $0x2f,%r11                                                                                                                                                                                                         ▒
  114574485 │     ↓ jne    35a                                                                                                                                                                                                                ▒
            │     MegamorphicLoadSlot                                                                                                                                                                                                         ▒
   95856216 │       mov    (%rsi),%rax                                                                                                                                                                                                        ▒
  997275975 │       mov    (%rax),%rax                                                                                                                                                                                                        ▒
 1164739541 │       mov    (%rax),%rax                                                                                                                                                                                                        ▒
 1040131403 │       testl  $0x40000,0x8(%rax)                                                                                                                                                                                                 ▒
 1236461039 │     ↓ jne    361                                                                                                                                                                                                                ▒
  130014040 │       movabs $0xfff9800000000000,%r11                                                                                                                                                                                           ▒
            │       push   %r11                                                                                                                                                                                                               ▒
    5805394 │       mov    %rsp,%rbx                                                                                                                                                                                                          ▒
  152958533 │       movabs $0x7f62b3b25500,%rax                                                                                                                                                                                               ▒
            │       movabs $0xa5cf504d00,%rdi                                                                                                                                                                                                 ▒
            │       sub    $0x8,%rsp                                                                                                                                                                                                          ▒
            │       mov    %rbx,%rcx                                                                                                                                                                                                          ▒
  129553533 │       mov    %rdi,%rdx                                                                                                                                                                                                          ▒
            │       mov    %rax,%rdi                                                                                                                                                                                                          ▒
    8711696 │     → call   Ion /home/denis/src/WebKit/JSTests/microbenchmarks/megamorphic-load.js:3:9                                                                                                                                         ▒
            │       add    $0x8,%rsp                                                                                                                                                                                                          ▒
  352224081 │       pop    %rcx                                                                                                                                                                                                               ▒
   17398055 │       test   $0xff,%al                                                                                                                                                                                                          ▒
            │     ↓ je     361

Denis Palmeiro [:denispal]

Reporter

Updated

•

2 years ago

Component: JavaScript Engine → JavaScript Engine: JIT

Jan de Mooij [:jandem]

Comment 1

•

2 years ago

Copying my Matrix comments, for posterity:

The fixed-size cache doesn't handle 1000 different shapes very well, resulting in a cache hit rate that's less than 1%. On most actual websites, the hit rate is around 80-95% last time I measured.
All objects have just two own properties and we're looking up one of them. This wouldn't benefit a lot from the cache even in the best case. The cache is most effective for missing properties, objects with many properties, or properties living on the proto chain.
We could experiment with a different table/hashing strategy (secondary table, different table size?).
Making the JSObject/Shape/BaseShape header words non-atomic might help a bit too.
The i % array.length is pretty slow (1024 entries is much faster than 1000). We should check if V8 is also using an idiv instruction for this.

Nicolas B. Pierron [:nbp]

Comment 2

•

2 years ago

Looking only at the generated assembly:

The major issue comes from the pointer chasing done within js::jit::MacroAssembler::branchIfNonNativeObj , and js::jit::MacroAssembler::loadObjClassUnsafe. Unfortunately there does not seems to be any way to improve by spreading the instructions as the data dependency is tight.
One way to improve this code would be to see that the Array object does not escape despite being manipulated with MArrayPush and assume that the length is a constant of 1000 after the first loop. This way we could replace the "division by a variable" by a "division by a constant". Removing the zero-check, and the sign check.
Another aspect is the register allocation chosen is not optimal. Ideally we should not have any MoveGroup being resolved for making this unconditional callWithABI. The Lowering phase should be updated to select a specific set of registers.
The register allocator only seems to restore the array in rdx using a MoveGroup instruction. Most likely due to the register flush required by the MegamorphicLoadSlot. However, loading it this late, probably cause misses in the loading of the value out of the array. The value could have been loaded memory prior the SpecterMaskIndex, using a different temporary register.
Why are BoundCheck / SpecterMaskIndex even necessary, when we are doing a modulo with the size of the array?

Nicolas B. Pierron [:nbp]

Updated

•

2 years ago

Blocks: sm-opt-jits, sm-defects-benchmark

Priority: -- → P2

Doug Thayer [:dthayer] (he/him)

Comment 3

•

2 years ago

We're basically on par with Chrome for this microbenchmark now. We can beat Chrome handily by increasing our megamorphic cache size, but I don't think that's representative of real life. Closing this out, but feel free to reopen.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

WebKit megamorphic-load benchmark 1.6x slower in FIrefox vs Chrome

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

Tracking

()

People

(Reporter: denispal, Unassigned)

References

(Blocks 2 open bugs)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3