Open Bug 1805280 Opened 2 years ago Updated 2 years ago

High cache miss count for subtest React-TodoMVC in GrandPrix

Categories

(Core :: JavaScript Engine, task, P3)

task

Tracking

()

People

(Reporter: denispal, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3-react-todomvc])

The React-TodoMVC subtest, located at https://grandprixbench.netlify.app/?suites=React-TodoMVC has a pretty bad IPC compared to Chromium, and it seems largely related to cache misses, especially in the icache.

Chromium:

Performance counter stats for process id '1329513':

   114,250,901,598      cycles                                                             
   160,562,154,374      instructions                     #    1.41  insn per cycle         
    32,684,497,884      branch-instructions                                                
       811,569,390      cache-misses                                                       
     2,500,193,250      icache_64b.iftag_miss                                              

      42.170563185 seconds time elapsed

Firefox:

 Performance counter stats for process id '1330136':

   128,673,347,788      cycles                                                             
   148,228,288,424      instructions                     #    1.15  insn per cycle         
    28,009,990,401      branch-instructions                                                
     1,246,536,046      cache-misses                                                       
     4,082,621,693      icache_64b.iftag_miss                                              

      42.165150893 seconds time elapsed

Here is a surface profile of the cache-misses for this test. Most of them seem to be coming from the GC and jumping into Trampolines.

Samples: 64K of event 'cache-misses', Event count (approx.): 762002296
  Overhead  Command          Shared Object            Symbol
+    4.56%  Isolated Web Co  libxul.so                [.] js::GCMarker::doMarking<0u>
+    2.17%  Isolated Web Co  jitted-1331115-1.so      [.] Trampolines
+    1.49%  Isolated Web Co  libxul.so                [.] js::jit::InitRestParameter
+    1.37%  Isolated Web Co  [unknown]                [k] 0xffffffff99149667
+    1.30%  Isolated Web Co  libxul.so                [.] js::TraceManuallyBarrieredGCCellPtr
+    1.25%  Isolated Web Co  libxul.so                [.] js::AtomizeString
+    1.24%  Isolated Web Co  libxul.so                [.] nsPurpleBuffer::VisitEntries<SnowWhiteKiller>
+    1.12%  Isolated Web Co  libxul.so                [.] js::jit::GetNativeDataPropertyByValuePure
+    1.00%  Isolated Web Co  libxul.so                [.] js::TenuringTracer::traceObject
+    0.93%  Isolated Web Co  libxul.so                [.] js::NativeObject::addProperty
+    0.87%  Isolated Web Co  libxul.so                [.] js::jit::SetElementMegamorphic
+    0.78%  Isolated Web Co  libxul.so                [.] js::jit::ICEntry::trace
+    0.62%  Isolated Web Co  libxul.so                [.] js::GenericTracerImpl<js::gc::MarkingTracerT<0u> >::onJitCodeEdge
+    0.61%  Isolated Web Co  libc.so.6                [.] __strncpy_sse2_unaligned
+    0.61%  Isolated Web Co  libc.so.6                [.] __stpncpy_sse2_unaligned
+    0.56%  Isolated Web Co  jitted-1331115-782.so    [.] RegExp
+    0.56%  Isolated Web Co  libxul.so                [.] js::jit::GetNativeDataPropertyPure
+    0.53%  Isolated Web Co  firefox                  [.] Allocator<MozJemallocBase>::free

Same thing but for icache misses:

Samples: 20K of event 'icache_64b.iftag_miss', Event count (approx.): 4048060720
  Overhead  Command          Shared Object            Symbol
+    1.61%  Isolated Web Co  libxul.so                [.] js::jit::GetNativeDataPropertyByValuePure                                                                                                                                          ◆
+    1.22%  Isolated Web Co  libxul.so                [.] js::jit::SetElementMegamorphic                                                                                                                                                     ▒
+    1.14%  Isolated Web Co  jitted-1331115-4.so      [.] BaselineInterpreter                                                                                                                                                                ▒
+    1.04%  Isolated Web Co  libxul.so                [.] js::AtomizeString                                                                                                                                                                  ▒
+    1.02%  Isolated Web Co  jitted-1331115-1.so      [.] Trampolines                                                                                                                                                                        ▒
+    0.78%  Isolated Web Co  libxul.so                [.] js::GetIterator                                                                                                                                                                    ▒
+    0.72%  Isolated Web Co  libc.so.6                [.] __strncpy_sse2_unaligned                                                                                                                                                           ▒
+    0.63%  Isolated Web Co  libxul.so                [.] js::NativeObject::addProperty                                                                                                                                                      ▒
+    0.55%  Isolated Web Co  libxul.so                [.] Interpret                                                  

For comparison, Chromiun spends very little time outside of JIT code for this benchmark:

Samples: 83K of event 'cycles', Event count (approx.): 101000369374, Thread: chrome
  Overhead  Comman  Shared Object           Symbol
+    3.59%  chrome  chrome                  [.] Builtins_KeyedStoreIC_Megamorphic                                                                                                                                                            ◆
+    2.79%  chrome  chrome                  [.] Builtins_LoadIC                                                                                                                                                                              ▒
+    2.55%  chrome  chrome                  [.] Builtins_KeyedLoadIC_Megamorphic                                                                                                                                                             ▒
+    2.23%  chrome  chrome                  [.] v8::internal::StringTable::TryStringToIndexOrLookupExisting                                                                                                                                  ▒
+    2.17%  chrome  chrome                  [.] Builtins_ObjectPrototypeHasOwnProperty                                                                                                                                                       ▒
+    1.85%  chrome  chrome                  [.] std::Cr::__introsort<std::Cr::_ClassicAlgPolicy, v8::internal::EnumIndexComparator<v8::internal::NameDictionary>&, v8::internal::AtomicSlot>                                                 ▒
+    1.62%  chrome  chrome                  [.] Builtins_LoadIC_Megamorphic                                                                                                                                                                  ▒
+    1.04%  chrome  chrome                  [.] v8::internal::FastKeyAccumulator::GetKeys                                                                                                                                                    ▒
+    0.98%  chrome  chrome                  [.] blink::HTMLCollection::length                                                                                                                                                                ▒
+    0.75%  chrome  chrome                  [.] allocator_shim::internal::PartitionMalloc                                                                                                                                                    ▒
+    0.75%  chrome  chrome                  [.] Builtins_ForInFilter                                                                                                                                                                         ▒
+    0.66%  chrome  chrome                  [.] Builtins_StoreIC                                                                                                                                                                             ▒
+    0.64%  chrome  chrome                  [.] Builtins_CallFunction_ReceiverIsNotNullOrUndefined                                                                                                                                           ▒
+    0.61%  chrome  chrome                  [.] Builtins_BaselineOutOfLinePrologue                                                                                                                                                           ▒
+    0.52%  chrome  chrome                  [.] Builtins_StrictEqual_WithFeedback                                                                                                                                                            ▒
     0.49%  chrome  chrome                  [.] Builtins_StrictEqual_Baseline                                                                                                                                                                ▒
     0.47%  chrome  chrome                  [.] Builtins_ToBooleanForBaselineJump                                                                                                                                                            ▒
     0.45%  chrome  chrome                  [.] v8::internal::BaseNameDictionary<v8::internal::NameDictionary, v8::internal::NameDictionaryShape>::Add                                                                                       ▒
     0.45%  chrome  chrome                  [.] Builtins_KeyedStoreIC                                                                                                                                                                        ▒
     0.43%  chrome  chrome                  [.] v8::internal::Runtime_RegExpExecMultiple                                                                                                                                                     ▒
     0.39%  chrome  chrome                  [.] allocator_shim::internal::PartitionFree                                                                                                                                                      ▒
     0.37%  chrome  chrome                  [.] Builtins_SetDataProperties                                                                                                                                                                   ▒
     0.36%  chrome  chrome                  [.] Builtins_CallFunction_ReceiverIsNullOrUndefined                                                                                                                                              ▒
     0.36%  chrome  chrome                  [.] v8::internal::LookupIterator::Start<false>                                                                                                                                                   ▒
     0.32%  chrome  chrome                  [.] v8::internal::RegExpGlobalCache::FetchNext                                                                                                                                                   ▒
     0.31%  chrome  chrome                  [.] v8::internal::RegExpGlobalCache::RegExpGlobalCache                                                                                                                                           ▒
     0.29%  chrome  chrome                  [.] v8::internal::String::WriteToFlat<unsigned char>                                                                                                                                             ▒
     0.28%  chrome  chrome                  [.] Builtins_RecordWriteSaveFP                                                                                                                                                                   ▒
     0.26%  chrome  chrome                  [.] Builtins_BaselineLeaveFrame                                                                                                                                                                  ▒
     0.24%  chrome  chrome                  [.] Builtins_KeyedLoadIC                                                                                                                                                                         ▒
     0.24%  chrome  chrome                  [.] Builtins_LoadICTrampoline_Megamorphic                                                                                                                                                        ▒
     0.22%  chrome  chrome                  [.] Builtins_Call_ReceiverIsNotNullOrUndefined_Baseline_Compact                                                                                                                                  ▒
     0.22%  chrome  chrome                  [.] Builtins_Call_ReceiverIsNotNullOrUndefined                                                                                                                                                   ▒
     0.21%  chrome  chrome                  [.] Builtins_ArrayFilter                                                                                                                                                                         ▒
     0.20%  chrome  chrome                  [.] Builtins_StringAdd_CheckNone                                                                                                                                                                 ▒
     0.20%  chrome  chrome                  [.] Builtins_RegExpReplace    

It seems like icache_64b.iftag_miss < cache-misses. Is cache-misses L2 misses?

Is it possible to get the cache miss profiles from Chrome as well?

InitRestParameter showing up so high on the dcache miss list is interesting. I don't think we'd previously identified it as a hotspot.

(In reply to Jeff Muizelaar [:jrmuizel] from comment #2)

It seems like icache_64b.iftag_miss < cache-misses. Is cache-misses L2 misses?

It doesn't actually seem perfectly clear what exactly "cache-misses" entails, but according to perf_event_open it states PERF_COUNT_HW_CACHE_MISSES: Cache misses. Usually this indicates Last Level Cache misses; this is intended to be used in conjunction with the PERF_COUNT_HW_CACHE_REFERENCES event to calculate cache miss rates.. It seems like it must also includes all prefetch requests since it's much higher than the LLC-*-misses.

I added some additional cache counters for further comparison below, but I think the icache misses may be the bigger issue and maybe the overall number of references we make. I wonder if this is because we are jumping into C++ to do GetNativeDataPropertyByValuePure, SetElementMegamorphic, addProperty, etc, while Chrome seems to do these things with an IC for the most part.

Chromium:

 Performance counter stats for process id '1345316':

     6,332,227,906      cache-references                                                     (36.31%)
       436,142,674      cache-misses                     #    6.888 % of all cache refs      (36.63%)
     1,907,876,036      L1-dcache-load-misses            #    4.38% of all L1-dcache accesses  (36.63%)
    43,511,411,953      L1-dcache-loads                                                      (36.79%)
    17,005,365,797      L1-dcache-stores                                                     (36.41%)
     2,516,458,663      L1-icache-load-misses                                                (36.23%)
        18,363,927      LLC-load-misses                  #    3.09% of all LL-cache accesses  (36.51%)
       594,196,920      LLC-loads                                                            (36.44%)
        40,397,238      LLC-store-misses                                                     (17.91%)
       135,787,475      LLC-stores                                                           (18.10%)
     2,471,436,366      icache_64b.iftag_miss                                                (26.94%)

      31.129881019 seconds time elapsed

Firefox:

 Performance counter stats for process id '1344844':

     8,598,874,325      cache-references                                                     (35.84%)
       713,126,462      cache-misses                     #    8.293 % of all cache refs      (36.08%)
     2,230,684,392      L1-dcache-load-misses            #    5.73% of all L1-dcache accesses  (36.34%)
    38,948,140,714      L1-dcache-loads                                                      (36.74%)
    18,422,283,077      L1-dcache-stores                                                     (36.70%)
     3,588,594,526      L1-icache-load-misses                                                (36.85%)
        38,634,359      LLC-load-misses                  #    5.00% of all LL-cache accesses  (36.83%)
       772,720,459      LLC-loads                                                            (36.66%)
        59,490,735      LLC-store-misses                                                     (17.87%)
       238,620,687      LLC-stores                                                           (17.80%)
     3,598,953,522      icache_64b.iftag_miss                                                (26.60%)

      39.157739853 seconds time elapsed

Is it possible to get the cache miss profiles from Chrome as well?

Samples: 42K of event 'cache-misses', Event count (approx.): 450303745, Thread: chrome
  Overhead  Comman  Shared Object           Symbol
+    2.12%  chrome  chrome                  [.] Builtins_KeyedStoreIC_Megamorphic
+    1.91%  chrome  chrome                  [.] Builtins_LoadIC_Megamorphic
+    1.35%  chrome  chrome                  [.] v8::internal::RegExpGlobalCache::RegExpGlobalCache
+    1.17%  chrome  chrome                  [.] Builtins_ObjectPrototypeHasOwnProperty
+    1.14%  chrome  chrome                  [.] blink::HTMLCollection::length
+    1.03%  chrome  chrome                  [.] allocator_shim::internal::PartitionMalloc
+    1.01%  chrome  chrome                  [.] v8::internal::Runtime_RegExpExecMultiple
+    0.86%  chrome  chrome                  [.] v8::internal::StringTable::TryStringToIndexOrLookupExisting
+    0.85%  chrome  chrome                  [.] Builtins_LoadIC
+    0.62%  chrome  chrome                  [.] v8::internal::Scavenger::ScavengePage
+    0.57%  chrome  chrome                  [.] allocator_shim::internal::PartitionFree
+    0.54%  chrome  chrome                  [.] v8::internal::TracedHandles::ComputeWeaknessForYoungObjects
+    0.54%  chrome  chrome                  [.] Builtins_KeyedLoadIC_Megamorphic
     0.50%  chrome  chrome                  [.] Builtins_RegExpReplace

and

Samples: 12K of event 'icache_64b.iftag_miss', Event count (approx.): 2504437566, Thread: chrome
  Overhead  Comman  Shared Object           Symbol
+    3.45%  chrome  chrome                  [.] Builtins_LoadIC                                                                                                                                                                              ◆
+    1.96%  chrome  chrome                  [.] Builtins_KeyedLoadIC_Megamorphic                                                                                                                                                             ▒
+    1.63%  chrome  chrome                  [.] Builtins_CallFunction_ReceiverIsNotNullOrUndefined                                                                                                                                           ▒
+    1.31%  chrome  chrome                  [.] v8::internal::StringTable::TryStringToIndexOrLookupExisting                                                                                                                                  ▒
+    1.17%  chrome  chrome                  [.] Builtins_KeyedStoreIC_Megamorphic                                                                                                                                                            ▒
+    1.10%  chrome  chrome                  [.] Builtins_StoreIC                                                                                                                                                                             ▒
+    0.92%  chrome  chrome                  [.] blink::Element::RecalcStyle                                                                                                                                                                  ▒
+    0.86%  chrome  chrome                  [.] blink::StyleResolver::ApplyBaseStyleNoCache                                                                                                                                                  ▒
+    0.84%  chrome  chrome                  [.] Builtins_LoadIC_Megamorphic                                                                                                                                                                  ▒
+    0.72%  chrome  chrome                  [.] v8::internal::Map::TransitionToDataProperty                                                                                                                                                  ▒
+    0.70%  chrome  chrome                  [.] v8::internal::Builtin_HandleApiCall                                                                                                                                                          ▒
+    0.67%  chrome  chrome                  [.] Builtins_ObjectPrototypeHasOwnProperty                                                                                                                                                       ▒
+    0.65%  chrome  chrome                  [.] Builtins_BaselineOutOfLinePrologue                                                                                                                                                           ▒
+    0.62%  chrome  chrome                  [.] v8::internal::LookupIterator::Start<false>                                                                                                                                                   ▒
+    0.61%  chrome  chrome                  [.] Builtins_CallFunction_ReceiverIsNullOrUndefined                                                                                                                                              ▒
+    0.57%  chrome  chrome                  [.] Builtins_BaselineLeaveFrame                                                                                                                                                                  ▒
+    0.56%  chrome  chrome                  [.] blink::EventDispatcher::Dispatch                                                                                                                                                             ▒
+    0.54%  chrome  chrome                  [.] blink::StyleResolver::ResolveStyle                                                                                                                                                           ▒
+    0.53%  chrome  chrome                  [.] Builtins_StrictEqual_Baseline                                                                                                                                                                ▒
+    0.53%  chrome  chrome                  [.] allocator_shim::internal::PartitionMalloc                                                                                                                                                    ▒
+    0.50%  chrome  chrome                  [.] blink::EventTarget::FireEventListeners                                                                          

Do we know how much GC V8 is doing? GC will typically have a lot of cache misses.

(You said it spends little time outside the JIT, so presumably not as much.)

(In reply to Jon Coppeard (:jonco) from comment #5)

Do we know how much GC V8 is doing? GC will typically have a lot of cache misses.

(You said it spends little time outside the JIT, so presumably not as much.)

I actually can't find Chrome spending any significant time in the GC. I do see some call stacks for the GC in the perf script output, but not in the report. That being said I think our GC activity is also quite low when looking at overall cycles, so I think the bulk of the slowdown is probably coming from icache misses which seem to line up closer to the report for cycles.

(In reply to Denis Palmeiro [:denispal] from comment #4)

I added some additional cache counters for further comparison below, but I think the icache misses may be the bigger issue and maybe the overall number of references we make. I wonder if this is because we are jumping into C++ to do GetNativeDataPropertyByValuePure, SetElementMegamorphic, addProperty, etc, while Chrome seems to do these things with an IC for the most part.

For the purpose of understanding icache behaviour, you probably need to dig in a bit more to which of this builtins are cloned vs shared and if they are layed out in memory in interesting ways. Another consideration might be the types of calls being used (such as direct vs indirect). Translating C++ code to equivalent MASM would not change icache behaviour, so I think we need a better theory about what structurally would be different (eg. a certain hotpath to run less code; avoiding trampoline frames more often; cloning builtins to use short-jumps; etc).

Another thing I notice about the data is one of our hot icache frames is the BaselineInterpreter, while in chrome there are many Builtins that appear to be pieces of their equivalent. I suspect that the fine-grained symbols in Chrome is simply appear better because it is divided up.

[triage note] I will set this bug as P3 task in the mean time.

My understanding is that this bug is still in the investigation stage, until we figure out the root cause of the icache-misses. Looking at icache-misses at the function level sounds like a fuzzy description of the problem. A better report should probably look at the instruction level and map these instructions back to the logic which is being manipulated by these instruction.

As soon as we have identified what is going on and that we can pin-point the issue, then we can create new bugs to fix each issue, as defects.

Severity: -- → N/A
Type: defect → task
Priority: -- → P3
Whiteboard: [sp3:react-todomvc]
Whiteboard: [sp3:react-todomvc] → [sp3-react-todomvc]
You need to log in before you can comment on or make changes to this bug.