Closed Bug 1821183 Opened 2 years ago Closed 1 year ago

EmberJS-TodoMVC spends a lot of time in Ion-compiled code for Cache.prototype.get

Categories

(Core :: JavaScript Engine, task, P2)

task

Tracking

()

RESOLVED WONTFIX

People

(Reporter: mstange, Unassigned)

References

(Blocks 1 open bug)

Details

In the function Cache.prototype.get in the EmberJS-TodoMVC benchmark, we spend 1149 samples in its "JS-only self time", of which we spend 957 samples in the Ion-generated code itself. V8 spends 556 samples in the function's "JS-only self time", 374 of which are in the Turbofan-generated code.

To reproduce, run index.js inside https://github.com/mozilla/Speedometer/tree/ab0e3b8406bbc0356244f8cd430900e9a346737b/resources/todomvc/architecture-examples/emberjs/dist in the JS shell.

In Spidermonkey, the function's full name is Cache</Cache.prototype.get

Profiles (focused on Cache.prototype.get's "JS-only self time"):
Spidermonkey: https://share.firefox.dev/3ZxdRk5
V8: https://share.firefox.dev/3Zxfz4Z

I don't have a way to share the generated assembly code at the moment.

I took a look at the generated assembly for this. Nothing stands out in particular. The hot spot is the code generated for this.store.get(key): after inlining it's this.store.data[key], and in particular the MegamorphicLoadSlotByValue we generate for data[key] is 87/153 of the self samples in my profile.

    Cache.prototype.get = function get(obj) {
      var key = this.key === undefined ? obj : this.key(obj);
      var value = this.store.get(key);
      if (value === undefined) {
        this.misses++;
        value = this._set(key, this.func(obj));
      } else if (value === UNDEFINED) {
        this.hits++;
        value = undefined;
      } else {
        this.hits++;
        // nothing to translate
      }

      return value;
    };

MegamorphicGetPropertyByValue has been pretty aggressively optimized already.

It's a little interesting that Markus's profile has so many more samples overall than mine. Mine ran for 1.1s with 13K samples; Markus's profile ran for 1.5 with 150K samples. That might just be differences in our perf setup, though.

Markus, can you confirm that this was an apples-to-apples comparison (eg we aren't missing places where V8 inlined this function)? Our codegen here seems reasonably good; I'm not sure how V8 could be twice as fast as us.

Thanks for taking a look!

Markus, can you confirm that this was an apples-to-apples comparison (eg we aren't missing places where V8 inlined this function)? Our codegen here seems reasonably good; I'm not sure how V8 could be twice as fast as us.

I filed this bug before we had understood that the high sampling frequency is introducing rather high stack-depth-dependent skew. Since our stacks are deeper than V8s, it is not as much of an apples-to-apples comparison as I initially assumed.

I will compare again once I have profiles with uniform overhead.

The new profiles tell a similar story to the old profiles.

SM: https://share.firefox.dev/3YEmUOW
V8: https://share.firefox.dev/3ZSe2X2

634 samples spent in Ion-generated code of 1061 total JS-self-time. V8 had a total JS-self-time of 566 samples for this function.

Something similar seems to happen in CachedTag.prototype.value (code is here), 5333 vs 2322 samples, 3457 in Ion:

SM: https://share.firefox.dev/3larGpx
V8: https://share.firefox.dev/3ywd9Yl

eg we aren't missing places where V8 inlined this function

This was running V8 with inlining turned off. And I have scoured the profile for places where V8 is spending more time than we are and haven't found any that seemed related to the work done in Cache.prototype.get.

Markus's profile ran for 1.5 with 150K samples. That might just be differences in our perf setup, though.

We were able to set the sampling frequency to 100kHz by rebooting the machine and setting /proc/sys/kernel/perf_cpu_time_max_percent to zero.

Blocks: 1801194
Type: defect → task
Priority: -- → P2

EmberJS has been removed from sp3. Does the issue show up in other tests?

Flags: needinfo?(mstange.moz)

Unclear. This particular function is unique to this test, so this bug no longer blocks sp3.

No longer blocks: speedometer3
Flags: needinfo?(mstange.moz)
Whiteboard: [sp3]

I'm going to go ahead and close this, then. Realistically we're never going to get around to optimizing one particular function in a benchmark we don't care about, particularly since my initial investigation didn't turn up anything interesting. If there's an important issue here, it will show up elsewhere.

Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → WONTFIX
Whiteboard: [sp3]
You need to log in before you can comment on or make changes to this bug.