Closed Bug 576687 Opened 14 years ago Closed 13 years ago

JM: Slower performance than tm and competition on "original" JS A* benchmark

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: bzbarsky, Unassigned)

References

()

Details

See url in url field.  Relevant numbers (moo branch has fix for bug 576398):

m-c:  312ms
v8:   158ms
moo: 3265ms
Ran it a few times, so these are aggregate Shark estimates:
 * 664ms in JS_GetReservedSlot from CallPropertyOp
 * 569ms in InlineGetProp (PIC failure?)
 * 528ms in stubs::Call
 * 516ms in PropertyCache::fill
 * 505ms in stubs::CallProp
... blah blah blah

This basically means we're compiling everything in the benchmark, but our scope chains and calls suck. No easy fix.

If we can't wait for the reboot of scope chains, maybe we can coerce the PICs to understand call objects better.
Hmm.  Call prop performance is critical on dromaeo, so we likely need to do _something_ here.  :(
We shouldn't be using JS_GetReservedSlot if we can load more directly. See how much that wins.

We need to own our PIC hit cases. That (PIC failure?) needs investigation.

Call objects can have compile-time shapes once bug 558451 is done, but even now we get cache hits off them in the interpreter. If these aren't Jaeger-PICed they ought to be.

/be
In particular, TM has a fast-path for Call objects where we convert prop access into direct dslots loads on trace (after some guards).

We should be able to do the same in JM after similar guards (but not sure how easy that is to fit into the JM arch).  Given the CallPropertyOp above it sounds like we're doing the same slow path TM used to have with js_GetPropertyHelper or the like at the moment.  See the patches in bug 530255 (for gets) and bug 532477 (for sets) for the optimizations we had to make to Call prop access in TM to get it to stop gating dromaeo.
OK, this looks like the relevant stack for call prop sets in a simple testcase:

#5  0x0052b402 in js_SetProperty (cx=0x14af600, obj=0x169d48c0, id={bits = 364177984}, vp=0x152d4160) at /Users/bzbarsky/mozilla/moo-branch/mozilla/js/src/jsobj.cpp:5207
#6  0x0047e3bf in JSObject::setProperty (this=0x169d48c0, cx=0x14af600, id={bits = 364177984}, vp=0x152d4160) at jsobj.h:670
#7  0x0067ff33 in js::mjit::ic::SetProp (f=@0xbfffbb90, index=4) at /Users/bzbarsky/mozilla/moo-branch/mozilla/js/src/methodjit/PolyIC.cpp:967

and for gets:

#3  0x0052dfe7 in js_NativeGet (cx=0x14af600, obj=0x169d48c0, pobj=0x169d48c0, sprop=0x1665770, getHow=0, vp=0xbfffbb18) at /Users/bzbarsky/mozilla/moo-branch/mozilla/js/src/jsobj.cpp:4730
#4  0x006556a5 in NameOp (f=@0xbfffbb90, obj=0x169d48c0, callname=false) at /Users/bzbarsky/mozilla/moo-branch/mozilla/js/src/methodjit/StubCalls.cpp:350
#5  0x00655bf7 in js::mjit::stubs::Name (f=@0xbfffbb90) at /Users/bzbarsky/mozilla/moo-branch/mozilla/js/src/methodjit/StubCalls.cpp:427
Note that pic fails here because we have getters and setters, I would think...
Shell mini-testcase for call prop access:

 (function() {
    var ret;
    var str = "abc";
    dump('aaa');
    var f = function() {
      for (var j = 0; j < 5000; ++j) {
        ret = str.charAt(0)
        ret = str.charAt(str.length - 1)
      }
    }
    var start = new Date();
    var i = 0;
    while ((new Date()) - start < 1000) {
      ++i;
      f();
    }
    print((i / (new Date() - start)) * 1000);
  })()
Thanks Boris, splitting off into bug 576733.
After fixing bug 576733, comment #7's test case still has problems. According to Shark:

  22%: stubs::SlowCall
  22%: stubs::BindName
  17%: str_charAt
   4%: str->chars()

Everything related to calls is bug 572275 - maybe we should have some sort of MIC there, for fast natives.

BINDNAME - I think we could make this a PIC too.
Depends on: JaegerCalls
BINDNAME uses the property cache, which is an interpreter-oriented PIC, so yeah.

You shouldn't have to check every hop in the scope chain, thanks to shape regen and the way we guard. Just the direct and target objects. Dmandelin knows the details.

/be
> After fixing bug 576733, comment #7's test case still has problems.

For what it's worth, with that fix I see 2820ms instead of 3265ms.  So this didn't seem to help as much as I'd thought it would based on comment 1...
(In reply to comment #11)

BINDNAME PIC landed. Will re-shark tomorrow and also see where our PICs are failing.
With the fix for bug 576926, we're at 1300ms here with moo tip.  Definite progress!
Depends on: 576926
Tested this with a tracemonkey nightly build from today. Numbers look much better now:

Chrome: 140 ms
TM: 180 ms
JM: 315 ms
JM+TM: 315 ms
Safari: 348 ms
seems like we want to get back on trace here?
Yeah, it does.  Do we need a separate bug on that?
Depends on: 580468
Peformance improved a lot over the last few days (maybe Brian's call patches in bug 587707?)

TM: 180 -> 175 ms 
JM & JM+TM: 315 -> 270 ms
On my system, JM+TI gets 167-172ms vs. 177-199ms (it bounces around a lot) for Chrome at the URL above. Safe to call this WFM?
Yea, indeed.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → WORKSFORME
We should add a regression test here, possibly...
Flags: in-testsuite?
You need to log in before you can comment on or make changes to this bug.