Closed
Bug 602129
Opened 14 years ago
Closed 14 years ago
JM: make f.call(...) fast
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
People
(Reporter: bzbarsky, Assigned: luke)
References
Details
Attachments
(3 files, 3 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dvander
:
review+
|
Details | Diff | Splinter Review |
When running the attached testcase I get these numbers:
bin% ./js ~/bar.js
801: control (no call)
1771: call([object global])
1632: call(null)
1633: call(undefined)
bin% ./js -m ~/bar.js
130: control (no call)
904: call([object global])
934: call(null)
946: call(undefined)
bin% ./js -j ~/bar.js
89: control (no call)
41: call([object global])
41: call(null)
1619: call(undefined)
bin% ./js -m -j ~/bar.js
97: control (no call)
50: call([object global])
47: call(null)
921: call(undefined)
So it looks like call() in JM is about 2x faster than in interp, but still a good bit slower than an actual function call (slower than interp, comparatively), and a lot slower than call when it gets traced (my local tree is slightly hacked so null thisp will trace for call).
For comparison, here is v8 opt shell:
v8-read-only% ./shell ~/bar.js
85: control (no call)
151: call([object global])
147: call(null)
160: call(undefined)so we're pretty close to them in JM for the straight function call, but a lot slower for call(
Comment 1•14 years ago
|
||
We can special-case this in the native call MIC, afaik. Might also help v8-deltablue a bit, bhackett found out that it calls Function.prototype.call 125k times (bug 595884 comment 13)
Assignee | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
Shark shows that that (with shell -m) we spend 10.9ms in js::Invoke 10.4ms in js_fun_call, out of 695ms in v8-deltablue.
Incidentally, David and I were talking about this yesterday, and the neat thing about expressions like foo.call(this, arg1, arg2) is that, at the call site of 'call', the stack is:
[Function.prototype.call, foo, this, arg1, arg2]
and the desired call stack to call 'foo' is:
[foo, this, arg1, arg2]
So we should be able to get a f.call() to cost only a tiny bit more than f().
Reporter | ||
Comment 3•14 years ago
|
||
Hmm. Can just somehow fast-path the case when f.call happens to be a function backed by js_fun_call or whatnot? So guard on that and take the fast-path if so, else fall back to whatever we're doing now?
Assignee | ||
Comment 4•14 years ago
|
||
(In reply to comment #3)
That's what we were thinking. It would basically be a PIC + Call IC mash-up.
Updated•14 years ago
|
Blocks: JaegerSpeed
Assignee | ||
Comment 5•14 years ago
|
||
The first step is to distinguish calls to 'apply' from calls to 'call'. The easiest way to do this is to just split JSOP_APPLY into two opcodes.
Assignee: general → lw
Status: NEW → ASSIGNED
Assignee | ||
Comment 6•14 years ago
|
||
This patch is a first stab at what was discussed above and hardly tested. It basically just adds a guard that f.call is actually js_fun_call and, if this passes, proceeds to do a normal call ic to call the 'this' parameter of js_fun_call.
Preliminary results show a 10x (-m) speedup of
function f(x) { return x+1 }
f.call(null, 3)
which puts us on par with jsc. This patch could even go a bit faster by comparing function-object equality instead of native equality.
Assignee | ||
Comment 7•14 years ago
|
||
Passes trace tests on x86.
Assignee | ||
Updated•14 years ago
|
Summary: JM: f.call seems to be slow compared to TM and V8 → JM: make f.call(...) fast
Assignee | ||
Comment 8•14 years ago
|
||
Green on try. SS unaffected.
deltablue: 1.058x as fast 436.3ms +/- 0.3% 412.4ms +/- 1.4%
Attachment #483714 -
Attachment is obsolete: true
Attachment #484024 -
Attachment is obsolete: true
Attachment #484349 -
Flags: review?(dvander)
Assignee | ||
Updated•14 years ago
|
Attachment #483713 -
Flags: review?(dvander)
Assignee | ||
Comment 9•14 years ago
|
||
Refactored to fit in with the rest of the patch queue. Still green on try.
Attachment #484349 -
Attachment is obsolete: true
Attachment #486276 -
Flags: review?(dvander)
Attachment #484349 -
Flags: review?(dvander)
Assignee | ||
Comment 10•14 years ago
|
||
I implemented the TODO:
// TODO: for compileAndGo scripts, we can just guard on callee JSObject*
I was unable to write a micro-benchmark that showed a speedup. This makes sense since the two extra loads are not lengthening the critical path for an OoO cpu.
Updated•14 years ago
|
Attachment #483713 -
Flags: review?(dvander) → review+
Comment on attachment 486276 [details] [diff] [review]
patch
Nice patch, only have tiny nits.
>+ explicit Address() {}
>+
THANK YOU.
>+void
>+mjit::Compiler::checkCallSpeculation(uint32 argc,
>+ FrameEntry *origCallee, FrameEntry *origThis,
>+ MaybeRegisterID calleeType, RegisterID calleeData,
Although it's longer, parameters 3,4 make more sense prefixed by "orig".
>+ Jump isNative = masm.branchPtr(Assembler::NotEqual,
>+ Address(calleeData, JSFunction::offsetOfNativeOrScript()),
>+ ImmPtr(JS_FUNC_TO_DATA_PTR(void *, js_fun_call)));
I would static assert that the layouts are compatible, here.
>+ uncachedCallPatch->ool = true;
>+ uncachedCallPatch->hasSlowNcode = false;
It's clearer to make ool mean "does not have fastNcodePatch", and then ...
>+ stubcc.masm.loadPtr(FrameAddress(offsetof(VMFrame, regs.fp)), JSFrameReg);
>+ Address ncodeAddr(JSFrameReg, JSStackFrame::offsetOfncode());
>+ uncachedCallPatch->fastNcodePatch = stubcc.masm.storePtrWithPatch(ImmPtr(NULL), ncodeAddr);
... set slowNcodePatch instead.
>+ /* Cannot rejoin: there is more fast path to generate in inlineCallHelper. */
This is a little cryptic. Maybe "Load the return address from vp[0], then jump. inlineCallHelper() will manually link this jump to rejoin into the fast path, since IC slow path expects the return value to be in vp[1]."
>+bool
>+mjit::Compiler::lowerableFunCall(jsbytecode *pc)
Unless this gets enhanced in the rest of the patch queue, might be better as:
static inline bool
IsLowerableFunCall?
> emitUncachedCall(argc, callingNew);
> return;
>+#ifdef JS_MONOIC
> }
>
>-#ifdef JS_MONOIC
Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it can catch errors after changing ICs, and also catch ARM regressions).
>- Registers tempRegs;
>+ CallGenInfo callIC(PC);
>+ CallPatchInfo callPatch;
>+ MaybeRegisterID icCalleeType;
>+ RegisterID icCalleeData;
>+ Address icRvalAddr;
>+
>+ /* These locals are only initialized if lowerFunCall. */
>+ Jump uncachedCallSlowRejoin;
>+ CallPatchInfo uncachedCallPatch;
A very brief comment to the right of each of these would be more readable than the block comments above.
>+ if (lowerFunCall)
>+ stubcc.crossJump(uncachedCallSlowRejoin, masm.label());
"The mispredicted .call slow path loaded rval already. Join it back to the fast-path now."
Attachment #486276 -
Flags: review?(dvander) → review+
Assignee | ||
Comment 12•14 years ago
|
||
(In reply to comment #11)
Good comments, thanks
> I would static assert that the layouts are compatible, here.
Done in offsetOfNativeOrScript().
> > emitUncachedCall(argc, callingNew);
> > return;
> >+#ifdef JS_MONOIC
> > }
> >
> >-#ifdef JS_MONOIC
>
> Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it
> can catch errors after changing ICs, and also catch ARM regressions).
Thanks, will do. This, though, is not a rebase error. If you look further down, this cuts out the #else case in #if JS_MONOIC. That is, it unifies the code run when !defined(JS_MONOIC) with the code when defined(JS_MONOIC) && debugMode.
Assignee | ||
Comment 13•14 years ago
|
||
Comment 14•14 years ago
|
||
Landed last month:
http://hg.mozilla.org/mozilla-central/rev/2f3a0ac5e251
http://hg.mozilla.org/mozilla-central/rev/d1bf74046ba7
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•