Closed Bug 602129 Opened 14 years ago Closed 14 years ago

JM: make f.call(...) fast

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: luke)

References

Details

Attachments

(3 files, 3 obsolete files)

Attached file Testcase (deleted) —
When running the attached testcase I get these numbers: bin% ./js ~/bar.js 801: control (no call) 1771: call([object global]) 1632: call(null) 1633: call(undefined) bin% ./js -m ~/bar.js 130: control (no call) 904: call([object global]) 934: call(null) 946: call(undefined) bin% ./js -j ~/bar.js 89: control (no call) 41: call([object global]) 41: call(null) 1619: call(undefined) bin% ./js -m -j ~/bar.js 97: control (no call) 50: call([object global]) 47: call(null) 921: call(undefined) So it looks like call() in JM is about 2x faster than in interp, but still a good bit slower than an actual function call (slower than interp, comparatively), and a lot slower than call when it gets traced (my local tree is slightly hacked so null thisp will trace for call). For comparison, here is v8 opt shell: v8-read-only% ./shell ~/bar.js 85: control (no call) 151: call([object global]) 147: call(null) 160: call(undefined)so we're pretty close to them in JM for the straight function call, but a lot slower for call(
Blocks: 602132
We can special-case this in the native call MIC, afaik. Might also help v8-deltablue a bit, bhackett found out that it calls Function.prototype.call 125k times (bug 595884 comment 13)
(In reply to comment #1) Shark shows that that (with shell -m) we spend 10.9ms in js::Invoke 10.4ms in js_fun_call, out of 695ms in v8-deltablue. Incidentally, David and I were talking about this yesterday, and the neat thing about expressions like foo.call(this, arg1, arg2) is that, at the call site of 'call', the stack is: [Function.prototype.call, foo, this, arg1, arg2] and the desired call stack to call 'foo' is: [foo, this, arg1, arg2] So we should be able to get a f.call() to cost only a tiny bit more than f().
Hmm. Can just somehow fast-path the case when f.call happens to be a function backed by js_fun_call or whatnot? So guard on that and take the fast-path if so, else fall back to whatever we're doing now?
(In reply to comment #3) That's what we were thinking. It would basically be a PIC + Call IC mash-up.
The first step is to distinguish calls to 'apply' from calls to 'call'. The easiest way to do this is to just split JSOP_APPLY into two opcodes.
Assignee: general → lw
Status: NEW → ASSIGNED
Attached patch WIP 1 (obsolete) (deleted) — Splinter Review
This patch is a first stab at what was discussed above and hardly tested. It basically just adds a guard that f.call is actually js_fun_call and, if this passes, proceeds to do a normal call ic to call the 'this' parameter of js_fun_call. Preliminary results show a 10x (-m) speedup of function f(x) { return x+1 } f.call(null, 3) which puts us on par with jsc. This patch could even go a bit faster by comparing function-object equality instead of native equality.
Attached patch patch (obsolete) (deleted) — Splinter Review
Passes trace tests on x86.
Blocks: 605192
Blocks: 602262
Summary: JM: f.call seems to be slow compared to TM and V8 → JM: make f.call(...) fast
Depends on: 605352
Depends on: 605355
Attached patch patch (obsolete) (deleted) — Splinter Review
Green on try. SS unaffected. deltablue: 1.058x as fast 436.3ms +/- 0.3% 412.4ms +/- 1.4%
Attachment #483714 - Attachment is obsolete: true
Attachment #484024 - Attachment is obsolete: true
Attachment #484349 - Flags: review?(dvander)
Attachment #483713 - Flags: review?(dvander)
Attached patch patch (deleted) — Splinter Review
Refactored to fit in with the rest of the patch queue. Still green on try.
Attachment #484349 - Attachment is obsolete: true
Attachment #486276 - Flags: review?(dvander)
Attachment #484349 - Flags: review?(dvander)
I implemented the TODO: // TODO: for compileAndGo scripts, we can just guard on callee JSObject* I was unable to write a micro-benchmark that showed a speedup. This makes sense since the two extra loads are not lengthening the critical path for an OoO cpu.
Attachment #483713 - Flags: review?(dvander) → review+
Comment on attachment 486276 [details] [diff] [review] patch Nice patch, only have tiny nits. >+ explicit Address() {} >+ THANK YOU. >+void >+mjit::Compiler::checkCallSpeculation(uint32 argc, >+ FrameEntry *origCallee, FrameEntry *origThis, >+ MaybeRegisterID calleeType, RegisterID calleeData, Although it's longer, parameters 3,4 make more sense prefixed by "orig". >+ Jump isNative = masm.branchPtr(Assembler::NotEqual, >+ Address(calleeData, JSFunction::offsetOfNativeOrScript()), >+ ImmPtr(JS_FUNC_TO_DATA_PTR(void *, js_fun_call))); I would static assert that the layouts are compatible, here. >+ uncachedCallPatch->ool = true; >+ uncachedCallPatch->hasSlowNcode = false; It's clearer to make ool mean "does not have fastNcodePatch", and then ... >+ stubcc.masm.loadPtr(FrameAddress(offsetof(VMFrame, regs.fp)), JSFrameReg); >+ Address ncodeAddr(JSFrameReg, JSStackFrame::offsetOfncode()); >+ uncachedCallPatch->fastNcodePatch = stubcc.masm.storePtrWithPatch(ImmPtr(NULL), ncodeAddr); ... set slowNcodePatch instead. >+ /* Cannot rejoin: there is more fast path to generate in inlineCallHelper. */ This is a little cryptic. Maybe "Load the return address from vp[0], then jump. inlineCallHelper() will manually link this jump to rejoin into the fast path, since IC slow path expects the return value to be in vp[1]." >+bool >+mjit::Compiler::lowerableFunCall(jsbytecode *pc) Unless this gets enhanced in the rest of the patch queue, might be better as: static inline bool IsLowerableFunCall? > emitUncachedCall(argc, callingNew); > return; >+#ifdef JS_MONOIC > } > >-#ifdef JS_MONOIC Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it can catch errors after changing ICs, and also catch ARM regressions). >- Registers tempRegs; >+ CallGenInfo callIC(PC); >+ CallPatchInfo callPatch; >+ MaybeRegisterID icCalleeType; >+ RegisterID icCalleeData; >+ Address icRvalAddr; >+ >+ /* These locals are only initialized if lowerFunCall. */ >+ Jump uncachedCallSlowRejoin; >+ CallPatchInfo uncachedCallPatch; A very brief comment to the right of each of these would be more readable than the block comments above. >+ if (lowerFunCall) >+ stubcc.crossJump(uncachedCallSlowRejoin, masm.label()); "The mispredicted .call slow path loaded rval already. Join it back to the fast-path now."
Attachment #486276 - Flags: review?(dvander) → review+
(In reply to comment #11) Good comments, thanks > I would static assert that the layouts are compatible, here. Done in offsetOfNativeOrScript(). > > emitUncachedCall(argc, callingNew); > > return; > >+#ifdef JS_MONOIC > > } > > > >-#ifdef JS_MONOIC > > Rebase misfire? Good idea to build with --disable-monoic and run jitflags=m (it > can catch errors after changing ICs, and also catch ARM regressions). Thanks, will do. This, though, is not a rebase error. If you look further down, this cuts out the #else case in #if JS_MONOIC. That is, it unifies the code run when !defined(JS_MONOIC) with the code when defined(JS_MONOIC) && debugMode.
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 613732
Depends on: 756864
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: