Closed Bug 581893 Opened 14 years ago Closed 14 years ago

build a js::Invoke Gatling gun for str_replace / array_sort / array_extra

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: luke, Assigned: luke)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 11 obsolete files)

WIP Patch 14 years ago Alan Pierce (deleted), patch		Details \| Diff \| Splinter Review
Sunspider comparison 14 years ago Alan Pierce (deleted), text/plain		Details
Patch, v1 14 years ago Alan Pierce (deleted), patch		Details \| Diff \| Splinter Review
Sunspider comparison 14 years ago Alan Pierce (deleted), text/plain		Details
WIP 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
patch 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
patch 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
faster 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
fix assert 14 years ago Luke Wagner [:luke] (deleted), patch	Waldo : review-	Details \| Diff \| Splinter Review
patch 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
with lazy allocation 14 years ago Luke Wagner [:luke] (deleted), patch		Details \| Diff \| Splinter Review
this time, the gatling patch 14 years ago Luke Wagner [:luke] (deleted), patch	Waldo : review+	Details \| Diff \| Splinter Review

Luke Wagner [:luke]

Assignee

Description

•

14 years ago

When doing an array_sort, str_replace, or array_extra, js::Invoke is called with the same callee/this on the same number of arguments many many times.  We already avoid the overhead of pushing the arguments for each invocation using the pattern:

pushInvokeArgs
for (...) {
  set args
  js::Invoke
}
popInvokeArgs

We could take this much farther by factoring more work out of the loop.  That is, in preparing to call js::Invoke many times, we can indicate the callee and number of actual arguments and dynamically choose a specialization of js::Invoke to take advantage of the dynamic properties of the callee.  Kinda like a PIC.  Furthermore, a single stack frame can be pushed and reused for each js::Invoke call.  In the best case, Invoke could be reduced to little more than a ptr-to-fun call (to the dynamically-chosen specialization) and a single memcpy to initialize the stack from a prototype built during the preparation step.

Shark shows all of SS spending 1% (~5ms) in Invoke + InvokeCommon called from str_replace and array_sort.  Looks like dmandelin already identified us as 2x slower doing Array.sort in bug 578226.  5ms is not big, so not a high priority, but, as bug 578226 points out, when its a hot path we are 2x slower than JSC.  Of course, JaegerCalls will help the situation too.

V8 has non-trivial Invoke overhead too, but of the single-shot mostly from apply/call, so no wins there.

Boris Zbarsky [:bzbarsky]

Comment 1

•

14 years ago

fwiw, JM jits the callback function, so the impact of the invoke overhead is proportionally more there than on TM....

Andreas Gal :gal

Comment 2

•

14 years ago

Even in case of TM we will invoke the JM-ied callback function eventually.

Alan Pierce

Updated

•

14 years ago

Assignee: general → apierce

Status: NEW → ASSIGNED

Alan Pierce

Comment 3

•

14 years ago

Attached patch WIP Patch (obsolete) (deleted) — Details — Splinter Review

This passes all the tests, but doesn't optimize cases like native functions, and the fast invoke call can probably be made a lot faster. It's about a 1% SS win, and will hopefully be more by the time the patch is done.

Alan Pierce

Comment 4

•

14 years ago

Attached file Sunspider comparison (obsolete) (deleted) — Details

Luke Wagner [:luke]

Assignee

Comment 5

•

14 years ago

Awesome!  In addition to measuring progress on SS, it might be useful to measure progress on a few micro-benchmarks.

Alan Pierce

Comment 6

•

14 years ago

Attached patch Patch, v1 (obsolete) (deleted) — Details — Splinter Review

This patch is for TM (I also have a version for JM, if it's better to apply the patch there). It passes trace tests and jstests. I don't have try server access anymore, so I wasn't able to test that. A few notes:

The location of the code is a little weird. BatchInvokeSession's declaration is in jscntxt.h,  since it needs InvokeArgsGuard and InvokeFrameGuard, while the function definitions are in jsinterp.cpp, since I think it's better to have all invoke code in jsinterp.cpp.

I tried with and without a memcpy to set the stack frame entries, and it's faster without the memcpy.

My main concern with this patch is that between the calls to prepare() and invoke(), the state of the stack/registers is weird, since we've pushed a stack frame, but aren't immediately going to use it. In this state, array_extra calls GetArrayElement, which can eventually call js_InferFlags (called from CallResolveOp), which uses cx->regs->pc, so we can't have an invalid pc. However, we also can't just temporarily give cx->fp and cx->regs their previous values because in some places (e.g. ValueToNumber in sort_compare) we need to do a JS function call while a BatchInvokeSession is prepared, which would cause us to clobber the arguments and the frame (this was causing an assert, although there might be a way to get around it). My solution was to have the prepare() create a full stack frame just like is done before every invoke. This would mess up js_InferFlags, I think, since we use a different cx->regs-pc than before, but I'm not sure if its correctness was necessary in these cases.

Perf updates in a soon-to-follow commment.

Attachment #464257 - Attachment is obsolete: true

Attachment #466690 - Flags: review?(lw)

Alan Pierce

Comment 7

•

14 years ago

Attached file Sunspider comparison (obsolete) (deleted) — Details

After I rebased on top of YARR and some stack frame changes, some other parts of unpack-code seemed to get slower from my patch, so the total win on unpack-code wasn't as much. I spent a while trying to figure it out, with not much success. I've seen unpack-code have weird fluctuations like this before, so it may just be little differences in the optimizer or cache behavior. So, the win on unpack-code may or may not be more than what's shown here. Also, the wins in places other than tagcloud and unpack-code can probably be ignored.


In addition to the sunspider results attached, here's the running time on a few microbenchmarks:

Calling replace on a string of length 2^20, with the identity function as a lambda:

Before: 167ms
After:  149ms


Sorting an initially-reversed list with 1000000 elements, specifying a regular integer comparison lambda:

Before: 925ms
After:  739ms

So, the patch doesn't hugely speed up either string replace or sort, but it's enough to be significant.

Attachment #464258 - Attachment is obsolete: true

Luke Wagner [:luke]

Assignee

Comment 8

•

14 years ago

I'm sorry for the delay.  I've been finishing some Invoke-changing patches (bug 581263 and bug 539144) which will both simplify this patch and change its performance impact.

Chris Leary [:cdleary] (not checking bugmail)

Comment 9

•

14 years ago

Looks like I'm going to be taking this and rebasing it on top of the argv removal patch if there are no objections.

Assignee: alangpierce → cdleary

Chris Leary [:cdleary] (not checking bugmail)

Comment 10

•

14 years ago

bhackett said he would like to take this while I bang my head against regexp stuff.

Assignee: cdleary → bhackett1024

Luke Wagner [:luke]

Assignee

Comment 11

•

14 years ago

Stealing back with permission.

Assignee: bhackett1024 → lw

Luke Wagner [:luke]

Assignee

Comment 12

•

14 years ago

Attached patch WIP (obsolete) (deleted) — Details — Splinter Review

Rebased and updated.  I was able to rip a bit more out of the hot invoke path and there may be an ounce or two of perf left still -- I did a few things for code simplicity that I could gross for speed.  Micro-benchmarks of forEach/sort/replace show a 20-45% speedup.  SS shows nice speedup (-j -m) for the two benchmarks that actually Invoke a lot:

  string:              1.044x as fast    105.8ms +/- 0.5%   101.3ms +/- 0.5%
    base64:            -                   5.7ms +/- 4.0%     5.4ms +/- 4.3%
    fasta:             -                  21.1ms +/- 1.1%    20.9ms +/- 1.0%
    tagcloud:          1.027x as fast     27.0ms +/- 0.6%    26.3ms +/- 1.0%
    unpack-code:       1.091x as fast     41.8ms +/- 0.6%    38.3ms +/- 1.0%
    validate-input:    ??                 10.3ms +/- 2.2%    10.4ms +/- 2.3%

For an overall 1.8% (6.5ms) speedup.

V8 is mostly unaffected (no hot Invoke), but v8-raytrace is showing a 10% regression, which is weird, since that shouldn't be affected at all, so that needs investigation.

Attachment #466690 - Attachment is obsolete: true

Attachment #466692 - Attachment is obsolete: true

Attachment #466690 - Flags: review?(lw)

Luke Wagner [:luke]

Assignee

Comment 13

•

14 years ago

(In reply to comment #12)
> V8 is mostly unaffected (no hot Invoke), but v8-raytrace is showing a 10%
> regression, which is weird, since that shouldn't be affected at all, so that
> needs investigation.

Ah, it was a poltergeist.  No measurable change now.  tagcloud/unpack-code speedup remains.

Luke Wagner [:luke]

Assignee

Comment 14

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

(With bug 587707 landed) I was able to avoid clearing rval on each iteration which shaves off maybe .5ms on unpack-code.

Attachment #479212 - Attachment is obsolete: true

Attachment #479485 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 15

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Temporarily put back the rval-setting to avoid doing a 10% job of what bug 593882 is going to do properly; once that lands it will be trivial to remove.

Attachment #479485 - Attachment is obsolete: true

Attachment #479850 - Flags: review?(jwalden+bmo)

Attachment #479485 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 16

•

14 years ago

Attached patch faster (obsolete) (deleted) — Details — Splinter Review

This patch optimizes access of session arguments to not go through fp->canonicalActualArgs() which gives a 7% boost on a str_replace micro-benchmark.

Attachment #479850 - Attachment is obsolete: true

Attachment #479862 - Flags: review?(jwalden+bmo)

Attachment #479850 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 17

•

14 years ago

Attached patch fix assert (obsolete) (deleted) — Details — Splinter Review

Fixed silly assert.  Looking green on try server.

Attachment #479862 - Attachment is obsolete: true

Attachment #479953 - Flags: review?(jwalden+bmo)

Attachment #479862 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 18

•

14 years ago

Gary, Jesse: the code paths added by this patch are only exercised by String.replace (with a function replace arg), Array.sort (with a function comparator), and the Array extras (filter, forEach, every, map, some, reduce, reduceRight).  Do you suppose there is any way to get some fuzzing that pounds on usage of these?

Jeff Walden [:Waldo]

Comment 20

•

14 years ago

Comment on attachment 479953 [details] [diff] [review]
fix assert

># HG changeset patch
># User Luke Wagner <lw@mozilla.com>
># Date 1285712623 25200
># Node ID ed71a2c842fbd8e99c8f9a331ef739417b669d02
># Parent  a307ab9638b45ce02a4b08ed5ab034c5527a9ec9
>try: -b do -p linux,macosx64 -m none -u all -t none

I think your commit message needs work.  :-P


>diff --git a/js/src/jsarray.cpp b/js/src/jsarray.cpp

>+    InvokeSessionGuard &session = ca->session;
>+    session[0] = *av;
>+    session[1] = *bv;

Range-checking of argument initialization?  BRILLIANT!


>diff --git a/js/src/jsinterp.cpp b/js/src/jsinterp.cpp

> JS_REQUIRES_STACK bool
> Invoke(JSContext *cx, const CallArgs &argsRef, uint32 flags)
> {
>+    /* N.B. Must be kept in sync with InvokeSessionGuard::start/invoke */

This is going to be an ongoing nightmare.  :-(


> bool
>+InvokeSessionGuard::start(JSContext *cx, const Value &calleev, JSObject *thisp, uintN argc)

I will admit my eyes glazed over somewhat reading this.  It could be reasonable; it does roughly seem to follow what Invoke does.  But I see one issue that, in concert with a nearby comment, makes me wonder: is where this-computation happens correct?  Is it after get/pushInvokeFrame or before?  I *thought* that was observable in terms of scope chain for the thisObject hook, but maybe I'm just wrong and am misunderstanding the "Currently, this must happen after the frame is pushed and fp->scopeChain is correct because the thisObject hook may call JS_GetScopeChain." comment in my tree.


>+        /*
>+         * Use the optimized invoke path.
>+         *
>+         * Set (callee, this) once for the session.
>+         * N.B. set before getInvokeFrame does any arg overflow duplication.
>+         */
>+        args_.callee() = calleev;
>+        // FIXME bug 592992: hoist thisObject hook to ExternalInvoke
>+        if (thisp) {
>+            thisp = thisp->thisObject(cx);
>+            if (!thisp)
>+                return false;
>+            JS_ASSERT(IsSaneThisObject(*thisp));
>+            args_.thisv().setObject(*thisp);
>+        } else {
>+            args_.thisv().setNull();
>+        }
>+
>+        /* Push the stack frame once for the session. */
>+        uint32 flags = 0;
>+        if (!stack.getInvokeFrame(cx, args_, fun, script, &flags, &frame_))
>+            return false;
>+
>+        JSStackFrame *fp = frame_.fp();
>+        fp->initCallFrame(cx, calleev.toObject(), fun, argc, flags);
>+
>+        stack.pushInvokeFrame(cx, args_, &frame_);


>diff --git a/js/src/jsinterp.h b/js/src/jsinterp.h

> /*
>+ * Natives like sort/forEach/replace call Invoke repeatedly with the same
>+ * callee, this, and number of arguments. To optimize this, such natives can
>+ * start an "invoke session" to factor out much of the dynamic setup logic
>+ * required by a normal Invoke. Usage is:
>+ *
>+ *   InvokeSessionGuard session(cx);
>+ *   if (!session.start(cx, callee, thisp, argc, &session))
>+ *     ...
>+ *
>+ *   while (...) {
>+ *     // write actual args (not callee, this)
>+ *     session[0] = ...
>+ *     ...
>+ *     session[argc-1] = ...

Spaces around -.


Address the thisObject-calling-location question, and I think you'll be good to go.

Attachment #479953 - Flags: review?(jwalden+bmo) → review-

Luke Wagner [:luke]

Assignee

Comment 21

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Good catch Waldo.  I originally had that in the right place (after the push), but then I forgot and moved it before because args.thisv() is the non-canonical-this when fp->hasOverflowArgs().

The mjit calling interface changed a bit so I changed things around a bit too.  I also moved a few other bits out of the 'invoke' path and onto the 'start' path.

Attachment #479953 - Attachment is obsolete: true

Attachment #481383 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 22

•

14 years ago

Attached patch with lazy allocation (obsolete) (deleted) — Details — Splinter Review

Attachment #481383 - Attachment is obsolete: true

Attachment #481731 - Flags: review?(jwalden+bmo)

Attachment #481383 - Flags: review?(jwalden+bmo)

Luke Wagner [:luke]

Assignee

Comment 23

•

14 years ago

Attached patch this time, the gatling patch (deleted) — Details — Splinter Review

Ignore that last patch, wrong bug.

Attachment #481731 - Attachment is obsolete: true

Attachment #481732 - Flags: review?(jwalden+bmo)

Attachment #481731 - Flags: review?(jwalden+bmo)

Jeff Walden [:Waldo]

Comment 24

•

14 years ago

Comment on attachment 481732 [details] [diff] [review]
this time, the gatling patch

One other minor change: don't pass JSObject* thisp, pass const Value& in anticipation of the strict-this patch landing.  (Or, probably better, wait until after it lands and then you'll be forced to make the change.  Do note that these functions should have |undefined| passed as their |this| value, if no other such value has been specified.)

Attachment #481732 - Flags: review?(jwalden+bmo) → review+

Luke Wagner [:luke]

Assignee

Comment 25

•

14 years ago

(In reply to comment #24)
Thanks, changed.

http://hg.mozilla.org/tracemonkey/rev/0caecf667343

Whiteboard: fixed-in-tracemonkey

Robert Sayre

Comment 26

•

14 years ago

http://hg.mozilla.org/mozilla-central/rev/0caecf667343

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.