Closed Bug 945555 Opened 11 years ago Closed 8 years ago

jsperf "arrow-function" test performance increases 2x by adding unused `var self=this` statement

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: cpeterson, Unassigned)

References

(Blocks 2 open bugs,
URL
)

Details

(Keywords: perf, regression)

Attachments

(2 files)

Shell testcase 11 years ago Boris Zbarsky [:bzbarsky] (deleted), text/javascript		Details
investigate_bind_regression-wip0.diff 11 years ago Terrence Cole [:terrence] (deleted), patch		Details \| Diff \| Splinter Review

Chris Peterson [:cpeterson]

Reporter

Description

•

11 years ago

The following jsperf test compares arrow functions and bind() overhead. The arrow functions, unsurprisingly, are slightly slower than bind(). http://jsperf.com/arrow-functions The most surprising result, however, is that, by adding an unused `var self=this` statement, the `withSelf` function is 2x faster than the `plain` function! The performance boost goes away if I change `var self=this` to `var self=x`. plain = function(x) { return function(y) {return x + y} } withSelf = function(x) { var self=this return function(y) {return x + y} } This seems to be a regression in Nightly 28. `withSelf` perf has been steadily improving in recent Firefox releases, but `plain` perf took a big dive in Nightly 28. withArrow() has been getting slower with each release. Operations/second from my MacBook Pro: withArrow withSelf withBind plain Nightly 28 390,116 8,230,679 473,588 3,940,216 Aurora 27 443,074 6,964,792 553,815 5,068,682 Beta 26 473,078 6,194,081 572,000 5,065,261 Firefox 25 532,878 4,201,867 484,456 4,746,151 Chrome 31 N/A 32,271,400 524,593 31,806,558

Boris Zbarsky [:bzbarsky]

Comment 1

•

11 years ago

Attached file Shell testcase (deleted) — Details

This shows the same odd behavior.

Boris Zbarsky [:bzbarsky]

Comment 2

•

11 years ago

JIT inspector shows about the same things for "withSelf" and "plain", so I'm not sure why the 2x difference... There is no obvious reason why withArrow should be so slow, btw; we just seem to be taking a slow path for it.

Chris Peterson [:cpeterson]

Reporter

Comment 3

•

11 years ago

My test results from comment 0 were from an "Early 2011" model MacBook Pro (8 GB RAM) and very consistent. On my "Mid 2012" model MacBook Pro Retina (16 GB RAM), "withSelf" is only a little faster than "plain", not 2x. The performance difference seems related to Ion because withSelf and plain are much closer with Baseline (on both MacBooks). withArrow withSelf withBind plain Nightly 28 (Ion) 443,719 5,675,600 515,022 4,410,839 Nightly 28 (Baseline) 369,219 3,361,748 357,804 3,946,622

Jan de Mooij [:jandem]

Comment 4

•

11 years ago

Unfortunately I can't reproduce this. I get the following numbers for the shell testcase: withArrow: 461254.6125461255 withSelf: 13513513.513513513 withBind: 507872.01625190454 plain: 14705882.352941176

Jan de Mooij [:jandem]

Comment 5

•

11 years ago

That was 32-bit, I do see the problem with a 64-bit build: withArrow: 429922.6139294927 withSelf: 9615384.615384616 withBind: 509424.3504839531 plain: 6097560.975609756 Since you're both on OS X, I assume you also used 64-bit builds. Will take look.

Jan de Mooij [:jandem]

Comment 6

•

11 years ago

I commented out the arrow and bind tests and I get the following numbers: withSelf: 9900990.099009901 plain: 5434782.608695652 If I run plain before withSelf: plain: 10638297.872340424 withSelf: 5235602.094240838 So the one that runs first is fast? Probably due to GC or something...

Boris Zbarsky [:bzbarsky]

Comment 7

•

11 years ago

That's entirely possible, yes. :(

Chris Peterson [:cpeterson]

Reporter

Comment 8

•

11 years ago

withArrow withSelf withBind plain Nightly 28 390,000 8,231,000 474,000 3,940,000 ... Aurora 30 481,000 8,960,000 556,000 6,597,000 Nightly 31 7,027,000 8,055,000 79,000 11,484,000 Kannan's arrow function optimizations can be seen in withArrow test results in Nightly 31, but withBind perf is about 7x slower in Nightly 31 than Aurora 30. I'm using OS X.

Boris Zbarsky [:bzbarsky]

Comment 9

•

11 years ago

That withBind regression happened in http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1 Nothing in there jumps out at me, unfortunately. :( Jan, any ideas?

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Comment 10

•

11 years ago

(In reply to Chris Peterson (:cpeterson) from comment #8) > Kannan's arrow function optimizations can be seen in withArrow test results > in Nightly 31 I did these optimizations, withArrow should be about as fast as plain/withSelf now, not sure why that isn't happening. It's possible we are GC'ing etc; see also comment 6... (In reply to Boris Zbarsky [:bz] from comment #9) > That withBind regression happened in > http://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1 > > Nothing in there jumps out at me, unfortunately. :( Jan, any ideas? GGC is in there... Forwarding to Terrence.

Flags: needinfo?(jdemooij) → needinfo?(terrence)

Terrence Cole [:terrence]

Comment 11

•

11 years ago

Attached patch investigate_bind_regression-wip0.diff (deleted) — Details — Splinter Review

Poking the default heap of Lamba yields some interesting results: --disable-gcgenerational: withArrow: 18,518,518.51851852 withSelf: 28,571,428.57142857 withBind: 651,041.6666666666 plain: 21,739,130.43478261 current tip (--enable-gcgenerationa): withArrow: 52,631,578.94736842 withSelf: 62,500,000 withBind: 89,936.14533681086 plain: 71,428,571.42857143 with patch: withArrow: 50,000,000 withSelf: 25,641,025.64102564 withBind: 652,315.7208088714 plain: 23,255,813.95348837 Self and plain are benefiting tremendously from participating in GGC by a factor of 3x or so. The reason withBind isn't getting the same benefit appears to be that bind enters the interpreter and fires a bunch more barriers while bumbling about in C++. The implicated barriers in a few minutes of poking about in ggc appear to be Shape::parent and InitialShapeTableRef. I'm not really sure what the right approach is yet: need to spend some more time figuring out why bind is hitting shapes so hard.

Terrence Cole [:terrence]

Updated

•

11 years ago

Blocks: 989414

Terrence Cole [:terrence]

Comment 12

•

11 years ago

Okay, now that I've looked into this, bind is crazy suboptimal in the presence of GGC. (1) Lambda is always nursery allocated and bind always creates a singleton (tenured) JSFunction. This means the proto link always needs barriers. (2) This unique nursery lambda proto means each new bound function gets its own entry in the initial shapes table, hammering on the barrier on that table. (3) JSFunction::initBoundFunction immediately converts the shape to a dictionary shape. This conversion hammers hard on the proto barrier from the weird handling of BaseShape in vm/Shape.cpp. (4) Then the bound function puts the lambda in a reserved slot, firing a slot barrier as well. I don't really see a great way to cheat here. Making the bound function nursery-allocated didn't help much here and would have hurt the places that need the singleton behavior. We switched to barriering the initial shapes table because sweeping through them is so expensive -- this is still true: reverting that patch was a net slowdown, even on this benchmark. Removing any one or two of these barriers is only going to get us a marginal speedup compared to the 10x we want. I think the only way forward here is to not allocate lambdas that are about to be bound in the nursery. I went on a brief adventure in the jit directory looking for where to scan the MIR for lambda, followed by box, followed by getcallnative for CallOrConstructBoundFunction. I quickly got lost and decided I should at least ask if this is a good idea before I spend more time on it.

Flags: needinfo?(terrence) → needinfo?(jdemooij)

Jan de Mooij [:jandem]

Comment 13

•

11 years ago

Terrence, thanks for looking into this. Our current fun.bind implementation is hopelessly inefficient, both creating them (as you said, initBoundFunction doing dictionary mode stuff etc) and calling them (calling a C++ native that calls back into JS.) Both V8 and JSC self-host function.bind. I think now's a good time to investigate doing that; I'll try to prototype something later today.

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Updated

•

11 years ago

Depends on: 1000780

Terrence Cole [:terrence]

Updated

•

11 years ago

Blocks: GC.performance
No longer blocks: 989414

Benjamin Bouvier [:bbouvier] (inactive)

Comment 14

•

8 years ago

Some updated measurements: withArrow: 43478260.86956522 withSelf: 90909090.9090909 withBind: 1748251.7482517483 plain: 83333333.33333333 Now withSelf and plain are in the same ballpark, but withBind is *much* slower than withArrow.

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

8 years ago

Blocks: jsperf

Priority: -- → P3

Till Schneidereit [:till]

Comment 15

•

8 years ago

withArrow, withSelf, and plain are now all about the same speed. The results are somewhat noisy, but over a few runs and with more iterations it's pretty clear. withBind tests something a bit different: the performance of both the binding operation and calling a bound function. Give that, the perf difference isn't even that bad :) Changing the shell test to only test calling performance of all cases still has withBind ~3x slower though, which shouldn't be the case after bug 1000780 and some additional optimizations jandem did last year. I'll file a new bug about that.

Status: NEW → RESOLVED

Closed: 8 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.