Closed
Bug 945555
Opened 11 years ago
Closed 8 years ago
jsperf "arrow-function" test performance increases 2x by adding unused `var self=this` statement
Categories
(Core :: JavaScript Engine: JIT, defect, P3)
Core
JavaScript Engine: JIT
Tracking
()
RESOLVED
FIXED
People
(Reporter: cpeterson, Unassigned)
References
(Blocks 2 open bugs, )
Details
(Keywords: perf, regression)
Attachments
(2 files)
(deleted),
text/javascript
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
The following jsperf test compares arrow functions and bind() overhead. The arrow functions, unsurprisingly, are slightly slower than bind().
http://jsperf.com/arrow-functions
The most surprising result, however, is that, by adding an unused `var self=this` statement, the `withSelf` function is 2x faster than the `plain` function! The performance boost goes away if I change `var self=this` to `var self=x`.
plain = function(x) {
return function(y) {return x + y}
}
withSelf = function(x) {
var self=this
return function(y) {return x + y}
}
This seems to be a regression in Nightly 28. `withSelf` perf has been steadily improving in recent Firefox releases, but `plain` perf took a big dive in Nightly 28. withArrow() has been getting slower with each release.
Operations/second from my MacBook Pro:
withArrow withSelf withBind plain
Nightly 28 390,116 8,230,679 473,588 3,940,216
Aurora 27 443,074 6,964,792 553,815 5,068,682
Beta 26 473,078 6,194,081 572,000 5,065,261
Firefox 25 532,878 4,201,867 484,456 4,746,151
Chrome 31 N/A 32,271,400 524,593 31,806,558
Comment 1•11 years ago
|
||
This shows the same odd behavior.
Comment 2•11 years ago
|
||
JIT inspector shows about the same things for "withSelf" and "plain", so I'm not sure why the 2x difference...
There is no obvious reason why withArrow should be so slow, btw; we just seem to be taking a slow path for it.
Reporter | ||
Comment 3•11 years ago
|
||
My test results from comment 0 were from an "Early 2011" model MacBook Pro (8 GB RAM) and very consistent. On my "Mid 2012" model MacBook Pro Retina (16 GB RAM), "withSelf" is only a little faster than "plain", not 2x. The performance difference seems related to Ion because withSelf and plain are much closer with Baseline (on both MacBooks).
withArrow withSelf withBind plain
Nightly 28 (Ion) 443,719 5,675,600 515,022 4,410,839
Nightly 28 (Baseline) 369,219 3,361,748 357,804 3,946,622
Comment 4•11 years ago
|
||
Unfortunately I can't reproduce this. I get the following numbers for the shell testcase:
withArrow: 461254.6125461255
withSelf: 13513513.513513513
withBind: 507872.01625190454
plain: 14705882.352941176
Comment 5•11 years ago
|
||
That was 32-bit, I do see the problem with a 64-bit build:
withArrow: 429922.6139294927
withSelf: 9615384.615384616
withBind: 509424.3504839531
plain: 6097560.975609756
Since you're both on OS X, I assume you also used 64-bit builds. Will take look.
Comment 6•11 years ago
|
||
I commented out the arrow and bind tests and I get the following numbers:
withSelf: 9900990.099009901
plain: 5434782.608695652
If I run plain before withSelf:
plain: 10638297.872340424
withSelf: 5235602.094240838
So the one that runs first is fast? Probably due to GC or something...
Comment 7•11 years ago
|
||
That's entirely possible, yes. :(
Reporter | ||
Comment 8•11 years ago
|
||
withArrow withSelf withBind plain
Nightly 28 390,000 8,231,000 474,000 3,940,000
...
Aurora 30 481,000 8,960,000 556,000 6,597,000
Nightly 31 7,027,000 8,055,000 79,000 11,484,000
Kannan's arrow function optimizations can be seen in withArrow test results in Nightly 31, but withBind perf is about 7x slower in Nightly 31 than Aurora 30. I'm using OS X.
Comment 9•11 years ago
|
||
That withBind regression happened in http://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1
Nothing in there jumps out at me, unfortunately. :( Jan, any ideas?
Flags: needinfo?(jdemooij)
Comment 10•11 years ago
|
||
(In reply to Chris Peterson (:cpeterson) from comment #8)
> Kannan's arrow function optimizations can be seen in withArrow test results
> in Nightly 31
I did these optimizations, withArrow should be about as fast as plain/withSelf now, not sure why that isn't happening. It's possible we are GC'ing etc; see also comment 6...
(In reply to Boris Zbarsky [:bz] from comment #9)
> That withBind regression happened in
> http://hg.mozilla.org/mozilla-central/
> pushloghtml?fromchange=6fa163ff81a3&tochange=4f3443da36a1
>
> Nothing in there jumps out at me, unfortunately. :( Jan, any ideas?
GGC is in there... Forwarding to Terrence.
Flags: needinfo?(jdemooij) → needinfo?(terrence)
Comment 11•11 years ago
|
||
Poking the default heap of Lamba yields some interesting results:
--disable-gcgenerational:
withArrow: 18,518,518.51851852
withSelf: 28,571,428.57142857
withBind: 651,041.6666666666
plain: 21,739,130.43478261
current tip (--enable-gcgenerationa):
withArrow: 52,631,578.94736842
withSelf: 62,500,000
withBind: 89,936.14533681086
plain: 71,428,571.42857143
with patch:
withArrow: 50,000,000
withSelf: 25,641,025.64102564
withBind: 652,315.7208088714
plain: 23,255,813.95348837
Self and plain are benefiting tremendously from participating in GGC by a factor of 3x or so. The reason withBind isn't getting the same benefit appears to be that bind enters the interpreter and fires a bunch more barriers while bumbling about in C++. The implicated barriers in a few minutes of poking about in ggc appear to be Shape::parent and InitialShapeTableRef.
I'm not really sure what the right approach is yet: need to spend some more time figuring out why bind is hitting shapes so hard.
Comment 12•11 years ago
|
||
Okay, now that I've looked into this, bind is crazy suboptimal in the presence of GGC.
(1) Lambda is always nursery allocated and bind always creates a singleton (tenured) JSFunction. This means the proto link always needs barriers.
(2) This unique nursery lambda proto means each new bound function gets its own entry in the initial shapes table, hammering on the barrier on that table.
(3) JSFunction::initBoundFunction immediately converts the shape to a dictionary shape. This conversion hammers hard on the proto barrier from the weird handling of BaseShape in vm/Shape.cpp.
(4) Then the bound function puts the lambda in a reserved slot, firing a slot barrier as well.
I don't really see a great way to cheat here. Making the bound function nursery-allocated didn't help much here and would have hurt the places that need the singleton behavior. We switched to barriering the initial shapes table because sweeping through them is so expensive -- this is still true: reverting that patch was a net slowdown, even on this benchmark.
Removing any one or two of these barriers is only going to get us a marginal speedup compared to the 10x we want. I think the only way forward here is to not allocate lambdas that are about to be bound in the nursery. I went on a brief adventure in the jit directory looking for where to scan the MIR for lambda, followed by box, followed by getcallnative for CallOrConstructBoundFunction. I quickly got lost and decided I should at least ask if this is a good idea before I spend more time on it.
Flags: needinfo?(terrence) → needinfo?(jdemooij)
Comment 13•11 years ago
|
||
Terrence, thanks for looking into this. Our current fun.bind implementation is hopelessly inefficient, both creating them (as you said, initBoundFunction doing dictionary mode stuff etc) and calling them (calling a C++ native that calls back into JS.)
Both V8 and JSC self-host function.bind. I think now's a good time to investigate doing that; I'll try to prototype something later today.
Flags: needinfo?(jdemooij)
Updated•11 years ago
|
Comment 14•8 years ago
|
||
Some updated measurements:
withArrow: 43478260.86956522
withSelf: 90909090.9090909
withBind: 1748251.7482517483
plain: 83333333.33333333
Now withSelf and plain are in the same ballpark, but withBind is *much* slower than withArrow.
Comment 15•8 years ago
|
||
withArrow, withSelf, and plain are now all about the same speed. The results are somewhat noisy, but over a few runs and with more iterations it's pretty clear.
withBind tests something a bit different: the performance of both the binding operation and calling a bound function. Give that, the perf difference isn't even that bad :) Changing the shell test to only test calling performance of all cases still has withBind ~3x slower though, which shouldn't be the case after bug 1000780 and some additional optimizations jandem did last year. I'll file a new bug about that.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•