Open
Bug 1404950
Opened 7 years ago
Updated 2 years ago
new Promise(resolve => {}) is a performance footgun
Categories
(Core :: JavaScript Engine, enhancement, P3)
Core
JavaScript Engine
Tracking
()
People
(Reporter: ehsan.akhgari, Unassigned)
References
(Depends on 1 open bug, Blocks 1 open bug)
Details
(Keywords: perf)
It's common for JS code to want to do this:
var p = new Promise(resolve => {
/* code */
});
Right now this is super expensive in SpiderMonkey because we'd need to clone the lambda function every single time we create a new Promise. This shows up quite badly in many profiles, for example https://perfht.ml/2fE6qTL from bug 1400534. This forces people to do crazy workarounds to avoid the idiomatic way to write this code such as https://reviewboard.mozilla.org/r/185408/diff/1 (from bug 1404652) which basically avoids the issue by storing the callback in a variable and passing that to all Promise constructors.
Reporter | ||
Updated•7 years ago
|
Comment 1•7 years ago
|
||
On this silly microbenchmark (perils noted):
var count = 100000;
var start = performance.now();
for (var i = 0; i < count; ++i) {
new Promise((r) => void(0));
}
var stop = performance.now();
alert((stop - start)/count * 1e6);
I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome...
Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(jdemooij)
Comment 2•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #1)
> On this silly microbenchmark (perils noted):
> I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome...
Is this with async stacks enabled or disabled? Until bug 1280819 is fixed, that'll be a huge difference.
Flags: needinfo?(bzbarsky)
Comment 3•7 years ago
|
||
> Is this with async stacks enabled or disabled?
Disabled. I always have them disabled, because otherwise performance measurements are sort of meaningless. :(
With async stacks enabled, the Firefox number is ~1050ns.
Flags: needinfo?(bzbarsky)
Comment 4•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #3)
> > Is this with async stacks enabled or disabled?
>
> Disabled. I always have them disabled, because otherwise performance
> measurements are sort of meaningless. :(
>
> With async stacks enabled, the Firefox number is ~1050ns.
Ok, that's quite unfortunate. I think v8 at least inlines Promise constructors, so that's probably something we have to look at.
Comment 5•7 years ago
|
||
If we know that the escaping closure won't be mutated and won't be used for its identity (which we might be able to know in this case, Promise being built-in?) then the closure creation could be hoisted by the compiler, I guess.
More generally we should have some kind of lamdba lifting, of course.
Comment 6•7 years ago
|
||
A few thoughts:
(1) We used to have a no-clone optimization for certain lambdas, but it's really hard to do for JS because of things like f.caller escaping the function. It's possible we could bring back a limited form of this, maybe just for strict mode.
(2) Boris' benchmark in comment 1 spends most of its time in the Promise constructor, not the lambda cloning, AFAICS. For the micro-benchmark below I get 370-400 ns in the shell (with --no-async-stacks). If I manually hoist the lambda it improves to 350-380 ns - a very small win.
(3) This runs in Ion. If the chrome JS code is really hot, we should double check it's Ion compiled, because lambda cloning will be relatively slower in Baseline.
alert = print;
performance = Date;
function f() {
var count = 100000;
var start = performance.now();
for (var i = 0; i < count; ++i) {
new Promise(r => void 0);
}
var stop = performance.now();
alert((stop - start)/count * 1e6);
}
f();
Comment 7•7 years ago
|
||
Self-hosting the promise constructor would be nice, because these JS -> C++ -> JS calls are bad for perf and hard to optimize.
A profile does show a bunch of overhead here that we can probably eliminate.
Comment 8•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #7)
> A profile does show a bunch of overhead here that we can probably eliminate.
In particular 6-7% is under JS::dbg::onNewPromise. This is a public API that can now be private (since Promise got moved into SpiderMonkey) and then we can easily fast-path this. I'll patch.
Comment 9•7 years ago
|
||
I'll spend some more time on this tomorrow. Allocating the PromiseObject and the resolving functions can hopefully be optimized quite a bit.
Reporter | ||
Comment 10•7 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #6)
> (3) This runs in Ion. If the chrome JS code is really hot, we should double
> check it's Ion compiled, because lambda cloning will be relatively slower in
> Baseline.
AFAIK the chrome JS code in question in the profile in comment 0 is very hot but it doesn't run in loops, it runs in response to HTTP traffic originating from the child process observed as IPC traffic in the parent process, so basically it's the result of the same function running many many times but not in a loop. Is that something that will get Ion compiled?
Comment 11•7 years ago
|
||
(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #10)
> AFAIK the chrome JS code in question in the profile in comment 0 is very hot
> but it doesn't run in loops, it runs in response to HTTP traffic originating
> from the child process observed as IPC traffic in the parent process, so
> basically it's the result of the same function running many many times but
> not in a loop. Is that something that will get Ion compiled?
It should get Baseline-compiled after 10 calls, and Ion-compiled after 1000.
Comment 12•7 years ago
|
||
I'm not sure this is a great testcase for the lambda clone overhead I've been seeing. I suspect we have optimizations to prevent cloning the lambda in the case where it doesn't actually capture anything. When I run a separate test with an actual lambda capture, I see significantly more overhead when passing an arrow function literal as opposed to a reference:
"use strict";
let iterations = 1024 * 1024;
{
let start = performance.now();
for (let i = 0; i < iterations; i++) {
let deferred = {};
deferred.promise = new Promise((resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
});
}
let end = performance.now();
let delta = (end - start) * 1000 * 1000;
document.body.appendChild(document.createTextNode(`Per-iteration: ${Math.round(delta / iterations)}ns`));
}
document.body.appendChild(document.createElement("br"));
{
let start = performance.now();
let deferred;
let lambda = (resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
};
for (let i = 0; i < iterations; i++) {
deferred = {};
deferred.promise = new Promise(lambda);
}
let end = performance.now();
let delta = (end - start) * 1000 * 1000;
document.body.appendChild(document.createTextNode(`Per-iteration: ${Math.round(delta / iterations)}ns`));
}
Results:
Per-iteration: 646ns
Per-iteration: 507ns
In the patch Ehsan mentioned in comment 0, that was adding up to 5-10% of the total overhead in a pretty hot code path (and possibly more, since it seems to increase GC pressure, but I'm not sure how much).
And, unfortunately, I'm not sure how much of that code path manages to run in Ion, since the non-syntactic scope stuff we turned on in bug 1381961 is known to break Ion in a lot of cases, and it's hard to tell with JIT is being used from profiles, sometimes. I do know that at least *some* of the code runs in Ion, though.
Comment 13•7 years ago
|
||
At least when testing in the shell, the overhead in comment #12 for the per-iteration lambda seems to stem from using |let| instead of |var| for the |deferred| variable.
Comment 14•7 years ago
|
||
(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #12)
> I suspect we have optimizations to prevent cloning the lambda
> in the case where it doesn't actually capture anything.
If it does not capture anything, then what you probably noticed is the lack of scope chain creation, to store the captured variables.
We have optimizations, in IonMonkey, to avoid cloning the lambda it does not escape. As the Promise constructor is not in JS, we actually have no way to infer that, and thus they are always considered as escaping, which I guess is likely the default with promises anyway.
Comment 15•7 years ago
|
||
(In reply to Nicolas B. Pierron [:nbp] from comment #14)
> If it does not capture anything, then what you probably noticed is the lack
> of scope chain creation, to store the captured variables.
That seems closer to reality, assuming scope chain creation is the most expensive part. It would also explain why switching from `let` to `var` in the loop made things faster.
A more exhaustive test:
let iterations = 1024 * 1024;
function run(callback) {
let start = performance.now();
for (let i = 0; i < iterations; i++) {
callback();
}
let end = performance.now();
let delta = (end - start) * 1000 * 1000;
document.body.appendChild(document.createTextNode(`Per-iteration ${callback.name}(): ${Math.round(delta / iterations)}ns`));
document.body.appendChild(document.createElement("br"));
}
let deferred;
let lambda = (resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
};
function newPromiseCloneNewScope() {
let deferred = {};
deferred.promise = new Promise((resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
});
return deferred;
}
function newPromiseCloneSameScope() {
deferred = {};
deferred.promise = new Promise((resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
});
return deferred;
}
function newPromiseNoClone() {
deferred = {};
deferred.promise = new Promise(lambda);
return deferred;
}
run(newPromiseCloneNewScope);
run(newPromiseCloneSameScope);
run(newPromiseNoClone);
{
let start = performance.now();
let deferred;
for (let i = 0; i < iterations; i++) {
deferred = {};
deferred.promise = new Promise((resolve, reject) => {
deferred.resolve = resolve;
deferred.reject = reject;
});
}
let end = performance.now();
let delta = (end - start) * 1000 * 1000;
document.body.appendChild(document.createTextNode(`Per-iteration for () CloneNoNewScope: ${Math.round(delta / iterations)}ns`));
document.body.appendChild(document.createElement("br"));
}
Results:
Per-iteration newPromiseCloneNewScope(): 763ns
Per-iteration newPromiseCloneSameScope(): 618ns
Per-iteration newPromiseNoClone(): 513ns
Per-iteration for () CloneNoNewScope: 534ns
But, unfortunately, this doesn't help my original use case very much, since every promise constructor call is in a new function invocation, and the no-clone version is still by far the fastest.
Updated•7 years ago
|
Comment 16•7 years ago
|
||
(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #1)
> I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome...
Boris, this should be quite a bit faster with the latest Nightly, if you want to try it. Still a lot more to shave off though.
Comment 17•7 years ago
|
||
Yes, on an m-c build from today looks somewhat faster (now ~260 in Firefox, ~170 in Safari; I think it got colder since my last measurements, so less thermal throttling...)
Comment 18•7 years ago
|
||
Cancelling needinfo for now. We improved this a lot, there's still more to do though.
Flags: needinfo?(jdemooij)
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [qf:p3]
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•