1404950 - new Promise(resolve => {}) is a performance footgun

Reporter

Description

•

7 years ago

It's common for JS code to want to do this: var p = new Promise(resolve => { /* code */ }); Right now this is super expensive in SpiderMonkey because we'd need to clone the lambda function every single time we create a new Promise. This shows up quite badly in many profiles, for example https://perfht.ml/2fE6qTL from bug 1400534. This forces people to do crazy workarounds to avoid the idiomatic way to write this code such as https://reviewboard.mozilla.org/r/185408/diff/1 (from bug 1404652) which basically avoids the issue by storing the callback in a variable and passing that to all Promise constructors.

(no longer active)

Reporter

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1400534

Boris Zbarsky [:bzbarsky]

Comment 1

•

7 years ago

On this silly microbenchmark (perils noted): var count = 100000; var start = performance.now(); for (var i = 0; i < count; ++i) { new Promise((r) => void(0)); } var stop = performance.now(); alert((stop - start)/count * 1e6); I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome...

(no longer active)

Reporter

Updated

•

7 years ago

Flags: needinfo?(jdemooij)

Tom S [:evilpie]

Updated

•

7 years ago

Blocks: 1342037

Till Schneidereit [:till]

Comment 2

•

7 years ago

(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #1) > On this silly microbenchmark (perils noted): > I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome... Is this with async stacks enabled or disabled? Until bug 1280819 is fixed, that'll be a huge difference.

Flags: needinfo?(bzbarsky)

Boris Zbarsky [:bzbarsky]

Comment 3

•

7 years ago

> Is this with async stacks enabled or disabled? Disabled. I always have them disabled, because otherwise performance measurements are sort of meaningless. :( With async stacks enabled, the Firefox number is ~1050ns.

Flags: needinfo?(bzbarsky)

Till Schneidereit [:till]

Comment 4

•

7 years ago

(In reply to Boris Zbarsky [:bz] (still digging out from vacation mail) from comment #3) > > Is this with async stacks enabled or disabled? > > Disabled. I always have them disabled, because otherwise performance > measurements are sort of meaningless. :( > > With async stacks enabled, the Firefox number is ~1050ns. Ok, that's quite unfortunate. I think v8 at least inlines Promise constructors, so that's probably something we have to look at.

Lars T Hansen [:lth]

Comment 5

•

7 years ago

If we know that the escaping closure won't be mutated and won't be used for its identity (which we might be able to know in this case, Promise being built-in?) then the closure creation could be hoisted by the compiler, I guess. More generally we should have some kind of lamdba lifting, of course.

Jan de Mooij [:jandem]

Comment 6

•

7 years ago

A few thoughts: (1) We used to have a no-clone optimization for certain lambdas, but it's really hard to do for JS because of things like f.caller escaping the function. It's possible we could bring back a limited form of this, maybe just for strict mode. (2) Boris' benchmark in comment 1 spends most of its time in the Promise constructor, not the lambda cloning, AFAICS. For the micro-benchmark below I get 370-400 ns in the shell (with --no-async-stacks). If I manually hoist the lambda it improves to 350-380 ns - a very small win. (3) This runs in Ion. If the chrome JS code is really hot, we should double check it's Ion compiled, because lambda cloning will be relatively slower in Baseline. alert = print; performance = Date; function f() { var count = 100000; var start = performance.now(); for (var i = 0; i < count; ++i) { new Promise(r => void 0); } var stop = performance.now(); alert((stop - start)/count * 1e6); } f();

Jan de Mooij [:jandem]

Comment 7

•

7 years ago

Self-hosting the promise constructor would be nice, because these JS -> C++ -> JS calls are bad for perf and hard to optimize. A profile does show a bunch of overhead here that we can probably eliminate.

Jan de Mooij [:jandem]

Comment 8

•

7 years ago

(In reply to Jan de Mooij [:jandem] from comment #7) > A profile does show a bunch of overhead here that we can probably eliminate. In particular 6-7% is under JS::dbg::onNewPromise. This is a public API that can now be private (since Promise got moved into SpiderMonkey) and then we can easily fast-path this. I'll patch.

Jan de Mooij [:jandem]

Updated

•

7 years ago

Depends on: 1405330

Jan de Mooij [:jandem]

Comment 9

•

7 years ago

I'll spend some more time on this tomorrow. Allocating the PromiseObject and the resolving functions can hopefully be optimized quite a bit.

(no longer active)

Reporter

Comment 10

•

7 years ago

(In reply to Jan de Mooij [:jandem] from comment #6) > (3) This runs in Ion. If the chrome JS code is really hot, we should double > check it's Ion compiled, because lambda cloning will be relatively slower in > Baseline. AFAIK the chrome JS code in question in the profile in comment 0 is very hot but it doesn't run in loops, it runs in response to HTTP traffic originating from the child process observed as IPC traffic in the parent process, so basically it's the result of the same function running many many times but not in a loop. Is that something that will get Ion compiled?

Nicolas B. Pierron [:nbp]

Comment 11

•

7 years ago

(In reply to :Ehsan Akhgari (needinfo please, extremely long backlog) from comment #10) > AFAIK the chrome JS code in question in the profile in comment 0 is very hot > but it doesn't run in loops, it runs in response to HTTP traffic originating > from the child process observed as IPC traffic in the parent process, so > basically it's the result of the same function running many many times but > not in a loop. Is that something that will get Ion compiled? It should get Baseline-compiled after 10 calls, and Ion-compiled after 1000.

Kris Maglione [:kmag]

Comment 12

•

7 years ago

I'm not sure this is a great testcase for the lambda clone overhead I've been seeing. I suspect we have optimizations to prevent cloning the lambda in the case where it doesn't actually capture anything. When I run a separate test with an actual lambda capture, I see significantly more overhead when passing an arrow function literal as opposed to a reference: "use strict"; let iterations = 1024 * 1024; { let start = performance.now(); for (let i = 0; i < iterations; i++) { let deferred = {}; deferred.promise = new Promise((resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }); } let end = performance.now(); let delta = (end - start) * 1000 * 1000; document.body.appendChild(document.createTextNode(`Per-iteration: ${Math.round(delta / iterations)}ns`)); } document.body.appendChild(document.createElement("br")); { let start = performance.now(); let deferred; let lambda = (resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }; for (let i = 0; i < iterations; i++) { deferred = {}; deferred.promise = new Promise(lambda); } let end = performance.now(); let delta = (end - start) * 1000 * 1000; document.body.appendChild(document.createTextNode(`Per-iteration: ${Math.round(delta / iterations)}ns`)); } Results: Per-iteration: 646ns Per-iteration: 507ns In the patch Ehsan mentioned in comment 0, that was adding up to 5-10% of the total overhead in a pretty hot code path (and possibly more, since it seems to increase GC pressure, but I'm not sure how much). And, unfortunately, I'm not sure how much of that code path manages to run in Ion, since the non-syntactic scope stuff we turned on in bug 1381961 is known to break Ion in a lot of cases, and it's hard to tell with JIT is being used from profiles, sometimes. I do know that at least *some* of the code runs in Ion, though.

André Bargull [:anba]

Comment 13

•

7 years ago

Nicolas B. Pierron [:nbp]

Comment 14

•

7 years ago

(In reply to Kris Maglione [:kmag] (long backlog; ping on IRC if you're blocked) from comment #12) > I suspect we have optimizations to prevent cloning the lambda > in the case where it doesn't actually capture anything. If it does not capture anything, then what you probably noticed is the lack of scope chain creation, to store the captured variables. We have optimizations, in IonMonkey, to avoid cloning the lambda it does not escape. As the Promise constructor is not in JS, we actually have no way to infer that, and thus they are always considered as escaping, which I guess is likely the default with promises anyway.

Kris Maglione [:kmag]

Comment 15

•

7 years ago

(In reply to Nicolas B. Pierron [:nbp] from comment #14) > If it does not capture anything, then what you probably noticed is the lack > of scope chain creation, to store the captured variables. That seems closer to reality, assuming scope chain creation is the most expensive part. It would also explain why switching from `let` to `var` in the loop made things faster. A more exhaustive test: let iterations = 1024 * 1024; function run(callback) { let start = performance.now(); for (let i = 0; i < iterations; i++) { callback(); } let end = performance.now(); let delta = (end - start) * 1000 * 1000; document.body.appendChild(document.createTextNode(`Per-iteration ${callback.name}(): ${Math.round(delta / iterations)}ns`)); document.body.appendChild(document.createElement("br")); } let deferred; let lambda = (resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }; function newPromiseCloneNewScope() { let deferred = {}; deferred.promise = new Promise((resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }); return deferred; } function newPromiseCloneSameScope() { deferred = {}; deferred.promise = new Promise((resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }); return deferred; } function newPromiseNoClone() { deferred = {}; deferred.promise = new Promise(lambda); return deferred; } run(newPromiseCloneNewScope); run(newPromiseCloneSameScope); run(newPromiseNoClone); { let start = performance.now(); let deferred; for (let i = 0; i < iterations; i++) { deferred = {}; deferred.promise = new Promise((resolve, reject) => { deferred.resolve = resolve; deferred.reject = reject; }); } let end = performance.now(); let delta = (end - start) * 1000 * 1000; document.body.appendChild(document.createTextNode(`Per-iteration for () CloneNoNewScope: ${Math.round(delta / iterations)}ns`)); document.body.appendChild(document.createElement("br")); } Results: Per-iteration newPromiseCloneNewScope(): 763ns Per-iteration newPromiseCloneSameScope(): 618ns Per-iteration newPromiseNoClone(): 513ns Per-iteration for () CloneNoNewScope: 534ns But, unfortunately, this doesn't help my original use case very much, since every promise constructor call is in a new function invocation, and the no-clone version is still by far the fastest.

Jan de Mooij [:jandem]

Updated

•

7 years ago

Depends on: 1405999

Jan de Mooij [:jandem]

Updated

•

7 years ago

Depends on: 1406870

Naveed Ihsanullah [:naveed]

Updated

•

7 years ago

status-firefox57: --- → wontfix

Keywords: perf

Priority: -- → P3

Whiteboard: [qf:p3]

Jan de Mooij [:jandem]

Comment 16

•

7 years ago

(In reply to Boris Zbarsky [:bz] (no decent commit message means r-) from comment #1) > I see ~450ns in Firefox, ~230 in Safari, ~160ns in Chrome... Boris, this should be quite a bit faster with the latest Nightly, if you want to try it. Still a lot more to shave off though.

Boris Zbarsky [:bzbarsky]

Comment 17

•

7 years ago

Yes, on an m-c build from today looks somewhat faster (now ~260 in Firefox, ~170 in Safari; I think it got colder since my last measurements, so less thermal throttling...)

Jan de Mooij [:jandem]

Comment 18

•

7 years ago

Cancelling needinfo for now. We improved this a lot, there's still more to do though.

Flags: needinfo?(jdemooij)

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Whiteboard: [qf:p3]

BMO Automation

Updated

•

2 years ago

Severity: normal → S3