Open
Bug 1393712
Opened 7 years ago
Updated 2 years ago
for-await-of with Async Generator is slow
Categories
(Core :: JavaScript Engine, enhancement, P2)
Core
JavaScript Engine
Tracking
()
People
(Reporter: arai, Unassigned)
References
(Depends on 5 open bugs)
Details
(Keywords: perf, Whiteboard: [js:perf])
from https://github.com/tc39/proposal-async-iteration/issues/112
--- code ---
async function* asyncRange(from, to) {
for (let i = from; i < to; i++) {
yield i;
}
}
async function main() {
const start = Date.now()
// Prevent dead code elimination although I doubt
// there's any yet
let total = 0
for await (const i of asyncRange(0, 550000)) {
total += i
}
const end = Date.now()
console.log(`Total: ${ total }`)
console.log(`Time Taken: ${ end - start }`)
}
main()
--------
on JS shell, it takes ~5 seconds, and on browser, it takes ~15 seconds.
haven't checked yet but bug 1317481 might be related.
Comment 1•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #0)
> haven't checked yet but bug 1317481 might be related.
That, and all the various other Promise optimizations. Plus, importantly, optimizations for Generators. I added a dependency on bug 1317690 for that, but I'm pretty sure that even without Ion compilation there's more we can do to make Generators faster.
Comment 2•7 years ago
|
||
A big part of this is when have generators, we turn all bindings into heap allocated instead of using the frame. This is done to avoid copying stack on every yield. This is a problem when there are lets in a loop we are allocating a new lexical environment each iteration. I'm in the process of looking at a solution for this. One one of the ARES-6 benchmarks, something like 40% of runtime is eliminated by turning the let into a var.
Reporter | ||
Comment 3•7 years ago
|
||
when I profile the code on shell, I see CaptureCurrentStack takes some amount of time in several places.
while it's useful information for debugging, there might be some chance that internal-use-only promises don't need that information.
might be nice to take a look.
Reporter | ||
Comment 4•7 years ago
|
||
with default configuration, it takes 5 seconds, but with --no-async-stacks, it takes 3.5 seconds.
Reporter | ||
Comment 5•7 years ago
|
||
also, I see CreateIterResultObject. that currently uses DefineProperty,
but that could use fixed slots, like RegExp match result object?
Reporter | ||
Comment 6•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #3)
> when I profile the code on shell, I see CaptureCurrentStack takes some
> amount of time in several places.
> while it's useful information for debugging, there might be some chance that
> internal-use-only promises don't need that information.
> might be nice to take a look.
or perhaps we could share the captured stack object between 2 promises?
Comment 7•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #6)
> (In reply to Tooru Fujisawa [:arai] from comment #3)
> > when I profile the code on shell, I see CaptureCurrentStack takes some
> > amount of time in several places.
> > while it's useful information for debugging, there might be some chance that
> > internal-use-only promises don't need that information.
> > might be nice to take a look.
>
> or perhaps we could share the captured stack object between 2 promises?
That seems worthwhile, yes.
In general, I wouldn't worry about the async stacks too much: they're not enabled in release builds of the browser, and eventually should be disabled in Nightly as well, as long as the debugger isn't open. See bug 1280819.
Ultimately, bug 1317481 is what we'll want here, I think.
Reporter | ||
Comment 8•7 years ago
|
||
I'm wondering if SpeciesConstructor called in js::OriginalPromiseThen can be optimized out.
that's taking 4% of the whole time, after bug 1394682 fix.
Reporter | ||
Comment 9•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #8)
> I'm wondering if SpeciesConstructor called in js::OriginalPromiseThen can be
> optimized out.
looks like bug 1386534 does that.
Depends on: 1386534
Reporter | ||
Comment 10•7 years ago
|
||
NewDenseCopiedArray in GeneratorObject::suspend also takes ~5% of whole time.
would it be possible to reuse the array between suspend calls?
Reporter | ||
Comment 11•7 years ago
|
||
surely bug 1317481 improves the performance
3587 [ms] => 710 [ms]
Updated•7 years ago
|
Priority: -- → P2
Whiteboard: [js:perf][qf:p3]
Reporter | ||
Comment 12•7 years ago
|
||
somehow, current best performance after bug 1396499 and bug 1317481 is ~2200 [ms], not ~700 [ms].
I guess I should figure out when and why the performance gain reduced.
then, the another thing I noticed in profiler is, the allocation of array and function inside Promise job handling.
I'm not sure if it's possible to somehow reuse them or reduce the allocation tho...
also, it might be possible to inline JSOP_YIELD in baseline in some case, after fixing bug 1396499.
Reporter | ||
Comment 13•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #12)
> then, the another thing I noticed in profiler is, the allocation of array
> and function inside Promise job handling.
> I'm not sure if it's possible to somehow reuse them or reduce the allocation
> tho...
bug 1342044 seems to be related.
Reporter | ||
Comment 14•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #12)
> also, it might be possible to inline JSOP_YIELD in baseline in some case,
> after fixing bug 1396499.
filed bug 1409231
Reporter | ||
Comment 15•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #12)
> somehow, current best performance after bug 1396499 and bug 1317481 is ~2200
> [ms], not ~700 [ms].
> I guess I should figure out when and why the performance gain reduced.
properly rebased and now the time is ~500 [ms] :)
Reporter | ||
Comment 16•7 years ago
|
||
with fixed version, the following code takes long:
13.7% AsyncGeneratorRequest::create
8.2% LexicalEnvironmentObject::create
6.7% CreatePromiseObjectWithoutResolutionFunctions
6.3% CreateIterResultObject
6.2% ResolvePromiseInternal
Reporter | ||
Comment 17•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #16)
> with fixed version, the following code takes long:
> 13.7% AsyncGeneratorRequest::create
filed bug 1410283.
Reporter | ||
Comment 18•7 years ago
|
||
After bug 1317481 (and others), GetPropertyPure inside PromiseOptimizable is taking some amount of time (~5% of total).
it needs to check the property value, so we cannot use shape...
Reporter | ||
Comment 19•7 years ago
|
||
(In reply to Tooru Fujisawa [:arai] from comment #18)
> After bug 1317481 (and others), GetPropertyPure inside PromiseOptimizable is
> taking some amount of time (~5% of total).
> it needs to check the property value, so we cannot use shape...
filed bug 1410695.
Depends on: 1410695
Reporter | ||
Comment 20•7 years ago
|
||
with all current WIP patches applied, the performance difference between let+const vs var is the following:
original (let+const): ~400 [ms]
replaced them to var: ~310 [ms]
Reporter | ||
Comment 21•7 years ago
|
||
now the time taken by setFixedSlot looks somewhat significant, in some case.
28 [ms] (8.5%) is taken by CreateIterResultObject
20 [ms] (6.3%) is taken by AsyncGeneratorObject::createRequest, that does setFixedSlot (the request object itself is cached)
10 [ms] (3.6%) is taken by ResolvePromiseInternal that does NativeGetProperty
9 [ms] (2.7%) is taken by ResolvePromise that does setFixedSlot.
one more thing I can think of is, if we use Async Generator from for-await-of, the IterResultObject is not exposed to user script, I think, and we could reuse it, instead of creating it every time,
or perhaps, just somehow avoid creating the IterResultObject and pass value/done props directly?
Reporter | ||
Comment 22•7 years ago
|
||
(In reply to Ted Campbell [:tcampbell] (PTO until Oct 30) from comment #2)
> A big part of this is when have generators, we turn all bindings into heap
> allocated instead of using the frame. This is done to avoid copying stack on
> every yield. This is a problem when there are lets in a loop we are
> allocating a new lexical environment each iteration. I'm in the process of
> looking at a solution for this. One one of the ARES-6 benchmarks, something
> like 40% of runtime is eliminated by turning the let into a var.
filed bug 1412202.
Updated•3 years ago
|
Performance Impact: --- → P3
Whiteboard: [js:perf][qf:p3] → [js:perf]
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•