Warp: Trial Inlining Prototype
Categories
(Core :: JavaScript Engine: JIT, enhancement, P1)
Tracking
()
Tracking | Status | |
---|---|---|
firefox82 | --- | fixed |
People
(Reporter: iain, Assigned: iain)
References
(Blocks 1 open bug)
Details
Attachments
(40 files)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
WarpBuilder relies heavily on transpiling CacheIR. We would like to be able to inline scripted functions in WarpBuilder, but this runs into problems for functions with many different callers. The ICs in such a function are likely to be polymorphic, even if they are monomorphic with respect to each individual caller.
The new approach: somewhere between Baseline and Warp we add a new warmup threshold. When a script reaches that threshold, we analyze that script and its callees to choose a set of callees that we would like to inline. For each call site we decide to inline, we allocate a new ICScript with a fresh set of ICEntries for the callee, and replace the call IC in the caller with a new special stub that will invoke the callee using the new ICScript. (If the callee gets hot, we can inline recursively.) When the root script hits the Warp warmup threshold, we can use the CacheIR from the inlined ICScripts, giving us more accurate information.
Updated•4 years ago
|
Assignee | ||
Comment 1•4 years ago
|
||
When we clone a JitScript for inlining, there is some data that we want to share and some data that needs to be separate. This patch moves the separate data (warmup count, ICEntry array) into ICScript and embeds an ICScript into JitScript. The layout of a JitScript in memory is mostly unchanged by this patch.
Assignee | ||
Comment 2•4 years ago
|
||
We want to be able to swap out ICScripts, so we can't hardcode the ICEntry in EmitCallIC. Hoisting the ICStubReg initialization out of EmitCallIC (where it was the same across architectures) lets us change it in a subsequent patch.
Depends on D80196
Assignee | ||
Comment 3•4 years ago
|
||
Depends on D80197
Assignee | ||
Comment 4•4 years ago
|
||
The previous patch put the ICScript pointer in the baseline frame. This patch uses that pointer to find the right ICEntry.
Depends on D80198
Assignee | ||
Comment 5•4 years ago
|
||
A previous patch put the ICScript pointer in the baseline frame. This patch uses that pointer to find the warmup count.
Depends on D80199
Assignee | ||
Comment 6•4 years ago
|
||
There were already three copies of this code, and I was about to add a fourth.
WarpOracle has similar code for accessing stubDataStart
. I wasn't sure that we needed an accessor for that yet.
Depends on D80200
Assignee | ||
Updated•4 years ago
|
Assignee | ||
Comment 7•4 years ago
|
||
Our plan for invalidating transpiled Warp ICs when guards fail is to bailout (leaving us in the baseline interpreter) and invalidate when attaching a new IC. To make this work with trial inlining, we need to ensure that we can invalidate the root of the inlining tree, not the original version of the inlined script. The most robust way to do this is to unconditionally store the ICScript in BaselineFrame, instead of only storing it in baseline and omitting it from blinterp.
This patch adds code to setInterpreterFields and the blinterp prologue to store the default ICScript, and fixes emit_JumpTarget to find the new entry in the correct ICScript. bumpInterpreterICEntry does not need to be changed. Right now the ICScript in the blinterp frame will always be the default ICScript. In the future, when we actually do Warp inlining, we can add setInterpreterFieldsForBailout and make sure that the correct ICScript is provided.
(Drive-by cleanup: BaselineFrame::addressOfEnvironmentChain was dead code.)
Depends on D80201
Assignee | ||
Comment 8•4 years ago
|
||
My current plan for stub allocation is as follows. A JitScript contains a fallbackStubSpace, which is used for all of its own stubs. When we decide to do trial inlining for a JitScript, we create an InliningRoot, which is owned by the JitScript. The InliningRoot contains an array of unique pointers to inlined ICScripts and has a single fallbackStubSpace which is shared by all those ICScripts.
I wrote this code when I was thinking that we would throw the InliningRoot away after Warp compilation. If we intend to keep it around indefinitely, it might be better to reuse the stubspace in the root JitScript, in which case we don't need this patch.
Depends on D81795
Assignee | ||
Comment 9•4 years ago
|
||
Depends on D81796
Assignee | ||
Comment 10•4 years ago
|
||
In a later patch it will be useful to be able to declare a CallFlags and give it a value later.
Depends on D81797
Assignee | ||
Comment 11•4 years ago
|
||
When we create the stub that calls an inlined script, we want to ensure that we use all the same guards as the original stub. To do so robustly, this patch adds CacheIRCloner, which can read an op from a CacheIRReader and write a copy of that op to a CacheIRWriter. This lets us clone the parts of the stub that we need and then add new ops.
In the future, this code may also be useful when folding together ICs with the same CacheIR.
Depends on D81798
Assignee | ||
Comment 12•4 years ago
|
||
This is mostly the same as CallScriptedFunction. The key differences are:
-
We have an extra argument, the ICScript, which must be stored in the JSContext so the Baseline prologue can find it.
-
We have to ensure that we call the baseline. This is a little bit awkward. We need to verify that the target still has a BaselineScript before calling
updateArgc
, which will overwrite the input operand for spread calls. However, on x86 32-bit, we don't have a spare register to hold the code pointer while we are pushing the arguments onto the stack. I eventually compromised on just reloading the code pointer for x86-32.
Depends on D81799
Assignee | ||
Comment 13•4 years ago
|
||
This patch sets up the framework to call DoTrialInlining when we want to inline. The next patch actually implements DoTrialInlining.
Depends on D81800
Assignee | ||
Comment 14•4 years ago
|
||
This patch implements the "trial" part of trial inlining. We find inlining candidates, clone them, create new ICScripts, and attach new ICs. When WarpOracle runs, we have accurate information available. The next step is to use that information to actually inline.
With this patch, --warp + --warp-trial-inlining passes almost all jit-tests, with the exception of a couple (ion/for-in-iterator-1.js
and self-hosting/is-possibly-wrapped-typed-array.js
) that rely on looping until inIon
is true. If we do a trial-inlining of the script containing the loop, then the warmup counter for the cloned ICScript gets hot, but the warmup counter for the original script (used in GetOptimizationLevel) is not hot enough to compile. I'm holding off on fixing that until we have decided what our policy is for inlined scripts that get hot.
Depends on D81801
Updated•4 years ago
|
Comment 15•4 years ago
|
||
Assignee | ||
Comment 16•4 years ago
|
||
Comment 17•4 years ago
|
||
Comment 18•4 years ago
|
||
Comment 19•4 years ago
|
||
Comment 20•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/75801659b613
https://hg.mozilla.org/mozilla-central/rev/8af0842d4e7d
https://hg.mozilla.org/mozilla-central/rev/ea762fef4139
https://hg.mozilla.org/mozilla-central/rev/da3a5a1a4414
https://hg.mozilla.org/mozilla-central/rev/55574569ada4
https://hg.mozilla.org/mozilla-central/rev/3c9fe96e1f50
https://hg.mozilla.org/mozilla-central/rev/88d8782f9cab
https://hg.mozilla.org/mozilla-central/rev/62a055d16919
https://hg.mozilla.org/mozilla-central/rev/f83599523d2b
https://hg.mozilla.org/mozilla-central/rev/0f907f953c9e
https://hg.mozilla.org/mozilla-central/rev/1aa0f867fca2
https://hg.mozilla.org/mozilla-central/rev/9bd1a1409856
https://hg.mozilla.org/mozilla-central/rev/8b4d525ae0e2
https://hg.mozilla.org/mozilla-central/rev/8695942e5375
Comment 21•4 years ago
|
||
Comment 22•4 years ago
|
||
bugherder |
Assignee | ||
Comment 23•4 years ago
|
||
Assignee | ||
Comment 24•4 years ago
|
||
Almost all of the data in WarpBuilder was already script-specific, so instead of renaming it to WarpScriptBuilder everywhere, I pulled out a WarpCompilation class to hold the data that is shared between scripts.
Depends on D83179
Assignee | ||
Comment 25•4 years ago
|
||
Preparing for inlining, at which point the WarpSnapshot / WarpScriptSnapshot relationship is no longer 1-to-1.
Depends on D83180
Assignee | ||
Comment 26•4 years ago
|
||
When taking a snapshot for inlining, we have to read the CacheIR to find the ICScript and the target. We could store the target in another field in CallInlinedFunction, but that could be awkward in the future if we want to inline without trial inlining, because we won't have a CallInlinedFunction available. Instead, I hoisted the CacheIR-reading code in maybeInlineCall into its own function that can be shared by TrialInlining and the snapshot code.
Depends on D83182
Assignee | ||
Comment 27•4 years ago
|
||
I based the InliningTree / CompileInfo code on IonBuilder::inlineScriptedCall.
In the future, if we want to inline cases where we didn't allocate a new ICScript, it should just be a matter of changing a few lines in maybeInlineIC. We can use the default ICScript off the target script.
(Note: we now check in maybeInlineCall that we don't already have a CallInlinedFunction. This should not normally happen, but could happen if we reset the warmup counter for a script.)
Depends on D83183
Comment 28•4 years ago
|
||
Comment 29•4 years ago
|
||
bugherder |
Assignee | ||
Comment 30•4 years ago
|
||
We want to use this code in WarpBuilder. It's cleaner if it just returns a bool.
Assignee | ||
Comment 31•4 years ago
|
||
When compiling with --ion-eager, we may do bytecode analysis before we have a jitscript. Instead of adding a flag to JitScript, I reused the uninlineable
flag on JSScript.
Depends on D83719
Assignee | ||
Comment 32•4 years ago
|
||
Drawing a clearer distinction between cases where we don't support inlining and cases where we have chosen not to inline.
Depends on D83720
Assignee | ||
Comment 33•4 years ago
|
||
Depends on D83721
Comment 34•4 years ago
|
||
Comment 35•4 years ago
|
||
bugherder |
Assignee | ||
Comment 36•4 years ago
|
||
Based on IonBuilder::buildInline and IonBuilder::inlineScriptedCall.
Assignee | ||
Comment 37•4 years ago
|
||
Fixing up the relevant TODOs.
There's one remaining TODO in build_Try
about supporting try-catch in inlined functions. The corresponding comment in IonBuilder dates back to bug 916914. As far as I can tell, it is out of date, and we support inlining JSOp::Try out of the box. I might be missing something, though, so I've left that comment in for now.
Depends on D83880
Assignee | ||
Comment 38•4 years ago
|
||
When a script becomes observable to the debugger, we invalidate the IonScript. We must also invalidate any scripts that inlined that script. The infrastructure to do this already exists; we just have to make sure inlined Warp compilations register themselves.
Depends on D83881
Comment 39•4 years ago
|
||
Assignee | ||
Comment 40•4 years ago
|
||
When we bail out of Warp, we create one or more BaselineFrames, each of which stores an ICScript. Until now, we used the default ICScript for each frame. This works poorly with our invalidation strategy: when we bail out of an inlined function because a transpiled guard failed, we want to resume execution with the inlined ICScript so that attaching a new IC will trigger us to invalidate the inlined version of the script.
Fortunately, bailouts rebuild the stack from the outermost to the innermost frame. The outermost frame will always use its default ICScript, and inner frames will use ICScripts owned by the outermost frame's JitScript. In theory we could recover the ICScript for an inlined call by looking up the ICEntry for that pc and inspecting the stub, but that seems slow and awkward. Instead, I added edges between ICScripts to let us find inlined children.
Note: the set of ICScripts in the inlining tree is the same as the set of ICScripts in InliningRoot::inlinedScripts_. If we moved ICScript ownership into the tree, then the only remaining members of InliningRoot would be the fallback stub space, and a JSScript pointer that I am about to add so that we can find the right script to invalidate. We could consider getting rid of InliningRoot entirely, but since one of the reasons we wanted a separate fallback stub space was because it could make things easier when we cancel inlining, I am holding off until we look at cancelation.
Assignee | ||
Comment 41•4 years ago
|
||
In preparation for fixing bailouts, this patch does two things:
- Moves initialization of
BaselineFrame::icScript_
out ofsetInterpreterFields
, where it didn't really belong. - Fixes
setInterpreterFields
to initializeinterpreterICEntry_
to use the ICScript stored in the frame instead of getting it from the JitScript.
Depends on D84427
Assignee | ||
Comment 42•4 years ago
|
||
Depends on D84428
Assignee | ||
Comment 43•4 years ago
|
||
When we replace a stub in an inlined ICScript, we want to invalidate the script into which it was inlined, rather than the original copy of the inlined script.
Depends on D84429
Comment 44•4 years ago
|
||
bugherder |
Comment 45•4 years ago
|
||
Comment 46•4 years ago
|
||
bugherder |
Assignee | ||
Comment 47•4 years ago
|
||
This fixes a crash in Elm-TodoMVC.
Assignee | ||
Comment 48•4 years ago
|
||
With these conservative heuristics, trial inlining gives a performance improvement on Octane and does not regress Speedometer.
Depends on D86101
Assignee | ||
Comment 49•4 years ago
|
||
Prior to this patch, if we aborted with AbortReason::Disable (for example, because it contains an unsupported opcode), then we would abort the entire compilation. With this patch, we mark the script uninlineable, clean up, and continue as if we had not done trial inlining. This improves Octane by ~1.3%.
The CacheIR for the callsite still contains a CallInlinedFunction. The best approach I could come up with was to transpile CallInlinedFunction differently depending on whether we were actually inlining. If not, it is treated like a regular CallScriptedFunction.
An alternative approach would be to check all the conditions that could cause AbortReason::Disable beforehand (possibly in BytecodeAnalysis) and preemptively mark the script unlineable, with the downside of creating another piece of code that had to be kept in sync with WarpOracle, and there are just enough quirky cases to make that seem like a bad idea. In the future, we could combine the approaches by preemptively marking the easy cases and relying on this code for the rest.
Depends on D86102
Assignee | ||
Comment 50•4 years ago
|
||
Depends on D86103
Comment 51•4 years ago
|
||
Comment 52•4 years ago
|
||
bugherder |
Assignee | ||
Comment 53•4 years ago
|
||
Limit the maximum depth of the inlining tree and don't inline scripts with too many arguments.
Comment 54•4 years ago
|
||
Comment 55•4 years ago
|
||
bugherder |
Assignee | ||
Comment 56•4 years ago
|
||
Pulling this out into its own function so that the next patch can add an early return.
Assignee | ||
Comment 57•4 years ago
|
||
We have to loosen this assertion to handle cases where we trial-inline a script because needsArgsObj
is false, but then we call argumentsOptimizationFailed on that script before we reach the warp oracle.
Depends on D86771
Comment 58•4 years ago
|
||
Comment 59•4 years ago
|
||
bugherder |
Assignee | ||
Comment 60•4 years ago
|
||
If a script cannot be warp-compiled, then we do not want to do trial inlining for that script. Where possible, it seems better to mark scripts proactively, instead of undoing the trial inlining after failing to warp-compile.
Assignee | ||
Comment 61•4 years ago
|
||
Depends on D87899
Comment 62•4 years ago
|
||
Assignee | ||
Updated•4 years ago
|
Comment 63•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/bd4a5fb3aa65
https://hg.mozilla.org/mozilla-central/rev/4462d1a3ffbb
Updated•4 years ago
|
Description
•