Closed Bug 1382650 Opened 7 years ago Closed 6 years ago

Consider adding a 4th tier using Ion optimization levels

Categories

(Core :: JavaScript Engine: JIT, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla68
Tracking Status
firefox67 --- fixed
firefox68 --- fixed

People

(Reporter: jandem, Assigned: jandem)

References

(Depends on 1 open bug, Blocks 2 open bugs)

Details

Attachments

(8 files, 1 obsolete file)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
Attached patch Hackish prototype (obsolete) (deleted) — Splinter Review
Scripts are currently Ion-compiled when their use count reaches ~1000. Ion does pretty aggressive inlining of scripts, this is necessary for really hot code (like the first 5 Octane tests). On workloads like Speedometer, this inlining is not helping as much and compilation time is an issue. I'm attaching a hackish prototype to use the optimization level infrastructure Hannes added (I had to fix some regressions and bugs but it mostly worked). With this patch we have 2 Ion tiers: we still compile scripts with use count 1000 but we only inline 1-2 levels deep and don't inline large functions. We also include use count checks and when a script's use count reaches 100,000 we recompile and use the heuristics we have now. It works surprisingly well. Octane is mostly unaffected or becomes a bit faster, Sunspider and Kraken regress by a few %, and Speedometer wins at least a few points on my machine (not sure how much because it's pretty noisy). I'm not sure if this is worth pursuing now as this stuff is hard to get right and I think there's lower-hanging fruit still.
Similar to bug 1080776?
(In reply to Guilherme Lima from comment #1) > Similar to bug 1080776? It's similar but a bit different: the approach there is to add a tier between Baseline and current Ion (at warmup count 100) whereas the patch here "delays" full-Ion. In general I don't think Ion-compiling with warmup count 100 (that bug) is a good idea.
Comment on attachment 8888286 [details] [diff] [review] Hackish prototype Review of attachment 8888286 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/IonBuilder.cpp @@ +5421,4 @@ > MRecompileCheck* check = > MRecompileCheck::New(alloc(), target->nonLazyScript(), > optimizationInfo().inliningRecompileThreshold(), > + MRecompileCheck::RecompileCheckType::Inlining); The problem is that the more we inline the more we would be likely to invalidate the compilation because of inlining issues. I suggest we use a larger threshold factor than 4, or with an exponentiation by the inlining depth, such that we do not invalidate the code too frequently.
Attachment #8888286 - Flags: feedback+
When Apple added their LLVM-tier JIT, they found that the performance improvements from a 4th tier mostly came from better inlining information. Although this design would be valuable by itself, it would be even better to have tier-4 Ion able to read type information from tier-3 Ion's ICs.
Priority: -- → P3
Depends on: 1467496
Blocks: 1490847

I've been playing with this a bit and it's still a significant win on Speedometer (> 5%), Google Docs (5-10%) etc. SunSpider, Kraken, Octane are either unaffected or show an improvement.

This only tunes inlining options - we could probably get more speedups by making more IonBuilder code or expensive optimization passes run only at the highest optimization level.

Assignee: nobody → jdemooij
Status: NEW → ASSIGNED

Just as a thought, I have been wondering whether it would be possible to use AFL-style branch tuples (http://lcamtuf.coredump.cx/afl/technical_details.txt) in the Baseline ICs to generate an implicit trace.

If we had something like that, then we would be able to make separate trace trees by type/shape, receiving the benefits of inlining even in polymorphic code.

I believe the answer is yes, but it's quite a lot of work.

Depends on: 1536612
Attachment #8888286 - Attachment is obsolete: true

We used to have a different threshold for small functions but now they're both
set to 1000 so there's no need to special-case small functions.

Storing this also in IonOptimizationLevels.h/cpp is more complicated than
necessary.

Depends on D24153

There's a lot of complexity around setting/unsetting the eagerCompilation flag.
It's simpler to determine this based on the warm-up threshold being 0.

The patch also fixes some jit-tests where this patch would result in a change in
behavior.

Depends on D24154

Also adds a javascript.options.ion.full.threshold browser pref and similar shell
flags.

This doesn't rename the existing prefs yet.

Depends on D24155

We want this to be more than 100 for the full-optimizations tier. Making this
relative also works better for tests that set a small Ion warm-up threshold.

Also disables OSR in some tests depending on the old behavior.

Depends on D24156

The old code would assert because we needed too many scratch registers, but it
was dead code until this patch.

Depends on D24157

Ion can do aggressive inlining, but inlining a lot of code has a negative
effect on compilation time and memory usage. It also means we spend more time
in the slower Baseline code while compiling the Ion code off-thread or after an
invalidation.

To address this, Ion now consists of two tiers:

  • Normal: the first tier (warm-up threshold of 1,000) only inlines small
    functions one level deep. This tier also has recompile checks to
    recompile the script when it becomes very hot.

  • Full: the second tier (warm-up threshold of 100,000) is only used for very
    hot code so we can afford inlining a lot more code.

This improves Speedometer and GDocs by more than 5%.

Depends on D24158

Attachment #9052206 - Attachment description: Bug 1382650 part 3 - Clean up eager compilation behavior. r?nbp! → Bug 1382650 part 3 - Clean up Ion eager compilation code. r?nbp!
Keywords: leave-open
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ca8d272b176f part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r=nbp https://hg.mozilla.org/integration/autoland/rev/30247ab61679 part 2 - Store Ion warmup threshold only in JitOptions. r=nbp https://hg.mozilla.org/integration/autoland/rev/7ee65d07d83f part 3 - Clean up Ion eager compilation code. r=nbp
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a812f6daf98e part 4 - Split Ion warmup threshold JitOption in 'normal' and 'full' options. r=nbp https://hg.mozilla.org/integration/autoland/rev/0614a9178eef part 5 - Calculate OSR warm-up threshold difference based on the warm-up threshold instead of hard-coding 100. r=nbp https://hg.mozilla.org/integration/autoland/rev/6960595971e5 part 6 - Fix ARM64 implementation of branch32(AbsoluteAddress, Imm32). r=nbp
Keywords: leave-open
Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/391cc6344efe part 7 - Use a separate Ion optimization level for very hot code. r=nbp
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla68

Unfortunately bug 1536874 made changes to the tp6 tests right before this landed, so I don't know if perf alerts will work well.

That said, I did a Try push comparing Speedometer and tp6 with and without part 7 [0] on Linux64, OS X, Win64:

  • Speedometer: 5% faster
  • Instagram: 20% faster on Linux64, 13% on Win64. Not sure what's going on.
  • GDocs: 7-8% faster
  • GSlides: 6-7% faster
  • Facebook: 4% faster
  • YouTube: 4% faster

And some smaller improvements across the board. Improvements might be bigger on low-end devices.

[0] https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=699add20be66c755397e327ec2474d851aeae1a9&newProject=try&newRevision=8e46faa6645ec0f40fa0c4b04353e8bc8ea0e89c&framework=10

We might uplift these patches to beta. Having a JitOption makes it easier to
turn this off if needed.

Pushed by jdemooij@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/3134740d831c part 8 - Add a JitOption to disable use of Ion optimization levels. r=nbp

Comment on attachment 9052204 [details]
Bug 1382650 part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r?nbp!

Beta/Release Uplift Approval Request

  • Feature/Bug causing the regression: None
  • User impact if declined: After some discussion we'd like to uplift these patches to FF67 to fix issues with Ion compilation slowing down page loads.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Medium
  • Why is the change risky/not risky? (and alternatives if risky): There's some risk but this has been on Nightly for a week without any issues. We think the perf improvements are worth it.
  • String changes made/needed: None
Attachment #9052204 - Flags: approval-mozilla-beta?
Attachment #9052205 - Flags: approval-mozilla-beta?
Attachment #9052206 - Flags: approval-mozilla-beta?
Attachment #9052207 - Flags: approval-mozilla-beta?
Attachment #9052208 - Flags: approval-mozilla-beta?
Attachment #9052209 - Flags: approval-mozilla-beta?
Attachment #9052210 - Flags: approval-mozilla-beta?
Attachment #9055095 - Flags: approval-mozilla-beta?

Comment on attachment 9052204 [details]
Bug 1382650 part 1 - Remove separate Ion warmup threshold for small functions, as it's equivalent to the normal one. r?nbp!

These patches have been on Nightly for 10 days with no stability or functional regressions reported. This and the significant performance improvements on multiple major sites mean that I am taking the risk of taking these patches for 67 beta 9 as we are still in the middle of the beta cycle. If the uplifts to beta degrades our stability, we can back these patches out.

Attachment #9052204 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052205 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052206 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052207 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052208 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052209 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9052210 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Attachment #9055095 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

== Change summary for alert #20341 (as of Mon, 08 Apr 2019 08:41:03 GMT) ==

Improvements:

2% tp5o linux64-shippable opt e10s stylo 130.87 -> 127.77

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20341

No longer regressions: 1545379

Noticed some perf improvements! \0/

== Change summary for alert #20188 (as of Thu, 28 Mar 2019 21:31:19 GMT) ==

Improvements:

4% tp5o windows10-64-shippable-qr opt e10s stylo 122.10 -> 117.15

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20188

== Change summary for alert #20401 (as of Thu, 11 Apr 2019 05:40:11 GMT) ==

Improvements:

5% raptor-speedometer-firefox osx-10-10 opt 39.67 -> 41.84
2% raptor-speedometer-fennec android-hw-p2-8-0-arm7-api-16-nightly opt 23.98 -> 24.53

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20401

Regressions: 1642067
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: