Remove Shape Teleporting Optimization
Categories
(Core :: JavaScript Engine, enhancement, P3)
Tracking
()
People
(Reporter: mgaudet, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 1 obsolete file)
Our Shape Teleporting optimization incurs a number of costs in terms of complexity, and may be worth removing after a cursory investigation of the impact of doing so.
Updated•6 years ago
|
Comment 1•6 years ago
|
||
I did a quick analysis of the degree to which teleporting shapes today. It's not referenced in IonBuilder at all - only in the CacheIR infrastructure.
I did a quick tour of a number of tp-6 page-loads with an instrumented build - measuring the number of uses of the optimization and the number of prototype object guards.
The distribution of typical hit-counts goes like this:
HITS HITPCT SAVES VOLUME VOLPCT
59 0.00 9 590 0.00
537 0.00 8 4833 0.02
1989 0.01 7 15912 0.07
2507 0.02 10 27577 0.13
24021 0.15 5 144126 0.67
29604 0.18 6 207228 0.96
65166 0.40 4 325830 1.50
261538 1.61 3 1046152 4.83
792086 4.88 2 2376258 10.97
2442486 15.04 1 4884972 22.56
12618471 77.71 0 12618471 58.28
Here HITS is the number of hits in CacheIR of the optimization.
The hits are bucketed by SAVES - the length of the proto-chain list that they "saved" from having to be guarded. (so: 2507 hits of the optimization optimized a case where 10 proto-chain guards were saved).
The "VOLUME" column is just HITS scaled by SAVES - an estimate of the amount of computation saved (per optimization use during execution) for each bucket.
So about 60% of uses of the Teleporting chain optimization save us 0 guards.
About 80% of uses save us 0 or 1 guards.
About 97% of uses save us <= 4 guards.
We have a peak at 5 and a sharp dropoff after that, so 5 seems like a good number for a cut off. <2.5% of the volume of saved guards come from those cases, and they cap off at 10. We could also just set a cap of 10 to catch those cases if we really care for them.
Comment 2•5 years ago
|
||
I've gone ahead and implemented this, and collected some more data as well. (The data I have is from instrumentation and actually doesn't depend on the implementation, but I'm going to get page load times from a release build as well.)
I instrumented the IC guard code to count dynamic instances (of ICs, and of individual guards) with and without the shape-teleporting early-out at CacheIR.cpp:713. Here's my data from manual browsing of some of the tp6 sites (specifically Reddit, Google Search, YouTube front page, Google Docs, and Amazon shopping search):
================
With Teleporting
================
| chainLength | Dyn IC Instances | Dyn IC Instances (CDF %) | Dyn Guard Instances | Dyn Guard Instances (CDF %) |
|-------------+------------------+--------------------------+---------------------+-----------------------------|
| 0 | 2118337 | 99.55% | 0 | 0.00% |
| 1 | 275 | 99.57% | 275 | 0.51% |
| 2 | 1213 | 99.62% | 2426 | 4.96% |
| 3 | 714 | 99.66% | 2142 | 8.90% |
| 4 | 1025 | 99.70% | 4100 | 16.43% |
| 5 | 1076 | 99.76% | 5380 | 26.32% |
| 6 | 2102 | 99.85% | 12612 | 49.50% |
| 7 | 798 | 99.89% | 5586 | 59.76% |
| 8 | 730 | 99.93% | 5840 | 70.49% |
| 9 | 669 | 99.96% | 6021 | 81.56% |
| 10 | 332 | 99.97% | 3320 | 87.66% |
| 11 | 208 | 99.98% | 2288 | 91.86% |
| 12 | 369 | 100.00% | 4428 | 100.00% |
===================
Without Teleporting
===================
| chainLength | Dyn IC Instances | Dyn IC Instances (CDF %) | Dyn Guard Instances | Dyn Guard Instances (CDF %) |
|-------------+------------------+--------------------------+---------------------+-----------------------------|
| 0 | 2118337 | 78.79% | 0 | 0.00% |
| 1 | 279533 | 89.18% | 279533 | 26.84% |
| 2 | 219504 | 97.35% | 439008 | 68.98% |
| 3 | 38191 | 98.77% | 114573 | 79.98% |
| 4 | 17874 | 99.43% | 71496 | 86.85% |
| 5 | 1395 | 99.48% | 6975 | 87.52% |
| 6 | 2157 | 99.56% | 12942 | 88.76% |
| 7 | 2974 | 99.68% | 20818 | 90.76% |
| 8 | 1145 | 99.72% | 9160 | 91.64% |
| 9 | 1296 | 99.77% | 11664 | 92.76% |
| 10 | 1207 | 99.81% | 12070 | 93.92% |
| 11 | 383 | 99.83% | 4213 | 94.32% |
| 12 | 2592 | 99.92% | 31104 | 97.31% |
| 13 | 1536 | 99.98% | 19968 | 99.22% |
| 14 | 501 | 100.00% | 7014 | 99.90% |
| 15 | 71 | 100.00% | 1065 | 100.00% |
The fourth column (dynamic guard instances) is just the second column (dynamic IC invocations) multiplied by the chain length -- so e.g. if we have an IC with two guards and invoke it N times, that counts for 2N guards executed.
Looking at the weighted-by-guard-count results, there is a bit of a long-ish tail, so it's unclear whether this would be a cheap change. I suppose the real question is whether these counts are significant relative to total execution time -- I'll try to get pageload numbers with the Raptor infrastructure next.
(Will attach Phab diff for clean change and a patch file + raw instrumentation logs for the data.)
Comment 3•5 years ago
|
||
Comment 4•5 years ago
|
||
Comment 5•5 years ago
|
||
Comment 6•5 years ago
|
||
It seems that there is non-negligible impact to disabling shape teleporting (data from a local ./mach raptor-test --test raptor-tp6-1
with and without the above patch):
======
raptor-no-teleporting.json
======
* raptor-tp6-amazon-firefox: pageload: 787.840000 ms
* raptor-tp6-facebook-firefox: pageload: 685.350000 ms
* raptor-tp6-google-firefox: pageload: 509.250000 ms
* raptor-tp6-youtube-firefox: pageload: 949.170000 ms
======
raptor-teleporting.json
======
* raptor-tp6-amazon-firefox: pageload: 744.100000 ms
* raptor-tp6-facebook-firefox: pageload: 636.190000 ms
* raptor-tp6-google-firefox: pageload: 479.060000 ms
* raptor-tp6-youtube-firefox: pageload: 930.690000 ms
So that's between 2.0% (YouTube front page) and 7.7% (Facebook newsfeed) page-load time increase -- not so great.
A few ideas occur to me to simplify guards in other ways, but nothing immediately actionable. (There's perhaps a bit more design space to explore here too, e.g. don't attach an IC at all if the guard sequence is too long, which I could try too...) Thoughts?
Comment 7•5 years ago
|
||
Just to clarify - is when you are counting IC instances here, is it # of generated instances, or # of hits per instance?
Comment 8•5 years ago
|
||
(In reply to Kannan Vijayan [:djvj] from comment #7)
Just to clarify - is when you are counting IC instances here, is it # of generated instances, or # of hits per instance?
Number of hits (the counts come from printfs in the ICs that execute at IC execution time).
Updated•5 years ago
|
Updated•2 years ago
|
Description
•