Open Bug 1539467 Opened 6 years ago Updated 2 years ago

Remove Shape Teleporting Optimization

Categories

(Core :: JavaScript Engine, enhancement, P3)

enhancement

Tracking

()

People

(Reporter: mgaudet, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(2 files, 1 obsolete file)

Our Shape Teleporting optimization incurs a number of costs in terms of complexity, and may be worth removing after a cursory investigation of the impact of doing so.

Priority: -- → P3

I did a quick analysis of the degree to which teleporting shapes today. It's not referenced in IonBuilder at all - only in the CacheIR infrastructure.

I did a quick tour of a number of tp-6 page-loads with an instrumented build - measuring the number of uses of the optimization and the number of prototype object guards.

The distribution of typical hit-counts goes like this:

      HITS     HITPCT SAVES    VOLUME     VOLPCT
        59       0.00    9        590       0.00
       537       0.00    8       4833       0.02
      1989       0.01    7      15912       0.07
      2507       0.02   10      27577       0.13
     24021       0.15    5     144126       0.67
     29604       0.18    6     207228       0.96
     65166       0.40    4     325830       1.50
    261538       1.61    3    1046152       4.83
    792086       4.88    2    2376258      10.97
   2442486      15.04    1    4884972      22.56
  12618471      77.71    0   12618471      58.28

Here HITS is the number of hits in CacheIR of the optimization.

The hits are bucketed by SAVES - the length of the proto-chain list that they "saved" from having to be guarded. (so: 2507 hits of the optimization optimized a case where 10 proto-chain guards were saved).

The "VOLUME" column is just HITS scaled by SAVES - an estimate of the amount of computation saved (per optimization use during execution) for each bucket.

So about 60% of uses of the Teleporting chain optimization save us 0 guards.
About 80% of uses save us 0 or 1 guards.
About 97% of uses save us <= 4 guards.

We have a peak at 5 and a sharp dropoff after that, so 5 seems like a good number for a cut off. <2.5% of the volume of saved guards come from those cases, and they cap off at 10. We could also just set a cap of 10 to catch those cases if we really care for them.

I've gone ahead and implemented this, and collected some more data as well. (The data I have is from instrumentation and actually doesn't depend on the implementation, but I'm going to get page load times from a release build as well.)

I instrumented the IC guard code to count dynamic instances (of ICs, and of individual guards) with and without the shape-teleporting early-out at CacheIR.cpp:713. Here's my data from manual browsing of some of the tp6 sites (specifically Reddit, Google Search, YouTube front page, Google Docs, and Amazon shopping search):

================
With Teleporting
================

| chainLength | Dyn IC Instances | Dyn IC Instances (CDF %) | Dyn Guard Instances | Dyn Guard Instances (CDF %) |
|-------------+------------------+--------------------------+---------------------+-----------------------------|
|           0 |          2118337 |                   99.55% |                   0 |                       0.00% |
|           1 |              275 |                   99.57% |                 275 |                       0.51% |
|           2 |             1213 |                   99.62% |                2426 |                       4.96% |
|           3 |              714 |                   99.66% |                2142 |                       8.90% |
|           4 |             1025 |                   99.70% |                4100 |                      16.43% |
|           5 |             1076 |                   99.76% |                5380 |                      26.32% |
|           6 |             2102 |                   99.85% |               12612 |                      49.50% |
|           7 |              798 |                   99.89% |                5586 |                      59.76% |
|           8 |              730 |                   99.93% |                5840 |                      70.49% |
|           9 |              669 |                   99.96% |                6021 |                      81.56% |
|          10 |              332 |                   99.97% |                3320 |                      87.66% |
|          11 |              208 |                   99.98% |                2288 |                      91.86% |
|          12 |              369 |                  100.00% |                4428 |                     100.00% |


===================
Without Teleporting
===================

| chainLength | Dyn IC Instances | Dyn IC Instances (CDF %) | Dyn Guard Instances | Dyn Guard Instances (CDF %) |
|-------------+------------------+--------------------------+---------------------+-----------------------------|
|           0 |          2118337 |                   78.79% |                   0 |                       0.00% |
|           1 |           279533 |                   89.18% |              279533 |                      26.84% |
|           2 |           219504 |                   97.35% |              439008 |                      68.98% |
|           3 |            38191 |                   98.77% |              114573 |                      79.98% |
|           4 |            17874 |                   99.43% |               71496 |                      86.85% |
|           5 |             1395 |                   99.48% |                6975 |                      87.52% |
|           6 |             2157 |                   99.56% |               12942 |                      88.76% |
|           7 |             2974 |                   99.68% |               20818 |                      90.76% |
|           8 |             1145 |                   99.72% |                9160 |                      91.64% |
|           9 |             1296 |                   99.77% |               11664 |                      92.76% |
|          10 |             1207 |                   99.81% |               12070 |                      93.92% |
|          11 |              383 |                   99.83% |                4213 |                      94.32% |
|          12 |             2592 |                   99.92% |               31104 |                      97.31% |
|          13 |             1536 |                   99.98% |               19968 |                      99.22% |
|          14 |              501 |                  100.00% |                7014 |                      99.90% |
|          15 |               71 |                  100.00% |                1065 |                     100.00% |

The fourth column (dynamic guard instances) is just the second column (dynamic IC invocations) multiplied by the chain length -- so e.g. if we have an IC with two guards and invoke it N times, that counts for 2N guards executed.

Looking at the weighted-by-guard-count results, there is a bit of a long-ish tail, so it's unclear whether this would be a cheap change. I suppose the real question is whether these counts are significant relative to total execution time -- I'll try to get pageload numbers with the Raptor infrastructure next.

(Will attach Phab diff for clean change and a patch file + raw instrumentation logs for the data.)

Attached file Bug 1539467: Remove shape teleporting. WIP. (obsolete) (deleted) —
Attached file shape-teleporting-logs.zip (deleted) —

It seems that there is non-negligible impact to disabling shape teleporting (data from a local ./mach raptor-test --test raptor-tp6-1 with and without the above patch):

======
raptor-no-teleporting.json
======

* raptor-tp6-amazon-firefox: pageload: 787.840000 ms
* raptor-tp6-facebook-firefox: pageload: 685.350000 ms
* raptor-tp6-google-firefox: pageload: 509.250000 ms
* raptor-tp6-youtube-firefox: pageload: 949.170000 ms
======
raptor-teleporting.json
======

* raptor-tp6-amazon-firefox: pageload: 744.100000 ms
* raptor-tp6-facebook-firefox: pageload: 636.190000 ms
* raptor-tp6-google-firefox: pageload: 479.060000 ms
* raptor-tp6-youtube-firefox: pageload: 930.690000 ms

So that's between 2.0% (YouTube front page) and 7.7% (Facebook newsfeed) page-load time increase -- not so great.

A few ideas occur to me to simplify guards in other ways, but nothing immediately actionable. (There's perhaps a bit more design space to explore here too, e.g. don't attach an IC at all if the guard sequence is too long, which I could try too...) Thoughts?

Just to clarify - is when you are counting IC instances here, is it # of generated instances, or # of hits per instance?

(In reply to Kannan Vijayan [:djvj] from comment #7)

Just to clarify - is when you are counting IC instances here, is it # of generated instances, or # of hits per instance?

Number of hits (the counts come from printfs in the ICs that execute at IC execution time).

Attachment #9088527 - Attachment is obsolete: true
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: