Closed Bug 1478104 Opened 6 years ago Closed 3 years ago

Slow typing (1-5s delay) for GDocs with many active contributors

Categories

(Core :: JavaScript Engine: JIT, defect, P2)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox63 --- wontfix
firefox64 --- affected
firefox65 --- affected

People

(Reporter: Harald, Unassigned)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

STR:
- https://docs.google.com/document/d/146p7Y8Ues_AKjj4ReWCk6InOPWe3C3Koy6EQ1qnYKNM/edit#
- Have 20 devtools team members join and type updates

Profiles from different users experiencing the drag

:jdescottes profile: https://perfht.ml/2LlAlT1
:digitarald https://perfht.ml/2Li8fIe
:pbro https://perfht.ml/2uJqewh
Component: General → JavaScript Engine: JIT
Whiteboard: [qf] → [qf:p1:f67][js:investigate]
Maybe related, profile from a high-end windows laptop, with only 2 people in a document but keypress events of about 200+ms: https://perfht.ml/2LP6WQG
Same doc as in bug description with slow typing. This time on Windows with WebRender on and GFX set to Nvidia: https://perfht.ml/2vm5Pxy
Priority: -- → P2
The issue is much less perceivable after we created a new short doc, so document length is core part of the STR.
Bas identified that we were compiling a lot of scripts, and potentially invalidating a lot of them.  Ted and I and Bas investigated and found that yes, we were compiling a lot of scripts.. but not really invalidating a lot of them.

Late today we discovered that we were basically inlining scripts really aggressively when compiling, leading to huge compiled scripts that would presumably take a LONG time for Ion to compile (janking both main thread, and delaying the execution of optimized code by causing background compilation to take longer, and also lead to more invalidations due to more code being compiled together into one piece of jitcode).

Adjusting these basically makes everything better.  Simple patch to come soon.

The fix to this might regress octane.. but it's most likely worth it.
Assignee: nobody → kvijayan
Attached patch adjust-inlining-tunings.patch (deleted) β€” β€” Splinter Review
So this fixes it.

I'm gonna have to eat some crow here.  After preaching up and down about how some magical set of tuning changes isn't going to fix our problems.. well.. this magical set of tuning changes fixes all our problems.

So much for my proclamations from on high.

It doesn't even regress octane appreciably from what I can tell.

The gdocs page runs smooth as butter.

We seriously need to re-investigate Ion and how well it's serving us on real web pages.  I don't want to land this patch _without_ having some explicit Ion telemetry land with it.  We were completely blind to this issue.  We cannot afford to continue to be blind to JIT issues.

So: land this patch and land Ion compile time and inlining telemetry with it.  Follow up with landing of telemetry for all other significant JIT events that make sense and doesn't hit us with perf overhead.
Attachment #9011166 - Flags: review?(tcampbell)
Attachment #9011166 - Flags: review?(bas)
To save at least a bit of face here, I'd like to reitereate one comment I've been preaching for a while:

ION COMPILES TOO MUCH CODE!

We've overfitted Ion for octane and octane is basically completely oblivious to large (relative) amounts of time spent in compile.  We need to investigate this for further gains.
perhaps related to bug 1382650 / bug 1383358 ?
should maybe block bug 1490847
Kraken regresses by 4%.  Speedometer improves by 7%.  Octane looks to be a wash (maybe slight regression).  Gdocs page goes from unusable to pretty freaking smooth and usable (even the page load smoothness seems to improve drastically).

Jan, I'd like your go-ahead for this since we are going to be regressing some classical benchmarks.  I think this is a clear yes, but it's good to be explicitly aware of where we're gaining and where we're losing.
Thanks for the heads up Mayank!  I think we should close 1382650.  I'm actually more interested in reducing tiers and taking a different architectural approach.

Bug 1383358 is also somewhat misguided I think - we like Ion compilations, it's just that we want to be smarter about Ion compilations.

Bug 1490847 is definitely valid to be blocked on this.  Adding now.
Blocks: 1490847
CC-ing jan for feedback.  See comment 8 and comment 5.
Flags: needinfo?(jdemooij)
Comment on attachment 9011166 [details] [diff] [review]
adjust-inlining-tunings.patch

Switching second review to Jan, as he is a JS/Jit peer.
Flags: needinfo?(jdemooij)
Attachment #9011166 - Flags: review?(bas) → review?(jdemooij)
Kannan: awesome results!   I'll try this patch and run some perf profiles as well
FYI, we should try to land this in 64 if we can.

Moving to [qf:p1] for re-evaluation of target release
Whiteboard: [qf:p1:f67][js:investigate] → [qf:p1][js:investigate]
I'd like to see the results for leaving maxInlineDepth_ alone, too. I'm curious what portion of the change it is responsible for.
Whiteboard: [qf:p1][js:investigate] → [qf][js:investigate]
Whiteboard: [qf][js:investigate] → [qf:p1:f64][qf:js:investigate]
Ok, so after a bunch of head-scratching and weirdness, I think I have a reasonably strong testable hypothesis for what's happening here.  Let's do a review of what occurred first:

My initial patch showed significant improvements on the build at the time.  This was confirmed by jesup on his own builds.  Then when updating to tip, the patch's gains disappeared.  Then when jesup reverted to the older revision, the gains seen earlier were somehow gone.  Then when he updated to tip, it was fast again.. both on an unpatched build and with a patched build.. but with load-times showing patches.

My attempts yielded similar results: across different runs on the same build, when updating back and forth, when applying the patch or not.. the issues still showed up here, there, sometimes not at all, sometimes on this build but not that, sometimes on that build but not this.

All in all it's very odd behaviour.

Last night I ran a bunch of instrumented builds and collected information using my trusty friend `printf`.

Here's what I think is happening, with some prelude setup to justify it.

Prelude:

1. When we turn Ion off entirely, the page is slow, because without the Ion JIT the execution is slow enough to cause the page to respond poorly.  Events take a long time to complete becuause JS execution just takes an order of mangnitude longer to complete.

2. During keypresses, and maybe loads, there is some eval path that is getting executed, which hits the interpreter and stays there until we warmup into baseline.  This causes our general poor response times.

3. The aggressive nature of our Ion compiles exacerbates the problem - we take a long time to compile perfect code, but we have already spent a lot of time executing things slowly in the interpreter by that point.. and keypresses need immediate fast response.

(Example: I'm seeing compiled scripts get retired for linking, in a number of cases, 3-4 seconds after compilation.  Not infrequently when pathological cases hit we can retire 7-24 seconds after compiles finish)

4. Any instability in Ion (which occurs frequently enough) exacerbates the issue as well - as we return to baseline and/or interpreter and execute slowly again.


Conclusions:

1. Firstly, we need to make sure that "good enough" Ion compiles are made avaliable for execution in a timely manner.  This might mean reducing Ion's aggressiveness, or adding a new "cheap ion" phase that provides compiled scripts in a more timely manner.

However this will not solve the issue.

2. Improve cold code execution speed - which comes down to interpreter speed, and potentially entering baseline earlier.  Still need to verify.
A few more thoughts: 

heavyweight eval() calls in interactive events is a Bad Idea.

Compilation in response to user input events perhaps should be more aggressive (shorter warmup?) than timers, etc, network, etc.  We could have context of "we're running an event off the input event queue" available to affect tunings.  In general, we should consider having situational tunings we choose between based on <factors>

In google docs we're seeing some ion compiles take >1s CPU (just on the Helper thread, not counting lazylink on mainthread).  That can't be a good thing. (see perfht.ml link below)

Some way to measure the effectiveness of ION compile against the cost of the compile would let us tune and improve from real data - both locally, and in telemetry.



We still experience (at least on my dual-xeon machine) ~10% jemalloc lock contention (see bug 1488780 currently slated for qf:p1:67).   See https://perfht.ml/2N0lzNn - during keypresses, it's even higher - https://perfht.ml/2ztMEVS with several long spins of up to 60ms while one of the JS Helpers does a LifAlloc::freeAll() (mostly in the madvise()).  Perhaps we should bump this, unless it's specific to high-core-count machines.

Note this profile is a "fast" profile where keypress time is <100ms.

Even "fast" profiles (right now?) still will fall behind typing - note the latency values if you hover the black bars for keypresses (150-300+ ms latency before it starts executing - end result is 400-500ms (sometimes more) before typing is echoed.  Edge (last I checked) does even better than chrome here - and chrome is clearly still much better than we are, even in our 'fast' mode.   Their keypress handlers run in 40-70ms typically (vs 60-110).  Chrome's latency before running a keypress is rarely more than 60ms.  Sometimes they'll run several in a row before rendering - they appear to hold off rendering while they're still processing input events (at least for several keys at a time), much as we do (when they fall behind), but ther rarely fall more than 2 chars behind.
Depends on: 1494504, 1384808
Attachment #9011166 - Flags: review?(tcampbell)
Attachment #9011166 - Flags: review?(jdemooij)
Depends on: 1488768
Depends on: 1489142
Depends on: 1500052
Depends on: 1501328
Whiteboard: [qf:p1:f64][qf:js:investigate] → [qf:js:investigate]
Assignee: kvijayan → nobody
Performance Impact: --- → ?
Whiteboard: [qf:js:investigate]

:smaug could you retest this to see if it's improved? If we want to keep this bug open, could you set the performance project flag using the calculator: https://codepen.io/mstange/pen/eYeWrGr

Flags: needinfo?(bugs)

I and mcomella were testing a copy of the test document (which does have 151 pages) and typing there felt quite smooth on Nightly.

https://share.firefox.dev/34ORO0W
Longest jank is 149ms, but even that isn't visually jank. We paint twice there and handle 4 keypress events.
And keypress latencies in general are < 50ms

Google docs has of course changed over the years and bug 1755006 helped quite a bit with other cases when typing has been slow.

So unless someone can actually see still some concrete slowness here, I'm not sure how useful it is to track this as a performance issue.
Better to file new performance bugs with new STR.

Looks like also that dependencies have been fixed, so perhaps one could even close this bug? But this is in JIT component so up to the JS folks to decide.

Performance Impact: ? → ---
Flags: needinfo?(bugs)

The JS team feels this can be closed now. The warp project and other changes have also helped improved the experience with gdocs.

Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: