Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Description

•

4 years ago

While majority of Fluent in Gecko is already in either Rust or C++, there are still two pieces in JavaScript.

There are three main reasons to move away from JS here:

Performance (see bug 1613705 comment 6 for some rough estimates)
Memory (same estimate gives us ~800kb savings)
Architecture - current architecture makes JS code block first paint and layout of the initial window.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Priority: -- → P1

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Depends on: 1613705

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Depends on: 1660392

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 1

•

4 years ago

We now have first functional pieces operational, and are starting to tie things up. The order of steps we plan to do is as follows:

(djg) Land bug 1660393 to get C++ L10nRegistry::Load(Sync) working
(djg) Factor our chunk-vec as a separate PR against fluent-rs (https://github.com/zbraniecki/fluent-rs/pull/3)
(djg/zibi) Merge the l10nregistry-rs PR from :djg (https://github.com/zbraniecki/l10nregistry-rs/pull/1)
(zibi) polish and release l10nregistry-rs
(zibi/djg) release chunk-vec
(zibi) Plug l10nregistry-rs into L10nRegistry in Gecko and expose via XPIDL
(djg/zibi) Merge the fluent-fallback PR from :djg (https://github.com/zbraniecki/fluent-rs/pull/3)
(zibi) Write a PR that movesLocalization.cpp to use fluent-fallback
(djg) Get the Future->Promise for fluent-fallback to Localization WebIDL use
(zibi) Clean up Localization/DOMLocalization/DocumentL10n to remove the no longer needed JSContext
(zibi) Remove Localization.jsm and L10nRegistry.jsm

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Blocks: 1683759

Paul Adenot (:padenot)

Updated

•

4 years ago

Blocks: 1685365

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 2

•

4 years ago

Attached patch markers.diff (deleted) — Details — Splinter Review

A set of markers used in performance profiles for identifying:

l10n_start_URL - when the document encounters the initial FTL link
l10n_trigger_URL - when the document triggers initial translation phase
l10n_end_URL - when the document reports initial translation to be completed

Assignee: nobody → zbraniecki

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 3

•

4 years ago

Final pre-review numbers!

With the advancements in Gecko bindings I was able to profile startup with the markers as described above.

Here are my profiles based on mozilla-central from the past weekend:

mozilla-central (using JS L10nRegistry and Localization):

1ms intervals:

https://share.firefox.dev/2Mmn6Te
https://share.firefox.dev/3a3R1bd
https://share.firefox.dev/3c8uIUn

0.1ms intervals:

https://share.firefox.dev/3qGfs5c
https://share.firefox.dev/3og8jqH

mozilla-central + l10nregistry-rs + localization-rs:

1ms intervals:

https://share.firefox.dev/3sNsqjv
https://share.firefox.dev/2LRgPzj
https://share.firefox.dev/2KO9C2q

0.1ms intervals:

https://share.firefox.dev/2Y7fAyi
https://share.firefox.dev/2YaePVh

In the main process you can find browser.xhtml and about:preferences, and in the content process about:home and about:newtab.

I'd appreciate any eyeballs that may want to evaluate anything standing out.

From my evaluation it looks like we're generally in a good shape, and what's remaining are:

Consider whether we want to prefetch L10n in either sync or async and then apply translation as we parse instead of collecting elements and applying translations after.
Consider whether we want to maintain the XUL cache and what's really a value of it when we're out of the JS realm on the blocking path
Bunch of microoptimizations in the Fluent parser around slice iteration and bytes retrieval.
Further Gecko/XPCOM/DOM bindings optimizations to minimize the cost there (hope to catch those in the review process!)
In the about:newtab there seem to be a large cost of JSON parsing, likely l10n-args. Is there a chance we can parse JSON faster?

I consider those optimizations optional and non-blocking landing of this work now, because the performance numbers look good!
I'll share more details in the next comment.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 4

•

4 years ago

I evaluated performance of four documents:

browser.xhtml
about:preferences
about:home
about:newtab

using two methods:

1ms profiler time, and l10n_end - l10n_trigger memory
talos tests

Profiler

From the profiler, I used the opt build, and measured l10n_end - l10n_start and l10n_end - l10n_trigger - the former being similarly noisy to talos, and the latter being much cleaner. The latter is the real different, the phase where localization is applied. If you look at the profiles, almost nothing happens before than, as we don't currently prefetch, so we can focus on the end - trigger phase.

We need to recognize, that the profiler adds some overhead and in theory may give us different results, so it is important to cross-check with talos, but in this case, I think the results are quite consistent and Talos matches end - start in the Profiler results, while end - trigger is the isolated difference that represents the actual perf difference from the change.

There's also a little bit of first-run difference, so I used an average between 2nd and 3rd for the table below (stdev between them is low):

Document	JS (ms)	Rust (ms)	Diff	%
browser.xhtml	7.5	4.7	-2.8ms	-37.3%
Preferences	19	9.6	-9.5ms	-65.8%
about:home	79	14	-65ms	-82.27%
about:newtab	122	80	-42ms	-34.42%

Document	JS (mem)	Rust (mem)	Diff	%
browser.xhtml	1.3mb	0.9mb	-0.4mb	-30.76%
Preferences	1.94mb	1.26mb	-0.68mb	-35.05%
about:home	5.25mb	0.97mb	-4.28mb	-81.50%
about:newtab	6.8mb	2.8mb	-4.00mb	-58.82%

Both numbers, time and memory, go significantly down!

Talos

Unfortunately, talos tests are quite noisy, so it's really hard to pin-point the wins, but one of the wins with the patches is that the stdev goes noticeably down, so I hope to also make the talos tests a better tool for further optimizations evaluation.

I tried to run it with ~40 reps, but stdev is continuously high enough that cutting 3ms from browser.xhtml or even 10ms from about:preferences is indistinguishable from noise when stdev is 15-20ms!

In result, my read from talos is that most numbers go down, in several cases quite significantly. stdev also goes down, which is great for the value of talos further :)

Document	Platform	JS (ms)	Rust (ms)	Diff	%
ts_paint	Linux	253.1	254.98	1.88ms	+0.7%
ts_paint	MacOS	928.1	934.32	6.22ms	+0.67%
ts_paint	Windows	365.88	359.85	-6.03ms	-1.64%
twinopen	Linux	342.67	343.88	1.21ms	+0.35%
twinopen	MacOS	124.54	122.0	-2.54ms	-2.03%
twinopen	Windows	104.5	101.66	-2.84ms	-2.74%
about_newtab	Linux	30.85	30.21	-0.64ms	-2.07%
about_newtab	MacOS	32.36	32.08	-0.28ms	-0.86%
about_newtab	Windows	31.81	29.74	-2.07ms	-6.50%
about_preferences_basic	Linux	124.39	102.0	-22.39ms	-17.99%
about_preferences_basic	MacOS	107.73	104.84	-2.89ms	-2.68%
about_preferences_basic	Windows	116.19	105.94	-10.25ms	-8.82%

Here's the full compare view: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=8771dfdc8694a91053b5e86c0a8ad9de34b68393&newProject=try&newRevision=c7f8b45423c3f228ad170c0a9b668e424f9abc96

With porfiler wins in both time and memory, and talos showing general trend down, some strong wins and much lower stdev in all tests, I'm comfortable recommending this change with the numbers as we have them right now.

Once we're closer to landing, I'll redo the talos tests to see if maybe we get more significant wins.

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Summary: Migrate Fluent in Gecko off of JavaScript → [meta] Migrate Fluent in Gecko off of JavaScript

BugBot [:suhaib / :marco/ :calixte]

Updated

•

4 years ago

Keywords: meta

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

4 years ago

Depends on: 1672317

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 5

•

4 years ago

Latest benchmarks: bug 1613705 comment 37
Latest talos numbers: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=4e7fdee308deafa3bebc6f177caf5d1720ee369f&newProject=try&newRevision=fd42ad55cf7527849a589454153c6e3bf1a38b11&framework=1

The status of the patchset:

FileSource - mostly reviewed, likely close to final state, some opportunity to profile I/O
L10nRegistry - in review, seems to be stabilizing, likely in last rounds of review
Localization - first round of reviews, functionality complete

And on the crate side:

fluent-syntax - stable, documented, good test coverage
fluent-bundle - stable, documented, good test coverage
fluent-fallback - to be documented and cleaned up, but stable
l10nregistry - to be documented and cleaned up, but stable

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 6

•

4 years ago

New talos build: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=c5d056841197483fbcd1e4359ec71c2c85a752a9&newProject=try&newRevision=677f66def3a622cc6ae09bbe2c48d6bd629ea638&framework=1

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Updated

•

3 years ago

Depends on: 1723886

Zibi Braniecki [:zbraniecki][:gandalf]

Assignee

Comment 7

•

3 years ago

this is now fixed and in beta.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → FIXED