Open
Bug 1336655
Opened 8 years ago
Updated 2 years ago
reduce repo-size impact of vendoring stylo dependencies
Categories
(Firefox Build System :: General, defect, P5)
Firefox Build System
General
Tracking
(firefox57 wontfix)
NEW
Tracking | Status | |
---|---|---|
firefox57 | --- | wontfix |
People
(Reporter: bholley, Unassigned)
References
Details
The patch in bug 1336607 is 15-megs uncompressed, which is a lot. We poked at this a bit and found some low-hanging fruit. Not entirely sure how much work here we want to do though.
Cargo provides the nice feature of being able to exclude files for packaging: http://doc.crates.io/manifest.html#the-exclude-and-include-fields-optional
Some easy wins in that department are:
* Skipping the tests directory in rust-bindgen, which saves 740k
* Removing the checked-in copy of Sherlock holmes in aho-corasick, which saves 500k
* skipping css-parsing-tests from rust-cssparser, which saves 300k
* skipping regex/examples and regex/tests, which saves 300k
Each one of these would require a republish, which is a bit annoying. But if it's a non-breaking update, we should be able to avoid touching and Cargo.tomls.
The other big win would be to use Cargo features to avoid pulling in things like encoding-index-simpchinese via rust-encoding. All of those dependencies have massive tables, and we definitely don't want them in either or source tree or our binary. This refactoring would save ~2M.
The above steps would reduce the patch size from ~15 megs to ~10 megs. If we think that's worth the time, somebody just needs to go and do it.
Comment 1•8 years ago
|
||
- unicode-segmentation: https://github.com/unicode-rs/unicode-segmentation/pull/15
- unicode-normalization: https://github.com/unicode-rs/unicode-normalization/pull/13
- aho-corasick: https://github.com/BurntSushi/aho-corasick/pull/10
Comment 2•8 years ago
|
||
- bindgen: https://github.com/servo/rust-bindgen/pull/477
- cssparser: https://github.com/servo/rust-cssparser/pull/116
I'll leave someone else to do the encoding one.
Why do we need to vendor regex? I thought we were getting rid of the env_logger dependency?
Comment 3•8 years ago
|
||
I think the reduction from 15MB to 10MB is worth a few hours of work.
Also, do we really need all ~80 packages that Servo pulls in? That more than quadruples our existing package count.
Comment 4•8 years ago
|
||
Heh, 80. Servo pulls in 300 packages IIRC, Stylo already gets the reduced version.
I'll have a look at the crates being vendored though. We shouldn't need stuff like xdg.
Comment 5•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #4)
> Heh, 80. Servo pulls in 300 packages IIRC, Stylo already gets the reduced
> version.
300?!
> I'll have a look at the crates being vendored though. We shouldn't need
> stuff like xdg.
Thank you.
Comment 6•8 years ago
|
||
> 300?!
Remember that a lot of our stack is from the community. Core HTTP handling is done by hyper, for example, which itself has its functionality (MIME, cookies, etc) split up into many crates (most of them maintained by hyper itself, but some of them, like cookie-rs, maintained by others). Then, many projects are groups of related crates -- like the unicode-* crates (Rust's ICU), phf-*, num-*, etc. Then there are all the bindings crates (especially for UI stuff). It ... adds up :)
Most of these crates wouldn't be pulled in even if layout were made part of Gecko (well, it would need some refactoring, but there's no strong dependency of layout on these things); they belong to things like script/net/canvas/etc which probably aren't going to be moved into Gecko.
---------
Looks like xdg is pulled in by servo_config, which looks for a conf file in the xdg base directory. We should probably exclude as much of servo_config as possible (we need it for opts, but perhaps that can be abstracted out)
Comment 7•8 years ago
|
||
(In reply to Manish Goregaokar [:manishearth] from comment #2)
> Why do we need to vendor regex? I thought we were getting rid of the
> env_logger dependency?
Bindgen uses regular expressions for configuration.
Comment 8•8 years ago
|
||
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> avoid pulling in things like encoding-index-simpchinese via rust-encoding.
Eventually, I’d like to move Servo (and it’s dependencies) to using Henri Sivonen’s encoding-rs (which optimizes for code size) instead of rust-encoding.
One difficulty is that removing rust-encoding support in rust-url would be a breaking change and so should go in version 2.0, which means also doing a breaking version of Hyper etc.
Reporter | ||
Comment 9•8 years ago
|
||
(In reply to Simon Sapin (:SimonSapin) from comment #8)
> (In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #0)
> > avoid pulling in things like encoding-index-simpchinese via rust-encoding.
>
> Eventually, I’d like to move Servo (and it’s dependencies) to using Henri
> Sivonen’s encoding-rs (which optimizes for code size) instead of
> rust-encoding.
How big of a difference does it make to codesize? Is this something we're likely to do (or that it makes sense to try to do) before shipping stylo?
Comment 10•8 years ago
|
||
> How big of a difference does it make to codesize?
I don’t know. I could take some time to measure, if that helps. Or Henri might already have measured? (encoding-rs vs rust-encoding)
But I just realized: since comment 0 mentions the size of the patch in bug 1336607, are we talking about size of uncompressed source code, or size of compiled binary code?
Flags: needinfo?(hsivonen)
Reporter | ||
Comment 11•8 years ago
|
||
(In reply to Simon Sapin (:SimonSapin) from comment #10)
> But I just realized: since comment 0 mentions the size of the patch in bug
> 1336607, are we talking about size of uncompressed source code, or size of
> compiled binary code?
This bug as filed is about the size of source (both compressed and uncompressed, primarily compressed).
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #9)
> How big of a difference does it make to codesize?
That's a difficult question to answer precisely without actually doing all the work in the Gecko context and measuring libxul sizes.
However, both rust-encoding and encoding_rs have sample command-line apps, so their sizes can be compared. It's worth noting that the encoding_rs sample, for sample purposes, uses both encoding_rs's "UTF-8 as internal Unicode representation" and "UTF-16 as internal Unicode representation" functionality. The former corresponds to what rust-encoding does and the latter to uconv does.
The release-mode x86_64 sizes are:
recode: 4174864
recode --features no-optimized-legacy-encoding: 4109320
recode_rs --features simd-accel: 3969536
recode_rs --features 'simd-accel no-static-ideograph-encoder-tables': 3919616
So encoding_rs is 185 KB or 200 KB smaller than rust-encoding depending on how speed vs. size is valued for legacy encode. (Note that even when valuing speed over size, encoding_rs currently doesn't do that tradeoff to the fullest: Even in the larger+faster mode, encoding_rs still encodes legacy CJK more slowly than rust-encoding and doesn't speed up single-byte or EUC-KR encode. It would be easy to make legacy CJK encode in encoding_rs faster by making the lookup tables even larger, but I doubt that browser use cases justify that. In both cases, encoding_rs is faster than rust-encoding for decode.)
But in both cases encoding_rs does more than rust-encoding: encoding_rs also contains the "replace uconv" UTF-16 functionality.
So since encoding_rs is already smaller than rust-encoding, the difference should be even greater when comparing encoding_rs to the combination of rust-encoding and uconv, which is the eventual big-picture comparison we should consider.
> Is this something we're
> likely to do (or that it makes sense to try to do) before shipping stylo?
The current status is that the Rust code of encoding_rs in a good shape and I have planned the Gecko glue, but I haven't yet typed out the Gecko bits. I intend to start that phase next week, and I'd expect to finish that part before Stylo ships.
The main foreseeable blocker is that explicit SSE2 is needed in order not to regress perf for the most important conversion pair (decoding UTF-8 into UTF-16), because in uconv that conversion pair uses SSE2. Explicit SSE2 isn't currently nightly-only in Rust. This may likely not get solved before Stylo ships.
So we'll need one of:
1) BurntSushi's work discussed in https://internals.rust-lang.org/t/getting-explicit-simd-on-stable-rust/4380 getting done and shipped.
2) Start building Firefox with nightly Rust.
3) Start building Firefox with release-snapshot rustc with nightly-only features enabled.
4) Take a regression in UTF-8 to UTF-16 decode perf until #1 is done.
(I don't think it makes sense to work around Rust's SIMD status by introducing hacks to call into C++ in order to be allowed to use explicit SSE2. It would make more sense to put the effort into boosting option #1 instead.)
Flags: needinfo?(hsivonen)
Comment 13•8 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #12)
> The release-mode x86_64 sizes are:
> recode: 4174864
> recode --features no-optimized-legacy-encoding: 4109320
> recode_rs --features simd-accel: 3969536
> recode_rs --features 'simd-accel no-static-ideograph-encoder-tables': 3919616
Is ~4MB really the size of the binary+data, or does that include debug info, which can have varying size depending on various factors, such that it could be larger with smaller binary+data?
Try `size $binary`.
Oops. There's an even better comparison to make: rust-encoding's sample app vs. the same sample app compiled with the rust-encoding API compatibility layer for encoding_rs. Also, LTO enabled in both cases this time, which in this case hopefully should discard UTF-16 functionality from encoding_rs and HZ from rust-encoding (but I didn't actually check).
So this is the "recode" sample app from rust-encoding in release mode with LTO on x86_64.
compat layer with simd: 2896784
compat layer with simd and slower legacy encode: 2847048
rust-encoding: 3076736
rust-encoding with slower legacy encode: 2862336
So 176 KB or 15 KB smaller depending on the legacy encode config.
Oh, and the encoding_rs version I used is the current development version that contains Rayon. However, Rayon a) is sunk cost for Gecko and b) turned out to be quite a bit less useful that I hoped for encoding_rs.
Anyway, even before I get around to writing the uconv replacement glue code, it might make sense for the Rust bits in Gecko to use encoding_rs (directly or with compat layer) instead of rust-encoding.
(Recent good stuff isn't on crates.io quite yet, so https://github.com/hsivonen/encoding_rs and https://github.com/hsivonen/encoding_rs_compat/ with the latter's Cargo.toml modified to depend on an off-crates.io copy of the former until I get the 0.4.0 release of encoding_rs polished and onto crates.io.)
"size" command outputs (release-mode with LTO on x86_64):
recode with rust-encoding:
text data bss dec hex filename
841755 19641 3640 865036 d330c target/release/examples/recode
recode with rust-encoding and smaller+slower legacy encode:
text data bss dec hex filename
631451 19641 3640 654732 9fd8c target/release/examples/recode
recode with compat layer and simd:
text data bss dec hex filename
635931 25825 3704 665460 a2774 target/release/examples/recode
recode with compat layer, simd and smaller+slower legacy encode:
text data bss dec hex filename
586427 25825 3704 615956 96614 target/release/examples/recode
The text vs. data numbers look counter-intuitive.
Comment 16•8 years ago
|
||
text is how executable code is called.
(In reply to Mike Hommey [:glandium] from comment #16)
> text is how executable code is called.
That's my expectation. Yet, the options to reduce table size does nothing to the data segment. Same thing with both encoding_rs and rust-encoding. Is rustc putting stuff that logically belongs in the data segment (immutable statically allocated data tables) into the text segment?
Comment 18•8 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) (away from Bugzilla until 2017-02-14) from comment #17)
> (In reply to Mike Hommey [:glandium] from comment #16)
> > text is how executable code is called.
>
> That's my expectation. Yet, the options to reduce table size does nothing to
> the data segment. Same thing with both encoding_rs and rust-encoding. Is
> rustc putting stuff that logically belongs in the data segment (immutable
> statically allocated data tables) into the text segment?
Yes.
Reporter | ||
Comment 19•8 years ago
|
||
FWIW, the above discussion is a bit of a digression for this bug, so we should split out a separate bug if there's more discussion to be had.
Comment 20•8 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) (away from Bugzilla until 2017-02-14) from comment #17)
> (In reply to Mike Hommey [:glandium] from comment #16)
> > text is how executable code is called.
>
> That's my expectation. Yet, the options to reduce table size does nothing to
> the data segment. Same thing with both encoding_rs and rust-encoding. Is
> rustc putting stuff that logically belongs in the data segment (immutable
> statically allocated data tables) into the text segment?
And/or in bss, filled through code.
Updated•8 years ago
|
Priority: -- → P5
Comment 21•7 years ago
|
||
status-firefox57=wontfix unless someone thinks this bug should block 57
status-firefox57:
--- → wontfix
Updated•7 years ago
|
Product: Core → Firefox Build System
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•