Closed Bug 1374824 Opened 7 years ago Closed 7 years ago

Build stylo for linux64

Categories

(Firefox Build System :: General, enhancement)

enhancement
Not set
normal

Tracking

(firefox56 fixed)

RESOLVED FIXED
mozilla56
Tracking Status
firefox56 --- fixed

People

(Reporter: gps, Assigned: gps)

References

Details

Attachments

(2 files)

Per stylo meeting earlier today, we'll be enabling building of stylo on platforms once they support it. (Stylo will still be disabled via pref for now.)

It is my understanding that we should be able to enable stylo for Linux64 builds today. So let's do that.

For now, we'll still have a "stylo" platform that runs tests with stylo enabled. This will likely stick around until stylo is enabled by default, at which point tests become redundant.
We need the Servo backout service to up before enabling building for linux64. Then we can stop building linux64-stylo builds (bug 1374748).
Blocks: 1374748
Summary: Enable stylo for linux64 → Build stylo for linux64
The Try push got a ton of mochitest failures: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f9568ad432825bbd73322991ba25266c85c3e669. I thought I heard the test suites were supposed to be clean after froydnj landed a WPT fix yesterday. Perhaps I misheard? Or maybe my patch is just wrong.
Greg, I think your Try push is missing the Stylo test fix from bug 1374432 comment 5. It landed on m-c just two hours before your Try push and I don't see the changeset in your Try push's hg history.

Manish, can you double check if the test failures in Greg's Try push were caused by using the Stylo test expectations when Stylo is built but disabled?
Assignee: nobody → gps
Flags: needinfo?(manishearth)
Seems to be the case.

Needs https://hg.mozilla.org/mozilla-central/rev/553d1437de14

We're also failing some accessibility tests on stylo due to lack of zoom. I'm not sure how those got enabled, but we may need to mark those as expected fail for stylo.
Flags: needinfo?(manishearth)
Comment on attachment 8879731 [details]
Bug 1374824 - Build stylo on Linux64 in automation;

https://reviewboard.mozilla.org/r/151072/#review156328

Sure, let's do this.  I think we need to polish `mach bootstrap` a little more before we require this locally, but building in automation WFM.

Want to write the equivalent patch for win64? :)
Attachment #8879731 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj] from comment #7)
> Want to write the equivalent patch for win64? :)

and maybe win32, now that bug 1365254 has landed?
Comment on attachment 8880097 [details]
Bug 1374824 - Fix --enable-stylo value processing;

https://reviewboard.mozilla.org/r/151454/#review156400

::: toolkit/moz.configure:634
(Diff revision 1)
>      enable_stylo = None
>  
>      # The default is to not build Stylo at all.
>      if value.origin == 'default':
>          pass
> -    elif value == 'build':
> +    elif len(value) and value[0] == 'build':

Can you file a followup bug to fix the `--enable-webrender` code?  `stylo_config` was modeled after `webrender`, and it looks like `webrender` ought to have the same bug.

Eek, does that mean we've been shipping webrender?!
Attachment #8880097 - Flags: review?(nfroyd) → review+
Filed bug 1375231 for that OptionRender issue. I'll also get a new bug on file for enabling Windows on Stylo, since I plan on landing this as soon as the Try run comes back clean. As a preview, I have a Try run for Windows at https://hg.mozilla.org/try/rev/da915b659f224b3a99173c8286c9812e52049587.
Status: NEW → ASSIGNED
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/1b5000300190
Fix --enable-stylo value processing; r=froydnj
https://hg.mozilla.org/integration/autoland/rev/471a163b37d0
Build stylo on Linux64 in automation; r=froydnj
Had to back out because of base toolchains build failure:

[task 2017-06-21T20:37:04.408331Z] 20:37:04     INFO -       Running `/home/worker/workspace/build/src/rustc/bin/rustc --crate-name selectors /home/worker/workspace/build/src/servo/components/selectors/lib.rs --crate-type lib -C opt-level=2 -C panic=abort -g -C metadata=e2cc91683d270b0d -C extra-filename=-e2cc91683d270b0d --out-dir /home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps --emit=dep-info,link --target x86_64-unknown-linux-gnu -C linker=/home/worker/workspace/build/src/build/cargo-linker -L dependency=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps -L dependency=/home/worker/workspace/build/src/obj-firefox/toolkit/library/release/deps --extern bitflags=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libbitflags-a8408926e41fa377.rlib --extern log=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/liblog-2f95e95cd59c68ff.rlib --extern servo_arc=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libservo_arc-e1c1cd1c0f46d3c3.rlib --extern precomputed_hash=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libprecomputed_hash-bb9056906d2fbf92.rlib --extern phf=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libphf-c32f66c63df2c0d8.rlib --extern cssparser=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libcssparser-4d9f3aea91e20e19.rlib --extern smallvec=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libsmallvec-d4496f15d9350a89.rlib --extern matches=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libmatches-bb55a7f51587b234.rlib --extern fnv=/home/worker/workspace/build/src/obj-firefox/toolkit/library/x86_64-unknown-linux-gnu/release/deps/libfnv-5a1b6d32969b4651.rlib`
[task 2017-06-21T20:37:07.107892Z] 20:37:07     INFO -  error: `Self` and associated types in struct expressions and patterns are unstable (see issue #37544)
[task 2017-06-21T20:37:07.108208Z] 20:37:07     INFO -     --> /home/worker/workspace/build/src/servo/components/selectors/context.rs:118:9
[task 2017-06-21T20:37:07.108474Z] 20:37:07     INFO -      |
[task 2017-06-21T20:37:07.108772Z] 20:37:07     INFO -  118 |         Self {
[task 2017-06-21T20:37:07.109044Z] 20:37:07     INFO -      |         ^^^^
[task 2017-06-21T20:37:07.113654Z] 20:37:07     INFO -      |
[task 2017-06-21T20:37:07.113740Z] 20:37:07     INFO -      = help: add #![feature(more_struct_aliases)] to the crate attributes to enable
[task 2017-06-21T20:37:07.113802Z] 20:37:07     INFO -  error: `Self` and associated types in struct expressions and patterns are unstable (see issue #37544)
[task 2017-06-21T20:37:07.113855Z] 20:37:07     INFO -     --> /home/worker/workspace/build/src/servo/components/selectors/context.rs:136:9
[task 2017-06-21T20:37:07.113893Z] 20:37:07     INFO -      |
[task 2017-06-21T20:37:07.113929Z] 20:37:07     INFO -  136 |         Self {
[task 2017-06-21T20:37:07.113966Z] 20:37:07     INFO -      |         ^^^^
[task 2017-06-21T20:37:07.113992Z] 20:37:07     INFO -      |
[task 2017-06-21T20:37:07.114039Z] 20:37:07     INFO -      = help: add #![feature(more_struct_aliases)] to the crate attributes to enable
[task 2017-06-21T20:37:07.114096Z] 20:37:07     INFO -  error: `Self` and associated types in struct expressions and patterns are unstable (see issue #37544)
[task 2017-06-21T20:37:07.114155Z] 20:37:07     INFO -    --> /home/worker/workspace/build/src/servo/components/selectors/matching.rs:83:9
[task 2017-06-21T20:37:07.114191Z] 20:37:07     INFO -     |
[task 2017-06-21T20:37:07.114228Z] 20:37:07     INFO -  83 |         Self {
[task 2017-06-21T20:37:07.114261Z] 20:37:07     INFO -     |         ^^^^
[task 2017-06-21T20:37:07.114287Z] 20:37:07     INFO -     |
[task 2017-06-21T20:37:07.114335Z] 20:37:07     INFO -     = help: add #![feature(more_struct_aliases)] to the crate attributes to enable
Manish: we're currently using rustc 1.15.1 on these base toolchains builds. Can we make this code easily work with that version of Rust or do we need to bump the Rust toolchain?
Flags: needinfo?(manishearth)
Backout by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/8fa50ef8c799
Backed out changeset 471a163b37d0 for breaking base toolchains build
Per IRC with glandium, we'll bump minimum Rust to 1.17. There's already a bug with a patch. So this shouldn't take long.
Depends on: 1374807
Flags: needinfo?(manishearth)
Yes, you can just replace Self with the actual named type being used there.
(In reply to Gregory Szorc [:gps] from comment #12)
> Filed bug 1375231 for that OptionRender issue. I'll also get a new bug on
> file for enabling Windows on Stylo, since I plan on landing this as soon as
> the Try run comes back clean. As a preview, I have a Try run for Windows at
> https://hg.mozilla.org/try/rev/da915b659f224b3a99173c8286c9812e52049587.

Greg's win32/win64 Try push with stylo built-but-disabled looks green:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=da915b659f224b3a99173c8286c9812e52049587

Greg, you can use bug 1366048 for enabling Stylo on Windows.
Pushed by gszorc@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/28585cf7da6f
Build stylo on Linux64 in automation; r=froydnj
https://hg.mozilla.org/mozilla-central/rev/1b5000300190
https://hg.mozilla.org/mozilla-central/rev/28585cf7da6f
Status: ASSIGNED → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla56
some build metrics alerts came in from this change:
== Change summary for alert #7458 (as of June 21 2017 21:34 UTC) ==

Regressions:

119%  build times summary linux64 opt base-toolchains taskcluster-c4.4xlarge     743.02 -> 1,627.24
 67%  build times summary linux64 opt taskcluster-c4.4xlarge                     783.47 -> 1,309.12
 24%  build times summary linux64 pgo taskcluster-c4.4xlarge                     2,727.86 -> 3,377.98
  3%  installer size summary linux64 pgo                                         60,643,474.21 -> 62,556,429.75

Improvements:

 83%  sccache requests_not_cacheable summary linux64 opt base-toolchains taskcluster-c4.4xlarge     62.00 -> 10.42

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7458
I see some talos memory metrics showing a regression with this change:
== Change summary for alert #7476 (as of June 21 2017 20:25 UTC) ==

Regressions:

  5%  tp5o Private Bytes linux64 opt e10s     1,200,620,725.43 -> 1,258,366,861.51
  5%  tp5o_webext Private Bytes linux64 pgo e10s1,220,618,437.43 -> 1,278,471,854.60
  5%  tp5o_webext Private Bytes linux64 opt e10s1,209,557,268.92 -> 1,266,870,514.87
  5%  tp5o Private Bytes linux64 pgo e10s     1,213,060,503.48 -> 1,268,325,219.69

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7476

I don't see this in areweslimyet, but odd that this is consistent- :gps do you have thoughts as why we might see a larger memory usage?
Flags: needinfo?(gps)
Depends on: 1375550
(In reply to Joel Maher ( :jmaher) from comment #23)
> I don't see this in areweslimyet, but odd that this is consistent- :gps do
> you have thoughts as why we might see a larger memory usage?

I am not surprised to see a small increase in memory usage, even when Stylo is pref'd off because the binary size is larger with Stylo.

@ bholley: Is a ~50 MB (5%) memory usage regression with Stylo pref'd off within the expected range??
Flags: needinfo?(bobbyholley)
(In reply to Chris Peterson [:cpeterson] from comment #24)
> (In reply to Joel Maher ( :jmaher) from comment #23)
> > I don't see this in areweslimyet, but odd that this is consistent- :gps do
> > you have thoughts as why we might see a larger memory usage?
> 
> I am not surprised to see a small increase in memory usage, even when Stylo
> is pref'd off because the binary size is larger with Stylo.

Yes, but that should be on the order of a megabyte.

> 
> @ bholley: Is a ~50 MB (5%) memory usage regression with Stylo pref'd off
> within the expected range??

Err, definitely not. I would expect zero memory overhead when stylo is prefed off.
Flags: needinfo?(bobbyholley)
Aside from linking in the Rust code, the code changes we get from building by default are very few: http://searchfox.org/mozilla-central/search?q=ifdef+MOZ_STYLO&path=

Of those, these are the only ones that could have any impact on memory (at least within my imagination):

[1] adds one extra pointer to nsDocument. There are not many documents in the runtime, so this should never cost more than a few hundred bytes.
[2] adds a single static boolean. This will cost a single byte for all of Gecko.
[3] parses sheets loaded by the stylesheet service in servo mode as well as gecko mode (since they're system-wide). This could plausibly increase overhead, but usage of the stylesheet service is very rare (mostly limited to addons and tests I think), so I would be quite surprised if this was a problem.
[4] invokes some static servo initialization code on startup, which calls [5]. But that doesn't seem to initialize more than a couple of bytes here and there.

[1] http://searchfox.org/mozilla-central/rev/ae94cfb36d804727dfb38f2750bfcaab4621df6e/dom/base/nsIDocument.h#3040
[2] http://searchfox.org/mozilla-central/rev/ae94cfb36d804727dfb38f2750bfcaab4621df6e/layout/base/nsLayoutUtils.h#2953
[3] http://searchfox.org/mozilla-central/rev/ae94cfb36d804727dfb38f2750bfcaab4621df6e/layout/base/nsStyleSheetService.cpp#260
[4] http://searchfox.org/mozilla-central/rev/ae94cfb36d804727dfb38f2750bfcaab4621df6e/layout/build/nsLayoutStatics.cpp#309
[5] http://searchfox.org/mozilla-central/rev/ae94cfb36d804727dfb38f2750bfcaab4621df6e/servo/ports/geckolib/glue.rs#136
And to be clear, even with stylo prefed on, 5% would be a much higher memory regression than we're targeting.

Regardless, if this is happening with stylo disabled, we need someone to investigate what the heck is going on.
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #27)
> And to be clear, even with stylo prefed on, 5% would be a much higher memory
> regression than we're targeting.

And to that end, are we 100% sure we only built stylo and didn't accidentally enable it? Given that we haven't finished certain key style struct sharing optimizations, 5% could be a plausible memory regression with stylo enabled at present.
(In reply to Carsten Book [:Tomcat] from comment #21)
> https://hg.mozilla.org/mozilla-central/rev/1b5000300190
> https://hg.mozilla.org/mozilla-central/rev/28585cf7da6f

(Nightly user) Can't verify: I built from m-c some minutes ago on Debian Testing x64 and have layout.css.servo.enabled;true - is there any way I can see if Stylo is really enabled? Have you thought about showing this information on about:support (like it is done for Webrender)?
(In reply to Darkspirit from comment #29)
> (In reply to Carsten Book [:Tomcat] from comment #21)
> > https://hg.mozilla.org/mozilla-central/rev/1b5000300190
> > https://hg.mozilla.org/mozilla-central/rev/28585cf7da6f
> 
> (Nightly user) Can't verify: I built from m-c some minutes ago on Debian
> Testing x64 and have layout.css.servo.enabled;true - is there any way I can
> see if Stylo is really enabled? Have you thought about showing this
> information on about:support (like it is done for Webrender)?

Easiest way to be 100% sure is to install the gecko profiler [1], load a webpage, capture a profile, and type 'Servo_TraverseSubtree' in the search box. If you samples in that function, you're running stylo. If not, you should see time spent in ResolveStyleFor.

[1] https://perf-html.io/
(In reply to Darkspirit from comment #29)
> (In reply to Carsten Book [:Tomcat] from comment #21)
> > https://hg.mozilla.org/mozilla-central/rev/1b5000300190
> > https://hg.mozilla.org/mozilla-central/rev/28585cf7da6f
> 
> (Nightly user) Can't verify: I built from m-c some minutes ago on Debian
> Testing x64 and have layout.css.servo.enabled;true - is there any way I can
> see if Stylo is really enabled? Have you thought about showing this
> information on about:support (like it is done for Webrender)?

That's ... unexpected. It's not even supposed to be enabled by default anywhere else than automation.
(In reply to Darkspirit from comment #29)
> and have *set* layout.css.servo.enabled;true
Sorry!
Size differences between libxul.so on the linux64 pgo build just before the enable-stylo landing, and the one on the landing changeset:

$ stat -c '%s %n' */libxul.so
114903144 firefox-stylo/libxul.so
108118952 firefox/libxul.so

That's a 6.7MB size difference, but that might as well be mostly symbols.

$ readelf -Wl firefox/libxul.so | awk '$1 == "LOAD" { tot += strtonum($5); } END { print tot }'
75660364
$ readelf -Wl firefox-stylo/libxul.so | awk '$1 == "LOAD" { tot += strtonum($5); } END { print tot }'
81394596

That's a 5.7MB difference in loaded code+data size.

And including BSS:
$ readelf -Wl firefox/libxul.so | awk '$1 == "LOAD" { tot += strtonum($6); } END { print tot }'
76272468
$ readelf -Wl firefox-stylo/libxul.so | awk '$1 == "LOAD" { tot += strtonum($6); } END { print tot }'
82006892

not much more BSS.

All in all, the regression in RSS should be that amount, multiplied by the number of processes (5?).

But the noted regression is about private bytes, I don't remember what that actually counts...

Just in case:

$ ./mach python toolkit/mozapps/installer/unpack.py firefox-stylo
$ grep servo firefox-stylo/greprefs
pref("layout.css.servo.enabled", false);
As of today, you can also load this testcase:

  data:text/html,<style>::first-line { color: green }</style><div>Is this green?</div>

If the text is green, you are not using stylo.  If it's black, you're using stylo.

Of course that relies on a stylo bug that we plan to fix soon, so long-term this won't help.
Others have addressed the memory question.

I also just filed bug 1375692 for about:support to gain some kind of Stylo indication. That seems like low hanging fruit, so I may just implement it :)
Flags: needinfo?(gps)
So, I added some logging to the talos script, so that it dumped all the lines from /proc/pid/maps it added up to private bytes.

Here are the things of note in the resulting data:
- there are only 2 content processes running, so 3 total processes.
- the difference due to libxul.so is... 319488 bytes
- all the rest of the difference is due to mapped, private memory. IOW, mostly the heap.
- some of that is due to differences in BSS sizes in some libraries. for libxul.so, that's 12288 bytes (on top of the 319488 mentioned above) ; that's exactly one page per process. Presumably, there's no BSS difference for other libraries, but it's hard to verify in /proc/pid/maps output.
- For the very first private bytes sample, the difference is only 11.5MB (for 2 processes, though), which suggests it's not related to something that would happen during startup.

If RSS hasn't regressed, that would suggest we're mapping memory we're not using, aka wasting address space. Not really a problem on 64-bits, but definitely one on 32-bits. But maybe RSS did also regress but stayed below some threshold?
So, for one thing, the stylo threads seem to be started up, because their associated jemalloc arenas are there. From running a build with stylo built-in locally, I see 3 additional arenas, each mapping one chunk of 1MB. Times 3 processes, that would possibly account for 9MB of largely unused memory (there *are* some allocations happening in those arenas, though (about 40k per arena), but they all seem to be related to initialization, that is, there doesn't seem to be more of them after visiting web sites). That's close enough to the difference between the first private bytes samples that I'm ready to say the additional difference might be noise.

Now the question is where the remaining 40MB come from.
(In reply to Mike Hommey [:glandium] from comment #37)
> So, for one thing, the stylo threads seem to be started up, because their
> associated jemalloc arenas are there.

Ahah! I'd have thought we wouldn't fire up the thread pools until actual parallel styling work happened, but we must be doing it somehow. It's a lazy_static, but it's in the GLOBAL_STYLE_DATA data structure [1], which can get bumped in enough places that it's plausible to me that we're doing that somehow. For example, if a stylesheet happens to get loaded in the stylesheet service, and we parse it for Servo, we'd end up triggering it at [2]. Can you verify if that's what happens?

> From running a build with stylo
> built-in locally, I see 3 additional arenas, each mapping one chunk of 1MB.
> Times 3 processes, that would possibly account for 9MB of largely unused
> memory (there *are* some allocations happening in those arenas, though
> (about 40k per arena)

Hm, I wonder what that would be. Assuming the parsing theory is correct, and we're not actually doing any parallel traversals, the only code I can imagine running in the arena is the thread initialization routine at [3]. But does that really allocate 40k?

Anyway, if the above holds, the easiest solution is probably just to move the threadpool into its own separate lazy static, which we could call STYLE_THREAD_POOL. That would prevent it from being initialized when we parse stylesheets.

> , but they all seem to be related to initialization,
> that is, there doesn't seem to be more of them after visiting web sites).
> That's close enough to the difference between the first private bytes
> samples that I'm ready to say the additional difference might be noise.

> 
> Now the question is where the remaining 40MB come from.

[1] http://searchfox.org/mozilla-central/rev/3291398f10dcbe192fb52e74974b172616c018aa/servo/components/style/gecko/global_style_data.rs#92
[2] http://searchfox.org/mozilla-central/rev/3291398f10dcbe192fb52e74974b172616c018aa/servo/ports/geckolib/glue.rs#767
[3] http://searchfox.org/mozilla-central/rev/3291398f10dcbe192fb52e74974b172616c018aa/servo/components/style/gecko/global_style_data.rs#36
(In reply to Bobby Holley (:bholley) (busy with Stylo) from comment #38)
> (In reply to Mike Hommey [:glandium] from comment #37)
> > So, for one thing, the stylo threads seem to be started up, because their
> > associated jemalloc arenas are there.
> 
> Ahah! I'd have thought we wouldn't fire up the thread pools until actual
> parallel styling work happened, but we must be doing it somehow. It's a
> lazy_static, but it's in the GLOBAL_STYLE_DATA data structure [1], which can
> get bumped in enough places that it's plausible to me that we're doing that
> somehow. For example, if a stylesheet happens to get loaded in the
> stylesheet service, and we parse it for Servo, we'd end up triggering it at
> [2]. Can you verify if that's what happens?

Seems to be it:
#0  0x00007f3c276914e9 in std::sync::once::Once::call_once::_$u7b$$u7b$closure$u7d$$u7d$::h93aefed2d775f018 ()
    at /home/glandium/f/libxul.so
#1  0x00007f3c278b2243 in std::sync::once::Once::call_inner::h3176ed88d2cbec9a () at /home/glandium/f/libxul.so
#2  0x00007f3c2740e674 in Servo_StyleSheet_FromUTF8Bytes () at /home/glandium/f/libxul.so
#3  0x00007f3c2589a834 in mozilla::ServoStyleSheet::ParseSheet(mozilla::css::Loader*, nsAString const&, nsIURI*, nsIURI*, nsIPrincipal*, unsigned int, nsCompatibility, mozilla::css::LoaderReusableStyleSheets*) () at /home/glandium/f/libxul.so
#4  0x00007f3c25877067 in mozilla::css::Loader::ParseSheet(nsAString const&, mozilla::css::SheetLoadData*, bool&) ()
    at /home/glandium/f/libxul.so
#5  0x00007f3c25877dc1 in mozilla::css::SheetLoadData::OnStreamComplete(nsIUnicharStreamLoader*, nsISupports*, nsresult, nsAString const&) [clone .part.437] () at /home/glandium/f/libxul.so
#6  0x00007f3c238f6b6c in nsUnicharStreamLoader::OnStopRequest(nsIRequest*, nsISupports*, nsresult) () at /home/glandium/f/libxul.so
#7  0x00007f3c2472b585 in nsSyncLoadService::PushSyncStreamToListener(nsIInputStream*, nsIStreamListener*, nsIChannel*) ()
    at /home/glandium/f/libxul.so
#8  0x00007f3c2587412f in mozilla::css::Loader::LoadSheet(mozilla::css::SheetLoadData*, mozilla::css::StyleSheetState, bool) ()
    at /home/glandium/f/libxul.so

I'm going to try to move the thread pool out of the global style state.
Submitted https://github.com/servo/servo/pull/17492 which I confirmed removes the thread pool initialization. Talos says this makes the whole 50MB regression go away... which is surprising to me. I'll still dig into this to understand where exactly that would come from.
(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #34)
> As of today, you can also load this testcase:
> data:text/html,<style>::first-line { color: green }</style><div>Is this green?</div>
> If the text is green, you are not using stylo.  If it's black, you're using stylo.

Yep, it's black. Thank you.
Where should/can I report broken sites? Here in bugzilla or has it to be on Github?
I don't know if stylo is faster or the same as without, but it's not slower.
1) completely broken: http://www.ardmediathek.de/tv (german public tv)
2) https://www.zdf.de/comedy/neo-magazin-mit-jan-boehmermann/neo-magazin-royale-mit-jan-boehmermann-vom-22-juni-2017-100.html (german public tv) Video is bigger than it's parent div, so one can't see the video controls and if I click on "Download" and "X" to close that layover, the font is getting large.
3) The tabs if bugzilla's comment field (where I am currently typing in) have no text.

Or should I just wait some weeks and report then?
Some more data:
- Somehow, talos is getting 6(!) style threads per process. I could only gather jemalloc stats for 2 processes, but both of them had 6 style threads. That's 6MB per process, which would bring us to 18MB for 3 processes.

- In the data I got for the 2 processes, the difference, as seen by jemalloc itself, is around 15.7MB. That's 3+MB more than the expected overhead from the 12 arenas with one chunk per arena, and according to jemalloc, those are all due to fragmentation.

- Extrapolated to 3 processes, we're still far from 50MB, so either the process I didn't get a jemalloc report for has even more overhead than the 2 others somehow, or there are some non-jemalloc allocations happening somehow. I'm also not excluding jemalloc under-reporting its own overhead for those arenas.

- All style arenas but one per process have 3112 bytes allocated in small allocations, and 36864 bytes in one large allocation. The other one per process has 5192 bytes in small allocations, and 36864 bytes in one large allocation. The small allocations are with the following exact sizes: 1 * 8, 2 * 16, 1* 1024, 1* 2048... the arena with 5192 bytes, add 1 * 32 and 1 * 2048.

- I tried identifying where these allocations come from, but I hit https://github.com/rust-lang/rust/issues/41775 which prevents from getting good backtraces. Some manual work identified some things related to rust native strings and ffi strings, one 576 bytes allocation from tools/profiler/core/platform.cpp:2021, one allocation for the thread name for the profile, one 32808 allocation from tools/profiler/core/ThreadInfo.cpp:27, etc. All in all, that's a lot of things due to the profiler registration added in https://github.com/servo/servo/commit/82bffefbf7820b58d2ddbf0440326af799d1e36b happening after the thread switched to use a separate arena, per https://github.com/servo/servo/commit/642cd08f21727ee35dc3dace14d0c9ad5837f380 . It actually seems to make sense to move those allocations to the main arena. If I force that to happen, I'm down to 2096 for most arenas, or 4176 bytes for one of them per process, and it looks like they could all come from rayon, but I'm still waiting for rr to hit the breakpoint I've set in jemalloc. I might send a PR to move those allocations to the main arena, but I first want to see why I was getting jemalloc segfaults in gdb in that case, suggesting some race condition.
> - Somehow, talos is getting 6(!) style threads per process.

If STYLO_THREADS is not set, looks we use cmp::max(num_cpus::get() * 3 / 4, 1).  num_cpus::get() returns logical cpus (compare num_cpus::get_physical()).

Is talos running on 2-core or 4-core machines?  6 threads would be consistent with 4-core....
(In reply to Darkspirit from comment #41)
> (In reply to Boris Zbarsky [:bz] (if a patch has no decent message,
> automatic r-) from comment #34)
> > As of today, you can also load this testcase:
> > data:text/html,<style>::first-line { color: green }</style><div>Is this green?</div>
> > If the text is green, you are not using stylo.  If it's black, you're using stylo.
> 
> Yep, it's black. Thank you.
> Where should/can I report broken sites? Here in bugzilla or has it to be on
> Github?
> I don't know if stylo is faster or the same as without, but it's not slower.

That's great to hear, and matches what we see on Tp (which currently shows them about the same). In reality stylo is much faster doing the actual styling work, but there's some other overhead (that we're working on fixing) that's cancelling out the wins.

> 1) completely broken: http://www.ardmediathek.de/tv (german public tv)
> 2)
> https://www.zdf.de/comedy/neo-magazin-mit-jan-boehmermann/neo-magazin-royale-
> mit-jan-boehmermann-vom-22-juni-2017-100.html (german public tv) Video is
> bigger than it's parent div, so one can't see the video controls and if I
> click on "Download" and "X" to close that layover, the font is getting large.
> 3) The tabs if bugzilla's comment field (where I am currently typing in)
> have no text.
> 
> Or should I just wait some weeks and report then?

No, this is really great. Can you please file these as separate bugs in the Core:CSS component, with "stylo: " in the title, and make them block the stylo-site-issues metabug (bug 1375906)? Thanks!
(In reply to Mike Hommey [:glandium] from comment #42)
> Some more data:
> - Somehow, talos is getting 6(!) style threads per process. I could only
> gather jemalloc stats for 2 processes, but both of them had 6 style threads.
> That's 6MB per process, which would bring us to 18MB for 3 processes.

Yes, that matches my mental model. I was a bit surprised in comment 37 to see you only talking about three threads, but I realized now you were talking about a local machine, which is presumably 2-core (or 4 non-hyperthreaded cores).

My understanding is that the Talos machines are 4-core intel chips, and that we run 6 threads on those cores (per the formula bz described). You can set STYLO_THREADS=X as an enviromental variable to tweak the count if it helps your investigation.

> 
> - In the data I got for the 2 processes, the difference, as seen by jemalloc
> itself, is around 15.7MB. That's 3+MB more than the expected overhead from
> the 12 arenas with one chunk per arena, and according to jemalloc, those are
> all due to fragmentation.

Hm, that's weird. The whole point of allocating 1MB arenas is to avoid fragmentation...

> 
> - Extrapolated to 3 processes, we're still far from 50MB, so either the
> process I didn't get a jemalloc report for has even more overhead than the 2
> others somehow, or there are some non-jemalloc allocations happening
> somehow. I'm also not excluding jemalloc under-reporting its own overhead
> for those arenas.
> 
> - All style arenas but one per process have 3112 bytes allocated in small
> allocations, and 36864 bytes in one large allocation. The other one per
> process has 5192 bytes in small allocations, and 36864 bytes in one large
> allocation. The small allocations are with the following exact sizes: 1 * 8,
> 2 * 16, 1* 1024, 1* 2048... the arena with 5192 bytes, add 1 * 32 and 1 *
> 2048.
> 
> - I tried identifying where these allocations come from, but I hit
> https://github.com/rust-lang/rust/issues/41775 which prevents from getting
> good backtraces. Some manual work identified some things related to rust
> native strings and ffi strings, one 576 bytes allocation from
> tools/profiler/core/platform.cpp:2021, one allocation for the thread name
> for the profile, one 32808 allocation from
> tools/profiler/core/ThreadInfo.cpp:27, etc. All in all, that's a lot of
> things due to the profiler registration added in
> https://github.com/servo/servo/commit/
> 82bffefbf7820b58d2ddbf0440326af799d1e36b happening after the thread switched
> to use a separate arena, per
> https://github.com/servo/servo/commit/
> 642cd08f21727ee35dc3dace14d0c9ad5837f380 . It actually seems to make sense
> to move those allocations to the main arena.

Yeah, agreed. We really only need the fancy per-thread arenas for the style traversal.

> If I force that to happen, I'm
> down to 2096 for most arenas, or 4176 bytes for one of them per process, and
> it looks like they could all come from rayon, but I'm still waiting for rr
> to hit the breakpoint I've set in jemalloc. I might send a PR to move those
> allocations to the main arena, but I first want to see why I was getting
> jemalloc segfaults in gdb in that case, suggesting some race condition.
> 1) completely broken: http://www.ardmediathek.de/tv (german public tv)

That's bug 1372041 at first glance.  Nixing that one media="print" link from the DOM makes it fine.

Bugs on the rest would be great.
(In reply to Boris Zbarsky [:bz] (if a patch has no decent message, automatic r-) from comment #46)
> > 1) completely broken: http://www.ardmediathek.de/tv (german public tv)
> 
> That's bug 1372041 at first glance.  Nixing that one media="print" link from
> the DOM makes it fine.

Filing a bug on this too, and making it depend on bug 1372041, would be fine I think. That way we'll remember to re-test the site once the dependent issue is fixed.

Also, congratulations on being our first Stylo tester! :-)
Blocks: 1375925
I filed bug 1375925 to continue investigating the memory issues. Let's move the discussion there, since this bug is closed and rather overloaded anyway.
Blocks: 1375976
I see the private bytes issue fixed in this range:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=13d724745f9fc3e6c3c768a91d264c71cf02f626&tochange=4d8da1f6e9e1b962cd5c94f5e0b577c173f7883a

which includes :glandium's PR:
https://github.com/servo/servo/pull/17492

== Change summary for alert #7529 (as of June 23 2017 20:47 UTC) ==

Improvements:

  5%  tp5o_webext Private Bytes linux64 opt e10s     1,264,828,390.42 -> 1,207,399,837.12
  5%  tp5o Private Bytes linux64 opt e10s            1,258,016,228.94 -> 1,201,419,362.32

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=7529
Product: Core → Firefox Build System
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: