Open Bug 1834977 Opened 1 year ago Updated 1 year ago

layout.css.stylo-threads=0 and/or layout.css.stylo-parallelism-threshold=0 improves sp3 numbers on Android significantly

Tracking

()

Status:

NEW

People

(Reporter: smaug, Unassigned)

References

(Blocks 1 open bug)

Details

(Whiteboard: [sp3])

Attachments

(1 file)

WIP: Bug 1834977 - non-parallel stylo is faster on Android, so change the pref to layout.css.stylo-parallelism-threshold=0, r=emilio 1 year ago Olli Pettay [:smaug][bugs@pettay.fi] (deleted), text/x-phabricator-request		Details

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Description

•

1 year ago

Android
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=74b475a614a21cd410ab2327ac512b6a17209ba4&originalSignature=4590278&newSignature=4590278&framework=13&application=geckoview&originalRevision=ed3b6f0c54d4a01ac18e70ef4c0fc4a1685b9ccc&page=1&showOnlyConfident=1

But Windows doesn't like it
https://treeherder.mozilla.org/perfherder/comparesubtest?originalProject=try&newProject=try&newRevision=74b475a614a21cd410ab2327ac512b6a17209ba4&originalSignature=4586009&newSignature=4586009&framework=13&application=firefox&originalRevision=ed3b6f0c54d4a01ac18e70ef4c0fc4a1685b9ccc&page=1&showOnlyConfident=1

Jira Integration Bot

Updated

•

1 year ago

See Also: → https://mozilla-hub.atlassian.net/browse/SP3-391

Gregory Pappas [:gregp]

Updated

•

1 year ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1834127

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 1

•

1 year ago

and I did try also values 2 and 4 and there weren't good.

Some page load tests for 0 are pending: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=33e9e0370b16185d0c77545b1e1cf787b08625c8&newProject=try&newRevision=721e5df325b8fac16338064a994dec3aef8c7a31

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 2

•

1 year ago

Attached file WIP: Bug 1834977 - non-parallel stylo is faster on Android, so change the pref to layout.css.stylo-parallelism-threshold=0, r=emilio (deleted) — Details

Emilio Cobos Álvarez (:emilio)

Comment 3

•

1 year ago

A try with a bigger parallelism threshold: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=46e0f500a3998d351f8c72b787d34414fc4edf25&newProject=try&newRevision=e3778acc9590726d0eff9f9e0b8556693ecb1e14&page=1

Bobby Holley (:bholley)

Comment 4

•

1 year ago

Seems like it would be good to understand this a little better. I could imagine it might be device-dependent, and we don't want to overrotate on the devices we happen to have in CI.

Emilio Cobos Álvarez (:emilio)

Comment 5

•

1 year ago

Yeah we've reproduced this in multiple devices, but others don't show the same issue.

In my device and Jesup's there's a somewhat reasonable explanation which is that there's only one P-core even tho there are multiple e-cores.

But on the a51 on automation there are four and four so that might not be it.

So my current theory that I need to verify is about priority of the stylo threads. Other potential avenues for investigation ks looking into if somehow more style sharing is disproportionately better on Android or something, Android having somehow a lot more context switching overhead, or jemaloc/TLS arenas?

But yeah I'd avoid disabling parallelism without really understanding this.

Jamie Nicol [:jnicol]

Comment 6

•

1 year ago

Here's some results testing different values for stylo-threads on a variety of devices. I was struggling to get results on some lower end devices (perhaps I was hitting the memory pressure clearing markers causing an exception issue), but can try again if needs be.

Also caveat that the results were fairly noisy, and perhaps I should have been capturing more data than just the score, but it was fairly time consuming already.

https://docs.google.com/spreadsheets/d/1FcZoy85SH3eXFARCyFYOx2ZnYnctB8e-9pISOi-bgzw/edit?usp=sharing

To me it looks like disabling parrallelism doesn't really regress anywhere. It seems to make not much difference on homogenous cores. On a 4+4 big/little configuration we sometimes see some improvements. And on devices where the CPU configuration is 1+1+6, or 2+2+4 we see a large improvement.

Emilio Cobos Álvarez (:emilio)

Comment 7

•

1 year ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #3)

A try with a bigger parallelism threshold: https://treeherder.mozilla.org/perfherder/compare?originalProject=try&originalRevision=46e0f500a3998d351f8c72b787d34414fc4edf25&newProject=try&newRevision=e3778acc9590726d0eff9f9e0b8556693ecb1e14&page=1

This seems to improve android and windows. This is just doubling the parallelism threshold but not the work unit size, so that we guarantee we have full work for two threads before we switch to parallel mode. Does that change seem objectionable Bobby?

Flags: needinfo?(bholley)

Emilio Cobos Álvarez (:emilio)

Updated

•

1 year ago

Depends on: 1835280

Bobby Holley (:bholley)

Comment 8

•

1 year ago

(In reply to Emilio Cobos Álvarez (:emilio) from comment #7)

This seems to improve android and windows. This is just doubling the parallelism threshold but not the work unit size, so that we guarantee we have full work for two threads before we switch to parallel mode. Does that change seem objectionable Bobby?

That seems like a reasonable change to make (worth testing it locally in a few configurations as well on various workloads).

That said, I think it doesn't really bring us closer to understanding why parallelism doesn't seem to be working as we expect on Android. The e-core thing is a plausible theory, but Jamie indicated that disabling parallelism also doesn't regress devices with homogeneous cores, which isn't what I'd initially expect. So there might be another effect going on that could be useful to understand, and ideally we'd do some profiling and investigation on such a device.

Flags: needinfo?(bholley)

Daniel Holbert [:dholbert]

Updated

•

1 year ago

Severity: -- → S3

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 9

•

1 year ago

Also MotionMark1.2 score seems to improve on A54 if the value of the pref is 0.

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 10

•

1 year ago

Some comparison:
https://share.firefox.dev/3C1XB08 default settings
https://share.firefox.dev/3ozUutg style-threads=0

I don't trust too much about the actual time reported by those particular profiles, but whatever the profiler does manage to capture in the
stylo threads is mostly futex_wait. That shows up a lot on desktop too. Is that expected?

With 0 stylo thread, the main thread just keeps busy and cpu usage is around 90%. With stylo threads we have multiple threads, but reported cpu usage is way lower, at least for some of the threads.

Since stylo is using so many threads, some of those do get run on little cores. Though, even limiting to just 2 threads isn't as good as 0.
But I expect that AndroidUI thread in the parent process and the main thread of the content process and something else too get to use
big cores by default, so stylo might get run mostly on little cores. This needs some more investigation.
And once we figure out how to run the main thread using the fastest possible core on those devices which have single very fast core, the difference here would be even larger, I think.

Do we always try to use all the available stylo threads even if there wasn't too much work to do? Does stylo always wait for all the threads to
acknowledge the main thread that they have finished the work? (in other words, do we always wake up all the 6 threads?)

Summary: layout.css.stylo-threads=0 improves sp3 numbers on Android significantly → layout.css.stylo-threads=0 and/or layout.css.stylo-parallelism-threshold=0 improves sp3 numbers on Android significantly

Olli Pettay [:smaug][bugs@pettay.fi]

Reporter

Comment 11

•

1 year ago

Based on profiling using Android GPU Inspector, stylo threads are run almost always using little cores (on A54). And while they run on other threads, the big core which runs the main thread is occasionally slowed down, and it takes a bit time to ramp that up again.

Phabricator Automation

Updated

•

1 year ago

Attachment #9335874 - Attachment description: WIP: Bug 1834977 - non-parallel stylo is faster on Android, so change the pref to layout.css.stylo-threads=0, r=emilio → WIP: Bug 1834977 - non-parallel stylo is faster on Android, so change the pref to layout.css.stylo-parallelism-threshold=0, r=emilio

Emilio Cobos Álvarez (:emilio)

Updated

•

1 year ago

Depends on: 1835923

You need to log in before you can comment on or make changes to this bug.