clock_gettime() etc are very slow on some android devices
Categories
(Core :: Performance, defect)
Tracking
()
People
(Reporter: jnicol, Assigned: jnicol)
References
(Blocks 1 open bug)
Details
(Whiteboard: [sp3])
Attachments
(7 files, 1 obsolete file)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
https://share.firefox.dev/3PPYuRI
This shows up a lot in a speedometer (around 4%), and skews comparisons with Chrome as both browsers take timestamps in different places, though appear to be similarly affected overall.
This occurs on several devices I have tested including the Samsung A51, Samsung S9, Pixel 5, OnePlus 8 Pro. On other devices (including Pixel 4, PIxel 6, Samsung S10) this barely shows up in profiles if at all. This isn't a exynos vs snapdragon type thing. Additionally, naively checking the kernel version doesn't show a pattern: if this was broken then fixed in specific versions then perhaps vendor specific backports muddy the water.
It appears that the VDSO fast path is failing for some reason and we end up doing a syscall. I'm trying to get kernel symbols included in the profile to see if that sheds any light on why.
We may need to audit our usage of timestamps on the critical path.
Updated•1 year ago
|
Updated•1 year ago
|
Assignee | ||
Comment 1•1 year ago
|
||
I think I've figured out why this is slow, on the A51 at least.
The code I'm linking to below is from the 4.14 branch of the android common kernel. The actual kernel used on the device will differ but hopefully the relevant bits are still the same.
See ARM64_ERRATUM_858921
, which affects Cortex-A73 CPUs (which the Samsung A51 has). Presumably this workaround is therefore enabled.
We can see here this creates a arch_timer_erratum_workaround
instance with the read_cntvct_el0
field set.
We can then see here that if that field is set, we set clocksource_counter.archdata.vdso_direct
to false
.
Then in update_vsyscall()
, which is where the kernel writes its time data to the vdso data page, if archdata.vdso_direct
is false then it sets vdso_data.use_syscall
to true.
Then finally, see the actual clock_gettime() implementation in the vdso. This uses the syscall_check
macro, which reads the value of vdso_data.use_syscall
and branches to the syscall fallback if it's set.
Assignee | ||
Comment 2•1 year ago
|
||
CLOCK_MONOTONIC_COARSE
is unaffected by this issue, it stays in the VDSO fast path. So at the very least we can add an implementation of TimeStamp::NowLowRes()
using this.
Additionally we can then audit our existing usage of timestamps, and see if there's any we can either remove or switch to low resolution.
Assignee | ||
Comment 3•1 year ago
|
||
The Pixel 5 family is also affected via the same mechanism, but due to a different CPU erratum, which affects Cortex A76 and Kyro 4G (of which the pixel 5 has the latter core).
While the pixel 6 family has a Cortex A76, it is not affected due to this commit which restricts that workaround to AArch32 mode.
There are other affected devices too with different CPUs, and sometimes 2 devices may have the same CPU and kernel version but one is affected and one is not. But I'm confident enough that this is our cause, and will chaulk that up to vendors cherry-picking different patches on to their kernels.
Assignee | ||
Comment 4•1 year ago
|
||
We have discovered that clock_gettime(CLOCK_MONOTONIC) can be slow on
certain arm64 devices due to kernel workarounds for CPU errata
avoiding the VDSO fast-path. Using CLOCK_MONOTONIC_COARSE is
unnaffected by these issues. This patch adds an implementation of
TimeStamp::NowLoRes() using the coarse clock, meaning that when lower
precision timestamps are adequate we do not pay the penalty of hitting
the slow path.
Updated•1 year ago
|
Assignee | ||
Comment 5•1 year ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive. Currently we measure the time taken to write protect JIT
code, and as this is a hot path we see these timestamps show up in
profiles. However, since bug 1835876 we no longer write protect code
in content processes. This patch therefore avoids measuring the time
when write protection is disabled.
Depends on D185004
Assignee | ||
Comment 6•1 year ago
|
||
On certain devices we have seen that taking frequent high-resolution
timestamps can be expensive, therefore it makes sense to use
low-resolution timestamps instead where possible.
js::Nursery::isUnderused() and CCGCScheduler::MaybePokeCC() check for
timeouts of > 1 second, therefore the low resolution timer is more
than adequate.
Depends on D185005
Assignee | ||
Comment 7•1 year ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. Measuring the time taken in refresh driver phases
and in NotifyObservers() shows up in profiles. However, these values
are only used to report to telemetry and we do not actually collect
this telemetry on Android. This patch therefore avoids measuring these
durations on Android.
Depends on D185006
Assignee | ||
Comment 8•1 year ago
|
||
As taking timestamps can be expensive on certain devices we should try
to avoid taking them where possible. This patch fixes a few occasions
where we take a timestamp solely for a profiler marker even when the
current thread is not being profiled.
Depends on D185007
Assignee | ||
Comment 9•1 year ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. HttpChannelChild::OnStopRequest() currently takes
multiple timestamps (for profiler markers and telemetry measurements)
without performing meaningful work in between. This patch replaces
these with a single timestamp used multiple times.
Depends on D185008
Assignee | ||
Comment 10•1 year ago
|
||
On certain devices we have seen that taking frequent timestamps can be
expensive, therefore it makes sense to reduce the number of timestamps
taken in hot paths. nsTimerImpl::Fire() currently takes a timestamp
which only gets used in logging. This patch therefore makes it so the
timestamp is only taken when logging is enabled.
Depends on D185009
Updated•1 year ago
|
Updated•1 year ago
|
Updated•1 year ago
|
Comment 11•1 year ago
|
||
This is a more generic version of D185008.
Comment 12•1 year ago
|
||
Updated•1 year ago
|
Comment 13•1 year ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/dfac5b1681a1
https://hg.mozilla.org/mozilla-central/rev/e1cf0b9086ba
https://hg.mozilla.org/mozilla-central/rev/cf8e2772f907
https://hg.mozilla.org/mozilla-central/rev/65677e6aa05d
https://hg.mozilla.org/mozilla-central/rev/01be17916dd3
https://hg.mozilla.org/mozilla-central/rev/e03410fcc17d
https://hg.mozilla.org/mozilla-central/rev/7b7e075d34e7
Comment 14•1 year ago
|
||
Comment 15•1 year ago
|
||
Backed out for bustages on ProfilerMarkers.h
Backout link: https://hg.mozilla.org/integration/autoland/rev/df7fb7f4fde75763265536c8098274ee9d22677d
Log link: https://treeherder.mozilla.org/logviewer?job_id=425518473&repo=autoland&lineNumber=27635
Assignee | ||
Comment 16•1 year ago
|
||
redirecting ni to emilio
Comment 17•1 year ago
|
||
Comment on attachment 9348145 [details]
Bug 1843534 - Ensure profiler_add_marker arguments are not evaluated when not profiled. r=canaltinova,aabh
Revision D185800 was moved to bug 1848130. Setting attachment 9348145 [details] to obsolete.
Comment 18•1 year ago
|
||
Let me move that to a separate bug to ease tracking.
Updated•1 year ago
|
Description
•