profiler_capture_backtrace_into should not lock the main Profiler mutex
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox93 | --- | fixed |
People
(Reporter: mozbugz, Assigned: mozbugz)
References
(Blocks 1 open bug)
Details
Attachments
(7 files)
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
As evidenced in https://share.firefox.dev/3iMEBu4 (look at "StreamTrans #52", where lots of IOs create lots of markers, which want to capture the stack), because markers use profiler_capture_backtrace_into
, which locks the global profiler mutex, it looks like there is a massive profiler overhead, when in fact it's "only" because of the shared lock, which makes it more likely that markers will get stuck attempting to lock the mutex, and then that mutex-locking function will get sampled.
To solve this, profiler_capture_backtrace_into
should not lock the profiler mutex.
Some problems to overcome:
- Access the
RegisteredThread
. But since we always capture the stack in the current thread, it shouldn't need to lock the mutex to accessRegisteredThread
, or at least theRacy
part of it, but I see code that uses some non-racy information, I'm hoping this could be moved to the racy part? To be investigated. - Access to
ActivePS::Feature...
, we should be able to switch toRacyFeatures
instead. DoSharedSample
uses a sharedJsFrameBuffer
fromCorePS
. A comment there mentions that it used to be on the stack, but that caused overflows. I think the easiest step would be to have a second buffer that is protected by a separate mutex, so it doesn't clash with the Profiler's mutex; this would solve the main issue with the over-representation ofprofiler_capture_backtrace_into
in samples. It would of course interact between different threads, but hopefully it's less of a problem; follow-ups could involve multiple buffers, and/or on-stack buffers if we can guarantee no overflow, etc.- Whatever else I may have missed!
Assignee | ||
Comment 1•3 years ago
|
||
Registers::SyncPopulate()
now uses a ucontext_t that's stored with the Registers
object, so it can safely be called from parallel threads.
Depends on D121978
Assignee | ||
Comment 2•3 years ago
|
||
Also added extra nullptr checks to avoid surprises if calling code ever tries to use lul before it's first set.
Depends on D122083
Assignee | ||
Comment 3•3 years ago
|
||
All implementations of DoNativeBacktrace are now thread-safe, so it's not necessary to make their use dependent on the Profiler's gPSMutex being locked.
Depends on D122084
Assignee | ||
Comment 4•3 years ago
|
||
Depends on D122085
Assignee | ||
Comment 5•3 years ago
|
||
MergeStack requires a fairly large buffer to store JS frames, too big to be allocated on the stack without risking a stack overflow.
Until now, there was only one buffer, stored in CorePS, and only accessible while holding the Profiler gPSMutex.
Now each thread that has a JSContext, also has its own JS frame buffer, which is accessible on the thread without needing any lock.
The Profiler's Sampler still uses the CorePS buffer for its periodic sampling, but it won't prevent parallel on-thread sampling anymore.
The appropriate buffer is passed to ExtractJsFrames and then MergeStacks.
MergeStacks accepts a null pointer, which happens on threads where there is no JSContext, and therefore no JS to sample.
Depends on D122086
Assignee | ||
Comment 6•3 years ago
|
||
Since these functions don't need to access profiler functions requiring a lock, they themselves don't need that lock anymore.
Depends on D122087
Assignee | ||
Comment 7•3 years ago
|
||
profiler_capture_backtrace_into
now only uses thread-safe functions: ThreadRegistration::WithOnThreadRefOf, Register::SyncPopulate, DoSyncSample.
So we don't need to lock the main profiler mutex anymore.
This means that on-thread sampling (typically used in markers) can happen at the same time the periodic sampler has locked the profiler mutex and is sampling this and other threads.
Depends on D122088
Comment 8•3 years ago
|
||
Backed out for Android build bustages
Backout link : https://hg.mozilla.org/integration/autoland/rev/119cef315d6b2879d731550cf98573dced22ad34
Push with failures : https://treeherder.mozilla.org/jobs?repo=autoland&resultStatus=pending%2Crunning%2Csuccess%2Ctestfailed%2Cbusted%2Cexception&searchStr=android%2C5.0%2Caarch64%2Copt%2Cbuild-android-aarch64%2Fopt%2Cb&revision=e6a849ee0e31bff064cafa9322ec59756d690791
Failure log: https://treeherder.mozilla.org/logviewer?job_id=349448269&repo=autoland&lineNumber=23637
Assignee | ||
Comment 9•3 years ago
|
||
It looks like Registers
cannot be passed by-value on aarch64. Easy to fix, on it!
Comment 10•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/94e7c8168e46
https://hg.mozilla.org/mozilla-central/rev/5ced356ad936
https://hg.mozilla.org/mozilla-central/rev/47d7b07f68b0
https://hg.mozilla.org/mozilla-central/rev/2772adb20068
https://hg.mozilla.org/mozilla-central/rev/9fb7bba8eec3
https://hg.mozilla.org/mozilla-central/rev/e1e1fb107a1a
https://hg.mozilla.org/mozilla-central/rev/c8933e49a64b
Comment 11•3 years ago
|
||
Comment 12•3 years ago
|
||
This comment was left by mistake so please kindly ignore it.
Description
•