Closed Bug 1716959 Opened 3 years ago Closed 3 years ago

profiler_capture_backtrace_into should not lock the main Profiler mutex

Categories

(Core :: Gecko Profiler, defect, P2)

defect

Tracking

()

RESOLVED FIXED
93 Branch
Tracking Status
firefox93 --- fixed

People

(Reporter: mozbugz, Assigned: mozbugz)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details
(deleted), text/x-phabricator-request
Details

As evidenced in https://share.firefox.dev/3iMEBu4 (look at "StreamTrans #52", where lots of IOs create lots of markers, which want to capture the stack), because markers use profiler_capture_backtrace_into, which locks the global profiler mutex, it looks like there is a massive profiler overhead, when in fact it's "only" because of the shared lock, which makes it more likely that markers will get stuck attempting to lock the mutex, and then that mutex-locking function will get sampled.

To solve this, profiler_capture_backtrace_into should not lock the profiler mutex.

Some problems to overcome:

  • Access the RegisteredThread. But since we always capture the stack in the current thread, it shouldn't need to lock the mutex to access RegisteredThread, or at least the Racy part of it, but I see code that uses some non-racy information, I'm hoping this could be moved to the racy part? To be investigated.
  • Access to ActivePS::Feature..., we should be able to switch to RacyFeatures instead.
  • DoSharedSample uses a shared JsFrameBuffer from CorePS. A comment there mentions that it used to be on the stack, but that caused overflows. I think the easiest step would be to have a second buffer that is protected by a separate mutex, so it doesn't clash with the Profiler's mutex; this would solve the main issue with the over-representation of profiler_capture_backtrace_into in samples. It would of course interact between different threads, but hopefully it's less of a problem; follow-ups could involve multiple buffers, and/or on-stack buffers if we can guarantee no overflow, etc.
  • Whatever else I may have missed!
No longer depends on: 1721569

Registers::SyncPopulate() now uses a ucontext_t that's stored with the Registers object, so it can safely be called from parallel threads.

Depends on D121978

Also added extra nullptr checks to avoid surprises if calling code ever tries to use lul before it's first set.

Depends on D122083

All implementations of DoNativeBacktrace are now thread-safe, so it's not necessary to make their use dependent on the Profiler's gPSMutex being locked.

Depends on D122084

MergeStack requires a fairly large buffer to store JS frames, too big to be allocated on the stack without risking a stack overflow.
Until now, there was only one buffer, stored in CorePS, and only accessible while holding the Profiler gPSMutex.

Now each thread that has a JSContext, also has its own JS frame buffer, which is accessible on the thread without needing any lock.
The Profiler's Sampler still uses the CorePS buffer for its periodic sampling, but it won't prevent parallel on-thread sampling anymore.

The appropriate buffer is passed to ExtractJsFrames and then MergeStacks.
MergeStacks accepts a null pointer, which happens on threads where there is no JSContext, and therefore no JS to sample.

Depends on D122086

Since these functions don't need to access profiler functions requiring a lock, they themselves don't need that lock anymore.

Depends on D122087

profiler_capture_backtrace_into now only uses thread-safe functions: ThreadRegistration::WithOnThreadRefOf, Register::SyncPopulate, DoSyncSample.
So we don't need to lock the main profiler mutex anymore.

This means that on-thread sampling (typically used in markers) can happen at the same time the periodic sampler has locked the profiler mutex and is sampling this and other threads.

Depends on D122088

It looks like Registers cannot be passed by-value on aarch64. Easy to fix, on it!

Flags: needinfo?(gsquelart)
Pushed by gsquelart@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/d5607038a5d6 Make Registers::SyncPopulate() re-entrant on Linux, for safe use in parallel stack unwinding calls - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/2f97ef8a91de Make CorePS::mLul atomic, to avoid using the profiler lock to access it - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/7c84ffb63567 Lock is not needed anymore in DoNativeBacktrace - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/ef79ed5f95c1 Pass features to DoSharedSamples to remove lock-dependent feature requests - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/c39b9c1696aa On-thread sampling uses a per-thread JS frame buffer that's only allocated when there's also a JSContext - r=canaltinova https://hg.mozilla.org/integration/autoland/rev/eb105787f5bd Remove compulsory proof-of-lock in DoNativeBacktrace and DoSyncSample - r=canaltinova

This comment was left by mistake so please kindly ignore it.

Regressions: 1779257
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: