Re-locking non-recursive mutex in profiler_shutdown -> MOZ_LOG -> WriteFile -> IOInterposer -> profiler_get_backtrace
Categories
(Core :: Gecko Profiler, defect, P2)
Tracking
()
People
(Reporter: mozbugz, Assigned: mozbugz)
References
Details
Attachments
(2 files, 1 obsolete file)
Found when closing Firefox that was started with MOZ_LOG=prof:3 MOZ_PROFILER_STARTUP=1 ./mach run
(which now effectively enables "mainthreadio" since bug 1539839) :
###!!! ERROR: Potential deadlock detected:
=== Cyclical dependency starts at
--- Mutex : StaticMutex (currently acquired)
calling context
#01: mozilla::BaseStaticMutex<mozilla::recordreplay::Behavior::Preserve>::Lock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\StaticMutex.h:39)
#02: mozilla::BaseAutoLock<PSMutex &>::BaseAutoLock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\Mutex.h:165)
#03: profiler_get_backtrace (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3730)
#04: mozilla::ProfilerIOInterposeObserver::Observe (c:\mozilla-source\mozilla-central\tools\profiler\gecko\ProfilerIOInterposeObserver.cpp:19)
#05:anonymous namespace'::PerThreadData::CallObservers (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:129) #06: mozilla::IOInterposer::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:474) #07: mozilla::IOInterposeObserver::Observation::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:382) #08:
anonymous namespace'::WinIOAutoObservation::~WinIOAutoObservation (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:148)
#09: `anonymous namespace'::InterposedNtWriteFile (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:269)
#10: WriteFile[C:\WINDOWS\System32\KERNELBASE.dll +0x1ebda]
#11: SetLoadLibraryCallback[C:\Program Files\ConEmu\ConEmu\ConEmuHk64.dll +0x28a03]
#12: fflush[C:\WINDOWS\System32\ucrtbase.dll +0x16db8]
#13: write[C:\WINDOWS\System32\ucrtbase.dll +0x15df5]
#14: write[C:\WINDOWS\System32\ucrtbase.dll +0x15c7f]
#15: o_iswalpha[C:\WINDOWS\System32\ucrtbase.dll +0x158b7]
#16: fclose_nolock[C:\WINDOWS\System32\ucrtbase.dll +0x172ee]
#17: fclose[C:\WINDOWS\System32\ucrtbase.dll +0x176f0]
#18: vprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:262)#19: fprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:287)
#20: mozilla::LogModuleManager::Print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:460) #21: mozilla::LogModule::Printv (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:602) #22: mozilla::detail::log_print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:71) #23: locked_profiler_stop (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3392) #24: profiler_shutdown (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:2918)
=== Cycle completed at
--- Mutex : StaticMutex (currently acquired)
calling context
#01: mozilla::BaseStaticMutex<mozilla::recordreplay::Behavior::Preserve>::Lock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\StaticMutex.h:39)
#02: mozilla::BaseAutoLock<PSMutex &>::BaseAutoLock (c:\mozilla-source\obj-mc-dbg\dist\include\mozilla\Mutex.h:165)
#03: profiler_get_backtrace (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3730)
#04: mozilla::ProfilerIOInterposeObserver::Observe (c:\mozilla-source\mozilla-central\tools\profiler\gecko\ProfilerIOInterposeObserver.cpp:19)
#05:anonymous namespace'::PerThreadData::CallObservers (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:129) #06: mozilla::IOInterposer::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:474) #07: mozilla::IOInterposeObserver::Observation::Report (c:\mozilla-source\mozilla-central\xpcom\build\IOInterposer.cpp:382) #08:
anonymous namespace'::WinIOAutoObservation::~WinIOAutoObservation (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:148)
#09: `anonymous namespace'::InterposedNtWriteFile (c:\mozilla-source\mozilla-central\xpcom\build\PoisonIOInterposerWin.cpp:269)
#10: WriteFile[C:\WINDOWS\System32\KERNELBASE.dll +0x1ebda]
#11: SetLoadLibraryCallback[C:\Program Files\ConEmu\ConEmu\ConEmuHk64.dll +0x28a03]
#12: fflush[C:\WINDOWS\System32\ucrtbase.dll +0x16db8]
#13: write[C:\WINDOWS\System32\ucrtbase.dll +0x15df5]
#14: write[C:\WINDOWS\System32\ucrtbase.dll +0x15c7f]
#15: o_iswalpha[C:\WINDOWS\System32\ucrtbase.dll +0x158b7]
#16: fclose_nolock[C:\WINDOWS\System32\ucrtbase.dll +0x172ee]
#17: fclose[C:\WINDOWS\System32\ucrtbase.dll +0x176f0]
#18: vprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:262)
#19: fprintf_stderr (c:\mozilla-source\mozilla-central\xpcom\base\nsCRTGlue.cpp:287)
#20: mozilla::LogModuleManager::Print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:460)
#21: mozilla::LogModule::Printv (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:602)
#22: mozilla::detail::log_print (c:\mozilla-source\mozilla-central\xpcom\base\Logging.cpp:71)
#23: locked_profiler_stop (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:3392)
#24: profiler_shutdown (c:\mozilla-source\mozilla-central\tools\profiler\core\platform.cpp:2918)
We've had a few similar issues with disk operations initiated from the profiler.
Maybe we need a more systematic way to disable the IO interposer when the profiler lock is taken?
Alternatively, maybe profiler_get_backtrace shouldn't use the lock? (Or use a different lock if stack-walking itself is not reentrant?)
Assignee | ||
Comment 1•6 years ago
|
||
Note: StaticMutex stacktraces gotten by commenting out MOZ_CALLSTACK_DISABLED in BlockingResourceBase.h:21.
Comment 2•6 years ago
|
||
Might also be what I was hitting when attempting to work on bug 1524574.
Assignee | ||
Comment 3•5 years ago
|
||
PSMutex can store the id of the thread owning the lock, and it's safe to check
whether the current thread owns that lock (either it does and no-one else can
change it, or it's another id).
Comments related to memory hooks have been moved to memory_hooks.cpp, because
profiler_is_locked_on_current_thread()
could be used for other things.
Assignee | ||
Comment 4•5 years ago
|
||
Logging could be intercepted by the I/O interposer, which takes the lock, this
could deadlock if the mutex had already been taken in the same thread (e.g., by
a "locked" profiler function that wants to log something).
Now, functions that hold the lock must use LOG_LOCKED(lock, ...)
, which will
skip logging when the I/O interposer is running.
In DEBUG builds, LOG()
checks that the lock is not owned, so find possible
misuses.
LOG()
s outside of platform.cpp were changed to NS_WARNING
s, because they
could not access gPSMutex
that's defined in platform.cpp; and they're really
warnings anyway.
Depends on D49895
Assignee | ||
Comment 5•5 years ago
|
||
I want to modify the PSMutex that Greg wrote in bug 1564475, to check if the current thread really owns the mutex or not.
Then those MOZ_LOG
s in locked profiler functions/scopes could be rewritten to skip logging when the I/O interposer is running, thus avoiding a deadlock (at the cost of some lost logging).
Updated•5 years ago
|
Updated•5 years ago
|
Comment 7•5 years ago
|
||
Backed out 2 changesets for causing bustages in shared-libraries-linux.cc and profiler/core/platform.cpp
Backout link: https://hg.mozilla.org/integration/autoland/rev/5d748daa45d3bd1893a47a377eeff6c8608021ee
Failure logs:
- src/tools/profiler/core/shared-libraries-linux.cc:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=272700366&repo=autoland&lineNumber=27770 - src/tools/profiler/core/platform.cpp:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=272700381&repo=autoland&lineNumber=78237
Assignee | ||
Comment 8•5 years ago
|
||
Thank you Alexandru.
Note to self: NS_WARNING
is not printf
!
I see now that my Try didn't include Android. 😅
Comment 9•5 years ago
|
||
There are also some assertion failures on profiler/core/platform.cpp:
https://treeherder.mozilla.org/#/jobs?repo=autoland&fromchange=f9e0d3054d08c26e3e58875dfac073dfe035026f&searchStr=windows%2C10%2Cx64%2Cdebug%2Cxpcshell%2Ctests%2Ctest-windows10-64%2Fdebug-xpcshell-e10s-2%2Cx%28x2%29&tochange=71c5259fcfa66a0d76b95df280a8843aab93ecbc&selectedJob=272705135
Log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=272705135&repo=autoland&lineNumber=5825
Assignee | ||
Comment 10•5 years ago
|
||
While working on these issues, I realized that in the first patch I introduced just the right tool to solve this issue at the most common points:
Instead of trying to cleverly pause&resume the interposer around potential I/Os in locked sections of the profiler, and instead of trying not to produce some I/Os (mostly from logging), we could just have the interposer itself check if the mutex is already locked (with profiler_is_locked_on_current_thread()
), in which case it just gives up and won't use potentially-locking profiler functions!
Hopefully this should handle all sources of profiler->IO->interposer->profiler deadlocks, so we won't have to revisit this again.
Assignee | ||
Comment 11•5 years ago
|
||
The source of deadlock is due to profiler->IO->interposer->profiler loops, where
the second profiler call tries to lock the profiler mutex again.
Thanks to profiler_is_locked_on_current_thread
the interposer can now check
whether the mutex is already locked, in which case it just won't call profiler
functions anymore, breaking the loop.
Depends on D49895
Updated•5 years ago
|
Comment 12•5 years ago
|
||
Comment 13•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/f0aabdb3bdb4
https://hg.mozilla.org/mozilla-central/rev/7e72dcc08ffb
Updated•5 years ago
|
Description
•