Closed Bug 1457545 Opened 6 years ago Closed 6 years ago

Mac Crash deadlock triggered by dlsym()/dlopen() deadlock

Categories

(Toolkit :: Crash Reporting, defect, P1)

57 Branch
Unspecified
macOS
defect

Tracking

()

RESOLVED FIXED
mozilla62
Tracking Status
firefox62 --- fixed

People

(Reporter: haik, Assigned: haik)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

See also bugs 1457501 and 1395504 which are similar. While running the test testing/marionette/harness/marionette_harness/tests/unit/test_crash.py with a change to make the test wait indefinitely for the browser to cleanly exit (see attached) and the fix for 1457501, I hit test hangs where Firefox has exited but a content process is stuck in a hung state. This problem is triggered by breakpad code in the exception handler thread running ExceptionHandler::WaitForMessage(). After we've finished generating a minidump, we call breakpad_exc_server() to send a response back to the exception server. That triggers a dlsym(3) call which blocks on a lock held by a suspended thread in the middle of a dlopen(3). The dlopen appears to hold a lock which is used to serialize dlsym() and dlopen() calls. Since the lock holder is suspended, it never releases the lock and causes deadlock in the crash handling. Ted tracked down the dlsym() blocking in Darwin/XNU code [1]. Here are some stacks from an instance of the problem. Exception handler thread, blocked trying to call dlsym: frame #0 libsystem_kernel.dylib`__psynch_mutexwait() frame #1 libsystem_pthread.dylib`_pthread_mutex_lock_wait() frame #2 libsystem_pthread.dylib`_pthread_mutex_lock_slow() frame #3 libdyld.dylib`LockHelper::LockHelper() frame #4 libdyld.dylib`dlsym() frame #5 libsystem_kernel.dylib`internal_catch_exception_raise() frame #6 libsystem_kernel.dylib`_Xexception_raise() frame #7 libsystem_kernel.dylib`exc_server() frame #8 XUL`google_breakpad::ExceptionHandler::WaitForMessage() at exception_handler.cc:207 frame #9 XUL`google_breakpad::ExceptionHandler::WaitForMessage() at exception_handler.cc:596 frame #10 libsystem_pthread.dylib`_pthread_body() frame #11 libsystem_pthread.dylib`_pthread_start() frame #12 libsystem_pthread.dylib`thread_start() Main thread, suspended during a call to dlopen, presumably holding a libdyld serialization lock: frame #0 dyld`stat64() frame #1 dyld`dyld::my_stat() frame #2 dyld`dyld::loadPhase5() frame #3 dyld`dyld::loadPhase4() frame #4 dyld`dyld::loadPhase3() frame #5 dyld`dyld::loadPhase1() frame #6 dyld`dyld::loadPhase0() frame #7 dyld`dyld::load() frame #8 dyld`dlopen() frame #9 libdyld.dylib`dlopen() frame #10 libsystem_sandbox.dylib`sandbox_init_with_parameters() frame #11 XUL`mozilla::StartMacSandbox() at Sandbox.mm:282 frame #12 XUL`mozilla::dom::ContentChild::RecvSetProcessSandbox() at ContentChild.cpp:1711 frame #13 XUL`mozilla::dom::ContentChild::RecvSetProcessSandbox() at ContentChild.cpp:1745 frame #14 XUL`mozilla::dom::PContentChild::OnMessageReceived() at PContentChild.cpp:5610 frame #15 XUL`mozilla::ipc::MessageChannel::DispatchAsyncMessage() at MessageChannel.cpp:2141 frame #16 XUL`mozilla::ipc::MessageChannel::DispatchMessage() at MessageChannel.cpp:2071 frame #17 XUL`mozilla::ipc::MessageChannel::MessageTask::Run() at MessageChannel.cpp:1950 frame #18 XUL`nsThread::ProcessNextEvent() at nsThread.cpp:1096 frame #19 XUL`NS_ProcessNextEvent() at nsThreadUtils.cpp:519 frame #20 XUL`mozilla::ipc::MessagePump::Run() at MessagePump.cpp:97 frame #21 XUL`MessageLoop::Run() at message_loop.cc:326 frame #22 XUL`MessageLoop::Run() at message_loop.cc:319 frame #23 XUL`MessageLoop::Run() at message_loop.cc:299 frame #24 XUL`XRE_RunAppShell() at nsEmbedFunctions.cpp:888 frame #25 XUL`MessageLoop::Run() at message_loop.cc:326 frame #26 XUL`MessageLoop::Run() at message_loop.cc:319 frame #27 XUL`MessageLoop::Run() at message_loop.cc:299 frame #28 XUL`XRE_InitChildProcess() at nsEmbedFunctions.cpp:719 frame #29 plugin-container`main() at plugin-container.cpp:50 frame #30 plugin-container`main() at MozillaRuntimeMain.cpp:25 frame #31 plugin-container`start() [1] https://github.com/apple/darwin-xnu/blob/5394bb038891708cd4ba748da79b90a33b19f82e/libsyscall/mach/exc_catcher.c#L55
Assignee: nobody → haftandilian
Priority: -- → P1
The posted changes resume threads before calling breakpad_exc_server(). Needs analysis is needed to determine if this is the best fix.
I'm looking into ways to fix this that don't require resuming threads earlier. And I hope to land this early in 62.
Haik, do you have an update for us?
Flags: needinfo?(haftandilian)
(In reply to Henrik Skupin (:whimboo) from comment #7) > Haik, do you have an update for us? Sorry, some higher priority things came up. I need to study the breakpad implementation a bit more. The posted fix worked well in my tests, but I'm not sure it's the right fix yet. I'll aim to get to this by the end of the week.
Flags: needinfo?(haftandilian)
After looking more at the breakpad code, I don't see a problem with resuming threads before calling breakpad_exc_server() after we've attempted to generate the minidump. At this point, we've already called WriteMinidumpWithException(exit_after_write=true) and therefore attempted to message the parent to write the minidump. Since WriteMinidumpWithException() returned instead of exiting, it must be because its call to crash_generation_client_->RequestDumpForException() fails which may mean we failed to generate a minidump. The call to breakpad_exc_server() ends up forwarding the exception which should result in the process being killed anyway.
Comment on attachment 8971682 [details] Bug 1457545 - Part 3 - Re-enable crash tests for Mac opt builds https://reviewboard.mozilla.org/r/240456/#review251212 Hurray!! Thank you so much!
Attachment #8971682 - Flags: review?(hskupin) → review+
Comment on attachment 8977763 [details] Bug 1457545 - Part 2 - Fix indendation (whitespace change only) https://reviewboard.mozilla.org/r/245250/#review251260 LGTM
Attachment #8977763 - Flags: review?(gsvelto) → review+
Comment on attachment 8971681 [details] Bug 1457545 - Part 1 - Mac Crash deadlock triggered by dlsym()/dlopen() deadlock https://reviewboard.mozilla.org/r/240454/#review251258 Very nice catch!
Attachment #8971681 - Flags: review?(gsvelto) → review+
Pushed by haftandilian@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c57a74c70332 Part 1 - Mac Crash deadlock triggered by dlsym()/dlopen() deadlock r=gsvelto https://hg.mozilla.org/integration/autoland/rev/a532236ec4a8 Part 2 - Fix indendation (whitespace change only) r=gsvelto https://hg.mozilla.org/integration/autoland/rev/e87851499d8f Part 3 - Re-enable crash tests for Mac opt builds r=whimboo
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: