Closed Bug 1346356 Opened 8 years ago Closed 8 years ago

Deadlock in the profiler because locked_profiler_stop notifies observers

Categories

(Core :: Gecko Profiler, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox52 --- unaffected
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- unaffected
firefox55 --- fixed

People

(Reporter: mstange, Assigned: n.nethercote)

References

(Blocks 1 open bug)

Details

(Keywords: regression)

Attachments

(4 files, 1 obsolete file)

Example call stacks: > [...] > NS_InvokeByIndex > nsProfiler::StopProfiler > profiler_stop > locked_profiler_stop <-- !! > nsObserverService::NotifyObservers > SharedStub > PrepareAndDispatch > nsXPCWrappedJSClass::CallMethod > JS_CallFunctionValue > js::Call > [... js stuff] > mozilla::dom::Element::SetAttr > mozilla::dom::Element::SetAttrAndNotify > nsNodeUtils::AttributeChanged > mozilla::PresShell::AttributeChanged > mozilla::GeckoRestyleManager::AttributeChanged > nsImageBoxFrame::AttributeChanged > mozilla::PresShell::FrameNeedsReflow > nsRefreshDriver::AddLayoutFlushObserver > profiler_get_backtrace > PR_Lock > [...] > mozilla::dom::ContentChild::RecvStopProfiler > profiler_stop > locked_profiler_stop <-- !! > nsObserverService::NotifyObservers > SharedStub > PrepareAndDispatch > nsXPCWrappedJSClass::CallMethod > JS_CallFunctionValue > js::Call > js::InternalCallOrConstruct > js::RunScript > [... js stuff] > js::jit::Bailout > js::jit::BailoutIonToBaseline > ProfilerJSEventMarker > PR_Lock
Bleh. I'll take a look on Monday.
Assignee: nobody → n.nethercote
Blocks: 1273422
Oh dear. I assigned this to myself and said I would look at it on Monday, but I ended up looking at it yesterday and writing some patches. They are quite similar to yours, but do a bit more refactoring as well, and I fixed a leak I found along the way. Also, I think yours misses the fact that profiler_init() calls profiler_start(). How about I upload them now for your consideration? They're a little under-tested, but I might as well show you them now given the circumstances.
profiler_start() can call locked_profiler_stop(). In that case it must then call SamplerThread::Join() in order to free the SamplerThread, but it currently doesn't. This patch adds such a call. It also marks locked_profiler_start() with MOZ_MUST_USE to make this mistake less likely in the future.
Attachment #8846385 - Flags: review?(mstange)
SamplerThread::Join() is always called just before the SamplerThread object is deleted. Might as well combine them.
Attachment #8846386 - Flags: review?(mstange)
Calling NotifyObserver() with gPSMutex locked is a bad idea; arbitrary code can run in observers, which can easily include code that calls other profiler functions that themselves lock gPSMutex, causing deadlock. This has been seen in practise with locked_profiler_stop(). This patch moves all but one of the NotifyObserver() calls in platform.cpp to after the sections where gPSMutex is locked. The remaining call (for the "profiler-subprocess") is harmless, because it just calls a simple callback implemented within platform.cpp, and hard to move. In the future we plan to allow profiler_start() and profiler_stop() to be called from different threads. When that happens, it will be possible for the "profiler-start" and "profiler-stop" notifications to arrive out of order. Unfortunately this seems difficult to avoid. (Well, recursive mutexes would make this problem much easier, but we don't have those...)
Attachment #8846387 - Flags: review?(mstange)
This patch does the following. - Introduces NotifyObservers() for the simple notification cases in platform.cpp. - Removes profiler_lock() and profiler_unlock() because they do notifications that the profiler add-on no longer listens for.
Attachment #8846388 - Flags: review?(mstange)
It sounds like you have confirmed that this deadlock is the cause of bug 1273422? Good to know.
(In reply to Nicholas Nethercote [:njn] from comment #4) Oh, oops. Well, I didn't spend very much time on it, and I trust your patches more than I trust mine. :) (In reply to Nicholas Nethercote [:njn] from comment #9) > It sounds like you have confirmed that this deadlock is the cause of bug > 1273422? Good to know. I'm not 100% certain but the log does say "###!!! ERROR: Potential deadlock detected:" so I think it's fairly likely. Thanks for uploading your patches. On the other hand now I can't promise I'll look at them before Monday (your Tuesday)...
Comment on attachment 8846352 [details] Bug 1346356 - Do not notify observers while gPSMutex is locked. https://reviewboard.mozilla.org/r/119408/#review121304 I will r- because my changes subsume yours, and ultimately I think leave the code in a better state overall. Sorry for the overlap!
Attachment #8846352 - Flags: review?(n.nethercote) → review-
> Thanks for uploading your patches. On the other hand now I can't promise > I'll look at them before Monday (your Tuesday)... It's a public holiday here on Monday anyway :)
Comment on attachment 8846385 [details] [diff] [review] (part 1) - Fix SamplerThread leak in profiler_start() Review of attachment 8846385 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +2359,4 @@ > LOG("END profiler_start"); > } > > +static MOZ_MUST_USE SamplerThread* Nice, I was about to suggest adding a MOZ_MUST_USE here.
Attachment #8846385 - Flags: review?(mstange) → review+
Attachment #8846386 - Flags: review?(mstange) → review+
Comment on attachment 8846387 [details] [diff] [review] (part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked Review of attachment 8846387 [details] [diff] [review]: ----------------------------------------------------------------- ::: tools/profiler/core/platform.cpp @@ +2394,1 @@ > if (samplerThread) { Using samplerThread is much better than the separate wasActive variable that I was using.
Attachment #8846387 - Flags: review?(mstange) → review+
Attachment #8846388 - Flags: review?(mstange) → review+
Attachment #8846352 - Attachment is obsolete: true
https://hg.mozilla.org/integration/mozilla-inbound/rev/3ec4e9dbdcb3f2a47e0618b6c97a3f753f92f1c9 Bug 1346356 (part 1) - Fix SamplerThread leak in profiler_start(). r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/8b0bfd19d13b861440f99b973249ac70f53cd457 Bug 1346356 (part 2) - Merge SamplerThread::Join() into ~SamplerThread(). r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/cbfe870385c09f0a63c46b586eeadb113cbde08a Bug 1346356 (part 3) - Do most NotifyObserver() calls when gPSMutex is unlocked. r=mstange. https://hg.mozilla.org/integration/mozilla-inbound/rev/e4725643ead636a7cb9b8f2103f68a31fe278dbe Bug 1346356 (part 4) - Clean up notification in the profiler. r=mstange.
No longer blocks: 1273422
mstange, how did you encounter this? Just running the browser manually?
(In reply to Nicholas Nethercote [:njn] from comment #16) > mstange, how did you encounter this? Just running the browser manually? Yes, and starting and stopping the profiler. I didn't do anything special.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: