Closed Bug 1004669 Opened 10 years ago Closed 10 years ago

Intermittent Windows & OS X shutdown timeout with MediaShutdownManager::Shutdown() on the stack

Categories

(Core :: Audio/Video, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
mozilla32
Tracking Status
firefox30 --- fixed
firefox31 --- fixed
firefox32 --- fixed
firefox-esr24 --- unaffected
b2g-v1.4 --- fixed
b2g-v2.0 --- fixed

People

(Reporter: emorley, Assigned: jwwang)

References

Details

(Keywords: intermittent-failure, Whiteboard: [qa-] )

Attachments

(3 files, 1 obsolete file)

Breaking out from bug 918759, which has become a dumping ground. On OS X and Windows, we're seeing intermittent failures where we hang during shutdown - and the induced crash shows MediaShutdownManager on the stack. Looking at the timeline, I believe bug 938107 caused this regression, since: https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=918759&entireHistory=true&tree=trunk ...shows a spike at 2013-12-16/2013-12-17 (note the OS X instances prior to that we're bug 957693 and bug 960605 aiui). Example logs: https://tbpl.mozilla.org/php/getParsedLog.php?id=38356405&full=1&branch=try#error0 Windows 7 32-bit try debug test crashtest on 2014-04-23 17:12:28 PDT for push 97491fef6385 17:35:31 INFO - Thread 0 17:35:31 INFO - 0 ntdll.dll + 0x470b4 17:35:31 INFO - eip = 0x772270b4 esp = 0x002af6e8 ebp = 0x002af754 ebx = 0x00000000 17:35:31 INFO - esi = 0x00000050 edi = 0x00000000 eax = 0x005c1d28 ecx = 0x650e6d3c 17:35:31 INFO - edx = 0x650e6d46 efl = 0x00200246 17:35:31 INFO - Found by: given as instruction pointer in context 17:35:31 INFO - 1 kernel32.dll + 0x4baf2 17:35:31 INFO - eip = 0x759cbaf3 esp = 0x002af75c ebp = 0x002af76c 17:35:31 INFO - Found by: previous frame's frame pointer 17:35:31 INFO - 2 kernel32.dll + 0x4baa1 17:35:31 INFO - eip = 0x759cbaa2 esp = 0x002af774 ebp = 0x002af780 17:35:31 INFO - Found by: previous frame's frame pointer 17:35:31 INFO - 3 nss3.dll!_PR_MD_WAIT_CV [w95cv.c:97491fef6385 : 248 + 0x10] 17:35:31 INFO - eip = 0x67a119de esp = 0x002af788 ebp = 0x002af79c 17:35:31 INFO - Found by: previous frame's frame pointer 17:35:31 INFO - 4 nss3.dll!_PR_WaitCondVar [prucv.c:97491fef6385 : 172 + 0x24] 17:35:31 INFO - eip = 0x67a0db1c esp = 0x002af7a4 ebp = 0x002af7b8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 5 nss3.dll!PR_WaitCondVar [prucv.c:97491fef6385 : 525 + 0xb] 17:35:31 INFO - eip = 0x67a0df36 esp = 0x002af7c0 ebp = 0x002af7d8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 6 nss3.dll!PR_Wait [prmon.c:97491fef6385 : 294 + 0xf] 17:35:31 INFO - eip = 0x67a0116d esp = 0x002af7e0 ebp = 0x002af7f8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 7 xul.dll!mozilla::ReentrantMonitor::Wait(unsigned int) [BlockingResourceBase.cpp:97491fef6385 : 320 + 0xb] 17:35:31 INFO - eip = 0x63771999 esp = 0x002af800 ebp = 0x002af820 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 8 xul.dll!nsEventQueue::GetEvent(bool,nsIRunnable * *) [nsEventQueue.cpp:97491fef6385 : 63 + 0x9] 17:35:31 INFO - eip = 0x637e3499 esp = 0x002af828 ebp = 0x002af83c 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 9 xul.dll!nsThread::ProcessNextEvent(bool,bool *) [nsThread.cpp:97491fef6385 : 693 + 0x23] 17:35:31 INFO - eip = 0x637eeb99 esp = 0x002af844 ebp = 0x002af8a4 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 10 xul.dll!NS_ProcessNextEvent(nsIThread *,bool) [nsThreadUtils.cpp:97491fef6385 : 263 + 0xc] 17:35:31 INFO - eip = 0x637867ae esp = 0x002af8ac ebp = 0x002af8b8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 11 xul.dll!mozilla::SharedThreadPool::SpinUntilShutdown() [SharedThreadPool.cpp:97491fef6385 : 78 + 0xc] 17:35:31 INFO - eip = 0x648addec esp = 0x002af8c0 ebp = 0x002af8d8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 12 xul.dll!mozilla::MediaShutdownManager::Shutdown() [MediaShutdownManager.cpp:97491fef6385 : 134 + 0x4] 17:35:31 INFO - eip = 0x648b0a0f esp = 0x002af8cc ebp = 0x002af8d8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 13 xul.dll!mozilla::MediaShutdownManager::Observe(nsISupports *,char const *,wchar_t const *) [MediaShutdownManager.cpp:97491fef6385 : 102 + 0x7] 17:35:31 INFO - eip = 0x648b4005 esp = 0x002af8d8 ebp = 0x002af8d8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 14 xul.dll!nsObserverList::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverList.cpp:97491fef6385 : 96 + 0x19] 17:35:31 INFO - eip = 0x637bf6b0 esp = 0x002af8e0 ebp = 0x002af8f8 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 15 xul.dll!nsObserverService::NotifyObservers(nsISupports *,char const *,wchar_t const *) [nsObserverService.cpp:97491fef6385 : 302 + 0xd] 17:35:31 INFO - eip = 0x637bf7cf esp = 0x002af900 ebp = 0x002af914 17:35:31 INFO - Found by: call frame info 17:35:31 INFO - 16 xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager *) [nsXPComInit.cpp:97491fef6385 : 777 + 0x16] 17:35:31 INFO - eip = 0x637868e0 esp = 0x002af91c ebp = 0x002af950 17:35:31 INFO - Found by: call frame info ... And: https://tbpl.mozilla.org/php/getParsedLog.php?id=38290528&full=1&branch=mozilla-inbound#error1 Rev4 MacOSX Snow Leopard 10.6 mozilla-inbound debug test mochitest-1 on 2014-04-22 18:06:19 PDT for push 4fca4b6565d8 19:13:51 INFO - Thread 0 (crashed) 19:13:51 INFO - 0 libSystem.B.dylib + 0x3ba6a 19:13:51 INFO - rbx = 0x00007fff7022bcc0 r12 = 0x000000010012bbf0 19:13:51 INFO - r13 = 0x000000010012bb78 r14 = 0x0000000000000001 19:13:51 INFO - r15 = 0x0000000000000000 rip = 0x00007fff819a0a6a 19:13:51 INFO - rsp = 0x00007fff5fbfe238 rbp = 0x00007fff5fbfe2b0 19:13:51 INFO - Found by: given as instruction pointer in context 19:13:51 INFO - 1 libSystem.B.dylib + 0x3f880 19:13:51 INFO - rip = 0x00007fff819a4881 rsp = 0x00007fff5fbfe240 19:13:51 INFO - rbp = 0x00007fff5fbfe2b0 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 2 libSystem.B.dylib + 0xdb7f2 19:13:51 INFO - rip = 0x00007fff81a407f3 rsp = 0x00007fff5fbfe268 19:13:51 INFO - rbp = 0x00007fff5fbfe2b0 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 3 libSystem.B.dylib + 0x3a12c 19:13:51 INFO - rip = 0x00007fff8199f12d rsp = 0x00007fff5fbfe280 19:13:51 INFO - rbp = 0x00007fff5fbfe2b0 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 4 libnss3.dylib!PR_Wait + 0x142 19:13:51 INFO - rip = 0x0000000100525163 rsp = 0x00007fff5fbfe2c0 19:13:51 INFO - rbp = 0x00007fff5fbfe320 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 5 XUL!_ZZL13nsEscapeCountPKc12nsEscapeMaskPmE8hexChars + 0x291b 19:13:51 INFO - rip = 0x00000001055e929c rsp = 0x00007fff5fbfe2e8 19:13:51 INFO - rbp = 0x00007fff5fbfe320 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 6 libSystem.B.dylib + 0x6699 19:13:51 INFO - rip = 0x00007fff8196b69a rsp = 0x00007fff5fbfe2f0 19:13:51 INFO - rbp = 0x00007fff5fbfe320 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 7 XUL!mozilla::ReentrantMonitor::Wait(unsigned int) [BlockingResourceBase.cpp:4fca4b6565d8 : 319 + 0xb] 19:13:51 INFO - rip = 0x0000000101812f53 rsp = 0x00007fff5fbfe330 19:13:51 INFO - rbp = 0x00007fff5fbfe360 19:13:51 INFO - Found by: stack scanning 19:13:51 INFO - 8 XUL!nsEventQueue::GetEvent(bool, nsIRunnable**) [ReentrantMonitor.h:4fca4b6565d8 : 199 + 0xc] 19:13:51 INFO - rbx = 0x000000010012b8d0 r12 = 0x00000001055e929c 19:13:51 INFO - r13 = 0x00000001055e92b4 r14 = 0x00007fff5fbfe3f8 19:13:51 INFO - r15 = 0x00000001055e9ef7 rip = 0x00000001018ab3c9 19:13:51 INFO - rsp = 0x00007fff5fbfe370 rbp = 0x00007fff5fbfe3a0 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 9 XUL!nsThread::ProcessNextEvent(bool, bool*) [nsThread.h:4fca4b6565d8 : 97 + 0x16] 19:13:51 INFO - rbx = 0x0000000000000000 r12 = 0x000000010012b850 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000000000001 19:13:51 INFO - r15 = 0x000000010012b890 rip = 0x00000001018aeca0 19:13:51 INFO - rsp = 0x00007fff5fbfe3b0 rbp = 0x00007fff5fbfe480 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 10 XUL!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:4fca4b6565d8 : 263 + 0xc] 19:13:51 INFO - rbx = 0x0000000000000001 r12 = 0x00000001563bd880 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000151ab5b78 19:13:51 INFO - r15 = 0x00000001055dbda6 rip = 0x000000010181da53 19:13:51 INFO - rsp = 0x00007fff5fbfe490 rbp = 0x00007fff5fbfe4a0 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 11 XUL!mozilla::SharedThreadPool::SpinUntilShutdown() [SharedThreadPool.cpp:4fca4b6565d8 : 78 + 0x11] 19:13:51 INFO - rbx = 0x0000000151ab5b60 r12 = 0x00000001563bd880 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000151ab5b78 19:13:51 INFO - r15 = 0x00000001055dbda6 rip = 0x0000000103102c8c 19:13:51 INFO - rsp = 0x00007fff5fbfe4b0 rbp = 0x00007fff5fbfe4c0 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 12 XUL!mozilla::MediaShutdownManager::Shutdown() [MediaShutdownManager.cpp:4fca4b6565d8 : 134 + 0x4] 19:13:51 INFO - rbx = 0x0000000151ab5b60 r12 = 0x00000001563bd880 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000151ab5b78 19:13:51 INFO - r15 = 0x00000001055dbda6 rip = 0x0000000103102b85 19:13:51 INFO - rsp = 0x00007fff5fbfe4d0 rbp = 0x00007fff5fbfe4f0 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 13 XUL!mozilla::MediaShutdownManager::Observe(nsISupports*, char const*, char16_t const*) [MediaShutdownManager.cpp:4fca4b6565d8 : 102 + 0x7] 19:13:51 INFO - rbx = 0x00000001055dbda6 r12 = 0x00000001563bd880 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000151ab5b60 19:13:51 INFO - r15 = 0x00000001055dbda6 rip = 0x0000000103102a91 19:13:51 INFO - rsp = 0x00007fff5fbfe500 rbp = 0x00007fff5fbfe510 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 14 XUL!nsObserverList::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverList.cpp:4fca4b6565d8 : 96 + 0xe] 19:13:51 INFO - rbx = 0x0000000000000005 r12 = 0x00000001563bd880 19:13:51 INFO - r13 = 0x000000010012d8a8 r14 = 0x0000000000000000 19:13:51 INFO - r15 = 0x00000001055dbda6 rip = 0x00000001018645f8 19:13:51 INFO - rsp = 0x00007fff5fbfe520 rbp = 0x00007fff5fbfe550 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 15 XUL!nsObserverService::NotifyObservers(nsISupports*, char const*, char16_t const*) [nsObserverService.cpp:4fca4b6565d8 : 302 + 0x10] 19:13:51 INFO - rbx = 0x0000000100144ae0 r12 = 0x00000001055dbda6 19:13:51 INFO - r13 = 0x0000000100144b08 r14 = 0x0000000000000000 19:13:51 INFO - r15 = 0x000000010012d8a8 rip = 0x0000000101865ba4 19:13:51 INFO - rsp = 0x00007fff5fbfe560 rbp = 0x00007fff5fbfe590 19:13:51 INFO - Found by: call frame info 19:13:51 INFO - 16 XUL!mozilla::ShutdownXPCOM(nsIServiceManager*) [nsXPComInit.cpp:4fca4b6565d8 : 776 + 0xd] 19:13:51 INFO - rbx = 0x0000000000000000 r12 = 0x0000000000000000 19:13:51 INFO - r13 = 0x00007fff5fbfe9e8 r14 = 0x000000010012d8a8 19:13:51 INFO - r15 = 0x0000000000000000 rip = 0x0000000101825c0c 19:13:51 INFO - rsp = 0x00007fff5fbfe5a0 rbp = 0x00007fff5fbfe610 19:13:51 INFO - Found by: call frame info ...
This intermittent failure is #10 on OrangeFactor - cpearce or roc, could you take a look?
Flags: needinfo?(cpearce)
Flags: needinfo?(roc)
This is going to have to be cpearce I think
Flags: needinfo?(roc)
I'll add this to my queue. This could caused by *anything* in the media stack, so it will take some time to find a solution.
Flags: needinfo?(cpearce)
If we block waiting for the mediadecoder shutdown like this in release builds, and we hang like this, this could make the user unable to restart Firefox. Which is bad. What we can do here is in opt (and thus release but not debug) builds is stick a timeout in SharedThreadPool::SpinUntilShutdown(), and give up if we don't complete shutdown before the timeout expires. Then we're less likely to cause grief with users, but we'll still get data to help us solve this.
It looks like some leaks preventing sPools->Count() from reaching 0 in SharedThreadPool::Release and ShutdownPoolsEvent is not dispatched. SharedThreadPool::SpinUntilShutdown waits until sPools is destroyed which never happens.
Thank you for looking at this :-)
Attached patch 1004669_fix_leaks.patch (obsolete) (deleted) — Splinter Review
MediaTaskQueue::Dispatch doesn't take the memory ownership of aRunnable when mIsShutdown is true. (https://hg.mozilla.org/try/file/20ca234fd62b/content/media/MediaTaskQueue.cpp#l33) And there will be a leak if passing |new nsRunnable(xxx)| and the memory ownership isn't transferred. (https://hg.mozilla.org/try/file/20ca234fd62b/content/media/MediaDecoderStateMachine.cpp#l1610) nsThread::DispatchInternal also has such a problem which can return early before transferring the memory ownership. (https://hg.mozilla.org/try/file/20ca234fd62b/xpcom/threads/nsThread.cpp#l453) Should we consider |thread->Dispatch(new nsRunnable())| a bad idiom or parameter as raw pointer being bad for the memory ownership is unclear?
Attachment #8420621 - Flags: review?(cpearce)
Attachment #8420621 - Flags: feedback?(rlin)
(In reply to JW Wang [:jwwang] from comment #11) > Should we consider |thread->Dispatch(new nsRunnable())| a bad idiom or > parameter as raw pointer being bad for the memory ownership is unclear? Yes! This is a bad idiom. I hit this last week in new code I was writing. We should prevent these silly mistakes.
Comment on attachment 8420621 [details] [diff] [review] 1004669_fix_leaks.patch Review of attachment 8420621 [details] [diff] [review]: ----------------------------------------------------------------- Good catch, but we can tweak the interface to make it explicit. ::: content/media/MediaTaskQueue.cpp @@ +26,5 @@ > MOZ_COUNT_DTOR(MediaTaskQueue); > } > > nsresult > +MediaTaskQueue::Dispatch(RefPtr<nsIRunnable> aRunnable) Please make this: nsresult MediaTaskQueue::Dispatch(TemporaryRef<nsIRunnable> aRunnable) { RefPtr<nsIRunnable> runnable(aRunnable); // ... Then the caller must explicitly call event.forget() and pass ownership of a reference to the task queue, though it may retain another reference it if wishes.
Attachment #8420621 - Flags: review?(cpearce) → review-
Sometimes I am confused with when to use nsRefPtr or RefPtr. It appears to me RefPtr+TemporaryRef are easier to use and read than nsRefPtr+already_AddRefed. Should we also replace nsRefPtr in MediaDecoderStateMachine?
(In reply to JW Wang [:jwwang] from comment #15) > Sometimes I am confused with when to use nsRefPtr or RefPtr. It appears to > me RefPtr+TemporaryRef are easier to use and read than > nsRefPtr+already_AddRefed. Should we also replace nsRefPtr in > MediaDecoderStateMachine? I prefer mozilla::RefPtr. I started a thread in m.d.platform to clarify: https://groups.google.com/forum/#!topic/mozilla.dev.platform/Xsuwy4h7lNE
Attached patch 1004669_fix_leaks.patch (deleted) — Splinter Review
Use TemporaryRef to prevent loss of memory ownership.
Assignee: nobody → jwwang
Attachment #8420621 - Attachment is obsolete: true
Status: NEW → ASSIGNED
Attachment #8420621 - Flags: feedback?(rlin)
Attachment #8420744 - Flags: feedback?(rlin)
Comment on attachment 8420744 [details] [diff] [review] 1004669_fix_leaks.patch Look good to me. :)
Attachment #8420744 - Flags: feedback?(rlin) → feedback+
Attachment #8420744 - Flags: review?(cpearce)
Attachment #8420744 - Flags: review?(cpearce) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
Given how this bug started (spun off from a big dumping ground bug), I'm not entirely sure what all branches are affected by this. AFAICT, both Aurora and Beta are affected? JW, can you please confirm and nominate this patch for whatever branches need it? :)
Flags: needinfo?(jwwang)
Attached patch 1004669_fix_leaks_aurora.patch (deleted) — Splinter Review
[Approval Request Comment] Bug caused by (feature/regressing bug #): 962385 User impact if declined: browser might hang if shutdown when playing music Testing completed (on m-c, etc.): try: https://tbpl.mozilla.org/?tree=Try&rev=f4a1a4a84069 Risk to taking this patch (and alternatives if risky): low String or IDL/UUID changes made by this patch: none
Attachment #8421592 - Flags: review+
Attachment #8421592 - Flags: approval-mozilla-aurora?
Flags: needinfo?(jwwang)
Attached patch 1004669_fix_leaks_beta.patch (deleted) — Splinter Review
[Approval Request Comment] Bug caused by (feature/regressing bug #): 962385 User impact if declined: browser might hang if shutdown when playing music Testing completed (on m-c, etc.): try: https://tbpl.mozilla.org/?tree=Try&rev=37893fcdde60 Risk to taking this patch (and alternatives if risky): low String or IDL/UUID changes made by this patch: none
Attachment #8421595 - Flags: review+
Attachment #8421595 - Flags: approval-mozilla-beta?
Attachment #8421592 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #8421595 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(Tweaking summary to avoid future TBPL false positives, now this is fixed)
Summary: Intermittent *Windows & OS X* TEST-UNEXPECTED-FAIL | Shutdown | application timed out after 330 seconds with no output (with MediaShutdownManager::Shutdown() on the stack) → Intermittent Windows & OS X shutdown timeout with MediaShutdownManager::Shutdown() on the stack
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: