Closed Bug 1279612 Opened 8 years ago Closed 8 years ago

Near permafailing mda tests on Win7 debug on beta in test_webvtt_disabled.html | application crashed [@ mozalloc_abort(char const * const)]

Categories

(Core :: Audio/Video: GMP, defect, P1)

49 Branch
defect

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox48 + wontfix
firefox49 + wontfix
firefox50 --- fixed
firefox51 --- fixed
firefox52 --- fixed

People

(Reporter: KWierso, Assigned: cyu)

References

Details

(Keywords: assertion, crash, intermittent-failure)

Attachments

(1 file)

Component: Audio/Video → Audio/Video: Playback
Benjamin - can you look into this?
Flags: needinfo?(bechen)
Priority: -- → P2
Assignee: nobody → bechen
Flags: needinfo?(bechen)
[Tracking Requested - why for this release]: Extremely high media test failure rate on OSX & Windows, especially on e10s, undermining confidence in what we're shipping.

mochitest-media (especially e10s) on OSX/Win is a trainwreck on Beta at the moment. It makes me very concerned about the quality of what we're shipping to our users. The crash this bug was filed for is one of a few a different ones that are hitting nearly every push at the moment:

https://treeherder.mozilla.org/logviewer.html#?job_id=1178363&repo=mozilla-beta#L21500
https://treeherder.mozilla.org/logviewer.html#?job_id=1175310&repo=mozilla-beta#L9472
https://treeherder.mozilla.org/logviewer.html#?job_id=1182251&repo=mozilla-beta#L29271
https://treeherder.mozilla.org/logviewer.html#?job_id=1187186&repo=mozilla-beta#L5180
https://treeherder.mozilla.org/logviewer.html#?job_id=1186497&repo=mozilla-beta#L29188
etc...

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&filter-searchStr=mda&fromchange=46d72a56c57dafb4dc1061d4741a3e1181ac3d68&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable

Seems like we have some serious media shutdown issues on 48 AFAICT.
Severity: normal → critical
Flags: needinfo?(ajones)
Keywords: assertion, crash
Tracking as it is critical for the release.
JW - is this a media shutdown issue?
Flags: needinfo?(ajones) → needinfo?(jwwang)
Priority: P2 → P1
It looks like an stl container error.

https://treeherder.mozilla.org/logviewer.html#?job_id=1243597&repo=mozilla-beta#L8407
fatal: STL threw length_error: deque<T> too long

stack:
0  mozglue.dll!mozalloc_abort(char const * const) [mozalloc_abort.cpp:2b22de79c849 : 33 + 0x2e]
1  mozglue.dll!abort_from_exception [msvc_raise_wrappers.cpp:2b22de79c849 : 18 + 0x8]
2  mozglue.dll!std::moz_Xlength_error(char const *) [msvc_raise_wrappers.cpp:2b22de79c849 : 36 + 0xd]
3 xul.dll!std::deque<MessageLoop::PendingTask,std::allocator<MessageLoop::PendingTask> >::_Growmap(unsigned int) [deque : 1782 + 0xb] 

Hi Nathan,
Do you have any idea about "STL threw length_error: deque<T> too long" or should I ask some MFBT guys?
Flags: needinfo?(jwwang) → needinfo?(nfroyd)
MSVC STL implementation aborts if the container length exceeds 0x7fffffff, which is quite large. I think this is more likely a memory corruption.
Cervantes's comment sounds like a more likely cause of this problem; a queue in core IPC code growing to 2^31 elements sounds pretty unusual.  It's possible that we just keep injecting messages into the queue without processing them, though?

Some logging on that deque when it grows to 128 elements, 256 elements, etc. might be informative, along with an NS_ASSERTION(false): we might get some insight as to whether it's an overly long queue if the same stacks are showing up repeatedly.  Or we might confirm the memory corruption hypothesis if we only get a few messages before crashing.

Looking at the crash report shows that this is a non-main-thread queue that's supposedly overflowing.  I bet that the crashing thread is a PBackground thread, since the main thread (Thread 0) certainly looks like it's waiting for a background thread to shut down.  This crash is happening in the parent process, so perhaps the PBackground thread is stuck in some sort of error loop, constantly posting error messages (perhaps because it can't reach the child process?), and it can't make progress to unblock the main thread?

ni? Kyle for PBackground eyes on this problem.
Flags: needinfo?(nfroyd) → needinfo?(khuey)
The crashing thread is the IPC thread, not the PBackground thread (note the base::MessagePumpForIO::DoRunLoop() on the stack).

The main thread is definitely blocked waiting for the PBackground thread to shut down.  It's not obvious to me which thread is the PBackground thread.  None of the threads remaining are wedged, and none are obviously the PBackground thread.  It's possible that it's not even running at all, because we keep spinning until there are no more actors left (http://hg.mozilla.org/releases/mozilla-beta/annotate/tip/ipc/glue/BackgroundImpl.cpp#l1244) but there is the timeout code just above that that should kick in.
Flags: needinfo?(khuey)
https://hg.mozilla.org/try/rev/1303f0a868c2b8b60ba406e38f28af33cd32ce11#l1.13
I add some sanity checks and assertions.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7358d94e4b2c
The tests show that we never have more than 512 tasks in the queue.

Maybe it is really a memory corruption?
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f1512b7fd05

This sets X86 debug register to crash when the length variable of the incoming queue of the IPC thread is corrupted. Hopefully we should see the corruption in the crash stack.
Fix build bustage. https://treeherder.mozilla.org/#/jobs?repo=try&revision=0363e19bf430
On 48: https://treeherder.mozilla.org/#/jobs?repo=try&revision=477f56bccec3&selectedJob=23333260

Strangely, it still crashed like without the watchpoint.
https://hg.mozilla.org/try/rev/024cec644970bcf7e0ce018c94880c93da011e2e
I add some logs to trace the life cycle of MessageLoop and find out...

https://treeherder.mozilla.org/logviewer.html#?job_id=23334553&repo=try#L14694
Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 
It looks like |mWorkerLoop| becomes a dangling pointer and causes UAF.

Hi Kyle,
Since |MessageChannel::mWorkerLoop| is a raw pointer, do you know what ensures it is always valid?
Flags: needinfo?(khuey)
I don't think anything ensures it's always valid.  The target thread/MessageLoop should outlive the channel.

Can you figure out if the timer at http://searchfox.org/mozilla-central/rev/a7c8e9f3cc323fd707659175a46826ad12899cd1/ipc/glue/BackgroundImpl.cpp#1228 is firing?
Flags: needinfo?(khuey)
https://treeherder.mozilla.org/logviewer.html#?job_id=23402844&repo=try#L14536

22:33:33 INFO - jw schedule shutdown timer, closuer=002EF698 timer=10A9FFB0 
...
22:33:33 INFO - Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 

Can't find logs about timer fired.
Flags: needinfo?(khuey)
There's also "[Parent 1148] WARNING: pipe error: 109: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 343".  That is (presumably) the error that we're trying to handle when we crash.  MSDN says system error code 109 is ERROR_BROKEN_PIPE, which makes sense.

I would try to figure out how the PBackground thread's MessageLoop got destroyed, because that doesn't make any sense. The main thread is still spinning the event loop in ShutdownBackgroundThread so it shouldn't have called Shutdown() on it yet.
Flags: needinfo?(khuey)
Benjamin and JW,
How is this bug going?
Flags: needinfo?(jwwang)
Flags: needinfo?(bechen)
Per comment 19, Kyle is figuring out the timing of destroying the PBackground thread.
Flags: needinfo?(jwwang)
Flags: needinfo?(bechen)
Er, no, those were suggestions for you to try ...

Anyways, I'm in Taipei this week, so let's look at it in person.
Can we find someone familiar with PBackground to look into this issue?
I'm the closest thing to that that exists.
Too late for 48, can we make progress for 49?
Since Kyle is going, who can find someone else familiar with PBackground to take a look?
I know something about PBackground... perhaps billm also?
It appears that this crash is related to GMPParent. I added more debug code to the patch in comment #16 and found that the crash happens when the IPC transport notifies channel error after the GMP thread is already shut down.

The try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bb9adf0e8dc&selectedJob=25363170

The stack of opening an IPC channel with the crashing MessageLoop:
02:16:27     INFO -  jw MessageChannel 198D8438 Open 12636CC0
                                                     ^^^^^^^^ this is the message loop address
02:16:27     INFO -  [1704] #01: mozilla::gmp::GMPParent::LoadProcess() [dom/media/gmp/GMPParent.cpp:175]
02:16:27     INFO -  [1704] #02: mozilla::gmp::GMPParent::EnsureProcessLoaded() [dom/media/gmp/GMPParent.cpp:626]
02:16:27     INFO -  [1704] #03: mozilla::gmp::GMPServiceParent::RecvLoadGMP(nsCString const &,nsCString const &,nsTArray<nsCString> &&,nsTArray<unsigned long> &&,unsigned long *,nsCString *,unsigned int *,nsresult *) [dom/media/gmp/GMPServiceParent.cpp:1846]
02:16:27     INFO -  [1704] #04: mozilla::gmp::PGMPServiceParent::OnMessageReceived(IPC::Message const &,IPC::Message * &) [obj-firefox/ipc/ipdl/PGMPServiceParent.cpp:297]
02:16:27     INFO -  [1704] #05: mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const &,IPC::Message * &) [ipc/glue/MessageChannel.cpp:1637]
02:16:27     INFO -  [1704] #06: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) [ipc/glue/MessageChannel.cpp:1600]
02:16:27     INFO -  [1704] #07: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1572]
02:16:27     INFO -  [1704] #08: nsRunnableMethodImpl<bool ( mozilla::ipc::MessageChannel::*)(void),0,1>::Run() [obj-firefox/dist/include/nsThreadUtils.h:751]
02:16:27     INFO -  [1704] #09: mozilla::ipc::MessageChannel::DequeueTask::Run() [obj-firefox/dist/include/mozilla/ipc/MessageChannel.h:498]
02:16:27     INFO -  [1704] #10: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1067]
02:16:27     INFO -  [1704] #11: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:290]
02:16:27     INFO -  [1704] #12: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:384]
02:16:27     INFO -  [1704] #13: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241]
02:16:27     INFO -  [1704] #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235]
02:16:27     INFO -  [1704] #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215]
02:16:27     INFO -  [1704] #16: nsThread::ThreadFunc(void *) [xpcom/threads/nsThread.cpp:469]
02:16:27     INFO -  [1704] #17: _PR_NativeRunThread [nsprpub/pr/src/threads/combined/pruthr.c:397]
02:16:27     INFO -  [1704] #18: pr_root [nsprpub/pr/src/md/windows/w95thred.c:95]
02:16:27     INFO -  [1704] #19: ucrtbase.DLL + 0x862a4
02:16:27     INFO -  [1704] #20: kernel32.dll + 0x53c45
02:16:27     INFO -  [1704] #21: ntdll.dll + 0x637f5
02:16:27     INFO -  [1704] #22: ntdll.dll + 0x637c8

Then we the log entry
02:32:11     INFO -  cyu shutdown the GMP thread. pid=3952
shows that the parent process shuts down the GMP thread, but later an error with IPC channel is notified:

02:32:11     INFO -  mWorkerLoop is 12636CC0
02:32:11     INFO -  Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2132
02:32:11     INFO -  [3996] #01: mozilla::ipc::ProcessLink::OnChannelError() [ipc/glue/MessageLink.cpp:418]
02:32:11     INFO -  [3996] #02: base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) [ipc/chromium/src/base/message_pump_win.cc:497]
02:32:11     INFO -  [3996] #03: base::MessagePumpForIO::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:441]
02:32:11     INFO -  [3996] #04: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:80]
02:32:11     INFO -  [3996] #05: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241]
02:32:11     INFO -  [3996] #06: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235]
02:32:11     INFO -  [3996] #07: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215]
02:32:11     INFO -  [3996] #08: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:192]
02:32:11     INFO -  [3996] #09: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/platform_thread_win.cc:29]
02:32:11     INFO -  [3996] #10: kernel32.dll + 0x53c45
02:32:11     INFO -  [3996] #11: ntdll.dll + 0x637f5
02:32:11     INFO -  [3996] #12: ntdll.dll + 0x637c8

We need to find out which IPC channel opened on the GMP thread is left open when the thread is shut down.
cpearce, gerald, see comment 30 - GMP is involved here, and this may also be the cause of bug 1238542, which is apparently near permafail on beta (MSG not shutdown issues).  This bug started about when 48 uplifted to beta.
Flags: needinfo?(gsquelart)
Flags: needinfo?(cpearce)
It appears that this is a race condition during shutdown. A GMPServiceParent instance is deleted asynchronously in its GMPServiceParent::ActorDestroy(), which depends on PGMPServiceChild::Close(). If the parent process observes "xpcom-shutdown-threads" earlier than the child process, it could shut down the GMP thread while the GMPServiceParent instance is still alive. Later when GMPServiceChild::Close() is called because the child process observes "xpcom-shutdown-threads" and Close() the IPC channel, the parent process will observe that and try to post a task to the already-shutdown GMP thread.

I guess we need something similar to mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until all IPC channels running on it are closed.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #32)

> I guess we need something similar to
> mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until
> all IPC channels running on it are closed.

You should look at AsyncShutdownBlocker (as used in MediaStreamGraph); this will hold shutdown at that stage of shutdown.

(Note: currently that's backed out of beta, but will reland soon.  It's enabled on aurora/nightly)
It looks like it is not caused by WebVTT, so un-assign Benjamin.
Assignee: bechen → nobody
Assignee: nobody → cpearce
Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else who can fix the tests before the beta-release merge next Monday, or before 49 release a week later?
Flags: needinfo?(ajones)
(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35)
> Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else
> who can fix the tests before the beta-release merge next Monday, or before
> 49 release a week later?

Blake - can you co-ordinate with JW and Cervantes to see what we can do here?
Flags: needinfo?(ajones) → needinfo?(bwu)
(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35)
> > Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else
> > who can fix the tests before the beta-release merge next Monday, or before
> > 49 release a week later?
> 
> Blake - can you co-ordinate with JW and Cervantes to see what we can do here?
JW is PTO today. 
Cervantes, 
Can you help us move this bug forward?
Flags: needinfo?(bwu) → needinfo?(cyu)
Is there anything left to do here?  Now that bug 1239873 appears to be landed for good (not getting backed out from Beta via bug 1255737 every cycle), the media test shutdown issues are largely gone. Anyway, I think the issue is well-understood enough via bug 1255737 and deps that we can just close this out now.
I don't think bug 1239873 and 1255737 fix this bug. Those 2 bugs handles shutdown of the MediaStreanGraph thread, while this bug handles the bug of shutting down the GMP thread.

I have a preliminary patch to avoid use-after-free of the GMP thread based on the previous analysis: https://treeherder.mozilla.org/#/jobs?repo=try&revision=124b32d6c992 . We should see no more GMP thread crashes with it.

But note that win7 mda chunk has several crashes. I've seen MediaStreamGraph thread, PBackground thread and GMP thread shutdown crashes in this chunk. Those crashes will start showing up after the GMP thread crash is fixed.
Flags: needinfo?(cyu)
If I load TreeHerder on beta and filter by "job symbol": "mda", I hardly see this test_webvtt_disabled.html failure coming up as "near permafail". I've spotted 2 instances of this failures in the last 50 runs.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&fromchange=e48f4467d705d26ddd69ef5ac1825ec973c30344&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-job_type_symbol=mda&selectedJob=1475053

Of course mda test boxes do look alarmingly orange, it looks like most of these are other bugs in the MSG, i.e. they should be WebRTC or WebAudio related.

That's not to say we shouldn't fix the bug; I'd like to see cyu land his fix. It just seems we should be focusing on the other failures in MSG will bring the orange rate down faster. For example bug 1238542, and the *test*peerConnection* failures.
Flags: needinfo?(cpearce)
Look forward cyu's patch to fix this bug!
Flags: needinfo?(cyu)
The original crash seems to disappear from beta runs. I am not sure if some change elsewhere fixes or hides this crash. We need to check if the analysis in comment #32 is still valid. If yes then I'll move forward with the current patch. Otherwise, we can close this bug.
Flags: needinfo?(cyu)
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

I'm on PTO, so defer this to Gerald.
Attachment #8789755 - Flags: review?(cpearce) → review?(gsquelart)
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

r+ with comments addressed.

::: dom/media/gmp/GMPServiceParent.cpp:1874
(Diff revision 1)
> +}
> +
> +NS_IMETHODIMP
> +GeckoMediaPluginServiceParent::BlockShutdown(nsIAsyncShutdownClient*)
> +{
> +

Remove empty line.

::: dom/media/gmp/GMPServiceParent.cpp:1899
(Diff revision 1)
> +  MOZ_ASSERT(mServiceUserCount > 0);
> +  if (--mServiceUserCount == 0) {

Are you sure about this assert & decrement?

In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense).
But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on).

If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed.

::: dom/media/gmp/GMPServiceParent.cpp:1945
(Diff revision 1)
> +  NS_DispatchToMainThread(
> +    NewRunnableMethod(mService.get(),
> +                      &GeckoMediaPluginServiceParent::ServiceUserDestroyed),
> +    NS_DISPATCH_SYNC);

Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible!
ServiceUserDestroyed doesn't need the object to still be alive, does it?
(Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.)
Attachment #8789755 - Flags: review?(gsquelart) → review+
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

> Are you sure about this assert & decrement?
> 
> In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense).
> But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on).
> 
> If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed.

You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed().

> Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible!
> ServiceUserDestroyed doesn't need the object to still be alive, does it?
> (Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.)

It doesn't need to be synchornous. And if we deny the creation of GMPServiceParent in shutting down, then we don't event need to pass the pointer here.
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

> You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed().

Thank you. I'll want to review that, please!
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76792

Back to 'r?' so I can review the upcoming big change.
Attachment #8789755 - Flags: review+ → review?(gsquelart)
Assignee: cpearce → cyu
Component: Audio/Video: Playback → Audio/Video: GMP
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review78378

Sorry for the delay, I didn't notice the new revision. I'm usually quick to review, so please buzz me next time you don't hear from me within a day.

r+ with concern addressed:

::: dom/media/gmp/GMPServiceParent.cpp:2118
(Diff revision 2)
> +
>    nsAutoPtr<GMPServiceParent> serviceParent(new GMPServiceParent(gmp));
>  
>    nsCOMPtr<nsIThread> gmpThread;
>    nsresult rv = gmp->GetThread(getter_AddRefs(gmpThread));
> -  NS_ENSURE_SUCCESS(rv, nullptr);
> +  MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));

Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere?
Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible).

If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line.
Attachment #8789755 - Flags: review?(gsquelart) → review+
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review78378

> Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere?
> Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible).
> 
> If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line.

From GeckoMediaPluginService::GetThread(), it looks to me that if we are not in shutdown, then we should be able to get the thread (except when we fail to create the thread). But I think you are right that it the wrong choice to place an assertion here because the caller won't have any chance to handle the failure, even if the failure is very likely. And it's wrong that the caller method of GeckoMediaPluginService::GetThread() crosses the boundary to look at the implementation and place an assertion based on it. So I'll use NS_ENSURE_SUCCESS() here.
Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b82fa1825412
Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald
Backed out in https://hg.mozilla.org/integration/autoland/rev/79604052ba79 for static analysis bustage, https://treeherder.mozilla.org/logviewer.html#?job_id=3849813&repo=autoland
https://treeherder.mozilla.org/#/jobs?repo=try&revision=caba143550c7
Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/583124bcd4d9
Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald
https://hg.mozilla.org/mozilla-central/rev/583124bcd4d9
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Not sure which branches (if any) you want to backport this to, but please fill out the approval requests whenever you decide :)
Flags: needinfo?(gsquelart) → needinfo?(cyu)
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

Approval Request Comment
[Feature/regressing bug #]: 1057908
[User impact if declined]: Potential browser crash during shutdown after the user has ever viewed DRM-wrapped content.
[Describe test coverage new/current, TreeHerder]: Patch landed on m-c. It was also tested on try on the beta branch. Also performed manual tests with code modification trying to trigger the bug.
[Risks and why]: Low to medium. This is a fix of a timing issue during shutdown. We carefully reviewed and tested the patch. It's unlikely to have side effects.
[String/UUID change made/needed]: None.
Flags: needinfo?(cyu)
Attachment #8789755 - Flags: approval-mozilla-beta?
Attachment #8789755 - Flags: approval-mozilla-aurora?
Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

Makes the GMP shutdown more robust, Aurora51+, Beta50+
Attachment #8789755 - Flags: approval-mozilla-beta?
Attachment #8789755 - Flags: approval-mozilla-beta+
Attachment #8789755 - Flags: approval-mozilla-aurora?
Attachment #8789755 - Flags: approval-mozilla-aurora+
https://hg.mozilla.org/releases/mozilla-aurora/rev/c796e9b24f3c
https://hg.mozilla.org/releases/mozilla-beta/rev/118b9b4f5473
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: