Bugzilla

Comment 7

•

8 years ago

It looks like an stl container error.

https://treeherder.mozilla.org/logviewer.html#?job_id=1243597&repo=mozilla-beta#L8407
fatal: STL threw length_error: deque<T> too long

stack:
0  mozglue.dll!mozalloc_abort(char const * const) [mozalloc_abort.cpp:2b22de79c849 : 33 + 0x2e]
1  mozglue.dll!abort_from_exception [msvc_raise_wrappers.cpp:2b22de79c849 : 18 + 0x8]
2  mozglue.dll!std::moz_Xlength_error(char const *) [msvc_raise_wrappers.cpp:2b22de79c849 : 36 + 0xd]
3 xul.dll!std::deque<MessageLoop::PendingTask,std::allocator<MessageLoop::PendingTask> >::_Growmap(unsigned int) [deque : 1782 + 0xb] 

Hi Nathan,
Do you have any idea about "STL threw length_error: deque<T> too long" or should I ask some MFBT guys?

Flags: needinfo?(jwwang) → needinfo?(nfroyd)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Assignee

Comment 8

•

8 years ago

MSVC STL implementation aborts if the container length exceeds 0x7fffffff, which is quite large. I think this is more likely a memory corruption.

Nathan Froyd [:froydnj]

Comment 9

•

8 years ago

Cervantes's comment sounds like a more likely cause of this problem; a queue in core IPC code growing to 2^31 elements sounds pretty unusual.  It's possible that we just keep injecting messages into the queue without processing them, though?

Some logging on that deque when it grows to 128 elements, 256 elements, etc. might be informative, along with an NS_ASSERTION(false): we might get some insight as to whether it's an overly long queue if the same stacks are showing up repeatedly.  Or we might confirm the memory corruption hypothesis if we only get a few messages before crashing.

Looking at the crash report shows that this is a non-main-thread queue that's supposedly overflowing.  I bet that the crashing thread is a PBackground thread, since the main thread (Thread 0) certainly looks like it's waiting for a background thread to shut down.  This crash is happening in the parent process, so perhaps the PBackground thread is stuck in some sort of error loop, constantly posting error messages (perhaps because it can't reach the child process?), and it can't make progress to unblock the main thread?

ni? Kyle for PBackground eyes on this problem.

Flags: needinfo?(nfroyd) → needinfo?(khuey)

Comment 10

•

8 years ago

The crashing thread is the IPC thread, not the PBackground thread (note the base::MessagePumpForIO::DoRunLoop() on the stack).

The main thread is definitely blocked waiting for the PBackground thread to shut down.  It's not obvious to me which thread is the PBackground thread.  None of the threads remaining are wedged, and none are obviously the PBackground thread.  It's possible that it's not even running at all, because we keep spinning until there are no more actors left (http://hg.mozilla.org/releases/mozilla-beta/annotate/tip/ipc/glue/BackgroundImpl.cpp#l1244) but there is the timeout code just above that that should kick in.

Flags: needinfo?(khuey)

Comment 11

•

8 years ago

https://hg.mozilla.org/try/rev/1303f0a868c2b8b60ba406e38f28af33cd32ce11#l1.13
I add some sanity checks and assertions.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=7358d94e4b2c
The tests show that we never have more than 512 tasks in the queue.

Maybe it is really a memory corruption?

Comment hidden (Intermittent Failures Robot)

20 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-beta: 17
* fx-team: 3

Platform breakdown:
* windows7-32: 14
* windows8-64: 6

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1279612&startday=2016-06-27&endday=2016-07-03&tree=all

Assignee

Comment 13

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0f1512b7fd05

This sets X86 debug register to crash when the length variable of the incoming queue of the IPC thread is corrupted. Hopefully we should see the corruption in the crash stack.

Assignee

Comment 14

•

8 years ago

Fix build bustage. https://treeherder.mozilla.org/#/jobs?repo=try&revision=0363e19bf430

Assignee

Comment 15

•

8 years ago

On 48: https://treeherder.mozilla.org/#/jobs?repo=try&revision=477f56bccec3&selectedJob=23333260

Strangely, it still crashed like without the watchpoint.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 16

•

8 years ago

https://hg.mozilla.org/try/rev/024cec644970bcf7e0ce018c94880c93da011e2e
I add some logs to trace the life cycle of MessageLoop and find out...

https://treeherder.mozilla.org/logviewer.html#?job_id=23334553&repo=try#L14694
Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 
It looks like |mWorkerLoop| becomes a dangling pointer and causes UAF.

Hi Kyle,
Since |MessageChannel::mWorkerLoop| is a raw pointer, do you know what ensures it is always valid?

Flags: needinfo?(khuey)

Comment 17

•

8 years ago

I don't think anything ensures it's always valid.  The target thread/MessageLoop should outlive the channel.

Can you figure out if the timer at http://searchfox.org/mozilla-central/rev/a7c8e9f3cc323fd707659175a46826ad12899cd1/ipc/glue/BackgroundImpl.cpp#1228 is firing?

Flags: needinfo?(khuey)

Comment 18

•

8 years ago

https://treeherder.mozilla.org/logviewer.html#?job_id=23402844&repo=try#L14536

22:33:33 INFO - jw schedule shutdown timer, closuer=002EF698 timer=10A9FFB0 
...
22:33:33 INFO - Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2119 

Can't find logs about timer fired.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

8 years ago

Flags: needinfo?(khuey)

Comment 19

•

8 years ago

There's also "[Parent 1148] WARNING: pipe error: 109: file c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/chromium/src/chrome/common/ipc_channel_win.cc, line 343".  That is (presumably) the error that we're trying to handle when we crash.  MSDN says system error code 109 is ERROR_BROKEN_PIPE, which makes sense.

I would try to figure out how the PBackground thread's MessageLoop got destroyed, because that doesn't make any sense. The main thread is still spinning the event loop in ShutdownBackgroundThread so it shouldn't have called Shutdown() on it yet.

Flags: needinfo?(khuey)

Comment hidden (Intermittent Failures Robot)

20 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-beta: 20

Platform breakdown:
* windows7-32: 18
* windows8-64: 2

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1279612&startday=2016-07-04&endday=2016-07-10&tree=all

Comment 21

•

8 years ago

Benjamin and JW,
How is this bug going?

Flags: needinfo?(jwwang)

Flags: needinfo?(bechen)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 22

•

8 years ago

Per comment 19, Kyle is figuring out the timing of destroying the PBackground thread.

Flags: needinfo?(jwwang)

Flags: needinfo?(bechen)

Comment 23

•

8 years ago

Er, no, those were suggestions for you to try ...

Anyways, I'm in Taipei this week, so let's look at it in person.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 24

•

8 years ago

Can we find someone familiar with PBackground to look into this issue?

Comment 25

•

8 years ago

I'm the closest thing to that that exists.

Comment hidden (Intermittent Failures Robot)

13 automation job failures were associated with this bug in the last 7 days.

Repository breakdown:
* mozilla-beta: 13

Platform breakdown:
* windows7-32: 13

For more details, see:
https://brasstacks.mozilla.com/orangefactor/?display=Bug&bugid=1279612&startday=2016-07-11&endday=2016-07-17&tree=all

Sylvestre Ledru [:Sylvestre]

Comment 27

•

8 years ago

Too late for 48, can we make progress for 49?

status-firefox48: affected → wontfix

status-firefox49: --- → affected

tracking-firefox49: --- → +

Comment 28

•

8 years ago

Since Kyle is going, who can find someone else familiar with PBackground to take a look?

Comment 29

•

8 years ago

I know something about PBackground... perhaps billm also?

Assignee

Comment 30

•

8 years ago

It appears that this crash is related to GMPParent. I added more debug code to the patch in comment #16 and found that the crash happens when the IPC transport notifies channel error after the GMP thread is already shut down.

The try push:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1bb9adf0e8dc&selectedJob=25363170

The stack of opening an IPC channel with the crashing MessageLoop:
02:16:27     INFO -  jw MessageChannel 198D8438 Open 12636CC0
                                                     ^^^^^^^^ this is the message loop address
02:16:27     INFO -  [1704] #01: mozilla::gmp::GMPParent::LoadProcess() [dom/media/gmp/GMPParent.cpp:175]
02:16:27     INFO -  [1704] #02: mozilla::gmp::GMPParent::EnsureProcessLoaded() [dom/media/gmp/GMPParent.cpp:626]
02:16:27     INFO -  [1704] #03: mozilla::gmp::GMPServiceParent::RecvLoadGMP(nsCString const &,nsCString const &,nsTArray<nsCString> &&,nsTArray<unsigned long> &&,unsigned long *,nsCString *,unsigned int *,nsresult *) [dom/media/gmp/GMPServiceParent.cpp:1846]
02:16:27     INFO -  [1704] #04: mozilla::gmp::PGMPServiceParent::OnMessageReceived(IPC::Message const &,IPC::Message * &) [obj-firefox/ipc/ipdl/PGMPServiceParent.cpp:297]
02:16:27     INFO -  [1704] #05: mozilla::ipc::MessageChannel::DispatchSyncMessage(IPC::Message const &,IPC::Message * &) [ipc/glue/MessageChannel.cpp:1637]
02:16:27     INFO -  [1704] #06: mozilla::ipc::MessageChannel::DispatchMessageW(IPC::Message &&) [ipc/glue/MessageChannel.cpp:1600]
02:16:27     INFO -  [1704] #07: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1572]
02:16:27     INFO -  [1704] #08: nsRunnableMethodImpl<bool ( mozilla::ipc::MessageChannel::*)(void),0,1>::Run() [obj-firefox/dist/include/nsThreadUtils.h:751]
02:16:27     INFO -  [1704] #09: mozilla::ipc::MessageChannel::DequeueTask::Run() [obj-firefox/dist/include/mozilla/ipc/MessageChannel.h:498]
02:16:27     INFO -  [1704] #10: nsThread::ProcessNextEvent(bool,bool *) [xpcom/threads/nsThread.cpp:1067]
02:16:27     INFO -  [1704] #11: NS_ProcessNextEvent(nsIThread *,bool) [xpcom/glue/nsThreadUtils.cpp:290]
02:16:27     INFO -  [1704] #12: mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate *) [ipc/glue/MessagePump.cpp:384]
02:16:27     INFO -  [1704] #13: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241]
02:16:27     INFO -  [1704] #14: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235]
02:16:27     INFO -  [1704] #15: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215]
02:16:27     INFO -  [1704] #16: nsThread::ThreadFunc(void *) [xpcom/threads/nsThread.cpp:469]
02:16:27     INFO -  [1704] #17: _PR_NativeRunThread [nsprpub/pr/src/threads/combined/pruthr.c:397]
02:16:27     INFO -  [1704] #18: pr_root [nsprpub/pr/src/md/windows/w95thred.c:95]
02:16:27     INFO -  [1704] #19: ucrtbase.DLL + 0x862a4
02:16:27     INFO -  [1704] #20: kernel32.dll + 0x53c45
02:16:27     INFO -  [1704] #21: ntdll.dll + 0x637f5
02:16:27     INFO -  [1704] #22: ntdll.dll + 0x637c8

Then we the log entry
02:32:11     INFO -  cyu shutdown the GMP thread. pid=3952
shows that the parent process shuts down the GMP thread, but later an error with IPC channel is notified:

02:32:11     INFO -  mWorkerLoop is 12636CC0
02:32:11     INFO -  Assertion failure: !mOnDelete->IsRevoked() (jw UAF! mWorkerLoop is about to delete!), at c:/builds/moz2_slave/try-w32-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp:2132
02:32:11     INFO -  [3996] #01: mozilla::ipc::ProcessLink::OnChannelError() [ipc/glue/MessageLink.cpp:418]
02:32:11     INFO -  [3996] #02: base::MessagePumpForIO::WaitForIOCompletion(unsigned long,base::MessagePumpForIO::IOHandler *) [ipc/chromium/src/base/message_pump_win.cc:497]
02:32:11     INFO -  [3996] #03: base::MessagePumpForIO::DoRunLoop() [ipc/chromium/src/base/message_pump_win.cc:441]
02:32:11     INFO -  [3996] #04: base::MessagePumpWin::Run(base::MessagePump::Delegate *) [ipc/chromium/src/base/message_pump_win.h:80]
02:32:11     INFO -  [3996] #05: MessageLoop::RunInternal() [ipc/chromium/src/base/message_loop.cc:241]
02:32:11     INFO -  [3996] #06: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:235]
02:32:11     INFO -  [3996] #07: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:215]
02:32:11     INFO -  [3996] #08: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:192]
02:32:11     INFO -  [3996] #09: `anonymous namespace'::ThreadFunc [ipc/chromium/src/base/platform_thread_win.cc:29]
02:32:11     INFO -  [3996] #10: kernel32.dll + 0x53c45
02:32:11     INFO -  [3996] #11: ntdll.dll + 0x637f5
02:32:11     INFO -  [3996] #12: ntdll.dll + 0x637c8

We need to find out which IPC channel opened on the GMP thread is left open when the thread is shut down.

Comment 31

•

8 years ago

cpearce, gerald, see comment 30 - GMP is involved here, and this may also be the cause of bug 1238542, which is apparently near permafail on beta (MSG not shutdown issues).  This bug started about when 48 uplifted to beta.

Flags: needinfo?(gsquelart)

Flags: needinfo?(cpearce)

Updated

•

8 years ago

Blocks: 1295228

Assignee

Comment 32

•

8 years ago

It appears that this is a race condition during shutdown. A GMPServiceParent instance is deleted asynchronously in its GMPServiceParent::ActorDestroy(), which depends on PGMPServiceChild::Close(). If the parent process observes "xpcom-shutdown-threads" earlier than the child process, it could shut down the GMP thread while the GMPServiceParent instance is still alive. Later when GMPServiceChild::Close() is called because the child process observes "xpcom-shutdown-threads" and Close() the IPC channel, the parent process will observe that and try to post a task to the already-shutdown GMP thread.

I guess we need something similar to mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until all IPC channels running on it are closed.

Comment 33

•

8 years ago

(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #32)

> I guess we need something similar to
> mozilla::layers::CompositorThreadHolder() for the GMP thread to wait until
> all IPC channels running on it are closed.

You should look at AsyncShutdownBlocker (as used in MediaStreamGraph); this will hold shutdown at that stage of shutdown.

(Note: currently that's backed out of beta, but will reland soon.  It's enabled on aurora/nightly)

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 34

•

8 years ago

It looks like it is not caused by WebVTT, so un-assign Benjamin.

Assignee: bechen → nobody

Updated

•

8 years ago

Assignee: nobody → cpearce

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 35

•

8 years ago

Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else who can fix the tests before the beta-release merge next Monday, or before 49 release a week later?

Flags: needinfo?(ajones)

Anthony Jones (:ajones, :kentuckyfriedtakahe, :k17e)

Comment 36

•

8 years ago

(In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35)
> Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else
> who can fix the tests before the beta-release merge next Monday, or before
> 49 release a week later?

Blake - can you co-ordinate with JW and Cervantes to see what we can do here?

Flags: needinfo?(ajones) → needinfo?(bwu)

Ryan VanderMeulen [:RyanVM]

Comment 37

•

8 years ago

(In reply to Anthony Jones (:kentuckyfriedtakahe, :k17e) from comment #36)
> (In reply to Liz Henry (:lizzard) (needinfo? me) from comment #35)
> > Anthony, looks like Chris is out on PTO till Sept. 5th. Is there anyone else
> > who can fix the tests before the beta-release merge next Monday, or before
> > 49 release a week later?
> 
> Blake - can you co-ordinate with JW and Cervantes to see what we can do here?
JW is PTO today. 
Cervantes, 
Can you help us move this bug forward?

Flags: needinfo?(bwu) → needinfo?(cyu)

Comment 38

•

8 years ago

Is there anything left to do here?  Now that bug 1239873 appears to be landed for good (not getting backed out from Beta via bug 1255737 every cycle), the media test shutdown issues are largely gone. Anyway, I think the issue is well-understood enough via bug 1255737 and deps that we can just close this out now.

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 39

•

8 years ago

I don't think bug 1239873 and 1255737 fix this bug. Those 2 bugs handles shutdown of the MediaStreanGraph thread, while this bug handles the bug of shutting down the GMP thread.

I have a preliminary patch to avoid use-after-free of the GMP thread based on the previous analysis: https://treeherder.mozilla.org/#/jobs?repo=try&revision=124b32d6c992 . We should see no more GMP thread crashes with it.

But note that win7 mda chunk has several crashes. I've seen MediaStreamGraph thread, PBackground thread and GMP thread shutdown crashes in this chunk. Those crashes will start showing up after the GMP thread crash is fixed.

Flags: needinfo?(cyu)

Comment 40

•

8 years ago

If I load TreeHerder on beta and filter by "job symbol": "mda", I hardly see this test_webvtt_disabled.html failure coming up as "near permafail". I've spotted 2 instances of this failures in the last 50 runs.

https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&fromchange=e48f4467d705d26ddd69ef5ac1825ec973c30344&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable&filter-job_type_symbol=mda&selectedJob=1475053

Of course mda test boxes do look alarmingly orange, it looks like most of these are other bugs in the MSG, i.e. they should be WebRTC or WebAudio related.

That's not to say we shouldn't fix the bug; I'd like to see cyu land his fix. It just seems we should be focusing on the other failures in MSG will bring the orange rate down faster. For example bug 1238542, and the *test*peerConnection* failures.

Flags: needinfo?(cpearce)

Comment 41

•

8 years ago

Look forward cyu's patch to fix this bug!

Flags: needinfo?(cyu)

Chris Pearce [:cpearce (Not reading bugmail)]

Assignee

Comment 42

•

8 years ago

The original crash seems to disappear from beta runs. I am not sure if some change elsewhere fixes or hides this crash. We need to check if the analysis in comment #32 is still valid. If yes then I'll move forward with the current patch. Otherwise, we can close this bug.

Flags: needinfo?(cyu)

Comment hidden (mozreview-request)

Review commit: https://reviewboard.mozilla.org/r/77856/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/77856/

Comment 44

•

8 years ago

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

I'm on PTO, so defer this to Gerald.

Attachment #8789755 - Flags: review?(cpearce) → review?(gsquelart)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 45

•

8 years ago

mozreview-review

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

r+ with comments addressed.

::: dom/media/gmp/GMPServiceParent.cpp:1874
(Diff revision 1)
> +}
> +
> +NS_IMETHODIMP
> +GeckoMediaPluginServiceParent::BlockShutdown(nsIAsyncShutdownClient*)
> +{
> +

Remove empty line.

::: dom/media/gmp/GMPServiceParent.cpp:1899
(Diff revision 1)
> +  MOZ_ASSERT(mServiceUserCount > 0);
> +  if (--mServiceUserCount == 0) {

Are you sure about this assert & decrement?

In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense).
But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on).

If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed.

::: dom/media/gmp/GMPServiceParent.cpp:1945
(Diff revision 1)
> +  NS_DispatchToMainThread(
> +    NewRunnableMethod(mService.get(),
> +                      &GeckoMediaPluginServiceParent::ServiceUserDestroyed),
> +    NS_DISPATCH_SYNC);

Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible!
ServiceUserDestroyed doesn't need the object to still be alive, does it?
(Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.)

Attachment #8789755 - Flags: review?(gsquelart) → review+

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Assignee

Comment 46

•

8 years ago

mozreview-review-reply

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

> Are you sure about this assert & decrement?
> 
> In ServiceUserCreated() above, you allow for a creation after shutdown, but don't modify the count for it (which makes sense).
> But then when that post-shutdown user gets destroyed, I believe this assertion will fail (either because the count is already 0; or you will decrement it once too many, so shutdown will be unblocked too soon and then the assertion will fail later on).
> 
> If I am correct, I cannot think of an easy solution to this, other than keeping a list of post-shutdown creations (this list should stay empty is almost all cases anyway), so that you can actively ignore them when they get destroyed.

You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed().

> Does this dispatch really need to be synchronous? Synchronous dispatching between GMPT and MT has been a source of shutdown hangs, so it's best to avoid them when possible!
> ServiceUserDestroyed doesn't need the object to still be alive, does it?
> (Even if you implement my proposal from the previous issue, you will just pass the 'this' pointer as an ID instead of a usable pointer, so it should still be fine to dispatch asynchronously -- I don't think an ABA problem is possible, for reasons that cannot fit in this margin.)

It doesn't need to be synchornous. And if we deny the creation of GMPServiceParent in shutting down, then we don't event need to pass the pointer here.

Comment 47

•

8 years ago

mozreview-review-reply

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76500

> You are right. The assertion here is incorrect since post-shutdown creation is not handled corretly. Instead of handling the post-shutdown creation problem, I think it makes much more sense if we just deny the creation of GMPServiceParent in GMPServiceParent::Create() if GeckoMediaPluginServiceParent::mShuttingDown is true. After all, we are shutting down the system, and creating a new instances of GMPServiceParent after the "profile-change-teardown" phase doesn't makes much sense. Then we don't need to worry about the inconsistency between ServiceUserCreated() and ServiceUserDestroyed().

Thank you. I'll want to review that, please!

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Comment 48

•

8 years ago

mozreview-review

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review76792

Back to 'r?' so I can review the upcoming big change.

Attachment #8789755 - Flags: review+ → review?(gsquelart)

Gerald Squelart (he/him) (not at Mozilla since 2022-09-15)

Updated

•

8 years ago

Assignee: cpearce → cyu

Component: Audio/Video: Playback → Audio/Video: GMP

Comment hidden (mozreview-request)

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/77856/diff/1-2/

Comment 50

•

8 years ago

mozreview-review

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review78378

Sorry for the delay, I didn't notice the new revision. I'm usually quick to review, so please buzz me next time you don't hear from me within a day.

r+ with concern addressed:

::: dom/media/gmp/GMPServiceParent.cpp:2118
(Diff revision 2)
> +
>    nsAutoPtr<GMPServiceParent> serviceParent(new GMPServiceParent(gmp));
>  
>    nsCOMPtr<nsIThread> gmpThread;
>    nsresult rv = gmp->GetThread(getter_AddRefs(gmpThread));
> -  NS_ENSURE_SUCCESS(rv, nullptr);
> +  MOZ_RELEASE_ASSERT(NS_SUCCEEDED(rv));

Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere?
Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible).

If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line.

Attachment #8789755 - Flags: review?(gsquelart) → review+

Assignee

Comment 51

•

8 years ago

mozreview-review-reply

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

https://reviewboard.mozilla.org/r/77856/#review78378

> Why change this from a mild test-and-return-null to a very strong assert-and-crash-everywhere?
> Not being in shutdown does not guarantee that gmp->GetThread will always succeed, e.g.: NS_NewNamedThread could fail (very unlikely, but we should still handle this gracefully where possible).
> 
> If you want to prevent an unnecessary GMPServiceParent construction&destruction in this case, I think you could move the creation from line 2114 to after this line.

From GeckoMediaPluginService::GetThread(), it looks to me that if we are not in shutdown, then we should be able to get the thread (except when we fail to create the thread). But I think you are right that it the wrong choice to place an assertion here because the caller won't have any chance to handle the failure, even if the failure is very likely. And it's wrong that the caller method of GeckoMediaPluginService::GetThread() crosses the boundary to look at the implementation and place an assertion based on it. So I'll use NS_ENSURE_SUCCESS() here.

Comment hidden (mozreview-request)

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/77856/diff/2-3/

Pulsebot

Comment 53

•

8 years ago

Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/b82fa1825412
Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald

Phil Ringnalda (:philor)

Comment 54

•

8 years ago

Backed out in https://hg.mozilla.org/integration/autoland/rev/79604052ba79 for static analysis bustage, https://treeherder.mozilla.org/logviewer.html#?job_id=3849813&repo=autoland

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Assignee

Comment 55

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=caba143550c7

Comment hidden (mozreview-request)

Comment on attachment 8789755 [details]
Bug 1279612 - Block xpcom-will-shutdown before GMPServiceParent instances are shut down.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/77856/diff/3-4/

Pulsebot

Comment 57

•

8 years ago

Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/583124bcd4d9
Block xpcom-will-shutdown before GMPServiceParent instances are shut down. r=gerald

Reporter

Comment 58

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/583124bcd4d9

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox52: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla52

Ryan VanderMeulen [:RyanVM]

Comment 59

•

8 years ago

Not sure which branches (if any) you want to backport this to, but please fill out the approval requests whenever you decide :)

status-firefox49: affected → wontfix

status-firefox50: --- → affected

status-firefox51: --- → affected

Flags: needinfo?(gsquelart) → needinfo?(cyu)