Closed
Bug 1278443
Opened 8 years ago
Closed 8 years ago
ChildImpl::OpenProtocolOnMainThread crashes when attempting to send a message during shutdown
Categories
(Core :: IPC, defect, P2)
Tracking
()
RESOLVED
FIXED
mozilla52
People
(Reporter: ting, Assigned: cyu)
References
Details
(Keywords: crash, topcrash-mac, Whiteboard: btpp-backlog)
Crash Data
Attachments
(3 files, 1 obsolete file)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/x-review-board-request
|
billm
:
review+
gchang
:
approval-mozilla-aurora+
|
Details |
(deleted),
text/x-review-board-request
|
mrbkap
:
review+
|
Details |
This bug was filed from the Socorro interface and is
report bp-c35a89a5-deb8-4c1e-be49-a46202160606.
=============================================================
#2 of Mac OS X Nightly 20160605030215, 4 crashes from 2 installations. It's a MOZ_CRASH from opening PBackground failure.
Comment 1•8 years ago
|
||
Is this a new regression? I don't think that 4 crashes would typically be enough to investigate a bug report, in case this is developers doing odd things.
Flags: needinfo?(janus926)
Comment 2•8 years ago
|
||
That's four crashes for a single build, on OSX, which is fairly high for OSX. I see 51 crashes in the last 7 days, mostly on 49.
These are all happening during shutdown, so it is likely more fallout from the "submit old crash reports" thing.
BackgroundChild does have a sShutdownHasStarted variable that tracks when shutdown has started, but it is waiting for NS_XPCOM_SHUTDOWN_THREADS_OBSERVER_ID, which I think is partway through XPCOM shutdown, so it isn't catching when we trigger this in the earlier part of XPCOM shutdown, when we are presumably still not allowed to send IPC messages. That would be easy enough to fix to listen for whatever earlier point when we can't send IPC messages any more... but when we hit that condition we also crash, so it wouldn't really improve things! Maybe the behavior could be changed to fail more gracefully.
Updated•8 years ago
|
Summary: Crash in (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread → ChildImpl::OpenProtocolOnMainThread crashes when attempting to send a message during shutdown
Updated•8 years ago
|
tracking-e10s:
--- → ?
Keywords: topcrash-mac
Comment 3•8 years ago
|
||
I don't know how much it matters, but in two of the crash reports I looked at, what is happening is that the child is receiving a destroy message, which causes it to trigger PageHide, which runs some script, which triggers what I think is a sync XHR that spins the event loop, which then tries to create this background child thing.
Updated•8 years ago
|
Comment 4•8 years ago
|
||
Ting-Yu, are you interested/able to investigate more here? At first glance this doesn't seem super-urgent.
Whiteboard: btpp-backlog
Reporter | ||
Comment 6•8 years ago
|
||
Let's see if bug 1268559 could help this.
Flags: needinfo?(janus926)
Comment 7•8 years ago
|
||
Updated•8 years ago
|
Priority: -- → P2
Updated•8 years ago
|
Crash Signature: [@ (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread] → [@ (anonymous namespace)::ChildImpl::OpenProtocolOnMainThread] [@ `anonymous namespace''::ChildImpl::OpenProtocolOnMainThread ]
Updated•8 years ago
|
Comment 8•8 years ago
|
||
Crash volume for signature '(anonymous namespace)::ChildImpl::OpenProtocolOnMainThread':
- nightly (version 50): 35 crashes from 2016-06-06.
- aurora (version 49): 110 crashes from 2016-06-07.
- beta (version 48): 1 crash from 2016-06-06.
- release (version 47): 0 crashes from 2016-05-31.
- esr (version 45): 0 crashes from 2016-04-07.
Crash volume on the last weeks:
W. N-1 W. N-2 W. N-3 W. N-4 W. N-5 W. N-6 W. N-7
- nightly 6 3 1 2 0 7 9
- aurora 11 7 6 8 12 29 29
- beta 0 0 1 0 0 0 0
- release 0 0 0 0 0 0 0
- esr 0 0 0 0 0 0 0
Affected platform: Mac OS X
status-firefox48:
--- → affected
status-firefox49:
--- → affected
Updated•8 years ago
|
Keywords: intermittent-failure
Comment 9•8 years ago
|
||
Crash volume for signature '`anonymous namespace''::ChildImpl::OpenProtocolOnMainThread':
- nightly (version 51): 147 crashes from 2016-08-01.
- aurora (version 50): 280 crashes from 2016-08-01.
- beta (version 49): 185 crashes from 2016-08-02.
- release (version 48): 6 crashes from 2016-07-25.
- esr (version 45): 5 crashes from 2016-05-02.
Crash volume on the last weeks (Week N is from 08-22 to 08-28):
W. N-1 W. N-2 W. N-3
- nightly 42 51 31
- aurora 105 89 26
- beta 5 85 78
- release 3 0 1
- esr 2 0 1
Affected platform: Windows
Crash rank on the last 7 days:
Browser Content Plugin
- nightly #13
- aurora #16
- beta #803
- release #4554
- esr #4043
status-firefox51:
--- → affected
status-firefox-esr45:
--- → affected
Comment 10•8 years ago
|
||
bug 1277582 just landed, let's see if this will disappear too.
Hello Kan-Ru, I just checked the crash signature and I see crash reports from 09-12 aurora50 build. I don't think the fix from bug 1277582 address this signature. Will you be working on an alternate fix for it?
Flags: needinfo?(kchen)
Updated•8 years ago
|
Assignee: kchen → cyu
Comment 13•8 years ago
|
||
Crash volume for signature '`anonymous namespace''::ChildImpl::OpenProtocolOnMainThread':
- nightly (version 52): 148 crashes from 2016-09-19.
- aurora (version 51): 56 crashes from 2016-09-19.
- beta (version 50): 2 crashes from 2016-09-20.
- release (version 49): 10 crashes from 2016-09-05.
- esr (version 45): 21 crashes from 2016-06-01.
Crash volume on the last weeks (Week N is from 10-03 to 10-09):
W. N-1 W. N-2
- nightly 48 100
- aurora 45 11
- beta 2 0
- release 4 6
- esr 1 5
Affected platform: Windows
Crash rank on the last 7 days:
Browser Content Plugin
- nightly #11
- aurora #21
- beta #4616
- release #6227
- esr #4483
status-firefox52:
--- → affected
Assignee | ||
Comment 14•8 years ago
|
||
After applying the patch, just open and then close the browser. Then the content process crashes.
Assignee | ||
Comment 15•8 years ago
|
||
This patch fails ChildImpl::OpenProtocolOnMainThread() gently after PContentChild is closed.
It looks like CamerasChild can handle the failure in opening BackgroundChild.
The change in RuntimeService seems to work, but I am not sure if it's the right fix.
Attachment #8797584 -
Flags: review?(wmccloskey)
Comment on attachment 8797584 [details] [diff] [review]
WIP
Review of attachment 8797584 [details] [diff] [review]:
-----------------------------------------------------------------
If we decide to make the worker change, Blake should probably review that. I know very little about workers. Otherwise this looks good to me.
::: dom/workers/RuntimeService.cpp
@@ +2547,5 @@
>
> // Too many idle threads, just shut this one down.
> if (shutdownThread) {
> MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
> } else if (scheduleTimer) {
Maybe combine the mIdleThreadTimer check with this one?
@@ -2787,5 @@
> // mWorkerPrivate->SetThread() in order to avoid accidentally consuming
> // worker messages here.
> if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
> // XXX need to fire an error at parent.
> - return NS_ERROR_UNEXPECTED;
I guess I don't see why we need to change this. If the worker fails to start up when we're shutting down, would that be bad? It seems pretty reasonable to me.
::: ipc/glue/BackgroundImpl.cpp
@@ +2049,5 @@
>
> return true;
> }
>
> + if (sIPCShutdown) {
Can you just use
if (XRE_IsContentProcess() && ContentChild::GetSingleton()->IsShuttingDown())
?
Then you could remove the observer.
Attachment #8797584 -
Flags: review?(wmccloskey)
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•8 years ago
|
Attachment #8797584 -
Attachment is obsolete: true
Assignee | ||
Comment 19•8 years ago
|
||
(In reply to Bill McCloskey (:billm) from comment #16)
> Comment on attachment 8797584 [details] [diff] [review]
> WIP
>
> Review of attachment 8797584 [details] [diff] [review]:
> -----------------------------------------------------------------
>
> If we decide to make the worker change, Blake should probably review that. I
> know very little about workers. Otherwise this looks good to me.
>
> ::: dom/workers/RuntimeService.cpp
> @@ +2547,5 @@
> >
> > // Too many idle threads, just shut this one down.
> > if (shutdownThread) {
> > MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
> > } else if (scheduleTimer) {
>
> Maybe combine the mIdleThreadTimer check with this one?
>
This is related to the problem in test case design. Starting a worker in "xpcom-shutdow" phase confuses RuntimeService's cleanup sequence because RuntimeService also observes "xpcom-shutdow" to Shutdown() itself.
> @@ -2787,5 @@
> > // mWorkerPrivate->SetThread() in order to avoid accidentally consuming
> > // worker messages here.
> > if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
> > // XXX need to fire an error at parent.
> > - return NS_ERROR_UNEXPECTED;
>
> I guess I don't see why we need to change this. If the worker fails to start
> up when we're shutting down, would that be bad? It seems pretty reasonable
> to me.
>
It'd be pretty bad because the worker will not be unregistered. Then RuntimeService will wait forever for this leaked worker during shutdown.
(In reply to Cervantes Yu [:cyu] [:cervantes] from comment #19)
> (In reply to Bill McCloskey (:billm) from comment #16)
> > Comment on attachment 8797584 [details] [diff] [review]
> > WIP
> >
> > Review of attachment 8797584 [details] [diff] [review]:
> > -----------------------------------------------------------------
> >
> > If we decide to make the worker change, Blake should probably review that. I
> > know very little about workers. Otherwise this looks good to me.
> >
> > ::: dom/workers/RuntimeService.cpp
> > @@ +2547,5 @@
> > >
> > > // Too many idle threads, just shut this one down.
> > > if (shutdownThread) {
> > > MOZ_ALWAYS_SUCCEEDS(aThread->Shutdown());
> > > } else if (scheduleTimer) {
> >
> > Maybe combine the mIdleThreadTimer check with this one?
> >
> This is related to the problem in test case design. Starting a worker in
> "xpcom-shutdow" phase confuses RuntimeService's cleanup sequence because
> RuntimeService also observes "xpcom-shutdow" to Shutdown() itself.
I just meant it would be easier to write this:
} else if (scheduleTimer && mIdleThreadTimer) {
rather than:
} else if (scheduleTimer) {
if (mIdleThreadTimer) {
> > @@ -2787,5 @@
> > > // mWorkerPrivate->SetThread() in order to avoid accidentally consuming
> > > // worker messages here.
> > > if (NS_WARN_IF(!BackgroundChild::SynchronouslyCreateForCurrentThread())) {
> > > // XXX need to fire an error at parent.
> > > - return NS_ERROR_UNEXPECTED;
> >
> > I guess I don't see why we need to change this. If the worker fails to start
> > up when we're shutting down, would that be bad? It seems pretty reasonable
> > to me.
> >
> It'd be pretty bad because the worker will not be unregistered. Then
> RuntimeService will wait forever for this leaked worker during shutdown.
OK, I was afraid that might be the case. I wonder how hard it would be to actually notify the parent of the error.
Comment 21•8 years ago
|
||
mozreview-review |
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.
https://reviewboard.mozilla.org/r/83580/#review83282
Attachment #8797962 -
Flags: review?(wmccloskey) → review+
Comment 22•8 years ago
|
||
Hey Cervantes, sorry for my delay in reviewing this. I'll make sure to finish this review either later today or tomorrow.
Comment 23•8 years ago
|
||
mozreview-review |
Comment on attachment 8797963 [details]
Bug 1278443 - Part 2: Continue to run the worker even if BackgroundChild fails to create.
https://reviewboard.mozilla.org/r/83582/#review84330
I'm not sure I fully understand all of the implications of this, but let's try it.
Attachment #8797963 -
Flags: review?(mrbkap) → review+
Updated•8 years ago
|
Flags: needinfo?(cyu)
Assignee | ||
Comment 24•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&tochange=9fa393166521&fromchange=1175841277bc1bc3193ddd830aa569ca82317cf2&selectedJob=29426388
mochitest mda on Windows 7 VM looks unhappy with the change.
Flags: needinfo?(cyu)
Assignee | ||
Comment 25•8 years ago
|
||
Comment 26•8 years ago
|
||
Pushed by cyu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/53a144cdc973
Part 1: Don't open PBackground actors after PContentChild is closed. r=billm
https://hg.mozilla.org/integration/autoland/rev/d4858e65d890
Part 2: Continue to run the worker even if BackgroundChild fails to create. r=mrbkap
Comment 27•8 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/53a144cdc973
https://hg.mozilla.org/mozilla-central/rev/d4858e65d890
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
Comment 28•8 years ago
|
||
Is this worth trying to backport?
Flags: needinfo?(cyu)
Keywords: intermittent-failure
Comment 29•8 years ago
|
||
Hi Kan-Ru,
Since :cyu is OOO, this might need your help to create uplift request. Thanks.
Flags: needinfo?(cyu) → needinfo?(kchen)
Comment 30•8 years ago
|
||
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.
Approval Request Comment
[Feature/regressing bug #]:
[User impact if declined]: Fixed a shutdown crash
[Describe test coverage new/current, TreeHerder]: on m-c. Baked on Nightly.
[Risks and why]: Low. This patch disallows PBackground creation when shutdown; PBackground client should already handle connection failure gracefully.
[String/UUID change made/needed]:
Flags: needinfo?(kchen)
Attachment #8797962 -
Flags: approval-mozilla-aurora?
Comment 31•8 years ago
|
||
Comment on attachment 8797962 [details]
Bug 1278443 - Part 1: Don't open PBackground actors after PContentChild is closed.
Fix a crash. Take it in 51 aurora.
Attachment #8797962 -
Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Comment 33•8 years ago
|
||
bugherder uplift |
Backed out in https://hg.mozilla.org/releases/mozilla-aurora/rev/90660f89ad092bd9cb32261637be6b1c33a9e731 for being the likely cause of Wr failures like https://treeherder.mozilla.org/logviewer.html#?job_id=3962885&repo=mozilla-aurora
Comment 36•8 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•