1277582 - PNeckoChild::SendPHttpChannelConstructor called during shutdown

Reporter

Description

•

8 years ago

This bug was filed from the Socorro interface and is report bp-59626cfd-df24-4c42-8446-90a822160602. ============================================================= Seen while looking at nightly crash stats. Signatures appear for both Windows and Mac. First appearance on Mac seems to be 5-14.

Andrew McCreight [:mccr8]

Comment 1

•

8 years ago

This might be a signature change for bug 1152087.

Jim Mathies [:jimm]

Comment 2

•

8 years ago

[Tracking Requested - why for this release]: #1 browser side top crasher in 49

tracking-firefox49: --- → ?

Andrew McCreight [:mccr8]

Comment 3

•

8 years ago

I looked at a few reports, and I noticed that almost all of them have a ShutdownProgress value, like profile-before-change or xpcom-shutdown. I'm not sure where that comes from (the child doesn't have profile-before change), but we might be trying to create a new HttpChannel after shutdown has started. The constructors seem to be mostly or all being called from HttpChannelChild::ContinueAsyncOpen.

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Component: Networking → Networking: HTTP

Tracy Walker [:tracy]

Comment 5

•

8 years ago

This has risen to nearly 5% of all crashes on Nightly. note at time of this comment: How are we getting an installation count of 354 when there are only 222 crash reports on 49.0a1?

tracking-e10s: --- → ?

Andrew McCreight [:mccr8]

Comment 6

•

8 years ago

I think this spike is from the backwash of old crash reports from blassey's patch. Looking through the first page, most of them are from a week or more ago, which makes sense as this is some kind of shutdown crash. So I think it isn't a real top crash. Also, why are we doing anything with PNeckoChild in the parent? I find that surprising.

Andrew McCreight [:mccr8]

Comment 7

•

8 years ago

If you click on "by build date" instead of "by report date" on the top crash list, this is only #10, which about 1% of crashes. So, serious, but not as bad as it looks.

Keywords: topcrash

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Summary: Crash in mozalloc_abort | Abort | NS_DebugBreak | mozilla::ipc::FatalError | mozilla::net::PNeckoChild::SendPHttpChannelConstructor → Parent process calls PNeckoChild::SendPHttpChannelConstructor during shutdown

David Baron :dbaron:

Updated

•

8 years ago

Flags: needinfo?(mcmanus)

Andrew McCreight [:mccr8]

Comment 8

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #6) > Also, why are we doing anything with PNeckoChild in the parent? I find that > surprising. I guess the parent process is the one that sends a message to start up the child, or something. Anyways, I'd guess what we need to do here is the usual IPC shutdown dance where something checks if we've shutdown somehow, and if so, don't send the message.

Jim Mathies [:jimm]

Comment 9

•

8 years ago

This has risen up to the #1 top browser crash in nightly. We're merging today so it's going to move to aurora as well.

Jim Mathies [:jimm]

Updated

•

8 years ago

Flags: needinfo?(jduell.mcbugs)

Andrew McCreight [:mccr8]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1278161

Patrick McManus [:mcmanus]

Comment 10

•

8 years ago

jason is the right person to look at this and he's looking right now..

Flags: needinfo?(mcmanus)

Jason Duell

Comment 11

•

8 years ago

Valentin is taking this. Dragana points out it may be related to bug 1277961

Assignee: nobody → valentin.gosu

Flags: needinfo?(jduell.mcbugs)

Andrew McCreight [:mccr8]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1277961

blinky

Comment 12

•

8 years ago

I can reproduce this crash, See https://dl.dropboxusercontent.com/u/95157096/85f61cf7/tt971ibpet.mp4. Crash reports: bp-19670c25-325e-42ef-ae3d-e6b2f2160606 bp-960750e9-471c-4b0f-9f7b-77ab92160606

blinky

Comment 13

•

8 years ago

Regression range: https://hg.mozilla.org/integration/mozilla-inbound/pushloghtml?fromchange=02c69f4f896255189ce2f9f4e0d875e383bcfbd7&tochange=a1bd47d76f71162534090485acc57866dcd55eef Regressed by: bug 1229961

Jim Mathies [:jimm]

Comment 14

•

8 years ago

(In reply to blinky from comment #13) > Regression range: > https://hg.mozilla.org/integration/mozilla-inbound/ > pushloghtml?fromchange=02c69f4f896255189ce2f9f4e0d875e383bcfbd7&tochange=a1bd > 47d76f71162534090485acc57866dcd55eef > > Regressed by: bug 1229961 Thanks for testing and providing the video. Curious how you came up with this regression range?

Flags: needinfo?(over68)

Jim Mathies [:jimm]

Comment 15

•

8 years ago

There's a mixture of stacks in here. The plugin _geturlnotify during plugin shutdown calls that blinky experienced are part of this, but not 100% of the stacks.

blinky

Comment 16

•

8 years ago

(In reply to Jim Mathies [:jimm] from comment #14) > (In reply to blinky from comment #13) > > Regression range: > > https://hg.mozilla.org/integration/mozilla-inbound/ > > pushloghtml?fromchange=02c69f4f896255189ce2f9f4e0d875e383bcfbd7&tochange=a1bd > > 47d76f71162534090485acc57866dcd55eef > > > > Regressed by: bug 1229961 > > Thanks for testing and providing the video. Curious how you came up with > this regression range? With setting the pref dom.ipc.plugins.asyncdrawing.enabled to false, I can not reproduce the crash.

blinky

Updated

•

8 years ago

Flags: needinfo?(over68)

Marcia Knous [:marcia]

Reporter

Comment 17

•

8 years ago

Let's track this crash for 49.

tracking-firefox49: ? → +

Valentin Gosu [:valentin] (he/him)

Comment 18

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=792f7a961887

Valentin Gosu [:valentin] (he/him)

Comment 19

•

8 years ago

Do you think you could try installing this build and letting us know if you can reproduce the bug? http://archive.mozilla.org/pub/firefox/try-builds/valentin.gosu@gmail.com-05e6aad0172e900900dd2d0ea04df3798ddccb6f/try-win32/firefox-49.0a1.en-US.win32.installer.exe Thanks!

Flags: needinfo?(over68)

blinky

Comment 20

•

8 years ago

(In reply to Valentin Gosu [:valentin] from comment #19) > Do you think you could try installing this build and letting us know if you > can reproduce the bug? > http://archive.mozilla.org/pub/firefox/try-builds/valentin.gosu@gmail.com- > 05e6aad0172e900900dd2d0ea04df3798ddccb6f/try-win32/firefox-49.0a1.en-US. > win32.installer.exe > > Thanks! I can not reproduce the crash with this build, thanks.

Flags: needinfo?(over68)

Valentin Gosu [:valentin] (he/him)

Comment 21

•

8 years ago

Attached patch Don't send SendPHttpChannelConstructor ipdl messages during child shutdown (obsolete) (deleted) — Details — Splinter Review

MozReview-Commit-ID: F6pCCn4jPVb

Attachment #8760642 - Flags: review?(mcmanus)

Dragana Damjanovic [:dragana]

Comment 22

•

8 years ago

Do you know why we are calling HttpChannelChild::Continue... in the parent process?

Valentin Gosu [:valentin] (he/him)

Comment 23

•

8 years ago

I don't think this is the parent process. There are plugin-container related things in the crash-report.

Dragana Damjanovic [:dragana]

Comment 24

•

8 years ago

(In reply to Valentin Gosu [:valentin] from comment #23) > I don't think this is the parent process. There are plugin-container related > things in the crash-report. Ok, only the title is wrong :)

Patrick McManus [:mcmanus]

Updated

•

8 years ago

Attachment #8760642 - Flags: review?(mcmanus)

Attachment #8760642 - Flags: review?(josh)

Attachment #8760642 - Flags: review?(dd.mozilla)

Attachment #8760642 - Flags: feedback+

Andrew McCreight [:mccr8]

Updated

•

8 years ago

Summary: Parent process calls PNeckoChild::SendPHttpChannelConstructor during shutdown → PNeckoChild::SendPHttpChannelConstructor called during shutdown

Dragana Damjanovic [:dragana]

Updated

•

8 years ago

Attachment #8760642 - Flags: review?(dd.mozilla) → review+

Josh Matthews [:jdm]

Comment 25

•

8 years ago

The crash stack in comment 0 is the exact same problem as bug 1277961; it's not anything related to shutdown in particular, it's that the TabChild being used in the XHR is no longer valid. In this particular crash stack, the original stack is from TabChild::RecvDestroy - this would ordinarily be fine, since it enqueues a runnable that sends the TabChild::__delete__ message to the parent. However, the stack shows us starting a synchronous XHR via the "unload" event dispatched to the tab child global, which processes the runnable and deletes the tab child (since there's apparently a service worker involved that causes the XHR networking to be asynchronous). A more robust fix that would deal with bug 1277961 as well would be to check the state of the tab child object we want to use before using it.

Josh Matthews [:jdm]

Comment 26

•

8 years ago

Comment on attachment 8760642 [details] [diff] [review] Don't send SendPHttpChannelConstructor ipdl messages during child shutdown As explained in the previous comment, I think the fact that this works is merely coincidence.

Attachment #8760642 - Flags: review?(josh) → review-

Andrew McCreight [:mccr8]

Comment 27

•

8 years ago

Comment 25 sounds similar to bug 1278443.

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1278443

Valentin Gosu [:valentin] (he/him)

Comment 28

•

8 years ago

(In reply to Josh Matthews [:jdm] from comment #26) > Comment on attachment 8760642 [details] [diff] [review] > Don't send SendPHttpChannelConstructor ipdl messages during child shutdown > > As explained in the previous comment, I think the fact that this works is > merely coincidence. I don't think it's a mere coincidence, this is one of the codepaths that lead to us sending IPC messages after the shutdown has begun. Indeed, it seems that bug 1277961 is a generalization of this issue, and would not be fixed by this patch. I'll have a chat with :jduell to see what are the options for fixing this.

Daniel Stenberg [:bagder]

Updated

•

8 years ago

Whiteboard: [necko-active]

Jim Mathies [:jimm]

Updated

•

8 years ago

tracking-e10s: ? → +

Priority: -- → P1

Valentin Gosu [:valentin] (he/him)

Comment 29

•

8 years ago

So, I talked about this issue with :BillM and it seems that attachment 8759907 [details] [diff] [review] would fix this issue (and bug 1277961).

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1268559

Daniel Stenberg [:bagder]

Comment 30

•

8 years ago

What seems to be a related shutdown crash involving the HTTP channel is in bug 1274886.

Daniel Stenberg [:bagder]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1274886

Jim Mathies [:jimm]

Updated

•

8 years ago

Depends on: 1268559

Valentin Gosu [:valentin] (he/him)

Comment 31

•

8 years ago

Unfortunatelly, it seems bug 1268559 didn't fix this: https://crash-stats.mozilla.com/report/index/25b8c4d5-3926-40c9-a192-98be62160628

Valentin Gosu [:valentin] (he/him)

Comment 32

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5a9b60ad0c43

Valentin Gosu [:valentin] (he/him)

Comment 33

•

8 years ago

Attached patch Don't send SendPHttpChannelConstructor ipdl messages during child shutdown (deleted) — Details — Splinter Review

MozReview-Commit-ID: F6pCCn4jPVb

Attachment #8765914 - Flags: review?(josh)

Attachment #8765914 - Flags: review?(dd.mozilla)

Valentin Gosu [:valentin] (he/him)

Updated

•

8 years ago

Attachment #8760642 - Attachment is obsolete: true

Josh Matthews [:jdm]

Comment 34

•

8 years ago

The FatalError that's being invoked in that crash report is different: ``` bool sendok__ = (mChannel)->Send(msg__); if ((!(sendok__))) { FatalError("constructor for actor failed"); return nullptr; } ```

Josh Matthews [:jdm]

Comment 35

•

8 years ago

That's the same stack trace as the crash report reported in https://bugzilla.mozilla.org/show_bug.cgi?id=1277961#c5, too.

Josh Matthews [:jdm]

Comment 36

•

8 years ago

I'm not sure what to do about that; this patch shouldn't affect it at all, so we'll probably need input from someone like billm who knows the IPC layer well.

Flags: needinfo?(wmccloskey)

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 37

•

8 years ago

Is this related to the crash from bug 1277961? This is the #1 topcrash in aurora so I'm hoping once this new patch lands we can judge if it's effective and uplift it.

status-firefox50: --- → affected

tracking-firefox50: --- → +

Flags: needinfo?(valentin.gosu)

Valentin Gosu [:valentin] (he/him)

Comment 38

•

8 years ago

The stack trace in bug 1277961 also contains SendPHttpChannelConstructor. I fully expect that bug would be fixed by this patch.

Flags: needinfo?(valentin.gosu)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 39

•

8 years ago

It looks like there are several issues here. The simplest one (which mainly seems to be covered by bug 1277961) is that the PContent link to the parent process has unexpectedly closed. I'm not sure why that happens--maybe the parent crashed. Any attempt to send at this point will fail. The best thing we can do is avoid sending. So I think Valentin's patch will work in this case. So I think we should proceed with this patch, but maybe it should be moved to bug 1277961. However, there's a separate issue where we're shutting down normally. We should be allowed to send sync XHRs at this point since we delete the tab actor in a runnable. However, it looks like the XHR tries to send more IPC stuff in a separate turn of the event loop from the initial XHR, and it somehow has to do with service workers? This part I don't really understand. What does ResetInterception mean? Anyway, the ideal solution for this problem is to avoid tearing down the tab actor until the XHR has completed. I think we could use a similar solution as in bug 1268559: the Send__delete__ runnable for TabChild would be put on hold if we're in a nested event loop. However, it would be great if we had a test case for this issue. It's pretty easy to send an XHR from an "unload" or "pagehide" handler, but I don't know how to get to this ResetInterception situation. Josh, can you answer that or needinfo someone who can? It will be a lot easier to patch this with a test case.

Flags: needinfo?(wmccloskey) → needinfo?(josh)

Valentin Gosu [:valentin] (he/him)

Updated

•

8 years ago

Blocks: 1277961

Josh Matthews [:jdm]

Comment 40

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #39) > However, there's a separate issue where we're shutting down normally. We > should be allowed to send sync XHRs at this point since we delete the tab > actor in a runnable. However, it looks like the XHR tries to send more IPC > stuff in a separate turn of the event loop from the initial XHR, and it > somehow has to do with service workers? This part I don't really understand. > What does ResetInterception mean? > > Anyway, the ideal solution for this problem is to avoid tearing down the tab > actor until the XHR has completed. I think we could use a similar solution > as in bug 1268559: the Send__delete__ runnable for TabChild would be put on > hold if we're in a nested event loop. > > However, it would be great if we had a test case for this issue. It's pretty > easy to send an XHR from an "unload" or "pagehide" handler, but I don't know > how to get to this ResetInterception situation. Josh, can you answer that or > needinfo someone who can? It will be a lot easier to patch this with a test > case. With regards to the test case, the only difference is that a page with an unload/pagehide handler needs to register an empty service worker - this will cause the ResetInterception code path to be taken, which makes AsyncOpen become more asynchronous than usual.

Flags: needinfo?(josh)

Josh Matthews [:jdm]

Comment 41

•

8 years ago

Comment on attachment 8765914 [details] [diff] [review] Don't send SendPHttpChannelConstructor ipdl messages during child shutdown Review of attachment 8765914 [details] [diff] [review]: ----------------------------------------------------------------- I'm not convinced about how much this will help, but it certainly shouldn't hurt besides covering up cases where we should be sending valid actors but can't due to bugs.

Attachment #8765914 - Flags: review?(josh) → review+

Dragana Damjanovic [:dragana]

Updated

•

8 years ago

Attachment #8765914 - Flags: review?(dd.mozilla) → review+

Valentin Gosu [:valentin] (he/him)

Comment 42

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c465467931d3806df0a28d3351d6eb15351b561a Bug 1277582 - Don't send SendPHttpChannelConstructor ipdl messages during child shutdown r=dragana,jdm

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 43

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c465467931d3

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox50: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla50

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 44

•

8 years ago

This isn't fixed. We still need to figure out what to do during normal shutdown.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Valentin Gosu [:valentin] (he/him)

Comment 46

•

8 years ago

It appears that the latest patch hasn't fixed the issue either: https://crash-stats.mozilla.com/report/index/67dbee6c-acf9-44ed-92d2-ce61e2160709 This is quite confusing to me. I was expecting that tabChild->IsDestroyed() would be true in this case. Bill, is there any other reason why ``` bool sendok__ = (mChannel)->Send(msg__); ``` might return false, other than shutdown?

Flags: needinfo?(wmccloskey)

Ben Kelly [:bkelly, not reviewing]

Updated

•

8 years ago

Depends on: 1286008

Ben Kelly [:bkelly, not reviewing]

Comment 47

•

8 years ago

This seems to be causing us to leak outer windows. See the STR in bug 1286008. After backing this patch out the leaks go away. If this doesn't even fix the origin problem can we just back it out please?

Flags: needinfo?(valentin.gosu)

Ben Kelly [:bkelly, not reviewing]

Comment 48

•

8 years ago

I suspect twitter leaks with this patch because it uses a service worker without a fetch event handler. Its always hitting the ResetInterception() path which basically does nothing with an error code from ContinueAsyncOpen(): https://dxr.mozilla.org/mozilla-central/source/netwerk/protocol/http/HttpChannelChild.cpp#2531 Josh, it seems like maybe ResetInterception() should do something with that error code to abort the channel?

Flags: needinfo?(josh)

Ben Kelly [:bkelly, not reviewing]

Comment 49

•

8 years ago

Actually, I have an easy fix for the bad error handling in bug 1286258. I'll just fix that there.

Flags: needinfo?(josh)

Kan-Ru Chen [:kanru] (UTC+9)

Comment 50

•

8 years ago

This is #3 crash on windows nightly build 20160711034039 with 21 crashes.

status-firefox50: fixed → affected

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 51

•

8 years ago

As I said in comment 39, I don't expect the patch here to fix this bug. I had expected it to fix bug 1277961 (the patch really should have been moved there). However, it looks like it didn't fix that bug. I think the reason is that we're doing the wrong check. IsDestroyed() only checks if the tab has been destroyed gracefully. We should be using IPCOpen() instead. Comment 39 describes a change that I think would actually fix this bug. Valentin, is that something you could work on?

Flags: needinfo?(wmccloskey)

Kan-Ru Chen [:kanru] (UTC+9)

Updated

•

8 years ago

decebalus

Comment 52

•

8 years ago

Hi! I have many crashes on some .pdf files and my crash reports suggests this bug. These crashes happens only with e10s enabled in all versions: FF 47/ Beta / Nightly. FF freeze and I should kill process from task manager (or kill Firefox plugin container wich remains at 30-35% CPU usage) I tried everything: clean profile/clean installation/ safe mode..the issue remains. OS: WIN 8.1 pro

Valentin Gosu [:valentin] (he/him)

Comment 53

•

8 years ago

Attached patch Check if TabChild is still available before calling SendPHttpChannelConstructor (deleted) — Details — Splinter Review

MozReview-Commit-ID: 5XH3cYkbUd1

Attachment #8771178 - Flags: review?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

8 years ago

Attachment #8771178 - Flags: review?(wmccloskey) → review+

Valentin Gosu [:valentin] (he/him)

Updated

•

8 years ago

Keywords: checkin-needed

Valentin Gosu [:valentin] (he/him)

Comment 54

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #51) > Comment 39 describes a change that I think would actually fix this bug. > Valentin, is that something you could work on? I'm actually not very familiar with the workings of TabChild, which is probably why I didn't fix this properly. If you could help me with that I would really appreciate it.

Flags: needinfo?(valentin.gosu)

Pulsebot

Comment 55

•

8 years ago

Pushed by ryanvm@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/2c0a0814add0 Check if TabChild is still available before calling SendPHttpChannelConstructor. r=billm

Keywords: checkin-needed

Carsten Book [:Tomcat]

Comment 56

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/2c0a0814add0

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox50: affected → fixed

Resolution: --- → FIXED

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Updated

•

8 years ago

Status: RESOLVED → REOPENED

Flags: needinfo?(wmccloskey)

Resolution: FIXED → ---

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 57

•

8 years ago

Bug 1277961 (the PNeckoChild::Write signature) appears to be fixed now!

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 58

•

8 years ago

Attached patch wait for nested event loops to finish at TabChild::Destroy (obsolete) (deleted) — Details — Splinter Review

I wasn't able to reproduce the crash, but I was able to reproduce the XHR failing when a service worker was registered. This patch fixes that problem. Hopefully it will solve the crashes too. Blake, the basic problem is that there's a sync XHR and within the nested event loop we're tearing down the PBrowser. That causes the XHR to fail/crash. We already protect against tearing down PContent in this situation. This patch just protects PBrowser in the same (ugly) way. Eventually I would like to avoid running (at least some) IPC message handlers from within nested event loops, but that will require quite a bit more testing.

Flags: needinfo?(wmccloskey)

Attachment #8772627 - Flags: review?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Updated

•

8 years ago

Attachment #8772627 - Flags: review?(mrbkap) → review+

Kan-Ru Chen [:kanru] (UTC+9)

Comment 59

•

8 years ago

Still seeing the crashes. Bill, is the patch ready to land?

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 60

•

8 years ago

Not quite. There were some test failures that I need to figure out.

Flags: needinfo?(wmccloskey)

Valentin Gosu [:valentin] (he/him)

Updated

•

8 years ago

Assignee: valentin.gosu → wmccloskey

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 61

•

8 years ago

Marking this affected for 50 since the bug was reopened. Once we have a verified fix, we may want to uplift it as far as beta since it looks like a top crash in beta 1.

status-firefox50: fixed → affected

Tobias B. Besemer [:BesTo] (QA)

Comment 62

•

8 years ago

[Tracking Requested - why for this release]: FF51.0a1: 165 crashes in the last 7 days https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozalloc_abort%20%7C%20NS_DebugBreak%20%7C%20mozilla%3A%3Aipc%3A%3AFatalError%20%7C%20mozilla%3A%3Anet%3A%3APNeckoChild%3A%3ASendPHttpChannelConstructor

status-firefox51: --- → affected

tracking-firefox51: --- → ?

Tobias B. Besemer [:BesTo] (QA)

Updated

•

8 years ago

OS: All → Windows

Version: Trunk → 49 Branch

Tobias B. Besemer [:BesTo] (QA)

Updated

•

8 years ago

status-firefox48: --- → affected

status-firefox-esr45: --- → unaffected

Version: 49 Branch → 48 Branch

Marcia Knous [:marcia]

Reporter

Comment 63

•

8 years ago

Tracking 51+ for this crash which we can check on nightly volume once the fix lands.

tracking-firefox51: ? → +

Liz Henry (:lizzard Please n-i to RyanVM, jcristau, or pascal)

Comment 64

•

8 years ago

Something else must have mitigated this crash for 49, maybe something Dragana worked on. It isn't a topcrash in beta anymore, but still looks significant volume in 51 and 50.

status-firefox48: affected → wontfix

status-firefox49: affected → wontfix

Marcia Knous [:marcia]

Reporter

Comment 65

•

8 years ago

(In reply to Bill McCloskey (:billm) from comment #60) > Not quite. There were some test failures that I need to figure out. Bill: Any progress yet? This remains #4 top browser crash on nightly.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 66

•

8 years ago

I spent quite a bit of time on this today. It turns out my original patch wouldn't have fixed the problem. I'm not sure why I thought it would. However, I've been able to reproduce a crash that looks identical to the ones we're seeing in the reports. Here's what I think is happening: 1. The user quits while there is a page open that has a service worker registered. The page has initiated an XHR at some point (maybe in an onunload handler, but could be anywhere). 2. We go through the shutdown sequence, sending the Destroy message to every tab and then the Shutdown message to the ContentChild. 3. The parent calls the Close() method on the PContent channel. 4. The child receives the goodbye message on the IPC thread and posts a runnable on the main thread to call ActorDestroy on the ContentChild. It immediately sets the channel state to closed. 5. At this point, the XHR rises up and decides it wants to ContinueAsyncOpen. Even though the tab it's associated with is dead, it doesn't notice this because the tabChild we get from GetCallbacks is null anyway. I'm not sure why this is. When it sends the PHttpChannelConstructor message, Connected() is false for the channel and we assert. 6. We write out a minidump, but the parent never intercepts it because the ContentParent for that process is already gone. That's why the ProcessType is being set incorrectly for these crashes (bug 1299964). I'll prepare a patch for this ASAP. We can patch around this, but it exposes some fundamental issues in how IPDL deals with failure. Since this is a shutdown crash that only gets reported through the infobar, it's invisible to release users. So it's not a major priority.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 67

•

8 years ago

Attached patch don't send HTTP constructor when shutting down (deleted) — Details — Splinter Review

Please see the previous comment for an explanation of what's going on. Basically, there's a short time interval between when the IPC thread finds out that the channel is closed and when the main thread finds out that the channel is closed where we'll crash if we send any constructor messages. It seems like networking is where this tends to happen, but it could happen anywhere. This patch fixes the problem for networking. The parent won't Close() the channel until the child sends the FinishShutdown message. We really need to fix this more generally though. I'll file something for that.

Attachment #8772627 - Attachment is obsolete: true

Attachment #8787798 - Flags: review?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Comment 68

•

8 years ago

Comment on attachment 8787798 [details] [diff] [review] don't send HTTP constructor when shutting down Review of attachment 8787798 [details] [diff] [review]: ----------------------------------------------------------------- r=me with my question addressed. ::: netwerk/protocol/http/HttpChannelChild.cpp @@ +1998,5 @@ > AddIPDLReference(); > > + ContentChild* cc = static_cast<ContentChild*>(gNeckoChild->Manager()); > + if (cc->IsShuttingDown()) { > + return NS_ERROR_FAILURE; Shouldn't this happen before the AddIPDLReference call? If we don't call SendPHttpChannelConstructor then we never actually transfer ownership. It isn't a big deal because we're shutting down anyway, but might be worth fixing.

Attachment #8787798 - Flags: review?(mrbkap) → review+

Gabriele Svelto [:gsvelto]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1282776

Pulsebot

Comment 69

•

8 years ago

Pushed by wmccloskey@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/f7fc4fab8018 Don't send HTTP constructor when content process has started to shut down (r=mrbkap)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 70

•

8 years ago

Good point. I moved the check up above the addref.

Carsten Book [:Tomcat]

Comment 71

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/f7fc4fab8018

Status: REOPENED → RESOLVED

Closed: 8 years ago → 8 years ago

status-firefox51: affected → fixed

Resolution: --- → FIXED

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 72

•

8 years ago

Hi Bill, should we consider uplifting this fix to Aurora50?

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 73

•

8 years ago

Sure. I would like to hold off uplifting to beta since this crash doesn't really affect release users, and we should make sure there are no regressions.

Flags: needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 74

•

8 years ago

Comment on attachment 8787798 [details] [diff] [review] don't send HTTP constructor when shutting down This fixes a content process crash that is only visible to users when the unsubmitted crashes infobar pops up. It cannot lead to data loss as far as I know. Approval Request Comment [Feature/regressing bug #]: e10s I think [User impact if declined]: more unsubmitted crash reports [Describe test coverage new/current, TreeHerder]: on m-c for a little while [Risks and why]: Simple patch, pretty low risk. [String/UUID change made/needed]: None

Attachment #8787798 - Flags: approval-mozilla-aurora?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 75

•

8 years ago

I'll wait for this to stabilize on Nightly51 for a day or two before approving the uplift to Aurora50.

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 76

•

8 years ago

Comment on attachment 8787798 [details] [diff] [review] don't send HTTP constructor when shutting down Crash fix, verified that the data from Nightly looks promising so far, Aurora50+

Attachment #8787798 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Target Milestone: mozilla50 → mozilla51

Ryan VanderMeulen [:RyanVM]

Comment 77

•

8 years ago

bugherder uplift

https://hg.mozilla.org/releases/mozilla-aurora/rev/5a6b175148fc

status-firefox50: affected → fixed

Bill McCloskey [inactive unless it's an emergency] (:billm)

Assignee

Comment 78

•

8 years ago

It looks like this worked. Last nightly crash was on 9/7.

Junior [inactive]

Updated

•

8 years ago

Blocks: 1363659

Don't send SendPHttpChannelConstructor ipdl messages during child shutdown 8 years ago Valentin Gosu [:valentin] (he/him) (deleted), patch	dragana : review+ jdm : review- mcmanus : feedback+	Details \| Diff \| Splinter Review
Don't send SendPHttpChannelConstructor ipdl messages during child shutdown 8 years ago Valentin Gosu [:valentin] (he/him) (deleted), patch	jdm : review+ dragana : review+	Details \| Diff \| Splinter Review
Check if TabChild is still available before calling SendPHttpChannelConstructor 8 years ago Valentin Gosu [:valentin] (he/him) (deleted), patch	billm : review+	Details \| Diff \| Splinter Review
wait for nested event loops to finish at TabChild::Destroy 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	mrbkap : review+	Details \| Diff \| Splinter Review
don't send HTTP constructor when shutting down 8 years ago Bill McCloskey [inactive unless it's an emergency] (:billm) (deleted), patch	mrbkap : review+ ritu : approval-mozilla-aurora+	Details \| Diff \| Splinter Review