1423261 - Crash in mozilla::ipc::MessageChannel::Clear | mozilla::ipc::MessageChannel::~MessageChannel | mozilla::dom::PContentParent::~PContentParent

Reporter

Description

•

7 years ago

This bug was filed from the Socorro interface and is report bp-0077eff1-a75a-4c26-ac0c-0dd730171205. ============================================================= Top 10 frames of crashing thread: 0 xul.dll mozilla::ipc::MessageChannel::Clear ipc/glue/MessageChannel.cpp:708 1 xul.dll mozilla::ipc::MessageChannel::~MessageChannel ipc/glue/MessageChannel.cpp:581 2 xul.dll mozilla::dom::PContentParent::~PContentParent ipc/ipdl/PContentParent.cpp:264 3 xul.dll mozilla::dom::ContentParent::cycleCollection::DeleteCycleCollectable dom/ipc/ContentParent.h:316 4 xul.dll SnowWhiteKiller::~SnowWhiteKiller xpcom/base/nsCycleCollector.cpp:2729 5 xul.dll nsCycleCollector::FreeSnowWhite xpcom/base/nsCycleCollector.cpp:2917 6 xul.dll nsCycleCollector::Shutdown xpcom/base/nsCycleCollector.cpp:3987 7 xul.dll nsCycleCollector_shutdown xpcom/base/nsCycleCollector.cpp:4373 8 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:973 9 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup toolkit/xre/nsAppRunner.cpp:1508 ============================================================= crashes during shutdown with this signature are (re)appearing in firefox 58 - the same signature got fixed once before in bug 1363601. on 58.0b this is currently the #3 browser top crash, all the crashes are from windows and come with MOZ_CRASH(MessageChannel destroyed without being closed). it first regressed on 58.0a1 build 20171003100226, so a likely pushlog containing the regression would be: https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=15f221f491f707b1e8e46da344b6dd5a394b1242&tochange=11fe0a2895aab26c57bcfe61b3041d7837e954cd

Jim Mathies [:jimm]

Comment 1

•

7 years ago

Hey Gabor, can you take a quick look here? You fixed this previously, had something to do with preallocated processes.

Flags: needinfo?(gkrizsanits)

Gabor Krizsanits (INACTIVE)

Comment 2

•

7 years ago

(In reply to [:philipp] from comment #0) > This bug was filed from the Socorro interface and is > report bp-0077eff1-a75a-4c26-ac0c-0dd730171205. > ============================================================= > > Top 10 frames of crashing thread: > > 0 xul.dll mozilla::ipc::MessageChannel::Clear ipc/glue/MessageChannel.cpp:708 > 1 xul.dll mozilla::ipc::MessageChannel::~MessageChannel > ipc/glue/MessageChannel.cpp:581 > 2 xul.dll mozilla::dom::PContentParent::~PContentParent > ipc/ipdl/PContentParent.cpp:264 > 3 xul.dll > mozilla::dom::ContentParent::cycleCollection::DeleteCycleCollectable > dom/ipc/ContentParent.h:316 > 4 xul.dll SnowWhiteKiller::~SnowWhiteKiller > xpcom/base/nsCycleCollector.cpp:2729 > 5 xul.dll nsCycleCollector::FreeSnowWhite > xpcom/base/nsCycleCollector.cpp:2917 > 6 xul.dll nsCycleCollector::Shutdown xpcom/base/nsCycleCollector.cpp:3987 > 7 xul.dll nsCycleCollector_shutdown xpcom/base/nsCycleCollector.cpp:4373 > 8 xul.dll mozilla::ShutdownXPCOM xpcom/build/XPCOMInit.cpp:973 > 9 xul.dll ScopedXPCOMStartup::~ScopedXPCOMStartup > toolkit/xre/nsAppRunner.cpp:1508 > > ============================================================= > > crashes during shutdown with this signature are (re)appearing in firefox 58 > - the same signature got fixed once before in bug 1363601. > on 58.0b this is currently the #3 browser top crash, all the crashes are > from windows and come with MOZ_CRASH(MessageChannel destroyed without being > closed). > > it first regressed on 58.0a1 build 20171003100226, so a likely pushlog > containing the regression would be: > https://hg.mozilla.org/mozilla-central/ > pushloghtml?fromchange=15f221f491f707b1e8e46da344b6dd5a394b1242&tochange=11fe > 0a2895aab26c57bcfe61b3041d7837e954cd The patch that reenables the preallocated process manager is in this range. However that didn't come with a big change back then, so I'm not convinced that it is related. (note that we have crashes with this signature even on 57 release where the ppm is disabled completely). Something seems to be changed around November 15th-ish: https://crash-stats.mozilla.com/signature/?product=Firefox&signature=mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AClear%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3A~MessageChannel%20%7C%20mozilla%3A%3Adom%3A%3APContentParent%3A%3A~PContentParent&date=%3E%3D2017-09-01T13%3A35%3A00.000Z&date=%3C2017-12-07T13%3A35%3A59.000Z#graphs that's where the crash-rate went up like crazy, I'm not sure why, could not found anything suspicious yet.

Flags: needinfo?(gkrizsanits)

[:philipp]

Reporter

Comment 3

•

7 years ago

(In reply to Gabor Krizsanits [:krizsa :gabor] from comment #2) > Something seems to be changed around November 15th-ish: > https://crash-stats.mozilla.com/signature/ > ?product=Firefox&signature=mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AClear%20 > %7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3A~MessageChannel%20%7C%20mozil > la%3A%3Adom%3A%3APContentParent%3A%3A~PContentParent&date=%3E%3D2017-09- > 01T13%3A35%3A00.000Z&date=%3C2017-12-07T13%3A35%3A59.000Z#graphs > > that's where the crash-rate went up like crazy, I'm not sure why, could not > found anything suspicious yet. that's when 58 was released/pushed to the beta population. if you focus in to crashes on nightly you can see they started appearing on october 3rd on that channel (pushlog of the changes is in comment #0): https://crash-stats.mozilla.com/signature/?product=Firefox&release_channel=nightly&signature=mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AClear%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3A~MessageChannel%20%7C%20mozilla%3A%3Adom%3A%3APContentParent%3A%3A~PContentParent&date=%3E%3D2017-09-01T15%3A35%3A00.000Z#graphs

Mike Taylor [:miketaylr]

Comment 4

•

7 years ago

Gabor, does anything in the range from Comment #3 stick out to you?

Flags: needinfo?(gkrizsanits)

Gabor Krizsanits (INACTIVE)

Comment 5

•

7 years ago

(In reply to Mike Taylor [:miketaylr] (58 Regression Engineering Owner) from comment #4) > Gabor, does anything in the range from Comment #3 stick out to you? Based on Comment 3, it's probably the preallocated process manager (Bug 1385249) I've looked into this in the last couple of days, but without being able to reproduce it it's quite hard to tell what's going on. It's a Windows only shutdown crash, and it seems like it can happen without the ppm but the ppm makes it (much) more likely that it happens. What _should_ make sure that this never happens is that during the shutdown we explicitly spin the event loop and wait for the channels to close for all content processes, including the ppm here: https://searchfox.org/mozilla-central/rev/f6f1731b1b7fec332f86b55fa40e2c9ae67ac39b/dom/ipc/ContentParent.cpp#2764 and short before that we make sure that no more preallocated process is spawned during the shutdown here: https://searchfox.org/mozilla-central/rev/f6f1731b1b7fec332f86b55fa40e2c9ae67ac39b/dom/ipc/PreallocatedProcessManager.cpp#139 I discovered that probably after the channel is closed the ppm should release the preallocated process (this is happening a bit later now: https://searchfox.org/mozilla-central/rev/f6f1731b1b7fec332f86b55fa40e2c9ae67ac39b/dom/ipc/PreallocatedProcessManager.cpp#84) What might also cause some troubles is that while we're waiting for the channel mForceKillTimer might get fired, but I fail to see how would that leave the channel open.

Flags: needinfo?(gkrizsanits)

Gabor Krizsanits (INACTIVE)

Comment 6

•

7 years ago

Attached patch WIP clear the ppm earlier during shutdown (obsolete) (deleted) — Details — Splinter Review

I guess it's worth a try to release the pp earlier and see if that helps somehow, but probably this will require some more work to understand the underlying real issue around content process shutdown. Another approach is to convert the failing release assertion to something else during shutdown that is more resilient.

Andrew Overholt [:overholt]

Comment 7

•

7 years ago

Jim/Blake, any ideas of how to progress here?

Flags: needinfo?(mrbkap)

Flags: needinfo?(jmathies)

Blake Kaplan (:mrbkap) (inactive)

Updated

•

7 years ago

Assignee: nobody → mrbkap

Flags: needinfo?(mrbkap)

Blake Kaplan (:mrbkap) (inactive)

Comment 8

•

7 years ago

This is almost certainly the same problem that billm found in bug 1138520, comment 26. In particular, KillHard either isn't killing the (hung?) child process or, at least, killing the process without notifying the parent and closing the channel. I don't know too much about Windows IPC stuff and especially its interaction with force killing processes. Jim, do you know what could cause this? It's worth noting that in the past, this seems to have happened for hung child processes (and those processes stayed hung for at least a minute, triggering the parent process's shutdown hang killer). We probably should be figuring out what's happening to these processes.

[:philipp]

Reporter

Comment 9

•

7 years ago

[Tracking Requested - why for this release]: so far in 58rc1 this is the #2 top browser crash (close to 3% of all browser crashes)

tracking-firefox58: --- → ?

Ritu Kothari (:ritu) (Inactive, please n-i to RyanVM, jcristau, or pascal)

Comment 10

•

7 years ago

IF we can get a fix ready, this seems to be a good candidate for inclusion into a potential 58.0.1 release. GChang, fYI

Flags: needinfo?(gchang)

Blake Kaplan (:mrbkap) (inactive)

Comment 11

•

7 years ago

Hey Bob, do you have any ideas for how to track this down. I'm wondering if the child process ever starts up at all. One thing I've noticed is that we don't seem to ever upgrade the process priority of the preallocated processes that we create. I suppose it might be possible, then, that it doesn't get scheduled ever. I'll file a new bug to track that.

Flags: needinfo?(bobowencode)

Blake Kaplan (:mrbkap) (inactive)

Comment 12

•

7 years ago

(In reply to Blake Kaplan (:mrbkap) from comment #11) > One thing I've noticed is that we don't seem to ever upgrade the process > priority of the preallocated processes that we create. I suppose it might be > possible, then, that it doesn't get scheduled ever. I'll file a new bug to > track that. After reading more code, setting process priorities isn't enabled on any of our current platforms anyway, so fixing this won't have any effect.

Julien Cristau [:jcristau]

Comment 13

•

7 years ago

I'll track this as a possible dot release candidate for 58.

tracking-firefox58: ? → +

Flags: needinfo?(gchang)

Jim Mathies [:jimm]

Updated

•

7 years ago

Blocks: 1385249

Jim Mathies [:jimm]

Comment 14

•

7 years ago

https://crash-stats.mozilla.com/search/?signature=%3Dmozilla%3A%3Aipc%3A%3AMessageChannel%3A%3AClear%20%7C%20mozilla%3A%3Aipc%3A%3AMessageChannel%3A%3A~MessageChannel%20%7C%20mozilla%3A%3Adom%3A%3APContentParent%3A%3A~PContentParent&product=Firefox&date=%3E%3D2018-01-11T14%3A45%3A34.000Z&date=%3C2018-01-18T14%3A45%3A34.000Z&_sort=-date&_facets=signature&_facets=version&_facets=uptime&_facets=shutdown_progress&_columns=date&_columns=signature&_columns=product&_columns=version&_columns=build_id&_columns=platform#facet-uptime crash notes - * always occurs during xpcom shutdown * uptimes are between 25 and about 130 seconds, none higher. * currently #4 top browser crash for beta 58 I'm not sure what those uptimes imply but I find it interesting. An odd timing window.

Jim Mathies [:jimm]

Comment 15

•

7 years ago

also impacts both 64 and 32 bit builds, although 32-bit has a much higher incident rate.

Jim Mathies [:jimm]

Comment 16

•

7 years ago

(In reply to Jim Mathies [:jimm] from comment #14) > * uptimes are between 25 and about 130 seconds, none higher. > > I'm not sure what those uptimes imply but I find it interesting. An odd > timing window. I think super search is messing with me here, uptimes go higher they just don't show up in search results.

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Comment 17

•

7 years ago

(In reply to Blake Kaplan (:mrbkap) from comment #8) > This is almost certainly the same problem that billm found in bug 1138520, > comment 26. In particular, KillHard either isn't killing the (hung?) child > process or, at least, killing the process without notifying the parent and > closing the channel. I don't think it's the responsibility of the child process to communicate shutdown. The KillProcess call in KillHard's OnGenerateMinidumpComplete calls directly through to TerminateProcess. I've never seen that call fail. There's a scary wait in here btw [1], 60 seconds waiting for the process to shut down. > I don't know too much about Windows IPC stuff and especially its interaction > with force killing processes. Jim, do you know what could cause this? It's Not yet. One note, I don't think this is a case where the child never launches since we open the handle [2] right before we try to terminate it. That call should fail for an invalid handle. [1] https://searchfox.org/mozilla-central/source/ipc/chromium/src/base/process_util_win.cc#427 [2] https://searchfox.org/mozilla-central/source/dom/ipc/ContentParent.cpp#3161

Jim Mathies [:jimm]

Updated

•

7 years ago

Flags: needinfo?(jmathies)

Jim Mathies [:jimm]

Updated

•

7 years ago

Flags: needinfo?(jmathies)

Flags: needinfo?(bobowencode)

Jim Mathies [:jimm]

Updated

•

7 years ago

Assignee: mrbkap → nobody

Jim Mathies [:jimm]

Updated

•

7 years ago

Priority: -- → P2

Mike Taylor [:miketaylr]

Updated

•

7 years ago

status-firefox58: affected → wontfix

tracking-firefox58: + → ---

WIP clear the ppm earlier during shutdown 7 years ago Gabor Krizsanits (INACTIVE) (deleted), patch		Details \| Diff \| Splinter Review
Diagnostics patch for mChannelState 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Fix the way we annotate crash reports 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Add crash message annotation 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch		Details \| Diff \| Splinter Review
Add crash message annotation 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Patch - set shutdown timeout in child processes 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Patch - set shutdown timeout in child processes 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	spohl : review+	Details \| Diff \| Splinter Review
Fix: Close IPC channel during KillHard shutdowns 7 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Fix: Skip intentionally crashing the browser when KillHard has been called 6 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review
Revert changeset 58ddcf890331 (attachment 8959581) 6 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	spohl : review+	Details \| Diff \| Splinter Review
Fix: Skip intentionally crashing the browser when KillHard has been called 6 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+ RyanVM : approval-mozilla-esr60+	Details \| Diff \| Splinter Review
Restrict fix to nightly only 6 years ago Stephen A Pohl [:spohl] (OOO until 9/8) (deleted), patch	jimm : review+	Details \| Diff \| Splinter Review