Closed Bug 1215265 Opened 9 years ago Closed 8 years ago

CompositorChild leaks in e10s content process

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla48

Tracking Flags:

Tracking

Status

e10s

---

firefox46

---

wontfix

firefox48

---

fixed

People

(Reporter: mccr8, Assigned: nical)

References

(Blocks 3 open bugs)

Details

(Whiteboard: [gfx-noted])

Attachments

(5 files, 7 obsolete files)

MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
MozReview Request: Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
MozReview Request: Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), text/x-review-board-request	nical : review+	Details
patch 9 years ago Robert O'Callahan (:roc) (email my personal email if necessary) (deleted), patch		Details \| Diff \| Splinter Review
Shut ImageBridge down properly 8 years ago Nicolas Silva [:nical] (deleted), patch	sotaro : review+	Details \| Diff \| Splinter Review
do a cycle collection before shutting down gfx 8 years ago Nicolas Silva [:nical] (deleted), patch	mccr8 : review-	Details \| Diff \| Splinter Review
Shut ShadowLayers down properly 8 years ago Nicolas Silva [:nical] (deleted), patch	sotaro : review+	Details \| Diff \| Splinter Review
Add a pref to enable shutting child processes down and disable it on windows. 8 years ago Nicolas Silva [:nical] (deleted), patch	sotaro : review+	Details \| Diff \| Splinter Review
Shut dhadowlayers down properly (adendum) 8 years ago Nicolas Silva [:nical] (deleted), patch	sotaro : review+	Details \| Diff \| Splinter Review

Andrew McCreight [:mccr8]

Reporter

Description

•

9 years ago

The previous bug for this (bug 1065536) ended up as kind of a mess, so I'm filing a new bug for this instead. CompositorChild does not properly shutdown in the content process, which results in leaks. There is only one per process, and it is expected to live as long as the process itself, so this is not an issue from a memory usage perspective, but it does make it harder to find other leaks.

Andrew McCreight [:mccr8]

Reporter

Comment 1

•

9 years ago

This class also leaks a Transport, because it does not have the DeleteTask boilerplate in its dtor. (As in bug 1213407.)

Brad Lassey [:blassey] (use needinfo?)

Updated

•

9 years ago

tracking-e10s: --- → +

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 2

•

9 years ago

I'm looking into this with rr. The leaked reference is in CompositorChild::sCompositor. CompositorChild::ActorDestroy is not called before the process shuts down.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 3

•

9 years ago

For chrome processes, nsBaseWidget::Shutdown calls nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content processes, the PuppetWidget doesn't have a mCompositorChild. It's a global singleton; I don't know what's supposed to be responsible for cleaning it up on Shutdown. nical, do you?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 4

•

9 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #3)
> For chrome processes, nsBaseWidget::Shutdown calls
> nsBaseWidget::DestroyCompositor to destroy the CompositorChild. For content
> processes, the PuppetWidget doesn't have a mCompositorChild. It's a global
> singleton; I don't know what's supposed to be responsible for cleaning it up
> on Shutdown. nical, do you?

Hm, I assumed the PuppetWidget would handle this since it inherits nsBaseWidget's code that does the work, but mCompositorChild is indeed null.
It's not quite clear to me whether CompositorChild was originally intended to be a singleton or if it was hacked into being one.
If we never want to have several CompositorChilds in the same process, I'd be game for making it REALLY a singleton, and take its destruction out of nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the other protocols. Otherwise we could make sure PuppetWidget has its mCompopositorChild member properly set like other widgets but then let's remove the singleton in the process!
Considering the current state of the code, I have a slight preference for the first approach.

Flags: needinfo?(nical.bugzilla)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 5

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #4)
> Hm, I assumed the PuppetWidget would handle this since it inherits
> nsBaseWidget's code that does the work, but mCompositorChild is indeed null.
> It's not quite clear to me whether CompositorChild was originally intended
> to be a singleton or if it was hacked into being one.
> If we never want to have several CompositorChilds in the same process, I'd
> be game for making it REALLY a singleton, and take its destruction out of
> nsBaseWidget to put it in gfxPlatform::ShutdownLayersIPC along with the
> other protocols. Otherwise we could make sure PuppetWidget has its
> mCompopositorChild member properly set like other widgets but then let's
> remove the singleton in the process!
> Considering the current state of the code, I have a slight preference for
> the first approach.

The current situation is that in chrome processes there is one CompositorChild per window, but in content processes there is only one CompositorChild. Assuming we keep that, destroying the CompositorChild gfxPlatform::ShutdownLayersIPC looks like a good option.

Nicolas Silva [:nical]

Assignee

Comment 6

•

9 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> The current situation is that in chrome processes there is one
> CompositorChild per window,

I think you meant CompositorParent? My understanding is that the chrome process also has 1 and only CompositorChild but I may be wrong.

> but in content processes there is only one
> CompositorChild. Assuming we keep that, destroying the CompositorChild
> gfxPlatform::ShutdownLayersIPC looks like a good option.

Ok

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #6)
> (In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #5)
> > The current situation is that in chrome processes there is one
> > CompositorChild per window,
> 
> I think you meant CompositorParent? My understanding is that the chrome
> process also has 1 and only CompositorChild but I may be wrong.

I think that's incorrect. I think in the chrome process there's one CompositorParent and one CompositorChild per window, and one CrossProcessCompositorParent per content process. Each content process gets one CompositorChild corresponding to its CrossProcessCompositorParent.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 8

•

9 years ago

Attached file MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical (obsolete) (deleted) — Details

Bug 1215265. Don't leak CompositorChild in content processes. r=nical

This makes the shutdown sequence of CrossProcessCompositorParent
work very much like the shutdown sequence of CompositorParent:
the child does SendStop(); the parent's RecvStop does some cleanup
and then queues a DeferredDestroy task; DeferredDestroy releases
the parent, destroying it; IPDL sends a message to destroy the child.

Attachment #8688935 - Flags: review?(nical.bugzilla)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 9

•

9 years ago

Attached file MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical (obsolete) (deleted) — Details

Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Attachment #8688936 - Flags: review?(nical.bugzilla)

Andrew McCreight [:mccr8]

Reporter

Comment 10

•

9 years ago

Please also remove the appendExpectedLeakCounts() chunk for bug 1215265 from leaklog.py:
  http://mxr.mozilla.org/mozilla-central/source/testing/mozbase/mozleak/mozleak/leaklog.py#49
(I recently added object-count-based leak thresholds for content process leak checking, because byte-based thresholds let too much through.)

Andrew McCreight [:mccr8]

Reporter

Comment 11

•

9 years ago

Also, thanks for fixing this leak! It makes it harder to notice real leaks.

Nicolas Silva [:nical]

Assignee

Updated

•

9 years ago

Attachment #8688935 - Flags: review?(nical.bugzilla) → review+

Nicolas Silva [:nical]

Assignee

Comment 12

•

9 years ago

Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

https://reviewboard.mozilla.org/r/25451/#review22977

::: gfx/thebes/gfxPlatform.cpp:751
(Diff revision 1)
> +        CompositorChild::ShutdownLayersIPC();

It would be great if all CompositorChilds (in parent and child process) had the same shutdown logic, but we can look into that as a followup.

Nicolas Silva [:nical]

Assignee

Comment 13

•

9 years ago

Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

https://reviewboard.mozilla.org/r/25453/#review22975

::: gfx/layers/ipc/ImageBridgeParent.cpp:62
(Diff revision 1)
> +  MOZ_COUNT_CTOR(ImageBridgeParent);

Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all PImageBridgeParents are ImageBridgeParents leaks will be reported twice.

Attachment #8688936 - Flags: review?(nical.bugzilla) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 14

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #12)
> It would be great if all CompositorChilds (in parent and child process) had
> the same shutdown logic, but we can look into that as a followup.

That would require either enforcing one PuppetWidget per process (which we definitely don't want to do) or using one CompositorChild per PuppetWidget (which would require significant changes, e.g. CompositorParent relies on being able to find a LayerTransactionParent given a child process ID).

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 15

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #13)
> Comment on attachment 8688936 [details]
> MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs.
> r=nical
> 
> https://reviewboard.mozilla.org/r/25453/#review22975
> 
> ::: gfx/layers/ipc/ImageBridgeParent.cpp:62
> (Diff revision 1)
> > +  MOZ_COUNT_CTOR(ImageBridgeParent);
> 
> Nit: Aren't IPDL protocols already MOZ_COUNT_CTOR/DTOR'd ? since all
> PImageBridgeParents are ImageBridgeParents leaks will be reported twice.

True, although we already do that in many cases. But I'll drop the ImageBridge changes.

Nicolas Silva [:nical]

Assignee

Comment 16

•

9 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #14)
> That would require either enforcing one PuppetWidget per process (which we
> definitely don't want to do) or using one CompositorChild per PuppetWidget
> (which would require significant changes, e.g. CompositorParent relies on
> being able to find a LayerTransactionParent given a child process ID).

So IIUC, we currently have several (puppet) widgets which use the same CompositorChild (of which we have exactly one instance per process, child or parent alike). What I have in mind is a static CompositorChild::ShutDown method (or whatever its name) which cleans up the singleton CompsitorChild, or at least all of its IPC components, maybe leaving a refcounted CompositorChild in a state where it can't do anything but won't crash either if some widgets are still alive past that point (which would surprise me because it'd be super late in in the shutdown sequence). But I may be missing something.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 17

•

9 years ago

(In reply to Nicolas Silva [:nical] from comment #16)
> What I have in mind is a static CompositorChild::ShutDown
> method (or whatever its name) which cleans up the singleton CompsitorChild,
> or at least all of its IPC components, maybe leaving a refcounted
> CompositorChild in a state where it can't do anything but won't crash either
> if some widgets are still alive past that point (which would surprise me
> because it'd be super late in in the shutdown sequence). But I may be
> missing something.

How would that be different from my patch here?

Nicolas Silva [:nical]

Assignee

Comment 18

•

9 years ago

(In reply to Robert O'Callahan (:roc) (Mozilla Corporation) from comment #17)
> How would that be different from my patch here?

Not much. In your patch, CompositorChild::ShutdownLayersIPC() is called only in child processes, and it would be nice if instead of having the widget destroy the CompsoitorChild in the parent process, we'd just unconditionally shut down the CompositorChild singleton in both the child and parent processes the same way from XPCOM's shutdown. The work should be the same so i'd rather not have to remember 2 paths to clean CompositorChild up.
I don't mean to block landing your patches, I say this out loud so I'll remember and maybe put a patch up next time I have some time to spare on small cleanups.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

9 years ago

The problem is that CompositorChild is not a singleton in the parent process, there's one per widget (because there's one CompositorParent were widget). Maybe we should fix that but it's quite a lot of work I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 20

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7b113a6167e94e97645286e7431b98d17bc2610c
Bug 1215265. Don't leak CompositorChild in content processes. r=nical

https://hg.mozilla.org/integration/mozilla-inbound/rev/0cbe4b80d443dd21899a600b7a9e0e8449ccdc4c
Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 21

•

9 years ago

Backed out for being the likely cause of mass bustage on inbound. If it wasn't these, it was one or more of the other bugs patched in that push:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=162ded2e49c8



remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/564f5491ccdd
remote:   https://hg.mozilla.org/integration/mozilla-inbound/rev/acbac8b86619

Flags: needinfo?(roc)

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1227347

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 22

•

9 years ago

Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/1-2/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 23

•

9 years ago

Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/1-2/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 24

•

9 years ago

Attached file MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical (obsolete) (deleted) — Details

Bug 1215265. Shut down ImageBridgeChild properly. r=nical

Attachment #8691204 - Flags: review?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 25

•

9 years ago

Comment on attachment 8691204 [details]
MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical

https://reviewboard.mozilla.org/r/26107/#review23467

nice

Attachment #8691204 - Flags: review?(nical.bugzilla) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 28

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=8912d9ae19b1

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 29

•

9 years ago

Attached file MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical (deleted) — Details

Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

Attachment #8692657 - Flags: review?(nical.bugzilla)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 30

•

9 years ago

Without this latest patch, the first three patches cause shutdown hangs because a CompositorThreadHolder leaks.

Flags: needinfo?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 31

•

9 years ago

Two problems showed up in the latest try push. We're leaking CompositorThreadHolder, on Windows only. I plan to add some logging to try to track that down.

On B2G, content processes are crashing in CompositorChild::ActorDestroy:
  // Due to poor lifetime management of gralloc (and possibly shmems) we will
  // crash at some point in the future when we get destroyed due to abnormal
  // shutdown. Its better just to crash here. On desktop though, we have a chance
  // of recovering.
  if (aWhy == AbnormalShutdown) {
    NS_RUNTIMEABORT("ActorDestroy by IPC channel failure at CompositorChild");
  }

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 32

•

9 years ago

We reach ActorDestroy(AbnormalShutdown) on desktop too.

The basic problem seems to be that CompositorChild::sCompositor (used only in content processes) is assigned in CompositorChild::Create by forget()ing the reference we created. So there's an implicit 1 refcount held by sCompositor. But the only balancing Release is in CompositorChild::ActorDestroy. So during content process shutdown, we call SendStop to trigger destruction of the CompositorParent and then do CompositorChild::DeferredDestroy, but DeferredDestroy does not cause the CompositorChild to be deleted (since it adds as many refs as it releases). So CompositorChild stays alive until IPC notices that the CompositorParent's channel has been destroyed, at which point we get a channel error and CompositorChild::ActorDestroy with an abnormal reason.

I think that DeferredDestroy should release the reference that ActorDestroy releases.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 33

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=c4ef7d7827f3

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 34

•

9 years ago

Comment on attachment 8688935 [details]
MozReview Request: Bug 1215265. Don't leak CompositorChild in content processes. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25451/diff/2-3/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 35

•

9 years ago

Comment on attachment 8688936 [details]
MozReview Request: Bug 1215265. Add some missing MOZ_COUNT_CTOR/DTORs. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/25453/diff/2-3/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 36

•

9 years ago

Comment on attachment 8691204 [details]
MozReview Request: Bug 1215265. Shut down ImageBridgeChild properly. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26107/diff/1-2/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 37

•

9 years ago

Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/26297/diff/1-2/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 38

•

9 years ago

Attached file MozReview Request: Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical (obsolete) (deleted) — Details

Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical

Attachment #8692833 - Flags: review?(nical.bugzilla)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 39

•

9 years ago

Attached file MozReview Request: Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical (obsolete) (deleted) — Details

Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical

With the new setup, RecvStop gets called and then it's possible for the
CompositorChild to go away, causing ActorDestroy to get called. We need
to make sure it's safe to call Destroy() twice.

Attachment #8692834 - Flags: review?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Updated

•

9 years ago

Attachment #8692657 - Flags: review?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 40

•

9 years ago

Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

https://reviewboard.mozilla.org/r/26297/#review23777

The patch title says something about mCompositorThreadHolder and mSelfRef but I don't see references to them in the actual patch, is this the complete patch?

Nicolas Silva [:nical]

Assignee

Comment 41

•

9 years ago

Comment on attachment 8692657 [details]
MozReview Request: Bug 1215265. Make CrossProcessCompositorParent::ActorDestroy release mCompositorThreadHolder and mSelfRef. r=nical

https://reviewboard.mozilla.org/r/26297/#review23785

Attachment #8692657 - Flags: review+

Nicolas Silva [:nical]

Assignee

Comment 42

•

9 years ago

Comment on attachment 8692833 [details]
MozReview Request: Bug 1215265. Make sCompositor clearly a strong reference and release it in DeferredDestroyCompositor. r=nical

https://reviewboard.mozilla.org/r/26335/#review23781

Attachment #8692833 - Flags: review?(nical.bugzilla) → review+

Nicolas Silva [:nical]

Assignee

Comment 43

•

9 years ago

Comment on attachment 8692834 [details]
MozReview Request: Bug 1215265. Make it safe to call CrossProcessCompositorParent::Destroy more than once. r=nical

https://reviewboard.mozilla.org/r/26337/#review23783

Attachment #8692834 - Flags: review?(nical.bugzilla) → review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 44

•

9 years ago

https://reviewboard.mozilla.org/r/26297/#review23777

CrossProcessCompositorParent::ActorDestroy calls Destroy, which we're changing to trigger DeferredDestroy.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 45

•

9 years ago

Still got bugs on Windows and B2G:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=06d486170e47&group_state=expanded
I can't afford to spend any more time on this for the forseeable future.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 46

•

9 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

That try push included this patch for ImageBridgeChild. (As well as some logging code.)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 47

•

9 years ago

Talked to billm and mccr8 about this. Bill thinks the core problem is that we need to call CompositorParent::Close, perhaps from a runnable dispatched by RecvStop.

Sotaro Ikeda [:sotaro]

Updated

•

9 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1232549

Sylvestre Ledru [:Sylvestre]

Comment 48

•

9 years ago

Robert, bug 1232549 caused the mozilla-aurora tree to be closed. Do you have an eta to land that?

Flags: needinfo?(roc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 49

•

9 years ago

The work in this bug is harder than I thought and needs to be redone. After redoing it, which will take a while to get right, there will still be a risk of shutdown crashes and other things, so this won't be a good fit for Aurora.

Flags: needinfo?(roc)

Nicolas Silva [:nical]

Assignee

Comment 50

•

9 years ago

I tried to take the PImageBridge part of this patch queue and apply it separately but things break because of media resources being still alive when shutting gfx (and the ImageBridge) down (see bug 1207220). We won't be able to do anything with ImageBridge's shutdown as long as all media resources aren't properly destroyed when gfx is shut down, so marking dependent although it's a bit of a stretch since this bug fixes the shutdown of both the CompositorChild and ImageBridgeChild protocols, and the former isn't blocked on media issues.

Depends on: 1207220

Andrew McCreight [:mccr8]

Reporter

Comment 51

•

9 years ago

Feel free to deduplicate bug 1117203 and separate out fixing the two leaks. Any forward progress on this would be great. :)

Nicolas Silva [:nical]

Assignee

Comment 52

•

9 years ago

I'll give this bug another shot as soon as bug 1207220 is resolved which should be soon.

Sotaro Ikeda [:sotaro]

Updated

•

9 years ago

Whiteboard: [gfx-noted]

Nicolas Silva [:nical]

Assignee

Updated

•

9 years ago

Blocks: 1243369

Andrew McCreight [:mccr8]

Reporter

Updated

•

9 years ago

Blocks: 1239504

Tom Tromey :tromey

Comment 53

•

9 years ago

In bug 1239504, I'm landing a hack to work around an LSan report that arises
because the ImageBridgeChild thread is not shut down properly.
When this thread shutdown is fixed, it would be good to back out this hack
at the same time.

Brad Lassey [:blassey] (use needinfo?)

Comment 54

•

8 years ago

Andrew, do you want to pick this up from roc?

Flags: needinfo?(continuation)

Priority: -- → P3

Andrew McCreight [:mccr8]

Reporter

Comment 55

•

8 years ago

This really needs somebody who understands layers to work on it. Maybe Nicolas could take a look per comment 52?

Flags: needinfo?(continuation) → needinfo?(nical.bugzilla)

Andrew McCreight [:mccr8]

Reporter

Updated

•

8 years ago

Blocks: 1252263

Nicolas Silva [:nical]

Assignee

Comment 56

•

8 years ago

I'll look into this (I already did since comment 52 but unfortunately bug 1207220's resolution wasn't enough to get the patch queue to work).

Assignee: nobody → nical.bugzilla

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 57

•

8 years ago

I have been looking into this, lately. In order to shut CompositorChild down properly I first want to shut the ImageBridgeChild down properly since the later depends on the former. We currently don't shut Imagebridge down for content processes doing it mostly works except on windows where a lot of tests end up timing out on shutdown. For most of them, the problem is that a CompositorThreadHolder is leaked, and CompositorParent::ShutDown spins the event loop until CompositorThreadHolder is destroyed. few things actually hold references to CompsoitorThreadHolders, and ImageBridgeParent is one of them.

On most platforms, content processes' ImageBridge shutdown always ends with the ActorDestroy hook called with AbnormalShutdown as the reason, except on Windows. The fact that windows differs there is a bit worrying. We expect ImageBridge to run the ActorDestroy hook for every possible shutdown case at some point. Ideally with "Deletion" rather than "AbnormalShutdown".
ImageBridgeParent holds a self reference that is cleared only if ActroDestroy is called, hence the leak, the the deadlock.

Ironically, not shutting ImageBridge down means it is always still alive when the channel is destroyed, which causes the abnormal shutdown. AbnornmalShutdown sounds sad but at least it does get ActorDestroy called. I can't remember why we don't call __delete__ on top level protocols, I guess I'll figure that out quickly enough, but that should ensure ActorDestroy is called.

Nicolas Silva [:nical]

Assignee

Comment 58

•

8 years ago

Attached patch Shut ImageBridge down properly (deleted) — Details — Splinter Review

So, the reason we didn't Close() the channel (you can't __delete__ a top-level protocol, you close its channel instead) was because of some other bug that got resolved after a lot of the shutdown code was written, and closing the channel properly wasn't revisited afterwards.

This patch replaces the PImageBridge::Stop message by closing the ImageBridge's channel. What was in ImageBridgeParent::RecvStop got moved into ActorDestroy and actor destroy is now properly called in any situation (be it a normal or an abnormal shutdown). We have the guarantee that CompositorParent::DeferredDestroy is called, mSelfRef never leaked and all is well... or so I hope and this try push will tell: https://treeherder.mozilla.org/#/jobs?repo=try&revision=0c18425ba56e

Attachment #8691204 - Attachment is obsolete: true

Attachment #8734750 - Flags: review?(sotaro.ikeda.g)

Nicolas Silva [:nical]

Assignee

Comment 59

•

8 years ago

The try push doesn't look very good, unfortunately. A lot of the failures I've looked at are media stuff that outlived the gfx shutdown, and later released by the cycle collector. Hopefully this can be worked around throwing some cycle collections here and there.

Sotaro Ikeda [:sotaro]

Comment 60

•

8 years ago

Comment on attachment 8734750 [details] [diff] [review]
Shut ImageBridge down properly

The code seems ok, though it caused test failures.

Attachment #8734750 - Flags: review?(sotaro.ikeda.g) → review+

Sotaro Ikeda [:sotaro]

Comment 61

•

8 years ago

:nical, do you have an idea why the following crash happned? Crash of OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be deleted during the mutex is locked.

  https://treeherder.mozilla.org/logviewer.html#?job_id=18592065&repo=try

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 62

•

8 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #61)
> :nical, do you have an idea why the following crash happned? Crash of
> OffTheBooksMutex::Lock normally seems to mean that Mutex is going to be
> deleted during the mutex is locked.

Yes, in this case the HTMLMediaElement is destroyed too late in the shutdown sequence (during a cycle collection happening after gfx shutdown), so it calls into stuff that's already destroyed.
We'll have to do another pass of finding resources that outlive the gfx shutdown and clean them up synchronously. This problem is not caused by this patch per say, it's just that not shutting down the ipdl protocols and threads for child processes was hiding the issue. It does block my patches from landing, though, so we'll have to do something about it.

I'll try to land the shutdown patches preff'ed off so that we can more easily make progress in this area.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 63

•

8 years ago

Attached patch do a cycle collection before shutting down gfx (obsolete) (deleted) — Details — Splinter Review

This patch should trigger a cycle collection before shutting down gfx. I am not actually sure that it's the proper way to do this.

Attachment #8735902 - Flags: review?(continuation)

Andrew McCreight [:mccr8]

Reporter

Comment 64

•

8 years ago

Doesn't this mean we get crashes if something leaks media elements? That doesn't sound great. I think media people have had to deal with this issue for stream graph stuff. Maybe jesup or somebody has some ideas?

Andrew McCreight [:mccr8]

Reporter

Comment 65

•

8 years ago

Comment on attachment 8735902 [details] [diff] [review]
do a cycle collection before shutting down gfx

Review of attachment 8735902 [details] [diff] [review]:
-----------------------------------------------------------------

Doesn't this mean we get use-after-free crashes if something leaks media elements? That s

Attachment #8735902 - Flags: review?(continuation) → review-

Andrew McCreight [:mccr8]

Reporter

Comment 66

•

8 years ago

We also want to avoid shutdown GCs in non-DEBUG builds, though I think we never run ShutdownXPCOM in non-DEBUG builds, so that should be okay.

Andrew McCreight [:mccr8]

Reporter

Comment 67

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #65)
> Doesn't this mean we get use-after-free crashes if something leaks media
> elements? That s

Ignore that, I just meant the same block of text as I put in comment 64.

Sotaro Ikeda [:sotaro]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1254829

Nicolas Silva [:nical]

Assignee

Comment 68

•

8 years ago

(In reply to Andrew McCreight [:mccr8] from comment #64)
> Doesn't this mean we get crashes if something leaks media elements? That
> doesn't sound great.

I don't understand how the crashes you mention relate to the extra CC.
Currently we get crashes if a gfx resource is used after the gfx module is shut down (destroying it too late counts as used).

Is the r- because we don't want to spend time on extra CC during shutdown, or is there something else? I think that ShutdownXPCOM is run in non-debug builds.

Olli Pettay [:smaug][bugs@pettay.fi]

Comment 69

•

8 years ago

That extra CC during shutdown seems like a hack and doesn't work if someone happens to leak object.
Also, it doesn't guarantee everything is released after it. For example some code might on purpose keep objects alive until some other notifications happening after xpcom-shutdown.
Or ( this is a leak sure ) the common case where non-weak ObserverService observers are kept alive until the ObserverService is shutdown and those listeners may keep other objects alive.

In general code shouldn't rely on CC for any cleans up happening before/during shutdown.

Sounds like media stuff needs to explicitly release some resources when xpcom-shutdown notification is fired.

Nicolas Silva [:nical]

Assignee

Updated

•

8 years ago

Blocks: 1245574

Andrew McCreight [:mccr8]

Reporter

Comment 70

•

8 years ago

Yes, what Olli said. Thanks for explaining it better than me. As I said, the media people have had to deal with similar issues for a while, so I'd talk to jesup about what they do. I don't know the details.

Nicolas Silva [:nical]

Assignee

Updated

•

8 years ago

Blocks: 1242442

Nicolas Silva [:nical]

Assignee

Comment 71

•

8 years ago

Attached patch Shut ShadowLayers down properly (deleted) — Details — Splinter Review

Similar to shutting ImageBridge down. This patch includes the stuff Roc wrote that I had reviewed and add code to close the channel is properly and make sure we don't leak CompositorThreadHolders.
the calling CleanupRemoteResources was moved from the compositor to the widget because the Compositor some times outlives the widget and adding a strong ref to to the widget would create a cycle.

looks good on try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=692f220c11a1b17dabd7571afa99c990c9177439

Attachment #8737880 - Flags: review?(sotaro.ikeda.g)

Nicolas Silva [:nical]

Assignee

Comment 72

•

8 years ago

Attached patch Add a pref to enable shutting child processes down and disable it on windows. (deleted) — Details — Splinter Review

There's still a few things to figure out on windows specifically. The try push in the previous comment was with this patch applied.

Attachment #8688935 - Attachment is obsolete: true

Attachment #8688936 - Attachment is obsolete: true

Attachment #8692833 - Attachment is obsolete: true

Attachment #8692834 - Attachment is obsolete: true

Attachment #8693359 - Attachment is obsolete: true

Attachment #8735902 - Attachment is obsolete: true

Attachment #8737882 - Flags: review?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 73

•

8 years ago

(In reply to Nicolas Silva [:nical] from comment #72)
> There's still a few things to figure out on windows specifically. The try
> push in the previous comment was with this patch applied.

Is it going to be addressed in this bug?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 74

•

8 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #73)
> Is it going to be addressed in this bug?

Yes. Depending on the amount of work I might take a break from this and come back to the remaining windows bits later if no one else wants to have fun with shutdown issues.

Flags: needinfo?(nical.bugzilla)

Sotaro Ikeda [:sotaro]

Comment 75

•

8 years ago

Comment on attachment 8737880 [details] [diff] [review]
Shut ShadowLayers down properly

Review of attachment 8737880 [details] [diff] [review]:
-----------------------------------------------------------------

I have one question about calling thread of CleanupRemoteDrawing(). The question is the following.

::: gfx/layers/ipc/CompositorBridgeChild.cpp
@@ +33,5 @@
>  
>  namespace mozilla {
>  namespace layers {
>  
> +static StaticRefPtr<CompositorBridgeChild> sCompositor;

//nit. It might better change "sCompositor" to like "sCompositorBridge".

@@ +185,5 @@
>  }
>  
> +// static
> +bool
> +CompositorBridgeChild::ChildProcessHasCompositor()

//nit. It might better to change "ChildProcessHasCompositor()" to like "ChildProcessHasCompositorBridge()"

::: widget/nsBaseWidget.cpp
@@ +269,5 @@
>  }
>  
>  void nsBaseWidget::DestroyCompositor()
>  {
> +  CleanupRemoteDrawing();

// Is it ok to call it here? It is on main thread. It seems to be expected that CleanupRemoteDrawing() is called on Compositor thread.

Sotaro Ikeda [:sotaro]

Comment 76

•

8 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #75)
> 
> // Is it ok to call it here? It is on main thread. It seems to be expected
> that CleanupRemoteDrawing() is called on Compositor thread.

All another remote drawing functions are called on compositor thread.

Sotaro Ikeda [:sotaro]

Updated

•

8 years ago

Attachment #8737882 - Flags: review?(sotaro.ikeda.g) → review+

Sotaro Ikeda [:sotaro]

Updated

•

8 years ago

Attachment #8737880 - Flags: review?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Updated

•

8 years ago

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 77

•

8 years ago

(In reply to Sotaro Ikeda [:sotaro] from comment #76)
> All another remote drawing functions are called on compositor thread.

You are right, code seems to expect CleanupRemoteResources to happen on the compositor thread. I'm digging some more and I think we should be good if we call it when receiving the WillClose message, I'm doing some more testing to be sure.

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 78

•

8 years ago

Attached patch Shut dhadowlayers down properly (adendum) (deleted) — Details — Splinter Review

Modifications to the patch "Shut Shadowlayers down properly" I figured it would be easier to review this in isolation than the whole thing again. calling CleanupRemoteDrawing from RecvWillClose fixed the use-after-free issue without having to move the call off the compositor thread.

Attachment #8738514 - Flags: review?(sotaro.ikeda.g)

Nicolas Silva [:nical]

Assignee

Comment 79

•

8 years ago

Comment on attachment 8737880 [details] [diff] [review]
Shut ShadowLayers down properly

Review of attachment 8737880 [details] [diff] [review]:
-----------------------------------------------------------------

Re-setting the review flag since there is a followup fix addressing the review comments.

Attachment #8737880 - Flags: review?(sotaro.ikeda.g)

Sotaro Ikeda [:sotaro]

Comment 80

•

8 years ago

Comment on attachment 8738514 [details] [diff] [review]
Shut dhadowlayers down properly (adendum)

Review of attachment 8738514 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good!

Attachment #8738514 - Flags: review?(sotaro.ikeda.g) → review+

Sotaro Ikeda [:sotaro]

Updated

•

8 years ago

Attachment #8737880 - Flags: review?(sotaro.ikeda.g) → review+

Pulsebot

Comment 81

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/1f07fce7daf2
https://hg.mozilla.org/integration/mozilla-inbound/rev/577472ad5c38
https://hg.mozilla.org/integration/mozilla-inbound/rev/125f5b4294f9

Nicolas Silva [:nical]

Assignee

Updated

•

8 years ago

Blocks: 1262886

Nicolas Silva [:nical]

Assignee

Updated

•

8 years ago

Blocks: 1262898

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 82

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/1f07fce7daf2
https://hg.mozilla.org/mozilla-central/rev/577472ad5c38
https://hg.mozilla.org/mozilla-central/rev/125f5b4294f9

Status: NEW → RESOLVED

Closed: 8 years ago

status-firefox48: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla48

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Depends on: 1246529

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

See Also: https://bugzilla.mozilla.org/show_bug.cgi?id=1254829 →

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Blocks: 1252283

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Blocks: 1252348

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Blocks: 1253440

Ryan VanderMeulen [:RyanVM]

Comment 83

•

8 years ago

Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on file and allowed me to re-enable a few entire *directories* of tests! Is there any chance we can safely uplift this to Aurora as well so it's included in the next e10s experiments on Beta47?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 84

•

8 years ago

(In reply to Ryan VanderMeulen [:RyanVM] from comment #83)
> Wowza, this fixed a pile of shutdown assertion/crash/leak bugs we've had on
> file and allowed me to re-enable a few entire *directories* of tests! Is
> there any chance we can safely uplift this to Aurora as well so it's
> included in the next e10s experiments on Beta47?

\o/

I think that it's a bit risky to uplift, though. We could try and be ready to back it out in case of issue, I'll see how cleanly it applies to aurora but I suspect it will require some fiddling.

Flags: needinfo?(nical.bugzilla)

Gian-Carlo Pascutto [:gcp]

Updated

•

8 years ago

Depends on: 1263515

Gian-Carlo Pascutto [:gcp]

Comment 85

•

8 years ago

This seems to have broken hardware accelerated rendering on Linux.

Nicolas Silva [:nical]

Assignee

Updated

•

8 years ago

Blocks: 1264293

Andrew McCreight [:mccr8]

Reporter

Updated

•

8 years ago

status-firefox44: affected → ---

status-firefox46: --- → wontfix

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 87

•

8 years ago

Uplifting this to 47 would also make uplifting bug 1264161 easier. In particular this patch changed some stuff in CompositorParent::ActorDestroy and without that change I'm not entirely clear on what impact bug 1264161 will have. Now that this has had some more bake time, do you feel more comfortable uplifting it?

Flags: needinfo?(nical.bugzilla)

Nicolas Silva [:nical]

Assignee

Comment 88

•

8 years ago

I am a bit worried that uplifting this will mean uplifting a lot of other shutdown stuff we've been doing in the last few months and I don't have a clear idea of what depends on what. I don't know how hard it would be to have a fix for bug 1264161 in 47 but it's most likely still less risky.

Flags: needinfo?(nical.bugzilla)

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Comment 89

•

8 years ago

Ok,  thanks.

(no longer active)

Updated

•

7 years ago

Depends on: 1334655

David Anderson [:dvander] - inactive, e-mail if emergency

Updated

•

7 years ago

No longer depends on: 1334655

You need to log in before you can comment on or make changes to this bug.