Closed Bug 1303426 Opened 8 years ago Closed 8 years ago

Content crashes when video is playing while the GPU process shuts down

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla52
Tracking Status
firefox52 --- fixed

People

(Reporter: dvander, Assigned: dvander)

References

(Blocks 1 open bug)

Details

Attachments

(5 files, 1 obsolete file)

The crash occurs in BufferTextureData::BorrowMappedYCbCrData[1]. The reason is, shmems are deallocated immediately after ImageBridgeChild's ActorDestroy, and there is no way to prevent a race between the media thread and the ImageBridgeChild thread.

Even existing mechanisms do not prevent this: nothing locks shmem deallocation, so locking the actor or TextureClient has no effect.

[1] http://searchfox.org/mozilla-central/source/gfx/layers/BufferTexture.cpp#329
I can't really see any reason not to do this. The biggest risk (I think) is that we call DeallocShmem without sending a destroy message for a PTexture, and then end up leaking the shmem. But my understanding is TextureClient owns the shmem and is responsible for freeing it. So this shouldn't really be an issue.
Assignee: nobody → dvander
Status: NEW → ASSIGNED
Attachment #8792738 - Flags: review?(wmccloskey)
gw280 found bug 523174 which increases my suspicion: the original intent was for shmem to work in lock-step with a single actor. In fact this bug didn't include "unsafe" shmem support, so it just didn't anticipate how modern gecko would work.
Comment on attachment 8792738 [details] [diff] [review]
part 1, ipc::Shmem should hold strong ref

Nical, what do you think of this approach? This does in fact fix the video crashes I get when restarting the GPU process. Is there any TextureClient code we can clean up after?

Note that we still have to acquire a lock if an ipc::Shmem will have readers and writers on different threads. If that's not the case we will need to fix it. However, the gfxCriticalError here[1] can probably be removed since the early return will be safe?

[1] http://searchfox.org/mozilla-central/rev/f6c298b36db67a7109079c0dd7755f329c1d58e2/gfx/layers/client/TextureClient.cpp#342
Attachment #8792738 - Flags: feedback?(nical.bugzilla)
Also, try run for the shmem change: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2f326e435c7b0f1ca50087db02574feab82cf95b
Attachment #8792738 - Flags: feedback?(nical.bugzilla) → feedback+
This turns CanSend asserts in ImageBridgeChild into safe failures.
Attachment #8793102 - Flags: review?(nical.bugzilla)
Attached patch part 3, remove ChildActor (deleted) β€” β€” Splinter Review
This removes ChildActor and folds the still-relevant parts directly into TextureChild. mDestroy is renamed to mOwnerCalledDestroy, to differentiate it from the two other booleans that already indicate various destruction states.
Attachment #8793103 - Flags: review?(nical.bugzilla)
Attached patch part 4, fix DeallocateTextureClient (deleted) β€” β€” Splinter Review
When the GPU process dies, with all of the previous patches applied, we still hit two asserts.

First, the gfxCriticalError that IPC is still open (this should be a bogus assert after part 1 lands). Second, TextureChild's destructor wants mOwnerCalledDestroy to be true. We don't call destroy if IPC isn't open.

To clean this up I moved the tail code of DeallocateTextureClient into TextureChild::Destroy and TextureChild::DestroySync. The sync case is now called when IPC isn't open, in addition to the normal sync case. That method ensures DestroyTextureData is called, but skips IPC stuff if ActorDestroy has been called.
Attachment #8793106 - Flags: review?(nical.bugzilla)
Attached patch part 5, address TODO comment (obsolete) (deleted) β€” β€” Splinter Review
There's a scary looking comment in TextureClient::AsTextureClient about how it's prone to races on shutdown. This comment is true, but it looks like no callers would exhibit this problem. This patch adds an assert so no illegal uses come up.
Attachment #8793107 - Flags: review?(nical.bugzilla)
Comment on attachment 8792738 [details] [diff] [review]
part 1, ipc::Shmem should hold strong ref

Review of attachment 8792738 [details] [diff] [review]:
-----------------------------------------------------------------

DeallocShmem sends a destroy message, so I don't see how we could get a leak that way.
Attachment #8792738 - Flags: review?(wmccloskey) → review+
Attachment #8793102 - Flags: review?(nical.bugzilla) → review+
Attachment #8793103 - Flags: review?(nical.bugzilla) → review+
Attachment #8793106 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8793107 [details] [diff] [review]
part 5, address TODO comment

Review of attachment 8793107 [details] [diff] [review]:
-----------------------------------------------------------------

::: gfx/layers/client/TextureClient.cpp
@@ +768,5 @@
>    TextureChild* tc = static_cast<TextureChild*>(actor);
>  
> +  // NOTE: AsTextureClient must be called from the IPDL thread that owns the
> +  // actor. Otherwise, we could race with TextureClient's refcount dropping
> +  // to 0 and the actor could be destroyed.

AsTextureClient is already only called from the IPDL thread. The race happens when the TextureClient's ref-count drops to zero on another thread (say, an ImageContainer held by a video decoder was holding the the last reference to the texture and goes away on a media-related thread). In this case AsTextureClient may return a pointer to a texture that may be somewhere between the beginning of its destructor and after deletion of its memory.

It's good to assert here but I don't think it addresses the race, so you should modify the comment so that it does not look like the race only depends on this being accessed on the main thread.
Attached patch part 5, update comment (deleted) β€” β€” Splinter Review
Attachment #8793107 - Attachment is obsolete: true
Attachment #8793107 - Flags: review?(nical.bugzilla)
Attachment #8793378 - Flags: review?(nical.bugzilla)
Attachment #8793378 - Flags: review?(nical.bugzilla) → review+
Pushed by danderson@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/268c634a2dc8
Make ipc::Shmem hold a strong reference to underlying SharedMemory. (bug 1303426 part 1, r=billm)
https://hg.mozilla.org/integration/mozilla-inbound/rev/e2ae7ac2fb56
Fail safely if IPC has shut down during an ImageBridge shmem request. (bug 1303426 part 2, r=nical)
https://hg.mozilla.org/integration/mozilla-inbound/rev/4b306288100e
Remove the ChildActor helper class. (bug 1303426 part 3, r=nical)
https://hg.mozilla.org/integration/mozilla-inbound/rev/7d098e62480b
Destoy TextureData even if TextureChild::ActorDestroy has already run. (bug 1303426 part 4, r=nical)
https://hg.mozilla.org/integration/mozilla-inbound/rev/1db550ca8682
Update comment in TextureClient::AsTextureClient. (bug 1303426 part 5, r=nical)
https://hg.mozilla.org/mozilla-central/rev/268c634a2dc8
https://hg.mozilla.org/mozilla-central/rev/e2ae7ac2fb56
https://hg.mozilla.org/mozilla-central/rev/4b306288100e
https://hg.mozilla.org/mozilla-central/rev/7d098e62480b
https://hg.mozilla.org/mozilla-central/rev/1db550ca8682
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla52
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: