Closed Bug 1302611 Opened 8 years ago Closed 8 years ago

Unhandled critical error in TextureClient.cpp

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 1303426

People

(Reporter: dvander, Assigned: gw280)

References

(Blocks 1 open bug)

Details

When I play HTML5 video and kill the GPU process, I hit this assert in LayerTransactionParent.cpp:

http://searchfox.org/mozilla-central/rev/bcc9ea6947878ca6378e5b7d6f08c1c090ed9eb7/gfx/layers/client/TextureClient.cpp#342

This comment suggests that we're in some kind of unsalvageable state. If that's true, we have to figure out how to salvage it. Otherwise, we should remove the assert and keep going.

Nical, any thoughts?
Flags: needinfo?(nical.bugzilla)
Blocks: 1297568
I'll take this as part of bug 1297568
Assignee: nobody → gwright
This typically happens when textures are being used after shutdown, or destroyed in a racy way on two threads at the same time (again because of shutdown forcing deallocation on the IPDL thread while the other thread was already in the process of deallocating the texture). This is usually best fixed by either fixing the user of the texture which should not be holding on to the texture after gfx is shut down, or taking the necessary steps during abnormal shutdown to prevent this from happening.

There's been a lot of work to minimize this situation. As far as I know, the abnormal shutdown case (which seems to be the case here) has never been fully solved. Basically, during abnormal shutdown, IPDL will force all IPC things to be destroyed regardless of whether or not other threads are using these resources.
For actors we can work around it by replacing the deallocation by a reference count but shmems, among other things, will be unmapped while another thread is accessing the memory. We tried to wrap all accesses to the texture's shared data in a lock, but this creates interleaved locking patterns that ASAN is not happy with so we disabled it in some cases.
This specific assertion is hit because destruction is happening in a racy way. It does not look like shmems and other shared resources are involved, but when this happens all of the shared resource issues could be happening. It's just that this is the easiest place to detect the racy situation and put the assertion.

As far as I know, the assertion is legit. When this happens we are in a bad situation and I think it's better to crash here than in some random place later. This assertion helps with detecting when users of TextureClient are not shut down in the proper order, so removing it would enable this type of bug to proliferate again.
A hack could be to detect whether the shutdown was a normal shutdown and an abnormal shutdown, and attempt to tolerate the raciness with abnormal shutdowns (while we work out a better way to handle it).

The whole thing is certainly fixable but I believe it requires to rethink a good chunk of the way we share resources and come up with a stricter system. There's a fundamental tension between how some things need to be deallocated synchronously and our general yolo-approach to multithreading and automatic memory management.
Flags: needinfo?(nical.bugzilla)
Thanks, nical - this also helped me understand why bug 1303426 was happening, and I think it's the same problem. We deallocate shmems asynchronously but nothing synchronizes it with the media thread. This is worth fixing in IPDL, will do that in that bug.
No longer depends on: 1303426
Forwarding duping to bug 1303426 since that bug will implicitly fix this one.
Status: UNCONFIRMED → RESOLVED
Closed: 8 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.