Make APZ tell chrome main thread the transforms for chrome to content coordinate spaces when GPU process is used
Categories
(Core :: Panning and Zooming, enhancement, P3)
Tracking
()
Tracking | Status | |
---|---|---|
firefox68 | --- | fixed |
People
(Reporter: hsivonen, Assigned: kats)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fission-event-m2])
Attachments
(2 files)
Bug 1533673 - Prevent the GPU RemoteContentController from sending messages to a dead actor. r?rhunt
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Follow-up for bug 1530661:
Bug 1530661 is disabled by default when the GPU process is in use. The IPC problem should be tracked down and fixed so that the pref can be removed and the code run unconditionally in the GPU process case.
Reporter | ||
Comment 1•6 years ago
|
||
Marking [fission-event-m2] on the assumption that for [fission-event-m1] it's more important to have feature breadth on some platforms (Linux and Mac) than platform coverage (Windows).
Assignee | ||
Updated•6 years ago
|
Reporter | ||
Comment 2•6 years ago
|
||
Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c3dc02b3f5934a4d103db5017fee75642d9fef5
Reporter | ||
Comment 3•6 years ago
|
||
(In reply to Henri Sivonen (:hsivonen) from comment #2)
Let's see if it just works now, considering that in the original bug the mutex scope got narrowed after blocking Windows by pref:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=8c3dc02b3f5934a4d103db5017fee75642d9fef5
No, still fails, so the mutex scope change didn't fix this.
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 4•6 years ago
|
||
Still failing. The RemoteContentController::SendLayerTransforms
is blowing up with a Route error: message sent to unknown actor ID
error. This is quite bizarre because the code is (a) running on the compositor thread, which is the right thread, and (b) guarded by an mCanSend
check, which should be false if the ActorDestroy
method was called.
Clearly some sort of race condition since there's a bunch of tests running fine before it fails but it's not clear why this would happen. I'll try reproducing locally on Linux with the GPU process enabled and use rr if I can, as that will be easier to debug than spamming tryserver.
Assignee | ||
Comment 5•6 years ago
|
||
Best theory so far is that the main thread is running RemoteCompositorSession::Shutdown
at the same time that the GPU process is doing its RemoteContentController::SendLayerTransforms
, and so when the message arrives to the main thread the actor is gone. But if that's the case I'm kind of surprised we haven't run into this crash before because all the RemoteContentController methods should be subject to the same problem.
Assignee | ||
Comment 6•6 years ago
|
||
The theory seems to be supported by a try push with additional logging here. The crash always seems to occur immediately after RemoteCompositorSession::Shutdown
runs.
Looking at the RemoteContentController methods, NotifyLayerTransforms
is the only one that would get called from the updater thread (which with WR enabled would be the scene builder thread) so maybe that's related, although I can't see how that would matter, since it redispatches itself to the compositor thread anyway.
Anyway, now that I think I know what's going I'll try to come up with a fix.
Assignee | ||
Comment 7•6 years ago
|
||
Assignee | ||
Comment 8•6 years ago
|
||
Currently it's possible for the RemoteContentController in the GPU process to
be sending a message to the UI process while the UI process is running
RemoteCompositorSession::Shutdown. This means that by the time that message
is processed, the UI process actor is destroyed and the message produces
a routing error.
This patch ensures that before RemoteCompositorSession::Shutdown completes,
it notifies the RemoteContentController in the GPU process that it's about
to destroy the APZChild actor. This eliminates the race because it ensures
the RemoteContentController is synchronously notified of the impending
actor destruction before it tries to send the message.
Assignee | ||
Comment 9•6 years ago
|
||
With the IPC fix in the previous patch this seems to work now.
Depends on D29941
Updated•6 years ago
|
Comment 10•6 years ago
|
||
Comment 11•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c737fc642ddc
https://hg.mozilla.org/mozilla-central/rev/4abfc5c2a4ee
Description
•