Closed Bug 1129223 Opened 10 years ago Closed 10 years ago

[e10s] MozAfterRemotePaint event uses a lot of messages

Categories

(Core :: Graphics: Layers, defect)

x86_64
Linux
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla39
Tracking Status
e10s + ---
firefox39 --- fixed

People

(Reporter: billm, Assigned: billm)

References

Details

Attachments

(2 files, 2 obsolete files)

The design of the MozAfterRemotePaint event is described here: https://bugzilla.mozilla.org/show_bug.cgi?id=1009628#c0 It seems like it uses a lot of messages (and context switches). I wonder if it would be possible to do something like this: When the event is requested, the chrome process would generate a new "sequence number". This number would be sent to the content process via a message. The CompositorChild would include the latest sequence number in any layer updates sent to the parent. When the parent received a layer update with a sequence number equal to or greater than the one it generated, it would fire the MozAfterRemotePaint event. Benoit, do you think this would work?
Flags: needinfo?(bgirard)
Is the sequence number necessary? If I understand correctly your proposal is to save the trip from the compositor to the content process and instead have the compositor notify the event listener directly. It was design this way for simplicity because the compositor might not have a channel with the entity that's requesting the event. In practice it's always the chrome's main thread so maybe it's fine if it's not flexible that way.
Flags: needinfo?(bgirard) → needinfo?(wmccloskey)
The reason I think the sequence number is important is to ensure that, when a layer update arrives, it's not just content that was slow to arrive from the old tab. Imagine this sequence: 1. Tab A paints in content process. Layer update is sent to parent. 2. User switches from tab A to tab B. We request a MozAfterRemotePaint notification. 3. Layer data from tab A arrives at compositor thread. We don't want to dispatch the MozAfterRemotePaint in this case because the data is stale.
Flags: needinfo?(wmccloskey)
(In reply to Bill McCloskey (:billm) from comment #2) > We don't want to dispatch the MozAfterRemotePaint in this case because the > data is stale. Then it should be up to the caller to handle this (likely by calling the callback). The layer system only sees incoming update. It doesn't know what a tab is so it has no measure of whats stale.
likely be canceling the callback*
(In reply to Benoit Girard (:BenWa) from comment #3) > (In reply to Bill McCloskey (:billm) from comment #2) > > We don't want to dispatch the MozAfterRemotePaint in this case because the > > data is stale. > > Then it should be up to the caller to handle this (likely by calling the > callback). The layer system only sees incoming update. It doesn't know what > a tab is so it has no measure of whats stale. Well, there's no way for the caller to know whether the data is stale without a sequence number. That's why we need a sequence number. Also, we don't want to cancel the callback in this case. We want to wait until layer data comes in that was generated after we requested the MozAfterRemotePaint event. The sequence number can be used to tell that.
(In reply to Bill McCloskey (:billm) from comment #5) > (In reply to Benoit Girard (:BenWa) from comment #3) > > (In reply to Bill McCloskey (:billm) from comment #2) > > > We don't want to dispatch the MozAfterRemotePaint in this case because the > > > data is stale. > > > > Then it should be up to the caller to handle this (likely by calling the > > callback). The layer system only sees incoming update. It doesn't know what > > a tab is so it has no measure of whats stale. > > Well, there's no way for the caller to know whether the data is stale > without a sequence number. That's why we need a sequence number. I don't think that's true. Consider the following with no sequence number. Note that for some cases the user will still want the mozAfterPaint even if more changes are made later so the definition of stale is user dependent: mozAfterRemotePaint(function() { // This is called after A is sent to the compositor // but it doesn't promise that the compositor isn't showing // tab B. The user would need to guard against that himself. }); // Change to tab A setTimeout(function() { // Change to tab B }, 100 ms); This could be fixed by doing: var afterPaintHandle = mozAfterRemotePaint(function() { // This is called after A is sent to the compositor // but it doesn't promise that the compositor isn't showing // tab B. The user would need to guard against that himself. if (window.gCurrentTab != "tabA") { dump("Stake\n"); } }); // Change to tab A window.gCurrentTab = "tabB"; setTimeout(function() { // Change to tab B mozCancelAfterRemotePaint(afterPaintHandle); // OR window.gCurrentTab = "tabB"; }, 100 ms); > > Also, we don't want to cancel the callback in this case. We want to wait > until layer data comes in that was generated after we requested the > MozAfterRemotePaint event. The sequence number can be used to tell that. The user should decide if he wants to cancel the callback because he no longer cares about it. Similarly to setTimeout/setInterval/requestAnimationFrame.
I agree that if there are two tab switches, then the frontend should be responsible for canceling the event listener from the first tab switch. I think that's what you're talking about? But I'm talking about a different situation where there's only one tab switch. Let's say that the user is looking at tab A. Imagine an animation is playing in that tab and lots of layer updates are continuously being sent from the content process to the compositor. Now the user switches to tab B. The frontend wants to install a callback that should be called when we draw the first frame for tab B. The problem is that we may still receive some frames from tab A even after the tab switch takes place. How do we know whether the incoming frames are from tab A or tab B? The frontend certainly has no way of knowing. That's what the sequence number solves. It essentially labels the layer updates, telling us whether they're from before the tab switch (and therefore from tab A) or after the tab switch (and therefore from tab B).
I checked the code and I'm not certain that it can't get confused if you have 2 or more pending callback. But I'm fairly sure it's correct if you have only a single pending callback. (In reply to Bill McCloskey (:billm) from comment #7) Ohh right. I believe this is already handled properly. Take my example code above. We set our callback and the active tab in the same JS Event. Because of JS' run to completion semantics we know that we wont get a paint in the middle of requesting the event and doing a tab switch. Note we can re-order the 3 global statement in any possible permutation and still get the same behavior. From the point of view of the refresh driver the 'mozAfterRemotePaint' and 'window.gCurrentTab = "tabB"' are atomic as the refresh driver can not run in between. When the refresh driver fires it will have a AfterRemotePaint callback requested which will follow along with the transaction. So it shouldn't fire the callback for animations that are pending. i.e. the code shouldn't get confused with another main thread transaction. Having said that it would be worth while checking that we handle the more complicated cases like OMTAnimations, which is not enabled yet. Do we know of any bugs with the current implementation? Your initial bug report seems to suggest wanting to reduce the overhead. I would imagine that we only do this on tab switch. Do we want to go through this trouble to save 1 IPC message there?
(In reply to Benoit Girard (:BenWa) from comment #8) > Ohh right. I believe this is already handled properly. Take my example code > above. We set our callback and the active tab in the same JS Event. Because > of JS' run to completion semantics we know that we wont get a paint in the > middle of requesting the event and doing a tab switch. Note we can re-order > the 3 global statement in any possible permutation and still get the same > behavior. From the point of view of the refresh driver the > 'mozAfterRemotePaint' and 'window.gCurrentTab = "tabB"' are atomic as the > refresh driver can not run in between. The problem is that the content process is running independently of the chrome process. Here's how the tab switch code currently works when switching from A to B: 1. Chrome asks content process to start rendering tab B (by setting the docShellIsActive flag for it) 2. Chrome installs a mozAfterRemotePaint listener on tab B 3. When the listener fires, chrome code switches to tab B (up until now we were updating layers but not displaying them) and sets docShellIsActive to false for A The problem is that, until we set docShellIsActive to false for tab A, we're still getting layer updates for A. And nothing in the code distinguishes between updates for A and B. It turns out that even the current code is broken. I tested this by changing the TabChild::RecvSetIsDocShellActive handler to avoid setting isActive to true for a certain tab. That means it will never publish layer updates and we'll never be able to draw it. With the existing code, the mozAfterRemotePaint event still fires when we switch to that tab because the compositor is still getting data from the old tab and it can't tell them apart. The result is that we switch to the new tab and it's just drawn as a black rectangle. I'm pretty sure this is why I often see flashes of black when switching tabs in e10s. > Do we know of any bugs with the current implementation? Your initial bug > report seems to suggest wanting to reduce the overhead. I would imagine that > we only do this on tab switch. Do we want to go through this trouble to save > 1 IPC message there? There is also a performance issue with using so many messages. Although the raw cost of the messages is low, each roundtrip requires the main thread on the content process to process a message. If the content process is working hard, it won't process messages very often. It's not uncommon for the main thread to be busy and only processing messages every 100ms. The two roundtrips add an extra 200ms to the tab switch time.
Attached patch notify-paint (obsolete) (deleted) — Splinter Review
I went ahead and implemented this. It turns out that we don't even need a sequence number since the layer trees for different tabs have IDs. We can just use the ID to figure out whether we've gotten layer data for the correct tab.
Assignee: nobody → wmccloskey
Status: NEW → ASSIGNED
Attachment #8559591 - Flags: review?(bgirard)
tracking-e10s: --- → +
Sorry it's a complex change, need a bit more time to review.
Attached patch rm-remote-local-paint (deleted) — Splinter Review
This patch removes mozAfterRemotePaint events for non-remote windows. We don't actually need the event, so I don't see any reason to keep the code around.
Attachment #8562457 - Flags: review?(bugs)
Attached patch notify-paint (obsolete) (deleted) — Splinter Review
It turns out that we also need a message when layers are unloaded. This version adds that functionality.
Attachment #8559591 - Attachment is obsolete: true
Attachment #8559591 - Flags: review?(bgirard)
Attachment #8562543 - Flags: review?(bgirard)
Attachment #8562457 - Flags: review?(bugs) → review+
We had a discussion on IRC: - mozAfterRemotePaint is fundamentally worse for tab switch if the child is busy since it requires two events to be posted on its event loop. - The patch here is providing different behavior than mozAfterRemotePaint and is nothing like mozAfterPaint so it should have a different name. Perhaps we should call it mozLayerTreeReady. - Need to decide if we're going to keep mozAfterRemotePaint or not.
Attachment #8562543 - Flags: review?(bgirard)
How should I proceed here? I will change the name of the event. Do you want me to leave the old mozAfterRemotePaint or can I delete it?
Flags: needinfo?(bgirard)
I'd vote to keep it. You mention you knew that it was incorrect, if you know why I'd be nice to document what's wrong with the next person that needs the stronger semantics. I'd imagine if we don't use it in the near future and we find a solution for the b2g keyboard without it then we could remove it entirely.
Flags: needinfo?(bgirard)
Attached patch notify-paint v2 (deleted) — Splinter Review
OK, I made the changes you requested.
Attachment #8562543 - Attachment is obsolete: true
Attachment #8570257 - Flags: review?(bgirard)
Comment on attachment 8570257 [details] [diff] [review] notify-paint v2 Review of attachment 8570257 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsFrameLoader.cpp @@ +2685,5 @@ > + if (mRemoteBrowser) { > + mRemoteBrowser->RequestNotifyLayerTreeReady(); > + } > + > + return NS_OK; Silent failure here is suboptimal but probably ok in this context. ::: dom/ipc/TabParent.cpp @@ +2670,5 @@ > +void > +TabParent::RequestNotifyLayerTreeReady() > +{ > + if (RenderFrameParent* frame = GetRenderFrame()) { > + CompositorParent::RequestNotifyLayerTreeReady(frame->GetLayersId(), Ok good, this event wont misfire if we have multiple remote layer trees. ::: dom/ipc/TabParent.h @@ +372,5 @@ > > bool SendLoadRemoteScript(const nsString& aURL, > const bool& aRunInGlobalScope); > > + void RequestNotifyLayerTreeReady(); Maybe a comment like: // See nsIFrameLoader requestNotifyLayerTreeReady ::: gfx/layers/ipc/CompositorParent.cpp @@ +1434,5 @@ > +CompositorParent::RequestNotifyLayerTreeReady(uint64_t aLayersId, CompositorUpdateObserver* aObserver) > +{ > + EnsureLayerTreeMapReady(); > + MonitorAutoLock lock(*sIndirectLayerTreesLock); > + sIndirectLayerTrees[aLayersId].mLayerTreeReadyObserver = aObserver; I guess here's you're assuming that you only have one remote site which can talk to this layer tree via mRemoteBrowser? Either make this into a list of add an assert that this is either null or that 'mLayerTreeReadyObserver == aObserver'. This will make sure that we never setup two update observer have the second clobber the first should we later allow more than one observer.
Attachment #8570257 - Flags: review?(bgirard) → review+
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: