Closed Bug 1541589 (apz-fission-hit-test) Opened 6 years ago Closed 4 years ago

[meta] Ensure APZ hit-testing gives correct results with Fission

Categories

(Core :: Panning and Zooming, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
Fission Milestone M7a

People

(Reporter: kats, Unassigned)

References

(Blocks 2 open bugs)

Details

(Keywords: meta, Whiteboard: [apz:fission:3:M])

Attachments

(1 file, 1 obsolete file)

Currently APZ gets a dispatch-to-content region for a number of reasons. The main buckets are:

  1. There is an element with APZ-aware listeners, so APZ doesn't know if it should use the event for async scrolling until main thread runs the listener
  2. There is an area for which APZ doesn't know which scrollframe to target, so it needs to ask the main thread.

Case 1 is not particularly interesting for Fission, since it's technically not a hit-testing problem. Case 2, however, is. There appear to be a number of scenarios that fall into case 2. SVGs and elements with clip-paths sitting on top of an OOP iframe is one of the more complex scenarios. Some other edge-case scenarios that fall into case 2: these codepaths as well as this one. There's also a case with rounded corners (bug 1421385).

It might not be feasible in the short/medium term to handle of these in APZ even if we assume WebRender is on, so it would be good to have a main thread fallback codepath.

Ryan had a discussion with Nika today that revealed some additional problems. In particular the main thread fallback cannot do synchronous IPC to the content process (because the content process can block indefinitely) and we also want to maintain input event ordering. More discussion about this with Botond revealed a defect in bug 1528725 which I didn't realize before. I've filed bug 1542020 with details on that. I also filed bug 1542019 to split the two "buckets" described in comment 0, as it will mitigate some of the perf impact from the planned fix for the defect.

Botond and I discussed the synchronous IPC problem and came up with a possible solution. Right now, the UI process synchronously sends input events to APZ and expects the input events come back stamped with a layers id indicating which content process that input event should be delivered to. The "Case 2" conditions described in comment 0 mean that sometimes APZ will not be able to do this stamping. If this happens, the UI process main thread would need to put the input event into a queue and trigger an asynchronous hit-test (which might involve multiple messages to multiple processes to figure out the true destination of the event). Any events following that one would also go into the queue. Once the hit test is resolved, the UI process could process the input events in the queue, until it encounters another input event that is unstamped. It is likely that this will be pretty rare.

APZ's main-thread fallback will be separate and unaffected. Either APZ will stamp the event, or the main thread will do a fallback hit-test. In either case, we will know for sure which process needs to receive the event, and when that process receives the event, it will send a SetTargetAPZC message back to APZ which will let it know the destination scrollframe of the input event. APZ has it's own input queue to hold events in such cases.

Having to have any content fallback for hit testing seems very unfortunate. Is there a reason why SVG & CSS clipping information can't be uploaded to WR itself?

It is unfortunate, but at the moment WR doesn't do SVGs natively. We basically rasterize it using gecko, which means it's not easy for WR to know all the hit-testing details of stuff inside the SVG. We do have plans to make WR do SVGs natively at which point this problem will go away. However that's not a small project, so it will take some time.

That being said, we shouldn't need to do this content fallback in most cases; only when an SVG's bounding box overlaps an OOP iframe. I wrote a little testcase and I'm not sure we handle that case properly with current Gecko (non-WR, non-fission) either. I'll have to investigate a bit more there.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #3)

I wrote a little testcase and I'm not sure we handle that case properly with current Gecko (non-WR, non-fission) either. I'll have to investigate a bit more there.

Filed bug 1543482 for this

No longer blocks: 1560312

Tentatively moving all bugs whose summaries mention "Fission" (or other Fission-related keywords) but are not assigned to a Fission Milestone to the "?" triage milestone.

This will generate a lot of bugmail, so you can filter your bugmail for the following UUID and delete them en masse:

0ee3c76a-bc79-4eb2-8d12-05dc0b68e732

Fission Milestone: --- → ?

Blocks enabling Fission in Nightly (M6)

Fission Milestone: ? → M6
Fission Milestone: M6 → M7
Whiteboard: [apz:fission:3:M]

Henri, isn't this already implemented?

Flags: needinfo?(hsivonen)

No. There is some hit-testing functionality, but this specific bug is about edge cases where WR doesn't have enough information to properly do the hit-test.

Flags: needinfo?(hsivonen)

btw, we plan to allow Fission on macOS and Linux without WR (and may need to support Fission on Windows without WR), so we will want to test APZ bugs both with and without WR.

Assignee: nobody → kats

I started thinking a bit more deeply about this bug and possible approaches.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #1)

Right now, the UI process synchronously sends input events to APZ and expects the input events come back stamped with a layers id indicating which content process that input event should be delivered to. The "Case 2" conditions described in comment 0 mean that sometimes APZ will not be able to do this stamping. If this happens, the UI process main thread would need to put the input event into a queue and trigger an asynchronous hit-test (which might involve multiple messages to multiple processes to figure out the true destination of the event).
[...]
APZ's main-thread fallback will be separate and unaffected. Either APZ will stamp the event, or the main thread will do a fallback hit-test.

I'm not sure why I added a second queue here - I think it might be possible to just our existing InputQueue infrastructure to solve the problem. That is, in the case where APZ is unable to determine a target layers id, the UI process main thread should just do its regular (main-thread) hit-test on the input event and pass the event on to whatever subdocument (OOP or not) that it hits. This can happen multiple times until finally the event reaches the process with the actual hit content, and at that point it can send the usual SetTargetAPZC message back to APZ which will let it know the destination scrollframe of the input event.

I need to think about this more but if it is viable then it would be much simpler. At this point my main concern is about the callback transform and other compositor transforms that might need to be unapplied on the way down and if that will pose a problem.

Attached file Hit-testing through clips (obsolete) (deleted) —

I spent some more time looking into this and making various test cases. I think what I had in mind in comment 10 sorta works, but it's not as trivial as I thought. The hard part is doing the "regular (main-thread) hit-test on the input event" because in a Fission world that means the event will have to bounce around between various processes and the parent process. So it's not great.

The good news is that based on my testing, the only case that we don't currently handle properly is when there's a polygon/complex clip-path on top of the OOPIF. And I discussed that with Jeff today and he suggested that we should just send the clip-path info over to WR and then we can hit-test it inside WR. If we can do that, it means we don't need any of the main-thread hit-testing codepaths (at least with WR enabled) and everything will work great. So I'd prefer going with that approach.

I'm attach the test case I made. To use it, click on one of the buttons in the bottom box to apply a particular clip on top of the iframe. The behaviour (mouseover info and whether or not you can scroll the OOPIF) should match in fission and non-fission. Right now the ellipse case and the path mask case do match; the former because we can hit-test simple shapes in WR already, and the latter because we don't hit-test through masks even on the main-thread. The other three cases are all "polygon" type clips and we should be able to send a representation of that over to WR and use it for hit-testing.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #0)

Some other edge-case scenarios that fall into case 2: these codepaths as well as this one.

These scenarios actually don't fall into case 2, they fall into case 1. That is, we should be able to hit-test the correct process for these cases in the compositor, but we may not know if the point lands on a scrollbar or not (for example). So we still want to hold off on doing APZ actions until that's resolved, but we can at least dispatch the gecko event to the correct process and that's all that matters here.

There's also a case with rounded corners (bug 1421385).

This one still needs fixing for hit-testing purposes, but shouldn't require a big architectural change or anything. It's just a matter of ensuring the nsDisplayCompositorHitTestInfo item has the correct clip chain on it.

I discussed this further with Botond. There are other cases that are interesting, even with WR enabled.

For example, in bug 1645433 he came across various scenarios related to inactive scrollframes where the way we position the hit-test item for the inactive scrollframe means that it can get hit even when it shouldn't. His work on that bug will fix some of those cases but there are still scenarios where positioned items inside the inactive scrollframe are on top of the OOPIF even though the scrollframe itself is underneath. And then the hit info for the inactive scrollframe ends up on top, causing hit tests to go wrong. I've created a test case that demonstrates this and included it in the attachment (the "scroller with split items" scenario). Note that once the scroller is activated (by scrolling on the green "abs inside scroller" item) the disparity between fission and non-fission goes away, as expected.

We discussed this case a bit and came up with a couple of potential solutions.

  • One is to simply activate any scroller that intersects with an OOPIF's rectangle. That way we won't have the inactive scroller hit info at all and it won't cause any problems.
  • Another is to drop the eVisibleToHitTest flag on the inactive scrollframe item so that it doesn't end up participating in hit-testing. Instead we would hit the contents of the scrollframe (or the OOPIF if that's under the cursor). To make this work we'd need to ensure that even inactive scrollframes get scrollIds (so that their contents which will end up getting hit have correct scrollids) and to have APZ treat such scrollIds as implicitly inactive rather than asserting here.

Assuming we can handle these cases, we couldn't think of anything else in the WR codepath that would require the main-thread fallback. We did discuss a bit around the implementation details of the fallback in case we need it for the layers codepath, and concluded that it would indeed require some sort of queue in the parent process. In order to maintain input ordering we'd have to keep a copy of any event dispatched from the widget code into Gecko. If the process receiving the event determined it was wrong (i.e. it hit-tested to a nsDisplayRemote or equivalent) it would then notify the parent process who would then re-dispatch it to the newly-determined correct process. Any events behind that one would have to wait. This all seems highly complex and only useful for some scenarios in the non-WR codepath, so I'm hoping we can avoid doing it entirely.

Attachment #9185735 - Attachment is obsolete: true

I filed bug 1675375 for the clip-path case and bug 1675547 for the inactive scrollframe case. With those two cases handled I believe we won't need a fallback path at all for WR. We may still need it for layers depending on whether or not we're going to ship Fission with layers. But even then, I would advise against implementing the fallback path because it will add a lot of complexity, for a codepath that's going away, and it only fixes some relatively rare edge cases.

If we do run into real-world cases that need improved handling with layers+fission, it might be feasible to spot-fix them by adding more state to the layer tree (e.g. adding more things into the EventRegions structure) for APZ to use when hit-testing.

Assignee: kats → nobody
No longer blocks: 1519412
Type: defect → enhancement
Blocks: 1675547

Based on the discussion in this bug, and the fact that my latest understanding is that Fission will require WebRender (falling back to software WebRender if necessary, but not to Layers), it's looking like we're not going to have a dispatch-to-content fallback mechanism for APZ hit testing.

I'm going to re-purpose this bug to be a meta bug tracking correctness issues with APZ hit testing for Fission. (I'm choosing this bug for that purpose rather than, say, bug 1542020, which I'll dupe over shortly, mostly because this bug has a lot of good discussion and it makes sense to keep important discussion in open bugs.)

Summary: Ensure we have a viable dispatch-to-content codepath for APZ hit-testing with Fission → [meta] Ensure APZ hit-testing gives correct results Fission
Summary: [meta] Ensure APZ hit-testing gives correct results Fission → [meta] Ensure APZ hit-testing gives correct results with Fission

I'm also going to keep this bug blocking M7, but some of its dependencies may block M6c if we're expecting nightly users to run into them.

Depends on: 1625173
Alias: apz-fission-hit-test
No longer blocks: 1675547
Depends on: 1675547
Depends on: 1675375
Depends on: 1682200
Depends on: 1682209

The only remaining sub-bug is now targeting M7a so I'm moving this to the same target.
Botond, do we have confirmation that this is all of the known apz hit testing issues?

Fission Milestone: M7 → M7a
Flags: needinfo?(botond)

Yep, I'm not aware of anything else.

Flags: needinfo?(botond)

All identified dependencies are fixed so based on Botond's confirmation in comment 19, I'm closing this as Fixed.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: