Closed Bug 1675547 Opened 4 years ago Closed 4 years ago

Better handle hit-testing when inactive scrollframes intersect OOPIFs

Categories

(Core :: Panning and Zooming, enhancement)

enhancement

Tracking

()

RESOLVED FIXED
87 Branch
Fission Milestone M6c
Tracking Status
firefox87 --- fixed

People

(Reporter: kats, Assigned: tnikkel)

References

(Blocks 2 open bugs, Regressed 1 open bug)

Details

(Whiteboard: [2/5] patches bounced)

Attachments

(5 files, 2 obsolete files)

As discussed in https://bugzilla.mozilla.org/show_bug.cgi?id=1541589#c13 there are some hit-testing issues when inactive scrollframes are underneath OOPIFs. This happens because of the way we generate the hit info for the inactive scrollframe; it can end up on top of the OOPIF. The compositor then targets the inactive scrollframe for incoming events instead of the OOPIF.

Botond is fixing a subset of these cases in bug 1645433 but in discussion with Matt Woodrow they realized that there are other cases that won't be handled, such as when there is a positioned item inside the inactive scrollframe with a z-index that places it on top of the OOPIF. There is such a case in the mega-test here (click on the "scroller with split items" button), and I'll attach a standalone test case to this bug as well.

In terms of fixing it, we came up with a couple of potential solutions, which I wrote down in this comment.

Attached file Bug 1675547 - Add a test (deleted) —

Depends on D96075

Some more discussion on this topic today, with a variety of possible solutions to explore. The first few are all sub-cases of the second bullet point in bug 1541589 comment 13.

  • One is to ensure that hit-test info items inside the inactive scrollframe have the correct scrollId (that of the inactive scrollframe) with no other changes. In particular this would leave the ASRs untouched, the WebRenderScrollData untouched, and the APZC tree untouched. However, when APZ does a hit-test and gets a scrollId that it doesn't recognize, it would treat it the same as an unlayerized scrollframe and leave it in the input queue until a round-trip to the main thread layerized it and created the APZC instance. Implementation-wise, this would involve ensuring that hit-test info items inside an inactive scrollframe have an mScrollTarget explicitly set at this point rather than relying on getting the scrollId from the item's ASR. I thought about this approach a bit more as I was writing this down, and I realized that if the APZ tree doesn't have metrics or an APZC instance for the scrollId, it may not know where that scrollframe is within the tree of scrollframes, and may not be able to properly untransform the input event for main-thread dispatch, if e.g. the inactive scrollframe is inside a nested active scrollframe with an async transform. So that's also something to consider.

  • Another is to ensure that all inactive scrollframes still create ASR instances, and that items inside the scrollframes use those ASRs. This would automatically ensure that hit-test info items inside the inactive scrollframe have the correct scrollId (i.e., the above bullet point) but would also cause WebRenderLayerScrollData instances corresponding to the ASRs get sent over as part of the WebRenderScrollData. This in turn would cause creation of the APZCs in the APZ tree. Input events targeted at these APZCs would not necessarily wait for a main-thread round-trip (unless there were e.g. APZ-aware listeners there) but would instead go straight to the APZC via the input queue. This would result in the APZC scrolling but immediately checkerboarding because of the lack of displayport. The main-thread version of the input event or the next repaint request would cause the displayport to get set and clear the checkerboard.

https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=d2c349236e5e9c187eb84951104a6cd7e5d5ad6b is my most recent incomplete WIP for this approach, but it's failing some reftests and I need to investigate why.

  • Another variant of the above scenario is that we set the eRequiresTargetConfirmation flag on the hit-test info items, so that the input events would wait in the InputQueue until the main-thread round-trip happened, which should eliminate or shorten the checkerboarding. This flag could be set either at display-list building time or tacked on to the result of the hit-test in the compositor. The latter is probably simpler.

Just for completeness, the first bullet point from bug 1541589 comment 13:

  • Set a zero-margin displayport on every inactive scrollframe, thereby activating it. The end result of this should be the same as the second bullet above, but the implementation may be simpler as it takes advantage of existing abstractions rather than mucking around with ASRs directly.

Finally, a backup plan:

  • Allow out-of-order event dispatch in these rare edge cases. Here, if the input events stamped with the wrong layers id by APZ, the parent process still dispatches it to that (wrong) content process. the content process can determine that the input event falls inside an OOPIF, and can then stamp the input event with the more-correct layers id of the OOPIF and send it back to the parent process to redispatch. This repeats until the input arrives at the correct OOPIF. Note that while a content process is processing an input event, other inputs may arrive and go to the correct target process (e.g. keyboard inputs that go directly to the currently-focused element, without needing to be stamped with a layers id by APZ), so this can result in out-of-order event delivery to an OOPIF. Still, the number cases in which this occurs may be low enough to make it ok.

New WIP: https://treeherder.mozilla.org/#/jobs?repo=try&revision=20682aa8e27981b4c2f2bc114a85c72e4129d07d

I'm expecting this to be green for correctness but I haven't really looked into the performance implications. This patchset basically activates all the scrollframes when WR is enabled, and then updates failing apz mochitests accordingly.

Fission Milestone: --- → M6c

https://treeherder.mozilla.org/perfherder/compare?originalProject=mozilla-central&newProject=try&newRevision=b12972231592ab7eece8701984afe8db016bdcea&framework=1&selectedTimeRange=172800 is the Linux WR talos performance comparison. ~9% regression on tscrollx and tp5o_scroll but otherwise seems ok. So that's not terrible; I suspect the regression is mostly just from extra displaylist items and scroll data making IPC slower.

No longer depends on: apz-fission-hit-test

Looking at the regression it's basically every subtest that has regressed.

testing/talos/talos/tests/scroll/reader.htm from tscrollx is a simple page, the only things that have scrollable overflow are the root element and the scrollable div that presumably the test is scrolling. So I would expect both to be active for this test. So I would expect the patch to be a no-op, but yet we still see a large regression on that page.

I got other weird results like making the change only affect the content process, or only affect the non-content process, didn't move the numbers. And making the change apply to non-wr doesn't move the non-wr numbers at all. So I just disabled the change and pushed that to try and I still get the worse perf numbers. So it looks like the difference is coming purely from some difference when running talos on m-c vs on try. So I'm guessing there is no actual perf regression here, just tscroll/asap mode biting us again. For example bug 1645275 documents how tscroll bit us previously, although I can't tell if that is related or not.

So I will probably just get kats' tests and patches reviewed and land them and see if we get perf alerts.

If we don't have the WantAsyncScroll condition there, then some reftests
(e.g. clipPath-on-outflowElement-01a.html) will fail because even
non-scrollable overflow:hidden divs get activated.

Assignee: nobody → tnikkel
Status: NEW → ASSIGNED
Attached file Bug 1675547 - Add test. r?botond (deleted) —

Depends on D99983

Timothy, are you working on this scrollframe bug? Do you have an ETA for addressing the review comments from Phabricator?

Flags: needinfo?(tnikkel)
Whiteboard: [ETA ? patches in Phabricator need more work]

Yeah, I'm working on it! Addressing the review comments really spiralled out into a much bigger patch stack. The patches are all written (approach approved by the eventually reviewer) they pass try now, but I'm working on a few test issues. Writing tests are always a nightmare and always takes at least 2x longer than debugging and writing a patch for the original issue it seems. I finally got the test working okay in bug 1689492 over the weekend (not linked here, but needed to so that my patch stack has a reasonable mental model that I can work with).

Flags: needinfo?(tnikkel)

(In reply to Timothy Nikkel (:tnikkel) from comment #11)

The patches are all written (approach approved by the eventually reviewer) they pass try now, but I'm working on a few test issues. Writing tests are always a nightmare and always takes at least 2x longer than debugging and writing a patch for the original issue it seems. I finally got the test working okay in bug 1689492 over the weekend (not linked here, but needed to so that my patch stack has a reasonable mental model that I can work with).

Sounds good. I totally understand about tests taking longer than the patch itself. :)

I'm the EPM for the Fission project. If you get blocked or need anything, just let me know!

Whiteboard: [ETA ? patches in Phabricator need more work] → [ETA ? patches done; now fixing tests in bug 1689492]
Depends on: 1690433

We want this for hit testing purposes.

Depends on D103859

Timothy, your patches (after botond's r+) can land without the new tests in bug 1689492, correct?
Adding new tests should follow right after but should not be a M6c blocker.

Flags: needinfo?(tnikkel)

Bug 1689492 is already landed (it's a bug to land a test).

I'll land the patches in the dependant bug (bug 1690433) shortly.

Before flipping the pref here I need to modify a few tests slightly otherwise they will fail. (The tests were originally modified to expect the new behaviour always, whereas I'm only flipping the pref to enable the new behaviour with fission.) I'll change the tests more thoroughly after landing the pref flip.

Flags: needinfo?(tnikkel)
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/1408a1f0e735 Add test. r=botond https://hg.mozilla.org/integration/autoland/rev/2e05f59e722e Update existing mochitests to handle WR layerizing. r=botond https://hg.mozilla.org/integration/autoland/rev/5064e7de4dcb Adjust tests that we are touching to look at the activate all scroll frames pref and modify expectations accordingly.
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7913b9adce62 Force layerization for all scrollframes that want it, when WR and fission is enabled. r=botond
Attached file Bug 1675547. Fix eslint complaints. (obsolete) (deleted) —

Depends on D104176

Backout by cbrindusan@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/fa3b003cee3f Backed out changeset 7913b9adce62 for bc failures in browser_test_group_fission.js. CLOSED TREE

Also saw the following intermittent showing up with the backed out changes: https://treeherder.mozilla.org/logviewer?job_id=328969509&repo=autoland&lineNumber=12438

Whiteboard: [ETA ? patches done; now fixing tests in bug 1689492] → [2/5] patches bounced
Attachment #9201373 - Attachment is obsolete: true

The failure in helper_check_dp_size.html is interesting. This only happens on the M-fis, so all scroll frames get a minimal display port at least. We call promiseMoveMouseAndScrollWheelOver in the test to activate an inner div. AsyncPanZoomController::HandleEndOfPan gets called (assuming from OnPanEnd). It calls FlushRepaints() on the overscroll handoff chain. This calls RequestContentRepaint on the outer div. The resulting UpdateSubFrame call does not clear the minimal display port property because we already have a display port (it happens to be minimal). Then PrepareForSetTargetAPZCNotification is called to activate the inner div, it sets a zero margin display port on all ancestors including the outer div, and hence removes the minimal display port property. The outer div ends up with a zero margin display port (but not minimal). This is the normal case. When the test fails these two things happen in reversed order: PrepareForSetTargetAPZCNotification removes the minimal display port property, and then UpdateSubFrame is called, which sets non-zero margins for the display port.

If we are in a situation where all scroll frames are not activated we avoid this entirely because apz doesn't know about the outer div scroll frame, so there is no RequestContentRepaint happening.

Flags: needinfo?(tnikkel)

Possible actions to take:

  1. change SetZeroMarginDisplayPortOnAsyncScrollableAncestors to never clear a minimal display port property
  2. switch zero margin display ports to minimal display ports (I have a patch for this already), it would also fix bug 1691160
  3. divide all RequestContentRepaint calls into those that want to set a displayport and those that don't and send that in the repaint request and have the apz callback helper look at that field (I have a patch for this already)
  4. have UpdateSub/RootFrame in apz callback helper never replace a zero margin display port with a non-zero margin display port.

number 4 feels kind of hacky.
number 3 is appealing because it could potentially eliminate other sources of setting a full display port when we don't want. As we saw in bug 1687927, comment 8 avoiding setting a full display port is a nice and significant perf win.
number 2 I was planning on doing, but wanted to wait until fall out of this and related patches was clear.
number 1 seems like a nice, good, small idea to do, and doesn't preclude us from doing any of the other options or other potential fixes I haven't thought of on this list.

Depends on: 1691186

Bug 1691186 filed for number 1 above with patch.

Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/10606ca6f128 Add test. r=botond https://hg.mozilla.org/integration/autoland/rev/213de2637a7a Update existing mochitests to handle WR layerizing. r=botond https://hg.mozilla.org/integration/autoland/rev/aafff5935628 Adjust tests that we are touching to look at the activate all scroll frames pref and modify expectations accordingly.
Pushed by tnikkel@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/64ddd35a8fc0 Force layerization for all scrollframes that want it, when WR and fission is enabled. r=botond
Attachment #9193684 - Attachment is obsolete: true
Blocks: 1691878

Filed bug 1691878 to more better test this code in some existing tests, which was mentioned in review comments.

Regressions: 1695957
Regressions: 1724245
Regressions: CVE-2022-36319
Regressions: 1781007
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: