Closed Bug 1645433 Opened 4 years ago Closed 4 years ago

[Fission] Can't interact with out-of-process subframes on some pages

Categories

(Core :: Panning and Zooming, defect, P2)

defect

Tracking

()

RESOLVED FIXED
85 Branch
Fission Milestone M6c
Tracking Status
firefox85 --- fixed

People

(Reporter: u608768, Assigned: botond)

References

(Blocks 1 open bug)

Details

Attachments

(4 files)

Attached file testcase.html (deleted) —

+++ This bug was initially created as a clone of Bug #1620309 +++

STR

  1. Enable apz.allow_zooming and fission.autostart, and restart.
  2. Open the attached test case. Note that the toplevel document has origin https://bug1645433.bmoattachments.org and the subframe has origin https://example.com, and so the two documents are loaded into separate processes.
  3. Attempt to interact with the example.com subframe (highlight text, scroll, etc). You should notice that this doesn't work.
  4. Interact with the toplevel document (highlight text, scroll, etc), and then try to interact with the subframe again. You should notice that this now works as expected.

I can reproduce this on Linux and macOS, and cpeterson was able to reproduce on Windows.

As far as I can tell, this only reproduces when the toplevel content has height: 100%; overflow: auto and the iframe has position: fixed.

Fission Milestone: --- → M6c

Kris, this needs to be fixed before Fx82 soft freeze as Fission will be turned on in Nightly at that time. This should have been blocking desktop-zoom-nightly.

Fission Milestone: M6c → M6b
Flags: needinfo?(ktaeleman)

Based on comment 0 this bug only occurs under very specific conditions (position:fixed iframe), so I think the original milestone of M6c was justified. And regardless, I don't see why this should have been blocking desktop-zoom-nightly since we're shipping desktop zoom before fission will even be turned on in Nightly.

A couple of notes:

  • The test page here appears to have been reduced from chat.mozilla.org (where the original STR involved interacting with a dialog)
  • Based on the Fission mana page, while Fission may not be enabled by default in nightly until 84, an experiment could be shipped to some nightly users in 82.

So, we may want to consider fixing this bug in the 82 cycle to avoid regressing the behaviour of chat.mozilla.org for users who participate in the experiment.

Moving this to tracking in release, but based on dtz release and fission release timelines lower priority than dtz release blockers.

Blocks: desktop-zoom-release
No longer blocks: desktop-zoom-post
Flags: needinfo?(ktaeleman)

Yes, Fission Nightly experiment is targeted for Fx83 so all issues need to be fixed in Fx82.

Kris, please find an assignee for getting this fixed in Fx82.

Flags: needinfo?(ktaeleman)

I can take a look at this.

Assignee: nobody → botond
Flags: needinfo?(ktaeleman)

I can reproduce this in a nightly from the day this was filed, but not in today's nightly -- looks like something has fixed it since then!

Fix range points to bug 1651050. This was one of the changes we made to get existing mochitests passing with apz.allow_zooming=true, so it's certainly plausible that it fixed misbehaviour like this that only happened with apz.allow_zooming=true.

Unfortunately, while bug 1651050 does fix this particular testcase, the mechanism by which it does so is largely incidental. Attached is a slightly modified testcase which still demonstrates the underlying issue.

STR

Same as the original testcase (try scrolling the "Example Domain" iframe right after page load).

Expected results

The iframe contents scroll.

Actual results

The page contents scroll instead. If you scroll the area outside the iframe first, then subsequent scrolling over the iframe succeeds.

Attachment #9173780 - Attachment mime type: text/plain → text/html

Actually, the issue with the modified testcase occurs with desktop zooming disabled, too, so this remaining issue does not need to block desktop-zoom-release.

No longer blocks: desktop-zoom-release
Summary: [Fission] Can't interact with out-of-process subframes when apz.allow_zooming is enabled → [Fission] Can't interact with out-of-process subframes on some pages

Anyways, I debugged the issue with the modified testcase, and here's my diagnosis:

The important ingredient required to trigger the bug is that an inactive scroll frame overlaps the OOP iframe (even if the OOP iframe is on top).

(This explains why desktop zooming initially regressed the original testcase, and why bug 1651050 fixed it. The <html> element in the original testcase is not scrollable, so our "activate the first scroll frame we find" heuristic activated the <div>, thereby avoiding the bug. Desktop zooming initially caused the <html> to be activated even if it's not scrollable (because it's zoomable), so the <div> remained inactive and the bug was triggered. Bug 1651050 then changed the heuristic such that the <html> is only activated if it's actually zoomed, so the <div> is now activated again and the bug is avoided again. My modification makes the <html> scrollable so the <div> is never activated and the bug is always triggered.)

The inactive scroll frame builds an eInactiveScrollframe CompositorHitTest display item here, which triggers the dispatch-to-content codepath if hit. This item has a z-index of INT32_MAX, for seemingly sad reasons. As a result, it creates an invisible layer, with a large DTC region covering the entire <div>, that's on top of the RefLayer for the OOP iframe. Compositor hit testing hits this invisible layer. We take the DTC codepath, which isn't hooked up for cross-process stuff, and so the event is never delivered to the OOP iframe's process.

Interacting with the <div> activates it, thereby avoiding the bug again.

So, bug 1541589 (possibly together with some related work like bug 1542020) would fix this. However:

  1. Fixing bug 1541589 is somewhat involved, and we were not planning on fixing it until M7.
  2. It's rather unfortunate that we need a cross-process dispatch-to-content mechanism to accurately target events in a case like this in the first place. It means sending an IPC message to the <div>'s content process and back, just to determine that the correct target for the event is the OOP iframe's process. This is not great for latency, and there's nothing fundamental about the page structure that should necessitate this extra chatter.

My preference would be to fix this by addressing the display list building issue from bug 1448841 in some other way that does not require the <div>'s inactive scrollframe item to be on top of everything else (and in particular the OOP iframe's Remote item) in the display list. Then compositor hit testing over the OOP iframe would just work correctly without needing any dispatch-to-content.

Kashav, would you be able to re-test the original STR from bug 1620309 with a recent nightly? Based on my investigation so far, I suspect it no longer reproduces, since the remaining issue here is not specific to apz.allow_zooming=true (and so if chat.mozilla.org were affected by the remaining issue, we would have noticed that sooner).

Flags: needinfo?(kmadan)

(In reply to Botond Ballo [:botond] from comment #14)

Kashav, would you be able to re-test the original STR from bug 1620309 with a recent nightly? Based on my investigation so far, I suspect it no longer reproduces, since the remaining issue here is not specific to apz.allow_zooming=true (and so if chat.mozilla.org were affected by the remaining issue, we would have noticed that sooner).

Neither bug 1620309 nor comment #0 reproduce on today's nightly!

Flags: needinfo?(kmadan)

(In reply to Botond Ballo [:botond] from comment #13)

My preference would be to fix this by addressing the display list building issue from bug 1448841 in some other way that does not require the <div>'s inactive scrollframe item to be on top of everything else (and in particular the OOP iframe's Remote item) in the display list. Then compositor hit testing over the OOP iframe would just work correctly without needing any dispatch-to-content.

I discussed this with Matt on chat; looks like it should be doable. I'm recording some tips he gave me for reference:

I think that we can use the max-z-index-in-list logic for full builds, and for partial builds we can lookup the previous hit-test item and include that z-index in the calculation
That should mean that we can only ever move it forward during partial updates, not back (and it'll always be guaranteed to be on top of everything that it needs to)
We also need to manually mark the old item as invalid if that happens, since we've effectively invalidated (by computing a new z-index for it)
If you look at DiscardOldItems, it has code for both iterating the old items for the frame, and for marking them as being invalid (can't be reused)

(In reply to :kashav from comment #15)

Neither bug 1620309 nor comment #0 reproduce on today's nightly!

Botond, if the original issue isn't reproducible, could you please describe what will we be fixing here, so we can re-evaluate if it should block Fission M6b or not?

Flags: needinfo?(botond)

(In reply to Neha Kochar [:neha] from comment #17)

(In reply to :kashav from comment #15)

Neither bug 1620309 nor comment #0 reproduce on today's nightly!

Botond, if the original issue isn't reproducible, could you please describe what will we be fixing here, so we can re-evaluate if it should block Fission M6b or not?

There is an underlying issue where compositor hit testing can target the wrong content process in some cases where an OOP iframe is on top of another scrollable element.

The issue no longer occurs on the original reduced testcase, nor on chat.mozilla.org which was the source of that testcase. However, it occurs on a slightly modified version of the reduced testcase, posted in comment 11.

As this remaining issue has not been reported as affecting a real website thus far, I think it would make sense for it to block M6c or M7 rather than M6b.

Flags: needinfo?(botond)
Blocks: apz-fission

Moving to M6c per comment 18.

Fission Milestone: M6b → M6c

I have a fix based on the approach described in comment 16 that works locally. I'd like to write a test as well before posting for review.

(The fix ix here for reference.)

(In reply to Botond Ballo [:botond] from comment #20)

I have a fix based on the approach described in comment 16 that works locally. I'd like to write a test as well before posting for review.

I cooked up a test since I wrote a very similar one for bug 1675547. It is attached as part of a patch stack that includes other tests, but can be rebased as needed. It does require the patch from bug 1675378 to be applied first though. I also confirmed that the fix in comment 21 doesn't fix the test for bug 1675547

(In reply to Kartikaya Gupta (email:kats@mozilla.staktrace.com) from comment #23)

I cooked up a test since I wrote a very similar one for bug 1675547. [...]

Nice, thanks!

The test that I was working on was taking a slighly different approach (not using Fission, just hit-testing in an in-process scenario and checking that we don't get a dspatch-to-content hit result). I'll go with yours as it tracks the user-visible STR more closely.

Hmm, the test seems to be passing on Linux but failing on Mac: https://treeherder.mozilla.org/jobs?repo=try&revision=4a8ca51e4819de7255b715e0704f0b5eae4c4dd1

Actually, it seems like it's passing on non-WR but failing on WR (the previous push just used mach try auto which seemed to schedule browser-chrome tests for non-WR on Linux but WR on Mac): https://treeherder.mozilla.org/jobs?repo=try&revision=0712747548a987d76270d6ab992381aa81b054eb

That's puzzling. I definitely tested it with and without your patch to make sure it was failing without and passing with. That being said the test only runs with WR so it's not surprising that it passes on non-WR (because it doesn't run).

Here's what I ran locally and passed: https://treeherder.mozilla.org/jobs?repo=try&group_state=expanded&revision=601208be1d727fe5a5ecacfa58fefe5fc49c17a2

(Note that the top patch is the one from comment 25 but I also accidentally squashed a change in there to disable all the other subtests of browser_test_group_fission.js)

If it passes on try then most likely something changed between my base rev and current m-c. If it fails then it's some sort of discrepancy between local runs and try runs.

(In reply to Kartikaya Gupta (email:kats@mozilla.staktrace.com) from comment #29)

So that was green.

Err scratch that. Looks like my base rev was from before WR got enabled on macOS CI, so WR was disabled on that push and the test didn't get run. Sigh.

I ran the rebased test locally and it passes. So there must be some difference between try and my local setup that's causing it to fail on try :(

Attachment #9186049 - Attachment description: Bug 1645433 - Add a test → Bug 1645433 - Add a test. r=botond
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/7e28049666b7
Avoid using INT32_MAX as the z-index of inactive scrollframe items. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/34c5981b4667
Add a test. r=botond

The reftest failures were due to us sometimes computing a z-index of -1 for the inactive scrollframe item. That caused the inactive item to be behind non-positioned items for the scrolled content. Making sure the z-index is at least 0 resolved the issue.

Flags: needinfo?(botond)
Pushed by bballo@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/6846280ac5a0
Avoid using INT32_MAX as the z-index of inactive scrollframe items. r=mattwoodrow
https://hg.mozilla.org/integration/autoland/rev/39e2701db99a
Add a test. r=botond
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 85 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: