Closed Bug 1621740 Opened 5 years ago Closed 4 years ago

Get all Tier 1 tests passing with apz.allow_zooming=true

Categories

(Core :: Panning and Zooming, task, P3)

task

Tracking

()

RESOLVED FIXED
81 Branch
Tracking Status
firefox81 --- fixed

People

(Reporter: botond, Assigned: kats)

References

Details

(Whiteboard: apz-planning)

Attachments

(2 files)

Before we can land a patch to flip apz.allow_zooming=true on the Nightly channel, we need to get all Tier 1 tests passing with that pref flipped. This bug tracks.

Depends on: 1556556
Priority: -- → P3
Blocks: 1623473
Whiteboard: desktop-zoom-nightly
Whiteboard: desktop-zoom-nightly → apz-planning

To get a feel for the scope of the work here, this Try push with the pref flipped shows 38 failing test jobs on Linux debug.

A couple of notes:

  • I didn't vet the failures for existing intermittents
  • It's likely that large subsets of the failures share the same few root causes (but there's also likely to be a tail of issues causing individual failures)

Also, a piece of good news here is that, unlike the Android test failures in bug 1556556, Linux test failures can be debugged using rr / pernosco.

I'm looking into the assert in accessible/tests/mochitest/hittest/test_browser.html

Depends on: 1636380

Fresh push

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f80c461063f5ab4b164c1e7d8ee6e4e2e0d02402

A couple of new asserts hitting, haven't looked at the differences in detail.

Depends on: 1640381

Going through the failures there are a copy of common threads:
-crashtests that already asserts during reflow get a few more reflow asserts (we can annotate but if we do extra reflow for every page load we might want to avoid that)
-any reftest using the reftest-async-scroll attribute fails, looks like the aync scroll doesn't get applied
-some mochitests fail what seem like some basic geometric tests, and hence the fix for those could fix a lot of other mochitests

TEST-UNEXPECTED-FAIL | dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html | CSS viewport width is meta viewport width - got 308, expected 320
TEST-UNEXPECTED-FAIL | dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html | CSS viewport height is meta viewport height - got 308, expected 320

TEST-UNEXPECTED-FAIL | accessible/tests/browser/bounds/browser_test_resolution.js | Wrong scaled width of [DOM node id: p1, role: paragraph, address: 0x7fc9af6c2dc0] - Got 2528, expected 2504 with error of 2 -

-some devtools tests hit

Assertion failure: !mUsingAsyncZoomContainer || !haveRootContentOutsideAsyncZoomContainer (If there is an async zoom container, all scroll nodes with root content scroll metadata should be inside it), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:566

And then there are a bunch of mochitest failures, but I think we should look into the two failing mochitests above first (or something similar).

(In reply to Timothy Nikkel (:tnikkel) from comment #6)

Assertion failure: !mUsingAsyncZoomContainer || !haveRootContentOutsideAsyncZoomContainer (If there is an async zoom container, all scroll nodes with root content scroll metadata should be inside it), at /builds/worker/checkouts/gecko/gfx/layers/apz/src/APZCTreeManager.cpp:566

There is an STR on file that causes this at bug 1610657, could have the same underlying cause.

I think it might be a good idea to turn apz.allow_zooming on by default in tests, and explicitly disable the pref in tests that are failing. That will ensure that (a) any new tests that are added don't set us back by adding new failures, because they will be backed out on landing and (b) As fix the test failures and re-enable the pref on those individual tests, we can be sure we're not regressing other tests.

So my proposal:
a) Land patch that adds apz.allow_zooming=true to the testing profile and turns it back off for individual tests that are failing with it.
b) Work through the tests that have it off, re-enabling them one at a time
c) Once all the tests are good, or we deem that that remaining tests don't need zooming enabled, we can turn on the pref in nightly.
d) Once the pref enablement is riding the trains we can remove it from the testing profile as it won't be needed there any more.

Thoughts?

(In reply to Timothy Nikkel (:tnikkel) from comment #6)

-some mochitests fail what seem like some basic geometric tests, and hence the fix for those could fix a lot of other mochitests

TEST-UNEXPECTED-FAIL | dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html | CSS viewport width is meta viewport width - got 308, expected 320
TEST-UNEXPECTED-FAIL | dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html | CSS viewport height is meta viewport height - got 308, expected 320

Similar to this, there's e.g. this wpt failure:

TEST-UNEXPECTED-FAIL | /css/css-position/position-fixed-at-bottom-right-on-viewport.html | position:fixed - assert_equals: expected "0px" but got "-12px"

It looks very much like the scrollbars are impacting layout properties when they shouldn't be.

Btw here's a new try push I did earlier today: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=eefa7d752132f78d57351de88f7bdbfc99fe5bfc

A lot of the failures are due to this scrollbar-impacting-layout-properties problem, so we should probably tackle that first. Should be easily reproducible/debuggable with rr but I don't have a good mental model of the relevant layout code. Still, I can take a whack at it later this week if nobody beats me to it.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #10)

Btw here's a new try push I did earlier today: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=eefa7d752132f78d57351de88f7bdbfc99fe5bfc

Looks like crashtests and reftests got fixed. Would be interesting to know what fixed them.

Your plan on enabling the pref for tests sounds reasonable to me.

(In reply to Timothy Nikkel (:tnikkel) from comment #11)

Looks like crashtests and reftests got fixed. Would be interesting to know what fixed them.

Actually, I'm not sure, but I think the way you enabled zooming in testing profiles does not affect reftests and crashtests, so they are probably still broken.

Depends on: 1642869

(In reply to Timothy Nikkel (:tnikkel) from comment #12)

(In reply to Timothy Nikkel (:tnikkel) from comment #11)

Looks like crashtests and reftests got fixed. Would be interesting to know what fixed them.

Actually, I'm not sure, but I think the way you enabled zooming in testing profiles does not affect reftests and crashtests, so they are probably still broken.

Yeah, I think testing/profiles/reftest/user.js is used for reftest/crashtests

dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html specifies a metaviewport width and height of 320 and enables meta viewport pref, which we don't support with layout scrollbars. Scaling a 320x320 document into a window that is wider than it is long correctly gives us a vertical scrollbar, and the (layout) scrollbar correctly decreases the clientWidth. We should just disable the test when we have layout scrollbars.

/css/css-position/position-fixed-at-bottom-right-on-viewport.html the fixed pos item and scrollbars render the same with and without the allow zooming pref, so the difference must be in the getComputedStyle somehow.

(In reply to Timothy Nikkel (:tnikkel) from comment #14)

dom/tests/mochitest/dom-level0/test_innerWidthHeight_script.html specifies a metaviewport width and height of 320 and enables meta viewport pref, which we don't support with layout scrollbars. Scaling a 320x320 document into a window that is wider than it is long correctly gives us a vertical scrollbar, and the (layout) scrollbar correctly decreases the clientWidth. We should just disable the test when we have layout scrollbars.

Alternatively, we could force overlay scrollbars for these tests, right? However when I run this particular test with --setpref apz.allow_zooming=true --setpref ui.useOverlayScrollbars=true it still fails. I'm going to dig into this a bit more, I'd rather not disable tests if there's a more targeted solution we can apply.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)

--setpref ui.useOverlayScrollbars=true

Doh, it's an int pref, so it needs to be =1. With that the test passes. It's also a pref that doesn't require a restart, which is good, because we can just set it on individual tests that are enabling meta-viewport.

Notes as I go through the tests:

  • test_sizetocontent_clamp.html - not obvious what the problem here is. meta-viewport pref is not enabled but the innerWidth is modified
  • test_innerWidthHeight_script.html - setting ui.useOverlayScrollbars=1 here seems appropriate and fixes the test
  • test_bug982141.html - this requires the root APZC to not be scrollable and I guess making it zoomable changes that. So setting apz.allow_zooming=false seems like the right fix here
  • test_frame_reconstruction.html - not obvious, needs investigation
  • start-edge-in-block-layout-direction.html - not obvious, needs investigation
  • test_group_hittest.html subtests - not obvious, needs investigation. These seem like simple tests, might be good to investigate first. Both failing subtests seem like the same problem
  • test_group_wheelevents.html - hangs, not sure why. if hit-testing is affected (as implied by test_group_hittest failures) that might explain the hang, because wheel events might be going to the wrong thing
  • test_relative_update.html - needs investigation. this one smells like a potential code bug. also this one can't be fixed by setting apz.allow_zooming=false inside the test itself, because it's a single-page test and by the time we set the pref it's too late.
  • test_bug574663.html - probably also hit-test related

For the ones where I'm disabling apz.allow_zooming I'm a little concerned that it might break the test or make it invalid on Android where this pref is normally true always. Some of the tests don't run on Android but will need to check the rest.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)

  • test_bug982141.html - this requires the root APZC to not be scrollable and I guess making it zoomable changes that. So setting apz.allow_zooming=false seems like the right fix here

Maybe this?
https://searchfox.org/mozilla-central/rev/7cadba1d8b8feaec4615f5bb98aac4b8a719793c/layout/generic/nsGfxScrollFrame.cpp#1408

Seems plausible, yeah. Not sure how it works on Android though.

Also here's my WIP so far if you feel like stealing and building on it. https://github.com/staktrace/gecko-dev/commit/52762e0eac4aeb6b33116c2d1619e35effcb43fd

For now I'm focusing on the test_relative_update failure, in bug 1643042.

Assigning to myself as I'm looking into getting the tests passing as much as possible.

Assignee: nobody → kats

I'm starting to suspect that many of the test failures are caused by the fact that turning on the pref also causes the MVM to get activated, via this condition. This seems fundamentally wrong to me - although the MVM might be doing some quasi-important things with relation to resolution, the CSS viewport and such shouldn't be affected by flipping the pref.

Disabling the MVM fixes at least test_sizetocontent_clamp.html and I suspect will fix a number of other bugs too.

(In reply to Timothy Nikkel (:tnikkel) from comment #18)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #17)

  • test_bug982141.html - this requires the root APZC to not be scrollable and I guess making it zoomable changes that. So setting apz.allow_zooming=false seems like the right fix here

Maybe this?
https://searchfox.org/mozilla-central/rev/7cadba1d8b8feaec4615f5bb98aac4b8a719793c/layout/generic/nsGfxScrollFrame.cpp#1408

So yes, the root scrollframe now has mZoomable = true, which has quite a large set of changes downstream. For one thing, it takes the displayport that we set here which I think is kind of undesirable.

Consider the case (e.g. in test_bug982141.html) where the root scrollframe is not actually scrollable (due to overflow:hidden) but it is zoomable. In this case, I think we definitely want to be setting a displayport on whatever the first actually-scrollable scrollframe is, because that's likely to be the "primary" scroller that the user will use. We might want to also set a displayport on the root (zoomable) scrollframe but it might be better to do that lazily, when the user actually zooms it.

I expect that (a) zooming is going to happen much less frequently than scrolling and (b) the first zoom action will be a zoom in, which can be done in the compositor without any displayport as long as stuff is layerized properly. The main benefit of a displayport on a non-scrollable root frame is that when you're zoomed in, you can clip what you're painting to be less than the entire thing to save memory, while still keeping some amount of margin to scroll around the zoomed page without checkerboarding right away. And for that we don't need the displayport to be installed right away when the page loads, we can do it after the user zooms in.

With that in mind it might be good to remove the mZoomable condition from WantAsyncScroll and move it to some subset of the use sites of WantAsyncScroll.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #23)

With that in mind it might be good to remove the mZoomable condition from WantAsyncScroll and move it to some subset of the use sites of WantAsyncScroll.

Sounds reasonable to me. Or we could even extend the auto setting of a displayport to handle setting a displayport on the first zoomable and the first scrollable scrollframe it finds (if those are different).

Recentest push: https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=4c67347df8088cf7ba2703c11d7085461cd703ad

I started looking at the first reftest failure. Problem seems to be that the way the test's async scroll offset is applied, it gets picked up as part of the visual async transform which ends up on the zoom container layer instead of the contents where we actually want that transform. (In particular, the bg-fixed layer is a descendant of the zoom container and erroneously gets the scroll transform). I think in general when we have reftest-async-scroll properties we want them to get applied to both the layout and visual viewports on the APZ side. We will also want to add new attributes that apply to just the visual viewport, for future reftests that exercise scrolling the visual viewport around inside the layout viewport. Although I'm not sure what the current behaviour is with both reftest-async-zoom and reftest-async-scroll properties. Maybe there it's supposed to apply only to the visual viewport.

Ah the problem is more subtle than I thought. The attributes do move the visual viewport and the layout viewport is supposed to follow to keep the visual viewport contained, with this call. But that doesn't happen because the scrollable rect is short, because the test page has overflow:hidden.

^ That try push is quite green. Just two reftests that need fuzzing (one could argue that the difference on partial-prerender-expansion-rotate.html is actually a failure, but considering it's happening only on unaccelerated Windows I think the cost/benefit tradeoff to investigating is not worth it).

Now that all the deps are landed we could in theory turn on apz.allow_zooming on reftests/mochitests/wpt (there's just a few test expectations that need tweaking along with this change). We had decided to hold off on it for a bit to avoid the scenario where changes break behaviour with zooming disabled, but don't get caught by tests because they're running with zooming enabled. Of course, this means that changes might break behaviour with allow_zooming enabled. I'll do try pushes periodically to try and catch any such breakages.

So it looks like now that we have apz.allow_zooming in the Features.toml file we can't just flip the pref from StaticPrefList.yaml, it has to be false and Features.toml becomes the single point of truth. If we set it to true elsewhere there's a test failure. Except of course the test doesn't run on Android, so I have no idea what it's doing there now.. or if Features.toml is even used on Android. Sigh.

:mythmon, as the one who added the feature-gate code, can you clarify if the library is enabled on Android? There's a bunch of stuff like this which makes me think it is not, but then the documentation explicitly lists android as a valid condition. What happens when a pref is set to true on Android but then the Feature based on that pref leaves default-value as false?

Also, does the FeatureGate code always clobber the default value of the pref as set in StaticPrefList.yaml? i.e. if it set to true in StaticPrefList.yaml but the default-value for those conditions is false, will the FeatureGate code set a user pref to make it false? And finally, at what point during startup does the FeatureGate code do its pref-fiddling? I know in the past we've had issues with Normandy or Shield or one of the other "deploy an experiment to users" tools where it wasn't setting the pref early enough. I realize that the FeatureGate code allows you to specify a feature as restart-required, but how does that work with automated testing? i.e. if the pref is false in StaticPrefList.yaml, but the default-value of the Feature is true, and it requires a restart, will testing in automation run with the feature enabled or disabled?

Flags: needinfo?(mcooper)

Do we also need to consider tests that enable apz.allow_zooming only for themselves?

can you clarify if the library is enabled on Android?

I believe it is, based on the build system. The line you linked to in Troubleshoot.jsm isn't one that I worked on, so I can't comment on why it doesn't run on Android.

Feature gates does not do any pref fiddling at all. The default-value field is primarily used for when the value of a feature gate is read directly and the backing pref doesn't have a value.

I realize that the FeatureGate code allows you to specify a feature as restart-required, but how does that work with automated testing?

The restart-required field doesn't enforce any particular behavior. As far as I know the only user of the field is the about:preferences UI, which prompts the user to restart after changing a field. It doesn't do anything in automated tests.

I know in the past we've had issues with Normandy or Shield or one of the other "deploy an experiment to users" tools where it wasn't setting the pref early enough.

Feature Gates is not a "deploy an experiment to users" or really a remote change tool at all. It's a way to add metadata and move towards a more featureful version of prefs, but it doesn't actually make any changes to pref values itself. It's mostly a UI think right now.

I'm of the opinion that you should be able to set the default value of the feature gate to true (at least on Android), and that the test failure you saw in comment 31 is not a thing we should be testing. I think we should just remove that test, but again, I didn't work on the troubleshoot usage of feature gates, so I'm not sure of the implications there.

There has also been some talk of making the default-value field optional, but there are some challenges there, since the preferences system is complicated.

Flags: needinfo?(mcooper)

Ah, perfect, thank you for that explanation!

(In reply to Timothy Nikkel (:tnikkel) from comment #33)

Do we also need to consider tests that enable apz.allow_zooming only for themselves?

Those tests should continue to work as before. If apz.allow_zooming is set to true globally their attempt to set the pref should be a no-op.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #38)

(In reply to Timothy Nikkel (:tnikkel) from comment #33)

Do we also need to consider tests that enable apz.allow_zooming only for themselves?

Those tests should continue to work as before. If apz.allow_zooming is set to true globally their attempt to set the pref should be a no-op.

And when apz.allow_zooming is set to false globally the tests can still enable it for one test?

(In reply to Timothy Nikkel (:tnikkel) from comment #39)

And when apz.allow_zooming is set to false globally the tests can still enable it for one test?

Yup.

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/a816301284e6 Update wpt expectations for a couple of unexpected-passes. r=botond https://hg.mozilla.org/integration/autoland/rev/bb406c0968c5 Fuzz a reftest that shows some differences with allow_zooming. r=botond
Flags: needinfo?(kats)

I should have realized this before but these patches will cause failures on the central-as-beta simulation pushes. And also if we actually do a merge without desktop zooming riding the trains. So I'll update the patches to allow both passing and failing on these tests.

Pushed by kgupta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/044bc5a155d8 Update wpt expectations for a couple of unexpected-passes. r=botond https://hg.mozilla.org/integration/autoland/rev/e8faabdbc607 Fuzz a reftest that shows some differences with allow_zooming. r=botond
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
Target Milestone: --- → 81 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: