We run a lot of duplicated reftests
Categories
(Testing :: Reftest, enhancement, P4)
Tracking
(firefox83 fixed)
Tracking | Status | |
---|---|---|
firefox83 | --- | fixed |
People
(Reporter: MatsPalmgren_bugz, Assigned: jgraham)
References
(Blocks 1 open bug)
Details
Attachments
(9 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/html
|
Details | |
(deleted),
text/html
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details | |
(deleted),
text/x-phabricator-request
|
Details |
Comment 1•6 years ago
|
||
Comment 2•6 years ago
|
||
Reporter | ||
Comment 3•6 years ago
|
||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Reporter | ||
Comment 6•6 years ago
|
||
Assignee | ||
Comment 7•6 years ago
|
||
Reporter | ||
Comment 8•6 years ago
|
||
Assignee | ||
Comment 9•6 years ago
|
||
Reporter | ||
Comment 10•6 years ago
|
||
Assignee | ||
Comment 11•6 years ago
|
||
Reporter | ||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
Comment 14•6 years ago
|
||
Assignee | ||
Comment 15•6 years ago
|
||
Reporter | ||
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
Reporter | ||
Comment 18•6 years ago
|
||
Assignee | ||
Comment 19•6 years ago
|
||
Assignee | ||
Comment 20•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Comment 21•6 years ago
|
||
Comment 22•6 years ago
|
||
Comment 23•6 years ago
|
||
Reporter | ||
Comment 24•6 years ago
|
||
Assignee | ||
Comment 25•6 years ago
|
||
Reporter | ||
Comment 26•6 years ago
|
||
Assignee | ||
Comment 27•6 years ago
|
||
Assignee | ||
Comment 28•6 years ago
|
||
Reporter | ||
Comment 29•6 years ago
|
||
Comment 30•6 years ago
|
||
Assignee | ||
Comment 31•6 years ago
|
||
Reporter | ||
Comment 32•6 years ago
|
||
Assignee | ||
Comment 33•6 years ago
|
||
Reporter | ||
Comment 34•6 years ago
|
||
Reporter | ||
Comment 35•6 years ago
|
||
Assignee | ||
Comment 36•6 years ago
|
||
Reporter | ||
Comment 37•6 years ago
|
||
Comment 38•6 years ago
|
||
Comment 39•6 years ago
|
||
Comment 40•6 years ago
|
||
Comment 41•6 years ago
|
||
Updated•6 years ago
|
Comment 42•6 years ago
|
||
Reporter | ||
Comment 43•6 years ago
|
||
Reporter | ||
Comment 44•6 years ago
|
||
Assignee | ||
Comment 45•6 years ago
|
||
Reporter | ||
Comment 46•6 years ago
|
||
Assignee | ||
Comment 47•6 years ago
|
||
Reporter | ||
Comment 48•6 years ago
|
||
Assignee | ||
Comment 49•6 years ago
|
||
Reporter | ||
Comment 50•6 years ago
|
||
Assignee | ||
Comment 51•6 years ago
|
||
Assignee | ||
Comment 52•6 years ago
|
||
Comment 53•6 years ago
|
||
Reporter | ||
Comment 54•6 years ago
|
||
Assignee | ||
Comment 55•6 years ago
|
||
All the recorded blockers here are fixed. Are there other issues that people feel should block deduping the reftests?
Reporter | ||
Comment 56•6 years ago
|
||
No, I don't see any blockers. We can fix any remaining tests
that fails when run as WPT but not as a reftest as we go.
Comment 57•6 years ago
|
||
Are we running the WPT reftests on Android now, and if not, should we be blocking on that? Looking at a m-c push, I see Tier 2 Wr jobs on Android 7.0 x86 opt, but no other Android configurations.
Comment 58•6 years ago
|
||
I see the dependent bug 1509983 which enabled them for that configuration, in Tier 2. What prevents those from moving to Tier 1, and should we be running those tests in debug or other Android version configurations?
Assignee | ||
Comment 59•6 years ago
|
||
OK, so the Android configuration has been a little bit in flux, but the situation is clearer now.
We are currently running tests in Fennec as Tier-2 and (just landed, and not yet very green) GeckoView as Tier-3.
The intent is to green up the GeckoView version and then eventually promote that to Tier-1. I expect that Fennec tests will be dropped at some point, although that could be subject to product decisions. After the opt tests are Tier-1 (or at least Tier-2) we can look at debug, but I'm not confident we will be able to run the full suite in that configuration given the limited resources and relatively poor performance. But hopefully I'm wrong.
I don't think there's any plan to run the tests in other android configurations; given the relative performance it's hard to justify running things in ARM Andoid emulators on x86 hardware rather than in x86 Andoid emulators on x86 hardware unless there's a strong justification for doing so. For generic gecko code we've been taking the approach that Android vs !Android is a more significant distinction than ARM vs x86, although there are obviously exceptions for things like the Spidermonkey JIT. If that's something we need to do with (a subset of) wpt, we should discuss that.
Updated•6 years ago
|
Assignee | ||
Comment 60•5 years ago
|
||
Are there any raminaing issues that people feel are blocking this change? We now have Tier 1 support for wpt reftests on the Android emulators.
The confusion caused by the vendor-imports directory just came up talking to people working on other browser engines as a pain point for interop testing layout features.
Assignee | ||
Comment 61•5 years ago
|
||
dbaron: Do you have feedback on whether there are remaining blockers here?
Comment 62•5 years ago
|
||
I'm still a bit worried about whether the developer experience for writing reftests for WPT is on par with that for our original reftest harness.
Here's an example of something that seems unnecessarily difficult with WPT but was easy with the reftest harness (when I did it just a few hours earlier, but I was asked by my reviewers to make the test a WPT instead, probably for good reason):
I'm adding a new failing test for bug 1584018, and I want to run just that test to check that I wrote the manifest correctly (since I'm going to remove that manifest in the next commit when I fix the failure, so nobody else will ever actually run it). So I type ./mach web-platform-tests testing/web-platform/tests/css/css-flexbox/child-percent-basis-resize-1.html
, and instead of the normal process of running the tests given, closing, and printing output... since I ran just one test, the WPT harness decides that I magically wanted to leave the window open. And when I close the one of the two windows involved that has UI... that still doesn't close the other one, so I have to Ctrl+C the process and I don't get the printf'd output at the end that says whether the results were expected or not. Then I vaguely recall having this conversation with jgraham before, and recall that there was perhaps a --close-when-done
argument... but there's nothing that looks like that in ./mach help web-platform-tests
, and --close-when-done
is rejected as an error. So I give up, and just add a second test that I didn't need to run to the command line arguments in order to get the output that I wanted:
web-platform-test
~~~~~~~~~~~~~~~~~
Ran 2 checks (2 tests)
Expected results: 2
Unexpected results: 0
OK
(This was literally my first interaction with WPT since you needinfo'd me... and the fact that I hit something with subpar developer experience seems entirely normal.)
Comment 63•5 years ago
|
||
Oh, I guess the option I wanted would have been --no-pause-after-test
. (I still think the pause-after-test default doesn't make sense for reftests.)
Comment 64•5 years ago
|
||
(In reply to David Baron :dbaron: 🏴 ⌚UTC-7 from comment #63)
Oh, I guess the option I wanted would have been
--no-pause-after-test
. (I still think the pause-after-test default doesn't make sense for reftests.)
That seems reasonable, and hopefully a straightforward change to https://searchfox.org/mozilla-central/rev/2a355e56c490768be676689de18629485c9d699b/testing/web-platform/tests/tools/wptrunner/wptrunner/wptrunner.py#123 to check that test_loader.tests["reftest"]
is the array that's got the one item in it?
Assignee | ||
Comment 65•5 years ago
|
||
I filed and fixed bug 1591349 for the --pause-after-test
defaults to depend on the test type; now only testharness tests will default to pausing when only a single test is run, matching the mochitest behaviour.
It's not clear to me that blocking this change on wpt harness UX is the best way forward. The current situation is already causing problems for gecko developers (per comment 0) and developers of other engines who are trying to improve interop with gecko by using our tests (c.f. comment 60). I would expect that layout developers have to interact with the wpt harness anyway, so the downsides of increasing the number of tests for which that is required seems limited, and there are clear upsides in terms of reducing duplication, making editing the tests easier, and them more accessible to other engines.
That said, I'm of course keen to work with people to improve the wpt harness experience. It does of course have some constraints from the need to work cross-browser, and has some design decisions that are different from the gecko harness, but I think it's made a lot of progress in terms of performance and featureset over the last few years. Where there are still issues with missing features or poor UX for wpt reftests I'd really like to work with layout developers to get them fixed.
Comment 66•5 years ago
|
||
I'm ok with going ahead with this as long as you check that the fail/pass state matches as you deduplicate. If you find cases where tests are failing in one harness and passing in another, those need to be investigated and understood.
Comment 67•5 years ago
|
||
One other question: do the harnesses use different canvas sizes? (e.g., reftest harness uses 800x1000) If so, it might be worth a little bit more checking that the tests are testing what they're supposed to be testing at both sizes.
Comment 68•5 years ago
|
||
(In reply to David Baron :dbaron: 🏴 ⌚UTC-8 from comment #67)
One other question: do the harnesses use different canvas sizes? (e.g., reftest harness uses 800x1000) If so, it might be worth a little bit more checking that the tests are testing what they're supposed to be testing at both sizes.
Yes, they do. WPT uses 800x600. See also https://github.com/web-platform-tests/wpt/issues/7135 about making that configurable.
Assignee | ||
Comment 69•5 years ago
|
||
Status update:
The following tests have failures as wpt reftests (paths relative to testing/web-platform/tests/css/vendor-imports/mozilla/mozilla-central-reftests
:
flexbox/flexbox-align-self-horiz-004.xhtml
- Intermittent on Linux.flexbox/flexbox-mbp-horiz-003-reverse.xhtml
- Intermittent on Windowsflexbox/flexbox-min-height-auto-002b.html
- Failing on Linuxmasking/clip-path-geometryBox-2.html
- Failing in WebRendermasking/mask-image-2.html
,masking/mask-image-3a.html
,masking/mask-image-3b.html
,masking/mask-image-3c.html
,masking/mask-image-3d.html
,masking/mask-image-3e.html
,masking/mask-image-3f.html
,masking/mask-image-3g.html
,masking/mask-image-3h.html
,masking/mask-image-3i.html
- Intermittent timeout on Android opt (Bug 1624089)masking/mask-origin-2.html
- Failing everywhereshapes1/shape-outside-border-box-border-radius-002.html
,shapes1/shape-outside-border-box-border-radius-004.html
,shapes1/shape-outside-margin-box-border-radius-002.html
,shapes1/shape-outside-margin-box-border-radius-004.html
- Failing everywheresizing/vert-block-size-small-or-larger-than-container-with-min-or-max-content-2b.html
- Failing everywheretext3/segment-break-transformation-removable-1.html
,text3/segment-break-transformation-removable-2.html
,text3/segment-break-transformation-removable-3.html
,text3/segment-break-transformation-removable-4.html
- Failing on (linux) 32 bit.
If the intermittent or failing tests are "new" (i.e. last 6 months) they may need fuzzy annotations. We previously added logging for tests with a document larger than the viewport and elminated most/all of those, but it needs to be rechecked given there have been changes here in that time.
Comment 70•4 years ago
|
||
I spoke with :jgraham and :jmaher (separately) about this issue and the approach to stop running duplicate tests (especially so if they are intermittents).
As we know in the current climate of things, cost savings is a top priority. While not everything is about ROI, I think for things I'm involved in there's a heavy skew towards finding ways to cut costs, improve efficiency and ship a better product.
Intermittents cause sheriffs a lot of workload as they must annotate every Tier-1 and Tier-2 failures on autoland and mozilla-central.
If these tests are duplicated across Reftest and WPT(reftest), and it intermittently times out on one or the other, then I see two problems:
- we are causing additional workload to sheriffs for little to no gain, impacting their work.
- we are incurring additional costs in the CI system when duplicated tests are run as they add overhead, execution time and artifact storage space costs.
The duplicated tests are moderate in number and won't be a thousand-dollar impact immediately, but this is also probably one of the lower hanging fruits that we can pick off and not adversely impact the quality of tests or the shipped product in any way.
With the above stated, I am proposing that we either put the WPT(reftest) variant to a backlog status. Alternatively, mainline the WPT(reftest) variant leading to the Reftest variant being dropped.
Comment 71•4 years ago
|
||
I know there are a lot of duplicated tests between reftest and wpt-reftest, but wpt-reftest does have many unique reftests- I am not sure we want to put them all in backlog or turn them off.
Assignee | ||
Comment 72•4 years ago
|
||
I assume the proposal was just for these specific reftests, not all wpt reftests.
In any case I think that backlogging the wpt versions is the wrong solution here. Moving more things onto the wpt reftest harness and away from vendor-specific reftests is important for Gecko to maintain web compatibility. Note that a lot of the issues we have already fixed related to this bug were basically the tests testing the wrong thing in the cross-browser case (because of different viewport sizes, or incorrect fuzziness), meaning they weren't providing useful coverage for Blink or WebKit. Fixing those issues is good and doesn't happen if we continue to view internal reftests as the default and cross-browser reftests as the edge case.
AFIAK comment #69 identifies all the remaning tests that don't behave identically in the two harnesses. With the exception of the masking tests where there's a real issue on Android it's most likely that the tests are just requiring some additional fuzzy annotations or the platform specific failures aren't problematic. In any case, with the expection of those named tests I'd hope it's uncontroversial to remove the reftest harness copies of the remaining tests.
Comment 73•4 years ago
|
||
that is 24 tests- What would be the work required to fix these or the risk of not running these properly?
Do we support all modes in wpt-reftest (webrender, fission, linux/windows/osx/android, gpu, no-accel) ?
Comment 74•4 years ago
|
||
shapes1/shape-outside-border-box-border-radius-002.html, shapes1/shape-outside-border-box-border-radius-004.html, shapes1/shape-outside-margin-box-border-radius-002.html, shapes1/shape-outside-margin-box-border-radius-004.html - Failing everywhere
FWIW, these are already failing under our own reftest framework annotated in this reftest.list. I filed bug 1652112 to merge shapes tests into wpt to move this bug a bit forward.
Assignee | ||
Comment 75•4 years ago
|
||
that is 24 tests- What would be the work required to fix these or the risk of not running these properly?
Apart from the masking tests I suspect it's only verifying that these tests have the same behaviour as the reftest harness, or modifying something like a fuzzy annotation i.e. almost no work. For the masking tests there's a suspected bug somewhere where on android we can end up waiting for a layout that never happens or something. That's going to require a bit more work.
Do we support all modes in wpt-reftest (webrender, fission, linux/windows/osx/android, gpu, no-accel) ?
I think we never enabled the no-accel/gpu configurations? I don't know of any reason we couldn't do that though. But note that's largely an orthogonal issue to this one; if having full coverage on those configurations is important we already have a problem because we already have things that are only covered with wpt reftests (and enabling those is probably going to dwarf whatever cost savings we make here).
Assignee | ||
Comment 76•4 years ago
|
||
Updated•4 years ago
|
Assignee | ||
Comment 77•4 years ago
|
||
I couldn't reproduce this bug with a try push containing 20
retriggers, so I think it's safe to remove the annotations. If this
comes up again as an intermittent, we can revert this patch and
investigate further.
Assignee | ||
Comment 78•4 years ago
|
||
These tests are duplicated across the reftest harness and the wpt
reftest harness. This has a number of problems:
-
We run every test twice, taking unnecessary time and resources.
-
A bespoke sync process is required for this directory.
-
We often get metadata out of sync between the two copies, so they
seem to have different results. -
Other vendors often don't realise these tests exist, so they're less
useful for interop. -
When others do realise the test exist, they don't feel able to fix
issues in them because of the complex sync.
This patch removes the reftest harness copy of the tests. It seems
like the wpt reftest harness is sufficiently aligned with the reftest
harness that this shouldn't lose much in the way of coverage (and any
remaining differences are obviously a problem for all other wpt
reftests as well so should be fixed in general if there's a problem).
A notable remaining difference is that many object-fit tests are using
image-rendering:-moz-crisp-edges. The export script changed this to
image-rendering:pixelated, whih is not actually supported by gecko.
Assignee | ||
Comment 79•4 years ago
|
||
This better matches the original mozilla-central copies of these tests
Comment 80•4 years ago
|
||
I just notice we have some reftest-paged
reftests under layout/reftests/w3c-css/submitted
. https://searchfox.org/mozilla-central/search?q=reftest-paged&path=layout%2Freftests%2Fw3c-css%2Fsubmitted&case=true®exp=false
They are not running as wpt print reftest on wpt. I discover this because my patch breaks layout/reftests/w3c-css/submitted/break3/moz-block-fragmentation-001.html
on my try run, but it doesn't fail in wpt jobs.
Could you convert them to proper wpt print reftest, please? Bug 1645272 contains some examples of our previous attempt.
Assignee | ||
Comment 82•4 years ago
|
||
Assignee | ||
Comment 83•4 years ago
|
||
Comment 84•4 years ago
|
||
Comment 85•4 years ago
|
||
Comment 86•4 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/b6502643bedc
https://hg.mozilla.org/mozilla-central/rev/3cb57a090de2
https://hg.mozilla.org/mozilla-central/rev/f11d38d8ed40
https://hg.mozilla.org/mozilla-central/rev/1a10423235ce
https://hg.mozilla.org/mozilla-central/rev/25436958000f
https://hg.mozilla.org/mozilla-central/rev/69ffd9d94e53
https://hg.mozilla.org/mozilla-central/rev/af2cd2d1ca4e
Description
•