Reduce costs of running web-platform-tests scheduling by not running tests which are guaranteed to fail
Categories
(Testing :: web-platform-tests, enhancement)
Tracking
(firefox76 fixed)
Tracking | Status | |
---|---|---|
firefox76 | --- | fixed |
People
(Reporter: bc, Assigned: jgraham)
References
Details
(Whiteboard: [ci-costs-2020:done], [wptsync upstream])
Attachments
(3 files, 1 obsolete file)
web-platform-tests attempt to run every test in the manifests even if there is no chance that the test will pass. This results in about 20% of the time and money spent on running web platform tests being wasted to at least some degree.
Bug 1594796 Comment 3 mentions some drawbacks to simply turning off these tests. Coming up with a strategy that fits with the needs of the web platform tests and works for all branches is an open question.
Assignee | ||
Comment 1•5 years ago
|
||
"Guaranteed to fail" doesn't accurately characterise any suggested approach. Tests that are expected to fail may in fact pass (or error, timeout, or crash) due to code changes in Gecko, in the test, or in the harness. Tests for features we don't support may pass if they happen to be testing things that are true in the absence of any implementation (for example a test that we don't implement some historical variant of an API).
Reporter | ||
Comment 2•5 years ago
|
||
Thank you for correcting my misunderstanding. I am sure that whomever takes on this task will benefit.
Assignee | ||
Comment 3•5 years ago
|
||
I appreciate that was quite a pedantic correction, but I want to be sure that we have a common — and correct — understanding of the current situation to base decisions on.
Reporter | ||
Comment 4•5 years ago
|
||
This patch adds a --skip-status argument to web platform tests which
can be used to skip tests with the specified expected status. More
than one status can be specified by including additional --skip-status
arguments.
A corresponding --wpt-skip-status argument with the same usage is also
added to the mach try fuzzy command which can be used to set the
values for try runs.
A special value of NONE is used to clear an existing list of expected
statuses to be skipped. When a value of NONE is detected in the list
of skip statuses, all entries in the list up to and including the NONE
will be discarded leaving the subsequent values in the list.
NONE is intended to be used in try runs in order to reset the skip
list so that any skip statuses specified either in
taskcluster/ci/test/web-platform.yml or in
taskcluster/taskgraph/transforms/tests.py can be overridden by the
user.
The current version of this patch uses
taskcluster/taskgraph/transforms/tests.py to set
--skip-status=FAIL --skip-status=TIMEOUT --skip-status=ERROR
for the autoland and try repositories.
-
Default try run skipping tests with expected status FAIL, TIMEOUT and
ERROR which were set by the test transform:./mach try fuzzy --query web-platform-tests
-
Try run not skipping any any tests
./mach try fuzzy --query web-platform-tests --wpt-skip-status NONE
Updated•5 years ago
|
Reporter | ||
Comment 5•5 years ago
|
||
I think this approach where try is run by default with the FAIL, TIMEOUT and ERROR status skipped is not workable. If we did this as outlined in this patch then developers might submit patches which passed on try and autoland but then showed failures when run on mozilla-central. I think for this reason we should not use this approach.
Instead I think we should just set the skip status values only for autoland in the transform and allow mozilla-central and try to run the full set of tests. This would not result in the maximum possible savings but would saves 20% on autoland and would allow the tests to be run completely on mozilla-central thereby skirting any problems with wpt sync.
jgraham, ahal: I'm not looking for a full review yet but wanted to get your feed-back to see if this approach is acceptable. I expect that the final version will not set the skip status values for try.
Assignee | ||
Comment 6•5 years ago
|
||
The plan of record at the moment is to start by running tests for features we aren't implementing less frequently (or not at all), instead of using the expected status to determine when to run things. There's an action on me to start gathering the data on which tests this applies to by talking to product and engineering.
Reporter | ||
Comment 7•5 years ago
|
||
Looking at bug 1572820 comment 7, we've got some of the plumbing in place with this patch.
--skip-implementation-status=not-implementing,backlog corresponds to the --skip-status/--wpt-skip-status though it uses a comma delimited list which does seem better than the array value i used here.
If we are going to be implementing something similar for the other frameworks, then instead of having distinct wpt only skip argument for try, we could use the same argument name for try and for the runners for wpt and the other frameworks.
The other area to consider is how to specify these arguments for the different repositories. I couldn't figure out how to do this in taskcluster/ci/test/web-platform.yml and had to resort to using the transform in taskcluster/taskgraph/transforms/tests.py. Does any one have any suggestions on how to do this using the yaml file instead? The existing extra-options made life difficult trying to shoe-horn additional extra-option processing based on by-project.
Also, if we do specify project/repo based extra-options for the mozharness command in the tree especially for try how would we allow the user to override them? Is the use of NONE as a signal to clear preceding values acceptable?
Comment 8•5 years ago
|
||
Yeah, we should definitely make sure that we run everything on try (i.e, keep the behaviour on try / central the same, whatever we end up doing). Enabling this on autoland-only seems like the right approach!
Reporter | ||
Comment 9•5 years ago
|
||
Work in progress:
The following tests are supported:
- marionette
- mochitests
- web-platform-tests
- xpcshell
By default try does not have --skip-implementation-status set. Only autoland has the command line argument set in the test transform.
For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.
This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.
-
./mach try fuzzy --query 'xpcshell | mochitest | marionette | web-platform-tests' --skip-implementation-statuses not-implemented,backlog
Looks ok and does skip the appropriate tests. -
./mach try fuzzy --query 'reftest | crashtest | jsreftest -1$' --skip-implementation-statuses not-implemented,backlog --artifact
TODO: Though only the "supported" frameworks have the --skip-implementation-statuses set in taskcluster/taskgraph/transforms/tests.py setup_skip_implementation_statuses, the reftests were still called with the unsupported command line argument. I must be missing some point here.
Feedback requested.
Comment 10•5 years ago
|
||
thanks for the details :bc
I would like to support -if syntax, primarily for cases where we have 'windows installer tests' and those will be not_supported on android, linux, osx. That would also support where we cannot realistically support this test running in ccov mode or some other configuration.
2 other things to consider:
- change to --skip-statuses (small semantic, but allows for more flexibility)
- ensure we can support a 'never run' and a 'periodic run' vs default. I think the plan here would do that as we could do:
- 'never': --skip-statuses default, highly_intermittent, long_runtime
- 'periodic': --skip-statuses default, not_supported, backlog
- default: --skip-statuses highly_intermittent, long_runtime, not_supported, backlog
Assignee | ||
Comment 11•5 years ago
|
||
(more comments on the rest of this later)
--skip-statuses
is confusing because it sounds more like result statuses. If you want a better name that isn't tied to implemenation status call it --skip-flags
or something.
Reporter | ||
Comment 12•5 years ago
|
||
I've fixed a minor bug with the wpt support and confirmed that wpt's conditional manifest properties do work so we have that at least. I'll continue looking into the manifestparser filters with conditional properties.
Comment 13•5 years ago
|
||
(In reply to Bob Clary [:bc:] from comment #9)
For demonstration purposes, I used not-implemented,backlog as possible values for implementation-statuses. The appropriate values will need to be determined.
What would "backlog" mean in this case?
This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.
Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling active_tests
. So for example, the mochitest harness might decide to add both the skip_if
filter as well as the skip_implementation_status
filter (or maybe it just adds one or the other). Unless you are explicitly removing the skip_if
filter, I don't understand why skip-if
wouldn't be working.
Feedback requested.
What if we piggy-back off of the tagging system and implement --skip-tag
? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:
[test_foo.js]
tags = devtools not-implemented
[test_bar.js]
tags = long-runtime backlog
Then run with:
./mach mochitest --skip-tag not-implemented --skip-tag long-runtime
p.s would appreciate if you could push to phabricator for feedback as the hg.m.o
diffs aren't that great. It also gives us a place to leave comments on the code.
Comment 14•5 years ago
|
||
Alternatively, we could modify the existing --tag
attribute to pass its value through manifestparsers expression language. So you could do things like:
./mach mochitest --tag "!not-implemented && !long-runtime"
This is less user friendly, but potentially more powerful. Ftr, I think I prefer --skip-tag
.
Comment 15•5 years ago
|
||
could we support tag-if? I am thinking of the windows-installer test cases that are currently skipped on !win, maybe they have a win_only tag
Comment 16•5 years ago
|
||
Then we would need logic in the taskgraph to pass in --skip-tag win_only
on non-Windows tasks.. which seems much harder than using skip-if = os == "win"
:). I'd push back against tag-if
unless there was a really good reason for it.
Edit: Come to think of it.. why would win_only
be a conditional tag? It would need to be a constant tag anyway.
Comment 17•5 years ago
|
||
as mentioned in comment 10, there are a lot of tests that are skipped because of implementation details; that is confusing the data when trying to figure out what is skipped for other reasons (frequent/perma fail, etc.)
there is a good chance some wpt tests will be implemented on desktop but not yet on android, therefore we need some method to say this is not supported or not yet implemented on a specific platform or config.
Comment 18•5 years ago
|
||
Ah, that's a pretty good reason :). And I guess that explains what :bc meant by supporting the -if
syntax. In that case, yes.. tag-if
should be do-able. However, we may want to tag different things based on different conditions. E.g: not-implemented
on Windows, but backlog
on Android. So we can't use a single tag-if
key as manifestparser doesn't allow duplicate keys (note: this same problem would apply to implementation-status-if
).
So we'd need some kind of per tag expression, maybe something like:
[test_foo.js]
tags =
devtools
os == "android", long-running
os == "win", not-implemented
It's definitely going to add a whole bunch of complexity though. For instance, tags currently use a space as a delimiter. We'd have to switch to something else (newlines in example above).
Comment 19•5 years ago
|
||
(Honestly, I'd recommend implementing -if
in a follow-up.. whether or not we use tags or implemenation-status.. it's going to be a whole can of worms)
Reporter | ||
Comment 20•5 years ago
|
||
(In reply to Andrew Halberstadt [:ahal] from comment #13)
What would "backlog" mean in this case?
I just used it as an example from jgraham's bug 1572820 comment 3 in order to exercise the code. We need to figure out what values we need, then determine the tests which need the annotations.
This does not support the "-if" type tags. They appear to be best suited for the "built-in" filters and not the "Instance" filters I used. "built-in" seems to be geared towards fixed tag names and adding the conditional to the "Instance" filters would require plumbing changes to the expression parsing.
Could you expand what you mean here? Why would you need to "support" the skip-if filter from what looks like a completely separate filter? The manifest filters are supposed to act like a chain that modifies the test objects as they go (they're kind of like early versions of the transforms system we have in taskgraph). Consumers (aka test harnesses) then add whichever filters they want prior to calling
active_tests
. So for example, the mochitest harness might decide to add both theskip_if
filter as well as theskip_implementation_status
filter (or maybe it just adds one or the other). Unless you are explicitly removing theskip_if
filter, I don't understand whyskip-if
wouldn't be working.
I'm not talking about replacing the built-in filters skip_if, run_if or fail_if but am talking about something like implementation-statuses-if. It would be different from the current built-in filters which require the annotation to look like
<tag>-if = <boolean expression>
We need instead something that would allow
<tag>-if = <boolean expression> value
Which isn't supported by manifestparser's expression parser.
Note that the web-platform test's manifests and manifest parser already support conditional properties with multiple values. If we want to go down this route I would suggest we consider standardizing on web platform test's version.
What if we piggy-back off of the tagging system and implement
--skip-tag
? That way we can simply add a tag for whatever crazy purposes we can come up with (one of which being an implementation status). So e.g, in the manifest:[test_foo.js] tags = devtools not-implemented [test_bar.js] tags = long-runtime backlog
Then run with:
./mach mochitest --skip-tag not-implemented --skip-tag long-runtime
That might cover the use-cases. I'd have to think about it to see if it supported everything we wanted. A prerequisite would be to determine how we would want to annotate tests and how we would want to use the annotations to select tests.
p.s would appreciate if you could push to phabricator for feedback as the
hg.m.o
diffs aren't that great. It also gives us a place to leave comments on the code.
Will do. I'm doing another try run at the moment and just noticed the problem with the xpcshell tests on android. I was planning on doing a moz-phab submit --wip without a bug number once we were further along so we could share. I'll do that in a bit.
Reporter | ||
Comment 21•5 years ago
|
||
Assignee | ||
Comment 22•5 years ago
|
||
OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:
- I'd rather the type conversion stuff go in the manifest rather than in
wpttest.py
. See theprefs
implementation for comparison; this should be very similar. But if we use tags there might be no changes required. - I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.
Reporter | ||
Comment 23•5 years ago
|
||
(In reply to James Graham [:jgraham] from comment #22)
- I'd rather the type conversion stuff go in the manifest rather than in
wpttest.py
. See theprefs
implementation for comparison; this should be very similar. But if we use tags there might be no changes required.
I don't understand. Can you be more explicit in what you mean here?
Reporter | ||
Updated•5 years ago
|
Assignee | ||
Comment 24•5 years ago
|
||
See how tags work, the type conversion is in:
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#56-64
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/manifestexpected.py#274-276 (and later in the same file for "reasons").
then for wpttest.py
we just read the value and handle the (little-used) reset atom
https://searchfox.org/mozilla-central/source/testing/web-platform/tests/tools/wptrunner/wptrunner/wpttest.py#297-309
If we just use tags directly there's no changes required ofc. but if we take a different path, then that's the closest implementation to copy.
Reporter | ||
Comment 25•5 years ago
|
||
(In reply to James Graham [:jgraham] from comment #22)
Updated the phab requests with the changes.
OK, so for wpt this looks like a reasonable approach. A couple of minor feedback points:
- I'd rather the type conversion stuff go in the manifest rather than in
wpttest.py
. See theprefs
implementation for comparison; this should be very similar. But if we use tags there might be no changes required.
I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.
- I think we should explicitly mark these tests as skipped in the output, just like we do for tests that are disabled. So let's just add them to the list of disabled tests. Later we could consider having some extra information e.g. in the message property to indicate why they were disabled.
I already marked them as disabled and they showed up with the other tests as skipped. I also had a log message listing each test which was skipped and the reason(s) why.
Assignee | ||
Comment 26•5 years ago
|
||
What's the status here? I have enough information to nominate some wpt directories for "never run" on the basis that we have objected to the feature existing. Getting the "will implement eventually, but not yet on the roadmap" list pinned down might take longer due to the changes around product, but the former is the lowest hanging fruit anyway.
Reporter | ||
Comment 27•5 years ago
|
||
jgraham: I have https://phabricator.services.mozilla.com/D57497 which assumes we are using --skip-implementation-statuses=not-implemented,backlog. The next step is to decide which implementation statuses you want to use and I will put up a final version of the patch for review.
Assignee | ||
Comment 28•5 years ago
|
||
I think not-implementing
and backlog
are good (not-implemented
sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.
Reporter | ||
Comment 29•5 years ago
|
||
(In reply to James Graham [:jgraham] from comment #28)
I think
not-implementing
andbacklog
are good (not-implemented
sounds like anything that doesn't have an implementation). I didn't know if you were planning to change the implementation per comment #25; if not I'll do a review of the wpt changes.
(In reply to Bob Clary [:bc:] from comment #25)
I was talking with jmaher and I am leaning towards a small set of conditional tags instead of these implementation-statuses. That would give us conditional parsing of the manifests on web-platform-tests and the tests that use manifestparser.
Oh, right. Let me look into this. I'll ping you when I have any questions or something to review.
Reporter | ||
Updated•5 years ago
|
Updated•5 years ago
|
Reporter | ||
Comment 30•5 years ago
|
||
I started looking into this again and implemented not-implementing-if and backlog-if tags for manifestparser but really am at a loss on where to go with web-platform-tests. Personally, I think I did not understand the issues when I filed this bug, and it should be closed as invalid and a new one created by someone with a better understanding of the intended implementation and scope and new person assigned to carry it forward.
Assignee | ||
Comment 31•5 years ago
|
||
I can take this work. I had the impression your patch was basically doing the correct thing, so I'm hopeful that it isn't hard to get it over the line :)
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 32•5 years ago
|
||
This adds an implementation-status
field to the wpt metadata with 3
possible values; "not-implementing", "backlog" and "implmenting" (the
latter being the same as the default value). It also adds a
--skip-implementation-status
command line argument to wptrunner that
can be used to skip running tests with the specified implementation
statuses. This is primarilly to allow gecko to avoid running tests (or
run tests less frequently) for things where we don't implement the
spec and don't plan to either ever or in the current roadmap.
Assignee | ||
Comment 33•5 years ago
|
||
Assignee | ||
Comment 34•5 years ago
|
||
Comment 35•5 years ago
|
||
Comment 38•5 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/c2a6dd63c951
https://hg.mozilla.org/mozilla-central/rev/8823ff6be13b
https://hg.mozilla.org/mozilla-central/rev/bb0d32984294
Updated•5 years ago
|
Description
•