Closed Bug 1466923 Opened 6 years ago Closed 6 years ago

Test verification of web-platform tests sometimes reports PASS for steps with test failures

Tracking

(Not tracked)

Status:

RESOLVED WONTFIX

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

Geoff Brown [:gbrown]

Assignee

Description

•

6 years ago

Follow-up on https://bugzilla.mozilla.org/show_bug.cgi?id=1466740#c2. Consider: https://treeherder.mozilla.org/logviewer.html#?job_id=181826441&repo=mozilla-inbound&lineNumber=1329-1343 02:37:46 INFO - ## All results ## 02:37:46 INFO - 02:37:46 INFO - ### /css/css-scoping/shadow-host-with-before-after.html ### 02:37:46 INFO - | Subtest | Results | Messages | 02:37:46 INFO - |---------|---------|----------| 02:37:46 INFO - | | FAIL | | 02:37:46 INFO - 02:37:46 INFO - ::: Running tests in a loop 10 times : PASS 02:37:46 INFO - ::: Running tests in a loop with restarts 5 times : PASS 02:37:46 INFO - ::: Running tests in a loop 10 times with flags chaos_mode_flags=3 : PASS 02:37:46 INFO - ::: Running tests in a loop with restarts 5 times with flags chaos_mode_flags=3 : PASS 02:37:46 INFO - ::: 02:37:46 INFO - ::: Test verification PASS 02:37:46 INFO - ::: 02:37:47 INFO - Return code: 0 If we know that a test has failed, why list PASS for each step, and for overall test verification? It seems inconsistent and confusing. (On the other hand, it's not too serious, because the task is still orange.)

Geoff Brown [:gbrown]

Assignee

Comment 1

•

6 years ago

:jgraham -- I suspect you understand this code much better than I. Should we be checking the 'results' here, in addition to 'inconsistent'? https://dxr.mozilla.org/mozilla-central/rev/d8f180ab74921fd07a66d6868914a48e5f9ea797/testing/web-platform/tests/tools/wptrunner/wptrunner/stability.py#267-277

Flags: needinfo?(james)

James Graham [:jgraham]

Comment 2

•

6 years ago

It depends what you think the purpose of test-verify is. In my model the defining feature of test-verify is that it checks stability, and the PASS at the end only refers to whether the test is stable or not. Putting FAIL for a test that consistenly FAILs is wrong because — assuming it's marked as failing in the metadata — nothing is wrong. That's not a situation that occurs for other test harnesses because they typically don't allow any non-passing tests, but it is something we need in wpt.. In this specific case it seems like the test itself FAILed, stably, on Windows. So what we see is TEST-UNEXPECTED-FAIL from the test not macthing expectations, FAIL (only) in the list of results obtained, and PASS for the stability check. This is perhaps confusing, but it is presenting all the information. I'm not sure it would be less confusing to make some of these things degenerate, e.g. marking the stability check as FAIL if the test has an unexpected result might also make people believe the test is unstable when that isn't actually the case.

Flags: needinfo?(james)

Geoff Brown [:gbrown]

Assignee

Comment 3

•

6 years ago

(In reply to James Graham [:jgraham] from comment #2) Thanks James, that's quite helpful. > In my model the defining feature of test-verify is that it checks stability, I might agree with you, but I'm not 100% sure... > Putting FAIL for a test that consistently FAILs is wrong because — assuming > it's marked as failing in the metadata — nothing is wrong. I agree completely with that case. I am still bothered by the case -- as in comment 0 -- where a test fails consistently but it is not marked as failing in the metadata. If there is no metadata expectation that the test fails, but TVw finds the test failing consistently, is that "stable"?

James Graham [:jgraham]

Comment 4

•

6 years ago

> If there is no metadata expectation that the test fails, but TVw finds the test failing consistently, is that "stable"? Yes, IMO. I should note that we use the same code for stablity checking upstream, and in that case we don't have expectation data, so any test that happens to not pass on Firefox or Chrome would be regarded as "unstable" if we made stability dependent on the results themselves rather than just on the consistency of results between runs. Obviously for the m-c case we always have expectation data by the time TVw runs, so we could have different behaviour there, but that would have to be a flag of some sort, since we can't tell the difference between missing expectation data and expecting a pass, which would add some complexity. Since we always fail the job correctly and the issue is "just" how hard it is to interpret the log, I'd rather not change anything here. Ideally we would start scheduling things so that changing a web-platform-test always causes the relevant test to run in the normal jobs, which I think would make things clearer here.

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

6 years ago

great discussion. I think this is either a P5 or a WONTFIX given that the job reports a proper green || orange colour when finished. If we correctly identify the proper test then we point someone at the right problem (developer or sheriff). The remaining question is how can we ensure less confusion when reading a log file and figuring out what failed and why. Can we print out expected status from web-platform-tests? Maybe knowing it is marked as FAIL but it was expected FAIL, then that would help reduce confusion; of course any action like this would need to be output in the right place so it could be read easily (thinking about debug and the hundreds of lines of spew between useful nuggets of information).

Geoff Brown [:gbrown]

Assignee

Updated

•

6 years ago

Assignee: nobody → gbrown

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → WONTFIX

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Test verification of web-platform tests sometimes reports PASS for steps with test failures

Categories

(Testing :: web-platform-tests, enhancement)

Tracking

(Not tracked)

People

(Reporter: gbrown, Assigned: gbrown)

References

(Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Updated