Open Bug 1357513 (test-verify) Opened 8 years ago Updated 2 years ago

[meta] New/modified test verification

Categories

(Testing :: General, enhancement, P3)

Product:

Component:

Type:

enhancement

Priority:

P3

Severity:

S3

Tracking

(Not tracked)

Status:

NEW

People

(Reporter: gbrown, Unassigned)

References

(Depends on 18 open bugs, Blocks 2 open bugs)

Details

(Keywords: meta)

Geoff Brown [:gbrown]

Reporter

Description

•

8 years ago

A major finding from the Stockwell project's triaging experience: Many frequent intermittent test failures arise from the introduction of new tests, or the modification of existing tests. A few random examples: https://bugzilla.mozilla.org/show_bug.cgi?id=1340413#c7 https://bugzilla.mozilla.org/show_bug.cgi?id=1318389#c2 https://bugzilla.mozilla.org/show_bug.cgi?id=1353894#c2 https://bugzilla.mozilla.org/show_bug.cgi?id=1351456#c1 https://bugzilla.mozilla.org/show_bug.cgi?id=1351409#c5 Sometimes a new/modified test fails frequently and obviously on try and the test is improved before check-in to an integration branch. Sometimes a new/modified test fails frequently and obviously on check-in and the changeset is backed out. But sometimes those checks fail and an intermittent test failure is introduced anyway. We can reduce intermittent failures by introducing tools and processes which find these cases faster. The basic strategy here is to notice when tests are being updated and subject those tests to more stringent verification right away. For example, when mochitest test_blah.html is updated in a push to try or an integration branch, a new test-verification job is run and it runs test_blah.html 50 times, in isolation. A similar test-verification mach command might be useful for ad-hoc use in development environments. Not all of the implementation details are clear to me, but some of them are; I'll file dependent bugs.

Geoff Brown [:gbrown]

Reporter

Updated

•

8 years ago

Depends on: 1357520

Geoff Brown [:gbrown]

Reporter

Updated

•

8 years ago

Depends on: 1357551

Geoff Brown [:gbrown]

Reporter

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1357557

William Lachance (:wlach)

Comment 2

•

8 years ago

It feels like there's a fair amount of overlap between this and bug 1322433. Some of the business logic is no doubt different but the basic concept ("run this job N times or until it fails") seems similar. You could consider using action tasks for this, which would have the added bonus of exposing the feature in the treeherder UI: http://gecko.readthedocs.io/en/latest/taskcluster/taskcluster/actions.html

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1371782

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1380121

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1380122

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1380126

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1390599

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1390884

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1390889

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1390893

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1391694

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1396901

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1396905

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1397043

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1397970

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1398953

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1398933

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1394910

Geoff Brown [:gbrown]

Reporter

Comment 3

•

7 years ago

Documentation at: https://developer.mozilla.org/en-US/docs/test_verification

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1400405

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1400691

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1400895

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1400967

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1400979

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1404525

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1404526

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1405141

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1405143

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

7 years ago

Depends on: 1405369

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1403565

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1405428

J. Ryan Stinnett [:jryans] (Use needinfo, replies may be slow)

Updated

•

7 years ago

Depends on: 1405561

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1406204

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1406213

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

7 years ago

Depends on: 1406663

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1406407

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1409507

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1409511

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1410911

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1411660

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1412349

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1418375

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1418363

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1423918

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1411298

Kartikaya Gupta (email:kats@mozilla.staktrace.com)

Updated

•

7 years ago

Depends on: 1425929

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Blocks: 1428828

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1431125

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Priority: -- → P3

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Updated

•

7 years ago

Depends on: 1439591

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1439589

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1441990

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1443177

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1453056

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1447179

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1455316

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1455309

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Assignee: gbrown → nobody

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1461440

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1461809

Geoff Brown [:gbrown]

Reporter

Updated

•

7 years ago

Depends on: 1462182

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1465117

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

7 years ago

Depends on: 1466187

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1466578

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1466862

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1460901

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1466923

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1467837

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1469583

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1471227

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1473392

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

6 years ago

Depends on: 1476318

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1475194

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1477976

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1483421

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1482413

BugBot [:suhaib / :marco/ :calixte]

Updated

•

6 years ago

Keywords: meta

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Blocks: Intermittents-2019

Depends on: 1483292

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1522113

Ting-Yu Lin [:TYLin] (UTC-8)

Updated

•

6 years ago

Depends on: 1534867

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1535417

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1536696

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Alias: test-verify

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1529238

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1545297

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1528471

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1550735

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1535287

Geoff Brown [:gbrown]

Reporter

Updated

•

6 years ago

Depends on: 1552300

Geoff Brown [:gbrown]

Reporter

Comment 4

•

5 years ago

One weakness of TV is that the TV task may not run with the same task configuration as the normal test task in which the tasks would run. For instance, if the xpcshell test task and the mochitest test task for a particular platform use different builds (eg. Windows xpcshell tests may run against a signed build) then TV can be configured to match one or the other, but not both. This is why TVg was introduced: So that different virtualizations could be used in TV/TVg; but deciding which tests apply to TV vs TVg has been tricky also. And task configurations are always changing.

:bc's recent work on "test isolation" suggests a different approach -- https://treeherder.mozilla.org/#/jobs?repo=try&tier=1%2C2%2C3&revision=bc8f78e7ea0b947c07b6a6c4c502882faa1b973f -- where existing task definitions are cloned. There would be additional challenges for TV, but I'm thinking TV could identify tests files from the hg log (as it does today), then spawn new tasks for each supported suite affected by the push. If a push modified a mochitest and an xpcshell test, TV would notice that, then spawn M-tv and X-tv tasks, each cloned from the appropriate existing task definition.

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1561884

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1568063

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1569982

Mark Banner (:standard8)

Updated

•

5 years ago

Depends on: 1577197

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1593779

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1599242

Joel Maher ( :jmaher ) (UTC -8)

Comment 5

•

5 years ago

:gbrown, given the upcoming changes in test scheduling (test manifest level) as well as recent fixes to retain meta data while retriggering, do you think fixing some of the scheduling issues for test-verify is accurate in the coming months?

I would like to know that test-verify works for all our major test harnesses and configs and that it is scheduled properly. Maybe a stretch goal is to treat tests that do not pass test-verify as something we only run on m-c and not on try by default (i.e. lower value). I don't think we can consider something like that without knowing if test-verify is accurate.

Based on the dependencies:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=1357513&hide_resolved=1

it looks like there is some work to do here but not a lot.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Comment 6

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #5)

:gbrown, given the upcoming changes in test scheduling (test manifest level) as well as recent fixes to retain meta data while retriggering, do you think fixing some of the scheduling issues for test-verify is accurate in the coming months?

Since https://bugzilla.mozilla.org/show_bug.cgi?id=1522113#c14, most of my TV scheduling concerns have been addressed. Do you have TV scheduling concerns? How would test manifest level test scheduling affect TV scheduling?

I would like to know that test-verify works for all our major test harnesses and configs and that it is scheduled properly.

test-verify supports wpt, mochitest (including subsuites, etc), reftest/crashtest/jsreftest, and xpcshell; nothing else.

Maybe a stretch goal is to treat tests that do not pass test-verify as something we only run on m-c and not on try by default (i.e. lower value). I don't think we can consider something like that without knowing if test-verify is accurate.

TV is intended as an early warning system which draws attention to test vulnerabilities that can lead to intermittent failures; also, it provides a fast and convenient way to reproduce many intermittent failures quickly. I don't think it is appropriate to modify test scheduling based on TV results; certainly intermittent failure history is a more direct, simple, and fair metric to use for such purposes. (This is part of why I keep saying that tier-1 TV should be a non-goal.)

I believe that TV is mostly accurate: It finds genuine vulnerabilities in tests, it reproduces most frequent intermittent failures, it very rarely fails without good reason. There is sometimes a perception that TV is not accurate because it reports failures related to tests relying on state established by other tests (eg. tests that cannot run standalone cannot pass TV).

Based on the dependencies:
https://bugzilla.mozilla.org/showdependencytree.cgi?id=1357513&hide_resolved=1

it looks like there is some work to do here but not a lot.

Bug dependencies here reflect a mixture of in-progress work and imminent plans that have been unexpectedly postponed; more extensive, longer term plans for TV were proposed in planning documents in Q4 2018 and Q1 2019 and a trimmed down version ("smart" TV) was again proposed recently but none of these proposals have been supported. Given the on-going lack of investment in TV, I am considering de-scheduling it entirely in 2020.

Flags: needinfo?(gbrown)

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1610886

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1551889

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1628695

Rob Wu [:robwu]

Updated

•

5 years ago

Depends on: 1641900

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1640758

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1598776

Geoff Brown [:gbrown]

Reporter

Updated

•

5 years ago

Depends on: 1642360

Geoff Brown [:gbrown]

Reporter

Updated

•

4 years ago

Depends on: 1708763

Geoff Brown [:gbrown]

Reporter

Updated

•

3 years ago

Depends on: 1724296

Geoff Brown [:gbrown]

Reporter

Updated

•

3 years ago

Depends on: 1720101

Geoff Brown [:gbrown]

Reporter

Updated

•

3 years ago

Depends on: tv-chaosmode-timeout-wpt

Geoff Brown [:gbrown]

Reporter

Updated

•

2 years ago

Depends on: 1783867

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.