Closed Bug 1612549 Opened 5 years ago Closed 2 years ago

Automatically retry failing tests

Tracking

(Not tracked)

Status:

RESOLVED MOVED

People

(Reporter: marco, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Marco Castelluccio [:marco]

Reporter

Description

•

5 years ago

After we are done with the smart scheduling work, a few nice possible enhancements will be unlocked. For example: automatically retrying failing tests to check if they are intermittent.

Marco Castelluccio [:marco]

Reporter

Comment 1

•

5 years ago

NOTE: Doing this will in turn also improve the data we have and thus the results of the ML algorithm.

Geoff Brown [:gbrown]

Updated

•

5 years ago

Priority: -- → P3

Marco Castelluccio [:marco]

Reporter

Comment 2

•

5 years ago

We can schedule these tasks with lower priority.

Marco Castelluccio [:marco]

Reporter

Updated

•

4 years ago

Blocks: sheriff-workflow

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

4 years ago

I think this is a great idea- there are a few ways to tackle this:

retry inside the job- save A LOT of resources; if the machine is bad or infra issues this won't help
retry the same job with same set of tests

The risk here is that if we accidentally schedule for most if not all intermittent test failures, then we could be scheduling many jobs which would not be good on platforms like OSX and Android hardware.

As this is for sheriffs, we should focus on something like autoland and m-c, and keep this for windows10-64 and linux64 only. I would hesitate to put restrictions on number_of_retriggers / revision as then the sheriffs won't know what is retriggered and what is not, it then adds more work to the sheriffs effectively cancelling out the savings of retriggering.

On the point of time savings vs adding- this has a time shifting phase for extra data- sheriffs are looking at failures in <30 minutes and in most cases if not all having decisions made and adding annotations. In some cases they need to wait for retriggers/backfills/future data- and that can take a few hours. So any work we do here needs to think about the sheriff workflow and how we would present data to them via treeherder.

Marco Castelluccio [:marco]

Reporter

Comment 4

•

2 years ago

This is being tracked in https://github.com/mozilla/mozci/issues/654 and implemented in https://github.com/mozilla/mozci/pull/771.

Status: NEW → RESOLVED

Closed: 2 years ago

Resolution: --- → MOVED

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Automatically retry failing tests

Categories

(Testing :: General, enhancement, P3)

Tracking

(Not tracked)

People

(Reporter: marco, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Updated

Comment 2

Updated

Comment 3

Comment 4