Open Bug 1520610 Opened 6 years ago Updated 2 years ago

Why does tv-bf fail to reproduce some intermittents?

Categories

(Testing :: General, defect, P3)

Version 3
defect

Tracking

(Not tracked)

People

(Reporter: gbrown, Unassigned)

References

Details

:jmaher has been running TV-bf on some known regressions and keeps finding cases where the test does not fail in TV-bf. Why is that?

Possible explanations:

  • Is TV-bf not running the test?
  • Is TV-bf failing to notice test failures?
  • Is TV-bf running in an appropriate environment? (Same instance type, same e10s, same ...?)
  • Is the intermittent too infrequent to find by (reasonable) repetition?
  • Does the intermittent only occur when run after another test in the manifest?

An example:

https://treeherder.mozilla.org/#/jobs?repo=autoland&searchStr=tv-bf&tochange=a85699150a8b513d42dc4eab0e17b7bd9926190b&fromchange=7da55789063f261d79bc947ac6338f6b2658e10e

It looks like the correct test is run: browser/base/content/test/static/browser_parsable_css.js
It never fails (no UNEXPECTED-FAIL in the log).
It is running on gecko-t-linux-large, with e10s config.
Other possibilities remain...

I have ran 6 so far and I see:

  • 2 tests fail on the regressed change and pass on the previous
    ** 1 of those had an intermittent failure 1/3 times on the previous push
  • 2 tests are already annotated as skip-if=verify, so we couldn't run them
  • 2 tests were not able to reproduce the failure (green on the offending and previous push)

Typically I run on the offending push and 2 prior pushes 3 times each. It would be good to denote machine information and specifics like e10s.

One thought is to run the test a single time in the "manifest" to replicate context- this would be different than normal because we run a set of manifests in a job- ideally we could get results in <10 minutes.

  • on a related note, it might be a good idea to have all our manifests have a time limit of 5 minutes on opt- if there are too many tests, then we split the manifest up dynamically or hardcoded (i.e. mochitest1.ini, mochitest2.ini, mochitest3.ini). Doing this will force us to not run long manifest sessions and shorter browser sessions, likewise it would force us to fix a chunk of tests that fail in verify mode and shrink the gap between a test-verify and our current solution.
Priority: -- → P3

In email today, we discussed a possible goal: "whenever there is a test-regression on the tree (dozens/week) we would run test-verify and it would be orange on the regressed push and green on the previous push at least 80% of the time", where "test-regression" means a test begins intermittently failing with frequency greater than some threshold (1 time in N, 2 <= N <= 10), where it was consistently passing in the recent past.

Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.