Closed Bug 829728 Opened 12 years ago Closed 12 years ago

Crashes whilst running Talos should turn the run orange rather than red

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: emorley)

Details

(Keywords: sheriffing-P1)

Attachments

(1 file)

Red on TBPL normally means infra. 

At the moment Talos crashes show as red - and IMO is half the reason devs don't realise how many talos failures are actually due to problems in-product (the other half is the lack of PROCESS-CRASH lines until recently).

We need buildbot to always use WARNINGS (orange), rather than FAILURE (red), unless it's definitely an infra issue.

Looks like we can do this at:
https://hg.mozilla.org/build/buildbotcustom/file/68f25a5132c4/steps/talos.py#l106
In fact, for the runTests step of the talos run, should we just always use orange (instead of red), like unit tests do?
Attached patch Patch v1 (deleted) — Splinter Review
Currently:
1) If the talos run-tests step had a non-green result, buildbot just returned the result as-is.
2) It it was green, we double-checked to see if "ERROR", "USAGE:" or "FAIL:" were present in the stdout, and if so, overrode the result. (IMO talos shouldn't be returning a zero exit code if there was a failure, so this is talos being broken).
3) Otherwise we returned green.

However:
* "USAGE:" doesn't even appear in the talos repo any more, so pointless to check for it.
* As mentioned above we should be able to rely on the exit code being non-zero if an error occurred & not have to override the result if it is SUCCESS. However, we have to leave these hacks in until we fix up talos to exit with the correct return value.

With this patch:
We now also check if the run result was FAILURE (red) + a non-harness/infra failure mode (currently PROCESS-CRASH and TEST-UNEXPECTED-FAIL) - and if so, swap it to WARNINGS (orange).

In other bugs, we can then:
1) Make all in-product failure mode use "{TEST-UNEXPECTED-FAIL,PROCESS-CRASH} | foo | bar" (so they appear orange). Crashes are already done, so this just leaves things like "timeout exceeded", "no output from browser", "failed to initialise browser" etc.
2) Make sure that all ("ERROR", "FAIL:") talos failure modes result in a non-zero talos return code (so we can clean up the hacks in the buildbotcustom talos steps even more).

Sound ok? :-)
Attachment #701834 - Flags: feedback?(jmaher)
Comment on attachment 701834 [details] [diff] [review]
Patch v1

Review of attachment 701834 [details] [diff] [review]:
-----------------------------------------------------------------

will superResult == FAILURE?  otherwise this looks good.
Attachment #701834 - Flags: feedback?(jmaher) → feedback+
Yeah, eg:
https://tbpl.mozilla.org/php/getParsedLog.php?id=18780960&full=1&branch=mozilla-inbound
(PROCESS-CRASH)

Full Log

builder: mozilla-inbound_panda_android_test-remote-tp4m_nochrome
slave: panda-0690
starttime: 1358173227.42
results: failure (2)
Comment on attachment 701834 [details] [diff] [review]
Patch v1

(See comment 2 for notes :-))
Attachment #701834 - Flags: review?(bhearsum)
Comment on attachment 701834 [details] [diff] [review]
Patch v1

Review of attachment 701834 [details] [diff] [review]:
-----------------------------------------------------------------

Thanks for this!
Attachment #701834 - Flags: review?(bhearsum) → review+
This is in production.
Looks good, eg https://tbpl.mozilla.org/?tree=Mozilla-Inbound&rev=aeacf3c6231b&jobname=talos 

:-)
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: