Closed Bug 920161 Opened 11 years ago Closed 11 years ago

Cloning of talos repo does not retry and/or output a TBPL compatible failure message ("command timed out: 3600 seconds without output, attempting to kill")

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: mozilla)

Details

(Keywords: intermittent-failure, sheriffing-P1)

Attachments

(1 file)

+++ This bug was initially created as a clone of Bug #920153 +++ The talos equivalent of bug 920153. eg: https://tbpl.mozilla.org/php/getParsedLog.php?id=27701351&tree=Mozilla-Inbound { 06:13:24 INFO - ##### 06:13:24 INFO - ##### Running clone-talos step. 06:13:24 INFO - ##### 06:13:24 INFO - Running pre-action listener: _resource_record_pre_action 06:13:24 INFO - Running main action method: clone_talos 06:13:24 INFO - Populating webroot /builds/slave/talos-slave/talos-data... 06:13:24 INFO - rmtree: /builds/slave/talos-slave/talos-data/talos 06:13:24 INFO - retry: Calling <function rmtree at 0xb6e6d454> with args: ('/builds/slave/talos-slave/talos-data/talos',), kwargs: {}, attempt #1 06:13:24 INFO - retry: Calling <bound method Talos._get_revision of <mozharness.mozilla.testing.talos.Talos object at 0xb7040b6c>> with args: (<mozharness.base.vcs.mercurial.MercurialVCS object at 0xb6ea1a2c>, '/builds/slave/talos-slave/test/build/talos_repo'), kwargs: {}, attempt #1 06:13:24 INFO - Setting /builds/slave/talos-slave/test/build/talos_repo to http://hg.mozilla.org/build/talos revision ca2229a32cb6. 06:13:24 INFO - Cloning http://hg.mozilla.org/build/talos to /builds/slave/talos-slave/test/build/talos_repo. 06:13:24 INFO - Running command: ['hg', '--config', 'ui.merge=internal:merge', 'clone', 'http://hg.mozilla.org/build/talos', '/builds/slave/talos-slave/test/build/talos_repo'] 06:13:24 INFO - Copy/paste: hg --config ui.merge=internal:merge clone http://hg.mozilla.org/build/talos /builds/slave/talos-slave/test/build/talos_repo command timed out: 3600 seconds without output, attempting to kill process killed by signal 9 program finished with exit code -1 elapsedTime=3658.177546 ========= Finished '/tools/buildbot/bin/python scripts/scripts/talos_script.py ...' failed (results: 2, elapsed: 1 hrs, 58 secs) (at 2013-09-11 07:13:24.901622) ========= } Expected: * Few retries of the hg clone * "Automation Error: Unable to clone talos repo" (and buildbot not having to kill the run).
Keywords: sheriffing-P1
Chris, is there someone that could take a look at this for us? :-)
Flags: needinfo?(catlee)
Attached patch talos_timeout (deleted) — Splinter Review
This patch: * allows for an "output_timeout" in the repo definition for MercurialVCS * fixes a bunch of pep8 in mozharness.mozilla.testing.talos * adds a hardcoded 1200 second talos clone timeout * adds a hardcoded mozprocess dependency; we now create the virtualenv before cloning talos, and then update the virtualenv with talos+pyyaml later. Got past the talos clone on ash with no concerns; the talos clone had an output timeout of 1200 as expected.
Assignee: nobody → aki
Attachment #8355690 - Flags: review?(jgriffin)
Flags: needinfo?(catlee)
Comment on attachment 8355690 [details] [diff] [review] talos_timeout Review of attachment 8355690 [details] [diff] [review]: ----------------------------------------------------------------- lgtm
Attachment #8355690 - Flags: review?(jgriffin) → review+
Armen: are you going to merge mozharness at some point this week, or should I?
Flags: needinfo?(armenzg)
Merged mozharness (not getting CCed to this bug). (In reply to Aki Sasaki [:aki] from comment #123) > Armen: are you going to merge mozharness at some point this week, or should > I? I had pushed new code to Cypress to make sure that default was in good shape. I had to do a lot of retries until it all looked green.
Flags: needinfo?(armenzg)
Thanks! This bug should be resolved for desktop talos. Android panda talos runs a separate workflow and may not be fixed yet, but looks to be the minority of issues posted here. I *think* I should resolve this bug, but could leave open/morph for Pandas. Do you have a preference?
Flags: needinfo?(emorley)
(In reply to Aki Sasaki [:aki] from comment #125) > I *think* I should resolve this bug, but could leave open/morph for Pandas. > Do you have a preference? Let's close this - very few of the failures were for the Pandas - we can always file another bug if needed. Thank you all for fixing this! :-D
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: needinfo?(emorley)
Resolution: --- → FIXED
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: