Closed Bug 808814 Opened 12 years ago Closed 12 years ago

mozharness download-and-extract should detect, retry, and report download errors

Categories

(Release Engineering :: Applications: MozharnessCore, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mozilla, Assigned: mozilla)

References

Details

(Whiteboard: [mozharness][unittest])

Attachments

(1 file, 2 obsolete files)

13:45:21 INFO - ##### 13:45:21 INFO - ##### Running download-and-extract step. 13:45:21 INFO - ##### 13:45:21 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.tests.zip 13:45:25 INFO - mkdir: /Users/cltbld/talos-slave/test/build 13:51:26 INFO - mkdir: /Users/cltbld/talos-slave/test/build/tests 13:51:26 INFO - Running command: ['unzip', '-o', '/Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip', 'bin/*', 'certs/*', 'modules/*', 'mozbase/*', 'mochitest/*'] in /Users/cltbld/talos-slave/test/build/tests 13:51:26 INFO - Copy/paste: unzip -o /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip bin/* certs/* modules/* mozbase/* mochitest/* 13:51:26 INFO - Archive: /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip 13:51:26 INFO - End-of-central-directory signature not found. Either this file is not 13:51:26 INFO - a zipfile, or it constitutes one disk of a multi-part archive. In the 13:51:26 INFO - latter case the central directory and zipfile comment will be found on 13:51:26 INFO - the last disk(s) of this archive. 13:51:26 INFO - unzip: cannot find zipfile directory in one of /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip or 13:51:26 INFO - /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.zip, and cannot find /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.ZIP, period. 13:51:26 ERROR - Return code: 9 13:51:26 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg 13:55:16 INFO - Setting buildbot property build_url to http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg 13:55:16 INFO - mkdir: /Users/cltbld/talos-slave/test/properties 13:55:16 INFO - Writing buildbot properties ['build_url'] to /Users/cltbld/talos-slave/test/properties/build_url 13:55:16 INFO - Writing to file /Users/cltbld/talos-slave/test/properties/build_url 13:55:16 INFO - Contents: 13:55:16 INFO - build_url:http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
Rather more terse but still unhelpful, from downloading a build (50-50 odds whether it was a period of ftp.m.o doing 500/503, or one of the busted-dns periods we're having now where ftp.m.o can't be resolved for a few seconds) https://tbpl.mozilla.org/php/getParsedLog.php?id=16876494&tree=Cedar 17:07:05 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip 17:07:26 FATAL - URL Error: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip 17:07:26 FATAL - Exiting -1
I think the HTTP error gives a status code, and the URL error tells you there's a url issue (possibly dns?). We could potentially do a dns check on the server after the retries fail.
Assignee: nobody → aki
This is my latest test result, from a bogus sendchange: 12:36:06 INFO - Downloading http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip to /home/cltbld/talos-slave/test/build/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:06 WARNING - Try 1: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:06 INFO - Sleeping 5 seconds... 12:36:11 WARNING - Try 2: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:11 INFO - Sleeping 10 seconds... 12:36:21 WARNING - Try 3: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:21 INFO - Sleeping 15 seconds... 12:36:36 WARNING - Try 4: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:36 INFO - Sleeping 20 seconds... 12:36:56 WARNING - Try 5: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:36:56 INFO - Sleeping 25 seconds... 12:37:21 INFO - Running command: ['nslookup', 'ftpyadda.mozilla.org'] 12:37:21 INFO - Copy/paste: nslookup ftpyadda.mozilla.org 12:37:22 INFO - Server: 10.12.48.19 12:37:22 INFO - Address: 10.12.48.19#53 12:37:22 ERROR - ** server can't find ftpyadda.mozilla.org: NXDOMAIN 12:37:22 ERROR - Either ftpyadda.mozilla.org is an invalid hostname, or DNS is busted. 12:37:22 INFO - Return code: 0 12:37:22 FATAL - Try 6: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip 12:37:22 FATAL - Exiting -1 We'll be sleeping longer (multiples of 20, atm) outside of staging. Q: should we then set RETRY in buildbot? Or is failing this many download attempts reason to go red?
Flags: needinfo?
This patch is ready for review, unless we want to add a buildbot RETRY status at the end.
Attachment #681350 - Attachment is obsolete: true
Flags: needinfo?
Flags: needinfo?
(In reply to Aki Sasaki [:aki] from comment #6) > Created attachment 681645 [details] [diff] [review] > download retry with nslookup, also tear out vestiges of noop > > This patch is ready for review, unless we want to add a buildbot RETRY > status at the end. Lets say no buildbot RETRY for now, and see how often it hits us in production :-)
Flags: needinfo?
Now with tooltool retry, which I tested by putting in a bogus tooltool server in the b2g emulator configs in staging.
Attachment #681645 - Attachment is obsolete: true
Attachment #681661 - Flags: review?(rail)
Blocks: 812149
Comment on attachment 681661 [details] [diff] [review] download retry with nslookup, tooltool retry, also tear out vestiges of noop Review of attachment 681661 [details] [diff] [review]: ----------------------------------------------------------------- LGTM. I think, it would be great to factor the retry logic or use util.retry form tools.
Attachment #681661 - Flags: review?(rail) → review+
Yeah, I was thinking that we could pass a method, frequency/count, error_level, error_msg, etc. to a helper retry method. I'm futuring that atm, though.
Comment on attachment 681661 [details] [diff] [review] download retry with nslookup, tooltool retry, also tear out vestiges of noop http://hg.mozilla.org/build/mozharness/rev/8854e241ce97 Thanks Rail!
Attachment #681661 - Flags: checked-in+
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Product: mozilla.org → Release Engineering
Component: General Automation → Mozharness
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: