Closed
Bug 808814
Opened 12 years ago
Closed 12 years ago
mozharness download-and-extract should detect, retry, and report download errors
Categories
(Release Engineering :: Applications: MozharnessCore, defect, P3)
Release Engineering
Applications: MozharnessCore
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: mozilla, Assigned: mozilla)
References
Details
(Whiteboard: [mozharness][unittest])
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
rail
:
review+
mozilla
:
checked-in+
|
Details | Diff | Splinter Review |
13:45:21 INFO - #####
13:45:21 INFO - ##### Running download-and-extract step.
13:45:21 INFO - #####
13:45:21 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.tests.zip
13:45:25 INFO - mkdir: /Users/cltbld/talos-slave/test/build
13:51:26 INFO - mkdir: /Users/cltbld/talos-slave/test/build/tests
13:51:26 INFO - Running command: ['unzip', '-o', '/Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip', 'bin/*', 'certs/*', 'modules/*', 'mozbase/*', 'mochitest/*'] in /Users/cltbld/talos-slave/test/build/tests
13:51:26 INFO - Copy/paste: unzip -o /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip bin/* certs/* modules/* mozbase/* mochitest/*
13:51:26 INFO - Archive: /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip
13:51:26 INFO - End-of-central-directory signature not found. Either this file is not
13:51:26 INFO - a zipfile, or it constitutes one disk of a multi-part archive. In the
13:51:26 INFO - latter case the central directory and zipfile comment will be found on
13:51:26 INFO - the last disk(s) of this archive.
13:51:26 INFO - unzip: cannot find zipfile directory in one of /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip or
13:51:26 INFO - /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.zip, and cannot find /Users/cltbld/talos-slave/test/build/firefox-19.0a1.en-US.mac.tests.zip.ZIP, period.
13:51:26 ERROR - Return code: 9
13:51:26 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
13:55:16 INFO - Setting buildbot property build_url to http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
13:55:16 INFO - mkdir: /Users/cltbld/talos-slave/test/properties
13:55:16 INFO - Writing buildbot properties ['build_url'] to /Users/cltbld/talos-slave/test/properties/build_url
13:55:16 INFO - Writing to file /Users/cltbld/talos-slave/test/properties/build_url
13:55:16 INFO - Contents:
13:55:16 INFO - build_url:http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-macosx64/1352141866/firefox-19.0a1.en-US.mac.dmg
Comment 1•12 years ago
|
||
Rather more terse but still unhelpful, from downloading a build (50-50 odds whether it was a period of ftp.m.o doing 500/503, or one of the busted-dns periods we're having now where ftp.m.o can't be resolved for a few seconds)
https://tbpl.mozilla.org/php/getParsedLog.php?id=16876494&tree=Cedar
17:07:05 INFO - Downloading http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip
17:07:26 FATAL - URL Error: http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/cedar-win32/1352414251/firefox-19.0a1.en-US.win32.zip
17:07:26 FATAL - Exiting -1
Assignee | ||
Comment 2•12 years ago
|
||
I think the HTTP error gives a status code, and the URL error tells you there's a url issue (possibly dns?).
We could potentially do a dns check on the server after the retries fail.
Assignee | ||
Comment 3•12 years ago
|
||
Assignee | ||
Updated•12 years ago
|
Assignee: nobody → aki
Comment 4•12 years ago
|
||
Assignee | ||
Comment 5•12 years ago
|
||
This is my latest test result, from a bogus sendchange:
12:36:06 INFO - Downloading http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip to /home/cltbld/talos-slave/test/build/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:06 WARNING - Try 1: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:06 INFO - Sleeping 5 seconds...
12:36:11 WARNING - Try 2: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:11 INFO - Sleeping 10 seconds...
12:36:21 WARNING - Try 3: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:21 INFO - Sleeping 15 seconds...
12:36:36 WARNING - Try 4: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:36 INFO - Sleeping 20 seconds...
12:36:56 WARNING - Try 5: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:36:56 INFO - Sleeping 25 seconds...
12:37:21 INFO - Running command: ['nslookup', 'ftpyadda.mozilla.org']
12:37:21 INFO - Copy/paste: nslookup ftpyadda.mozilla.org
12:37:22 INFO - Server: 10.12.48.19
12:37:22 INFO - Address: 10.12.48.19#53
12:37:22 ERROR - ** server can't find ftpyadda.mozilla.org: NXDOMAIN
12:37:22 ERROR - Either ftpyadda.mozilla.org is an invalid hostname, or DNS is busted.
12:37:22 INFO - Return code: 0
12:37:22 FATAL - Try 6: URL Error: http://ftpyadda.mozilla.org/pub/mozilla.org/b2g/tinderbox-builds/cedar-ics_armv7a_gecko/1352838526/b2g-19.0a1.en-US.android-arm.tests.zip
12:37:22 FATAL - Exiting -1
We'll be sleeping longer (multiples of 20, atm) outside of staging.
Q: should we then set RETRY in buildbot? Or is failing this many download attempts reason to go red?
Flags: needinfo?
Assignee | ||
Comment 6•12 years ago
|
||
This patch is ready for review, unless we want to add a buildbot RETRY status at the end.
Attachment #681350 -
Attachment is obsolete: true
Flags: needinfo?
Assignee | ||
Updated•12 years ago
|
Flags: needinfo?
Comment 7•12 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #6)
> Created attachment 681645 [details] [diff] [review]
> download retry with nslookup, also tear out vestiges of noop
>
> This patch is ready for review, unless we want to add a buildbot RETRY
> status at the end.
Lets say no buildbot RETRY for now, and see how often it hits us in production :-)
Flags: needinfo?
Assignee | ||
Comment 8•12 years ago
|
||
Now with tooltool retry, which I tested by putting in a bogus tooltool server in the b2g emulator configs in staging.
Attachment #681645 -
Attachment is obsolete: true
Attachment #681661 -
Flags: review?(rail)
Comment 9•12 years ago
|
||
Comment on attachment 681661 [details] [diff] [review]
download retry with nslookup, tooltool retry, also tear out vestiges of noop
Review of attachment 681661 [details] [diff] [review]:
-----------------------------------------------------------------
LGTM. I think, it would be great to factor the retry logic or use util.retry form tools.
Attachment #681661 -
Flags: review?(rail) → review+
Assignee | ||
Comment 10•12 years ago
|
||
Yeah, I was thinking that we could pass a method, frequency/count, error_level, error_msg, etc. to a helper retry method. I'm futuring that atm, though.
Assignee | ||
Comment 11•12 years ago
|
||
Comment on attachment 681661 [details] [diff] [review]
download retry with nslookup, tooltool retry, also tear out vestiges of noop
http://hg.mozilla.org/build/mozharness/rev/8854e241ce97
Thanks Rail!
Attachment #681661 -
Flags: checked-in+
Assignee | ||
Updated•12 years ago
|
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
Updated•10 years ago
|
Component: General Automation → Mozharness
You need to log in
before you can comment on or make changes to this bug.
Description
•