Closed Bug 1266624 Opened 9 years ago Closed 8 years ago

Intermittent-infra [test-linux.sh:error] failed to download mozharness zip

Categories

(Taskcluster :: Operations and Service Requests, task, P3)

Tracking

(Not tracked)

RESOLVED FIXED
mozilla53

People

(Reporter: philor, Assigned: wcosta)

References

Details

(Keywords: intermittent-failure)

Attachments

(2 files)

https://tools.taskcluster.net/task-inspector/#P51vUqolSTGW_9-1eCWnHQ/0 + curl --fail -o mozharness.zip --retry 10 -L https://queue.taskcluster.net/v1/task/Uc1c4AkUSYSfKMjoY_y7hQ/artifacts/public/build/mozharness.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 29 100 29 0 0 81 0 --:--:-- --:--:-- --:--:-- 85 100 139 100 139 0 0 279 0 --:--:-- --:--:-- --:--:-- 279 79 642k 79 509k 0 0 317k 0 0:00:02 0:00:01 0:00:01 317k 79 642k 79 509k 0 0 195k 0 0:00:03 0:00:02 0:00:01 0 79 642k 79 509k 0 0 141k 0 0:00:04 0:00:03 0:00:01 0 79 642k 79 509k 0 0 110k 0 0:00:05 0:00:04 0:00:01 0 79 642k 79 509k 0 0 92938 0 0:00:07 0:00:05 0:00:02 0 79 642k 79 509k 0 0 78860 0 0:00:08 0:00:06 0:00:02 0 79 642k 79 509k 0 0 68489 0 0:00:09 0:00:07 0:00:02 0 79 642k 79 509k 0 0 60527 0 0:00:10 0:00:08 0:00:02 0 79 642k 79 509k 0 0 54223 0 0:00:12 0:00:09 0:00:03 0 79 642k 79 509k 0 0 49108 0 0:00:13 0:00:10 0:00:03 0 79 642k 79 509k 0 0 44875 0 0:00:14 0:00:11 0:00:03 0 79 642k 79 509k 0 0 41314 0 0:00:15 0:00:12 0:00:03 0 79 642k 79 509k 0 0 38277 0 0:00:17 0:00:13 0:00:04 0 79 642k 79 509k 0 0 35655 0 0:00:18 0:00:14 0:00:04 0 79 642k 79 509k 0 0 33370 0 0:00:19 0:00:15 0:00:04 0 79 642k 79 509k 0 0 31360 0 0:00:20 0:00:16 0:00:04 0 79 642k 79 509k 0 0 29578 0 0:00:22 0:00:17 0:00:05 0 79 642k 79 509k 0 0 27988 0 0:00:23 0:00:18 0:00:05 0 79 642k 79 509k 0 0 26560 0 0:00:24 0:00:19 0:00:05 0 79 642k 79 509k 0 0 25271 0 0:00:26 0:00:20 0:00:06 0 79 642k 79 509k 0 0 24101 0 0:00:27 0:00:21 0:00:06 0 79 642k 79 509k 0 0 23035 0 0:00:28 0:00:22 0:00:06 0 79 642k 79 509k 0 0 22058 0 0:00:29 0:00:23 0:00:06 0 79 642k 79 509k 0 0 21162 0 0:00:31 0:00:24 0:00:07 0 79 642k 79 509k 0 0 20335 0 0:00:32 0:00:25 0:00:07 0 79 642k 79 509k 0 0 19571 0 0:00:33 0:00:26 0:00:07 0 79 642k 79 509k 0 0 18862 0 0:00:34 0:00:27 0:00:07 0 79 642k 79 509k 0 0 18202 0 0:00:36 0:00:28 0:00:08 0 79 642k 79 509k 0 0 17587 0 0:00:37 0:00:29 0:00:08 0 79 642k 79 509k 0 0 17013 0 0:00:38 0:00:30 0:00:08 0 79 642k 79 509k 0 0 16474 0 0:00:39 0:00:31 0:00:08 0 79 642k 79 509k 0 0 15969 0 0:00:41 0:00:32 0:00:09 0 79 642k 79 509k 0 0 15494 0 0:00:42 0:00:33 0:00:09 0 79 642k 79 509k 0 0 15046 0 0:00:43 0:00:34 0:00:09 0 79 642k 79 509k 0 0 14623 0 0:00:44 0:00:35 0:00:09 0 79 642k 79 509k 0 0 14224 0 0:00:46 0:00:36 0:00:10 0 79 642k 79 509k 0 0 13845 0 0:00:47 0:00:37 0:00:10 0 79 642k 79 509k 0 0 13487 0 0:00:48 0:00:38 0:00:10 0 79 642k 79 509k 0 0 13146 0 0:00:50 0:00:39 0:00:11 0 79 642k 79 509k 0 0 12822 0 0:00:51 0:00:40 0:00:11 0 79 642k 79 509k 0 0 12514 0 0:00:52 0:00:41 0:00:11 0 79 642k 79 509k 0 0 12220 0 0:00:53 0:00:42 0:00:11 0 79 642k 79 509k 0 0 11940 0 0:00:55 0:00:43 0:00:12 0 79 642k 79 509k 0 0 11672 0 0:00:56 0:00:44 0:00:12 0 79 642k 79 509k 0 0 11416 0 0:00:57 0:00:45 0:00:12 0 79 642k 79 509k 0 0 11171 0 0:00:58 0:00:46 0:00:12 0 79 642k 79 509k 0 0 10937 0 0:01:00 0:00:47 0:00:13 0 79 642k 79 509k 0 0 10712 0 0:01:01 0:00:48 0:00:13 0 79 642k 79 509k 0 0 10496 0 0:01:02 0:00:49 0:00:13 0 79 642k 79 509k 0 0 10288 0 0:01:03 0:00:50 0:00:13 0 79 642k 79 509k 0 0 10089 0 0:01:05 0:00:51 0:00:14 0 79 642k 79 509k 0 0 9897 0 0:01:06 0:00:52 0:00:14 0 79 642k 79 509k 0 0 9712 0 0:01:07 0:00:53 0:00:14 0 79 642k 79 509k 0 0 9535 0 0:01:09 0:00:54 0:00:15 0 79 642k 79 509k 0 0 9363 0 0:01:10 0:00:55 0:00:15 0 79 642k 79 509k 0 0 9198 0 0:01:11 0:00:56 0:00:15 0 79 642k 79 509k 0 0 9038 0 0:01:12 0:00:57 0:00:15 0 79 642k 79 509k 0 0 8884 0 0:01:14 0:00:58 0:00:16 0 79 642k 79 509k 0 0 8735 0 0:01:15 0:00:59 0:00:16 0 79 642k 79 509k 0 0 8590 0 0:01:16 0:01:00 0:00:16 0 80 642k 80 515k 0 0 8534 0 0:01:17 0:01:01 0:00:16 1228 80 642k 80 515k 0 0 8398 0 0:01:18 0:01:02 0:00:16 1228 80 642k 80 515k 0 0 8267 0 0:01:19 0:01:03 0:00:16 1228 80 642k 80 515k 0 0 8139 0 0:01:20 0:01:04 0:00:16 1228 80 642k 80 515k 0 0 8015 0 0:01:22 0:01:05 0:00:17 1228 80 642k 80 515k 0 0 7895 0 0:01:23 0:01:06 0:00:17 0 80 642k 80 515k 0 0 7778 0 0:01:24 0:01:07 0:00:17 0 80 642k 80 515k 0 0 7665 0 0:01:25 0:01:08 0:00:17 0 80 642k 80 515k 0 0 7555 0 0:01:27 0:01:09 0:00:18 0 80 642k 80 515k 0 0 7448 0 0:01:28 0:01:10 0:00:18 0 80 642k 80 515k 0 0 7345 0 0:01:29 0:01:11 0:00:18 0 80 642k 80 515k 0 0 7244 0 0:01:30 0:01:12 0:00:18 0 80 642k 80 515k 0 0 7145 0 0:01:32 0:01:13 0:00:19 0 80 642k 80 515k 0 0 7050 0 0:01:33 0:01:14 0:00:19 0 80 642k 80 515k 0 0 6957 0 0:01:34 0:01:15 0:00:19 0 80 642k 80 515k 0 0 6866 0 0:01:35 0:01:16 0:00:19 0 80 642k 80 515k 0 0 6778 0 0:01:37 0:01:17 0:00:20 0 80 642k 80 515k 0 0 6692 0 0:01:38 0:01:18 0:00:20 0 80 642k 80 515k 0 0 6608 0 0:01:39 0:01:19 0:00:20 0 80 642k 80 515k 0 0 6527 0 0:01:40 0:01:20 0:00:20 0 80 642k 80 515k 0 0 6527 0 0:01:40 0:01:20 0:00:20 0 curl: (18) transfer closed with 130367 bytes remaining to read + fail 'failed to download mozharness zip' + echo + echo '[test-linux.sh:error]' 'failed to download mozharness zip' [test-linux.sh:error] failed to download mozharness zip + exit 1
This hasn't been tagged in 12 days. Closing for lack of activity.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INCOMPLETE
Status: RESOLVED → REOPENED
Resolution: INCOMPLETE → ---
Component: General → Operations
Priority: -- → P3
happened a lot also this morning
Component: Operations → Buildduty
Product: Taskcluster → Release Engineering
QA Contact: bugspam.Callek
I think this is a problem with the RelEng Archiver service.
I think this is a problem with the RelEng Archiver service.
Component: Buildduty → Operations
Product: Release Engineering → Taskcluster
QA Contact: bugspam.Callek
Sorry, my bad - I should have read the bug description, and not just the title. Indeed from comment 0, the problem was a TaskCluster one. The reason I suggested switching it was that today we had 404's from the releng api archiver service, I believe - but that is a different problem to the one described in this bug, so moved it back to Taskcluster :: Operations.
This intermittent is displayed as an "orange" right now, but IMO it should be red or purple. I filed bug 1292353 on that.
I suspect that the autostarring is lumping a lot of different problems into this one bug. The original post was about a download that failed, while bug 1292353 is about a 404 from somewhere (queue or cloud-mirror).
s|the autostarring|taskcluster/scripts/tester/test-ubuntu1(2|6)04.sh| If you run a script which does if ! curl --fail -o mozharness.zip --retry 10 -L $MOZHARNESS_URL; then fail "failed to download mozharness zip" then you're going to get every possible curl failure reporting the same error message.
Summary: Intermittent [test-linux.sh:error] failed to download mozharness zip → Intermittent-infra [test-linux.sh:error] failed to download mozharness zip
Sample: https://tools.taskcluster.net/task-inspector/#bIEs4qriR9iQ_5fa2RywXA/0 ---- # Unzip the mozharness ZIP file created by the build task if ! curl --fail -o mozharness.zip --retry 10 -L $MOZHARNESS_URL; then fail "failed to download mozharness zip" fi + curl --fail -o mozharness.zip --retry 10 -L https://queue.taskcluster.net/v1/task/GiDYEiBzT9u7Uw6GRz8oXA/artifacts/public/build/mozharnes s.zip % Total % Received % Xferd Average Speed Time Time Time Current Dload Upload Total Spent Left Speed 0 0 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 100 29 100 29 0 0 120 0 --:--:-- --:--:-- --:--:-- 121 100 231 100 231 0 0 879 0 --:--:-- --:--:-- --:--:-- 879 0 231 0 0 0 0 0 0 --:--:-- --:--:-- --:--:-- 0 curl: (22) The requested URL returned error: 404 + fail 'failed to download mozharness zip' + echo + echo '[test-linux.sh:error]' 'failed to download mozharness zip' [test-linux.sh:error] failed to download mozharness zip ---- In this case, the parent task (https://tools.taskcluster.net/task-inspector/#GiDYEiBzT9u7Uw6GRz8oXA/0) finished at 07:40:12Z past the hour, and the dependent task started at 07:46:21Z, so the artifact should not have 404'd at that time.
:dustin, this is a taskcluster issue and one of our top intermittent failures, can you find someone to take a look at this?
Flags: needinfo?(dustin)
I think this is the same as bug 1306189
Assignee: nobody → jhford
Depends on: 1306189
Flags: needinfo?(dustin)
I hit this variation, which seems uncommon: https://queue.taskcluster.net/v1/task/WDnmcSTFSW6dd4xseB3ECQ/runs/0/artifacts/public%2Flogs%2Flive_backing.log [task 2016-10-18T19:42:53.183525Z] 5 584k 5 34399 0 0 113 0 1:28:17 0:05:02 1:23:15 0 [task 2016-10-18T19:42:53.709537Z] 5 584k 5 34399 0 0 113 0 1:28:17 0:05:03 1:23:14 0 [task 2016-10-18T19:42:53.709573Z] 5 584k 5 34399 0 0 113 0 1:28:17 0:05:03 1:23:14 0 [task 2016-10-18T19:42:53.710695Z] curl: (56) SSL read: error:00000000:lib(0):func(0):reason(0), errno 104 https://github.com/docker/docker/issues/2011 has some interesting discussion, but I'm not sure if it is relevant.
I think the fixes for bug 1306189 addressed the bulk of this bug. The remainder may be background noise from using curl (which doesn't retry).
:dustin, a large spike happened yesterday, can you look into this and see if we have a new recurring problem or if this is a hiccup in the system?
Flags: needinfo?(dustin)
Greg looked yesterday (the spike was two days ago) -- it was due to an increased error rate from S3
Flags: needinfo?(dustin)
Hi jhford, This seems to be a high offender on OF. Do you still have time to look into what can be done to be improved? Thanks.
Flags: needinfo?(jhford)
John's away until mid-December. Until we start doing `hg robustcheckout` for tests, I think we need to make this curl invocation more reliable by retrying.
Flags: needinfo?(jhford)
Dustin, can you work on making curl more reliable?
I can put it on my list, but not soon.
Maybe Greg can find somebody else to look into this.
Flags: needinfo?(garndt)
Assignee: jhford → wcosta
wander will be able to take a stab at this sometime soon. It appears that this is just a modification to how we download mozharness.zip to make it a bit better when things have a hiccup.
Flags: needinfo?(garndt)
Depends on: 1319449
Comment on attachment 8814246 [details] Bug 1266624: Apply exponential backoff for mozharness download. https://reviewboard.mozilla.org/r/95484/#review95582
Attachment #8814246 - Flags: review+
:jonasfj image builder still fails, any idea why? https://tools.taskcluster.net/task-inspector/#eJdJazH9TMCE6TCAKbPh-A/
Flags: needinfo?(jopsen)
I keep seeing this failure too, always ending just the same, after the mesa tooltool_fetch. It is intermittent -- if I retry a few times, it will eventually succeed. https://public-artifacts.taskcluster.net/XMUNGbqnTTqO07fDA7z7Xw/0/public/logs/live_backing.log https://public-artifacts.taskcluster.net/FODulSJxTraUrfAybvCtMA/0/public/logs/live_backing.log
@wcosta, image builder failing is definitely unrelated to this... Looking at curl options it could be related I filed bug 1320776 to investigate that possibility further. So far I haven't been able to reproduce this issue.
Flags: needinfo?(jopsen)
Depends on: 1320776
Comment on attachment 8814246 [details] Bug 1266624: Apply exponential backoff for mozharness download. dustin gave a r+, that's good for me!
Attachment #8814246 - Flags: review?(garndt)
Pushed by wcosta@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/373caf284bbc Apply exponential backoff for mozharness download. r=dustin
Keywords: leave-open
Comment on attachment 8817200 [details] [diff] [review] apply backoff mozharness fixes to macosx script. r=dustin I forgot to fix mac script in my earlier patch :/
Attachment #8817200 - Flags: review?(dustin)
Attachment #8817200 - Flags: review?(dustin) → review+
Comment on attachment 8814246 [details] Bug 1266624: Apply exponential backoff for mozharness download. Looks like this might have been covered under the other patch
Attachment #8814246 - Flags: review?(garndt)
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/05799cb1659d apply backoff mozharness fixes to macosx script. r=dustin
Keywords: checkin-needed
Status: REOPENED → RESOLVED
Closed: 9 years ago8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla53
Whiteboard: [checkin-needed-aurora]
Whiteboard: [checkin-needed-aurora]
Depends on: 1433059
No longer depends on: 1433059
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: