Closed Bug 1186619 Opened 9 years ago Closed 9 years ago

Artifact upload should be retried

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: garndt, Assigned: wcosta)

References

Details

Attachments

(1 file)

(deleted), text/x-github-pull-request
garndt
: review+
jonasfj
: review+
Details
Occasionally uploading of artifacts will fail. Docker-worker should make an attempt to retry uploading the artifact with some kind of exponential back off. Example: [taskcluster] Error uploading "private/build/aries.zip" artifact. Error: Could not upload artifact. Error: write ECONNRESET
Blocks: 1165759
Should note that these issues were happening before, but the errors were not caught and reported properly. This caused the task to hang forever until it couldn't be reclaimed any longer (usually 24 hours later).
Blocks: 1187960
Blocks: 1187590
Blocks: 1197807
Blocks: 1203523
Assignee: nobody → wcosta
Status: NEW → ASSIGNED
Attached file Artifact Upload PR (deleted) —
Attachment #8680260 - Flags: review?(garndt)
Comment on attachment 8680260 [details] Artifact Upload PR Some work needs to be done around retrying individual files within a directory rather than the directory itself. Talked on IRC about it.
Attachment #8680260 - Flags: review?(garndt)
Comment on attachment 8680260 [details] Artifact Upload PR Patch updated based on discussion on IRC.
Attachment #8680260 - Flags: review?(garndt)
Left some feedback in the PR. I'm not sure how multiple reads form the stream is causing the same data to be returned, it might be because of the small size. Tests around directory uploads should be done with files of various sizes (preferably above the 16k watermark of the stream (I think that's the watermark) )
Comment on attachment 8680260 [details] Artifact Upload PR Removing flag for now. Spoke to Wander on IRC and it seems the approach should be to stream the copy to a file, and then retry upload from that file. Also tests should be updated to do directory uploads and also be careful when the IP table rule is put into place so it captures the artifact uppload and not the calls to things like the queue.
Attachment #8680260 - Flags: review?(garndt)
Attachment #8680260 - Flags: review?(jopsen)
Attachment #8680260 - Flags: review?(garndt)
Comment on attachment 8680260 [details] Artifact Upload PR iptables is not good for testing. Optimally use a proxy on localhost, or an invalid url which isn't a perfect test. Unit test might be the only feasible way to test this. Also fix promise/async issues.
Attachment #8680260 - Flags: review?(jopsen) → review-
Comment on attachment 8680260 [details] Artifact Upload PR I reworked the patch, there are some tests failure but seems unrelated to this. I had to add two optional parameters to the S3 upload function, which allow to hijack in the upload URL.
Attachment #8680260 - Flags: review- → review?(jopsen)
Ignoring the "all" chunk right now (I cant' reproduce it locally), none of the chunks should fail after rebasing master hopefully.
Comment on attachment 8680260 [details] Artifact Upload PR From a quick look it seems okay to me. I trust garndt will give it a more detailed review.
Attachment #8680260 - Flags: review?(jopsen) → review+
Comment on attachment 8680260 [details] Artifact Upload PR looks good, been rolling this out slowly to workers.
Attachment #8680260 - Flags: review?(garndt) → review+
This support has landed.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Component: Docker-Worker → Workers
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: