Closed Bug 1432183 Opened 7 years ago Closed 7 years ago

403s from queue.taskcluster.net and cloud-mirror.taskcluster.net

Categories

(Taskcluster :: Operations and Service Requests, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: philor, Assigned: dustin)

References

Details

This is https://github.com/taskcluster/taskcluster-retrospectives/blob/master/retrospectives/2018-01-18-spring-cleaning-url-change.md When that occurred, we rolled back in Heroku and the plan was to fix things up Friday morning (it's a config change to cloud-mirror followed by a re-deployment of the queue patch). But then lots of other stuff failed and we never did the fix-up. Then pmoore innocently landed a change to queue this morning, re-deploying the known-bad patch. At least this time the fix was clear!
Assignee: nobody → dustin
Commit pushed to master at https://github.com/taskcluster/taskcluster-queue https://github.com/taskcluster/taskcluster-queue/commit/0beb33f6680f2121685af292b420c2eb518b8d43 Bug 1432183 - Go back to the old AWS-SDK version This is a partial revert of a8c1f01734d7457af1fe23bb0e406a0b78043228. This upgrade caused a URL change which led to the outage in https://github.com/taskcluster/taskcluster-retrospectives/blob/master/retrospectives/2018-01-18-spring-cleaning-url-change.md
Severity: blocker → normal
Summary: Trees closed - 403s from queue.taskcluster.net and cloud-mirror.taskcluster.net → 403s from queue.taskcluster.net and cloud-mirror.taskcluster.net
I deployed the changes to cloud-mirror this morning, and re-deployed the queue spring-cleaning patch just now.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Cloud-mirror started 503'ing about 40 minutes after that at 16:18 UTC. I can't tell why from the logs -- sentry and statsum are misconfigured for the service, so there are no exceptions to see in the logs. I rolled back the queue patch (both in Heroku and in git) and rolled back cloud-mirror in heroku only (not in docker-cloud). I'll need some help diagnosing and fixing this.
Status: RESOLVED → REOPENED
Flags: needinfo?(jhford)
Resolution: FIXED → ---
John suggested just landing the queue changes without the aws-sdk upgrade that causes the cloud-mirror failures. https://github.com/taskcluster/taskcluster-queue/pull/245
Flags: needinfo?(jhford)
Commit pushed to master at https://github.com/taskcluster/taskcluster-queue https://github.com/taskcluster/taskcluster-queue/commit/d1bc912764f66f5a4489a90782428e98ee32bc50 Bug 1432183 - pin version of AWS SDK at 2.1.22 Newer versions generate valid URLs that cause failures in cloud-mirror; cloud-mirror cannot be redeployed to fix this issue, so we will hold back aws-sdk in the queue until cloud-mirror is turned off.
I'm deploying the version from comment 6 (I had to go back and add aws-sdk-promise, too) to the queue so that we have the spring-cleaning in place and can continue to hack on queue. There's still a time-bomb here in that if we update aws-sdk, cloud-mirror will fail.
Bug 1433020 to enable defusing the time-bomb. Otherwise, this is fixed as queue no longer has anything backed out.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Component: Operations → Operations and Service Requests
You need to log in before you can comment on or make changes to this bug.