Closed Bug 1434338 Opened 7 years ago Closed 7 years ago

Investigate periodic H12's in Heroku services

Categories

(Taskcluster :: Services, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: dustin, Assigned: dustin)

References

Details

No description provided.
From the last 24h: Jan 29 23:46:54 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskGroupMembers/read-write" host=auth.taskcluster.net request_id=86722837-b01e-4902-b8dd-f485c9b2178f fwd="54.145.103.177" dyno=web.3 connect=1ms service=30001ms status=503 bytes=0 protocol=https Jan 30 02:55:45 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskRequirement/read-write" host=auth.taskcluster.net request_id=e5265bb4-ff3c-4e44-b789-bd338de7b6bb fwd="54.159.233.238" dyno=web.7 connect=2ms service=30000ms status=503 bytes=0 protocol=https Jan 30 03:43:19 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueArtifacts/read-write" host=auth.taskcluster.net request_id=778fc5e3-d7d0-47d9-b8b5-b08b56b24d64 fwd="54.158.179.180" dyno=web.8 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 30 03:51:21 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskGroupMembers/read-write" host=auth.taskcluster.net request_id=4f3d7d27-976c-47da-94c6-d730ee987460 fwd="54.162.112.145" dyno=web.6 connect=6ms service=30001ms status=503 bytes=0 protocol=https Jan 30 03:56:04 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTasks/read-write" host=auth.taskcluster.net request_id=527beb75-9676-4005-9f9f-ea27594b1562 fwd="54.163.75.42" dyno=web.5 connect=0ms service=30000ms status=503 bytes=0 protocol=https Jan 30 04:36:36 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskGroupMembers/read-write" host=auth.taskcluster.net request_id=0a6c800a-6b59-4e1e-ac56-81d2d96b0825 fwd="54.224.34.77" dyno=web.5 connect=2ms service=30000ms status=503 bytes=0 protocol=https Jan 30 05:30:00 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTasks/read-write" host=auth.taskcluster.net request_id=e0c0c9ee-cf01-474d-878d-add20a198a0e fwd="54.159.161.78" dyno=web.6 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 30 05:43:03 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskGroupMembers/read-write" host=auth.taskcluster.net request_id=60db4ec9-10c2-4def-a072-0f3069d4d36f fwd="54.226.77.207" dyno=web.8 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 30 06:13:23 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueWorker/read-write" host=auth.taskcluster.net request_id=ff81a5c2-c346-4fca-9f4d-cd24a66c931f fwd="54.145.103.177" dyno=web.7 connect=0ms service=30000ms status=503 bytes=0 protocol=https Jan 30 06:37:51 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueWorkerType/read-write" host=auth.taskcluster.net request_id=3d1d0ed4-c414-4c02-9764-e5b08e35f103 fwd="54.81.6.197" dyno=web.8 connect=5ms service=30001ms status=503 bytes=0 protocol=https Jan 30 07:12:22 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskRequirement/read-write" host=auth.taskcluster.net request_id=daab804d-0443-4ac2-a54d-a8df8ebee0a5 fwd="54.234.160.246" dyno=web.2 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 30 07:55:46 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskGroups/read-write" host=auth.taskcluster.net request_id=75fd45b6-705d-4196-b305-dc71c9409c15 fwd="54.81.156.187" dyno=web.8 connect=0ms service=30000ms status=503 bytes=0 protocol=https Jan 30 08:03:27 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueWorker/read-write" host=auth.taskcluster.net request_id=ae694c10-3835-46f0-8247-cbf2e058e367 fwd="54.224.151.66" dyno=web.5 connect=2ms service=30001ms status=503 bytes=0 protocol=https Jan 30 08:32:00 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskDependency/read-write" host=auth.taskcluster.net request_id=6f0589af-7fbe-4787-80f6-8b3dc9f81148 fwd="54.161.198.186" dyno=web.5 connect=1ms service=30000ms status=503 bytes=0 protocol=https There's a strong consistency there: these are all Azure credential requests. Bug 1434369 to time those out early.
Yesterday's failure was different. The web.1 dyno failed without warning: Jan 29 16:38:40 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=15f5463a-785b-47d2-9871-6fe5dfedd125 fwd="54.211.186.84" dyno=web.1 connect=1ms service=5ms status=200 bytes=1348 protocol=https Jan 29 16:38:40 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=896861a1-fc10-4358-893d-65a4e3cc7f7f fwd="54.162.211.136" dyno=web.1 connect=9ms service=12ms status=200 bytes=8713 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=5486c8f3-2fed-4b2d-9df3-320a0ac21d2a fwd="54.167.122.228" dyno=web.1 connect=4ms service=8ms status=200 bytes=8713 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f8b310ec-77a5-41be-886a-0196fc1bddbd fwd="54.196.247.57" dyno=web.1 connect=1ms service=7ms status=200 bytes=13501 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e7e8a151-02b7-468a-8632-74151cf78875 fwd="54.198.71.27" dyno=web.1 connect=1ms service=5ms status=200 bytes=1029 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=176744b9-f39b-4814-a419-b009c25fc0ba fwd="54.196.119.202" dyno=web.1 connect=1ms service=4ms status=200 bytes=1207 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e7e8de4b-a92d-42a6-9e85-19b3952ebcf0 fwd="54.167.122.228" dyno=web.1 connect=4ms service=9ms status=200 bytes=1036 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f6eb6a93-94a2-439e-83ff-85b6e6383a19 fwd="54.234.230.153" dyno=web.1 connect=0ms service=12ms status=200 bytes=985 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=26328312-b943-475b-acf3-7a0a88d8ebeb fwd="54.161.21.155" dyno=web.1 connect=1ms service=47ms status=200 bytes=1346 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ff2a3bdb-ac75-42d8-9bf3-8840f3822d46 fwd="54.198.71.27" dyno=web.1 connect=1ms service=185ms status=200 bytes=2091 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e794db1d-7f30-4780-bd14-04fa336d632e fwd="54.196.247.57" dyno=web.1 connect=2ms service=105ms status=200 bytes=1318 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=28af7096-3643-4a6a-9696-fc7d4c485937 fwd="54.156.124.123" dyno=web.1 connect=9ms service=19ms status=200 bytes=1273 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=4c76fe21-6eb2-4452-bc44-2b6577baac72 fwd="54.167.122.228" dyno=web.1 connect=1ms service=5ms status=200 bytes=985 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=574b37e2-8a52-489b-b64e-eb577e8dcd65 fwd="54.205.191.197" dyno=web.1 connect=1ms service=21ms status=200 bytes=1283 protocol=https Jan 29 16:38:41 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=de912fc4-d9f6-4815-9b54-e4d74fabcff3 fwd="54.211.186.84" dyno=web.1 connect=0ms service=130ms status=200 bytes=1028 protocol=https Jan 29 16:39:09 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=90e898d0-886a-4d61-9cb0-20b6adb5cea8 fwd="54.162.38.99" dyno=web.1 connect=0ms service=18ms status=200 bytes=1036 protocol=https Jan 29 16:39:09 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=25263ec2-ff2f-4311-abf5-a863200a421a fwd="54.204.92.232" dyno=web.1 connect=1ms service=4ms status=200 bytes=1275 protocol=https Jan 29 16:39:11 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ed627c51-bcfa-43ee-ab4e-2696a6b180f8 fwd="54.205.191.197" dyno=web.1 connect=0ms service=30000ms status=503 bytes=0 protocol=https Jan 29 16:39:11 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=9bb158e0-a142-4b0b-9807-054a7cdf5b04 fwd="54.205.191.197" dyno=web.1 connect=0ms service=30002ms status=503 bytes=0 protocol=https Jan 29 16:39:11 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=90ae0b95-37a7-4d53-a5fc-8f8ff7c879e2 fwd="54.161.236.99" dyno=web.1 connect=3ms service=30000ms status=503 bytes=0 protocol=https Jan 29 16:39:12 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=1d2d929b-0eb1-4ed7-8f2f-d353b57c2b49 fwd="54.205.191.197" dyno=web.1 connect=18ms service=30001ms status=503 bytes=0 protocol=https Jan 29 16:39:12 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=3514880a-c956-4424-88d1-a322c74cd484 fwd="54.205.191.197" dyno=web.1 connect=3ms service=30005ms status=503 bytes=0 protocol=https Jan 29 16:39:12 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=6566425f-24ab-4266-a6c7-3b27fc097493 fwd="54.91.191.76" dyno=web.1 connect=4ms service=30000ms status=503 bytes=0 protocol=https Jan 29 16:39:12 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=a568e584-5b2a-4bbd-bedc-2eb5d6856299 fwd="54.211.186.84" dyno=web.1 connect=3ms service=30000ms status=503 bytes=0 protocol=https Things to note here: - There are only authenticateHawk requests on this dyno (the most recent call other than authenticateHawk is to expandScopes at 16:38:35 and completed in 3ms; the next one is also expandScopes, completing at 16:39:13 with an H12 and thus having begun at 16:38:43) - Two requests complete at 16:39:09, just two seconds before the H12's start, both within 20ms of beginning, indicating that the event loop was running at that time - That said, no other requests complete after 16:38:41, 30s before the H12's start, indicating that the event loop was *not* running at that time, as requests typically come to a dyno at about 20/sec - The dyno never recovered - it was terminated manually (by restarting all dynos) - The service times leading up to the failure do not ramp smoothly up to 30000ms, but a few requests before the failure are in the 100-200ms range instead of the usual 5-15ms range. - I didn't see any client editing API calls around this time on any dyno, but it's possible that it decided to regenerate the resolver anyway (which it does on a schedule). DEBUG logging was disabled at the time, so this would not appear here. - This dyno did not get a listClients call (which we suspect might be slow, since it expands scopes for all clients)
Jonas is looking for some patterns in the metrics.
Two more azure-related H12's: Jan 31 16:13:52 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterindexv1/table/IndexedTasks/read-write" host=auth.taskcluster.net request_id=ddd3700b-34ed-4e8d-bbfe-a6e825d2357f fwd="184.72.152.166" dyno=web.1 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 31 17:11:26 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/azure/taskclusterqueuev1/table/QueueTaskDependency/read-write" host=auth.taskcluster.net request_id=9459a0d5-dddb-4ebe-822c-b7f96f157158 fwd="184.73.43.189" dyno=web.5 connect=1ms service=30001ms status=503 bytes=0 protocol=https
About an hour ago, web.5 got "sick": its connect= and service= times went up 50-100x: Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=cb1be7f6-0eea-48df-ad81-c9540f4eb64c fwd="54.163.184.30" dyno=web.2 connect=0ms service=2ms status=200 bytes=1235 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=578fb248-7c36-4211-9528-1a28072b1fb1 fwd="54.159.187.102" dyno=web.2 connect=4ms service=25ms status=200 bytes=1336 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=7d66634b-2125-4236-94df-609e804ca1cc fwd="184.73.43.189" dyno=web.2 connect=7ms service=27ms status=200 bytes=1207 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=fd602f58-b386-4f93-8400-14f41d15981a fwd="54.163.184.30" dyno=web.2 connect=1ms service=26ms status=200 bytes=1336 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ab983373-05f8-4acf-901d-12461da62e71 fwd="54.81.60.223" dyno=web.2 connect=9ms service=83ms status=200 bytes=1275 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=46233eb8-5f93-4668-9c33-6033cfa2ad05 fwd="54.159.183.74" dyno=web.2 connect=2ms service=16ms status=200 bytes=1207 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=bc659872-3708-4e0d-abba-954865d85192 fwd="54.167.137.245" dyno=web.2 connect=7ms service=21ms status=200 bytes=1630 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=8e88a509-9e40-4083-8047-aaee7bc11cb5 fwd="54.234.210.209" dyno=web.2 connect=6ms service=3ms status=200 bytes=1036 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=eb773f00-7375-493a-ae96-5c7e43f7fa91 fwd="54.158.8.132" dyno=web.2 connect=16ms service=22ms status=200 bytes=1336 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=57e96787-47db-4f54-b3f4-f30b53a66aac fwd="54.145.51.224" dyno=web.2 connect=0ms service=4ms status=200 bytes=1273 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ce40abd8-7211-407a-a487-b0e5e4cc6bd3 fwd="184.73.43.189" dyno=web.2 connect=11ms service=21ms status=200 bytes=1273 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=839741c7-30e1-47c5-9670-0c06ba0d1cef fwd="54.221.137.69" dyno=web.2 connect=2ms service=2ms status=200 bytes=1273 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=4e588181-23f0-433f-91bb-73a27ec6e01b fwd="54.235.0.209" dyno=web.2 connect=2ms service=111ms status=200 bytes=1258 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=7e6baea4-7e94-4e84-a131-4584e720794a fwd="174.129.102.130" dyno=web.2 connect=7ms service=25ms status=200 bytes=8713 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=10d893c1-3bb9-4d5d-a7ab-b1840c8afd28 fwd="54.159.212.164" dyno=web.2 connect=2ms service=10ms status=200 bytes=1199 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=8f22e6f0-8d7a-4ac1-a70a-f09ce76105a5 fwd="54.81.60.223" dyno=web.2 connect=4ms service=6ms status=200 bytes=1321 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=072b96a8-a188-4a3c-a8e1-3b4e13fbc9a6 fwd="54.159.183.74" dyno=web.2 connect=16ms service=32ms status=200 bytes=1199 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f7b1fcdd-a498-461e-bbe2-5e5ac156ed4c fwd="54.198.219.77" dyno=web.2 connect=12ms service=9ms status=200 bytes=1273 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c95eb199-36dc-49ac-a6ad-b15b25778923 fwd="54.196.33.252" dyno=web.2 connect=9ms service=5ms status=200 bytes=1235 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c47059cd-4587-4471-8bf5-e93c9378ae48 fwd="174.129.102.130" dyno=web.2 connect=9ms service=11ms status=200 bytes=1333 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=9ded5bf8-2bce-492f-bff4-7975a814f965 fwd="50.16.12.181" dyno=web.2 connect=8ms service=8ms status=200 bytes=1404 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d116b282-d9c4-4c82-bfa0-3616e0d7bd86 fwd="54.158.8.132" dyno=web.2 connect=2ms service=10ms status=200 bytes=1197 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=3ed5d221-5ef3-4483-aa30-e61af7886300 fwd="50.16.12.181" dyno=web.2 connect=3ms service=13ms status=200 bytes=1207 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f16a00eb-30cc-49bd-a7db-86f00e1576b6 fwd="54.227.52.160" dyno=web.2 connect=2ms service=7ms status=200 bytes=1207 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=2694445e-859d-4739-8f59-a853b549d849 fwd="54.211.255.120" dyno=web.2 connect=3ms service=6ms status=200 bytes=1336 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ea3b12f3-a29b-4b9a-8a1e-51d08a49ffaf fwd="54.221.137.69" dyno=web.2 connect=2ms service=9ms status=200 bytes=1324 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=aad621ec-2f2e-4475-a1b3-887fffa0c884 fwd="50.16.12.181" dyno=web.2 connect=2ms service=13ms status=200 bytes=1333 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=5543676f-2c7f-410d-b99e-9f434d0ff321 fwd="174.129.102.130" dyno=web.2 connect=1ms service=5ms status=200 bytes=8713 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=GET path="/v1/azure/taskclustersecretsv1/tables" host=auth.taskcluster.net request_id=c5aed901-665d-4422-9e7d-fedfa56c7691 fwd="54.245.179.83" dyno=web.2 connect=3ms service=438ms status=200 bytes=696 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d7ef772a-b1cf-4875-87eb-48ad41348674 fwd="54.198.219.77" dyno=web.2 connect=13ms service=18ms status=200 bytes=1275 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=77c7cc58-2900-4926-a26e-94c450b2fa1b fwd="54.225.34.105" dyno=web.2 connect=14ms service=25ms status=200 bytes=1324 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d64ea682-8410-4b89-a534-ee49934f2970 fwd="54.159.212.164" dyno=web.2 connect=0ms service=2ms status=200 bytes=1207 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e1f26099-61ca-48fd-87f8-dbd817c04a30 fwd="54.211.255.120" dyno=web.2 connect=2ms service=16ms status=200 bytes=1273 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=43131ca3-7649-44b5-9d3e-713db00305bf fwd="54.205.247.112" dyno=web.2 connect=3ms service=2ms status=200 bytes=1036 protocol=https Jan 31 17:33:31 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=37b6368c-4d14-48e3-b41f-d76fda6f72a9 fwd="54.159.212.164" dyno=web.2 connect=4ms service=7ms status=200 bytes=1185 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=75d2f63d-5cc5-46b6-a9c3-183a394905f1 fwd="54.196.33.252" dyno=web.2 connect=3ms service=4ms status=200 bytes=1172 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=0c3d498d-1642-484e-94a0-8f3e3c9a7921 fwd="54.198.23.137" dyno=web.2 connect=62ms service=160ms status=200 bytes=1039 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c3d5b66d-1ffb-4504-aecd-19f91cce2dbc fwd="54.167.137.245" dyno=web.2 connect=89ms service=186ms status=200 bytes=1172 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=538391b5-adee-47b7-9ac6-95af328ab925 fwd="54.158.8.132" dyno=web.2 connect=71ms service=166ms status=200 bytes=1327 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=264924a4-5d10-4842-b958-520a897f5221 fwd="54.196.33.252" dyno=web.2 connect=143ms service=269ms status=200 bytes=1321 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=b4556194-495c-4952-9c00-0b5c025e3476 fwd="54.158.8.132" dyno=web.2 connect=155ms service=284ms status=200 bytes=1273 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=011dc8ae-01a5-4a5e-89e5-492ce9a7879c fwd="54.163.184.30" dyno=web.2 connect=149ms service=276ms status=200 bytes=1336 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=b63a53d0-3ab1-40b9-a27c-298105eb5a95 fwd="54.226.78.236" dyno=web.2 connect=185ms service=115ms status=200 bytes=1288 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d9596ce9-b556-41de-9785-197108ad4999 fwd="54.159.212.164" dyno=web.2 connect=187ms service=117ms status=200 bytes=1273 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=4db758fc-7117-4590-8e79-1592562752d7 fwd="174.129.102.130" dyno=web.2 connect=165ms service=249ms status=200 bytes=8713 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=3cc2a923-73fc-42f1-aa2d-e10e607a0813 fwd="54.167.137.245" dyno=web.2 connect=80ms service=119ms status=200 bytes=1273 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=26a1aa10-a974-4d59-88dd-13266250bf65 fwd="54.158.8.132" dyno=web.2 connect=199ms service=119ms status=200 bytes=1273 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=8dfb6662-1463-4389-9426-f04ecd54ca81 fwd="54.198.23.137" dyno=web.2 connect=137ms service=118ms status=200 bytes=1207 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=GET path="/v1/azure/taskclusterauthv1/table/BackupTest171219/read-only" host=auth.taskcluster.net request_id=5a2311ff-89da-48e1-9a5f-2c36a0142a3a fwd="54.245.179.83" dyno=web.2 connect=147ms service=38ms status=200 bytes=862 protocol=https Jan 31 17:33:32 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=0ef47b0f-086f-44c6-b04b-61e714e39fc4 fwd="54.198.219.77" dyno=web.2 connect=149ms service=111ms status=200 bytes=1267 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=551251c7-f11b-4604-9cf1-a46f3b0ab6f1 fwd="54.167.137.245" dyno=web.2 connect=91ms service=193ms status=200 bytes=1324 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=bac8c92a-a870-44f6-b3d1-d532f7e846c1 fwd="54.234.210.209" dyno=web.2 connect=170ms service=246ms status=200 bytes=985 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=40d29d4a-dce0-40d2-9b52-3a887e55e3a4 fwd="54.226.78.236" dyno=web.2 connect=91ms service=207ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=faa1eb9c-d6ec-4a3d-b245-9fe94ce05856 fwd="54.226.78.236" dyno=web.2 connect=91ms service=196ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=01686af4-2249-466b-8cb4-005d0ae7cf6b fwd="54.159.183.74" dyno=web.2 connect=94ms service=196ms status=200 bytes=1333 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=02afe02d-7c46-4138-8429-9c10822df260 fwd="54.205.247.112" dyno=web.2 connect=179ms service=120ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=41cbc21a-59fa-48dd-94e7-d2fd8ac814d7 fwd="54.162.85.43" dyno=web.2 connect=98ms service=224ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=7245f4c0-519b-4d5f-80bc-a329fdc7ae5a fwd="54.221.137.69" dyno=web.2 connect=167ms service=312ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=13ce4553-231d-4a92-8f76-e5b5cf3ddf22 fwd="54.81.60.223" dyno=web.2 connect=138ms service=284ms status=200 bytes=1199 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=b4787007-c34c-4cfc-8f17-82b14336e545 fwd="54.205.42.160" dyno=web.2 connect=199ms service=312ms status=200 bytes=1925 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=a0a67a7d-b074-47cb-ae43-854f1b017bab fwd="54.163.184.30" dyno=web.2 connect=207ms service=359ms status=200 bytes=1336 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e11fa900-ed93-4e41-9d2b-620673c7d9b2 fwd="50.16.12.181" dyno=web.2 connect=196ms service=332ms status=200 bytes=1341 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=4156e177-f5bc-4d6a-8b7e-54a33fd010b3 fwd="54.226.199.31" dyno=web.2 connect=202ms service=361ms status=200 bytes=1273 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=80e3074b-3195-4dac-a4df-c847016f9da9 fwd="54.81.60.223" dyno=web.2 connect=211ms service=330ms status=200 bytes=1324 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=188e8baf-0750-4b80-9346-5a2a08428922 fwd="54.226.78.236" dyno=web.2 connect=209ms service=350ms status=200 bytes=1275 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=9a472c57-7c41-4599-8a99-4372b6571371 fwd="54.81.60.223" dyno=web.2 connect=203ms service=357ms status=200 bytes=1275 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d98dfa7a-7a30-432f-9342-85ffcecade06 fwd="174.129.102.130" dyno=web.2 connect=206ms service=360ms status=200 bytes=8713 protocol=https Jan 31 17:33:33 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=091ec2e3-03da-48f5-b7c5-a00313f2beb1 fwd="54.81.60.223" dyno=web.2 connect=207ms service=370ms status=200 bytes=1333 protocol=https It stayed in this state until about 17:55 (the times sort of petered out over a few seconds) The connect times here are particularly suspcious. In previous cases of H12's, we've seen connect=3ms service=300001ms or the like -- very short connect times. But in this case, the connect times are in the 100's of ms, suggesting that the host network stack is delayed (https://devcenter.heroku.com/articles/http-routing#heroku-router-log-format indicates that connect= is the TCP connect time)
I filed a Heroku support ticket.
The suddenly-30s thing happened again. This time, I was editing roles (for bug 1425366) Jan 31 22:19:13 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=b09dff88-943a-4c91-9ff5-0283d1b7e344 fwd="54.146.210.91" dyno=web.8 connect=0ms service=3ms status=200 bytes=1275 protocol=https Jan 31 22:19:13 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ce3ef411-298f-4fbc-9dac-c743e4e7e70c fwd="54.146.210.91" dyno=web.8 connect=1ms service=3ms status=200 bytes=1275 protocol=https Jan 31 22:19:13 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=6aa65d81-21ae-461c-8097-8332391e7304 fwd="54.196.253.158" dyno=web.8 connect=1ms service=3ms status=200 bytes=985 protocol=https Jan 31 22:19:13 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=9dba323c-27b3-432d-a323-facff37e4281 fwd="23.20.101.24" dyno=web.8 connect=1ms service=9ms status=200 bytes=1926 protocol=https Jan 31 22:19:14 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c17fcb92-db7c-4085-8399-fd47f5edc1b1 fwd="54.80.212.111" dyno=web.8 connect=1ms service=7ms status=200 bytes=1273 protocol=https Jan 31 22:19:14 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=6891bf36-9304-4972-a288-96dac43fcd06 fwd="54.205.168.5" dyno=web.8 connect=1ms service=8ms status=200 bytes=1199 protocol=https Jan 31 22:19:14 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=57f70e36-5829-4385-aa3d-e6eeb7f87b59 fwd="54.196.253.158" dyno=web.8 connect=1ms service=9ms status=200 bytes=1199 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=092e7b9c-48ce-4947-aed1-a935732a84f6 fwd="54.221.118.133" dyno=web.8 connect=3ms service=30000ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=92d951de-d1eb-4707-8c87-8a2d7d3c3e55 fwd="54.82.60.164" dyno=web.8 connect=3ms service=30001ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=74953333-af33-44ed-8edc-518bcef568e8 fwd="54.205.47.2" dyno=web.8 connect=1ms service=30000ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=95e1888c-41bf-415d-a22e-53a2025731df fwd="107.21.169.188" dyno=web.8 connect=1ms service=30001ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=9de139f1-2c7f-4607-b48b-ae2b3d98af78 fwd="54.224.157.36" dyno=web.8 connect=1ms service=30001ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=6ad63652-1153-4c1a-bcd9-7aa1085c5b6c fwd="54.92.248.240" dyno=web.8 connect=2ms service=30000ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=44a14f97-6404-4b09-8971-9facdc0d713b fwd="50.16.22.33" dyno=web.8 connect=2ms service=30001ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=fd6f1fef-f977-4e32-9d4f-dd945c113354 fwd="54.92.248.240" dyno=web.8 connect=5ms service=30000ms status=503 bytes=0 protocol=https Jan 31 22:19:44 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=090bd544-2ca3-4142-b683-03ca4af7177a fwd="23.20.101.24" dyno=web.8 connect=4ms service=30000ms status=503 bytes=0 protocol=https
That said, the last change I made was quite a bit earlier: Jan 31 22:10:52 taskcluster-auth heroku/router: at=info method=POST path="/v1/roles/worker-type%3A*" host=auth.taskcluster.net request_id=84da4320-5551-4b1f-929a-5f94526eedce fwd="100.4.199.178" dyno=web.1 connect=1ms service=972ms status=200 bytes=2657 protocol=https So it's not clear exactly what this was.. A restart all all dynos seems to have cleared up auth's issue. I don't see any downstream damage..
web.6 got sick about 3h ago Feb 05 09:38:00 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f4429925-2cdd-4fd9-b1dc-6cbb685249c0 fwd="54.163.108.178" dyno=web.6 connect=0ms service=3ms status=200 bytes=1039 protocol=https Feb 05 09:38:00 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=1879a5ec-4b8f-4d7a-9188-17fd273d04da fwd="54.166.84.39" dyno=web.6 connect=1ms service=4ms status=200 bytes=1273 protocol=https Feb 05 09:38:00 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=f5afafea-04a3-45bb-bae6-357e18abea77 fwd="54.211.32.103" dyno=web.6 connect=1ms service=4ms status=200 bytes=1207 protocol=https Feb 05 09:38:30 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=58ac1959-d0ff-47e2-844c-58ad76a546aa fwd="54.166.13.82" dyno=web.6 connect=2ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:38:30 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=5df68350-4261-468b-8926-17fcd94e3680 fwd="23.20.7.207" dyno=web.6 connect=1ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:38:30 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c339939f-7a08-4546-96ae-dae69d9e3939 fwd="54.198.33.150" dyno=web.6 connect=1ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:38:30 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=316c434c-1854-40b1-87e3-de4e5aa9208e fwd="54.196.247.57" dyno=web.6 connect=0ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:38:30 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=e1bd4559-e867-4835-9de2-ceb67b23f12b fwd="54.221.14.146" dyno=web.6 connect=4ms service=30000ms status=503 bytes=0 protocol=https ... Feb 05 09:43:26 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=1a16b111-bcf6-4d8e-ad95-2d60f7668d15 fwd="23.20.213.113" dyno=web.6 connect=4ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:43:26 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=5e352aae-98be-4313-83bb-2fa76dc36719 fwd="54.196.247.57" dyno=web.6 connect=4ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=81b7730a-e1b7-4941-887e-65c0c562ed19 fwd="54.226.169.192" dyno=web.6 connect=1012ms service=30000ms status=503 bytes=0 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=3fc8c1a6-13b2-475a-9b55-c51bff915c44 fwd="54.146.48.11" dyno=web.6 connect=3002ms service=30001ms status=503 bytes=0 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=8e95ccdd-cb13-4534-bba1-9d5778af3abd fwd="54.146.48.11" dyno=web.6 connect=1ms service=5ms status=200 bytes=1535 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=b1e69033-1174-47a6-916a-1d86dba962fd fwd="54.226.169.192" dyno=web.6 connect=1ms service=43ms status=200 bytes=1273 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=41fe756c-dc77-43a6-a94f-41814f02d4be fwd="54.91.150.116" dyno=web.6 connect=1ms service=3ms status=200 bytes=1324 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=c8eae5db-a3b2-4878-91be-be3703c2e3ee fwd="54.224.241.210" dyno=web.6 connect=1ms service=3ms status=200 bytes=1273 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=ccceb8e5-5ae0-4ce2-a314-7bdfe3b66c67 fwd="54.163.108.178" dyno=web.6 connect=1ms service=4ms status=200 bytes=1324 protocol=https Feb 05 09:43:27 taskcluster-auth heroku/router: at=info method=POST path="/v1/authenticate-hawk" host=auth.taskcluster.net request_id=d08ef674-940b-4852-85f0-f0867cb9497d fwd="54.205.244.43" dyno=web.6 connect=0ms service=2ms status=200 bytes=1207 protocol=https It seems to have fixed itself this time. There's still no ramp-up or ramp-down on the service= times: everything is going fine, then suddenly this dyno is not responding in 30s, then about five minutes later it's back to replying in a few ms. It never takes 1s or 5s or 27s. I do not see any related logging at either the start or end time or 30s before the start time. I haven't heard back from Heroku about the support request.
I added statsum* to DEBUG for the production service to try to see when datapoints are being submitted
Looking through the list of middleware https://github.com/taskcluster/taskcluster-lib-api/blob/master/src/api.js#L874 I don't see anything else that could be blocking a response for >30s. I added taskcluster-lib-monitor to DEBUG too, just in case.
Just saw a similar thing in tc-secrets. This service runs two web dynos, and web.2 got "sick": Feb 05 17:40:18 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/secret/project/releng/gecko/build/level-3/adjust-sdk.token" host=secrets.taskcluster.net request_id=03666874-49a0-463b-b99c-a64d36eda7a1 fwd="52.205.255.138" dyno=web.2 connect=5ms service=30001ms status=503 bytes=0 protocol=https ... Feb 05 17:46:35 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/brighton" host=secrets.taskcluster.net request_id=2a9c606c-2a83-4544-8884-3bd33d8f7f7f fwd="89.136.108.180" dyno=web.2 connect=3ms service=30000ms status=503 bytes=0 protocol=https Feb 05 17:46:40 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/secret/project/taskcluster/gecko/hgfingerprint" host=secrets.taskcluster.net request_id=2b24c347-c2cf-4c27-b6a6-761e4e552a15 fwd="13.56.236.249" dyno=web.2 connect=1ms service=30000ms status=503 bytes=0 protocol=https Feb 05 17:46:53 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/secret/project/taskcluster/gecko/hgfingerprint" host=secrets.taskcluster.net request_id=45893eab-b075-4281-ab72-cfd3dc930697 fwd="54.175.204.156" dyno=web.2 connect=7ms service=30002ms status=503 bytes=0 protocol=https Feb 05 17:46:59 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/secret/repo%3Agithub.com%2Fmozilla-releng%2Fservices%3Abranch%3Astaging" host=secrets.taskcluster.net request_id=33f85b2a-5532-43c6-8bce-479b2a3d5f1f fwd="54.236.138.100" dyno=web.2 connect=1ms service=30000ms status=503 bytes=0 protocol=https Feb 05 17:47:07 taskcluster-secrets heroku/router: at=error code=H12 desc="Request timeout" method=GET path="/v1/secret/project/releng/gecko/build/level-3/gapi.data" host=secrets.taskcluster.net request_id=642b4f84-6906-4fe4-a9a5-1538990041e8 fwd="34.228.245.124" dyno=web.2 connect=4ms service=30000ms status=503 bytes=0 protocol=https The GET /v1/secret/<secret> API endpoint does call out to Azure, so slow or retried responses from Azure would cause this failure. Those calls are asynchronous, so waiting on Azure would not block other requests. However, /brighton is not a valid path, and the Express router should have quickly turned that into a 404 with no remote calls. So something "froze" this dyno for about seven minutes. Right at the end of that: Feb 05 17:46:43 taskcluster-secrets heroku/web.2: State changed from up to down Feb 05 17:46:43 taskcluster-secrets heroku/web.2: State changed from down to starting Feb 05 17:46:47 taskcluster-secrets heroku/web.2: Starting process with command `node src/main.js server` Feb 05 17:46:53 taskcluster-secrets heroku/web.2: State changed from starting to up so heroku noticed, or something. Notably, that dyno restart does not appear on the Heroku dashboard Looking back in the logs from comment 9, I see (interspersed with many H12's): Feb 05 09:43:12 taskcluster-auth heroku/web.6: State changed from up to down Feb 05 09:43:12 taskcluster-auth heroku/web.6: State changed from down to starting Feb 05 09:43:16 taskcluster-auth heroku/web.6: Starting process with command `node src/main server` Feb 05 09:43:22 taskcluster-auth app/web.6: 2018-02-05T09:43:22.322Z auth:ScopeResolver Loading clients and roles Feb 05 09:43:26 taskcluster-auth heroku/web.6: State changed from starting to up which corresponds to the end of that outage, too.
Summary: Investigate periodic H12's in Auth service → Investigate periodic H12's in Heroku services
I updated the Heroku support request with more details. Hopefully they'll get back to us soon -- nothing for five days now.
Support confirms that this is an issue with shutdowns of dyno hosts for planned migrations. They indicate it is fixed and that we should not see further examples of it.
I'll close in a bit if this is no longer happening.
Assignee: jopsen → dustin
I haven't seen anything.. Heroku says there are still minor issues around dyno relocations, etc., but the minutes-long outages are fixed. The minor issues will cause one-off 500's etc. that should be automatically retried.
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
We just saw this on taskcluster-queue. web.3 (out of 25!) got sick. Lots of Feb 15 17:02:45 taskcluster-queue heroku/router: at=error code=H12 desc="Request timeout" method=POST path="/v1/claim-work/aws-provisioner-v1/gecko-1-b-win2012" host=queue.taskcluster.net request_id=e61ebbf9-a4a5-465c-8f42-4b74beacc700 fwd="34.209.25.0" dyno=web.3 connect=318ms service=30000ms status=503 bytes=0 protocol=https Restarting all dynos fixed it. It's starting to look like Heroku just isn't up to the task of hosting something with this level of uptime requirement.
(In reply to Dustin J. Mitchell [:dustin] pronoun: he from comment #17) > We just saw this on taskcluster-queue. web.3 (out of 25!) got sick. Lots of The performance dynos are single-tenant (unlike P1/P2). Given the sheer number of P1s in use for taskcluster-queue, would switching to fewer performance-m dynos (at a higher concurrency) perhaps provide an overall greater performance/$ and improved reliability? https://devcenter.heroku.com/articles/heroku-enterprise#available-dyno-types The guide here suggests that the performance dynos have fewer relocations: https://help.heroku.com/KR6BGO6A/what-do-relocating-dyno-to-a-new-server-messages-mean (I guess it depends on whether this bug is caused by relocations, or corrected by them [and several minutes later than would have been ideal])
I'll leave that question to Jonas -- the same question applies for auth, which is using 2X dynos right now.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Occurred again just now. This happened to treeherder Friday, too. It appears that Heroku now includes a "Platform Initiated" dyno restart when this occurs. It's still a 5-10 minute outage, though.
I got a note from Heroku saying that on Feb 16 they fixed another major source of this issue. It is, indeed, becoming less frequent. I'd like to start dialing back the resources afforded to the auth service (which will, surprisingly, reduce our exposure to this issue!) tomorrow.
Memory usage is way down and load is hovering around 0.5, so I'm going to dial this back from 8 to 6 2X dynos.
..another outage, unrelated to my change
Numbers still look OK, so dialing down to 5 dynos
That seems fine, too.
Status: REOPENED → RESOLVED
Closed: 7 years ago7 years ago
Resolution: --- → FIXED
Component: Authentication → Services
You need to log in before you can comment on or make changes to this bug.