1286935 - Requests to queue.taskcluster.net from Heroku sometimes taking up to 30 seconds

Reporter

Description

•

8 years ago

Treeherder's log parser fetches logs from queue.taskcluster.net in order to parse them. For the last 12 hours, our SCL3 instance was seeing request times between 1.0-1.5s: https://rpm.newrelic.com/accounts/677903/applications/5585473/externals?tw%5Bend%5D=1468523469&tw%5Bstart%5D=1468480269#id=5b2245787465726e616c2f71756575652e7461736b636c75737465722e6e65742f616c6c222c22225d&sort_by=average_call_time (will attach graph screenshots shortly for those without New Relic access) However for the same timeframe, our Heroku instance saw times between 2-29s: https://rpm.newrelic.com/accounts/677903/applications/14179733/externals?tw%5Bend%5D=1468523454&tw%5Bstart%5D=1468480254#id=5b2245787465726e616c2f71756575652e7461736b636c75737465722e6e65742f616c6c222c22225d&sort_by=average_call_time An example slow to fetch log was: https://queue.taskcluster.net/v1/task/G4y51plRTzu3UgrjjFWkLQ/runs/0/artifacts/public%2Flogs%2Flive_backing.log I'm presuming this may be because the Heroku instance requests are being routed through cloud-mirror. As such, I'm guessing that the problem is either: a) cloud-mirror was under too much load at those times b) Treeherder was the first consumer to request that log, and the first fetch just takes a while (until cloud-mirror has mirrored it to the relevant region) c) some other issue? Some questions: 1) Is there logging somewhere for cloud-mirror that can show how long requests are taking / current load? If not, could we add some? 2) I see cloud-mirror's `maxWaitForCachedCopy` is set to 28000ms, which matches the max times we're seeing. Can we lower this please? 3) Given I'm pretty sure Treeherder is the majority consumer of these logs (can we work out some stats? it sets a unique user-agent) - and Heroku's US region choice (where Treeherder will soon live) is in US-east, should we just store the taskcluster logs in US-east to start with, to save having to always use cloud-mirror? Is there a reason Taskcluster's AWS instances use US-west? (The Heroku parts of Taskcluster are US-east already)

Flags: needinfo?(jhford)

Flags: needinfo?(garndt)

scl3-request-durations.jpg 8 years ago Ed Morley [:emorley] (deleted), image/jpeg		Details
heroku-request-durations.jpg 8 years ago Ed Morley [:emorley] (deleted), image/jpeg		Details
us-east-1_copy_times.png 8 years ago Greg Arndt [:garndt] (deleted), image/png		Details