Closed Bug 1154248 Opened 10 years ago Closed 6 years ago

Taskcluster jobs being submitted to Treeherder with log URLs that 404/403

Categories

(Taskcluster :: Services, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: emorley, Assigned: dustin)

References

Details

We're periodically seeing many taskcluster jobs with invalid (404ing) log URLs. It would be great if we could try and eliminate these, since it's causing quite a bit of noise in the Treeherder New Relic exception logs. James, do you have any ideas? Thanks :-) [emorley@treeherder-processor1.stage.private.scl3 ~]$ grep 'HTTP Error' /var/log/celery/celery_worker_log_parser.log | tail -n 50 [2015-04-14 04:57:54,996: ERROR/Worker-66] Failed to download/parse log for try e7d4a6df-0646-4767-9c7b-d714bf0d6658/1 (https://queue.taskcluster.net/v1/task/59Sm3wZGR2ece9cUvw1mWA/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:56,006: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound a6345444-b75c-4f26-bcd0-5024e669e40e/2 (https://queue.taskcluster.net/v1/task/pjRURLdcTya80FAk5mnkDg/runs/2/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:56,018: ERROR/Worker-65] Failed to download/parse log for try 921137ee-25aa-48ce-af1d-1ad680053221/0 (https://queue.taskcluster.net/v1/task/khE37iWqSM6vHRrWgAUyIQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:56,512: ERROR/Worker-65] Failed to download/parse log for try ecac06fe-aa18-41c1-8628-c25ae6fe146e/1 (https://queue.taskcluster.net/v1/task/7KwG_qoYQcGGKMJa5v4Ubg/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:56,994: ERROR/Worker-65] Failed to download/parse log for try 2fcb4a2a-42c6-445e-bfd7-a15f70c3f36d/0 (https://queue.taskcluster.net/v1/task/L8tKKkLGRF6_16FfcMPzbQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:58,851: ERROR/Worker-65] Failed to download/parse log for try dea3ee03-1e1b-47c1-86c3-2417d205f96d/0 (https://queue.taskcluster.net/v1/task/3qPuAx4bR8GGwyQX0gX5bQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:57:59,690: ERROR/Worker-66] Failed to download/parse log for try 7561c61c-5d0c-45c5-a5f7-3dee40fceae3/1 (https://queue.taskcluster.net/v1/task/dWHGHF0MRcWl9z3uQPzq4w/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:02,504: ERROR/Worker-66] Failed to download/parse log for try 12bf2a67-ed82-4f0e-92b7-f29714b7a85f/1 (https://queue.taskcluster.net/v1/task/Er8qZ-2CTw6St_KXFLeoXw/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:03,854: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 68251795-7e8c-4d27-94b5-491607a01624/0 (https://queue.taskcluster.net/v1/task/aCUXlX6MTSeUtUkWB6AWJA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:05,884: ERROR/Worker-66] Failed to download/parse log for try bb9e1bd2-8c92-4306-8f66-dd07ba2c965a/0 (https://queue.taskcluster.net/v1/task/u54b0oySQwaPZt0HuiyWWg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:48,577: ERROR/Worker-65] Failed to download/parse log for b2g-inbound 3bc44a95-5a5e-4f7d-ba97-6db43e5c3906/0 (https://queue.taskcluster.net/v1/task/O8RKlVpeT326l220Plw5Bg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:48,611: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 81fcaca8-8ae6-432a-9851-b7069f7c87cb/0 (https://queue.taskcluster.net/v1/task/gfysqIrmQyqYUbcGn3yHyw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:49,107: ERROR/Worker-65] Failed to download/parse log for mozilla-inbound 9e7704f8-6f80-4c43-b860-95d98ed72866/0 (https://queue.taskcluster.net/v1/task/nncE-G-ATEO4YJXZjtcoZg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:49,121: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 8ae3c9c7-3cd0-480b-9450-a3af55f20c02/0 (https://queue.taskcluster.net/v1/task/iuPJxzzQSAuUUKOvVfIMAg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:49,624: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound f8883888-46c3-49bd-b38a-66ad60e21c73/0 (https://queue.taskcluster.net/v1/task/-Ig4iEbDSb2zimatYOIccw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:49,645: ERROR/Worker-65] Failed to download/parse log for mozilla-inbound 4ce39052-cf0c-45f2-8bb1-7807bd5394ef/0 (https://queue.taskcluster.net/v1/task/TOOQUs8MRfKLsXgHvVOU7w/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:50,140: ERROR/Worker-65] Failed to download/parse log for try fdf1f1ed-6a11-4e2d-a2eb-0e09c93c4598/1 (https://queue.taskcluster.net/v1/task/_fHx7WoRTi2i6w4JyTxFmA/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:50,169: ERROR/Worker-66] Failed to download/parse log for try 002c18f8-4678-49c6-86e7-21d6cece85d5/0 (https://queue.taskcluster.net/v1/task/ACwY-EZ4ScaG5yHWzs6F1Q/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:50,697: ERROR/Worker-65] Failed to download/parse log for try 339e0d3f-48bd-4a0d-8d8b-7e3b26e3a191/0 (https://queue.taskcluster.net/v1/task/M54NP0i9Sg2Ni347JuOhkQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:50,702: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound a8048685-9715-48d9-b6bf-08298dff577f/0 (https://queue.taskcluster.net/v1/task/qASGhZcVSNm2vwgpjf9Xfw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:51,192: ERROR/Worker-65] Failed to download/parse log for mozilla-inbound 50417910-286e-4f99-806b-1ed1646d2d91/0 (https://queue.taskcluster.net/v1/task/UEF5EChuT5mAax7RZG0tkQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:51,214: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 410e8cab-4a97-4913-bce0-fb4d11e3e3e8/0 (https://queue.taskcluster.net/v1/task/QQ6Mq0qXSRO84PtNEePj6A/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:51,692: ERROR/Worker-65] Failed to download/parse log for try eb120a7d-434a-4f01-bc16-90a656b22c0c/0 (https://queue.taskcluster.net/v1/task/6xIKfUNKTwG8FpCmVrIsDA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:51,721: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound b80846c8-5ff1-4604-839d-6f226688cc96/0 (https://queue.taskcluster.net/v1/task/uAhGyF_xRgSDnW8iZojMlg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:52,191: ERROR/Worker-65] Failed to download/parse log for try 39ecc1cd-3ffe-4851-b75e-d56b464ca99c/1 (https://queue.taskcluster.net/v1/task/OezBzT_-SFG3XtVrRkypnA/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:52,217: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound a4e2495a-15fd-4c55-b0bc-764db0b88173/0 (https://queue.taskcluster.net/v1/task/pOJJWhX9TFWwvHZNsLiBcw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:52,728: ERROR/Worker-65] Failed to download/parse log for try d5845232-411d-4555-950e-2444f0a9ecb6/0 (https://queue.taskcluster.net/v1/task/1YRSMkEdRVWVDiRE8Knstg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:52,753: ERROR/Worker-66] Failed to download/parse log for try 2de5d181-d6db-4b3a-bfa6-cbf7f7da6bf5/0 (https://queue.taskcluster.net/v1/task/LeXRgdbbSzq_psv399pr9Q/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:53,239: ERROR/Worker-65] Failed to download/parse log for try c99ccaf0-385e-4c49-9f4c-1f9a6f8b97c9/0 (https://queue.taskcluster.net/v1/task/yZzK8DheTEmfTB-ab4uXyQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:53,273: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 174b439b-b4d3-409e-9c35-02213e389a56/0 (https://queue.taskcluster.net/v1/task/F0tDm7TTQJ6cNQIhPjiaVg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:53,721: ERROR/Worker-65] Failed to download/parse log for try 590069fb-2733-47d3-8c96-779124f2f41a/1 (https://queue.taskcluster.net/v1/task/WQBp-yczR9OMlneRJPL0Gg/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:53,779: ERROR/Worker-66] Failed to download/parse log for try 2f206f12-c595-47c8-9a78-16f8104f37df/1 (https://queue.taskcluster.net/v1/task/LyBvEsWVR8iaeBb4EE833w/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:54,287: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 652659f1-95ad-44ce-a5fc-824f3e13a47f/0 (https://queue.taskcluster.net/v1/task/ZSZZ8ZWtRM6l_IJPPhOkfw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:54,310: ERROR/Worker-65] Failed to download/parse log for try f921b06a-e52f-4824-b738-97a51f27d809/1 (https://queue.taskcluster.net/v1/task/-SGwauUvSCS3OJelHyfYCQ/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:54,814: ERROR/Worker-66] Failed to download/parse log for try 99997f6d-06ca-4f93-a0ff-fb012274720b/0 (https://queue.taskcluster.net/v1/task/mZl_bQbKT5Og__sBInRyCw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:54,827: ERROR/Worker-65] Failed to download/parse log for mozilla-inbound 636f5431-0aa3-4f33-ae96-ce54706f9a77/0 (https://queue.taskcluster.net/v1/task/Y29UMQqjTzOuls5UcG-adw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:55,362: ERROR/Worker-66] Failed to download/parse log for try 1ebf6f87-7d8d-44f2-9251-0c8922eaaee2/1 (https://queue.taskcluster.net/v1/task/Hr9vh32NRPKSUQyJIuqu4g/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:55,410: ERROR/Worker-65] Failed to download/parse log for try ba12f4a2-2fc7-461f-b972-b50adce0fc56/1 (https://queue.taskcluster.net/v1/task/uhL0oi_HRh-5crUK3OD8Vg/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:55,909: ERROR/Worker-65] Failed to download/parse log for try 52c6784f-b93a-4e5c-85f5-bd61813a7e26/0 (https://queue.taskcluster.net/v1/task/UsZ4T7k6TlyF9b1hgTp-Jg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:55,942: ERROR/Worker-66] Failed to download/parse log for try 34b43ff5-295a-4039-8b6e-08a4f1c851e0/0 (https://queue.taskcluster.net/v1/task/NLQ_9SlaQDmLbgik8chR4A/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:56,475: ERROR/Worker-65] Failed to download/parse log for try f4aa89f1-368d-4533-86da-60d0cd5a4869/0 (https://queue.taskcluster.net/v1/task/9KqJ8TaNRTOG2mDQzVpIaQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:56,482: ERROR/Worker-66] Failed to download/parse log for mozilla-inbound 1d7ad85d-713a-4857-b1f4-baf1e9192d21/0 (https://queue.taskcluster.net/v1/task/HXrYXXE6SFex9Lrx6RktIQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:57,002: ERROR/Worker-65] Failed to download/parse log for try 963a34c0-3727-47b4-832b-8dc52653f95d/0 (https://queue.taskcluster.net/v1/task/ljo0wDcnR7SDK43FJlP5XQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:57,003: ERROR/Worker-66] Failed to download/parse log for try 3401a2ec-2158-4954-8d0f-9a2f43e7e4c6/0 (https://queue.taskcluster.net/v1/task/NAGi7CFYSVSND5ovQ-fkxg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:57,512: ERROR/Worker-65] Failed to download/parse log for try 916c2d84-06e9-4a6d-8202-e61efec4dc43/0 (https://queue.taskcluster.net/v1/task/kWwthAbpSm2CAuYe_sTcQw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:57,530: ERROR/Worker-66] Failed to download/parse log for b2g-inbound f2d25aa3-4137-4327-b8db-d0be4d6e2839/0 (https://queue.taskcluster.net/v1/task/8tJao0E3Qye429C-TW4oOQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:58,025: ERROR/Worker-66] Failed to download/parse log for try ee179ff5-db27-44fe-8aec-9eefc29c897d/0 (https://queue.taskcluster.net/v1/task/7hef9dsnRP6K7J7vwpyJfQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 04:58:58,047: ERROR/Worker-65] Failed to download/parse log for mozilla-inbound 7490e46f-a58a-4520-823d-2a727c125da5/0 (https://queue.taskcluster.net/v1/task/dJDkb6WKRSCCPSpyfBJdpQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 05:09:14,680: ERROR/Worker-68] Failed to download/parse log for try bface915-93a8-473b-a91d-cee3335d023f/0 (https://queue.taskcluster.net/v1/task/v6zpFZOoRzupHc7jM10CPw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-14 05:10:15,732: ERROR/Worker-67] Failed to download/parse log for try bface915-93a8-473b-a91d-cee3335d023f/0 (https://queue.taskcluster.net/v1/task/v6zpFZOoRzupHc7jM10CPw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found
Flags: needinfo?(jlal)
Depends on: 1154276
I looked over only some of these, but those that I did, all were caused by the task being resolved as an exception because the claim expired (caused by bug 1154276). When that bug is fixed, it should cut back on some of the exception logging, but there might be cases where a task might not have logs. Such as if the instance was not gracefully shutdown, dies somewhere in the process of handling a task, canceled (although that might change when another bug is resolved), or some other exceptional state the worker/task is in. The worker makes the best effort to upload logs, but if the task is resolved as "exception" it's a toss up if the logs will be uploaded depending on what happened.
I am not sure what the best way to handle this... I suspect you want to message 404 in the UI as a human readable error rather then an exception... With the spot model we continue to roll out we will always have some of these errors (though we can reduce the rate which Greg mentions above).
Flags: needinfo?(jlal)
Thank you for the explanation - agree we should probably make these non-fatal. In the meantime, there's been a massive spike on stage (like 50x) on the last ~hour: https://rpm.newrelic.com/accounts/677903/applications/5585473/traced_errors?tw[end]=1429204643&tw[start]=1429193843 Any ideas what caused this? eg: [2015-04-16 10:16:00,061: ERROR/Worker-7] Failed to download/parse log for try 58ed2a50-31ad-4095-bf0e-87a9b209030e/0 (https://queue.taskcluster.net/v1/task/WO0qUDGtQJW_DoepsgkDDg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:03,566: ERROR/Worker-8] Failed to download/parse log for try 4364a183-9cde-463b-a92e-3e17dbef00cc/1 (https://queue.taskcluster.net/v1/task/Q2Shg5zeRjupLj4X2-8AzA/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:04,751: ERROR/Worker-7] Failed to download/parse log for try 0b09fbb9-bfdb-48dd-9b19-5cc16670b347/0 (https://queue.taskcluster.net/v1/task/Cwn7ub_bSN2bGVzBZnCzRw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:05,285: ERROR/Worker-7] Failed to download/parse log for mozilla-inbound c887c387-70d9-4f91-af57-de767c8a9b4b/0 (https://queue.taskcluster.net/v1/task/yIfDh3DZT5GvV952fIqbSw/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:16,182: ERROR/Worker-7] Failed to download/parse log for try fd11f749-7124-4c6b-a5b6-94ef9d40043d/0 (https://queue.taskcluster.net/v1/task/_RH3SXEkTGultpTvnUAEPQ/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:18,310: ERROR/Worker-8] Failed to download/parse log for try a36bffe4-e6d4-4a94-ab22-e64d443e287a/0 (https://queue.taskcluster.net/v1/task/o2v_5ObUSpSrIuZNRD4oeg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:19,816: ERROR/Worker-8] Failed to download/parse log for try 3fc55c01-3af9-4bfb-88e2-6fff6eb00957/1 (https://queue.taskcluster.net/v1/task/P8VcATr5S_uI4m__brAJVw/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:23,770: ERROR/Worker-8] Failed to download/parse log for mozilla-inbound c38b1054-3115-4ce5-a979-f5ed7fb870bf/1 (https://queue.taskcluster.net/v1/task/w4sQVDEVTOWpefXtf7hwvw/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:39,288: ERROR/Worker-7] Failed to download/parse log for fx-team 78e35d56-9249-472c-813d-429ebb52403e/0 (https://queue.taskcluster.net/v1/task/eONdVpJJRyyBPUKeu1JAPg/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:16:44,710: ERROR/Worker-8] Failed to download/parse log for mozilla-inbound 626c09cf-a134-4e1e-a47c-9ec31b4ec169/2 (https://queue.taskcluster.net/v1/task/YmwJz6E0Th6kfJ7DG07BaQ/runs/2/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:17:00,788: ERROR/Worker-7] Failed to download/parse log for cypress ed74a18d-3c84-4240-8ffb-43190713cbd6/2 (https://queue.taskcluster.net/v1/task/7XShjTyEQkCP-0MZBxPL1g/runs/2/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:17:03,284: ERROR/Worker-7] Failed to download/parse log for try 5fae6a73-1bfb-4352-9724-777ac9c97277/1 (https://queue.taskcluster.net/v1/task/X65qcxv7Q1KXJHd6yclydw/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found [2015-04-16 10:17:47,659: ERROR/Worker-8] Failed to download/parse log for fx-team 86aac008-8acb-44e0-b266-3ff80e1fc570/0 (https://queue.taskcluster.net/v1/task/hqrACIrLROCyZj_4Dh_FcA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 404: Not Found
Spot checking a lot of these, they seem to be related more or less to the original problem. We also have a large emulator test backlog that there's a bunch of workers trying to get done so there might be a consistent increase while it gets rid of the backlog. I'll keep digging in as well.
I've filed bug 1155647 and have a PR there that makes 404s not generate exceptions on our side. So long as there are not cases where due to races, the log is initially 404 and then N seconds later is uploaded, this should be fine from my POV. Though for the sheriffs, I suspect reducing the number of jobs that are missing logs would avoid potential confusion :-)
I'm also now seeing instances where we get 403s - any idea why these are protected? If this is intentional (eg private builds) - I wonder if we should have a way of marking the log_url property when submitting to Treeherder that the URL is only for potential downstream tool consumption (where that tool may have access), and not for Treeherder's logparser? [2015-04-20 14:05:52,156: ERROR/Worker-9] Failed to download/parse log for try a3c9285d-4e57-473a-8779-c32826b0bb93/1 (https://queue.taskcluster.net/v1/task/o8koXU5XRzqHecMoJrC7kw/runs/1/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden [2015-04-20 23:56:51,865: ERROR/Worker-47] Failed to download/parse log for try 5a76999c-51de-444e-b28d-ac48633796b0/0 (https://queue.taskcluster.net/v1/task/WnaZnFHeRE6yjaxIYzeWsA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden [2015-04-21 00:00:52,853: ERROR/Worker-47] Failed to download/parse log for try 5a76999c-51de-444e-b28d-ac48633796b0/0 (https://queue.taskcluster.net/v1/task/WnaZnFHeRE6yjaxIYzeWsA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden [2015-04-21 00:11:55,512: ERROR/Worker-47] Failed to download/parse log for try 5a76999c-51de-444e-b28d-ac48633796b0/0 (https://queue.taskcluster.net/v1/task/WnaZnFHeRE6yjaxIYzeWsA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden [2015-04-21 00:36:03,957: ERROR/Worker-50] Failed to download/parse log for try 5a76999c-51de-444e-b28d-ac48633796b0/0 (https://queue.taskcluster.net/v1/task/WnaZnFHeRE6yjaxIYzeWsA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden [2015-04-21 00:46:06,082: ERROR/Worker-51] Failed to download/parse log for try 5a76999c-51de-444e-b28d-ac48633796b0/0 (https://queue.taskcluster.net/v1/task/WnaZnFHeRE6yjaxIYzeWsA/runs/0/artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden
(In reply to Ed Morley [:emorley] from comment #7) > [2015-04-20 14:05:52,156: ERROR/Worker-9] Failed to download/parse log for > try a3c9285d-4e57-473a-8779-c32826b0bb93/1 > (https://queue.taskcluster.net/v1/task/o8koXU5XRzqHecMoJrC7kw/runs/1/ > artifacts/public/logs/live_backing.log): HTTP Error 403: Forbidden Just looking at this first task, it still appears to be pending: https://tools.taskcluster.net/task-inspector/#o8koXU5XRzqHecMoJrC7kw The first run failed with worker-shutdown (this can happen if we lose spot instance). The second run failed with claim-expired (6 hours later) The third run is still pending, 20 hours after task was submitted. :/ In four hours it will hit the task deadline.
Should the log be 403 though? Surely it should 404 or else we should use different log URLs for each task - the only reason Treeherder is trying to access it is that it was given as the log URL for a completed job.
s/task/run/
(In reply to Ed Morley [:emorley] from comment #9) > Should the log be 403 though? Surely it should 404 or else we should use > different log URLs for each task - the only reason Treeherder is trying to > access it is that it was given as the log URL for a completed job. I'm not sure if the 403 is intentional or not - garndt can probably offer more insight here. Comment 8 was just to highlight that the flow for this particular task also seems a bit suspicious, with the 6 hour timeout for the claim expiring, and now in the 14 hours since the claim expiry, no new execution being triggered. I should have explained that I only wished to provide extra context for troubleshooting the issue, rather than providing an explanation (which I currently do not have). :/
Flags: needinfo?(garndt)
It was helpful, thank you :-) (I now know about the task-inspector for a start hehe)
Great! There are also some more awesome tools here: https://tools.taskcluster.net/ Certainly worth trying them all out. =)
So this looks to be a combination of two bugs. The first issue is being worked on [1] and the second issue [2] Jonas and I have been discussing. The first issue caused the task to be claimed but the second issue is what is causing your 403 issues. The artifact reference was created within taskcluster (which will cause the logs to appear as log artifacts) however the actual logs were not uploaded because the instance was killed. When this happens, AWS will give an access denied message. [1] https://bugzilla.mozilla.org/show_bug.cgi?id=1154276 [2] https://bugzilla.mozilla.org/show_bug.cgi?id=1155645
Flags: needinfo?(garndt)
Ah thank you :-)
Depends on: 1155645
Summary: Taskcluster jobs being submitted to Treeherder with log URLs that 404 → Taskcluster jobs being submitted to Treeherder with log URLs that 404/403
No longer blocks: treeherder-nr-exceptions
Component: TaskCluster → General
Product: Testing → Taskcluster
Component: General → Integration
Component: Integration → Platform and Services
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WORKSFORME
https://papertrailapp.com/systems/treeherder-prod/events?q=%22Unable%20to%20retrieve%20log%20for%22 Feb 06 00:25:04 treeherder-prod app/worker_log_parser.6: [2018-02-06 00:25:04,187] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598466: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/IJ4AzTqkS8WXKrwlLm4ZUw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:25:04 treeherder-prod app/worker_log_parser.6: [2018-02-06 00:25:04,187: WARNING/Worker-128] Unable to retrieve log for 257598466: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/IJ4AzTqkS8WXKrwlLm4ZUw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:25:10 treeherder-prod app/worker_log_parser.4: [2018-02-06 00:25:10,390] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598475: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/YAYQC5ClQYSme6j7W7Jqpw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:25:10 treeherder-prod app/worker_log_parser.4: [2018-02-06 00:25:10,390: WARNING/Worker-147] Unable to retrieve log for 257598475: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/YAYQC5ClQYSme6j7W7Jqpw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:20 treeherder-prod app/worker_log_parser.5: [2018-02-06 00:26:20,496] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598600: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/cCA8RQDLRHKSdFF4s5By4A/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:20 treeherder-prod app/worker_log_parser.5: [2018-02-06 00:26:20,496: WARNING/Worker-82] Unable to retrieve log for 257598600: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/cCA8RQDLRHKSdFF4s5By4A/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:25 treeherder-prod app/worker_log_parser.1: [2018-02-06 00:26:25,301] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598617: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/WenQnFxiSnC5sxLOMEknhQ/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:25 treeherder-prod app/worker_log_parser.1: [2018-02-06 00:26:25,301: WARNING/Worker-152] Unable to retrieve log for 257598617: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/WenQnFxiSnC5sxLOMEknhQ/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:25 treeherder-prod app/worker_log_parser.2: [2018-02-06 00:26:25,356] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598615: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/IK7e0iiuQbKEjksdIcZTPw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:25 treeherder-prod app/worker_log_parser.2: [2018-02-06 00:26:25,356: WARNING/Worker-67] Unable to retrieve log for 257598615: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/IK7e0iiuQbKEjksdIcZTPw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:52 treeherder-prod app/worker_log_parser.3: [2018-02-06 00:26:52,395] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598677: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OR4qL-tWT3aP19gpsqrawg/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:26:52 treeherder-prod app/worker_log_parser.3: [2018-02-06 00:26:52,395: WARNING/Worker-36] Unable to retrieve log for 257598677: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OR4qL-tWT3aP19gpsqrawg/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:08 treeherder-prod app/worker_log_parser.4: [2018-02-06 00:27:08,170] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598717: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OeKeVoKBTwKctIN4NTQyZg/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:08 treeherder-prod app/worker_log_parser.4: [2018-02-06 00:27:08,170: WARNING/Worker-143] Unable to retrieve log for 257598717: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/OeKeVoKBTwKctIN4NTQyZg/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:13 treeherder-prod app/worker_log_parser.5: [2018-02-06 00:27:13,572] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598730: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/PJmXMB59TkC9lEVHL0Y90g/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:13 treeherder-prod app/worker_log_parser.5: [2018-02-06 00:27:13,572: WARNING/Worker-81] Unable to retrieve log for 257598730: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/PJmXMB59TkC9lEVHL0Y90g/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:14 treeherder-prod app/worker_log_parser.3: [2018-02-06 00:27:13,812] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598731: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/MSuCd5_-T-aoy9_jUkAPHw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:14 treeherder-prod app/worker_log_parser.3: [2018-02-06 00:27:13,812: WARNING/Worker-38] Unable to retrieve log for 257598731: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/MSuCd5_-T-aoy9_jUkAPHw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:35 treeherder-prod app/worker_log_parser.2: [2018-02-06 00:27:35,637] WARNING [treeherder.log_parser.utils:47] Unable to retrieve log for 257598774: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/QUMdmSgbTNq9R7sts6LwKw/runs/0/artifacts/public/logs/live_backing.log Feb 06 00:27:35 treeherder-prod app/worker_log_parser.2: [2018-02-06 00:27:35,637: WARNING/Worker-64] Unable to retrieve log for 257598774: 404 Client Error: Not Found for url: https://queue.taskcluster.net/v1/task/QUMdmSgbTNq9R7sts6LwKw/runs/0/artifacts/public/logs/live_backing.log
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Flags: needinfo?(jhford)
Dustin, per our conversation on IRC about cloud-mirror being involved here, it isn't. Treeherder uses the skip-cache header to bypass cloud-mirror. As for these errors, I'm not sure why they are failing https://public-artifacts.taskcluster.net/QUMdmSgbTNq9R7sts6LwKw/1/public/logs/live_backing.log works, but the /0/ one is just completely absent Pete, are you aware of any issues where failing tasks result in a lack of logs being uploaded? I think I saw something which might be related in bugmail recently. Ed, it looks like the logs genuinely aren't present and I'm not sure how to see what went wrong. Wander, could you help by checking into the example QUMdmSgbTNq9R7sts6LwKw task in the docker-worker logs?
Flags: needinfo?(wcosta)
Flags: needinfo?(pmoore)
Flags: needinfo?(jhford)
(In reply to John Ford [:jhford] CET/CEST Berlin Time from comment #17) > Dustin, per our conversation on IRC about cloud-mirror being involved here, > it isn't. Treeherder uses the skip-cache header to bypass cloud-mirror. > > As for these errors, I'm not sure why they are failing > > https://public-artifacts.taskcluster.net/QUMdmSgbTNq9R7sts6LwKw/1/public/ > logs/live_backing.log works, but the /0/ one is just completely absent > > Pete, are you aware of any issues where failing tasks result in a lack of > logs being uploaded? I think I saw something which might be related in > bugmail recently. This is expected behaviour - I suspect the worker that took run 0 was a victim of bug 1372172 - and thus lost contact with the world before the task completed and it could upload the backing log. Then the queue resolved it as claim-expired, and a new task run was created. On run 1, the worker didn't get swallowed into the abyss and was able to upload the backing log. Independently of whether bug 1372172 is resolved, it is quite possible for workers to suddenly fail/lose power/catch fire etc, so nothing should expect the backing log to be present, especially in the case the task run is resolved as claim-expired. > Ed, it looks like the logs genuinely aren't present and I'm not sure how to > see what went wrong. > > Wander, could you help by checking into the example QUMdmSgbTNq9R7sts6LwKw > task in the docker-worker logs? This is a generic-worker task.
Flags: needinfo?(wcosta)
Flags: needinfo?(pmoore)
Would there ever be cases where a `claim-expired` job had uploaded a log? If not, perhaps {treeherder, taskcluster-treeherder, <something upstream>} could not set the log URL in this case?
To answer that question, yes -- it's possible that a worker dies between uploading an artifact and resolving the task, resulting in claim-expired. I think the thing causing the confusion here is that tc-treeherder is unconditionally adding a log URL to the messages it sends to treeherder: https://github.com/taskcluster/taskcluster-treeherder/blob/f6de5be5cb224784ef91b7e9c9ca43ee19bde843/src/handler.js#L289 There are two things we could do here: * consult the queue (queue.listArtifacts) to see if there's a log artifact * assume that exceptions don't have logs The first option would add a bit of extra load on the queue, slow down tc-treeherder, and add more time between task completion and appearance in TH. The second option might be a decent compromise. Users can still find logs -- if they exist -- via the inspector. Ed, what do you think?
Flags: needinfo?(emorley)
I think the second option sounds like a good compromise that will have a net win for UX :-)
Flags: needinfo?(emorley)
Assignee: nobody → dustin
I'm still seeing logs for successful tasks, so I think we're OK here.
Status: REOPENED → RESOLVED
Closed: 7 years ago6 years ago
Resolution: --- → FIXED
Amazing - thank you :-)
Component: Platform and Services → Services
You need to log in before you can comment on or make changes to this bug.