Closed
Bug 1284289
Opened 8 years ago
Closed 8 years ago
Improvements to celery task retrying & New Relic reporting
Categories
(Tree Management :: Treeherder: Data Ingestion, defect, P2)
Tree Management
Treeherder: Data Ingestion
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Assigned: emorley)
References
Details
Attachments
(1 file)
There are currently a few issues:
* Bug 1281809 added the `@retryable_task` decorator to the `parse_log`, `store_failure_lines` and `crossreference_error_lines` tasks, however all three had their own retry logic already, so there are now two layers of retries, which is what contributed to the 100K message backlog in bug 1283413.
* New Relic lumps all of the `celery.exceptions.Retry()` exceptions together which isn't great for visibility. We should use the Python agent's `.record_exception()` feature to report the original exception (and include the retry count in the custom attributes list), then we can add `celery.exceptions.Retry()` to the ignore list on the New Relic dashboard config.
* There are a few other tasks that can be switched to the `@retryable_task` decorator to save duplication.
* The Celery docs now suggest prefixing the `.retry()` call with `raise `, to make the lack of continuing code execution more obvious. (Even though that particular `raise` will never actually occur.)
Assignee | ||
Updated•8 years ago
|
Assignee | ||
Updated•8 years ago
|
Comment 1•8 years ago
|
||
Assignee | ||
Updated•8 years ago
|
Attachment #8772834 -
Flags: review?(wlachance)
Comment 2•8 years ago
|
||
Comment on attachment 8772834 [details]
[treeherder] mozilla:task-retry-tweaks > mozilla:master
Looks like some nice cleanup!
Attachment #8772834 -
Flags: review?(wlachance) → review+
Comment 3•8 years ago
|
||
Commits pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/46b2a41f0b553c63787f94af79d0c93e7ea6f5c9
Bug 1284289 - Remove duplicate .retry() of crossreference_error_lines
Since the task already has the `@retryable_task` decorator applied.
https://github.com/mozilla/treeherder/commit/2f317a0be2ec7b6c993feb44b5d33e18791d77b0
Bug 1284289 - Remove duplicate .retry() of store_failure_lines
Since the task already has the `@retryable_task` decorator applied.
https://github.com/mozilla/treeherder/commit/77959aa2c7a6077f0a0c1ced5018619f76153d73
Bug 1284289 - Remove duplicate .retry() of parse_log
Since the task already has the `@retryable_task` decorator applied.
Note that the decorator retries in more cases than the .retry()s in
`utils.py` did, since previously only `urllib2.URLError` exceptions
during `extract_text_log_artifacts()` would retry, whereas now any will.
However (a) we can add more exemptions to `@retryable_task` as we
encounter them later, (b) this isn't a regression on the current
duplicate `.retry()` behaviour.
https://github.com/mozilla/treeherder/commit/ffd5a5c8921ec7f892f20462089686efe2231972
Bug 1284289 - Use @retryable_task with the submit_elasticsearch_doc task
Rather than duplicating more retry-handling logic.
https://github.com/mozilla/treeherder/commit/e518b52b6e359d742c2e85cf4400e219bf9f694f
Bug 1284289 - Clean up log parser logging statements
https://github.com/mozilla/treeherder/commit/9faf9d97ae0b050c1bdc3fda3c4bb7a4e6a15c34
Bug 1284289 - Clarify celery retry call behaviour with 'raise' prefix
As recommended by:
http://docs.celeryproject.org/en/stable/reference/celery.app.task.html#celery.app.task.Task.retry
https://github.com/mozilla/treeherder/commit/e699729b40f4359248aa54803de7f94b0a7abefc
Bug 1284289 - Simplify @retryable_task's exception blacklist
The previous variable name of `raise_exceptions` was easy to misread as
"exceptions that will cause this task to fail and thus retry". Also,
nothing is customising this list at present, so keep as a fixed
blacklist for now.
https://github.com/mozilla/treeherder/commit/77a809cd0c7adbc273a88a859d0350ab99ba1730
Bug 1284289 - Don't retry tasks for 'zlib.error` exceptions
This prevents continually retrying when we hit bug 1284360, given that
the task will never succeed anyway.
https://github.com/mozilla/treeherder/commit/f5817334d86e08836874b00ecac38fd6e4479f3c
Bug 1284289 - Manually report exceptions that caused tasks to retry
See in-code comment for more details.
Once this is deployed, I'll use the New Relic web configuration page to
add `celery.exceptions:Retry` to the ignore list. (Contrary to the
linked New Relic docs, this cannot be done via newrelic.ini, since the
server-side config takes preference once server-side mode is enabled.)
Assignee | ||
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 4•8 years ago
|
||
I've now added 'celery.exceptions:Retry' to the ignore list for all New Relic app configurations via the web settings UI, since we're reporting the exceptions individually.
As such:
* Exceptions of type "celery.exceptions:Retry" will no longer be ingested into New Relic
* The counts of existing exceptions will appear to increase, but it's really just the retries morphing signature (from the PR here)
* When looking at an individual exception, you can tell if it was from a retry, since it will have the custom attribute "number_of_prior_retries"
* The new "Error Analysis" section on New Relic should be much more usable, since exceptions won't just be all piled into the one bucket.
If anyone has any questions, just ask :-)
You need to log in
before you can comment on or make changes to this bug.
Description
•