Closed
Bug 1441996
Opened 7 years ago
Closed 3 years ago
Sentry connectivity checks for Socorro processes
Categories
(Socorro :: General, task)
Socorro
General
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 1758701
People
(Reporter: osmose, Unassigned)
Details
We use sentry across Socorro for reporting errors, but we have nothing in place to alert use when it is unreachable. This also is a bit complex because Sentry is typically how we'd report an error like this.
For the webapp, we could add an endpoint that sends a test message to Sentry and throws a 500 error if it fails. Infra can hit this endpoint either periodically, after a deploy, or both.
For the processor/crontabber/etc, one suggestion was to report to Datadog when we can't send a test message on startup.
Reporter | ||
Comment 1•7 years ago
|
||
willkg: Besides the webapp, processor, and crontabber, are there any processes/services we would want to cover with this kind of check?
willkg/miles: What options do we have for reporting besides Sentry itself and Datadog?
Flags: needinfo?(willkg)
Flags: needinfo?(miles)
Comment 2•7 years ago
|
||
Sentry and Datadog are the realistic places where this reporting should be handled.
We could put this in the healthcheck/heartbeat endpoints in some capacity, but returning non-200 in those endpoints is treated as page-able downtime.
Flags: needinfo?(miles)
Comment 3•7 years ago
|
||
Seems to me that what we want to test here are two things:
1. will the code we have send exceptions to sentry
2. is the configuration for the component correct
Both of those are things that change during deploys--they're not things that change on the whims of time.
Given that, I don't want to add these to heartbeat-type healthchecks. I think we want to implement during-deploy checks that get run once during a deploy for each component.
For mechanisms, the webapp has that "./manage.py raven whatever" thing. I think we could build an equivalent thing for the processor and crontabber where a "pass" is "error got sent to sentry" and a "fail" is "code raised an error trying to send an error to sentry".
Sending an incr to datadog on fails is interesting, but I think I'd rather this used our existing deploy alerting for when deploys fail.
Flags: needinfo?(willkg)
Comment 4•3 years ago
|
||
We implemented a cli that lets us test sentry configuration and connectivity for any of the server nodes in bug #1758701, so I'm going to dupe this one to that.
Status: NEW → RESOLVED
Closed: 3 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•