Closed Bug 1441996 Opened 7 years ago Closed 3 years ago

Sentry connectivity checks for Socorro processes

Tracking

(Not tracked)

Status:

RESOLVED DUPLICATE of bug 1758701

People

(Reporter: osmose, Unassigned)

Details

Osmose [:osmose, :mkelly]

Reporter

Description

•

7 years ago

We use sentry across Socorro for reporting errors, but we have nothing in place to alert use when it is unreachable. This also is a bit complex because Sentry is typically how we'd report an error like this. For the webapp, we could add an endpoint that sends a test message to Sentry and throws a 500 error if it fails. Infra can hit this endpoint either periodically, after a deploy, or both. For the processor/crontabber/etc, one suggestion was to report to Datadog when we can't send a test message on startup.

Osmose [:osmose, :mkelly]

Reporter

Comment 1

•

7 years ago

willkg: Besides the webapp, processor, and crontabber, are there any processes/services we would want to cover with this kind of check? willkg/miles: What options do we have for reporting besides Sentry itself and Datadog?

Flags: needinfo?(willkg)

Flags: needinfo?(miles)

Miles Crabill [:miles]

Comment 2

•

7 years ago

Sentry and Datadog are the realistic places where this reporting should be handled. We could put this in the healthcheck/heartbeat endpoints in some capacity, but returning non-200 in those endpoints is treated as page-able downtime.

Flags: needinfo?(miles)

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 3

•

7 years ago

Seems to me that what we want to test here are two things: 1. will the code we have send exceptions to sentry 2. is the configuration for the component correct Both of those are things that change during deploys--they're not things that change on the whims of time. Given that, I don't want to add these to heartbeat-type healthchecks. I think we want to implement during-deploy checks that get run once during a deploy for each component. For mechanisms, the webapp has that "./manage.py raven whatever" thing. I think we could build an equivalent thing for the processor and crontabber where a "pass" is "error got sent to sentry" and a "fail" is "code raised an error trying to send an error to sentry". Sending an incr to datadog on fails is interesting, but I think I'd rather this used our existing deploy alerting for when deploys fail.

Flags: needinfo?(willkg)

Will Kahn-Greene [:willkg] ET needinfo? me

Comment 4

•

3 years ago

We implemented a cli that lets us test sentry configuration and connectivity for any of the server nodes in bug #1758701, so I'm going to dupe this one to that.

Status: NEW → RESOLVED

Closed: 3 years ago

Resolution: --- → DUPLICATE

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Quick Search

Sentry connectivity checks for Socorro processes

Categories

(Socorro :: General, task)

Tracking

(Not tracked)

People

(Reporter: osmose, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Comment 3

Comment 4