Closed Bug 1647841 Opened 4 years ago Closed 4 years ago

Monitor overall system health of production environment

Categories

(Cloud Services :: Operations: CRLite, task)

task

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: sven, Assigned: sven)

References

Details

Bug 1596537 mentions there are some metrics exported to Stackdriver. Figure out whether we can get these exported to InfluxDB instead, or whether we should implement a Poucave check instead.

We also need to monitor the memory utilizaton of the Redis instance.

I've filed https://github.com/mozilla-services/poucave/pull/555 for the Poucave check.

The code change for Poucave is deployed; https://github.com/mozilla-services/cloudops-infra/pull/2474 will enable the new check.

Remaining steps from today's meeting:

PR to crlite to add crlite prefix to metrics

set up pagerduty low urgency

add pingdom checks for poucave

create grafana dashbaord for custom metrics and redis.

add alert on redis memory usage.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.