Closed
Bug 1647841
Opened 4 years ago
Closed 4 years ago
Monitor overall system health of production environment
Categories
(Cloud Services :: Operations: CRLite, task)
Cloud Services
Operations: CRLite
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: sven, Assigned: sven)
References
Details
Bug 1596537 mentions there are some metrics exported to Stackdriver. Figure out whether we can get these exported to InfluxDB instead, or whether we should implement a Poucave check instead.
Assignee | ||
Comment 1•4 years ago
|
||
We also need to monitor the memory utilizaton of the Redis instance.
Assignee | ||
Comment 2•4 years ago
|
||
I've filed https://github.com/mozilla-services/poucave/pull/555 for the Poucave check.
Assignee | ||
Comment 3•4 years ago
|
||
The code change for Poucave is deployed; https://github.com/mozilla-services/cloudops-infra/pull/2474 will enable the new check.
Comment 4•4 years ago
|
||
Remaining steps from today's meeting:
PR to crlite to add crlite prefix to metrics
set up pagerduty low urgency
add pingdom checks for poucave
create grafana dashbaord for custom metrics and redis.
add alert on redis memory usage.
Updated•4 years ago
|
Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•