Closed Bug 778805 Opened 12 years ago Closed 12 years ago

Verify Zeus thift_check.py properly reports failures to Zeus for socorro staging

Categories

(Infrastructure & Operations Graveyard :: WebOps: Other, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bburton, Assigned: bburton)

References

Details

We're seeing ongoing connectivity issues to Socorro staging's HBase pool, but Zeus is not recording any errors with backend nodes, unlike the Postgres pool. Work with :tmary to stop HBase on one of the Socorro Staging nodes (hp-node62 - hp-node69) and confirm the Zeus check (https://pp-zlb01.phx.mozilla.net:9090/apps/zxtm/index.fcgi?section=Extra%20Files%3AExternProgMonitors) is working properly.
Per IRC and Zeus logs 20:34:13 tmary | solarce: done [30/Jul/2012:11:34:25 -0700] WARN monitors/socorro-thrift-check monitorfail Monitor has detected a failure in node '10.8.100.62:9090': Monitor exited, exit code 1, no output generated [30/Jul/2012:11:34:25 -0700] SERIOUS pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodefail Node 10.8.100.62 has failed - A monitor has detected a failure
Status: NEW → ASSIGNED
Confirmed happy again, monitor is working [30/Jul/2012:11:45:56 -0700] INFO monitors/socorro-thrift-check monitorok Monitor is working for node '10.8.100.62:9090'. [30/Jul/2012:11:45:57 -0700] INFO pools/socorro-thrift-stage:9090 nodes/10.8.100.62:9090 nodeworking Node 10.8.100.62 is working again
Status: ASSIGNED → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Component: Server Operations: Web Operations → WebOps: Other
Product: mozilla.org → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.