Closed
Bug 881228
Opened 11 years ago
Closed 11 years ago
Please enable downtime alerts for nagios-releng in #buildduty
Categories
(mozilla.org Graveyard :: Server Operations, task)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: Callek, Assigned: ashish)
References
Details
This will allow nagios-releng to do alerts like what we see nagios-scl3 doing:
Mon 05:16:48 PDT ringring.mv.mozilla.com is DOWNTIMESTART (UP) :PING OK - Packet loss = 0%, RTA = 3.18 ms
nagios-scl3 Mon 05:17:47 PDT ringring.mv.mozilla.com is DOWNTIMEEND (UP) :PING OK - Packet loss = 0%, RTA = 2.41 ms
Allows us to know if a group of hosts is alerting because it comes off downtime, or because it is a new alert.
We could additionally turn it on for our release@ nagios notification e-mails, but I'm not as concerned about giving us more spam that way, since every buildduty person I know of uses the channel over e-mail for this task.
Reporter | ||
Comment 1•11 years ago
|
||
I did tell you in IRC there was no urgency here, but I'm curious what we're looking at for an ETA. Since -- for me if no-one else, this would help a lot!
Flags: needinfo?(ashish)
Assignee | ||
Comment 2•11 years ago
|
||
:Callek the last time we spoke you mentioned you'd bring this up in your team meeting and clarify whether emailing the list for downtime notifications (just as all other alerts do) was feasible? If yes, I can expedite and push this out soon. Do let me know, thanks!
Flags: needinfo?(ashish) → needinfo?(bugspam.Callek)
Reporter | ||
Comment 3•11 years ago
|
||
redir needinfo to hal, since he took ownership of this item.
Flags: needinfo?(bugspam.Callek) → needinfo?(hwine)
Comment 4•11 years ago
|
||
Hal, poke.
Comment 5•11 years ago
|
||
Shyam -- aiui the ideal solution is to get the "depends on" relationships into Nagios, so we'll get the right notifications.
Sending downtime alerts is a stop-gap until the "depends on" relationships are established. We seem to be bogging down on that effort -- I can't even find a bug on it, so created bug 932598.
Since we've made some procedural changes on our side, back to Callek to confirm this is still wanted by the buildduty team.
Flags: needinfo?(hwine) → needinfo?(bugspam.Callek)
Reporter | ||
Comment 6•11 years ago
|
||
(In reply to Hal Wine [:hwine] (use needinfo) from comment #5)
> Shyam -- aiui the ideal solution is to get the "depends on" relationships
> into Nagios, so we'll get the right notifications.
>
> Sending downtime alerts is a stop-gap until the "depends on" relationships
> are established. We seem to be bogging down on that effort -- I can't even
> find a bug on it, so created bug 932598.
This is indeed still a want, irregardless of the depends on relationships being accurate.
The depends on being done will lower the usefulness of this as a global thing, but will not invalidate this bug in and of itself.
Flags: needinfo?(bugspam.Callek)
Assignee | ||
Comment 7•11 years ago
|
||
Pushed out, live and verified:
Hosts:
---8<---
23:03:14 < ashish> nagios-releng: downtime t-w732-ix-126.wintest.releng.scl3.mozilla.com 1m test
23:03:14 < nagios-releng> ashish: Downtime for host t-w732-ix-126.wintest.releng.scl3.mozilla.com scheduled for 0:01:00
23:03:15 < nagios-releng> Tue 23:03:14 PST t-w732-ix-126.wintest.releng.scl3.mozilla.com is DOWNTIMESTART (DOWN) :PING CRITICAL - Packet loss = 100%
23:04:15 < nagios-releng> Tue 23:04:15 PST t-w732-ix-126.wintest.releng.scl3.mozilla.com is DOWNTIMEEND (DOWN) :PING CRITICAL - Packet loss = 100%
---8<---
Services:
---8<---
22:18:53 < ashish> nagios-releng: downtime 4003 1m test
22:18:53 < nagios-releng> ashish: Downtime for service bm-remote.build.mtv1.mozilla.com:http scheduled for 0:01:00
22:18:57 < nagios-releng> Tue 22:18:57 PST bm-remote.build.mtv1.mozilla.com:http is DOWNTIMESTART (WARNING): HTTP WARNING: HTTP/1.1 403 Forbidden - 599 bytes in 0.007 second response time (http://m.allizom.org/http) (notify-by-email) HTTP WARNING: HTTP/1.1 403 Forbidden - 599 bytes in 0.007 second response time
22:19:53 < nagios-releng> Tue 22:19:53 PST bm-remote.build.mtv1.mozilla.com:http is DOWNTIMEEND (WARNING): HTTP WARNING: HTTP/1.1 403 Forbidden - 599 bytes in 0.007 second response time (http://m.allizom.org/http) (notify-by-email) HTTP WARNING: HTTP/1.1 403 Forbidden - 599 bytes in 0.007 second response time
---8<---
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•10 years ago
|
Product: mozilla.org → mozilla.org Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•