Closed Bug 537988 Opened 15 years ago Closed 15 years ago

Ensure there are appropriate nagios checks on dm-download02

Categories

(mozilla.org Graveyard :: Server Operations, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: reed, Assigned: arzhel)

Details

dm-download02 is the only machine in Bouncer that is serving every release ever, which means it serves a pretty critical function for users with old Firefox versions, as it's the only machine that users can get old releases from (via bouncer) in order to get updated to the latest release. Please ensure there are appropriate nagios checks, including: * Apache working * Files for old versions exist (separate checks for netapp vs. eql) * etc... It would be really nice if there were two machines in bouncer that were serving every file, but I'll take what I can get for now, I guess...
23:48:15 <@fox2mike> nagios: status dm-download02:.* 23:48:18 <@nagios> fox2mike: dm-download02:avg load is OK: OK - load average: 0.05, 0.11, 0.09 23:48:18 <@nagios> fox2mike: dm-download02:RSYNC is OK: TCP OK - 0.000 second response time on port 873 23:48:18 <@nagios> fox2mike: dm-download02:health is OK: OK - System: proliant dl360 g4, S/N: USM50703JH, ROM: P52 12/02/2004, hardware working fine 23:48:18 <@nagios> fox2mike: dm-download02:root partition is OK: DISK OK - free space: / 10990 MB (17% inode=99%): 23:48:18 <@nagios> fox2mike: dm-download02:http - dm-download02.mozilla.org is OK: HTTP OK HTTP/1.1 200 OK - 810 bytes in 0.003 seconds 23:48:18 <@nagios> fox2mike: dm-download02:RAID is OK: RAID OK: Smart Array 6i in Slot 0 array A logicaldrive 1 (67.8 GB, RAID 1+0, OK) [Controller Status: OK Cache Status: OK] 23:48:20 <@nagios> fox2mike: dm-download02:ftp is OK: FTP OK - 0.002 second response time on port 21 [220 (vsFTPd 2.0.5)] 23:48:22 <@nagios> fox2mike: dm-download02:PING is OK: PING OK - Packet loss = 0%, RTA = 0.17 ms 23:48:24 <@nagios> fox2mike: dm-download02:http - videos.mozilla.org is OK: HTTP OK HTTP/1.1 200 OK - 2094 bytes in 0.002 seconds What more do you think needs to be monitored?
(In reply to comment #1) > What more do you think needs to be monitored? Need to ensure that files for old versions exist (at least two checks -- one for what's on the netapp and one for what's on eql). Might could get away with HTTP checks that just do HEAD for those files (rather than actually GET them), or you can do a file check on the system itself...
Assignee: server-ops → ayounsi
Arzhel, eta?
Soon, check_http doesn't support HEAD, so I chose to use a file check. Last step is to figure out how to check files with a space in their path/name through nrpe.
I added file check on en-US and de-DE for the linux version of the latest version of each Firefox and Thunderbird branch (1.0.8, 1.5.0.12, 2.0.0.20, 3.0.18, 3.5.8 both on the netapp and the eql for Firefox). I can't check win32 or mac version because they both have spaces in they filename (and NRPE don't like escaped characters) but we can suggest that if the win32 version disappears, the linux-i686 will disappears as well. Is that good enough or more checks are required?
bug 548408 for timeouts caught by these checks.
I think we can close this bug.
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Product: mozilla.org → mozilla.org Graveyard
You need to log in before you can comment on or make changes to this bug.