Closed Bug 1017551 Opened 11 years ago Closed 9 years ago

Add Nagios alert for pending jobs in scheduler DB with submit time > N hours ago

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 978956

People

(Reporter: emorley, Unassigned)

References

Details

(Keywords: sheriffing-P1, Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2531] )

In bug 1012633 it took some time for us to realise that there was a backlog building up - since the normal metric of "are there more than N total queued jobs" missed the fact that most jobs were being picked up by a machine and run fine, those specific ones were not. In order to not miss this in future, we should add a Nagios alert (or fix/tweak the current rule if we already have something similar) that checks for pending (ie scheduled but not started) jobs in the scheduler DB that have a submit time > N hours ago. Up for suggestions as to what we set N to - and we may need to vary N depending on whether the job is scheduled on a main tree ({mozilla-central, mozilla-inbound, b2g-inbound, fx-team, mozilla-aurora, mozilla-beta, mozilla-release, mozilla-b2g*, ...}) or a lower priority tree ({try, cedar, ash, ...}).
Keywords: sheriffing-P1
Similar/dupe of bug 978956?
Yeah though that states it's just for tests; happy for you to dupe either way and/or morph
Given the total lack of activity in that bug, I don't really care at this point.
Depends on: 1050264
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2531]
Component: Tools → Buildduty
QA Contact: hwine → bugspam.Callek
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → DUPLICATE
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.