Closed Bug 1188930 Opened 9 years ago Closed 9 years ago

Identify how stats can reset in Heka

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: mreid, Unassigned)

Details

We received an alert where a custom measure called "ProcessFileFailures" increased from 0 to 1. Then when we checked the Heka dashboard UI, that measure had reverted to zero. The code for incrementing the number of failures is: https://github.com/mozilla-services/data-pipeline/blob/master/heka/plugins/s3splitfile/s3splitfile_output.go#L477 As far as I can tell there is no mechanism for resetting this value in the code. Heka was not restarted after the alert. Rob, is this expected behaviour? Do you know why it might have reverted to zero?
Flags: needinfo?(rmiller)
may need to change the priority based on the data
Iteration: --- → 42.3 - Aug 10
Priority: -- → P2
Yeah, I'm baffled by this one. The behaviour isn't expected, nor do I know how it might have reverted to zero. If someone showed up in IRC w/ this problem, I'd strongly suspect that the process actually HAD been restarted but they somehow didn't know that was the case. This seems less likely to me since both you and Trink seem to have checked and verified that it didn't, however. :P
Flags: needinfo?(rmiller)
It looks like there are two dwl's running and https://pipeline-prototype-dwl.prod.mozaws.net/#health is landing on the one without the error.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
Product: Cloud Services → Cloud Services Graveyard
You need to log in before you can comment on or make changes to this bug.