Closed Bug 1188930 Opened 9 years ago Closed 9 years ago

Identify how stats can reset in Heka

Tracking

(Not tracked)

Status:

RESOLVED FIXED

People

(Reporter: mreid, Unassigned)

Details

Mark Reid [:mreid]

Reporter

Description

•

9 years ago

We received an alert where a custom measure called "ProcessFileFailures" increased from 0 to 1. Then when we checked the Heka dashboard UI, that measure had reverted to zero. The code for incrementing the number of failures is: https://github.com/mozilla-services/data-pipeline/blob/master/heka/plugins/s3splitfile/s3splitfile_output.go#L477 As far as I can tell there is no mechanism for resetting this value in the code. Heka was not restarted after the alert. Rob, is this expected behaviour? Do you know why it might have reverted to zero?

Mark Reid [:mreid]

Reporter

Updated

•

9 years ago

Flags: needinfo?(rmiller)

Thomas Huelbert

Comment 1

•

9 years ago

may need to change the priority based on the data

Iteration: --- → 42.3 - Aug 10

Priority: -- → P2

Rob Miller [:rmiller]

Comment 2

•

9 years ago

Yeah, I'm baffled by this one. The behaviour isn't expected, nor do I know how it might have reverted to zero. If someone showed up in IRC w/ this problem, I'd strongly suspect that the process actually HAD been restarted but they somehow didn't know that was the case. This seems less likely to me since both you and Trink seem to have checked and verified that it didn't, however. :P

Flags: needinfo?(rmiller)

Mike Trinkala [:trink]

Comment 3

•

9 years ago

It looks like there are two dwl's running and https://pipeline-prototype-dwl.prod.mozaws.net/#health is landing on the one without the error.

Status: NEW → RESOLVED

Closed: 9 years ago

Resolution: --- → FIXED

BMO Automation

Updated

•

6 years ago

Product: Cloud Services → Cloud Services Graveyard

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Identify how stats can reset in Heka

Categories

(Cloud Services Graveyard :: Metrics: Pipeline, defect, P2)

Tracking

(Not tracked)

People

(Reporter: mreid, Unassigned)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Updated