Closed Bug 1125088 Opened 10 years ago Closed 10 years ago

Ensure log parser doesn't re-parse the log if another task in the queue has already done so

Categories

(Tree Management :: Treeherder: Data Ingestion, defect, P2)

defect

Tracking

(Not tracked)

RESOLVED WORKSFORME

People

(Reporter: emorley, Unassigned)

References

Details

Similar to the missing pushlog ingestion task, we should also check that we don't unnecessarily repeat the task, if we've performed it since. ie in this scenario: 1) Log parse for job X is scheduled, but hasn't completed promptly. 2) User clicks on job X in the Treeherder UI, parsed log isn't available, so high priority on-demand log parsing task is scheduled. 3) Potentially another user does the same as #2 on a different machine. 4) We now (potentially) have multiple duplicate entries in the high priority log parse queue, plus the original normal-priority task in the queues. 5) One of the tasks completes, leaving several redundant tasks, that ideally should be no-ops. We may handle this case already - but it's worth checking - since if we don't, it can massively compound a log parser backlog - since we end up with hundreds of high priority tasks from people clicking in the UI to have to deal with - which will get handled before the actual backlog.
Summary: Check log parsing doesn't do busy work if we've since parsed the log (eg via on demand parsing) → Check log parser doesn't re-parse the log if another task in the queue has already done so
Yeas that's what guarantees the no-op, although there could be a case when the log parser keeps failing and that condition is not met
But we'll want to retry in many cases though right? And in those that we don't that's bug 1125104 - ie don't retry 10 times if we hit an exception that should not be retried. Presuming we're happy with this - we can close this bug now :-)
Status: NEW → RESOLVED
Closed: 10 years ago
Resolution: --- → WORKSFORME
Summary: Check log parser doesn't re-parse the log if another task in the queue has already done so → Ensure log parser doesn't re-parse the log if another task in the queue has already done so
Oh but I see what you mean - even for the "we should retry" case - eg ftp.m.o timeout - we can end up with multiple duplicate tasks in log_parser_hp (plus the original in log_parser), which will _all_ retry 10 times each, hammering ftp.m.o. Plus of course the more obvious "two tasks racing, both end up running, but both succeed, so at least we don't retry" case.
You need to log in before you can comment on or make changes to this bug.