Closed Bug 1198273 Opened 9 years ago Closed 9 years ago

Have metrics for scheduling

Categories

(Testing :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: armenzg, Unassigned)

References

(Blocks 1 open bug)

Details

I would like to analyze the effects of automatic backfilling and SETA.
do explain more
We currently don't have a tool that allow us to see: 1) how many jobs were scheduled on a push 2) how many of those were because of user's request 3) how many of those were because of automatic scheduling We could break down per pool or per builder. We would easily be able to see what percentages of all jobs are due to automatic backfilling and we can see how many pushes get full set of jobs (no SETA applied).
I would like to see: * number of coalesced jobs for a push; avg/push * per platform number of pushes we can handle (avg/platform) * breakdown of periodic jobs (nightly, pgo) If we can derive from this capacity planning for hardware pools, then we can more accurately model impact on our infrastructure. While we don't spread the load on all machines perfectly, we can do theoretical models.
(In reply to Joel Maher (:jmaher) from comment #3) > I would like to see: > * number of coalesced jobs for a push; avg/push > * per platform number of pushes we can handle (avg/platform) > * breakdown of periodic jobs (nightly, pgo) > > If we can derive from this capacity planning for hardware pools, then we can > more accurately model impact on our infrastructure. While we don't spread > the load on all machines perfectly, we can do theoretical models. To make this work, we'd also need to track the amount of machine time we spend per push, per branch, per platform. That is, on win7-opt, we might spend 30 hours of machine time per push on mozilla-central, but 20 hours per push on mozilla-inbound (due to SETA), on average. This number varies over time as we change test jobs, and Firefox and/or the infra speeds up (or slows down). If we tracked this, we could realistically start to predict the impacts of adding new test jobs, increasing the size of our hardware pools, etc. There are a lot of edge cases that affect this number, like retriggers and cancellations, and try runs are a big wildcard, but I think if we looked at averages over time we could come up with a pretty useful set of numbers. We should be able to pull this out of Treeherder, I believe.
I believe ouija could generate this very quickly :)
I hear we pull this data out of treeherder, however, I have found TH pretty unreliable wrt to showing more jobs that there are or less jobs that there should be. Until the buildbot jobs are not run through TC we won't have accurate data (thought perfect should not detriment this). I'm afraid the scope of what you're mentioning is way bigger than what I had in mind.
Some of my personal notes: https://etherpad.mozilla.org/scheduling-metrics-bug1198273 We're considering this to be part of an Outreachy internship.
I'm going to close a bunch of bugs and start talking with sherrifs what specifically they need. Instead of coming out with I think is needed.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
You need to log in before you can comment on or make changes to this bug.