Closed Bug 1642985 Opened 4 years ago Closed 4 years ago

Improve 'disperse' optimization so tasks are spread across configurations more evenly

Categories

(Firefox Build System :: Task Configuration, defect, P3)

defect

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: sg, Unassigned)

References

(Blocks 1 open bug)

Details

On the mach try auto push https://treeherder.mozilla.org/#/jobs?repo=try&revision=dff104d2242c06b6e11d762c810317f8c4e77afe&selectedTaskRun=RvFYemYqRKqVx2QM858_6Q-0, the selection of tests looks inconsistent:

Some mochitest jobs were run on Linux WebRender and Android, but none on Windows or OS X.

Interestingly, I also tried running mach try syntax, which didn't run any mochitests at all:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef52fd438a716f28606c1217e4688aba143a3f34

Summary: Inconsistent selection of tests → Inconsistent selection of mochitests with mach try auto

FWIW, when the patches changing the same files landed before (before being backed out), mochitests were triggered ONLY on Windows: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&revision=052839fb9b56060e603e4fada1c8f7e56df4ae0f

Thanks for filing. We're handling platforms on autoland and try differently (the former is chosen by the service, the latter attempts to disperse manifests across platforms). So the differences between try <-> autoland are somewhat expected.

Unfortunately I think what you're seeing is due to a limitation in the "disperse" algorithm. It tries to set a cap on the number of configurations a manifest can run (cap depends on how "important" the manifest is). The linux configurations are being processed first, and are likely eating up the quota of the manifests before we get around to Windows.

We could probably do this a bit better by not only keeping track of "seen" configurations, but "seen" operating systems in general. If a manifest has already been scheduled on a given OS family, it's "quota" is reduced. If it hasn't, the quota is increased.

Code is here if you are curious:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize/bugbug.py#119

Severity: -- → S3
Component: Try → Task Configuration
Priority: -- → P3
Summary: Inconsistent selection of mochitests with mach try auto → Improve 'disperse' optimization so tasks are spread across configurations more evenly

I'm working on smarter platform selection for manifests, so this should be fixed by that.
If it works, we can drop the disperse algorithm and close this as WONTFIX. If it doesn't work, we can come back to this bug.

(In reply to Marco Castelluccio [:marco] from comment #3)

I'm working on smarter platform selection for manifests, so this should be fixed by that.

Can you refer me to the bug tracking that?

Flags: needinfo?(mcastelluccio)

Here is is: bug 1639164.

Flags: needinfo?(mcastelluccio)

Marking this as WONTFIX as bug 1639164 is about to land.

Status: NEW → RESOLVED
Closed: 4 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.