Improve 'disperse' optimization so tasks are spread across configurations more evenly
Categories
(Firefox Build System :: Task Configuration, defect, P3)
Tracking
(Not tracked)
People
(Reporter: sg, Unassigned)
References
(Blocks 1 open bug)
Details
On the mach try auto
push https://treeherder.mozilla.org/#/jobs?repo=try&revision=dff104d2242c06b6e11d762c810317f8c4e77afe&selectedTaskRun=RvFYemYqRKqVx2QM858_6Q-0, the selection of tests looks inconsistent:
Some mochitest jobs were run on Linux WebRender and Android, but none on Windows or OS X.
Interestingly, I also tried running mach try syntax
, which didn't run any mochitests at all:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ef52fd438a716f28606c1217e4688aba143a3f34
Reporter | ||
Updated•4 years ago
|
Reporter | ||
Comment 1•4 years ago
|
||
FWIW, when the patches changing the same files landed before (before being backed out), mochitests were triggered ONLY on Windows: https://treeherder.mozilla.org/#/jobs?repo=autoland&resultStatus=success%2Ctestfailed%2Cbusted%2Cexception&revision=052839fb9b56060e603e4fada1c8f7e56df4ae0f
Comment 2•4 years ago
|
||
Thanks for filing. We're handling platforms on autoland and try differently (the former is chosen by the service, the latter attempts to disperse manifests across platforms). So the differences between try <-> autoland are somewhat expected.
Unfortunately I think what you're seeing is due to a limitation in the "disperse" algorithm. It tries to set a cap on the number of configurations a manifest can run (cap depends on how "important" the manifest is). The linux configurations are being processed first, and are likely eating up the quota of the manifests before we get around to Windows.
We could probably do this a bit better by not only keeping track of "seen" configurations, but "seen" operating systems in general. If a manifest has already been scheduled on a given OS family, it's "quota" is reduced. If it hasn't, the quota is increased.
Code is here if you are curious:
https://searchfox.org/mozilla-central/source/taskcluster/taskgraph/optimize/bugbug.py#119
Updated•4 years ago
|
Comment 3•4 years ago
|
||
I'm working on smarter platform selection for manifests, so this should be fixed by that.
If it works, we can drop the disperse algorithm and close this as WONTFIX. If it doesn't work, we can come back to this bug.
Reporter | ||
Comment 4•4 years ago
|
||
(In reply to Marco Castelluccio [:marco] from comment #3)
I'm working on smarter platform selection for manifests, so this should be fixed by that.
Can you refer me to the bug tracking that?
Updated•4 years ago
|
Comment 6•4 years ago
|
||
Marking this as WONTFIX as bug 1639164 is about to land.
Description
•