Closed Bug 1352676 Opened 8 years ago Closed 8 years ago

Windows 7 taskcluster tests don't run/start, get killed after one day

Categories

(Release Engineering :: General, enhancement)

enhancement
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: aryx, Unassigned)

References

Details

(Whiteboard: [stockwell infra])

I believe this is because these require the gpu instances and AWS has a limited pool available. Maybe we have our own restrictions and there is a danger of us outbidding ourselves. If this is the case, maybe we can run the gpu jobs on central only.
Yes, we're throttling in the provisioner. Currently limited to 48 instances and showing 5k+ pending for worker type gecko-t-win7-32-gpu which is the pricier g2.2xlarge. :garndt: how do you feel about allowing more of this instance type?
Flags: needinfo?(rthijssen) → needinfo?(garndt)
Yea, we should absolutely be adding more capacity to account for the additional things being enabled. What that upper limit is really is a balance of worker efficiency, cost, and competition of bidding. The last one is the one that worries me and I do not know enough of what gpu instances we provision for the releng services. We'll need to be careful bumping this up and balancing it with the instances releng needs so we do not get in a bid war.
Flags: needinfo?(garndt)
for some reason, i can no longer change provisioner settings in tools.taskcluster.net. if someone has access to https://tools.taskcluster.net/aws-provisioner/#gecko-t-win7-32-gpu/ please up the max capacity to 96 (double the current limit).
nm, login sorted and limit upped.
if we see problems in buildbot/build-cloud-tools not having enough instances available, we can simply remove us-east-1 and us-west-2 from the gecko-t-win7-32-gpu provisioner config and force tc builds into us-west-1 and eu-central-1 where buildbot doesn't operate. i hope that i'm correct in thinking this would be an unlikely scenario (as discussed in bug 1318648).
pending count is down to 0.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Whiteboard: [stockwell infra]
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.