Closed
Bug 1352676
Opened 8 years ago
Closed 8 years ago
Windows 7 taskcluster tests don't run/start, get killed after one day
Categories
(Release Engineering :: General, enhancement)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: aryx, Unassigned)
References
Details
(Whiteboard: [stockwell infra])
Windows 7 taskcluster tests added in bug 1351272 are shown as pending and never start, but get killed after one day, e.g. https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=3364cc17988c013c36f2a8123315db2855393011&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=retry&filter-resultStatus=usercancel&filter-resultStatus=runnable
Trees stay open because these are Tier 2.
Flags: needinfo?(rthijssen)
Comment 1•8 years ago
|
||
I believe this is because these require the gpu instances and AWS has a limited pool available. Maybe we have our own restrictions and there is a danger of us outbidding ourselves. If this is the case, maybe we can run the gpu jobs on central only.
Comment 2•8 years ago
|
||
Yes, we're throttling in the provisioner. Currently limited to 48 instances and showing 5k+ pending for worker type gecko-t-win7-32-gpu which is the pricier g2.2xlarge.
:garndt: how do you feel about allowing more of this instance type?
Flags: needinfo?(rthijssen) → needinfo?(garndt)
Comment 3•8 years ago
|
||
Yea, we should absolutely be adding more capacity to account for the additional things being enabled. What that upper limit is really is a balance of worker efficiency, cost, and competition of bidding. The last one is the one that worries me and I do not know enough of what gpu instances we provision for the releng services. We'll need to be careful bumping this up and balancing it with the instances releng needs so we do not get in a bid war.
Flags: needinfo?(garndt)
Comment hidden (Intermittent Failures Robot) |
Comment 5•8 years ago
|
||
for some reason, i can no longer change provisioner settings in tools.taskcluster.net. if someone has access to https://tools.taskcluster.net/aws-provisioner/#gecko-t-win7-32-gpu/ please up the max capacity to 96 (double the current limit).
Comment 6•8 years ago
|
||
nm, login sorted and limit upped.
Comment 7•8 years ago
|
||
if we see problems in buildbot/build-cloud-tools not having enough instances available, we can simply remove us-east-1 and us-west-2 from the gecko-t-win7-32-gpu provisioner config and force tc builds into us-west-1 and eu-central-1 where buildbot doesn't operate. i hope that i'm correct in thinking this would be an unlikely scenario (as discussed in bug 1318648).
Comment 8•8 years ago
|
||
pending count is down to 0.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Whiteboard: [stockwell infra]
Assignee | ||
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•