Closed Bug 1198317 Opened 9 years ago Closed 9 years ago

reduce the number of available b-2008-ix instances in TRY in order to force y-2008-spot instantiation

Categories

(Infrastructure & Operations :: RelOps: Puppet, task)

x86_64
Windows Server 2008
task
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: grenade, Assigned: grenade)

References

Details

(Whiteboard: [windows][aws])

Attachments

(1 file)

disabled instances: b-2008-ix-0036 b-2008-ix-0039 b-2008-ix-0019 b-2008-ix-0038 b-2008-ix-0025 b-2008-ix-0054 b-2008-ix-0022 b-2008-ix-0058 b-2008-ix-0026 b-2008-ix-0059
disabled instances extended to include: b-2008-ix-0030 b-2008-ix-0174 b-2008-ix-0057 b-2008-ix-0023 b-2008-ix-0047 b-2008-ix-0046 b-2008-ix-0041 b-2008-ix-0044 b-2008-ix-0061 b-2008-ix-0055 b-2008-ix-0043 b-2008-ix-0049 b-2008-ix-0035 b-2008-ix-0029 b-2008-ix-0031 b-2008-ix-0062 b-2008-ix-0045
all machines returned to pool. will disable more tomorrow.
progress: - reduced ix capacity to a single instance (b-2008-ix-0043) - pushed win32, win64 m-c build to try (https://treeherder.mozilla.org/#/jobs?repo=try&revision=272cab1322fc) - observed messages in watch pending log indicating our max bid price (0.4) would not be successful - updated max bid price for y-2008 to 0.5 (https://github.com/mozilla/build-cloud-tools/pull/109) - observed successful spot requests in ec2 console (3 for use1, 3 for usw2, as expected/configured in slavealloc) - observed spot instances starting, successfully running userdata, naming themselves and mailing logs - now awaiting build output at https://ftp-ssl.mozilla.org/pub/mozilla.org/firefox/try-builds/rthijssen@mozilla.com-272cab1322fc
Attached image spot-cltbld.png (deleted) —
the us-east-1 instances appear to have hung mid build. rdp'ing to the instances (001 - 003) as cltbld shows this running but apparently going nowhere cmd prompt.
Attachment #8653370 - Flags: feedback?(mcornmesser)
the us-west-2 instances have all terminated. I cannot find any evidence that they did any work before terminating (slave_health/treeherder). The PaperTrail logs end like this: Aug 27 02:09:48 y-2008-spot-101.try.releng.usw2.mozilla.com USER32: The process c:\windows\SysWOW64\shutdown.exe (Y-2008-SPOT-101) has initiated the shutdown of computer Y-2008-SPOT-101 on behalf of user Y-2008-SPOT-101\cltbld for the following reason: No title for this reason could be found Reason Code: 0x800000ff Shutdown Type: shutdown Comment: #015
I think we've demonstrated that the spinning up and terminating processes work. We obviously have work to do to get mozilla-build's undies untwisted, but that's another bug...
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
There are alerts in #buildduty that indicate there's a buildbot misconfiguration/missing configuration: [sns alert] Thu 06:08:03 PDT buildbot-master78.bb.releng.usw2.mozilla.com watch_twistd_log.py: Count: 675 | First instance: 2015-08-27 05:28:27-0700 | Most recent instance: 2015-08-27 06:00:02-0700 | Twistd exception: twisted.cred.error.UnauthorizedLogin - unknown 10.132.67.67 [sns alert] Thu 06:08:03 PDT buildbot-master78.bb.releng.usw2.mozilla.com watch_twistd_log.py: Count: 681 | First instance: 2015-08-27 05:28:27-0700 | Most recent instance: 2015-08-27 06:00:01-0700 | Twistd exception: twisted.cred.error.UnauthorizedLogin - unknown 10.132.67.101 I've verified that those are windows spot instances 10.132.67.67 (y-2008-spot-103) and 10.132.67.101 (y-2008-spot-102)
All of the alerts were for use1 IPs, I didn't see any for usw2.
Attachment #8653370 - Flags: feedback?(mcornmesser)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: