Closed
Bug 942498
Opened 11 years ago
Closed 11 years ago
Use spot instances for try builds
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: catlee, Assigned: rail)
References
Details
Attachments
(4 files)
(deleted),
patch
|
catlee
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
(deleted),
text/plain
|
Details | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
dustin
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
We should be able to use spot instances pretty easily for try builds. Try builds are always clobber builds, so as long as we have a semi-recent local copy of the try repo, builds on fresh spot instances shouldn't be slower than builds on regular in-house or ondemand instances.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → rail
Assignee | ||
Comment 1•11 years ago
|
||
Attachment #8346212 -
Flags: review?(catlee)
Reporter | ||
Updated•11 years ago
|
Attachment #8346212 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 2•11 years ago
|
||
Comment on attachment 8346212 [details] [diff] [review]
configs
https://hg.mozilla.org/build/buildbot-configs/rev/550521109f80
Attachment #8346212 -
Flags: checked-in+
Assignee | ||
Comment 3•11 years ago
|
||
Assignee | ||
Comment 4•11 years ago
|
||
Assignee | ||
Comment 5•11 years ago
|
||
Assignee | ||
Comment 6•11 years ago
|
||
I extended subnets in AWS to use the whole block of 10.DC.64.0/22 and added missing SOA entries to the inventory.
Assignee | ||
Comment 7•11 years ago
|
||
I have 2 spot instances running in staging environment since yesterday. If I don't find any major issues I'd like to bring some into production early next week.
Assignee | ||
Comment 8•11 years ago
|
||
try is 10.DC.64.0/22 is actually
Attachment #8348247 -
Flags: review?(dustin)
Updated•11 years ago
|
Attachment #8348247 -
Flags: review?(dustin) → review+
Assignee | ||
Comment 9•11 years ago
|
||
Comment on attachment 8348247 [details] [diff] [review]
[puppet] allowed IPs
remote: https://hg.mozilla.org/build/puppet/rev/48e532b1967f
remote: https://hg.mozilla.org/build/puppet/rev/6b45823f91df
Attachment #8348247 -
Flags: checked-in+
Assignee | ||
Comment 10•11 years ago
|
||
Dec 16, Mon
total jobs, jobs on spots, spot retries
528, 14 (2%), 9 (64%)
The ratio is very high here... Since build jobs take much longer, probability of being killed is much higher here... I"ll try to bump the price a bit and see the difference.
Assignee | ||
Comment 11•11 years ago
|
||
It was better yesterday:
total jobs, jobs on spots, spot retries
1097, 54 (4%), 1 (1%)
I'm going to bump the limits
Assignee | ||
Comment 12•11 years ago
|
||
Thu, Dec 19
total jobs, jobs on spots, spot retries
781, 219 (28%), 28 (12%)
I think, the high percentage is correlated to shortage of m3.xlarge these days.
Assignee | ||
Comment 13•11 years ago
|
||
date, total jobs, jobs on spots, spot retries
2013-12-20, 546, 286 (52%), 14 (4%)
2013-12-21, 133, 122 (91%), 0 (0%)
2013-12-22, 17, 17 (100%), 0 (0%)
2013-12-23, 236, 137 (58%), 0 (0%)
2013-12-24, 226, 152 (67%), 0 (0%)
2013-12-25, 33, 23 (69%), 0 (0%)
2013-12-26, 5, 5 (100%), 0 (0%)
2013-12-27, 1, 1 (100%), 0 (0%)
2013-12-28, 0, 0 (0%), 0 (0%)
2013-12-29, 38, 34 (89%), 0 (0%)
2013-12-30, 83, 26 (31%), 0 (0%)
2013-12-31, 27, 27 (100%), 0 (0%)
I'm going to monitor try builds on spot instances to figure out cases when we don't retry properly.
Assignee | ||
Comment 14•11 years ago
|
||
I bumped the try limit up to 200 in total: https://hg.mozilla.org/build/cloud-tools/rev/c362cf6b1af6
Assignee | ||
Comment 15•11 years ago
|
||
FTR, after talking to Taras regarding AWS spot pricing model we decided to bump the bid prices up to 10¢ for m1.medium (vs 0.12¢ on-demand price). This price actually represents our highest price we will to pay in case the "market" price goes up. This should improve out retry stats.
I bumped adjusted the prices on Friday, Jan 3rd, PT afternoon.
date, total jobs, jobs on spots, spot retries
2014-01-01, 3, 3 (100%), 0 (0%)
2014-01-02, 262, 245 (93%), 1 (0%)
2014-01-03, 289, 283 (97%), 25 (8%)
2014-01-04, 163, 152 (93%), 8 (5%)
2014-01-05, 3, 3 (100%), 3 (100%)
The stats are still not representative due to low load.
Assignee | ||
Comment 16•11 years ago
|
||
I'm going to leave the current limits as as because I see some spikes in retries. I hope it's related to the usw2 network blips, but it'd better to isolate them.
date, total jobs, jobs on spots, spot retries
2014-01-06, 366, 362 (98%), 9 (2%)
2014-01-07, 754, 611 (81%), 106 (17%)
2014-01-08, 1093, 713 (65%), 54 (7%)
Assignee | ||
Comment 17•11 years ago
|
||
Updated stats for try-linux64
date, total jobs, jobs on spots, spot retries, o-d retries
2014-01-01, 3, 3 (100%), 0 (0%), 0 (0%)
2014-01-02, 262, 245 (93%), 1 (0%), 0 (0%)
2014-01-03, 289, 283 (97%), 25 (8%), 0 (0%)
2014-01-04, 163, 152 (93%), 8 (5%), 0 (0%)
2014-01-05, 3, 3 (100%), 3 (100%), 0 (0%)
2014-01-06, 366, 362 (98%), 9 (2%), 0 (0%)
2014-01-07, 754, 611 (81%), 106 (17%), 0 (0%)
2014-01-08, 1093, 713 (65%), 54 (7%), 0 (0%)
2014-01-09, 1231, 940 (76%), 69 (7%), 2 (0%)
2014-01-10, 1312, 848 (64%), 6 (0%), 4 (0%)
2014-01-11, 468, 345 (73%), 1 (0%), 28 (22%)
2014-01-12, 267, 267 (100%), 0 (0%), 0 (0%)
Assignee | ||
Comment 18•11 years ago
|
||
I bumped the limtis for try jobs twice: https://hg.mozilla.org/build/cloud-tools/rev/098082cd1239
Assignee | ||
Comment 19•11 years ago
|
||
This has been quite stable so far, closing.
date, total jobs, jobs on spots, spot retries, o-d retries
2014-01-13, 790, 662 (83%), 7 (1%), 0 (0%)
2014-01-14, 889, 843 (94%), 32 (3%), 0 (0%)
2014-01-15, 1244, 1079 (86%), 105 (9%), 0 (0%)
2014-01-16, 1072, 997 (93%), 2 (0%), 0 (0%)
2014-01-17, 1158, 814 (70%), 122 (14%), 0 (0%)
2014-01-18, 377, 340 (90%), 0 (0%), 0 (0%)
2014-01-19, 152, 152 (100%), 0 (0%), 0 (0%)
2014-01-20, 461, 461 (100%), 2 (0%), 0 (0%)
2014-01-21, 1363, 814 (59%), 15 (1%), 0 (0%)
2014-01-22, 997, 679 (68%), 14 (2%), 1 (0%)
2014-01-23, 1249, 688 (55%), 3 (0%), 2 (0%)
2014-01-24, 1064, 704 (66%), 10 (1%), 0 (0%)
2014-01-25, 138, 113 (81%), 6 (5%), 0 (0%)
2014-01-26, 67, 67 (100%), 0 (0%), 0 (0%)
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•7 years ago
|
Component: General Automation → General
You need to log in
before you can comment on or make changes to this bug.
Description
•