Closed Bug 1009012 Opened 11 years ago Closed 9 years ago

auto-scale buildbot masters in AWS

Categories

(Release Engineering :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: catlee, Unassigned)

References

Details

(Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2986] )

We're running quite a few buildbot-masters on ondemand machines in AWS to cover our peak load. However, we're not always running at peak load! We should shut off masters when they're not needed. For the moment, let's limit this to starting/stopping existing instances rather than creating/terminating new ones. Things to keep in mind: - Need to monitor masters dying, and to start new ones if required - Use slavealloc to mark masters as disabled, enable graceful shutdown on the master, then halt. - Need to figure out how to resume a master and get it up-to-date - manage_masters.py needs to not fail on suspended masters Step 1 is to look at historical load and estimate many masters we could shut off, factor that by the ondemand price, and see how much potential savings could be.
Some initial data: We have 7 m1.large and 19 m3.medium on demand instances running 24/7 as masters. This costs us about $1,800 per month for CPU time. If the lowest we can scale down is 4 masters per region (1 build + 1 try + 1 32-bit tests + 1 64-bit tests), then we can get down to about $500 per month for these. We could also look at running >1 master per instance, perhaps on beefier instances.
Also use slaveapi to reboot any connected slaves, so they're not sitting around trying to connect to something that's off ?
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2976]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2976] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2981]
Whiteboard: [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2981] → [kanban:engops:https://mozilla.kanbanize.com/ctrl_board/6/2986]
No master no headache!
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
Component: General Automation → General
You need to log in before you can comment on or make changes to this bug.