Closed Bug 978971 Opened 11 years ago Closed 11 years ago

Spot bidding should be able to filter out AZs with bad conditions

Categories

(Release Engineering :: General, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jlund, Assigned: rail)

Details

This could be due to: - problems getting spot nodes - nodes not being able to connect to buildbot - jacuzzi logic going crazy on the masters This caused tree closures today while sheriffs are reporting this to be a regular occurence. Today's was solved by itself. Route initial cause/issue is unknown. This bug serves to find that cause. See Bug 978956 for solving how to catch it when it happens -- much of this is going unnoticed as we are not seeing a high pending count, but a few machines pending for too long (over 1hour).
We hit the following scenario here: - the bidding library gives us a list of choices and we use the cheapest one - the availability zone doesn't not have enough capacity to serve the spot requests - the spot sanity scripts sees the "capacity-oversubscribed" results and cancels them We should enhance the bidding library so we can filter out the choices depending on some condition, like too many failures in last N minutes.
Assignee: nobody → rail
Summary: we are hitting long pending wait times before machines pick up jobs for aws machines, regularly → Spot bidding should be able to filter out AZs with bad conditions
I landed the following to see how it behaves on the weekend: http://hg.mozilla.org/build/cloud-tools/rev/3429306e6bd2 What it does: * checks if the market price is higher than 80% of our bid price. If it's higher we don't try to request spot instances * It checks recent (last 15 minutes) spot request for the requested instances type in the requested AZ. If we see more than 10% of instances being killed, spot requests not fulfilled due to low price or oversubscribed capacity, then we skip this spot choice.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Component: Tools → General
You need to log in before you can comment on or make changes to this bug.