Closed
Bug 978971
Opened 11 years ago
Closed 11 years ago
Spot bidding should be able to filter out AZs with bad conditions
Categories
(Release Engineering :: General, defect)
Release Engineering
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jlund, Assigned: rail)
Details
This could be due to:
- problems getting spot nodes
- nodes not being able to connect to buildbot
- jacuzzi logic going crazy on the masters
This caused tree closures today while sheriffs are reporting this to be a regular occurence.
Today's was solved by itself. Route initial cause/issue is unknown.
This bug serves to find that cause.
See Bug 978956 for solving how to catch it when it happens -- much of this is going unnoticed as we are not seeing a high pending count, but a few machines pending for too long (over 1hour).
Assignee | ||
Comment 1•11 years ago
|
||
We hit the following scenario here:
- the bidding library gives us a list of choices and we use the cheapest one
- the availability zone doesn't not have enough capacity to serve the spot requests
- the spot sanity scripts sees the "capacity-oversubscribed" results and cancels them
We should enhance the bidding library so we can filter out the choices depending on some condition, like too many failures in last N minutes.
Assignee: nobody → rail
Summary: we are hitting long pending wait times before machines pick up jobs for aws machines, regularly → Spot bidding should be able to filter out AZs with bad conditions
Assignee | ||
Comment 2•11 years ago
|
||
I landed the following to see how it behaves on the weekend:
http://hg.mozilla.org/build/cloud-tools/rev/3429306e6bd2
What it does:
* checks if the market price is higher than 80% of our bid price. If it's higher we don't try to request spot instances
* It checks recent (last 15 minutes) spot request for the requested instances type in the requested AZ. If we see more than 10% of instances being killed, spot requests not fulfilled due to low price or oversubscribed capacity, then we skip this spot choice.
Assignee | ||
Updated•11 years ago
|
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•8 years ago
|
Component: Tools → General
You need to log in
before you can comment on or make changes to this bug.
Description
•