Closed
Bug 1287604
Opened 8 years ago
Closed 8 years ago
Experiment with different AWS instance types for TC linux64 builds
Categories
(Firefox Build System :: General, defect, P2)
Firefox Build System
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: jgriffin, Assigned: jgriffin)
References
(Blocks 1 open bug)
Details
Linux64 builds in TaskCluster are currently built using a blend of m3/c3/r3.2xlarge instances, depending on pricing and availability.
We'd like to experiment with using AWS instance types with more RAM and/or cores, in order to be able to evaluate the cost/benefit ratio of faster E2E build times in automation vs cost.
Comment 1•8 years ago
|
||
It is more important to scale cores than RAM. As long as we have 1+ GB/core, we should be fine. A little less is probably OK. Depends on platform though.
I've built in 4-5 minutes on a C4.8xlarge. That was only the build - symbol generating, packaging, etc take several minutes longer. But the C++ in the build system does scale out to dozens of cores pretty well.
Assignee | ||
Comment 3•8 years ago
|
||
Some numbers:
type compile build price/cents per hr
m3.2xlarge 32 40 13.5
m4.4xlarge 14 22 20.6
c4.2xlarge 24 33 15.3
c4.4xlarge 13 20 22.9
r3.4xlarge 17 25 25.3
This suggests we should consider switching to m4/c4.4xlarge for linux builds; this would have 20 minutes off the build time at a cost delta of around 7 to 9 cents an hour. Since we'd be using the instance for about 20 minutes less per build, the real delta per build is only about 5 or 6 cents. This is a tiny cost compared the development velocity improvements we could achieve by reducing build times, especially on Try.
I haven't run experiments on 8xlarge instances yet; comment # 1 suggests this would result in additional speed increases, but they would come at greater cost. Currently a c4.8xlarge spot instance costs 45.7 cents/hr.
Comment 4•8 years ago
|
||
https://tools.taskcluster.net/aws-provisioner/ says we have ~100 instances of {dbg,opt}-linux-{32,64}. Assuming we run 100 instances 24/7, multiply the cost per hour by 74,400 to get our monthly cost. e.g.
100x m3.2xlarge @ $0.135: $10,044
100x c4.2xlarge @ $0.153: $11,383
100x c4.4xlarge @ $0.229: $17,037
The jump from m3.2xl to c4.2xl for little over $1,000/mo is a no brainer IMO.
Considering build jobs are the long pole in automation, I think throwing thousands of dollars at the problem per month is warranted.
jgriffin: did you test a full build job (symbol generation and all)? Or is this just the `mach build` piece?
Flags: needinfo?(jgriffin)
Assignee | ||
Comment 5•8 years ago
|
||
I ran the entire build, using TaskCluster's build.sh script. So, the raw data I was looking at was something like this:
PERFHERDER_DATA: {"framework": {"name": "build_metrics"}, "suites": [{"subtests": [{"name": "libxul.so", "value": 119959088}], "name": "installer size", "value": 66849994, "alertThreshold": 0.25}, {"subtests": [{"name": "configure", "value": 25.377415895462036}, {"name": "pre-export", "value": 0.4210519790649414}, {"name": "export", "value": 26.041933059692383}, {"name": "compile", "value": 773.1008520126343}, {"name": "misc", "value": 2.1479151248931885}, {"name": "libs", "value": 9.281205177307129}, {"name": "tools", "value": 0.4877140522003174}, {"name": "package-tests", "value": 111.22820997238159}, {"name": "buildsymbols", "value": 204.79079699516296}, {"name": "package", "value": 44.21001100540161}, {"name":
"upload", "value": 8.736287832260132}], "name": "build times", "value": 1207.6935601234436}]}
(this for a c4.4xlarge instance)
Flags: needinfo?(jgriffin)
Comment 6•8 years ago
|
||
773s for a compile on a c4.4xlarge seems a bit long since the c4.4xlarge has 16 VCPUs. I would expect the compile tier to take 300-450s on that instance type.
I wonder if ccache or sccache could be interfering here. Also, in the c4 series, everything except the c4.8xlarge is shared hardware. So if there are other instances on the same physical machine, you'll be competing for system resources.
Also, slow I/O due to e.g. EBS could be slowing things down as well.
Assignee | ||
Updated•8 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: Build Config → General
Product: Firefox → Firefox Build System
You need to log in
before you can comment on or make changes to this bug.
Description
•