Closed
Bug 1378381
Opened 7 years ago
Closed 7 years ago
OpenCloudConfig: avoid long-running format of EBS backed Z: drive
Categories
(Infrastructure & Operations :: RelOps: General, task)
Infrastructure & Operations
RelOps: General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: pmoore, Assigned: grenade)
References
Details
On Windows workers managed by OpenCloudConfig, we currently format the Z: drive[1] between task runs in order to improve efficiency of reads from Z: drive (due to the copy-on-read semantics).
This can take 20-25 mins, so we should avoid this. Instead we should probably create a volume at startup on the instance. The two possible volume types would be an EBS volume (remote) or an instance store volume (local). An instance store volume might not be possible on all instance types, so we'll need to check if there are appropriate instance types we can use in all cases, that suit all our requirements (including pricing!). I'm not sure at the moment if dynamically creating an EBS volume from scratch mitigates the need to format the drive for performance gain. We would need to test this. It looks like an EBS volume can be initialised via powershell[2].
AWS provides comprehensive documentation about block device mapping configuration[3].
--
[1] https://github.com/mozilla-releng/OpenCloudConfig/blob/9e615f9b56026faca9307f8dc582097f101a6d67/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot.bat#L37
[2] http://docs.aws.amazon.com/powershell/latest/reference/items/New-EC2Volume.html
[3] http://docs.aws.amazon.com/AWSEC2/latest/WindowsGuide/block-device-mapping-concepts.html
Comment 1•7 years ago
|
||
Instance store volumes are going the way of the dodo and don't exist on modern EC2 instance types. Everything is backed by EBS, right?
Either way, yes, attaching a fresh EBS volume and doing a quick format is the way to go. If you initialize an EBS volume from an AMI, you get the crappy copy-on-read behavior. It is probably faster to initialize a fresh EBS volume and stream bits from S3 than to initialize from an AMI and touch all sectors via format.
Blocks: 1305174
Updated•7 years ago
|
Assignee: relops → rthijssen
Assignee | ||
Comment 2•7 years ago
|
||
just an update that the implementation that mounts fresh ebs volumes (on spot instances) at boot, is working. i'm testing on gecko-1-b-win2012-beta.
basic design is:
- ami contains only the c: drive which has dependencies installed through occ during golden ami run
- occ updates the provisioner config with a section like this:
"launchSpec": {
"BlockDeviceMappings": [
{
"DeviceName": "/dev/sda1",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 40
}
},
{
"DeviceName": "/dev/sdb",
"Ebs": {
"DeleteOnTermination": true,
"VolumeSize": 120
}
}
],
...
- occ/dsc runs again on the spot instance and initialises /dev/sdb with two partitions for y: and z:, quick formats these
and assigns drive letters (https://github.com/mozilla-releng/OpenCloudConfig/blob/346047b7/userdata/rundsc.ps1#L294-L355)
i ran into problems with todays testing, due to a missing cot gpg key for gecko-1-b-win2012-beta which meant that test builds failed (https://treeherder.mozilla.org/#/jobs?repo=try&revision=59e97080262b13037129a420613c3d0d229da018&group_state=expanded&exclusion_profile=false&filter-searchStr=tc). but I expect to have this resolved and deployed tomorrow to gecko-(1-3)-b-win2012.
Comment 3•7 years ago
|
||
This got landed in try today in https://github.com/mozilla-releng/OpenCloudConfig/commit/c819210a76021161285fb782a708316fd8e2807e but signs point to it causing this problem:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=074acb44df33719de2e47ee941887fc8da6e61e4&selectedJob=123961923
Comment 4•7 years ago
|
||
(In reply to Amy Rich [:arr] [:arich] from comment #3)
> This got landed in try today in
> https://github.com/mozilla-releng/OpenCloudConfig/commit/
> c819210a76021161285fb782a708316fd8e2807e but signs point to it causing this
> problem:
>
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=074acb44df33719de2e47ee941887fc8da6e61e4&selectedJob=1
> 23961923
After conversation in irc with arr and gps.
https://github.com/mozilla-releng/OpenCloudConfig/commit/44f6633a88caefb776c294e86a325d8e3b6f6554
Comment 5•7 years ago
|
||
Rob mentioned today that he was testing an updated patch in try again.
Flags: needinfo?(rthijssen)
Assignee | ||
Comment 6•7 years ago
|
||
yes. has been running for nearly 24 hours on gecko-1-b-win2012 without any hg path length exceptions. probably due to the robustcheckout updates and the removal of the hg precache as well.
https://github.com/mozilla-releng/OpenCloudConfig/commit/c087f802347464ee3084a66ee8c7590bf852ec15
promoting now to gecko-2-b-win2012 & gecko-3-b-win2012
https://github.com/mozilla-releng/OpenCloudConfig/commit/bfaef41031359709dfde980669f3aa0dbb193410
Flags: needinfo?(rthijssen)
Assignee | ||
Updated•7 years ago
|
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Reporter | ||
Comment 7•7 years ago
|
||
Thanks Rob!
Comment 8•7 years ago
|
||
rob, do you know if this could affect the testers as well? I believe they do formats too, but the patch is for changing the builders.
Flags: needinfo?(rthijssen)
Assignee | ||
Comment 9•7 years ago
|
||
yes, the patch went live for win 7 and 10 this afternoon as well.
Flags: needinfo?(rthijssen)
Assignee | ||
Comment 10•7 years ago
|
||
win 7:
https://github.com/mozilla-releng/OpenCloudConfig/commit/baebbfda25e015c496b67ecaadab6c82073813e6
win 10:
https://github.com/mozilla-releng/OpenCloudConfig/commit/3218a73e0f4a90fe009a9fec87a4767544da2cfd
the ami change:
https://github.com/mozilla-releng/OpenCloudConfig/commit/bf26a64d67f5e74aa9fe901bbd93c48953cf3c65
You need to log in
before you can comment on or make changes to this bug.
Description
•