Upgrade to taskcluster-proxy 5.1.0 and generic-worker 13.0.2 on OCC worker types
Categories
(Infrastructure & Operations :: RelOps: OpenCloudConfig, task)
Tracking
(Not tracked)
People
(Reporter: pmoore, Assigned: grenade)
References
(Blocks 1 open bug)
Details
Attachments
(5 files, 1 obsolete file)
There are some gecko changes planned which require taskcluster-proxy 5.1.0 and generic-worker 12.0.0:
- bug 1492618: g-w 12.0.0 is needed for ed25519 chain of trust signing keys
- bug 1508381: removing hardcoding of TASKCLUSTER_ROOT_URL and TASKCLUSTER_PROXY_URL from task definitions
Comment 1•6 years ago
|
||
Either you can generate the new level 3 ed25519 keypair and send me the public key portion, or I can generate a keypair and send you the private key portion... either way works for me.
Assignee | ||
Comment 2•6 years ago
|
||
i'm currently clearing the current backlog of pull requests using the beta worker types.
i should be able to get those workers testing gw 12 and tc-p 5.1.0 tomorrow.
Assignee | ||
Comment 3•6 years ago
|
||
this pr to the cot-gpg-keys repo rotates the gecko-3-b-win2012 cot key and changes the key algorithm to eddsa/Ed25519
Assignee | ||
Comment 4•6 years ago
|
||
Assignee | ||
Comment 5•6 years ago
|
||
Assignee | ||
Comment 6•6 years ago
|
||
- the two links above are to try pushes that use the beta worker types
- the beta worker types were built this morning from the occ beta branch using
- generic-worker 12.0.0
- taskcluster-proxy 5.1.0
Comment 7•6 years ago
|
||
Comment 8•6 years ago
|
||
Should be able to run this via go run genEd25519.go
. Writes a private key to ./ed25519-privkey and the public key to stdout.
Comment 9•6 years ago
|
||
Should be able to run this via go run getPubKey.go
. It will read the privkey base64 string from ./ed25519-privkey
and writes the base64-encoded pubkey to stdout.
Updated•6 years ago
|
Updated•6 years ago
|
Assignee | ||
Comment 10•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 11•6 years ago
|
||
Assignee | ||
Comment 12•6 years ago
|
||
Assignee | ||
Comment 13•6 years ago
|
||
Assignee | ||
Comment 15•6 years ago
|
||
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 16•6 years ago
|
||
Assignee | ||
Comment 17•6 years ago
|
||
aki, new public key for gecko-3-b-win2012 is:
6UPrVTyw0EPQV7bCEMXo+5jNR4clbK55JWG74bBJHZQ=
should i make a pr for the cot repo or are we doing it differently now?
Comment 18•6 years ago
|
||
Thanks! No, for now this will go into the scriptworker ed25519 PR.
Once we retire gpg cot we can retire the cot-gpg-keys repo.
Reporter | ||
Comment 19•6 years ago
|
||
Thanks Rob!
Important deployment note
The old workers will be routinely checking .secrets.generic-worker.config.deploymentId
to see if it changes, and if they should shut down.
However, the new deployment updates the .userData.genericWorker.config.deploymentId
property. Removing the contents of secrets section isn't enough to trigger the old workers to shut down.
Therefore, the old workers won't notice the new deployment, so you'll need to either manually update the old deploymentId (.secrets.generic-worker.config.deploymentId
) or just kill the old workers, if it is important that they don't stay around too long.
Credit to :SimonSapin for diagnosing this issue! He hit it when updating the servo-win2016 worker type.
Of course this only applies to the very first upgrade. Once the workers are running v13, future upgrades won't be affected, as the workers will be checking .userData.genericWorker.config.deploymentId
and the deployments will be updating .userData.genericWorker.config.deploymentId
. The issue occurs only when crossing the v13 boundary.
Thanks!
Assignee | ||
Updated•6 years ago
|
Assignee | ||
Comment 20•6 years ago
|
||
Assignee | ||
Comment 21•6 years ago
|
||
Assignee | ||
Comment 22•6 years ago
|
||
Comment 23•6 years ago
|
||
we currently fail to run jittest on windows10 and it isn't run in general (we switched this off in place of spidermonkey, but this was an accident).
the jsreftest is a new issue for me, I see it broken on mozilla-central:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&tier=1%2C2%2C3&searchStr=jsreftest%2Cwindows7-32&fromchange=b29c87add05f735b250612ca2444103652750091
so the gw upgrade doesn't induce anything new.
Updated•6 years ago
|
Assignee | ||
Comment 24•6 years ago
|
||
gw update deployment in progress...
planned deployment order is:
- 08:00 UTC: gecko-t-win10-64-hw, gecko-t-win10-64-ux
- 09:00 UTC: gecko-t-win10-64, gecko-t-win10-64-gpu, gecko-t-win7-32, gecko-t-win7-32-gpu
- 10:00 UTC: gecko-1-b-win2012, gecko-2-b-win2012, gecko-3-b-win2012
timings will change if anything doesn't go smoothly or if we rollback at some stage due to issues.
Assignee | ||
Comment 25•6 years ago
|
||
there are issues with missing configuration settings on gecko-t-win10-64-hw & gecko-t-win10-64-ux. i have patched these and am waiting to see if the patch succeeds.
deployment to gecko-t-win10-64, gecko-t-win10-64-gpu, gecko-t-win7-32, gecko-t-win7-32-gpu, gecko-1-b-win2012, gecko-2-b-win2012, gecko-3-b-win2012 is delayed until the issues are resolved.
Updated•6 years ago
|
Assignee | ||
Comment 26•6 years ago
|
||
rolled back gw to 10.11.2 and tc-proxy to 4.1 on win 10 hardware. will reattempt the upgrade tomorrow if i can fix the config issues.
Comment 27•6 years ago
|
||
Doh. Let me know if you need a hand with anything.
Assignee | ||
Comment 28•6 years ago
|
||
deploying gw 13.0.2 and tc-proxy 5.1.0 to gecko-t-win10-64-gpu only (for now).
https://tools.taskcluster.net/groups/WPAKKJyzQLGyIUHXmWuiNg/tasks/PVRN0pKYRQmsACTEJ1nFJw/runs/0/logs/public%2Flogs%2Flive.log
Assignee | ||
Comment 29•6 years ago
|
||
i've had to revert gecko-t-win10-64-gpu as something is wrong with gw configuration on ec2 instances as well.
papertrail shows this:
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Making system call GetProfilesDirectoryW with args: [0 C042021358]
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Result: 0 0 The data area passed to a system call is too small.
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Making system call GetProfilesDirectoryW with args: [C042017E20 C042021358]
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Result: 1 0 The operation completed successfully.
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Loading generic-worker config file 'C:\generic-worker\generic-worker.config'...
Feb 26 15:35:29 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker: Error loading configuration: open C:\generic-worker\generic-worker.config: The system cannot find the file specified.
i thought that gw was generating its own configuration files when we run generic-worker.exe run --configure-for-aws
but obviously something isn't right.
Reporter | ||
Comment 30•6 years ago
|
||
Assignee | ||
Comment 31•6 years ago
|
||
Reporter | ||
Comment 32•6 years ago
|
||
On reflection, I don't think that was the issue, as it looks like from this log line:
Feb 26 14:23:52 i-0096aed52a1cb4c8e.gecko-t-win10-64-gpu.use1.mozilla.com generic-worker-service: C:\generic-worker>.\generic-worker.exe run 1>.\generic-worker.log 2>&1
that C:\generic-worker\run-generic-worker.bat
didn't get replaced with the OCC version of run-generic-worker.bat on gecko-t-win10-64-gpu (since the log line does not include --configure-for-aws
).
Although the PR from comment 30 does no harm, it technically isn't absolutely needed, although is desirable.
The PR is still desirable because it causes C:\generic-worker\run-generic-worker.bat
to be generated correctly. In OCC, we replace C:\generic-worker\run-generic-worker.bat
in GenericWorkerStateWait with this file, so applying the PR now will not really impact the final machine images. However it is a little more explicit/informative to include --configure-for-aws
at installation time, and as such, if at some point we chose not to patch the generated C:\generic-worker\run-generic-worker.bat
with the OCC version, having the PR already landed now would mean that the script would run generic-worker with valid parameters in the future. So it feels somehow cleaner to include this change set already, for future-proofing.
In conclusion, it appears the failure was that GenericWorkerStateWait didn't run or failed for some reason on gecko-t-win10-64-gpu.
Reporter | ||
Comment 33•6 years ago
|
||
(In reply to Rob Thijssen [:grenade (EET)] from comment #31)
Comment on attachment 9046706 [details] [diff] [review]
Github Pull Request for OpenCloudConfig (added --configure-for-aws)Review of attachment 9046706 [details] [diff] [review]:
thanks, i'm retesting on beta presently.
Just a heads up that the first push accidentally included gecko-t-win7-32-hw (which doesn't run in AWS) so I removed the changes to it and force pushed...
Assignee | ||
Comment 34•6 years ago
|
||
windows 7 rollout is complete. amis are live and running gw 13 now
Comment 35•6 years ago
|
||
Any luck with the builds? I don't see the new ed25519 cot artifacts on autoland, which I'm guessing means they're running older AMIs.
Assignee | ||
Comment 36•6 years ago
|
||
(In reply to Aki Sasaki [:aki] from comment #35)
Any luck with the builds? I don't see the new ed25519 cot artifacts on autoland, which I'm guessing means they're running older AMIs.
windows 10 is in progress now. 2012 will follow...
Assignee | ||
Comment 37•6 years ago
|
||
windows 10 rollout is complete. amis are live and running gw 13 now
Assignee | ||
Comment 38•6 years ago
|
||
Assignee | ||
Comment 39•6 years ago
|
||
gecko-1-b-win2012 & gecko-2-b-win2012 rollout is complete. amis are live and running gw 13 now
gecko-3-b-win2012, gecko-3-b-win2012-c4 & gecko-3-b-win2012-c5 (and non-prod stragglers) updates will proceed tomorrow morning (08:00 GMT)
Assignee | ||
Comment 40•6 years ago
|
||
gecko-3-b-win2012, gecko-3-b-win2012-c4 & gecko-3-b-win2012-c5 rollout is complete. amis are live and running gw 13 now
Reporter | ||
Comment 41•6 years ago
|
||
We're getting close:
aws-provisioner-v1/win2012r2-cu: generic-worker 13.0.2
aws-provisioner-v1/gecko-2-b-win2012: generic-worker 13.0.2
aws-provisioner-v1/servo-win2016-staging: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-cu: generic-worker 13.0.2
aws-provisioner-v1/nss-win2012r2: generic-worker 13.0.2
aws-provisioner-v1/deepspeech-win: generic-worker 13.0.3
aws-provisioner-v1/gecko-t-win10-64: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win7-32-gpu-b: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win7-32-beta: generic-worker 13.0.2
aws-provisioner-v1/gecko-3-b-win2012-c5: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win7-32: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-beta: generic-worker 13.0.2
aws-provisioner-v1/win2012r2: generic-worker 13.0.2
aws-provisioner-v1/gecko-3-b-win2012-c4: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-gpu-a: generic-worker 11.1.0
aws-provisioner-v1/gecko-3-b-win2012: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win7-32-gpu: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win7-32-cu: generic-worker 13.0.2
aws-provisioner-v1/gecko-1-b-win2012: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-gpu: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-gpu-b: generic-worker 13.0.2
aws-provisioner-v1/gecko-1-b-win2012-beta: generic-worker 13.0.2
aws-provisioner-v1/servo-win2016: generic-worker 13.0.2
aws-provisioner-v1/nss-win2012r2-new: generic-worker 13.0.2
aws-provisioner-v1/gecko-t-win10-64-alpha: generic-worker 11.1.0
So I think the only ones that are still needed are:
aws-provisioner-v1/gecko-t-win10-64-gpu-a: generic-worker 11.1.0
aws-provisioner-v1/gecko-t-win10-64-alpha: generic-worker 11.1.0
Reporter | ||
Comment 42•6 years ago
|
||
Hey Rob,
Are you ok to upgrade these last two?
aws-provisioner-v1/gecko-t-win10-64-gpu-a: generic-worker 11.1.0
aws-provisioner-v1/gecko-t-win10-64-alpha: generic-worker 11.1.0
Also, any idea what might be wrong here?
Thanks!
Reporter | ||
Comment 43•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #42)
Hey Rob,
Are you ok to upgrade these last two?
aws-provisioner-v1/gecko-t-win10-64-gpu-a: generic-worker 11.1.0 aws-provisioner-v1/gecko-t-win10-64-alpha: generic-worker 11.1.0
I'm deploying these in https://github.com/mozilla-releng/OpenCloudConfig/commit/5ce58ec41b0923af79c3e8a005a4d908dba040f8
Also, any idea what might be wrong here?
I created bug 1533402 for this. It turned out to be an issue in the archiver library:
Reporter | ||
Comment 44•6 years ago
|
||
Deploying generic-worker 13.0.4 to gecko-*-win*-{b,beta,cu}
worker types in https://tools.taskcluster.net/groups/ZYTaoWLCSQu8qg4BIJzZ3g
Reporter | ||
Comment 45•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #44)
Deploying generic-worker 13.0.4 to
gecko-*-win*-{b,beta,cu}
worker types in https://tools.taskcluster.net/groups/ZYTaoWLCSQu8qg4BIJzZ3g
Due to a bug that deployment didn't go so well (it deployed generic-worker 13.0.2 instead of generic-worker 13.0.4), so I've triggered another deployment in https://tools.taskcluster.net/groups/KTuN0josS6SDidd9QQYLvQ
Reporter | ||
Comment 46•6 years ago
|
||
Try push with generic-worker 13.0.4:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d5a0411f4070cb119ce8c4b48fb714e972844d4
Reporter | ||
Comment 47•6 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #46)
Try push with generic-worker 13.0.4:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0d5a0411f4070cb119ce8c4b48fb714e972844d4
Hey Rob,
This try push with 13.0.4 looks fine to me, do you see any reason for us not to update?
Note, the primary reason to upgrade was to fix gecko-t-win7-32-cu, used by generic-worker CI, when mounting archives that contained hard links - so we don't really gain anything by upgrading the gecko worker types, other than just keeping them up-to-date.
So I'll let you make the call if you think it is worth it or not.
Thanks!
Reporter | ||
Comment 49•6 years ago
|
||
Reporter | ||
Comment 50•6 years ago
|
||
I'm going to mark this as resolved, and put 13.0.2 -> 13.0.4 in a separate bug, as we're already on 13.0.2 or higher on all the worker types now.
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Updated•6 years ago
|
Description
•