Configure docker-worker to work with worker-manager in GCP
Categories
(Taskcluster :: Workers, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: dustin)
References
Details
Assignee | ||
Comment 1•5 years ago
|
||
I got started with a thing at https://github.com/djmitche/taskcluster-worker-runner
Assignee | ||
Comment 2•5 years ago
|
||
https://github.com/taskcluster/docker-worker/pull/464
The master branch of tc-worker-runner now has all the pieces needed to run docker-worker under aws-provisioner. Next step will be to try that out for-really using ami-test or some such workerType.
The only hitch in the plan here is that docker-worker expects a good bit of information in its config which is not always available, such as region or instance type or public IP. In general that data seems to be used for things like logging identifiers and other debugging-related stuff. I've abstracted that as "provisionerMetadata" here, and encouraged treating it as a soft requirement. If there are "harder" requirements that are generally applicable, we could certainly add those to tc-worker-runner.
Assignee | ||
Comment 3•5 years ago
|
||
ni -> review of what's in the tc-worker-runner repo right now.
Assignee | ||
Comment 4•5 years ago
|
||
Content of my TODO.txt right now, for reference:
* basic CI
* linting
* generate README from --help, check in CI
* add docs to tc repo
* factor out common code in providerconfig.go, workerimplconfig.go
* support caching configuration over restarts
* support setting permissions for files
* support unpacking files in secrets (?? or just make workers take it as
config)
* support starting workers as another user
* support preventing access to metadata via firewall
* stay running, reboot or halt when worker exits (based partially on exit code)
* manage autologin
* support polling for expired deployments (send a signal to the worker?)
* support termination notification
Comment 5•5 years ago
|
||
The interface looks great. Initial thought:
"This looks like it does more than run workers, it manages workers... Let's call it worker-manager instead."
Then I realized what I had done.
Anything I can do to help out with this?
Assignee | ||
Comment 6•5 years ago
|
||
Next target per bstack, once I get aws-provisioner working: docker-worker + gcp provider
Assignee | ||
Comment 7•5 years ago
|
||
https://github.com/taskcluster/docker-worker/pull/464
^^ working on ami-test
with taskcluster-worker-runner@0.1.1
Assignee | ||
Comment 8•5 years ago
|
||
This bug is now more narrowly targeted at docker-worker / gcp-provider.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 9•5 years ago
|
||
https://github.com/taskcluster/docker-worker/pull/465 for the taskcluster-worker-runner compatibility in docker-worker.
Assignee | ||
Comment 10•5 years ago
|
||
So this seems to be running workers, but they shut down with
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: reportError - level: fatal, tags: {}
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: { Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: errno: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: code: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: syscall: 'spawn shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: path: 'shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: spawnargs: [ '-h', 'now' ] }
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Succesfully reported error to Sentry.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: 2019/07/03 22:24:31 exit status 1
I'm not sure why this patch would change that behavior? My only guess is that maybe it's not passing $PATH along to the worker..
Assignee | ||
Comment 11•5 years ago
|
||
Assignee | ||
Comment 12•5 years ago
|
||
Assignee | ||
Comment 13•5 years ago
|
||
Brian's been working on getting this set up in the taskcluster-dev
deployment. Once that's up to the point of generating lots of errors, I can hack on those errors.
Assignee | ||
Comment 14•5 years ago
|
||
Assignee | ||
Comment 15•5 years ago
|
||
..and those have landed now, too.
So the current state is that we can successfully build an image that will run docker-worker. I think we're about ready to run docker-worker tasks in staging, then!
Description
•