1546686 - Configure docker-worker to work with worker-manager in GCP

The master branch of tc-worker-runner now has all the pieces needed to run docker-worker under aws-provisioner. Next step will be to try that out for-really using ami-test or some such workerType.

The only hitch in the plan here is that docker-worker expects a good bit of information in its config which is not always available, such as region or instance type or public IP. In general that data seems to be used for things like logging identifiers and other debugging-related stuff. I've abstracted that as "provisionerMetadata" here, and encouraged treating it as a soft requirement. If there are "harder" requirements that are generally applicable, we could certainly add those to tc-worker-runner.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 3

•

5 years ago

ni -> review of what's in the tc-worker-runner repo right now.

https://github.com/taskcluster/taskcluster-worker-runner

Flags: needinfo?(pmoore)

Flags: needinfo?(bstack)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 4

•

5 years ago

Content of my TODO.txt right now, for reference:

* basic CI                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                                     
* linting
* generate README from --help, check in CI
* add docs to tc repo
* factor out common code in providerconfig.go, workerimplconfig.go

* support caching configuration over restarts
* support setting permissions for files
* support unpacking files in secrets (?? or just make workers take it as
  config)
* support starting workers as another user
* support preventing access to metadata via firewall
* stay running, reboot or halt when worker exits (based partially on exit code)
* manage autologin
* support polling for expired deployments (send a signal to the worker?)
* support termination notification

Brian Stack [:bstack]

Comment 5

•

5 years ago

The interface looks great. Initial thought:

"This looks like it does more than run workers, it manages workers... Let's call it worker-manager instead."

Then I realized what I had done.

Anything I can do to help out with this?

Flags: needinfo?(bstack)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 6

•

5 years ago

Next target per bstack, once I get aws-provisioner working: docker-worker + gcp provider

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 7

•

5 years ago

https://github.com/taskcluster/docker-worker/pull/464

^^ working on ami-test with taskcluster-worker-runner@0.1.1

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

5 years ago

Blocks: 1558525

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 8

•

5 years ago

This bug is now more narrowly targeted at docker-worker / gcp-provider.

Flags: needinfo?(pmoore)

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Updated

•

5 years ago

Component: Services → Workers

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 9

•

5 years ago

https://github.com/taskcluster/docker-worker/pull/465 for the taskcluster-worker-runner compatibility in docker-worker.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 10

•

5 years ago

So this seems to be running workers, but they shut down with


Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Uncaught Exception! Attempting to report to Sentry and crash.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: reportError - level: fatal, tags: {}
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:  { Error: spawn shutdown ENOENT
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at Process.ChildProcess._handle.onexit (internal/child_process.js:190:19)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at onErrorNT (internal/child_process.js:362:16)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at _combinedTickCallback (internal/process/next_tick.js:139:11)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:     at process._tickDomainCallback (internal/process/next_tick.js:219:9)
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   errno: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   code: 'ENOENT',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   syscall: 'spawn shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   path: 'shutdown',
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker:   spawnargs: [ '-h', 'now' ] }
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: Succesfully reported error to Sentry.
Jul 03 22:24:31 docker-worker.aws-provisioner.us-west-2c.ami-0b7a5d713c599ffb9.c5d-4xlarge.i-0a595bb6ab3d1d726 docker-worker: 2019/07/03 22:24:31 exit status 1

I'm not sure why this patch would change that behavior? My only guess is that maybe it's not passing $PATH along to the worker..

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 11

•

5 years ago

It does run tasks, though: https://tools.taskcluster.net/groups/PqGxGEjiQTK4B5UynHvJhw/tasks/PqGxGEjiQTK4B5UynHvJhw/runs/0/logs/public%2Flogs%2Flive.log

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 12

•

5 years ago

https://github.com/taskcluster/taskcluster-worker-runner/pull/6

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 13

•

5 years ago

Brian's been working on getting this set up in the taskcluster-dev deployment. Once that's up to the point of generating lots of errors, I can hack on those errors.

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 14

•

5 years ago

Just https://github.com/taskcluster/docker-worker/pull/486 and https://github.com/taskcluster/taskcluster/pull/1135 left

Dustin J. Mitchell [:dustin] (he/him)

Assignee

Comment 15

•

5 years ago

..and those have landed now, too.

So the current state is that we can successfully build an image that will run docker-worker. I think we're about ready to run docker-worker tasks in staging, then!

Status: NEW → RESOLVED

Closed: 5 years ago

Resolution: --- → FIXED

Bugzilla

Configure docker-worker to work with worker-manager in GCP

Categories

(Taskcluster :: Workers, task)

Tracking

(Not tracked)

People

(Reporter: dustin, Assigned: dustin)

References

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Updated

Comment 8

Updated

Comment 9

Comment 10

Comment 11

Comment 12

Comment 13

Comment 14

Comment 15