[worker-manager] update static provider to require workers to authenticate themselves
Categories
(Taskcluster :: Services, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: dustin)
References
Details
Per some discussions with relops, the idea is to pre-configure static workers with a shared secret that those workers can use to authenticate themselves to worker-manager and get taskcluster credentials.
We'll need an API to add and delete workers, with the ability for a provider to "reject" a worker (so e.g., google and ec2 providers won't allow adding workers). Then users can add workers in worker pools managed by a static provisioner.
The static provider will then need a registerWorker
implementation that verifies the shared secret.
Note that those who would like to can still run workers without this sort of pre-configuration, just as they always have done: configure the worker with credentials including queue:claim-task:<workerId>
and set it running.
I'll make the credential lifetime a provider configuration parameter. Although Firefox CI uses static workers that restart frequently, and thus will call registerWorker
frequently, likely other use-cases will have long-running workers. Since we're issuing temporary credentials that have a maximum lifetime of 30 days, we have two options:
- require workers to shut down and re-register before their credentials expire (worker-runner could do this pretty easily); or
- issue permacreds for workers
We can solve that in a followup.
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 1•5 years ago
|
||
AJ, this is solving the problem of identifying a hardware worker to the Taskcluster services such that it can start claiming work. Messing this up would potentially mean that anyone can get credentials to claim and execute tasks.
I'll add to the description above that I'd like to validate the shared secret in such a way that it is not revealed. That will probably be by sending
{"salt": "iethu6mishaeSho2Thai", "hash": "aegh9loo1Nongier9ko1xaj3ok6na1Ahvahb7iez"}
where the hash is HMAC(<secret>, <salt>)
. The secret will be user-specified.
Do you see anything else to worry about here?
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 2•5 years ago
|
||
First half will be https://github.com/taskcluster/taskcluster/pull/998
Comment 3•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #1)
AJ, this is solving the problem of identifying a hardware worker to the Taskcluster services such that it can start claiming work. Messing this up would potentially mean that anyone can get credentials to claim and execute tasks.
I'll add to the description above that I'd like to validate the shared secret in such a way that it is not revealed. That will probably be by sending
{"salt": "iethu6mishaeSho2Thai", "hash": "aegh9loo1Nongier9ko1xaj3ok6na1Ahvahb7iez"}
where the hash is
HMAC(<secret>, <salt>)
. The secret will be user-specified.Do you see anything else to worry about here?
This all sounds pretty good, I do have a few questions:
- "validate the shared secret in such a way that it is not revealed" - Why? Why not have the workers have something adjacent to an API key that is sent along in the request?
- How will these "shared secrets" be created? Who will be able to create them?
- When you "add a worker", does that create a "shared secret" that is then used by the worker to claim tasks? If not, then what exactly does "adding a worker"/
registerWorker
do?
Assignee | ||
Comment 4•5 years ago
|
||
"validate the shared secret in such a way that it is not revealed" - Why? Why not have the workers have something adjacent to an API key that is sent along in the request?
The particular way I implemented this accomplishes nothing, now that I look at it again. Yeah, let's just use it as a bearer token.
How will these "shared secrets" be created? Who will be able to create them?
They'll be created by the caller of the createWorker
API method. The alternative is for createWorker to generate a random secret and return it. The chosen approach is a little more flexible for users, allowing cases where, for example, all workers in a pool have the same secret. Users can make that decision for themselves.
When you "add a worker", does that create a "shared secret" that is then used by the worker to claim tasks? If not, then what exactly does "adding a worker"/registerWorker do?
Yes, basically. There are two steps here:
- user calls
createWorker
with a secret value, and sets up worker with that same value - worker calls
registerWorker
on startup, using that secret value as an "identity proof", and gets Taskcluster credentials (which include scopes to claim tasks) in response
Comment 5•5 years ago
|
||
How will these "shared secrets" be created? Who will be able to create them?
They'll be created by the caller of the
createWorker
API method. The alternative is for createWorker to generate a random secret and return it. The chosen approach is a little more flexible for users, allowing cases where, for example, all workers in a pool have the same secret. Users can make that decision for themselves.
This is interesting. So the user supplies there "shared secret"? Are there requirements on the format and entropy of the secret? How is accidental re-use prevented?
Assignee | ||
Comment 6•5 years ago
|
||
There aren't any such requirements -- those are all security requirements that the user/deployer would enforce, and not a threat to the TC platform itself. Re-use might be beneficial in some cases, e.g., a collection of identical VM's from the same template, where the distinction between the workers is not important.
That said, I can implement what you're suggesting. I'd like to get some feedback from the relops folks as to whether that makes their work more difficult first.
Comment 7•5 years ago
|
||
There aren't any such requirements -- those are all security requirements that the user/deployer would enforce, and not a threat to the TC platform itself. Re-use might be beneficial in some cases, e.g., a collection of identical VM's from the same template, where the distinction between the workers is not important.
I hear that. If this route is taken, I'd suggest some sort of format/entropy requirements to prevent bruteforcing.
That said, I can implement what you're suggesting. I'd like to get some feedback from the relops folks as to whether that makes their work more difficult first.
Ok sounds good.
Assignee | ||
Comment 8•5 years ago
|
||
I hear that. If this route is taken, I'd suggest some sort of format/entropy requirements to prevent bruteforcing.
Hm, I think this could work. For client accessTokens, we use two sluigid's back to back, so 44 characters matching some charset. We could do teh same here and include in the docs that we recommend using slugid() + slugid()
and that each worker have a unique secret. That doesn't prohibit aaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaaa
but does prohibit TODO
and foobar
.
Assignee | ||
Comment 9•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Description
•