Closed Bug 1170784 Opened 9 years ago Closed 7 years ago

Support for generic docker-worker proxies with secure secret injection (proposal)

Categories

(Taskcluster :: Workers, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: dustin, Unassigned)

References

Details

(Whiteboard: [docker-worker])

The goal here is to enable creation of things like https://github.com/taskcluster/testdroid-proxy without requiring modification of docker-worker. So it should be easy to create and update proxies in-tree, including experimentation in try. task.payload.proxies = { "relengapi": "quay.io/djmitche/relengapi-proxy:0.0.2", } I don't see any reason to limit which proxies a task can use (and such limiting would make it harder for devs to self-service new proxies). Without any secret data, there's nothing a proxy can do that a task couldn't do all by itself. Ideally the proxy images would be built using the in-tree testing/docker stuff, although this does make it tricky to build go apps, for example (to my knowledge, go is one of the few languages not already used in the gecko build process). The tricky bit is controlling access to the secrets (and infra-specific configuration) that proxies need. For that, I think we can turn to the secrets service proposed in bug 1168534, or whatever that develops into. So we need a way for a task's scopes to grant its *proxies* permission to access certain secrets, without giving the task itself such permissions. The idea is to use some simple transform on scopes from the tasks's scopes to the proxy's scopes; the docker-worker would then give some temporary credentials to the proxy based on the transformed scopes (and limited by the workerType's scopes, so e.g., the try worker wouldn't be able to grant balrog credentials). The exact form of that scope is not quite determined yet. We need something that will allow us to carve out a "testing" space in the secrets to which anyone can write and grant proxies access, to allow self-serve proxy development. And we need to be able to prevent users from granting scopes for important secrets to untrusted proxies. For example, all users will have a scope allowing them to grant the symbol-upload proxy access to the symbol-upload secret, but we need to ensure that users cannot grant their secret-stealing proxy access to the same secret.
Blocks: 1168314
Blocks: 1170753
Blocks: 1164615
Basically, a scope on the form: docker-worker:delegate:<docker-image>:<scope> Would delegate the scope <scope> to any proxy running the image <docker-image>. Then <scope> can grant access a secret or a service, and <docker-image> is an image we trust to ensure that this access is guarded. Example: A proxy that allows you to index tasks by hash of artifacts may be trusted to validate the hash of artifacts before inserting task in index. So it's safe to give this trusted proxy image a scope that allows it to insert into the index. From this example, it seems it might be just as useful if docker-worker had a trusted source from which it could get a set of scopes that a trusted docker-image was allowed to use. Example: docker-worker is told to run <docker-image> as proxy. Then docker-worker looks up the auth-role "assume:proxy-image:<docker-image>" which returns a list of scopes that is assigned to the proxy. (auth-roles is scope management concept under development in my mind, as mapping from roleid to scopes).
The first bit sounds good, but how would I self-serve building a new proxy in that case, since I wouldn't have any scope for my docker image? I don't really understand the auth-roles / assume bit, perhaps due to lack of access to your mind :)
As a usability note, it would be *great* to also get stdout from the proxies kept as an artifact (even better if it live-scrolls!). This will require not logging credentials, but that's not a big deal.
Component: TaskCluster → Docker-Worker
Product: Testing → Taskcluster
Whiteboard: [docker-worker]
Component: Docker-Worker → Worker
This sounds like it'd enable some very powerful stuff, but I wonder if it's possible to build a slightly more limited version that was easier to implement, but covered most of our use cases? AFAIK most of our use cases are "let this task access this HTTP API without having to know the credentials". I don't know about the other proxies, but AFAICT the relengapi-proxy literally just fetches a temporary token and then sticks that in an HTTP Authentication header for each request and passes it on: https://github.com/taskcluster/relengapi-proxy/blob/4f3e8febb0ff6b24ffe9da0aa47224c981b5ed64/proxy.go#L69 Would it be possible to define something like that in a generic way, like: ``` "proxies": [ { "local_name": "relengapi", "target": "https://api.pub.build.mozilla.org/" "secret": "something/in/secrets", "proxy_auth": { "type": "http_bearer_auth", }, }, ] ``` And with something like that in a task definition, it would proxy http://relengapi/whatever -> https://api.pub.build.mozilla.org/whatever with the authentication header added. Then instead of writing a custom proxy for crash-stats.mozilla.org for symbol upload, maybe we could define something like: ``` "proxies": [ { "local_name": "crash-stats", "target": "https://crash-stats.mozilla.org/" "secret": "something/in/secrets", "proxy_auth": { "type": "http_header", "header": "Auth-Token" }, } ] ``` What do you think about that?
How are the temporary credentials are fetched in that scenario? One thing I'm worried about is that we not change how proxies are accessed too many times. Currently we have http://relengapi and http://taskcluster, but one of the ideas that we've talked about is a general proxy, so something like http://task-proxy/api.pub.build.mozilla.org/some/path/on/relengapi where the first element of the path is used as a hostname, and TC temporary credentials are always added. I *think* we can support that across all of our platforms. That would also avoid the need to develop custom proxies -- if necessary, such a thing would sit outside of taskcluster and translate taskcluster credentials into service credentials. For example if codecov.org had an API to submit coverage results, we could build a codecov-proxy.pub.build.mozilla.org that takes taskcluster credentials on input, generates matching codecov.org API credentials, and forwards the request to codecov.org. Also, as I understand it, these things are implemented differently in taskcluster-worker's docker-engine, where the proxying takes place in the engine process rather than in a second container (those second containers have proven unreliable). So I think we are not at a great point to start redesigning this feature. Once we have a more solid footing in using taskcluster-worker everywhere, we can draw up a more general, cross-platform solution. I think this came up in the context of submitting stats directly to crash-stats, right? If we were able to help building a purpose-specific crash-stats-proxy, would that get you past the immediate need? I found the relengapi proxy to be not too difficult, so there's hope this would be simple to implement, followed by deployment in docker-worker.
Yeah, the immediate thing I'd like to fix is symbol upload, where we currently have that private docker image with the credentials baked in. The proxy there ought to be trivial, it'd literally just have to fetch the token from secrets and put it in the 'Auth-Token' HTTP header. It would be nicer to not have a one-off proxy for every service like this, but if we're not in a good place to build that then this would still be an improvement.
I think we've agreed that it's better to standup a heroku app that proxies requests. Giving an architecture like: Worker <--> HerokuApp <--> SymbolUpload So in this scenario the Worker would use the taskclusterProxy feature to contact HerokuApp using requests signed with taskcluster credentials. So HerokuApp will authorize the request based on taskcluster scopes associated with the request. And forward it to SymbolUpload, using a secret that's burned into the surface. This way the secrets are only in SymbolUpload and can't be leaked if an evil task breaks out of the docker container. Also the HerokuApp can be used from all workers (as well as any service) that supports making requests with taskcluster credentials. --- Any concerns about how to do this, please reach out to me. I'm happy to help. We've done proxies like the HerokuApp here for sentry, s3, statsum, webhooktunnel with success.
Status: NEW → RESOLVED
Closed: 7 years ago
QA Contact: pmoore
Resolution: --- → WONTFIX
Obviously, support for taskcluster authentication can also be added directly in the service like SymbolUpload. Even if it's written in another language, it's pretty easy to do, I'd be happy to help, if any one wants to play with this. (the way hawk and scope resolution is off-loaded to taskcluster-auth all of this is actually pretty easy to do)
For symbol upload we went a different direction in bug 1422740: we just put auth tokens in taskcluster secrets and have the upload task fetch them from there. Maybe it would be nicer for the task to not actually get access to the secrets, but it's not hugely important to me and this is better than the previous situation we had with a private Docker image, so I'm unlikely to worry about further improvements. :)
Component: Worker → Workers
You need to log in before you can comment on or make changes to this bug.