Migrate deepspeech to community taskcluster deployment
Categories
(Taskcluster :: Operations and Service Requests, task)
Tracking
(Not tracked)
People
(Reporter: dustin, Assigned: miles)
References
Details
Make a plan to move this project to the new community deployment.
Notes:
need to upgrade docker-worker workers quite some distance
generic-worker is at 14.x, can try an upgrade
Reporter | ||
Comment 1•5 years ago
|
||
Pete, FYI regarding the worker updates here. I think Alex was getting started on that already.
Comment 2•5 years ago
|
||
Yeah, I started some work but I'm busy on other things, I'll focus on that start of next month :)
Comment 3•5 years ago
|
||
So, I've updated:
- all RPi3, LePotato boards to https://github.com/lissyx/docker-worker/commit/23b2d5dff7ae3c6ac22a9c31e7357fc3c7d2de19
- all macOS (heavy & light) workers to generic-worker simple v15.1.5
And deepspeech-win/-b for windows runs generic-worker v15.1.0
Reporter | ||
Comment 4•5 years ago
|
||
I'll work on updating tc.yml to v1. Notably, it uses env vars:
https://github.com/lissyx/taskcluster-github-decision/blob/master/tc-decision.py#L62-L80
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Updated•5 years ago
|
Reporter | ||
Comment 5•5 years ago
|
||
:Summarizing a conversation with Alexandre in irc (and notes from above):
- Updating to v1 tc.yml might not be necessary, and seems hard due to use of env vars. We can still specify different provisionerId/workerType using v0.
- We will manage the project with https://github.com/mozilla/community-tc-config/, so we should figure out
- What cloud-based worker pools are required (we have docker-worker and win2012r2 right now)
- Clients that are required (one for each worker)
- Scopes and roles that are required, where that's not clear from the existing roles:
- Any hooks, or other things I haven't thought of
Comment 6•5 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] (he/him) from comment #5)
:Summarizing a conversation with Alexandre in irc (and notes from above):
- Updating to v1 tc.yml might not be necessary, and seems hard due to use of env vars. We can still specify different provisionerId/workerType using v0.
- We will manage the project with https://github.com/mozilla/community-tc-config/, so we should figure out
- What cloud-based worker pools are required (we have docker-worker and win2012r2 right now)
We use:
taskcluster:
schedulerId: taskcluster-github
docker:
provisionerId: aws-provisioner-v1
workerType: deepspeech-worker
workerTypeKvm: deepspeech-kvm-worker
workerTypeWin: deepspeech-win-b
dockerrpi3:
provisionerId: deepspeech-provisioner
workerType: ds-rpi3
dockerarm64:
provisionerId: deepspeech-provisioner
workerType: ds-lepotato
generic:
provisionerId: deepspeech-provisioner
workerType: ds-macos-light
script:
provisionerId: deepspeech-provisioner
workerType: ds-scriptworker
- Clients that are required (one for each worker)
Not sure I get this one, we only have clientId for the macOS, RPi3 and LePotato workers
- Scopes and roles that are required, where that's not clear from the existing roles:
Not sure I get this one as well
- Any hooks, or other things I haven't thought of
He he, no idea :)
Assignee | ||
Comment 7•5 years ago
|
||
For my notes:
workerType: deepspeech-worker
This can be a standard docker-worker pool in GCP.
workerTypeKvm: deepspeech-kvm-worker
This will need to be docker-worker on metal instances in AWS.
workerTypeWin: deepspeech-win-b
This can be a generic-worker windows pool in AWS/GCP.
Essentially, we have to make sure that the same worker images are available in new and separate AWS/GCP accounts, and that corresponding worker-pools exist in the community cluster for this project. That's why we're asking about cloud workers.
dockerrpi3: provisionerId: deepspeech-provisioner workerType: ds-rpi3 dockerarm64: provisionerId: deepspeech-provisioner workerType: ds-lepotato generic: provisionerId: deepspeech-provisioner workerType: ds-macos-light script: provisionerId: deepspeech-provisioner workerType: ds-scriptworker
Given that you have your own provisioner here and your comment above, we'll need to provide you with credentials / clientIds for these workers, and you'll need to update TASKCLUSTER_ROOT_URL
and other configurations.
- Scopes and roles that are required, where that's not clear from the existing roles:
This is referring to extra taskcluster scopes that your project will need, i.e. beyond Github. The workers you have will need to be able to claim tasks from the queue, for example.
In addition to the github repo roles it looks like you have a project admin role that we'll replicate as well: https://tools.taskcluster.net/auth/roles/project%3Adeepspeech%3Aadmin
Comment 8•5 years ago
|
||
Miles, we likely don't need anymore the KVM type. For the own provisioner, I think I can even create clientIds myself. At least I did in the past, does the community deployment changes that?
Assignee | ||
Comment 9•5 years ago
|
||
The structure of things is changing a bit with the move to the community deployment. The biggest shift is that we're managing taskcluster roles and scopes and other things in a project definition yaml file.
Because of the complexity of this project there are a few things that will need to change:
- I've added your project definition in this PR: https://github.com/mozilla/community-tc-config/pull/51 (already applied, so things are in place)
- The community cluster has a different root URL, https://community-tc.services.mozilla.com/, so references to this will need to change
- The community cluster has different provisioners, and worker types are per project
I've created a bug for these changes in the deepspeech repo: https://github.com/mozilla/DeepSpeech/pull/2485
Here is my PR to community-tc-config that adds the deepspeech project (and corresponding worker-pools): https://github.com/mozilla/community-tc-config/pull/51/. It also creates clients, and users in the github team mozilla/research-machine-learning
will be able to reset those clients' accessTokens to set up workers.
There's a bit more to do here, and some verification work to make sure things won't be broken, but the foundation is laid. Outstanding work:
- I need to test the provisionerId/workerType changes myself on my forked in community-tc
- I need to mirror the deepspeech changes to the tensorflow repo
- Once the changes are landed in each repo a Github admin will need to replace the taskcluster integration with the community-tc-integration
Comment 10•5 years ago
|
||
(In reply to Miles Crabill [:miles] [also mcrabill@mozilla.com] from comment #9)
The structure of things is changing a bit with the move to the community deployment. The biggest shift is that we're managing taskcluster roles and scopes and other things in a project definition yaml file.
Because of the complexity of this project there are a few things that will need to change:
- I've added your project definition in this PR: https://github.com/mozilla/community-tc-config/pull/51 (already applied, so things are in place)
- The community cluster has a different root URL, https://community-tc.services.mozilla.com/, so references to this will need to change
Will this be the TASKCLUSTER_ROOT_URL
to use?
- The community cluster has different provisioners, and worker types are per project
Which ones does change ? Again, it's not obvious to me from the PR linked.
I've created a bug for these changes in the deepspeech repo: https://github.com/mozilla/DeepSpeech/pull/2485
You need to take care of https://github.com/mozilla/tensorflow as well, on branch master and r1.14
(and potentially others)
Here is my PR to community-tc-config that adds the deepspeech project (and corresponding worker-pools): https://github.com/mozilla/community-tc-config/pull/51/. It also creates clients, and users in the github team
mozilla/research-machine-learning
will be able to reset those clients' accessTokens to set up workers.There's a bit more to do here, and some verification work to make sure things won't be broken, but the foundation is laid. Outstanding work:
- I need to test the provisionerId/workerType changes myself on my forked in community-tc
- I need to mirror the deepspeech changes to the tensorflow repo
- Once the changes are landed in each repo a Github admin will need to replace the taskcluster integration with the community-tc-integration
Can we get some planning here ? We're getting close to a v0.6 release now, I'd like to know when we have a hard cut date (this was supposed to be september 21st).
Reporter | ||
Comment 11•5 years ago
|
||
PSA: The existing (https://taskcluster.net) deployment will be shut down a week from today, on November 9. After that point, any CI not migrated to the new community cluster will stop functioning. The TC team is ready and eager to help get everything migrated by that time, but the deadline is firm.
Apologies for failing to communicate this as broadly and loudly as necessary, and for the bugspam now.
Reporter | ||
Comment 12•5 years ago
|
||
Alexandre emailed with a pretty tight timeline, essentially looking to land this on the 5th (Tuesday) at the latest.
I've given the filed PRs a good going-over, and landed the community-tc-config PR. I also filed https://github.com/mozilla/DeepSpeech/pull/2486 to replace https://github.com/mozilla/DeepSpeech/pull/2485. And I filed https://github.com/mozilla/tensorflow/pull/113 as miles suggested in #2485. Thankfully there are no taskcluster.net
references in tensorflow to rewrite.
I see that lots of DS configs reference an index path for tensorflow. I hope that putting that path in place is as simple as pushing to the tensorflow repo, and perhaps updating the sha1 in the index path in the DS repo?
I'll check back today (Sunday) during my daylight hours. I'm also available all day Monday to work on this. Pete, if you can help out with any issues Monday before I'm awake, that would be great. Hopefully that's limited to landing and applying community-tc-config patches.
Comment 13•5 years ago
|
||
Yeah, the TensorFlow repo is less complicated. Honestly, if it's change in our repo, it's less big of an issue. What I would like to avoid is us blocked on something you need to do.
Reporter | ||
Comment 14•5 years ago
|
||
Same -- and in general we've architected this so that there are fewer of those, and especially fewer of them that are "behind the scenes" from your perspective -- at worst, you should need to file a PR to community-tc-config and someone can merge/apply.
That said, SimonSapin has encountered a number of bugs that have been solved most efficiently by someone like me, and solving such bugs is a top priority.
Comment 15•5 years ago
|
||
So, neither my github account nor :reuben ones are working as expected, we don't get granted the assume:project-admin:deepspeech
. In my case, it seems it is because I am not part of mozilla/research-machine-learning
but reuben is.
Reporter | ||
Comment 16•5 years ago
|
||
^^ addressed temporarily in #66
Comment 17•5 years ago
|
||
Unblocked, we could manually run dry tasks with true
as payload on all workers. So at least this part is fine.
Comment 18•5 years ago
|
||
PRs, Push and package upload are all working: https://community-tc.services.mozilla.com/tasks/groups/KCFy3t0rQBGMj2k_Ecb0Vg
Description
•