Closed Bug 1088350 Opened 10 years ago Closed 9 years ago

taskcluster-cron: Periodic task scheduler

Categories

(Taskcluster :: General, defect)

x86_64
Linux
defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: jonasfj, Unassigned)

References

Details

cron.taskcluster.net should offer a way to schedule tasks peridically.

Let's say one uploads the following:
 - a name
 - an interval
 - a task template

The task template could paramterized similarly to the task-graph here:
https://gist.github.com/jonasfj/d16a2d6edd6dc75f1599#file-task-graph-yml
jonasfj has libraries/scripts for this somewhere (taskcluster-try)...

We should probably not let the interval be less 15 min or so...
I imagine that we store all scheduled tasks in azure table storage, the template perhaps in blob storage. And that we every 5 min goes through the list from table storage and schedule tasks as needed.

Note, we should probably employ a two-step locking process on azure table storage, so that regardless of when the process restarts we only create a task once.
Presumably we write down the taskId and timestamp we last submitted something, then we can trivially check if that taskId exists and do the second step lock after that.

This component will require a frontend API and a backend, communication over exchanges, so it'll have some of those too.
The API should have UI on tools.taskcluster.net and it should be protected by scopes paramterized with the `name` property given.
Hence, all scheduled tasks called `jonasfj-tasks/*` can only be modified by me.
This would be useful if we want to allow other servers to build on top of the API, but also to prevent people from fooling around with the nightly task, unless ofcourse they have sufficient scopes.
Use cases involve:
 - Update git clone caches on S3 that we use when a cache folder is cold,
 - Run nightly builds/tests,
 - Schedule any kind of task (imagine something like telemetry analysis, \me still dream about porting that one day)
Blocks: 1164212
We have discussed in-tree alternatives. I don't recall the design anymore.
But we should possibly discuss options here. Building a component dedicated to scheduling jobs for various trees etc, might not be easy to manage going forward (or trivial to implement).

@wcosta, let me know if this is high priority.
Rob is currently working on adding windows golden ami creation to the set of periodic ami generation jobs in RelEng for their cloud tools project. Currently the ami generation process is executed in a cron job on a fixed machine, which lacks visibility and robustness. This would be a perfect fit for defining as a periodic taskcluster taskgraph.

Overall in Mozilla we must have hundreds or thousands of jobs running on a regular cadence that prop up our systems. In RelEng I can think of b2g bumper, slave rebooter, hg poller, builds4hr, vcs sync, legacy vcs sync - no doubt there are several more.

Therefore ideally a web interface would allow you to drill down to the set of periodic taskgraphs you are interested in, perhaps by tag (e.g. all of these could be tagged with "releng").

Then some kind of dashboard showing you the current (or most recent) status of each of the periodic taskgraph executions etc.

CC'ing Rob for his input.

Essentially a way to drill down to something like status.taskcluster.net but where the entries are for periodic jobs, rather than live services, and a way to fit tens or hundreds of them cleanly into a display so that you can see at a glance which scheduled jobs are currently having problems etc (and with e.g. some percentage success for each one to show if it is intermittently failing etc).
No longer blocks: 1164212
Additional features from discussion with pmoore 
 * forced indexing, under "latest" and "<date>"
 * Special UI on tools.taskcluster.net
(Most of this still just dreaming)

Note, maybe it should only schedule tasks, then we can always create task graphs from there.
So I am not sure if this is come up yet but I think the design of "cron" should really be simply to send a pulse event. The "events" tooling (respond to pulse events with TC tasks) should be the way we handle this type of logic and that particular event type can simply listen to some periodic thing we setup.
@jlal,
This has been discussed in bug 1149789 about hooks, and yeah I agree we shouldn't make both.
I suspect that for consistency, we should just implement periodic tasks in something like hooks.taskcluster.net, as proposed in bug 1149789.

-----
Open question to the people who seems to be interested in this feature:
 A) What is smallest resolution you care about (higher yields better guarantees)
    e.g. daily, every 6 hours, every 15 min, every 1 min?
 B) What semantics are we interested in?
    - Scheduled at least once
    - Scheduled at most once
    - Scheduled exactly once (only possible with something like 1 hour resolution, or higher)
Component: TaskCluster → General
Product: Testing → Taskcluster
This will be accomplished within the "hooks" service, instead.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.