Closed
Bug 1245953
Opened 9 years ago
Closed 9 years ago
Allow non-build/test tasks to be added easily
Categories
(Taskcluster :: General, defect)
Taskcluster
General
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: gps, Assigned: gps)
References
(Blocks 1 open bug)
Details
Attachments
(9 files)
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
garndt
:
review+
|
Details |
In-tree TaskCluster tasks are currently classified as a "build" or "test" task. And, "test" tasks are dependent on a "build" task running, so they can be though of as an extension to "build" tasks. As bug 1229588 showed us and as work I'm doing further show, not everything is a "build." There are many "other" or "misc" tasks. This included static analysis, documentation generation, or miscellaneous tests (such as my upcoming work to test that "mach bootstrap" works).
I'm filing this bug to track adding support for new task types. The goal is to enable new high-level categories of tasks to be introduced easily without having to shoehorn them into a "build" task. As part of this work, the existing eslint "build" task will likely get converted to the new world order.
This work likely means inventing a new primitive for the try syntax parser for TaskCluster so this new namespace of task can be specified.
I intend to work on this ASAP. I'm sure I'm horribly naive about the implications of this change. So if I'm barking up the wrong tree, I'd appreciate an IRC ping telling me. But I feel like I grok the YAML files and the graph generation mechanism enough that I can hack something together. Whether it is correct, I guess I'll find out...
Comment 1•9 years ago
|
||
+1
Another use case I have is a behind-the-scenes worker task. Basically a task that won't show up on treeherder, but that other higher level (in the graph) tasks can use to farm out work across multiple AWS instances.
Comment 2•9 years ago
|
||
> I intend to work on this ASAP. I'm sure I'm horribly naive about the implications of this change.
++, go hack the current solution..
--
I hope that over the coming weeks (maybe months) work on bug 1243844 will completely refactor
the current in-tree configs. And make other kinds of tasks more feasible, we already had to hack
the mach python command to do on-push rebuilds of in-tree dockerfiles.
But bug 1243844 is high risk of turning into a bike-shedding expedition, so I don't recommend waiting
for it. Rather go ahead solve your problem :)
Comment 3•9 years ago
|
||
rillian and I have jobs of this nature too: building rust toolchains and fetching Maven Gradle dependencies, respectively. We both want to upload to tooltool; perhaps an approach to making things depend on those tooltool uploads can fall out of this works.
Assignee | ||
Comment 4•9 years ago
|
||
Assignee | ||
Comment 5•9 years ago
|
||
Assignee | ||
Comment 6•9 years ago
|
||
THIS PATCH IS A BIT HACKY. TREAT REVIEW AS A REQUEST FOR FEEDBACK.
ASSUME I DON'T KNOW WHAT I'M DOING BECAUSE I REALLY DON'T.
Currently, tasks are either "build" or "test" tasks. And "test" tasks
are dependent on "build" tasks, so they are effectively an extension of
"build" tasks.
Not everything is a "build" task. Not everything is associated with a
specific platform.
This commit introduces support for defining non-build "tasks" under the
"tasks" top-level element of a jobs YAML file.
By default, all these tasks run.
The -j/--job argument has been added to the try syntax parser. It
specifies an opt-in list of these non-build tasks to run. By default, it
runs all of them.
The eslint-gecko "build" task has been moved to this new mechanism.
Review commit: https://reviewboard.mozilla.org/r/33729/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/33729/
Attachment #8716128 -
Flags: review?(jopsen)
Comment 7•9 years ago
|
||
Comment on attachment 8716128 [details]
MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
https://reviewboard.mozilla.org/r/33729/#review30709
This looks reasonably sane to me... but like you I honestly don't know what is going on here :)
Maybe wcosta or garndt is a better reviewer.
Attachment #8716128 -
Flags: review?(jopsen)
Comment 8•9 years ago
|
||
(In reply to Gregory Szorc [:gps] from comment #6)
> Created attachment 8716128 [details]
> MozReview Request: Bug 1245953 - Support defining non-build/test tasks;
> r?jonasfj
>
> THIS PATCH IS A BIT HACKY. TREAT REVIEW AS A REQUEST FOR FEEDBACK.
> ASSUME I DON'T KNOW WHAT I'M DOING BECAUSE I REALLY DON'T.
>
> Currently, tasks are either "build" or "test" tasks. And "test" tasks
> are dependent on "build" tasks, so they are effectively an extension of
> "build" tasks.
>
> Not everything is a "build" task. Not everything is associated with a
> specific platform.
>
> This commit introduces support for defining non-build "tasks" under the
> "tasks" top-level element of a jobs YAML file.
>
> By default, all these tasks run.
This seems wrong. There are going to be tons of tasks that are platform specific (like various Android lint tasks), and lots that are intermittent (like fetching Android deps, or building the rust toolchain).
Let's land with them all off and figure out how to opt-in as quickly as possible. Landing with them all on doesn't let us land new tasks easily -- unless there's a way to "disable" a job that I do not see in the patch.
Assignee | ||
Comment 9•9 years ago
|
||
(In reply to Nick Alexander :nalexander from comment #8)
> (In reply to Gregory Szorc [:gps] from comment #6)
> >
> > This commit introduces support for defining non-build "tasks" under the
> > "tasks" top-level element of a jobs YAML file.
> >
> > By default, all these tasks run.
>
> This seems wrong. There are going to be tons of tasks that are platform
> specific (like various Android lint tasks), and lots that are intermittent
> (like fetching Android deps, or building the rust toolchain).
You are correct. I'm going to scope bloat myself to cover addressing this.
Comment 10•9 years ago
|
||
Before you do that -- we on the TC team have been tossing around ideas for a larger, long-overdue refactoring of the whole mess under testing/taskcluster/tasks. The idea is to support requirements like you're working on, and many more.
In bug 1247703, we're going to do some brainstorming of what we want this new thing to "feel like" for developers, at which point I'd like to talk to everyone on this bug (and a few others) to see if the results would meet your needs. Then we can set to work implementing it -- we actually have a number of hands on keyboards to do the implementing, so it should go pretty fast!
Assignee | ||
Comment 11•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #10)
> Before you do that -- we on the TC team have been tossing around ideas for a
> larger, long-overdue refactoring of the whole mess under
> testing/taskcluster/tasks. The idea is to support requirements like you're
> working on, and many more.
>
> In bug 1247703, we're going to do some brainstorming of what we want this
> new thing to "feel like" for developers, at which point I'd like to talk to
> everyone on this bug (and a few others) to see if the results would meet
> your needs. Then we can set to work implementing it -- we actually have a
> number of hands on keyboards to do the implementing, so it should go pretty
> fast!
That feels like something that can linger on for weeks or months. I'm really about instant gratification in this bug and I think I can piece something together that fits the immediate needs of myself and others. I have no problems with my code getting refactored to a better solution in the weeks and months ahead. And I have no desire to make any sweeping changes to existing in-tree tasks. My plan is to staple on something with the basic ability to schedule non-build/test tasks as a result of specific files changing. Emphasis on "staple on."
Assignee | ||
Comment 12•9 years ago
|
||
The function will soon query something that isn't limited to pushlog
info. Rename it accordingly.
Review commit: https://reviewboard.mozilla.org/r/34689/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34689/
Attachment #8718679 -
Flags: review?(garndt)
Assignee | ||
Comment 13•9 years ago
|
||
requests should *always* be used for performing HTTP requests because it
has a better API *and* has sane security defaults compared to the HTTP
request APIs in the Python standard library. Although, Python 2.7.9+
does have slightly saner defaults in the standard library. I still trust
requests more.
Review commit: https://reviewboard.mozilla.org/r/34691/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34691/
Attachment #8718680 -
Flags: review?(garndt)
Assignee | ||
Comment 14•9 years ago
|
||
Before, we attempted to build and query a URL that potentially had
"None" in it. This printed some wonky messages in the log and may have
contributed to added latency due to the HTTP request that was doomed to
fail.
Review commit: https://reviewboard.mozilla.org/r/34693/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34693/
Attachment #8718681 -
Flags: review?(garndt)
Assignee | ||
Comment 15•9 years ago
|
||
In preparation for adding more content that isn't strictly related to
pushlog info.
Review commit: https://reviewboard.mozilla.org/r/34695/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34695/
Attachment #8718682 -
Flags: review?(garndt)
Assignee | ||
Comment 16•9 years ago
|
||
Over in bug 1247802 we deployed a new JSON web API on hg.mozilla.org
that returns JSON metadata for changesets that are relevant for build
automation. It returns a superset of what is returned by the pushlog
JSON API. So we switch to it.
Review commit: https://reviewboard.mozilla.org/r/34697/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34697/
Attachment #8718683 -
Flags: review?(garndt)
Assignee | ||
Comment 17•9 years ago
|
||
We're about to introduce a mechanism to influence which tasks run based
on what files change. To help debug what's happening, print out the list
of commits that influence the task selection.
Review commit: https://reviewboard.mozilla.org/r/34699/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34699/
Attachment #8718684 -
Flags: review?(garndt)
Assignee | ||
Comment 18•9 years ago
|
||
Comment on attachment 8716128 [details]
MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33729/diff/1-2/
Attachment #8716128 -
Attachment description: MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?jonasfj → MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
Attachment #8716128 -
Flags: review?(garndt)
Assignee | ||
Comment 19•9 years ago
|
||
Firefox's automation currently tends to run all the jobs all the time.
It is wasteful to do this. For example, running ESLint when the commit
only changes a .cpp file adds no value.
This commit adds support for only running tasks when certain files
change. The new-style tasks introduced by the previous commit have been
taught a "when" dictionary property that defines conditions that should
hold for the task to be executed. We define a "file_patterns" list that
defines lists of mozpack path matching expressions that will be matched
against the set of files changed by the changesets relevant to the
changeset being built. The eslint task has been updated to only run if
the file extensions that it actually lints change.
Review commit: https://reviewboard.mozilla.org/r/34701/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/34701/
Attachment #8718685 -
Flags: review?(garndt)
Assignee | ||
Comment 20•9 years ago
|
||
https://reviewboard.mozilla.org/r/34697/#review31349
FYI this doesn't yet work against hg.mozilla.org because bug 1247802 hasn't landed or been deployed yet. While I'm here, basically the series starting with this commit is PoC and in the RFC stage.
Assignee | ||
Comment 21•9 years ago
|
||
Few more quick tidbits:
1) There is a strong possibility I'll wake up tomorrow and change my mind about having the changeset resolution on the server. We can certainly implement it client side by getting the pushlog info from the JSON web API (like today) and the phases info from the local repo. Let me sleep on it.
2) I'm not thrilled putting the file patterns in the YAML files because having references to paths not inside the paths themselves is a recipe for creating cruft. e.g. down the line we may have tasks that are only scheduled when files in foo/ change. Then someone deletes foo/ and we forget there are references to foo/ in TaskCluster tasks. moz.build files already have a mechanism for mapping metadata for files. There is even a mach command for accessing that data. It is certainly possible to have an e.g. IMPACTED_TASKS moz.build variable that is used to trigger tasks from changes to files. This is probably only an hour or 2 of work to implement (or about as much time as this series took me).
Again, this series is very RFC. And given the massive refactorings planned in bug 1247703, I'd say perfect is definitely the enemy of done. You can already do a lot with what I've implemented and I'd be stoked if we got something quick and dirty landed.
Updated•9 years ago
|
Attachment #8718679 -
Flags: review?(garndt) → review+
Comment 22•9 years ago
|
||
Comment on attachment 8718679 [details]
MozReview Request: Bug 1245953 - Rename query_pushinfo to query_vcsinfo; r=garndt
https://reviewboard.mozilla.org/r/34689/#review31379
Updated•9 years ago
|
Attachment #8718680 -
Flags: review?(garndt) → review+
Comment 23•9 years ago
|
||
Comment on attachment 8718680 [details]
MozReview Request: Bug 1245953 - Use requests for performing HTTP request; r=garndt
https://reviewboard.mozilla.org/r/34691/#review31381
Thanks! Definitely prefer the change over to 'requests'
Comment 24•9 years ago
|
||
Comment on attachment 8718681 [details]
MozReview Request: Bug 1245953 - Fail fast if no VCS info defined; r=garndt
https://reviewboard.mozilla.org/r/34693/#review31389
Attachment #8718681 -
Flags: review?(garndt) → review+
Comment 25•9 years ago
|
||
So from what I understand this looks at files that are modified in the push, which means that "files changed" is implicitly referencing the parent cset to the first cset in the push. That means my try push, if it happens to include csets which weren't already on try, may run a lot more than I really need it to.
We've been discussing a very different way to detect files changed, similar to what we do now for Docker images. For those, we hash the directory containing the Dockerfile, then look for a task matching that hash in the index at docker.images.v1.mozilla-central.desktop-build.hash.<hash>. When generating the graph, if we find a task in the index, we use that task's output as the docker image. If not, we schedule a new task and depend on it.
I think we could do the same with any arbitrary set of files in the tree. For example, we could hash testing/mozharness excluding testing/mozharness/configs, and schedule a run of the mozharness test suite for each such tree. Similarly, we can achieve artifact builds this way by hashing only the inputs to the c++ components, and building those in a separate task from the artifact build (while linting that artifact build in parallel, even).
The real power of this technique comes from the ability to not only say "we don't need to do X", but to have the cached outputs of X readily available for the tasks that you *do* want to perform. For example, the model you've put forward here doesn't handle the docker-image case very well.
I only bring this up now because I think it will be rather difficult to refactor a system built on changes in pushlog to one built on hashes. What do you think?
Comment 26•9 years ago
|
||
Comment on attachment 8718682 [details]
MozReview Request: Bug 1245953 - Rename "push info" to "vcs info"; r=garndt
https://reviewboard.mozilla.org/r/34695/#review31393
Attachment #8718682 -
Flags: review?(garndt) → review+
Comment 27•9 years ago
|
||
Comment on attachment 8716128 [details]
MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
https://reviewboard.mozilla.org/r/33729/#review31401
::: testing/taskcluster/taskcluster_graph/commit_parser.py:261
(Diff revision 2)
> + parser.add_argument('-j', '--job', action='append')
Does this mean that jobs will have to be specified as such:
"--job task1 --job task2 --job task3"
instead of comma delimited like platform and unittests?
::: testing/taskcluster/taskcluster_graph/commit_parser.py:327
(Diff revision 2)
> + # Process miscellaneous tasks.
I'm not sure if this is too much, but a comment about the default behavior of including all jobs if no jobs are explicitly declared might clear up some confusion if someone is viewing this without reading the commit message for this change.
Attachment #8716128 -
Flags: review?(garndt)
Comment 28•9 years ago
|
||
Comment on attachment 8718685 [details]
MozReview Request: Bug 1245953 - Support for only running tasks when certain files change; r?garndt
https://reviewboard.mozilla.org/r/34701/#review31409
::: testing/taskcluster/mach_commands.py:398
(Diff revision 1)
> + def should_run(task):
I don't see an issue with the implementation side of this, but do leave it open up for discussion based on https://bugzilla.mozilla.org/show_bug.cgi?id=1245953#c25
Attachment #8718685 -
Flags: review?(garndt)
Comment 29•9 years ago
|
||
I left some review comments, and r+ some of the commits. I left some of the commits untouched (such as those relating to the new hg endpoint we will be querying) until that endpoint is deployed and we see how it fairs out with these changes.
Comment 30•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #25)
> So from what I understand this looks at files that are modified in the push,
> which means that "files changed" is implicitly referencing the parent cset
> to the first cset in the push. That means my try push, if it happens to
> include csets which weren't already on try, may run a lot more than I really
> need it to.
>
> We've been discussing a very different way to detect files changed, similar
> to what we do now for Docker images. For those, we hash the directory
> containing the Dockerfile, then look for a task matching that hash in the
> index at docker.images.v1.mozilla-central.desktop-build.hash.<hash>. When
> generating the graph, if we find a task in the index, we use that task's
> output as the docker image. If not, we schedule a new task and depend on it.
>
> I think we could do the same with any arbitrary set of files in the tree.
> For example, we could hash testing/mozharness excluding
> testing/mozharness/configs, and schedule a run of the mozharness test suite
> for each such tree. Similarly, we can achieve artifact builds this way by
> hashing only the inputs to the c++ components, and building those in a
> separate task from the artifact build (while linting that artifact build in
> parallel, even).
>
> The real power of this technique comes from the ability to not only say "we
> don't need to do X", but to have the cached outputs of X readily available
> for the tasks that you *do* want to perform. For example, the model you've
> put forward here doesn't handle the docker-image case very well.
>
> I only bring this up now because I think it will be rather difficult to
> refactor a system built on changes in pushlog to one built on hashes. What
> do you think?
I have thought a little about this, and want to raise two mundane concerns:
1) with the docker-image case, and I think many more cases, the hash needs to contain an arbitrary set of files in the source tree. The set grows to become the whole tree relatively quickly. At that point, the hash of content is roughly equivalent to the cset hash.
2) the runway to actually using docker-specific approaches in production is really long. cset and sets of changed files based on version control has the advantage that we can do some version of this in buildbot and get wins today.
Comment 31•9 years ago
|
||
1) I think you're arguing that this whole effort is useless, as we'll quickly find ourselves back to rebuilding everything on every push. Maybe that's the case, but some really smart people are working on it, so hopefully not! Anyway, I don't think that gets to the issue of whether to look at push metadata or hash in-tree files.
2) Buildbot doesn't support any kind of in-tree scheduling, so regardless of which approach we choose, it won't work in Buildbot.
2b) The approach I'm suggesting isn't docker-specific. I used the example of building docker images since it already exists, but if you replace "docker" with "clang" the point still holds (it's just not implemented yet). In that case we would hash the script used to build the clang toolchain, https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/misc/build-clang-linux.sh.
With that in place, you could test out a patch to clang or a version bump in try. To accomplish the same thing by looking at files changed in the push, you'd need to find the last cset in which any of the affected files were changed, then look up that cset in the index to find the clang blob, which is a lot of hg operations when the list of files gets long.
Comment 32•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #31)
> 1) I think you're arguing that this whole effort is useless, as we'll
> quickly find ourselves back to rebuilding everything on every push. Maybe
> that's the case, but some really smart people are working on it, so
> hopefully not! Anyway, I don't think that gets to the issue of whether to
> look at push metadata or hash in-tree files.
Maybe I'm missing the thread here, but: the docker-image hashing approach is great when your image doesn't refer to the rest of the tree, but it's not so great when it does. From my recent experience, my job should depend on:
testing/docker/JOB/**
testing/taskcluster/JOB/**
mobile/android/JOB/**
mobile/android/configs/**
configure.in
python/**
At some point, with all the connections between the bits in the tree and poorly defined boundaries, we approach hashing the whole tree. configure.in and build defines in particular run into this (just like they do now with the existing build system).
In any case, I think we're all aware of the difficulties. I'm absolutely *not* arguing the effort is useless, so I'll leave this be.
> 2) Buildbot doesn't support any kind of in-tree scheduling, so regardless of
> which approach we choose, it won't work in Buildbot.
I mis-spoke above -- I mean to say "taskcluster-specific", not "docker-specific". That is, I can't solve my problems in taskcluster-specific ways -- right now, caching based on content hashes -- because I need to work in buildbot.
> 2b) The approach I'm suggesting isn't docker-specific. I used the example
> of building docker images since it already exists, but if you replace
> "docker" with "clang" the point still holds (it's just not implemented yet).
> In that case we would hash the script used to build the clang toolchain,
> https://dxr.mozilla.org/mozilla-central/source/testing/taskcluster/scripts/
> misc/build-clang-linux.sh.
>
> With that in place, you could test out a patch to clang or a version bump in
> try. To accomplish the same thing by looking at files changed in the push,
> you'd need to find the last cset in which any of the affected files were
> changed, then look up that cset in the index to find the clang blob, which
> is a lot of hg operations when the list of files gets long.
To my eye, this is what VCS systems are good at. Mercurial, for example, can quickly tell you what changesets aren't "public" (i.e., haven't landed) and tell you all changed files in them.
Assignee | ||
Comment 33•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #25)
> So from what I understand this looks at files that are modified in the push,
> which means that "files changed" is implicitly referencing the parent cset
> to the first cset in the push. That means my try push, if it happens to
> include csets which weren't already on try, may run a lot more than I really
> need it to.
It's /slightly/ more complicated than that. Read the very detailed commit message in bug 1247802.
You do bring up a valid point about the Try push containing extra changesets (e.g. things that have landed on central that haven't yet been pushed to Try). However, I have a Q2 deliverable to stand up a unified, auto aggregating Firefox repository. As part of that, I will have Try and (hopefully) the MozReview repos automatically aggregating as well. This means that Try/MozReview pushes should only receive commits specific to that push and the algorithm implemented in bug 1247802 will be sufficient. Until then, we could pull in extra, unwanted files. But that's OK because that's basically our scheduling model today (largely changed files agnostic).
I'll respond to the rest of your comment later once I have time to digest it.
Flags: needinfo?(gps)
Assignee | ||
Comment 34•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #25)
When you squint really hard, you realize that DAGs and build systems are everywhere and that version control repositories are snapshotting filesystems and therefore can be used to populate dependency graphs similar to how traditional filesystems are consulted by traditional build systems.
From a very high level, you could model everything that our build automation does in a modern build system (not GNU Make). Someone would check something in and `mach build` (read: the build system) would create Docker images, compile toolchains, build Firefox, run tests, do l10n repacks, create installers, etc. This could all be derived from a single DAG. If you did all the work on a single machine, you could use the filesystem to look at what changed and the build system would do a minimal rebuild.
Unfortunately, that doesn't scale. So you need multiple machines. And that means that traditional filesystem mtime based approaches for invalidating inputs and outputs can't easily be used. But mtimes are a crappy mechanism for doing change detection. To borrow terminology from HTTP caching, mtime (like the If-Modified-Since header) is a "weak validator" because it is indirectly measuring change (albeit in a cheap way). Content based change detection (the ETag header) is a "strong validator" and therefore preferred. But it is typically more expensive and therefore has scaling problems.
Your suggested approach of hashing inputs to a) determine what needs rebuilt b) facilitate caching are excellent ideas and are how "build systems" should work. In fact, this is how the Bazel build system works! Many of us have a serious nerd crush on the Bazel build system for this reason. So, in a way your proposal is to implement a build system, albeit one that knows how to perform distributed execution and caching. Of course, "distributed execution" can be modeled in any build system as a rule that runs a command to schedule a remote job that has no inputs that can be immediately used. Where the model does somewhat break down is if scheduled/remote job output needs to further influence the build. But this can be modeled as separate "targets" or "goals" for the build process, which can be invoked whenever.
I *really* like your proposal for content based scheduling and caching. However, it will suffer from many of the shortcomings of most build systems. Namely that getting dependencies correct is hard. Modern build systems like Tup and Bazel instrument filesystem accesses and/or sandbox execution so all dependencies are guaranteed to be captured. (Actually, I /think/ Bazel may even error if build dependencies aren't explicitly declared.)
I'd encourage you to seriously investigate using an actual and modern build system for scheduling automation tasks. I recommend Bazel. With a little forward planning, we /might/ even be able to merge the Firefox build system into the same DAG down the road. That would be epic.
If you end up building your own special purpose "build system," you may want to lean on version control for accessing content hashes efficiently, as scanning 100,000+ files could be time prohibitive. Although, my i7-6700K can SHA-256 a mozilla-central checkout in <6s if the entire source tree is in the page cache. Of course, the source tree isn't the only input.
Flags: needinfo?(gps)
Comment 35•9 years ago
|
||
Comment on attachment 8718683 [details]
MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
https://reviewboard.mozilla.org/r/34697/#review31659
This looks good to me, but the service it depends on hasn't been deployed yet.
Attachment #8718683 -
Flags: review?(garndt) → review+
Comment 36•9 years ago
|
||
Comment on attachment 8718684 [details]
MozReview Request: Bug 1245953 - Print info on commits influencing scheduling; r=garndt
https://reviewboard.mozilla.org/r/34699/#review31661
This looks good to me, but the service it depends on hasn't been deployed yet.
Attachment #8718684 -
Flags: review?(garndt) → review+
Comment 37•9 years ago
|
||
Is it even realistic to consider using a java-based build system? To be honest, I had already rejected the possibility of writing this in JS since our build requirements don't include node. Also, from a quick look, it doesn't seem like Bazel could separate out the small corner of it that might be useful for this purpose. Basically I would just need it to generate a DAG and then replace nodes in that DAG with existing nodes from other DAGs that have the same inputs. I don't need it to actually execute that DAG, nor would any of the automatic dependency stuff be practical.
Comment 38•9 years ago
|
||
Also, in trying to think about using version-control for this instead of hashing files locally, I think the question I want to ask is, "what is the latest revision containing this version of all of these paths". For a single directory, that's easy:
dustin@dustin-moz-devel ~/p/m-c $ hg log -l 1 testing/docker/desktop-test/
changeset: 284023:e3cf1fdd0d97
bookmark: bug1246947
user: Dustin J. Mitchell <dustin@mozilla.com>
date: Thu Feb 11 16:47:54 2016 +0000
summary: Bug 1242979: Install Valgrind on mochitest-valgrind test nodes; r=jseward
So we would look up e3cf1fdd0d97 in the index of docker-image/desktop-test builds and, if found, link to (and use the image from) the existing task. If not, create a new task and link to that.
Does that generalize correctly to multiple paths? For example:
dustin@dustin-moz-devel ~/p/m-c $ hg log -l 1 testing/mozharness/{mach_commands.py,mozharness,mozinfo,mozprocess,scripts,test,tox.ini,setup.py,unit.sh}
changeset: 283960:d4d72e7b30da
parent: 283959:b21946a2e993
parent: 283899:97c7a71cce02
user: Carsten "Tomcat" Book <cbook@mozilla.com>
date: Thu Feb 11 11:52:01 2016 +0100
summary: merge mozilla-inbound to mozilla-central a=merge
Assignee | ||
Comment 39•9 years ago
|
||
(In reply to Dustin J. Mitchell [:dustin] from comment #38)
> Does that generalize correctly to multiple paths? For example:
>
> dustin@dustin-moz-devel ~/p/m-c $ hg log -l 1
> testing/mozharness/{mach_commands.py,mozharness,mozinfo,mozprocess,scripts,
> test,tox.ini,setup.py,unit.sh}
No. `hg log -l file1 file2` will assemble all changesets touching *any* of the files listed and display the newest one. To find the last changeset where all files were the same, you'd likely need to write a custom extension/command.
Comment 40•9 years ago
|
||
I think that's what I want. If I run `hg log -l file1 file2` today, then make a bunch of commits that don't affect file1 or file2, it will continue returning the same hash. When I do touch file1, it will return the hash in which I touch file1.
Comment 41•9 years ago
|
||
If you have something like this:
1----2----3----5
\
\---4
Where your checkout is 5, the numbers are the revision numbers, and the files you're looking at have been changed in 2 and 4.
In that case, hg log -l 1 file1 file2 will give you 4 when you want 2.
So what you probably want is to at least throw in ancestry.
hg log -l 1 -r ::. file1 file2
Comment 42•9 years ago
|
||
Comment on attachment 8718683 [details]
MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
https://reviewboard.mozilla.org/r/34697/#review31821
::: testing/taskcluster/mach_commands.py:160
(Diff revision 1)
> - url = '%s/json-pushes?changeset=%s' % (repository, revision)
> + url = '%s/json-automationrelevance/%s' % (repository, revision)
If 'repository' contains a trailing slash, this results in ...//json-automationrelevance..
As seen in task XI9-AZb9TKGNLn5lwFgXKQ
Attachment #8718683 -
Flags: review+
Assignee | ||
Comment 43•9 years ago
|
||
https://reviewboard.mozilla.org/r/33729/#review31401
> Does this mean that jobs will have to be specified as such:
> "--job task1 --job task2 --job task3"
> instead of comma delimited like platform and unittests?
As implemented, yes. I'll add support for commas.
Assignee | ||
Comment 44•9 years ago
|
||
Comment on attachment 8718679 [details]
MozReview Request: Bug 1245953 - Rename query_pushinfo to query_vcsinfo; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34689/diff/1-2/
Attachment #8718679 -
Attachment description: MozReview Request: Bug 1245953 - Rename query_pushinfo to query_vcsinfo; r?garndt → MozReview Request: Bug 1245953 - Rename query_pushinfo to query_vcsinfo; r=garndt
Assignee | ||
Comment 45•9 years ago
|
||
Comment on attachment 8718680 [details]
MozReview Request: Bug 1245953 - Use requests for performing HTTP request; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34691/diff/1-2/
Attachment #8718680 -
Attachment description: MozReview Request: Bug 1245953 - Use requests for performing HTTP request; r?garndt → MozReview Request: Bug 1245953 - Use requests for performing HTTP request; r=garndt
Assignee | ||
Updated•9 years ago
|
Attachment #8718681 -
Attachment description: MozReview Request: Bug 1245953 - Fail fast if no VCS info defined; r?garndt → MozReview Request: Bug 1245953 - Fail fast if no VCS info defined; r=garndt
Assignee | ||
Comment 46•9 years ago
|
||
Comment on attachment 8718681 [details]
MozReview Request: Bug 1245953 - Fail fast if no VCS info defined; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34693/diff/1-2/
Assignee | ||
Comment 47•9 years ago
|
||
Comment on attachment 8718682 [details]
MozReview Request: Bug 1245953 - Rename "push info" to "vcs info"; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34695/diff/1-2/
Attachment #8718682 -
Attachment description: MozReview Request: Bug 1245953 - Rename "push info" to "vcs info"; r?garndt → MozReview Request: Bug 1245953 - Rename "push info" to "vcs info"; r=garndt
Assignee | ||
Comment 48•9 years ago
|
||
Comment on attachment 8718683 [details]
MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34697/diff/1-2/
Attachment #8718683 -
Attachment description: MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r?garndt → MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
Attachment #8718683 -
Flags: review?(garndt)
Assignee | ||
Updated•9 years ago
|
Attachment #8718684 -
Attachment description: MozReview Request: Bug 1245953 - Print info on commits influencing scheduling; r?garndt → MozReview Request: Bug 1245953 - Print info on commits influencing scheduling; r=garndt
Assignee | ||
Comment 49•9 years ago
|
||
Comment on attachment 8718684 [details]
MozReview Request: Bug 1245953 - Print info on commits influencing scheduling; r=garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34699/diff/1-2/
Assignee | ||
Comment 50•9 years ago
|
||
It is possible to hook up in-tree documentation to Sphinx. Convert the
one-off README.md to ReStructuredText and add it to the Sphinx docs.
I added a moz.build file under testing/ because I don't think it is
appropriate for the Sphinx directive to live in the root moz.build file.
Review commit: https://reviewboard.mozilla.org/r/35207/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/35207/
Attachment #8720109 -
Flags: review?(garndt)
Assignee | ||
Comment 51•9 years ago
|
||
Comment on attachment 8716128 [details]
MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/33729/diff/2-3/
Attachment #8716128 -
Flags: review?(garndt)
Assignee | ||
Updated•9 years ago
|
Attachment #8718685 -
Flags: review?(garndt)
Assignee | ||
Comment 52•9 years ago
|
||
Comment on attachment 8718685 [details]
MozReview Request: Bug 1245953 - Support for only running tasks when certain files change; r?garndt
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/34701/diff/1-2/
Updated•9 years ago
|
Attachment #8718683 -
Flags: review?(garndt) → review+
Comment 53•9 years ago
|
||
Comment on attachment 8718683 [details]
MozReview Request: Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
https://reviewboard.mozilla.org/r/34697/#review31899
Comment 54•9 years ago
|
||
Comment on attachment 8720109 [details]
MozReview Request: Bug 1245953 - Convert TaskCluster docs to Sphinx; r?garndt
https://reviewboard.mozilla.org/r/35207/#review31901
Attachment #8720109 -
Flags: review?(garndt) → review+
Comment 56•9 years ago
|
||
Comment on attachment 8716128 [details]
MozReview Request: Bug 1245953 - Support defining non-build/test tasks; r?garndt
https://reviewboard.mozilla.org/r/33729/#review31905
::: testing/taskcluster/taskcluster_graph/commit_parser.py:336
(Diff revision 3)
> + # args.jobs == None implies all taks.
'tasks'
Attachment #8716128 -
Flags: review?(garndt) → review+
Updated•9 years ago
|
Attachment #8718685 -
Flags: review?(garndt) → review+
Comment 57•9 years ago
|
||
Comment on attachment 8718685 [details]
MozReview Request: Bug 1245953 - Support for only running tasks when certain files change; r?garndt
https://reviewboard.mozilla.org/r/34701/#review31909
Assignee | ||
Comment 58•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f1977ad30c69b233e837f275b9e8c20b8b148d7
Bug 1245953 - Rename query_pushinfo to query_vcsinfo; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/870ae50e413d371973492883c690f690f370aa0a
Bug 1245953 - Use requests for performing HTTP request; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/2ae135674c0bc62796dc907700bce1223a850851
Bug 1245953 - Fail fast if no VCS info defined; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3734f50f9468a50d429dff5d8ea9bf54a0a5e78
Bug 1245953 - Rename "push info" to "vcs info"; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/1a4d474faefbf08fb9a5aab1cd1d713d68314d44
Bug 1245953 - Query automationrelevance API instead of pushlog; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/7b676d94c9dd9f63491fa5683dd433e955d14404
Bug 1245953 - Print info on commits influencing scheduling; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/547e0c23071ebab8bde38d7ec354dff3a4dacdad
Bug 1245953 - Convert TaskCluster docs to Sphinx; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/623765c2381e49126a9768ba2f597edec7763ee6
Bug 1245953 - Support defining non-build/test Task Cluster tasks; r=garndt
https://hg.mozilla.org/integration/mozilla-inbound/rev/eee2e3b43fc1441709d940a62a59e38e47104b73
Bug 1245953 - Support for only running tasks when certain files change; r=garndt
Assignee | ||
Comment 59•9 years ago
|
||
I'd like to again state for the record that the new generic tasks feature is very half baked and is only intended to be a stop-gap until the imminent in-tree tasks rewrite occurs. Depending on timelines, I may need to add additional features to generic tasks to facilitate bug 1245969.
Comment 60•9 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/0f1977ad30c6
https://hg.mozilla.org/mozilla-central/rev/870ae50e413d
https://hg.mozilla.org/mozilla-central/rev/2ae135674c0b
https://hg.mozilla.org/mozilla-central/rev/c3734f50f946
https://hg.mozilla.org/mozilla-central/rev/1a4d474faefb
https://hg.mozilla.org/mozilla-central/rev/7b676d94c9dd
https://hg.mozilla.org/mozilla-central/rev/547e0c23071e
https://hg.mozilla.org/mozilla-central/rev/623765c2381e
https://hg.mozilla.org/mozilla-central/rev/eee2e3b43fc1
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•