Closed
Bug 1288993
Opened 8 years ago
Closed 8 years ago
Run valgrind-mochitest twice a day as a Tier 2 job
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox54 fixed)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox54 | --- | fixed |
People
(Reporter: n.nethercote, Assigned: jmaher)
References
Details
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dustin
:
review+
|
Details | Diff | Splinter Review |
We want to run the valgrind-mochitest job twice a day. It won't be a failures-are-cause-for-backout job, because it doesn't run on every push. That means it'll probably be hidden by default. But it'll be a big improvement on the current situation which is that jseward runs mochitest under Valgrind on his own machine every once in a while -- the results will be more frequent and visible to anyone who cares to look.
Reporter | ||
Updated•8 years ago
|
Summary: Run the valgrind-mochitest job twice a day → Run valgrind-mochitest twice a day as a Tier 2 job
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → jseward
Comment 1•8 years ago
|
||
jonasfj, jseward, we spoke about this on London. I believe there was some talk about a cron-like feature on TaskCluster.
Comment 2•8 years ago
|
||
That's probably the hooks service. https://tools.taskcluster.net/hooks/
Comment 3•8 years ago
|
||
Possibly related to https://github.com/mozilla/ouija/issues/186
Assignee | ||
Comment 5•8 years ago
|
||
now that cron.yml is hooked up, lets work on this.
Assignee | ||
Comment 6•8 years ago
|
||
I am happy to take this bug- I know we want this twice/day, not sure if depending on the nightlies are a good idea- this should get us started. Please inform on how to test this or what else I should do
Comment 7•8 years ago
|
||
Comment on attachment 8832958 [details] [diff] [review]
run mochitest-valgrind on m-c on the nightly builds
Review of attachment 8832958 [details] [diff] [review]:
-----------------------------------------------------------------
::: .cron.yml
@@ +29,5 @@
>
> + - name: nightly-mochitest-valgrind
> + job:
> + type: decision-task
> + treeherder-symbol: tc-M-V()
That looks like a group name with no symbol - does that do something special in TreeHerder?
I suspect something like Vg would be better; it will appear on the decision task row in treeherder.
@@ +30,5 @@
> + - name: nightly-mochitest-valgrind
> + job:
> + type: decision-task
> + treeherder-symbol: tc-M-V()
> + triggered-by: nightly
This needs some work still, but I don't think you want --triggered-by=nightly here -- you just want a "regular" decision task, only with a target tasks method
@@ +35,5 @@
> + target-tasks-method: mochitest_valgrind
> + projects:
> + - mozilla-central
> + when:
> + - {hour: 16, minute: 0}
It would probably be good to run this at a different time from the nightlies, just to get a more even task load.
::: taskcluster/taskgraph/target_tasks.py
@@ +138,5 @@
> return [l for l in filtered_for_project if filter(full_task_graph[l])]
>
>
> +@_target_task('mochitest_valgrind')
> +def target_tasks_valgrind(full_task_graph, parameters):
This is great -- exactly how target task methods were intended :)
Attachment #8832958 -
Flags: feedback?(dustin) → feedback+
Assignee | ||
Comment 8•8 years ago
|
||
thanks for the feedback, I have adjusted this and I believe what I have is more in line with a final solution- please r- if there are nits or if I am doing something wrong.
Attachment #8832958 -
Attachment is obsolete: true
Attachment #8833291 -
Flags: review?(dustin)
Comment 9•8 years ago
|
||
Comment on attachment 8833291 [details] [diff] [review]
run mochitest-valgrind twice/day
Review of attachment 8833291 [details] [diff] [review]:
-----------------------------------------------------------------
::: .cron.yml
@@ +29,5 @@
> + job:
> + type: decision-task
> + treeherder-symbol: Vg
> + target-tasks-method: mochitest_valgrind
> + projects:
This is `run-on-projects` now (but the format is the same)
Attachment #8833291 -
Flags: review?(dustin) → review+
Comment 10•8 years ago
|
||
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/b8370948ee4a
Run valgrind-mochitest twice a day as a Tier 2 job. r=dustin
Comment 11•8 years ago
|
||
bugherder |
Comment 12•8 years ago
|
||
https://treeherder.mozilla.org/logviewer.html#?job_id=75018174&repo=mozilla-central
[task 2017-02-07T04:01:56.233593Z] Traceback (most recent call last):
[task 2017-02-07T04:01:56.233644Z] File "/home/worker/checkouts/gecko/taskcluster/mach_commands.py", line 165, in taskgraph_decision
[task 2017-02-07T04:01:56.233690Z] return taskgraph.decision.taskgraph_decision(options)
[task 2017-02-07T04:01:56.233745Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 106, in taskgraph_decision
[task 2017-02-07T04:01:56.233794Z] create_tasks(tgg.optimized_task_graph, tgg.label_to_taskid, parameters)
[task 2017-02-07T04:01:56.233847Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/create.py", line 76, in create_tasks
[task 2017-02-07T04:01:56.233874Z] f.result()
[task 2017-02-07T04:01:56.233947Z] File "/home/worker/checkouts/gecko/python/futures/concurrent/futures/_base.py", line 398, in result
[task 2017-02-07T04:01:56.233983Z] return self.__get_result()
[task 2017-02-07T04:01:56.234036Z] File "/home/worker/checkouts/gecko/python/futures/concurrent/futures/thread.py", line 55, in run
[task 2017-02-07T04:01:56.234086Z] result = self.fn(*self.args, **self.kwargs)
[task 2017-02-07T04:01:56.234146Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/create.py", line 108, in create_task
[task 2017-02-07T04:01:56.234180Z] res.raise_for_status()
[task 2017-02-07T04:01:56.234234Z] File "/home/worker/checkouts/gecko/python/requests/requests/models.py", line 840, in raise_for_status
[task 2017-02-07T04:01:56.234274Z] raise HTTPError(http_error_msg, response=self)
[task 2017-02-07T04:01:56.234330Z] HTTPError: 409 Client Error: Conflict for url: http://taskcluster/queue/v1/task/XYQVC7MnQA2wZSjl949hXg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 13•8 years ago
|
||
I had thought these changes were backed out, but in dxr and my local mozilla-inbound checkout I see all code from the patch, and on mozilla-central, I can see the Vg job:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=Gecko%20Decision%20Task%20opt%20Decision%20task%20for%20cron%20job%20nightly-mochitest-valgrind%20cron(Vg)&selectedJob=75391724
I think next up is getting more tests running under Vg. In looking at the Vg task, I don't see mochitest-valgrind tests running?
We have this transform:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#150
:dustin, could you help shed light on why you think we run the Vg task, but not the mochitest-valgrind tests?
Status: REOPENED → ASSIGNED
Flags: needinfo?(dustin)
Comment 14•8 years ago
|
||
from https://bugzilla.mozilla.org/show_bug.cgi?id=1339148:
for example code coverage and valgrind ran on this task:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=4ec373fafebf79846cd5fde0561ac02fa0bb9647&filter-searchStr=cron&group_state=expanded
valgrind is defined in cron.yml:
https://dxr.mozilla.org/mozilla-central/source/.cron.yml#42
which calls:
target-tasks-method: mochitest_valgrind
and the transform is here:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#152
and the definition of the tests are here:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#808
Comment 16•8 years ago
|
||
Ugh, I mid-aired myself, having written out a long explanation. Here's what I did, in summary: looked at the decision task (Vg), and at the logs, to see that the target task method filtered out all but four tasks, so it is probably to blame. Then I found a valgrind task in full-task-graph.json, and looked at its attributes. I mentally executed the filter in the target task method against those attributes, and noted that the `unittest_suite` attribute is not what you want (it is "mochitest" in this case). I think you want to check both `unittest_suite` and `unittest_flavor`.
Flags: needinfo?(dustin)
Assignee | ||
Comment 17•8 years ago
|
||
thanks for the pointer :dustin. I believe I have this correct for valgrind and code coverage, so please review and let me know what you think.
Attachment #8836841 -
Flags: review?(dustin)
Comment 18•8 years ago
|
||
Comment on attachment 8836841 [details] [diff] [review]
use proper taskcluster attributes to define cron tasks
Review of attachment 8836841 [details] [diff] [review]:
-----------------------------------------------------------------
I think this *might* work, but can definitely be clearer (at least clear enough that it's not uncertain whether it would work..)
::: taskcluster/taskgraph/target_tasks.py
@@ +157,5 @@
> # only select platforms
> if platform not in ['linux64']:
> return False
> + if task.attributes.get('unittest_suite') or \
> + task.attributes.get('unittest_flavor'):
Is this conditional is meant to guard against KeyError in the accesses below? If so, it should be "and" not "or".
@@ +158,5 @@
> if platform not in ['linux64']:
> return False
> + if task.attributes.get('unittest_suite') or \
> + task.attributes.get('unittest_flavor'):
> + if not (task.attributes['unittest_suite'].startswith('mochitest-valgrind') or
No suite names start with mochitest-valgrind. Check out the full-task-graph.json and find the valgrind test to see what attributes it has (I think the suite is `mochitest`, but double-check me)
@@ +159,5 @@
> return False
> + if task.attributes.get('unittest_suite') or \
> + task.attributes.get('unittest_flavor'):
> + if not (task.attributes['unittest_suite'].startswith('mochitest-valgrind') or
> + task.attributes['unittest_flavor'].startswith('mochitest-valgrind')):
This will probably end up accidentally doing what you want, since you combine these with "or", and since no non-mochitest suites have a flavor named `mochitest-valgrind`.
@@ +169,5 @@
> @_target_task('nightly_code_coverage')
> def target_tasks_code_coverage(full_task_graph, parameters):
> """Target tasks that generate coverage data."""
> def filter(task):
> + platform = task.attributes.get('test_platform')
I can't tell what the diff is in this hunk....
Attachment #8836841 -
Flags: review?(dustin) → review-
Assignee | ||
Comment 19•8 years ago
|
||
after discussing over vidyo, I understand more of what I am doing. I verified this with data from the task-graph.json I did on try server:
https://public-artifacts.taskcluster.net/KQ_z07M7Qo-xCesUkJuk7g/0/public/task-graph.json
Attachment #8836841 -
Attachment is obsolete: true
Attachment #8838622 -
Flags: review?(dustin)
Comment 20•8 years ago
|
||
Comment on attachment 8838622 [details] [diff] [review]
use proper taskcluster attributes to define cron tasks
Review of attachment 8838622 [details] [diff] [review]:
-----------------------------------------------------------------
What's got two thumbs and likes this patch?
Attachment #8838622 -
Flags: review?(dustin) → review+
Comment 21•8 years ago
|
||
Pushed by jmaher@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/bb77e8d293e0
adjust target tasks to use correct taskcluster attributes. r=dustin
Comment 22•8 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 23•8 years ago
|
||
this is deployed and I assume working properly (the code coverage cron task is). I do not see any test jobs related to valgrind, is it possible there are none ready?
Flags: needinfo?(jseward)
Comment 24•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #23)
> this is deployed and I assume working properly
Joel, that's great to hear.
> I do not see any test jobs related to valgrind, is it possible there
> are none ready?
I am not sure what I need to provide here in order to complete the
picture. Currently I have it that if you push to try with the
syntax "-b o -p linux64 -u mochitest-valgrind -t none", you get a
v/mochi run, which shows up in the usual way in Treeherder, and
that is what I'd hoped to have auto-run.
There is an entry in .cron.yml that looks plausible:
- name: nightly-mochitest-valgrind
job:
type: decision-task
treeherder-symbol: Vg
target-tasks-method: mochitest_valgrind
run-on-projects:
- mozilla-central
when:
- {hour: 16, minute: 0}
- {hour: 4, minute: 0}
Is that what you were looking for, or something else?
Sorry to be so vague about this.
Flags: needinfo?(jseward) → needinfo?(jmaher)
Assignee | ||
Comment 25•8 years ago
|
||
so this entry is supposed to trigger a valgrind build and/or all tests (i.e. mochitest-valgrind). This is defined here:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#152
@_target_task('mochitest_valgrind')
def target_tasks_valgrind(full_task_graph, parameters):
"""Target tasks that only run on the cedar branch."""
def filter(task):
platform = task.attributes.get('test_platform')
if platform not in ['linux64']:
return False
if task.attributes.get('unittest_suite', '').startswith('mochitest') and \
task.attributes.get('unittest_flavor', '').startswith('valgrind-plain'):
return True
return False
return [l for l, t in full_task_graph.tasks.iteritems() if filter(t)]
and my assumption was that this would launch:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#761
I guess my question is- are we expecting those tests to be run twice/day? If not, then we have more work to do.
Flags: needinfo?(jmaher)
Comment 26•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #25)
> I guess my question is- are we expecting those tests to be run twice/day?
> If not, then we have more work to do.
I am lost, unfortunately. Can we talk on irc?
Comment 27•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #25)
> I guess my question is- are we expecting those tests to be run twice/day?
Yes, it is those. Although they are a few lines further down the file now:
https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#773
Assignee | ||
Comment 28•8 years ago
|
||
I am not clear why this isn't working. For example we have a mochitest-valgrind-1 definition here:
https://public-artifacts.taskcluster.net/H9uSK3PmSF2hnA55WX_ygg/0/public/task-graph.json
attributes
build_platform "linux64"
build_type "opt"
e10s false
kind "test"
run_on_projects
test_chunk "1"
test_platform "linux64"
unittest_flavor "valgrind-plain"
unittest_suite "mochitest"
unittest_try_name "mochitest-valgrind"
and in our cron.yml target task that we call ( https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#152 ):
@_target_task('mochitest_valgrind')
def target_tasks_valgrind(full_task_graph, parameters):
"""Target tasks that only run on the cedar branch."""
def filter(task):
platform = task.attributes.get('test_platform')
if platform not in ['linux64']:
return False
if task.attributes.get('unittest_suite', '').startswith('mochitest') and \
task.attributes.get('unittest_flavor', '').startswith('valgrind-plain'):
return True
return False
return [l for l, t in full_task_graph.tasks.iteritems() if filter(t)]
I really do not know why these are not scheduled, for example:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=106a96755d3bcebe64bbbc3b521d65d262ba9c02&filter-searchStr=cron%20valgrind
:dustin, can you see anything that is going wrong here?
Flags: needinfo?(dustin)
Updated•8 years ago
|
Status: RESOLVED → REOPENED
Flags: needinfo?(dustin)
Resolution: FIXED → ---
Updated•8 years ago
|
Flags: needinfo?(dustin)
Comment 29•8 years ago
|
||
It actually did run those jobs
https://tools.taskcluster.net/task-group-inspector/#/NK8dGPYGR9WOXbJrmfesfQ?_k=ybsuyh
what's not clear is, why they didn't show up in treeherder.
https://queue.taskcluster.net/v1/task/ce1Plsk2RjCDtNqB2DW6vQ has
"routes": [
"tc-treeherder.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1",
"tc-treeherder-stage.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1"
],
and
"extra": {
"treeherder": {
"jobKind": "test",
"groupSymbol": "tc-M-V",
"collection": {
"opt": true
},
"machine": {
"platform": "linux64"
},
"groupName": "Mochitests on Valgrind executed by TaskCluster",
"tier": 1,
"symbol": "10"
}
the decision task (the Vg that does show up) has
"routes": [
"index.gecko.v2.mozilla-central.latest.firefox.decision", // not relevant to TH
"tc-treeherder.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1",
"tc-treeherder-stage.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1"
],
and
"extra": {
"treeherder": {
"symbol": "Vg",
"groupSymbol": "cron"
}
}
Greg, based on what you know of the TH integration, can you see why those might not have shown up? I tried turning off exclusions, etc., and no luck.
Flags: needinfo?(dustin) → needinfo?(garndt)
Comment 30•8 years ago
|
||
I turned off the "excluded jobs" filter and see it:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=106a96755d3bcebe64bbbc3b521d65d262ba9c02&filter-searchStr=linux64%20tc-m-v&group_state=expanded&filter-tier=1&filter-tier=2&filter-tier=3&selectedJob=80414218&exclusion_profile=false
Flags: needinfo?(garndt)
Assignee | ||
Comment 31•8 years ago
|
||
odd, I tried that and it looks like dustin tried that- either way, this is working.
:jseward, do you have what you need here? Is there a plan for getting these green and showing again?
Flags: needinfo?(jseward)
Comment 32•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #31)
Great!
> :jseward, do you have what you need here?
Nearly! One more question: how do I find these URLs? Is there a
way for me to see all the valgrind runs and nothing else?
> Is there a plan for getting these green and showing again?
There are 3 sources of failure:
(1) Timeouts caused by valgrind. I looked at these a while back and can
get back to them.
(2) Errors reported by valgrind. I can fix the real ones and suppress
the false ones, and have slowly been doing so.
(3) Failures that would have occurred anyway (running natively). These
are a bit of a problem because there's no easy way to distinguish
them from (2) without having to look at all the failing chunks
-- in both cases they go orange. Ideally they could be a different
colour. I filed bug 1341406 about that.
Flags: needinfo?(jseward) → needinfo?(jmaher)
Assignee | ||
Comment 33•8 years ago
|
||
here is a link to help you:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=valgrind&exclusion_profile=false
I go to:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central
search for 'valgrind'
then click the 'excluded jobs' which allows you to see the results.
Thanks for the information about the greening up! These are getting greener :)
Please close this if you feel that there is nothing else to do here.
Flags: needinfo?(jmaher)
Comment 34•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #33)
Joel, Dustin, thank you for doing this!
A perhaps better URL can be constructed by searching for the "tc-M-V"
string:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=tc-M-V&exclusion_profile=false
So it seems to work. But I noticed just now an interesting anomaly,
which you can see at (eg)
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=eb23648534779c110f3a1f2baae1849ae4a9c570&filter-searchStr=tc-M-V&exclusion_profile=false
Each run would normally display 40 chunk results inside the tc-M-V parentheses,
but this one -- and others I've seen -- display 80. At first, I thought
that each job had been run twice. But no, what seems to have happened is
that treeherder is displaying together the result of two different sets of
runs, one of which was requested at "Sat Mar 4, 17:04:04" and the other
at "Sun Mar 5, 5:02:55".
Is that expected? I assume this is somehow related to the fact that there
were no merges to m-c over the weekend (or at least in the interval
between the two abovementioned dates) and so the the two builds are
regarded as identical.
Is it possible to fix this easily?
Flags: needinfo?(jmaher)
Comment 35•8 years ago
|
||
Treeherder indexes by revision, so if the jobs ran on the same revision, no, there is no distinction.
Assignee | ||
Comment 36•8 years ago
|
||
the problem here is that we are running twice/day and we have no pushes to m-c, so it schedules tests on the most recent revision, which happens to be a duplicate. Would you like to go once/day? This is only a problems on low volume periods of time.
Flags: needinfo?(jmaher)
Comment 37•8 years ago
|
||
(In reply to Joel Maher ( :jmaher) from comment #36)
> [..] Would you like to go once/day?
No, it's fine like it is. I understand why this is happening, it's no
big deal, and I'd prefer to stick with the more-or-less 12 hour latency.
So I'm going to close. I think we're done here. Thank you for your
collective efforts.
Status: REOPENED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•