Closed Bug 1288993 Opened 8 years ago Closed 8 years ago

Run valgrind-mochitest twice a day as a Tier 2 job

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(firefox54 fixed)

RESOLVED FIXED
Tracking Status
firefox54 --- fixed

People

(Reporter: n.nethercote, Assigned: jmaher)

References

Details

Attachments

(2 files, 2 obsolete files)

We want to run the valgrind-mochitest job twice a day. It won't be a failures-are-cause-for-backout job, because it doesn't run on every push. That means it'll probably be hidden by default. But it'll be a big improvement on the current situation which is that jseward runs mochitest under Valgrind on his own machine every once in a while -- the results will be more frequent and visible to anyone who cares to look.
Summary: Run the valgrind-mochitest job twice a day → Run valgrind-mochitest twice a day as a Tier 2 job
Assignee: nobody → jseward
jonasfj, jseward, we spoke about this on London. I believe there was some talk about a cron-like feature on TaskCluster.
That's probably the hooks service. https://tools.taskcluster.net/hooks/
Blocks: 1289646
now that cron.yml is hooked up, lets work on this.
I am happy to take this bug- I know we want this twice/day, not sure if depending on the nightlies are a good idea- this should get us started. Please inform on how to test this or what else I should do
Assignee: jseward → jmaher
Status: NEW → ASSIGNED
Attachment #8832958 - Flags: feedback?(dustin)
Comment on attachment 8832958 [details] [diff] [review] run mochitest-valgrind on m-c on the nightly builds Review of attachment 8832958 [details] [diff] [review]: ----------------------------------------------------------------- ::: .cron.yml @@ +29,5 @@ > > + - name: nightly-mochitest-valgrind > + job: > + type: decision-task > + treeherder-symbol: tc-M-V() That looks like a group name with no symbol - does that do something special in TreeHerder? I suspect something like Vg would be better; it will appear on the decision task row in treeherder. @@ +30,5 @@ > + - name: nightly-mochitest-valgrind > + job: > + type: decision-task > + treeherder-symbol: tc-M-V() > + triggered-by: nightly This needs some work still, but I don't think you want --triggered-by=nightly here -- you just want a "regular" decision task, only with a target tasks method @@ +35,5 @@ > + target-tasks-method: mochitest_valgrind > + projects: > + - mozilla-central > + when: > + - {hour: 16, minute: 0} It would probably be good to run this at a different time from the nightlies, just to get a more even task load. ::: taskcluster/taskgraph/target_tasks.py @@ +138,5 @@ > return [l for l in filtered_for_project if filter(full_task_graph[l])] > > > +@_target_task('mochitest_valgrind') > +def target_tasks_valgrind(full_task_graph, parameters): This is great -- exactly how target task methods were intended :)
Attachment #8832958 - Flags: feedback?(dustin) → feedback+
thanks for the feedback, I have adjusted this and I believe what I have is more in line with a final solution- please r- if there are nits or if I am doing something wrong.
Attachment #8832958 - Attachment is obsolete: true
Attachment #8833291 - Flags: review?(dustin)
Comment on attachment 8833291 [details] [diff] [review] run mochitest-valgrind twice/day Review of attachment 8833291 [details] [diff] [review]: ----------------------------------------------------------------- ::: .cron.yml @@ +29,5 @@ > + job: > + type: decision-task > + treeherder-symbol: Vg > + target-tasks-method: mochitest_valgrind > + projects: This is `run-on-projects` now (but the format is the same)
Attachment #8833291 - Flags: review?(dustin) → review+
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/b8370948ee4a Run valgrind-mochitest twice a day as a Tier 2 job. r=dustin
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
https://treeherder.mozilla.org/logviewer.html#?job_id=75018174&repo=mozilla-central [task 2017-02-07T04:01:56.233593Z] Traceback (most recent call last): [task 2017-02-07T04:01:56.233644Z] File "/home/worker/checkouts/gecko/taskcluster/mach_commands.py", line 165, in taskgraph_decision [task 2017-02-07T04:01:56.233690Z] return taskgraph.decision.taskgraph_decision(options) [task 2017-02-07T04:01:56.233745Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/decision.py", line 106, in taskgraph_decision [task 2017-02-07T04:01:56.233794Z] create_tasks(tgg.optimized_task_graph, tgg.label_to_taskid, parameters) [task 2017-02-07T04:01:56.233847Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/create.py", line 76, in create_tasks [task 2017-02-07T04:01:56.233874Z] f.result() [task 2017-02-07T04:01:56.233947Z] File "/home/worker/checkouts/gecko/python/futures/concurrent/futures/_base.py", line 398, in result [task 2017-02-07T04:01:56.233983Z] return self.__get_result() [task 2017-02-07T04:01:56.234036Z] File "/home/worker/checkouts/gecko/python/futures/concurrent/futures/thread.py", line 55, in run [task 2017-02-07T04:01:56.234086Z] result = self.fn(*self.args, **self.kwargs) [task 2017-02-07T04:01:56.234146Z] File "/home/worker/checkouts/gecko/taskcluster/taskgraph/create.py", line 108, in create_task [task 2017-02-07T04:01:56.234180Z] res.raise_for_status() [task 2017-02-07T04:01:56.234234Z] File "/home/worker/checkouts/gecko/python/requests/requests/models.py", line 840, in raise_for_status [task 2017-02-07T04:01:56.234274Z] raise HTTPError(http_error_msg, response=self) [task 2017-02-07T04:01:56.234330Z] HTTPError: 409 Client Error: Conflict for url: http://taskcluster/queue/v1/task/XYQVC7MnQA2wZSjl949hXg
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
I had thought these changes were backed out, but in dxr and my local mozilla-inbound checkout I see all code from the patch, and on mozilla-central, I can see the Vg job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=Gecko%20Decision%20Task%20opt%20Decision%20task%20for%20cron%20job%20nightly-mochitest-valgrind%20cron(Vg)&selectedJob=75391724 I think next up is getting more tests running under Vg. In looking at the Vg task, I don't see mochitest-valgrind tests running? We have this transform: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#150 :dustin, could you help shed light on why you think we run the Vg task, but not the mochitest-valgrind tests?
Status: REOPENED → ASSIGNED
Flags: needinfo?(dustin)
Ugh, I mid-aired myself, having written out a long explanation. Here's what I did, in summary: looked at the decision task (Vg), and at the logs, to see that the target task method filtered out all but four tasks, so it is probably to blame. Then I found a valgrind task in full-task-graph.json, and looked at its attributes. I mentally executed the filter in the target task method against those attributes, and noted that the `unittest_suite` attribute is not what you want (it is "mochitest" in this case). I think you want to check both `unittest_suite` and `unittest_flavor`.
Flags: needinfo?(dustin)
thanks for the pointer :dustin. I believe I have this correct for valgrind and code coverage, so please review and let me know what you think.
Attachment #8836841 - Flags: review?(dustin)
Comment on attachment 8836841 [details] [diff] [review] use proper taskcluster attributes to define cron tasks Review of attachment 8836841 [details] [diff] [review]: ----------------------------------------------------------------- I think this *might* work, but can definitely be clearer (at least clear enough that it's not uncertain whether it would work..) ::: taskcluster/taskgraph/target_tasks.py @@ +157,5 @@ > # only select platforms > if platform not in ['linux64']: > return False > + if task.attributes.get('unittest_suite') or \ > + task.attributes.get('unittest_flavor'): Is this conditional is meant to guard against KeyError in the accesses below? If so, it should be "and" not "or". @@ +158,5 @@ > if platform not in ['linux64']: > return False > + if task.attributes.get('unittest_suite') or \ > + task.attributes.get('unittest_flavor'): > + if not (task.attributes['unittest_suite'].startswith('mochitest-valgrind') or No suite names start with mochitest-valgrind. Check out the full-task-graph.json and find the valgrind test to see what attributes it has (I think the suite is `mochitest`, but double-check me) @@ +159,5 @@ > return False > + if task.attributes.get('unittest_suite') or \ > + task.attributes.get('unittest_flavor'): > + if not (task.attributes['unittest_suite'].startswith('mochitest-valgrind') or > + task.attributes['unittest_flavor'].startswith('mochitest-valgrind')): This will probably end up accidentally doing what you want, since you combine these with "or", and since no non-mochitest suites have a flavor named `mochitest-valgrind`. @@ +169,5 @@ > @_target_task('nightly_code_coverage') > def target_tasks_code_coverage(full_task_graph, parameters): > """Target tasks that generate coverage data.""" > def filter(task): > + platform = task.attributes.get('test_platform') I can't tell what the diff is in this hunk....
Attachment #8836841 - Flags: review?(dustin) → review-
after discussing over vidyo, I understand more of what I am doing. I verified this with data from the task-graph.json I did on try server: https://public-artifacts.taskcluster.net/KQ_z07M7Qo-xCesUkJuk7g/0/public/task-graph.json
Attachment #8836841 - Attachment is obsolete: true
Attachment #8838622 - Flags: review?(dustin)
Comment on attachment 8838622 [details] [diff] [review] use proper taskcluster attributes to define cron tasks Review of attachment 8838622 [details] [diff] [review]: ----------------------------------------------------------------- What's got two thumbs and likes this patch?
Attachment #8838622 - Flags: review?(dustin) → review+
Pushed by jmaher@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/bb77e8d293e0 adjust target tasks to use correct taskcluster attributes. r=dustin
Status: ASSIGNED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
this is deployed and I assume working properly (the code coverage cron task is). I do not see any test jobs related to valgrind, is it possible there are none ready?
Flags: needinfo?(jseward)
(In reply to Joel Maher ( :jmaher) from comment #23) > this is deployed and I assume working properly Joel, that's great to hear. > I do not see any test jobs related to valgrind, is it possible there > are none ready? I am not sure what I need to provide here in order to complete the picture. Currently I have it that if you push to try with the syntax "-b o -p linux64 -u mochitest-valgrind -t none", you get a v/mochi run, which shows up in the usual way in Treeherder, and that is what I'd hoped to have auto-run. There is an entry in .cron.yml that looks plausible: - name: nightly-mochitest-valgrind job: type: decision-task treeherder-symbol: Vg target-tasks-method: mochitest_valgrind run-on-projects: - mozilla-central when: - {hour: 16, minute: 0} - {hour: 4, minute: 0} Is that what you were looking for, or something else? Sorry to be so vague about this.
Flags: needinfo?(jseward) → needinfo?(jmaher)
so this entry is supposed to trigger a valgrind build and/or all tests (i.e. mochitest-valgrind). This is defined here: https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#152 @_target_task('mochitest_valgrind') def target_tasks_valgrind(full_task_graph, parameters): """Target tasks that only run on the cedar branch.""" def filter(task): platform = task.attributes.get('test_platform') if platform not in ['linux64']: return False if task.attributes.get('unittest_suite', '').startswith('mochitest') and \ task.attributes.get('unittest_flavor', '').startswith('valgrind-plain'): return True return False return [l for l, t in full_task_graph.tasks.iteritems() if filter(t)] and my assumption was that this would launch: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#761 I guess my question is- are we expecting those tests to be run twice/day? If not, then we have more work to do.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #25) > I guess my question is- are we expecting those tests to be run twice/day? > If not, then we have more work to do. I am lost, unfortunately. Can we talk on irc?
(In reply to Joel Maher ( :jmaher) from comment #25) > I guess my question is- are we expecting those tests to be run twice/day? Yes, it is those. Although they are a few lines further down the file now: https://dxr.mozilla.org/mozilla-central/source/taskcluster/ci/test/tests.yml#773
I am not clear why this isn't working. For example we have a mochitest-valgrind-1 definition here: https://public-artifacts.taskcluster.net/H9uSK3PmSF2hnA55WX_ygg/0/public/task-graph.json attributes build_platform "linux64" build_type "opt" e10s false kind "test" run_on_projects test_chunk "1" test_platform "linux64" unittest_flavor "valgrind-plain" unittest_suite "mochitest" unittest_try_name "mochitest-valgrind" and in our cron.yml target task that we call ( https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/target_tasks.py#152 ): @_target_task('mochitest_valgrind') def target_tasks_valgrind(full_task_graph, parameters): """Target tasks that only run on the cedar branch.""" def filter(task): platform = task.attributes.get('test_platform') if platform not in ['linux64']: return False if task.attributes.get('unittest_suite', '').startswith('mochitest') and \ task.attributes.get('unittest_flavor', '').startswith('valgrind-plain'): return True return False return [l for l, t in full_task_graph.tasks.iteritems() if filter(t)] I really do not know why these are not scheduled, for example: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=106a96755d3bcebe64bbbc3b521d65d262ba9c02&filter-searchStr=cron%20valgrind :dustin, can you see anything that is going wrong here?
Flags: needinfo?(dustin)
Status: RESOLVED → REOPENED
Flags: needinfo?(dustin)
Resolution: FIXED → ---
Flags: needinfo?(dustin)
It actually did run those jobs https://tools.taskcluster.net/task-group-inspector/#/NK8dGPYGR9WOXbJrmfesfQ?_k=ybsuyh what's not clear is, why they didn't show up in treeherder. https://queue.taskcluster.net/v1/task/ce1Plsk2RjCDtNqB2DW6vQ has "routes": [ "tc-treeherder.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1", "tc-treeherder-stage.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1" ], and "extra": { "treeherder": { "jobKind": "test", "groupSymbol": "tc-M-V", "collection": { "opt": true }, "machine": { "platform": "linux64" }, "groupName": "Mochitests on Valgrind executed by TaskCluster", "tier": 1, "symbol": "10" } the decision task (the Vg that does show up) has "routes": [ "index.gecko.v2.mozilla-central.latest.firefox.decision", // not relevant to TH "tc-treeherder.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1", "tc-treeherder-stage.v2.mozilla-central.106a96755d3bcebe64bbbc3b521d65d262ba9c02.-1" ], and "extra": { "treeherder": { "symbol": "Vg", "groupSymbol": "cron" } } Greg, based on what you know of the TH integration, can you see why those might not have shown up? I tried turning off exclusions, etc., and no luck.
Flags: needinfo?(dustin) → needinfo?(garndt)
odd, I tried that and it looks like dustin tried that- either way, this is working. :jseward, do you have what you need here? Is there a plan for getting these green and showing again?
Flags: needinfo?(jseward)
(In reply to Joel Maher ( :jmaher) from comment #31) Great! > :jseward, do you have what you need here? Nearly! One more question: how do I find these URLs? Is there a way for me to see all the valgrind runs and nothing else? > Is there a plan for getting these green and showing again? There are 3 sources of failure: (1) Timeouts caused by valgrind. I looked at these a while back and can get back to them. (2) Errors reported by valgrind. I can fix the real ones and suppress the false ones, and have slowly been doing so. (3) Failures that would have occurred anyway (running natively). These are a bit of a problem because there's no easy way to distinguish them from (2) without having to look at all the failing chunks -- in both cases they go orange. Ideally they could be a different colour. I filed bug 1341406 about that.
Flags: needinfo?(jseward) → needinfo?(jmaher)
here is a link to help you: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=valgrind&exclusion_profile=false I go to: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central search for 'valgrind' then click the 'excluded jobs' which allows you to see the results. Thanks for the information about the greening up! These are getting greener :) Please close this if you feel that there is nothing else to do here.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #33) Joel, Dustin, thank you for doing this! A perhaps better URL can be constructed by searching for the "tc-M-V" string: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&filter-searchStr=tc-M-V&exclusion_profile=false So it seems to work. But I noticed just now an interesting anomaly, which you can see at (eg) https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=eb23648534779c110f3a1f2baae1849ae4a9c570&filter-searchStr=tc-M-V&exclusion_profile=false Each run would normally display 40 chunk results inside the tc-M-V parentheses, but this one -- and others I've seen -- display 80. At first, I thought that each job had been run twice. But no, what seems to have happened is that treeherder is displaying together the result of two different sets of runs, one of which was requested at "Sat Mar 4, 17:04:04" and the other at "Sun Mar 5, 5:02:55". Is that expected? I assume this is somehow related to the fact that there were no merges to m-c over the weekend (or at least in the interval between the two abovementioned dates) and so the the two builds are regarded as identical. Is it possible to fix this easily?
Flags: needinfo?(jmaher)
Treeherder indexes by revision, so if the jobs ran on the same revision, no, there is no distinction.
the problem here is that we are running twice/day and we have no pushes to m-c, so it schedules tests on the most recent revision, which happens to be a duplicate. Would you like to go once/day? This is only a problems on low volume periods of time.
Flags: needinfo?(jmaher)
(In reply to Joel Maher ( :jmaher) from comment #36) > [..] Would you like to go once/day? No, it's fine like it is. I understand why this is happening, it's no big deal, and I'd prefer to stick with the more-or-less 12 hour latency. So I'm going to close. I think we're done here. Thank you for your collective efforts.
Status: REOPENED → RESOLVED
Closed: 8 years ago8 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: