Closed
Bug 1271077
Opened 9 years ago
Closed 9 years ago
Surface resource utilization to Treeherder
Categories
(Release Engineering :: Applications: MozharnessCore, defect)
Release Engineering
Applications: MozharnessCore
Tracking
(firefox49 fixed)
RESOLVED
FIXED
Tracking | Status | |
---|---|---|
firefox49 | --- | fixed |
People
(Reporter: gps, Assigned: gps)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
Bug 1271035 made reftests ~15% faster across the board by eliminating a major source of I/O during the jobs.
Bug 859573 implemented resource monitoring in mozharness. Had the data been surfaced more visibly, I think someone would have noticed sooner.
My proposal for this bug is to:
1) Use "TinderboxPrint" to surface minimal resource utilization metrics in Treeherder
2) Send minimal resource utilization metrics to Perfherder so we can look at trends over time
I think recording overall CPU and I/O would be a good start.
Assignee | ||
Comment 1•9 years ago
|
||
I'll work on this next week.
In the mean time https://treeherder.mozilla.org/#/jobs?repo=try&revision=235c714f2f50 should hopefully have some nice data to report in Treeherder in a few hours. If you click on the job, it should print system metrics for the overall job in the "job details" pane.
Assignee: nobody → gps
Status: NEW → ASSIGNED
Assignee | ||
Comment 2•9 years ago
|
||
I finally got the formatting correct in https://treeherder.mozilla.org/#/jobs?repo=try&revision=d09130fbe0545f1b6e29a97045949e1d385a819b
Comment 3•9 years ago
|
||
is the goal to get this in perfherder to track alerts, or have some basic data easy to find in treeherder?
Assignee | ||
Comment 4•9 years ago
|
||
The system resource utilization during job execution is important: it
gives us an idea of the efficiency (or lack thereof) of activities.
As bug 1271035 showed us, there can be some really wonky things going
on during job execution. To help us notice these things, this commit
prints some overall resource utilization data with the special
"TinderboxPrint" syntax so it appears in Treeherder. This should
hopefully draw the attention of more eye balls and cause people to
ask questions about what jobs are doing.
This supplements the existing printing of total resource usage in the
logs. Unfortunately nobody was really looking at that data because it
wasn't exposed that well. This commit should change that.
Review commit: https://reviewboard.mozilla.org/r/51477/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/51477/
Attachment #8750559 -
Flags: review?(jlund)
Assignee | ||
Comment 5•9 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #3)
> is the goal to get this in perfherder to track alerts, or have some basic
> data easy to find in treeherder?
Both.
I want the data in Treeherder to increase the chances of eyeballs spotting something fishy.
I also want alerts because if e.g. someone refactors Places and causes tests to use 100 MB for I/O, that's important to detect. I also want to know if things like e.g. refactoring Marionette make tests faster or slower. We have next to 0 insight into these things today.
Anyway, I'm inclined to punt Perfherder/alerting to a follow-up bug. Let's get the numbers in front of eyeballs as a first step.
Assignee | ||
Comment 6•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/1-2/
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/2-3/
Assignee | ||
Comment 8•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/3-4/
Comment 9•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
https://reviewboard.mozilla.org/r/51477/#review48545
neat. I think this will be super helpful. couple comments below
::: testing/mozharness/mozharness/base/python.py:577
(Diff revision 4)
> + continue
> +
> + if attr in ('count', 'index'):
> + continue
> +
> + value = getattr(cpu_times, attr)
hm, this seems odd. I would have thought this is a container that we can iterate over..
::: testing/mozharness/mozharness/base/python.py:584
(Diff revision 4)
> + if percent > 1.00:
> + self._tinderbox_print('CPU {}<br/>{:,.1f} ({:,.1f}%)'.format(
> + attr, value, percent))
> +
> + # Swap on Windows isn't reported by psutil.
> + if os.name not in ('nt', 'ce'):
fyi - there is _is_windows()
https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#70
::: testing/mozharness/mozharness/base/python.py:594
(Diff revision 4)
> start_time, end_time = rm.phases[phase]
> - cpu_percent, cpu_times, io = resources(phase)
> + cpu_percent, cpu_times, io, swap = resources(phase)
> log_usage(phase, end_time - start_time, cpu_percent, cpu_times, io)
>
> + def _tinderbox_print(self, message):
> + self.info('TinderboxPrint: %s' % message)
outside of this scope but this would be nice to be defined in core and replace anytime we log for treeherder.
Attachment #8750559 -
Flags: review?(jlund)
Comment 10•9 years ago
|
||
https://reviewboard.mozilla.org/r/51477/#review48551
feel free to r? me again once you comment back. I'm not expecting code change.
Assignee | ||
Comment 11•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/4-5/
Attachment #8750559 -
Flags: review?(jlund)
Assignee | ||
Comment 12•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/5-6/
Assignee | ||
Comment 13•9 years ago
|
||
https://reviewboard.mozilla.org/r/51477/#review48601
I noticed percentages on the Try build were >100% in some scenarios. This is because systems have multiple cores. I changed to report percentage in terms of total CPU. So e.g. 1 core 100% on a 4 core machine will report as 25% total CPU. There is room to report the CPU core count. We'd need to expose that from the resource monitor. We can do that as a follow-up.
Comment 14•9 years ago
|
||
Comment on attachment 8750559 [details]
MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund
https://reviewboard.mozilla.org/r/51477/#review48993
Attachment #8750559 -
Flags: review?(jlund) → review+
Comment 15•9 years ago
|
||
Assignee | ||
Comment 16•9 years ago
|
||
Will tackle Perfherder in another bug.
Summary: Surface resource utilization to Treeherder, Perfherder → Surface resource utilization to Treeherder
Comment 17•9 years ago
|
||
bugherder |
You need to log in
before you can comment on or make changes to this bug.
Description
•