Closed Bug 1271077 Opened 9 years ago Closed 9 years ago

Surface resource utilization to Treeherder

Categories

(Release Engineering :: Applications: MozharnessCore, defect)

defect
Not set
normal

Tracking

(firefox49 fixed)

RESOLVED FIXED
Tracking Status
firefox49 --- fixed

People

(Reporter: gps, Assigned: gps)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

Bug 1271035 made reftests ~15% faster across the board by eliminating a major source of I/O during the jobs. Bug 859573 implemented resource monitoring in mozharness. Had the data been surfaced more visibly, I think someone would have noticed sooner. My proposal for this bug is to: 1) Use "TinderboxPrint" to surface minimal resource utilization metrics in Treeherder 2) Send minimal resource utilization metrics to Perfherder so we can look at trends over time I think recording overall CPU and I/O would be a good start.
I'll work on this next week. In the mean time https://treeherder.mozilla.org/#/jobs?repo=try&revision=235c714f2f50 should hopefully have some nice data to report in Treeherder in a few hours. If you click on the job, it should print system metrics for the overall job in the "job details" pane.
Assignee: nobody → gps
Status: NEW → ASSIGNED
is the goal to get this in perfherder to track alerts, or have some basic data easy to find in treeherder?
Blocks: fastci
The system resource utilization during job execution is important: it gives us an idea of the efficiency (or lack thereof) of activities. As bug 1271035 showed us, there can be some really wonky things going on during job execution. To help us notice these things, this commit prints some overall resource utilization data with the special "TinderboxPrint" syntax so it appears in Treeherder. This should hopefully draw the attention of more eye balls and cause people to ask questions about what jobs are doing. This supplements the existing printing of total resource usage in the logs. Unfortunately nobody was really looking at that data because it wasn't exposed that well. This commit should change that. Review commit: https://reviewboard.mozilla.org/r/51477/diff/#index_header See other reviews: https://reviewboard.mozilla.org/r/51477/
Attachment #8750559 - Flags: review?(jlund)
(In reply to Joel Maher (:jmaher) from comment #3) > is the goal to get this in perfherder to track alerts, or have some basic > data easy to find in treeherder? Both. I want the data in Treeherder to increase the chances of eyeballs spotting something fishy. I also want alerts because if e.g. someone refactors Places and causes tests to use 100 MB for I/O, that's important to detect. I also want to know if things like e.g. refactoring Marionette make tests faster or slower. We have next to 0 insight into these things today. Anyway, I'm inclined to punt Perfherder/alerting to a follow-up bug. Let's get the numbers in front of eyeballs as a first step.
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/1-2/
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/2-3/
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/3-4/
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund https://reviewboard.mozilla.org/r/51477/#review48545 neat. I think this will be super helpful. couple comments below ::: testing/mozharness/mozharness/base/python.py:577 (Diff revision 4) > + continue > + > + if attr in ('count', 'index'): > + continue > + > + value = getattr(cpu_times, attr) hm, this seems odd. I would have thought this is a container that we can iterate over.. ::: testing/mozharness/mozharness/base/python.py:584 (Diff revision 4) > + if percent > 1.00: > + self._tinderbox_print('CPU {}<br/>{:,.1f} ({:,.1f}%)'.format( > + attr, value, percent)) > + > + # Swap on Windows isn't reported by psutil. > + if os.name not in ('nt', 'ce'): fyi - there is _is_windows() https://dxr.mozilla.org/mozilla-central/source/testing/mozharness/mozharness/base/script.py#70 ::: testing/mozharness/mozharness/base/python.py:594 (Diff revision 4) > start_time, end_time = rm.phases[phase] > - cpu_percent, cpu_times, io = resources(phase) > + cpu_percent, cpu_times, io, swap = resources(phase) > log_usage(phase, end_time - start_time, cpu_percent, cpu_times, io) > > + def _tinderbox_print(self, message): > + self.info('TinderboxPrint: %s' % message) outside of this scope but this would be nice to be defined in core and replace anytime we log for treeherder.
Attachment #8750559 - Flags: review?(jlund)
https://reviewboard.mozilla.org/r/51477/#review48551 feel free to r? me again once you comment back. I'm not expecting code change.
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/4-5/
Attachment #8750559 - Flags: review?(jlund)
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund Review request updated; see interdiff: https://reviewboard.mozilla.org/r/51477/diff/5-6/
https://reviewboard.mozilla.org/r/51477/#review48601 I noticed percentages on the Try build were >100% in some scenarios. This is because systems have multiple cores. I changed to report percentage in terms of total CPU. So e.g. 1 core 100% on a 4 core machine will report as 25% total CPU. There is room to report the CPU core count. We'd need to expose that from the resource monitor. We can do that as a follow-up.
Comment on attachment 8750559 [details] MozReview Request: Bug 1271077 - Print system resource utilization so it appears in Treeherder; r?jlund https://reviewboard.mozilla.org/r/51477/#review48993
Attachment #8750559 - Flags: review?(jlund) → review+
Will tackle Perfherder in another bug.
Summary: Surface resource utilization to Treeherder, Perfherder → Surface resource utilization to Treeherder
Blocks: 1272176
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: