Closed Bug 812352 Opened 12 years ago Closed 7 years ago

[Meta] New counter handling architecture for Talos

Categories

(Testing :: Talos, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED WONTFIX

People

(Reporter: k0scist, Unassigned)

References

Details

(Whiteboard: [talos_wishlist])

To review, counters accumulate data auxiliary to the "main"
measurement -- either the time of the test or some computed value.

There is a pool of possible counters.

These counters may or may not be available depending on your environment:
  - is an external program installed/available?
  - what operating system are you on?
  - etc.

You activate these counters
  - either from the command line
  - or in test.py
  - or by editing the YAML file if you're crazy

However, the handling of this is far from unified:

* So you have these counters listed in test.py:
http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/test.py#l155

win_counters = ['Working Set', 'Private Bytes', '%Processor Time']
w7_counters = ['Working Set', 'Private Bytes', '% Processor Time', 'Modified Page List Bytes']
linux_counters = ['Private Bytes', 'RSS', 'XRes']
mac_counters = ['Private Bytes', 'RSS']

* You can see the available linux and mac counters in `counter_dict`:
  http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/cmanager_linux.py
  http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/cmanager_mac.py

* Windows counters depend on the mysterious win32pdh

* We don't have any remote counters, currently, just a stub class:
  http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/cmanager_remote.py

* There are also some counters that we don't treat like counters:
  shutdown, Main_RSS, Content_RSS, responsiveness

* Most counters are per-cycle; some counters persist across cycles
  (shutdown, responsiveness)

* Instead of living near the rest of the counter logic, the short name
  for counters lives in output:
  http://hg.mozilla.org/build/talos/file/07322bbe0f7d/talos/output.py#l75


Actions:

 - each counter should be transitioned to a class:

"""
class Counter(object):
    '''abstract base class for counter object'''

    @classmethod
    def available(cls):
        '''returns if the counter is available on your system'''
"""

- all counter classes should be moved to a central registry

- PerfConfigurator should be modified to allow counters to be added
  and inspected

A good interface would be:

PerfConfigurator --all-counters:
  list all counters available and for each one what is required for
  them to be available

PerfConfigurator --available-counters:
  list all counters available for your system (or for remote, the
  remote system)

PerfConfigurator --counter 'Private Bytes':
  will add the private bytes counter to the tests' counter set

- I would like to merge the xperf counters back into win counters:
  that is in fact what they are
I would love to start working on this though not assigning to myself yet since I'm not sure if I have enough time. If anybody new wants to work on this, I can also assist/mentor.
This bug will be quite intricate, and though it may be broken into several parts, it will be a bit of effort to get this right.  Note also that this is low-priority right now.

The initial and perhaps most painful work will be getting all of the counters under one umbrella.  Currently, counters do not behave the same and there are several "global" counters that cut across browser cycles.
It is worth reading and understanding:
* http://hg.mozilla.org/build/talos/file/751345a46752/talos/ttest.py#l301
* http://hg.mozilla.org/build/talos/file/751345a46752/talos/ttest.py#l366
* http://hg.mozilla.org/build/talos/file/751345a46752/talos/ttest.py#l404
* http://hg.mozilla.org/build/talos/file/751345a46752/talos/ttest.py#l432
(as well as the cmanager_*.py files). It would be kinda nice to have this code have a more blackbox feeling.

In fact one thing that would be nice to have right now is documentation about his this currently works: bug Bug 815009 . Of course this would have to be changed once we revised this system.

One of the things that would be nice to iterate on here is the API that will make this possible.  Counters should have an API (will probably need an API) that will support checking if they are available (e.g. "is an executable on the system PATH?" or "are you on the right operating system?"), instantiation (possibly with parameters, e.g. executable path), and measuring.

I'm not convinced that e.g. linux_counters, win32_counters, mac_counters, xperf_counters, etc, is necessarily how this should be specified.  I don't really have any strong suggestions to fix this at this point in time, but maybe this can be inferred from the overhaul here.  Perhaps just "counters" is sufficient. (Remember we're aiming for the future here vs trying to support the past, so we should figure out what we ultimately want. We can figure out what to do about e.g. buildbot compatability when we're preparing to deploy if this is still an issue then.)

The CounterManager API will probably also need an overhaul.  PerfConfigurator.py will need to be updated with the ability to set counters and list --all- and --available-counters.

So yeah.  Not necessarily a small amount of work.  As said, one thing I'd like to iterate on is an architecture/API that supports everything we do currently and is suggested in this bug as well as being flexible enough to support future work.
Depends on: 777757
Also, not necessarily part of this bug, but worth pointing out: missing counters are a serious problem for Talos in production (see e.g. https://bugzilla.mozilla.org/buglist.cgi?resolution=---;query_format=advanced;component=Talos;product=Testing;list_id=5041061 , although we turned of erring out for this failure, that's pretty undesirable). Whatever the new framework is, some thought should be given to make it easy to diagnose and hopefully fix any counter errors.
Blocks: 1088251
Whiteboard: [talos_wishlist]
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.