Closed
Bug 805925
Opened 12 years ago
Closed 11 years ago
mozharness talos tpn busted on cedar on Linux and Windows: "Unable to proceed with missing counter 'tp5n_%cpu'"
Categories
(Testing :: Talos, defect)
Testing
Talos
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: emorley, Unassigned)
References
Details
https://tbpl.mozilla.org/php/getParsedLog.php?id=16493747&tree=Cedar https://tbpl.mozilla.org/php/getParsedLog.php?id=16492196&tree=Cedar https://tbpl.mozilla.org/php/getParsedLog.php?id=16492816&tree=Cedar https://tbpl.mozilla.org/php/getParsedLog.php?id=16492755&tree=Cedar eg: { 09:45:18 INFO - NOISE: Outputting talos results => {'results_urls': ['http://graphs.mozilla.org/server/collect.cgi'], 'datazilla_urls': ['https://datazilla.mozilla.org/talos']} 09:45:18 INFO - DEBUG: Working with test: tp5n 09:45:18 INFO - Generating results file: tp5n: 09:45:18 INFO - Started Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - Generating results file: tp5n: 09:45:18 INFO - Stopped Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - No results collected for: tp5n_%cpu: 09:45:18 INFO - Error Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - DEBUG: Working with test: tp5n 09:45:18 INFO - Generating results file: tp5n: 09:45:18 INFO - Started Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - Generating results file: tp5n: 09:45:18 INFO - Stopped Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - No results collected for: tp5n_%cpu: 09:45:18 INFO - Error Fri, 26 Oct 2012 09:45:18 09:45:18 INFO - FAIL: Unable to proceed with missing counter 'tp5n_%cpu' 09:45:18 ERROR - Traceback (most recent call last): 09:45:18 INFO - File "c:\talos-slave\test\build\venv\Scripts\talos-script.py", line 9, in <module> 09:45:18 INFO - load_entry_point('talos==0.0', 'console_scripts', 'talos')() 09:45:18 INFO - File "c:\talos-slave\test\build\venv\lib\site-packages\talos\run_tests.py", line 300, in main 09:45:18 INFO - run_tests(parser) 09:45:18 INFO - File "c:\talos-slave\test\build\venv\lib\site-packages\talos\run_tests.py", line 276, in run_tests 09:45:18 INFO - talos_results.output(results_urls, **results_options) 09:45:18 INFO - File "c:\talos-slave\test\build\venv\lib\site-packages\talos\results.py", line 89, in output 09:45:18 INFO - raise e 09:45:18 CRITICAL - talos.utils.talosError: "Unable to proceed with missing counter 'tp5n_%cpu'" }
Comment 1•12 years ago
|
||
that is a new one for us. I have seen this fail on tp5n_xperf_main_startup_netio and tp5n_xres. %cpu is a new one!
Reporter | ||
Comment 2•12 years ago
|
||
talos... the gift that keeps on giving!
Comment 3•12 years ago
|
||
It looks like this isn't mozharness specific? Is there something I can do to help this along?
Comment 4•12 years ago
|
||
Well, once every few days spread across every talos job that runs on every tree isn't mozharness specific, but every single time seems to be.
Comment 5•12 years ago
|
||
This may be related to bug 795531 on the mozharness side.
Comment 6•12 years ago
|
||
We have shut off making counters mandatory for the time being. In light of that, should we keep this bug open?
Comment 7•12 years ago
|
||
We need to update the talos + other packages to pick up the workaround in comment 6.
Comment 8•12 years ago
|
||
As Aki says, we do need to update talos + deps to get better here (bug 823306). The current revision, 0e9224d7bc95, raises an error if we collect counters: http://hg.mozilla.org/build/talos/diff/524c6ff1736b/talos/output.py#l208 . However, because this happens all the time, we subsequently disabled this error as missing counters caused several intermittent bugs: http://hg.mozilla.org/build/talos/file/71f7f2ed08a7/talos/output.py#l208 . See https://bugzilla.mozilla.org/show_bug.cgi?id=812315 . Unfortunately, this has merely transformed this into a different intermittent: bug 812729 . So while we should update the packages and get on parity with what we use to test m-c, we're still going to have the graphserver error. An alternative is to not care about graphserver for mozharness talos and go straight to datazilla.
Comment 9•12 years ago
|
||
I would like to say mozharness should only care about datazilla, but that would be putting the cart before the horse. Technically we could do it and the UI will support it fine. I still think we have about 2 months before all tests and data are stable, organized and validated in the datazilla UI. Then we would need to hook up the regression emailer to it, or find a way to report failures. That is one of the final steps, but until we have that at least well under way we shouldn't be talking about reporting to datazilla only.
Comment 10•11 years ago
|
||
(In reply to Joel Maher (:jmaher) from comment #9) > I would like to say mozharness should only care about datazilla, but that > would be putting the cart before the horse. > > Technically we could do it and the UI will support it fine. I still think > we have about 2 months before all tests and data are stable, organized and > validated in the datazilla UI. Then we would need to hook up the regression > emailer to it, or find a way to report failures. That is one of the final > steps, but until we have that at least well under way we shouldn't be > talking about reporting to datazilla only. Two months to get into datazilla - huh - is there anything we can do in the meanwhile? I'm asking because its unclear (at least to my quick read) what next steps are here, and if this really does block bug#713055 (talos-on-mozharness).
Comment 11•11 years ago
|
||
the next steps are for somebody who can update and debug this to figure out why we are getting failure to collect counters on mozharness only. The same test harness works just fine on buildbot/tinderbox, so something is amiss in the land of mozharness.
Comment 12•11 years ago
|
||
I am not seeing this error on linux, only windows. For linux we are timing out on the tp test, and looking at the logs there is this magical 20 minute void in the timestamps. For windows, it would be nice if we could update mozharness to use the latest talos bits.
Comment 13•11 years ago
|
||
Unless I'm mistaken, we're no longer running tp5n and we can WONTFIX this bug !!! If we do, however, we should open a new one for tp5o being busted across all platforms :\
Comment 14•11 years ago
|
||
All we did was adjust tp5 pageset and call it tp5o, this isn't a wontfix, the same harness is being run. If you feel the issues are different, then go ahead and wontfix. I do know that tp5o runs great on all our production platforms.
Comment 15•11 years ago
|
||
Apparently, this is fixed in bug 887479.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•