Closed Bug 1518969 Opened 6 years ago Closed 6 years ago

talos --gecko-profile option doesn't seem to work anymore

Categories

(Testing :: Talos, defect)

Version 3
defect
Not set
normal

Tracking

(firefox-esr60 unaffected, firefox64 unaffected, firefox65 unaffected, firefox66 fixed)

RESOLVED FIXED
mozilla66
Tracking Status
firefox-esr60 --- unaffected
firefox64 --- unaffected
firefox65 --- unaffected
firefox66 --- fixed

People

(Reporter: dholbert, Assigned: dholbert)

References

Details

(Keywords: regression)

Attachments

(1 file)

STR:

  1. Run either of these commands:
./mach talos-test -a tart --gecko-profile --cycles 2
./mach talos-test -a ts_paint --gecko-profile --cycles 2

(use --geckoProfile if you prefer; it's just an alias)

EXPECTED RESULTS:
The test should run and give me gecko profile somehow.

ACTUAL RESULTS:
I don't end up getting any gecko profile that I can discover. Instead: a firefox window spawns, runs a cycle of tests, and then takes you to some talos configuration page with a bunch of checkboxes (not profiler-related). If you quit out of that Firefox session at this point, your terminal complains with something like the following:

15:46:53     INFO -  PID 10930 | Cycle 1(1): loaded http://127.0.0.1:57819/tests/tart/tart.html#auto (next: http://127.0.0.1:57819/tests/tart/tart.html#auto)
15:48:01     INFO -  PID 10930 | [Parent 10930, Gecko_IOThread] WARNING: pipe error (92): Connection reset by peer: file /scratch/work/builds/mozilla-central/mozilla/ipc/chromium/src/chrome/common/ipc_channel_posix.cc, line 349
15:48:02     INFO -  Terminating psutil.Process(pid=10930, name='firefox', started='15:46:38')
15:48:02     INFO -  TEST-INFO | 10930: exit 0
15:48:02     INFO -  TEST-UNEXPECTED-ERROR | tart | Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]
15:48:02    ERROR -  Traceback (most recent call last):
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/run_tests.py", line 300, in run_tests
15:48:02     INFO -      talos_results.add(mytest.runTest(browser_config, test))
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/ttest.py", line 64, in runTest
15:48:02     INFO -      return self._runTest(browser_config, test_config, setup)
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/ttest.py", line 278, in _runTest
15:48:02     INFO -      else None)
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/results.py", line 95, in add
15:48:02     INFO -      global_counters=self.global_counters
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/results.py", line 326, in __init__
15:48:02     INFO -      self.parse()
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/results.py", line 353, in parse
15:48:02     INFO -      % self.report_tokens)
15:48:02     INFO -    File "/scratch/work/builds/mozilla-central/mozilla/testing/talos/talos/results.py", line 337, in error
15:48:02     INFO -      raise utils.TalosError(message)
15:48:02     INFO -  TalosError: Could not find report in browser output: [('tsformat', ('__start_report', '__end_report')), ('tpformat', ('__start_tp_report', '__end_tp_report'))]
15:48:02     INFO -  TEST-INFO took 87316ms
15:48:02     INFO -  SUITE-END | took 87s
15:48:02    ERROR - Return code: 2
15:48:02  WARNING - setting return code to 2
15:48:02    ERROR - # TBPL FAILURE #

Apparently this may have regressed recently... bgrins says this worked a few weeks ago (before Christmas).

(This isn't just me, too -- bgrins is hitting the error locally as well.)

Just for good measure, I tried exporting MOZ_UPLOAD_DIR=/tmp/upload (a directory I created) in my shell, since the documentation for --gecko-profile mentions that variable is used to figure out where to put the profile. But that doesn't seem to make a difference & nothing ends up in that directory.

I'm using a local mozilla-central build at https://hg.mozilla.org/mozilla-central/rev/cf9e7037ac4c

My mozconfig, in case it matters (probably doesn't):

mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/../obj
mk_add_options AUTOCLOBBER=1
ac_add_options --enable-linker=lld
ac_add_options --enable-debug-symbols --enable-optimize
ac_add_options --with-ccache
mk_add_options 'export CCACHE_PREFIX=icecc'
mk_add_options MOZ_MAKE_FLAGS="-j120"

Hmm, I think the regression/behavior-change here might be external to mozilla-central...

At least, I tested an arbitrary changeset from 2018-12-01 [1] and 2018-12-20 [2], and both had the same issue. (And bgrins had said he ran a command like this successfully before the holidays, which I think would've been more recently than 2018-12-01.)

bgrins, correct me if I'm wrong & the last-good-date might be longer ago...

I suppose it's possible there's a footgun that I'm hitting here, too. :) Please let me know if anyone can get this sort of command to work & if you know what I might be doing wrong...

[1] https://hg.mozilla.org/mozilla-central/rev/571c01c5f84b
[2] https://hg.mozilla.org/mozilla-central/rev/d3dbff04a1d4

(In reply to Daniel Holbert [:dholbert] from comment #2)

Hmm, I think the regression/behavior-change here might be external to mozilla-central...

At least, I tested an arbitrary changeset from 2018-12-01 [1] and 2018-12-20 [2], and both had the same issue. (And bgrins had said he ran a command like this successfully before the holidays, which I think would've been more recently than 2018-12-01.)

bgrins, correct me if I'm wrong & the last-good-date might be longer ago...

I remember taking at least a few in Orlando (~Dec 5) when we were looking at some flexbox perf issues.

I tried another build [3] from a couple months earlier (from October 2nd) for good measure, and got the same behavior there as described in comment 0.

[3] https://hg.mozilla.org/mozilla-central/rev/1f7b787349b8

sorry, disregard comment 2 and 4 perhaps; those builds have "commit dates" of the dates that I mentioned, but their push dates are significantly later. /me shakes fist at untrustworthy hg commit dates (and 'hg up [date]' being untrustworthy)

So the oldest build I've actually tested so far is https://hg.mozilla.org/mozilla-central/rev/571c01c5f84b which was pushed on Dec 15, after bgrins' recollected last-good date.

OK, I tested a build from a commit whose push date is late november:
https://hg.mozilla.org/mozilla-central/rev/fbd97100c83c

...and I confirmed that it works. It quits Firefox after each cycle (doesn't sit on the page w/ results and checkboxes), and the terminal output has "Adding profile" near the end, and I end up with a browser window launched with some profile-viewing options ("Choose a profile from this zip file" & some links).

So this is indeed a regression in mozilla-central that is discoverable via bisection, in theory.

Keywords: regression

Tracked down the regression with 'hg --bisect'.

The first bad revision is:
https://hg.mozilla.org/mozilla-central/rev/2d59d8c885a3
Mark Banner — Bug 1415483 - Apply the new options to reject-importGlobalProperties across the codebase, remove unnecessary importGlobalProperties. r=nika

That ^^ revision gives "actual results", while the parent revision, https://hg.mozilla.org/mozilla-central/rev/85f261cb71cf, gives "expected results". This makes some sense, because 2d59d8c885a3 was a pretty wide-ranging commit that did include some changes to talos code.

(Warning: to be able to test either of those revisions, you have to first revert the [unrelated] previous commit, Bug 1512673's 985505cc1347, which broke the build & was later backed out.)

Blocks: 1415483

Aha! I'm pretty sure the problem was from https://hg.mozilla.org/mozilla-central/rev/2d59d8c885a3#l20.2 's removal of Cu.importGlobalProperties(["TextEncoder"]); from the "talos-powers/api.js" file, because:

  1. In a build at (first-bad) cset 2d59d8c885a3, I see this in my error console when the issue happens:

ReferenceError: TextEncoder is not defined[Learn More] api.js:124:13

(2) If I revert that one-line api.js change from a build at that first-bad cset 2d59d8c885a3, then I'm restored to EXPECTED RESULTS.

This line was removed by mistake in Bug 1415483. I'm restoring it along with a eslint-rule-disabling comment that we now need (as of that same bug).
Assignee: nobody → dholbert
Status: NEW → ASSIGNED

I verified that the attached (phabricator-submitted) patch gives me "expected results" in a mozilla-central build from yesterday.

Pushed by dholbert@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/24b9c0eb34c0 Restore TextEncoder import in talos's api.js file. r=Standard8
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla66
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: