Closed
Bug 1342828
Opened 8 years ago
Closed 7 years ago
linux64 debug mozmill test always hit timeout
Categories
(Thunderbird :: Testing Infrastructure, defect)
Thunderbird
Testing Infrastructure
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: arai, Assigned: tomprince)
References
Details
Attachments
(5 files, 1 obsolete file)
(deleted),
patch
|
jlund
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Fallen
:
review+
jorgk-bmo
:
approval-comm-beta+
jorgk-bmo
:
approval-comm-esr52+
|
Details | Diff | Splinter Review |
(deleted),
text/x-github-pull-request
|
Details | |
(deleted),
patch
|
jlund
:
review+
|
Details | Diff | Splinter Review |
(deleted),
text/x-review-board-request
|
nthomas
:
review+
|
Details |
> command timed out: 7200 seconds elapsed running ['/tools/buildbot/bin/python',
> 'scripts/scripts/desktop_unittest.py', '--cfg', 'unittests/linux_unittest.py',
> '--mozmill-suite', 'mozmill', '--cfg', 'unittests/thunderbird_extra.py',
> '--blob-upload-branch', 'try-comm-central', '--download-symbols', 'true'],
> attempting to kill
it need to be split up into 2 or more chunks, like bug 1248692 for xpcshell test.
Reporter | ||
Comment 2•8 years ago
|
||
can it be separated into many smaller parts?
in that case it's easy to test try-only intermittent.
Reporter | ||
Comment 3•8 years ago
|
||
I think currently mozmill test result is not reliable for finding regression, because of this issue.
even if we compare the result between previous run, the set of executed tests can be different, and we'll overlook regression, or get confused by not-new-regression.
Severity: normal → major
Updated•8 years ago
|
Assignee: nobody → aleth
Flags: needinfo?(aleth)
Comment 4•8 years ago
|
||
I'm not sure if mozmill supports totalChunks, but let's try it and see. If it doesn't work we might have to increase the script_maxtime instead.
Attachment #8843727 -
Flags: review?(jlund)
Comment 5•8 years ago
|
||
Sorry about the following unrelated comment:
On Linux debug the Xpcshell tests run as X1 and X2 on the treeherder. X1 seem to be TB tests whereas X2 are M-C tests. Could I have that on all platforms and also on opt builds.
Comment 6•8 years ago
|
||
(In reply to Jorg K (GMT+1) from comment #5)
> Sorry about the following unrelated comment:
> On Linux debug the Xpcshell tests run as X1 and X2 on the treeherder. X1
> seem to be TB tests whereas X2 are M-C tests. Could I have that on all
> platforms and also on opt builds.
That pattern is a coincidence, as you can see from the fact it's only approximately true (e.g. devtools/ in X1). The test order is more or less alphabetical.
I'm not sure what the resource costs would be of using two xpcshell chunks everywhere.
Comment 7•8 years ago
|
||
Comment on attachment 8843727 [details] [diff] [review]
Use two chunks for TB Linux debug mozmill tests
Review of attachment 8843727 [details] [diff] [review]:
-----------------------------------------------------------------
::: mozilla-tests/thunderbird_config.py
@@ +166,5 @@
> ]
> +MOZMILL_TWO_CHUNKS = [
> + ('mozmill', {
> + 'use_mozharness': True,
> + 'script_path': 'scripts/desktop_unittest.py',
hm, I suppose if you wanted to test this on try you hack the desktop_unittest.py script in-tree to pretend as if totalChunks:2 was passed in the script args.
Attachment #8843727 -
Flags: review?(jlund) → review+
Comment 8•8 years ago
|
||
What is the status of this bug?
It has a positive review but no chekin yet...
Flags: needinfo?(jlund)
Flags: needinfo?(aleth)
Comment 9•8 years ago
|
||
Sadly I haven't seen Aleth in weeks :-(
Comment 10•8 years ago
|
||
Or it just needs to land on *central? Can it break anything outside of c-c if it isn't right? In which repo does it go when the file is in mozilla-tests/ ?
Comment 11•8 years ago
|
||
(In reply to Markus Adrario [:Taraman] from comment #8)
> What is the status of this bug?
> It has a positive review but no chekin yet...
I'm not sure. It has been reviewed so it can land on buildbot-configs (outside gecko/central). Someone will have to land and be on point for any fallout.
Flags: needinfo?(jlund)
Comment 12•8 years ago
|
||
Philipp, could you please take a look at this and land it if you think it's right.
Flags: needinfo?(philipp)
Comment 13•8 years ago
|
||
Looks right to me. I'll take care to back out if there is fallout, just let me know if I miss it.
https://hg.mozilla.org/build/buildbot-configs/rev/555764ee0a26cda0164cb1d54fdb014b2afc6606
Flags: needinfo?(philipp)
Flags: needinfo?(aleth)
Comment 14•8 years ago
|
||
When is this change expected to take effect?
until now the problem remains...
Comment 15•8 years ago
|
||
Philipp, can you please back this out again, it seems to have destroyed Mozmill on Linux altogether, see treeherder C-C after Mon May 1, 2017, 22:15:53: ? ? B X1 X2.
Flags: needinfo?(philipp)
Comment 16•8 years ago
|
||
Backed out in https://hg.mozilla.org/build/buildbot-configs/rev/f3cb566ee39e4a42af2f7713053d73d98f11f7f6
Pending a merge to production and a reconfig
Flags: needinfo?(philipp)
Comment 17•8 years ago
|
||
Nick, could you please take care of the merge to production, etc.
Flags: needinfo?(nthomas)
Comment 19•7 years ago
|
||
Tom, could you take a look at this one. We put this into production and had to back it out.
Flags: needinfo?(tom.prince)
Assignee | ||
Comment 20•7 years ago
|
||
It looks like the mozmill test runner (https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtestlist.py) doesn't have support for running in multiple chunks currently, unlike the xpcshell runner (https://dxr.allizom.org/comm-central/source/mozilla/testing/xpcshell/runxpcshelltests.py).
It looks like it would make sense to port runtestlist to use manifestparser, which can then take advantage manifestparser.chunk_by_slice. A quicker solution would be to just duplicate the slicing logic (https://dxr.allizom.org/comm-central/source/mozilla/testing/mozbase/manifestparser/manifestparser/filters.py#170-194) from there.
Assignee | ||
Comment 21•7 years ago
|
||
I looked into using manifestparser, but it has more expectations on test layout and test running, so I just implemented chunking in the mozmill runner.
It does look like mozmill has some support for manifestparser (although it doesn't expose filters). It might be worthwhile exploring migrating to using that, but this unblocks chunking the tests pending that work.
Flags: needinfo?(tom.prince)
Comment 22•7 years ago
|
||
Comment on attachment 8876459 [details] [diff] [review]
Support chunks in mozmill tests.
Looks good to me, thanks for figuring it out! I agree that this is the simpler solution, especially given mozmill is only still used in Thunderbird.
Let's get this patch pushed then retry the buildbot changes. I wonder if the question mark that showed up on treeherder was just a side-effect of the chunk logic missing, or if more work is needed.
Attachment #8876459 -
Flags: review+
Updated•7 years ago
|
Assignee: aleth → tom.prince
Keywords: checkin-needed
Comment 23•7 years ago
|
||
(In reply to Philipp Kewisch [:Fallen] from comment #22)
> Let's get this patch pushed then retry the buildbot changes.
I can certainly land this in the next push.
Can someone please enlighten me what the Buildbot changes are for. Are they needed as well? Philipp talks about a "simpler solution" presented in the second patch.
Comment 24•7 years ago
|
||
Oh, looks like the first patch requests chunks and the second patch makes them possible for C-C.
Comment 25•7 years ago
|
||
To see the treeherder question marks Philipp mentioned, please click here:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=34153f64820674523909b789c68f58e8f0c1af01
Assignee | ||
Comment 26•7 years ago
|
||
> I wonder if the question mark that showed up on treeherder was just a side-effect of the chunk logic missing, or if more work is needed.
I had been going to guess that it was just a side effect, but looking at https://github.com/mozilla/treeherder/blob/5e7a78cbbd129b21164609a3d96353dc114ecde8/treeherder/etl/buildbot.py#L642 the trailing '$' in the regex might need to be removed.
Comment 27•7 years ago
|
||
Does this change mozmill itself? May the change get overwritten if we update mozmill? If so, the hunk should be somehow marked so it doesn't get lost as it will not be upstream.
Assignee | ||
Comment 28•7 years ago
|
||
No. The vendored copy of mozmill is at https://dxr.allizom.org/comm-central/source/mail/test/resources/mozmill . This is just the code that invokes mozmill. https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtest.py is a wrapper around mozmill which is invoked by https://dxr.allizom.org/comm-central/source/mail/test/mozmill/runtestlist.py which is what is being changed.
Comment 29•7 years ago
|
||
(In reply to Jorg K (GMT+2) from comment #23)
> (In reply to Philipp Kewisch [:Fallen] from comment #22)
> > Let's get this patch pushed then retry the buildbot changes.
> I can certainly land this in the next push.
>
> Can someone please enlighten me what the Buildbot changes are for. Are they
> needed as well? Philipp talks about a "simpler solution" presented in the
> second patch.
So for checkin, please push the c-c patch. I will take care of the buildbot patch once it is in. With "simpler solution", I just agreed with Tom from comment 20 and 21 that just porting the chunking logic makese sense instead of porting mozmill to use the manifestparser.
Tom, if you are certain that the $ will fix it I think you can go ahead and send a PR for treeherder. Otherwise we can fix this when the builds are in as it seems to just be a cosmetic issue.
Comment 30•7 years ago
|
||
I know I make myself very unpopular by re-reviewing things before landing, but so be it.
Let's assume we have 100 tests and want two chunks. Let's also assume this_chunk is set to 1 and 2, although I can't see where this happens.
this_chunk=1:
+ tests_per_chunk = float(len(tests)) / options.total_chunks
+ start = int(round((options.this_chunk - 1) * tests_per_chunk))
+ end = int(round(options.this_chunk * tests_per_chunk))
We get: tests_per_chunk = 50.
start=0
end=50.
this_chunk=2:
We get: tests_per_chunk = 50.
start=50
end=100.
I think there is a -1 missing on the end calculation. Or am I missing something? Then I apologise in advance ;-)
Keywords: checkin-needed
Assignee | ||
Comment 31•7 years ago
|
||
> I know I make myself very unpopular by re-reviewing things before landing, but so be it.
Easier to fix things before they land than after.
> I think there is a -1 missing on the end calculation. Or am I missing something?
Python (as well as many other languages) uses half-open intervals to represent ranges. That is, a range includes the left endpoint but not the right endpoint. So tests[0:50] selects everything from 0-49 and tests[50:100] selects everything 50-99. They do this to make this kind of code easier to reason about; you know that you've covered a large interval by smaller ones if all the endpoints match, rather than if the endpoints differ by one, for example.
> Then I apologise in advance ;-)
If something is confusing for one person, it is liable to be confusing to others, so better to catch it and improve it, if necessary. I don't *think* that any explanation is needed in the code here, but it seems better to ask than to let something confusing get through.
Comment 32•7 years ago
|
||
(In reply to Tom Prince from comment #31)
> That is, a range includes the left endpoint but not the
> right endpoint. So tests[0:50] selects everything from 0-49 and
> tests[50:100] selects everything 50-99.
I thought you would answer that, I was going to look it up once I returned to my office, but you beat me to it.
This clearly shows my ignorance and the fact that we need a build/release engineer ;-)
> I don't *think* that any explanation is needed in the code here, ...
Well, if it's a language feature, it doesn't need a comment. Code is to be read by people capable of understanding it ;-)
I'll get it landed after the next M-C merge. As I explained elsewhere, M-C land so many changes that in order to see bustage early, I do a push after every M-C merge.
Keywords: checkin-needed
Comment 33•7 years ago
|
||
https://hg.mozilla.org/comm-central/rev/aa78e5d5dc26d9cda6e983589ccb98a7a94054be
Philipp, please take care of the buildbot part. Can we also please fix the treeherder, cosmetic or not ;-)
Flags: needinfo?(philipp)
Keywords: checkin-needed → leave-open
Comment 34•7 years ago
|
||
Mozmill still working ;-)
Just as a matter of interest, where do you increment options.this_chunk?
Assignee | ||
Comment 35•7 years ago
|
||
this_chunk isn't every incremented. The buildbot configuration arranges to call the script twice (in the two different builds), once with --this-chunk 1 and once with --this-chunk 2. (See https://dxr.allizom.org/build-central/source/buildbotcustom/misc.py#665-686 for details)
Comment 36•7 years ago
|
||
Good to know, thanks. Any reason to use allizom?
https://dxr.mozilla.org/build-central/source/buildbotcustom/misc.py#665-686 works, too.
Comment 37•7 years ago
|
||
https://hg.mozilla.org/build/buildbot-configs/rev/4f9b97ebf944bb3fb0aff829e8450d41ffdc5821
I think buildbot-configs is merged to production and reconfigured at midnight, I always forget if this is automated or requires someone to manually intervene.
Flags: needinfo?(philipp)
Comment 38•7 years ago
|
||
The question marks are back, can we get them fixed, please.
Flags: needinfo?(tom.prince)
Assignee | ||
Comment 39•7 years ago
|
||
https://github.com/mozilla/treeherder/pull/2573 should fix the `?`s.
Flags: needinfo?(tom.prince)
Comment 40•7 years ago
|
||
Comment 41•7 years ago
|
||
Commit pushed to master at https://github.com/mozilla/treeherder
https://github.com/mozilla/treeherder/commit/a0fbe83925d3a9c72363ba71b12822f5111aeae4
Bug 1342828 - Support chunked mozmill jobs (#2573)
Assignee | ||
Comment 42•7 years ago
|
||
It looks like one of the tests is still failing due to timeout. It looks like the number of test per folder is unevenly distributed causing the first chunk to take wildly longer. (The first chunk has over 600 tests and the second has ~250).
Comment 43•7 years ago
|
||
(In reply to Tom Prince from comment #39)
> https://github.com/mozilla/treeherder/pull/2573 should fix the `?`s.
It does, you can see it working on comm-beta right now. Thanks!
Assignee | ||
Comment 44•7 years ago
|
||
It looks like https://bugzilla.mozilla.org/attachment.cgi?id=8876459 should be landed comm-beta too, since the buildbot changes are causing the tests to not run there. (This might mean that this also needs to be applied to ESR releases?)
Comment 45•7 years ago
|
||
Comment on attachment 8876459 [details] [diff] [review]
Support chunks in mozmill tests.
I'll take care of it, thanks!
Attachment #8876459 -
Flags: approval-comm-esr52+
Attachment #8876459 -
Flags: approval-comm-beta+
Comment 46•7 years ago
|
||
Assignee | ||
Comment 47•7 years ago
|
||
Assignee | ||
Updated•7 years ago
|
Attachment #8880478 -
Flags: review?(jlund)
Comment 48•7 years ago
|
||
Comment on attachment 8880478 [details] [diff] [review]
Support chunked mozmill tests in try syntax.
Review of attachment 8880478 [details] [diff] [review]:
-----------------------------------------------------------------
lgtm. do we need to update trychooser front end to add this option?
Attachment #8880478 -
Flags: review?(jlund) → review+
Comment 49•7 years ago
|
||
https://hg.mozilla.org/build/buildbotcustom/rev/e32f42e43b34782fde995aff4477c5cbc5d2d967
(In reply to Jordan Lund (:jlund) from comment #48)
> do we need to update trychooser front end to add this option?
Maybe not. Mozmill is not used in Firefox and our TB folks just put |-u mozmill|. That just stopped working due to the chunking so now it will work again.
Are we done here or will we still address comment #42? Tom?
Assignee | ||
Comment 50•7 years ago
|
||
Looking at the latest treeherder push on comm-central, the first mozmill test run is still timing out, so we'll need to something to address that.
An easy workaround would be to up the chunks, but given that the first chunk takes >120m and the second just 45, there should be a better way of balancing the tests.
Comment 51•7 years ago
|
||
I'd go to three chunks and be done with it ;-)
Comment 52•7 years ago
|
||
Nick, sorry for the hassle, did the push from comment #49 get merged to production?
Flags: needinfo?(nthomas)
Comment 53•7 years ago
|
||
https://hg.mozilla.org/build/buildbotcustom/pushloghtml?fromchange=07a05ce39ac6&tochange=4e4744e608c6 shows pretty clearly that it did.
Flags: needinfo?(nthomas)
Comment 54•7 years ago
|
||
Sorry, I didn't know what to look for, now I understand.
Comment 55•7 years ago
|
||
Tom, can we do a bit more here, since both Xpcshell and Mozmill are timing out, for example here:
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=0170000376576d23e86bd8faa3e51c4e659f85c2
Flags: needinfo?(mozilla)
Comment hidden (mozreview-request) |
Assignee | ||
Updated•7 years ago
|
Attachment #8887142 -
Attachment is obsolete: true
Comment hidden (mozreview-request) |
Assignee | ||
Comment 58•7 years ago
|
||
mozreview-review |
Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.
https://reviewboard.mozilla.org/r/157892/#review162994
::: mozilla-tests/thunderbird_config.py:171
(Diff revision 1)
> ('mozmill', {
> 'use_mozharness': True,
> 'script_path': 'scripts/desktop_unittest.py',
> 'extra_args': ['--mozmill-suite', 'mozmill',
> '--cfg', 'unittests/thunderbird_extra.py',
> '--opt-cfg', 'unittests/thunderbird_buildbot.py'],
This depends on https://reviewboard.mozilla.org/r/155746/ for the trivial reason that they overlap adding this line.
Assignee | ||
Comment 59•7 years ago
|
||
mozreview-review |
Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.
https://reviewboard.mozilla.org/r/157894/#review162998
This is probably overkill, but it seems easier to split things up more, and increase the timeouts to figure out what is going on, and then make the bounds tighter later.
Comment 60•7 years ago
|
||
Great!
Comment 61•7 years ago
|
||
mozreview-review |
Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.
https://reviewboard.mozilla.org/r/157894/#review163254
Attachment #8887143 -
Flags: review+
Comment 62•7 years ago
|
||
Comment on attachment 8887143 [details]
Bug 1342828 - Use many chunks in thunderbird debug tests.
https://hg.mozilla.org/build/buildbot-configs/rev/26fba14dab2d8a75690dcec4394c1d06ce60f3df
https://hg.mozilla.org/build/buildbot-configs/rev/cc0315e8cf031861b866a972ca5048e93b22a232
Will be used for jobs starting in 55 minutes or so.
Comment 63•7 years ago
|
||
If you would like to dig into the timeouts and running them locally is an issue then I suggest getting a loaner, see https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_loaner.
Comment 64•7 years ago
|
||
I think it's time to close this bug and open a new one for follow-up work.
In
https://treeherder.mozilla.org/#/jobs?repo=comm-central&revision=25b49e872ac53975737e857be31f5c676c586007
there are no more timeouts, the slowest tests are X4 and Z1:
Linux 32bit: X4: 110 min, Z1: 113 min
Linux 64bit: X4: 85 min, Z1: 88 min.
With 'script_maxtime': 3600*3 which is 180 min we should be good for a while, right?
Status: NEW → RESOLVED
Closed: 7 years ago
Flags: needinfo?(mozilla)
Keywords: leave-open
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•