Closed Bug 906510 Opened 11 years ago Closed 11 years ago

Enhance parallel XPCShell harness to retry failed tests sequentially

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

mozilla26

People

(Reporter: mihneadb, Assigned: mihneadb)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 5 obsolete files)

safetynet 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch		Details \| Diff \| Splinter Review
Retry tests that timed out when run in parallel 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch		Details \| Diff \| Splinter Review
Rerun tests that fail when run in parallel. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch	ted : feedback+	Details \| Diff \| Splinter Review
Rerun tests that fail when run in parallel. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch		Details \| Diff \| Splinter Review
Rerun tests that fail when run in parallel. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch		Details \| Diff \| Splinter Review
Rerun tests that fail when run in parallel. 11 years ago Mihnea Dobrescu-Balaur (:mihneadb) (deleted), patch	ted : review+	Details \| Diff \| Splinter Review

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Description

•

11 years ago

Working on bug 898658 I found that I ended up with ~200 tests marked to run sequentially just because I found them fail once or twice in tens of try X test runs. This will not scale too well.

I suggest a new approach - we make the harness smarter (some might say "complex") so that instead of marking flaky tests as "run-sequentially", we watch for failures in the parallel run and then we run the respective failures sequentially at the end of the run.

I did "some" try runs [1] and found this works pretty well. As you can see from the logs, on average there's not many tests that get rerun and they don't take long. I also talked a bit to philor about this in #ateam and (short version [2]) he supports this approach.


The only platform that is really flaky for parxpc and might not necessarily benefit is Win8, from what I can see in the try runs. We could just keep the current way of running tests on Win8 by using a check against mozinfo.

[1] https://tbpl.mozilla.org/?tree=Try&rev=6c50d2ebcada
[2] http://logbot.glob.com.au/?c=mozilla%23ateam#c661779

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Blocks: 660788

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 1

•

11 years ago

Attached patch safetynet (obsolete) (deleted) — Details — Splinter Review

This is the patch that I used for the try runs. We could probably make this cleaner, but I wanted to come up quickly with a proof of concept so I could do the test runs.

I'd love some feedback on:

1. should we pursue this approach (it seems to me the only reasonable / fast way to get this rolling into production asap)
2. [if 1. is yes] how should I structure this patch (I'm not sure if there's a nicer approach than the early returns)

Thanks

Assignee: nobody → mihneadb

Attachment #791879 - Flags: feedback?(ted)

Ryan VanderMeulen [:RyanVM]

Comment 2

•

11 years ago

My only concern with this is that we're wallpapering over real test problems that need fixing by doing this. As we know from hiding/skipping tests, obscuring a test's problems tends to /dev/null the issue in the minds of any interested parties.

But I also know that getting flaky tests fixed is a much longer-term issue and I wouldn't want to see this work (and benefits it provides) held up in the mean time. Can we at least write a list of repeated tests to the log so that they're easily identifiable? That way, we can at least work towards eliminating them and hopefully remove this workaround in the long-run.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 3

•

11 years ago

(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #2)
> My only concern with this is that we're wallpapering over real test problems
> that need fixing by doing this. As we know from hiding/skipping tests,
> obscuring a test's problems tends to /dev/null the issue in the minds of any
> interested parties.
> 
> But I also know that getting flaky tests fixed is a much longer-term issue
> and I wouldn't want to see this work (and benefits it provides) held up in
> the mean time. Can we at least write a list of repeated tests to the log so
> that they're easily identifiable? That way, we can at least work towards
> eliminating them and hopefully remove this workaround in the long-run.

Yes, absolutely, we would be logging them.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 4

•

11 years ago

Attached patch Retry tests that timed out when run in parallel (obsolete) (deleted) — Details — Splinter Review

It might be enough to just retry the tests that timeout. I'm not sure, doing
a try run to confirm.

https://tbpl.mozilla.org/?tree=Try&rev=9363edb097b1

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 5

•

11 years ago

Comment on attachment 792640 [details] [diff] [review]
Retry tests that timed out when run in parallel

Ok, this is clearly not enough. :(

Attachment #792640 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 6

•

11 years ago

Attached patch Rerun tests that fail when run in parallel. (obsolete) (deleted) — Details — Splinter Review

Cleaned the code up a bit and improved logging of reruns.

https://tbpl.mozilla.org/?tree=Try&rev=2bb0d6ce7b96

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #791879 - Attachment is obsolete: true

Attachment #791879 - Flags: feedback?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #792920 - Flags: feedback?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 7

•

11 years ago

It would be great to come up with a way to tie in the retry logs into orangefactor, but I think we don't have support of uploading something from tbpl anywhere on the internet right now. Could be a follow up.

(not currently active) Ted Mielczarek

Comment 8

•

11 years ago

Note that bug 749421 just went live on Cedar.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 9

•

11 years ago

Great, will do a follow up bug after this lands.

(not currently active) Ted Mielczarek

Comment 10

•

11 years ago

Comment on attachment 792920 [details] [diff] [review]
Rerun tests that fail when run in parallel.

Review of attachment 792920 [details] [diff] [review]:
-----------------------------------------------------------------

I think this is a net positive. We should probably do this in our other harnesses as well to reduce the amount of intermittent orange (but ideally we'd have a way to list the tests that are intermittently failing).

::: testing/xpcshell/runxpcshelltests.py
@@ +79,5 @@
>          self.test_object = test_object
>          self.cleanup_dir_list = cleanup_dir_list
> +        self.try_again_list = try_again_list
> +        self.first_try = first_try
> +        self.retry = False

It feels like you've got too many variables here. I'd probably just make retry a list, and append its contents to try_again_list in the main runTests function.

@@ +129,5 @@
>          self.event.set()
>  
> +    def mark_for_retry(self):
> +        self.try_again_list.append(self.test_object)
> +        self.retry = True

So this could just be:
self.retry.append(self.test_object)

@@ +575,5 @@
> +
> +                if self.first_try:
> +                    self.mark_for_retry()
> +                    self.clean_temp_dirs(name, stdout)
> +                    return

All these repeated blocks are kind of ugly. Can you just set like retry = True here, and then down the bottom of this method do:
if retry:
  self.mark_for_retry()
?

@@ +1302,5 @@
> +                    verbose=verbose, pStdout=pStdout, pStderr=pStderr,
> +                    keep_going=keepGoing, log=self.log, mobileArgs=mobileArgs,
> +                    first_try=False, **kwargs)
> +            test.start()
> +            test.join()

I'd like to see this factored into a function to share with the other call to testClass above.

Attachment #792920 - Flags: feedback?(ted) → feedback+

Andrew Halberstadt [:ahal]

Comment 11

•

11 years ago

(In reply to Ted Mielczarek [:ted.mielczarek] from comment #10)
> I think this is a net positive. We should probably do this in our other
> harnesses as well to reduce the amount of intermittent orange (but ideally
> we'd have a way to list the tests that are intermittently failing).

And ideally the frequency of tests in this list would be hooked up directly to orangefactor. Otherwise doing this on a larger would make orangefactor more or less useless.

(not currently active) Ted Mielczarek

Comment 12

•

11 years ago

While that's true, orange factor exists as a measure of the pain of sheriffing the tree. If we're retrying failures and presenting a green run for tests that don't fail repeatably, we're removing some of that pain, which makes it less important.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 13

•

11 years ago

Attached patch Rerun tests that fail when run in parallel. (obsolete) (deleted) — Details — Splinter Review

Cleaned up, feedback welcome.

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #792920 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 14

•

11 years ago

Attached patch Rerun tests that fail when run in parallel. (obsolete) (deleted) — Details — Splinter Review

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #796169 - Attachment is obsolete: true

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 15

•

11 years ago

Comment on attachment 796236 [details] [diff] [review]
Rerun tests that fail when run in parallel.

https://tbpl.mozilla.org/?tree=Try&rev=6c43ba6cfd2b

I'm not sure about refactoring that init call, could you please explain what you meant?

Not sure how I could make this cleaner.

Attachment #796236 - Flags: feedback?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Comment 16

•

11 years ago

Attached patch Rerun tests that fail when run in parallel. (deleted) — Details — Splinter Review

Rebased on latest m-c changes. run_test already has ~3 exit points because
of the way we handle hangs, test failures and crashes, so the patch has to
affect all those.

Changing to r?.

Attachment #796809 - Flags: review?(ted)

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Attachment #796236 - Attachment is obsolete: true

Attachment #796236 - Flags: feedback?(ted)

(not currently active) Ted Mielczarek

Comment 17

•

11 years ago

Comment on attachment 796809 [details] [diff] [review]
Rerun tests that fail when run in parallel.

Review of attachment 796809 [details] [diff] [review]:
-----------------------------------------------------------------

Looks a lot better, thanks!

Attachment #796809 - Flags: review?(ted) → review+

Mihnea Dobrescu-Balaur (:mihneadb)

Assignee

Updated

•

11 years ago

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 18

•

11 years ago

https://hg.mozilla.org/integration/b2g-inbound/rev/fa1e79afefca

Flags: in-testsuite-

Keywords: checkin-needed

Ryan VanderMeulen [:RyanVM]

Comment 19

•

11 years ago

https://hg.mozilla.org/mozilla-central/rev/fa1e79afefca

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla26

(not currently active) Ted Mielczarek

Updated

•

11 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=918921

(not currently active) Ted Mielczarek

Updated

•

11 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=918922

David Baron :dbaron:

Comment 20

•

11 years ago

I think this change should be reverted.  Tests were written with a particular pass/fail threshold in mind,  and moving the threshold might change whether they're still testing what they're intended to test.  For example, a test might have been written that would pass all the time without a bug, but fail intermittently when the bug is present.  (I've definitely written tests of that form, and I suspect others have as well.)

Moving the line might relieve the pain temporarily, but it won't solve the problem that there will always be some bugs (whether bugs in the code or the tests) that cause tests to intermittently cross the pass/fail line.

The point of running tests is to prevent regressing the things that they're testing.  The way to deal with an intermittent failure problem is to put resources into fixing the tests that fail intermittently (particularly into finding the changes that caused them to start failing intermittently).

Gregory Szorc [:gps]

Comment 21

•

11 years ago

David: Correct me if I'm wrong, but your complaint has nothing to do with the fact we re-run tests that failed but rather that we report a failed-then-passed test as passed, right? Presumably, we could re-run tests all we want and as long as the output of the job is "proper" then it doesn't matter what the harness does behind the scenes.

Perhaps we should add a new color to differentiate between "always failed" and "intermittent failed?"

David Baron :dbaron:

Comment 22

•

11 years ago

(In reply to Gregory Szorc [:gps] from comment #21)
> David: Correct me if I'm wrong, but your complaint has nothing to do with
> the fact we re-run tests that failed but rather that we report a
> failed-then-passed test as passed, right?

Right.

> Presumably, we could re-run tests
> all we want and as long as the output of the job is "proper" then it doesn't
> matter what the harness does behind the scenes.
> 
> Perhaps we should add a new color to differentiate between "always failed"
> and "intermittent failed?"

I think it should be orange, just like other failed tests.

(Also see the somewhat more extended discussion in bug 918922.)

Ed Morley [:emorley]

Updated

•

10 years ago

Blocks: 996504

You need to log in before you can comment on or make changes to this bug.