Closed Bug 1630982 Opened 5 years ago Closed 5 years ago

Terminate mochitests using exit(0) as soon as we have test results

Tracking

(firefox78 fixed)

Status:

RESOLVED FIXED

Milestone:

mozilla78

Tracking Flags:

Tracking

Status

firefox78

---

fixed

People

(Reporter: florian, Assigned: florian)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [ci-costs-2020:done])

Attachments

(3 files, 1 obsolete file)

Bug 1630982 - Terminate browser-chrome mochitests with an exit(0), r=dthayer,gbrown. 5 years ago Florian Quèze [:florian] (deleted), text/x-phabricator-request		Details
IPC errors when terminating the parent process with an exit(0) 5 years ago Florian Quèze [:florian] (deleted), text/plain		Details
Bug 1630982 - Exit(0) at the end of xpcshell tests, r=gbrown. 5 years ago Florian Quèze [:florian] (deleted), text/x-phabricator-request		Details
Bug 1630982 - terminate the process quickly for mochitest-plain tests, r=gbrown. 5 years ago Florian Quèze [:florian] (deleted), text/x-phabricator-request		Details

Florian Quèze [:florian]

Assignee

Description

•

5 years ago

Shutdown is slow because we write things to disk and do cleanups (eg. garbage collection). There's an ongoing effort (tracked in bug 1606879) to make shutdown faster in general by doing the writes during the beginning of shutdown, and then terminating the process with an exit(0) for optimized builds.

For browser chrome mochitests, we can terminate without worrying about what we write to disk, so we can be much more agressive.

Florian Quèze [:florian]

Assignee

Comment 1

•

5 years ago

Attached file Bug 1630982 - Terminate browser-chrome mochitests with an exit(0), r=dthayer,gbrown. (deleted) — Details

Marco Castelluccio [:marco]

Updated

•

5 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1608454

Marco Castelluccio [:marco]

Comment 2

•

5 years ago

Would this skip leak checking?

Joel Maher ( :jmaher ) (UTC -8)

Comment 3

•

5 years ago

I see this is skipping debug, so leak checking would still be valid. we should validate ccov, tsan, asan needs; assuming this is successful, we should do the rest of mochitest, xpcshell, reftest, wpt as well.

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

5 years ago

Whiteboard: [ci-costs-2020:todo]

Marco Castelluccio [:marco]

Comment 4

•

5 years ago

For ccov, we need to take the long shutdown road, as coverage counters are written during shutdown.

Florian Quèze [:florian]

Assignee

Comment 5

•

5 years ago

So far I’ve tested on try with the browser/base/content/test/webrtc folder of tests that I know well (as I wrote most of them years ago). Initial results seem encouraging: 53s instead of 75s on Win7, 63 instead of 73 on Win10, 55 instead of 66 on OS X. On linux 299s without the patch and 246s with it (but both runs lost 119s during startup on the initialization of dbus that seems to be terribly slow when startup profiling is enabled, as we discovered in bug 1624868).

James Graham [:jgraham]

Comment 6

•

5 years ago

How different is this to just sending SIGKILL to the browser? For wpt control has to return to the Python and we currently do a dance where we try to do a clean shutdown using marionette, then send a SIGTERM and then a SIGKILL. So for an implementation there there are two options: either allow marionette to perform a fast shutdown, or just kill the process agressively.

Florian Quèze [:florian]

Assignee

Comment 7

•

5 years ago

Attached file IPC errors when terminating the parent process with an exit(0) (deleted) — Details

Florian Quèze [:florian]

Assignee

Comment 8

•

5 years ago

One behavior that I don't like with the current patch is that terminating abruptly the parent process without saying goodbye to child processes means they output noise in the terminal before also terminating themselves. Probably not much of a problem for runs on infra, but I would also like this patch to improve the experience for developers running mochitests locally. Sending an IPC messages to tell child processes to quietly exit could be a follow-up.

Florian Quèze [:florian]

Assignee

Comment 9

•

5 years ago

(In reply to James Graham [:jgraham] from comment #6)

How different is this to just sending SIGKILL to the browser?

If profiling is enabled, this patch dumps the profile to the disk before calling exit. I would like profiling to be available for all tests, so that we can see where they spend time, and find more opportunities for optimization.

I think we should allow marionette to perform a fast shutdown.

Geoff Brown [:gbrown]

Comment 10

•

5 years ago

Overall I think this is a great initiative and I am happy to see the implementation in the proposed patch. However, I feel we should keep in mind -- especially if we are considering expanding the scope of changes -- that we want to test the entire browser life-cycle, and various tests have, on occasion, demonstrated shutdown hangs.

Florian Quèze [:florian]

Assignee

Comment 11

•

5 years ago

(In reply to Geoff Brown [:gbrown] from comment #10)

various tests have, on occasion, demonstrated shutdown hangs.

Were these hangs not happening on debug builds?

Geoff Brown [:gbrown]

Comment 12

•

5 years ago

I'm not sure, but probably. As I said, I support the proposed change; I just don't want us to try to short cut shutdown everywhere.

Phabricator Automation

Updated

•

5 years ago

Assignee: nobody → florian

Status: NEW → ASSIGNED

Florian Quèze [:florian]

Assignee

Comment 13

•

5 years ago

Attached file Bug 1630982 - Exit(0) at the end of xpcshell tests, r=gbrown. (obsolete) (deleted) — Details

Florian Quèze [:florian]

Assignee

Comment 14

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #3)

I see this is skipping debug, so leak checking would still be valid. we should validate ccov, tsan, asan needs; assuming this is successful, we should do the rest of mochitest, xpcshell, reftest, wpt as well.

The patch already skips debug builds, and also skips the this._coverageCollector case, so I'm hoping it's good for ccov. Anything specific that needs to be done to validate that tsan and asan are not going to be negatively affected?

Flags: needinfo?(jmaher)

Marco Castelluccio [:marco]

Comment 15

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #14)

The patch already skips debug builds, and also skips the this._coverageCollector case, so I'm hoping it's good for ccov.

No, _coverageCollector is not used in normal coverage builds (it was used for coverage collected via the debugger). There is a MOZ_CODE_COVERAGE definition that you can use instead.

Florian Quèze [:florian]

Assignee

Comment 16

•

5 years ago

xpcshell profile of the same test without the exit(0) https://perfht.ml/2VxnujY and with it https://perfht.ml/2x9hHYE

Joel Maher ( :jmaher ) (UTC -8)

Comment 17

•

5 years ago

let me check with :decoder about tsan (and he might know about asan).

:decoder, this patch is doing a quick browser shutdown when we can to save time. Is this ok for TSAN? we are not doing this for debug as we look for leaks. Are there similar things for TSAN? If so how can we look for them?

Flags: needinfo?(jmaher) → needinfo?(choller)

Christian Holler (:decoder)

Comment 18

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)

let me check with :decoder about tsan (and he might know about asan).

:decoder, this patch is doing a quick browser shutdown when we can to save time. Is this ok for TSAN? we are not doing this for debug as we look for leaks. Are there similar things for TSAN? If so how can we look for them?

Is this just for tests, or is this the same we do in reality in the release builds?

Flags: needinfo?(choller) → needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 19

•

5 years ago

just for tests, hacked into the harnesses

Flags: needinfo?(jmaher)

Christian Holler (:decoder)

Comment 20

•

5 years ago

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #19)

just for tests, hacked into the harnesses

I don't think it's a good idea to do this (not just in TSan, but for all builds), in particular because we had and have problems with shutdown. Shutdown hangs or races have been a constant problem in Firefox and skipping this part just in tests greatly increases the risk that we ship a shutdown regression of some sort.

Sylvestre Ledru [:Sylvestre]

Comment 21

•

5 years ago

we could keep the normal exit process enabled for the m-c job (and disable it for try and autoland)
or some variance of this :)

and, anyway, if we regress too much, we can always come back to the previous behavior

Florian Quèze [:florian]

Assignee

Comment 22

•

5 years ago

(In reply to Christian Holler (:decoder) from comment #20)

I don't think it's a good idea to do this (not just in TSan, but for all builds), in particular because we had and have problems with shutdown. Shutdown hangs or races have been a constant problem in Firefox and skipping this part just in tests greatly increases the risk that we ship a shutdown regression of some sort.

Which kind of shutdown hangs would happen in opt builds but not in debug builds? I'm under the impression that debug builds are more likely to hang than optimized builds, so I don't think we would be losing significant test coverage.

Adding the needinfo back for the initial question:

(In reply to Joel Maher ( :jmaher ) (UTC-4) from comment #17)

let me check with :decoder about tsan (and he might know about asan).

:decoder, this patch is doing a quick browser shutdown when we can to save time. Is this ok for TSAN? we are not doing this for debug as we look for leaks. Are there similar things for TSAN? If so how can we look for them?

Flags: needinfo?(choller)

Christian Holler (:decoder)

Comment 23

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #22)

Which kind of shutdown hangs would happen in opt builds but not in debug builds? I'm under the impression that debug builds are more likely to hang than optimized builds, so I don't think we would be losing significant test coverage.

That is correct because in opt builds, the content processes have a fast exit path, while in debug they don't (hence in total, you will see more hangs with debug builds, if the hang is caused by a child).

But there is no such thing for the parent process in debug vs. opt and since opt has a vastly different timing compared to debug, it is not viable to assume that opt has less hangs than debug. It could well be exactly the other way around.

Adding the needinfo back for the initial question:

With TSan we should not exit because we would not be able to catch shutdown races anymore. There is no debug TSan configuration running because that would be way too slow.

Flags: needinfo?(choller)

Phabricator Automation

Updated

•

5 years ago

Attachment #9141272 - Attachment description: Bug 1630982 - Terminate mochitests with an exit(0), → Bug 1630982 - Terminate mochitests with an exit(0), r=dthayer,gbrown.

Florian Quèze [:florian]

Assignee

Comment 24

•

5 years ago

(In reply to Sylvestre Ledru [:Sylvestre] from comment #21)

we could keep the normal exit process enabled for the m-c job (and disable it for try and autoland)
or some variance of this :)

After a discussion on matrix with decoder, I think we should avoid the exit(0) shutdown path for RELEASE_OR_BETA. This is a good compromise between ensuring we test what we ship to users, and reducing CI costs.

Phabricator Automation

Updated

•

5 years ago

Attachment #9141272 - Attachment description: Bug 1630982 - Terminate mochitests with an exit(0), r=dthayer,gbrown. → Bug 1630982 - Terminate browser-chrome mochitests with an exit(0), r=dthayer,gbrown.

Pulsebot

Comment 25

•

5 years ago

Pushed by fqueze@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/b77388326b1b Terminate browser-chrome mochitests with an exit(0), r=gbrown,dthayer.

Florian Quèze [:florian]

Assignee

Comment 26

•

5 years ago

I reduced the patch to only contain the part related to browser-chrome tests so that it can land sooner.

For the mochitest-plain tests, I have test failures on try when a profiler folder is re-used for the edge case of a test manifest specifying tests using both the http and https schemes.

For the xpcshell tests, I have failures for tests that run in a child process, I need to figure out how to handle this case.

I'll handle finishing both of these in follow-up bugs.

Summary: Terminate the browser process using exit(0) as soon as we have test results → Terminate browser-chrome mochitests using exit(0) as soon as we have test results

Dorel Luca [:dluca]

Comment 27

•

5 years ago

Backed out changeset b77388326b1b (bug 1630982) for causing leaks on mochitest jobs. CLOSED TREE

Log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=301178919&repo=autoland&lineNumber=1666

Push with failures:
https://treeherder.mozilla.org/#/jobs?repo=autoland&group_state=expanded&revision=b77388326b1bf23441fad1c3aa22ad04443550f9\

Backout:
https://hg.mozilla.org/integration/autoland/rev/e2f1a6b3f0a865cbdbf6810d2278de290ce17773

Flags: needinfo?(florian)

Florian Quèze [:florian]

Assignee

Comment 28

•

5 years ago

Attached file Bug 1630982 - terminate the process quickly for mochitest-plain tests, r=gbrown. (deleted) — Details

Depends on D71336

Pulsebot

Comment 29

•

5 years ago

Pushed by fqueze@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/ae9720f97abb Terminate browser-chrome mochitests with an exit(0), r=gbrown,dthayer. https://hg.mozilla.org/integration/autoland/rev/a8cd211f5393 terminate the process quickly for mochitest-plain tests, r=gbrown.

Razvan Maries

Comment 30

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/ae9720f97abb
https://hg.mozilla.org/mozilla-central/rev/a8cd211f5393

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox78: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla78

Bogdan Tara[:bogdan_tara | bogdant]

Updated

•

5 years ago

Regressions: 1636316

Florian Quèze [:florian]

Assignee

Comment 31

•

5 years ago

(In reply to Florian Quèze [:florian] from comment #26)

I reduced the patch to only contain the part related to browser-chrome tests so that it can land sooner.

For the mochitest-plain tests, I have test failures on try when a profiler folder is re-used for the edge case of a test manifest specifying tests using both the http and https schemes.

This part actually landed in this bug.

For the xpcshell tests, I have failures for tests that run in a child process, I need to figure out how to handle this case.

I fixed this case where xpcshell tests using a child process failed, but I still have 2 xpcshell tests that crash at shutdown and I haven't figured out why yet; I'll file a follow-up for the xpcshell part of my patches.

Flags: needinfo?(florian)

Summary: Terminate browser-chrome mochitests using exit(0) as soon as we have test results → Terminate mochitests using exit(0) as soon as we have test results

Florian Quèze [:florian]

Assignee

Updated

•

5 years ago

Blocks: 1637637

Phabricator Automation

Comment 32

•

5 years ago

Comment on attachment 9143127 [details]
Bug 1630982 - Exit(0) at the end of xpcshell tests, r=gbrown.

Revision D72407 was moved to bug 1637637. Setting attachment 9143127 [details] to obsolete.

Attachment #9143127 - Attachment is obsolete: true

Simon Giesecke [:sg] [he/him]

Updated

•

5 years ago

Regressions: 1637715

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

4 years ago

Whiteboard: [ci-costs-2020:todo] → [ci-costs-2020:done]

Marco Castelluccio [:marco]

Comment 33

•

2 years ago

Did we ever consider doing this for other test suites, e.g. reftest?

Flags: needinfo?(florian)

Florian Quèze [:florian]

Assignee

Comment 34

•

2 years ago

(In reply to Marco Castelluccio [:marco] from comment #33)

Did we ever consider doing this for other test suites, e.g. reftest?

I did it for xpcshell tests too in bug 1637637. I haven't attempted to do it for reftests. I just grabbed a profile of running locally a random folder of reftests: https://share.firefox.dev/3NIFAHM On my machine we could save about 200ms with an exit(0) after the last test finishes. That seems small compared to the 1.6s it took to start running tests (we do a full Firefox startup, initializing a full browser window before we close it to open the reftests window instead).

Feel free to open bugs for reftests (or other test suites) if you think there are interesting potential wins to be had. I'm probably not going to take them myself, but happy to help.

Flags: needinfo?(florian)

Marco Castelluccio [:marco]

Comment 35

•

2 years ago

Given the large number of manifests/groups that we run, 200 ms per manifest/group is still a pretty large saving.
Joel, other than reftest, where could we do this?

Flags: needinfo?(jmaher)

Joel Maher ( :jmaher ) (UTC -8)

Comment 36

•

2 years ago

reftest and web-platform-tests might be interesting. I assume we have all mochitests (webgl, media, a11y, browser, chrome, devtools, plain, etc.) with this exit(0). I don't believe there are other harnesses where we would see the scale of large volumes of tests and browser restarts.

This bug would be a good bug for a contributor or new hire.

Flags: needinfo?(jmaher)

Marco Castelluccio [:marco]

Updated

•

2 years ago

Blocks: 1778458

Marco Castelluccio [:marco]

Comment 37

•

2 years ago

(In reply to Joel Maher ( :jmaher ) (UTC -0800) (back July 11th) from comment #36)

reftest and web-platform-tests might be interesting.

I filed bug 1778458 for reftest.
wpt has bug 1608454 implemented according to bug 1608454 comment 4, so IDK if it would benefit from this as well.

I assume we have all mochitests (webgl, media, a11y, browser, chrome, devtools, plain, etc.) with this exit(0). I don't believe there are other harnesses where we would see the scale of large volumes of tests and browser restarts.

The two patches in this bug mention browser-chrome and plain, so I don't know if the other mochitest are covered as well by them. Are they? If not, we should file something like bug 1778458 for the rest of mochitest.

You need to log in before you can comment on or make changes to this bug.