Closed
Bug 1142229
Opened 10 years ago
Closed 10 years ago
Debug emulator shutdown timeout caused by dom/ipc/test_NuwaProcessCreation.html and dom/ipc/test_NuwaProcessDeadlock.html
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: ahal, Assigned: cyu)
References
Details
Attachments
(2 files, 2 obsolete files)
(deleted),
patch
|
sinker
:
review+
bajaj
:
approval-mozilla-b2g32+
bajaj
:
approval-mozilla-b2g34+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
mrbkap
:
review+
bajaj
:
approval-mozilla-b2g32+
bajaj
:
approval-mozilla-b2g34+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
All debug emulator mochitest-11 jobs have a shutdown timeout in them (e.g search for TEST-UNEXPECTED-FAIL in [1]). These are green because of a special hack in mozharness just for this issue.
RyanVM and I recently figured out test_NuwaProcessCreation.html and test_NuwaProcessDeadlock.html are somehow causing it. Interestingly, disabling either of those tests fixes the issue. It's only when both are run together that the timeout occurs:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=0019c0ba19fd
https://treeherder.mozilla.org/#/jobs?repo=try&revision=698c547cf90d
[1] http://ftp.mozilla.org/pub/mozilla.org/b2g/try-builds/ahalberstadt@mozilla.com-e035ba199aec/try-emulator-debug/try_ubuntu64_vm-b2g-emulator-debug_test-mochitest-debug-11-bm122-tests1-linux64-build72.txt.gz
Comment 1•10 years ago
|
||
Just so it's clear here, we're very likely to be disabling these tests in short order as they're blocking other patches from landing and are causing a legitimate permafail at the moment.
Assignee | ||
Comment 2•10 years ago
|
||
Just to be sure, are we closing both test cases or just one to make it green? We need at least one test case as a guard for breakage of Nuwa functionality.
Comment 3•10 years ago
|
||
Ideally we'd disable neither :-). That said, happy to defer to your preference if it comes to that.
Assignee | ||
Comment 4•10 years ago
|
||
I can look into it, but the problem was only seen on the try server as I remember. Debugging is really painful as I could only printf and push to try to see the output. If there is debuggable environment for such cases, we can save lots of time on similar bugs. Do you have any suggestion?
Comment 5•10 years ago
|
||
Assignee | ||
Comment 6•10 years ago
|
||
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #5)
> Loaner slave?
> https://wiki.mozilla.org/ReleaseEngineering/How_To/Request_a_slave
That looks pretty delicious. I'll try that.
Reporter | ||
Updated•10 years ago
|
Comment 7•10 years ago
|
||
What do you think about disabling test_NuwaProcessDeadlock.html so we can fix this and bug 1132368 in one shot?
Flags: needinfo?(cyu)
Assignee | ||
Comment 8•10 years ago
|
||
We still need the test case to guard against new stuff added to the Nuwa process that creates a deadlock as the original bug did. I agree that we disable the test case temporarily, rinse, and then reenable the test case.
Flags: needinfo?(cyu)
Assignee | ||
Comment 9•10 years ago
|
||
This can be reproduced locally so no need for loaner slave.
Assignee | ||
Comment 10•10 years ago
|
||
Hooking the Nuwa test cases under ProcessPriorityManager turns out to be a bad idea. Setting process priorities is an indirect signal that can subject to timing problems easily. I think it's better to redesign the test cases that use more direct signals from where the Nuwa process are created/shutdown.
Assignee | ||
Comment 11•10 years ago
|
||
This should not block and timeout: https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce95abcb1760
The root cause of this timeout is:
1. We shut down the Nuwa process.
1.1 The run loop of the main thread exits.
1.2 The Nuwa process shuts down XPCOM
1.3 Some observer of xpcom-shutdown fires a Timer to the Timer thread, which is frozen.
1.4 The Nuwa process deadlocks.
2. The b2g process starts to shut down.
2.1 The compositor thread shuts down, which cannot complete because the compositor thread is ref-counted.
2.2 The b2g process deadlocks.
The patch contains rework of the test cases and adds logic to shut down the Nuwa process. Ideally we can thaw the threads to let them shut down correctly. But it's still not functioning. I use _exit() to force shutting down the process, but it leads to abort() crashes in the b2g process. I need to figure out how to fix it. At least, with this patch we should not see timeouts anymore.
Assignee | ||
Comment 12•10 years ago
|
||
This cleans up the test case and remove the use of flaky timeouts. Also we use a dedicated pref value 'dom.ipc.preallocatedProcessManager.testMode' only for Nuwa test cases.
Attachment #8584449 -
Attachment is obsolete: true
Attachment #8586642 -
Flags: review?(ryanvm)
Assignee | ||
Comment 13•10 years ago
|
||
Attachment #8586644 -
Flags: review?(tlee)
Assignee | ||
Comment 14•10 years ago
|
||
Comment 15•10 years ago
|
||
Comment on attachment 8586642 [details] [diff] [review]
Part 1: refactor the test cases.
I'm not a peer. Wish I could help here, but way out of my jurisdiction :)
Flags: needinfo?(cyu)
Attachment #8586642 -
Flags: review?(ryanvm)
Assignee | ||
Comment 16•10 years ago
|
||
Comment on attachment 8586642 [details] [diff] [review]
Part 1: refactor the test cases.
Request review from :mrbkap.
Blake, since Kyle is on vacation, would you review the test cases? Thanks.
Flags: needinfo?(cyu)
Attachment #8586642 -
Flags: review?(mrbkap)
Updated•10 years ago
|
Attachment #8586644 -
Flags: review?(tlee) → review+
Comment 17•10 years ago
|
||
Comment on attachment 8586642 [details] [diff] [review]
Part 1: refactor the test cases.
Review of attachment 8586642 [details] [diff] [review]:
-----------------------------------------------------------------
Clearing this request until my question about the second pushPrefEnv calls is resolved.
::: dom/ipc/tests/test_NuwaProcessCreation.html
@@ +39,5 @@
> cpmm.removeMessageListener("TEST-ONLY:nuwa-add-new-process", msgHandler);
> +
> +
> + SpecialPowers.pushPrefEnv({
> + 'clear': [
This shouldn't be necessary. The initial pushPrefEnv knows that it has to clear the prefs if they don't exist to begin with. This just creates an extra round-trip that doesn't do anything.
::: dom/ipc/tests/test_NuwaProcessDeadlock.html
@@ +39,5 @@
> cpmm.removeMessageListener("TEST-ONLY:nuwa-add-new-process", msgHandler);
> +
> +
> + SpecialPowers.pushPrefEnv({
> + 'clear': [
Same here.
Attachment #8586642 -
Flags: review?(mrbkap)
Assignee | ||
Comment 18•10 years ago
|
||
Update to the previous revision: remove the unnecessary SpecialPowers.pushPrefEnv() call to clear the pref values on test case shutdown.
Attachment #8586642 -
Attachment is obsolete: true
Attachment #8589569 -
Flags: review?(mrbkap)
Comment 19•10 years ago
|
||
Comment on attachment 8589569 [details] [diff] [review]
Part 1: refactor the test cases (v2),
Review of attachment 8589569 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good. Thanks!
Attachment #8589569 -
Flags: review?(mrbkap) → review+
Assignee | ||
Comment 20•10 years ago
|
||
Comment 21•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/a255a2f9d4f3
https://hg.mozilla.org/mozilla-central/rev/6b67bc79ec2d
Assignee: nobody → cyu
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox40:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla40
Comment 22•10 years ago
|
||
Nice, I've confirmed that this actually fixes some permafails we recently began to encounter on the older release branches as well after the chunking boundaries changed. Please can we nominate these patches for b2g32/34/37 approval? I've already done the rebases and verified green on Try.
status-b2g-v2.0:
--- → affected
status-b2g-v2.1:
--- → affected
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
status-firefox38:
--- → wontfix
status-firefox39:
--- → wontfix
Flags: needinfo?(cyu)
Comment 23•10 years ago
|
||
The alternative to approval being that I disable the Nuwa tests on the release branches, but that doesn't seem like an ideal outcome when we have a relatively small, simple patch that can fix them instead.
Assignee | ||
Comment 24•10 years ago
|
||
It's pretty safe change so we should uplift to release branches. Part 2 contains a change in IPC, but no one is ever going to Close() the Nuwa process except in test cases that changes the pref "dom.ipc.processPrelaunch.enabled" dynamically.
Flags: needinfo?(cyu)
Assignee | ||
Comment 25•10 years ago
|
||
Comment on attachment 8586644 [details] [diff] [review]
Part 2: Shut down the Nuwa process with QuickExit()
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 922465
User impact if declined: None. This is a test case fix.
Testing completed: Automatic tests on try and locally.
Risk to taking this patch (and alternatives if risky): Pretty low. This is a fix in test case. Also the change is only reachable in debug build, it's never seen in release build.
Attachment #8586644 -
Flags: approval-mozilla-b2g37?
Attachment #8586644 -
Flags: approval-mozilla-b2g34?
Attachment #8586644 -
Flags: approval-mozilla-b2g32?
Assignee | ||
Comment 26•10 years ago
|
||
Comment on attachment 8589569 [details] [diff] [review]
Part 1: refactor the test cases (v2),
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): bug 922465
User impact if declined: None. This is a test case fix.
Testing completed: Automatic tests on try and locally.
Risk to taking this patch (and alternatives if risky): Pretty low. This is a fix in test case.
Attachment #8589569 -
Flags: approval-mozilla-b2g37?
Attachment #8589569 -
Flags: approval-mozilla-b2g34?
Attachment #8589569 -
Flags: approval-mozilla-b2g32?
Updated•10 years ago
|
Attachment #8586644 -
Flags: approval-mozilla-b2g37?
Attachment #8586644 -
Flags: approval-mozilla-b2g37+
Attachment #8586644 -
Flags: approval-mozilla-b2g34?
Attachment #8586644 -
Flags: approval-mozilla-b2g34+
Attachment #8586644 -
Flags: approval-mozilla-b2g32?
Attachment #8586644 -
Flags: approval-mozilla-b2g32+
Comment 27•10 years ago
|
||
Comment on attachment 8589569 [details] [diff] [review]
Part 1: refactor the test cases (v2),
Spoke offline with Ryan and he gave me context and we feel its better to uplift this to help with permafails else we would have to end up disabling nuwa tests. Please note, these are deemed low risk as well, hence the a+
Attachment #8589569 -
Flags: approval-mozilla-b2g37?
Attachment #8589569 -
Flags: approval-mozilla-b2g37+
Attachment #8589569 -
Flags: approval-mozilla-b2g34?
Attachment #8589569 -
Flags: approval-mozilla-b2g34+
Attachment #8589569 -
Flags: approval-mozilla-b2g32?
Attachment #8589569 -
Flags: approval-mozilla-b2g32+
Comment 28•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/a3286b24ae86
https://hg.mozilla.org/releases/mozilla-b2g37_v2_2/rev/ba89756104b0
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/d6a5a21e289f
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1/rev/24860a5e6c84
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/068cd1b55d7d
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0/rev/29d943c277ab
https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/b8551dbe2a77
https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/3b992f316710
status-b2g-v1.4:
--- → fixed
Comment 29•10 years ago
|
||
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1s/rev/d6a5a21e289f
https://hg.mozilla.org/releases/mozilla-b2g34_v2_1s/rev/24860a5e6c84
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0m/rev/068cd1b55d7d
https://hg.mozilla.org/releases/mozilla-b2g32_v2_0m/rev/29d943c277ab
status-b2g-v2.0M:
--- → fixed
status-b2g-v2.1S:
--- → fixed
Comment hidden (off-topic) |
Comment 32•10 years ago
|
||
Ouch, I meant to leave comment 31 in bug 1099195. Wrong tab. Sorry about that.
Flags: needinfo?(khuey)
You need to log in
before you can comment on or make changes to this bug.
Description
•