Perma macOS shutdown hangs devtools/server/tests/xpcshell/test_extension_storage_actor.js | Test timed out + browser/components/extensions/test/xpcshell/test_ext_urlbar.js | Test timed out when Gecko 88 switches to Late Beta on 2021-04-02
Categories
(Toolkit :: General, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr78 | --- | unaffected |
firefox86 | --- | unaffected |
firefox87 | --- | unaffected |
firefox88 | + | fixed |
firefox89 | + | verified |
People
(Reporter: aryx, Assigned: jstutte)
References
(Regression)
Details
(Keywords: regression)
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details |
- central-as-late-beta simulation (both xpcshell failures on macOS platforms)
- How to run these simulations
- Failure logs
** devtools/server/tests/xpcshell/test_extension_storage_actor.js
** browser/components/extensions/test/xpcshell/test_ext_urlbar.js
There are several affects in each log.
The issue doesn't affect early beta and at least some tests are related to storage which indicates a dependency on the old storage implementation.
[task 2021-03-20T21:08:25.346Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.manager DEBUG before shutdown"
[task 2021-03-20T21:08:25.346Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.manager DEBUG shutdown"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.manager DEBUG Calling shutdown blocker for XPIProvider"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.xpi DEBUG shutdown"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.xpi-utils DEBUG shutdown"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.manager DEBUG Calling shutdown blocker for PluginProvider"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224729 addons.manager DEBUG Calling shutdown blocker for GMPProvider"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - "CONSOLE_MESSAGE: (info) 1616274224738 addons.manager DEBUG Async provider shutdown done"
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | WARNING: At least one completion condition is taking too long to complete. Conditions: [{"name":"ContentParent: id=11c6f1000","state":{"remoteTypePrefix":"prealloc"},"filename":"/builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp","lineNumber":3540,"stack":["resource://gre/modules/nsAsyncShutdown.jsm:addBlocker:162","/Users/cltbld/tasks/task_1616259816/build/tests/xpcshell/head.js:_do_main:239","/Users/cltbld/tasks/task_1616259816/build/tests/xpcshell/head.js:_execute_test:567","-e:null:1"]}] Barrier: profile-before-change
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ContentParent: id=11c6f1000","state":{"remoteTypePrefix":"prealloc"},"filename":"/builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp","lineNumber":3540,"stack":["resource://gre/modules/nsAsyncShutdown.jsm:addBlocker:162","/Users/cltbld/tasks/task_1616259816/build/tests/xpcshell/head.js:_do_main:239","/Users/cltbld/tasks/task_1616259816/build/tests/xpcshell/head.js:_execute_test:567","-e:null:1"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | WARNING: No crash reporter available
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | \x07[Parent 7401, Main Thread] ###!!! ABORT: file /builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp:3540
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | [Parent 7401, Main Thread] ###!!! ABORT: file /builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp:3540
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | ###!!! [Child][MessageChannel] Error: (msgtype=0x3A00E2,name=PContent::Msg_GraphicsError) Channel error: cannot send/recv
[task 2021-03-20T21:08:25.347Z] 21:08:25 INFO - PID 7401 | [GFX1-]: Receive IPC close with reason=AbnormalShutdown
Assignee | ||
Comment 1•4 years ago
|
||
Hi Paul, am I right that the use of prompter.jsm depends on configurations and thus we might just end up with an untested configuration here in late beta? I doubt that local storage is the direct culprit this time (though it might be involved in causing the hang we do not catch here). Can you point me somewhere how to configure the release behavior locally here? (Edit) Just realizing that it's Mac only, which I cannot test anyway.
Brain dump so far:
SpinEventLoopUntilOrQuit
is used only in few places other than prompter.jsm:
browser/base/content/tabbrowser.js
toolkit/components/passwordmgr/LoginManagerAuthPrompter.jsm
toolkit/content/widgets/browser-custom-element.js
Of particular interest might be browser/base/content/tabbrowser.js as (according to the comment) we seem to prompt right afterwards, though apparently only if the beforeUnloadPromises
really resolved. I wonder if we should be more explicit about errors to avoid any doubt of what is going to happen in the abort case.
It might also be the case, that there are other process types involved here for which we do not advance the correct shutdown phase, but then I would not understand why this happens only with late beta.
Assignee | ||
Updated•4 years ago
|
Updated•4 years ago
|
Updated•4 years ago
|
Assignee | ||
Comment 2•4 years ago
|
||
So I added some warnings and made a central-as-late-beta simulation on MacOS (thanks to aryx for the instructions).
It seems that AppShutdown::AdvanceShutdownPhase
is never called here which might cause a lack of shutdown notifications in general.
Are XPCShell tests (that involve XPI, it seems) invoked differently under MacOS (and this only on the release channel) ? Sounds weird.
I will launch the same test also for linux.
Assignee | ||
Comment 3•4 years ago
|
||
Selecting the tasks for linux I noticed, that we do not execute XPCShell tests at all on linux shippable? That could explain the OS difference, at least.
Anyway, here is the try run on linux-asan and macos in parallel as late beta. This might give us a hook for a pernosco session.
(Edit) I just realized that I was not using the right way to print out messages to the log. So I updated things and gave it a new try (link updated, too).
Reporter | ||
Comment 4•4 years ago
|
||
The XPCshell tasks run on Linux shippable for central-as-beta simulations and regular beta pushes. ./mach try chooser --full
offers XPCshell tests for shippable builds both on Linux and macOS here.
Assignee | ||
Comment 5•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #3)
Anyway, here is the try run on linux-asan and macos in parallel as late beta. This might give us a hook for a pernosco session.
From my first try runs it has been confirmed to be MacOS only, as reported. So no real hope for a pernosco session here.
Assignee | ||
Comment 6•4 years ago
|
||
So this time the failure is apparently happening on a different test. Filtering the output for timeouts yields:
[task 2021-03-23T16:01:25.472Z] 16:01:25 INFO - browser/components/tests/unit/test_distribution.js failed or timed out, will retry.
[task 2021-03-23T16:02:49.257Z] 16:02:49 INFO - extensions/pref/autoconfig/test/unit/test_autoconfig_nonascii.js failed or timed out, will retry.
[task 2021-03-23T16:06:15.859Z] 16:06:15 INFO - TEST-TIMEOUT | xpcshell.ini:browser/components/extensions/test/xpcshell/test_ext_urlbar.js | took 300120ms
[task 2021-03-23T16:06:57.547Z] 16:06:57 INFO - TEST-TIMEOUT | xpcshell-child-process.ini:dom/indexedDB/test/unit/test_keys.js | took 300038ms
[task 2021-03-23T16:07:33.208Z] 16:07:33 INFO - TEST-TIMEOUT | dom/push/test/xpcshell/test_unregister_invalid_json.js | took 300079ms
[task 2021-03-23T16:07:33.209Z] 16:07:33 INFO - dom/push/test/xpcshell/test_unregister_invalid_json.js failed or timed out, will retry.
[task 2021-03-23T16:08:16.759Z] 16:08:16 INFO - toolkit/components/telemetry/tests/unit/test_ChildScalars.js failed or timed out, will retry.
[task 2021-03-23T16:08:27.489Z] 16:08:27 INFO - toolkit/components/url-classifier/tests/unit/test_listmanager.js failed or timed out, will retry.
[task 2021-03-23T16:09:35.090Z] 16:09:35 INFO - TEST-TIMEOUT | netwerk/test/unit_ipc/test_alt-data_stream_wrap.js | took 300037ms
[task 2021-03-23T16:11:03.107Z] 16:11:03 INFO - TEST-TIMEOUT | xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_responseBody.js | took 300012ms
[task 2021-03-23T16:11:13.042Z] 16:11:13 INFO - TEST-TIMEOUT | xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript_triggeringPrincipal.js | took 300008ms
[task 2021-03-23T16:15:39.664Z] 16:15:39 INFO - TEST-START | security/manager/ssl/tests/unit/test_ocsp_timeout.js
[task 2021-03-23T16:16:16.303Z] 16:16:16 INFO - TEST-PASS | security/manager/ssl/tests/unit/test_ocsp_timeout.js | took 36591ms
[task 2021-03-23T16:18:21.769Z] 16:18:21 INFO - xpcshell.ini:browser/components/extensions/test/xpcshell/test_ext_urlbar.js failed or timed out, will retry.
[task 2021-03-23T16:18:50.576Z] 16:18:50 INFO - xpcshell-child-process.ini:dom/indexedDB/test/unit/test_keys.js failed or timed out, will retry.
[task 2021-03-23T16:21:35.155Z] 16:21:35 INFO - netwerk/test/unit_ipc/test_alt-data_stream_wrap.js failed or timed out, will retry.
[task 2021-03-23T16:22:59.502Z] 16:22:59 INFO - xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_responseBody.js failed or timed out, will retry.
[task 2021-03-23T16:23:13.035Z] 16:23:13 INFO - xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript_triggeringPrincipal.js failed or timed out, will retry.
[task 2021-03-23T16:23:29.457Z] 16:23:29 WARNING - TEST-UNEXPECTED-TIMEOUT | xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_responseBody.js | Test timed out
[task 2021-03-23T16:23:29.529Z] 16:23:29 INFO - PID 5688 | FATAL ERROR: AsyncShutdown timeout in profile-before-change Conditions: [{"name":"ContentParent: id=11fc8c800","state":{"remoteTypePrefix":"web"},"filename":"/builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp","lineNumber":3540,"stack":["resource://gre/modules/nsAsyncShutdown.jsm:addBlocker:162","resource://testing-common/ExtensionXPCShellUtils.jsm:_initBrowser:274","/Users/cltbld/tasks/task_1616512595/build/tests/xpcshell/head.js:_do_main:239","/Users/cltbld/tasks/task_1616512595/build/tests/xpcshell/head.js:_execute_test:567","-e:null:1"]},{"name":"ContentParent: id=12092f000","state":{"remoteTypePrefix":"prealloc"},"filename":"/builds/worker/checkouts/gecko/dom/ipc/ContentParent.cpp","lineNumber":3540,"stack":["resource://gre/modules/nsAsyncShutdown.jsm:addBlocker:162","/Users/cltbld/tasks/task_1616512595/build/tests/xpcshell/head.js:_do_main:239","/Users/cltbld/tasks/task_1616512595/build/tests/xpcshell/head.js:_execute_test:567","-e:null:1"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
It seems, we encounter several timeouts, some of them reported as INFO and not as error. And then something tries to abort the child process probably through an IPC message to the content processes:
[task 2021-03-23T16:23:29.527Z] 16:23:29 INFO - PID 5688 | [Child 5689: Main Thread]: D/AppShutdown AppShutdown::AdvanceShutdownPhase called with 1
[task 2021-03-23T16:23:29.527Z] 16:23:29 INFO - PID 5688 | [Child 5689: Main Thread]: I/AppShutdown AppShutdown::AdvanceShutdownPhase advancing to 1
[task 2021-03-23T16:23:29.527Z] 16:23:29 INFO - PID 5688 | [Child 5690: Main Thread]: D/AppShutdown AppShutdown::AdvanceShutdownPhase called with 1
[task 2021-03-23T16:23:29.528Z] 16:23:29 INFO - PID 5688 | [Child 5690: Main Thread]: I/AppShutdown AppShutdown::AdvanceShutdownPhase advancing to 1
This is the only touching point with the code I changed here, but it happens way after we already messed up, it seems. And the AsyncShutdown timeout
is happening immediately after, thus is not induced by AppShutdown::AdvanceShutdownPhase
(which potentially spins up the terminator thread, but this is not causing the FATAL ERROR
here - it would crash hard after a timeout).
I might not be the right person to investigate this deeply, but I cannot really see how the patch from bug 1696397 could cause those initial timeouts. The abort case of SpinEventLoopUntilOrQuit is never hit, it seems, so this should not influence anything. On the linux executions I see far less timeouts which cause a retry that then works. And I do not see any invocation of AppShutdown::AdvanceShutdownPhase
at all. To me all I see in the MacOS log just shows a sane Async Shutdown Blocker behavior after some unforeseen timeouts of unknown nature. Which might just be caused by slow execution?
To be sure, we could simulate, if a backout of the patch from bug 1696397 really would solve this problem? My hg "expertise" is not enough to do this on my own, though...
Assignee | ||
Comment 7•4 years ago
|
||
Before we do this, I pushed another try that tracks each call to nsThreadManager::SpinEventLoopUntilInternal
. If we find some of those next to the tests that time out and that have a value >0 attached to them, I might be wrong.
Reporter | ||
Comment 8•4 years ago
|
||
It's from bug 1696397. The pushlog lists a push by Hiro before the one with bug 1696397. The try push based on the former didn't hit the issue while the one on the latter did. The imported patch got used to fix some bustage.
Reporter | ||
Comment 9•4 years ago
|
||
And test_ext_webRequest_responseBody.js was always one of the failing tests as far as I checked but not the first one.
Assignee | ||
Comment 10•4 years ago
|
||
Thanks for clarifying. On my local tests, xpcshell seems to use a lot of nsThreadManager::SpinEventLoopUntil
but I do not see the nsThreadManager::SpinEventLoopUntilOrQuit
variant, which is the only one that should have changed behavior. Let's see if this is different in the try push. I am not an xpcshell expert at all, but there might be something we do different when we retry a timed out test execution.
Assignee | ||
Comment 11•4 years ago
|
||
So, just to summarize my findings so far (based on the latest push):
- there seem to be no calls to
SpinEventLoopUntilOrQuit
involved:
13 matches
[task 2021-03-23T20:20:14.771Z] 20:20:14 INFO - PID 4597 | [Parent 4597: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:20:14.772Z] 20:20:14 INFO - PID 4597 | [Parent 4597: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshell/head.js:_do_main) (0)
[task 2021-03-23T20:20:14.773Z] 20:20:14 INFO - PID 4597 | [Parent 4597: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshell/head.js:_do_main)|XPIProvider.jsm:awaitPromise (0)
[task 2021-03-23T20:20:15.136Z] 20:20:15 INFO - PID 4597 | [Parent 4597: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:20:15.136Z] 20:20:15 INFO - PID 4597 | [Parent 4597: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshel/head.js:_execute_test) (0)
[task 2021-03-23T20:37:29.190Z] 20:37:29 INFO - PID 4819 | [Parent 4819: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:37:29.191Z] 20:37:29 INFO - PID 4819 | [Parent 4819: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshell/head.js:_do_main) (0)
[task 2021-03-23T20:37:29.272Z] 20:37:29 INFO - PID 4819 | [Parent 4819: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:37:29.272Z] 20:37:29 INFO - PID 4819 | [Parent 4819: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshel/head.js:_execute_test) (0)
[task 2021-03-23T20:54:50.825Z] 20:54:50 INFO - PID 5027 | [Parent 5027: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:54:50.825Z] 20:54:50 INFO - PID 5027 | [Parent 5027: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshell/head.js:_do_main) (0)
[task 2021-03-23T20:54:50.955Z] 20:54:50 INFO - PID 5027 | [Parent 5027: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for PerTestCoverageUtils.jsm:awaitPromise (0)
[task 2021-03-23T20:54:50.955Z] 20:54:50 INFO - PID 5027 | [Parent 5027: Main Thread]: D/nsThreadManager SpinEventLoopUntilInternal called for Test(xpcshel/head.js:_execute_test) (0)
These are all instances of calls to SpinEventLoopUntilInternal
. The (n)
indicates the shutdown phase we should abort the loop for, 0
means no abort at all. So no SpinEventLoopUntilOrQuit
has been called, apparently. And in fact no additional message indicates that we aborted a loop due to shutdown (I added those messages on the push, too).
- There seems to be a correlation between earlier
INFO ... xxx test timed out, will retry.
and laterWARNING ... xxx | Test timed out
:
13 matches
[task 2021-03-23T19:32:37.337Z] 19:32:37 INFO - extensions/pref/autoconfig/test/unit/test_autoconfig_nonascii.js failed or timed out, will retry.
[task 2021-03-23T19:38:48.897Z] 19:38:48 INFO - toolkit/components/url-classifier/tests/unit/test_listmanager.js failed or timed out, will retry.
[task 2021-03-23T19:47:49.498Z] 19:47:49 INFO - xpcshell.ini:browser/components/extensions/test/xpcshell/test_ext_history.js failed or timed out, will retry.
[task 2021-03-23T19:48:36.512Z] 19:48:36 INFO - xpcshell-child-process.ini:dom/indexedDB/test/unit/test_keys.js failed or timed out, will retry.
[task 2021-03-23T19:48:44.132Z] 19:48:44 INFO - xpcshell.ini:browser/components/extensions/test/xpcshell/test_ext_urlbar.js failed or timed out, will retry.
[task 2021-03-23T19:51:32.918Z] 19:51:32 INFO - netwerk/test/unit_ipc/test_alt-data_stream_wrap.js failed or timed out, will retry.
[task 2021-03-23T19:53:04.033Z] 19:53:04 INFO - xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_filterResponseData.js failed or timed out, will retry.
[task 2021-03-23T19:53:06.037Z] 19:53:06 INFO - xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_responseBody.js failed or timed out, will retry.
[task 2021-03-23T19:53:14.320Z] 19:53:14 INFO - xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentscript_triggeringPrincipal.js failed or timed out, will retry.
[task 2021-03-23T19:55:27.250Z] 19:55:27 WARNING - TEST-UNEXPECTED-TIMEOUT | xpcshell.ini:browser/components/extensions/test/xpcshell/test_ext_urlbar.js | Test timed out
[task 2021-03-23T19:55:27.395Z] 19:55:27 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "SkippableTimer :: UrlbarProviderExtension notification timer :: Timed out!" {file: "resource:///modules/UrlbarUtils.jsm" line: 1935}]
[task 2021-03-23T19:55:27.402Z] 19:55:27 INFO - "CONSOLE_MESSAGE: (error) [JavaScript Error: "SkippableTimer :: UrlbarProviderExtension notification timer :: Timed out!" {file: "resource:///modules/UrlbarUtils.jsm" line: 1935}]
[task 2021-03-23T20:12:27.622Z] 20:12:27 WARNING - TEST-UNEXPECTED-TIMEOUT | xpcshell-e10s.ini:toolkit/components/extensions/test/xpcshell/test_ext_webRequest_responseBody.js | Test timed out
- It still just happens only on MacOS, which is puzzling me additionally. The later
WARNING - TEST-UNEXPECTED-TIMEOUT
are not happening under linux at all.
So from what I understand so far from the logs the move from nsThreadManager::SpinEventLoopUntilOrShutdown
to nsThreadManager::SpinEventLoopUntilOrQuit
seems not relevant here. But I might be overlooking something obvious here, :kmag ?
Comment 12•4 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #11)
- There seems to be a correlation between earlier
INFO ... xxx test timed out, will retry.
and laterWARNING ... xxx | Test timed out
:
Yes. When a test fails when run in parallel, the harness tries running it again on its own in case there's a race that we may not hit when under less load.
So from what I understand so far from the logs the move from
nsThreadManager::SpinEventLoopUntilOrShutdown
tonsThreadManager::SpinEventLoopUntilOrQuit
seems not relevant here. But I might be overlooking something obvious here, :kmag ?
These are all timeouts in async shutdown blockers waiting for ContentParent shutdown. The only way they'd have anything to do with your changes is if something's spinning a nested event loop in a child process and was affected by your changes. That seems unlikely, though.
Comment 13•4 years ago
|
||
(In reply to Kris Maglione [:kmag] from comment #12)
The only way they'd have anything to do with your changes is if something's spinning a nested event loop in a child process and was affected by your changes.
For what it's worth, spinEventLoopUntilOrQuit
in Prompter.jsm here: https://searchfox.org/mozilla-central/rev/cc4a17cea338fe236626d838ca96e9bf6a775675/toolkit/components/prompts/src/Prompter.jsm#1065 can be called in the content process. For example if a site calls window.alert
.
(In reply to Jens Stutte [:jstutte] from comment #1)
Hi Paul, am I right that the use of prompter.jsm depends on configurations and thus we might just end up with an untested configuration here in late beta?
I'm not aware of any late-beta specific behavior of Prompter.jsm itself, however its consumers (via nsIPrompt
, or nsIPromptService
) could behave differently.
Assignee | ||
Comment 14•4 years ago
|
||
(In reply to Paul Zühlcke [:pbz] from comment #13)
(In reply to Kris Maglione [:kmag] from comment #12)
The only way they'd have anything to do with your changes is if something's spinning a nested event loop in a child process and was affected by your changes.
For what it's worth,
spinEventLoopUntilOrQuit
in Prompter.jsm here: https://searchfox.org/mozilla-central/rev/cc4a17cea338fe236626d838ca96e9bf6a775675/toolkit/components/prompts/src/Prompter.jsm#1065 can be called in the content process. For example if a site callswindow.alert
.
The log messages I added seem to indicate, that this is not the case here. I added some more messages (also for direct non-JS SpinEventLoopUntil calls) and made a new try push.
My only remaining doubt is, if we still arrive here on all processes, thus I included also observer messages in the log.
Assignee | ||
Comment 15•4 years ago
|
||
I made a try push with a doubled timeout. It shows the same behavior, which means we clearly have some live lock here.
I am not able to force the same behavior locally under linux, even with ridiculously short timeouts.
(In reply to Jens Stutte [:jstutte] from comment #6)
To be sure, we could simulate, if a backout of the patch from bug 1696397 really would solve this problem? My hg "expertise" is not enough to do this on my own, though...
I do not see any path forward than trying a backout push and in case this heals compare the logs with the failing one.
Comment 16•4 years ago
|
||
(In reply to Paul Zühlcke [:pbz] from comment #13)
(In reply to Kris Maglione [:kmag] from comment #12)
The only way they'd have anything to do with your changes is if something's spinning a nested event loop in a child process and was affected by your changes.
For what it's worth,
spinEventLoopUntilOrQuit
in Prompter.jsm here: https://searchfox.org/mozilla-central/rev/cc4a17cea338fe236626d838ca96e9bf6a775675/toolkit/components/prompts/src/Prompter.jsm#1065 can be called in the content process. For example if a site callswindow.alert
.
This is happening in xpcshell tests, which don't support any sort of prompts (though I did check that the failing tests weren't using them just in case). And, even if they did, the processes that are blocking shutdown are usually preallocated processes, which wouldn't ever get a chance to run code that could trigger the prompt service.
Comment 17•4 years ago
|
||
OK, one possible significant difference I do see is that between bug 1689953 and bug 1696397, we now fire "quit-application" observers in content processes at ContentChild
shutdown, which we didn't before. That could mean that we're triggering observers that we didn't before, and it definitely means that we're calling a couple of observers that listen for both "content-child-will-shutdown"
and "quit-application"
an additional time.
Assignee | ||
Comment 18•4 years ago
|
||
Yeah, might be. I kind of anticipated the need to make this cleaner in bug 1697745. A faster hack could be to have a variant of AdvanceShutdownPhase that does not fire any observers.
Assignee | ||
Comment 19•4 years ago
|
||
Updated•4 years ago
|
Reporter | ||
Updated•4 years ago
|
Comment 20•4 years ago
|
||
Comment 21•4 years ago
|
||
bugherder |
Assignee | ||
Comment 22•4 years ago
|
||
Comment on attachment 9211363 [details]
Bug 1699983: Avoid firing unexpected shutdown observer notification in content process. r?kmag,#xpcom-reviewers
Beta/Release Uplift Approval Request
- User impact if declined: This fixes a regression caused by D107619. We have no evidence of concrete user impact, yet, but in order to be sure we will not cause any harm with the other patch we should uplift this.
Though we do not have a specific test for it, the regression has been detected by our automatic testing.
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: Yes
- Needs manual test from QE?: No
- If yes, steps to reproduce:
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): It is not risky and reverts a slightly risky change made by D107619, that is firing unexpected shutdown observer notifications in the content process.
- String changes made/needed:
Reporter | ||
Comment 23•4 years ago
|
||
Verified fixed with latest central-as-late-beta simulation. Thank you to everybody involved.
Comment 24•4 years ago
|
||
Comment on attachment 9211363 [details]
Bug 1699983: Avoid firing unexpected shutdown observer notification in content process. r?kmag,#xpcom-reviewers
Thanks indeed for getting this sorted out! Approved for 88.0b4.
Comment 25•4 years ago
|
||
bugherder uplift |
Updated•4 years ago
|
Description
•