Closed Bug 1814104 Opened 2 years ago Closed 2 years ago

FATAL ERROR: AsyncShutdown timeout in profile-change-teardown Conditions

Categories

(Core :: DOM: Content Processes, defect, P3)

All
Unspecified
defect

Tracking

()

RESOLVED FIXED
112 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox109 --- unaffected
firefox110 --- unaffected
firefox111 --- fixed
firefox112 --- fixed

People

(Reporter: petr.sumbera, Assigned: jstutte)

References

(Regression)

Details

(Keywords: regression)

Crash Data

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:102.0) Gecko/20100101 Firefox/102.0

Steps to reproduce:

Firefox on Solaris fails with:

FATAL ERROR: AsyncShutdown timeout in profile-change-teardown Conditions: [{"name":"Extension shutdown: webcompat@mozilla.org","state":{"state":"Startup: Run manifest, asyncEmitManifestEntry(\"background\")"},"filename":"resource://gre/modules/addons/XPIProvider.jsm","lineNumber":2630,"stack":["resource://gre/modules/addons/XPIProvider.jsm:startup/<:2630","resource://gre/modules/AsyncShutdown.sys.mjs:observe:576","resource:///modules/HeadlessShell.jsm:handleCmdLineArgs:266"]},{"name":"Extension shutdown: webcompat-reporter@mozilla.org","state":{"state":"Startup: Run manifest, asyncEmitManifestEntry(\"background\")"},"filename":"resource://gre/modules/addons/XPIProvider.jsm","lineNumber":2630,"stack":["resource://gre/modules/addons/XPIProvider.jsm:startup/<:2630","resource://gre/modules/AsyncShutdown.sys.mjs:observe:576","resource:///modules/HeadlessShell.jsm:handleCmdLineArgs:266"]},{"name":"Extension shutdown: screenshots@mozilla.org","state":{"state":"Startup: Run manifest, asyncEmitManifestEntry(\"background\")"},"filename":"resource://gre/modules/addons/XPIProvider.jsm","lineNumber":2630,"stack":["resource://gre/modules/addons/XPIProvider.jsm:startup/<:2630","resource://gre/modules/AsyncShutdown.sys.mjs:observe:576","resource:///modules/HeadlessShell.jsm:handleCmdLineArgs:266"]},{"name":"Extension shutdown: pictureinpicture@mozilla.org","state":{"state":"Startup: Run manifest, asyncEmitManifestEntry(\"background\")"},"filename":"resource://gre/modules/addons/XPIProvider.jsm","lineNumber":2630,"stack":["resource://gre/modules/addons/XPIProvider.jsm:startup/<:2630","resource://gre/modules/AsyncShutdown.sys.mjs:observe:576","resource:///modules/HeadlessShell.jsm:handleCmdLineArgs:266"]},{"name":"Extension shutdown: formautofill@mozilla.org","state":{"state":"Startup: Run manifest, asyncEmitManifestEntry(\"background\")"},"filename":"resource://gre/modules/addons/XPIProvider.jsm","lineNumber":2630,"stack":["resource://gre/modules/addons/XPIProvider.jsm:startup/<:2630","resource://gre/modules/AsyncShutdown.sys.mjs:observe:576","resource:///modules/HeadlessShell.jsm:handleCmdLineArgs:266"]}] At least one completion condition failed to complete within a reasonable amount of time. Causing a crash to ensure that we do not leave the user with an unresponsive process draining resources.
WARNING: No crash reporter available
^G[Parent 161944, Main Thread] ###!!! ABORT: file resource://gre/modules/addons/XPIProvider.jsm:2630
#01: NS_DebugBreak[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18325042]
#02: nsDebugImpl::Abort(char const*, int)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x183246fc]
#03: nsXPTCStubBase::nsXPTCStubBase(nsIXPTCProxy*, nsXPTInterfaceInfo const*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1856ebee]
#04: CallMethodHelper::Invoke()[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x195a679d]
#05: CallMethodHelper::Call()[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x195a4c14]
#06: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1958c5d7]
#07: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1959468f]
#08: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x22413e67]
#09: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f49e4]
#10: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f4ea9]
#11: js::CallFromStack(JSContext*, JS::CallArgs const&, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f4ef1]
#12: Interpret(JSContext*, js::RunState&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x22403aed]
#13: js::RunScript(JSContext*, js::RunState&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f44dd]
#14: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f4b91]
#15: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f4ea9]
#16: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f504d]
#17: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::MutableHandle<JS::Value>)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x2116f9fe]
#18: PromiseReactionJob(JSContext*, unsigned int, JS::Value*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x2141c7e8]
#19: CallJSNative(JSContext*, bool (*)(JSContext*, unsigned int, JS::Value*), js::CallReason, JS::CallArgs const&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x22413e67]
#20: js::InternalCallOrConstruct(JSContext*, JS::CallArgs const&, js::MaybeConstruct, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f49e4]
#21: InternalCall(JSContext*, js::AnyInvokeArgs const&, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f4ea9]
#22: js::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, js::AnyInvokeArgs const&, JS::MutableHandle<JS::Value>, js::CallReason)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x223f504d]
#23: JS::Call(JSContext*, JS::Handle<JS::Value>, JS::Handle<JS::Value>, JS::HandleValueArray const&, JS::MutableHandle<JS::Value>)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x211fecef]
#24: mozilla::dom::PromiseJobCallback::Call(mozilla::dom::BindingCallContext&, JS::Handle<JS::Value>, mozilla::ErrorResult&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1ac91180]
#25: mozilla::dom::PromiseJobCallback::Call(mozilla::ErrorResult&, char const*, mozilla::dom::CallbackObject::ExceptionHandling, JS::Realm*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18346cad]
#26: mozilla::dom::PromiseJobCallback::Call(char const*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18346d4e]
#27: mozilla::PromiseJobRunnable::Run(mozilla::AutoSlowOperation&)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1835854b]
#28: mozilla::CycleCollectedJSContext::PerformMicroTaskCheckPoint(bool)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18331be4]
#29: mozilla::CycleCollectedJSContext::AfterProcessTask(unsigned int)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18330f20]
#30: XPCJSContext::AfterProcessTask(unsigned int)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x19527b75]
#31: nsThread::ProcessNextEvent(bool, bool*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x18519fe9]
#32: nsXPTCStubBase::nsXPTCStubBase(nsIXPTCProxy*, nsXPTInterfaceInfo const*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1856ebee]
#33: CallMethodHelper::Invoke()[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x195a679d]
#34: CallMethodHelper::Call()[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x195a4c14]
#35: XPCWrappedNative::CallMethod(XPCCallContext&, XPCWrappedNative::CallMode)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1958c5d7]
#36: XPC_WN_CallMethod(JSContext*, unsigned int, JS::Value*)[/builds/psumbera/mozilla-central-build/INSTALLED/lib/firefox/libxul.so +0x1959468f]
#37: ??? (???:???)

The first bad revision is:
changeset: 650360:df20721143f6
user: Jens Stutte <jstutte@mozilla.com>
date: Tue Jan 24 03:22:52 2023 -0800
description:
Bug 1811195 - Bail out on AppShutdownConfirmed in DocumentLoadListener::OnStartRequest. r=smaug,necko-reviewers,nika,valentin

Differential Revision: https://phabricator.services.mozilla.com/D167395
Keywords: regression
Regressed by: 1811195

Hi! Does this happen always/on each shutdown? Did you check if that happens with a clean profile?

(In reply to Jens Stutte [:jstutte] from comment #1)

Hi! Does this happen always/on each shutdown? Did you check if that happens with a clean profile?

It happens every time I start Firefox.
It's part of my Fireofx builder on Solaris. It just do following:
firefox --screenshot about:blank
It starts always with new profile (empty HOME directory).

Note that I do have special patch in place (for SPARC which seems to be slower):

--- a/modules/libpref/init/all.js       Thu Mar 24 02:36:16 2022 -0700
+++ b/modules/libpref/init/all.js       Mon Aug 29 23:36:50 2022 -0700
@@ -703,7 +703,7 @@
 #elif defined(MOZ_CODE_COVERAGE)
   pref("toolkit.asyncshutdown.crash_timeout", 180000); // 3 minutes
 #else
-  pref("toolkit.asyncshutdown.crash_timeout", 60000); // 1 minute
+  pref("toolkit.asyncshutdown.crash_timeout", 300000); // 5 minutes
 #endif // !defined(MOZ_ASAN) && !defined(MOZ_TSAN)
 // Extra logging for AsyncShutdown barriers and phases
 pref("toolkit.asyncshutdown.log", false);

(In reply to Petr Sumbera from comment #0)

The first bad revision is:
changeset: 650360:df20721143f6
user: Jens Stutte <jstutte@mozilla.com>
date: Tue Jan 24 03:22:52 2023 -0800
description:
Bug 1811195 - Bail out on AppShutdownConfirmed in DocumentLoadListener::OnStartRequest. r=smaug,necko-reviewers,nika,valentin

Differential Revision: https://phabricator.services.mozilla.com/D167395

I assume this includes all patches from bug 1811195 ? Can you check if bug 1813559 helps ? We had some frequent intermittent hangs without that here, too.

Thanks!

Setting this to Core > DOM: Content Processes, if this is not the right component please move it to a more appropriate one. Thanks!

Component: Untriaged → DOM: Content Processes
Product: Firefox → Core

(In reply to Jens Stutte [:jstutte] from comment #4)

(In reply to Petr Sumbera from comment #0)

The first bad revision is:
changeset: 650360:df20721143f6
user: Jens Stutte <jstutte@mozilla.com>
date: Tue Jan 24 03:22:52 2023 -0800
description:
Bug 1811195 - Bail out on AppShutdownConfirmed in DocumentLoadListener::OnStartRequest. r=smaug,necko-reviewers,nika,valentin

Differential Revision: https://phabricator.services.mozilla.com/D167395

I assume this includes all patches from bug 1811195 ?

This is output of 'hg bisect'. It means this is the first bad commit (and yes there were other your commits which followed).

Can you check if bug 1813559 helps ? We had some frequent intermittent hangs without that here, too.

It still fails with latest sources (which include now also changes from bug 1813559).

For recorrd. I can still avoid the issue with lastest sources by reverting problematic commit:

diff -r 6ae9a5505bee -r 2f72239aa024 netwerk/ipc/DocumentLoadListener.cpp
--- a/netwerk/ipc/DocumentLoadListener.cpp      Wed Feb 01 00:09:29 2023 +0200
+++ b/netwerk/ipc/DocumentLoadListener.cpp      Wed Feb 01 01:26:30 2023 -0800
@@ -2424,16 +2424,10 @@

   auto* loadingContext = GetLoadingBrowsingContext();
   if (!loadingContext || loadingContext->IsDiscarded()) {
-    Cancel(NS_ERROR_UNEXPECTED, "No valid LoadingBrowsingContext."_ns);
+    DisconnectListeners(NS_ERROR_UNEXPECTED, NS_ERROR_UNEXPECTED);
     return NS_ERROR_UNEXPECTED;
   }

-  if (AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdownConfirmed)) {
-    Cancel(NS_ERROR_ILLEGAL_DURING_SHUTDOWN,
-           "Aborting OnStartRequest after shutdown started."_ns);
-    return NS_OK;
-  }
-
   // Block top-level data URI navigations if triggered by the web. Logging is
   // performed in AllowTopLevelNavigationToDataURI.
   if (!nsContentSecurityManager::AllowTopLevelNavigationToDataURI(mChannel)) {

(In reply to Petr Sumbera from comment #2)

It's part of my Fireofx builder on Solaris. It just do following:
firefox --screenshot about:blank
It starts always with new profile (empty HOME directory).

Probably I see the same here. I think we shifted something that makes it more likely that we want to shutdown some modules (here extensions) before we even started them up completely. We know that shutting down too fast can cause problems in general (also before), but here we have an easy to reproduce case, it seems. I assume this is not directly content process related but I'll leave it here for now for further investigation.

Hardware: Unspecified → All

(In reply to Petr Sumbera from comment #7)

For recorrd. I can still avoid the issue with lastest sources by reverting problematic commit:

-  if (AppShutdown::IsInOrBeyond(ShutdownPhase::AppShutdownConfirmed)) {
-    Cancel(NS_ERROR_ILLEGAL_DURING_SHUTDOWN,
-           "Aborting OnStartRequest after shutdown started."_ns);
-    return NS_OK;
-  }

I assume only this part matters here. Extensions have this two-step shutdown blockers that relies on sending something to the content process running the extension. Olli, could it be that we break that logic here ? Basically we did this to avoid we try to create new processes, but here we probably affect existing ones, too?

Flags: needinfo?(smaug)

Tried this on Nightly on Linux:
It leads to the following errors (although, screenshot is saved):

WARNING: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
WARNING: addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:727:15
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:523:26
addBlocker@resource://gre/modules/AsyncShutdown.sys.mjs:458:15
setupShutdown@resource://gre/modules/osfile/osfile_async_front.jsm:1548:28
@resource://gre/modules/osfile/osfile_async_front.jsm:1568:16
@resource://gre/modules/osfile.jsm:12:30
@resource://gre/modules/TelemetryStorage.sys.mjs:10:28
checkPendingPings@resource://gre/modules/TelemetrySend.sys.mjs:863:17
setup@resource://gre/modules/TelemetrySend.sys.mjs:803:18
setup@resource://gre/modules/TelemetrySend.sys.mjs:241:30
setupTelemetry/this.delayedInitTask<@resource://gre/modules/TelemetryControllerParent.sys.mjs:829:36
observe@resource://gre/modules/AsyncShutdown.sys.mjs:576:16
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/AsyncShutdown.sys.mjs, line 727: Error: Phase "profile-before-change" is finished, it is too late to register completion condition "OS.File: flush I/O queued before profileBeforeChange"
JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 89: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsITelemetry.getHistogramById]
JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 89: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsITelemetry.getHistogramById]
JavaScript error: resource://gre/modules/TerminatorTelemetry.jsm, line 89: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsITelemetry.getHistogramById]
[Parent 70971, Main Thread] WARNING: FuncScope not on stack!: file /home/aiunusov/mozilla-unified/dom/canvas/WebGLContext.cpp:1771
[Parent 70971, Main Thread] WARNING: FuncScope not on stack!: file /home/aiunusov/mozilla-unified/dom/canvas/WebGLContext.cpp:1771
[Parent 70971, IPDL Background] WARNING: IPC Connection Error: [Parent][PBackgroundParent] RunMessage(msgname=PRemoteWorkerService::Msg___delete
) Channel closing: too late to send/recv, messages will be lost: file /home/aiunusov/mozilla-unified/ipc/glue/MessageChannel.cpp:1927
[Socket 71047, Main Thread] WARNING: IPC message 'PSocketProcess::Msg_FOGData' discarded: actor cannot send: file /home/aiunusov/mozilla-unified/ipc/glue/ProtocolUtils.cpp:504
[Socket 71047, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /home/aiunusov/mozilla-unified/modules/libpref/Preferences.cpp:4663
Destroying context 0x7f3857efd040 surface 0x7f3857eb5800 on display 0x7f38616d9800
[Parent 70971, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /home/aiunusov/mozilla-unified/xpcom/base/nsCycleCollector.cpp:3384
[Parent 70971, Main Thread] WARNING: NS_ENSURE_TRUE(InitStaticMembers()) failed: file /home/aiunusov/mozilla-unified/obj-x86_64-pc-linux-gnu/dist/include/mozilla/Preferences.h:129
[Parent 70971, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /home/aiunusov/mozilla-unified/modules/libpref/Preferences.cpp:4663

(In reply to Artur Iunusov from comment #10)

Tried this on Nightly on Linux:
It leads to the following errors (although, screenshot is saved):

./obj-x86_64-pc-linux-gnu/dist/bin/firefox --screenshot about:blank
*** You are running in headless mode.
[GFX1-]: glxtest: VA-API test failed: missing or old libva-drm library.
Screenshot saved to: /home/aiunusov/mozilla-unified/screenshot.png
console.error: (new TypeError("NetworkError: Network request failed", "resource://services-settings/Utils.jsm", 237))
console.error: Region.sys.mjs: "Error fetching region" (new TypeError("NetworkError when attempting to fetch resource.", ""))
console.error: Region.sys.mjs: "Failed to fetch region" (new Error("NO_RESULT", "resource://gre/modules/Region.sys.mjs", 416))

That's not the same error, apparently. Could it be that for whatever reasons --screenshot enables --headless only on Linux and that prevents some things from being started that otherwise would start?

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true
Flags: needinfo?(aiunusov)

It seems that it's true for the following platforms: XP_WIN || MOZ_WIDGET_GTK || XP_MACOSX:
https://searchfox.org/mozilla-central/source/toolkit/xre/nsAppRunner.cpp#3922
On other systems we will not see that error, but according to gfxPlatform::IsHeadless()), the mode is headless indeed.

Flags: needinfo?(aiunusov)

and here we set it: https://searchfox.org/mozilla-central/rev/e7dd9d9ba128478e1ff399778e851365ced3c806/toolkit/xre/nsAppRunner.cpp#3916

you could disable that check to try to reproduce locally on Linux.

Duplicate of this bug: 1815118
Duplicate of this bug: 1815456
Duplicate of this bug: 1815244

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsDebugImpl::Abort(char const*, int)]

(In reply to Jens Stutte [:jstutte] from comment #14)

and here we set it: https://searchfox.org/mozilla-central/rev/e7dd9d9ba128478e1ff399778e851365ced3c806/toolkit/xre/nsAppRunner.cpp#3916

you could disable that check to try to reproduce locally on Linux.

tried. Got exactly the same log, except line "*** You are running in headless mode.".

Rest of the log is same.

Duplicate of this bug: 1813651

Copying crash signatures from duplicate bugs.

Crash Signature: [@ nsDebugImpl::Abort(char const*, int)] → [@ nsDebugImpl::Abort(char const*, int)] [@ void __stdcall wil::details::DebugBreak]

Set release status flags based on info from the regressing bug 1811195

Duplicate of this bug: 1816359

Jari could this be triaged for severity, wondering the impact on 111?

Crash Signature: [@ nsDebugImpl::Abort(char const*, int)] [@ void __stdcall wil::details::DebugBreak] → [@ nsDebugImpl::Abort(char const*, int)] [@ void __stdcall wil::details::DebugBreak]
Flags: needinfo?(jjalkanen)

I wonder if this relates to bug 1799421 (and if the patch there might help to avoid this, though I doubt it, there is probably more to it).

In any case I would not think that this is very relevant for normal usage - this seems to be specific to a very early shutdown while the browser is still starting.

Severity: -- → S3
Priority: -- → P3
Attached file startuphang.json (deleted) —

So the shutdown blocker hang annotation shows clearly that all extensions are still starting while we should ask them to shutdown. I wonder if their startup is really so slow or if the early shutdown request makes them just hang without ever resolving their promises?

And there seem to be already some handling against races between extension startup and shutdown. But there is yet a huge window of time where many things happen until we create the promise we see in the hang report within which we can already be asked to shutdown.

I wonder if we should just explicitly wait for all the startup promises to have finished at the beginning of shutdown rather than hoping to be able to abort the startup at any time?

Flags: needinfo?(tomica)
Flags: needinfo?(smaug)
Flags: needinfo?(jjalkanen)

(In reply to Jens Stutte [:jstutte] from comment #29)

I wonder if we should just explicitly wait for all the startup promises to have finished at the beginning of shutdown rather than hoping to be able to abort the startup at any time?

Oh, that seems already to be the case? But IIUC we wait there blocking the quitApplicationGranted barrier rather than returning a promise we wait for in profileChangeTeardown. So maybe we should move that await into the async execution part?

Hi Petr, as it seems I am not able to reproduce your exact failure, would you mind to try out the attached patch and/or the one from bug 1799421 (single and combined) ? In particular if it still happens with the patch from this bug, we probably know that XPIProvider startup is just slower than our timeout.

Flags: needinfo?(petr.sumbera)

I have tested attached patch file and I'm not able to reproduce the issue with it.

Flags: needinfo?(petr.sumbera)

(In reply to Petr Sumbera from comment #34)

I have tested attached patch file and I'm not able to reproduce the issue with it.

Thanks for your support! That's nice to hear. I need to check if we want to have this as a paper over patch now and work on some better solution later, but it seems easy enough to be good to go, for now.

Assignee: nobody → jstutte
Attachment #9318799 - Attachment description: WIP: Bug 1814104 - Do not shutdown XPIProvider before it is fully started. → Bug 1814104 - Do not shutdown XPIProvider before it is fully started. r?mixedpuppy
Status: NEW → ASSIGNED

Just for a better understanding of the root cause:

Each single extensions did already wait with its shutdown until the startup of that single extension completed or rejected.

The patch from comment 7 made us cancel document load requests after reaching ShutdownPhase::AppShutdownConfirmed aka quit-application. If still in the process of starting up, this can make us abort the loading of the extension, apparently without rejecting its startupPromise.

The proposed patch "heals" this by blocking the earlier quitApplicationGranted until all extensions loaded, ensuring each of their startupPromise resolves before continuing shutdown.

It seems it would be more robust and faster to find a way to make each extension's startupPromise reject if DocumentLoadListener::OnStartRequest is canceling, but I am not really able to follow the path that would ensure this.

Hey Jens, in REO triage we noticed the attached patch has been waiting on review for a week. Does that need a nudge or a different reviewer / reviewer group to move along for 111?

Flags: needinfo?(jstutte)
Pushed by jstutte@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/c5c6075af679 Do not shutdown XPIProvider before it is fully started. r=rpl
Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 112 Branch

(In reply to Petr Sumbera from comment #34)

I have tested attached patch file and I'm not able to reproduce the issue with it.

Hi Petr, would you mind to re-test your case with current central? Thanks for your support!

Flags: needinfo?(tomica)
Flags: needinfo?(petr.sumbera)
Flags: needinfo?(jstutte)

I can confirm that with latest sources I don't see the problem.

Flags: needinfo?(petr.sumbera)

The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?

  • If yes, please nominate the patch for beta approval.
  • If no, please set status-firefox111 to wontfix.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

Comment on attachment 9318799 [details]
Bug 1814104 - Do not shutdown XPIProvider before it is fully started. r?mixedpuppy

Beta/Release Uplift Approval Request

  • User impact if declined: Shutdown hangs if asked to shutdown while still starting. It seems these can happen in the wild in particular in background tasks, where the direct impact is probably low.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): The patch is simply collecting extension startup promises and waits until they are settled before entering shutdown.
  • String changes made/needed:
  • Is Android affected?: Unknown
Flags: needinfo?(jstutte)
Attachment #9318799 - Flags: approval-mozilla-beta?

Comment on attachment 9318799 [details]
Bug 1814104 - Do not shutdown XPIProvider before it is fully started. r?mixedpuppy

Approved for 111.0b8

Attachment #9318799 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: