1675068 - Service workers delaying shutdown

Reporter

Description

•

4 years ago

After the nasty deadlock in bug 1588152 was resolved, it turns out that there are still reports of service workers delaying the shutdown. The top 3 async shutdown timeout reasons, accounting for 60% of the crashes, are:

{"shutdownStates":"parent process IPDL background thread, ","pendingPromises":1,"acceptingPromises":false}
{"shutdownStates":"","pendingPromises":0,"acceptingPromises":true}
{"shutdownStates":"content process main thread, ","pendingPromises":1,"acceptingPromises":false}

Yaron Tausky [:ytausky]

Reporter

Updated

•

4 years ago

Assignee: nobody → ytausky

Status: NEW → ASSIGNED

Crash Signature: [@ AsyncShutdownTimeout | profile-change-teardown | ServiceWorkerShutdownBlocker: shutting down Service Workers ]

Florian Quèze [:florian]

Comment 1

•

4 years ago

(In reply to Yaron Tausky [:ytausky] from comment #0)

{"shutdownStates":"content process main thread, ","pendingPromises":1,"acceptingPromises":false}

I just encountered a crash today that falls in this category: https://crash-stats.mozilla.org/report/index/bac8b5b8-29da-487f-a42d-71cf40201104#tab-metadata

I have semi-reliable steps to reproduce. When I do startup profiling on Windows (start with MOZ_PROFILER_STARTUP=1 in the environment), sometimes when I attempt to collect the profile, the profiler tab doesn't load, and remain stuck with the animated loading indicator, and the profiler.firefox.com domain name as the tab title, but the tab doesn't actually load. The profiler website uses a service worker. When a profiler tab fails to load like this, Firefox shutdown never completes cleanly.

On a slow test laptop, I reproduce quite consistently when doing a cold startup profile. On my devlopment machine I could reproduce about half the time with my local build (so I might be able to test things there).

Jens Stutte [:jstutte]

Updated

•

4 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1675708

Jens Stutte [:jstutte]

Updated

•

4 years ago

Severity: -- → S3

Priority: -- → P2

Itiel

Comment 2

•

4 years ago

I woke my laptop, immediately updated Nightly and after ~2-3 minutes of the updater running in the background I got this crash:
https://crash-stats.mozilla.org/report/index/0c366038-3f4a-42d8-bdbe-181e70201116

Florian Quèze [:florian]

Updated

•

4 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1677421

alex_mayorga

Comment 3

•

4 years ago

¡Hola!

Build ID 20201211213049 crashed here:

https://crash-stats.mozilla.org/report/index/7bb232a4-9947-4e39-acdf-a71050201216#tab-details

Updated flags per the crash data available.

¡Gracias!
Alex

status-firefox85: --- → affected

status-firefox86: --- → affected

status-firefox-esr78: --- → affected

OS: Unspecified → All

Hardware: Unspecified → All

alex_mayorga

Comment 4

•

4 years ago

¡Hola y'all!

Build ID 20210429214231 crashed here:

https://crash-stats.mozilla.org/report/index/56c30794-a886-45dd-9af5-b00c20210430#tab-details

Updated flags per the crash data available:

https://crash-stats.mozilla.org/signature/?product=Firefox&signature=AsyncShutdownTimeout%20%7C%20profile-change-teardown%20%7C%20ServiceWorkerShutdownBlocker%3A%20shutting%20down%20Service%20Workers

¡Gracias!
Alex

status-firefox89: --- → affected

status-firefox90: --- → affected

Jens Stutte [:jstutte]

Updated

•

3 years ago

Assignee: ytausky → jstutte

alex_mayorga

Comment 5

•

3 years ago

¡Hola y'all!

Build ID 20220129091708 crashed here:

https://crash-stats.mozilla.org/report/index/56c30794-a886-45dd-9af5-b00c20210430#tab-details

Updated flags per the crash data available:

https://crash-stats.mozilla.org/signature/?product=Firefox&signature=AsyncShutdownTimeout%20%7C%20profile-change-teardown%20%7C%20ServiceWorkerShutdownBlocker%3A%20shutting%20down%20Service%20Workers

¡Gracias!
Alex

status-firefox96: --- → affected

status-firefox97: --- → affected

status-firefox98: --- → affected

Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout)

Comment 6

•

2 years ago

This signature started to get crash reports again at the end of last week (10-20 crash reports per Nighty). Regression range:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=720d5125a9b4aa6750806ed9b51fbd0811da10c4&tochange=59134b451eec1454be5f2e489176b4b59f1ddb5a

Yulia, could bug 1742438 be related?

Flags: needinfo?(ystartsev)

Yulia Startsev [:yulia] | Back in Oct.

Comment 7

•

2 years ago

Yes, possibly. Hopefully I have a fix for this regression already.

Flags: needinfo?(ystartsev)

Dianna Smith [:diannaS]

Updated

•

2 years ago

status-firefox104: --- → affected

status-firefox85: affected → wontfix

status-firefox86: affected → wontfix

status-firefox89: affected → wontfix

status-firefox90: affected → wontfix

status-firefox96: affected → wontfix

status-firefox97: affected → wontfix

status-firefox98: affected → wontfix

tracking-firefox104: --- → +

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

2 years ago

The bug is marked as tracked for firefox104 (nightly). However, the bug still has low severity.

:jstutte, could you please increase the severity for this tracked bug? If you disagree with the tracking decision, please talk with the release managers.

For more information, please visit auto_nag documentation.

Flags: needinfo?(jstutte)

Jens Stutte [:jstutte]

Comment 9

•

2 years ago

(In reply to Yulia Startsev [:yulia] from comment #7)

Yes, possibly. Hopefully I have a fix for this regression already.

Yulia, could you please link the bug with the fix here?

Flags: needinfo?(jstutte) → needinfo?(ystartsev)

Yulia Startsev [:yulia] | Back in Oct.

Comment 10

•

2 years ago

(discussed with jens offline)

I took a look, and this appears to still be happening. I believe this is due to the cache promise cleanup potentially sometimes being late. it is similar to the points made by Yaron above a few years ago, but now that it is done independently of the ScriptLoader shutdown it may be lagging or failing. I'll investigate.

Yulia Startsev [:yulia] | Back in Oct.

Comment 11

•

2 years ago

I think I found the underlying issue. Cancellation was changed into "informing" rather than triggering a kill of the worker. However, it is possible for the worker to be killed without going through cancellation, which means we don't do clean up. I am looking in to how to make this always go through the cancellation path / do cleanup.

Jens Stutte [:jstutte]

Comment 12

•

2 years ago

(removing myself as assignee as Yulia seems to be on top of it)

Assignee: jstutte → ystartsev

Yulia Startsev [:yulia] | Back in Oct.

Updated

•

2 years ago

Depends on: 1783190

Dianna Smith [:diannaS]

Updated

•

2 years ago

status-firefox104: affected → wontfix

status-firefox105: --- → affected

tracking-firefox104: + → ---

BugBot [:suhaib / :marco/ :calixte]

Comment 13

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly

:yulia, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(ystartsev)

Keywords: topcrash

BugBot [:suhaib / :marco/ :calixte]

Comment 14

•

2 years ago

Based on the topcrash criteria, the crash signature linked to this bug is not a topcrash signature anymore.

For more information, please visit auto_nag documentation.

Keywords: topcrash

Yulia Startsev [:yulia] | Back in Oct.

Comment 15

•

2 years ago

I am still tracking this bug, as there is a bit more work to do regarding the cancellation path. Will update once I am finished.

Flags: needinfo?(ystartsev)

Yulia Startsev [:yulia] | Back in Oct.

Comment 16

•

2 years ago

I've reviewed the crashes in the last week, and they do not appear to be related to the ScriptLoader shutdown any more, they appear to be hanging in the JS interpreter, which looks unrelated.

Jens Stutte [:jstutte]

Comment 17

•

2 years ago

Indeed we are back to normal here. Unassigning Yulia but we keep it open for further monitoring.

Assignee: ystartsev → nobody

Status: ASSIGNED → NEW

BugBot [:suhaib / :marco/ :calixte]

Comment 18

•

2 years ago

The bug is linked to a topcrash signature, which matches the following criterion:

Top 10 desktop browser crashes on nightly

:jmarshall, could you consider increasing the severity of this top-crash bug?

For more information, please visit auto_nag documentation.

Flags: needinfo?(jmarshall)

Keywords: topcrash

Jens Stutte [:jstutte]

Updated

•

2 years ago

Flags: needinfo?(jmarshall) → needinfo?(jstutte)

Donal Meehan [:dmeehan]

Comment 19

•

2 years ago

:jstutte this has started spiking in recent nightly builds.
Could someone take a look, is it related to Bug 1812490?

Donal Meehan [:dmeehan]

Updated

•

2 years ago

status-firefox105: affected → wontfix

status-firefox109: --- → wontfix

status-firefox110: --- → wontfix

status-firefox111: --- → affected

Donal Meehan [:dmeehan]

Updated

•

2 years ago

tracking-firefox111: --- → +

Jens Stutte [:jstutte]

Comment 20

•

2 years ago

The spike seemed to start with landing of bug 1811195, which has a remote worker related part. Then we landed the fix for bug 1812490 and numbers went down a lot, but seem to be still significantly higher than before. My assumption would be, that one of the two conditions we changed in bug 1811195 is not triggered anymore since bug 1812490 landed, but the other is?

:asuth, could it be that the bail out in BackgroundParentImpl::AllocPRemoteWorkerControllerParent is doing more harm than good?

Flags: needinfo?(jstutte) → needinfo?(bugmail)

Jens Stutte [:jstutte]

Comment 21

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #20)

:asuth, could it be that the bail out in BackgroundParentImpl::AllocPRemoteWorkerControllerParent is doing more harm than good?

From here that looks very likely.

Jens Stutte [:jstutte]

Updated

•

2 years ago

Depends on: 1813559

Jens Stutte [:jstutte]

Updated

•

2 years ago

Flags: needinfo?(bugmail)

Andrew Sutherland [:asuth] (he/him)

Comment 22

•

2 years ago

(In reply to Jens Stutte [:jstutte] from comment #20)

:asuth, could it be that the bail out in BackgroundParentImpl::AllocPRemoteWorkerControllerParent is doing more harm than good?

Yeah, I think this would probably break the state machine and so removing it again in https://phabricator.services.mozilla.com/D168264 makes sense.

BugBot [:suhaib / :marco/ :calixte]

Comment 23

•

2 years ago

This is a reminder regarding comment #8!

The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

Jens Stutte [:jstutte]

Comment 24

•

2 years ago

(In reply to Release mgmt bot [:suhaib / :marco/ :calixte] from comment #23)

This is a reminder regarding comment #8!

The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 9 days. However, the bug still isn't assigned.

We do not yet see data after bug 1813559 landed (which is tracked, too), the hope would be that we can untrack this bug again after that landed.

BugBot [:suhaib / :marco/ :calixte]

Comment 25

•

2 years ago

This is a reminder regarding comment #8!

The bug is marked as tracked for firefox111 (nightly). We have limited time to fix this, the soft freeze is in 8 days. However, the bug still isn't assigned.

Donal Meehan [:dmeehan]

Comment 26

•

2 years ago

Removing tracking for 111, the crash volume dropped after Bug 1813559 landed.

tracking-firefox111: + → ---

phsaldanha

Comment 27

•

2 years ago

I can reproduce this bug by opening all my 40+ bookmarks and closing the browser before all pages are loaded. Sometimes it happens and sometimes it doesn’t, but most of the time I get the same crash.
STR:
1 - Open Firefox Nightly 102.0a1
2 - Right-click bookmark bar -> Open all bookmarks
3 - Close the browser right after opening all the bookmarks without waiting for them to finish loading

Results: Firefox closes most of the processes, but a few particular processes remain open for a while until I get the crash notification.

Expected results: Firefox should close all the tabs.

Using Windows 11

Crash report: https://crash-stats.mozilla.org/report/index/205110cc-31e3-4c5c-ab0d-2df8a0230221

Andrew Sutherland [:asuth] (he/him)

Updated

•

2 years ago

Depends on: 1825446

Comment hidden (off-topic)

Jens Stutte [:jstutte]

Comment 32

•

1 year ago

Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria (similar to https://github.com/mozilla/bugbug/issues/351) ?

Flags: needinfo?(smujahid)

Suhaib Mujahid [:suhaib]

Comment 33

•

1 year ago

(In reply to Jens Stutte [:jstutte] from comment #32)

Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria

Thank you for reporting this! The bot account for the release day and includes the previous version in the first week after the release date. However, it seems that contradictions in the versions on the release day caused this. I filed an issue to follow up on this: https://github.com/mozilla/bugbot/issues/2100

(similar to https://github.com/mozilla/bugbug/issues/351) ?

I guess you mean https://github.com/mozilla/bugbug/issues/3516

Flags: needinfo?(smujahid)

Comment hidden (off-topic)

Jens Stutte [:jstutte]

Comment 37

•

1 year ago

(In reply to Suhaib Mujahid [:suhaib] from comment #33)

(In reply to Jens Stutte [:jstutte] from comment #32)

Could it be that the bot is getting confused around merge day about how to query the nightly topcrash criteria

Thank you for reporting this! The bot account for the release day and includes the previous version in the first week after the release date. However, it seems that contradictions in the versions on the release day caused this. I filed an issue to follow up on this: https://github.com/mozilla/bugbot/issues/2100

It seems it is happening again.

Flags: needinfo?(smujahid)

Suhaib Mujahid [:suhaib]

Comment 38

•

1 year ago

:jstutte I will prioritize working on #2100 to prevent this from happening again.

Flags: needinfo?(smujahid)

Comment hidden (off-topic)

Worcester12345

Comment 40

•

1 year ago

"Perf" key word?

Dennis Clarke

Comment 41

•

1 year ago

This crash may fall into the same category : 

https://crash-stats.mozilla.org/report/index/3e31cb79-abdb-46e7-83da-0f8680230819#tab-bugzilla

I see Signature: [@ AsyncShutdownTimeout | profile-change-teardown | ServiceWorkerShutdownBlocker: shutting down Service Workers ]

Comment hidden (off-topic)