Closed Bug 1685204 Opened 4 years ago Closed 3 years ago

Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at src/toolkit/components/terminator/nsTerminator.cpp:233 | application crashed [@ PR_NativeRunThread(void*)]

Categories

(Core :: DOM: Workers, defect, P2)

defect

Tracking

()

RESOLVED DUPLICATE of bug 1719481
Tracking Status
firefox86 --- wontfix
firefox90 --- wontfix
firefox91 --- affected
firefox92 --- affected

People

(Reporter: tsmith, Unassigned)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure, Whiteboard: [stockwell unknown])

Crash Data

We have seen this a few times recently while fuzzing. It just randomly happened while trying to get an rr trace for another issue.

A Pernosco session is available here: https://pernos.co/debug/pB_A9fWne6ZRjl2hOCO05Q/index.html

Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at src/toolkit/components/terminator/nsTerminator.cpp:233

#0 0xe4356b707c4 in mozilla::(anonymous namespace)::RunWatchdog(void*) /home/twsmith/code/mozilla-central/toolkit/components/terminator/nsTerminator.cpp:233:5
#1 0x7f188f590444 in _pt_root /home/twsmith/code/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:201:5
#2 0x535c1cb5a6da in start_thread (/lib/x86_64-linux-gnu/libpthread.so.0+0x76da)
#3 0x535c1d0a4a3e in clone /build/glibc-2ORdQG/glibc-2.27/misc/../sysdeps/unix/sysv/linux/x86_64/clone.S:95

Bugbug thinks this bug should belong to this component, but please revert this change in case of error.

Component: General → DOM: Workers
Blocks: 1505660
Severity: -- → S3
Priority: -- → P2
Crash Signature: [@ _PR_NativeRunThread(void*)]
Summary: Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at src/toolkit/components/terminator/nsTerminator.cpp:233 → Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at src/toolkit/components/terminator/nsTerminator.cpp:233 | application crashed [@ PR_NativeRunThread(void*)]

Update:

There have been 30 failures within the last 7 days:

  • 3 failures on Windows 10 x86 WebRender debug
  • 4 failures on windows10-64-2004-qr debug
  • 23 failures on Windows 10 x64 WebRender debug

Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=347032414&repo=mozilla-central&lineNumber=44423

[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread(void*)]
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - Crash dump filename: C:\Users\task_1627813005\AppData\Local\Temp\tmput6xzc5c.mozrunner\minidumps\483a0348-c7bb-4eb2-b5d2-a34ba6ae7dc5.dmp
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - Operating system: Windows NT
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO -                   10.0.17134 
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - CPU: amd64
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO -      family 6 model 85 stepping 7
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO -      8 CPUs
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - 
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - GPU: UNKNOWN
[task 2021-08-01T10:47:19.563Z] 10:47:19     INFO - 
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO - Crash reason:  EXCEPTION_BREAKPOINT
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO - Crash address: 0xc086e128
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO - Process uptime: 207 seconds
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO - 
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO - Thread 20 (crashed) - Shutdown Hang Terminator 0  xul.dll!mozilla::`anonymous namespace'::RunWatchdog(void*) [nsTerminator.cpp:c59236b26192d1299ae1353fefbdb9c147e01aa8 : 246 + 0x0]
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rax = 0x00007ffdc4395e41   rdx = 0x0000000000000000
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rcx = 0x00007ffdf0498880   rbx = 0x00007ffde60790f7
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rsi = 0x00007ffdfb1e3ca0   rdi = 0x0000000000000276
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rbp = 0x00007ffde60d1d08   rsp = 0x0000009d8393fbd0
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -      r8 = 0x0000009d8393fd80    r9 = 0x0000000000000021
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     r10 = 0x0000009d8393fd30   r11 = 0x00007ffdfdae0000
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     r12 = 0x000002564c849138   r13 = 0x000002564c849148
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     r14 = 0x0000000000000000   r15 = 0x00007ffde6079118
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rip = 0x00007ffdc086e128
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     Found by: given as instruction pointer in context
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -  1  nss3.dll!PR_NativeRunThread(void*) [pruthr.c:c59236b26192d1299ae1353fefbdb9c147e01aa8 : 399 + 0xe]
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rbx = 0x00007ffde60790f7   rbp = 0x00007ffde60d1d08
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     rsp = 0x0000009d8393fc20   r12 = 0x000002564c849138
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     r13 = 0x000002564c849148   r14 = 0x0000000000000000
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     r15 = 0x00007ffde6079118   rip = 0x00007ffde5f29462
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -     Found by: call frame info
[task 2021-08-01T10:47:19.564Z] 10:47:19     INFO -  2  nss3.dll!pr_root(void*) [w95thred.c:c59236b26192d1299ae1353fefbdb9c147e01aa8 : 139 + 0xd]
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     rbx = 0x00007ffde60790f7   rbp = 0x00007ffde60d1d08
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     rsp = 0x0000009d8393fca0   r12 = 0x000002564c849138
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     r13 = 0x000002564c849148   r14 = 0x0000000000000000
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     r15 = 0x00007ffde6079118   rip = 0x00007ffde5f19e41
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     Found by: call frame info
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -  3  ucrtbase.dll!RtlpHpSegPageRangeShrink + 0xda
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     rbx = 0x00007ffde60790f7   rbp = 0x00007ffde60d1d08
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     rsp = 0x0000009d8393fcd0   r12 = 0x000002564c849138
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     r13 = 0x000002564c849148   r14 = 0x0000000000000000
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     r15 = 0x00007ffde6079118   rip = 0x00007ffdfae6c4be
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO -     Found by: call frame info
[task 2021-08-01T10:47:19.565Z] 10:47:19     INFO - 

Jens, could you help us assign this to someone?
Thank you.

Flags: needinfo?(jstutte)
Whiteboard: [stockwell unknown] → [stockwell unknown][stockwell needswork:owner]

So this one looks interesting.

If I read the stacks right, the main thread is waiting for the IO thread to terminate while the IO thread is waiting for process_ to terminate while processing a ChildReaper runnable. I assume, that the process we are waiting for is a child process (can we see this somewhere?). This wait has been introduced in bug 1268559 fairly recently (5y ago) wrt the age of the IPC code around the join (9y).

IIUC the situation, it feels wrong to me that we wait endlessly for a child process to terminate (endlessly until the shutdown terminator triggers). I would expect it to be totally irrelevant for the parent process (and its final task to cleanly save the session state) if there is any child process alife after a certain shutdown stage (and we are in a very late stage here). We rather might want to kill them forced if they do not react to the quit message timely?

Flags: needinfo?(jstutte) → needinfo?(nika)

I believe this may be a similar issue to bug 1719481, as :mccr8 noted when linking the bugs together, so I think my bug 1719481 comment 10 also applies here. I'm guessing that the main difference is just that that bug is running under ccov, and this one is under debug, so the crashes probably look a bit different.

I think I'm going to dupe the bugs for now, and we can re-open this one if it turns out to be different.

Status: NEW → RESOLVED
Closed: 3 years ago
Flags: needinfo?(nika)
Resolution: --- → DUPLICATE

Thanks for clarifying, I did not read well through the other bug.

You need to log in before you can comment on or make changes to this bug.