Intermittent [Tier 2] Last test finished | application crashed [@ PR_NativeRunThread(void*)] | After Last test finished | application terminated with exit code 1
Categories
(Core :: IPC, defect, P3)
Tracking
()
Fission Milestone | Future |
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox93 | --- | unaffected |
firefox94 | --- | affected |
firefox95 | --- | affected |
firefox96 | --- | affected |
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: crash, intermittent-failure, Whiteboard: [fission:m96?][stockwell disabled][stockwell unknown])
Crash Data
Attachments
(2 files)
Filed by: nbeleuzu [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=351255856&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Bmk8FOpkS-OnNkO1cEw4sw/runs/0/artifacts/public/logs/live_backing.log
[task 2021-09-11T23:21:31.450Z] 23:21:31 INFO - PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread(void*)]
[task 2021-09-11T23:21:31.450Z] 23:21:31 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Crash dump filename: C:\Users\task_163140047228689\AppData\Local\Temp\tmpb0mrwjnq.mozrunner\minidumps\d3e6ec8a-8e13-4d83-a902-04c303b32dde.dmp
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Operating system: Windows NT
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - 10.0.19041
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - CPU: amd64
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - family 6 model 85 stepping 7
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - 8 CPUs
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO -
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - GPU: UNKNOWN
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO -
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Crash address: 0xc6d80448
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Process uptime: 847 seconds
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO -
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - Thread 19 tid 5288 (crashed) - Shutdown Hang Terminator 0 xul.dll!mozilla::`anonymous namespace'::RunWatchdog(void*) [nsTerminator.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 246 + 0x0]
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - rax = 0x00007ffdca5b6001 rdx = 0x0000000000000000
[task 2021-09-11T23:21:31.466Z] 23:21:31 INFO - rcx = 0x00007ffdf6f19898 rbx = 0x00007ffdf28f90f7
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rsi = 0x00007ffdfd0bada0 rdi = 0x0000000000000276
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rbp = 0x00007ffdf2951d08 rsp = 0x000000cdc87ffae0
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r8 = 0x000000cdc87ffcc0 r9 = 0x00007ffdfee70000
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r10 = 0x00007ffdfeec2651 r11 = 0x000000cdc87ff4a0
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r12 = 0x000002b626924938 r13 = 0x000002b626924948
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r14 = 0x0000000000000000 r15 = 0x00007ffdf28f9118
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rip = 0x00007ffdc6d80448
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - Found by: given as instruction pointer in context
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - 1 nss3.dll!PR_NativeRunThread(void*) [pruthr.c:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 399 + 0xe]
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rbx = 0x00007ffdf28f90f7 rbp = 0x00007ffdf2951d08
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rsp = 0x000000cdc87ffb30 r12 = 0x000002b626924938
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r13 = 0x000002b626924948 r14 = 0x0000000000000000
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r15 = 0x00007ffdf28f9118 rip = 0x00007ffdf27a9462
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - 2 nss3.dll!pr_root(void*) [w95thred.c:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 139 + 0xd]
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rbx = 0x00007ffdf28f90f7 rbp = 0x00007ffdf2951d08
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - rsp = 0x000000cdc87ffbb0 r12 = 0x000002b626924938
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r13 = 0x000002b626924948 r14 = 0x0000000000000000
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - r15 = 0x00007ffdf28f9118 rip = 0x00007ffdf2799e41
[task 2021-09-11T23:21:31.467Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - 3 ucrtbase.dll!RtlpReAllocateHeap + 0x1ce
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rbx = 0x00007ffdf28f90f7 rbp = 0x00007ffdf2951d08
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rsp = 0x000000cdc87ffbe0 r12 = 0x000002b626924938
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - r13 = 0x000002b626924948 r14 = 0x0000000000000000
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - r15 = 0x00007ffdf28f9118 rip = 0x00007ffdfca41bb2
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO -
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - Thread 0 tid 6544 - GeckoMain
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - 0 ntdll.dll!NtWaitForSingleObject + 0x14
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rax = 0x0000000000000004 rdx = 0x0000000000000000
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rcx = 0x0000000000000428 rbx = 0x0000000000000000
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rsi = 0x0000000000000000 rdi = 0x0000000000000428
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - rbp = 0x000000cdc57ff660 rsp = 0x000000cdc57ff268
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - r8 = 0x0000000000000010 r9 = 0x0000000000000008
[task 2021-09-11T23:21:31.468Z] 23:21:31 INFO - r10 = 0x0000000000000000 r11 = 0x00007ffdfeff0248
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - r12 = 0xaaaaaaaaaaaaaaaa r13 = 0x000002b626909490
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - r14 = 0x0000000000000428 r15 = 0x0000000000000000
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - rip = 0x00007ffdfef0cea4
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - Found by: given as instruction pointer in context
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - 1 KERNELBASE.dll!RtlReleaseSRWLockShared + 0xae
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - rsp = 0x000000cdc57ff270 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdfc5c19ce
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - 2 xul.dll!base::MessagePumpForIO::ScheduleWork() [message_pump_win.cc:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 389 + 0xd]
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.469Z] 23:21:31 INFO - rsp = 0x000000cdc57ff2a0 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc1497271
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - 3 xul.dll!MessageLoop::PostTask_Helper(already_AddRefed<nsIRunnable>, int) [message_loop.cc:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 422 + 0xd]
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - rsp = 0x000000cdc57ff2f0 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc14aaebd
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - 4 xul.dll!base::Thread::Stop() [thread.cc:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 124 + 0x9]
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - rsp = 0x000000cdc57ff390 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.470Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc14b5350
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - 5 xul.dll!mozilla::ipc::BrowserProcessSubThread::~BrowserProcessSubThread() [BrowserProcessSubThread.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 40 + 0xf]
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rsp = 0x000000cdc57ff400 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc14fe3eb
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - 6 xul.dll!mozilla::ShutdownXPCOM(nsIServiceManager*) [XPCOMInit.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 796 + 0x1d]
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rsp = 0x000000cdc57ff440 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc0c6c473
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - 7 xul.dll!ScopedXPCOMStartup::~ScopedXPCOMStartup() [nsAppRunner.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 1683 + 0x8]
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.471Z] 23:21:31 INFO - rsp = 0x000000cdc57ff530 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc6dc23b4
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - 8 xul.dll!XREMain::XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 5493 + 0x19]
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rsp = 0x000000cdc57ff610 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc6dd1bf2
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - 9 xul.dll!XRE_main(int, char**, mozilla::BootstrapConfig const&) [nsAppRunner.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 5535 + 0x10]
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rsp = 0x000000cdc57ff720 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdc6dd24f2
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - 10 firefox.exe!NS_internal_main(int, char**, char**) [nsBrowserApp.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 386 + 0x10f]
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.472Z] 23:21:31 INFO - rsp = 0x000000cdc57ff880 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ff64ea3183e
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - 11 firefox.exe!wmain(int, wchar_t**) [nsWindowsWMain.cpp:96fd1bb41036f14d9b18f9d1e462f8382453c6f2 : 131 + 0x15]
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rsp = 0x000000cdc57ffa60 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ff64ea3121b
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - 12 firefox.exe!__scrt_common_main_seh() [exe_common.inl : 288 + 0x22]
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rsp = 0x000000cdc57ffb20 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ff64eacfce8
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - Found by: call frame info
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - 13 kernel32.dll!LdrpSnapKernelBaseExtensions + 0x150
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rbx = 0x0000000000000000 rbp = 0x000000cdc57ff660
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - rsp = 0x000000cdc57ffb60 r12 = 0xaaaaaaaaaaaaaaaa
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r13 = 0x000002b626909490 r14 = 0x0000000000000428
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - r15 = 0x0000000000000000 rip = 0x00007ffdfd0b7034
[task 2021-09-11T23:21:31.473Z] 23:21:31 INFO - Found by: call frame info
Comment hidden (Intermittent Failures Robot) |
Comment 2•3 years ago
|
||
Marking as P3 since it's intermittent but not assigning since it is low frequency and debug only.
Comment hidden (Intermittent Failures Robot) |
Comment 4•3 years ago
|
||
In the past week, we had 63 occurrences for this crash, all on windows10-64-2004-qr debug.
Most recent failure: https://treeherder.mozilla.org/logviewer?job_id=352725786&repo=autoland&lineNumber=28567
[task 2021-09-26T11:44:12.393Z] 11:44:12 INFO - PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread(void*)]
[task 2021-09-26T11:44:12.393Z] 11:44:12 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-09-26T11:44:12.393Z] 11:44:12 INFO - Crash dump filename: C:\Users\task_163265260581469\AppData\Local\Temp\tmphubucl6h.mozrunner\minidumps\f8875539-a866-402b-8e7e-c82bc80cfa56.dmp
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - Operating system: Windows NT
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - 10.0.19041
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - CPU: amd64
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - family 6 model 85 stepping 7
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - 8 CPUs
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO -
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - GPU: UNKNOWN
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO -
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - Crash address: 0x4d7c2b78
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - Process uptime: 451 seconds
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO -
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - Thread 20 tid 4956 (crashed) - Shutdown Hang Terminator 0 xul.dll!mozilla::`anonymous namespace'::RunWatchdog(void*) [nsTerminator.cpp:9237deee110beeaa7605195528a273b2bbc77352 : 246 + 0x0]
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - rax = 0x00007ff85100d341 rdx = 0x0000000000000000
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - rcx = 0x00007ff8784cc948 rbx = 0x00007ff87748a0f7
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - rsi = 0x00007ff8803eada0 rdi = 0x0000000000000276
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - rbp = 0x00007ff8774e2d08 rsp = 0x00000001847ffd90
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - r8 = 0x00000001847fff70 r9 = 0x00007ff881d70000
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - r10 = 0x00007ff881dc2651 r11 = 0x00000001847ff750
[task 2021-09-26T11:44:12.394Z] 11:44:12 INFO - r12 = 0x00000196b1d04938 r13 = 0x00000196b1d04948
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - r14 = 0x0000000000000000 r15 = 0x00007ff87748a118
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rip = 0x00007ff84d7c2b78
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - Found by: given as instruction pointer in context
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - 1 nss3.dll!PR_NativeRunThread(void*) [pruthr.c:9237deee110beeaa7605195528a273b2bbc77352 : 399 + 0xe]
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rbx = 0x00007ff87748a0f7 rbp = 0x00007ff8774e2d08
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rsp = 0x00000001847ffde0 r12 = 0x00000196b1d04938
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - r13 = 0x00000196b1d04948 r14 = 0x0000000000000000
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - r15 = 0x00007ff87748a118 rip = 0x00007ff877339462
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - Found by: call frame info
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - 2 nss3.dll!pr_root(void*) [w95thred.c:9237deee110beeaa7605195528a273b2bbc77352 : 139 + 0xd]
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rbx = 0x00007ff87748a0f7 rbp = 0x00007ff8774e2d08
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rsp = 0x00000001847ffe60 r12 = 0x00000196b1d04938
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - r13 = 0x00000196b1d04948 r14 = 0x0000000000000000
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - r15 = 0x00007ff87748a118 rip = 0x00007ff877329e41
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - Found by: call frame info
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - 3 ucrtbase.dll!RtlpReAllocateHeap + 0x1ce
[task 2021-09-26T11:44:12.395Z] 11:44:12 INFO - rbx = 0x00007ff87748a0f7 rbp = 0x00007ff8774e2d08
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - rsp = 0x00000001847ffe90 r12 = 0x00000196b1d04938
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - r13 = 0x00000196b1d04948 r14 = 0x0000000000000000
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - r15 = 0x00007ff87748a118 rip = 0x00007ff87f6d1bb2
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - Found by: call frame info
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - 4 kernel32.dll!LdrpSnapKernelBaseExtensions + 0x150
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - rbp = 0x00007ff8774e2d08 rsp = 0x00000001847ffec0
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - rip = 0x00007ff8803e7034
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO - Found by: stack scanning
[task 2021-09-26T11:44:12.396Z] 11:44:12 INFO -
Comment hidden (Intermittent Failures Robot) |
Comment 6•3 years ago
|
||
Pleas re-evaluate, it's failing frequently.
Comment 7•3 years ago
|
||
Kris or Jens, can you take a look at what's happening here and/or advise on how we'd get more details? I guess this is from https://searchfox.org/mozilla-central/rev/3fa5cc437a4937c621ea068ba5dc246f75831633/toolkit/components/terminator/nsTerminator.cpp#246 , but I have no idea how to investigate. Naïvely, it looks like we're hanging while writing shutdown leak logs or something? This appears to happen exclusively on Windows debug, but on different tests, primarily browser chrome but also plain ones.
Comment 8•3 years ago
|
||
Clearing priority/severity so this goes back to being flagged in triage.
Comment 9•3 years ago
|
||
My only suggestion would be to increase toolkit.asyncshutdown.crash_timeout
and see if this moves the needle. From this log it seems that we pass ~70 seconds from starting the shutdown to the crash, which is close to the default timeout of 60s + 3s additional time (which cannot be reduced).
Comment 10•3 years ago
|
||
An increase to 120s didn't improve the situation.
Comment 11•3 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #10)
An increase to 120s didn't improve the situation.
At this failure frequency, is it possible to bisect down what broke things? That might be the quickest way of making progress.
Comment 12•3 years ago
|
||
This seems to be similar to bug 1684441, note that the curves of the intermittent frequency have a very similar shape. We filed bug 1728721 there that might help also here, in many cases shutdown hangs are caused by spinning a nested event loop that waits for a condition that cannot be satisfied any more, and even slight timing and/or order changes in the event processing elsewhere might move the frequency up or down.
And over in bug 1731564 I started work on having the SpinEventLoopUntil annotations also for C++ calls.
Comment 13•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #11)
At this failure frequency, is it possible to bisect down what broke things? That might be the quickest way of making progress.
Not sure this is a new issue, the signature gets reported for the first time on 2021-07-07 - just a change in reported failure lines?
Comment 14•3 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #13)
(In reply to :Gijs (he/him) from comment #11)
At this failure frequency, is it possible to bisect down what broke things? That might be the quickest way of making progress.
Not sure this is a new issue, the signature gets reported for the first time on 2021-07-07 - just a change in reported failure lines?
Hm, looks like that may have gotten filed as bug 1719481? I guess I assumed there was a change in frequency, but yes, a change in reported lines might also affect this - or perhaps this broke on tests of one kind first and then on tests of another kind? :-(
It's unfortunate this is Windows-specific, as it means we can't use pernosco to gather information about what's happening at the time of the shutdown hang. bug 1719481 comment 10 suggests that the minidumps might help, and blames bug 1685204 which has since been fixed (though 1719481 itself is still open and recorded failure instances in the past week). I don't know much about Windows minidumps and/or how to extract useful information out of them. I tried for a bit using the guide at https://firefox-source-docs.mozilla.org/contributing/debugging/debugging_a_minidump.html but didn't get very far; adding the moco symbol server and MS symbol server doesn't seem to be enough to load the symbol dump, possibly because the Windows version and my (up-to-date) Win10 install have different versions for ntdll.dll and such. :gsvelto, do you know how to do this and/or do we have better / more up-to-date documentation?
I did then look at the raw log from comment 0 some more, and there is no event loop in any of the stacks, beyond the main event loop, so I don't think this is related to the nested event loop spinning as comment #12 suggested. Jens, do you have any other hypotheses, or ways we could investigate further?
Comment 15•3 years ago
|
||
(In reply to :Gijs (he/him) from comment #14)
It's unfortunate this is Windows-specific, as it means we can't use pernosco to gather information about what's happening at the time of the shutdown hang. bug 1719481 comment 10 suggests that the minidumps might help, and blames bug 1685204 which has since been fixed (though 1719481 itself is still open and recorded failure instances in the past week). I don't know much about Windows minidumps and/or how to extract useful information out of them. I tried for a bit using the guide at https://firefox-source-docs.mozilla.org/contributing/debugging/debugging_a_minidump.html but didn't get very far; adding the moco symbol server and MS symbol server doesn't seem to be enough to load the symbol dump, possibly because the Windows version and my (up-to-date) Win10 install have different versions for ntdll.dll and such. :gsvelto, do you know how to do this and/or do we have better / more up-to-date documentation?
AFAIK you will not be able to find the symbols for xul.dll on the symbol server if this is not an official build. I am not aware of another way to get them for this specific dump.
I did then look at the raw log from comment 0 some more, and there is no event loop in any of the stacks, beyond the main event loop, so I don't think this is related to the nested event loop spinning as comment #12 suggested. Jens, do you have any other hypotheses, or ways we could investigate further?
Sorry for the noise then, I should have examined the stacks. Still it remains curious that on bug 1684441 we see a very similar trend in the frequency.
Comment 16•3 years ago
|
||
For Try pushes, use the push's menu in Treeherder (top right) > Add new jobs (Search)
> Activate Use full job list
> Search for symbol
> Pick the upload-symbols
task for the desired platform and wait for it to complete.
Comment 17•3 years ago
|
||
Once the symbols are uploaded you can simply run~/.mozbuild/minidump_stackwalkminidump_stackwalk <path_to_the_minidump.dmp>
. It will fetch symbols automatically and will get you a stack trace much faster than Visual Studio. I recommend using Visual Studio only if you need to inspect local variables and arguments, or poke other data on the stack.
Comment 18•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #15)
Sorry for the noise then, I should have examined the stacks. Still it remains curious that on bug 1684441 we see a very similar trend in the frequency.
I think the peaks are from Sundays, when there are a much lower number of pushes altogether, and so frequency calculations spike if the number of mozilla-central pushes compared to autoland pushes fluctuates significantly.
Comment 19•3 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #17)
Once the symbols are uploaded you can simply run
~/.mozbuild/minidump_stackwalkminidump_stackwalk <path_to_the_minidump.dmp>
. It will fetch symbols automatically and will get you a stack trace much faster than Visual Studio. I recommend using Visual Studio only if you need to inspect local variables and arguments, or poke other data on the stack.
Per discussion with Gabriele, I don't think running minidump_stackwalk will yield information that's not already in the stacks that are in the logs.
At this point, I'm not sure how to make progress here; I think it's going to require one of the XPCOM / crash-stack-tea-leaf-reader folks to take a look, I suspect. I poked at some of the logs - the Breakpad ExceptionHandler
is busy writing something to disk in each crash, but I don't know if that's just normal because we've caught it in breakpad or what. There are usually a handful of TaskController threads that are waiting on a condvar (on Windows, via NtWaitForAlertByThreadId
) and the crash from comment 0 has us stopped inside the destructor of ipc::BrowserProcessSubThread, in RtlReleaseSRWLockShared
, but I don't see that in the other crash logs, and I don't know if that's relevant, coincidence, or a red herring...
Adding in :mccr8 in the hope that either he or Kris can (find someone to) help move this forward
Comment hidden (Intermittent Failures Robot) |
Comment 21•3 years ago
|
||
I extracted a better stack trace including symbols for the Windows system libraries from the crash in comment 0 and here's what I can see:
- The main thread is stuck here waiting for the thread to be joined.
- A child process of some kind seems to have crashed, because the breakpad exception handler thread is busy writing out a minidump
- All the other threads appear to be waiting
- Not knowing all the IPC bits my guess is that the main thread is waiting for a child process to shut down, but the child crashed, and taking minidump is taking so long that we time out
It's hard to tell which child crashed but I think I can divine it by opening the minidump with Visual Studio
Comment 22•3 years ago
|
||
FYI this is a better stack trace of the crash in comment 0
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•3 years ago
|
||
1 KERNELBASE.dll!WaitForSingleObjectEx
2 xul.dll!`anonymous namespace'::ChildReaper::WillDestroyCurrentMessageLoop()
3 xul.dll!MessageLoop::~MessageLoop()
This is something we do on builds where we need child processes to fully shut down (instead of doing an early OS-level exit to save time) because of instrumentation for things like memory leaks and coverage: so, debug builds (refcount logging), ASan/LSan, ccov, etc. It's an indefinite wait for all child processes to exit, and it's done in a somewhat hacky way after XPCOM thread shutdown. There'd been talk of changing it to eventually time out and give up, but we'd want to avoid interrupting a crash report in progress.
The same IPC “feature” appears in bug 1719481 but I don't know if the root cause is the same.
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 28•3 years ago
|
||
There are 65 total failures in the last 7 days on - windows10-64-2004-qr debug
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=354845815&repo=autoland&lineNumber=15617
[task 2021-10-14T23:20:48.861Z] 23:20:48 INFO - TEST-START | browser/base/content/test/protectionsUI/browser_protectionsUI_trackers_subview.js
[task 2021-10-14T23:20:48.865Z] 23:20:48 INFO - GECKO(8124) | [Child 1104: Main Thread]: I/DocShellAndDOMWindowLeak --DOCSHELL 234c7913000 == 0 [pid = 1104] [id = 0] [url = http://trackertest.org/]
[task 2021-10-14T23:20:48.877Z] 23:20:48 INFO - GECKO(8124) | [Child 1104, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:527
[task 2021-10-14T23:20:48.882Z] 23:20:48 INFO - GECKO(8124) | [Child 1640, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:527
[task 2021-10-14T23:22:11.352Z] 23:22:11 INFO - GECKO(8124) | [Child 5136, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:298
[task 2021-10-14T23:22:13.617Z] 23:22:13 INFO - GECKO(8124) | XPCOM_MEM_BLOAT_LOG: C:\Users\task_163425230863140\AppData\Local\Temp\tmpr5hf2mxk.mozrunner\runtests_leaks.log
[task 2021-10-14T23:22:13.618Z] 23:22:13 INFO - GECKO(8124) | Writing to log: C:\Users\task_163425230863140\AppData\Local\Temp\tmpr5hf2mxk.mozrunner\runtests_leaks.log
[task 2021-10-14T23:22:13.620Z] 23:22:13 INFO - GECKO(8124) | RunWatchdog: Mainthread nested event loops during hang:
[task 2021-10-14T23:22:13.621Z] 23:22:13 INFO - GECKO(8124) | --- (no nested event loop active)
[task 2021-10-14T23:22:14.635Z] 23:22:14 INFO - GECKO(8124) | Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.) at /builds/worker/checkouts/gecko/toolkit/components/terminator/nsTerminator.cpp:256
[task 2021-10-14T23:22:14.733Z] 23:22:14 INFO - Initializing stack-fixing for the first stack frame, this may take a while...
[task 2021-10-14T23:22:24.308Z] 23:22:24 INFO - GECKO(8124) | #01: mozilla::`anonymous namespace'::RunWatchdog(void*) [toolkit/components/terminator/nsTerminator.cpp:256]
[task 2021-10-14T23:22:24.503Z] 23:22:24 INFO - GECKO(8124) | #02: PR_NativeRunThread(void*) [nsprpub/pr/src/threads/combined/pruthr.c:408]
[task 2021-10-14T23:22:24.514Z] 23:22:24 INFO - GECKO(8124) | #03: pr_root(void*) [nsprpub/pr/src/md/windows/w95thred.c:140]
[task 2021-10-14T23:22:24.518Z] 23:22:24 INFO - fix-stacks: error: failed to read debug info file `ucrtbase.pdb` for `C:\Windows\System32\ucrtbase.dll`
[task 2021-10-14T23:22:24.518Z] 23:22:24 INFO - fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2021-10-14T23:22:24.518Z] 23:22:24 INFO - fix-stacks: The system cannot find the file specified. (os error 2)
[task 2021-10-14T23:22:24.519Z] 23:22:24 INFO - GECKO(8124) | #04: configthreadlocale [C:\Windows\System32\ucrtbase.dll + 0x21bb2]
[task 2021-10-14T23:22:24.522Z] 23:22:24 INFO - fix-stacks: error: failed to read debug info file `wkernel32.pdb` for `C:\Windows\System32\KERNEL32.DLL`
[task 2021-10-14T23:22:24.522Z] 23:22:24 INFO - fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2021-10-14T23:22:24.522Z] 23:22:24 INFO - fix-stacks: The system cannot find the file specified. (os error 2)
[task 2021-10-14T23:22:24.523Z] 23:22:24 INFO - GECKO(8124) | #05: BaseThreadInitThunk [C:\Windows\System32\KERNEL32.DLL + 0x17034]
[task 2021-10-14T23:22:24.527Z] 23:22:24 INFO - fix-stacks: error: failed to read debug info file `wntdll.pdb` for `C:\Windows\SYSTEM32\ntdll.dll`
[task 2021-10-14T23:22:24.527Z] 23:22:24 INFO - fix-stacks: note: this is expected and harmless for all PDB files on opt automation runs
[task 2021-10-14T23:22:24.527Z] 23:22:24 INFO - fix-stacks: The system cannot find the file specified. (os error 2)
[task 2021-10-14T23:22:24.528Z] 23:22:24 INFO - GECKO(8124) | #06: RtlUserThreadStart [C:\Windows\SYSTEM32\ntdll.dll + 0x52651]
[task 2021-10-14T23:22:24.528Z] 23:22:24 INFO - GECKO(8124) | [GPU 4592, IPC I/O Child] WARNING: [4A180DBF879BE3FB.57C8326487332A4C]: Ignoring message 'EVENT_MESSAGE' to peer 37124C67C69AC9FD.7D14168D4B2BD336 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-10-14T23:22:24.529Z] 23:22:24 INFO - GECKO(8124) | [GPU 4592, IPC I/O Child] WARNING: [4A180DBF879BE3FB.57C8326487332A4C]: Ignoring message 'EVENT_MESSAGE' to peer 37124C67C69AC9FD.7D14168D4B2BD336 due to a missing broker: file /builds/worker/checkouts/gecko/ipc/glue/NodeController.cpp:297
[task 2021-10-14T23:22:24.529Z] 23:22:24 INFO - GECKO(8124) | Destroying context 0000027796667000 surface 00000277948652E0 on display 0000027796608200
[task 2021-10-14T23:22:24.530Z] 23:22:24 INFO - GECKO(8124) | [GPU 4592, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4439
[task 2021-10-14T23:22:24.530Z] 23:22:24 INFO - TEST-INFO | Main app process: exit 1
task 2021-10-14T23:22:24.548Z] 23:22:24 INFO - TEST-INFO | browser/base/content/test/protectionsUI/browser_protectionsUI_trackers_subview.js | This test created 1 hidden docshell(s)
[task 2021-10-14T23:22:24.548Z] 23:22:24 INFO - Buffered messages finished
[task 2021-10-14T23:22:24.549Z] 23:22:24 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
[task 2021-10-14T23:22:24.549Z] 23:22:24 INFO - runtests.py | Application ran for: 0:04:09.742412
[task 2021-10-14T23:22:24.549Z] 23:22:24 INFO - zombiecheck | Reading PID log: C:\Users\task_163425230863140\AppData\Local\Temp\tmp45prhdxvpidlog
task 2021-10-14T23:22:31.792Z] 23:22:31 INFO - mozcrash Saved app info as Z:\task_163425230863140\build\blobber_upload_dir\25e3ca0f-5074-4ff7-956a-c73d45a10776.extra
[task 2021-10-14T23:22:31.836Z] 23:22:31 INFO - PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread(void*)]
[task 2021-10-14T23:22:31.836Z] 23:22:31 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Crash dump filename: C:\Users\task_163425230863140\AppData\Local\Temp\tmpr5hf2mxk.mozrunner\minidumps\25e3ca0f-5074-4ff7-956a-c73d45a10776.dmp
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Operating system: Windows NT
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - 10.0.19041
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - CPU: amd64
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - family 6 model 85 stepping 7
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - 8 CPUs
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO -
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - GPU: UNKNOWN
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO -
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Crash address: 0x80ef856a
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Process uptime: 240 seconds
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO -
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - Thread 19 tid 7448 (crashed) - Shutdown Hang Terminator 0 xul.dll!mozilla::`anonymous namespace'::RunWatchdog(void*) [nsTerminator.cpp:03de2f75a442af1253075b106158bca2a198385b : 256 + 0x0]
[task 2021-10-14T23:22:31.847Z] 23:22:31 INFO - rax = 0x00007ffe8475fe73 rdx = 0x0000000000000000
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rcx = 0x00007ffe953bc948 rbx = 0x00007ffe9567a0f7
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rsi = 0x000000c1f04efd10 rdi = 0x0000000000000276
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rbp = 0x00007ffe956d2d08 rsp = 0x000000c1f04efcf0
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r8 = 0x000000c1f04efee0 r9 = 0x00007ffeb9250000
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r10 = 0x00007ffeb92a2651 r11 = 0x000000c1f04ef6b0
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r12 = 0x000002b755d33838 r13 = 0x000002b755d33848
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r14 = 0x0000000000000000 r15 = 0x00007ffe9567a118
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rip = 0x00007ffe80ef856a
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - Found by: given as instruction pointer in context
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - 1 nss3.dll!PR_NativeRunThread(void*) [pruthr.c:03de2f75a442af1253075b106158bca2a198385b : 399 + 0xe]
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rbx = 0x00007ffe9567a0f7 rbp = 0x00007ffe956d2d08
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rsp = 0x000000c1f04efd50 r12 = 0x000002b755d33838
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r13 = 0x000002b755d33848 r14 = 0x0000000000000000
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r15 = 0x00007ffe9567a118 rip = 0x00007ffe95529462
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - Found by: call frame info
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - 2 nss3.dll!pr_root(void*) [w95thred.c:03de2f75a442af1253075b106158bca2a198385b : 139 + 0xd]
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rbx = 0x00007ffe9567a0f7 rbp = 0x00007ffe956d2d08
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rsp = 0x000000c1f04efdd0 r12 = 0x000002b755d33838
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r13 = 0x000002b755d33848 r14 = 0x0000000000000000
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - r15 = 0x00007ffe9567a118 rip = 0x00007ffe95519e41
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - Found by: call frame info
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - 3 ucrtbase.dll!RtlpReAllocateHeap + 0x1ce
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rbx = 0x00007ffe9567a0f7 rbp = 0x00007ffe956d2d08
[task 2021-10-14T23:22:31.848Z] 23:22:31 INFO - rsp = 0x000000c1f04efe00 r12 = 0x000002b755d33838
[task 2021-10-14T23:22:31.849Z] 23:22:31 INFO - r13 = 0x000002b755d33848 r14 = 0x0000000000000000
[task 2021-10-14T23:22:31.849Z] 23:22:31 INFO - r15 = 0x00007ffe9567a118 rip = 0x00007ffeb6ef1bb2
[task 2021-10-14T23:22:31.849Z] 23:22:31 INFO - Found by: call frame info
Comment hidden (Intermittent Failures Robot) |
Comment 30•3 years ago
|
||
In the last 7 days there have been 58 failures: https://treeherder.mozilla.org/intermittent-failures/bugdetails?startday=2021-10-13&endday=2021-10-20&tree=trunk&bug=1730374
Happens on windows10-64-2004-qr debug.
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=355362647&repo=mozilla-central&lineNumber=57891
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 33•3 years ago
|
||
(In reply to Gabriele Svelto [:gsvelto] from comment #21)
I extracted a better stack trace including symbols for the Windows system libraries from the crash in comment 0 and here's what I can see:
- The main thread is stuck here waiting for the thread to be joined.
- A child process of some kind seems to have crashed, because the breakpad exception handler thread is busy writing out a minidump
- All the other threads appear to be waiting
- Not knowing all the IPC bits my guess is that the main thread is waiting for a child process to shut down, but the child crashed, and taking minidump is taking so long that we time out
It's hard to tell which child crashed but I think I can divine it by opening the minidump with Visual Studio
Looking at another stack it seems we are stuck on PostQueuedCompletionStatus
on the main thread. It seems that PostQueuedCompletionStatus
can delay its execution in some cases, not sure if this is what happens here, though. FWIW, we create the completion IO port with a concurrency of 1 thread only. From what I read, too much concurrency wrt to the initial setting seems to be a possible issue here.
Updated•3 years ago
|
Comment 34•3 years ago
|
||
I think bug 1684441 is the ASan/TSan equivalent of this shutdown hang issue.
Comment 35•3 years ago
|
||
I was confused about what sIOThread was and what was going on here, but Nika explained that sIOThread is the main processes IPC thread, and we're probably waitpiding for the child processes to go away here. Which I guess is what gsvelto was talking about in comment 21.
Comment 36•3 years ago
|
||
In logs where we don't hang, the shutdown sequence has these two lines from the GPU process (well, I'm assuming the Destroying Context is from the GPU process):
Destroying context 0000018618D65000 surface 000001861706A580 on display 00000186170FD800
[GPU 3572, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: ...
...then a bunch of --DOMWINDOW stuff and the like in the parent, then the parent exits.
Looking at about a half dozen of the failing logs, we get the Hit MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
and THEN we get some error messages from the GPU process:
[GPU 7188, IPC I/O Child] WARNING: ...: Ignoring message 'EVENT_MESSAGE' to peer ... due to a missing broker
[GPU 7188, IPC I/O Child] WARNING: ...: Ignoring message 'EVENT_MESSAGE' to peer ... due to a missing broker
Destroying context 0000027CBDC67000 surface 0000027CBBD6A190 on display 0000027CBBDFD400
[GPU 7188, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: ...
The first two messages are presumably the GPU process being sad because the parent went away, but it seems bad that we're apparently not shutting down until then. The previous logging from the GPU process was about a minute earlier.
Some but not all of the failing logs also feature log spam from telemetry in a child failing to send IPC right up until the point of the hang:
[Child 1008, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)'
...so maybe a content process is also taking a while to go away, sometimes. But the GPU stuff above is still there.
Maybe we could add some logging to figure out exactly which process we're waiting on, including a PID. It might also be good to get some logging of the PID for the process that we're getting a minidump for. Maybe there's some weird shutdown race there.
Updated•3 years ago
|
Comment 37•3 years ago
|
||
FWIW I opened one of the dmp files with Visual Studio.
The main thread is waiting for the IPC sIOThread
to join.
The IPC thread seems to be waiting for a content process to exit.
It is not clear to me who was supposed to tell the child process to exit, but it apparently did not exit yet (assuming that Windows recognizes an obsolete process handle).
There is some
[Child 6944, Main Thread] WARNING: IPC message discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:531
but I am not sure if this is said by the same child we are waiting for and then some
###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
message but I do not know if some of those were supposed to send the quit message to the child process. From the frequent "message discarded" entries in the log it seems something bad happened to our IPC layer in general?
In any case I am wondering if WillDestroyCurrentMessageLoop()
should have some timeout and forced kill via KillProcess()
to avoid such situations?
Comment 38•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #37)
message but I do not know if some of those were supposed to send the quit message to the child process. From the frequent "message discarded" entries in the log it seems something bad happened to our IPC layer in general?
This just means that the child tried to send a message to the parent after the IPDL stuff was shut down. We get some of that regularly, so it isn't a big deal. However, it looks like in many of these error logs we're spamming out a bunch of this for telemetry, and IIRC I didn't see that in non-failing logs. But it might just be a symptom rather than a cause of a process failing to shut down.
In any case I am wondering if
WillDestroyCurrentMessageLoop()
should have some timeout and forced kill viaKillProcess()
to avoid such situations?
I don't remember where it is off the top of my head, but we do already have code set up to do something along those lines, but it is intentionally disabled for debug and ASan builds because we need those processes to do stuff when shutting down. If we don't let them run their course, we'll either get other error messages or may start silently failing to get LSan/ASan errors from the content process. I'm hoping we can figure out why the child process is not going away. I'm also not sure what exactly is supposed to trigger it.
I'm doing a try run that includes logging of the PID of the process we're waiting for, and the PID of the process we're doing the minidump stuff with. It might confirm my theory that this is the GPU process.
Comment 39•3 years ago
|
||
I did a try run with patches that print out the pid of process
in WriteMinidumpWithExceptionForProcess()
, and the pid of process_
in WillDestroyCurrentMessageLoop()
.
In the three cases that I saw in my try run ("fortunately" this seems to reproduce fairly often on bc5), the process in WriteMinidumpWithExceptionForProcess() is for the main process, and that isn't happening until after we've hit the hang. So I think the presence of the minidump in the minidump stack is a red herring: it looks like it is just recording the actual crash from the hang watchdog. Hopefully that makes sense.
As for the other logging, in 2 cases there is something like this:
QQQ WillDestroyCurrentMessageLoop pid 8964
[Child 8964, Main Thread] WARNING: IPC message discarded [...]
The parent process is waiting for a content process to exit, but the content process is still far enough away from shutdown that it is still trying to send IPC messages. Well, these are actually telemetry messages, and as Nika found yesterday, there's actually an attempt to send them as late as the shutdown of the XPCOM component manager, via ShutdownTelemetry().
So I guess the GPU process is not the problematic process, it is a content process.
(In the third case, it is also a content process that we're waiting on, but the last logging message from that content process happened much earlier. But there was a crash in some process around the same time so maybe that caused things to go weird?)
Comment 40•3 years ago
|
||
In one of these logs, the process that isn't going away has a weird history. It gets created, goes along enough to create a leak log, then there's some ++DOCSHELL stuff, then less than a second later it is hitting "WARNING: MsgDropped in ContentChild:" and then the only logging it produces for the rest of its lifetime is failed attempts to send telemetry IPC messages to the parent. It is sitting around doing that for around 5 minutes! The other log where this is happening follows the same general pattern. Nika was wondering if there could be some kind of startup/shutdown race that causes the process to not shutdown.
Comment 41•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #40)
In one of these logs, the process that isn't going away has a weird history. It gets created, goes along enough to create a leak log, then there's some ++DOCSHELL stuff, then less than a second later it is hitting "WARNING: MsgDropped in ContentChild:" and then the only logging it produces for the rest of its lifetime is failed attempts to send telemetry IPC messages to the parent. It is sitting around doing that for around 5 minutes! The other log where this is happening follows the same general pattern. Nika was wondering if there could be some kind of startup/shutdown race that causes the process to not shutdown.
FWIW, in that log I read:
[task 2021-10-27T21:16:47.572Z] 21:16:47 INFO - GECKO(5636) | [Child 8964, Main Thread] WARNING: DispatchEvent called on non-current inner window, dropping. Please check the window in the caller instead.: file /builds/worker/checkouts/gecko/dom/base/nsGlobalWindowInner.cpp:4195
[task 2021-10-27T21:16:47.642Z] 21:16:47 INFO - GECKO(5636) | [Child 8964, Main Thread] WARNING: DispatchEvent called on non-current inner window, dropping. Please check the window in the caller instead.: file /builds/worker/checkouts/gecko/dom/base/nsGlobalWindowInner.cpp:4195
Just a wild guess (as I ignore many details): Could it be that we ignore some quit events if we are not the current inner window?
Comment 42•3 years ago
|
||
gerard-majax mention bug 1718210 as a shutdown hang involving the GPU process.
Comment 43•3 years ago
|
||
This is probably IPC, not threads, so I'll clear Kris's needinfo.
Comment hidden (Intermittent Failures Robot) |
Comment 45•3 years ago
|
||
I've been working on reproducing this locally on my Windows machine.
I can reproduce it very frequently (maybe even 100% of the time) with:
mach test --headless --enable-fission --verify browser/components/extensions/test/browser/browser_ext_tabs_move_window.js
It consistently fails during the first part of verification, where it runs the test 10 times. Next I'm going to figure out how to get the right logging going. Enabling chromium logging unfortunately seems to slow things down enough that the hang goes away.
Comment 46•3 years ago
|
||
Unfortunately, the hang I'm getting locally appears to be something else.
Updated•3 years ago
|
Comment 47•3 years ago
|
||
mccr8 says this test failure looks like it's Fission and debug only. It might or might not affect real users.
Of the 393 test failures from the last four weeks, all but one are on windows10-64-2004-qr. The one is one macosx1015-64-qr.
Comment 48•3 years ago
|
||
This is the stack for the ContentChild::ActorDestroy call:
#02: mozilla::dom::ContentChild::ActorDestroy(mozilla::ipc::IProtocol::ActorDestroyReason) [dom/ipc/ContentChild.cpp:2226]
#03: mozilla::ipc::IProtocol::DestroySubtree(mozilla::ipc::IProtocol::ActorDestroyReason) [ipc/glue/ProtocolUtils.cpp:623]
#04: mozilla::dom::PContentChild::OnChannelClose() [/ipc/ipdl/PContentChild.cpp::16069]
#05: mozilla::ipc::MessageChannel::OnNotifyMaybeChannelError() [ipc/glue/MessageChannel.cpp:2474]
#06: mozilla::detail::RunnableMethodImpl<mozilla::ipc::MessageChannel ,void (mozilla::ipc::MessageChannel::)(),0,mozilla::RunnableKind::Cancelable>::Run() [xpcom/threads/nsThreadUtils.h:1203]
#07: mozilla::RunnableTask::Run() [xpcom/threads/TaskController.cpp:468]
#08: mozilla::TaskController::DoExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:770]
#09: mozilla::TaskController::ExecuteNextTaskOnlyMainThreadInternal(mozilla::detail::BaseAutoLock<mozilla::Mutex &> const&) [xpcom/threads/TaskController.cpp:606]
#10: mozilla::TaskController::ProcessPendingMTTask(bool) [xpcom/threads/TaskController.cpp:390]
#11: mozilla::detail::RunnableFunction<`lambda at /builds/worker/checkouts/gecko/xpcom/threads/TaskController.cpp:126:7'>::Run() [xpcom/threads/nsThreadUtils.h:532]
#12: nsThread::ProcessNextEvent(bool, bool*) [xpcom/threads/nsThread.cpp:1178]
#13: NS_ProcessNextEvent(nsIThread*, bool) [xpcom/threads/nsThreadUtils.cpp:467]
#14: mozilla::ipc::MessagePump::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:107]
#15: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:325]
#16: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
#17: nsBaseAppShell::Run() [widget/nsBaseAppShell.cpp:139]
#18: nsAppShell::Run() [widget/windows/nsAppShell.cpp:603]
#19: XRE_RunAppShell() [toolkit/xre/nsEmbedFunctions.cpp:917]
#20: mozilla::ipc::MessagePumpForChildProcess::Run(base::MessagePump::Delegate*) [ipc/glue/MessagePump.cpp:235]
#21: MessageLoop::RunHandler() [ipc/chromium/src/base/message_loop.cc:325]
#22: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:307]
#23: XRE_InitChildProcess(int, char**, XREChildData const*) [toolkit/xre/nsEmbedFunctions.cpp:753]
It looks the same in non-failing runs, so whatever oddness is here is not sufficient to explain the hangs.
Comment 49•3 years ago
|
||
The failure rate in this bug could likely be greatly reduced by disabling browser/components/extensions/test/browser/browser_ext_tabs_move_window.js on Windows 10 Fission debug. It seems to be happening elsewhere, but a process created during that specific test seems to hang about 2/3 of the time.
Comment 50•3 years ago
|
||
Updated•3 years ago
|
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment 52•3 years ago
|
||
Comment 53•3 years ago
|
||
I'm kind of out of ideas here. Maybe inspiration will strike later. I'll move this to IPC because it seems like some kind of weird IPC shutdown issue, at least in the current form. Hopefully the test disablement will reduce the failure rate enough.
Comment 54•3 years ago
|
||
bugherder |
Comment hidden (Intermittent Failures Robot) |
Comment 56•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #53)
I'm kind of out of ideas here. Maybe inspiration will strike later. I'll move this to IPC because it seems like some kind of weird IPC shutdown issue, at least in the current form. Hopefully the test disablement will reduce the failure rate enough.
Hi Andrew, now that you added the pid in bug 1740529, I see a pattern here when I filter for it:
[task 2021-11-11T16:54:53.110Z] 16:54:53 INFO - GECKO(4396) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to C:\Users\task_163664756302097\AppData\Local\Temp\tmp1xjaw5t3.mozrunner\runtests_leaks_tab_pid1384.log
[task 2021-11-11T16:54:53.110Z] 16:54:53 INFO - GECKO(4396) | [1384, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:250
[task 2021-11-11T16:54:54.851Z] 16:54:54 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/docshell/base/nsDocShell.cpp:8535
[task 2021-11-11T16:54:55.039Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_StartedGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.061Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_DoneGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.093Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_StartedGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.114Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_DoneGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.117Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: Extra shutdown CC: 'i < NORMAL_SHUTDOWN_COLLECTIONS', file /builds/worker/checkouts/gecko/xpcom/base/nsCycleCollector.cpp:3353
[task 2021-11-11T16:54:55.120Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_StartedGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.123Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_DoneGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.124Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_StartedGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.125Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: IPC message 'PIdleScheduler::Msg_DoneGC' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T16:54:55.134Z] 16:54:55 INFO - GECKO(4396) | [Child 1384, Main Thread] WARNING: NS_ENSURE_TRUE(Preferences::InitStaticMembers()) failed: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:4439
[task 2021-11-11T17:00:11.018Z] 17:00:11 INFO - ==> process 3264 launched child process 1384 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="3264.649.62035424\28145897" -childID 325 -isForBrowser -prefsHandle 4632 -prefMapHandle 5476 -prefsLen 11781 -prefMapSize 255484 -jsInitHandle 1332 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 3264 "\\.\pipe\gecko-crash-server-pipe.3264" 5792 22fe6348970 tab)
[task 2021-11-11T17:00:11.070Z] 17:00:11 INFO - ==> process 3264 launched child process 1668 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="3264.773.2013840525\1546756485" -childID 387 -isForBrowser -prefsHandle 4916 -prefMapHandle 5904 -prefsLen 11860 -prefMapSize 255484 -jsInitHandle 1332 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 3264 "\\.\pipe\gecko-crash-server-pipe.3264" 6068 22fe7606970 tab)
[task 2021-11-11T17:00:11.364Z] 17:00:11 INFO - zombiecheck | Checking for orphan process with PID: 1384
[task 2021-11-11T17:00:19.174Z] 17:00:19 INFO - leakcheck | Processing leak log file C:\Users\task_163664756302097\AppData\Local\Temp\tmp1xjaw5t3.mozrunner\runtests_leaks_tab_pid1384.log
[task 2021-11-11T17:00:19.174Z] 17:00:19 INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, tab process 1384
[task 2021-11-11T17:07:54.894Z] 17:07:54 INFO - ==> process 6244 launched child process 5976 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="6244.365.1175471384\1084633623" -childID 183 -isForBrowser -prefsHandle 5868 -prefMapHandle 5960 -prefsLen 11579 -prefMapSize 255448 -jsInitHandle 1352 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 6244 "\\.\pipe\gecko-crash-server-pipe.6244" 6036 1f0f9540970 tab)
[task 2021-11-11T17:07:54.899Z] 17:07:54 INFO - ==> process 6244 launched child process 3636 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="6244.377.1886900087\1384882622" -childID 189 -isForBrowser -prefsHandle 3640 -prefMapHandle 3352 -prefsLen 11579 -prefMapSize 255448 -jsInitHandle 1352 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 6244 "\\.\pipe\gecko-crash-server-pipe.6244" 3376 1f0ef1e9870 tab)
[task 2021-11-11T17:09:49.592Z] 17:09:49 INFO - ==> process 3076 launched child process 8476 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="3076.131.1821038806\520881384" -childID 66 -isForBrowser -prefsHandle 6124 -prefMapHandle 5440 -prefsLen 11637 -prefMapSize 255484 -jsInitHandle 1364 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 3076 "\\.\pipe\gecko-crash-server-pipe.3076" 5056 1c7ae21b570 tab)
[task 2021-11-11T17:09:50.158Z] 17:09:50 INFO - 0 |TOTAL | 57 0| 21384 0|
[task 2021-11-11T17:13:26.009Z] 17:13:26 INFO - ==> process 3656 launched child process 628 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="3656.947.1384405750\374374135" -childID 474 -isForBrowser -prefsHandle 3136 -prefMapHandle 3280 -prefsLen 11250 -prefMapSize 255448 -jsInitHandle 1308 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 3656 "\\.\pipe\gecko-crash-server-pipe.3656" 4056 1be8696e970 tab)
[task 2021-11-11T17:19:58.515Z] 17:19:58 INFO - GECKO(2852) | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to C:\Users\task_163664756302097\AppData\Local\Temp\tmpc1fun4vn.mozrunner\runtests_leaks_tab_pid1384.log
[task 2021-11-11T17:19:58.516Z] 17:19:58 INFO - GECKO(2852) | [1384, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:250
[task 2021-11-11T17:20:00.714Z] 17:20:00 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: MsgDropped in ContentChild: file /builds/worker/checkouts/gecko/dom/ipc/ContentChild.cpp:2282
[task 2021-11-11T17:20:02.727Z] 17:20:02 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_AccumulateChildKeyedHistograms' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:02.728Z] 17:20:02 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendAccumulateChildKeyedHistograms(keyedHistogramsToSend)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:283
[task 2021-11-11T17:20:02.729Z] 17:20:02 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_RecordDiscardedData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:02.730Z] 17:20:02 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:298
[task 2021-11-11T17:20:08.606Z] 17:20:08 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_AccumulateChildHistograms' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:08.607Z] 17:20:08 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendAccumulateChildHistograms(histogramsToSend)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:279
[task 2021-11-11T17:20:08.608Z] 17:20:08 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_RecordDiscardedData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:08.608Z] 17:20:08 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:298
[task 2021-11-11T17:20:54.182Z] 17:20:54 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_UpdateChildScalars' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:54.183Z] 17:20:54 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendUpdateChildScalars(scalarsToSend)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:287
[task 2021-11-11T17:20:54.183Z] 17:20:54 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: IPC message 'PContent::Msg_RecordDiscardedData' discarded: actor cannot send: file /builds/worker/checkouts/gecko/ipc/glue/ProtocolUtils.cpp:525
[task 2021-11-11T17:20:54.184Z] 17:20:54 INFO - GECKO(2852) | [Child 1384, Main Thread] WARNING: '!ipcActor->SendRecordDiscardedData(discardedData)', file /builds/worker/checkouts/gecko/toolkit/components/telemetry/core/ipc/TelemetryIPCAccumulator.cpp:298
[task 2021-11-11T17:21:33.293Z] 17:21:33 INFO - GECKO(2852) | Waiting in WillDestroyCurrentMessageLoop for pid 1384
[task 2021-11-11T17:22:31.072Z] 17:22:31 INFO - ==> process 7628 launched child process 1384 ("Z:\task_163664756302097\build\application\firefox\firefox.exe" -contentproc --channel="7628.193.1023192696\825889016" -childID 97 -isForBrowser -prefsHandle 5620 -prefMapHandle 5616 -prefsLen 12080 -prefMapSize 255448 -jsInitHandle 1352 -jsInitLen 336328 -parentBuildID 20211111152544 -appDir "Z:\task_163664756302097\build\application\firefox\browser" - 7628 "\\.\pipe\gecko-crash-server-pipe.7628" 5732 2223b76b470 tab)
[task 2021-11-11T17:22:31.317Z] 17:22:31 INFO - zombiecheck | Checking for orphan process with PID: 1384
[task 2021-11-11T17:22:38.878Z] 17:22:38 INFO - leakcheck | Processing leak log file C:\Users\task_163664756302097\AppData\Local\Temp\tmpc1fun4vn.mozrunner\runtests_leaks_tab_pid1384.log
Could it be that we end up with a race between one parent asking a child process to close and another parent creating a child process who get's assigned the very same process ID before we reach WillDestroyCurrentMessageLoop in the first parent ? IIUC, Windows does reuse process IDs. Still it should be guaranteed to not do this until the last handle to a process has not been freed, so it looks like this should not be possible. But given we are out of ideas...
Updated•3 years ago
|
Comment 57•3 years ago
|
||
Hmm, are we sure that WillDestroyCurrentMessageLoop
ever does what we expect it to do? The message we print is only conditional with an #ifdef
and the existence of the process_
handle, but I see it appearing in the logs I looked at only when we actually block.
We call WillDestroyCurrentMessageLoop
only if we had a DestructionObserver
registered. So my doubt would be that the normal flow unregisters that observer before we get into MessageLoop::~MessageLoop
and that if we get here with a registered observer we always block?
Comment 58•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #57)
Hmm, are we sure that
WillDestroyCurrentMessageLoop
ever does what we expect it to do? The message we print is only conditional with an#ifdef
and the existence of theprocess_
handle, but I see it appearing in the logs I looked at only when we actually block.We call
WillDestroyCurrentMessageLoop
only if we had aDestructionObserver
registered. So my doubt would be that the normal flow unregisters that observer before we get intoMessageLoop::~MessageLoop
and that if we get here with a registered observer we always block?
Yeah, my guess is that usually processes should go away by the time we get to WillDestroyCurrentMessageLoop, so we're probably only going to hit this logging code in weird cases.
As for the PID reuse, I'm not sure. When looking at the bc5 test failure (that is now disabled), I saw that the PID of the process occurred a few times in the log in some cases but not others. Though the failures might not all be happening for the same reason.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Comment 61•3 years ago
|
||
(In reply to Andrew McCreight [:mccr8] from comment #58)
(In reply to Jens Stutte [:jstutte] from comment #57)
Hmm, are we sure that
WillDestroyCurrentMessageLoop
ever does what we expect it to do? The message we print is only conditional with an#ifdef
and the existence of theprocess_
handle, but I see it appearing in the logs I looked at only when we actually block.We call
WillDestroyCurrentMessageLoop
only if we had aDestructionObserver
registered. So my doubt would be that the normal flow unregisters that observer before we get intoMessageLoop::~MessageLoop
and that if we get here with a registered observer we always block?Yeah, my guess is that usually processes should go away by the time we get to WillDestroyCurrentMessageLoop, so we're probably only going to hit this logging code in weird cases.
As for the PID reuse, I'm not sure. When looking at the bc5 test failure (that is now disabled), I saw that the PID of the process occurred a few times in the log in some cases but not others. Though the failures might not all be happening for the same reason.
FWIW, I filed bug 1743362 and I'd want to move all these investigations to bug 1724337 and children.
Comment 62•3 years ago
|
||
Looks like this went to 0 on Nov 22, which was a good 10 days after the disabling patch landed, so something else either fixed this or influenced timing. Should we try reverting the disabling patch, and/or do we know what "fixed" this?
Comment 63•3 years ago
|
||
Superseded by bug 1742677 (and back to RunWatchdog as signature which had been tracked with 1358818 before).
Comment 64•3 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #63)
Superseded by bug 1742677 (and back to RunWatchdog as signature which had been tracked with 1358818 before).
The generic RunWatchdog case has many different possible causes. One is the WillDestroyCurrentMessageLoop
cause we want to investigate in bug 1724337 and children.
There are many other possible causes, in the current list of bug 1742677 I see mainly the SpinEventLoopUntil("CompositorThreadHolder::Shutdown", ...)
case which is actually bug 1712136 and is happening also in the wild.
We cannot keep these cases separate as in crash-stats if they happen as intermittents on our CI ?
Comment hidden (Intermittent Failures Robot) |
Comment 66•3 years ago
|
||
(In reply to Sebastian Hengst [:aryx] (needinfo me if it's about an intermittent or backout) from comment #63)
Superseded by bug 1742677 (and back to RunWatchdog as signature which had been tracked with 1358818 before).
So closing.
Updated•3 years ago
|
Description
•