Intermittent ccov PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread + 0x16a]
Categories
(Testing :: Code Coverage, defect)
Tracking
(Not tracked)
People
(Reporter: intermittent-bug-filer, Unassigned)
References
Details
(Keywords: crash, intermittent-failure, Whiteboard: [stockwell fixed:other])
Crash Data
Filed by: ncsoregi [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=344563965&repo=mozilla-central
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/LT5kLaJnR_e_kwAUmqJSFg/runs/0/artifacts/public/logs/live_backing.log
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread + 0x16a]
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - Crash dump filename: C:\Users\task_1625659277\AppData\Local\Temp\tmptkgwu0vx.mozrunner\minidumps\da69e453-e7a0-4138-8dc8-0760c21b39d3.dmp
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - Operating system: Windows NT
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - 10.0.17134
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - CPU: amd64
[task 2021-07-07T12:25:06.987Z] 12:25:06 INFO - family 6 model 94 stepping 3
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - 8 CPUs
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO -
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - GPU: UNKNOWN
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO -
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - Crash address: 0xba5eb1c7
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - Process uptime: 73 seconds
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO -
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - Thread 19 (crashed) - Shutdown Hang Terminator 0 xul.dll!void mozilla::nsTerminator::AdvancePhase(enum mozilla::ShutdownPhase) + 0x447
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - rax = 0x00007ffec36033ee rdx = 0x0000000000000000
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - rcx = 0x00007ffef1af4bb0 rbx = 0x00000219bbe59d30
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - rsi = 0x00007ffefeae3ca0 rdi = 0x0000000000000276
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - rbp = 0x0000000000000000 rsp = 0x000000d3e7fbfc50
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - r8 = 0x0000000000000008 r9 = 0x00007ffefd290cec
[task 2021-07-07T12:25:06.988Z] 12:25:06 INFO - r10 = 0x00000fffdfa72be7 r11 = 0x0022008000000002
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - r12 = 0x00007ffedbaf7c20 r13 = 0x00000219ca46b138
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - r14 = 0x00007ffedbaceef4 r15 = 0x00007fff001f82c0
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - rip = 0x00007ffeba5eb1c7
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - Found by: given as instruction pointer in context
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - 1 nss3.dll!PR_NativeRunThread + 0x16a
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - rbx = 0x00000219bbe59d30 rbp = 0x0000000000000000
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - rsp = 0x000000d3e7fbfca0 r12 = 0x00007ffedbaf7c20
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - r13 = 0x00000219ca46b138 r14 = 0x00007ffedbaceef4
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - r15 = 0x00007fff001f82c0 rip = 0x00007ffedb95186a
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - Found by: call frame info
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - 2 nss3.dll!PR_MD_CREATE_THREAD + 0xd9
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - rbx = 0x00000219bbe59d30 rbp = 0x0000000000000000
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - rsp = 0x000000d3e7fbfd10 r12 = 0x00007ffedbaf7c20
[task 2021-07-07T12:25:06.989Z] 12:25:06 INFO - r13 = 0x00000219ca46b138 r14 = 0x00007ffedbaceef4
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - r15 = 0x00007fff001f82c0 rip = 0x00007ffedb93dc69
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - Found by: call frame info
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - 3 ucrtbase.dll!RtlpHpSegPageRangeShrink + 0xda
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - rbx = 0x00000219bbe59d30 rbp = 0x0000000000000000
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - rsp = 0x000000d3e7fbfd40 r12 = 0x00007ffedbaf7c20
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - r13 = 0x00000219ca46b138 r14 = 0x00007ffedbaceef4
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - r15 = 0x00007fff001f82c0 rip = 0x00007ffefd26c4be
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO - Found by: call frame info
[task 2021-07-07T12:25:06.990Z] 12:25:06 INFO -
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 6•3 years ago
|
||
Update:
There have been 59 failures within the last 7 days:
• 44 failures on Windows 10 x64 2004 CCov WebRender opt
• 15 failures on Windows 10 x64 CCov WebRender opt
Recent failure log: https://treeherder.mozilla.org/logviewer?job_id=347570272&repo=mozilla-central&lineNumber=4467
[task 2021-08-05T20:11:44.686Z] 20:11:44 INFO - REFTEST TEST-START | layout/reftests/svg/as-image/defer-unsupported-1.svg == layout/reftests/svg/as-image/defer-unsupported-1-ref.svg
[task 2021-08-05T20:11:44.687Z] 20:11:44 INFO - REFTEST TEST-LOAD | file:///Z:/task_162819332576699/build/tests/reftest/tests/layout/reftests/svg/as-image/defer-unsupported-1.svg | 140 / 141 (99%)
[task 2021-08-05T20:11:44.762Z] 20:11:44 INFO - REFTEST TEST-LOAD | file:///Z:/task_162819332576699/build/tests/reftest/tests/layout/reftests/svg/as-image/defer-unsupported-1-ref.svg | 140 / 141 (99%)
[task 2021-08-05T20:11:44.826Z] 20:11:44 INFO - REFTEST TEST-PASS | layout/reftests/svg/as-image/defer-unsupported-1.svg == layout/reftests/svg/as-image/defer-unsupported-1-ref.svg | image comparison, max difference: 0, number of differing pixels: 0
[task 2021-08-05T20:11:44.826Z] 20:11:44 INFO - REFTEST TEST-END | layout/reftests/svg/as-image/defer-unsupported-1.svg == layout/reftests/svg/as-image/defer-unsupported-1-ref.svg
[task 2021-08-05T20:11:44.841Z] 20:11:44 INFO - REFTEST INFO | Slowest test took 995ms (file:///Z:/task_162819332576699/build/tests/reftest/tests/layout/reftests/svg/as-image/img-novb-widthAndHeight-all-1-ref.html)
[task 2021-08-05T20:11:44.841Z] 20:11:44 INFO - REFTEST INFO | Total canvas count = 4
[task 2021-08-05T20:11:44.903Z] 20:11:44 INFO - 1628194304903 Marionette TRACE Received observer notification quit-application
[task 2021-08-05T20:11:44.905Z] 20:11:44 INFO - 1628194304903 Marionette INFO Stopped listening on port 2828
[task 2021-08-05T20:11:44.905Z] 20:11:44 INFO - 1628194304904 Marionette DEBUG Marionette stopped listening
[task 2021-08-05T20:11:44.975Z] 20:11:44 INFO - JavaScript error: resource:///modules/Interactions.jsm, line 230: NS_ERROR_FAILURE: Component returned failure code: 0x80004005 (NS_ERROR_FAILURE) [nsIUserIdleService.removeIdleObserver]
[task 2021-08-05T20:11:44.983Z] 20:11:44 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-08-05T20:11:45.393Z] 20:11:45 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-08-05T20:11:45.394Z] 20:11:45 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-08-05T20:11:45.394Z] 20:11:45 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-08-05T20:11:46.795Z] 20:11:46 INFO - ###!!! [Parent][RunMessage] Error: Channel closing: too late to send/recv, messages will be lost
[task 2021-08-05T20:12:52.809Z] 20:12:52 ERROR - TEST-UNEXPECTED-FAIL | Last test finished | application terminated with exit code 1
[task 2021-08-05T20:12:52.809Z] 20:12:52 INFO - REFTEST INFO | Downloading symbols from: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/Fejj4_-TTPK135s8vNnWqw/artifacts/public/build/target.crashreporter-symbols.zip
[task 2021-08-05T20:12:55.929Z] 20:12:55 INFO - REFTEST INFO | Copy/paste: Z:/task_162819332576699/fetches\minidump_stackwalk\minidump_stackwalk.exe C:\Users\task_162819332576699\AppData\Local\Temp\tmpwosl7qvy.mozrunner\minidumps\e9aa6647-cb09-45dc-97c0-efa2fdb5d848.dmp C:\Users\task_162819332576699\AppData\Local\Temp\tmpohhk74m4
[task 2021-08-05T20:12:59.727Z] 20:12:59 INFO - REFTEST INFO | Saved minidump as Z:\task_162819332576699\build\blobber_upload_dir\e9aa6647-cb09-45dc-97c0-efa2fdb5d848.dmp
[task 2021-08-05T20:12:59.732Z] 20:12:59 INFO - REFTEST INFO | Saved app info as Z:\task_162819332576699\build\blobber_upload_dir\e9aa6647-cb09-45dc-97c0-efa2fdb5d848.extra
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - REFTEST PROCESS-CRASH | Last test finished | application crashed [@ PR_NativeRunThread + 0x16a]
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Mozilla crash reason: MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Crash dump filename: C:\Users\task_162819332576699\AppData\Local\Temp\tmpwosl7qvy.mozrunner\minidumps\e9aa6647-cb09-45dc-97c0-efa2fdb5d848.dmp
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Operating system: Windows NT
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - 10.0.19041
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - CPU: amd64
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - family 6 model 63 stepping 2
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - 6 CPUs
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO -
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - GPU: UNKNOWN
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO -
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Crash reason: EXCEPTION_BREAKPOINT
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Crash address: 0xacd98107
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Process uptime: 104 seconds
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO -
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - Thread 17 (crashed) - Shutdown Hang Terminator 0 xul.dll!void mozilla::nsTerminator::AdvancePhase(enum mozilla::ShutdownPhase) + 0x447
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - rax = 0x00007ffeb5f5be7b rdx = 0x0000000000000000
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - rcx = 0x00007ffee46ba8e0 rbx = 0x000002208fd59b30
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - rsi = 0x00007ffeed0fada0 rdi = 0x0000000000000276
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - rbp = 0x0000000000000000 rsp = 0x00000037540cfd60
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - r8 = 0x0000000000000002 r9 = 0x00000037540cfc08
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - r10 = 0x0000000000000013 r11 = 0x2200000080082000
[task 2021-08-05T20:12:59.776Z] 20:12:59 INFO - r12 = 0x00007ffed6607c20 r13 = 0x000002209a219538
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r14 = 0x00007ffed65def24 r15 = 0x00007ffeeeb9faa0
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rip = 0x00007ffeacd98107
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - Found by: given as instruction pointer in context
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - 1 nss3.dll!PR_NativeRunThread + 0x16a
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rbx = 0x000002208fd59b30 rbp = 0x0000000000000000
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rsp = 0x00000037540cfdb0 r12 = 0x00007ffed6607c20
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r13 = 0x000002209a219538 r14 = 0x00007ffed65def24
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r15 = 0x00007ffeeeb9faa0 rip = 0x00007ffed646186a
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - Found by: call frame info
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - 2 nss3.dll!PR_MD_CREATE_THREAD + 0xd9
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rbx = 0x000002208fd59b30 rbp = 0x0000000000000000
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rsp = 0x00000037540cfe20 r12 = 0x00007ffed6607c20
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r13 = 0x000002209a219538 r14 = 0x00007ffed65def24
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r15 = 0x00007ffeeeb9faa0 rip = 0x00007ffed644dc69
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - Found by: call frame info
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - 3 ucrtbase.dll!RtlpReAllocateHeap + 0x1ce
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rbx = 0x000002208fd59b30 rbp = 0x0000000000000000
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - rsp = 0x00000037540cfe50 r12 = 0x00007ffed6607c20
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r13 = 0x000002209a219538 r14 = 0x00007ffed65def24
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - r15 = 0x00007ffeeeb9faa0 rip = 0x00007ffeecb01bb2
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO - Found by: call frame info
[task 2021-08-05T20:12:59.777Z] 20:12:59 INFO -
Comment 7•3 years ago
|
||
Nika, as the owner of this component, could you help us assign it to someone?
Thank you.
Comment 8•3 years ago
|
||
FWIW, it looks like these are windows10-64 ccov WebRender shutdown hangs, spread across various test suites.
In a few logs I looked at, the main thread was in XPCOM shutdown, doing mozilla::ipc::BrowserProcessSubThread::`scalar deleting destructor'
Comment hidden (Intermittent Failures Robot) |
Comment 10•3 years ago
|
||
I managed to get a minidump open, though only for bug 1685204. I'm expecting that these two crashes have the same cause. We're hanging within ChildReaper::WillDestroyCurrentMessageLoop
here: https://searchfox.org/mozilla-central/rev/162a8620f126c27de70766808708696260f2256a/ipc/chromium/src/chrome/common/process_watcher_win.cc#38 in response to the I/O
thread trying to exit. The main thread is then hanging trying to join the I/O thread from BrowserProcessSubThread
's destructor.
I'm guessing these are only appearing in debug & ccov builds because toolkit.shutdown.fastShutdownStage
is disabled for these builds (https://searchfox.org/mozilla-central/rev/162a8620f126c27de70766808708696260f2256a/modules/libpref/init/StaticPrefList.yaml#11013-11017).
ni? :jld who I know has been touching the ChildReaper
code recently, and might have changed something which is leading to these hangs.
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment 13•3 years ago
|
||
I've been touching the code, but I haven't landed those patches, and they're only to the Unix side, and they don't change this behavior, which is intentional: On builds that need to do a full orderly shutdown because of instrumentation (leak checking, ccov, etc.), the parent process blocks at this point waiting for child processes to exit.
What would help is a way to get the stack for the child process in this case. Nika, you pointed out the other week that this should be converted to use nsIAsyncShutdown
instead of just blocking in late shutdown like this, and I see it has a state
property for exporting information into a hang crash… but in this case I think we'd want a way to say “do a paired minidump with pid N instead” (rather than trying to somehow get all the process's stacks in a useful format and encoded as nsIPropertyBag
s/nsIVariant
s).
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 25•3 years ago
|
||
There have been 109 total failures in the last 7 days, recent failure log.
Affected platforms are:
- windows10-64-2004-ccov-qr
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 28•3 years ago
|
||
The crashes here are all code coverage build related. Maybe they stopped on Friday, or failing jobs haven't been classified over the weekend yet. Marco, could you have a look please?
Comment hidden (Intermittent Failures Robot) |
Comment 30•3 years ago
|
||
From the crash reason (MOZ_CRASH(Shutdown hanging after all known phases and workers finished.)
), I'd say this was another instance where ccov instrumentation is slowing down shutdown too much, and with the move to Azure this got much worse.
It looks like this stopped on October 8th though, so maybe bug 1733611 was enough to fix this (or at least bring it back to be rarely happening).
Comment 31•3 years ago
|
||
No failures on trunk since the 9th of October.
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Updated•3 years ago
|
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment hidden (Intermittent Failures Robot) |
Comment 39•2 years ago
|
||
The severity field for this bug is set to S4
. However, the following bug duplicate has higher severity:
- Bug 1685204: S3
:marco, could you consider increasing the severity of this bug to S3
?
For more information, please visit auto_nag documentation.
Updated•2 years ago
|
Description
•