CreateRemoteThread (used to crash hanging child processes) doesn't work on Windows ccov builds
Categories
(Core :: IPC, defect)
Tracking
()
People
(Reporter: jld, Unassigned)
References
Details
Attachments
(1 file)
(deleted),
patch
|
Details | Diff | Splinter Review |
Discovered in bug 1805761: on Windows ccov builds (and apparently only ccov builds?), the use of CreateRemoteThread
by ProcessWatcher
to crash a possibly-hanging child process — including if it's intentionally blocked in Sleep
rather than in coverage runtime shutdown — fails with ERROR_ACCESS_DENIED
(error 5).
Thus, we do not get a crash report as promised by the log message. We also don't kill the process, and proceed to do an infinite wait for it to exit on its own (like the status quo before bug 1793525), which isn't ideal.
It would be nice to understand why that happens, but also we could just not attempt to get a crash report on that build type, or fall back to TerminateProcess
when it fails.
Comment 1•2 years ago
|
||
The Win32 error ERROR_ACCESS_DENIED
is not very explicit, but in the case of CreateRemoteThread
, the last error is likely set by converting from a more explicit NTSTATUS
value. It is possible to get the NTSTATUS
value by catching the call to ntdll!RtlNtStatusToDosError
, and retrieving the first argument. Normally I would recommend hooking the function to do this; but I'm not sure of the interaction between ccov builds and our hooking code. So instead, you can use the (dirty) attached patched. It sets, handles, and unsets a one-shot breakpoint on RtlNtStatusToDosError
. Then you can find the meaning for the reported NTSTATUS
code here.
Updated•2 years ago
|
Comment 2•2 years ago
|
||
One reason for getting ERROR_ACCESS_DENIED
would be that the original NTSTATUS
reported within CreateRemoteThread
is 0xC000010A
STATUS_PROCESS_IS_TERMINATING
. I've actually caught this NTSTATUS
locally if playing with kShutdownWaitMs
. If we manage to reproduce the intermittent with the patch and confirm that this is the NTSTATUS
that gets caught, then I believe the correct fix would be to accept ERROR_ACCESS_DENIED
as a benign error -- we cannot create the remote thread because the process is already terminating, but since the process is already terminating it's OK that we can't create the remote thread to crash it, we are in the happy scenario where the process isn't hanging. I have a try push here where we can try to relaunch jobs until we get the intermittent failure.
Comment 3•2 years ago
|
||
That seems to be what's happening indeed: reproduced in Wd2 and in Wd4.
[Parent 4588, IPC I/O Parent] WARNING: Process 4496 may be hanging at shutdown; will wait for up to 8000ms: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:119
[Parent 4588, IPC I/O Parent] WARNING: Process 4496 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
[Parent 4588, IPC I/O Parent] WARNING: Caught NTSTATUS 0xc000010a
: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:214
[Parent 4588, IPC I/O Parent] WARNING: CreateRemoteThread: error 5: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:235
[Parent 4588, IPC I/O Parent] WARNING: Process 1612 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
[Parent 4588, IPC I/O Parent] WARNING: Caught NTSTATUS 0xc000010a
: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:214
[Parent 4588, IPC I/O Parent] WARNING: CreateRemoteThread: error 5: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:235
[Parent 4588, IPC I/O Parent] WARNING: Process 8344 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
[Parent 4588, IPC I/O Parent] WARNING: Caught NTSTATUS 0xc000010a
: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:214
[Parent 4588, IPC I/O Parent] WARNING: CreateRemoteThread: error 5: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:235
[Parent 5032, IPC I/O Parent] WARNING: Process 6456 may be hanging at shutdown; will wait for up to 8000ms: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:119
[Parent 5032, IPC I/O Parent] WARNING: Process 6456 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
[Parent 5032, IPC I/O Parent] WARNING: Caught NTSTATUS 0xc000010a
: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:214
[Parent 5032, IPC I/O Parent] WARNING: CreateRemoteThread: error 5: file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:235
[Parent 5032, IPC I/O Parent] WARNING: Process 7408 hanging at shutdown; attempting crash report (fatal error): file Z:/task_167120411251431/build/src/ipc/chromium/src/chrome/common/process_watcher_win.cc:154
[Parent 5032, IPC I/O Parent] WARNING: Caught NTSTATUS 0xc000010a
I hope this clarifies what's happening and helps you decide how it should be fixed. I personally think that failing with ERROR_ACCESS_DENIED
is a good case where the fatal error shouldn't be reported -- since the process is indeed terminating so not hanging. Maybe the fatal error should be reported after the call to CreateRemoteThread
-- and only if the call did not fail with ERROR_ACCESS_DENIED
?
Comment 4•2 years ago
|
||
I personally think that failing with ERROR_ACCESS_DENIED is a good case where the fatal error shouldn't be reported -- since the process is indeed terminating so not hanging.
That sounds like it could lead to false positives. If the ERROR_ACCESS_DENIED
instead arises from the source process actually not having privileges to create a remote thread in the target process (which is probably not true now, but is a plausible unintended side effect of future changes to our test setup or sandboxing), that will silently make the test useless.
Also, the fact that it's internally reporting STATUS_PROCESS_IS_TERMINATING
may not actually mean we're not hanging! I suspect — based, admittedly, on nothing but questionable intuition — that this error code may be produced after the target process has exited the main program context and is currently within a DLL_PROCESS_DETACH
handler (which may have arbitrary misbehavior).
We could avoid the first issue by falling back to NtTerminateProcess
and checking the return code (STATUS_PROCESS_IS_TERMINATING
good; anything else bad). I'm not sure what we can do about the second, though, if it's real.
Comment 5•2 years ago
|
||
Thanks Ray for these interesting remarks. After testing with the code listed below -- here are some results:
- your intuition is correct that calling
CreateRemoteThread
on a process which is stuck in aDLL_PROCESS_DETACH
handler fails withSTATUS_PROCESS_IS_TERMINATING
(converted toERROR_ACCESS_DENIED
); - however calling
NtTerminateProcess
on such a process terminates it (as in, will stop execution of theDLL_PROCESS_DETACH
handlers and make the process return the provided exit code) and returns0
(STATUS_SUCCESS
), notSTATUS_PROCESS_IS_TERMINATING
.
// IsTerminating.exe
#include <windows.h>
#include <stdio.h>
constexpr bool kReproduceError = true;
int main(int argc, char* argv[])
{
if (argc <= 1) {
printf("Hello from parent\n");
STARTUPINFOW startupInfo{};
startupInfo.cb = sizeof(startupInfo);
PROCESS_INFORMATION processInfo{};
wchar_t commandLine[] = L"IsTerminating.exe -child";
if (!CreateProcessW(nullptr, commandLine, nullptr, nullptr, TRUE, 0, nullptr, nullptr, &startupInfo, &processInfo)) {
printf("CreateProcessW - %lu\n", GetLastError());
return 0;
}
Sleep(1000);
DWORD threadId = 0;
HANDLE thread = CreateRemoteThread(processInfo.hProcess, nullptr, 0, nullptr, nullptr, 0, &threadId);
printf("HANDLE=%p threadId=%lu LastError=%d\n", thread, threadId, GetLastError());
auto NtTerminateProcess = reinterpret_cast<DWORD(*)(HANDLE, DWORD)>(GetProcAddress(GetModuleHandleW(L"ntdll.dll"), "NtTerminateProcess"));
printf("NtTerminateProcess=0x%lx\n", NtTerminateProcess(processInfo.hProcess, 0));
}
else {
printf("Hello from child\n");
if constexpr (!kReproduceError) {
Sleep(2000);
}
LoadLibraryW(L"DllIsTerminating.dll");
}
return 0;
}
// DllIsTerminating.dll
#include <windows.h>
#include <stdio.h>
BOOL APIENTRY DllMain( HMODULE hModule, DWORD ul_reason_for_call, LPVOID lpReserved)
{
switch (ul_reason_for_call)
{
case DLL_PROCESS_ATTACH:
case DLL_THREAD_ATTACH:
case DLL_THREAD_DETACH:
break;
case DLL_PROCESS_DETACH:
printf("I WILL NEVER TERMINATE\n");
while (1) {
}
break;
}
return TRUE;
}
The output is:
Hello from parent
Hello from child
I WILL NEVER TERMINATE
HANDLE=0000000000000000 threadId=0 LastError=5
NtTerminateProcess=0x0
Comment 6•2 years ago
|
||
Windows distinguishes between an exiting process ("clean exit") and a terminating process ("forced exit"). Trying to create a thread (remotely or not) in an exiting process fails with STATUS_PROCESS_IS_TERMINATING
(converted to ERROR_ACCESS_DENIED
) even though strictly speaking the process is currently exiting, not terminating. However calling NtTerminateProcess
on an exiting process succeeds and makes the process a terminating one, so for example calling NtTerminateProcess
again right after the first call fails with STATUS_PROCESS_IS_TERMINATING
. Since creating new threads is not an option for an exiting process, all we can hope to do with these processes is to either terminate them or interact with the only remaining thread in the process.
So I think our options are as follows:
- for processes that may be hanging on shutdown code that runs before
ExitProcess
is called, we can callCreateRemoteThread
like with the current patch; - that may fail because the process has actually already reached
ExitProcess
and is thus exiting, in which case:- if it is safe to assume that all DLLs'
DLL_PROCESS_DETACH
handlers cannot block, we can simply let these processes end their lives peacefully; - if it is not safe to assume this, then:
- if we do not care to know where exiting processes get stuck, we can call
TerminateProcess
to just force the process to terminate; - if we do care, we could try to generate a dump based on the context of the remaining thread in the process, so the call stack will reflect the location where the thread is currently stuck.
- if we do not care to know where exiting processes get stuck, we can call
- if it is safe to assume that all DLLs'
I believe we are heading towards the last option, so here is how I think it can be achieved: enumerate threads to find the only remaining thread in the process; suspend it; get its context; then generate a dump from the captured context. In fact we seem to be doing something very similar here and we should try using this function and see if it works with exiting processes.
If it works we may actually want to consider using this function instead of the CreateRemoteThread
solution for (some?) non-exiting processes as well; the main difference between the two approaches seems to be that with WriteMinidumpForChild
we would have to decide of an arbitrary thread to blame, which can be difficult if there are many threads, and then the same hang might show up with different stacks; whereas with CreateRemoteThread
the blamed thread is the one we create so the call stacks would always be the same, including for different hangs, which is quite bad for triage.
(Edited: It looks much simpler to generate the dump from the captured context directly, if that works, rather than trying to crash the exiting process, which I was considering as a possibility at first.)
Updated•2 years ago
|
Description
•