IOInterposer fails assertion in UnregisterCurrentThread for new test
Categories
(Core :: XPCOM, defect)
Tracking
()
Tracking | Status | |
---|---|---|
firefox67 | --- | affected |
People
(Reporter: gregtatum, Unassigned)
References
(Blocks 1 open bug)
Details
Attachments
(2 files, 3 obsolete files)
I am working on Bug 1527737, and I am writing a new xpcshell test for the profiler and collecting FileIO markers. The test is failing only on debug builds, and it appears to be an error in the logic of the IOInterposer.
I set some breakpoints in RegisterCurrentThread, and UnregisterCurrentThread and observed the following behavior, which appears to be a bug.
thread #2: tid = 0x5790ed XUL`mozilla::IOInterposer::RegisterCurrentThread(aIsMainThread=false) at IOInterposer.cpp:501, name = 'Gecko_IOThread', stop reason = step over
sThreadLocalDataInitialized == false, do not create local storage.
thread #6: tid = 0x57961c XUL`mozilla::IOInterposer::RegisterCurrentThread(aIsMainThread=false) at IOInterposer.cpp:501, name = 'Socket Thread', stop reason = step over
sThreadLocalDataInitialized == false, do not create local storage.
thread #1: tid = 0x5790d0 XUL`mozilla::IOInterposer::RegisterCurrentThread(aIsMainThread=true) at IOInterposer.cpp:505, queue = 'com.apple.main-thread', stop reason = step over
sThreadLocalDataInitialized == true, create local storage.
thread #21: tid = 0x579b4d XUL`mozilla::IOInterposer::RegisterCurrentThread(aIsMainThread=false) at IOInterposer.cpp:505, name = 'Timer Thread', stop reason = step over
sThreadLocalDataInitialized == true, create local storage.
thread #6: tid = 0x57961c XUL`mozilla::IOInterposer::UnregisterCurrentThread() at IOInterposer.cpp:513, name = 'Socket Thread', stop reason = breakpoint 4.1
sThreadLocalDataInitialized == true, try to destruct
Failing try run on debug only: https://treeherder.mozilla.org/#/jobs?repo=try&revision=2cd5a551bd3c2df463aec59e3e0f04b63e540d52
Reporter | ||
Comment 1•6 years ago
|
||
Ok, I reproduced this assertion failure in a debug build:
STR:
- Run a debug build
- Go to https://profiler.firefox.com/
- Install the Gecko Profiler Add-on and open the menu
- Enable "Main thread IO" under features
- Click "Start"
- Click "Discard and Stop"
- Wait a second and the tab crashes
Assertion failure: curThreadData, at /Users/gregorytatum/dev/gecko/xpcom/build/IOInterposer.cpp:505
#01: nsThread::ThreadFunc(void*)[/Users/gregorytatum/dev/gecko/obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/XUL +0x1472da]
#02: _pt_root[/Users/gregorytatum/dev/gecko/obj-ff-dbg/dist/NightlyDebug.app/Contents/MacOS/libnss3.dylib +0x16e8b0]
#03: _pthread_body[/usr/lib/system/libsystem_pthread.dylib +0x3305]
#04: _pthread_start[/usr/lib/system/libsystem_pthread.dylib +0x626f]
Looking in searchfox, it appears that IOInterposer::RegisterCurrentThread is being called before initialization, so we are missing those registrations.
I'm not sure what the desired behavior is, should we ignore this, or somehow keep a list of these, and only register them once the interposer is initiated?
Reporter | ||
Comment 2•6 years ago
|
||
Markus: do you have thoughts on how we should approach this? I've only been looking at this from the FileIO perspective, but there is other IO types that are being interposed here. I'm not sure who's consuming that data without digging into it further.
Reporter | ||
Comment 3•6 years ago
|
||
Reporter | ||
Comment 4•6 years ago
|
||
Ok, I took another read-through today, and wrote a patch that retains references for the threads before initialization. This allows my tests from Bug 1527737 to pass, and my STR does not trigger the assertion anymore. The other approach I can think of is to just drop the assertion. :froydnj I flagged you for review, as I saw your review flag on a lot of pieces here. Let me know if there is someone I can redirect it to.
Reporter | ||
Comment 5•6 years ago
|
||
Add an RAII guarded initialization for the IO interposer to the
initialization process for xpc shell tests. This ensures that whenever
an xpcshell tests users the IOInterposer, that it will correctly
catch all registered threads, and will not miss any.
Depends on D20416
Reporter | ||
Comment 6•6 years ago
|
||
Add an RAII guarded initialization for the IO interposer to the
initialization process for content processes. This ensures that whenever
a content process uses the IOInterposer, that it will correctly catch
all registered threads, and will not miss any.
Depends on D20732
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Updated•6 years ago
|
Reporter | ||
Comment 7•6 years ago
|
||
Add an RAII guarded initialization for the IO interposer to the
initialization process for xpc shell tests. This ensures that whenever
an xpcshell tests uses the IOInterposer, that it will correctly
catch all registered threads, and will not miss any.
Reporter | ||
Comment 8•6 years ago
|
||
Add an RAII guarded initialization for the IO interposer to the
initialization process for content processes. This ensures that whenever
a content process uses the IOInterposer, that it will correctly catch
all registered threads, and will not miss any.
Depends on D20736
Reporter | ||
Updated•6 years ago
|
Reporter | ||
Comment 9•6 years ago
|
||
:froydnj Also, assuming this is the approach we go with, do you have a recommended try run for this? My normal aggressive try run for a profiler change is:
mach try -b do -p all -u xpcshell,gtest,mochitest-dt -t none
Comment 10•6 years ago
|
||
(In reply to Greg Tatum [:gregtatum] from comment #9)
:froydnj Also, assuming this is the approach we go with, do you have a recommended try run for this? My normal aggressive try run for a profiler change is:
mach try -b do -p all -u xpcshell,gtest,mochitest-dt -t none
My usual try run is -b do -p all -u all[x64] -t none
to hit all the Linux x86-64 tests, but I think the changes under discussion here do warrant a larger set of tests. I think I'd maybe just run all mochitests for a little more coverage, and I'd possibly run all of the talos tests, just to make sure there aren't weird interactions with IOInterposer and our performance tests.
Reporter | ||
Comment 11•6 years ago
|
||
Reporter | ||
Comment 12•6 years ago
|
||
Reporter | ||
Comment 13•6 years ago
|
||
It turns out sMasterList was showing up as a leak, so I ended up not using the RAII guard, and manually changed the call to IOInterposer::Clear
to be earlier in the child process shutdown process. It's passing locally now, so doing a quick full debug build and try run, before re-requesting review.
Reporter | ||
Comment 14•6 years ago
|
||
Reporter | ||
Comment 15•6 years ago
|
||
I'm getting failures with leak checks now in Windows debug, plus some other mysterious assertion errors around the Windows implementation of PoisonIOInterposer. I'm going to try narrow the scope of this patch a bit, and attempt to land the xpcshell part only, then follow-up with more investigation into the windows error with adding the initialization to the content process.
15:12:41 INFO - PID 9568 | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at z:/build/build/src/obj-firefox/dist/include/mozilla/interceptor/PatcherDetour.h:840
15:13:01 INFO - PID 9568 | #01: static int mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,128> >,long (*)(void **, unsigned long, _OBJECT_ATTRIBUTES *, _IO_STATUS_BLOCK *, _LARGE_INTEGER *, unsigned long, unsigned long, unsigned long, unsigned long, void *, unsigned long)>::InitOnceCallback(union _RTL_RUN_ONCE *, void *, void * *) [mozglue/misc/nsWindowsDllInterceptor.h:194]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #02: ntdll.dll + 0x2285f
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #03: KERNELBASE.dll + 0x539ca
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #04: mozilla::InitPoisonIOInterposer() [xpcom/build/PoisonIOInterposerWin.cpp:352]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #05: mozilla::IOInterposer::Init() [xpcom/build/IOInterposer.cpp:406]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #06: XRE_InitChildProcess(int,char * * const,XREChildData const *) [toolkit/xre/nsEmbedFunctions.cpp:382]
15:13:01 INFO -
15:13:02 INFO - PID 9568 | #07: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:265]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #08: wmain [toolkit/xre/nsWindowsWMain.cpp:129]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #09: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #10: KERNEL32.DLL + 0x13034
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #11: ntdll.dll + 0x71461
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #01: static int mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,128> >,long (*)(void **, unsigned long, _OBJECT_ATTRIBUTES *, _IO_STATUS_BLOCK *, _LARGE_INTEGER *, unsigned long, unsigned long, unsigned long, unsigned long, void *, unsigned long)>::InitOnceCallback(union _RTL_RUN_ONCE *, void *, void * *) [mozglue/misc/nsWindowsDllInterceptor.h:194]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #02: ntdll.dll + 0x2285f
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #03: KERNELBASE.dll + 0x539ca
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #04: mozilla::InitPoisonIOInterposer() [xpcom/build/PoisonIOInterposerWin.cpp:352]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #05: mozilla::IOInterposer::Init() [xpcom/build/IOInterposer.cpp:406]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #06: XRE_InitChildProcess(int,char * * const,XREChildData const *) [toolkit/xre/nsEmbedFunctions.cpp:382]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #07: NS_internal_main(int,char * *,char * *) [browser/app/nsBrowserApp.cpp:265]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #08: wmain [toolkit/xre/nsWindowsWMain.cpp:129]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #09: static int __scrt_common_main_seh() [f:/dd/vctools/crt/vcstartup/src/startup/exe_common.inl:288]
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #10: KERNEL32.DLL + 0x13034
15:13:02 INFO -
15:13:02 INFO - PID 9568 | #11: ntdll.dll + 0x71461
Here is the XPC shell try run: https://treeherder.mozilla.org/#/jobs?repo=try&revision=daf79d5536f88855b4544cde827e944a59414701
Comment 16•6 years ago
|
||
Comment 17•6 years ago
|
||
bugherder |
Reporter | ||
Comment 18•6 years ago
|
||
I'm re-opening this, as I only landed the xpc shell part of the patches, and need to still get the content process working on Windows.
Comment 19•6 years ago
|
||
(In reply to Greg Tatum [:gregtatum] from comment #15)
15:12:41 INFO - PID 9568 | Assertion failure: false (MOZ_ASSERT_UNREACHABLE: Unrecognized opcode sequence), at z:/build/build/src/obj-firefox/dist/include/mozilla/interceptor/PatcherDetour.h:840
15:13:01 INFO - PID 9568 | #01: static int mozilla::interceptor::FuncHook<mozilla::interceptor::WindowsDllInterceptor<mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,128> >,long (*)(void **, unsigned long, _OBJECT_ATTRIBUTES *, _IO_STATUS_BLOCK *, _LARGE_INTEGER *, unsigned long, unsigned long, unsigned long, unsigned long, void *, unsigned long)>::InitOnceCallback(union _RTL_RUN_ONCE *, void *, void * *) [mozglue/misc/nsWindowsDllInterceptor.h:194]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #02: ntdll.dll + 0x2285f
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #03: KERNELBASE.dll + 0x539ca
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #04: mozilla::InitPoisonIOInterposer() [xpcom/build/PoisonIOInterposerWin.cpp:352]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #05: mozilla::IOInterposer::Init() [xpcom/build/IOInterposer.cpp:406]
15:13:01 INFO -
15:13:01 INFO - PID 9568 | #06: XRE_InitChildProcess(int,char * * const,XREChildData const *) [toolkit/xre/nsEmbedFunctions.cpp:382]
Aaron, the hg history suggests you touched the code triggering this assertion in bug 1432653. Do you have suggestions about what needs to be done to fix it?
Comment 20•6 years ago
|
||
This is nsWindowsDllInterceptor
telling us that the instruction sequence at the beginning of the function being hooked is something that we do not yet support.
Seeing a disassembly (or even better, a minidump) of the function in question would be helpful here.
Assuming the line numbers in PoisonIOInterposerWin.cpp
haven't changed, NtReadFile
would be the function whose disassembly I'd want to look at (preferably side by side with the bytes of the function).
Comment 21•6 years ago
|
||
(Also knowing the version number for ntdll.dll would be useful)
Comment 22•6 years ago
|
||
I'm seeing this assertion on this try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b83546bbafd008708e18df6602cec454d58989f2&searchStr=Windows%2C10%2Cx64%2Cdebug&selectedJob=231325635
The minidump c96d5b4b-fe2e-43dc-9569-219142cd4bfc.dmp might be what you need.
ntdll.dll version is 6.2.17134.254
Reporter | ||
Updated•6 years ago
|
Updated•6 years ago
|
Comment 23•6 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #22)
I'm seeing this assertion on this try job: https://treeherder.mozilla.org/#/jobs?repo=try&revision=b83546bbafd008708e18df6602cec454d58989f2&searchStr=Windows%2C10%2Cx64%2Cdebug&selectedJob=231325635
The minidump c96d5b4b-fe2e-43dc-9569-219142cd4bfc.dmp might be what you need.
ntdll.dll version is 6.2.17134.254
Reporter | ||
Comment 24•6 years ago
|
||
I'm removing the needinfo from :aklotz, and moving it to Bug 1533808.
Reporter | ||
Comment 25•6 years ago
|
||
Also, I've landed the XPC shell part of this patch, but am waiting on Bug 1533808 before I can land the content process part of the patch.
Comment 26•6 years ago
|
||
Comment 27•6 years ago
|
||
Backed out changeset 3719f1901ffe (Bug 1529125) for mda assertion failures CLOSED TREE
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=237293896&repo=autoland&lineNumber=9180
Comment 28•6 years ago
|
||
Apparently we need to do something specific for the rdd (media decoding) process.
Removing the needinfo on Greg as he is traveling this week and I don't expect him to have time for this.
Comment 29•6 years ago
|
||
Comment 30•6 years ago
|
||
The assertion on Windows is:
GECKO(880) | Assertion failure: ok, at z:/build/build/src/obj-firefox/dist/include/mozilla/interceptor/Trampoline.h:350
PROCESS-CRASH | Main app process exited normally | application crashed [@ mozilla::interceptor::VMSharingPolicyShared<mozilla::interceptor::MMPolicyInProcess,128>::Items()]
Comment 31•6 years ago
|
||
Landing the patch here makes the mda test jobs on Windows Debug 64bit perma fail with the assertion in comment 30. I don't understand what the assertion means, so I'm afraid I'll need more help to move this forward.
Comment 32•6 years ago
|
||
I wonder if, given David's new patch, we end up in a scenario where we have reserved trampoline space but never end up committing anything. Perhaps that is why VirtualProtect
fails.
David, thoughts?
Comment 33•6 years ago
|
||
I don't see a dangling Commit but I don't see another cause. Still looking.
Comment 34•6 years ago
|
||
(In reply to David Parks (dparks) [:handyman] from comment #33)
I don't see a dangling Commit but I don't see another cause. Still looking.
Hi David,
It looks like mozilla-group:active_scm_level_1 has scope generic-worker:allow-rdp:aws-provisioner-v1/gecko-t-win*
so section 5.1 of the interactive loaner process shouldn't require any action. Make sure to assign the scopes generic-worker:allow-rdp:aws-provisioner-v1/gecko-t-win10-64-gpu
(section 4) and generic-worker:os-group:aws-provisioner-v1/gecko-t-win10-64-gpu/Administrators
(section 5.2) to the task, and to add the osGroups
property (also section 5.2). Please report if you have any problems, providing the list of scopes you see under this page and the ClientId
you are using (also shown on the same page).
Many thanks!
Updated•5 years ago
|
Comment 35•5 years ago
|
||
Comment 36•5 years ago
|
||
Backed out for talos xperf crashes on Windows:
https://hg.mozilla.org/integration/mozilla-inbound/rev/0f8f181540096810ec288bcb0954a798c05491ee
Failure log: https://treeherder.mozilla.org/logviewer.html#?job_id=243672292&repo=mozilla-inbound
16:36:42 INFO - TEST-INFO | started process 5580 (Z:\task_1556639211\build\application\firefox\firefox -wait-for-browser -no-deelevate -profile c:\users\task_1556639211\appdata\local\temp\tmpujv3fx\profile)
16:36:56 INFO - PID 5580 | __WARNTimeout (1/3) exceeded on http://127.0.0.1:49333/tests/tp5n/thesartorialist.blogspot.com/thesartorialist.blogspot.com/index.html__WARN
16:37:06 INFO - PID 5580 | __WARNTimeout (2/3) exceeded on http://127.0.0.1:49333/tests/tp5n/thesartorialist.blogspot.com/thesartorialist.blogspot.com/index.html__WARN
16:37:16 INFO - PID 5580 | __FAILTimeout in tp5n__FAIL
16:37:16 INFO - PID 5580 | __FAILTimeout (3/3) exceeded on http://127.0.0.1:49333/tests/tp5n/thesartorialist.blogspot.com/thesartorialist.blogspot.com/index.html__FAIL
16:37:16 INFO - PID 5580 | console.error: "You should not call finishTest without having first initted the Profiler"
17:06:42 INFO - Timeout waiting for test completion; killing browser...
17:06:44 INFO - mozcrash Downloading symbols from: https://queue.taskcluster.net/v1/task/ODjp3r4mQ1KXMoxFh-b7aA/artifacts/public/build/target.crashreporter-symbols.zip
17:06:49 INFO - mozcrash Copy/paste: Z:\task_1556639211\build\win32-minidump_stackwalk.exe c:\users\task_1556639211\appdata\local\temp\tmpujv3fx\profile\minidumps\893003da-9996-4f87-8e36-7119db37a5a0.dmp c:\users\task_1556639211\appdata\local\temp\tmpjspbc6
17:06:51 INFO - mozcrash Saved minidump as Z:\task_1556639211\build\blobber_upload_dir\893003da-9996-4f87-8e36-7119db37a5a0.dmp
17:06:51 INFO - PROCESS-CRASH | tp5n | application crashed [unknown top frame]
17:06:51 INFO - Crash dump filename: c:\users\task_1556639211\appdata\local\temp\tmpujv3fx\profile\minidumps\893003da-9996-4f87-8e36-7119db37a5a0.dmp
17:06:51 INFO - Operating system: Windows NT
17:06:51 INFO - 6.1.7601 Service Pack 1
17:06:51 INFO - CPU: x86
17:06:51 INFO - GenuineIntel family 6 model 63 stepping 2
17:06:51 INFO - 8 CPUs
17:06:51 INFO - GPU: UNKNOWN
17:06:51 INFO - No crash
17:06:51 INFO - Process uptime: 1800 seconds
17:06:51 INFO - Thread 0
17:06:51 INFO - 0 ntdll.dll!KiFastSystemCallRet + 0x0
17:06:51 INFO - eip = 0x779370b4 esp = 0x001cf6f8 ebp = 0x001cf764 ebx = 0x00000000
17:06:51 INFO - esi = 0x00000038 edi = 0x00000000 eax = 0x000000f8 ecx = 0x804ddf8d
17:06:51 INFO - edx = 0x001cf6bc efl = 0x00000246
17:06:51 INFO - Found by: given as instruction pointer in context
17:06:51 INFO - 1 ntdll.dll!ZwWaitForSingleObject + 0xc
17:06:51 INFO - eip = 0x77936a24 esp = 0x001cf6fc ebp = 0x001cf764 ebx = 0x00000000
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 2 KERNELBASE.dll!ResCKeGetCacheIndices + 0x1f
17:06:51 INFO - eip = 0x75ac179c esp = 0x001cf700 ebp = 0x001cf764 ebx = 0x00000000
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 3 KERNELBASE.dll!_except_handler4 + 0x78
17:06:51 INFO - eip = 0x75ade165 esp = 0x001cf75c ebp = 0x001cf77c
17:06:51 INFO - Found by: stack scanning
17:06:51 INFO - 4 kernel32.dll! ?? ?? ::FNODOBFM::`string' + 0x48
17:06:51 INFO - eip = 0x7785baa2 esp = 0x001cf784 ebp = 0x001cf790 ebx = 0x001cf77c
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 5 firefox.exe!mozilla::LauncherMain(int &,wchar_t * * const,mozilla::StaticXREAppData const &) [LauncherProcessWin.cpp:46179401c77b3f40291dea4cf636314399785c14 : 350 + 0x9]
17:06:51 INFO - eip = 0x010b431d esp = 0x001cf798 ebp = 0x001cf918
17:06:51 INFO - Found by: previous frame's frame pointer
17:06:51 INFO - 6 firefox.exe!wmain [nsWindowsWMain.cpp:46179401c77b3f40291dea4cf636314399785c14 : 101 + 0xf]
17:06:51 INFO - eip = 0x010b1038 esp = 0x001cf920 ebp = 0x001cf948 ebx = 0x7ffdd000
17:06:51 INFO - esi = 0x64fff0b0 edi = 0x002a1ed8
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 7 firefox.exe!static int __scrt_common_main_seh() [exe_common.inl : 288 + 0x1c]
17:06:51 INFO - eip = 0x010ea81c esp = 0x001cf950 ebp = 0x001cf990 ebx = 0x7ffdd000
17:06:51 INFO - esi = 0x64fff0b0 edi = 0x002a1ed8
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 8 kernel32.dll!RtlMultiByteToUnicodeN + 0x73c
17:06:51 INFO - eip = 0x77863c45 esp = 0x001cf998 ebp = 0x001cf99c
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - 9 ntdll.dll!__RtlUserThreadStart + 0x27
17:06:51 INFO - eip = 0x779537f5 esp = 0x001cf9a4 ebp = 0x001cf9dc
17:06:51 INFO - Found by: previous frame's frame pointer
17:06:51 INFO - 10 ntdll.dll!_RtlUserThreadStart + 0x1b
17:06:51 INFO - eip = 0x779537c8 esp = 0x001cf9e4 ebp = 0x001cf9f4
17:06:51 INFO - Found by: call frame info
17:06:51 INFO - Loaded modules:
17:06:51 INFO - 0x010b0000 - 0x0112efff firefox.exe 68.0.0.7059 (main)
17:06:51 INFO - 0x64ef0000 - 0x6500dfff ucrtbase.dll 10.0.17134.12
17:06:51 INFO - 0x670a0000 - 0x6718afff dbghelp.dll 6.1.7601.17514
17:06:51 INFO - 0x67620000 - 0x67622fff api-ms-win-crt-utility-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67630000 - 0x67632fff api-ms-win-crt-environment-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67640000 - 0x67642fff api-ms-win-crt-filesystem-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67650000 - 0x676bffff msvcp140.dll 14.15.26706.0
17:06:51 INFO - 0x67730000 - 0x67732fff api-ms-win-crt-time-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67740000 - 0x67744fff api-ms-win-crt-math-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67750000 - 0x67752fff api-ms-win-crt-locale-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x67760000 - 0x677a1fff mozglue.dll 68.0.0.7059
17:06:51 INFO - 0x69150000 - 0x69153fff api-ms-win-crt-convert-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x69160000 - 0x69163fff api-ms-win-crt-stdio-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x69170000 - 0x69172fff api-ms-win-crt-heap-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x69180000 - 0x69193fff VCRUNTIME140.dll 14.15.26706.0
17:06:51 INFO - 0x6b6a0000 - 0x6b6a3fff api-ms-win-crt-string-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x6b6b0000 - 0x6b6b2fff api-ms-win-core-synch-l1-2-0.dll 10.0.17134.12
17:06:51 INFO - 0x71030000 - 0x71032fff api-ms-win-core-timezone-l1-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x717c0000 - 0x717c2fff api-ms-win-core-file-l2-1-0.dll 10.0.17134.12
17:06:51 INFO - 0x71920000 - 0x71922fff api-ms-win-core-processthreads-l1-1-1.dll 10.0.17134.12
17:06:51 INFO - 0x72d20000 - 0x72d22fff api-ms-win-core-file-l1-2-0.dll 10.0.17134.12
17:06:51 INFO - 0x73230000 - 0x73233fff api-ms-win-crt-runtime-l1-1-0.dll 10.0.17134.12
17:06:59 INFO - 0x734a0000 - 0x734a2fff api-ms-win-core-localization-l1-2-0.dll 10.0.17134.12
17:06:59 INFO - 0x74f30000 - 0x74f38fff version.dll 6.1.7600.16385
17:06:59 INFO - 0x75940000 - 0x7598bfff apphelp.dll 6.1.7601.17514
17:06:59 INFO - 0x75ac0000 - 0x75b09fff KERNELBASE.dll 6.1.7601.17514
17:06:59 INFO - 0x76fa0000 - 0x7704bfff msvcrt.dll 7.0.7600.16385
17:06:59 INFO - 0x77050000 - 0x77068fff sechost.dll 6.1.7600.16385
17:06:59 INFO - 0x77070000 - 0x77110fff rpcrt4.dll 6.1.7601.17514
17:06:59 INFO - 0x771f0000 - 0x7728ffff advapi32.dll 6.1.7601.17514
17:06:59 INFO - 0x77810000 - 0x778e3fff kernel32.dll 6.1.7601.17514
17:06:59 INFO - 0x778f0000 - 0x77a2bfff ntdll.dll 6.1.7601.17514
17:06:59 INFO - rmtree() failed for "('c:\users\task_1556639211\appdata\local\temp\tmpujv3fx',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
17:06:59 INFO - rmtree() failed for "('c:\users\task_1556639211\appdata\local\temp\tmpujv3fx',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
17:06:59 INFO - rmtree() failed for "('c:\users\task_1556639211\appdata\local\temp\tmpujv3fx',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
17:06:59 INFO - rmtree() failed for "('c:\users\task_1556639211\appdata\local\temp\tmpujv3fx',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
17:06:59 INFO - rmtree() failed for "('c:\users\task_1556639211\appdata\local\temp\tmpujv3fx',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
17:06:59 INFO - Exception while removing profile directory: c:\users\task_1556639211\appdata\local\temp\tmpujv3fx
17:06:59 INFO - [Error 32] The process cannot access the file because it is being used by another process: 'c:\users\task_1556639211\appdata\local\temp\tmpujv3fx\profile\cert9.db'
17:06:59 INFO - TEST-UNEXPECTED-ERROR | tp5n | Found crashes after test run, terminating test
17:06:59 ERROR - Traceback (most recent call last):
17:06:59 INFO - File "Z:\task_1556639211\build\tests\talos\talos\run_tests.py", line 300, in run_tests
17:06:59 INFO - talos_results.add(mytest.runTest(browser_config, test))
17:06:59 INFO - File "Z:\task_1556639211\build\tests\talos\talos\ttest.py", line 64, in runTest
17:06:59 INFO - return self._runTest(browser_config, test_config, setup)
17:06:59 INFO - File "Z:\task_1556639211\build\tests\talos\talos\ttest.py", line 219, in _runTest
17:06:59 INFO - test_config['name'])
17:06:59 INFO - File "Z:\task_1556639211\build\tests\talos\talos\ttest.py", line 48, in check_for_crashes
17:06:59 INFO - raise TalosCrash('Found crashes after test run, terminating test')
17:06:59 INFO - TalosCrash: Found crashes after test run, terminating test
Reporter | ||
Updated•5 years ago
|
Comment 37•5 years ago
|
||
When running that test locally with the patch applied, I can reproduce the failure and see this in the terminal:
13:20:18 INFO - PID 8912 | ###!!! [Parent][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv
13:20:18 INFO - PID 8912 |
13:20:18 INFO - PID 8912 | [GFX1-]: Failed to initialize GPU process
13:20:18 INFO - PID 8912 |
13:20:18 INFO - PID 8912 | ###!!! [Parent][MessageChannel] Error: (msgtype=0x4C001A,name=PGPU::Msg_ShutdownVR) Channel error: cannot send/recv
Comment 38•5 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #37)
I tried making the patch apply only to content processes rather than all child processes. That made the messages in comment 37 disappear, but the tests still failed, without displaying any page, so I think we failed to start content processes. I wonder if xperf deadlocks with the IOInterposer.
Comment 39•5 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #38)
I wonder if xperf deadlocks with the IOInterposer.
Sorry to bother you again with this, but I'm stuck again. Do you have any idea about how to verify this or make it actionable?
Comment 40•5 years ago
|
||
Running this tests in a debug build gives some potentially interesting output:
Lots of:
[Child 6396, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
: file c:/buildhg/mozilla/xpcom/threads/BlockingResourceBase.cpp, line 329
--- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
calling context
[stack trace unavailable]
Sometimes there's a stack:
#01: `anonymous namespace'::MainThreadIOLoggerImpl::sIOThreadFunc (c:\buildhg\mozilla\xpcom\build\MainThreadIOLogger.cpp:114)
#02: _PR_NativeRunThread (c:\buildhg\mozilla\nsprpub\pr\src\threads\combined\pruthr.c:406)
#03: pr_root (c:\buildhg\mozilla\nsprpub\pr\src\md\windows\w95thred.c:138)
#04: o_strcat_s[C:\WINDOWS\System32\ucrtbase.dll +0x1e3fe]
#05: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x14034]
#06: patched_BaseThreadInitThunk (c:\buildhg\mozilla\mozglue\build\WindowsDllBlocklist.cpp:713)
#07: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x73691]
Assertion failure: sLogModuleManager != nullptr, at c:/buildhg/mozilla/xpcom/base/Logging.cpp:540
Several times this error:
minidump.cc:1571: ERROR: MinidumpThread has a memory region problem
If anybody wants to try to reproduce locally, the command to run this test is:
mach talos-test --suite xperf
Comment 41•5 years ago
|
||
(In reply to Florian Quèze [:florian] from comment #40)
Running this tests in a debug build gives some potentially interesting output:
Lots of:
[Child 6396, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
: file c:/buildhg/mozilla/xpcom/threads/BlockingResourceBase.cpp, line 329
--- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
calling context
[stack trace unavailable]
This definitely supports the theory if there being a deadlock. If you disable this line in theory you'll get a stack, let me know if that doesn't build though. Unfortunately that code tends to bitrot.
Sometimes there's a stack:
#01: `anonymous namespace'::MainThreadIOLoggerImpl::sIOThreadFunc (c:\buildhg\mozilla\xpcom\build\MainThreadIOLogger.cpp:114)
#02: _PR_NativeRunThread (c:\buildhg\mozilla\nsprpub\pr\src\threads\combined\pruthr.c:406)
#03: pr_root (c:\buildhg\mozilla\nsprpub\pr\src\md\windows\w95thred.c:138)
#04: o_strcat_s[C:\WINDOWS\System32\ucrtbase.dll +0x1e3fe]
#05: BaseThreadInitThunk[C:\WINDOWS\System32\KERNEL32.DLL +0x14034]
#06: patched_BaseThreadInitThunk (c:\buildhg\mozilla\mozglue\build\WindowsDllBlocklist.cpp:713)
#07: RtlUserThreadStart[C:\WINDOWS\SYSTEM32\ntdll.dll +0x73691]Assertion failure: sLogModuleManager != nullptr, at c:/buildhg/mozilla/xpcom/base/Logging.cpp:540
This implies that MainThreadIOLoggerImpl
is trying to log before we setup the log module manager. We should probably try to make sure logging / profiling is setup before this thing.
Comment 42•5 years ago
|
||
(In reply to Eric Rahm [:erahm] from comment #41)
(In reply to Florian Quèze [:florian] from comment #40)
Running this tests in a debug build gives some potentially interesting output:
Lots of:
[Child 6396, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
: file c:/buildhg/mozilla/xpcom/threads/BlockingResourceBase.cpp, line 329
--- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
calling context
[stack trace unavailable]This definitely supports the theory if there being a deadlock. If you disable this line in theory you'll get a stack, let me know if that doesn't build though.
Thanks! When I disable that line, the first occurence fails to generate a stack, I get:
[Child 1672, Main Thread] WARNING: Resource acquired is being released in non-LIFO order; why?
: file c:/buildhg/mozilla/xpcom/threads/BlockingResourceBase.cpp, line 329
--- Mutex : mozilla::mscom::LiveSet::mMutex (currently acquired)
calling context
ERROR: SymInitialize: Param´┐¢tre incorrect.
#01: ??? (???:???)
(and 23 more similar broken stack frames)
Then all the other cases that previous didn't have a stack now show a stack in a11y code, looking like this:
#01: mozilla::a11y::CreateHolderFromAccessible (c:\buildhg\mozilla\accessible\ipc\win\COMPtrTypes.cpp:50)
#02: mozilla::a11y::DocAccessible::DoInitialUpdate (c:\buildhg\mozilla\accessible\generic\DocAccessible.cpp:1372)
#03: mozilla::a11y::DocAccessibleWrap::DoInitialUpdate (c:\buildhg\mozilla\accessible\windows\msaa\DocAccessibleWrap.cpp:135)
#04: mozilla::a11y::NotificationController::WillRefresh (c:\buildhg\mozilla\accessible\base\NotificationController.cpp:662)
#05: nsRefreshDriver::Tick (c:\buildhg\mozilla\layout\base\nsRefreshDriver.cpp:1894)
#06: mozilla::RefreshDriverTimer::TickRefreshDrivers (c:\buildhg\mozilla\layout\base\nsRefreshDriver.cpp:319)
#07: mozilla::RefreshDriverTimer::Tick (c:\buildhg\mozilla\layout\base\nsRefreshDriver.cpp:344)
#08: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::TickRefreshDriver (c:\buildhg\mozilla\layout\base\nsRefreshDriver.cpp:708)
#09: mozilla::VsyncRefreshDriverTimer::RefreshDriverVsyncObserver::NotifyVsync (c:\buildhg\mozilla\layout\base\nsRefreshDriver.cpp:604)
#10: mozilla::layout::VsyncChild::RecvNotify (c:\buildhg\mozilla\layout\ipc\VsyncChild.cpp:67)
#11: mozilla::layout::PVsyncChild::OnMessageReceived (c:\buildhg\mozilla\obj-browser-dbg\ipc\ipdl\PVsyncChild.cpp:168)
#12: mozilla::ipc::PBackgroundChild::OnMessageReceived (c:\buildhg\mozilla\obj-browser-dbg\ipc\ipdl\PBackgroundChild.cpp:4061)
#13: mozilla::ipc::MessageChannel::DispatchAsyncMessage (c:\buildhg\mozilla\ipc\glue\MessageChannel.cpp:2152)
#14: mozilla::ipc::MessageChannel::DispatchMessage (c:\buildhg\mozilla\ipc\glue\MessageChannel.cpp:2078)
#15: mozilla::ipc::MessageChannel::RunMessage (c:\buildhg\mozilla\ipc\glue\MessageChannel.cpp:1938)
#16: mozilla::ipc::MessageChannel::MessageTask::Run (c:\buildhg\mozilla\ipc\glue\MessageChannel.cpp:1970)
#17: nsThread::ProcessNextEvent (c:\buildhg\mozilla\xpcom\threads\nsThread.cpp:1167)
#18: NS_ProcessNextEvent (c:\buildhg\mozilla\xpcom\threads\nsThreadUtils.cpp:486)
#19: nsSyncStreamListener::Available (c:\buildhg\mozilla\netwerk\base\nsSyncStreamListener.cpp:120)
#20: NS_ImplementChannelOpen (c:\buildhg\mozilla\netwerk\base\nsNetUtil.cpp:990)
#21: nsBaseChannel::Open (c:\buildhg\mozilla\netwerk\base\nsBaseChannel.cpp:619)
#22: nsMessageManagerScriptExecutor::TryCacheLoadAndCompileScript (c:\buildhg\mozilla\dom\base\nsFrameMessageManager.cpp:1289)
#23: nsMessageManagerScriptExecutor::LoadScriptInternal (c:\buildhg\mozilla\dom\base\nsFrameMessageManager.cpp:1218)
#24: mozilla::dom::BrowserChild::RecvLoadRemoteScript (c:\buildhg\mozilla\dom\ipc\BrowserChild.cpp:2154)
Not exactly what I expected. Might be a red herring. But given that there's code that opens a file at the top of the stack, it might still be related.
Comment 43•5 years ago
|
||
The only way that I could see that a11y stack triggering I/O is the first time that code path is taken - COM might be lazily loading some DLLs.
Comment 44•5 years ago
|
||
Have you tried running this under WinDbg
? If you reproduce it there, you can query the kernel for the wait chain.
Once you hit the deadlock, break into the debugger, switch to one of the stuck threads, and then type:
!analyze -hang -v
Comment 45•5 years ago
|
||
(In reply to Aaron Klotz [:aklotz] from comment #44)
Have you tried running this under
WinDbg
?
No, and I think anybody already familiar with WinDbg would do this much faster than it would take me to figure out how to do it, so any help here would be appreciated.
The steps to reproduce are simple:
- make a local build with the patch from this bug applied.
- run: mach talos-test --suite xperf
It's the child processes that deadlock, the parent process is working fine.
Reporter | ||
Updated•5 years ago
|
Updated•2 years ago
|
Description
•