Closed Bug 1265637 Opened 9 years ago Closed 4 years ago

Intermittent Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at LinkedList.h:332

Categories

(Core :: JavaScript Engine, defect)

49 Branch
x86_64
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
Tracking Status
firefox-esr68 --- wontfix
firefox-esr78 --- wontfix
firefox77 --- unaffected
firefox78 + wontfix
firefox79 + disabled
firefox80 --- wontfix

People

(Reporter: cbook, Unassigned)

References

(Regression)

Details

(Keywords: assertion, intermittent-failure, regression, Whiteboard: [stockwell unknown])

Crash Data

Attachments

(1 file)

In bughunter we see more and more : Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\obj-firefox\dist\include\mozilla/LinkedList.h:332 the problem on this is that this is making crash detection more complicated and we might miss regressions because of this Waldo, froydnj , seems you were working on this, can you take a look ? thanks!
Flags: needinfo?(nfroyd)
Flags: needinfo?(jwalden+bmo)
(In reply to Carsten Book [:Tomcat] from comment #0) > In bughunter we see more and more : > Assertion failure: isEmpty() (failing this assertion means this LinkedList's > creator is buggy: it should have removed all this list's elements before the > list's destruction), at > c:\builds\moz2_slave\m-cen-w32-d-000000000000000000\build\src\obj- > firefox\dist\include\mozilla/LinkedList.h:332 > > the problem on this is that this is making crash detection more complicated > and we might miss regressions because of this > > Waldo, froydnj , seems you were working on this, can you take a look ? > thanks! Do we not have stacks to tell us what LinkedList is not getting cleared? That's the important piece of information here, and should be reflected in the call stack.
Flags: needinfo?(nfroyd)
bc, do you have a stack? the log i saw didn't generated one, but you might have more insight to get a stack
Flags: needinfo?(bob)
The stacks are available from the crash reports. There are a variety of different stacks with this assertion so there may be more than one bug involved here.
Flags: needinfo?(bob)
This assertion indicates a problem with the caller, not with mfbt code. mfbt has nothing to do with anything that hits this assertion -- it's all on the callers. When filing bugs for this stuff, you should set needinfo corresponding to the location of the stack frame that *triggered* this assertion. Not against mfbt, or against froydnj or me. :-)
Flags: needinfo?(jwalden+bmo)
Attached file stack (deleted) —
steps to reproduce are to start the browser from the command line with -silent, e.g. firefox-debug/dist/bin/firefox -silent -profile /tmp/foobar
Component: General → XPCOM
I'm seeing this locally on OS X a lot, I get this assertion failure during shutdown. Stack looks similar to the one Bob posted in comment 5, it looks like a JS compartment is holding the linked list? Thread 0 Crashed:: Dispatch queue: com.apple.main-thread 0 XUL 0x000000010e34dd42 mozilla::LinkedList<js::UnboxedLayout>::~LinkedList() + 178 (LinkedList.h:329) 1 XUL 0x000000010e3200e0 JSCompartment::~JSCompartment() + 944 (ArrayBufferObject.h:497) 2 XUL 0x000000010e368587 JS::Zone::sweepCompartments(js::FreeOp*, bool, bool) + 583 (Utility.h:249) 3 XUL 0x000000010e3688bd js::gc::GCRuntime::sweepZones(js::FreeOp*, bool) + 381 (jsgc.cpp:3898) 4 XUL 0x000000010e375987 js::gc::GCRuntime::incrementalCollectSlice(js::SliceBudget&, JS::gcreason::Reason) + 1623 (jsgc.cpp:6258) 5 XUL 0x000000010e3765bc js::gc::GCRuntime::gcCycle(bool, js::SliceBudget&, JS::gcreason::Reason) + 460 (jsgc.cpp:6448) 6 XUL 0x000000010e376f35 js::gc::GCRuntime::collect(bool, js::SliceBudget, JS::gcreason::Reason) + 741 (jsgc.cpp:6553) 7 XUL 0x000000010e366456 js::gc::GCRuntime::gc(JSGCInvocationKind, JS::gcreason::Reason) + 86 (jsgc.cpp:6614) 8 XUL 0x000000010e5da23c JSRuntime::~JSRuntime() + 844 (Runtime.cpp:426) 9 XUL 0x000000010e2a2746 JS_DestroyRuntime(JSRuntime*) + 22 (Utility.h:249) 10 XUL 0x0000000109c7bb52 mozilla::CycleCollectedJSRuntime::~CycleCollectedJSRuntime() + 226 (CycleCollectedJSRuntime.cpp:475) 11 XUL 0x000000010a69e8be XPCJSRuntime::~XPCJSRuntime() + 14 (mozalloc.h:210) 12 XUL 0x000000010a6e8b0a nsXPConnect::~nsXPConnect() + 138 (nsXPConnect.cpp:107) 13 XUL 0x000000010a6e8b4e nsXPConnect::~nsXPConnect() + 14 (mozalloc.h:210) 14 XUL 0x000000010a6e89b1 nsXPConnect::Release() + 97 (nsXPConnect.cpp:42) 15 XUL 0x000000010a6acc49 xpcModuleDtor() + 9 (XPCJSID.cpp:267) 16 XUL 0x0000000109d0798c nsTArray_Impl<nsAutoPtr<nsComponentManagerImpl::KnownModule>, nsTArrayInfallibleAllocator>::RemoveElementsAt(unsigned long, unsigned long) + 124 (nsCOMPtr.h:403) 17 XUL 0x0000000109d02cc0 nsComponentManagerImpl::Shutdown() + 192 (nsComponentManager.cpp:912) 18 XUL 0x0000000109d40b75 mozilla::ShutdownXPCOM(nsIServiceManager*) + 1477 (XPCOMInit.cpp:991) 19 XUL 0x000000010d55e90a ScopedXPCOMStartup::~ScopedXPCOMStartup() + 186 (nsAppRunner.cpp:1473) 20 XUL 0x000000010d5661b7 XREMain::XRE_main(int, char**, nsXREAppData const*) + 1175 (mozalloc.h:210) 21 XUL 0x000000010d56645e XRE_main + 238 (nsAppRunner.cpp:4559) 22 org.mozilla.nightlydebug 0x0000000109219114 main + 2212 (nsBrowserApp.cpp:220) 23 org.mozilla.nightlydebug 0x0000000109218534 start + 52
Component: XPCOM → JavaScript Engine
OS: Unspecified → Mac OS X
Hardware: Unspecified → x86_64
Version: unspecified → 49 Branch
An easy fix for this would be to add: unboxedLayouts.clear(); wasmModuleWeakList.clear(); to the JSCompartment dtor. However, it's unclear to me whether that would just be hiding a bug elsewhere that should be fixed.
Flags: needinfo?(terrence)
(In reply to Jonathan Watt [:jwatt] from comment #7) > An easy fix for this would be to add: > > unboxedLayouts.clear(); > wasmModuleWeakList.clear(); > > to the JSCompartment dtor. However, it's unclear to me whether that would > just be hiding a bug elsewhere that should be fixed. That would be hiding a bug elsewhere, and would probably just move the assertion later as the heap would stilll not be empty. I just checked in bug 1268992, which should print the leaking edges before we get to this crash. That should allow us to use the CC leak logs to quickly track down the bug in gecko or chrome that is holding things live past shutdown.
Flags: needinfo?(terrence)
Depends on: 1309662
I just got it on Windows on shutdown as well; changing the platform to all.
OS: Mac OS X → All

There are 20 total failures in the last 7 days, on linux64, osx and windows.

Recent failure log: https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=231359838&repo=autoland&lineNumber=6655

22:38:42 INFO - TEST-START | xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_contentScripts_register.js
22:38:43 INFO - rmtree() failed for "('c:\users\task_1551478786\appdata\local\temp\xpc-profile-ksp3rq',)". Reason: The process cannot access the file because it is being used by another process (13). Retrying...
22:38:44 INFO - mozcrash Saved minidump as Z:\task_1551478786\build\blobber_upload_dir\c5e91ecd-3e9c-4eb7-8468-ec0346799c71.dmp
22:38:44 INFO - mozcrash Saved app info as Z:\task_1551478786\build\blobber_upload_dir\c5e91ecd-3e9c-4eb7-8468-ec0346799c71.extra
22:38:44 WARNING - PROCESS-CRASH | xpcshell-remote.ini:toolkit/components/extensions/test/xpcshell/test_ext_proxy_socks.js | application crashed [@ static void ??__FsList@?1??ThreadList@nsThread@@KAAEAV?$LinkedList@VnsThread@@@mozilla@@XZ@YAXXZ()]
22:38:44 INFO - Crash dump filename: c:\users\task_1551478786\appdata\local\temp\xpc-other-d6j_ek\c5e91ecd-3e9c-4eb7-8468-ec0346799c71.dmp
22:38:44 INFO - Operating system: Windows NT
22:38:44 INFO - 10.0.17134
22:38:44 INFO - CPU: amd64
22:38:44 INFO - family 6 model 85 stepping 4
22:38:44 INFO - 8 CPUs
22:38:44 INFO - GPU: UNKNOWN
22:38:44 INFO - Crash reason: EXCEPTION_BREAKPOINT
22:38:44 INFO - Crash address: 0x7ffd8f5828bc
22:38:44 INFO - Assertion: Unknown assertion type 0x00000000
22:38:44 INFO - Process uptime: 3 seconds
22:38:44 INFO - Thread 0 (crashed)
22:38:44 INFO - 0 xul.dll!static void ??__FsList@?1??ThreadList@nsThread@@KAAEAV?$LinkedList@VnsThread@@@mozilla@@XZ@YAXXZ() [Unified_cpp_xpcom_threads1.cpp:1712a35cd239049c186f7da80cdc759885ae81ae : 367 + 0x9c]
22:38:44 INFO - rax = 0x00007ffd967ef727 rdx = 0x00007ffdd5eda640
22:38:44 INFO - rcx = 0x00007ffdcd1afae0 rbx = 0x000002455feb21f8
22:38:44 INFO - rsi = 0x000002455feb1c40 rdi = 0x0000a3dfda9fe39f
22:38:44 INFO - rbp = 0x000002455feb22a0 rsp = 0x0000008fb9dff7d0
22:38:44 INFO - r8 = 0x0000008fb9df9928 r9 = 0x0000008fb9dfaf40
22:38:44 INFO - r10 = 0x0000000000000000 r11 = 0x0000008fb9dfae50
22:38:44 INFO - r12 = 0x0000000000000000 r13 = 0x0000000000000040
22:38:44 INFO - r14 = 0x0000008fb9dff8d8 r15 = 0x000002455feb1c40
22:38:44 INFO - rip = 0x00007ffd8f5828bc
22:38:44 INFO - Found by: given as instruction pointer in context
22:38:44 INFO - 1 ucrtbase.dll!<lambda_f03950bc5685219e0bcd2087efbe011e>::operator() + 0xc3
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff800 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffdd5e01243
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 2 ucrtbase.dll!__crt_seh_guarded_call<int>::operator()<<lambda_7777bce6b2f8c936911f934f8298dc43>,<lambda_f03950bc5685219e0bcd2087efbe011e> & ptr64,<lambda_3883c3dff614d5e0c5f61bb1ac94921c> > + 0x3b
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff860 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffdd5e01017
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 3 ucrtbase.dll!execute_onexit_table + 0x34
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff890 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffdd5e00fd4
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 4 xul.dll!static int dllmain_crt_process_detach(const bool) [dll_dllmain.cpp : 106 + 0x5]
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff8c0 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffd967bd4be
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 5 xul.dll!static int dllmain_dispatch(struct HINSTANCE
*, const unsigned long, void *) [dll_dllmain.cpp : 212 + 0xd]
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff8f0 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffd967bd65c
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 6 ntdll.dll!remainderf + 0x233
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff950 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x00007ffdd8f34053
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 7 0x2455fea3560
22:38:44 INFO - rbx = 0x000002455feb21f8 rbp = 0x000002455feb22a0
22:38:44 INFO - rsp = 0x0000008fb9dff9b0 r12 = 0x0000000000000000
22:38:44 INFO - r13 = 0x0000000000000040 r14 = 0x0000008fb9dff8d8
22:38:44 INFO - r15 = 0x000002455feb1c40 rip = 0x000002455fea3560
22:38:44 INFO - Found by: call frame info
22:38:44 INFO - 8 xul.dll + 0x738d6b0
22:38:44 INFO - rbp = 0x000002455feb22a0 rsp = 0x0000008fb9dff9b8
22:38:44 INFO - rip = 0x00007ffd967bd6b0
22:38:44 INFO - Found by: stack scanning
22:38:44 INFO - 9 ntdll.dll!_uncaught_exceptions + 0x15
22:38:44 INFO - rbp = 0x000002455feb22a0 rsp = 0x0000008fb9dff9c0
22:38:44 INFO - rip = 0x00007ffdd8f40a05
22:38:44 INFO - Found by: stack scanning
22:38:44 INFO - Loaded modules:
22:38:44 INFO - 0x7ff7beae0000 - 0x7ff7beb61fff plugin-container.exe 67.0.0.6999 (main)
22:38:44 INFO - 0x7ffd8f430000 - 0x7ffd98eb0fff xul.dll 67.0.0.6999
22:38:44 INFO - 0x7ffdaacf0000 - 0x7ffdaad09fff pnrpnsp.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdad7b0000 - 0x7ffdad8dbfff AudioSes.dll 10.0.17134.137
22:38:44 INFO - 0x7ffdb0d10000 - 0x7ffdb0d1dfff winrnr.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdb25c0000 - 0x7ffdb2667fff mscms.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdbc460000 - 0x7ffdbc475fff NapiNSP.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdbf9a0000 - 0x7ffdbfc8cfff nss3.dll 67.0.0.6999
22:38:44 INFO - 0x7ffdc1a70000 - 0x7ffdc1a88fff usp10.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdc2cd0000 - 0x7ffdc2cdffff ColorAdapterClient.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdc5ae0000 - 0x7ffdc5b7afff msvcp140.dll 14.15.26706.0
22:38:44 INFO - 0x7ffdc75c0000 - 0x7ffdc78dbfff DWrite.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdc7ca0000 - 0x7ffdc7e68fff dbghelp.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcb2d0000 - 0x7ffdcb2d8fff wsock32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcb600000 - 0x7ffdcb610fff credui.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcbf20000 - 0x7ffdcbf30fff lgpllibs.dll 67.0.0.6999
22:38:44 INFO - 0x7ffdcc8d0000 - 0x7ffdcc948fff InputHost.dll ???
22:38:44 INFO - 0x7ffdcc950000 - 0x7ffdcc9e6fff TextInputFramework.dll 10.0.17134.191
22:38:44 INFO - 0x7ffdccd20000 - 0x7ffdcce33fff Windows.UI.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcd140000 - 0x7ffdcd1b7fff mozglue.dll 67.0.0.6999
22:38:44 INFO - 0x7ffdcdab0000 - 0x7ffdcdac5fff VCRUNTIME140.dll 14.15.26706.0
22:38:44 INFO - 0x7ffdce130000 - 0x7ffdce1a5fff MMDevAPI.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcf350000 - 0x7ffdcf359fff version.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdcfae0000 - 0x7ffdcfaf9fff dhcpcsvc.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd0100000 - 0x7ffdd041dfff CoreUIComponents.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd06c0000 - 0x7ffdd06c9fff avrt.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd0790000 - 0x7ffdd08dcfff WinTypes.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd18a0000 - 0x7ffdd1a53fff propsys.dll 7.0.17134.112
22:38:44 INFO - 0x7ffdd1c70000 - 0x7ffdd1c82fff wtsapi32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd1cb0000 - 0x7ffdd1cc8fff nlaapi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd24b0000 - 0x7ffdd27bafff d3d11.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd3270000 - 0x7ffdd3349fff CoreMessaging.dll 10.0.17134.285
22:38:44 INFO - 0x7ffdd3350000 - 0x7ffdd3379fff WINMMBASE.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd3470000 - 0x7ffdd3492fff winmm.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd3750000 - 0x7ffdd37e7fff uxtheme.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd39a0000 - 0x7ffdd39c8fff dwmapi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd3fd0000 - 0x7ffdd408afff dxgi.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd4300000 - 0x7ffdd4330fff ntmarta.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd47b0000 - 0x7ffdd47e7fff IPHLPAPI.DLL 10.0.17134.1
22:38:44 INFO - 0x7ffdd47f0000 - 0x7ffdd48adfff dnsapi.dll 10.0.17134.165
22:38:44 INFO - 0x7ffdd4a30000 - 0x7ffdd4a95fff mswsock.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd4c00000 - 0x7ffdd4c0afff CRYPTBASE.DLL 10.0.17134.1
22:38:44 INFO - 0x7ffdd4d10000 - 0x7ffdd4d34fff bcrypt.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd4fc0000 - 0x7ffdd4fe6fff devobj.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd5110000 - 0x7ffdd5137fff userenv.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd5210000 - 0x7ffdd522efff profapi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd5230000 - 0x7ffdd5240fff kernel.appcore.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd5250000 - 0x7ffdd529bfff powrprof.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd52a0000 - 0x7ffdd52b1fff msasn1.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd52c0000 - 0x7ffdd52c9fff fltLib.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd52d0000 - 0x7ffdd5318fff cfgmgr32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd5320000 - 0x7ffdd533ffff win32u.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd5340000 - 0x7ffdd55b2fff KERNELBASE.dll 10.0.17134.165
22:38:44 INFO - 0x7ffdd55c0000 - 0x7ffdd5cccfff windows.storage.dll 10.0.17134.285
22:38:44 INFO - 0x7ffdd5cd0000 - 0x7ffdd5d6efff msvcp_win.dll 10.0.17134.137
22:38:44 INFO - 0x7ffdd5d70000 - 0x7ffdd5de9fff bcryptPrimitives.dll 10.0.17134.285
22:38:44 INFO - 0x7ffdd5df0000 - 0x7ffdd5ee9fff ucrtbase.dll 10.0.17134.254
22:38:44 INFO - 0x7ffdd5ef0000 - 0x7ffdd6081fff gdi32full.dll 10.0.17134.285
22:38:44 INFO - 0x7ffdd6140000 - 0x7ffdd6196fff wintrust.dll 10.0.17134.81
22:38:44 INFO - 0x7ffdd61a0000 - 0x7ffdd6381fff crypt32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd6550000 - 0x7ffdd65a0fff shlwapi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd6630000 - 0x7ffdd67bffff user32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd67c0000 - 0x7ffdd681afff sechost.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd6830000 - 0x7ffdd6b52fff combase.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd6c50000 - 0x7ffdd6c7cfff imm32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd6ce0000 - 0x7ffdd6d7dfff msvcrt.dll 7.0.17134.1
22:38:44 INFO - 0x7ffdd6d80000 - 0x7ffdd71cafff setupapi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd71d0000 - 0x7ffdd723bfff ws2_32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd7240000 - 0x7ffdd72e8fff SHCore.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd7350000 - 0x7ffdd73effff clbcatq.dll 2001.12.10941.16384
22:38:44 INFO - 0x7ffdd73f0000 - 0x7ffdd7540fff ole32.dll 10.0.17134.137
22:38:44 INFO - 0x7ffdd7550000 - 0x7ffdd7611fff oleaut32.dll 10.0.17134.48
22:38:44 INFO - 0x7ffdd77a0000 - 0x7ffdd7851fff kernel32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd7870000 - 0x7ffdd7910fff advapi32.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd7920000 - 0x7ffdd8d5ffff shell32.dll 10.0.17134.228
22:38:44 INFO - 0x7ffdd8d60000 - 0x7ffdd8e83fff rpcrt4.dll 10.0.17134.112
22:38:44 INFO - 0x7ffdd8e90000 - 0x7ffdd8e97fff nsi.dll 10.0.17134.1
22:38:44 INFO - 0x7ffdd8ea0000 - 0x7ffdd8ec7fff gdi32.dll 10.0.17134.285
22:38:44 INFO - 0x7ffdd8f00000 - 0x7ffdd90e0fff ntdll.dll 10.0.17134.254
22:38:44 INFO - >>>>>>>
22:38:44 INFO - PID 15724 | Unable to load \untrusted-startup-test-dll.dll; LoadLibraryW failed: 126[15724, Main Thread] WARNING: Failed to get directory to cache.: file z:/build/build/src/security/sandbox/win/src/sandboxbroker/sandboxBroker.cpp, line 81

The assertion always appears during devtools/client/debugger/new/test/mochitest/browser_dbg-worker-scopes.js, before this test fails with time out.

Jan can you take a look at this?

Flags: needinfo?(jdemooij)
Whiteboard: [stockwell needswork:owner]

Clearing the NI because this mostly stopped being an issue. Unboxed object removal and leak fixes might have helped (I think we got this with the unboxedLayouts list when Gecko leaked GC things on shutdown).

Flags: needinfo?(jdemooij)
Regressions: 1548163
Crash Signature: [@ static void ??__FsList@?1??ThreadList@nsThread@@KAAEAV?$LinkedList@VnsThread@@@mozilla@@XZ@YAXXZ()]

There are 10 failures in the last 7 days and 43 in the last 30.
Since this doesn't seem to be going away, Steven can you assign someone to take a look?

Flags: needinfo?(sdetar)
Whiteboard: [stockwell unknown] → [stockwell needswork:owner]

Jan, do you have any idea what to do with this bug?

Flags: needinfo?(sdetar) → needinfo?(jdemooij)

(In reply to Steven DeTar [:sdetar] from comment #66)

Jan, do you have any idea what to do with this bug?

The logs show that we fail to finish an off-thread parse task on the browser side because it's too late during shutdown. Then during JS_Shutdown we fail because there are still parse tasks in the list... I'll poke a bit more.

What I see in the logs is things like this:

INFO - GECKO(3598) | [Child 7149, JS Helper] WARNING: Called GetMainThread but there isn't a main thread and we're not the main thread.: file /builds/worker/workspace/build/src/xpcom/threads/nsThreadManager.cpp, line 579
INFO - GECKO(3598) | [Child 7149, JS Helper] WARNING: 'NS_FAILED(rv)', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp, line 241
INFO - GECKO(3598) | [Child 7149, JS Helper] ###!!! ASSERTION: Failed NS_DispatchToMainThread() in shutdown; leaking: 'false', file /builds/worker/workspace/build/src/xpcom/threads/nsThreadUtils.cpp, line 243
... snip ...
INFO - GECKO(3598) | WARNING: YOU ARE LEAKING THE WORLD (at least one JSRuntime and everything alive inside it, that is) AT JS_ShutDown TIME.  FIX THIS!
INFO - GECKO(3598) | Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at /builds/worker/workspace/build/src/obj-firefox/dist/include/mozilla/LinkedList.h:433

Maybe we're leaking a JSRuntime, then when we try to finish a parse task we fail (too late in shutdown) and in the end we also assert when destroying the process-wide HelperThreadState because the parse task linked list is not empty.

Does that seem plausible?

(I also noticed many of these logs mention devtools jsonview code or tests..)

Flags: needinfo?(continuation)

Yes, that is plausible. If helper threads are going to try to dispatch things to the main thread, they need to do that earlier. For regular browser stuff,
I think we shut down threads with the xpcom-shutdown-threads event. (Generally I'm not a fan of these various assertions that turn leaks into crashes.)

Flags: needinfo?(continuation)

Bugbug thinks this bug is a regression, but please revert this change in case of error.

Keywords: regression

The recent increase (started May 19th?) ran dom/security/ test before.

Flags: needinfo?(ckerschb)
Whiteboard: [stockwell unknown] → [stockwell needswork:owner]

The recent occurrences after the dom/security/ mochitests ran start with bug 1629866 according to retriggers.

Flags: needinfo?(ckerschb) → needinfo?(peterv)
Regressed by: 1629866

Comment 103 is about a leak reported during /html/browsers/offline/introduction-4/event_checking.https.html.

From the dupe we have in comm-central (where this is a perma fail on debug) - bug 1639446 - it started May 20 with this push:
https://hg.mozilla.org/mozilla-central/pushloghtml?fromchange=8f68705097b4bf88cd61b43b14401cde98ac75b6&tochange=855249e545c361516a65bcba8f5bc6b423e2d131

Whiteboard: [stockwell disable-recommended] → [stockwell needsword]
Whiteboard: [stockwell needsword] → [stockwell needswork]
Whiteboard: [stockwell needswork] → [stockwell needswork:owner]

[Tracking Requested - why for this release]: High frequency leak regression (from bug 1629866) according to comment 101.

While they are lower frequency than the failures in bug 1634641, the failures here in mochitest-browser-chrome-e10s-5 are happening while running browser/components/downloads/test/browser/, so I think they are the debug equivalent of the regression from bug 1606652. This is a debug build, so maybe there's a little more information in the log?

For instance, I see this shortly before the failure:

[Child 3900, JS Helper] WARNING: Called GetMainThread but there isn't a main thread and we're not the main thread.: file /builds/worker/checkouts/gecko/xpcom/threads/nsThreadManager.cpp, line 651
[Child 3900, JS Helper] WARNING: 'NS_FAILED(rv)', file /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp, line 254
[Child 3900, JS Helper] ###!!! ASSERTION: Failed NS_DispatchToMainThread() in shutdown; leaking: 'false', file /builds/worker/checkouts/gecko/xpcom/threads/nsThreadUtils.cpp, line 256
GECKO(3167) | #01: mozilla::SchedulerGroup::InternalUnlabeledDispatch(mozilla::TaskCategory, already_AddRefed<mozilla::SchedulerGroup::Runnable>&&) [xpcom/threads/SchedulerGroup.cpp:98]
GECKO(3167) | #02: mozilla::SchedulerGroup::LabeledDispatch(mozilla::TaskCategory, already_AddRefed<nsIRunnable>&&, mozilla::dom::DocGroup*) [xpcom/threads/SchedulerGroup.cpp:83]
GECKO(3167) | #03: mozilla::dom::OffThreadScriptLoaderCallback(JS::OffThreadToken*, void*) [dom/script/ScriptLoader.cpp:2218]
GECKO(3167) | #04: js::HelperThread::handleParseWorkload(js::AutoLockHelperThreadState&) [js/src/vm/HelperThreads.cpp:2257]

Flags: needinfo?(dpalmeiro)
Regressed by: 1606652

Although this is quite hard to reproduce, it does indeed seem to be a duplicate of bug 1634641 as I can also generate this assert in the other bug.

Flags: needinfo?(dpalmeiro)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Summary: Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at LinkedList.h:332 → Intermittent Assertion failure: isEmpty() (failing this assertion means this LinkedList's creator is buggy: it should have removed all this list's elements before the list's destruction), at LinkedList.h:332
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 187 failures in the last 7 days.
All failures are on debug:
Linux1804-64 -> 95 failures
Macosx1014-64 -> 36 failures
Windows10-64 -> 39 failures
Windows7-32 - 17 failures

Steven, can you please take a look?

Flags: needinfo?(sdetar)

Denis is currently working on this.

Flags: needinfo?(sdetar)
Flags: needinfo?(peterv)
Flags: needinfo?(jdemooij)
Flags: needinfo?(james)
Flags: needinfo?(dpalmeiro)

(In reply to Sebastian Hengst [:aryx] (needinfo on intermittent or backout) from comment #101)

The recent occurrences after the dom/security/ mochitests ran start with bug 1629866 according to retriggers.

This was an existing leak, where we were leaking a ton of BrowsingContexts. Bug 1629866 made the BrowsingContext hold an additional object (ChildSHistory), and that made the leak worse, but it doesn't cause it. Somebody needs to figure out the existing leak.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Denis are there updates here?

Flags: needinfo?(dpalmeiro)
Flags: needinfo?(dpalmeiro)

I have been mostly focused on bug 1634641, hoping it will address both issues. However if browser_pdfjs_preview.js is being disabled, then I can shift focus back onto this and come back to 1634641 afterwards.

Flags: needinfo?(dpalmeiro)

What looks like may be happening here is that we initiate an off thread script parse from the Scriptloader and while this is in the middle of compiling, a ShutdownXPCom is issued. Scriptloader tries to cancel these off thread parses, but since the tokens used to cancel these scripts are only made available when the parse is finished, they are never actually properly cleaned up and just linger in the parseFinishedList from the helper thread which leads to this assert. I believe this was always an existing problem, but since bug 1606652 we are now issuing many more scripts to be off thread compiled so it has a higher chance to fail now.

This also explains the race condition and may explain the leaks in bug 1634641. Trying to verify this with some changes at the moment and will work on a fix next.

bug 1606652 is only enabled for nightly, so this should not affect 79.

Thunderbird has had permanently failing tests due to this since at least 78. We'd appreciate any fix being uplifted.

Depends on: 1652126
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Hi Denis are there updates regarding the fix?

Flags: needinfo?(dpalmeiro)

Yes, I believe I have a fix for this in bug 1652126. I'm just trying to see if there's a better way to do it before I post a patch for review.

Flags: needinfo?(dpalmeiro)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

In the last 7 days there have been 115 occurrences on linux1804-64, macosx1014-64, windows10-64, build type debug.

Recent failure: https://treeherder.mozilla.org/logviewer.html#?job_id=311057848&repo=mozilla-central

The fix from https://bugzilla.mozilla.org/show_bug.cgi?id=1652126#c11 doesn't seems to work.

Since https://bugzilla.mozilla.org/show_bug.cgi?id=1654357#c2 landed on 21st of July there are still many failures as seen on Intermittent Failure View: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-07-19&endday=2020-07-26&tree=trunk&bug=1265637

Denis, please take a look at this. Thank you

Flags: needinfo?(dpalmeiro)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

(In reply to Cristina Coroiu [:ccoroiu] from comment #164)

The fix from https://bugzilla.mozilla.org/show_bug.cgi?id=1652126#c11 doesn't seems to work.

Since https://bugzilla.mozilla.org/show_bug.cgi?id=1654357#c2 landed on 21st of July there are still many failures as seen on Intermittent Failure View: https://treeherder.mozilla.org/intermittent-failures.html#/bugdetails?startday=2020-07-19&endday=2020-07-26&tree=trunk&bug=1265637

Denis, please take a look at this. Thank you

The tracebacks look different now. I don't see any in handleParseWorkload like before. In any case, Joel can you please help back out bug 1606652 to see if this fixes the problem?

Flags: needinfo?(dpalmeiro) → needinfo?(jmaher)

I did a try push with Bug 1606652 backed out:
https://treeherder.mozilla.org/#/jobs?repo=try&selectedTaskRun=JRBQSSVcQSC9AjHqIcEsaw.0&resultStatus=testfailed%2Cbusted%2Cexception%2Csuccess%2Cusercancel%2Crunning%2Cpending%2Crunnable&revision=a103d9148c440ec3977476c2eb830104026bee81

It contains browser-chrome tests on which the assertion failure discussed here has been seen recently, I don't know if the
failure rate seen there can warrant a backout, but "handleParseWorkload" can be seen in the trace.

Thanks Alexandru. I assume then that bug 1606652 is not the offending regressor here? This would seem to explain why turning the feature off also had no effect.

I assume based on :malexandru's try push this is not needed.

Flags: needinfo?(jmaher)
No longer depends on: 1652126

Denis are you still working on this considering it's not from bug 1606652 ?

Flags: needinfo?(dpalmeiro)

(In reply to Andreea Pavel [:apavel] from comment #178)

Denis are you still working on this considering it's not from bug 1606652 ?

No, I am not. I am mostly working on bug 1652126 to help fix bug 1634641.

Flags: needinfo?(dpalmeiro)

Steven can you assign someone to take a look here?

Flags: needinfo?(sdetar)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

Denis has been looking at this, though but via one other bugs in Comment 180. This bug just seems to be more of meta bug based on the history.

Flags: needinfo?(sdetar)

Hi Joel, can you please take a look at the above comments and tell us what to do next here? Do we make this a meta bug?

Flags: needinfo?(jmaher)

this bug has been around for a long time, I suspect there are many causes and fixes required to get rid of this. Unfortunately this is across many OS and test suites, I cannot think of a way to narrow this down much more other than debugging. This maybe could become a meta bug, or we could just have a bunch of sub bugs to fix cases as they are debugged.

Flags: needinfo?(jmaher)

I wrote a patch that I think may help split this bug into sub-bugs based on the different users of the LinkedList code. Patch is here, but so far the try push hasn't turned up any of this failure so I haven't yet verified the error message has been changed in the way I intend. I'll keep retriggering periodically.

My theory is that some major leaks (those that include a lot of JS and DOM stuff) turn into this assertion. I would expect that they are all for the same class. Honestly, my preference would be to stop asserting for this and let the leak checker do what it does best. Although large JS/DOM leaks tend to also turn into a single bucket.

I'm happy to write a patch to remove the assertion if you think that's the way to go! IMO assertions should only be used for things that are relatively rare and are actually going to be addressed, and this doesn't seem to qualify given how long and how frequently this failure has been occurring.

As a leak person, I think that would be an improvement on the current situation. MFBT people may disagree.

Nathan, as an MFBT person, what do you think? (see comment 185 onwards)

Flags: needinfo?(nfroyd)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are 145 failures in the last 7 days.
Most of them (122 failures) are on linux1804-64/debug

:sdetar, can you please take a look?

Flags: needinfo?(sdetar)

Jeff, could you have a look at what Kartikaya was proposing in comment 189? This bug has the second highest failure rate with:

Here's how the changes look: https://hg.mozilla.org/try/rev/c1c688e3603dc850ca7ebd813295c966ac9f6b55 and the try push which has no such failures on it. Thank you.

Flags: needinfo?(jwalden)
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #191)

Nathan, as an MFBT person, what do you think? (see comment 185 onwards)

Comment 4 still stands, IMHO. But I guess if getting better information from the leak checker is desirable, we could go ahead and disable this assertion for a cycle or so.

Flags: needinfo?(nfroyd)

I don't think disabling it for a cycle will help much. The question is more about whether the assertion provides any value in its current form at all.

Consider the case where there are callers who want to use the LinkedList code and legitimately don't care about the leaks - right now the only real option for those callers is to use a different LinkedList implementation. So from an API point of view, it seems unnecessarily limiting for the LinkedList code to have this assertion. It's unclear to me why the LinkedList code has this assertion in the first place - it can help detect leaks, but we have other mechanisms to do that as well.

(In reply to Andrew McCreight [:mccr8] from comment #188)

My theory is that some major leaks (those that include a lot of JS and DOM stuff) turn into this assertion. I would expect that they are all for the same class.

Do you know which class? If not then I think my patch may provide at least some useful information (assuming it works as I intended).

Honestly, my preference would be to stop asserting for this and let the leak checker do what it does best. Although large JS/DOM leaks tend to also turn into a single bucket.

I guess I don't fully understand this. Can you elaborate a bit? If we remove the assertion, is there information the leak checker will provide that will facilitate fixing this bug? That's what I was assuming, since the assertion causes a process crash and presuambly we don't run the leak checking diagnostic stuff after that.

Flags: needinfo?(nfroyd)
Flags: needinfo?(jwalden)
Flags: needinfo?(continuation)

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #203)

Do you know which class? If not then I think my patch may provide at least some useful information (assuming it works as I intended).

I have no idea. I have some vague recollection it was some JS engine thing, but I could be wrong.

I guess I don't fully understand this. Can you elaborate a bit? If we remove the assertion, is there information the leak checker will provide that will facilitate fixing this bug? That's what I was assuming, since the assertion causes a process crash and presuambly we don't run the leak checking diagnostic stuff after that.

It is hard to say. It probably won't help. Different leak that leak the entire DOM of a web page tend to look the same to the leak checker. Sometimes we can get the URL of an individual page that is leaking, and that can help.

Flags: needinfo?(continuation)

This certainly seems to qualify for conversion to a non-fatal NS_ASSERTION instead of a fatal MOZ_ASSERT at least until the situation is clearer. We may have to adjust tests which check assertion counts, but it is better than a fatal assertion I think.

(In reply to Bob Clary [:bc] from comment #205)

This certainly seems to qualify for conversion to a non-fatal NS_ASSERTION instead of a fatal MOZ_ASSERT at least until the situation is clearer. We may have to adjust tests which check assertion counts, but it is better than a fatal assertion I think.

My understanding is that we can't use NS_ASSERTION in mfbt code, because NS_ASSERTION is defined in xpcom and there's some dependency violation if we use it in mfbt.

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #207)

(In reply to Bob Clary [:bc] from comment #205)

This certainly seems to qualify for conversion to a non-fatal NS_ASSERTION instead of a fatal MOZ_ASSERT at least until the situation is clearer. We may have to adjust tests which check assertion counts, but it is better than a fatal assertion I think.

My understanding is that we can't use NS_ASSERTION in mfbt code, because NS_ASSERTION is defined in xpcom and there's some dependency violation if we use it in mfbt.

You can use it if MOZILLA_INTERNAL_API; see examples elsewhere in mfbt/.

Flags: needinfo?(sdetar)

My patch from comment 187 causes a crash and I don't know why. But it shows that at least in some instances the LinkedList is holding instances of type js::ParseTask. https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&revision=b09bb71cc35359769a76a5dbc9e3f04bf31ab435

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #203)

I don't think disabling it for a cycle will help much. The question is more about whether the assertion provides any value in its current form at all.

I meant insofar as we might be able to get more precise information about what's getting leaked, fix the leaks, and then turn it back on.

I suppose I'd r+ a patch to turn it off, but I should think we'd still have the leaks, and we might not have as much information about them. I see the patch in comment 187, but I don't understand "it causes a crash and I don't know why" in comment 211 -- it looks like it caused one crash in the try push, exactly as it was supposed to.

So I'd support one of two things:

  1. Change the code according to the patch in comment 187 so we can get more information.
  2. Remove the assert, but only if we actually get useful information from the leak checker or equivalent -- otherwise, we're getting rid of this problem and not actually moving forward to a solution.
Flags: needinfo?(nfroyd)

Yeah, I think my patch (comment 187) is the way to go here, so we can split this bug by caller and file separate bugs in the buggy users.

(In reply to Nathan Froyd [:froydnj] from comment #212)

I see the patch in comment 187, but I don't understand "it causes a crash and I don't know why" in comment 211 -- it looks like it caused one crash in the try push, exactly as it was supposed to.

Indeed. My confusion was that I wasn't seeing the "<whatever> has a buggy user..." message in the treeherder summary and so I thought it was crashing while evaluating my printf string, rather than after emitting the string, which is what I wanted. But I see now that if I look at the raw job log the string is there, it just doesn't show up in the part of the log treeherder highlights as relevant. So (I guess) the patch is working fine.

Anyway, the important part is that the error message surfaced by TreeHerder does have the class being put into the LinkedList, so the patch as-is should be sufficient to accomplish what we want here, which is to allow sheriffs to classify the crashes by user code.

I'll put the patch up for review on a dependent bug.

Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]
Whiteboard: [stockwell disable-recommended] → [stockwell needswork:owner]

There are no failures here since bug 1661798 got fixed.
Kartikaya, can this be closed as fixed too?

Flags: needinfo?(kats)

Yeah I guess so, there are now individual bugs being filled for the specific call sites that are leaking.

Status: NEW → RESOLVED
Closed: 4 years ago
Depends on: 1661682, 1661683
Flags: needinfo?(kats)
Resolution: --- → FIXED

There are some recent failures here for esr-78, does this need to be reopened?

Flags: needinfo?(aryx.bugmail)

No, because it is known to affect Firefox 78 but won't fixed for it (firefox-esr78: wontfix).

Flags: needinfo?(aryx.bugmail)
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: