Closed Bug 1835430 Opened 1 years ago Closed 1 year ago

High frequency OSX leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)

Categories

(Core :: Networking, defect, P2)

Firefox 115
defect

Tracking

()

RESOLVED FIXED
116 Branch
Tracking Status
firefox-esr102 --- unaffected
firefox-esr115 --- fixed
firefox113 --- unaffected
firefox114 --- unaffected
firefox115 --- fixed
firefox116 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: kershaw, NeedInfo)

References

(Regression)

Details

(Keywords: intermittent-failure, regression, Whiteboard: [retriggered][stockwell disable-recommended][necko-triaged][necko-priority-queue])

Attachments

(1 file)

Filed by: smolnar [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer?job_id=417241513&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/PPS3ocEmR9ut7cbCNjuLUQ/runs/0/artifacts/public/logs/live_backing.log


INFO - TEST-PASS | leakcheck | utility no leaks detected!
[task 2023-05-26T21:21:50.690Z] 21:21:50     INFO - leakcheck | Processing leak log file /var/folders/l4/gqq0rv4x3252x7yrx9lgp66r000014/T/tmphjbr1mgk/runtests_leaks_1072.log
[task 2023-05-26T21:21:50.690Z] 21:21:50     INFO - 
[task 2023-05-26T21:21:50.690Z] 21:21:50     INFO - == BloatView: ALL (cumulative) LEAK AND BLOAT STATISTICS, default process 1357
[task 2023-05-26T21:21:50.691Z] 21:21:50     INFO - 
[task 2023-05-26T21:21:50.691Z] 21:21:50     INFO -      |<----------------Class--------------->|<-----Bytes------>|<----Objects---->|
[task 2023-05-26T21:21:50.691Z] 21:21:50     INFO -      |                                      | Per-Inst   Leaked|   Total      Rem|
[task 2023-05-26T21:21:50.692Z] 21:21:50     INFO -    0 |TOTAL                                 |       48     4432| 2404381       11|
[task 2023-05-26T21:21:50.692Z] 21:21:50     INFO -  606 |Mutex                                 |       96      288|   14072        3|
[task 2023-05-26T21:21:50.692Z] 21:21:50     INFO -  977 |PollableEvent                         |       32       32|       1        1|
[task 2023-05-26T21:21:50.693Z] 21:21:50     INFO - 1036 |ReentrantMonitor                      |       40       40|     973        1|
[task 2023-05-26T21:21:50.694Z] 21:21:50     INFO - 1558 |nsAStreamCopier                       |       40       40|     406        1|
[task 2023-05-26T21:21:50.695Z] 21:21:50     INFO - 1847 |nsPipe                                |      208      208|     716        1|
[task 2023-05-26T21:21:50.695Z] 21:21:50     INFO - 1849 |nsPipeInputStream                     |      160      160|     716        1|
[task 2023-05-26T21:21:50.696Z] 21:21:50     INFO - 1898 |nsSocketTransport                     |      848      848|      38        1|
[task 2023-05-26T21:21:50.696Z] 21:21:50     INFO - 1899 |nsSocketTransportService              |     2808     2808|       1        1|
[task 2023-05-26T21:21:50.696Z] 21:21:50     INFO - 1910 |nsStringBuffer                        |        8        8|  161235        1|
[task 2023-05-26T21:21:50.697Z] 21:21:50     INFO - 
[task 2023-05-26T21:21:50.697Z] 21:21:50     INFO - nsTraceRefcnt::DumpStatistics: 2054 entries
[task 2023-05-26T21:21:50.697Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 3 Mutex
[task 2023-05-26T21:21:50.697Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 PollableEvent
[task 2023-05-26T21:21:50.698Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 ReentrantMonitor
[task 2023-05-26T21:21:50.698Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsAStreamCopier
[task 2023-05-26T21:21:50.698Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsPipe
[task 2023-05-26T21:21:50.699Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsPipeInputStream
[task 2023-05-26T21:21:50.699Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsSocketTransport
[task 2023-05-26T21:21:50.699Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsSocketTransportService
[task 2023-05-26T21:21:50.700Z] 21:21:50     INFO - TEST-INFO | leakcheck | default leaked 1 nsStringBuffer
[task 2023-05-26T21:21:50.700Z] 21:21:50     INFO - TEST-UNEXPECTED-FAIL | leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)
[task 2023-05-26T21:21:50.700Z] 21:21:50     INFO - 
[task 2023-05-26T21:21:50.702Z] 21:21:50     INFO - Closing logging queue
[task 2023-05-26T21:21:50.702Z] 21:21:50     INFO - queue closed
[task 2023-05-26T21:21:50.727Z] 21:21:50     INFO - Application command: /opt/worker/tasks/task_168513561145155/build/application/Firefox NightlyDebug.app/Contents/MacOS/firefox --marionette about:blank -foreground -profile /var/folders/l4/gqq0rv4x3252x7yrx9lgp66r000014/T/tmpt8e77i0x
[task 2023-05-26T21:21:50.733Z] 21:21:50     INFO - PID 1371 | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/l4/gqq0rv4x3252x7yrx9lgp66r000014/T/tmpf8yhh5bp/runtests_leaks_1072.log
[task 2023-05-26T21:21:50.734Z] 21:21:50     INFO - PID 1371 | [1371, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:345
[task 2023-05-26T21:21:50.734Z] 21:21:50     INFO - PID 1371 | [1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: Rejected attempt to change type of pref extensions.formautofill.creditCards.available's user value from bool to string: file /builds/worker/checkouts/gecko/modules/libpref/Preferences.cpp:1895
[task 2023-05-26T21:21:50.735Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.736Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.736Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.736Z] 21:21:50     INFO - PID 1371 | [WARN  rkv::backend::impl_safe::environment] `load_ratio()` is irrelevant for this storage backend.
[task 2023-05-26T21:21:50.737Z] 21:21:50     INFO - PID 1371 | GL_VENDOR: Intel Inc.
[task 2023-05-26T21:21:50.737Z] 21:21:50     INFO - PID 1371 | mVendor: Intel
[task 2023-05-26T21:21:50.738Z] 21:21:50     INFO - PID 1371 | GL_RENDERER: Intel(R) UHD Graphics 630
[task 2023-05-26T21:21:50.738Z] 21:21:50     INFO - PID 1371 | mRenderer: Unknown
[task 2023-05-26T21:21:50.739Z] 21:21:50     INFO - PID 1371 | mIsMesa: 0
[task 2023-05-26T21:21:50.739Z] 21:21:50     INFO - PID 1371 | 1685136104166	Marionette	INFO	Marionette enabled
[task 2023-05-26T21:21:50.740Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.740Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.741Z] 21:21:50     INFO - PID 1371 | 1685136104366	Marionette	INFO	Listening on port 50288
[task 2023-05-26T21:21:50.741Z] 21:21:50     INFO - PID 1371 | [Parent 1371, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.742Z] 21:21:50     INFO - PID 1371 | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/l4/gqq0rv4x3252x7yrx9lgp66r000014/T/tmpf8yhh5bp/runtests_leaks_1072_tab_pid1380.log
[task 2023-05-26T21:21:50.742Z] 21:21:50     INFO - PID 1371 | [1380, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:345
[task 2023-05-26T21:21:50.742Z] 21:21:50     INFO - PID 1371 | [Parent 1371, GMPThread] WARNING: Failed to delete GMP storage directory: file /builds/worker/checkouts/gecko/dom/media/gmp/GMPServiceParent.cpp:1759
[task 2023-05-26T21:21:50.743Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.743Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.744Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.744Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: Fallback to FallbackRenderer: file /builds/worker/checkouts/gecko/dom/ipc/BrowserChild.cpp:2697
[task 2023-05-26T21:21:50.745Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.745Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: Fallback to FallbackRenderer: file /builds/worker/checkouts/gecko/dom/ipc/BrowserChild.cpp:2697
[task 2023-05-26T21:21:50.746Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: Fallback to FallbackRenderer: file /builds/worker/checkouts/gecko/dom/ipc/BrowserChild.cpp:2697
[task 2023-05-26T21:21:50.746Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: Fallback to FallbackRenderer: file /builds/worker/checkouts/gecko/dom/ipc/BrowserChild.cpp:2697
[task 2023-05-26T21:21:50.747Z] 21:21:50     INFO - PID 1371 | [Child 1380, Main Thread] WARNING: Fallback to FallbackRenderer: file /builds/worker/checkouts/gecko/dom/ipc/BrowserChild.cpp:2697
[task 2023-05-26T21:21:50.747Z] 21:21:50     INFO - PID 1371 | ### XPCOM_MEM_BLOAT_LOG defined -- logging bloat/leaks to /var/folders/l4/gqq0rv4x3252x7yrx9lgp66r000014/T/tmpf8yhh5bp/runtests_leaks_1072_tab_pid1382.log
[task 2023-05-26T21:21:50.748Z] 21:21:50     INFO - PID 1371 | [1382, Main Thread] WARNING: XPCOM_MEM_BLOAT_LOG is set, disabling native allocations.: file /builds/worker/checkouts/gecko/tools/profiler/core/platform.cpp:345
[task 2023-05-26T21:21:50.748Z] 21:21:50     INFO - PID 1371 | [WARN  webrender::device::gl] Missing optimized shader source for gpu_cache_update
<...>
INFO - PID 1371 | [Child 1388, Main Thread] WARNING: NS_ENSURE_SUCCESS(rv, rv) failed with result 0x80004005 (NS_ERROR_FAILURE): file /builds/worker/checkouts/gecko/toolkit/xre/nsXREDirProvider.cpp:475
[task 2023-05-26T21:21:50.769Z] 21:21:50     INFO - Starting runner
[task 2023-05-26T21:21:51.048Z] 21:21:51     INFO - PID 1371 | 2023-05-26 21:21:51.047 firefox[1371:20023] Persistent UI failed to open file file:///Users/cltbld/Library/Saved%20Application%20State/org.mozilla.nightlydebug.savedState/window_1.data: No such file or directory (2)
[task 2023-05-26T21:21:51.394Z] 21:21:51     INFO - TEST-START | /content-security-policy/img-src/icon-allowed.sub.html

Backfills and retriggers indicate this started from bug 1834640.

Can you take a look?

Flags: needinfo?(nika)
Keywords: regression
Regressed by: 1834640
Summary: Intermittent leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...) → High frequency leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)

Set release status flags based on info from the regressing bug 1834640

Whiteboard: [retriggered]

This seems to be OS X 10.15 WebRender debug specific.

Summary: High frequency leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...) → High frequency OSX leakcheck | default 4432 bytes leaked (Mutex, PollableEvent, ReentrantMonitor, nsAStreamCopier, nsPipe, ...)

Randell, could you take a look at this very frequent failure (started with the changes from bug 1834640)?

Flags: needinfo?(rjesup)
Severity: S4 → --
Priority: P5 → --
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]

In the log from comment 0, I notice this error log stating we failed to dispatch the continuation event from nsAStreamCopier:

[task 2023-05-26T21:21:49.459Z] 21:21:49     INFO - PID 1357 | [Parent 1357, Main Thread] WARNING: unable to post continuation event: file /builds/worker/checkouts/gecko/xpcom/io/nsStreamUtils.cpp:467

If this dispatch ends up failing e.g. due to the thread being shut down, we could end up leaking the runnable, which in this case is the nsAStreamCopier, and would end up leading to a leak approximately like this one.

There's a chance that the previous behaviour from before bug 1834640 which was not shutting down the socket thread until later during shutdown was masking some error, where we're ending up attempting to use the socket transport during xpcom-shutdown-threads. If that's the case, it seems somewhat likely that we may have been encountering this situation prior to bug 1818998 as well, or the caller which is failing here was introduced during the time between that landing and now.

A quick scan through the calls to NS_AsyncCopy and a look at the particular leaked objects suggests that the leaked copy is probably one of the ones created from nsSocketTransport::Open{Input,Output}Stream (https://searchfox.org/mozilla-central/rev/af5be0d1e0ff1e44bd37896d95cca9f21d47ec9c/netwerk/base/nsSocketTransport2.cpp#2336-2337,2382-2383). If called late enough, I think that these would have the expected behaviour here.

Perhaps these methods on nsSocketTransport could be made clever enough to check if the socket thread is still alive when dispatching? I'm unsure whether that would have the intended effect though, and it's not clear why it would be macOS specific (perhaps just timing?)

Set release status flags based on info from the regressing bug 1834640

Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered]
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell needswork:owner]
Component: web-platform-tests → Networking
Product: Testing → Core
Assignee: nobody → kershaw
Severity: -- → S4
Priority: -- → P2
Whiteboard: [retriggered][stockwell disable-recommended] → [retriggered][stockwell disable-recommended][necko-triaged][necko-priority-queue]

I can reproduce this locally, so I'll try to fix this.

As Nika pointed out in comment #13, the problem is that the {Input,Output}Stream is closed too late after socket thread has been already shut down.
I think the proper way to fix this is to cancel the AsyncCopy when the socket is detached.

Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/7d29150fa9f6 Cancel async copy to avoid leaks, r=necko-reviewers,valentin

Backed out for causing failures on nsCOMPtr.h

Backout link

Push with failures

Failure log

Flags: needinfo?(kershaw)

(In reply to Norisz Fay [:noriszfay] from comment #35)

Backed out for causing failures on nsCOMPtr.h

Backout link

Push with failures

Failure log

I am fixing the race issue and will be updating the patch soon.

Flags: needinfo?(kershaw)
Pushed by kjang@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/f33a545e3834 Cancel async copy to avoid leaks, r=necko-reviewers,valentin,jesup
Status: NEW → RESOLVED
Closed: 1 year ago
Resolution: --- → FIXED
Target Milestone: --- → 116 Branch

:kershaw could you add an uplift request on this if it's safe to take?
Considering 115 is the next ESR it would be good to fix high frequency failures

Flags: needinfo?(kershaw)
Regressions: 1840300
Version: unspecified → Firefox 115

:kershaw we would take this in 115 along with the fix for the regression (Bug 1840300) if you could add a release uplift request?

Comment on attachment 9340292 [details]
Bug 1835430 - Cancel async copy to avoid leaks, r=#necko

Beta/Release Uplift Approval Request

  • User impact if declined: Potentially memory leaks during shutdown.
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: Yes
  • Needs manual test from QE?: No
  • If yes, steps to reproduce:
  • List of other uplifts needed: Bug 1840300
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Should be low. Cancel async copy at shutdown should be fine.
  • String changes made/needed: N/A
  • Is Android affected?: Yes
Flags: needinfo?(kershaw)
Attachment #9340292 - Flags: approval-mozilla-beta?

(In reply to Donal Meehan [:dmeehan] from comment #44)

:kershaw we would take this in 115 along with the fix for the regression (Bug 1840300) if you could add a release uplift request?

Sorry for the late response. I wanted to wait a bit before requesting uplift, because this change could break some tests.

Duplicate of this bug: 1835478

Comment on attachment 9340292 [details]
Bug 1835430 - Cancel async copy to avoid leaks, r=#necko

Fx115 is now in release, switching the uplift request

Attachment #9340292 - Flags: approval-mozilla-beta? → approval-mozilla-release?

Comment on attachment 9340292 [details]
Bug 1835430 - Cancel async copy to avoid leaks, r=#necko

Approved for 115.0 RC2

Attachment #9340292 - Flags: approval-mozilla-release? → approval-mozilla-release+
Flags: needinfo?(rjesup)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: