Closed
Bug 1304156
Opened 8 years ago
Closed 8 years ago
LeakSanitizer detects leak of 8 bytes in (<unknown module>) when running tests on Ubuntu 16.04
Categories
(Core :: Audio/Video: Playback, defect, P3)
Core
Audio/Video: Playback
Tracking
()
RESOLVED
FIXED
People
(Reporter: dminor, Assigned: karlt)
References
(Blocks 1 open bug)
Details
(Whiteboard: [fixed with bug 1323382])
This is from a recent try run:
[task 2016-09-20T15:52:43.983897Z] 15:52:43 INFO - ==1050==ERROR: LeakSanitizer: detected memory leaks
[task 2016-09-20T15:52:43.983947Z] 15:52:43 INFO - Direct leak of 8 byte(s) in 1 object(s) allocated from:
[task 2016-09-20T15:52:43.984015Z] 15:52:43 INFO - #0 0x4b247b in malloc /builds/slave/moz-toolchain/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:52:3
[task 2016-09-20T15:52:43.986840Z] 15:52:43 INFO - #1 0x7fe7f6104778 (<unknown module>)
I see a leak of the same size when I run locally. This blocks being able to run ASAN tests on Ubuntu 16.04 images.
Reporter | ||
Updated•8 years ago
|
Rank: 25
Reporter | ||
Comment 1•8 years ago
|
||
If I restart the browser between mochitests, it looks like the following tests expose the leak:
dom/media/tests/mochitest/test_getUserMedia_audioCapture.html
dom/media/tests/mochitest/test_peerConnection_capturedVideo.html
I also get a bunch of leak in libX11.so on my system, but that isn't present in the try run - possibly because I run ASAN+debug locally.
Reporter | ||
Updated•8 years ago
|
Assignee: nobody → dminor
Reporter | ||
Comment 2•8 years ago
|
||
If I turn on reporting of leaked memory locations and capture an rr run with a watch on the leak location, I get the following backtrace for the last allocation to touch that memory:
#0 __memset_avx2 ()
at ../sysdeps/x86_64/multiarch/memset-avx2.S:102
#1 0x000000000042267b in __asan::Allocator::Allocate (
can_fill=true, alloc_type=__asan::FROM_MALLOC, stack=0x2,
alignment=8, size=8, this=0x56ea60 <__asan::instance>)
at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:448
#2 __asan::asan_malloc (size=size@entry=8,
stack=stack@entry=0x7ffd3ea859a0)
at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_allocator.cc:728
#3 0x00000000004be252 in __interceptor_malloc (size=8)
at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/asan_malloc_linux.cc:53
#4 0x00007f80155b1779 in ?? ()
from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#5 0x00007f80155ba648 in ?? ()
from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#6 0x00007f80155afde2 in ?? ()
from /usr/lib/x86_64-linux-gnu/libgomp.so.1
#7 0x00007f807fb794ea in call_init (l=<optimized out>,
argc=argc@entry=5, argv=argv@entry=0x7ffd3ea89a08,
env=env@entry=0x618000067880) at dl-init.c:72
#8 0x00007f807fb795fb in call_init (env=0x618000067880,
argv=0x7ffd3ea89a08, argc=5, l=<optimized out>) at dl-init.c:30
#9 _dl_init (main_map=main_map@entry=0x61a00020be80, argc=5,
argv=0x7ffd3ea89a08, env=0x618000067880) at dl-init.c:120
#10 0x00007f807fb7e712 in dl_open_worker (a=a@entry=0x7ffd3ea86660)
at dl-open.c:575
#11 0x00007f807fb79394 in _dl_catch_error (
objname=objname@entry=0x7ffd3ea86650,
errstring=errstring@entry=0x7ffd3ea86658,
mallocedp=mallocedp@entry=0x7ffd3ea8664f,
operate=operate@entry=0x7f807fb7e300 <dl_open_worker>,
args=args@entry=0x7ffd3ea86660) at dl-error.c:187
#12 0x00007f807fb7dbd9 in _dl_open (
file=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56",
mode=-2147483646, caller_dlopen=
0x444da5 <__interceptor_dlopen(char const*, int)+101>, nsid=-2,
argc=<optimized out>, argv=<optimized out>, env=0x618000067880)
at dl-open.c:660
#13 0x00007f807ec88f09 in dlopen_doit (a=a@entry=0x7ffd3ea86890)
at dlopen.c:66
#14 0x00007f807fb79394 in _dl_catch_error (
objname=0x78c610 <calloc_memory_for_dlsym+16>,
errstring=0x78c618 <calloc_memory_for_dlsym+24>,
mallocedp=0x78c608 <calloc_memory_for_dlsym+8>,
operate=0x7f807ec88eb0 <dlopen_doit>, args=0x7ffd3ea86890)
at dl-error.c:187
#15 0x00007f807ec89571 in _dlerror_run (
operate=operate@entry=0x7f807ec88eb0 <dlopen_doit>,
args=args@entry=0x7ffd3ea86890) at dlerror.c:163
#16 0x00007f807ec88fa1 in __dlopen (file=<optimized out>,
mode=<optimized out>) at dlopen.c:87
#17 0x0000000000444da5 in __interceptor_dlopen (
filename=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56",
flag=2)
at /home/dminor/src/llvm/projects/compiler-rt/lib/asan/../sanitizer_common/sanitizer_common_interceptors.inc:5135
#18 0x00007f807e488e18 in pr_LoadLibraryByPathname (
name=0x7f806d9fbe60 <.str.20> "libavcodec-ffmpeg.so.56",
flags=<optimized out>)
at /home/dminor/src/firefox-asan/nsprpub/pr/src/linking/prlink.c:803
#19 PR_LoadLibraryWithFlags (libSpec=..., flags=<optimized out>)
at /home/dminor/src/firefox-asan/nsprpub/pr/src/linking/prlink.c:418
#20 0x00007f8066519b85 in mozilla::FFmpegRuntimeLinker::Init ()
at /home/dminor/src/firefox-asan/dom/media/platforms/ffmpeg/FFmpegRuntimeLinker.cpp:62
#21 0x00007f80664b7a45 in mozilla::PDMFactoryImpl::PDMFactoryImpl (
this=0x602000186e90)
at /home/dminor/src/firefox-asan/dom/media/platforms/PDMFactory.cpp:74
#22 mozilla::PDMFactory::EnsureInit() const::$_0::operator()() const
(this=<optimized out>)
at /home/dminor/src/firefox-asan/dom/media/platforms/PDMFactory.cpp:196
#23 mozilla::detail::RunnableFunction<mozilla::PDMFactory::EnsureInit() const::$_0>::Run() (this=<optimized out>)
at /home/dminor/src/firefox-asan/objdir-ff-asan/dist/include/nsThreadUtils.h:278
which makes it look like it might be a system library problem in libgomp.so.
If I set fast_unwind_on_malloc=0 in the ASAN options, the leak disappears, which makes me think this is something that we would normally suppress, except that we get a bad stack in LSAN. Presumably disabling fast_unwind_on_malloc has performance implications, but if it is bad, perhaps we can just set that flag for the media job.
Status: NEW → ASSIGNED
Reporter | ||
Comment 3•8 years ago
|
||
Looks like it is too slow on our test machines: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f7e906e07f85e66ad2921b6d21c6a371a8128b02
Reporter | ||
Comment 4•8 years ago
|
||
I ran the dom/media/test tests and they are also affected, which is not surprising given the stack above.
Unless I've missed something, I think we're down to some unpleasant alternatives:
- add a suppression for <unknown module>
- stop running the media tests on asan builds
- investigate the libgomp leak and attempt to put together a patched version of Ubuntu 16.04 to use to run the tests. This isn't quite as bad as it sounds, since the test machines are docker images anyway.
With fast_unwind_on_malloc=0 set, the 8 bytes are reported as being leaked by libdl.so, so I guess there's no guarantee the leak is actually in libgomp.so.
Component: WebRTC → Audio/Video: Playback
Summary: LeakSanitizer detects leak of 8 bytes when running WebRTC tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes when running media tests on Ubuntu 16.04
Reporter | ||
Comment 5•8 years ago
|
||
Andrew, any suggestions on the above? Have I missed something? Thanks!
Flags: needinfo?(continuation)
Comment 6•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #5)
> Andrew, any suggestions on the above? Have I missed something? Thanks!
Yeah, that analysis sounds right to me. Maybe installing the debug symbols for libgomp on the machine would give us a better stack? I know that's improved the stacks for me locally when debugging LSan issues.
jib, do you know who might be able to look at this? Maybe there's some easy fix to this leak. Otherwise we may have to disable some tests. <unknown module> seems like a very broad suppression, though I haven't noticed it before so maybe it isn't so bad.
Flags: needinfo?(continuation) → needinfo?(jib)
Comment 7•8 years ago
|
||
I agree with the analysis also. I was going to suggest dminor ;) The stack suggests this is playback though, which is also how it's triaged, so maybe Anthony has someone who can look at it?
Flags: needinfo?(jib) → needinfo?(ajones)
Gerald - I'm not convinced that an 8 byte leak is an issue. Can you find a way to make the problem go away?
Flags: needinfo?(ajones) → needinfo?(gsquelart)
Priority: P2 → P3
Reporter | ||
Comment 9•8 years ago
|
||
The only reason this is an issue is that the ateam in the process of moving linux test machines from 12.04 to 16.04 and this blocks them from being able to run ASAN jobs on 16.04.
Assignee: dminor → nobody
Status: ASSIGNED → NEW
Comment 10•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #9)
> The only reason this is an issue is that the ateam in the process of moving
> linux test machines from 12.04 to 16.04 and this blocks them from being able
> to run ASAN jobs on 16.04.
Is it possible to install debug symbols for libgomp.so onto these machines? I don't know how difficult that is. I know some distros have separate packages for those.
Flags: needinfo?(dminor)
Reporter | ||
Comment 11•8 years ago
|
||
The Dockerfile used to create the test machine lives in tree here: testing/docker/desktop1604-test/Dockerfile. I think all that is needed is to add 'apt install libgomp1-dbg' there.
You will also need to remove the special case to run the ASAN job on Ubuntu 12.04 like I did in this try push: https://hg.mozilla.org/try/rev/f3dce4beadaf.
Here's a try push with libgomp1-dbg: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4fa6d941c48d4dfaab853c5ca380c5e756877674
Flags: needinfo?(dminor)
One thing I could blindly do from my side, is to unload the library when Firefox shuts down, in case that helps clear the allocated memory.
I'm not setup to run *SAN, so Dan, could you please try the following patch:
https://hg.mozilla.org/try/rev/75807390ff8bf89c6db8228374261f46eed57e2a
(From this try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=f684b972412dfcd949a4f4b65b9f6541fb46046e )
Flags: needinfo?(gsquelart) → needinfo?(dminor)
Reporter | ||
Comment 13•8 years ago
|
||
Unfortunately, the leak is still there with this patch in place.
Flags: needinfo?(dminor)
Comment 14•8 years ago
|
||
It looks like the debug symbols didn't help either.
Reporter | ||
Comment 15•8 years ago
|
||
Looks like this is now working: https://treeherder.mozilla.org/#/jobs?repo=try&revision=97e5846bab0c8b06cbcd110f695adf52fd90bc2a
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
Reporter | ||
Comment 16•8 years ago
|
||
Reopening as it seems this leak now shows up (maybe intermittently?) on other test suites as well.
Status: RESOLVED → REOPENED
Resolution: WORKSFORME → ---
Summary: LeakSanitizer detects leak of 8 bytes when running media tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes when running tests on Ubuntu 16.04
Comment 17•8 years ago
|
||
Permafail on some suites, intermittent for others.
Assignee | ||
Comment 18•8 years ago
|
||
An existing comment indicates that leaks in "<unknown module>" "can not [sic]
be suppressed":
http://searchfox.org/mozilla-central/rev/5ee2bd8800b007d6c120d9521d5bf01131885afb/media/webrtc/trunk/webrtc/modules/audio_device/linux/latebindingsymboltable_linux.cc#53
Here's a failed attempt:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=864405473e689113f30b2921555d5b3142e37e3e&selectedJob=32511475
Looking at the sanitizer_stacktrace_printer.cc version in chromium, it seems
that "(<unknown module>)" is printed when there is no module to match against
suppressions.
https://cs.chromium.org/chromium/src/third_party/llvm/compiler-rt/test/asan/TestCases/Linux/stack-trace-dlclose.cc?sq=package:chromium&l=42&dr=C
tests that "(<unknown module>)" is printed when the module has been unloaded.
A different scenario could have another module at the same address as was in
the unloaded module.
https://bugs.chromium.org/p/webrtc/issues/detail?id=3402 is a related bug
involving pulseaudio.
Assignee | ||
Comment 19•8 years ago
|
||
Skipping all dlclose calls removes these leaks. I hesitate to apply such a
workaround in general because it may suppress ASAN detection of one class of
use-after-free bugs.
When skipping dlclose removes these leaks in c1, there are no extra
suppressions. That suggests that the library may be holding, in a static
variable, a reference to the leaking memory.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5f8d1abb4aeb0cc9db69c8ed4422c2762b60584f&selectedJob=32534567
This try run confirms that libavcodec-ffmpeg.so.56 is the library that
triggers the leaks when dlclose()d.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=55e3970969db84d54301f73c5af15964cbcdac7e&selectedJob=32539723
If the library is not designed to be dlclose()d, then it probably should not
be dlclose()d. However, comment 2 seems to indicate that libgomp is the
problem library, and so perhaps keeping that open may work around this.
Or perhaps a gcc runtime library update could fix this.
Summary: LeakSanitizer detects leak of 8 bytes when running tests on Ubuntu 16.04 → LeakSanitizer detects leak of 8 bytes in (<unknown module>) when running tests on Ubuntu 16.04
Assignee | ||
Comment 20•8 years ago
|
||
(In reply to Dan Minor [:dminor] from comment #15)
> Looks like this is now working:
> https://treeherder.mozilla.org/#/
> jobs?repo=try&revision=97e5846bab0c8b06cbcd110f695adf52fd90bc2a
I don't know why those runs were green, because the leaks were still output in the logs.
e.g. https://treeherder.mozilla.org/logviewer.html#?job_id=31044459&repo=try
Assignee | ||
Comment 21•8 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=46b709c267874ac9853029609c1a33e93e1ac0d8&selectedJob=32614977
indicates that the proposed fix for bug 1323382 suppresses these leaks as they appear in mochitest-chrome tests.
Assignee: nobody → karlt
Status: REOPENED → ASSIGNED
Assignee | ||
Updated•8 years ago
|
Status: ASSIGNED → RESOLVED
Closed: 8 years ago → 8 years ago
Resolution: --- → FIXED
Whiteboard: [fixed with bug 1323382]
You need to log in
before you can comment on or make changes to this bug.
Description
•