Closed Bug 1602689 Opened 5 years ago Closed 3 years ago

Intermittent LeakSanitizer | leak at mozilla::NotNull, RacyRegisteredThread, RegisteredThread::RegisteredThread, mozilla::detail::UniqueSelector

Categories

(Core :: Gecko Profiler, defect, P5)

defect

Tracking

()

RESOLVED FIXED
Tracking Status
firefox97 --- fixed
firefox98 --- fixed
firefox99 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: egao)

References

(Blocks 1 open bug)

Details

(Keywords: intermittent-failure)

Attachments

(1 file)

Filed by: nerli [at] mozilla.com
Parsed log: https://treeherder.mozilla.org/logviewer.html#?job_id=280435304&repo=autoland
Full log: https://firefox-ci-tc.services.mozilla.com/api/queue/v1/task/JL3W6jibRL23XK3tRkyPSA/runs/0/artifacts/public/logs/live_backing.log


[task 2019-12-10T07:25:22.316Z] 07:25:22 INFO - GECKO(4052) | SUMMARY: AddressSanitizer: 64 byte(s) leaked in 2 allocation(s).
[task 2019-12-10T07:25:22.392Z] 07:25:22 INFO - TEST-INFO | Main app process: exit 0
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - TEST-INFO | LeakSanitizer | To show the addresses of leaked objects add report_objects=1 to LSAN_OPTIONS
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - TEST-INFO | LeakSanitizer | This can be done in testing/mozbase/mozrunner/mozrunner/utils.py
[task 2019-12-10T07:25:22.393Z] 07:25:22 ERROR - TEST-UNEXPECTED-FAIL | LeakSanitizer | leak at mozilla::NotNull, RacyRegisteredThread, RegisteredThread::RegisteredThread, mozilla::detail::UniqueSelector
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - runtests.py | Application ran for: 0:00:19.358857
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - zombiecheck | Reading PID log: /tmp/tmpT2Dxgvpidlog
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - ==> process 4052 launched child process 4065
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - ==> process 4052 launched child process 4105
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - ==> process 4052 launched child process 4118
[task 2019-12-10T07:25:22.393Z] 07:25:22 INFO - ==> process 4052 launched child process 4185
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - ==> process 4052 launched child process 4218
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - ==> process 4052 launched child process 4249
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4065
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4105
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4118
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4185
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4249
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - zombiecheck | Checking for orphan process with PID: 4218
[task 2019-12-10T07:25:22.394Z] 07:25:22 INFO - Stopping web server
[task 2019-12-10T07:25:22.409Z] 07:25:22 INFO - Stopping web socket server
[task 2019-12-10T07:25:22.413Z] 07:25:22 INFO - Stopping ssltunnel
[task 2019-12-10T07:25:22.432Z] 07:25:22 WARNING - leakcheck | refcount logging is off, so leaks can't be detected!

:gerald - this is an issue that I've been observing on almost all chunks that have failures in linux64-asan/opt, when I use the new ubuntu1804 test image.

I noticed that this particular failure was seen on mozilla-central (ubuntu1604) tests on December 16 but since December 17 it is no longer reported. However, on my latest mozilla-central pull, I am still able to reproduce this issue with ubuntu1804.

Here are some try runs:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=55d21a101684af73b0c7faa74ab912ff7d80596a&searchStr=browser-chrome
example log: https://firefoxci.taskcluster-artifacts.net/KSgSfGjHS7iZps2yTEdTFw/0/public/logs/live_backing.log

Would you be able to take a look, or pass the ni to someone that can comment on why this is still reproducible on ubuntu1804? Thanks!

To run tests against ubuntu1804, please use ./mach try fuzzy --ubuntu-bionic and select linux64 tasks as normal.

Flags: needinfo?(gsquelart)

It looks like the leaks are all happening with cubeb in the stack, if that indicates anything.

Thank you Edwin and Andrew for all this good information.

I see that when the "AudioIPC Client RPC" thread is created, the thread function registers the thread, but then it's never de-registered.
And since bug 1445822, the information for still-registered threads is not scrapped when the profiler shuts down (in case the thread still needs access to it to work on labels).

So we need that thread to de-register itself when it ends -- assuming it ever ends? I'm afraid it may be one of those never-ending Rayon threads, see bug 1445822 comment 50.

Paul, I see you wrote this register_thread. Would it be possible to write and call a corresponding deregister_thread? And would it actually be called?!
(The C++ callback is there.)

Flags: needinfo?(gsquelart) → needinfo?(padenot)

We certainly can, but it's annoying to have to.

Flags: needinfo?(padenot)

Failure is still reproducible on current mozilla-central as of 2020/01/06:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=351c5423c62b726096049d19bc4c10849be3fcb9&searchStr=asan%2Cbrowser-chrome&selectedJob=283661039

Refer to test-linux64-asan/opt-mochitest-browser-chrome-fis-e10s-8 M-fis(bc8) for the instance of this failure.

Some possible alternatives:

  • The thread could store its required data on the stack, and the profiler would store what it needs separately; so the profiler could destroy what it owns (like before) while the thread would still be able to do its work after that.
  • Tell LeakSanitizer not to worry (if that's possible?)

(In reply to Gerald Squelart [:gerald] (he/him) from comment #10)

  • Tell LeakSanitizer not to worry (if that's possible?)

You can use MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT to tell LSan to ignore the leak of a specific object. Obviously, it should be used with great care.

You can also whitelist via the allocation stack, but in this case where there are just one or two specific objects then MOZ_LSAN_INTENTIONALLY_LEAK_OBJECT might be better.

Checking to see if there's any movement on this.

This issue is one of the couple that are preventing mochitest-browser-chrome from running under linux1804.

:gerald - thanks for the initial triage of this bug back in comment 2.

There has been no movement on this bug and this is the last item holding the mochitest-browser-chrome suite from being moved to run on linux1804. This is also a failure that I can't disable or annotate, which means I am blocked in my migration.

My hope was to have the mochitest-browser-chrome suite migrated over before the Berlin All Hands as I am going on parental leave after that time.

Flags: needinfo?(gsquelart)

For reference, this is the most recently try push showing the failures:
https://treeherder.mozilla.org/#/jobs?repo=try&group_state=expanded&tier=1%2C2%2C3&revision=84e3929daec47302129549699274491943bb1ad4&searchStr=asan%2Cbrowser-chrome&selectedJob=285706918

To run pushes against ubuntu1804, please use mach try fuzzy --ubuntu-bionic and select linux64 jobs as normal.

Assignee: nobody → egao
Status: NEW → ASSIGNED
Pushed by egao@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4a26f99760d5 whitelist RegisteredThread::RegisteredThread r=decoder,gerald

I've added the leave-open flag, this issue still needs to be addressed as soon as possible.

I'm on PTO for one week, then half-PTO for 2 weeks, so "ASAP" for me may be a few weeks away...

I think we can arrange to call PROFILER_UNREGISTER_THREAD() when the AudioIPC client threads shut down, I'll try that when I have a chance once I get home from Berlin.

Flags: needinfo?(kinetik)

(In reply to Matthew Gregan [:kinetik] from comment #24)

I think we can arrange to call PROFILER_UNREGISTER_THREAD() when the AudioIPC client threads shut down, I'll try that when I have a chance once I get home from Berlin.

I'll land this change via bug 1614547.

Flags: needinfo?(kinetik)

Worth noting that bug 1610640 may hide this, since AudioIPC is (temporarily) disabled in mochitest-browser-chrome.

Much appreciated, Matthew. ⭐️

I'll keep an eye on leaks here, in case there are other long-living threads...

Depends on: 1614547
Assignee: egao → nobody
Status: ASSIGNED → NEW
Blocks: LSan

RegisteredThread was removed in bug 1722261, so I'll call this bug here effectively fixed, since 93.0a1 / 20210824094724.

Status: NEW → RESOLVED
Closed: 3 years ago
Depends on: 1722261
Flags: needinfo?(gsquelart)
Resolution: --- → FIXED
Assignee: nobody → egao
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: