Open Bug 988464 Opened 11 years ago Updated 2 years ago

Flag threads that aren't shut down before nsThreadManager::Shutdown()

Categories

(Core :: XPCOM, defect)

defect

Tracking

()

ASSIGNED
mozilla35

People

(Reporter: jesup, Assigned: jesup)

References

(Depends on 2 open bugs)

Details

Attachments

(1 file, 8 obsolete files)

This is to catch nsThread leaks like the one in bug 986764. Threads should be shut down in xpcom-shutdown-threads. However, this may not be always worthwhile (and will cost code just for cleanup), so we could easily mark known-want-to-shutdown-at-threadmanager-shutdown threads so they don't get flagged and just ::Shutdown().
It's an API contract that client code must shut down all threads by the time xpcom-shutdown-threads finishes, so all of that code ought to exist already.
Note: some threads with ShutdownNeeded() == false have PRThread pointers that go to freed memory!
https://tbpl.mozilla.org/?tree=Try&rev=8ac1c136a95f For some reason I wasn't getting notices on Try runs with nsThread:3 in automation.py.in, though local mochitest runs do show it. So I added an printf() for good measure.
Attachment #8397304 - Attachment is obsolete: true
Attached patch Shut down MediaManager thread in xpcom-shutdown (obsolete) (deleted) — Splinter Review
shuts down MediaThread in xpcom-shutdown (which is before xpcom-shutdown-threads....). Perhaps we should shut it down later in xpcom-shutdown-threads. Also it appeared there were several extant refs to MediaManager itself
Note: automation.py is no longer used in Mochitest.
Attachment #8397379 - Attachment is obsolete: true
Blocks: 988872
Blocks: 988871
Blocks: 988874
Blocks: 988877
Blocks: 988881
Blocks: 988884
Attachment #8397840 - Attachment is obsolete: true
Comment on attachment 8398015 [details] [diff] [review] log nsThreads still active at nsThreadManager::Shutdown() I can remove automation.py.in if we're not using it anymore (seemed to be used for a local ./mach mochitest-plain)
Attachment #8398015 - Flags: review?(benjamin)
Attachment #8397382 - Attachment is obsolete: true
I think we need to lock around the PRThread access
Attachment #8398015 - Attachment is obsolete: true
Attachment #8398015 - Flags: review?(benjamin)
Attachment #8398041 - Flags: review?(benjamin)
(In reply to Randell Jesup [:jesup] from comment #9) > I can remove automation.py.in if we're not using it anymore (seemed to be > used for a local ./mach mochitest-plain) We're not using it for mochitest, it's still being used for reftest. It should not be used in a local mochitest run, I'm not sure where you're seeing that.
Comment on attachment 8398041 [details] [diff] [review] log nsThreads still active at nsThreadManager::Shutdown() Why is the top part #ifdef PR_LOGGING but the lower part #ifdef DEBUG? Is this code so performance-critical that we couldn't use PR logging in all builds and use PR_LOG_TEST to guard whether we enter those logging loops?
Flags: needinfo?(rjesup)
(In reply to Benjamin Smedberg [:bsmedberg] from comment #13) > Comment on attachment 8398041 [details] [diff] [review] > log nsThreads still active at nsThreadManager::Shutdown() > > Why is the top part #ifdef PR_LOGGING but the lower part #ifdef DEBUG? History of how I wrote it; originally it was leeching off nsThread's logs. It should be PR_LOGGING I think > Is this code so performance-critical that we couldn't use PR logging in all > builds and use PR_LOG_TEST to guard whether we enter those logging loops? It's not perf-critical, and only does two PR_Log() calls anyways in shutdown. (Partly I was initially cautious about letting this code run in opt/release builds because I know half-shut-down is a tricky state.) And there should be few or no threads here anyways.) We can use PR_LOG_TEST, but I don't think it's important here.
Flags: needinfo?(rjesup)
Comment on attachment 8398041 [details] [diff] [review] log nsThreads still active at nsThreadManager::Shutdown() ok r=me with #ifdef PR_LOGGING instead of DEBUG
Attachment #8398041 - Flags: review?(benjamin) → review+
Attachment #8398041 - Attachment is obsolete: true
Comment on attachment 8457444 [details] [diff] [review] log nsThreads still active at nsThreadManager::Shutdown() Corry r+ forward
Attachment #8457444 - Flags: review+
sorry had to back out this changes, seems this caused xpcshell test failures like https://tbpl.mozilla.org/php/getParsedLog.php?id=43999486&tree=Mozilla-Inbound TEST-UNEXPECTED-FAIL | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/toolkit/modules/tests/xpcshell/test_sqlite.js | test failed (with xpcshell return code: 1), see following log: PROCESS-CRASH | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/toolkit/modules/tests/xpcshell/test_sqlite.js | application crashed [@ 0x7fff82857c00] Operating system: Mac OS X 23:48:35 INFO - 10.6.8 10K549 23:48:35 INFO - CPU: amd64 23:48:35 INFO - family 6 model 23 stepping 10 23:48:35 INFO - 2 CPUs 23:48:35 INFO - Crash reason: EXC_BAD_ACCESS / 0x0000000d 23:48:35 INFO - Crash address: 0x0 23:48:35 INFO - Thread 0 (crashed) 23:48:35 INFO - 0 0x7fff82857c00 23:48:35 INFO - rbx = 0x0000000000000008 r12 = 0x0000000000000002 23:48:35 INFO - r13 = 0x74537a6f6d206461 r14 = 0x74537a6f6d206461 23:48:35 INFO - r15 = 0x00007fff829a5da4 rip = 0x00007fff82857c00 23:48:35 INFO - rsp = 0x00007fff5fbfd228 rbp = 0x00007fff5fbfd250 23:48:35 INFO - Found by: given as instruction pointer in context 23:48:35 INFO - Thread 1 23:48:35 INFO - 0 0x7fff8286dc0a 23:48:35 INFO - rbx = 0x0000000106280e50 r12 = 0x00007fff7037a998 23:48:35 INFO - r13 = 0x00007fff7037ad68 r14 = 0xffffffffffffffff 23:48:35 INFO - r15 = 0x00007fff7037ada8 rip = 0x00007fff8286dc0a 23:48:35 INFO - rsp = 0x0000000106280ce8 rbp = 0x0000000106280e90 23:48:35 INFO - Found by: given as instruction pointer in context 23:48:35 INFO - Thread 2 23:48:35 INFO - 0 0x7fff8286ea2a 23:48:35 INFO - rbx = 0x0000000106704000 r12 = 0x00000001061593c0 23:48:35 INFO - r13 = 0x0000000106165000 r14 = 0x0000000106704000 23:48:35 INFO - r15 = 0x0000000000000000 rip = 0x00007fff8286ea2a 23:48:35 INFO - rsp = 0x0000000106703f28 rbp = 0x0000000106703f50 23:48:35 INFO - Found by: given as instruction pointer in context 23:48:35 INFO - Thread 3 23:48:35 INFO - 0 0x7fff8286dc0a 23:48:35 INFO - rbx = 0x0000000106147490 r12 = 0x0000000106127000 23:48:35 INFO - r13 = 0x0000000106126800 r14 = 0x0000000106147490 23:48:35 INFO - r15 = 0x0000000000000001 rip = 0x00007fff8286dc0a 23:48:35 INFO - rsp = 0x0000000107280ac8 rbp = 0x0000000107280b30 23:48:35 INFO - Found by: given as instruction pointer in context 23:48:35 INFO - Thread 4 23:48:35 INFO - 0 0x7fff8288fa6a 23:48:35 INFO - rbx = 0x0000000107f81000 r12 = 0x0000000106117308
I should dig out the cause and re-land this
Target Milestone: mozilla33 → mozilla35
We need to get this landed again. We're leaking threads in async storage and we didn't notice...
Minor unbitrotting of jesup's patch.
Attachment #8457444 - Attachment is obsolete: true
That sql XPCshell test still fails. Locally, the log contains a bunch of lines that look like this: 0:04.76 LOG: Thread-1 INFO "CONSOLE_MESSAGE: (warn) [JavaScript Error: "Warning: Sqlite connection 'gc_99.sqlite#0' was not properly closed. Auto-close triggered by garbage collection. " {file: "resource://gre/modules/Sqlite.jsm" line: 67}]" The OSX crash report says that there's a stack overflow, though I'm not sure in what.
FYI, this is the bug we were discussing at Whistler
Simplified version of the patch, which simply NS_ASSERTION()s on any leaked nsThread. Note that simple tests (load Youtube, quit (linux)) show 2 leaked threads: Image Scaler, and FileBlockCache (which helpfully doesn't set the threadname; I'll file a bug on that). We likely won't want to land this without fixing those. Note that GDB shows that FileBlockCache::Close() isn't even being called in my test.
Attachment #8649759 - Flags: review?(nfroyd)
Attachment #8586281 - Attachment is obsolete: true
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
cpearce, bholley - note comment 25 about FileBlockCache leaking
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2a8c5cf4b35 Mostly green; several C++ unit tests and XPCShell tests don't bother to shut down some of their threads. Interesting that in mochitests, timing works out that Image Scaler and FileBlockCache get shut down (albeit with a sample size of only 1 run), while running a local linux64 debug build they fail always when closing a window after a simple test of Youtube. I can't say I'm surprised at some of our thread shutdowns having timing races, though - one of the bugs that poked me to push this forward again has a strongly implied timing-caused thread leak. Wouldn't be shocked if turning this on exposed a handful of intermittents in automation. Right now it just uses NS_ASSERTION; we could make it log in opt builds (though I'd prefer something that caused orange/red on treeherder; perhaps use some env var to react more harshly if we're in a mochitest). MOZ_CRASH would seem overkill however.
Comment on attachment 8649759 [details] [diff] [review] log nsThreads still active at nsThreadManager::Shutdown() Review of attachment 8649759 [details] [diff] [review]: ----------------------------------------------------------------- r=me with changes below. ::: xpcom/threads/nsThread.h @@ +173,5 @@ > } > }; > > + // This lock protects access to mObserver, mEvents and mEventsAreDoomed > + // and mShutdownRequired. All of those fields (except mShutdownRequired) Nit: this should now read "...mEvents, mEventsAreDoomed, and mShutdownRequired" Feel free to rewrap the comment if necessary. ::: xpcom/threads/nsThreadManager.cpp @@ +3,5 @@ > /* This Source Code Form is subject to the terms of the Mozilla Public > * License, v. 2.0. If a copy of the MPL was not distributed with this > * file, You can obtain one at http://mozilla.org/MPL/2.0/. */ > > +#include "prprf.h" Can we just use mozilla/Snprintf.h and use snprintf_literal directly below? Best to avoid introducing more NSPR-isms.
Attachment #8649759 - Flags: review?(nfroyd) → review+
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #28) > Can we just use mozilla/Snprintf.h and use snprintf_literal directly below? > Best to avoid introducing more NSPR-isms. Sure! I hadn't noticed that had landed; good!
Blocks: 1270575
Depends on: 1271402
Depends on: 1271429
Depends on: 1271514
Depends on: 1271791
tracking-fennec: --- → ?
relnote-firefox: --- → ?
Flags: sec-bounty?
Flags: in-testsuite+
Flags: behind-pref+
Flags: a11y-review+
tracking-fennec: ? → ---
relnote-firefox: ? → ---
Flags: sec-bounty?
Flags: in-testsuite+
Flags: behind-pref+
Flags: a11y-review+

"PERF" key word?

Depends on: 1644883
Depends on: 1644896
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: