Open
Bug 988464
Opened 11 years ago
Updated 2 years ago
Flag threads that aren't shut down before nsThreadManager::Shutdown()
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
ASSIGNED
mozilla35
People
(Reporter: jesup, Assigned: jesup)
References
(Depends on 2 open bugs)
Details
Attachments
(1 file, 8 obsolete files)
(deleted),
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
This is to catch nsThread leaks like the one in bug 986764. Threads should be shut down in xpcom-shutdown-threads.
However, this may not be always worthwhile (and will cost code just for cleanup), so we could easily mark known-want-to-shutdown-at-threadmanager-shutdown threads so they don't get flagged and just ::Shutdown().
Comment 1•11 years ago
|
||
It's an API contract that client code must shut down all threads by the time xpcom-shutdown-threads finishes, so all of that code ought to exist already.
Assignee | ||
Comment 2•11 years ago
|
||
Note: some threads with ShutdownNeeded() == false have PRThread pointers that go to freed memory!
Assignee | ||
Comment 3•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=8ac1c136a95f
For some reason I wasn't getting notices on Try runs with nsThread:3 in automation.py.in, though local mochitest runs do show it. So I added an printf() for good measure.
Assignee | ||
Comment 4•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8397304 -
Attachment is obsolete: true
Assignee | ||
Comment 5•11 years ago
|
||
shuts down MediaThread in xpcom-shutdown (which is before xpcom-shutdown-threads....). Perhaps we should shut it down later in xpcom-shutdown-threads. Also it appeared there were several extant refs to MediaManager itself
Comment 6•11 years ago
|
||
Note: automation.py is no longer used in Mochitest.
Assignee | ||
Comment 7•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8397379 -
Attachment is obsolete: true
Assignee | ||
Comment 8•11 years ago
|
||
Assignee | ||
Updated•11 years ago
|
Attachment #8397840 -
Attachment is obsolete: true
Assignee | ||
Comment 9•11 years ago
|
||
Comment on attachment 8398015 [details] [diff] [review]
log nsThreads still active at nsThreadManager::Shutdown()
I can remove automation.py.in if we're not using it anymore (seemed to be used for a local ./mach mochitest-plain)
Attachment #8398015 -
Flags: review?(benjamin)
Assignee | ||
Updated•11 years ago
|
Attachment #8397382 -
Attachment is obsolete: true
Assignee | ||
Comment 10•11 years ago
|
||
I think we need to lock around the PRThread access
Assignee | ||
Updated•11 years ago
|
Attachment #8398015 -
Attachment is obsolete: true
Attachment #8398015 -
Flags: review?(benjamin)
Assignee | ||
Updated•11 years ago
|
Attachment #8398041 -
Flags: review?(benjamin)
Assignee | ||
Comment 11•11 years ago
|
||
https://tbpl.mozilla.org/?tree=Try&rev=e26f103ed199
(with a previous patch)
Comment 12•11 years ago
|
||
(In reply to Randell Jesup [:jesup] from comment #9)
> I can remove automation.py.in if we're not using it anymore (seemed to be
> used for a local ./mach mochitest-plain)
We're not using it for mochitest, it's still being used for reftest. It should not be used in a local mochitest run, I'm not sure where you're seeing that.
Comment 13•11 years ago
|
||
Comment on attachment 8398041 [details] [diff] [review]
log nsThreads still active at nsThreadManager::Shutdown()
Why is the top part #ifdef PR_LOGGING but the lower part #ifdef DEBUG?
Is this code so performance-critical that we couldn't use PR logging in all builds and use PR_LOG_TEST to guard whether we enter those logging loops?
Flags: needinfo?(rjesup)
Assignee | ||
Comment 14•11 years ago
|
||
(In reply to Benjamin Smedberg [:bsmedberg] from comment #13)
> Comment on attachment 8398041 [details] [diff] [review]
> log nsThreads still active at nsThreadManager::Shutdown()
>
> Why is the top part #ifdef PR_LOGGING but the lower part #ifdef DEBUG?
History of how I wrote it; originally it was leeching off nsThread's logs. It should be PR_LOGGING I think
> Is this code so performance-critical that we couldn't use PR logging in all
> builds and use PR_LOG_TEST to guard whether we enter those logging loops?
It's not perf-critical, and only does two PR_Log() calls anyways in shutdown. (Partly I was initially cautious about letting this code run in opt/release builds because I know half-shut-down is a tricky state.) And there should be few or no threads here anyways.) We can use PR_LOG_TEST, but I don't think it's important here.
Flags: needinfo?(rjesup)
Comment 15•11 years ago
|
||
Comment on attachment 8398041 [details] [diff] [review]
log nsThreads still active at nsThreadManager::Shutdown()
ok r=me with #ifdef PR_LOGGING instead of DEBUG
Attachment #8398041 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 16•10 years ago
|
||
unbitrot
Assignee | ||
Updated•10 years ago
|
Attachment #8398041 -
Attachment is obsolete: true
Assignee | ||
Comment 17•10 years ago
|
||
Comment on attachment 8457444 [details] [diff] [review]
log nsThreads still active at nsThreadManager::Shutdown()
Corry r+ forward
Attachment #8457444 -
Flags: review+
Assignee | ||
Comment 18•10 years ago
|
||
Target Milestone: --- → mozilla33
Comment 19•10 years ago
|
||
sorry had to back out this changes, seems this caused xpcshell test failures like
https://tbpl.mozilla.org/php/getParsedLog.php?id=43999486&tree=Mozilla-Inbound
TEST-UNEXPECTED-FAIL | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/toolkit/modules/tests/xpcshell/test_sqlite.js | test failed (with xpcshell return code: 1), see following log:
PROCESS-CRASH | /builds/slave/talos-slave/test/build/tests/xpcshell/tests/toolkit/modules/tests/xpcshell/test_sqlite.js | application crashed [@ 0x7fff82857c00]
Operating system: Mac OS X
23:48:35 INFO - 10.6.8 10K549
23:48:35 INFO - CPU: amd64
23:48:35 INFO - family 6 model 23 stepping 10
23:48:35 INFO - 2 CPUs
23:48:35 INFO - Crash reason: EXC_BAD_ACCESS / 0x0000000d
23:48:35 INFO - Crash address: 0x0
23:48:35 INFO - Thread 0 (crashed)
23:48:35 INFO - 0 0x7fff82857c00
23:48:35 INFO - rbx = 0x0000000000000008 r12 = 0x0000000000000002
23:48:35 INFO - r13 = 0x74537a6f6d206461 r14 = 0x74537a6f6d206461
23:48:35 INFO - r15 = 0x00007fff829a5da4 rip = 0x00007fff82857c00
23:48:35 INFO - rsp = 0x00007fff5fbfd228 rbp = 0x00007fff5fbfd250
23:48:35 INFO - Found by: given as instruction pointer in context
23:48:35 INFO - Thread 1
23:48:35 INFO - 0 0x7fff8286dc0a
23:48:35 INFO - rbx = 0x0000000106280e50 r12 = 0x00007fff7037a998
23:48:35 INFO - r13 = 0x00007fff7037ad68 r14 = 0xffffffffffffffff
23:48:35 INFO - r15 = 0x00007fff7037ada8 rip = 0x00007fff8286dc0a
23:48:35 INFO - rsp = 0x0000000106280ce8 rbp = 0x0000000106280e90
23:48:35 INFO - Found by: given as instruction pointer in context
23:48:35 INFO - Thread 2
23:48:35 INFO - 0 0x7fff8286ea2a
23:48:35 INFO - rbx = 0x0000000106704000 r12 = 0x00000001061593c0
23:48:35 INFO - r13 = 0x0000000106165000 r14 = 0x0000000106704000
23:48:35 INFO - r15 = 0x0000000000000000 rip = 0x00007fff8286ea2a
23:48:35 INFO - rsp = 0x0000000106703f28 rbp = 0x0000000106703f50
23:48:35 INFO - Found by: given as instruction pointer in context
23:48:35 INFO - Thread 3
23:48:35 INFO - 0 0x7fff8286dc0a
23:48:35 INFO - rbx = 0x0000000106147490 r12 = 0x0000000106127000
23:48:35 INFO - r13 = 0x0000000106126800 r14 = 0x0000000106147490
23:48:35 INFO - r15 = 0x0000000000000001 rip = 0x00007fff8286dc0a
23:48:35 INFO - rsp = 0x0000000107280ac8 rbp = 0x0000000107280b30
23:48:35 INFO - Found by: given as instruction pointer in context
23:48:35 INFO - Thread 4
23:48:35 INFO - 0 0x7fff8288fa6a
23:48:35 INFO - rbx = 0x0000000107f81000 r12 = 0x0000000106117308
Assignee | ||
Comment 20•10 years ago
|
||
I should dig out the cause and re-land this
Target Milestone: mozilla33 → mozilla35
We need to get this landed again. We're leaking threads in async storage and we didn't notice...
Comment 22•10 years ago
|
||
Minor unbitrotting of jesup's patch.
Attachment #8457444 -
Attachment is obsolete: true
Comment 23•10 years ago
|
||
That sql XPCshell test still fails. Locally, the log contains a bunch of lines that look like this:
0:04.76 LOG: Thread-1 INFO "CONSOLE_MESSAGE: (warn) [JavaScript Error: "Warning: Sqlite connection 'gc_99.sqlite#0' was not properly closed. Auto-close triggered by garbage collection.
" {file: "resource://gre/modules/Sqlite.jsm" line: 67}]"
The OSX crash report says that there's a stack overflow, though I'm not sure in what.
Assignee | ||
Comment 24•9 years ago
|
||
FYI, this is the bug we were discussing at Whistler
Assignee | ||
Comment 25•9 years ago
|
||
Simplified version of the patch, which simply NS_ASSERTION()s on any leaked nsThread. Note that simple tests (load Youtube, quit (linux)) show 2 leaked threads: Image Scaler, and FileBlockCache (which helpfully doesn't set the threadname; I'll file a bug on that). We likely won't want to land this without fixing those. Note that GDB shows that FileBlockCache::Close() isn't even being called in my test.
Attachment #8649759 -
Flags: review?(nfroyd)
Assignee | ||
Updated•9 years ago
|
Attachment #8586281 -
Attachment is obsolete: true
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → rjesup
Status: NEW → ASSIGNED
Assignee | ||
Comment 26•9 years ago
|
||
cpearce, bholley - note comment 25 about FileBlockCache leaking
Assignee | ||
Comment 27•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c2a8c5cf4b35
Mostly green; several C++ unit tests and XPCShell tests don't bother to shut down some of their threads. Interesting that in mochitests, timing works out that Image Scaler and FileBlockCache get shut down (albeit with a sample size of only 1 run), while running a local linux64 debug build they fail always when closing a window after a simple test of Youtube. I can't say I'm surprised at some of our thread shutdowns having timing races, though - one of the bugs that poked me to push this forward again has a strongly implied timing-caused thread leak. Wouldn't be shocked if turning this on exposed a handful of intermittents in automation.
Right now it just uses NS_ASSERTION; we could make it log in opt builds (though I'd prefer something that caused orange/red on treeherder; perhaps use some env var to react more harshly if we're in a mochitest). MOZ_CRASH would seem overkill however.
Comment 28•9 years ago
|
||
Comment on attachment 8649759 [details] [diff] [review]
log nsThreads still active at nsThreadManager::Shutdown()
Review of attachment 8649759 [details] [diff] [review]:
-----------------------------------------------------------------
r=me with changes below.
::: xpcom/threads/nsThread.h
@@ +173,5 @@
> }
> };
>
> + // This lock protects access to mObserver, mEvents and mEventsAreDoomed
> + // and mShutdownRequired. All of those fields (except mShutdownRequired)
Nit: this should now read "...mEvents, mEventsAreDoomed, and mShutdownRequired" Feel free to rewrap the comment if necessary.
::: xpcom/threads/nsThreadManager.cpp
@@ +3,5 @@
> /* This Source Code Form is subject to the terms of the Mozilla Public
> * License, v. 2.0. If a copy of the MPL was not distributed with this
> * file, You can obtain one at http://mozilla.org/MPL/2.0/. */
>
> +#include "prprf.h"
Can we just use mozilla/Snprintf.h and use snprintf_literal directly below? Best to avoid introducing more NSPR-isms.
Attachment #8649759 -
Flags: review?(nfroyd) → review+
Assignee | ||
Comment 29•9 years ago
|
||
(In reply to Nathan Froyd [:froydnj][:nfroyd] from comment #28)
> Can we just use mozilla/Snprintf.h and use snprintf_literal directly below?
> Best to avoid introducing more NSPR-isms.
Sure! I hadn't noticed that had landed; good!
Assignee | ||
Updated•9 years ago
|
Assignee | ||
Comment 30•9 years ago
|
||
https://treeherder.mozilla.org/#/jobs?repo=try&revision=3dc899f68172 with the fix for bug 1271402 (in review) and bug 1270575 (landed)
Assignee | ||
Updated•9 years ago
|
tracking-fennec: --- → ?
status-firefox60:
--- → ?
status-firefox61:
--- → ?
status-firefox62:
--- → ?
status-firefox-esr52:
--- → ?
status-firefox-esr60:
--- → ?
tracking-firefox60:
--- → ?
tracking-firefox61:
--- → ?
tracking-firefox62:
--- → ?
tracking-firefox-esr52:
--- → ?
tracking-firefox-esr60:
--- → ?
relnote-firefox:
--- → ?
tracking-thunderbird_esr52:
--- → ?
tracking-thunderbird_esr60:
--- → ?
Flags: sec-bounty?
Flags: in-testsuite+
Flags: behind-pref+
Flags: a11y-review+
Updated•6 years ago
|
tracking-fennec: ? → ---
status-firefox60:
? → ---
status-firefox61:
? → ---
status-firefox62:
? → ---
status-firefox-esr52:
? → ---
status-firefox-esr60:
? → ---
tracking-firefox60:
? → ---
tracking-firefox61:
? → ---
tracking-firefox62:
? → ---
tracking-firefox-esr52:
? → ---
tracking-firefox-esr60:
? → ---
relnote-firefox:
? → ---
tracking-thunderbird_esr52:
? → ---
tracking-thunderbird_esr60:
? → ---
Flags: sec-bounty?
Flags: in-testsuite+
Flags: behind-pref+
Flags: a11y-review+
Comment 31•5 years ago
|
||
"PERF" key word?
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•