Closed Bug 998330 Opened 11 years ago Closed 10 years ago

MinimizeMemory should happen on the opened-thread, not on main-thread

Categories

(Toolkit :: Storage, defect)

defect
Not set
normal
Points:
5

Tracking

()

RESOLVED FIXED
mozilla38
Iteration:
38.2 - 9 Feb
Tracking Status
firefox37 --- fixed
firefox38 --- fixed

People

(Reporter: mak, Assigned: mak)

References

Details

Attachments

(2 files, 3 obsolete files)

if the connection is created on a given thread, minimizeMemory should happens on that thread, not on main-thread.
Flags: firefox-backlog?
Flags: firefox-backlog? → firefox-backlog+
Whiteboard: p=0
Needs a point estimate. Putting into the current iteration as it is blocking bug 914070. Set to qa- because I think this will be hard to test manually, please correct me if I'm wrong.
Whiteboard: p=0 → p=0 s=it-31c-30a-29b.3 [qa-]
it's easier to test automatically than manually, basically I'm not sure it's worth a test though, since almost all of our test harnesses hit the assertion I added in bug 914070. This is probably an easy:2, should not require too much time to make a patch.
Whiteboard: p=0 s=it-31c-30a-29b.3 [qa-] → p=2 s=it-31c-30a-29b.3 [qa-]
Attached patch patch v1 (obsolete) (deleted) — Splinter Review
Unfortunately I must remove the thread check from connectionReady cause it may be executed on the main-thread when instead the connection was opened on another thread. For example when we collect all of the connections in the service and we check connectionReady for each of them... Probably with some refactoring (using !isClosing && !isClosed for example) I could fix that, but I feel like we are going too far from the goal here. I added a thread check assertion to ExecuteSimpleSQL, again it would probably be better to have a stricter check (ideally it should only run on the opened thread), but asyncClone() likes to use it from the async thread, and would be more refactoring... So, after spending time evaluating crazy refactorings, I figured I should just fix this bug and leave the rest to future development.
sigh, and unfortunately nsOfflineCacheDevice::Shutdown runs on main thread (and executeSimpleSQL from it) even if the database was created on a different thread... This is further stuff that is broken, but I'm not going to fix it here, also cause cache is going away for cache2, so I'm not even sure it's worth spending time on it. I'll convert the assert to a warning for now.
So, looks like the change here is causing a Connection leak in mochitests, I don't know if it's due to NS_newRunnableMethodWithArg, of just to unfortunate timing in calling minimizeMemory. I will experiment with my own runnable implementation. Though, since I must remove the connectionReady assertion regardless, due to the other consumers, I can start landing bug 914070 without the assertion check.
it's definitely dispatching the event that ExecuteSimpleSQL that causes the leak. it doesn't get better even using a custom runnable and releasing the connection asap. No ideas off-hand.
I looked at https://tbpl.mozilla.org/?tree=Try&rev=cd0c330b38f8. I don't think the problem is nsOfflineCacheDevice in the leak case. I traced down a lot of thread death stuff and it all looks sane. I could maybe imagine cases where Close() returns a failure code, but the Connection itself should still be destructed. Then I noticed in the leak event that there's an alive nsReverseStringSQLFunction. Which is this guy: http://dxr.mozilla.org/mozilla-central/source/dom/src/storage/DOMStorageDBThread.cpp#439 Maybe you're breaking the DOM storage DB?
Specifically, from https://tbpl.mozilla.org/php/getParsedLog.php?id=38414294&tree=Try we have: 09:09:40 INFO - TEST-INFO | leakcheck | leaked 2 Connection (224 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 5 Mutex (60 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 ReentrantMonitor (16 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 Service (64 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 nsLocalFile (124 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 nsReverseStringSQLFunction (12 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 2 nsRunnable (24 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 nsStringBuffer (8 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 2 nsTArray_base (8 bytes) 09:09:40 INFO - TEST-INFO | leakcheck | leaked 1 nsThread (124 bytes)
(In reply to Andrew Sutherland (:asuth) from comment #7) > I looked at https://tbpl.mozilla.org/?tree=Try&rev=cd0c330b38f8. I don't > think the problem is nsOfflineCacheDevice in the leak case. Yep, sorry if I was unclear, offlineCacheDevice was an issue with the assertion cause it acts like minimizeMemory, executing stuff on the wrong thread. But it's definitely not the reason of the leak. Removing the assertion "fixed" both issues (I will file a follow-up to track that). The leak is somehow related to this specific runnable I'm dispatching, or at least to how it interacts with all of the rest, since I don't think it's the runnable itself. > Then I noticed in the leak event that there's an alive > nsReverseStringSQLFunction. Which is this guy: > http://dxr.mozilla.org/mozilla-central/source/dom/src/storage/ > DOMStorageDBThread.cpp#439 > > Maybe you're breaking the DOM storage DB? I'm not sure how it may break by just dispatching a pragma shrink_memory. And a leak is a strange kind of breakage... A couple facts: 1. during mochitests we consume all of the memory, so memory-pressure notification is very likely to happen, thus minimizeMemory runs for sure. 2. the runnable may cause the connection to stay alive a little bit longer, since it has an owning pointer to it, until it can Run. Maybe we end up keeping the connection alive for too long.
Okay, I think I found the problem: DOMStorageDBThread uses a thread created via PR_CreateThread not via NS_NewThread. NS_NewThread uses the thread manager and so gets shutdown as part of the XPCOM shutdown where correctness is guaranteed. Unless there are some wacky #define shenanigans going on, the PR_Thread doesn't get that protection. So when you properly dispatch minimize memory to the DOMStorageDBThread, you have no guarantee that your runnable will run before shutdown happens.
Er, link: http://dxr.mozilla.org/mozilla-central/source/dom/src/storage/DOMStorageDBThread.cpp#81 It's possible there are other complex interactions going on. I just know that in my audit of the offlineCacheDevice, a lot of why your code should be safe was because NSThreads go to shut efforts to make sure all runnables are run during the shutdown process. And I should clarify that DOMStorageDBThread does create 2 connections, so this is very likely to be the guilty/affected party. I haven't looked into what the nsReverseFunction being alive means.
s/shut/such/ efforts. (Sorry, in a rush to get back to some FxOS stuff but wanted to try and help out here.)
Sigh, good find. I wonder if we can change DOM Storage to use nsIThread and the TM... I may try to test such patch on Try. A possible workaround may be to store a mShouldPurgeCache field in the connection and, on first opportunity (like an API call), do the shrink, just for those cases where we should have dispatched. It would still be sort of a problem cause the field would be set on one thread, used on another one, and the shrink may even not happen for a long time, or not at all.
I wonder if the only reason to directly use a PR thread was to be able to set its priority... off-hand anything can be done with an nsIthread as well, since it has a PRThread attribute. New patch on try that changes dom storage to use NS_NewThread, let's see. I think your idea is very likely, indeed the leak happens only on those mochitests using localStorage.
Attached patch patch v2 (obsolete) (deleted) — Splinter Review
Attachment #8411114 - Attachment is obsolete: true
Attached patch DOMStorage patch v1 (obsolete) (deleted) — Splinter Review
this is running here https://tbpl.mozilla.org/?tree=Try&rev=255ce9470a99 I don't know if it's going to work yet, just attaching for reference
Comment on attachment 8413163 [details] [diff] [review] DOMStorage patch v1 Ans looks like it's working! Let's see what Honza thinks about this.
Attachment #8413163 - Flags: review?(honzab.moz)
I need some background summary (too many comments here). From a quick look: - I prefer PR_Thread since it's faster (NS_NewThread stupidly waits for the thread creating blocking the calling thread, Shutdown() then stupidly loops the calling thread event queue - very bad thing! - why the rename to ThreadInit()?
- and also, why are you using nsThreadPoolNaming? there is just a single localstorage IO thread
(In reply to Andrew Sutherland (:asuth) from comment #10) > Okay, I think I found the problem: DOMStorageDBThread uses a thread created > via PR_CreateThread not via NS_NewThread. NS_NewThread uses the thread > manager and so gets shutdown as part of the XPCOM shutdown where correctness > is guaranteed. Unless there are some wacky #define shenanigans going on, > the PR_Thread doesn't get that protection. > > So when you properly dispatch minimize memory to the DOMStorageDBThread, you > have no guarantee that your runnable will run before shutdown happens. First, DOMStorageDBThread is not a regular thread that eats nsIRunnables you dispatch to it. Actually, it doesn't at all and never was meant to. If you need any protection, we can add it. I'd try to put some effort to fix this bug w/o swtiching to NS_NewThread. See also bug 992512.
(In reply to Honza Bambas (:mayhemer) from comment #18) > - I prefer PR_Thread since it's faster (NS_NewThread stupidly waits for the > thread creating blocking the calling thread, Shutdown() then stupidly loops > the calling thread event queue - very bad thing! See comment 10, PR_Thread doesn't ensure runnables are freeing up their resources on shutdown, so basically dispatching anything owning some pointer to it causes a leak. I cannot fix this bug unless we figure a way to solve that leak. Moving to nsIThread looked like the best approach from a safety point of view (since it's a largely tested/used code) than a custom solution. > - why the rename to ThreadInit()? No particular reason, just that not being anymore a ThreadFunc as PRThread expect was making the ThreadFunc naming strange. I can surely revert the change. > - and also, why are you using nsThreadPoolNaming? there is just a single > localstorage IO thread Hm, I somehow thought for a while that we had one thread per host or something like that,my fault. (In reply to Honza Bambas (:mayhemer) from comment #20) > First, DOMStorageDBThread is not a regular thread that eats nsIRunnables you > dispatch to it. Actually, it doesn't at all and never was meant to. Though, there's nothing blocking doing that, it's indeed a very common thing to dispatch runnables. > If you need any protection, we can add it. I'd try to put some effort to > fix this bug w/o swtiching to NS_NewThread. > > See also bug 992512. I just need to avoid leaking anything a runnable owns on shutdown :) If you want to avoid NS_NewThread cause it has measurable negative impact, fine by me, I actually thought it was "safer" to use it than to use the raw PRThread implementation, and the overhead was negligible for a single thread.
(In reply to Marco Bonardo [:mak] from comment #21) > (In reply to Honza Bambas (:mayhemer) from comment #18) > > - I prefer PR_Thread since it's faster (NS_NewThread stupidly waits for the > > thread creating blocking the calling thread, Shutdown() then stupidly loops > > the calling thread event queue - very bad thing! > > See comment 10, PR_Thread doesn't ensure runnables are freeing up their > resources on shutdown, so basically dispatching anything owning some pointer > to it causes a leak. I cannot fix this bug unless we figure a way to solve > that leak. Moving to nsIThread looked like the best approach from a safety > point of view (since it's a largely tested/used code) than a custom solution. > > > - why the rename to ThreadInit()? > > No particular reason, just that not being anymore a ThreadFunc as PRThread > expect was making the ThreadFunc naming strange. I can surely revert the > change. > > > - and also, why are you using nsThreadPoolNaming? there is just a single > > localstorage IO thread > > Hm, I somehow thought for a while that we had one thread per host or > something like that,my fault. Also, it won't work. The way you use it it dispatches an event that is never executed. Leave PR_SetCurrentThreadName on the place it is now and it will do. No need to change it, there is no more magic in the XPCOM code anyway. > > (In reply to Honza Bambas (:mayhemer) from comment #20) > > First, DOMStorageDBThread is not a regular thread that eats nsIRunnables you > > dispatch to it. Actually, it doesn't at all and never was meant to. > > Though, there's nothing blocking doing that, it's indeed a very common thing > to dispatch runnables. No one has a ref to this thread. No way to dispatch runnables here. Also, if there is an nsThread created for this thread by accident (with NS_GetCurrentThread called on the IO thread - does it happen actually? does our Storage code do it?) we can loop it's event queue at shutdown easily or intermittently during execution. We do the same for the new HTTP cache v2 in CacheIOThread. It's 3 lines of code. > > > If you need any protection, we can add it. I'd try to put some effort to > > fix this bug w/o swtiching to NS_NewThread. > > > > See also bug 992512. > > I just need to avoid leaking anything a runnable owns on shutdown :) Understand. I just need some more adjectives here :) Which runnables? What is owned by that runnable and why it has something to do with the localstorage db thread? > If you want to avoid NS_NewThread cause it has measurable negative impact, > fine by me, I actually thought it was "safer" to use it than to use the raw > PRThread implementation, and the overhead was negligible for a single thread. The thing is that the DOM storage thread starts very soon, so it may have some influence when the system is e.g. more loaded. Also I hate the Shutdown() property of nsThread - there is already a bug to make the caller thread queue spinning optional, but not in the tree yet.
(In reply to Honza Bambas (:mayhemer) from comment #22) > No one has a ref to this thread. No way to dispatch runnables here. Also, > if there is an nsThread created for this thread by accident (with > NS_GetCurrentThread called on the IO thread - does it happen actually? does > our Storage code do it?) we can loop it's event queue at shutdown easily or > intermittently during execution. We do the same for the new HTTP cache v2 > in CacheIOThread. It's 3 lines of code. When you create a Storage connection, the connection keeps a reference to the thread it was opened on, since then any related stuff should stay on that thread. I'm trying to add some more assertions to ensure this, since some components are going fancy, opening a connection on thread A and then executing stuff on main-thread... That's basically how we have access to the thread. > Understand. I just need some more adjectives here :) Which runnables? > What is owned by that runnable and why it has something to do with the > localstorage db thread? Sure, I didn't give you enough information indeed. The point is that Storage in certain occasions has to do some maintenance work on the connections, that is independent from the connection owner. One piece of this is handling the "memory-pressure" notification (mostly for B2G memory needs). When memory-pressure is received, we ask all connections to drop their shared cache, thus freeing up some memory. This means we must run a "PRAGMA shrink_memory" on the connection, and we should do that on the thread that opened the connection for thread-safety, but the observer service notifies us on main-thread, thus we need to send a runnable to the proper thread, to execute that query. The runnable has an owning pointer to the connection (otherwise one may close the connection and release if before Run()), and it just invokes the ExecuteSimpleSQL method of the connection (through NS_NewRunnableMethodWithArg) when it runs. Mochitests are hitting this, since the memory is shared across multiple virtual machines and thus limited. If the runnable never runs, the Connection object is never released. > The thing is that the DOM storage thread starts very soon, so it may have > some influence when the system is e.g. more loaded. Also I hate the > Shutdown() property of nsThread - there is already a bug to make the caller > thread queue spinning optional, but not in the tree yet. Though from my understanding, it's the usual way to ensure any runnable left on the thread is run and released. I think this is the sort of "protection" we were speaking about.
(In reply to Marco Bonardo [:mak] from comment #23) > (In reply to Honza Bambas (:mayhemer) from comment #22) > > No one has a ref to this thread. No way to dispatch runnables here. Also, > > if there is an nsThread created for this thread by accident (with > > NS_GetCurrentThread called on the IO thread - does it happen actually? does > > our Storage code do it?) we can loop it's event queue at shutdown easily or > > intermittently during execution. We do the same for the new HTTP cache v2 > > in CacheIOThread. It's 3 lines of code. > > When you create a Storage connection, the connection keeps a reference to > the thread it was opened on, since then any related stuff should stay on > that thread. I'm trying to add some more assertions to ensure this, since > some components are going fancy, opening a connection on thread A and then > executing stuff on main-thread... > That's basically how we have access to the thread. > > > Understand. I just need some more adjectives here :) Which runnables? > > What is owned by that runnable and why it has something to do with the > > localstorage db thread? > > Sure, I didn't give you enough information indeed. > > The point is that Storage in certain occasions has to do some maintenance > work on the connections, that is independent from the connection owner. One > piece of this is handling the "memory-pressure" notification (mostly for B2G > memory needs). > > When memory-pressure is received, we ask all connections to drop their > shared cache, thus freeing up some memory. > This means we must run a "PRAGMA shrink_memory" on the connection, and we > should do that on the thread that opened the connection for thread-safety, > but the observer service notifies us on main-thread, thus we need to send a > runnable to the proper thread, to execute that query. > The runnable has an owning pointer to the connection (otherwise one may > close the connection and release if before Run()), and it just invokes the > ExecuteSimpleSQL method of the connection (through > NS_NewRunnableMethodWithArg) when it runs. > Mochitests are hitting this, since the memory is shared across multiple > virtual machines and thus limited. If the runnable never runs, the > Connection object is never released. Got it, thanks! And you want these events be executed sooner then during shutdown I presume, right? If so, then you patch is a clear r-. To do this properly, you need to copy what is now in CacheIOThread: - create nsIThread with NS_GetCurrentThread() at the start of ThreadFunc [1] - add itself as an observer - when a runnable via nsIThread(nsIEventTarget).dispatch() is posted (OnDispatchedEvent), wake the thread and execute the runnables (mXPCOMThread->ProcessNextEvent in a loop, [2]), up to you to do it before any db-op is executed or after [1] http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheIOThread.cpp#170 [2] http://mxr.mozilla.org/mozilla-central/source/netwerk/cache2/CacheIOThread.cpp#199 > > > The thing is that the DOM storage thread starts very soon, so it may have > > some influence when the system is e.g. more loaded. Also I hate the > > Shutdown() property of nsThread - there is already a bug to make the caller > > thread queue spinning optional, but not in the tree yet. > > Though from my understanding, it's the usual way to ensure any runnable left > on the thread is run and released. I think this is the sort of "protection" > we were speaking about. No, you understand it backwards. Indeed nsThread loops/runs all pending posted runnables before it joins. The thing is that the other thread's (the one calling Shutdown()) event queue is looped, which is bad and may cause reentrances and otherwise bad unexpected behavior. This is a serious bug some people including me are fighting against. And indeed, when I implement a thread my self (CacheIOThread is also PR_Thread) I take great care to exec all that has been posted (and has to be executed) before the thread goes away.
(In reply to Honza Bambas (:mayhemer) from comment #24) > Got it, thanks! And you want these events be executed sooner then during > shutdown I presume, right? If so, then you patch is a clear r-. well, if it doesn't run due to shutdown, that's fine, we don't need to free up memory if we are already shutting down. Indeed I was now thinking we may maybe keep a weak reference and null-check it... That would basically solve the leak without making crazy changes. So I will first try this, and then if that's not going anywhere I may try implementing your suggestion. > And indeed, when I implement a thread my self (CacheIOThread is also > PR_Thread) I take great care to exec all that has been posted (and has to be > executed) before the thread goes away. But not runnables, right? You mean anything you dispatched from the code that created the thread itself. Off-hand looks like that would benefit from a more general solution than hand picking things for each consumer (but I'm clearly not aware of the actual issues so my comment is very weak).
(In reply to Marco Bonardo [:mak] from comment #25) > > And indeed, when I implement a thread my self (CacheIOThread is also > > PR_Thread) I take great care to exec all that has been posted (and has to be > > executed) before the thread goes away. > > But not runnables, right? You mean anything you dispatched from the code > that created the thread itself. Off-hand looks like that would benefit from > a more general solution than hand picking things for each consumer (but I'm > clearly not aware of the actual issues so my comment is very weak). I am not sure I fully follow this. The CacheIOThread executes any nsIRunnables posted to it via nsIEventTarget.dispatch() and makes sure all of them are executed before shutdown (the event target is obtained with NS_GetCurrentThread() called on the IO thread or with a specific API we provide like getIOTarget().) The localStorage IO thread does not handle nsIRunnables at all, never was a need for it until now.
Comment on attachment 8413163 [details] [diff] [review] DOMStorage patch v1 Based on comment 24.
Attachment #8413163 - Flags: review?(honzab.moz) → review-
Whiteboard: p=2 s=it-31c-30a-29b.3 [qa-] → p=2 s=it-32c-31a-30b.1 [qa-]
Clearly using a refptr doesn't work cause it is just a nsCOMPtr<nsIWeakReference>, so still an owning pointer that leaks on shutdown. And regardless the nsRunnable would leak. I will probably try to implement the suggestion in comment 24, even if it's a bit unfortunate we have to patch consumers, we don't have control over all consumers in the wild, so we can't patch all of them. But at least we could have shrink_memory working on dom storage and indexedDB, that is what matters most on B2G. That said, while this bug should still be fixed, I'll probably ask for it to be removed from the Firefox backlog, it was added cause it was blocking bug 914070, but that landed without this one, and there are more important bugs to be fixed sooner in Firefox. So this could be worked on a little lazier.
Status: ASSIGNED → NEW
Whiteboard: p=2 s=it-32c-31a-30b.1 [qa-] → p=2 [qa-]
mak, maybe create a new bug blocking this one to impl the full nsIRunnable support in the dom storage thread.
(In reply to Honza Bambas (:mayhemer) from comment #29) > mak, maybe create a new bug blocking this one to impl the full nsIRunnable > support in the dom storage thread. would that be to implement the suggestion in comment 24? I was evaluating whether even fixing it we could still have issues with other consumers (thinking of indexedDB mostly) but sounds like DOM Storage at the moment is the only Storage consumer using raw PR_Thread, so likely fixing it may be the right solution as of now. Also because I can't think of a more generic solution that would free up the memory at the right time. I'll file it.
Depends on: 1010717
Points: --- → 2
Flags: qe-verify-
Whiteboard: p=2 [qa-]
Points: 2 → 5
Hey guys, what's the status here? This is biting us pretty hard in bug 1101419.
Blocks: 1101419
I can try next week to look into bug 1010717. If Honza would have any time to do that, it would probably take less time since I don't know that code and I must first figure my way through it.
I have a patch on Try, let's see.
Attachment #8413163 - Attachment is obsolete: true
Attached patch patch v3 (deleted) — Splinter Review
So, supposing we can figure bug 1010717 for which I have a patch that works locally, this still needs a review.
Attachment #8413162 - Attachment is obsolete: true
Attachment #8534394 - Flags: review?(bugmail)
Status: NEW → ASSIGNED
Iteration: --- → 37.2
Attachment #8534394 - Flags: review?(bugmail) → review+
Iteration: 37.2 → 37.3
Hey mak, what's left here?
the pending review in bug 1010717.
Iteration: 37.3 - 12 Jan → 38.1 - 26 Jan
Iteration: 38.1 - 26 Jan → 38.2 - 9 Feb
Target Milestone: --- → mozilla38
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Approval Request Comment [Feature/regressing bug #]: SQLite3.8.7.2 and newer [User impact if declined]: this can cause crashes due to thread-safety problems. The biggest problem though is that it causes intermittent crashes in our tests infrastructure, see bug 1101419. [Describe test coverage new/current, TreeHerder]: m-c [Risks and why]: The risk is limited by the fact this code only runs at memory-pressure. The reason we want this in Aurora is that it will become a long supported B2G branch and this crash causes very frequent intermittent crashes in tests. [String/UUID change made/needed]: none
Attachment #8559222 - Flags: approval-mozilla-aurora?
needinfo-ing Honza to make him aware of this approval request, since the patch includes bug 1010717
Flags: needinfo?(honzab.moz)
Thanks.
Flags: needinfo?(honzab.moz)
Comment on attachment 8559222 [details] [diff] [review] coalesced patch including bug 1010717 Given that we've proven that this can cause crashes in our test infrastructure and we've had this on m-c for over a week, I'm going to accept this change for Aurora even though making things thread safe can be somewhat scary and we don't have this associated with a high priority crash signature from our pre-release channels. The benefit to B2G is a side benefit that will prevent the sheriffs from disabling tests. Aurora+
Attachment #8559222 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Depends on: 1239794
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: