Open Bug 990804 Opened 11 years ago Updated 2 years ago

Add a generic way to run some small piece of code off the main thread without requiring people to spawn their own threads

Categories

(Core :: XPCOM, defect)

x86
macOS
defect

Tracking

()

People

(Reporter: ehsan.akhgari, Unassigned)

Details

(Whiteboard: [Async])

Attachments

(2 files)

We have some places in the code base where we have a little bit of work to do which we want to finish off of the main thread. In some cases there are good reasons why you would want to spawn off your own thread and do the work there, but in a lot of cases, that is not really useful. This is error prone (because people may forget to shut down the thread for example), it uses too much memory, and requires writing too much code. OS X provides an API called "Grand Central Dispatch" <http://en.wikipedia.org/wiki/Grand_Central_Dispatch> which simplifies this task (ignore the language extension parts, but please see the example on the wikipedia page.) What I propose here is a very simple API like below: bool DispatchToWorkerPool(nsIRunnable* aRunnable); Under the hood we can make the API do smart things such as manage its own thread pool, scale up/down depending on how much concurrent work is in progress, how many cores you have, etc. Thoughts?
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #0) > > Thoughts? Yes please. This api seems like a reasonable place to start.
I also believe that we can start with a trivial implementation of the API and handle any under-the-hood scaling in a followup.
Yes, we could start with one thread. Bug 825746 is related, but not the same, since that is for parallelizing long running algorithms (I was testing that for certain parts of cycle collection). We may need to be careful to not over use threads in single core setups, but DispatchToWorkerPool or some such could of course use the current thread too.
If we want to reuse the implementation also to parallelize algorithms, then before we roll our own we might want to consider either libdispatch or TBB. For the scope of this patch they are both overkill but they might come in handy for future extensions. Do we have a particular policy for using/including external libraries into our codebase?
Flags: needinfo?(ehsan)
(In reply to Olli Pettay [:smaug] from comment #4) > Yes, we could start with one thread. Sure, that's quite reasonable, especially since we won't have a ton of consumers for this API in the beginning. > Bug 825746 is related, but not the same, since that is for parallelizing long > running algorithms (I was testing that for certain parts of cycle > collection). Agreed. > We may need to be careful to not over use threads in single core setups, but > DispatchToWorkerPool or some such could of course use the current thread too. No, I explicitly want this to be an API for running things *off* the current thread and the main thread, so that people can use it for blocking IO, etc. (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #5) > If we want to reuse the implementation also to parallelize algorithms, then > before we roll our own we might want to consider either libdispatch or TBB. > For the scope of this patch they are both overkill but they might come in > handy for future extensions. libdispatch is mostly useful for something like bug 825746, but IIRC it's BSD-only. I don't know much about TBB. Lots of marketing material on their website, but the link to the actual docs seems to be dead! <https://www.threadingbuildingblocks.org/documentation.php> But I don't understand why we need a library for this. I don't think there is anything particularly tricky about implementing this idea. > Do we have a particular policy for > using/including external libraries into our codebase? We can use them where they make sense, assuming license compat, etc.
Flags: needinfo?(ehsan)
I've sorta been using the stream transport service threadpool for this purpose... If we do make something new it would be good to find existing uses of that service and migrate them over. However, in my experience these sorts of things are hard to use generically. You have to worry about what happens if you need to do I/O or other blocking operations, you have to worry about temporary objects created on that threadpool being released on the correct thread, sometimes you have to worry about ordering, etc. And scaling is really hard to do right. Is there any way we could try to leverage the existing per-platform mechanisms to do that for us? Windows has a "smart" threadpool for this sort of thing, and as was pointed out OS X has GCD. Does linux have anything similar? I'm also worried that we already have too many threadpools. JS already makes one per core, mozStorage seems to make a bazillion, workers, indexedDB, necko, our multiple (!) hang monitoring threads, nss, more?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #7) > And scaling is really hard to do right. Is there any way we could try to > leverage the existing per-platform mechanisms to do that for us? Windows has > a "smart" threadpool for this sort of thing, and as was pointed out OS X has > GCD. Does linux have anything similar? I agree that it is hard to get right. Afaik Linux doesn't have a "suggested" mechanism but afaik TBB, OpenMP >3 and Cilk Plus support task parallelism and work well on Linux.
(In reply to comment #7) > I've sorta been using the stream transport service threadpool for this > purpose... If we do make something new it would be good to find existing uses > of that service and migrate them over. Yeah, the STS thread pool is sort of our IO thread these days... I definitely agree that if we end up doing this, we should move those uses of STS over. > However, in my experience these sorts of things are hard to use generically. > You have to worry about what happens if you need to do I/O or other blocking > operations, you have to worry about temporary objects created on that > threadpool being released on the correct thread, sometimes you have to worry > about ordering, etc. Completely agreed on the blocking operations issue, that's basically why we need this to be scalable. I'm curious to know if the existing usage of STS does not result in the same concern though? Not sure if I understand the issue of releasing objects on the correct thread. I'm not proposing posting different runnables for the same logical task to multiple threads in the pool... Although I guess that is hard to enforce at the API level. But this is basically supposed to be used for small "tasks", FWIW. Not sure if I completely understand the ordering concern, can you please elaborate? > And scaling is really hard to do right. Is there any way we could try to > leverage the existing per-platform mechanisms to do that for us? Windows has a > "smart" threadpool for this sort of thing, and as was pointed out OS X has GCD. > Does linux have anything similar? That's actually a very good idea IMO! Are you talking about <http://msdn.microsoft.com/en-us/library/windows/desktop/ms686980%28v=vs.85%29.aspx> for Windows? I don't know of anything equivalent for Linux unfortunately... > I'm also worried that we already have too many threadpools. JS already makes > one per core, mozStorage seems to make a bazillion, workers, indexedDB, necko, > our multiple (!) hang monitoring threads, nss, more? Yes, absolutely. I think one of the reasons behind this is that there is no better alternative than doing your own. My ultimate goal is for us to be able to run with "a few" threads. Anecdotally, right now my normal Firefox Nightly on OS X has 96 threads. I think that's unacceptable. ;-) (*Please* try to suggest solutions in addition to raising the problems! :-)
Ehsan, on Windows I would actually suggest TPL (http://msdn.microsoft.com/en-us/library/dd460717%28v=vs.110%29.aspx), which is what TBB uses under the hood. By using a full-blown task library instead of a simple threadpool we might be able to better exploit parallelism for future uses-cases.
Wrong library, my bad; PPL (http://msdn.microsoft.com/en-us/library/dd492418.aspx) is the right library.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #9) > > I'm also worried that we already have too many threadpools. JS already makes > > one per core, mozStorage seems to make a bazillion, workers, indexedDB, necko, > > our multiple (!) hang monitoring threads, nss, more? > > Yes, absolutely. I think one of the reasons behind this is that there is no > better alternative than doing your own. My ultimate goal is for us to be > able to run with "a few" threads. Anecdotally, right now my normal Firefox > Nightly on OS X has 96 threads. I think that's unacceptable. ;-) Why do you care how many threads we have?
Thread stacks use memory and memory space, which does come up as a big deal for low-memory Windows and Android users especially. We've seen unusual cases with a thousand threads eating 1G of VM space.
> However, in my experience these sorts of things are hard to use generically. > You have to worry about what happens if you need to do I/O or other blocking > operations, you have to worry about temporary objects created on that > threadpool being released on the correct thread, sometimes you have to worry > about ordering, etc. I believe that this API should not handle temporary objects or ordering. In order to keep the module simple and maintainable, I would leave these questions to be handled by API clients, who generally know much better than us. > TBB, OpenMP >3 and Cilk Plus support task parallelism and work well on Linux. I am not familiar with TBB, but I believe that OpenMP is way overkill. Here too, I believe that we should start with just a simple thread pool, and extend behind the hood only when we start having a good idea of how things are used in practice.
Anyway, if are interested in a trivial implementation, I'm willing to give it a go in a few days.
Assignee: nobody → dteller
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #9) > Completely agreed on the blocking operations issue, that's basically why we > need this to be scalable. I'm curious to know if the existing usage of STS > does not result in the same concern though? It definitely is, but up until now I don't think anyone beyond necko, me, and a few other folks use it. Mostly everyone uses their own threadpools. > Not sure if I understand the issue of releasing objects on the correct > thread. I'm not proposing posting different runnables for the same logical > task to multiple threads in the pool... Although I guess that is hard to > enforce at the API level. But this is basically supposed to be used for > small "tasks", FWIW. Storage code ran into trouble with this where you would send off some task to a threadpool saying "open a database, get this value out" and then later you'd send a "close this database" task (and there could of course be other tasks in between). That tripped all of our threadsafety assertions, plus violated some sqlite assumptions, etc. To deal with this currently all async storage stuff has to create its own thread per connection which I think is terrible. > Not sure if I completely understand the ordering concern, can you please > elaborate? Well if you have a single thread and you post two tasks to it they run in the posted order. Once you move to a pool they could run in any order. There isn't really any way to detect this kind of dependence. I just worry that having a magic service that purports to run everything optimally will encourage people to favor it over their own threads when order really is important. > That's actually a very good idea IMO! Are you talking about > <http://msdn.microsoft.com/en-us/library/windows/desktop/ms686980%28v=vs. > 85%29.aspx> for Windows? I'd love to use that but that API is only available in Vista. The XP and later API is here: http://msdn.microsoft.com/en-us/library/windows/desktop/ms686756%28v=vs.85%29.aspx > (*Please* try to suggest solutions in addition to raising the problems! :-) It's been a rough week...
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #16) > > Not sure if I understand the issue of releasing objects on the correct > > thread. I'm not proposing posting different runnables for the same logical > > task to multiple threads in the pool... Although I guess that is hard to > > enforce at the API level. But this is basically supposed to be used for > > small "tasks", FWIW. > > Storage code ran into trouble with this where you would send off some task > to a threadpool saying "open a database, get this value out" and then later > you'd send a "close this database" task (and there could of course be other > tasks in between). That tripped all of our threadsafety assertions, plus > violated some sqlite assumptions, etc. To deal with this currently all async > storage stuff has to create its own thread per connection which I think is > terrible. Hmm, so is this an issue with AddRef/Release thread safety assertions? Do those objects use thread-safe refcounting? I'm still trying to understand the nature of this problem. > > Not sure if I completely understand the ordering concern, can you please > > elaborate? > > Well if you have a single thread and you post two tasks to it they run in > the posted order. Once you move to a pool they could run in any order. There > isn't really any way to detect this kind of dependence. I just worry that > having a magic service that purports to run everything optimally will > encourage people to favor it over their own threads when order really is > important. I see. That is a good point. So correct me if I'm wrong, but I think the ordering of when two tasks start can be relatively easily guaranteed by making sure that all of the worker threads look at the same work queue when they're ready to do some work. That is what nsThreadPool already does. Now, that still won't protect people who expect task B to start running after task A has finished if B is posted after A, but I don't know of any thread pool API that guarantees that, so at least that should not be surprising to the consumers of the API. > > That's actually a very good idea IMO! Are you talking about > > <http://msdn.microsoft.com/en-us/library/windows/desktop/ms686980%28v=vs. > > 85%29.aspx> for Windows? > > I'd love to use that but that API is only available in Vista. The XP and > later API is here: > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686756%28v=vs. > 85%29.aspx Thanks! The MSDN page says the other API is better but doesn't really pinpoint how... > > (*Please* try to suggest solutions in addition to raising the problems! :-) > > It's been a rough week... Yeah, sorry, that was too snarky. :-) So here's another idea. Perhaps we're approaching the problem backwards. The other thing which we can do is to look at an instance of Firefox with a surprising number of threads (any Nightly running your normal browsing session would do I think!), look to see how many threads of which type we have, go look at the consumers to get a sense of what kinds of threads we have hanging around waiting for work in a typical browsing session, and then focus on solving a subset of the general problem. How does that sound?
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #11) > Wrong library, my bad; PPL > (http://msdn.microsoft.com/en-us/library/dd492418.aspx) is the right library. I don't think that's something that we want to use, as it seems to require you to write your code in a specific way, and is non-portable. Also, it seems to be more useful for parallel algorithms, but most (all?) of the work we do on background threads doesn't really fall under that umbrella.
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from comment #15) > Anyway, if are interested in a trivial implementation, I'm willing to give > it a go in a few days. David, can you please discuss your plans first? It seems like we still have a lot of unanswered questions here so I'm not sure if we're ready to work on the implementation yet. Are you interested to undertake my suggestion in the last paragraph of comment 17?
> So correct me if I'm wrong, but I think the ordering of when two tasks start can be > relatively easily guaranteed by making sure that all of the worker threads look at the > same work queue when they're ready to do some work. I don't think so. Say you have a queue with two items: A, B. You have two threads: X, Y. Thread X wakes up, removes A from the queue, immediately loses timeslice before running any of the actual code from A. Then thread Y wakes up, runs B. Then thread A wakes up, runs A. I mean, in some sense we can claim that A started before B, but not in the "A could have performed some atomic operation to tell B that A has started", for example.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #18) > (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #11) > > Wrong library, my bad; PPL > > (http://msdn.microsoft.com/en-us/library/dd492418.aspx) is the right library. > > I don't think that's something that we want to use, as it seems to require > you to write your code in a specific way, and is non-portable. Also, it > seems to be more useful for parallel algorithms, but most (all?) of the work > we do on background threads doesn't really fall under that umbrella. Right, but it also allows to compose tasks (e.g. http://msdn.microsoft.com/en-us/library/dd492427.aspx#composing_tasks); we might want to have a similar API to enforce ordering. Just to be clear, should this API be used only to run background threads or is there a chance that we might want to use it to implement parallel algorithms in the future? It seems clear that the former is what we really want but nonetheless it would be great to have a definitive answer since this is an important design decision.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #17) > > > That's actually a very good idea IMO! Are you talking about > > > <http://msdn.microsoft.com/en-us/library/windows/desktop/ms686980%28v=vs. > > > 85%29.aspx> for Windows? > > > > I'd love to use that but that API is only available in Vista. The XP and > > later API is here: > > > > http://msdn.microsoft.com/en-us/library/windows/desktop/ms686756%28v=vs. > > 85%29.aspx > > Thanks! > > The MSDN page says the other API is better but doesn't really pinpoint how... I've gone a few rounds with the Windows thread pool APIs. Here is why the legacy thread pool API is not as good: * There is a single legacy thread pool per-process. The new API allows you to create multiple pools that are isolated from each other; * The new API has better control over the attributes of each thread pool; * The new API allows you to prioritize work items whereas the old API does not (though this feature is specific to Windows 7 and newer); * The old API does not have a built-in mechanism to wait for all callbacks to complete (though this is easily implemented by ourselves atop the old API); * The new API is able to cancel pending callbacks, the old API cannot; * The new API supports cleaning up resources when they are no longer needed.
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #21) > Just to be clear, should this API be used only to run background threads or > is there a chance that we might want to use it to implement parallel > algorithms in the future? It seems clear that the former is what we really > want but nonetheless it would be great to have a definitive answer since > this is an important design decision. My 2 cents: it's good to keep the scope simple and straightforward here. The ordering issue for tasks can be easily dealt with by making the thread pool API force instantiation of a "submission scope" for every related set of tasks submitted. The class for the submission scope would recieve a unique number (taken from the actual thread pool, which maybe maintains a monotonically increasing int64), and use that number to tag all tasks that have ordering requirements. Any worker thread picking up a task would toggle the "running" flag for a given submission scope, and no other tasks from that submission scope would be scheduled as long as the "running" flag is set. So instead of: Thread *thr = createThread(); runInThread(thr, task); ... runInThread(thr, task); The usage would transform into: ThreadSubmissionContext *subcx = threadPool->newSubmissionContext(); subcx->dispatch(task); ... subcx->dispatch(task); This whole thread got started because we're kicking off 50+ threads on MacOSX for a single tab in firefox. That's the problem to fix. Other problems are other problems, and we can expand the solution as necessary to address that later.
(In reply to Boris Zbarsky [:bz] from comment #20) > > So correct me if I'm wrong, but I think the ordering of when two tasks start can be > > relatively easily guaranteed by making sure that all of the worker threads look at the > > same work queue when they're ready to do some work. > > I don't think so. Say you have a queue with two items: A, B. You have two > threads: X, Y. > > Thread X wakes up, removes A from the queue, immediately loses timeslice > before running any of the actual code from A. Then thread Y wakes up, runs > B. Then thread A wakes up, runs A. > > I mean, in some sense we can claim that A started before B, but not in the > "A could have performed some atomic operation to tell B that A has started", > for example. Yeah, but I think that's OK. Like I said, that is already the level of guarantee that nsThreadPool provides. I don't think we need anything fancier than that. And if we do, the idea in comment 23 seems good to me, and seems straightforward.
(In reply to Roberto Agostino Vitillo (:rvitillo) from comment #21) > (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment > #18) > > (In reply to Roberto Agostino Vitillo (:rvitillo) from comment #11) > > > Wrong library, my bad; PPL > > > (http://msdn.microsoft.com/en-us/library/dd492418.aspx) is the right library. > > > > I don't think that's something that we want to use, as it seems to require > > you to write your code in a specific way, and is non-portable. Also, it > > seems to be more useful for parallel algorithms, but most (all?) of the work > > we do on background threads doesn't really fall under that umbrella. > > Right, but it also allows to compose tasks (e.g. > http://msdn.microsoft.com/en-us/library/dd492427.aspx#composing_tasks); we > might want to have a similar API to enforce ordering. > > Just to be clear, should this API be used only to run background threads or > is there a chance that we might want to use it to implement parallel > algorithms in the future? It seems clear that the former is what we really > want but nonetheless it would be great to have a definitive answer since > this is an important design decision. Parallel algorithms and facilities for running those are outside of the scope here. :-)
I think that as a concrete next step here, someone should investigate the kinds of threads that we currently create for a typical and small workload. I prefer to stick to a simple solution here if possible, but we should first make sure that such a solution would actually be useful to our existing consumers.
I can only think of two, and in those places i used the stream transport service. Maybe we should just keep it and change the backend to use platform-specific scaling on the platforms that support it, leave as threadpool on the others?
Well, what I meant was for someone to actually go through the threads in a running Firefox build with a small workload, list all of the threads, group them into different buckets per consumer and figure out the reason they exist (whether they're used for small one-off tasks.) By "keep it", do you mean using the stream transport service as the mechanism for the scenario that this bug describes?
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #16) > To deal with this currently all async > storage stuff has to create its own thread per connection which I think is > terrible. While it's not be the best concurrency situation we could have, I don't think it's so terrible, exactly cause it mimics the underlying library behavior, thus reducing possibilities of foot guns. Actually it may be improved provided one uses fewer threads in a pool and sends runnables for each connection always to the same thread (so they are naturally enqueued). Though then it would be hard to balance the load across the threads, some may stay barely used while some could be overloaded, some rebalancing would then be required. But this is sort of OT, even if related to ordering tasks properly.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #19) > (In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from > comment #15) > > Anyway, if are interested in a trivial implementation, I'm willing to give > > it a go in a few days. > > David, can you please discuss your plans first? It seems like we still have > a lot of unanswered questions here so I'm not sure if we're ready to work on > the implementation yet. Are you interested to undertake my suggestion in > the last paragraph of comment 17? Sorry for not being clear. I haven't started, but my idea was to basically copy the stream transport service, then add measurements to see how we end up using this new service, and only then decide in direction we wish to see the service evolve.
(In reply to comment #30) > (In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment > #19) > > (In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?") from > > comment #15) > > > Anyway, if are interested in a trivial implementation, I'm willing to give > > > it a go in a few days. > > > > David, can you please discuss your plans first? It seems like we still have > > a lot of unanswered questions here so I'm not sure if we're ready to work on > > the implementation yet. Are you interested to undertake my suggestion in > > the last paragraph of comment 17? > > Sorry for not being clear. I haven't started, but my idea was to basically copy > the stream transport service, then add measurements to see how we end up using > this new service, and only then decide in direction we wish to see the service > evolve. What kind of measurements do you mean? The STS basically gives you an event target against which you can post runnables. Not sure what copying it means...
I'll note the nsThread API can be a pain for these sorts of things: you can't have a task execute one runnable and kill itself without hand-coding a remote Shutdown(), you have to add explicit cleanup code, if you use a (semi) persistent thread you have to do a bunch of locked creation/etc, and if you want it to timeout and go away it's even more work, etc. If a thread wants to kill itself, there should be a standard, shared way to do it, not N hand-rolled versions. So, a few bits of data: AudioStream initialization (esp. on Mac): bug 919215, not landed yet. opening/starting cubeb can take 300ms (and in one case 8.5 seconds). We need to get this off any normal thread, and if in a queue it will block the thread it's working on for a long time. This would benefit from a pool, with possible API indication this is a workload of unknown duration (i.e. it needs a thread, it needs it now, and don't make other things wait for the thread if it has it). MediaManager has a singleton thread used for getUserMedia operations that need to be off mainthread mostly because of blocking: starting/stopping audio/video capture, enumerating audio/video devices, and a few other housekeeping things involving permissions. The main uses are somewhat indeterminate duration as they involve talking to OS drivers/hardware. CryptoTask gets spawned to do crypto ops and report back to MainThread. Not sure of duration; I imagine short-to-medium. I think they also don't get cleaned up properly... DOM Storage uses a bunch (already discussed). Media Decoders use them (though once MSG is converted to "pull" API a bunch of these may go away). You have N ImageDecoders, N DNS resolvers, N mozStorage, N Analysis Helpers, Cert Verify, SSL Certs, URL Classifier, Proxy Resolution Playing a video(s): N Media Decode, N Media S~hine(?), Image Scaler, a whole bunch of multiqueue0:src whatever that is (this is linux), source:src (ditto), aqueue:src (ditto), vqueue:src (ditto), etc. LOTS of threads in total.
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #31) > > Sorry for not being clear. I haven't started, but my idea was to basically copy > > the stream transport service, then add measurements to see how we end up using > > this new service, and only then decide in direction we wish to see the service > > evolve. > > What kind of measurements do you mean? The STS basically gives you an event > target against which you can post runnables. I was thinking of Telemetry statistics about the queue of runnables. Initially: 1. number of runnables queued; 2. time a runnable spends waiting. We will need to optimize once 1. and 2. start rising, which will certainly not be immediate. If/once we start doing smarter things: - number of runnables executed concurrently; - memory used by the pool; - etc. > Not sure what copying it > means... Well, basically, it means start by copying & pasting the source code http://dxr.mozilla.org/mozilla-central/source/netwerk/base/src/nsStreamTransportService.cpp#460-553 .
Attached patch First draft of an API (deleted) — — Splinter Review
First draft of a possible API. * I wrote this as a idl, but we might decide to make it raw C++. * I added nsIDispatchQueue, as clients such as mozStorage would probably depend on this, but I'm not sure we need this in v1. * For a v1, I would tend to write fully cross-platform xpcom code, and only refine to use GCD & al. in v2.
Attachment #8408502 - Flags: feedback?(rvitillo)
Attachment #8408502 - Flags: feedback?(ehsan)
Attachment #8408502 - Flags: feedback?(bent.mozilla)
Comment on attachment 8408502 [details] [diff] [review] First draft of an API The API is *not* the hard part of this problem. I think that any attempt to solve this without understanding what the use cases are is moving in the right direction. In other words, the next step here would be the last paragraph of comment 17, as I've said before. Once we have a better understanding of that, then we can figure out what the API that we need should look like. As you suspected, it won't be an XPIDL API though, as that doesn't really buy us anything since this API will not be callable through JS. Also, about the idea of copying the code of the STS, that's not the right thing to do. Not at least before we have investigated the use cases (and probably not even then.) The fact that we currently use STS for similar purposes is a hack, all the STS gives you is an nsIEventTarget against which you can post runnables.
Attachment #8408502 - Flags: feedback?(rvitillo)
Attachment #8408502 - Flags: feedback?(ehsan)
Attachment #8408502 - Flags: feedback?(bent.mozilla)
Attachment #8408502 - Flags: feedback-
I'm not sure why you decided that my "feedback?"s had to be canceled. bent obviously has been in touch with some of the issues that prevented mozStorage from using the STS, and I was interested in knowing whether this simple API would be sufficient to solve the problems he (and others) are facing. I would appreciate if you could avoid doing this again. Now, if you prefer, I can post the same API in C++ instead of IDL, but since we are early in the design stage, I tend to prefer pseudo-language, to avoid being bogged down by premature details. > So here's another idea. Perhaps we're approaching the problem backwards. The other thing which we can do is to look at an instance of Firefox with a surprising number of threads (any Nightly running your normal browsing session would do I think!), look to see how many threads of which type we have, go look at the consumers to get a sense of what kinds of threads we have hanging around waiting for work in a typical browsing session, and then focus on solving a subset of the general problem. How does that sound? I will see if I can do that. If you have a clear idea on how to do it already, don't hesitate to show the results.
Attached file Snapshot of threads (deleted) —
So, here's a snapshot of my threads during a regular browsing session, on Mac OS X. [waiting cvar] = apparently waiting for a message from another thread [waiting event] = apparently waiting for a message from the system 1. com.apple.main-thread 2. com.apple.libdispatch-manager 3. [waiting event] (I can't identify) 4. Gecko_IOThread 5. Socket Thread, running nsSocketTransportService 6. [waiting cvar] JS GC Helper 7, 20, 24, 44. [waiting] AsmJSMachExceptionHandlerThread, used by JS for exception-handling, designed to be idle 8. [waiting cvar] JS Watchdog 9. [waiting cvar[ Hang Monitor 10. [waiting cvar] BgHangManager 11. [waiting cvar] Cache2 I/O 12. [waiting cvar] Timer Threa 13. [waiting cvar[ Cert Verify 14, 15, 16, 17. [waiting cvar] Analysis Helper, used by JS for background compilation 18. [waiting cvar] Network Seer 19. , 23., 43. [waiting cvar] DOM Worker (?) 21. [waiting event] ? 22. [waiting cvar] Cache I/O 25. [waiting cvar] Compositor 26. [waiting cvar] ImageBridgeChild 27, 29., 30., 35., 38., 39. [waiting cvar] mozStorage 28. [waiting cvar] Proxy R-olution 31. [waiting cvar] SamplerThread 32. [waiting cvar] Event Tracer 33. [waiting cvar] localstorage DB 34. [waiting cvar] HTML5 Parser 36. [waiting cvar] URL Classifier 37., 46., 54. [waiting cvar] DNS Resolver 40. [waiting cvar] MediaStreamGrph 41. [waiting cvar] Latency Logger 42. [waiting event] com.apple.audio.IOThread.client 45., 55, 56 [waiting cvar] Image Decoder 47, 48, 49, 50, 51, 52, 53, 57 - dead or dying 58 [waiting cvar] MediaManager
Interestingly, I don't see the STS in the list.
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #37) > Created attachment 8408952 [details] > Snapshot of threads > > So, here's a snapshot of my threads during a regular browsing session, on > Mac OS X. Thanks for gathering this data, let me do a first pass on the analysis. The items I've marked as (***) below are the ones that I think we should investigate further. > [waiting cvar] = apparently waiting for a message from another thread > [waiting event] = apparently waiting for a message from the system > > 1. com.apple.main-thread > 2. com.apple.libdispatch-manager > 3. [waiting event] (I can't identify) That seems to be related to the crash reporter. > 4. Gecko_IOThread (***) This seems to be what XRE_GetIOMessageLoop() returns. Looking at some of the callers, this seems very similar to the idea behind this bug. I can't actually tell what enforces that this is only used for I/O. We also seem to be doing some if not all of our IPC on this thread. > 5. Socket Thread, running nsSocketTransportService (***) This is the STS. :-) > 6. [waiting cvar] JS GC Helper > 7, 20, 24, 44. [waiting] AsmJSMachExceptionHandlerThread, used by JS for > exception-handling, designed to be idle > 8. [waiting cvar] JS Watchdog Let's ignore all things JS related for now, as getting those consumers to change is a different kind of pain anyways. > 9. [waiting cvar[ Hang Monitor > 10. [waiting cvar] BgHangManager Sigh, I didn't know we have two of these threads. Hopefully there's a good reason. > 11. [waiting cvar] Cache2 I/O This seems like it does fairly complicated things. > 12. [waiting cvar] Timer Threa We need to keep this around. > 13. [waiting cvar[ Cert Verify (***) This we should be able to get rid of based on a quick look. > 14, 15, 16, 17. [waiting cvar] Analysis Helper, used by JS for background > compilation More JS things. > 18. [waiting cvar] Network Seer (***) This seems like something which we should be able to get rid of as well. > 19. , 23., 43. [waiting cvar] DOM Worker (?) These are the OS level threads implementing DOM workers, nothing we can do about them. > 21. [waiting event] ? Seems like a system thread. > 22. [waiting cvar] Cache I/O Sigh, so we have two cache IO threads. :( > 25. [waiting cvar] Compositor This we will need to keep around as the frame compositions are time sensitive. > 26. [waiting cvar] ImageBridgeChild This seems like some gfx thing, who can say whether it's actually needed? > 27, 29., 30., 35., 38., 39. [waiting cvar] mozStorage (***) Some of the users of these might be interesting to look at too. > 28. [waiting cvar] Proxy R-olution (***) This is nsPACMan, we should be able to get rid of it. > 31. [waiting cvar] SamplerThread This is the profiler sampler thread, we need to keep it. Plus, our users should not run this. > 32. [waiting cvar] Event Tracer Our users should not run this. > 33. [waiting cvar] localstorage DB I think we may need to keep this around. > 34. [waiting cvar] HTML5 Parser This too. > 36. [waiting cvar] URL Classifier This one I'm not so sure of. > 37., 46., 54. [waiting cvar] DNS Resolver (***) I think (hope) we can get rid of this. > 40. [waiting cvar] MediaStreamGrph This one we need to keep around as it performs time sensitive audio/video stuff. > 41. [waiting cvar] Latency Logger This is hopefully debug-only audio latency logging stuff. > 42. [waiting event] com.apple.audio.IOThread.client This talks to the OS, not much we can do here. > 45., 55, 56 [waiting cvar] Image Decoder (***) I think (hope) we can get rid of these. > 47, 48, 49, 50, 51, 52, 53, 57 - dead or dying I like these the most! ;-) > 58 [waiting cvar] MediaManager This seems to be related to WebRTC, let's not worry about this for now.
(In reply to David Rajchenbach Teller [:Yoric] (please use "needinfo?" - I'll be away on April 9th-16th) from comment #36) > I'm not sure why you decided that my "feedback?"s had to be canceled. bent > obviously has been in touch with some of the issues that prevented > mozStorage from using the STS, and I was interested in knowing whether this > simple API would be sufficient to solve the problems he (and others) are > facing. Oh, sorry, it just seemed to me like you're asking for feedback on the API design, not whether it will be possible for the API design to address those mozStorage use cases. > I would appreciate if you could avoid doing this again. Now, if you prefer, > I can post the same API in C++ instead of IDL, but since we are early in the > design stage, I tend to prefer pseudo-language, to avoid being bogged down > by premature details. Sure. But I think we should think about the API design once we finish analyzing the existing use cases. > > So here's another idea. Perhaps we're approaching the problem backwards. The other thing which we can do is to look at an instance of Firefox with a surprising number of threads (any Nightly running your normal browsing session would do I think!), look to see how many threads of which type we have, go look at the consumers to get a sense of what kinds of threads we have hanging around waiting for work in a typical browsing session, and then focus on solving a subset of the general problem. How does that sound? > > I will see if I can do that. If you have a clear idea on how to do it > already, don't hesitate to show the results. You've already started on the right track. :-) I think a good next step would be to go through the list of (***) above, look more closely into each one of them and try to determine the use cases and whether a generic API can be used to satisfy them, and also gather some possible constraints about each one of them. Once we have more details on those we can think about what we need to build in order to move off those consumers of having their own threads.
See also the patch in bug 988464, which logs threads that haven't ended by the end of the shutdown-threads observer (often leaked, or semi-temporary pools or single-instance threads that didn't bother to have shutdown handlers and just rely on the ThreadManager backstop). We've found two thread leaks with this patch. You can also have up to ~8(+?) storage threads running, and a lot of other temporary ones. Some subsystems start a bunch of temporary pthreads, like webrtc, or nsThreads, like Storage or MediaDecoder, but they should clean them up. We might be able to reduce the number of these (though the nsTimer restriction my limit that, note). I will note that the pain (and overhead) of spawning threads to handle things we'd like to be async forces people to push stuff to MainThread or STS, or spawn semi-permanent threads, or risk bugs implementing temporary threads (see CryptoTask and getUserMedia permission thread leaks, and related problems having an easy way to shut down the current thread). And there's other stuff that we just run on the current thread when feeding it off would be better than blocking the current thread. The APIs we make available affect how we design and write our software, and this tends to push us into single-threaded paradigms or permanent specialized threads and misuse of easy-to-dispatch-to threads like MainThread and STS.
(In reply to Randell Jesup [:jesup] from comment #41) > I will note that the pain (and overhead) of spawning threads to handle > things we'd like to be async forces people to push stuff to MainThread or > STS, or spawn semi-permanent threads Why is pushing things onto STS (which presumably backs things with a thread pool) a problem? Because it risks getting overloaded with stuff and/or the load it then carries isn't appropriate for whatever parameters we tuned the thread pool for? (Assuming we've tried to tune it...)
(In reply to comment #42) > Why is pushing things onto STS (which presumably backs things with a thread > pool) a problem? Because it risks getting overloaded with stuff and/or the > load it then carries isn't appropriate for whatever parameters we tuned the > thread pool for? (Assuming we've tried to tune it...) STS uses a single thread.
Comment on attachment 8408502 [details] [diff] [review] First draft of an API Review of attachment 8408502 [details] [diff] [review]: ----------------------------------------------------------------- I tend to agree with Ehsan, let's figure out our list of use cases before trying to design anything. In general though I don't think we need much more than nsIEventTarget... I've only had to do something more complicated a few times.
Attachment #8408502 - Flags: feedback?(bent.mozilla)
(In reply to ben turner [:bent] (use the needinfo? flag!) from comment #44) > I tend to agree with Ehsan, let's figure out our list of use cases before > trying to design anything. > > In general though I don't think we need much more than nsIEventTarget... > I've only had to do something more complicated a few times. I was actually interested specifically on the use case of mozStorage that you mentioned previously. If I understand correctly, there are two issues that prevent the use of STS for mozStorage: 1. references being passed between too many threads (as per your comment 16); 2. ordering database requests. The draft I posted is basically a nsIEventTarget that handles 2. and tries to make refcounting of Runnable more predictable, which should help a little (not much) with 1. I am wondering what else we would be missing for mozStorage.
Flags: needinfo?(bent.mozilla)
(In reply to :Ehsan Akhgari (lagging on bugmail, needinfo? me!) from comment #43) > (In reply to comment #42) > > Why is pushing things onto STS (which presumably backs things with a thread > > pool) a problem? Because it risks getting overloaded with stuff and/or the > > load it then carries isn't appropriate for whatever parameters we tuned the > > thread pool for? (Assuming we've tried to tune it...) > > STS uses a single thread. Yup - and that can't (reasonably) change; implicit I'm-on-the-STS-thread locking would get broken. And we shouldn't be (over)loading STS with anything that isn't related to socket IO or NSS (NSS is basically single-threaded and must run on STS).
(In reply to David Rajchenbach Teller [:Yoric] from comment #45) MozStorage might be able to use such a mechanism (I'm not certain, :mak might know better), but IndexedDB couldn't since we try to run some things in parallel and we need much more control over the scheduling.
Flags: needinfo?(bent.mozilla)
Can we just duplicate what STS does today and give it a different xpcom ID?
Yes, that was my initial plan. Duplicate STS for the moment, add more features later if we find out we need them.
David, do you have time to work on this now? If not, do you mind if one of us steals it? I think we might want it sooner rather than later for SW stuff?
If it's urgent, go ahead and steal it.
Assignee: dteller → bkelly
Status: NEW → ASSIGNED
I think I'm just going to implement the nsIEventTarget Dispatch() interface. I don't think most uses of STS we need to replace use CreateInputTransport(), etc.
I guess nsDOMFileReader uses CreateInputTransport(). I'm not sure it needs to, though.
Alternatively, Nathan suggests moving the STS code into xpcom/io under a different name, and then having necko's STS inherit from it. This would avoid code duplication and the separate xpcom IDs would give us different thread pool instances.
No longer blocks: 1147850
No longer blocks: 1147870
So it turns out most of the places using StreamTransportService are fine. So I'm not going to be working on this now.
Assignee: bkelly → nobody
Status: ASSIGNED → NEW
No longer blocks: 1147863
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: