Closed Bug 1436247 Opened 7 years ago Closed 7 years ago

Image decoding threadpool size seems quite large

Categories

(Core :: Graphics: ImageLib, enhancement, P3)

enhancement

Tracking

()

RESOLVED FIXED
mozilla60
Tracking Status
firefox60 --- fixed

People

(Reporter: bzbarsky, Assigned: aosmond)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [MemShrink][gfx-noted])

Crash Data

Attachments

(4 files)

It looks like our image decoding threadpool size is set to number of logical CPUs minus 1, _and_ preallocates all the threads. So in my case, on a 4-core maching, I have 7 threads here. On an 8-core machine I have 15 threads. Do we ever actually end up using 15 threads effectively here? Especially in a multiprocess world, with various other things contending for CPU timeslices across all the processes? I don't think we have any other threadpools that have this sort of behavior.... This came up in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1436179 because the memory overhead of those 15 threads ends up being about 600KB just for the profiler metadata.
Whiteboard: [MemShrink]
Seems like at a minimum we should let the threadpool deallocate threads down to some reasonable number of idle threads.
I imagine the typical load on the decoder threads is highly bursty; during pageloads it will spike to use (all?) available threads on many pages. IIRC we don't decode all images for a page (especially if large) until they're made visible (or perhaps about to), so scrolling and the like can cause other bursts. I think a max of all cores isn't *unreasonable*, though you can argue optimality. Using all cores might limit desktop (or laptop) frequency maximums, though I don't see that as a major concern. Given the burstiness, it might make sense to allow the pool to keep some threads around (argue as to how many), and keep others temporarily (N seconds?) However, anything that drops the threads will cause us to spend cycles (and time) recreating them repeatedly, hurting our perf and pageload latency (a little) and out power use (a very little). These hits might be minimal, especially if we keep (say 4) threads alive, but requires checking.
I do have some patches that rework our threading a bit. One part spawns the one decoder thread always (and release asserts its creation) and the remaining threads on demand (if there are pending items in the queue, and we have capacity in our pool, attempt to spawn another). It never releases threads from the pool however. My own local testing suggests that it will save us from spawning more than 1 thread in the parent process (since it does limited decoding...) and 1/2 to 2/3s of the threads get spawned for content processes on the new tab page. I need to do some testing on an optimized build to see if that changes things.
Assignee: nobody → aosmond
Reducing it in the Master process would be a big win (doubly so until we can get the Profiler to stop allocating 32K+ per thread, just in case). You might want to throw a pref in for how many to keep alive, and there definitely should be *some* hysteresis to dropping threads to avoid churn. I'd also suggest playing with these parameters a timing something like page-down or End on heavy-image pages. Things like speedometer (and probably talos page-load tests) won't show any issues here, or won't show them much, so other ideas are welcome.
This patch makes us spawn a new thread when we have more tasks than idle threads. It will always spawn one at the beginning.
Attachment #8950325 - Flags: review?(tnikkel)
This patch will shutdown threads after a configurable timeout. I set the default to 10 minutes, although that was not chosen for any particular reason other than "long." It will keep at least half of the maximum number of threads, rounded up -- only the excess threads can be shutdown due to being idle.
Attachment #8950327 - Flags: review?(tnikkel)
Not mandatory, just the FILO task ordering always seemed strange to me vs FIFO?
Attachment #8950328 - Flags: review?(tnikkel)
There is a related crash (see signature) where we cannot create all the desired threads on startup. These patches will fix the crash most likely (unless it is the first thread we fail to spawn), albeit incidentally. If we get hammered with requests, it will try to spawn a new thread and fail later -- and it will keep retrying to spawn that thread as long as the queue depth is greater than the number of idle threads. There will clearly be a runtime cost to this, but maybe I'm worrying over nothing, as a system which cannot spawn a thread will probably crash due to OOM or something? :)
Crash Signature: mozilla::image::DecodePool::DecodePool
Priority: -- → P3
Whiteboard: [MemShrink] → [MemShrink][gfx-noted]
Crash Signature: mozilla::image::DecodePool::DecodePool → [@ mozilla::image::DecodePool::DecodePool ]
Comment on attachment 8950325 [details] [diff] [review] 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 Review of attachment 8950325 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/DecodePool.cpp @@ +63,4 @@ > , mShuttingDown(false) > + { > + MonitorAutoLock lock(mMonitor); > + bool rv = CreateThread(); Usually rv is for nsresult I thought? So just |success| would be okay here. @@ +110,5 @@ > // Drop any new work on the floor if we're shutting down. > return; > } > > + // If there are pending tasks, create more workers if and only if we have It seems like this block would make more sense after we append to the queue? Or am I missing something?
Attachment #8950325 - Flags: review?(tnikkel) → review+
Attachment #8950329 - Flags: review?(tnikkel) → review+
Attachment #8950328 - Flags: review?(tnikkel) → review+
Attachment #8950327 - Flags: review?(tnikkel) → review+
(In reply to Timothy Nikkel (:tnikkel) from comment #10) > Comment on attachment 8950325 [details] [diff] [review] > 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 > > Review of attachment 8950325 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: image/DecodePool.cpp > @@ +63,4 @@ > > , mShuttingDown(false) > > + { > > + MonitorAutoLock lock(mMonitor); > > + bool rv = CreateThread(); > > Usually rv is for nsresult I thought? So just |success| would be okay here. > Yes, my bad. Will do. > @@ +110,5 @@ > > // Drop any new work on the floor if we're shutting down. > > return; > > } > > > > + // If there are pending tasks, create more workers if and only if we have > > It seems like this block would make more sense after we append to the queue? > Or am I missing something? You are correct. What you are missing was how the check to decide to spawn a thread changed :). I will move it.
Pushed by aosmond@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c32ead4e3525 Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/3650631487c7 Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/4ce2bfe462a0 Part 3. Process image decoding tasks in a more predictable order. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/858d629f761d Part 4. Fix image/DecodePool.h inclusions. r=tnikkel
Depends on: 1421150
try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43f4fd3a4b65c5ee81b1a41fe518472f25b35055 This suggests part 3 is the problematic piece. It is interesting because I guess if you are a test, and you only have one image, it *is* to your advantage that the most recently requested image is decoded first... I'm going to try landing parts 1, 2 and 4, and consider landing part 3 in the future once I get other machinery in place / tests fixed.
Flags: needinfo?(aosmond)
Pushed by aosmond@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7c913713301f Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/e65b7fee0060 Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/38557fb9f111 Part 3. Fix image/DecodePool.h inclusions. r=tnikkel
I'm a little worried here where I'm only seeing one image decoder thread on the latest nightly in my content processes. Not at all what I saw during my local dev testing. I need to investigate further.
Flags: needinfo?(aosmond)
My local builds (debug and optimized) have the expected behaviour. Maybe it is related to pgo.
User error. I had changed image.multithreaded_decoding.limit to 1 for testing in my nightly profile at some point, and completely forgot that I did so.
Flags: needinfo?(aosmond)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: