Closed Bug 1436247 Opened 7 years ago Closed 7 years ago

Image decoding threadpool size seems quite large

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla60

Tracking Flags:

Tracking

Status

firefox60

---

fixed

People

(Reporter: bzbarsky, Assigned: aosmond)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Whiteboard: [MemShrink][gfx-noted])

Crash Data

Attachments

(4 files)

0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 7 years ago Andrew Osmond [:aosmond] (he/him) (deleted), patch	tnikkel : review+	Details \| Diff \| Splinter Review
0002-Bug-1436247-Part-2.-Shutdown-idle-image-decoder-thre.patch, v1 7 years ago Andrew Osmond [:aosmond] (he/him) (deleted), patch	tnikkel : review+	Details \| Diff \| Splinter Review
0003-Bug-1436247-Part-3.-Process-image-decoding-tasks-in-.patch, v1 7 years ago Andrew Osmond [:aosmond] (he/him) (deleted), patch	tnikkel : review+	Details \| Diff \| Splinter Review
0004-Bug-1436247-Part-4.-Fix-image-DecodePool.h-inclusion.patch, v1 7 years ago Andrew Osmond [:aosmond] (he/him) (deleted), patch	tnikkel : review+	Details \| Diff \| Splinter Review

Boris Zbarsky [:bzbarsky]

Reporter

Description

•

7 years ago

It looks like our image decoding threadpool size is set to number of logical CPUs minus 1, _and_ preallocates all the threads. So in my case, on a 4-core maching, I have 7 threads here. On an 8-core machine I have 15 threads. Do we ever actually end up using 15 threads effectively here? Especially in a multiprocess world, with various other things contending for CPU timeslices across all the processes? I don't think we have any other threadpools that have this sort of behavior.... This came up in bug https://bugzilla.mozilla.org/show_bug.cgi?id=1436179 because the memory overhead of those 15 threads ends up being about 600KB just for the profiler metadata.

Boris Zbarsky [:bzbarsky]

Reporter

Updated

•

7 years ago

Blocks: memshrink-content

Ben Kelly [:bkelly, not reviewing]

Updated

•

7 years ago

Whiteboard: [MemShrink]

Ben Kelly [:bkelly, not reviewing]

Comment 1

•

7 years ago

Seems like at a minimum we should let the threadpool deallocate threads down to some reasonable number of idle threads.

Randell Jesup [:jesup] (needinfo me)

Comment 2

•

7 years ago

I imagine the typical load on the decoder threads is highly bursty; during pageloads it will spike to use (all?) available threads on many pages. IIRC we don't decode all images for a page (especially if large) until they're made visible (or perhaps about to), so scrolling and the like can cause other bursts. I think a max of all cores isn't *unreasonable*, though you can argue optimality. Using all cores might limit desktop (or laptop) frequency maximums, though I don't see that as a major concern. Given the burstiness, it might make sense to allow the pool to keep some threads around (argue as to how many), and keep others temporarily (N seconds?) However, anything that drops the threads will cause us to spend cycles (and time) recreating them repeatedly, hurting our perf and pageload latency (a little) and out power use (a very little). These hits might be minimal, especially if we keep (say 4) threads alive, but requires checking.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 3

•

7 years ago

I do have some patches that rework our threading a bit. One part spawns the one decoder thread always (and release asserts its creation) and the remaining threads on demand (if there are pending items in the queue, and we have capacity in our pool, attempt to spawn another). It never releases threads from the pool however. My own local testing suggests that it will save us from spawning more than 1 thread in the parent process (since it does limited decoding...) and 1/2 to 2/3s of the threads get spawned for content processes on the new tab page. I need to do some testing on an optimized build to see if that changes things.

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

7 years ago

Assignee: nobody → aosmond

Randell Jesup [:jesup] (needinfo me)

Comment 4

•

7 years ago

Reducing it in the Master process would be a big win (doubly so until we can get the Profiler to stop allocating 32K+ per thread, just in case). You might want to throw a pref in for how many to keep alive, and there definitely should be *some* hysteresis to dropping threads to avoid churn. I'd also suggest playing with these parameters a timing something like page-down or End on heavy-image pages. Things like speedometer (and probably talos page-load tests) won't show any issues here, or won't show them much, so other ideas are welcome.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 5

•

7 years ago

Attached patch 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 (deleted) — Details — Splinter Review

This patch makes us spawn a new thread when we have more tasks than idle threads. It will always spawn one at the beginning.

Attachment #8950325 - Flags: review?(tnikkel)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 6

•

7 years ago

Attached patch 0002-Bug-1436247-Part-2.-Shutdown-idle-image-decoder-thre.patch, v1 (deleted) — Details — Splinter Review

This patch will shutdown threads after a configurable timeout. I set the default to 10 minutes, although that was not chosen for any particular reason other than "long." It will keep at least half of the maximum number of threads, rounded up -- only the excess threads can be shutdown due to being idle.

Attachment #8950327 - Flags: review?(tnikkel)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 7

•

7 years ago

Attached patch 0003-Bug-1436247-Part-3.-Process-image-decoding-tasks-in-.patch, v1 (deleted) — Details — Splinter Review

Not mandatory, just the FILO task ordering always seemed strange to me vs FIFO?

Attachment #8950328 - Flags: review?(tnikkel)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 8

•

7 years ago

Attached patch 0004-Bug-1436247-Part-4.-Fix-image-DecodePool.h-inclusion.patch, v1 (deleted) — Details — Splinter Review

I noticed when rebuilding that some headers included DecodePool.h when they did not have to. try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=53c534ab9f1c7a1597a74669dc5c4a0b541a65b8 try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=28e8279d6bd360ef96e17fb954bb85caf956fd90

Attachment #8950329 - Flags: review?(tnikkel)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 9

•

7 years ago

There is a related crash (see signature) where we cannot create all the desired threads on startup. These patches will fix the crash most likely (unless it is the first thread we fail to spawn), albeit incidentally. If we get hammered with requests, it will try to spawn a new thread and fail later -- and it will keep retrying to spawn that thread as long as the queue depth is greater than the number of idle threads. There will clearly be a runtime cost to this, but maybe I'm worrying over nothing, as a system which cannot spawn a thread will probably crash due to OOM or something? :)

Crash Signature: mozilla::image::DecodePool::DecodePool

Priority: -- → P3

Whiteboard: [MemShrink] → [MemShrink][gfx-noted]

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

7 years ago

Crash Signature: mozilla::image::DecodePool::DecodePool → [@ mozilla::image::DecodePool::DecodePool ]

Timothy Nikkel (:tnikkel)

Comment 10

•

7 years ago

Comment on attachment 8950325 [details] [diff] [review] 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 Review of attachment 8950325 [details] [diff] [review]: ----------------------------------------------------------------- ::: image/DecodePool.cpp @@ +63,4 @@ > , mShuttingDown(false) > + { > + MonitorAutoLock lock(mMonitor); > + bool rv = CreateThread(); Usually rv is for nsresult I thought? So just |success| would be okay here. @@ +110,5 @@ > // Drop any new work on the floor if we're shutting down. > return; > } > > + // If there are pending tasks, create more workers if and only if we have It seems like this block would make more sense after we append to the queue? Or am I missing something?

Attachment #8950325 - Flags: review?(tnikkel) → review+

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8950329 - Flags: review?(tnikkel) → review+

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8950328 - Flags: review?(tnikkel) → review+

Timothy Nikkel (:tnikkel)

Updated

•

7 years ago

Attachment #8950327 - Flags: review?(tnikkel) → review+

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 11

•

7 years ago

(In reply to Timothy Nikkel (:tnikkel) from comment #10) > Comment on attachment 8950325 [details] [diff] [review] > 0001-Bug-1436247-Part-1.-Spawn-image-decoder-threads-on-d.patch, v1 > > Review of attachment 8950325 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: image/DecodePool.cpp > @@ +63,4 @@ > > , mShuttingDown(false) > > + { > > + MonitorAutoLock lock(mMonitor); > > + bool rv = CreateThread(); > > Usually rv is for nsresult I thought? So just |success| would be okay here. > Yes, my bad. Will do. > @@ +110,5 @@ > > // Drop any new work on the floor if we're shutting down. > > return; > > } > > > > + // If there are pending tasks, create more workers if and only if we have > > It seems like this block would make more sense after we append to the queue? > Or am I missing something? You are correct. What you are missing was how the check to decide to spawn a thread changed :). I will move it.

Pulsebot

Comment 12

•

7 years ago

Pushed by aosmond@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c32ead4e3525 Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/3650631487c7 Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/4ce2bfe462a0 Part 3. Process image decoding tasks in a more predictable order. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/858d629f761d Part 4. Fix image/DecodePool.h inclusions. r=tnikkel

Eliza Balazs [:ebalazs_]

Comment 13

•

7 years ago

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Updated

•

7 years ago

Depends on: 1421150

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 14

•

7 years ago

try: https://treeherder.mozilla.org/#/jobs?repo=try&revision=43f4fd3a4b65c5ee81b1a41fe518472f25b35055 This suggests part 3 is the problematic piece. It is interesting because I guess if you are a test, and you only have one image, it *is* to your advantage that the most recently requested image is decoded first... I'm going to try landing parts 1, 2 and 4, and consider landing part 3 in the future once I get other machinery in place / tests fixed.

Flags: needinfo?(aosmond)

Pulsebot

Comment 15

•

7 years ago

Pushed by aosmond@gmail.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/7c913713301f Part 1. Spawn image decoder threads on demand, rather than at startup. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/e65b7fee0060 Part 2. Shutdown idle image decoder threads after the configured timeout. r=tnikkel https://hg.mozilla.org/integration/mozilla-inbound/rev/38557fb9f111 Part 3. Fix image/DecodePool.h inclusions. r=tnikkel

Raul Gurzau (:RaulG)

Comment 16

•

7 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/7c913713301f https://hg.mozilla.org/mozilla-central/rev/e65b7fee0060 https://hg.mozilla.org/mozilla-central/rev/38557fb9f111

Status: NEW → RESOLVED

Closed: 7 years ago

status-firefox60: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla60

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 17

•

7 years ago

I'm a little worried here where I'm only seeing one image decoder thread on the latest nightly in my content processes. Not at all what I saw during my local dev testing. I need to investigate further.

Flags: needinfo?(aosmond)

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 18

•

7 years ago

My local builds (debug and optimized) have the expected behaviour. Maybe it is related to pgo.

Andrew Osmond [:aosmond] (he/him)

Assignee

Comment 19

•

7 years ago

User error. I had changed image.multithreaded_decoding.limit to 1 for testing in my nightly profile at some point, and completely forgot that I did so.

Flags: needinfo?(aosmond)

You need to log in before you can comment on or make changes to this bug.