Closed Bug 1532838 Opened 6 years ago Closed 5 years ago

Tune minimum nursery size using a more realistic benchmark

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla68

Project Flags:

Performance Impact

low

Tracking Flags:

Tracking

Status

firefox68

---

fixed

People

(Reporter: pbone, Assigned: pbone)

References

(Regressed 1 open bug)

Details

(Keywords: perf:resource-use)

Attachments

(4 files)

test.pdf 5 years ago Paul Bone [:pbone] (deleted), application/pdf		Details
Bug 1532838 - Use correct units in a preference name r=jonco 5 years ago Paul Bone [:pbone] (deleted), text/x-phabricator-request		Details
Bug 1532838 - Add a pref for the minimum nursery size r?jonco 5 years ago Paul Bone [:pbone] (deleted), text/x-phabricator-request		Details
Bug 1532838 - Set minimum nursery size to 256KB r=jonco 5 years ago Paul Bone [:pbone] (deleted), text/x-phabricator-request		Details

Paul Bone [:pbone]

Assignee

Description

•

6 years ago

In Bug 1525983 Jon found that collecting the nursery more frequently (4x as frequently) is causing 0.5x more time spend in minor GCs. One reason might be that the current minimum 192K is not appropriate for that benchmark. Is it appropriate for typical use / JS games?

Nursery collection is already well under 100us for these workloads, so this change wont affect responsiveness and maybe shouldn't be a QF bug. However it will affect throughput.

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Whiteboard: [qf] → [qf:p3:resource]

Paul Bone [:pbone]

Assignee

Comment 1

•

6 years ago

Hi Jean,

To tune this parameter we (Jon and I) think it'd be best to use quantum flow reference hardware. Should I have some hardware shipped here (Australia) for what is probably a once-off use? Or can I direct someone with this hardware to run some tests for me?

Thanks.

Flags: needinfo?(jgong)

Paul Bone [:pbone]

Assignee

Comment 2

•

6 years ago

I was discussing this with jonco and we think it'd also be interesting to use a microbenchmark to determine where the theoretical limits of the nursery & cache are. I'm working on that now.

Assignee: nobody → pbone

Status: NEW → ASSIGNED

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Depends on: 1544648

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Depends on: 1544651

Paul Bone [:pbone]

Assignee

Comment 3

•

5 years ago

Attached file test.pdf (deleted) — Details

Hi Jonco, Here's the results I got from this test. The optimal nursery size for this test is 288KB, that maximises the throughput. I unfortunatly didn't measure throughput of the mutator directly, so I'm using allocation rate as a proxy.

There's two things happening for allocation rate here. If the GC runs less (as a percentage of total runtime) then the mutator can run more, and therefore throughput improves, that's the first allocation rate colloum, which is how much gets allocated for the entire 20s duration.

The 2nd coloum, shows allocation rate pers second but only counts while the mutator is running, that gives an idea of how the GC can affect the performance of the mutator. In this case I think due to cache as the allocations in the nursery touch memory and more memory needs to be brought in to the 256KB L2 cache (my laptop, I don't have reference hardware). So we see this figure begin to drop after 288KB, My best guess about why it didn't drop before 256KB is that the hardware prefetching is able to keep up with it for a while. We then see the allocation rate drop, and begin to pick up again, I'm not sure why it picks up again so easilly.

Maybe we should raise the minimum nursery size to 256KB, that should improve this case, but not much of course and still keeps baseline memory usage lower. Later when we know if our tab is in the fore/back-groud we can lower it for background tabs and maybe even rase it quite a lot for foreground tabs.

The elephant in the room in this data is the time taken to mark the Roots, Mostly that's the mkRntm (mark runtime) column from the raw data. That is quite large compared to the time actually taken to do the collection. But even when we reduce it (I tried subtracting it from the total GC time) the GC still spends more time GCing at small sizes, I'm not sure which phase dominates things then but that doesn't help us today anyway.

Paul Bone [:pbone]

Assignee

Comment 4

•

5 years ago

Attached file Bug 1532838 - Use correct units in a preference name r=jonco (deleted) — Details

Paul Bone [:pbone]

Assignee

Comment 5

•

5 years ago

Attached file Bug 1532838 - Add a pref for the minimum nursery size r?jonco (deleted) — Details

Depends on D29814

Paul Bone [:pbone]

Assignee

Comment 6

•

5 years ago

Attached file Bug 1532838 - Set minimum nursery size to 256KB r=jonco (deleted) — Details

Depends on D29815

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 7

•

5 years ago

(In reply to Paul Bone [:pbone] from comment #3)

Thanks for doing this testing.

Maybe we should raise the minimum nursery size to 256KB, that should improve this case, but not much of course and still keeps baseline memory usage lower.

Yes that sounds good to me.

Pulsebot

Comment 8

•

5 years ago

Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/81d3a25685dc
Use correct units in a preference name r=jonco
https://hg.mozilla.org/integration/autoland/rev/fbf69131cbda
Add a pref for the minimum nursery size r=mccr8
https://hg.mozilla.org/integration/autoland/rev/2509defe2779
Set minimum nursery size to 256KB r=jonco

Paul Bone [:pbone]

Assignee

Comment 9

•

5 years ago

Hi perfherders,

This change may regress memory usage a little bit, but that's okay.

Thanks.

Flags: needinfo?(igoldan)

Flags: needinfo?(fstrugariu)

Paul Bone [:pbone]

Assignee

Comment 10

•

5 years ago

I did the testing/tuning on my X1 Carbon - not reference hardware. This computer has a 256KB L2 cache.

Flags: needinfo?(jgong)

Alexandru Michis [:malexandru]

Comment 11

•

5 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/81d3a25685dc
https://hg.mozilla.org/mozilla-central/rev/fbf69131cbda
https://hg.mozilla.org/mozilla-central/rev/2509defe2779

Status: ASSIGNED → RESOLVED

Closed: 5 years ago

status-firefox68: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla68

Florin Strugariu [:Bebe]

Comment 12

•

5 years ago

And the expected regression:

== Change summary for alert #20826 (as of Mon, 06 May 2019 17:21:34 GMT) ==

Regressions:

3% Base Content JS windows7-32-shippable opt 3,174,032.00 -> 3,255,554.67
2% Base Content JS linux64-shippable-qr opt 4,014,694.67 -> 4,077,597.33
2% Base Content JS osx-10-10-shippable opt 4,015,438.67 -> 4,078,466.67
2% Base Content JS linux64-shippable opt 4,014,724.00 -> 4,077,464.00
2% Base Content JS windows10-64-shippable-qr opt 4,078,922.67 -> 4,142,024.00
2% Base Content JS windows10-64-shippable opt 4,078,969.33 -> 4,142,170.67

For up to date results, see: https://treeherder.mozilla.org/perf.html#/alerts?id=20826

:pbone please confirm that this is accepted

Flags: needinfo?(pbone)

Flags: needinfo?(igoldan)

Flags: needinfo?(fstrugariu)

Andreea Pavel [:apavel]

Updated

•

5 years ago

Regressions: 1533762

Paul Bone [:pbone]

Assignee

Comment 13

•

5 years ago

Thanks Bebe, yeah that's the amount of regression we were expecting. Thanks.

Flags: needinfo?(pbone)

Paul Bone [:pbone]

Assignee

Updated

•

5 years ago

Blocks: 1550382

Dave Hunt [:davehunt] [he/him] ⌚BST

Updated

•

3 years ago

Performance Impact: --- → P3

Keywords: perf:resource-use

Whiteboard: [qf:p3:resource]

You need to log in before you can comment on or make changes to this bug.