Closed Bug 1497873 Opened 6 years ago Closed 6 years ago

Investigate improving the nursery resizing heuristic

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla67

Tracking Flags:

Tracking

Status

firefox67

---

fixed

People

(Reporter: jonco, Assigned: pbone)

References

(Blocks 2 open bugs)

Details

(Whiteboard: [MemShrink:P2])

Attachments

(5 files, 5 obsolete files)

Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing 6 years ago Paul Bone [:pbone] (deleted), patch		Details \| Diff \| Splinter Review
Bug 1497873 - Resizing also depends on previous promotion rate 6 years ago Paul Bone [:pbone] (deleted), text/plain		Details
Tweaking the pretenure condition 6 years ago Paul Bone [:pbone] (deleted), patch		Details \| Diff \| Splinter Review
Test results showing changes in nursery size for Ares6 Air test. 6 years ago Paul Bone [:pbone] (deleted), text/plain		Details
Test results showing changes in nursery size for Ares6 Babylon test. 6 years ago Paul Bone [:pbone] (deleted), text/plain		Details
Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing 6 years ago Paul Bone [:pbone] (deleted), patch		Details \| Diff \| Splinter Review
charts.zip 6 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), application/zip		Details
Bug 1497873 - Tweaking pretenure condition 6 years ago Paul Bone [:pbone] (deleted), patch		Details \| Diff \| Splinter Review
Bug 1497873 - Tweak the pretenure condition r=jonco! 6 years ago Paul Bone [:pbone] (deleted), text/x-phabricator-request		Details
Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing r=jonco! 6 years ago Paul Bone [:pbone] (deleted), text/x-phabricator-request		Details

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Description

•

6 years ago

Currently we resize the nursery depending on the promotion rate and the size of the nursery used, and according to some thresholds we can double the size of the nursery or shrink it by one chunk.

The problem with this is that growth is very aggressive while shrinking can take a while to have any effect and that the nursery size can end up growing and shrinking a lot even for a uniform workload.

It might make more sense instead to estimate an optimal nursery size based on the data we have, possibly using a moving average, and then resize to that.

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Updated

•

6 years ago

Priority: -- → P3

Paul Bone [:pbone]

Assignee

Comment 1

•

6 years ago

Depends on Bug 1433007 because that bug complicates the GC size calculation and is already underway.

Blocks Bug 1449600 because this may make the nursery size more stable and therefore be "nicer" to the chunk allocator.

Related to Bug 1492720 because both bugs propose to change the nursery size calculation.

Blocks: MinGCMem

Depends on: 1433007

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1492720

Paul Bone [:pbone]

Assignee

Comment 2

•

6 years ago

I'd like to work on this, I think the nursery size stability it should provide will make some of my other work have more tangible results.

Assignee: nobody → pbone

Status: NEW → ASSIGNED

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Blocks: 1433007

No longer depends on: 1433007

Paul Bone [:pbone]

Assignee

Comment 3

•

6 years ago

Attached patch Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing (obsolete) (deleted) — Details — Splinter Review

This seems to improve some stability in nursery sizes.  Also the nursery is now very quick to react to changing workloads, maybe too quick.  For example in the some benchmarks as it switches between tasks the "switch" of tasks has different memory allocation patterns, the nursery reacts to this and then starts the next benchmark larger than necessary.  We could prerhaps add back in the condition that the current and previous promotion rates both exceed a threshold before we resize the nursery.

Attachment #9024923 - Flags: review?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 4

•

6 years ago

Here's the first set of differences with this patch:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=e92a6d2427ba&framework=1&showOnlyComparable=1&selectedTimeRange=172800

There are some sagnificant size improvments but also some perf regressions.

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Whiteboard: [MemShrink]

Paul Bone [:pbone]

Assignee

Comment 5

•

6 years ago

Attached file Bug 1497873 - Resizing also depends on previous promotion rate (obsolete) (deleted) — Details

It seemed that the Nursery was adjusting it's size too quickly.  So I added this patch that restores the behaviour where it needs two collections with a high or low promotion rate before adjusting its size.

Attachment #9025209 - Flags: feedback?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 6

•

6 years ago

And here are the results including the second patch, still needs some investigation:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=3f00d2abd8cf&framework=1&showOnlyComparable=1&selectedTimeRange=172800

Paul Bone [:pbone]

Assignee

Comment 7

•

6 years ago

In the ares6 test and possibly others we were getting unlucky and hit a high promotion rate while the nursery was small. causing the wrong objects to get pretenured, then those cause other objects to be tenured and the generational GC stops being an optimisation.

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Updated

•

6 years ago

Blocks: 1507445

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Comment 8

•

6 years ago

(In reply to Paul Bone [:pbone] from comment #7)
These patches seem reasonable but as you note there are still some regressions.

In order to compare with the current setup you post some numbers illustrating how the nursery size changes over time for benchmarks that make particular use of the nursery?

Paul Bone [:pbone]

Assignee

Comment 9

•

6 years ago

Attached patch Tweaking the pretenure condition (obsolete) (deleted) — Details — Splinter Review

Here is another patch I'm testing with, it may commit on this bug or maybe by itself.

Attachment #9026276 - Flags: feedback?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 10

•

6 years ago

As I noted (comment 7) the ares6 test got unlucky with 3 specific object groups getting pretenured, causing a lot more nursery objects surviving the nursery and being evicted, which also meant more regular GC activity.

I've added another patch which refused to make a pretenuring decision if the nursery is less than 4MB.  My thought that if the nursery is small and we promoted a lot, we should probably grow the nursery before we make a pretenuring decision.  This works out and the test is improved.  However there's still some regressions.  One of the interesting comparisions is:

pretenure tweak vs pretenure tweak + nursery sizing change:

All the tests:
https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=c985a960915e&framework=1&showOnlyComparable=1&selectedTimeRange=172800

Ares6:
https://treeherder.mozilla.org/perf.html#/comparesubtest?originalProject=try&originalRevision=c985a960915e&newProject=try&newRevision=eefa2a8c1919&originalSignature=0bae7e5fb769123dc4504aea0458898f63bee0dd&newSignature=0bae7e5fb769123dc4504aea0458898f63bee0dd&framework=11

So there's still something sub-optimal about this new nursery sizing change.  The pretenuring tweak on its own doesn't change much (there's some slight changes but it's hard to be sure), as it is it's only valuable if we also commit the sizing changes.

Paul Bone [:pbone]

Assignee

Comment 11

•

6 years ago

(In reply to Jon Coppeard (:jonco) from comment #8)
> (In reply to Paul Bone [:pbone] from comment #7)
> These patches seem reasonable but as you note there are still some
> regressions.
> 
> In order to compare with the current setup you post some numbers
> illustrating how the nursery size changes over time for benchmarks that make
> particular use of the nursery?

I'll see what I can do.

Paul Bone [:pbone]

Assignee

Comment 12

•

6 years ago

Attached file Test results showing changes in nursery size for Ares6 Air test. (deleted) — Details

Paul Bone [:pbone]

Assignee

Comment 13

•

6 years ago

Attached file Test results showing changes in nursery size for Ares6 Babylon test. (deleted) — Details

Here the Nightly version starts with a 10MB nursery, while the test version starts with a 1MB Nursery.  That's because with the patched code the previous test (Basic) drops its nursery size quickly, while the nightly version does not - and sees a higher tenure rate.

Paul Bone [:pbone]

Assignee

Comment 14

•

6 years ago

Okay, based on that data I've tweaked it a little further (patches soon).  What I've got now is it only consideres the current tenure rate (not the previous one) and it will at most double or halve the nursery.  We needed the halving because sometimes you'd see a really low tenure rate, the nursery size would drop to 1MB, then take 4 collections to restore itself to 16MB.

Here's the tests, comparing a specific version of central, against my patches:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=98c53f6ef2ba&newProject=try&newRevision=8fa0a2c613d9&framework=1&showOnlyConfident=1

There might be some regressions with about_preferences and tp6_facebook,  Neither do much GC work (I'm looking at the profiles generated from those tests) so I'm not sure why there's a difference.  I'll tidy up the patches and share them later tonight.
?

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Attachment #9025209 - Attachment is patch: false

Attachment #9025209 - Flags: feedback?(jcoppeard)

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Depends on: 1510560

Paul Bone [:pbone]

Assignee

Comment 15

•

6 years ago

Attached patch Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing (obsolete) (deleted) — Details — Splinter Review

The tests look good for this patch, but it must not land without Bug 1510560 without which it will regress.

Attachment #9024923 - Attachment is obsolete: true

Attachment #9026276 - Attachment is obsolete: true

Attachment #9024923 - Flags: review?(jcoppeard)

Attachment #9026276 - Flags: feedback?(jcoppeard)

Attachment #9028253 - Flags: review?(jcoppeard)

Andrew McCreight [:mccr8]

Updated

•

6 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Comment 16

•

6 years ago

Attached file charts.zip (deleted) — Details

To help evaluate changes to the nursery size heuristics, I wrote some scripts to generate charts of the size and tenure rate against time for various benchmarks (octane plus some of talos ones).  Here's the output, with and without the patch in this bug.  (I can post the scripts too if interested).

The results show some improvements (e.g. for EarleyBoyer the nursery size is more stable) and some regressions (e.g. for Splay it varies a lot more).

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Comment 17

•

6 years ago

I put the scripts to generate these charts on github: https://github.com/jonco3/nurserySizeCharts

Jon Coppeard (:jonco) (PTO until 14th September)

Reporter

Comment 18

•

6 years ago

Comment on attachment 9028253 [details] [diff] [review]
Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing

Review of attachment 9028253 [details] [diff] [review]:
-----------------------------------------------------------------

Cancelling review request for now.

Attachment #9028253 - Flags: review?(jcoppeard)

Paul Bone [:pbone]

Assignee

Comment 19

•

6 years ago

So in Jon's results, and replicated in mine, Splay running in the shell sees a lot of floundering WRT the nursery size.  While most of the other benchmarks show fairly stable behavour for stable tenure rates, and react quickly to changing tenure rates.  I also get the sense (but no hard data) that considering the previous tenure rate, and tweaking the pretenure condition result in better behavour in these tests (the latter definitly improved ares when tested on try due to some unlucky pretenuring).

I've exprimented with a few things to remove the floundering in Splay, including a larger deadzone for the tenure rate (didn't work).  So far what has worked, and seems a bit funny, is the "correct" calculation for the tenure rate (considering nursery used size not capacity).  I'm going to start some try/talos jobs now with this change and see what it's like.

One other change that I'm on the fence about is how it should behave WRT the previous tenure rate.  Should it observe it both for growing and shrinking the nursery, or only growing (as before)?  It seems like shrinking quickly might be desirable, particularly if there's not going to be another minor GC for a little while as the app hits a steady state (when tenure rate drops anyway).

¯\_(ツ)_/¯

Paul Bone [:pbone]

Assignee

Comment 20

•

6 years ago

My current patches with incorrect tenure rate calculation:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6e55e0f8e303&newProject=try&newRevision=8fac29e891b3&framework=1&showOnlyConfident=1

This still has some regressions for ares6, but other tests seem good.  Including some size improvements.

With correct tenure rate calculation:

https://treeherder.mozilla.org/perf.html#/compare?originalProject=try&originalRevision=6e55e0f8e303&newProject=try&newRevision=1c7c752cfa2e&framework=4&showOnlyConfident=1

Regresses awsy tests.

reminder to myself: Try to work out why ares6 regresses in the first comparison.  And for the 2nd, if we can have the correct tenure rate (which fixes the flip-floppy-ness of Splay) without regressing awsy.

Paul Bone [:pbone]

Assignee

Comment 21

•

6 years ago

Attached patch Bug 1497873 - Tweaking pretenure condition (obsolete) (deleted) — Details — Splinter Review

Paul Bone [:pbone]

Assignee

Comment 22

•

6 years ago

So the problem with ares6 is in the babylon subtest (as always) there are more non-imcremental GCs with these patches due to GCBytesTrigger.

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Priority: P3 → P1

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Attachment #9025209 - Attachment is obsolete: true

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Attachment #9028253 - Attachment is obsolete: true

Paul Bone [:pbone]

Assignee

Updated

•

6 years ago

Attachment #9032855 - Attachment is obsolete: true

Paul Bone [:pbone]

Assignee

Comment 23

•

6 years ago

Attached file Bug 1497873 - Tweak the pretenure condition r=jonco! (deleted) — Details

Paul Bone [:pbone]

Assignee

Comment 24

•

6 years ago

Attached file Bug 1497873 - Add a kind-of goal seeking herustic for nursery resizing r=jonco! (deleted) — Details

Also allow the nursery to shrink up to half it's current size, previously
it'd be one chunk at a time. Growing is still capped at twice the current
size. These limits tend to prevent it changing too dramantically for small
changes in allocation patterns.

Depends on D17593

Paul Bone [:pbone]

Assignee

Comment 25

•

6 years ago

Hi Jon,

I'd like to land this next week. The current talos/raptor/js-bench results look good. The big red marks on some graphics test is because someone recently landed something that improved those benchmarks but the changes I tested were based of an older central version.

https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=5a33f9b4e7466ecb0de6ba63cfae2699a7ab942d&framework=1&showOnlyComparable=1&showOnlyConfident=1&selectedTimeRange=86400

Note the raptor tests, some of those tests look very good.

Pulsebot

Comment 26

•

6 years ago

Pushed by pbone@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/adcc2b05c708
Tweak the pretenure condition r=jonco
https://hg.mozilla.org/integration/autoland/rev/a3e1dfc5aae9
Add a kind-of goal seeking herustic for nursery resizing r=jonco

Paul Bone [:pbone]

Assignee

Comment 27

•

6 years ago

NI perfherder people. This may change a few of our benchmarks, see https://treeherder.mozilla.org/perf.html#/compare?originalProject=mozilla-central&newProject=try&newRevision=5a33f9b4e7466ecb0de6ba63cfae2699a7ab942d&framework=10&showOnlyComparable=1&showOnlyConfident=1&selectedTimeRange=86400 (unless it was reddit that changed their code).

I still want to know about any changes you happen to notice, but if you're trying to find a patch that cased a perf change, look here.

Flags: needinfo?(rwood)

Flags: needinfo?(jmaher)

Cosmin Sabou [:CosminS]

Comment 28

•

6 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/adcc2b05c708
https://hg.mozilla.org/mozilla-central/rev/a3e1dfc5aae9

Status: ASSIGNED → RESOLVED

Closed: 6 years ago

status-firefox67: --- → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla67

Joel Maher ( :jmaher ) (UTC -8)

Comment 29

•

6 years ago

thanks :pbone!

:igoldan, :bebe, heads up- you should be seeing alerts by now

Flags: needinfo?(jmaher)

Flags: needinfo?(igoldan)

Flags: needinfo?(fstrugariu)

Robert Wood [:rwood]

Comment 30

•

6 years ago

Thanks Paul.

Flags: needinfo?(rwood)

Paul Bone [:pbone]

Assignee

Comment 31

•

6 years ago

Hrm, maybe the one I saw on try really was that reddit changed their code. Which is good because I didn't think this change would make that much difference.

Flags: needinfo?(igoldan)

Flags: needinfo?(fstrugariu)

Paul Bone [:pbone]

Assignee

Comment 32

•

6 years ago

This has slightly reduced the nursery bytes telemetry probe:

http://alerts.telemetry.mozilla.org/index.html#/detectors/1/metrics/1931/alerts/?from=2019-01-29&to=2019-01-29

You need to log in before you can comment on or make changes to this bug.