Closed Bug 1119537 Opened 10 years ago Closed 8 years ago

Separate decommit into its own GC phase

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla50

Tracking Flags:

Tracking

Status

firefox50

---

fixed

People

(Reporter: terrence, Assigned: terrence)

References

(Blocks 1 open bug)

Details

Attachments

(1 file, 4 obsolete files)

1_decommit_bg_during_sweep-v0.diff 10 years ago Terrence Cole [:terrence] (deleted), patch	jonco : review+	Details \| Diff \| Splinter Review
2_decommit_fg_during_bg_sweeping-v0.diff 10 years ago Terrence Cole [:terrence] (deleted), patch	jonco : review+	Details \| Diff \| Splinter Review
make_decommit_a_proper_gc_phase-v0.diff 9 years ago Terrence Cole [:terrence] (deleted), patch		Details \| Diff \| Splinter Review
10.00_make_decommit_a_proper_gc_phase-v1.diff 9 years ago Terrence Cole [:terrence] (deleted), patch		Details \| Diff \| Splinter Review
10.00_make_decommit_a_proper_gc_phase-v1.diff 9 years ago Terrence Cole [:terrence] (deleted), patch	jonco : review+	Details \| Diff \| Splinter Review

Terrence Cole [:terrence]

Assignee

Description

•

10 years ago

These would be much better expressed as parallel tasks.

Terrence Cole [:terrence]

Assignee

Comment 1

•

10 years ago

Actually, we should be able to do *way* better than this: 1) For things swept on the foreground: we can decommit them on the background at any time -- i.e. right after main-thread sweeping in parallel with background sweeping. This will miss background swept arenas, BUT: 2) For things swept on the background: we're already in the background, just decommit directly. This has a very nice follow-on effect: ShrinkGCBuffers no longer needs to worry at all about decommit, because decommit happens as a natural part of sweeping.

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 2

•

10 years ago

Note that some code relies on chunks staying around till the end of sweeping so that the mark bits are valid even after GC things have been finalized.

Terrence Cole [:terrence]

Assignee

Comment 3

•

10 years ago

Yes, that actually falls out of this scheme naturally: once decommit is completed with the GC, the only thing ShrinkGCBuffers needs to do is truncate the chunk pool. We can easily move that to its own task and leave the sweep task to be exclusively about sweeping/decommit.

Terrence Cole [:terrence]

Assignee

Comment 4

•

10 years ago

Note: this also gets rid of the side-band shrinkFlag communication, which should make it trivial to swap out the gc helper thread with a parallel task and clear out a bunch of code.

Terrence Cole [:terrence]

Assignee

Comment 5

•

10 years ago

Attached patch 1_decommit_bg_during_sweep-v0.diff (obsolete) (deleted) — Details — Splinter Review

Do decommit of empty arenas directly in the background.

Assignee: nobody → terrence

Status: NEW → ASSIGNED

Attachment #8549093 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 6

•

10 years ago

Attached patch 2_decommit_fg_during_bg_sweeping-v0.diff (obsolete) (deleted) — Details — Splinter Review

Kick off a decommit thread at the end of sweeping to decommit any arenas freed during foreground finalization.

Attachment #8549095 - Flags: review?(jcoppeard)

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 7

•

10 years ago

Comment on attachment 8549093 [details] [diff] [review] 1_decommit_bg_during_sweep-v0.diff Review of attachment 8549093 [details] [diff] [review]: ----------------------------------------------------------------- Looks good. ::: js/src/jsgc.cpp @@ +3398,5 @@ > } > } > > AutoLockGC lock(rt); > + DecommitArenaList(rt, emptyArenas, lock); This now releases the lock where it didn't before. I can't think of why this would be a problem but it's something to watch out for.

Attachment #8549093 - Flags: review?(jcoppeard) → review+

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 8

•

10 years ago

Comment on attachment 8549095 [details] [diff] [review] 2_decommit_fg_during_bg_sweeping-v0.diff Review of attachment 8549095 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsgc.cpp @@ +3333,5 @@ > + MOZ_ASSERT(!chunk->info.numArenasFreeCommitted); > + > + // Build a Vector of all current available Chunks. Since we release the > + // gc lock while doing the decommit syscall, it is dangerous to iterate > + // the available list directly, as concurrent operations can modify it. I guess this comment is out of date now. @@ +3357,5 @@ > +{ > + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime)); > + MOZ_ASSERT(!isRunningOutsideLock()); > + MOZ_ASSERT(toDecommit_.empty()); > + toDecommit_.appendAll(chunks); This is fallible so we should return the result and take appropriate action in the caller. @@ +3368,5 @@ > > // Start at the tail and stop before the first chunk: we allocate from the > // head and don't want to thrash with the mutator. > + for (size_t i = toDecommit_.length(); i > 1; --i) { > + Chunk *chunk = toDecommit_[i - 1]; Heh, I would have written the loop so that the variable i was the index we're looking at but whatever.

Attachment #8549095 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 9

•

10 years ago

remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3682a6de6b1f remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f2367eb72f0 Try run at: https://treeherder.mozilla.org/#/jobs?repo=try&revision=763a73d2a75c Busted, but it's all on tip: https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=8439a009837d That should take care of decommit, but we still need expire fixed up.

Terrence Cole [:terrence]

Assignee

Comment 10

•

10 years ago

Backed out the background task given some crashes the cropped up on inbound: https://hg.mozilla.org/integration/mozilla-inbound/rev/c2350a67e3c6 https://treeherder.mozilla.org/ui/logviewer.html#?job_id=5560583&repo=mozilla-inbound

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 11

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/3682a6de6b1f

Status: ASSIGNED → RESOLVED

Closed: 10 years ago

Resolution: --- → FIXED

Target Milestone: --- → mozilla38

Terrence Cole [:terrence]

Assignee

Comment 12

•

10 years ago

The problem here is that sweeping can move chunks from the available pool to the empty pool. If sweeping finishes before decommit, then expire can run before decommit and unmap chunks we haven't yet looked at in the decommit thread. I think the best approach here is to delay expire until after decommit finishes. https://treeherder.mozilla.org/#/jobs?repo=try&revision=03b3cc5f2272

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Summary: Separate decommit and chunk expire/shrink and run in parallel → Separate decommit into its own thread and run in parallel with background sweeping

Joel Maher ( :jmaher ) (UTC -8)

Updated

•

10 years ago

Depends on: 1123873

Terrence Cole [:terrence]

Assignee

Updated

•

10 years ago

Depends on: 1123260

Terrence Cole [:terrence]

Assignee

Comment 13

•

10 years ago

And backed out for a 19% regression on splay: https://hg.mozilla.org/integration/mozilla-inbound/rev/aa05cd133a76 I guess by doing the decommit before we expire chunks, we are decommitting a bunch of pages that would otherwise have been unmapped as part of the chunk. I guess Splay is a bit special because it actually GC's frequently enough that it spends significant time waiting on the background thread?

Ryan VanderMeulen [:RyanVM]

Comment 14

•

10 years ago

Merge of backout: https://hg.mozilla.org/mozilla-central/rev/aa05cd133a76

Joel Maher ( :jmaher ) (UTC -8)

Comment 15

•

10 years ago

When this was backed out we see a 2% regression in the tp5 test for the main RSS measurement on linux*: http://alertmanager.allizom.org:8080/alerts.html?rev=aa05cd133a76&showAll=1&testIndex=0&platIndex=0 This is a small bump and aligns with a win we gained when this originally landed: http://graphs.mozilla.org/graph.html#tests=[[261,131,33]]&sel=none&displayrange=30&datatype=geo

Terrence Cole [:terrence]

Assignee

Comment 16

•

9 years ago

Attached patch make_decommit_a_proper_gc_phase-v0.diff (obsolete) (deleted) — Details — Splinter Review

I had no idea how easy it would be to move background finalization into its own phase. Now that I know, let's give the same treatment to decommit. This patch is heavily based on 2_decommit_fg_during_bg_sweeping, so some of it make look similar; or not, it's been a year and a ton has changed. I'm going to move 1_decommit_bg_during_sweep to a new bug, since it's pretty much orthogonal. The browser "shrink" callback and associated machinery is still present. It is currently only doing chunk expiration. This is annoying as it is basically just restarting the background sweep thread outside the context of the GC solely for this purpose. I will move this into the background decommit task, remove the gecko callbacks, and lock down all of the background sweeping thread insanity in a followup.

Attachment #8549093 - Attachment is obsolete: true

Attachment #8549095 - Attachment is obsolete: true

Attachment #8723659 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 17

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=72b248b9b649

Terrence Cole [:terrence]

Assignee

Comment 18

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=91e9691c0d57

Terrence Cole [:terrence]

Assignee

Comment 19

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=e8ee0bcb52fe

Terrence Cole [:terrence]

Assignee

Comment 20

•

9 years ago

Attached patch 10.00_make_decommit_a_proper_gc_phase-v1.diff (obsolete) (deleted) — Details — Splinter Review

Squashed a few small bugs.

Attachment #8723659 - Attachment is obsolete: true

Attachment #8723659 - Flags: review?(jcoppeard)

Attachment #8723842 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Updated

•

9 years ago

Blocks: 1251463

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 21

•

9 years ago

(In reply to Terrence Cole [:terrence] from comment #20) > Squashed a few small bugs. You squashed too hard, the whole patch is missing ;)

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 22

•

9 years ago

Comment on attachment 8723659 [details] [diff] [review] make_decommit_a_proper_gc_phase-v0.diff Review of attachment 8723659 [details] [diff] [review]: ----------------------------------------------------------------- Comments on the previous version: This looks really nice. I guess this always adds a slice to our incremental GC, but in practice they are usually have many slices already. ::: js/src/jsgc.cpp @@ +3447,5 @@ > +{ > + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime)); > + MOZ_ASSERT(!isRunning()); > + MOZ_ASSERT(toDecommit.empty()); > + new (&toDecommit) ChunkVector(mozilla::Move(chunks)); I'm not sure about this - will this not fail to destruct the previous vector? I think you want to swap here. @@ +6268,5 @@ > > + startDecommit(); > + incrementalState = DECOMMIT; > + > + MOZ_FALLTHROUGH; Neat, I didn't realise that was a thing.

Terrence Cole [:terrence]

Assignee

Comment 23

•

9 years ago

Attached patch 10.00_make_decommit_a_proper_gc_phase-v1.diff (deleted) — Details — Splinter Review

Hopefully less empty.

Attachment #8723842 - Attachment is obsolete: true

Attachment #8723842 - Flags: review?(jcoppeard)

Attachment #8724092 - Flags: review?(jcoppeard)

Terrence Cole [:terrence]

Assignee

Comment 24

•

9 years ago

Comment on attachment 8724092 [details] [diff] [review] 10.00_make_decommit_a_proper_gc_phase-v1.diff Review of attachment 8724092 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit-test/tests/gc/incremental-state.js @@ +11,5 @@ > // Incremental GC in minimal slice. Note that finalization always uses zero- > // sized slices while background finalization is on-going, so we need to loop. > gcslice(1000000); > while (gcstate() == "finalize") { gcslice(1); } > +while (gcstate() == "decommit") { gcslice(1); } Indeed it did. But it's not guaranteed to, depending on how fast the decommit happens. This test, for example, only fails when I'm doing a doing a compile at the same time. ::: js/src/jsgc.cpp @@ +3448,5 @@ > +{ > + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime)); > + MOZ_ASSERT(!isRunning()); > + MOZ_ASSERT(toDecommit.empty()); > + new (&toDecommit) ChunkVector(mozilla::Move(chunks)); Yup, Swap is the right tool here, not sure what I was thinking.

Terrence Cole [:terrence]

Assignee

Updated

•

9 years ago

Summary: Separate decommit into its own thread and run in parallel with background sweeping → Separate decommit into its own GC phase

Jon Coppeard (:jonco) (PTO until 14th September)

Updated

•

9 years ago

Attachment #8724092 - Flags: review?(jcoppeard) → review+

Terrence Cole [:terrence]

Assignee

Comment 25

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/2a613f5a58663485f3c20b18a9fc867a09bc532a Bug 1119537 - Make decommit a proper GC phase; r=jonco

Emanuel Hoogeveen [:ehoogeveen]

Comment 26

•

9 years ago

Looks like this may have regressed Octane Splay by about 2500 points: http://arewefastyet.com/#machine=28&view=single&suite=octane&subtest=Splay&start=1456752709&end=1456772783 (I was looking because my patch landed right after this)

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 27

•

9 years ago

I backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d17a3f6097a5 for introducing some hazard failures: http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/2a436d03307df800afec26fc8e14a807dde6517de7843a913762044086abdb5555d38e13bf472bf9e5202c9679c745be21e60aaf5a95a0f7fc359fc08018b3fe

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Assignee

Comment 28

•

9 years ago

This is from froydnj's landing immediately before mine. Please leave it out in any case to give me time to investigate the Splay regression.

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Assignee

Comment 29

•

9 years ago

So it looks like the massive regression on Splay is because the shell never did decommit before. I think what's happening here is that the mutator and the decommit task are competing to allocate arenas. Decommit is winning as its inner loop is much shorter, so we end up having to repeatedly re-commit and memset the page, effectively halving our allocation throughput. The solution here is to detect when we are allocating heavily (and thus are likely to need this memory again soon) and to avoid decommiting in this case. I have wired up the high-frequency GC flag as a proxy for this (it effectively aggregates multiple allocation rate triggers) and it solves the issue on Splay. Note to future me: After I landed decommit initially, Igor checked in a patch to stop decommit as soon as we attempt to allocate a new Chunk to fix this exact problem. I also attempted this solution here and it does not appear to work anymore: we're reliably finishing decommit before we allocate enough new memory to need a new chunk.

Terrence Cole [:terrence]

Assignee

Comment 30

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/af1b34021a320652d4b933d6d38a1d1153a9181d Bug 1119537 - Make decommit a proper GC phase; r=jonco

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 31

•

9 years ago

I backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5c0bc38a3b88 because of this SM timeout: https://treeherder.mozilla.org/logviewer.html#?job_id=22782124&repo=mozilla-inbound I'm pretty sure I saw this failure yesterday when this patch landed but I had chalked it up to froydnj's patch, too.

Terrence Cole [:terrence]

Assignee

Comment 32

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=05f83ebbe7b4

Terrence Cole [:terrence]

Assignee

Comment 33

•

9 years ago

Failed to reproduce locally. I queued up a handful of SM(e) on the m-i push to see if it's intermittent. The above try run may let me hop into the interactive console and check locally? Err, no. Looks like I also need --interactive.

Pulsebot

Comment 34

•

9 years ago

Backout: https://hg.mozilla.org/mozilla-central/rev/eb25b90a05c1

Hannes Verschore [:h4writer]

Comment 35

•

9 years ago

(In reply to Terrence Cole [:terrence] from comment #30) > https://hg.mozilla.org/integration/mozilla-inbound/rev/ > af1b34021a320652d4b933d6d38a1d1153a9181d > Bug 1119537 - Make decommit a proper GC phase; r=jonco Splay didn't move with this patch ;)

Terrence Cole [:terrence]

Assignee

Comment 36

•

9 years ago

I failed to reproduce with an --interactive job. Maybe some difference with running under TC instead of buildbot.

Terrence Cole [:terrence]

Assignee

Comment 37

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=5e0ecd29a513

Terrence Cole [:terrence]

Assignee

Comment 38

•

9 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=0dc09a101b09

Terrence Cole [:terrence]

Assignee

Comment 39

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/7f1da255d0584a6d6c5ff2579a8e362496934960 Bug 1119537 - Make decommit a proper GC phase; r=jonco

Ryan VanderMeulen [:RyanVM]

Comment 40

•

9 years ago

Backed out for causing frequent Memory-drainAllocationsLog-13.js timeouts in the linux64 debug SM(e) job. https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb3c8365cc4 https://treeherder.mozilla.org/logviewer.html#?job_id=25464807&repo=mozilla-inbound

Terrence Cole [:terrence]

Assignee

Comment 41

•

8 years ago

https://treeherder.mozilla.org/#/jobs?repo=try&revision=85e9e13b9f32

Terrence Cole [:terrence]

Assignee

Comment 42

•

8 years ago

This was a real bug: a deadlock caused by accidentally releasing the lock where we didn't expect to.

Terrence Cole [:terrence]

Assignee

Comment 43

•

8 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/c23ff1725f56734958eb2978040ab36f278a6063 Bug 1119537 - Make decommit a proper GC phase; r=jonco

Carsten Book [:Tomcat]

Comment 44

•

8 years ago

bugherder

https://hg.mozilla.org/mozilla-central/rev/c23ff1725f56

Status: REOPENED → RESOLVED

Closed: 10 years ago → 8 years ago

status-firefox50: --- → fixed

Resolution: --- → FIXED

Tooru Fujisawa [:arai]

Updated

•

8 years ago

See Also: → https://bugzilla.mozilla.org/show_bug.cgi?id=1292324

James Graham [:jgraham]

Updated

•

8 years ago

Blocks: 1291234

James Graham [:jgraham]

Comment 45

•

8 years ago

Unfortunately, it appears, after extensive retriggering [1], that this causes bug 1291234 which has recently caused wpt to be hidden by sheriffs on inbound/central [2]. This is obviously not a situation that can be allowed to persist. I don't have specific insight into what wpt is doing that's weird here, but the harness basically consists of a script injected via marionette that window.open's a new Window containing the test, which will attempt to communicate the test result using both cross-Window function calls and postMessage. [1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=2026686ebec1&tochange=f4320d509686&filter-searchStr%3Dweb-platform-tests%20e10s&exclusion_profile=false&filter-searchStr=web-platform-tests%20e10s%20tc-w-e10s(1)%20debug (note that the orange a couple of pushes before this lands is not TEST-UNEXPECTED-TIMEOUT, whereas all the orange after this lands is. [2] bug 1295840

Jon Coppeard (:jonco) (PTO until 14th September)

Updated

•

8 years ago

Depends on: 1288579

Ryan VanderMeulen [:RyanVM]

Updated

•

8 years ago

Target Milestone: mozilla38 → mozilla50

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Updated

•

8 years ago

Depends on: 1305779

You need to log in before you can comment on or make changes to this bug.