Closed
Bug 1119537
Opened 10 years ago
Closed 8 years ago
Separate decommit into its own GC phase
Categories
(Core :: JavaScript: GC, defect)
Core
JavaScript: GC
Tracking
()
RESOLVED
FIXED
mozilla50
Tracking | Status | |
---|---|---|
firefox50 | --- | fixed |
People
(Reporter: terrence, Assigned: terrence)
References
(Blocks 1 open bug)
Details
Attachments
(1 file, 4 obsolete files)
(deleted),
patch
|
jonco
:
review+
|
Details | Diff | Splinter Review |
These would be much better expressed as parallel tasks.
Assignee | ||
Comment 1•10 years ago
|
||
Actually, we should be able to do *way* better than this:
1) For things swept on the foreground: we can decommit them on the background at any time -- i.e. right after main-thread sweeping in parallel with background sweeping. This will miss background swept arenas, BUT:
2) For things swept on the background: we're already in the background, just decommit directly.
This has a very nice follow-on effect: ShrinkGCBuffers no longer needs to worry at all about decommit, because decommit happens as a natural part of sweeping.
Comment 2•10 years ago
|
||
Note that some code relies on chunks staying around till the end of sweeping so that the mark bits are valid even after GC things have been finalized.
Assignee | ||
Comment 3•10 years ago
|
||
Yes, that actually falls out of this scheme naturally: once decommit is completed with the GC, the only thing ShrinkGCBuffers needs to do is truncate the chunk pool. We can easily move that to its own task and leave the sweep task to be exclusively about sweeping/decommit.
Assignee | ||
Comment 4•10 years ago
|
||
Note: this also gets rid of the side-band shrinkFlag communication, which should make it trivial to swap out the gc helper thread with a parallel task and clear out a bunch of code.
Assignee | ||
Comment 5•10 years ago
|
||
Do decommit of empty arenas directly in the background.
Assignee | ||
Comment 6•10 years ago
|
||
Kick off a decommit thread at the end of sweeping to decommit any arenas freed during foreground finalization.
Attachment #8549095 -
Flags: review?(jcoppeard)
Comment 7•10 years ago
|
||
Comment on attachment 8549093 [details] [diff] [review]
1_decommit_bg_during_sweep-v0.diff
Review of attachment 8549093 [details] [diff] [review]:
-----------------------------------------------------------------
Looks good.
::: js/src/jsgc.cpp
@@ +3398,5 @@
> }
> }
>
> AutoLockGC lock(rt);
> + DecommitArenaList(rt, emptyArenas, lock);
This now releases the lock where it didn't before. I can't think of why this would be a problem but it's something to watch out for.
Attachment #8549093 -
Flags: review?(jcoppeard) → review+
Comment 8•10 years ago
|
||
Comment on attachment 8549095 [details] [diff] [review]
2_decommit_fg_during_bg_sweeping-v0.diff
Review of attachment 8549095 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jsgc.cpp
@@ +3333,5 @@
> + MOZ_ASSERT(!chunk->info.numArenasFreeCommitted);
> +
> + // Build a Vector of all current available Chunks. Since we release the
> + // gc lock while doing the decommit syscall, it is dangerous to iterate
> + // the available list directly, as concurrent operations can modify it.
I guess this comment is out of date now.
@@ +3357,5 @@
> +{
> + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime));
> + MOZ_ASSERT(!isRunningOutsideLock());
> + MOZ_ASSERT(toDecommit_.empty());
> + toDecommit_.appendAll(chunks);
This is fallible so we should return the result and take appropriate action in the caller.
@@ +3368,5 @@
>
> // Start at the tail and stop before the first chunk: we allocate from the
> // head and don't want to thrash with the mutator.
> + for (size_t i = toDecommit_.length(); i > 1; --i) {
> + Chunk *chunk = toDecommit_[i - 1];
Heh, I would have written the loop so that the variable i was the index we're looking at but whatever.
Attachment #8549095 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 9•10 years ago
|
||
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/3682a6de6b1f
remote: https://hg.mozilla.org/integration/mozilla-inbound/rev/1f2367eb72f0
Try run at:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=763a73d2a75c
Busted, but it's all on tip:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=8439a009837d
That should take care of decommit, but we still need expire fixed up.
Assignee | ||
Comment 10•10 years ago
|
||
Backed out the background task given some crashes the cropped up on inbound:
https://hg.mozilla.org/integration/mozilla-inbound/rev/c2350a67e3c6
https://treeherder.mozilla.org/ui/logviewer.html#?job_id=5560583&repo=mozilla-inbound
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla38
Assignee | ||
Comment 12•10 years ago
|
||
The problem here is that sweeping can move chunks from the available pool to the empty pool. If sweeping finishes before decommit, then expire can run before decommit and unmap chunks we haven't yet looked at in the decommit thread. I think the best approach here is to delay expire until after decommit finishes.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=03b3cc5f2272
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Summary: Separate decommit and chunk expire/shrink and run in parallel → Separate decommit into its own thread and run in parallel with background sweeping
Assignee | ||
Comment 13•10 years ago
|
||
And backed out for a 19% regression on splay:
https://hg.mozilla.org/integration/mozilla-inbound/rev/aa05cd133a76
I guess by doing the decommit before we expire chunks, we are decommitting a bunch of pages that would otherwise have been unmapped as part of the chunk. I guess Splay is a bit special because it actually GC's frequently enough that it spends significant time waiting on the background thread?
Comment 14•10 years ago
|
||
Merge of backout:
https://hg.mozilla.org/mozilla-central/rev/aa05cd133a76
Comment 15•10 years ago
|
||
When this was backed out we see a 2% regression in the tp5 test for the main RSS measurement on linux*:
http://alertmanager.allizom.org:8080/alerts.html?rev=aa05cd133a76&showAll=1&testIndex=0&platIndex=0
This is a small bump and aligns with a win we gained when this originally landed:
http://graphs.mozilla.org/graph.html#tests=[[261,131,33]]&sel=none&displayrange=30&datatype=geo
Assignee | ||
Comment 16•9 years ago
|
||
I had no idea how easy it would be to move background finalization into its own phase. Now that I know, let's give the same treatment to decommit. This patch is heavily based on 2_decommit_fg_during_bg_sweeping, so some of it make look similar; or not, it's been a year and a ton has changed.
I'm going to move 1_decommit_bg_during_sweep to a new bug, since it's pretty much orthogonal.
The browser "shrink" callback and associated machinery is still present. It is currently only doing chunk expiration. This is annoying as it is basically just restarting the background sweep thread outside the context of the GC solely for this purpose. I will move this into the background decommit task, remove the gecko callbacks, and lock down all of the background sweeping thread insanity in a followup.
Attachment #8549093 -
Attachment is obsolete: true
Attachment #8549095 -
Attachment is obsolete: true
Attachment #8723659 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 17•9 years ago
|
||
Assignee | ||
Comment 18•9 years ago
|
||
Assignee | ||
Comment 19•9 years ago
|
||
Assignee | ||
Comment 20•9 years ago
|
||
Squashed a few small bugs.
Attachment #8723659 -
Attachment is obsolete: true
Attachment #8723659 -
Flags: review?(jcoppeard)
Attachment #8723842 -
Flags: review?(jcoppeard)
Comment 21•9 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #20)
> Squashed a few small bugs.
You squashed too hard, the whole patch is missing ;)
Comment 22•9 years ago
|
||
Comment on attachment 8723659 [details] [diff] [review]
make_decommit_a_proper_gc_phase-v0.diff
Review of attachment 8723659 [details] [diff] [review]:
-----------------------------------------------------------------
Comments on the previous version:
This looks really nice.
I guess this always adds a slice to our incremental GC, but in practice they are usually have many slices already.
::: js/src/jsgc.cpp
@@ +3447,5 @@
> +{
> + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime));
> + MOZ_ASSERT(!isRunning());
> + MOZ_ASSERT(toDecommit.empty());
> + new (&toDecommit) ChunkVector(mozilla::Move(chunks));
I'm not sure about this - will this not fail to destruct the previous vector? I think you want to swap here.
@@ +6268,5 @@
>
> + startDecommit();
> + incrementalState = DECOMMIT;
> +
> + MOZ_FALLTHROUGH;
Neat, I didn't realise that was a thing.
Assignee | ||
Comment 23•9 years ago
|
||
Hopefully less empty.
Attachment #8723842 -
Attachment is obsolete: true
Attachment #8723842 -
Flags: review?(jcoppeard)
Attachment #8724092 -
Flags: review?(jcoppeard)
Assignee | ||
Comment 24•9 years ago
|
||
Comment on attachment 8724092 [details] [diff] [review]
10.00_make_decommit_a_proper_gc_phase-v1.diff
Review of attachment 8724092 [details] [diff] [review]:
-----------------------------------------------------------------
::: js/src/jit-test/tests/gc/incremental-state.js
@@ +11,5 @@
> // Incremental GC in minimal slice. Note that finalization always uses zero-
> // sized slices while background finalization is on-going, so we need to loop.
> gcslice(1000000);
> while (gcstate() == "finalize") { gcslice(1); }
> +while (gcstate() == "decommit") { gcslice(1); }
Indeed it did. But it's not guaranteed to, depending on how fast the decommit happens. This test, for example, only fails when I'm doing a doing a compile at the same time.
::: js/src/jsgc.cpp
@@ +3448,5 @@
> +{
> + MOZ_ASSERT(CurrentThreadCanAccessRuntime(runtime));
> + MOZ_ASSERT(!isRunning());
> + MOZ_ASSERT(toDecommit.empty());
> + new (&toDecommit) ChunkVector(mozilla::Move(chunks));
Yup, Swap is the right tool here, not sure what I was thinking.
Assignee | ||
Updated•9 years ago
|
Summary: Separate decommit into its own thread and run in parallel with background sweeping → Separate decommit into its own GC phase
Updated•9 years ago
|
Attachment #8724092 -
Flags: review?(jcoppeard) → review+
Assignee | ||
Comment 25•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/2a613f5a58663485f3c20b18a9fc867a09bc532a
Bug 1119537 - Make decommit a proper GC phase; r=jonco
Comment 26•9 years ago
|
||
Looks like this may have regressed Octane Splay by about 2500 points:
http://arewefastyet.com/#machine=28&view=single&suite=octane&subtest=Splay&start=1456752709&end=1456772783
(I was looking because my patch landed right after this)
I backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/d17a3f6097a5 for introducing some hazard failures:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/mozilla-inbound/sha512/2a436d03307df800afec26fc8e14a807dde6517de7843a913762044086abdb5555d38e13bf472bf9e5202c9679c745be21e60aaf5a95a0f7fc359fc08018b3fe
Flags: needinfo?(terrence)
Assignee | ||
Comment 28•9 years ago
|
||
This is from froydnj's landing immediately before mine. Please leave it out in any case to give me time to investigate the Splay regression.
Flags: needinfo?(terrence)
Assignee | ||
Comment 29•9 years ago
|
||
So it looks like the massive regression on Splay is because the shell never did decommit before. I think what's happening here is that the mutator and the decommit task are competing to allocate arenas. Decommit is winning as its inner loop is much shorter, so we end up having to repeatedly re-commit and memset the page, effectively halving our allocation throughput.
The solution here is to detect when we are allocating heavily (and thus are likely to need this memory again soon) and to avoid decommiting in this case. I have wired up the high-frequency GC flag as a proxy for this (it effectively aggregates multiple allocation rate triggers) and it solves the issue on Splay.
Note to future me:
After I landed decommit initially, Igor checked in a patch to stop decommit as soon as we attempt to allocate a new Chunk to fix this exact problem. I also attempted this solution here and it does not appear to work anymore: we're reliably finishing decommit before we allocate enough new memory to need a new chunk.
Assignee | ||
Comment 30•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/af1b34021a320652d4b933d6d38a1d1153a9181d
Bug 1119537 - Make decommit a proper GC phase; r=jonco
I backed this out in https://hg.mozilla.org/integration/mozilla-inbound/rev/5c0bc38a3b88 because of this SM timeout: https://treeherder.mozilla.org/logviewer.html#?job_id=22782124&repo=mozilla-inbound
I'm pretty sure I saw this failure yesterday when this patch landed but I had chalked it up to froydnj's patch, too.
Assignee | ||
Comment 32•9 years ago
|
||
Assignee | ||
Comment 33•9 years ago
|
||
Failed to reproduce locally. I queued up a handful of SM(e) on the m-i push to see if it's intermittent. The above try run may let me hop into the interactive console and check locally?
Err, no. Looks like I also need --interactive.
Comment 34•9 years ago
|
||
Comment 35•9 years ago
|
||
(In reply to Terrence Cole [:terrence] from comment #30)
> https://hg.mozilla.org/integration/mozilla-inbound/rev/
> af1b34021a320652d4b933d6d38a1d1153a9181d
> Bug 1119537 - Make decommit a proper GC phase; r=jonco
Splay didn't move with this patch ;)
Assignee | ||
Comment 36•9 years ago
|
||
I failed to reproduce with an --interactive job. Maybe some difference with running under TC instead of buildbot.
Assignee | ||
Comment 37•9 years ago
|
||
Assignee | ||
Comment 38•9 years ago
|
||
Assignee | ||
Comment 39•9 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/7f1da255d0584a6d6c5ff2579a8e362496934960
Bug 1119537 - Make decommit a proper GC phase; r=jonco
Comment 40•9 years ago
|
||
Backed out for causing frequent Memory-drainAllocationsLog-13.js timeouts in the linux64 debug SM(e) job.
https://hg.mozilla.org/integration/mozilla-inbound/rev/1bb3c8365cc4
https://treeherder.mozilla.org/logviewer.html#?job_id=25464807&repo=mozilla-inbound
Assignee | ||
Comment 41•8 years ago
|
||
Assignee | ||
Comment 42•8 years ago
|
||
This was a real bug: a deadlock caused by accidentally releasing the lock where we didn't expect to.
Assignee | ||
Comment 43•8 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/c23ff1725f56734958eb2978040ab36f278a6063
Bug 1119537 - Make decommit a proper GC phase; r=jonco
Comment 44•8 years ago
|
||
bugherder |
Status: REOPENED → RESOLVED
Closed: 10 years ago → 8 years ago
status-firefox50:
--- → fixed
Resolution: --- → FIXED
Updated•8 years ago
|
Comment 45•8 years ago
|
||
Unfortunately, it appears, after extensive retriggering [1], that this causes bug 1291234 which has recently caused wpt to be hidden by sheriffs on inbound/central [2]. This is obviously not a situation that can be allowed to persist. I don't have specific insight into what wpt is doing that's weird here, but the harness basically consists of a script injected via marionette that window.open's a new Window containing the test, which will attempt to communicate the test result using both cross-Window function calls and postMessage.
[1] https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&fromchange=2026686ebec1&tochange=f4320d509686&filter-searchStr%3Dweb-platform-tests%20e10s&exclusion_profile=false&filter-searchStr=web-platform-tests%20e10s%20tc-w-e10s(1)%20debug (note that the orange a couple of pushes before this lands is not TEST-UNEXPECTED-TIMEOUT, whereas all the orange after this lands is.
[2] bug 1295840
Updated•8 years ago
|
Target Milestone: mozilla38 → mozilla50
You need to log in
before you can comment on or make changes to this bug.
Description
•