Open Bug 1172193 Opened 9 years ago Updated 2 years ago

Heuristics for zone GCs are bad

Tracking

()

Status:

NEW

Tracking Flags:

Tracking

Status

firefox41

---

affected

People

(Reporter: billm, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Attachments

(1 file)

bug1172193-collect-associated-zones 9 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), patch	terrence : review+	Details \| Diff \| Splinter Review

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Description

•

9 years ago

I'm filing this based on bug 1164014. There's a workaround there in comment 50, but we really need to fix the problem generally. The STR to get the bad behavior is: 1. Compile an opt build that doesn't include the patch in bug 1164014 comment 50. 2. Run mach mochitest -f browser browser_parsable_script.js 3. Observe "MEMORY STAT residentFast after test:" near the end of the log. For me, on 64bit Linux, it's ~1.1GB. That's very high. For a different test I tried, it's 190MB. What's this test doing? The core of it is here: http://mxr.mozilla.org/mozilla-central/source/browser/base/content/test/general/browser_parsable_script.js#44 It reads in all the chrome scripts in Firefox and parses them with Reflect.parse. The AST for each file is fairly large, but it becomes garbage as soon as the file is parsed. So, in theory, we should only use as much memory as we need for the largest file (plus GC overhead). Here's where things go wrong. We import Reflect.parse from a module called reflect.jsm, which lives in a different zone than the test code. So the AST is actually allocated in reflect.jsm's zone. The test just gets a cross-compartment wrapper to it. We're correctly doing ALLOC_TRIGGER and EAGER_ALLOC_TRIGGER GCs for reflect.jsm's zone. However, we can't collect any of the data in it because the CCWs from the test aren't dying. They aren't dying because we don't collect the test zone. And we don't collect the test zone because not much data is being allocated there, so none of our triggers hit. The solution is to somehow collect both zones. If I artificially call scheduleGC(this) after each Reflect.parse call from the test, then everything is good. I think the right way to handle this might as follows: whenever we are about to GC a set of zones, look at all the CCWs coming into it from other zones. Those zones should also be added to the set to be collected. Keep doing this until we reach a fixed point. A long time ago I think we would have always ended up with all the zones in this set. However, all the e10s work we've been doing has had the side effect of partitioning things a lot better. My hope is that we should be able to set up the zones so that content zones have no incoming pointers from the system zone. One thing we'll need to change for that to work is here: http://mxr.mozilla.org/mozilla-central/source/dom/base/nsFrameMessageManager.cpp#1845 We create the global for frame scripts (which, in e10s, run in the child process and are expected to touch content pages) in the system zone. We need to change that so the global is created in the same zone as web content. There may be other places that need to be fixed, but I think it should be feasible. Terrence, do you think you or someone else could take this on?

Flags: needinfo?(terrence)

Andrew McCreight [:mccr8]

Updated

•

9 years ago

Whiteboard: [MemShrink]

Terrence Cole [:terrence]

Comment 1

•

9 years ago

We discussed this today at the GC meeting. Jon already has a patch for the first half of it that appears to work on the given testcase. Now that we have a regression mailing list for GC telemetry, we should notice any adverse reactions from collecting more compartments, so we can push this without too much worry. None of us at all understood the second bit about system globals, however. Bill, could you expand a bit?

Flags: needinfo?(terrence) → needinfo?(wmccloskey)

Bill McCloskey [inactive unless it's an emergency] (:billm)

Reporter

Comment 2

•

9 years ago

Only 10% of GCs are zone GCs, so this probably won't have too much effect on telemetry. The system global thing is just to try to improve performance so that we don't always turn zone GCs into full GCs. Perhaps we could talk about it at the work week. I think we can land Jon's patch without it.

Flags: needinfo?(wmccloskey)

Eric Rahm [:erahm]

Updated

•

9 years ago

Whiteboard: [MemShrink] → [MemShrink:P2]

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Blocks: 1172468

Ryan VanderMeulen [:RyanVM]

Comment 3

•

9 years ago

FYI, bug 1172468 is nearing the point of permafail, so anything that can be done Really Soon Now would be much appreciated lest we have to resort to disabling the test instead.

No longer blocks: 1172468

Ryan VanderMeulen [:RyanVM]

Updated

•

9 years ago

Blocks: 1172468

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 4

•

9 years ago

Attached patch bug1172193-collect-associated-zones (deleted) — Details — Splinter Review

Attachment #8617913 - Flags: review?(terrence)

Terrence Cole [:terrence]

Comment 5

•

9 years ago

Comment on attachment 8617913 [details] [diff] [review] bug1172193-collect-associated-zones Review of attachment 8617913 [details] [diff] [review]: ----------------------------------------------------------------- LGTM

Attachment #8617913 - Flags: review?(terrence) → review+

Pulsebot

Comment 6

•

9 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/136d0f3fe47c

Carsten Book [:Tomcat]

Comment 7

•

9 years ago

sorry had to back this out in https://treeherder.mozilla.org/#/jobs?repo=mozilla-inbound&revision=4d38c2c86ba4 seems one of this changes caused a linux memory leak like : https://treeherder.mozilla.org/logviewer.html#?job_id=10663696&repo=mozilla-inbound

Flags: needinfo?(jcoppeard)

Jon Coppeard (:jonco) (PTO until 14th September)

Comment 8

•

9 years ago

This patch regressed our V8 score by about 6% while it was checked in. I guess the system global change would help there.

Flags: needinfo?(jcoppeard)

Joel Maher ( :jmaher ) (UTC -8)

Comment 9

•

9 years ago

Talos confirmed the v8 regression and improvement upon backout.

Nicolas B. Pierron [:nbp]

Comment 10

•

9 years ago

(In reply to Joel Maher (:jmaher) from comment #9) > Talos confirmed the v8 regression and improvement upon backout. Same thing on the flame: http://arewefastyet.com/regressions/#/regression/1773134 Except for SplayLatency, which regressed after the backout.

The 8472

Comment 11

•

9 years ago

(In reply to Bill McCloskey (:billm) from comment #0) > A long time ago I think we would have always ended up with all the zones in > this set. However, all the e10s work we've been doing has had the side > effect of partitioning things a lot better. My hope is that we should be > able to set up the zones so that content zones have no incoming pointers > from the system zone. Non-SDK addons using sandboxes without setting |sameZoneAs| or possibly interacting with content objects directly from frame scripts might invalidate that assumption, no? For example I've found extended principal compartments of µBlock and greasemonkey related to content windows living in the system zone (I've already filed issues with each). Other addons are probably doing similar things.

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Heuristics for zone GCs are bad

Categories

(Core :: JavaScript: GC, defect)

Tracking

()

People

(Reporter: billm, Unassigned)

References

Details

(Whiteboard: [MemShrink:P2])

Crash Data

Security

(public)

User Story

Attachments

(1 file)

Description

Updated

Comment 1

Comment 2

Updated

Updated

Comment 3

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Attachment

General

Description

File Name

Content Type