Closed Bug 1149752 Opened 10 years ago Closed 10 years ago

Large increase in GC pauses (with beachballing) in the last week of 2015-03

Tracking

()

Status:

RESOLVED FIXED

Milestone:

mozilla40

Tracking Flags:

Tracking

Status

firefox39

---

affected

firefox40

---

fixed

People

(Reporter: MattN, Assigned: jonco)

Details

(Keywords: regression)

Attachments

(6 files, 3 obsolete files)

javascript.options.mem.log snippet showing Max Pause: 7983.7ms 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
Local telemetry histograms related to GC 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
Anonymized memory-report 10 years ago Matthew N. [:MattN] (deleted), application/x-gzip		Details
Concise GC/CC logs from about:memory 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
about:support info 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
javascript.options.mem.log snippet showing Max Pause: 7183.4ms 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
javascript.options.mem.log snippet showing Max Pause: 7183.4ms (untruncated) 10 years ago Matthew N. [:MattN] (deleted), text/plain		Details
javascript.options.mem.log snippet showing Max Pause: 8565.777ms (untruncated) 10 years ago Matthew N. [:MattN] (deleted), application/x-gzip		Details
bug1149752-cancel-inactivity-gc 10 years ago Jon Coppeard (:jonco) (PTO until 14th September) (deleted), patch	terrence : review+ smaug : review+	Details \| Diff \| Splinter Review

Matthew N. [:MattN]

Reporter

Description

•

10 years ago

Attached file javascript.options.mem.log snippet showing Max Pause: 7983.7ms (obsolete) (deleted) — Details

In the last week I've been getting large multi-second GC pauses which lead to a beachball (Not Responding) on OS X. Platform: OS X Channel: Nightly e10s: Off Tabs: ~580 (most unloaded due to lazy loading) App Tab highlights: Gmail (x4), BMO, etherpads, Google Docs/Spreadsheets, Lucidchart Extensions: Many I do have Treeherder as an app tab for my own Fx-Team landings so maybe this is related to bug 1145831 (though I've had that app tab for a few weeks before the problem started). I'm marking this as a regression until we know that it was caused by something outside of Firefox (like an extension or web content change). Has there been any recent landings related to GC?

Matthew N. [:MattN]

Reporter

Comment 1

•

10 years ago

Attached file Local telemetry histograms related to GC (deleted) — Details

Note that 7% of GC_MAX_PAUSE_MS are >= 1s

Matthew N. [:MattN]

Reporter

Comment 2

•

10 years ago

Attached file Anonymized memory-report (deleted) — Details

about:memory shows: 448.28 MB (12.09%) ── heap-unclassified

Andrew McCreight [:mccr8]

Comment 3

•

10 years ago

Did you get the entire giant GC report in your first attachment? I only see slice 0, 1 and 2, and none of them seem that long. Please attach the entire thing. Anecdotally, I've noticed my browser has been more sluggish in the last week or so, but I don't know why.

Matthew N. [:MattN]

Reporter

Comment 4

•

10 years ago

Attached file Concise GC/CC logs from about:memory (deleted) — Details

Matthew N. [:MattN]

Reporter

Comment 5

•

10 years ago

Attached file about:support info (deleted) — Details

Matthew N. [:MattN]

Reporter

Comment 6

•

10 years ago

(In reply to Andrew McCreight [:mccr8] from comment #3) > Did you get the entire giant GC report in your first attachment? I only see > slice 0, 1 and 2, and none of them seem that long. Please attach the entire > thing. I thought I had the entire thing but maybe not. For some reason I'm not hitting the pauses now to capture more output. I'll attach the log when the pauses come back.

Andrew McCreight [:mccr8]

Comment 7

•

10 years ago

Also, the CC one might be useful, too, once you are seeing the problems.

Matthew N. [:MattN]

Reporter

Comment 8

•

10 years ago

Attached file javascript.options.mem.log snippet showing Max Pause: 7183.4ms (obsolete) (deleted) — Details

It looks like the 7183.4ms output line is truncated. Is this better?

Attachment #8586333 - Attachment is obsolete: true

Flags: needinfo?(continuation)

Andrew McCreight [:mccr8]

Comment 9

•

10 years ago

Yeah, there may be some kind of bug with the logging code. Olli was also complaining about the line being truncated. I do see slice information for one fairly long slice: Slice: 44, Pause: 222.5 (When: 4779.6ms, Reason: INTER_SLICE_GC): Mark: 0.3ms, Mark Delayed: 0.2ms, Sweep: 216.3ms, Mark During Sweeping: 144.0ms, Mark Weak: 73.9ms, Mark Gray: 59.9ms, Mark Gray and Weak: 10.0ms, Finalize Start Callback: 9.5ms, Sweep Compartments: 28.4ms, Sweep Discard Code: 11.3ms, Sweep Inner Views: 0.2ms, Sweep Cross Compartment Wrappers: 27.2ms, Sweep Base Shapes: 4.0ms, Sweep Initial Shapes: 10.5ms, Sweep Type Objects: 16.0ms, Sweep Breakpoints: 3.1ms, Sweep Regexps: 2.1ms, Sweep Miscellaneous: 15.9ms, Sweep type information: 0.2ms, Sweep Object: 16.6ms, Minor GCs to Evict Nursery: 0.8ms, Mark Roots: 0.5ms, Mark Rooters: 0.1ms I don't know how useful that is.

Flags: needinfo?(continuation)

Matthew N. [:MattN]

Reporter

Comment 10

•

10 years ago

Attached file javascript.options.mem.log snippet showing Max Pause: 7183.4ms (untruncated) (obsolete) (deleted) — Details

I got the untruncated version from "Save verbose" on about:memory. It seems like the output gets truncated before getting to the Browser Console.

Matthew N. [:MattN]

Reporter

Updated

•

10 years ago

Attachment #8586446 - Attachment is obsolete: true

Andrew McCreight [:mccr8]

Comment 11

•

10 years ago

I think a GC person needs to look at this further. Is this something you could look at Terrence? Thanks.

Flags: needinfo?(terrence)

Andrew McCreight [:mccr8]

Comment 12

•

10 years ago

I'm seeing something similar. CC pauses of 1.3 seconds, with over a million GCed objects in the graph. GC max pauses "only" around 160ms, looking like comment 9.

Andrew McCreight [:mccr8]

Comment 13

•

10 years ago

I have a bunch of ghost windows, from TechCrunch and Facebook. Running minimize memory usage did not fix them.

Andrew McCreight [:mccr8]

Comment 14

•

10 years ago

Slice: 0, Pause: 1467.5 (When: 0.0ms, Reason: ALLOC_TRIGGER): Begin Callback: 1458.8ms, Mark Discard Code: 0.3ms, Purge: 1.3ms, Mark: 5.1ms, Mark Roots: 4.9ms, Buffer Gray Roots: 3.5ms, Mark Cross Compartment Wrappers: 0.9ms, Mark Compartments: 0.3ms, Minor GCs to Evict Nursery: 1.8ms

Andrew McCreight [:mccr8]

Updated

•

10 years ago

Depends on: 1150127

Andrew McCreight [:mccr8]

Comment 15

•

10 years ago

I filed bug 1150127 for my issue which may or may not be the same thing. The about:memory report here shows 4 ghost-windows, so maybe it is the same thing.

Terrence Cole [:terrence]

Comment 16

•

10 years ago

(In reply to Matthew N. [:MattN] from comment #10) > Created attachment 8586465 [details] > javascript.options.mem.log snippet showing Max Pause: 7183.4ms (untruncated) > > I got the untruncated version from "Save verbose" on about:memory. It seems > like the output gets truncated before getting to the Browser Console. That still appears truncated (although much further down) -- at least I can't find the slice that has the listed Max Pause. Also it seems to cut of in the mid If you set MOZ_GCTIMER=path/to/some/file.txt we'll save the output directly to a file, nicely formatted for viewing in a terminal, as opposed to the squished up format we use for the browser console.

Flags: needinfo?(terrence)

Matthew N. [:MattN]

Reporter

Comment 17

•

10 years ago

Attached file javascript.options.mem.log snippet showing Max Pause: 8565.777ms (untruncated) (deleted) — Details

(In reply to Andrew McCreight [:mccr8] from comment #15) > I filed bug 1150127 for my issue which may or may not be the same thing. > The about:memory report here shows 4 ghost-windows, so maybe it is the same > thing. When I got a long hang today then opened about:memory I didn't see any ghost windows at that time so it may not be related. Thanks Terrence. I documented that env. var on MDN: https://developer.mozilla.org/en-US/docs/Mozilla/Performance/GC_and_CC_logs Here is a gzipped log for most of today. Near the end of it I had a max pause of 6307.657ms. Do you need any more info?

Attachment #8586465 - Attachment is obsolete: true

Flags: needinfo?(terrence)

Terrence Cole [:terrence]

Comment 18

•

10 years ago

Analyzing the long GC: This is a compacting GC triggered because of inactivity. The heap is fairly large ~1GiB, so we might expect a few slices to take a bit longer than budget, but not 6 seconds over budget. First slice: 96ms total 50ms in MarkRoots 25ms in Unmark 10ms in DiscardCode 10ms by 1's and 2's elsewhere This is, unfortunately, a long pause, but not unexpected as we don't incrementalize any of this stuff yet. Slice 1 to 32: 40ms Marking is incrementalized perfectly and working exactly as expected for INTER_SLICE_GC. Slice 33: 25ms We get a single slice of REFRESH_FRAME for some animation on the first slice of Sweeping. Still behaving roughly as expected. Slice 33 - 103: 40ms 25ms in FinalizeStartCallback 10ms in MarkDuringSweep 6ms in SweepCompartments 0.05ms in Sweep GC Things. We're hitting our pause times on the nose, but we have so much fixed overhead per sweeping slice that we're only effectively getting 50us of real work done per frame. Since we have ~1s of sweeping to do, we'd take days to finish the GC. Long before then, the CC needs to run and causes us to just do a full GC. Slice 104: 6.3s -- an all-zones, non-incremental, compacting GC Mark: 1.4s Sweep: 1s Compact: 3.7s So the majority of the 6.3s of pause time is 3.7s of non-incremental compacting GC. There are a couple of problems here. (1) We need to figure out what our fixed overheads are and see if we can't incrementalize them. At the very least we need to detect when we're not making adequate forward progress and adjust our slice times accordingly. (2) We schedule compacting at the beginning of the GC, but the GC takes so long that the user has interacted with the browser by the time we get to the end of the GC and could feasibly do the long compacting operation. It would be nice if we could not do compacting if the user interacts with the session between when we start the compacting GC and when the compacting actually happens. Jon, is (2) something you could take a look at?

Flags: needinfo?(terrence) → needinfo?(jcoppeard)

Andrew McCreight [:mccr8]

Updated

•

10 years ago

No longer depends on: 1150127

Emanuel Hoogeveen [:ehoogeveen]

Comment 19

•

10 years ago

For (1), could we get a profile of the offending functions while one of these GCs is going on? Maybe we recently started doing something slow during the fixed phase without realizing it.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Updated

•

10 years ago

Assignee: nobody → jcoppeard

Flags: needinfo?(jcoppeard)

Andrew McCreight [:mccr8]

Comment 20

•

10 years ago

This could actually still be a dupe of bug 1150127. If you have a leaking window for, say, bugzilla.mozilla.org, but you still have some windows open to bugzilla.mozilla.org, then the ghost-window analysis won't report any ghost windows. billm had something like that yesterday. So it might be good to wait a day or two to see if MattN still is seeing this problem.

Matthew N. [:MattN]

Reporter

Comment 21

•

10 years ago

I am still seeing this with 40.0a1 (2015-04-09).

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 22

•

10 years ago

Attached patch bug1149752-cancel-inactivity-gc (deleted) — Details — Splinter Review

Patch to add AbortIncrementalGC() API and call this if the user becomes active again while we are collecting.

Attachment #8591032 - Flags: review?(terrence)

Emanuel Hoogeveen [:ehoogeveen]

Comment 23

•

10 years ago

I haven't fully read up on this problem or the patch, but completely aborting a GC seems fairly heavy-handed to me. If the GC is incremental anyway, could we add, say, a static member (bool stopASAP) to SliceBudget and check it in SliceBudget::checkOverBudget? The caveat with that specific approach would be that it would also affect the cycle collector (and other potential users of SliceBudget, though currently the GC and the CC are the only ones), but maybe the same reasoning applies there anyway.

Terrence Cole [:terrence]

Comment 24

•

10 years ago

Comment on attachment 8591032 [details] [diff] [review] bug1149752-cancel-inactivity-gc Review of attachment 8591032 [details] [diff] [review]: ----------------------------------------------------------------- ::: dom/base/nsJSEnvironment.cpp @@ +219,5 @@ > > static bool sGCOnMemoryPressure; > + > +// nsJSEnvironmentObserver observes the user-interaction-inactive notifications > +// and triggers a shrinking a garbage collection if the user is still active s/active/inactive/? ::: js/src/jit-test/tests/gc/incremental-abort.js @@ +7,5 @@ > +{ > + // Allocate objectCount objects in zoneCount zones and run a incremental > + // shrinking GC. > + > + print("test abort in state: " + abortState); If you've tested that this works, you should probably remove the prints. We don't currently filter the output to add test markers, so this will confuse treeherder and annoy the sheriffs.

Attachment #8591032 - Flags: review?(terrence) → review+

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 25

•

10 years ago

(In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #23) What's being proposed here is a way to cancel the current ongoing GC. This may perform some amount of work to get the collector back into its idle state. For example, if we are currently sweeping a zone group then we will continue sweeping until the end of that group. This would be used to halt a speculative GC caused by the user inactive notification if the user becomes active again. The abort would happen on the main thread in between GC slices. Let me know if you can see a problem with this.

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 26

•

10 years ago

Comment on attachment 8591032 [details] [diff] [review] bug1149752-cancel-inactivity-gc Requesting review for browser changes.

Attachment #8591032 - Flags: review?(bugs)

Olli Pettay [:smaug][bugs@pettay.fi]

Updated

•

10 years ago

Attachment #8591032 - Flags: review?(bugs) → review+

Emanuel Hoogeveen [:ehoogeveen]

Comment 27

•

10 years ago

(In reply to Jon Coppeard (:jonco) from comment #25) > (In reply to Emanuel Hoogeveen [:ehoogeveen] from comment #23) > This would be used to halt a speculative GC caused by the user inactive > notification if the user becomes active again. The abort would happen on > the main thread in between GC slices. I see, it seems I misunderstood - thanks for explaining!

Jon Coppeard (:jonco) (PTO until 14th September)

Assignee

Comment 28

•

10 years ago

https://hg.mozilla.org/integration/mozilla-inbound/rev/4654c5e880d3

Wes Kocher (:KWierso) (Not reading bugmail; email directly if needed)

Comment 29

•

10 years ago

https://hg.mozilla.org/mozilla-central/rev/4654c5e880d3

Status: NEW → RESOLVED

Closed: 10 years ago

status-firefox40: affected → fixed

Resolution: --- → FIXED

Target Milestone: --- → mozilla40

You need to log in before you can comment on or make changes to this bug.