Open Bug 1057530 (GC.60fps) Opened 10 years ago Updated 1 year ago

[meta] Reduce our GC max-pause

Categories

(Core :: JavaScript: GC, defect, P3)

defect

Tracking

()

People

(Reporter: terrence, Unassigned)

References

(Depends on 9 open bugs, Blocks 1 open bug)

Details

(Keywords: meta, Whiteboard: [platform-rel-Games])

Attachments

(1 file)

Attached file bad_gc_behavior.txt (deleted) —
Right now, on a fresh profile, with default everything, our max pause is ~40-50ms. This appears to all come in the last slice. The GC dump is attached. Most importantly, the last slice times break down like this:

Wait Background Thread: 5.8ms,
Mark: 2.7ms
  Mark Roots: 1.6ms
Sweep: 30.4ms
  Mark During Sweeping: 8.1ms
    Mark Weak: 2.3ms
    Mark Gray: 5.5ms
    Mark Gray and Weak: 0.2ms
  Finalize Start Callback: 0.5ms
  Sweep Atoms: 3.9ms
  Sweep Compartments: 11.2ms
    Sweep Discard Code: 1.1ms
    Sweep Tables: 7.0ms
      Sweep Cross Compartment Wrappers: 0.8ms
      Sweep Base Shapes: 3.2ms
      Sweep Initial Shapes: 1.7ms
      Sweep Type Objects: 0.6ms
    Discard Analysis: 2.9ms
      Discard TI: 1.0ms
      Sweep Types: 1.9ms
  Sweep Object: 0.8ms
  Sweep String: 0.3ms
  Sweep Script: 0.6ms
  Sweep Shape: 1.9ms
  Sweep JIT code: 0.1ms
  Finalize End Callback: 1.7ms
  Deallocate: 0.5ms
End Callback: 0.2ms
Also, this is on a fast machine with plenty of memory and not on any particular site.
(In reply to Terrence Cole [:terrence] from comment #1)
> Also, this is on a fast machine with plenty of memory and not on any
> particular site.

Win 8.1 laptop, i7-4500u, 8G ram, m-c built today (opt build without debug symbols).

Also, the telemetry histograms support these values. E.g. On Firefox 34: GC_MAX_PAUSE_MS's median is 55.09 : http://telemetry.mozilla.org/#filter=nightly%2F34%2FGC_MAX_PAUSE_MS&aggregates=multiselect-all!Submissions!Mean!5th%20percentile!25th%20percentile!median!75th%20percentile!95th%20percentile&evoOver=Builds&locked=true&sanitize=true&renderhistogram=Graph
I think it's important to decide up front whether the goal is to improve the empty profile use case or the lots-of-tabs users. Part of the problem with GC is that we don't have test cases that are at all representative of how users actually use the GC. areweslimyet.com is probably the closest, but it takes forever to run.

We could also consider incorporating more GC data into telemetry.
The goal is, IMO, to have the majority of GCs to not have slices above 10ms. Right now, from my experiments (using the patch from bug 1019611) that every 5-30ms we get a 30-60ms max pause.

I used a clean profile with various scenarios, e.g. constant simple animation on http://testufo.com, just opening and closing tabs, banana bread webgl mozilla demos, scrolling up and down on cnn.com, and others. The pattern is quite consistent- we get a max pause of ~40 ms typically every 10-30s.

Each of those - if it happens during animation - is a clearly visible dropped frame. The testufo.com site also identifies it as "not smooth".

I didn't test with multiple tabs, but I'd say that if it can't keep a simple animation consistently smooth with a single tab on a clean profile, then what chance do we have with multiple tabs and dirty profiles?
(In reply to Avi Halachmi (:avih) from comment #4)
> The goal is, IMO, to have the majority of GCs to not have slices above 10ms.
> Right now, from my experiments (using the patch from bug 1019611) that every
> 5-30ms we get a 30-60ms max pause.

Typo, every 5-30s there's a GC pause of 30-60ms which results in visibly dropped frame[s] if it happens to be while animating or gaming.
(In reply to Avi Halachmi (:avih) from comment #4)
> The goal is, IMO, to have the majority of GCs to not have slices above 10ms.

If we wanted to really define a _good_ goal, I'd say that slices should be typically not more than 5ms and, if possible, more spread apart (e.g. maybe every 5-10 frames rather than every frame).

10ms taken out of a frame could many times still cause a frame drop IMO, since it's 70% of the entire frame time, so it squeezes the frame's work quite a bit.

FWIW, I was also observing CC max slices (bug 1019101) which happen a bit more frequently than GCs (typically every 6-7s) and are shorter - typically 1-7 ms averaging about 4 I think. And on most of the CCs I couldn't notice frame drops and also testufo.com didn't detect "stutter" during CCs.
OK. I have to stress, though, that 5ms is extremely ambitious. I worked on a research Java GC (Metronome) in 2006 and it managed to achieve pause times in that range, but the throughput was really bad. Another system, the Java GC from Azul is "pauseless", but it requires OS support. Also, Java GC is a lot easier than JavaScript GC, especially in a browser. I think that to get even close to 10ms would take a year or more of work.

A more reasonable goal would be to try to make GCs less frequent (with better scheduling) and smaller (per-tab rather than for all the tabs). Also, thinking ahead, electrolysis will split apart the JS heaps for the parent and the child processes. As a consequence, GCs in both processes will get faster.
Depends on: 1057563
Yeah, I didn't pretend to be able to estimate the difficulty of a 5ms goal. But both my understanding of animations and my observations with the CC pauses tell me that 5ms is a good goal to strive to.

My estimation is that 10ms taken out from a frame is likely to drop frames frequently enough - especially if we're CPU intensive like in gaming, and that consecutive frames with 10ms pause each is even more likely to (though according to Terrence we don't typically have consecutive >10ms slices - he gave me this example for slices:  [10, 1, 2, 1, 1, 3, 2, 2, 1, 39]).

I do agree, however, that having a frame or two dropped every 2 minutes is way better than every 18s - which is what I observed.

Another approach, which again I can't assess its difficulty, is to make the GCs/slices much more frequent, but only if you can make sure that typically all of them are very quick (say, 1-2 ms for the 75th percentile, and not more than 5ms for the rest upto 99th percentile).
Useful gc benchmarking links for Avi:
http://v8.googlecode.com/svn/branches/bleeding_edge/benchmarks/spinning-balls/index.html
http://29a.ch/2010/6/2/realtime-raytracing-in-javascript
https://videos.cdn.mozilla.net/uploads/mozhacks/flight-of-the-navigator/

(In reply to Avi Halachmi (:avih) from comment #8)
> Another approach, which again I can't assess its difficulty, is to make the
> GCs/slices much more frequent, but only if you can make sure that typically
> all of them are very quick (say, 1-2 ms for the 75th percentile, and not
> more than 5ms for the rest upto 99th percentile).

I don't understand this. What do you mean by more frequent? We're doing a slice at the top of each frame right now so more frequent would not be any different from a longer slice time.
(In reply to Terrence Cole [:terrence] from comment #9)
> (In reply to Avi Halachmi (:avih) from comment #8)
> > Another approach, which again I can't assess its difficulty, is to make the
> > GCs/slices much more frequent, but only if you can make sure that typically
> > all of them are very quick (say, 1-2 ms for the 75th percentile, and not
> > more than 5ms for the rest upto 99th percentile).
> 
> I don't understand this. What do you mean by more frequent? We're doing a
> slice at the top of each frame right now so more frequent would not be any
> different from a longer slice time.

Obviously more frequent than once a frame would have negative effect. I suggested an idea where the next GC would start sooner in the hope that its slices might be shorter. I can't assess how practical or useful it would be.
(In reply to Avi Halachmi (:avih) from comment #10)
> Obviously more frequent than once a frame would have negative effect. I
> suggested an idea where the next GC would start sooner in the hope that its
> slices might be shorter. I can't assess how practical or useful it would be.

Ah, no. The minimum length of the longest slices is proportional to the retained size, not the freed size, so this would be ineffective.
Depends on: ParallelSweeping
Depends on: 1065037
Depends on: 1065040
Depends on: 1066135
Depends on: 1068123
Depends on: 1070256
Depends on: ConcurrentMarking
Alias: GC.60fps
No longer depends on: ConcurrentMarking
Depends on: 1052728
Depends on: 1109336
Depends on: 1106964
Depends on: 1111361
Depends on: 1112278
Depends on: 673492
Depends on: 1119549
Depends on: 1120016
No longer depends on: 1119549
Depends on: 1122196
Depends on: 1124473
Depends on: 1127608
Depends on: 1130211
Depends on: 1145831
Depends on: 1157967
Depends on: 1164973
No longer depends on: 1052728
Depends on: 1171749
Depends on: 1181799
Depends on: 1181862
Depends on: 1184578
Depends on: 1186609
Depends on: 1189112
Depends on: 1190457
Depends on: 1192301
Depends on: 1202151
Depends on: 1223078
Depends on: 1227144
Depends on: ConcurrentGC
Depends on: 1233187
Depends on: 1233857
Depends on: 1233862
Depends on: 1238779
Depends on: 1245965
Depends on: 1267812
Whiteboard: [platform-rel-Games]
platform-rel: --- → ?
Depends on: 1294563
platform-rel: ? → ---
Moving to p3 because no activity for at least 24 weeks.
Priority: P1 → P3
QA Whiteboard: qa-not-actionable
Assignee: terrence.d.cole → nobody
Status: ASSIGNED → NEW
Depends on: 1755022

In the process of migrating remaining bugs to the new severity system, the severity for this bug cannot be automatically determined. Please retriage this bug using the new severity system.

Severity: critical → --
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: