Open Bug 600234 Opened 14 years ago Updated 2 years ago

moving GC ArenaHeader out of Arena again

Tracking

()

Status:

NEW

People

(Reporter: igor, Unassigned)

References

Details

Attachments

(2 files, 8 obsolete files)

patch 14 years ago Andreas Gal :gal (deleted), patch		Details \| Diff \| Splinter Review
patch 14 years ago Andreas Gal :gal (deleted), patch		Details \| Diff \| Splinter Review
patch with two headers 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
one small header 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v5 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
benchmark 14 years ago Gregor Wagner [:gwagner] (deleted), text/plain		Details
v6 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v7 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
attachment for the wrong bug 14 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v8 13 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review

Igor Bukanov

Reporter

Description

•

14 years ago

The patch for bug 558861 has moved the GC arena header into the arena itself I presume for code simplicity. But having the header there implies that various read and writes that involves only the header field would have bigger CPU cache and TLB footprints. Also this waste some memory especially for objects and function arenas as the header occupies the whole cell for the object when only 6 words are required. Those effects were visible previously in cachegrind and were the reason for moving the header out of arena.

Andreas Gal :gal

Comment 1

•

14 years ago

Not too worried about the memory waste. Peanuts. Can you link me to the cachegrind data?

Igor Bukanov

Reporter

Comment 2

•

14 years ago

Here is test based on what I have used before: function prepare(N) { var array = []; for (var i = 0; i != N; ++i) { array.push(String.fromCharCode(100, 101, 102)); array.push({}); } return array; } function do_it() { var array = prepare(12e5); var t = Date.now(); gc(); var time1 = Date.now() - t; array = prepare(10); array = null; var t = Date.now(); gc(); var time2 = Date.now() - t; return [time1, time2]; } print(do_it()); The test prints the pair representing the timing of GC dominated mostly by marking and by finalization time of a lot of empty objects and short strings to consider only pure GC things. On the TM the best timing among 20 runs is the pair 100,36. On a pre-compartment GC the best pair is 93,33. So we have slowdown on the scale of 7-10%. This is for 32 bit js shell on dual-core iCore 7 laptop under fedora 13. Given the code that test exercise the only difference can come from a separated arena header.

Igor Bukanov

Reporter

Comment 3

•

14 years ago

Making this bug as a performance regression from the bug 558861.

Blocks: 558861

No longer depends on: 558861

Andreas Gal :gal

Comment 4

•

14 years ago

Lets assume for a second that the workload is relevant and that the measurement is accurate, how do we explain the difference?

Andreas Gal :gal

Comment 5

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

The patch to move the header out of line. Untested, but pretty trivial. I am still not convinced this is a good idea though.

Assignee: igor → gal

Andreas Gal :gal

Comment 6

•

14 years ago

Attached patch patch (obsolete) (deleted) — Details — Splinter Review

Without accidental jetpack change.

Attachment #479134 - Attachment is obsolete: true

Igor Bukanov

Reporter

Comment 7

•

14 years ago

(In reply to comment #4) > Lets assume for a second that the workload is relevant and that the measurement > is accurate, how do we explain the difference? The first number in the above test represent the case when almost all GC arenas are fully used with no free cell available. With a separated array of arena headers during the sweep the GC does not need to touch the arena itself as it will see that everything is marked. This explains the first number. The second number is the opposite case when the arena contains only dead things with an empty free list. This is a common scenario when the GC happens after an allocation burst with the heap containing mostly a garbage. But it is harder to see what is responsible for the win here. With or without a separated header the GC has to assemble the new free list which presumably warm ups the CPU cache.

Igor Bukanov

Reporter

Comment 8

•

14 years ago

Attached patch patch with two headers (obsolete) (deleted) — Details — Splinter Review

With a single header I could not match the numbers of the pre compartment GC. It is really important to have the header to be the power of two. So the patch adds two headers, one 4-worded and one two-word where only two-worded header is touched during the GC. This allowed to recover all the timing.

Igor Bukanov

Reporter

Comment 9

•

14 years ago

Attached patch one small header (obsolete) (deleted) — Details — Splinter Review

This patch uses single separated arena header but shrinks it to 4 words in optimized build. This avoids the extra complexity of 2 headers and allowed to have the same results in the benchmark above or its version that I have observed with pre-compartment-gc code. The patch does not use templates for arena header as semantically the header allows to discover the dynamic type of the arena. The patch also adds Arena::chunkAndIndex with very similar code that was used in older GC. Last time I have checked GCC could not optimize separated chunk() and arenaIndex() calls into the common optimized version. The patch does not affect SS or V8 scores.

Assignee: gal → igor

Attachment #479136 - Attachment is obsolete: true

Attachment #479143 - Attachment is obsolete: true

Attachment #479399 - Flags: review?(anygregor)

Igor Bukanov

Reporter

Comment 10

•

14 years ago

Attached patch v5 (obsolete) (deleted) — Details — Splinter Review

Here is a rebased patch that also adds Cell::arenaHeader helper to make those lines with asCell()->arena()->header()-> little bit shorter. The patch also moves the alignment checks into the jsgc.h header. This avoids explicit template incitations and allowed for a wider assert coverage as the check is called during marking of any GC thing including JSXML.

Attachment #479399 - Attachment is obsolete: true

Attachment #479751 - Flags: review?(anygregor)

Attachment #479399 - Flags: review?(anygregor)

Gregor Wagner [:gwagner]

Comment 11

•

14 years ago

This change was mostly driven by code simplicity. Do we have real numbers that really force us to go back to the old system? SS, V8, also the browser versions of the benchmarks? I was measuring a lot for the new GC and the cycles needed to mark an object (total marking time / amount of marked objects) went down.

Gregor Wagner [:gwagner]

Comment 12

•

14 years ago

Attached file benchmark (deleted) — Details

I also used this micro benchmark for shark profiles and I get following numbers: Changeset right before GC landing: idefix2:OPT.OBJ idefix2$ time ./js ../../../../tests/wave.js END real 0m13.179s user 0m13.076s sys 0m0.101s idefix2:OPT.OBJ idefix2$ time ./js -j ../../../../tests/wave.js END real 0m10.501s user 0m10.389s sys 0m0.110s idefix2:OPT.OBJ idefix2$ time ./js -jm ../../../../tests/wave.js END real 0m10.513s user 0m10.401s sys 0m0.109s idefix2:OPT.OBJ idefix2$ time ./js -m ../../../../tests/wave.js END real 0m10.627s user 0m10.498s sys 0m0.092s current tip: idefix2:OPT.OBJ idefix2$ time ./js ../../../../tests/wave.js END real 0m10.363s user 0m10.265s sys 0m0.096s idefix2:OPT.OBJ idefix2$ time ./js -j ../../../../tests/wave.js END real 0m7.991s user 0m7.894s sys 0m0.096s idefix2:OPT.OBJ idefix2$ time ./js -jm ../../../../tests/wave.js END real 0m7.942s user 0m7.847s sys 0m0.093s idefix2:OPT.OBJ idefix2$ time ./js -m ../../../../tests/wave.js END real 0m8.120s user 0m8.022s sys 0m0.097s

Gregor Wagner [:gwagner]

Comment 13

•

14 years ago

Ups ignore that testcase. That's the wrong one.

Igor Bukanov

Reporter

Comment 14

•

14 years ago

(In reply to comment #11) > This change was mostly driven by code simplicity. Unfortunately simpler does not translate into more efficient. Besides, given that we already have separated mark bitmap and delayed marking structures, having the header that comes together with arena adds feature complexity IMO (one have to deal both with embedded and separated structures instead of uniformity of having everything outside). > Do we have real numbers that > really force us to go back to the old system? The numbers above are real with independent callgrind confirmation. > SS, V8, See the last paragraph of the comment 9. It would be very surprising if the benchmark scores are affected. The benchmarks are mostly CPU bound so changes in the GC support structures that does not touch the free list implementation should not be visible in them. > also the browser versions > of the benchmarks? I will try to check the v8 in the browser. > I was measuring a lot for the new GC and the cycles needed to mark an object > (total marking time / amount of marked objects) went down. If the amount of marked objects went down, it implies different GC heuristics like GC running less frequent. It does not indicate that the marking time per object went down. It could well go up as the data here and in the bug 600648 indicates.

Gregor Wagner [:gwagner]

Comment 15

•

14 years ago

(In reply to comment #14) > (In reply to comment #11) > > This change was mostly driven by code simplicity. > > Unfortunately simpler does not translate into more efficient. Besides, given > that we already have separated mark bitmap and delayed marking structures, > having the header that comes together with arena adds feature complexity IMO > (one have to deal both with embedded and separated structures instead of > uniformity of having everything outside). I agree. If the measurements show that we could do better with more complex code then we have to go for it. But if we look at JSC and their GC implementation then the message is clearly simplicity wins! They prove that one can be faster with even simpler code. > > > Do we have real numbers that > > really force us to go back to the old system? > > The numbers above are real with independent callgrind confirmation. > > > SS, V8, > > See the last paragraph of the comment 9. It would be very surprising if the > benchmark scores are affected. The benchmarks are mostly CPU bound so changes > in the GC support structures that does not touch the free list implementation > should not be visible in them. > > > also the browser versions > > of the benchmarks? > > I will try to check the v8 in the browser. We still perform GC's in the current v8 shell harness. > > > I was measuring a lot for the new GC and the cycles needed to mark an object > > (total marking time / amount of marked objects) went down. > > If the amount of marked objects went down, it implies different GC heuristics > like GC running less frequent. It does not indicate that the marking time per > object went down. It could well go up as the data here and in the bug 600648 > indicates. Not the amount of marked objects went down. I measured the cycles needed on average to mark an object. So total cycles needed for marking process of a GC divided by the number of marked objects.

Igor Bukanov

Reporter

Comment 16

•

14 years ago

(In reply to comment #12) > Created attachment 479813 [details] > benchmark > > I also used this micro benchmark for shark profiles and I get following > numbers: The benchmark is very allocation sensitive and the compartment GC includes the inlining of the allocations. A better compassion would to check the pre-compartmental GC with the patch from the 595048 bug on top aginst the current tip. Or the older code versus the current tip plus the patch from the bug 558861 comment 67. In any case, the patch here shows a win from ~/m/tm/js/src> time ~/b/js64opt.tm/js -jm ~/s/wave.js END real 0m8.640s user 0m8.497s sys 0m0.086s to: ~/m/tm/js/src> time ~/b/js64opt.tm/js -jm ~/s/wave.js END real 0m7.416s user 0m7.292s sys 0m0.080s

Igor Bukanov

Reporter

Comment 17

•

14 years ago

(In reply to comment #15) > I agree. If the measurements show that we could do better with more complex > code then we have to go for it. But if we look at JSC and their GC > implementation then the message is clearly simplicity wins! They do not need to deal with threads, OOM, limited amount of the stack space and the need to have a cycle collector. They could also afford much simpler GC heuristics. Still our fast allocation path should be on pair with them. > Not the amount of marked objects went down. > I measured the cycles needed on average to mark an object. Hm, was it on 64 bit system?

Gregor Wagner [:gwagner]

Comment 18

•

14 years ago

> > Not the amount of marked objects went down. > > I measured the cycles needed on average to mark an object. > > Hm, was it on 64 bit system? Yes. All these measurements are part of bug 593729. It would make sense to tune everything in a larger context but I guess time is against me.

Gregor Wagner [:gwagner]

Comment 19

•

14 years ago

This patch doesn't apply to current tip.

Igor Bukanov

Reporter

Comment 20

•

14 years ago

(In reply to comment #19) > This patch doesn't apply to current tip. Sory, I forgot to mention it is on top of the patch from bug 600687 comment 3.

Igor Bukanov

Reporter

Comment 21

•

14 years ago

(In reply to comment #20) > (In reply to comment #19) > > This patch doesn't apply to current tip. > > Sory, I forgot to mention it is on top of the patch from bug 600687 comment 3. And I will land that in a minute.

Igor Bukanov

Reporter

Comment 22

•

14 years ago

In the browser running v8 on my laptop is too noisy to have any firm conclusions. Still the best v8 score among 20 runs went up from 2957 to 2981.

Gregor Wagner [:gwagner]

Comment 23

•

14 years ago

I get a bunch of warnings with this patch: jsgc.cpp ../jsgc.cpp: In function ‘void FinalizeArenaList(JSCompartment*, JSContext*, unsigned int) [with T = JSObject]’: ../jsgc.cpp:2308: instantiated from here ../jsgc.cpp:2001: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.cpp: In function ‘void FinalizeArenaList(JSCompartment*, JSContext*, unsigned int) [with T = JSFunction]’: ../jsgc.cpp:2309: instantiated from here ../jsgc.cpp:2001: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.cpp: In function ‘void FinalizeArenaList(JSCompartment*, JSContext*, unsigned int) [with T = JSXML]’: ../jsgc.cpp:2311: instantiated from here ../jsgc.cpp:2001: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.cpp: In function ‘void FinalizeArenaList(JSCompartment*, JSContext*, unsigned int) [with T = JSShortString]’: ../jsgc.cpp:2323: instantiated from here ../jsgc.cpp:2001: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.cpp: In function ‘void FinalizeArenaList(JSCompartment*, JSContext*, unsigned int) [with T = JSString]’: ../jsgc.cpp:2324: instantiated from here ../jsgc.cpp:2001: warning: dereferencing type-punned pointer will break strict-aliasing rules

Igor Bukanov

Reporter

Comment 24

•

14 years ago

(In reply to comment #23) > I get a bunch of warnings with this patch: This is from: ap = (Arena<T> **) &header->next; GCC 4.4 does not complain about it has much better aliasing analysis than older versions. But I guess on Mac one one has to deal with that.

Igor Bukanov

Reporter

Comment 25

•

14 years ago

Attached patch v6 (obsolete) (deleted) — Details — Splinter Review

This patch uses template-based header to avoid aliasing warnings.

Attachment #479751 - Attachment is obsolete: true

Attachment #480035 - Flags: review?(anygregor)

Attachment #479751 - Flags: review?(anygregor)

Igor Bukanov

Reporter

Comment 26

•

14 years ago

With the latest patch the callgrind reliably shows for the v8-splay benchmark the decrease of instruction count with the patch from 1.67e9 down to 1.65e9 or a 1.3% win. I cannot see that using the benchmark suite - the iCore7 CPU speed on my laptop fluctuates substantially depending on the thermal situation which adds way too much noise.

Gregor Wagner [:gwagner]

Comment 27

•

14 years ago

Yes the cycles used to mark an object went down. Thats good! I still get a bunch of warnings with this patch: jsapi.cpp ../jsgc.h: In member function ‘js::gc::ArenaHeader<T>* js::gc::Arena<T>::header() const [with T = JSObject]’: ../jsgc.h:234: instantiated from here ../jsgc.h:389: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.h: In member function ‘js::gc::ArenaHeader<T>* js::gc::Arena<T>::header() const [with T = JSString]’: ../jsgc.h:237: instantiated from here ../jsgc.h:389: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.h: In member function ‘js::gc::ArenaHeader<T>* js::gc::Arena<T>::header() const [with T = JSShortString]’: ../jsgc.h:240: instantiated from here ../jsgc.h:389: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.h: In member function ‘js::gc::ArenaHeader<T>* js::gc::Arena<T>::header() const [with T = JSFunction]’: ../jsgc.h:245: instantiated from here ../jsgc.h:389: warning: dereferencing type-punned pointer will break strict-aliasing rules ../jsgc.h: In member function ‘js::gc::ArenaHeader<T>* js::gc::Arena<T>::header() const [with T = js::gc::Cell]’: ../jsgc.h:441: instantiated from here ../jsgc.h:389: warning: dereferencing type-punned pointer will break strict-aliasing rules I also don't really like the chunkAndIndex function. Does it really make a difference? If yes please add a comment. I have very limited time now since I am no longer in Mountain View but I will finish the review once all warnings are gone.

Igor Bukanov

Reporter

Comment 28

•

14 years ago

Comment on attachment 480035 [details] [diff] [review] v6 I will rebase the patch on top of the bug 601075. That bugs removes EmptyArenaLists which will minimize the amount of changes here and should remove the aliasing warnings.

Attachment #480035 - Attachment is obsolete: true

Attachment #480035 - Flags: review?(anygregor)

Igor Bukanov

Reporter

Updated

•

14 years ago

Depends on: 605029

Igor Bukanov

Reporter

Comment 29

•

14 years ago

Attached patch v7 (obsolete) (deleted) — Details — Splinter Review

The new patch is based on patches from the bug 605029 comment 4 and bug 601234 comment 3. To avoid expensive divisions when accessing the header the patch shrinks ArenaHeader down to 4 words by sharing the free list offset together with the ting kind. An alternative of making the header to be 8 words has adverse results on the above benchmark. The patch gives about 0.3%-0.5% win in SS and no regressions in V8. AFAICS it is hard to see wins with V8 as the patch affects the GC heuristics by putting an extra object into the arena and that adds a big noise to v8. For a synthetic benchmark like the following code the patch gives a 5% win. function test() { gc(); var zz = ({}); var z = zz; for (count = 0; count < 1e5; count++) { z.next = ({}); z = z.next; } gc(); for(count = 0; count < 35; count++){ var xx = ({}); var x = xx; for (var j = 0; j < 6e5; ++j) { x.next = ({}); x = x.next; } gc(); xx = x = null; gc(); } } var t = Date.now(); test(); t = Date.now() - t; print("time: "+t);

Attachment #485262 - Flags: review?(gal)

Igor Bukanov

Reporter

Comment 30

•

14 years ago

Attached patch attachment for the wrong bug (obsolete) (deleted) — Details — Splinter Review

patch update to reflect bug 616666 chnages

Attachment #485262 - Attachment is obsolete: true

Attachment #527123 - Flags: review?(gal)

Attachment #485262 - Flags: review?(gal)

Igor Bukanov

Reporter

Updated

•

14 years ago

Attachment #527123 - Attachment description: v8 → attachment for the wrong bug

Attachment #527123 - Attachment is obsolete: true

Attachment #527123 - Flags: review?(gal)

Igor Bukanov

Reporter

Comment 31

•

13 years ago

I need this bug for 648320. There I plan to add a few 2^N size classes of GC things. But if the arena header is stored at the beginning of the arena, this wastes the space for one GC thing which will be substantial for N >= 5. In addition, it complicates allocation and access to the type map which will be necessary for the conservative GC.

Igor Bukanov

Reporter

Updated

•

13 years ago

Blocks: 648320
No longer blocks: 558861

Igor Bukanov

Reporter

Updated

•

13 years ago

Depends on: 665354

Igor Bukanov

Reporter

Comment 32

•

13 years ago

Attached patch v8 (deleted) — Details — Splinter Review

Here is a rebased patch, it looks it is necessary to get best results with smaller chunks in bug 671702.

BugBot [:suhaib / :marco/ :calixte]

Comment 33

•

3 years ago

The bug assignee didn't login in Bugzilla in the last 7 months, so the assignee is being reset.

Assignee: igor → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.