Closed Bug 553033 Opened 15 years ago Closed 15 years ago

Consider replacing the background free list with a vector.

Tracking

()

Status:

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 3 obsolete files)

counting cycles for string finalizers 15 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
test case to check the string finalization timing 15 years ago Igor Bukanov (deleted), text/plain		Details
work-in-progress 15 years ago Igor Bukanov (deleted), application/octet-stream		Details
v1 15 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v2 15 years ago Igor Bukanov (deleted), patch		Details \| Diff \| Splinter Review
v3 15 years ago Igor Bukanov (deleted), patch	Waldo : review+	Details \| Diff \| Splinter Review

Igor Bukanov

Assignee

Description

•

15 years ago

The bug 505612 has moved the free call that are invoked from the finalizers to the background thread. For that the code initially links all memory regions that is supposed to be freed into a linked list. That list is frees later on the background thread. But the linked list approach means that for each reagion that should be freed the GC must bring into the CPU cache the whole cache line containing the begining of the memory region. That may impose rather heavy memory bandwidth tax. An alternative for that would be to put to-be-freed-regions into a vector. This will avoid dereferencing the region and could allow for shorter GC pause. In bug 504640, we found that finalizers were the main cause of GC pauses. We propose running finalizers on a separate thread. js_GC would only collect the finalizers to be run and then hand them off to a new thread. js_GC runs finalizers in place only if memory cannot be allocated to store the pointers to be finalized.

Igor Bukanov

Assignee

Comment 1

•

15 years ago

Attached patch counting cycles for string finalizers (deleted) — Details — Splinter Review

The patch adds rdtsc cycle counting for string finalizers as the timing of cx->free() is most visible for strings.

Assignee: general → igor

Igor Bukanov

Assignee

Comment 2

•

15 years ago

Attached file test case to check the string finalization timing (deleted) — Details

The test case goes together with the measurement code from the comment 1.

Igor Bukanov

Assignee

Comment 3

•

15 years ago

Attached file work-in-progress (obsolete) (deleted) — Details

The patch replaces the background free list with a chunked vector. With the patch I can see clear improvements for rdtsc cycle counts during the finalization with the test case and measurement code above. For Atom N270 1.6GHz CPU the number of cycles dropped from 64 to 60 millions (measured as the best timing from 5 runs). For Xenon X3353 4 Core 2.66 GHz the win was more impressive. The number of cycles has dropped from 80 to 52 millions.

Igor Bukanov

Assignee

Comment 4

•

15 years ago

Attached patch v1 (obsolete) (deleted) — Details — Splinter Review

The new patch version adds comments and removes no longer necessary memory size checks in the allocation helpers. It shows the same improvements as the previous patch.

Attachment #433145 - Attachment is obsolete: true

Attachment #433168 - Flags: review?

Igor Bukanov

Assignee

Updated

•

15 years ago

Attachment #433168 - Flags: review? → review?(gal)

Benjamin Smedberg

Comment 5

•

15 years ago

If a finalizer thread is the goal, we'll need some significant changes in xpconnect. Xpconnect uses a combination of GC hooks and finalizers to "mark" certain C++ objects during GC and "sweep" them at the gc end hook. Also, the C++ destructors often associated with finalizer methods need to run on the main thread.

Igor Bukanov

Assignee

Comment 6

•

15 years ago

(In reply to comment #5) > If a finalizer thread is the goal, we'll need some significant changes in > xpconnect. For now the goal is to move to the background thread the finalization of doubles and strings and probably the standard objects and array. The GC thread will continue to deal with objects with non-default finalizers.

Igor Bukanov

Assignee

Comment 7

•

15 years ago

Attached patch v2 (obsolete) (deleted) — Details — Splinter Review

The new version removes the null check for the passed pointer from cx->free as the code never dereference it. Another change is to move the sweep task from JSThread to JSContext for less indirection there and to remove the thread pointer check. This dropped the cycle counter for the benchmark above by farther few percents. Now for Atom N270 1.6GHz CPU the number of cycles drops from 64 to 52 millions (measured as the best timing from 5 runs) and for Xenon X3353 4 Core 2.66 GHz the change is from 80 to 51 millions. In addition the patch does some renaming to minimize the number of changes that I need for the bug 543036.

Attachment #433168 - Attachment is obsolete: true

Attachment #433613 - Flags: review?

Attachment #433168 - Flags: review?(gal)

Igor Bukanov

Assignee

Comment 8

•

15 years ago

gal: review ping

Serge Gautherie (:sgautherie)

Updated

•

15 years ago

Attachment #433613 - Flags: review? → review?(gal)

Andreas Gal :gal

Comment 9

•

15 years ago

Sorry. Missed this one. Reviewing.

Igor Bukanov

Assignee

Comment 10

•

15 years ago

gal: do you have time to review this?

Andreas Gal :gal

Comment 11

•

15 years ago

Yeah will do. Sorry for the delay, again.

Igor Bukanov

Assignee

Comment 12

•

15 years ago

Attached patch v3 (deleted) — Details — Splinter Review

Patch update to sync with the tip changes.

Attachment #433613 - Attachment is obsolete: true

Attachment #436908 - Flags: review?(jwalden+bmo)

Attachment #433613 - Flags: review?(gal)

Jeff Walden [:Waldo]

Comment 13

•

15 years ago

Comment on attachment 436908 [details] [diff] [review] v3 >Index: tm/js/src/jsgc.cpp >+void >+BackgroundSweepTask::replenishAndFreeLater(void *ptr) >+{ >+ JS_ASSERT(freeCursor == freeCursorEnd); >+ do { >+ if (freeCursor) { >+ if (!freeVector.append(freeCursorEnd - FREE_ARRAY_LENGTH)) >+ break; >+ } I don't really like keeping the pointer to the current free array latent through subtraction, but I don't see a good way to simplify it while avoiding excess arithmetic. I suppose you could mmap 64K pages instead, but that complexity seems even worse to me. Please unify these two conditions like so: > if (freeCursor && !freeVector.append(freeCursor - FREE_ARRAY_LENGTH)) > break; >+ freeCursor = (void **) js_malloc(FREE_ARRAY_LENGTH * sizeof(void *)); You're multiplying by sizeof(void*) here, but you divide by sizeof(jsuword) elsewhere. I would rather not have to rely on this extra level of mental abstraction of remembering that the two types have the same size. Please use these two consts rather than having separated, differently-abstracted size-calculation arithmetic: > static const size_t FREE_ARRAY_SIZE = 0x10000; > static const size_t FREE_ARRAY_LENGTH = FREE_ARRAY_SIZE / sizeof(void*); >+void >+BackgroundSweepTask::run() >+{ >+ if (freeCursor) { >+ void **array = freeCursorEnd - FREE_ARRAY_LENGTH; >+ for (void **p = array; p != freeCursor; ++p) >+ js_free(*p); >+ js_free(array); >+ freeCursor = freeCursorEnd = NULL; >+ } else { >+ JS_ASSERT(!freeCursorEnd); >+ } >+ for (void ***iter = freeVector.begin(); iter != freeVector.end(); ++iter) { >+ void **array = *iter; >+ for (void **p = array, **end = array + FREE_ARRAY_LENGTH; p != end; ++p) >+ js_free(*p); >+ js_free(array); >+ } >+} Please split the array-and-elements-freeing code out into a private static method in BackgroundSweepTask, so that you don't have to have two essentially identical loops here. >Index: tm/js/src/jsgc.h >+/* >+ * During the finalization we do not call the free immediately. Rather we add "do not free", because 1) we're not calling free, we're calling js_free, and 2) "the" is implicit, and its addition is awkward (maybe even grammatically incorrect, not sure, I avoid having to know by rewriting such phrases :-) ). >+ * the corresponding pointers to a buffer which we later release on the >+ * background thread. >+ * >+ * The buffer is implemented as a vector of 64K arrays of pointers, not as a >+ * simple vector, to avoid realloc calls during the vector growth and not to >+ * bloat the binary size of the inlined add method. Any out of memory "and to not bloat" -- parallel "to...and to" is proper grammar, "to...and not to" is awkward. >+ * condition during the buffer growth is treated as a signal to call the free >+ * immediately. "Any OOM during buffer growth results in the pointer being freed immediately." Shorter, avoids the "call the free" nit from before (and again with "during the buffer growth"), >+class BackgroundSweepTask : public JSBackgroundTask { >+ static const size_t FREE_ARRAY_LENGTH = 0x10000 / sizeof(jsuword); >+ typedef Vector<void **, 0x10, js::SystemAllocPolicy> FreeVector; >+ FreeVector freeVector; Don't use a hex literal here, 16 is clearer. Too bad we don't have templatized typedefs from C++0x, or something, to avoid having to specify any default size here when all we want to do is customize allocation policy. This type is never used except in the typedef, so it doesn't serve much purpose. Please fold the type into the freeVector member declaration.

Attachment #436908 - Flags: review?(jwalden+bmo) → review+

Igor Bukanov

Assignee

Comment 14

•

15 years ago

https://hg.mozilla.org/tracemonkey/rev/4da31f657210

Whiteboard: fixed-in-tracemonkey

Bob Clary [:bc] (inactive)

Comment 15

•

15 years ago

Igor: ../jsgc.cpp: In function 'void js_FinishGC(JSRuntime*)': ../jsgc.cpp:1157: error: 'struct JSRuntime' has no member named 'gcHelperThread' linux x86_64 opt at least.

Bob Clary [:bc] (inactive)

Comment 16

•

15 years ago

ditto mac os x x86. this is for the unthreaded shell btw.

David Mandelin [:dmandelin]

Comment 17

•

15 years ago

Yes, all non-JS_THREADSAFE shell builds are still busted from this changeset.

Igor Bukanov

Assignee

Comment 18

•

15 years ago

http://hg.mozilla.org/tracemonkey/rev/dcf388c58def - change by jorendorf to fix single-threaded builds. Thanks!

Andreas Gal :gal

Comment 19

•

15 years ago

Fix for multi-threaded cross-library builds http://hg.mozilla.org/tracemonkey/rev/ba6b5094a11f

Robert Sayre

Comment 20

•

15 years ago

http://hg.mozilla.org/mozilla-central/rev/4da31f657210

Status: NEW → RESOLVED

Closed: 15 years ago

Resolution: --- → FIXED

You need to log in before you can comment on or make changes to this bug.