Closed Bug 553033 Opened 15 years ago Closed 15 years ago

Consider replacing the background free list with a vector.

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: igor, Assigned: igor)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(3 files, 3 obsolete files)

The bug 505612 has moved the free call that are invoked from the finalizers to the background thread. For that the code initially links all memory regions that is supposed to be freed into a linked list. That list is frees later on the background thread. But the linked list approach means that for each reagion that should be freed the GC must bring into the CPU cache the whole cache line containing the begining of the memory region. That may impose rather heavy memory bandwidth tax. An alternative for that would be to put to-be-freed-regions into a vector. This will avoid dereferencing the region and could allow for shorter GC pause. In bug 504640, we found that finalizers were the main cause of GC pauses. We propose running finalizers on a separate thread. js_GC would only collect the finalizers to be run and then hand them off to a new thread. js_GC runs finalizers in place only if memory cannot be allocated to store the pointers to be finalized.
The patch adds rdtsc cycle counting for string finalizers as the timing of cx->free() is most visible for strings.
Assignee: general → igor
The test case goes together with the measurement code from the comment 1.
Attached file work-in-progress (obsolete) (deleted) —
The patch replaces the background free list with a chunked vector. With the patch I can see clear improvements for rdtsc cycle counts during the finalization with the test case and measurement code above. For Atom N270 1.6GHz CPU the number of cycles dropped from 64 to 60 millions (measured as the best timing from 5 runs). For Xenon X3353 4 Core 2.66 GHz the win was more impressive. The number of cycles has dropped from 80 to 52 millions.
Attached patch v1 (obsolete) (deleted) — Splinter Review
The new patch version adds comments and removes no longer necessary memory size checks in the allocation helpers. It shows the same improvements as the previous patch.
Attachment #433145 - Attachment is obsolete: true
Attachment #433168 - Flags: review?
Attachment #433168 - Flags: review? → review?(gal)
If a finalizer thread is the goal, we'll need some significant changes in xpconnect. Xpconnect uses a combination of GC hooks and finalizers to "mark" certain C++ objects during GC and "sweep" them at the gc end hook. Also, the C++ destructors often associated with finalizer methods need to run on the main thread.
(In reply to comment #5) > If a finalizer thread is the goal, we'll need some significant changes in > xpconnect. For now the goal is to move to the background thread the finalization of doubles and strings and probably the standard objects and array. The GC thread will continue to deal with objects with non-default finalizers.
Attached patch v2 (obsolete) (deleted) — Splinter Review
The new version removes the null check for the passed pointer from cx->free as the code never dereference it. Another change is to move the sweep task from JSThread to JSContext for less indirection there and to remove the thread pointer check. This dropped the cycle counter for the benchmark above by farther few percents. Now for Atom N270 1.6GHz CPU the number of cycles drops from 64 to 52 millions (measured as the best timing from 5 runs) and for Xenon X3353 4 Core 2.66 GHz the change is from 80 to 51 millions. In addition the patch does some renaming to minimize the number of changes that I need for the bug 543036.
Attachment #433168 - Attachment is obsolete: true
Attachment #433613 - Flags: review?
Attachment #433168 - Flags: review?(gal)
gal: review ping
Attachment #433613 - Flags: review? → review?(gal)
Sorry. Missed this one. Reviewing.
gal: do you have time to review this?
Yeah will do. Sorry for the delay, again.
Attached patch v3 (deleted) — Splinter Review
Patch update to sync with the tip changes.
Attachment #433613 - Attachment is obsolete: true
Attachment #436908 - Flags: review?(jwalden+bmo)
Attachment #433613 - Flags: review?(gal)
Comment on attachment 436908 [details] [diff] [review] v3 >Index: tm/js/src/jsgc.cpp >+void >+BackgroundSweepTask::replenishAndFreeLater(void *ptr) >+{ >+ JS_ASSERT(freeCursor == freeCursorEnd); >+ do { >+ if (freeCursor) { >+ if (!freeVector.append(freeCursorEnd - FREE_ARRAY_LENGTH)) >+ break; >+ } I don't really like keeping the pointer to the current free array latent through subtraction, but I don't see a good way to simplify it while avoiding excess arithmetic. I suppose you could mmap 64K pages instead, but that complexity seems even worse to me. Please unify these two conditions like so: > if (freeCursor && !freeVector.append(freeCursor - FREE_ARRAY_LENGTH)) > break; >+ freeCursor = (void **) js_malloc(FREE_ARRAY_LENGTH * sizeof(void *)); You're multiplying by sizeof(void*) here, but you divide by sizeof(jsuword) elsewhere. I would rather not have to rely on this extra level of mental abstraction of remembering that the two types have the same size. Please use these two consts rather than having separated, differently-abstracted size-calculation arithmetic: > static const size_t FREE_ARRAY_SIZE = 0x10000; > static const size_t FREE_ARRAY_LENGTH = FREE_ARRAY_SIZE / sizeof(void*); >+void >+BackgroundSweepTask::run() >+{ >+ if (freeCursor) { >+ void **array = freeCursorEnd - FREE_ARRAY_LENGTH; >+ for (void **p = array; p != freeCursor; ++p) >+ js_free(*p); >+ js_free(array); >+ freeCursor = freeCursorEnd = NULL; >+ } else { >+ JS_ASSERT(!freeCursorEnd); >+ } >+ for (void ***iter = freeVector.begin(); iter != freeVector.end(); ++iter) { >+ void **array = *iter; >+ for (void **p = array, **end = array + FREE_ARRAY_LENGTH; p != end; ++p) >+ js_free(*p); >+ js_free(array); >+ } >+} Please split the array-and-elements-freeing code out into a private static method in BackgroundSweepTask, so that you don't have to have two essentially identical loops here. >Index: tm/js/src/jsgc.h >+/* >+ * During the finalization we do not call the free immediately. Rather we add "do not free", because 1) we're not calling free, we're calling js_free, and 2) "the" is implicit, and its addition is awkward (maybe even grammatically incorrect, not sure, I avoid having to know by rewriting such phrases :-) ). >+ * the corresponding pointers to a buffer which we later release on the >+ * background thread. >+ * >+ * The buffer is implemented as a vector of 64K arrays of pointers, not as a >+ * simple vector, to avoid realloc calls during the vector growth and not to >+ * bloat the binary size of the inlined add method. Any out of memory "and to not bloat" -- parallel "to...and to" is proper grammar, "to...and not to" is awkward. >+ * condition during the buffer growth is treated as a signal to call the free >+ * immediately. "Any OOM during buffer growth results in the pointer being freed immediately." Shorter, avoids the "call the free" nit from before (and again with "during the buffer growth"), >+class BackgroundSweepTask : public JSBackgroundTask { >+ static const size_t FREE_ARRAY_LENGTH = 0x10000 / sizeof(jsuword); >+ typedef Vector<void **, 0x10, js::SystemAllocPolicy> FreeVector; >+ FreeVector freeVector; Don't use a hex literal here, 16 is clearer. Too bad we don't have templatized typedefs from C++0x, or something, to avoid having to specify any default size here when all we want to do is customize allocation policy. This type is never used except in the typedef, so it doesn't serve much purpose. Please fold the type into the freeVector member declaration.
Attachment #436908 - Flags: review?(jwalden+bmo) → review+
Whiteboard: fixed-in-tracemonkey
Igor: ../jsgc.cpp: In function 'void js_FinishGC(JSRuntime*)': ../jsgc.cpp:1157: error: 'struct JSRuntime' has no member named 'gcHelperThread' linux x86_64 opt at least.
ditto mac os x x86. this is for the unthreaded shell btw.
Yes, all non-JS_THREADSAFE shell builds are still busted from this changeset.
http://hg.mozilla.org/tracemonkey/rev/dcf388c58def - change by jorendorf to fix single-threaded builds. Thanks!
Fix for multi-threaded cross-library builds http://hg.mozilla.org/tracemonkey/rev/ba6b5094a11f
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: