Closed
Bug 553033
Opened 15 years ago
Closed 15 years ago
Consider replacing the background free list with a vector.
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(3 files, 3 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Waldo
:
review+
|
Details | Diff | Splinter Review |
The bug 505612 has moved the free call that are invoked from the finalizers to the background thread. For that the code initially links all memory regions that is supposed to be freed into a linked list. That list is frees later on the background thread.
But the linked list approach means that for each reagion that should be freed the GC must bring into the CPU cache the whole cache line containing the begining of the memory region. That may impose rather heavy memory bandwidth tax. An alternative for that would be to put to-be-freed-regions into a vector. This will avoid dereferencing the region and could allow for shorter GC pause.
In bug 504640, we found that finalizers were the main cause of GC pauses. We propose running finalizers on a separate thread. js_GC would only collect the finalizers to be run and then hand them off to a new thread. js_GC runs finalizers in place only if memory cannot be allocated to store the pointers to be finalized.
Assignee | ||
Comment 1•15 years ago
|
||
The patch adds rdtsc cycle counting for string finalizers as the timing of cx->free() is most visible for strings.
Assignee: general → igor
Assignee | ||
Comment 2•15 years ago
|
||
The test case goes together with the measurement code from the comment 1.
Assignee | ||
Comment 3•15 years ago
|
||
The patch replaces the background free list with a chunked vector. With the patch I can see clear improvements for rdtsc cycle counts during the finalization with the test case and measurement code above.
For Atom N270 1.6GHz CPU the number of cycles dropped from 64 to 60 millions (measured as the best timing from 5 runs). For Xenon X3353 4 Core 2.66 GHz the win was more impressive. The number of cycles has dropped from 80 to 52 millions.
Assignee | ||
Comment 4•15 years ago
|
||
The new patch version adds comments and removes no longer necessary memory size checks in the allocation helpers. It shows the same improvements as the previous patch.
Attachment #433145 -
Attachment is obsolete: true
Attachment #433168 -
Flags: review?
Assignee | ||
Updated•15 years ago
|
Attachment #433168 -
Flags: review? → review?(gal)
Comment 5•15 years ago
|
||
If a finalizer thread is the goal, we'll need some significant changes in xpconnect. Xpconnect uses a combination of GC hooks and finalizers to "mark" certain C++ objects during GC and "sweep" them at the gc end hook. Also, the C++ destructors often associated with finalizer methods need to run on the main thread.
Assignee | ||
Comment 6•15 years ago
|
||
(In reply to comment #5)
> If a finalizer thread is the goal, we'll need some significant changes in
> xpconnect.
For now the goal is to move to the background thread the finalization of doubles and strings and probably the standard objects and array. The GC thread will continue to deal with objects with non-default finalizers.
Assignee | ||
Comment 7•15 years ago
|
||
The new version removes the null check for the passed pointer from cx->free as the code never dereference it. Another change is to move the sweep task from JSThread to JSContext for less indirection there and to remove the thread pointer check.
This dropped the cycle counter for the benchmark above by farther few percents. Now for Atom N270 1.6GHz CPU the number of cycles drops from 64 to 52 millions (measured as the best timing from 5 runs) and for Xenon X3353 4 Core 2.66 GHz the change is from 80 to 51 millions.
In addition the patch does some renaming to minimize the number of changes that I need for the bug 543036.
Attachment #433168 -
Attachment is obsolete: true
Attachment #433613 -
Flags: review?
Attachment #433168 -
Flags: review?(gal)
Assignee | ||
Comment 8•15 years ago
|
||
gal: review ping
Updated•15 years ago
|
Attachment #433613 -
Flags: review? → review?(gal)
Comment 9•15 years ago
|
||
Sorry. Missed this one. Reviewing.
Assignee | ||
Comment 10•15 years ago
|
||
gal: do you have time to review this?
Comment 11•15 years ago
|
||
Yeah will do. Sorry for the delay, again.
Assignee | ||
Comment 12•15 years ago
|
||
Patch update to sync with the tip changes.
Attachment #433613 -
Attachment is obsolete: true
Attachment #436908 -
Flags: review?(jwalden+bmo)
Attachment #433613 -
Flags: review?(gal)
Comment 13•15 years ago
|
||
Comment on attachment 436908 [details] [diff] [review]
v3
>Index: tm/js/src/jsgc.cpp
>+void
>+BackgroundSweepTask::replenishAndFreeLater(void *ptr)
>+{
>+ JS_ASSERT(freeCursor == freeCursorEnd);
>+ do {
>+ if (freeCursor) {
>+ if (!freeVector.append(freeCursorEnd - FREE_ARRAY_LENGTH))
>+ break;
>+ }
I don't really like keeping the pointer to the current free array latent through subtraction, but I don't see a good way to simplify it while avoiding excess arithmetic. I suppose you could mmap 64K pages instead, but that complexity seems even worse to me.
Please unify these two conditions like so:
> if (freeCursor && !freeVector.append(freeCursor - FREE_ARRAY_LENGTH))
> break;
>+ freeCursor = (void **) js_malloc(FREE_ARRAY_LENGTH * sizeof(void *));
You're multiplying by sizeof(void*) here, but you divide by sizeof(jsuword) elsewhere. I would rather not have to rely on this extra level of mental abstraction of remembering that the two types have the same size. Please use these two consts rather than having separated, differently-abstracted size-calculation arithmetic:
> static const size_t FREE_ARRAY_SIZE = 0x10000;
> static const size_t FREE_ARRAY_LENGTH = FREE_ARRAY_SIZE / sizeof(void*);
>+void
>+BackgroundSweepTask::run()
>+{
>+ if (freeCursor) {
>+ void **array = freeCursorEnd - FREE_ARRAY_LENGTH;
>+ for (void **p = array; p != freeCursor; ++p)
>+ js_free(*p);
>+ js_free(array);
>+ freeCursor = freeCursorEnd = NULL;
>+ } else {
>+ JS_ASSERT(!freeCursorEnd);
>+ }
>+ for (void ***iter = freeVector.begin(); iter != freeVector.end(); ++iter) {
>+ void **array = *iter;
>+ for (void **p = array, **end = array + FREE_ARRAY_LENGTH; p != end; ++p)
>+ js_free(*p);
>+ js_free(array);
>+ }
>+}
Please split the array-and-elements-freeing code out into a private static method in BackgroundSweepTask, so that you don't have to have two essentially identical loops here.
>Index: tm/js/src/jsgc.h
>+/*
>+ * During the finalization we do not call the free immediately. Rather we add
"do not free", because 1) we're not calling free, we're calling js_free, and 2) "the" is implicit, and its addition is awkward (maybe even grammatically incorrect, not sure, I avoid having to know by rewriting such phrases :-) ).
>+ * the corresponding pointers to a buffer which we later release on the
>+ * background thread.
>+ *
>+ * The buffer is implemented as a vector of 64K arrays of pointers, not as a
>+ * simple vector, to avoid realloc calls during the vector growth and not to
>+ * bloat the binary size of the inlined add method. Any out of memory
"and to not bloat" -- parallel "to...and to" is proper grammar, "to...and not to" is awkward.
>+ * condition during the buffer growth is treated as a signal to call the free
>+ * immediately.
"Any OOM during buffer growth results in the pointer being freed immediately." Shorter, avoids the "call the free" nit from before (and again with "during the buffer growth"),
>+class BackgroundSweepTask : public JSBackgroundTask {
>+ static const size_t FREE_ARRAY_LENGTH = 0x10000 / sizeof(jsuword);
>+ typedef Vector<void **, 0x10, js::SystemAllocPolicy> FreeVector;
>+ FreeVector freeVector;
Don't use a hex literal here, 16 is clearer. Too bad we don't have templatized typedefs from C++0x, or something, to avoid having to specify any default size here when all we want to do is customize allocation policy.
This type is never used except in the typedef, so it doesn't serve much purpose. Please fold the type into the freeVector member declaration.
Attachment #436908 -
Flags: review?(jwalden+bmo) → review+
Assignee | ||
Comment 14•15 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 15•15 years ago
|
||
Igor:
../jsgc.cpp: In function 'void js_FinishGC(JSRuntime*)':
../jsgc.cpp:1157: error: 'struct JSRuntime' has no member named 'gcHelperThread'
linux x86_64 opt at least.
Comment 16•15 years ago
|
||
ditto mac os x x86. this is for the unthreaded shell btw.
Comment 17•15 years ago
|
||
Yes, all non-JS_THREADSAFE shell builds are still busted from this changeset.
Assignee | ||
Comment 18•15 years ago
|
||
http://hg.mozilla.org/tracemonkey/rev/dcf388c58def - change by jorendorf to fix single-threaded builds. Thanks!
Comment 19•15 years ago
|
||
Fix for multi-threaded cross-library builds
http://hg.mozilla.org/tracemonkey/rev/ba6b5094a11f
Comment 20•15 years ago
|
||
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•