Closed
Bug 392263
Opened 17 years ago
Closed 17 years ago
Using CPU pages as GC arenas
Categories
(Core :: DOM: Core & HTML, enhancement)
Core
DOM: Core & HTML
Tracking
()
RESOLVED
FIXED
People
(Reporter: igor, Assigned: igor)
References
Details
Attachments
(1 file, 30 obsolete files)
(deleted),
patch
|
brendan
:
review+
brendan
:
approval1.9+
|
Details | Diff | Splinter Review |
[This is a spin-off of bug 51 comment 157334]
It is worth to try to use CPU pages allocated via mmap or VirtualAlloc as GC arenas. It should allow for very fast js_GetGCThingFlags and would remove all the alignment complexity and overhead in the current jsgc.c.
Assignee | ||
Comment 1•17 years ago
|
||
This is backup of the initial implementation that does not use any mmap or VirtualAlloc calls. Instead it just allocs 9 * page_size bytes and uses 8 aligned pages inside this chunk with the overhead of 12.5%. I will test the code and later use it as a default implementation when mmap is not available on the target platform.
Comment 2•17 years ago
|
||
It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a class of security exploits that contrive to put the pc in a JS string or similar and execute its characters as instructions.
/be
Assignee | ||
Comment 3•17 years ago
|
||
(In reply to comment #2)
> It would also allow us to map PROT_READ|PROT_WRITE, no PROT_EXEC -- defeating a
> class of security exploits that contrive to put the pc in a JS string or
> similar and execute its characters as instructions.
AFAICS !PROT_EXEC for GC thinsg does not help since malloc would continue to allocate charaters for strings. Thus spraying the heap with strings with various forms of NOP instructions chunks before the bad payload would still work.
Comment 5•17 years ago
|
||
I was hoping we would get to the point where string character vectors would be allocated from ~PROT_EXEC memory.
/be
Assignee | ||
Comment 6•17 years ago
|
||
New versions with less overhead when mmap is not available plus infrastructure to use several arenas per mmap call. The latter is necessary to minimize overhead of mmap call.
Attachment #276841 -
Attachment is obsolete: true
Assignee | ||
Comment 7•17 years ago
|
||
The check point before adding mmap support on POSIX. Note that even without mmap support the patch shows faster allocation / GC on simple benchmarks.
Assignee | ||
Updated•17 years ago
|
Attachment #277636 -
Attachment is obsolete: true
Assignee | ||
Comment 8•17 years ago
|
||
The new version uses mmap on POSIX systems when available.
Attachment #277949 -
Attachment is obsolete: true
Assignee | ||
Comment 9•17 years ago
|
||
The new version fixes warnings with the last one when compiling FF.
Attachment #277973 -
Attachment is obsolete: true
Assignee | ||
Comment 10•17 years ago
|
||
The new version adds VirtualAlloc calls under Windows. This is based on info from http://msdn2.microsoft.com/en-us/library/aa366887.aspxbut I have no means to test it.
Attachment #277979 -
Attachment is obsolete: true
Assignee | ||
Comment 11•17 years ago
|
||
This is optimized version of v1. Notes on the patch:
1. Patch uses 4K arenas aligned on 4K boundaries to allocate GC things. This is hard coded even with 64-bit CPU as x64 uses 4K CPU pages. If mmap is not available, then to get properly alignment arenas the code allocates 32K and uses seven arenas in the block that are guaranteed to be aligned on 4K boundary.
2. Even with mmap available the patch allocates arenas in blocks to minimize the overhead of mmap.
3. As the page size is not a compile-time constant, for extra portability the code checks for the page size at startup and switches to the fall back schema if the size exceeds some sanity limit. Since the arena size is a compile-time constant, this runtime checks adds no visible overhead.
4. The XP_WIN part of the code is not tested beyond parsing by C compiler and based on http://msdn2.microsoft.com/en-us/library/aa366887.aspx.
5. Various synthetic benchmarks like:
function test(n)
{
var now = Date.now;
var startTime = now();
var o = null;
var array = [];
// Interleave garbage with reachable things
while (n-- != 0) {
var x = 1.1 + n;
if (n % 2)
array.push(x);
x = null;
}
for (var i = 0; i != 20; ++i)
gc();
}
test(100 * 1000);
shows a slight speedup of about 1% with the patch.
Attachment #278024 -
Attachment is obsolete: true
Attachment #278061 -
Flags: review?(brendan)
Comment 12•17 years ago
|
||
I will review by Monday, probably by tomorrow.
/be
Assignee | ||
Comment 13•17 years ago
|
||
The new version fixes a bug in OOM detection with mmap. When memory is not available, mmap returns MAP_FAILED or (void *) -1, not NULL.
Attachment #278061 -
Attachment is obsolete: true
Attachment #278240 -
Flags: review?(brendan)
Attachment #278061 -
Flags: review?(brendan)
Assignee | ||
Comment 14•17 years ago
|
||
The new version allows checks allows to explicitly define JS_GC_USE_MMAP during compilation to either 0 or 1 to disable mmap completely or to forces the inclusion of proper system headers.
Attachment #278240 -
Attachment is obsolete: true
Attachment #278241 -
Flags: review?(brendan)
Attachment #278240 -
Flags: review?(brendan)
Comment 15•17 years ago
|
||
Comment on attachment 278241 [details] [diff] [review]
v4
The major comment starting:
* Encodes a flag indicating if the arena is the first in the block, the
is too ragged-right -- reformat as if by Q commands in vim with tw=78. Also:
* the index of arena is either the index of a free arena holding
should say "the index of the arena" or "the index of this arena in its block" or something like that.
In:
* All blocks that have at least one free arena are put to the doubly-linked
s/put to/put in/
I saw another "put" abusage, or three (two pre-existing): "
* All blocks that have at least one free arena are put to the doubly-linked
for these, the simplest fix is "put on".
* the head of linked list of free arenas in the block together with previous
how about "the head of a linked free arena list for the block" or "the head of the block's free arena list"?
* it is removed from the list and BLOCK_ALL_USED_FLAG is set indicating that
BLOCK_ALL_USED_FLAG is not defined, so this comment seems stale.
One question: what bounds the number of mmap calls, if anything? These are not as cheap as one might like, and may consume enough OS overhead that we would prefer bigger chunks than 7 or 8 arenas' worth.
More comments in a bit, feel free to put up a new patch.
/be
Comment 16•17 years ago
|
||
Sorry, I meant to suggest "put on" for
* All blocks that have at least one free arena are put to the doubly-linked
not "put in". "Inserted in" but "put on". Micro-nit on my own comment, but it was inconsistent with itself (later recommending "put on" over "put to").
/be
Assignee | ||
Comment 17•17 years ago
|
||
(In reply to comment #15)
> One question: what bounds the number of mmap calls, if anything?
In the current patch the limit on the number of arenas in the block is limited by 2^ARENA_INDEX_BITS - 1 or 2^(JS_GC_ARENA_SHIFT - 1) - 1 which gives the maximum 2047 arenas per block limiting the block size by 8MB.
> These are not
> as cheap as one might like, and may consume enough OS overhead that we would
> prefer bigger chunks than 7 or 8 arenas' worth.
With synthetic benchmark like
for (var i = 0; i != 1000*1000; ++i) var tmp = i+0.1;
I have not noticed a difference with 4 or more arenas on Linux, so the patch uses 8 under assumption that on Windows, where system calls are more expensive, they are not that much worse than on Linux. But clearly the final value should be exposed by tests on Windows.
With no mmap available the patch defaults to 7 and not 8 arenas so the total size of malloc block would be the same as with mmap. In principle for less overhead for string and double GC things the default may be set to 15, but then that has to be balanced with heap fragmentation.
Assignee | ||
Comment 18•17 years ago
|
||
Fixing comments as suggested.
Attachment #278241 -
Attachment is obsolete: true
Attachment #278473 -
Flags: review?(brendan)
Attachment #278241 -
Flags: review?(brendan)
Comment 19•17 years ago
|
||
Comment on attachment 278473 [details] [diff] [review]
v4b
More comments, sorry for incremental review. One more cycle after this, new patch optional (if you are up still).
"Block" is typically an i/o unit, atomic to some part of a storage hierarchy, not a cluster of arenas or pages. Suggest Cluster or Chunk (latter is as short as Block and less specific). If we really wanted the right words for a hierarchy of Arena/Block, we might prefer Page and Cluster or Segment. Segment implies contiguous span of memory. But Chunk would suffice and be better than Block in my book.
s/gcUnscannedArenaStackTop/gcUnscannedStackTop/ or even gcUnscannedArenaTop -- Top implies Stack.
Please remove pre-existing FINALIZE_LEN deadwood.
Need Windows (and Mac?) build buddies -- I will help test Mac tonight if needed. Crowder, can you try Windows?
In:
* Effectively things are allocated from the start of things' area and flags
* are allocated from the end of flags' area. Such arrangement avoids
please use "their" instead of "things'" and "flags'". Also a comma after "Effectively", but I think you could just lose "Effectively" altogether.
Just wondering why the JS_GC_ prefixing of macros -- GC_ might not be enough in light of Windows.h?
/be
Assignee | ||
Comment 20•17 years ago
|
||
Besides addressing nits and doing renames as suggested above, the patch fixes js_InitGC to set js_gcUseMmap to false with too small/too big pages.
In the new patch only JS_GC_USE_MMAP uses JS_GC_ prefix as the macro can be set from the command line. The rest of macros just use GC_ when necessary.
Attachment #278473 -
Attachment is obsolete: true
Attachment #278555 -
Flags: review?(brendan)
Attachment #278473 -
Flags: review?(brendan)
Assignee | ||
Comment 21•17 years ago
|
||
(In reply to comment #19)
> If we really wanted the right words for a
> hierarchy of Arena/Block, we might prefer Page and Cluster or Segment.
In an initial prototype of the patch I replaced arena by page and used the former to mean "chunk" in the new patch. But then I realized that size(gc arena) != size(cpu page) in general and, to avoid confusion with CPU pages and extra renames in js_GC and other places explicitly accessing gc arenas, I kept the arena.
On the other hand, given the alignment requirements, "page" plays nicely with "chunk" and comments can easily clarify that gc page is not a cpu page. So should I rename JSGCArenaInfo a -> JSGCPageInfo page and update the comments accordingly?
Assignee | ||
Comment 22•17 years ago
|
||
jsgc.c currently uses the word "chunk" to mean "Number of things covered by a single bit of JSGCArena.unscannedThings." The new patch changes that usage into "segment" as "chunk" now means a sequence of GC arenas.
Attachment #278555 -
Attachment is obsolete: true
Attachment #278557 -
Flags: review?
Attachment #278555 -
Flags: review?(brendan)
Comment 23•17 years ago
|
||
v5b patch introduces the following new warnings:
jsgc.c(2615) : warning C4018: '<' : signed/unsigned mismatch
jsgc.c(2678) : warning C4018: '<' : signed/unsigned mismatch
The test-suite isn't working for me on Windows right now, I'll play with it in the morning.
Assignee | ||
Comment 24•17 years ago
|
||
Here is an attempt to fix the warnings reported by MSVC. I think it comes uint16->int32 promotion rule in arena->list->thingSize + 1 where thingSize is uint16 as 1 is int, not uint, literal. Thus the new patch uses explicit (uint32) 1.
Attachment #278557 -
Attachment is obsolete: true
Attachment #278560 -
Flags: review?(brendan)
Attachment #278557 -
Flags: review?
Assignee | ||
Comment 25•17 years ago
|
||
When testing on Windows, it would be nice to know how the timing with an optimized build of jsshell for following test is affected when one change 8 to 1-2-4-16 in lines 928-929, js_InitGC,
js_gcArenasPerChunk = (arenasPerPage < 8)
? 8
: (size_t) arenasPerPage;
The test:
function test()
{
var startTime = Date.now();
var i = 1000*1000;
var tmp = 0;
do {
tmp += 0.1;
} while (--i != 0);
tmp = null;
gc();
var time = Date.now() - startTime;
print("Test time: "+time+"ms");
}
test();
This will measure the overhead of mmap/VirtualAlloc on Posix/Windows.
On my 1.1Ghz Pentium M laptom the best timing from 20 runs was:
js_gcArenasPerChunk test-timing sys-timing
1 454 12
2 431 9
4 428 4
8 432 5
16 428 5
where sys-timing is the best system time as reported by
time ./Linux_All_OPT.OBJ/js ~/m/y.js > /dev/null
So I will change the default from 8 to 4 under Linux since it seems that very slight but increase when going from 4 to 8 pages is real.
Comment 26•17 years ago
|
||
I was getting results all over the map with i == 1,000,000, so I made it 10,000,000 instead (best result of ten runs of test()):
1: 2203
2: 2140
4: 2094
8: 2094
16: 2091
Alas, still getting the same compiler warnings (new line numbers):
jsgc.c(2616) : warning C4018: '<' : signed/unsigned mismatch
jsgc.c(2679) : warning C4018: '<' : signed/unsigned mismatch
Assignee | ||
Comment 27•17 years ago
|
||
(In reply to comment #25)
> js_gcArenasPerChunk test-timing sys-timing
> 1 454 12
> 2 431 9
> 4 428 4
> 8 432 5
> 16 428 5
Here is the results without the patch:
> without-the-patch 474 5
> patch with no mmap 470 6
Where "patch with no mmap" are results of the patch compiled with -DJS_GC_USE_MMAP=0.
Assignee | ||
Comment 28•17 years ago
|
||
(In reply to comment #26)
> I was getting results all over the map with i == 1,000,000, so I made it
> 10,000,000 instead (best result of ten runs of test()):
>
> 1: 2203
> 2: 2140
> 4: 2094
> 8: 2094
> 16: 2091
What about 24 and 32 and without the patch? I just want to confirm that 8 reaches plateau on Windows as well.
Assignee | ||
Comment 29•17 years ago
|
||
This should fix all the warnings.
Attachment #278560 -
Attachment is obsolete: true
Attachment #278569 -
Flags: review?(brendan)
Attachment #278560 -
Flags: review?(brendan)
Comment 30•17 years ago
|
||
> What about 24 and 32 and without the patch? I just want to confirm that 8
> reaches plateau on Windows as well.
>
24: 2150
32: 2134
Without the patch:
2088! And hovering in the mid-to-low 2090's for most runs. The new version does fix all warnings, however.
Assignee | ||
Comment 31•17 years ago
|
||
(In reply to comment #30)
> > What about 24 and 32 and without the patch? I just want to confirm that 8
> > reaches plateau on Windows as well.
> >
>
> 24: 2150
> 32: 2134
So this means that VirtualAlloc(4 CPU pages) brings the lowest overall overhead on Windows as well.
>
> Without the patch:
>
> 2088!
Interesting. It seems that allocating of 9K is extremely optimized on Windows.
Could you repeat the test with 4, 8 and 16 gc arenas per chunk and modified test case where double is replaced by object:
function test()
{
var startTime = Date.now();
var i = 10*1000*1000 / 4; // sizeof(object) = 4*sizeof(double)
var tmp = 0;
do {
tmp = {};
} while (--i != 0);
tmp = null;
gc();
var time = Date.now() - startTime;
print("Test time: "+time+"ms");
}
test();
With the patch the total memory allocation for the test should be 10% less so it would be interesting to compare with/without patch cases.
Assignee | ||
Comment 32•17 years ago
|
||
Timing with the test case from comment 31 on a 1.1 GHz Pentium M Laptop under Fedora 7:
js_gcArenasPerChunk test-timing
1 2477
2 2450
4 2433
8 2431
16 2413
24 2414
32* 2426
64* 2424
patch with no mmap 2424
no patch 2435
Now it seems 16 arenas is best when allocating objects.
* GC_ARENAS_PER_CPU_PAGE_LIMIT was altered to be 64 during the run
Comment 33•17 years ago
|
||
From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS X?
/be
Comment 34•17 years ago
|
||
(In reply to comment #33)
> From comment 26, 16 arenas looks best on Windows too, right? How about Mac OS
> X?
It seems so, though I get equivalent or very, very slightly better results without the patch on Win32.
Assignee | ||
Comment 35•17 years ago
|
||
(In reply to comment #34)
> It seems so, though I get equivalent or very, very slightly better results
> without the patch on Win32.
One more question about Windows performance. What happens if GC_ARENAS_PER_CPU_PAGE_LIMIT is altered to 1000 and js_gcArenasPerChunk is forced to be 128/256 with double/object test cases?
Comment 36•17 years ago
|
||
Comment on attachment 278569 [details] [diff] [review]
v7
* from the end of their area. Such arrangement avoids calculating the
Nit: could just say "This avoids ..." and wrap better (maybe).
* Macros to decode/encode previous arena on the GC list and chunk gap
Nit: "the previous arena ..."
Couldn't you use bit-fields instead of word1 and word2 and BIG_UGLY_MACROS (I know, I started that trend long ago ;-)?
s/\<bi\>/ci/g (block=>chunk renaming fallout)
Nit: 1U is slightly easier on the eyes than (uint32) 1, and just as good.
Suggest avoiding SEGMENT and just using SPAN (spanIndex, etc.) in context of UNSCANNED_ and the like -- just as clear or moreso because shorter, less associated with large (multi-"page") runs of memory, and (yeah) shorter.
I'm ok with arena/chunk instead of page/chunk or similar. It's hard to get these names right, but we're close and arena is traditional.
Need clarity on perf at large number of OS pages per chunk values on top three platforms, and nitpicks if you can, and I'll r/a+.
/be
Assignee | ||
Comment 37•17 years ago
|
||
Here is another test that makes everything allocated reachable until the explicit GC call not to depend on the effects of the last ditch GC:
function test()
{
var startTime = Date.now();
var i = 1000*1000;
var o = null;
do {
o = {o: o};
} while (--i != 0);
o = null;
gc();
var time = Date.now() - startTime;
print("Test time: "+time+"ms");
}
test();
When I run it in Linux console with no X11 running, I got the following results on 1.1 GHz Pentium M laptop with Fedora 7 for optimized js shell
arenas per chunk time of the best among 10 runs
1 1882
2 1871
3 1873
4 1859
6 1864
8 1850
12 1858
16 1859
24 1846
32 1854
48 1864
64 1852
96 1871
128 1856
192 1855
256 1850
no mmap 1860
no patch 1881
Assignee | ||
Comment 38•17 years ago
|
||
The new patch follows suggestions from comment 36 except the bit field ides. I will try that in the next patch.
Attachment #278569 -
Attachment is obsolete: true
Attachment #278569 -
Flags: review?(brendan)
Assignee | ||
Comment 39•17 years ago
|
||
The version of the patch that uses bit fields in JSGCArenaInfo instead of explicit bit manipulations. This indeeds reduces the number of macros, adds clarity and decreases the number of ifs as the previous patch contained code like:
a = GET_PREV_ARENA(a);
if (!a)
jump;
where GET_PREV_ARENA already contained the check for NULL address.
But to proceed with bit fields, I need to verify that on Windows/Mac compilers implements the bit fields properly. So the patch contains:
JS_STATIC_ASSERT(sizeof(JSGCArenaInfo) == sizeof(jsuword) * 4);
So does it compile on Windows/Mac?
Comment 40•17 years ago
|
||
Running with the current patch under windows, I hit the following assertion:
Assertion failure: arenaList->lastCount > 0, at jsgc.c:2622
Assignee | ||
Comment 41•17 years ago
|
||
The new patch fixes the asserts in the previous bit-field patch.
Attachment #278703 -
Attachment is obsolete: true
Attachment #278747 -
Attachment is obsolete: true
Assignee | ||
Comment 42•17 years ago
|
||
Comment on attachment 279069 [details] [diff] [review]
v10 (using bit fields)
The patch passes shell's test suite and fF smoke test.
Attachment #279069 -
Flags: review?(brendan)
Assignee | ||
Comment 43•17 years ago
|
||
For performance tuning I would like to know the results for the test from comment 37 on Windows/Mac with the following values of js_gcArenasPerChunk:
1 2 4 8 16 32 64
For that just add in js_InitGC from jsgc.c after the line:
js_gcArenasPerChunk = JS_MAX((uint32) arenasPerPage, 8);
a line like
js_gcArenasPerChunk = <the desired value>;
For completeness it would be interesting to know the results without the patch and the results with mmap/VirtaulAlloc disabled. For the later change the line js_InitGC that reads:
if (arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) {
into something like
if (0 && arenasPerPage - 1 <= (size_t) (GC_ARENAS_PER_CPU_PAGE_LIMIT - 1)) {
Assignee | ||
Comment 44•17 years ago
|
||
The new version fixes JS_GCMETER and uses 1K-sized arenas when mmap is not available to minimize the overhead of aligned allocation implemented through oversized malloc.
Attachment #279069 -
Attachment is obsolete: true
Attachment #279160 -
Flags: review?(brendan)
Attachment #279069 -
Flags: review?(brendan)
Comment 45•17 years ago
|
||
Using the test from comment #37,
1: 1485
2: 1422
4: 1375
8: 1390
16: 1390
32: 1375
64: 1406
With MMAP disabled:
1486
Without the patch:
1515ms (with most other results very nearby)
Status: NEW → ASSIGNED
Assignee | ||
Comment 46•17 years ago
|
||
(In reply to comment #45)
> Using the test from comment #37,
>
> 1: 1485
> 2: 1422
> 4: 1375
> 8: 1390
> 16: 1390
> 32: 1375
> 64: 1406
Since the test from comment 37 makes sure that 32MB of memory is constantly allocated until the explicit GC call, the results should be unhindered by CPU cache effects.
So I will switch the patch to use 4 CPU pages per GC chunk.
Assignee | ||
Comment 47•17 years ago
|
||
The new version uses 16K chunks with 4 4K-sized arenas.
Attachment #279160 -
Attachment is obsolete: true
Attachment #279715 -
Flags: review?(brendan)
Attachment #279160 -
Flags: review?(brendan)
Assignee | ||
Comment 48•17 years ago
|
||
In the new version I renamed gcUnscannedArenaTop back to gcUnscannedArenaStackTop to minimize the patch.
Attachment #279715 -
Attachment is obsolete: true
Attachment #279716 -
Flags: review?(brendan)
Attachment #279715 -
Flags: review?(brendan)
Assignee | ||
Comment 49•17 years ago
|
||
The new version fixes the wrong assert in FLAGP_TO_THING macro and optimizes JS_CallTracer for string and double things. There the code uses THING_TO_FLAGP(thing, thingSize) with a compile-time constant as thingSize so a compiler has more options for the optimization.
Attachment #279716 -
Attachment is obsolete: true
Attachment #280260 -
Flags: review?(brendan)
Attachment #279716 -
Flags: review?(brendan)
Assignee | ||
Comment 50•17 years ago
|
||
Cleanups and optimizations to reduce the number of ifs taken on various code paths.
Attachment #280260 -
Attachment is obsolete: true
Attachment #280265 -
Flags: review?(brendan)
Attachment #280260 -
Flags: review?(brendan)
Assignee | ||
Comment 51•17 years ago
|
||
Comment 52•17 years ago
|
||
Comment on attachment 280265 [details] [diff] [review]
v15 (with bit fields)
struct JSGCArenaInfo may be the first use of bit-fields in SpiderMonkey. Slight style preference for space before the : as well as after (aligning field sizes in a column is cool still).
Need "these" before "indexes" here:
* access either of indexes.
nfreeArenas => numFreeArenas
Nit: overflow line should underhang right operand of = here:
chunkGap = (GC_ARENA_SIZE - (chunk & GC_ARENA_MASK)) &
GC_ARENA_MASK;
Non-nit: the overflow line and final & on first line are not needed since GC_ARENA_MASK = GC_ARENA_SIZE - 1.
Too bad we can't use the extra arena allocated and wasted for alignment in the malloc-based chunk.
RemoveChunkFromList would be shorter if you used a double-indirect "prevp" -- see bug 391211 comment 18.
Suggest early return in NewGCArena for the js_gcArenasPerChunk == 1 case, to match DestroyGCArena.
Nit: "If it is fully used, ..." (missing "is") in:
* Try to allocate things from the last arena. If it fully used, check
Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't mean "set of values allowing redundant elements, i.e. a multiset".
More comments in a bit, almost done.
/be
Assignee | ||
Comment 53•17 years ago
|
||
(In reply to comment #52)
> Too bad we can't use the extra arena allocated and wasted for alignment in the
> malloc-based chunk.
Well, malloc code is the last resort code when if mmap or some other alignment primitives is not available. For example, on POSIX systems (including Linux) there is posix_memalign that can be used as mmap replacement. So if on some platform that cannot use mmap one would like to avoid 3% overhead of overallocated chunk, then one can try that.
> Old nit: does "Bag" mean something useful in AddToUnscannedBag etc.? It can't
> mean "set of values allowing redundant elements, i.e. a multiset".
A "bag" here means a container where one can add an element and from which one can retrive elements one by one. So this is not a set since the set allows to test for membership and such operation is not required here. It is not even a queue since the queue assumes a particular retrieval order and the code does not need to impose any particular ordering on things in the container. On the other hand such container resembles pretty much a real bag or sack.
Comment 54•17 years ago
|
||
memalign, right -- I was messing with trace-malloc lately, I should have thought of that.
Bag => multiset in my book, so "sack" is better for me. But it's a very minor nit.
/be
Assignee | ||
Comment 55•17 years ago
|
||
The new version brings the following changes besides addressing the nits:
1. When the code uses malloc for the arena allocation, the gap between the malloc area and the first GC arena is stored outside GC chunk. It allowed to use a pointer for JSGCArenaInfo.prev field avoiding the bitfield and simplifying the code.
2. The code uses prepv method to implement the doubly-linked list of JSGCChunkInfo. My only excuse for not using it in the first place is that some initial version of the patch have used paged addresses for JSGCChunkInfo.(prev|next). With that setup it was not possible to implement the method.
3. For consistency with the rest of code AddThingToUnscannedSack and ScanDelayedChildren uses "a", not "arena" as name for a local variable holding JSGCArenaInfo pointer.
4. The new version fixed bad bug in the free phase loop in JS_GC. The code there unconditionally reset arenaList->lastCount even if the first arena arena were not freed. Too bad that the test suite does not covers that.
Attachment #280265 -
Attachment is obsolete: true
Attachment #280266 -
Attachment is obsolete: true
Attachment #280491 -
Flags: review?(brendan)
Attachment #280265 -
Flags: review?(brendan)
Comment 56•17 years ago
|
||
Comment on attachment 280491 [details] [diff] [review]
v16
Rest of v15 comments. I will re-review via interdiff a final patch and stamp it.
/be
-----
Note "bellow" misspelling:
* NewGCChunk/DestroyGCChunk bellow for details.
Backslash not indented enough in:
#define GET_ARENA_CHUNK(arena, index) \
Nit: blank line before:
rt->gcBytes += GC_ARENA_SIZE;
in js_NewGCThing.
Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in New/DestroyGCArena?
Nit: rename UNSCANNED_SPAN to THINGS_PER_UNSCANNED_BIT? Not sure why I urged "SPAN" before. Let's keep the name concrete in terms of a bitmap. Sanity check: replace "span" with "bit" in:
* span as the real limit can be inside the span.
to get:
* bit as the real limit can be "inside" the bit.
(I added scare-quotes ;-) and it works for me. How about you? Further check:
* Skip free or already scanned things that share the span
becomes:
* Skip free or already scanned things that share the bit
also reads nicely.
In this more concrete light, perhaps s/Bag/Bitmap/.
Could use a "so" before "push" here:
* The thing is the first unscanned thing in the whole arena, push the
Probably want "[the] unscanned stack", not "[the] unscan stack", in:
* unscan stack since AddThingToUnscannedBag ensures that even for
and a comma after "stack" would be nice.
Nit: is it worthwhile to do explicit c.s.e. in:
} else if (endIndex > THINGS_PER_ARENA(thingSize)) {
endIndex = THINGS_PER_ARENA(thingSize);
There's a divide hiding in the macro, but modern compilers should common for us.
Typo or wrong name in this comment:
* When JS_TraceChildren from the above calls JS_Trace that in turn on
s/JS_Trace /JS_CallTracer /. Actually, grep on current source finds two such misnomers:
$ grep -w JS_Trace *.c
jsgc.c: * When JS_TraceChildren from the above calls JS_Trace that in turn
jsgc.c: * after it calls JS_Trace or JS_MarkGCThing for the last time, the
Nit: the change_list label and goto seem like a candidate for a do-while loop. But this requires a goto downward to exit the for (;;) loop. And it requires more indentation. Still, house style tries to use downward gotos but not upward ones that easily rewrite to canonical loop statements.
s/if/is/ in:
* Optimize for string and double as their size if known and their tracing
Canonical comment for:
+ /* unreachable */
uses "NOTREACHED", not "unreachable".
-END-
Comment 57•17 years ago
|
||
(In reply to comment #55)
> 4. The new version fixed bad bug in the free phase loop in JS_GC. The code
> there unconditionally reset arenaList->lastCount even if the first arena arena
> were not freed. Too bad that the test suite does not covers that.
Can you contrive a testcase?
/be
Assignee | ||
Comment 58•17 years ago
|
||
(In reply to comment #56)
> Note "bellow" misspelling:
I must learn the proper spelling by now :(
> Non-nit/question: why not do the gcBytes +/-= GC_ARENA_SIZE in
> New/DestroyGCArena?
gcBytes is irrelevant to arena allocation code so I moved its checks and updates to js_NewGCThing/js_GC/js_DestroyArenaList, the places that define GC memory pressure heuristics.
> In this more concrete light, perhaps s/Bag/Bitmap/.
Do you mean to s/Sack/Bitmap/ in the latest patch. IMO it does not read good. Compare:
static void
AddThingToUnscannedSack(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
AddThingToUnscannedSack(rt, flagp);
versus
static void
AddThingToUnscannedBitmap(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
AddThingToUnscannedBitmap(rt, flagp);
Perhaps it is better to call the function DelayChildrenScanning. It matches ScanDelayedChildren and even reads better:
static void
DelayChildrenScanning(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
DelayChildrenScanning(rt, flagp);
After typing this i realized that it using DelayChildrenTracing/TraceDelayedChildren would be even better:
static void
DelayChildrenScanning(JSRuntime *rt, uint8 *flagp)
...
if (RECURSION_TOO_DEEP())
DelayChildrenTracing(rt, flagp);
else
JS_TraceChildren(trc, thing, kind);
...
JS_TraceChildren(trc, thing, kind);
TraceDelayedChildren(trc);
But that would require comment rewrite.
Comment 59•17 years ago
|
||
DelayTracingChildren is the better order among the words in the identifier. I like it! Looking forward to a new (last) patch.
/be
Assignee | ||
Comment 60•17 years ago
|
||
Addressing nits and replacing (un)scanned -> (un)traced.
Now there is no words "bag" or "sack" in GC sources ;)
Attachment #280491 -
Attachment is obsolete: true
Attachment #280521 -
Flags: review?(brendan)
Attachment #280491 -
Flags: review?(brendan)
Comment 61•17 years ago
|
||
Comment on attachment 280521 [details] [diff] [review]
v17
>+ goto change_list;
> for (;;) {
>+ if (a->list != arenaList) {
>+ change_list:
>+ arenaList = a->list;
That's a downward goto, all right, but still a bit tricky. If a few cycles are not an issue here (I claim they are not), I'd prefer setting arenaList = NULL; before the for loop and letting the if do its thing.
r+a=me with that. Thanks, terrific patch.
/be
Attachment #280521 -
Flags: review?(brendan)
Attachment #280521 -
Flags: review+
Attachment #280521 -
Flags: approval1.9+
Assignee | ||
Comment 62•17 years ago
|
||
The new version skips goto and ifs in TraceDelayedChildren and unconditionally calls THINGS_PER_UNTRACED_BIT(thingSize) per each arena. This makes this rarely executed code simpler and trades storage of few temporaries for calculations.
Attachment #280521 -
Attachment is obsolete: true
Attachment #280648 -
Flags: review?(brendan)
Updated•17 years ago
|
Attachment #280648 -
Flags: review?(brendan)
Attachment #280648 -
Flags: review+
Attachment #280648 -
Flags: approval1.9+
Comment 63•17 years ago
|
||
Forgot to ask that memalign comment would be nice (about how the malloc-only case could use memalign if available).
/be
Assignee | ||
Comment 64•17 years ago
|
||
The new version contains comments suggestion to consider posix_memalign. Here is the inter-diff:
@@ -555,16 +555,19 @@ NewGCChunk()
return (p == MAP_FAILED) ? 0 : (jsuword) p;
# else
# error "Not implemented"
# endif
}
#endif
/*
+ * Implement the chunk allocation using over sized malloc if mmap can not
+ * be used. XXX Consider using posix_memalign here, see bug 396007.
+ *
* Since malloc allocates pointers aligned on the word boundary, to get
* js_gcArenasPerChunk aligned arenas, we need to malloc only
* ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) - sizeof(size_t)
* bytes. But since we stores the gap between the malloced pointer and the
* first arena in the chunk after the chunk, we need to ask for
* ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT)
* bytes to ensure that we always have room to store the gap.
*/
Attachment #280648 -
Attachment is obsolete: true
Attachment #280710 -
Flags: review+
Attachment #280710 -
Flags: approval1.9+
Assignee | ||
Comment 65•17 years ago
|
||
Darmon, is it not too late for M8 check in?
Comment 66•17 years ago
|
||
New way is FIXME not XXX in such comments. Please change to that before committing. I hear M8 is done and this can go in when the tree is open for M9.
/be
Assignee | ||
Comment 67•17 years ago
|
||
Patch to check in with comments fixed. I
Attachment #280710 -
Attachment is obsolete: true
Attachment #280991 -
Flags: review+
Attachment #280991 -
Flags: approval1.9+
Assignee | ||
Updated•17 years ago
|
Attachment #280991 -
Attachment description: v1bc → v1c
Assignee | ||
Updated•17 years ago
|
Attachment #280991 -
Attachment description: v1c → v18c
Assignee | ||
Comment 68•17 years ago
|
||
Fixing one more typo.
Attachment #280991 -
Attachment is obsolete: true
Attachment #280993 -
Flags: review+
Attachment #280993 -
Flags: approval1.9+
Assignee | ||
Comment 69•17 years ago
|
||
I checked in the patch from the comment 68 to the trunk:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189868942&maxdate=1189869125&who=igor%mir2.org
Status: ASSIGNED → RESOLVED
Closed: 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 70•17 years ago
|
||
The patch busted Mac tinderboxes. Apparently on Mac MAP_ANONYMOUS is not defined.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 71•17 years ago
|
||
The new version uses a workaround for platforms where MAP_ANONYMOUS is not defined and only deprecated MAP_ANON is available. Here is the inter-diff:
+++ js/src/jsgc.c 2007-09-15 17:37:01.000000000 +0200
@@ -96,16 +96,22 @@
# define JS_GC_USE_MMAP 0
#endif
#if JS_GC_USE_MMAP
# if defined(XP_WIN)
# include <windows.h>
# elif defined(XP_UNIX) || defined(XP_BEOS)
# include <sys/mman.h>
+
+/* On Mac OS X MAP_ANONYMOUS is not defined. */
+# if !defined(MAP_ANONYMOUS) && defined(MAP_ANON)
+# define MAP_ANONYMOUS MAP_ANON
+# endif
+
# endif
#endif
I would appreciate if somebody can check that the patch compiles on Mac.
Attachment #280993 -
Attachment is obsolete: true
Attachment #281009 -
Flags: review?(brendan)
Attachment #281009 -
Flags: approval1.9?
Comment 72•17 years ago
|
||
Comment on attachment 281009 [details] [diff] [review]
v19
That should do it.
/be
Attachment #281009 -
Flags: review?(brendan)
Attachment #281009 -
Flags: review+
Attachment #281009 -
Flags: approval1.9?
Attachment #281009 -
Flags: approval1.9+
Comment 73•17 years ago
|
||
I tested v19 on my Macbook Pro and it builds fine.
/be
Assignee | ||
Comment 74•17 years ago
|
||
(In reply to comment #73)
> I tested v19 on my Macbook Pro and it builds fine.
Thanks, I checked in the patch from comment 71 to the trunk:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189876705&maxdate=1189876980&who=igor%mir2.org
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 75•17 years ago
|
||
I will be away until 23:00 UTC / 16:00 PDT so if tinderbox testing reveals troubles, then the link in comments 74 should allow to take the patch away.
Comment 76•17 years ago
|
||
It appears this patch bumped up Lk on fxdbug-linux-tbox by about .25MB
Comment 77•17 years ago
|
||
The jump's filed as bug 396299
Assignee | ||
Comment 78•17 years ago
|
||
I took out the patch again to investigate the increased leak.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Assignee | ||
Comment 79•17 years ago
|
||
I rechecked-in the patch from comment 71 to the trunk as taking it out had no influence on the leak counters:
http://bonsai.mozilla.org/cvsquery.cgi?module=PhoenixTinderbox&branch=HEAD&cvsroot=%2Fcvsroot&date=explicit&mindate=1189947780&maxdate=1189947840&who=igor%mir2.org
Status: REOPENED → RESOLVED
Closed: 17 years ago → 17 years ago
Resolution: --- → FIXED
Updated•6 years ago
|
Component: DOM → DOM: Core & HTML
You need to log in
before you can comment on or make changes to this bug.
Description
•