Closed Bug 508707 Opened 15 years ago Closed 13 years ago

Use one single GC heap chunk, avoiding frequent mmap and malloc calls

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED WONTFIX

People

(Reporter: gal, Assigned: gwagner)

References

Details

Attachments

(1 file, 9 obsolete files)

No description provided.
Attached patch patch (obsolete) (deleted) — Splinter Review
Assignee: general → gal
The attached patch allocates the GC heap in a single chunks. This allows very fast identification of GC things, which is handy for conservative stack scanning. On anything not-macosx pages are only touched if we actually need that much gc heap. Pages are never deallocated. This is a 20ms on SS on mac, mostly due to the memset which avoids macosx page fault badness during SS execution. It would be great if someone can measure this on win32 or linux. Even if there is no speedup on win32 or linux, I think we should seriously consider taking this patch. It removes a lot of complexity (and code) from jsgc and relying on virtual memory in the year 2009 seems appropriate.
I tried to compare on Linux but the patch has bit-rotted. Can you rebase and repost?
I got out of memory errors with this patch. @@ -3736,12 +3271,6 @@ js_GC(JSContext *cx, JSGCInvocationKind */ js_SweepScriptFilenames(rt); - /* - * Destroy arenas after we finished the sweeping sofinalizers can safely - * use js_IsAboutToBeFinalized(). - */ - DestroyGCArenas(rt, emptyArenas); - The "allClear" state in js_GC has to be changed since empty arenas are not destroyed any more. A quick fix is to remove the allClear state.
Attached patch clean version (obsolete) (deleted) — Splinter Review
Another version of the previous patch. No changes, just a new version that shouldn't result in conflicts.
Attachment #394165 - Attachment is patch: true
Attachment #394165 - Attachment mime type: application/octet-stream → text/plain
Attached patch fixed out of memory error (obsolete) (deleted) — Splinter Review
Fix for out of memory error. Added emptyArenas list. An arena is added to the list if it gets empty during js_GC. NewGCArena takes an arena from the list, if available, instead of creating a new arena.
Attachment #394165 - Attachment is obsolete: true
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.028x as fast 872.9ms +/- 0.1% 848.9ms +/- 0.1% significant ============================================================================= 3d: 1.053x as fast 135.7ms +/- 0.2% 128.9ms +/- 0.2% significant cube: 1.106x as fast 38.1ms +/- 0.6% 34.5ms +/- 0.5% significant morph: 1.092x as fast 27.9ms +/- 0.5% 25.5ms +/- 0.6% significant raytrace: 1.012x as fast 69.7ms +/- 0.2% 68.8ms +/- 0.2% significant access: 1.039x as fast 130.8ms +/- 0.1% 126.0ms +/- 0.4% significant binary-trees: 1.099x as fast 38.9ms +/- 0.2% 35.4ms +/- 0.4% significant fannkuch: *1.037x as slow* 54.0ms +/- 0.2% 56.0ms +/- 0.6% significant nbody: 1.171x as fast 24.6ms +/- 0.6% 21.0ms +/- 0.8% significant nsieve: *1.017x as slow* 13.3ms +/- 1.0% 13.5ms +/- 1.1% significant bitops: 1.044x as fast 34.5ms +/- 0.6% 33.0ms +/- 0.7% significant 3bit-bits-in-byte: - 1.6ms +/- 9.1% 1.5ms +/- 9.8% bits-in-byte: ?? 8.1ms +/- 1.1% 8.1ms +/- 1.1% not conclusive: might be *1.002x as slow* bitwise-and: - 2.3ms +/- 5.8% 2.2ms +/- 5.2% nsieve-bits: 1.059x as fast 22.5ms +/- 0.6% 21.3ms +/- 0.6% significant controlflow: *1.026x as slow* 32.2ms +/- 0.6% 33.1ms +/- 0.4% significant recursive: *1.026x as slow* 32.2ms +/- 0.6% 33.1ms +/- 0.4% significant crypto: - 51.6ms +/- 0.8% 51.3ms +/- 1.2% aes: 1.020x as fast 28.9ms +/- 1.2% 28.4ms +/- 1.6% significant md5: *1.017x as slow* 14.5ms +/- 1.0% 14.7ms +/- 1.6% significant sha1: ?? 8.2ms +/- 1.4% 8.2ms +/- 1.6% not conclusive: might be *1.002x as slow* date: 1.049x as fast 130.5ms +/- 0.1% 124.3ms +/- 0.3% significant format-tofte: 1.049x as fast 62.4ms +/- 0.2% 59.5ms +/- 0.4% significant format-xparb: 1.050x as fast 68.1ms +/- 0.2% 64.8ms +/- 0.2% significant math: *1.010x as slow* 29.6ms +/- 0.5% 29.9ms +/- 0.6% significant cordic: ?? 10.5ms +/- 1.4% 10.6ms +/- 1.3% not conclusive: might be *1.010x as slow* partial-sums: *1.011x as slow* 13.1ms +/- 0.7% 13.2ms +/- 0.9% significant spectral-norm: ?? 6.0ms +/- 0.7% 6.1ms +/- 1.3% not conclusive: might be *1.010x as slow* regexp: ?? 39.4ms +/- 0.4% 39.5ms +/- 0.4% not conclusive: might be *1.003x as slow* dna: ?? 39.4ms +/- 0.4% 39.5ms +/- 0.4% not conclusive: might be *1.003x as slow* string: 1.020x as fast 288.5ms +/- 0.2% 282.9ms +/- 0.2% significant base64: 1.054x as fast 17.1ms +/- 0.5% 16.2ms +/- 1.1% significant fasta: 1.044x as fast 60.2ms +/- 0.2% 57.6ms +/- 0.2% significant tagcloud: ?? 91.2ms +/- 0.2% 91.3ms +/- 0.3% not conclusive: might be *1.001x as slow* unpack-code: 1.021x as fast 92.1ms +/- 0.4% 90.2ms +/- 0.3% significant validate-input: 1.015x as fast 28.0ms +/- 0.2% 27.6ms +/- 0.7% significant
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #392835 - Attachment is obsolete: true
Attachment #394185 - Attachment is obsolete: true
Attachment #395198 - Flags: review?(igor)
I will review the patch tomorrow, 2008-08-20.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #395198 - Attachment is obsolete: true
Attachment #395441 - Flags: review?(igor)
Attachment #395198 - Flags: review?(igor)
I guess the instrumentation-variable-name for arenas that are returned to the global free list should be changed. METER(nkilledarenas++); is misleading.
trace-test.js fails with an out of memory error in: #0 js_ReportOutOfMemory (cx=0x30a990) at ../jscntxt.cpp:1317 #1 0x00022211 in JS_ReportOutOfMemory (cx=0x30a990) at ../jsapi.cpp:5630 #2 0x0006ca70 in js_AddAsGCBytes (cx=0x30a990, sz=5760016) at ../jsgc.cpp:1820 values in AddAsGCBytes are: if (rt->gcBytes >= rt->gcMaxBytes || sz > (size_t) (rt->gcMaxBytes - rt->gcBytes)) { fprintf(stderr, "sz: %d, gcBytes: %d, max: %d, diff: %d\n", sz, rt->gcBytes, rt->gcMaxBytes, (size_t)(rt->gcMaxBytes - rt->gcBytes)); sz: 5760016, gcBytes: 67113264, max: 67108864, diff: -4400
The AddAsGCBytes API is completely borked. We have an approved patch to remove it. I will try to land the RSS patch first, that will take care of this problem.
Comment on attachment 395441 [details] [diff] [review] patch > static void > FinishGCArenaLists(JSRuntime *rt) > { ... > >+ rt->gcPtr = rt->gcBase; > rt->gcBytes = 0; >- JS_ASSERT(rt->gcChunkList == 0); I do not see where the patch releases rt->gcBase. It can be done here with rt->gcBase set to NULL after calling free. r+ with this fixed.
(In reply to comment #12) Since this patch would make my life easier with conservative stack-walking I was looking at the problem again. We fill up the heap with arenas and only increase rt->gcBytes since we don't delete arenas any more. The problem occurs with a very big (5760016 bytes) enumerator created in js_Enumerate. The size is added to the gcBytes with AddAsGCBytes.
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #395441 - Attachment is obsolete: true
Attachment #395441 - Flags: review?(igor)
Attached patch removed AddAsGCBytes (obsolete) (deleted) — Splinter Review
Attachment #399609 - Attachment is obsolete: true
Summary: TM: Allocate GC heap in one chunk → Use one single GC heap chunk, avoiding frequent mmap and malloc calls
Blocks: 505308
Comment on attachment 399618 [details] [diff] [review] removed AddAsGCBytes >diff -r 465737c189fc js/src/jscntxt.h >--- a/js/src/jscntxt.h Wed Sep 09 16:51:46 2009 -0500 >+++ b/js/src/jscntxt.h Wed Sep 09 16:01:57 2009 -0700 >@@ -395,18 +395,21 @@ struct JSRuntime { > * See bug 492355 for more details. > * > * This comes early in JSRuntime to minimize the immediate format used by > * trace-JITted code that reads it. > */ > uint32 protoHazardShape; > > /* Garbage collector state, used by jsgc.c. */ >- JSGCChunkInfo *gcChunkList; >+ uintptr_t gcBase; >+ uintptr_t gcPtr; >+ uintptr_t gcEnd; Nit but common naming conventions help a little: the common convention applied to gcPtr would yield gcCursor, and gcEnd would be gcLimit. A bit longer but that fits the member name style in general. Similar style-mongering: + inline bool IsGCThing(void *ptr) { + return uintptr_t(ptr) >= gcBase && uintptr_t(ptr) < gcEnd; + } The canonical name for ptr here is "thing". jsgc.{cpp,h} use jsuword not uintptr_t. We can switch but let's do it all at once, or not at all since we have better things to do and jsuword is the right sized type on all platforms already. This looks better with number-line order: + inline bool IsGCThing(void *ptr) { + return gcBase <= uintptr_t(ptr) && uintptr_t(ptr) < gcLimit; + } As Igor did for JSString::isStatic, you should also JS_ASSERT((uintptr_t(thing) & JSVAL_TAGMASK) == 0). /be
Comment on attachment 399618 [details] [diff] [review] removed AddAsGCBytes >diff -r 465737c189fc js/src/jscntxt.h >--- a/js/src/jscntxt.h Wed Sep 09 16:51:46 2009 -0500 >+++ b/js/src/jscntxt.h Wed Sep 09 16:01:57 2009 -0700 >@@ -395,18 +395,21 @@ struct JSRuntime { > * See bug 492355 for more details. > * > * This comes early in JSRuntime to minimize the immediate format used by > * trace-JITted code that reads it. > */ > uint32 protoHazardShape; > > /* Garbage collector state, used by jsgc.c. */ >- JSGCChunkInfo *gcChunkList; >+ uintptr_t gcBase; >+ uintptr_t gcPtr; >+ uintptr_t gcEnd; > JSGCArenaList gcArenaList[GC_NUM_FREELISTS]; >+ JSGCArenaInfo *emptyArenas; > JSGCDoubleArenaList gcDoubleArenaList; > JSDHashTable gcRootsHash; > JSDHashTable *gcLocksHash; > jsrefcount gcKeepAtoms; > size_t gcBytes; > size_t gcLastBytes; > size_t gcMaxBytes; > size_t gcMaxMallocBytes; >@@ -414,16 +417,20 @@ struct JSRuntime { > uint32 gcLevel; > uint32 gcNumber; > JSTracer *gcMarkingTracer; > uint32 gcTriggerFactor; > size_t gcTriggerBytes; > volatile JSBool gcIsNeeded; > volatile JSBool gcFlushCodeCaches; > >+ inline bool IsGCThing(void *ptr) { >+ return uintptr_t(ptr) >= gcBase && uintptr_t(ptr) < gcEnd; >+ } >+ > /* > * NB: do not pack another flag here by claiming gcPadding unless the new > * flag is written only by the GC thread. Atomic updates to packed bytes > * are not guaranteed, so stores issued by one thread may be lost due to > * unsynchronized read-modify-write cycles on other threads. > */ > JSPackedBool gcPoke; > JSPackedBool gcRunning; >diff -r 465737c189fc js/src/jsgc.cpp >--- a/js/src/jsgc.cpp Wed Sep 09 16:51:46 2009 -0500 >+++ b/js/src/jsgc.cpp Wed Sep 09 16:01:57 2009 -0700 >@@ -101,56 +101,16 @@ > */ > #ifdef MOZ_MEMORY > extern "C" { > #include "../../memory/jemalloc/jemalloc.h" > } > #endif > > /* >- * Include the headers for mmap unless we have posix_memalign and do not >- * insist on mmap. >- */ >-#if JS_GC_USE_MMAP || (!defined JS_GC_USE_MMAP && !HAS_POSIX_MEMALIGN) >-# if defined(XP_WIN) >-# ifndef JS_GC_USE_MMAP >-# define JS_GC_USE_MMAP 1 >-# endif >-# include <windows.h> >-# elif defined(__SYMBIAN32__) >-// Symbian's OpenC has mmap (and #defines _POSIX_MAPPED_FILES), but >-// doesn't implement MAP_ANON. If we have MOZ_MEMORY, then we can use >-// posix_memalign; we've defined HAS_POSIX_MEMALIGN above. Otherwise, >-// we overallocate. >-# else >-# if defined(XP_UNIX) || defined(XP_BEOS) >-# include <unistd.h> >-# endif >-# if _POSIX_MAPPED_FILES > 0 >-# ifndef JS_GC_USE_MMAP >-# define JS_GC_USE_MMAP 1 >-# endif >-# include <sys/mman.h> >- >-/* On Mac OS X MAP_ANONYMOUS is not defined. */ >-# if !defined(MAP_ANONYMOUS) && defined(MAP_ANON) >-# define MAP_ANONYMOUS MAP_ANON >-# endif >-# if !defined(MAP_ANONYMOUS) >-# define MAP_ANONYMOUS 0 >-# endif >-# else >-# if JS_GC_USE_MMAP >-# error "JS_GC_USE_MMAP is set when mmap is not available" >-# endif >-# endif >-# endif >-#endif >- >-/* > * Check JSTempValueUnion has the size of jsval and void * so we can > * reinterpret jsval as void* GC-thing pointer and use JSTVU_SINGLE for > * different GC-things. > */ > JS_STATIC_ASSERT(sizeof(JSTempValueUnion) == sizeof(jsval)); > JS_STATIC_ASSERT(sizeof(JSTempValueUnion) == sizeof(void *)); > > /* >@@ -228,124 +188,43 @@ JS_STATIC_ASSERT(JSVAL_NULL == 0); > * (GC_ARENA_SIZE - sizeof(JSGCArenaInfo)) / (thingSize + 1). > * > * To allocate doubles we use a specialized arena. It can contain only numbers > * so we do not need the type bits. Moreover, since the doubles do not require > * a finalizer and very few of them are locked via js_LockGCThing API, we use > * just one bit of flags per double to denote if it was marked during the > * marking phase of the GC. The locking is implemented via a hash table. Thus > * for doubles the flag area becomes a bitmap. >- * >- * JS_GC_USE_MMAP macro governs the choice of the aligned arena allocator. >- * When it is true, a platform-dependent function like mmap is used to get >- * memory aligned on CPU page boundaries. If the macro is false or undefined, >- * posix_memalign is used when available. Otherwise the code uses malloc to >- * over-allocate a chunk with js_gcArenasPerChunk aligned arenas. The >- * approximate space overhead of this is 1/js_gcArenasPerChunk. For details, >- * see NewGCChunk/DestroyGCChunk below. >- * >- * The code also allocates arenas in chunks when JS_GC_USE_MMAP is 1 to >- * minimize the overhead of mmap/munmap. In this case js_gcArenasPerChunk can >- * not be a compile-time constant as the system page size is not known until >- * runtime. > */ >-#if JS_GC_USE_MMAP >-static uint32 js_gcArenasPerChunk = 0; >-static JSBool js_gcUseMmap = JS_FALSE; >-#elif HAS_POSIX_MEMALIGN >-# define js_gcArenasPerChunk 1 >-#else >-# define js_gcArenasPerChunk 7 >-#endif >- >-#if defined(js_gcArenasPerChunk) && js_gcArenasPerChunk == 1 >-# define CHUNKED_ARENA_ALLOCATION 0 >-#else >-# define CHUNKED_ARENA_ALLOCATION 1 >-#endif > > #define GC_ARENA_SHIFT 12 > #define GC_ARENA_MASK ((jsuword) JS_BITMASK(GC_ARENA_SHIFT)) > #define GC_ARENA_SIZE JS_BIT(GC_ARENA_SHIFT) > >-/* >- * JS_GC_ARENA_PAD defines the number of bytes to pad JSGCArenaInfo structure. >- * It is used to improve allocation efficiency when using posix_memalign. If >- * malloc's implementation uses internal headers, then calling >- * >- * posix_memalign(&p, GC_ARENA_SIZE, GC_ARENA_SIZE * js_gcArenasPerChunk) >- * >- * in a sequence leaves holes between allocations of the size GC_ARENA_SIZE >- * due to the need to fit headers. JS_GC_ARENA_PAD mitigates that so the code >- * calls >- * >- * posix_memalign(&p, GC_ARENA_SIZE, >- * GC_ARENA_SIZE * js_gcArenasPerChunk - JS_GC_ARENA_PAD) >- * >- * When JS_GC_ARENA_PAD is equal or greater than the number of words in the >- * system header, the system can pack all allocations together without holes. >- * >- * With JS_GC_USE_MEMALIGN we want at least 2 word pad unless posix_memalign >- * comes from jemalloc that does not use any headers/trailers. >- */ >-#ifndef JS_GC_ARENA_PAD >-# if HAS_POSIX_MEMALIGN && !MOZ_MEMORY >-# define JS_GC_ARENA_PAD (2 * JS_BYTES_PER_WORD) >-# else >-# define JS_GC_ARENA_PAD 0 >-# endif >-#endif >- > struct JSGCArenaInfo { > /* > * Allocation list for the arena or NULL if the arena holds double values. > */ > JSGCArenaList *list; > > /* > * Pointer to the previous arena in a linked list. The arena can either > * belong to one of JSContext.gcArenaList lists or, when it does not have > * any allocated GC things, to the list of free arenas in the chunk with > * head stored in JSGCChunkInfo.lastFreeArena. > */ > JSGCArenaInfo *prev; > >-#if !CHUNKED_ARENA_ALLOCATION > jsuword prevUntracedPage; >-#else >- /* >- * A link field for the list of arenas with marked but not yet traced >- * things. The field is encoded as arena's page to share the space with >- * firstArena and arenaIndex fields. >- */ >- jsuword prevUntracedPage : JS_BITS_PER_WORD - GC_ARENA_SHIFT; >- >- /* >- * When firstArena is false, the index of arena in the chunk. When >- * firstArena is true, the index of a free arena holding JSGCChunkInfo or >- * NO_FREE_ARENAS if there are no free arenas in the chunk. >- * >- * GET_ARENA_INDEX and GET_CHUNK_INFO_INDEX are convenience macros to >- * access either of indexes. >- */ >- jsuword arenaIndex : GC_ARENA_SHIFT - 1; >- >- /* Flag indicating if the arena is the first in the chunk. */ >- jsuword firstArena : 1; >-#endif > > union { > jsuword untracedThings; /* bitset for fast search of marked > but not yet traced things */ > JSBool hasMarkedDoubles; /* the arena has marked doubles */ > } u; >- >-#if JS_GC_ARENA_PAD != 0 >- uint8 pad[JS_GC_ARENA_PAD]; >-#endif > }; > > /* > * Verify that the bit fields are indeed shared and JSGCArenaInfo is as small > * as possible. The code does not rely on this check so if on a particular > * platform this does not compile, then, as a workaround, comment the assert > * out and submit a bug report. > */ >@@ -376,69 +255,16 @@ JS_STATIC_ASSERT(offsetof(JSGCArenaInfo, > #define ARENA_INFO_TO_PAGE(arena) \ > (JS_ASSERT(IS_ARENA_INFO_ADDRESS(arena)), \ > ((jsuword) (arena) >> GC_ARENA_SHIFT)) > > #define GET_ARENA_INFO(chunk, index) \ > (JS_ASSERT((index) < js_gcArenasPerChunk), \ > ARENA_START_TO_INFO(chunk + ((index) << GC_ARENA_SHIFT))) > >-#if CHUNKED_ARENA_ALLOCATION >-/* >- * Definitions for allocating arenas in chunks. >- * >- * All chunks that have at least one free arena are put on the doubly-linked >- * list with the head stored in JSRuntime.gcChunkList. JSGCChunkInfo contains >- * the head of the chunk's free arena list together with the link fields for >- * gcChunkList. >- * >- * Structure stored in one of chunk's free arenas. GET_CHUNK_INFO_INDEX gives >- * the index of this arena. When all arenas in the chunk are used, it is >- * removed from the list and the index is set to NO_FREE_ARENAS indicating >- * that the chunk is not on gcChunkList and has no JSGCChunkInfo available. >- */ >- >-struct JSGCChunkInfo { >- JSGCChunkInfo **prevp; >- JSGCChunkInfo *next; >- JSGCArenaInfo *lastFreeArena; >- uint32 numFreeArenas; >-}; >- >-#define NO_FREE_ARENAS JS_BITMASK(GC_ARENA_SHIFT - 1) >- >-#ifdef js_gcArenasPerChunk >-JS_STATIC_ASSERT(1 <= js_gcArenasPerChunk && >- js_gcArenasPerChunk <= NO_FREE_ARENAS); >-#endif >- >-#define GET_ARENA_CHUNK(arena, index) \ >- (JS_ASSERT(GET_ARENA_INDEX(arena) == index), \ >- ARENA_INFO_TO_START(arena) - ((index) << GC_ARENA_SHIFT)) >- >-#define GET_ARENA_INDEX(arena) \ >- ((arena)->firstArena ? 0 : (uint32) (arena)->arenaIndex) >- >-#define GET_CHUNK_INFO_INDEX(chunk) \ >- ((uint32) ARENA_START_TO_INFO(chunk)->arenaIndex) >- >-#define SET_CHUNK_INFO_INDEX(chunk, index) \ >- (JS_ASSERT((index) < js_gcArenasPerChunk || (index) == NO_FREE_ARENAS), \ >- (void) (ARENA_START_TO_INFO(chunk)->arenaIndex = (jsuword) (index))) >- >-#define GET_CHUNK_INFO(chunk, infoIndex) \ >- (JS_ASSERT(GET_CHUNK_INFO_INDEX(chunk) == (infoIndex)), \ >- JS_ASSERT((uint32) (infoIndex) < js_gcArenasPerChunk), \ >- (JSGCChunkInfo *) ((chunk) + ((infoIndex) << GC_ARENA_SHIFT))) >- >-#define CHUNK_INFO_TO_INDEX(ci) \ >- GET_ARENA_INDEX(ARENA_START_TO_INFO((jsuword)ci)) >- >-#endif >- > /* > * Macros for GC-thing operations. > */ > #define THINGS_PER_ARENA(thingSize) \ > ((GC_ARENA_SIZE - (uint32) sizeof(JSGCArenaInfo)) / ((thingSize) + 1U)) > > #define THING_TO_ARENA(thing) \ > (JS_ASSERT(!JSString::isStatic((JSString *) (thing))), \ >@@ -817,279 +643,38 @@ ShrinkPtrTable(JSPtrTable *table, const > #else > # define METER(x) ((void) 0) > # define METER_IF(condition, x) ((void) 0) > #endif > > #define METER_UPDATE_MAX(maxLval, rval) \ > METER_IF((maxLval) < (rval), (maxLval) = (rval)) > >-#if JS_GC_USE_MMAP || !HAS_POSIX_MEMALIGN >- >-/* >- * For chunks allocated via over-sized malloc, get a pointer to store the gap >- * between the malloc's result and the first arena in the chunk. >- */ >-static uint32 * >-GetMallocedChunkGapPtr(jsuword chunk) >-{ >- JS_ASSERT((chunk & GC_ARENA_MASK) == 0); >- >- /* Use the memory after the chunk, see NewGCChunk for details. */ >- return (uint32 *) (chunk + (js_gcArenasPerChunk << GC_ARENA_SHIFT)); >-} >- >-#endif >- >-static jsuword >-NewGCChunk(void) >-{ >- void *p; >- >-#if JS_GC_USE_MMAP >- if (js_gcUseMmap) { >-# if defined(XP_WIN) >- p = VirtualAlloc(NULL, js_gcArenasPerChunk << GC_ARENA_SHIFT, >- MEM_COMMIT | MEM_RESERVE, PAGE_READWRITE); >- return (jsuword) p; >-# else >- p = mmap(NULL, js_gcArenasPerChunk << GC_ARENA_SHIFT, >- PROT_READ | PROT_WRITE, MAP_PRIVATE | MAP_ANONYMOUS, -1, 0); >- return (p == MAP_FAILED) ? 0 : (jsuword) p; >-# endif >- } >-#endif >- >-#if HAS_POSIX_MEMALIGN >- if (0 != posix_memalign(&p, GC_ARENA_SIZE, >- GC_ARENA_SIZE * js_gcArenasPerChunk - >- JS_GC_ARENA_PAD)) { >- return 0; >- } >- return (jsuword) p; >-#else >- /* >- * Implement chunk allocation using oversized malloc if mmap and >- * posix_memalign are not available. >- * >- * Since malloc allocates pointers aligned on the word boundary, to get >- * js_gcArenasPerChunk aligned arenas, we need to malloc only >- * >- * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) - sizeof(size_t) >- * >- * bytes. But since we stores the gap between the malloced pointer and the >- * first arena in the chunk after the chunk, we need to ask for >- * >- * ((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT) >- * >- * bytes to ensure that we always have room to store the gap. >- */ >- p = js_malloc((js_gcArenasPerChunk + 1) << GC_ARENA_SHIFT); >- if (!p) >- return 0; >- >- { >- jsuword chunk; >- >- chunk = ((jsuword) p + GC_ARENA_MASK) & ~GC_ARENA_MASK; >- *GetMallocedChunkGapPtr(chunk) = (uint32) (chunk - (jsuword) p); >- return chunk; >- } >-#endif >-} >- >-static void >-DestroyGCChunk(jsuword chunk) >-{ >- JS_ASSERT((chunk & GC_ARENA_MASK) == 0); >-#if JS_GC_USE_MMAP >- if (js_gcUseMmap) { >-# if defined(XP_WIN) >- VirtualFree((void *) chunk, 0, MEM_RELEASE); >-# elif defined(SOLARIS) >- munmap((char *) chunk, js_gcArenasPerChunk << GC_ARENA_SHIFT); >-# else >- munmap((void *) chunk, js_gcArenasPerChunk << GC_ARENA_SHIFT); >-# endif >- return; >- } >-#endif >- >-#if HAS_POSIX_MEMALIGN >- js_free((void *) chunk); >-#else >- /* See comments in NewGCChunk. */ >- JS_ASSERT(*GetMallocedChunkGapPtr(chunk) < GC_ARENA_SIZE); >- js_free((void *) (chunk - *GetMallocedChunkGapPtr(chunk))); >-#endif >-} >- >-#if CHUNKED_ARENA_ALLOCATION >- >-static void >-AddChunkToList(JSRuntime *rt, JSGCChunkInfo *ci) >-{ >- ci->prevp = &rt->gcChunkList; >- ci->next = rt->gcChunkList; >- if (rt->gcChunkList) { >- JS_ASSERT(rt->gcChunkList->prevp == &rt->gcChunkList); >- rt->gcChunkList->prevp = &ci->next; >- } >- rt->gcChunkList = ci; >-} >- >-static void >-RemoveChunkFromList(JSRuntime *rt, JSGCChunkInfo *ci) >-{ >- *ci->prevp = ci->next; >- if (ci->next) { >- JS_ASSERT(ci->next->prevp == &ci->next); >- ci->next->prevp = ci->prevp; >- } >-} >- >-#endif >- > static JSGCArenaInfo * > NewGCArena(JSRuntime *rt) > { >- jsuword chunk; > JSGCArenaInfo *a; > >- if (rt->gcBytes >= rt->gcMaxBytes) >- return NULL; >- >-#if CHUNKED_ARENA_ALLOCATION >- if (js_gcArenasPerChunk == 1) { >-#endif >- chunk = NewGCChunk(); >- if (chunk == 0) >+ if (rt->emptyArenas) { >+ a = rt->emptyArenas; >+ rt->emptyArenas = a->prev; >+ } else { >+ uintptr_t arena = rt->gcPtr; >+ if (arena + GC_ARENA_SIZE > rt->gcEnd) > return NULL; >- a = ARENA_START_TO_INFO(chunk); >-#if CHUNKED_ARENA_ALLOCATION >- } else { >- JSGCChunkInfo *ci; >- uint32 i; >- JSGCArenaInfo *aprev; >- >- ci = rt->gcChunkList; >- if (!ci) { >- chunk = NewGCChunk(); >- if (chunk == 0) >- return NULL; >- JS_ASSERT((chunk & GC_ARENA_MASK) == 0); >- a = GET_ARENA_INFO(chunk, 0); >- a->firstArena = JS_TRUE; >- a->arenaIndex = 0; >- aprev = NULL; >- i = 0; >- do { >- a->prev = aprev; >- aprev = a; >- ++i; >- a = GET_ARENA_INFO(chunk, i); >- a->firstArena = JS_FALSE; >- a->arenaIndex = i; >- } while (i != js_gcArenasPerChunk - 1); >- ci = GET_CHUNK_INFO(chunk, 0); >- ci->lastFreeArena = aprev; >- ci->numFreeArenas = js_gcArenasPerChunk - 1; >- AddChunkToList(rt, ci); >- } else { >- JS_ASSERT(ci->prevp == &rt->gcChunkList); >- a = ci->lastFreeArena; >- aprev = a->prev; >- if (!aprev) { >- JS_ASSERT(ci->numFreeArenas == 1); >- JS_ASSERT(ARENA_INFO_TO_START(a) == (jsuword) ci); >- RemoveChunkFromList(rt, ci); >- chunk = GET_ARENA_CHUNK(a, GET_ARENA_INDEX(a)); >- SET_CHUNK_INFO_INDEX(chunk, NO_FREE_ARENAS); >- } else { >- JS_ASSERT(ci->numFreeArenas >= 2); >- JS_ASSERT(ARENA_INFO_TO_START(a) != (jsuword) ci); >- ci->lastFreeArena = aprev; >- ci->numFreeArenas--; >- } >- } >+ rt->gcPtr = arena + GC_ARENA_SIZE; >+ rt->gcBytes += GC_ARENA_SIZE; >+ a = ARENA_START_TO_INFO(arena); > } >-#endif >- >- rt->gcBytes += GC_ARENA_SIZE; > a->prevUntracedPage = 0; > memset(&a->u, 0, sizeof(a->u)); >- > return a; > } > > static void >-DestroyGCArenas(JSRuntime *rt, JSGCArenaInfo *last) >-{ >- JSGCArenaInfo *a; >- >- while (last) { >- a = last; >- last = last->prev; >- >- METER(rt->gcStats.afree++); >- JS_ASSERT(rt->gcBytes >= GC_ARENA_SIZE); >- rt->gcBytes -= GC_ARENA_SIZE; >- >-#if CHUNKED_ARENA_ALLOCATION >- if (js_gcArenasPerChunk == 1) { >-#endif >- DestroyGCChunk(ARENA_INFO_TO_START(a)); >-#if CHUNKED_ARENA_ALLOCATION >- } else { >- uint32 arenaIndex; >- jsuword chunk; >- uint32 chunkInfoIndex; >- JSGCChunkInfo *ci; >-# ifdef DEBUG >- jsuword firstArena; >- >- firstArena = a->firstArena; >- arenaIndex = a->arenaIndex; >- memset((void *) ARENA_INFO_TO_START(a), JS_FREE_PATTERN, >- GC_ARENA_SIZE - JS_GC_ARENA_PAD); >- a->firstArena = firstArena; >- a->arenaIndex = arenaIndex; >-# endif >- arenaIndex = GET_ARENA_INDEX(a); >- chunk = GET_ARENA_CHUNK(a, arenaIndex); >- chunkInfoIndex = GET_CHUNK_INFO_INDEX(chunk); >- if (chunkInfoIndex == NO_FREE_ARENAS) { >- chunkInfoIndex = arenaIndex; >- SET_CHUNK_INFO_INDEX(chunk, arenaIndex); >- ci = GET_CHUNK_INFO(chunk, chunkInfoIndex); >- a->prev = NULL; >- ci->lastFreeArena = a; >- ci->numFreeArenas = 1; >- AddChunkToList(rt, ci); >- } else { >- JS_ASSERT(chunkInfoIndex != arenaIndex); >- ci = GET_CHUNK_INFO(chunk, chunkInfoIndex); >- JS_ASSERT(ci->numFreeArenas != 0); >- JS_ASSERT(ci->lastFreeArena); >- JS_ASSERT(a != ci->lastFreeArena); >- if (ci->numFreeArenas == js_gcArenasPerChunk - 1) { >- RemoveChunkFromList(rt, ci); >- DestroyGCChunk(chunk); >- } else { >- ++ci->numFreeArenas; >- a->prev = ci->lastFreeArena; >- ci->lastFreeArena = a; >- } >- } >- } >-# endif >- } >-} >- >-static void > InitGCArenaLists(JSRuntime *rt) > { > uintN i, thingSize; > JSGCArenaList *arenaList; > > for (i = 0; i < GC_NUM_FREELISTS; i++) { > arenaList = &rt->gcArenaList[i]; > thingSize = GC_FREELIST_NBYTES(i); >@@ -1105,27 +690,27 @@ InitGCArenaLists(JSRuntime *rt) > static void > FinishGCArenaLists(JSRuntime *rt) > { > uintN i; > JSGCArenaList *arenaList; > > for (i = 0; i < GC_NUM_FREELISTS; i++) { > arenaList = &rt->gcArenaList[i]; >- DestroyGCArenas(rt, arenaList->last); > arenaList->last = NULL; > arenaList->lastCount = THINGS_PER_ARENA(arenaList->thingSize); > arenaList->freeList = NULL; > } >- DestroyGCArenas(rt, rt->gcDoubleArenaList.first); > rt->gcDoubleArenaList.first = NULL; > rt->gcDoubleArenaList.nextDoubleFlags = DOUBLE_BITMAP_SENTINEL; > >+ if (rt->gcBase) >+ free((void *)rt->gcBase); >+ rt->gcPtr = rt->gcBase = rt->gcEnd = 0; > rt->gcBytes = 0; >- JS_ASSERT(rt->gcChunkList == 0); > } > > /* > * This function must not be called when thing is jsdouble. > */ > static uint8 * > GetGCThingFlags(void *thing) > { >@@ -1229,70 +814,28 @@ typedef struct JSGCRootHashEntry { > JSDHashEntryHdr hdr; > void *root; > const char *name; > } JSGCRootHashEntry; > > /* Initial size of the gcRootsHash table (SWAG, small enough to amortize). */ > #define GC_ROOTS_SIZE 256 > >-#if CHUNKED_ARENA_ALLOCATION >- >-/* >- * For a CPU with extremely large pages using them for GC things wastes >- * too much memory. >- */ >-# define GC_ARENAS_PER_CPU_PAGE_LIMIT JS_BIT(18 - GC_ARENA_SHIFT) >- >-JS_STATIC_ASSERT(GC_ARENAS_PER_CPU_PAGE_LIMIT <= NO_FREE_ARENAS); >- >-#endif >- > JSBool >-js_InitGC(JSRuntime *rt, uint32 maxbytes) >+js_InitGC(JSRuntime *rt, size_t maxbytes) > { >+ /* Overallocate so we can make sure our heap is aligned. */ >+ size_t bytes = maxbytes + GC_ARENA_SIZE - 1; >+ rt->gcBase = uintptr_t(malloc(bytes)); Long-standing silly embedding practice was to pass 0xffffffff or (uint32)-1 for maxbytes. You must not overflow to wrap to a small size, which you then buffer-overrun. Best course is to clamp against a sane upper bound (2^30 on 32 bits, 2^40 on 64?). Don't test 0 assuming null -> 0, use a void *base = malloc(bytes) temporary. I thought we were mmapping/virtual-allocating? For a big allocation malloc may well do that (should) but don't we want to decommit physical memory if possible? >+ rt->gcPtr = uintptr_t((rt->gcBase + GC_ARENA_SIZE - 1) & (~GC_ARENA_MASK)); uintptr_t -> jsuword Use GC_ARENA_MASK not GC_ARENA_SIZE - 1 (see other such expressions, e.g. jsgc.cpp: chunk = ((jsuword) p + GC_ARENA_MASK) & ~GC_ARENA_MASK; and follow suit). Don't overparenthesize ~GC_ARENA_MASK :-P. /be
Attachment #399618 - Flags: review-
To quote Andreas: "bugzilla should magically delete overcited code for me." /be
Attached patch new version (obsolete) (deleted) — Splinter Review
Attachment #399618 - Attachment is obsolete: true
Attached patch changed maxbyte handling (obsolete) (deleted) — Splinter Review
Attachment #399852 - Attachment is obsolete: true
Attachment #400169 - Flags: review?(brendan)
Attachment #400169 - Flags: review?(brendan) → review+
Comment on attachment 400169 [details] [diff] [review] changed maxbyte handling >+ void *base = malloc(bytes); >+ if (base == NULL) House style wants if (!base). s/MAXBYTES_LIMIT/GC_&/g to match other #defines from jsgc.{h,cpp}. r=me with these nits picked. Thanks, /be
Attached patch new version (deleted) — Splinter Review
Attachment #400169 - Attachment is obsolete: true
New Performance numbers wo/with patch: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.024x as fast 756.4ms +/- 0.1% 738.5ms +/- 0.2% significant ============================================================================= 3d: 1.035x as fast 115.7ms +/- 0.4% 111.8ms +/- 0.5% significant cube: 1.041x as fast 33.1ms +/- 0.7% 31.8ms +/- 0.9% significant morph: 1.080x as fast 23.0ms +/- 0.0% 21.3ms +/- 1.6% significant raytrace: 1.015x as fast 59.6ms +/- 0.6% 58.7ms +/- 0.6% significant access: 1.028x as fast 115.7ms +/- 0.8% 112.6ms +/- 0.5% significant binary-trees: 1.034x as fast 33.5ms +/- 1.1% 32.4ms +/- 1.1% significant fannkuch: *1.010x as slow* 49.3ms +/- 0.7% 49.8ms +/- 0.6% significant nbody: 1.148x as fast 21.7ms +/- 1.6% 18.9ms +/- 1.2% significant nsieve: ?? 11.2ms +/- 2.7% 11.5ms +/- 3.3% not conclusive: might be *1.027x as slow* bitops: 1.059x as fast 32.1ms +/- 1.6% 30.3ms +/- 2.2% significant 3bit-bits-in-byte: ?? 1.3ms +/- 26.6% 1.4ms +/- 26.4% not conclusive: might be *1.077x as slow* bits-in-byte: - 8.3ms +/- 4.2% 8.3ms +/- 4.2% bitwise-and: - 2.1ms +/- 10.8% 2.0ms +/- 0.0% nsieve-bits: 1.097x as fast 20.4ms +/- 1.8% 18.6ms +/- 2.0% significant controlflow: - 27.9ms +/- 0.8% 27.9ms +/- 0.8% recursive: - 27.9ms +/- 0.8% 27.9ms +/- 0.8% crypto: 1.033x as fast 46.9ms +/- 1.5% 45.4ms +/- 1.3% significant aes: 1.047x as fast 26.7ms +/- 1.8% 25.5ms +/- 2.0% significant md5: - 12.9ms +/- 1.8% 12.9ms +/- 1.8% sha1: - 7.3ms +/- 4.7% 7.0ms +/- 0.0% date: 1.016x as fast 108.7ms +/- 0.4% 107.0ms +/- 0.5% significant format-tofte: 1.007x as fast 57.1ms +/- 0.4% 56.7ms +/- 0.6% significant format-xparb: 1.026x as fast 51.6ms +/- 0.7% 50.3ms +/- 0.7% significant math: ?? 25.9ms +/- 3.3% 26.4ms +/- 1.9% not conclusive: might be *1.019x as slow* cordic: - 8.6ms +/- 4.3% 8.3ms +/- 4.2% partial-sums: *1.068x as slow* 11.8ms +/- 2.6% 12.6ms +/- 2.9% significant spectral-norm: - 5.5ms +/- 6.8% 5.5ms +/- 6.8% regexp: - 34.7ms +/- 1.0% 34.4ms +/- 1.1% dna: - 34.7ms +/- 1.0% 34.4ms +/- 1.1% string: 1.025x as fast 248.8ms +/- 0.2% 242.7ms +/- 0.2% significant base64: 1.056x as fast 11.3ms +/- 3.1% 10.7ms +/- 3.2% significant fasta: 1.055x as fast 53.6ms +/- 0.7% 50.8ms +/- 0.6% significant tagcloud: 1.006x as fast 80.6ms +/- 0.5% 80.1ms +/- 0.3% significant unpack-code: 1.022x as fast 80.5ms +/- 0.5% 78.8ms +/- 0.4% significant validate-input: 1.022x as fast 22.8ms +/- 1.3% 22.3ms +/- 1.5% significant
Blocks: 504640
Gregor Wagner contributed significantly to this patch. http://hg.mozilla.org/tracemonkey/rev/5f449dffdff5
Whiteboard: fixed-in-tracemonkey
No longer blocks: 504640
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
http://hg.mozilla.org/mozilla-central/rev/99c3d3e1722d backs this out on mozilla-central to see if it's the cause of the Tp4 heap size regression.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Whiteboard: fixed-in-tracemonkey
I was looking at the regressions for MacOSX Darwin 9.0.0 tracemonkey talos tp4 pbytes. (in millions) 120 initial before patch 153 applying patch 122 after removing memset((void *)rt->gcBase, 0, bytes); 123 right before big merge 140 merge 138 right before back out 136 after back out 138 last build I learned from the newsgroup today: It looks like it means virtual memory utilization on all the platforms except for OS X, so the regression there probably isn't good... and: - On OS X it is the task's resident size. (bug 516497) This shows that this patch was not the reason for the regression on OS X.
Please do try calloc. See bug 518730 comment 0. /be
I see a minimal performance win on my system. For the regression tests I would need access to the try-server. FROM = malloc TO = calloc BTW: malloc + memset((void *)rt->gcBase, 0, bytes) runs also about 1208ms. TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: - 1211.1ms +/- 0.2% 1208.8ms +/- 0.1% ============================================================================= 3d: 1.004x as fast 177.4ms +/- 0.3% 176.7ms +/- 0.2% significant cube: 1.010x as fast 50.8ms +/- 0.6% 50.3ms +/- 0.7% significant morph: - 36.7ms +/- 0.9% 36.6ms +/- 1.0% raytrace: - 89.9ms +/- 0.3% 89.8ms +/- 0.3% access: - 197.4ms +/- 0.4% 197.0ms +/- 0.4% binary-trees: ?? 52.5ms +/- 0.7% 52.7ms +/- 0.7% not conclusive: might be *1.004x as slow* fannkuch: - 90.6ms +/- 0.4% 90.6ms +/- 0.4% nbody: - 35.6ms +/- 1.0% 35.3ms +/- 1.0% nsieve: - 18.7ms +/- 1.8% 18.4ms +/- 2.0% bitops: - 53.2ms +/- 1.1% 53.2ms +/- 1.1% 3bit-bits-in-byte: - 2.3ms +/- 15.0% 2.3ms +/- 15.0% bits-in-byte: - 14.4ms +/- 2.6% 14.4ms +/- 2.6% bitwise-and: - 2.9ms +/- 7.8% 2.8ms +/- 10.8% nsieve-bits: ?? 33.6ms +/- 1.1% 33.7ms +/- 1.0% not conclusive: might be *1.003x as slow* controlflow: - 51.0ms +/- 0.0% 51.0ms +/- 0.0% recursive: - 51.0ms +/- 0.0% 51.0ms +/- 0.0% crypto: - 72.0ms +/- 2.3% 71.6ms +/- 1.3% aes: - 41.9ms +/- 3.6% 41.3ms +/- 2.5% md5: - 19.1ms +/- 1.2% 19.1ms +/- 1.2% sha1: *1.018x as slow* 11.0ms +/- 0.0% 11.2ms +/- 2.7% significant date: - 163.3ms +/- 0.3% 163.1ms +/- 0.2% format-tofte: - 86.5ms +/- 0.4% 86.3ms +/- 0.4% format-xparb: - 76.8ms +/- 0.4% 76.8ms +/- 0.4% math: ?? 44.6ms +/- 1.4% 44.7ms +/- 0.8% not conclusive: might be *1.002x as slow* cordic: ?? 14.9ms +/- 1.5% 15.0ms +/- 0.0% not conclusive: might be *1.007x as slow* partial-sums: ?? 20.5ms +/- 1.8% 20.6ms +/- 1.8% not conclusive: might be *1.005x as slow* spectral-norm: - 9.2ms +/- 3.3% 9.1ms +/- 2.5% regexp: - 59.4ms +/- 0.6% 59.2ms +/- 0.5% dna: - 59.4ms +/- 0.6% 59.2ms +/- 0.5% string: - 392.8ms +/- 0.2% 392.3ms +/- 0.2% base64: - 16.8ms +/- 1.8% 16.4ms +/- 2.3% fasta: - 83.6ms +/- 0.4% 83.5ms +/- 0.5% tagcloud: ?? 124.3ms +/- 0.3% 124.5ms +/- 0.3% not conclusive: might be *1.002x as slow* unpack-code: - 132.8ms +/- 0.3% 132.7ms +/- 0.3% validate-input: - 35.3ms +/- 1.0% 35.2ms +/- 0.9%
This made it onto 1.9.2 somehow. It shouldn't be there. Its backed out from TM.
(In reply to comment #34) > This made it onto 1.9.2 somehow. It shouldn't be there. Its backed out from TM. I accidentally pushed this and its backout: http://hg.mozilla.org/releases/mozilla-1.9.2/rev/f6bcaffb09ef so it's not there. sorry for the confusion.
Assignee: gal → anygregor
Gregor, do we still need this?
The intention was: The attached patch allocates the GC heap in a single chunk. This allows very fast identification of GC things, which is handy for conservative stack scanning. This patch is less important now because we increased the chunk size. We might want something similar in our new compartments. We should measure the impact once conservative stack scanning is enabled and decide if this is still necessary.
Status: REOPENED → RESOLVED
Closed: 15 years ago13 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: