Closed Bug 402614 Opened 17 years ago Closed 15 years ago

Study JS string length populations and lifetimes (was: Fat small string)

Categories

(Core :: JavaScript Engine, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED DUPLICATE of bug 553541

People

(Reporter: igor, Assigned: gal)

References

Details

(Keywords: perf)

Attachments

(1 file, 1 obsolete file)

The current representation for strings in SpiderMonkey is 2-word JSString structure that points to separately allocated array with 2-byte chars. For small strings this is wasteful. So it would be nice to introduce/implement a special string type that allocates the chars together with JSString header.
A possible implementation of the idea would borrow one bit from JSString.length to denote a special string type. For example, without any special support from GC to allocate fat strings it is straightforward to fit up to 3 2-byte chars into JSString. Add to that a special treatment of printable ascii text and encoding supporting up to 10 chars per JSString is feasible.
Keywords: perf
Memories of six-bit on DEC PDP 10 (36-bit word, 6 chars per word) come back. Got a patch to try? /be
Stuart has data to share, I think. /be
Stuart's histograms (please attach!) show most string chars vectors are short enough to be allocated in the JS GC heap. This is an easy change that will avoid putting potentially long-lived allocations in malloc-owned pages. Whether JS GC heap pages are underutilized due to many old strings scattered over many pages that are otherwise free even with the above change remains to be seen, but we can work on that harder inside the JS engine. /be
these are all from http://people.mozilla.com/~pavlov/frag/allocs8.txt.gz There are likely more but these seem to be the bigger callstacks. The value numbers are bins in bytes and the count is the number of allocations in that bin of that size. Hope this helps. libSystem.B.dylib`malloc+0x37 libmozjs.dylib`JS_malloc+0x37 libmozjs.dylib`js_InflateString+0x27 libmozjs.dylib`js_Atomize+0x60 libmozjs.dylib`JS_LookupPropertyWithFlags+0x42 XUL`nsXBLBinding::DoInitJSClass(JSContext*, JSObject*, JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**)+0x132 XUL`nsXBLPrototypeBinding::InitClass(nsCString const&, JSContext*, JSObject*, JSObject*, void**)+0x43 XUL`nsXBLProtoImpl::InitTargetObjects(nsXBLPrototypeBinding*, nsIScriptContext*, nsIContent*, nsIXPConnectJSObjectHolder**, void**)+0x12d value ------------- Distribution ------------- count 32 | 0 64 |@@@@@@@@@@@@@@@@@@@@@ 2121 128 |@@@@@@@@@@@@@@@@@@@ 1982 256 | 0 libSystem.B.dylib`realloc+0xd1 libmozjs.dylib`JS_realloc+0x31 libmozjs.dylib`js_ConcatStrings+0x16f libmozjs.dylib`js_Interpret+0x38ef libmozjs.dylib`js_Execute+0x2a0 libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x89 XUL`nsJSContext::EvaluateString(nsAString_internal const&, void*, nsIPrincipal*, char const*, unsigned int, unsigned int, nsAString_internal*, int*)+0x2bb XUL`nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsString const&)+0x22b value ------------- Distribution ------------- count 8 | 0 16 |@@@@@@@ 535 32 |@@@@@@@@@ 651 64 |@@@@@@@@ 595 128 |@@@@@@@@@ 620 256 |@@@@@ 337 512 |@ 63 1024 |@ 81 2048 | 0 libSystem.B.dylib`malloc+0x37 libmozjs.dylib`JS_malloc+0x37 libmozjs.dylib`js_NewStringCopyN+0x1f libmozjs.dylib`js_AtomizeString+0xc6 libmozjs.dylib`js_AtomizeChars+0x2e libmozjs.dylib`js_GetToken+0x129b libmozjs.dylib`js_MatchToken+0x19 libmozjs.dylib`AssignExpr+0x5d value ------------- Distribution ------------- count 2 | 0 4 |@@ 440 8 |@@@@@@@@@@@@@@ 3217 16 |@@@@@@@@@@@ 2586 32 |@@@@@@@@ 1789 64 |@@@ 777 128 |@ 281 256 |@ 205 512 | 21 1024 | 0 libSystem.B.dylib`realloc+0xd1 libmozjs.dylib`JS_realloc+0x31 libmozjs.dylib`js_ConcatStrings+0x16f libmozjs.dylib`js_Interpret+0x38ef libmozjs.dylib`js_Invoke+0x8f3 libmozjs.dylib`fun_call+0x1b3 libmozjs.dylib`js_Interpret+0x6a28 libmozjs.dylib`js_Invoke+0x8f3 value ------------- Distribution ------------- count 8 | 0 16 |@@@ 94 32 |@@@ 117 64 |@@@@@@@@@ 314 128 |@@@@@@@@@@@@@@ 509 256 |@@@@@@@@@ 342 512 |@ 48 1024 |@ 28 2048 | 0 libSystem.B.dylib`malloc+0x37 libmozjs.dylib`JS_malloc+0x37 libmozjs.dylib`js_ConcatStrings+0x11a libmozjs.dylib`js_Interpret+0x38ef libmozjs.dylib`js_Execute+0x2a0 libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x89 XUL`nsJSContext::EvaluateString(nsAString_internal const&, void*, nsIPrincipal*, char const*, unsigned int, unsigned int, nsAString_internal*, int*)+0x2bb XUL`nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsString const&)+0x22b value ------------- Distribution ------------- count 2 | 0 4 |@ 318 8 |@@@@@@@@@@@ 2472 16 |@@@@@@@@@@@@@@@@@@@@@@ 5036 32 |@@@@@ 1139 64 |@ 187 128 | 47 256 | 47 512 | 9 1024 | 56 2048 | 0 libSystem.B.dylib`malloc+0x37 libmozjs.dylib`JS_malloc+0x37 libmozjs.dylib`js_InflateString+0x27 libmozjs.dylib`JS_NewStringCopyZ+0x4c libmozjs.dylib`js_GetPrinterOutput+0x38 libmozjs.dylib`JS_DecompileFunction+0x57 libmozjs.dylib`fun_toStringHelper+0x158 libmozjs.dylib`fun_toString+0x1c value ------------- Distribution ------------- count 32 | 0 64 |@@ 30 128 |@@@@@@@@@@@@@@@@@@ 324 256 |@@@@@@@@@@@ 212 512 |@@@@@@@@ 154 1024 |@ 20 2048 | 0 libSystem.B.dylib`malloc+0x37 libmozjs.dylib`JS_malloc+0x37 libmozjs.dylib`js_NewStringCopyN+0x1f libmozjs.dylib`js_AtomizeString+0xc6 libmozjs.dylib`js_AtomizeChars+0x2e libmozjs.dylib`js_XDRStringAtom+0xda libmozjs.dylib`js_XDRAtom+0x5b libmozjs.dylib`js_XDRScript+0x4ce value ------------- Distribution ------------- count 2 | 0 4 | 77 8 |@@@ 506 16 |@@@@@@@@@@@@@@@ 2517 32 |@@@@@@@@@@@@@@@@ 2716 64 |@@@@ 715 128 | 70 256 | 6 512 | 0 1024 | 1 2048 | 0 libSystem.B.dylib`realloc+0xd1 libmozjs.dylib`JS_realloc+0x31 libmozjs.dylib`js_ConcatStrings+0x16f libmozjs.dylib`js_Interpret+0x38ef libmozjs.dylib`js_Invoke+0x8f3 libmozjs.dylib`js_InvokeConstructor+0x182 libmozjs.dylib`js_Interpret+0x4679 libmozjs.dylib`js_Invoke+0x8f3 value ------------- Distribution ------------- count 8 | 0 16 | 10 32 |@@@@@ 95 64 |@@@@@@@@@@@ 231 128 |@@@@@@@@@@ 196 256 |@@@@@@@@@@@@ 240 512 |@ 27 1024 |@ 20 2048 | 0
Not sure this bug answers my question. My question was: In a real browser embedding, if we were for some reason to increase the string size to 4 words from 2, would anyone notice memory-wise. I am not suggesting 3 words, because thats hard to align. I am not suggesting 5, because I can't find good use for more than 4 words in a string header with the current string types supported. Maybe the bug answers the question and I just don't understand the diagrams? My desire to move away from the current bit crunched header is motivated by the fact that bit-uncrunching is slow (as in measured in shark slow). Whether its worth trying something else depends on how much memory we would pay for it. Also, other VMs seem to use larger string representations and their world hasn't ended yet.
No one said the world would end. The point is to get data first. What data do you have about string length masking costing us on benchmarks? This bug is really about allocating string chars with the JSString header, up to some relatively small string length -- if the data supports such fused allocation policy saving memory and/or cycles. /be
Flags: wanted1.9.2?
(In reply to comment #7) > > My desire to move away from the current bit crunched header is motivated by the > fact that bit-uncrunching is slow (as in measured in shark slow). (In reply to comment #8) > The point is to get data first. What data do > you have about string length masking costing us on benchmarks?
I am not proposing fat small strings at this point. I am not really proposing anything at this point. If we were to increase the size of strings, 4 words would be the only sensible next step up. I would like to know whether that's an option from a memory utilization perspective. If our memory use spikes out of control, we clearly don't even have to think about going down this path. Sticking 2 words in there and measuring the memory impact seems a reasonable litmus test to me before I start trying to figure out how to make the processor not take a coffee break every time we touch our string headers.
A secondary motivation is to get rid of byte-sized gctags, by the way. Strings store their external finalizer information in there. I would like to move that somewhere else so we can mark a bitmap, which is more compact and reduces the internal working set size of the mark phase, which literature claims is important. Again, I would like to understand with some simple measurements the cost of various alternatives before trying to come up with a fix. We could also move each external finalizer type into its own pool and tag the entire pool with a finalizer type. That would work with the current string header. But again, I would like to see data first, and then think about implementation.
(In reply to comment #11) > A secondary motivation is to get rid of byte-sized gctags, by the way. Strings > store their external finalizer information in there. I would like to move that > somewhere else so we can mark a bitmap, which is more compact and reduces the > internal working set size of the mark phase, which literature claims is > important. Again, I would like to understand with some simple measurements the > cost of various alternatives before trying to come up with a fix. Getting rid of gc thing flags (not "gctags") is a great idea for all thing types but not via anything like bug 503769. We might want all of: * no gc-thing flag bytes allocated from the other end of the same arena as the thing, which has poor cache effects; * fatter string headers for less bit banging; * fatter string headers to inline-allocate short strings' chars. * fused string header and chars in general, if the GC can handle size variation (lots of size classes). Filing a new bug to study string size classes and lifetimes would be ok, but this bug has some data and seems as good. It captures at least the last two points above. Please file other bugs on the first two. > We could also move each external finalizer type into its own pool and tag the > entire pool with a finalizer type. That's a good idea for all thing types, maybe. Could be studied separately (bug filed on the first bullet above). > That would work with the current string > header. It has the advantage of working with all current gc-thing types. Separating the type-specific string optimizations from generic-for-all-gc-things improvements should let more work happen in parallel. > But again, I would like to see data first, and then think about > implementation. Yeah, let's get data based on workload, which could then be run through real or simulated alternative implementations. Not data from jacking up one parameter to see if you seem to "get away with it" for sunspider per top or taskmanager :-P. /be
Summary: Fat small string → Study JS string length populations and lifetimes (was: Fat small string)
I completely agree. What you propose is _exactly_ what I am trying to propose. I just proposed the first step first, and not the whole set of possible follow-up measurements at once. I already made a case why I think we don't have to measure 3 words. Going to 4 words is the smallest possible increment that makes sense from the status quo. If 4 gives us a good memory profile, we should measure 8, 16, N. If it DOESN'T, we can stop there. Hence my proposal to measure 4, and the we decide the next step. I am not trying to dictate implementation here, but I don't want to dictate unnecessary measurements either.
Summary: Study JS string length populations and lifetimes (was: Fat small string) → Fat small string
(In reply to comment #13) > I completely agree. What you propose is _exactly_ what I am trying to propose. > I just proposed the first step first, and not the whole set of possible > follow-up measurements at once. No. You're proposing perturbing the current impl with two extra dummy words per JSString. That does *not* measure string size populations or lifetimes (so don't stomp my changes to the summary). > I already made a case why I think we don't have to measure 3 words. Going to 4 > words is the smallest possible increment that makes sense from the status quo. Measure workload, not one noisy macro-effect on a limited benchmark for a change to overhead of the implementation. > If 4 gives us a good memory profile, we should measure 8, 16, N. If it DOESN'T, > we can stop there. You're not improvement memory profile by adding dummy words! This makes zero sense. If you actually used the overhead to win space back (storing short strings' chars, e.g. -- fat small strings) then your words "gives us a good memory profile" would make more sense. > Hence my proposal to measure 4, and the we decide the next step. Measure workload, not attempts to get a free lunch on overhead increases. > I am not trying to dictate implementation here, but I don't want to dictate > unnecessary measurements either. What is necessary is to optimize for workload, which means measuring workload. /be
Summary: Fat small string → Study JS string length populations and lifetimes (was: Fat small string)
Attached image Integral over string size from GMail (deleted) —
My bug was duped against this so I am grabbing this one. If we warm up the GC heap, the patch is a 7ms speedup.
Assignee: igor → gal
Depends on: 506174
Attached patch patch (obsolete) (deleted) — Splinter Review
Comment on attachment 390396 [details] [diff] [review] patch >@@ -1688,16 +1685,21 @@ extern JSObject* js_NewGCObject(JSContex > return NewGCThing<JSObject>(cx, flags); > } > > extern JSString* js_NewGCString(JSContext *cx, uintN flags) > { > return NewGCThing<JSString>(cx, flags); > } > >+extern JSShortString* js_NewGCShortString(JSContext *cx, uintN flags) >+{ >+ return NewGCThing<JSShortString>(cx, flags); >+} >+ > extern JSFunction* js_NewGCFunction(JSContext *cx, uintN flags) > { > return NewGCThing<JSFunction>(cx, flags); > } > > extern JSXML* js_NewGCXML(JSContext *cx, uintN flags) Lose the externs and put a space before the * and break the line after, before function name -- it's past time. >@@ -169,16 +169,19 @@ struct JSGCThing { > * values stored in the partially initialized thing. > */ > extern JSObject* > js_NewGCObject(JSContext *cx, uintN flags); > > extern JSString* > js_NewGCString(JSContext *cx, uintN flags); > >+extern JSShortString* >+js_NewGCShortString(JSContext *cx, uintN flags); >+ > extern JSFunction* > js_NewGCFunction(JSContext *cx, uintN flags); > > extern JSXML* > js_NewGCXML(JSContext *cx, uintN flags); When in Rome applied here better, but Vandals sacked the space-before-*-not-after convention. Reapply it. >@@ -122,16 +122,17 @@ typedef struct JSAtomState JSAt > typedef struct JSCodeSpec JSCodeSpec; > typedef struct JSPrinter JSPrinter; > typedef struct JSRegExp JSRegExp; > typedef struct JSRegExpStatics JSRegExpStatics; > typedef struct JSScope JSScope; > typedef struct JSScopeOps JSScopeOps; > typedef struct JSScopeProperty JSScopeProperty; > typedef struct JSStackHeader JSStackHeader; >+typedef struct JSShortString JSShortString; Alphabetical order prevailed till this point, since the list is overlong and there's no better order for finding a type when scanning. Move up a line. >+JSString * >+js_NewString(JSContext *cx, jschar *chars, size_t length) Why is this moving? Please move back for smaller patch. >+{ >+ JSString *str; >+ >+ if (length > JSString::MAX_LENGTH) { I'm going to hold you to the ">" and "MAX" here :-/. >+JSString * >+js_NewEmptyString(JSContext *cx, size_t length) >+{ >+ const size_t MAX_LENGTH = (sizeof(JSShortString) - sizeof(JSString)) / sizeof(jschar); This MAX_LENGTH should not be named the same as JSString::MAX_LENGTH. MAX_SHORT_LENGTH (see below). There is no point in redundantly deriving MAX_SHORT_LENGTH from struct size differencing (scaled); rather use the dimension used in the definition of JSShortString's data member (which should be mData to match the other members in JSString). >+ >+ if ((length + 1) < MAX_LENGTH) { Don't over-parenthesize. This cannot be right given the definition of MAX_LENGTH and the earlier (different limit, same maximum not fencepost idea). MAX_LENGTH as you defined it just above (const size_t local to js_NewEmptyString) is not a fencepost for length + 1, it is a maximum value for capacity (length + 1). Lose the + 1 and use <= MAX_SHORT_LENGTH. >+ JSShortString* sstr = js_NewGCShortString(cx, GCX_STRING); Space before * not after -- prevailing style in jsstr.cpp is consistent. >+ jschar* chars = (jschar*)JS_malloc(cx, (length + 1) * sizeof(jschar)); Space before * not after (and space before * in the cast too -- typically we use a space between the cast and a JS_malloc(...) call to let the code breathe a bit instead of looking like (foo*)...( line noise). >+ if (!chars) >+ return NULL; >+ JSString* str = js_NewString(cx, chars, length); Ditto. > js_NewStringCopyN(JSContext *cx, const jschar *s, size_t n) > { >- jschar *news; >- JSString *str; >- >- news = (jschar *) JS_malloc(cx, (n + 1) * sizeof(jschar)); Here's an example of prevailing cast + malloc style. >+ JSString *str = js_NewEmptyString(cx, n); >+ if (!str) >+ return NULL; >+ >+ JS_ASSERT(str->isFlat()); >+ jschar* chars = str->flatChars(); Space before * not after. >+ JS_ASSERT(str->isFlat()); >+ jschar* chars = str->flatChars(); Again. >+ bool isShort() { >+ return mChars == (jschar*)(this + 1); Space the cast as surrounding code does. >+struct JSShortString : public JSString { class not struct, no C API users requiring struct. >+ jschar data[sizeof(jschar) * 16 - sizeof(JSString)]; mData, and pull out the dimension so you can use it in the .cpp file. For use in the .cpp file, pull out a const size_t MAX_SHORT_LENGTH = 15 definition, and dimension like so: jschar mData[MAX_SHORT_LENGTH + 1]. >+ >+ jschar* initShort(size_t length) { Space before * not after. >+ JS_ASSERT(length <= MAX_LENGTH); This MAX_LENGTH can't be right. Use MAX_SHORT_LENGTH. >+ mLength = length; >+ return mChars = data; >+ } >+}; >+JS_STATIC_ASSERT(sizeof(JSShortString) > sizeof(JSString)); This is a vacuous assertion. There's no way on earth the size could be < or ==. /be
Sorry, I missed the inlining reason to move js_NewString. Odd to have it out of line and inline but if it helps gcc on Mac... (does it matter to MSVC or ICC?). > >+ jschar data[sizeof(jschar) * 16 - sizeof(JSString)]; > > mData, and pull out the dimension so you can use it in the .cpp file. > > For use in the .cpp file, pull out a const size_t MAX_SHORT_LENGTH = 15 I changed the size, didn't I? 40 byte JSShortString isn't so good a size class. But the sizeof based definition is obscure. MAX_SHORT_LENGTH still wins as name and way to dimension (letting compiler compute size_t scaling by jschar size). How about MAX_SHORT_LENGTH = 27? From Gregor's integration that looks to cover > 85% of cases. 56 bytes + 8 for JSString == 64 -- a JS_STATIC_ASSERT that the size is a power of two would be non-vacuous and helpful. /be
Blocks: 505308
what's the status here?
Depends on: 553541
Attachment #390396 - Attachment is obsolete: true
patch farmed out into bug 553541
Depends on: 553824
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: