Closed
Bug 402614
Opened 17 years ago
Closed 15 years ago
Study JS string length populations and lifetimes (was: Fat small string)
Categories
(Core :: JavaScript Engine, enhancement)
Core
JavaScript Engine
Tracking
()
RESOLVED
DUPLICATE
of bug 553541
People
(Reporter: igor, Assigned: gal)
References
Details
(Keywords: perf)
Attachments
(1 file, 1 obsolete file)
(deleted),
image/png
|
Details |
The current representation for strings in SpiderMonkey is 2-word JSString structure that points to separately allocated array with 2-byte chars. For small strings this is wasteful. So it would be nice to introduce/implement a special string type that allocates the chars together with JSString header.
Reporter | ||
Comment 1•17 years ago
|
||
A possible implementation of the idea would borrow one bit from JSString.length to denote a special string type. For example, without any special support from GC to allocate fat strings it is straightforward to fit up to 3 2-byte chars into JSString. Add to that a special treatment of printable ascii text and encoding supporting up to 10 chars per JSString is feasible.
Comment 2•17 years ago
|
||
Memories of six-bit on DEC PDP 10 (36-bit word, 6 chars per word) come back. Got a patch to try?
/be
Comment 3•17 years ago
|
||
Stuart has data to share, I think.
/be
Comment 4•17 years ago
|
||
Stuart's histograms (please attach!) show most string chars vectors are short enough to be allocated in the JS GC heap. This is an easy change that will avoid putting potentially long-lived allocations in malloc-owned pages.
Whether JS GC heap pages are underutilized due to many old strings scattered over many pages that are otherwise free even with the above change remains to be seen, but we can work on that harder inside the JS engine.
/be
Comment 5•17 years ago
|
||
these are all from http://people.mozilla.com/~pavlov/frag/allocs8.txt.gz
There are likely more but these seem to be the bigger callstacks. The value numbers are bins in bytes and the count is the number of allocations in that bin of that size. Hope this helps.
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_malloc+0x37
libmozjs.dylib`js_InflateString+0x27
libmozjs.dylib`js_Atomize+0x60
libmozjs.dylib`JS_LookupPropertyWithFlags+0x42
XUL`nsXBLBinding::DoInitJSClass(JSContext*, JSObject*, JSObject*, nsCString const&, nsXBLPrototypeBinding*, void**)+0x132
XUL`nsXBLPrototypeBinding::InitClass(nsCString const&, JSContext*, JSObject*, JSObject*, void**)+0x43
XUL`nsXBLProtoImpl::InitTargetObjects(nsXBLPrototypeBinding*, nsIScriptContext*, nsIContent*, nsIXPConnectJSObjectHolder**, void**)+0x12d
value ------------- Distribution ------------- count
32 | 0
64 |@@@@@@@@@@@@@@@@@@@@@ 2121
128 |@@@@@@@@@@@@@@@@@@@ 1982
256 | 0
libSystem.B.dylib`realloc+0xd1
libmozjs.dylib`JS_realloc+0x31
libmozjs.dylib`js_ConcatStrings+0x16f
libmozjs.dylib`js_Interpret+0x38ef
libmozjs.dylib`js_Execute+0x2a0
libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x89
XUL`nsJSContext::EvaluateString(nsAString_internal const&, void*, nsIPrincipal*, char const*, unsigned int, unsigned int, nsAString_internal*, int*)+0x2bb
XUL`nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsString const&)+0x22b
value ------------- Distribution ------------- count
8 | 0
16 |@@@@@@@ 535
32 |@@@@@@@@@ 651
64 |@@@@@@@@ 595
128 |@@@@@@@@@ 620
256 |@@@@@ 337
512 |@ 63
1024 |@ 81
2048 | 0
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_malloc+0x37
libmozjs.dylib`js_NewStringCopyN+0x1f
libmozjs.dylib`js_AtomizeString+0xc6
libmozjs.dylib`js_AtomizeChars+0x2e
libmozjs.dylib`js_GetToken+0x129b
libmozjs.dylib`js_MatchToken+0x19
libmozjs.dylib`AssignExpr+0x5d
value ------------- Distribution ------------- count
2 | 0
4 |@@ 440
8 |@@@@@@@@@@@@@@ 3217
16 |@@@@@@@@@@@ 2586
32 |@@@@@@@@ 1789
64 |@@@ 777
128 |@ 281
256 |@ 205
512 | 21
1024 | 0
libSystem.B.dylib`realloc+0xd1
libmozjs.dylib`JS_realloc+0x31
libmozjs.dylib`js_ConcatStrings+0x16f
libmozjs.dylib`js_Interpret+0x38ef
libmozjs.dylib`js_Invoke+0x8f3
libmozjs.dylib`fun_call+0x1b3
libmozjs.dylib`js_Interpret+0x6a28
libmozjs.dylib`js_Invoke+0x8f3
value ------------- Distribution ------------- count
8 | 0
16 |@@@ 94
32 |@@@ 117
64 |@@@@@@@@@ 314
128 |@@@@@@@@@@@@@@ 509
256 |@@@@@@@@@ 342
512 |@ 48
1024 |@ 28
2048 | 0
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_malloc+0x37
libmozjs.dylib`js_ConcatStrings+0x11a
libmozjs.dylib`js_Interpret+0x38ef
libmozjs.dylib`js_Execute+0x2a0
libmozjs.dylib`JS_EvaluateUCScriptForPrincipals+0x89
XUL`nsJSContext::EvaluateString(nsAString_internal const&, void*, nsIPrincipal*, char const*, unsigned int, unsigned int, nsAString_internal*, int*)+0x2bb
XUL`nsScriptLoader::EvaluateScript(nsScriptLoadRequest*, nsString const&)+0x22b
value ------------- Distribution ------------- count
2 | 0
4 |@ 318
8 |@@@@@@@@@@@ 2472
16 |@@@@@@@@@@@@@@@@@@@@@@ 5036
32 |@@@@@ 1139
64 |@ 187
128 | 47
256 | 47
512 | 9
1024 | 56
2048 | 0
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_malloc+0x37
libmozjs.dylib`js_InflateString+0x27
libmozjs.dylib`JS_NewStringCopyZ+0x4c
libmozjs.dylib`js_GetPrinterOutput+0x38
libmozjs.dylib`JS_DecompileFunction+0x57
libmozjs.dylib`fun_toStringHelper+0x158
libmozjs.dylib`fun_toString+0x1c
value ------------- Distribution ------------- count
32 | 0
64 |@@ 30
128 |@@@@@@@@@@@@@@@@@@ 324
256 |@@@@@@@@@@@ 212
512 |@@@@@@@@ 154
1024 |@ 20
2048 | 0
libSystem.B.dylib`malloc+0x37
libmozjs.dylib`JS_malloc+0x37
libmozjs.dylib`js_NewStringCopyN+0x1f
libmozjs.dylib`js_AtomizeString+0xc6
libmozjs.dylib`js_AtomizeChars+0x2e
libmozjs.dylib`js_XDRStringAtom+0xda
libmozjs.dylib`js_XDRAtom+0x5b
libmozjs.dylib`js_XDRScript+0x4ce
value ------------- Distribution ------------- count
2 | 0
4 | 77
8 |@@@ 506
16 |@@@@@@@@@@@@@@@ 2517
32 |@@@@@@@@@@@@@@@@ 2716
64 |@@@@ 715
128 | 70
256 | 6
512 | 0
1024 | 1
2048 | 0
libSystem.B.dylib`realloc+0xd1
libmozjs.dylib`JS_realloc+0x31
libmozjs.dylib`js_ConcatStrings+0x16f
libmozjs.dylib`js_Interpret+0x38ef
libmozjs.dylib`js_Invoke+0x8f3
libmozjs.dylib`js_InvokeConstructor+0x182
libmozjs.dylib`js_Interpret+0x4679
libmozjs.dylib`js_Invoke+0x8f3
value ------------- Distribution ------------- count
8 | 0
16 | 10
32 |@@@@@ 95
64 |@@@@@@@@@@@ 231
128 |@@@@@@@@@@ 196
256 |@@@@@@@@@@@@ 240
512 |@ 27
1024 |@ 20
2048 | 0
Assignee | ||
Comment 7•15 years ago
|
||
Not sure this bug answers my question. My question was: In a real browser embedding, if we were for some reason to increase the string size to 4 words from 2, would anyone notice memory-wise.
I am not suggesting 3 words, because thats hard to align. I am not suggesting 5, because I can't find good use for more than 4 words in a string header with the current string types supported. Maybe the bug answers the question and I just don't understand the diagrams?
My desire to move away from the current bit crunched header is motivated by the fact that bit-uncrunching is slow (as in measured in shark slow). Whether its worth trying something else depends on how much memory we would pay for it.
Also, other VMs seem to use larger string representations and their world hasn't ended yet.
Comment 8•15 years ago
|
||
No one said the world would end. The point is to get data first. What data do you have about string length masking costing us on benchmarks?
This bug is really about allocating string chars with the JSString header, up to some relatively small string length -- if the data supports such fused allocation policy saving memory and/or cycles.
/be
Flags: wanted1.9.2?
Comment 9•15 years ago
|
||
(In reply to comment #7)
>
> My desire to move away from the current bit crunched header is motivated by the
> fact that bit-uncrunching is slow (as in measured in shark slow).
(In reply to comment #8)
> The point is to get data first. What data do
> you have about string length masking costing us on benchmarks?
Assignee | ||
Comment 10•15 years ago
|
||
I am not proposing fat small strings at this point. I am not really proposing anything at this point. If we were to increase the size of strings, 4 words would be the only sensible next step up. I would like to know whether that's an option from a memory utilization perspective. If our memory use spikes out of control, we clearly don't even have to think about going down this path. Sticking 2 words in there and measuring the memory impact seems a reasonable litmus test to me before I start trying to figure out how to make the processor not take a coffee break every time we touch our string headers.
Assignee | ||
Comment 11•15 years ago
|
||
A secondary motivation is to get rid of byte-sized gctags, by the way. Strings store their external finalizer information in there. I would like to move that somewhere else so we can mark a bitmap, which is more compact and reduces the internal working set size of the mark phase, which literature claims is important. Again, I would like to understand with some simple measurements the cost of various alternatives before trying to come up with a fix.
We could also move each external finalizer type into its own pool and tag the entire pool with a finalizer type. That would work with the current string header. But again, I would like to see data first, and then think about implementation.
Comment 12•15 years ago
|
||
(In reply to comment #11)
> A secondary motivation is to get rid of byte-sized gctags, by the way. Strings
> store their external finalizer information in there. I would like to move that
> somewhere else so we can mark a bitmap, which is more compact and reduces the
> internal working set size of the mark phase, which literature claims is
> important. Again, I would like to understand with some simple measurements the
> cost of various alternatives before trying to come up with a fix.
Getting rid of gc thing flags (not "gctags") is a great idea for all thing types but not via anything like bug 503769.
We might want all of:
* no gc-thing flag bytes allocated from the other end of the same arena as the thing, which has poor cache effects;
* fatter string headers for less bit banging;
* fatter string headers to inline-allocate short strings' chars.
* fused string header and chars in general, if the GC can handle size variation (lots of size classes).
Filing a new bug to study string size classes and lifetimes would be ok, but this bug has some data and seems as good. It captures at least the last two points above. Please file other bugs on the first two.
> We could also move each external finalizer type into its own pool and tag the
> entire pool with a finalizer type.
That's a good idea for all thing types, maybe. Could be studied separately (bug filed on the first bullet above).
> That would work with the current string
> header.
It has the advantage of working with all current gc-thing types. Separating the type-specific string optimizations from generic-for-all-gc-things improvements should let more work happen in parallel.
> But again, I would like to see data first, and then think about
> implementation.
Yeah, let's get data based on workload, which could then be run through real or simulated alternative implementations. Not data from jacking up one parameter to see if you seem to "get away with it" for sunspider per top or taskmanager :-P.
/be
Updated•15 years ago
|
Summary: Fat small string → Study JS string length populations and lifetimes (was: Fat small string)
Assignee | ||
Comment 13•15 years ago
|
||
I completely agree. What you propose is _exactly_ what I am trying to propose. I just proposed the first step first, and not the whole set of possible follow-up measurements at once.
I already made a case why I think we don't have to measure 3 words. Going to 4 words is the smallest possible increment that makes sense from the status quo.
If 4 gives us a good memory profile, we should measure 8, 16, N. If it DOESN'T, we can stop there.
Hence my proposal to measure 4, and the we decide the next step.
I am not trying to dictate implementation here, but I don't want to dictate unnecessary measurements either.
Summary: Study JS string length populations and lifetimes (was: Fat small string) → Fat small string
Comment 14•15 years ago
|
||
(In reply to comment #13)
> I completely agree. What you propose is _exactly_ what I am trying to propose.
> I just proposed the first step first, and not the whole set of possible
> follow-up measurements at once.
No. You're proposing perturbing the current impl with two extra dummy words per JSString. That does *not* measure string size populations or lifetimes (so don't stomp my changes to the summary).
> I already made a case why I think we don't have to measure 3 words. Going to 4
> words is the smallest possible increment that makes sense from the status quo.
Measure workload, not one noisy macro-effect on a limited benchmark for a change to overhead of the implementation.
> If 4 gives us a good memory profile, we should measure 8, 16, N. If it DOESN'T,
> we can stop there.
You're not improvement memory profile by adding dummy words! This makes zero sense.
If you actually used the overhead to win space back (storing short strings' chars, e.g. -- fat small strings) then your words "gives us a good memory profile" would make more sense.
> Hence my proposal to measure 4, and the we decide the next step.
Measure workload, not attempts to get a free lunch on overhead increases.
> I am not trying to dictate implementation here, but I don't want to dictate
> unnecessary measurements either.
What is necessary is to optimize for workload, which means measuring workload.
/be
Summary: Fat small string → Study JS string length populations and lifetimes (was: Fat small string)
Comment 16•15 years ago
|
||
Assignee | ||
Comment 17•15 years ago
|
||
My bug was duped against this so I am grabbing this one.
If we warm up the GC heap, the patch is a 7ms speedup.
Assignee: igor → gal
Depends on: 506174
Assignee | ||
Comment 18•15 years ago
|
||
Comment 19•15 years ago
|
||
Comment on attachment 390396 [details] [diff] [review]
patch
>@@ -1688,16 +1685,21 @@ extern JSObject* js_NewGCObject(JSContex
> return NewGCThing<JSObject>(cx, flags);
> }
>
> extern JSString* js_NewGCString(JSContext *cx, uintN flags)
> {
> return NewGCThing<JSString>(cx, flags);
> }
>
>+extern JSShortString* js_NewGCShortString(JSContext *cx, uintN flags)
>+{
>+ return NewGCThing<JSShortString>(cx, flags);
>+}
>+
> extern JSFunction* js_NewGCFunction(JSContext *cx, uintN flags)
> {
> return NewGCThing<JSFunction>(cx, flags);
> }
>
> extern JSXML* js_NewGCXML(JSContext *cx, uintN flags)
Lose the externs and put a space before the * and break the line after, before function name -- it's past time.
>@@ -169,16 +169,19 @@ struct JSGCThing {
> * values stored in the partially initialized thing.
> */
> extern JSObject*
> js_NewGCObject(JSContext *cx, uintN flags);
>
> extern JSString*
> js_NewGCString(JSContext *cx, uintN flags);
>
>+extern JSShortString*
>+js_NewGCShortString(JSContext *cx, uintN flags);
>+
> extern JSFunction*
> js_NewGCFunction(JSContext *cx, uintN flags);
>
> extern JSXML*
> js_NewGCXML(JSContext *cx, uintN flags);
When in Rome applied here better, but Vandals sacked the space-before-*-not-after convention. Reapply it.
>@@ -122,16 +122,17 @@ typedef struct JSAtomState JSAt
> typedef struct JSCodeSpec JSCodeSpec;
> typedef struct JSPrinter JSPrinter;
> typedef struct JSRegExp JSRegExp;
> typedef struct JSRegExpStatics JSRegExpStatics;
> typedef struct JSScope JSScope;
> typedef struct JSScopeOps JSScopeOps;
> typedef struct JSScopeProperty JSScopeProperty;
> typedef struct JSStackHeader JSStackHeader;
>+typedef struct JSShortString JSShortString;
Alphabetical order prevailed till this point, since the list is overlong and there's no better order for finding a type when scanning. Move up a line.
>+JSString *
>+js_NewString(JSContext *cx, jschar *chars, size_t length)
Why is this moving? Please move back for smaller patch.
>+{
>+ JSString *str;
>+
>+ if (length > JSString::MAX_LENGTH) {
I'm going to hold you to the ">" and "MAX" here :-/.
>+JSString *
>+js_NewEmptyString(JSContext *cx, size_t length)
>+{
>+ const size_t MAX_LENGTH = (sizeof(JSShortString) - sizeof(JSString)) / sizeof(jschar);
This MAX_LENGTH should not be named the same as JSString::MAX_LENGTH. MAX_SHORT_LENGTH (see below).
There is no point in redundantly deriving MAX_SHORT_LENGTH from struct size differencing (scaled); rather use the dimension used in the definition of JSShortString's data member (which should be mData to match the other members in JSString).
>+
>+ if ((length + 1) < MAX_LENGTH) {
Don't over-parenthesize.
This cannot be right given the definition of MAX_LENGTH and the earlier (different limit, same maximum not fencepost idea). MAX_LENGTH as you defined it just above (const size_t local to js_NewEmptyString) is not a fencepost for length + 1, it is a maximum value for capacity (length + 1).
Lose the + 1 and use <= MAX_SHORT_LENGTH.
>+ JSShortString* sstr = js_NewGCShortString(cx, GCX_STRING);
Space before * not after -- prevailing style in jsstr.cpp is consistent.
>+ jschar* chars = (jschar*)JS_malloc(cx, (length + 1) * sizeof(jschar));
Space before * not after (and space before * in the cast too -- typically we use a space between the cast and a JS_malloc(...) call to let the code breathe a bit instead of looking like (foo*)...( line noise).
>+ if (!chars)
>+ return NULL;
>+ JSString* str = js_NewString(cx, chars, length);
Ditto.
> js_NewStringCopyN(JSContext *cx, const jschar *s, size_t n)
> {
>- jschar *news;
>- JSString *str;
>-
>- news = (jschar *) JS_malloc(cx, (n + 1) * sizeof(jschar));
Here's an example of prevailing cast + malloc style.
>+ JSString *str = js_NewEmptyString(cx, n);
>+ if (!str)
>+ return NULL;
>+
>+ JS_ASSERT(str->isFlat());
>+ jschar* chars = str->flatChars();
Space before * not after.
>+ JS_ASSERT(str->isFlat());
>+ jschar* chars = str->flatChars();
Again.
>+ bool isShort() {
>+ return mChars == (jschar*)(this + 1);
Space the cast as surrounding code does.
>+struct JSShortString : public JSString {
class not struct, no C API users requiring struct.
>+ jschar data[sizeof(jschar) * 16 - sizeof(JSString)];
mData, and pull out the dimension so you can use it in the .cpp file.
For use in the .cpp file, pull out a const size_t MAX_SHORT_LENGTH = 15 definition, and dimension like so:
jschar mData[MAX_SHORT_LENGTH + 1].
>+
>+ jschar* initShort(size_t length) {
Space before * not after.
>+ JS_ASSERT(length <= MAX_LENGTH);
This MAX_LENGTH can't be right. Use MAX_SHORT_LENGTH.
>+ mLength = length;
>+ return mChars = data;
>+ }
>+};
>+JS_STATIC_ASSERT(sizeof(JSShortString) > sizeof(JSString));
This is a vacuous assertion. There's no way on earth the size could be < or ==.
/be
Comment 20•15 years ago
|
||
Sorry, I missed the inlining reason to move js_NewString. Odd to have it out of line and inline but if it helps gcc on Mac... (does it matter to MSVC or ICC?).
> >+ jschar data[sizeof(jschar) * 16 - sizeof(JSString)];
>
> mData, and pull out the dimension so you can use it in the .cpp file.
>
> For use in the .cpp file, pull out a const size_t MAX_SHORT_LENGTH = 15
I changed the size, didn't I? 40 byte JSShortString isn't so good a size class.
But the sizeof based definition is obscure. MAX_SHORT_LENGTH still wins as name and way to dimension (letting compiler compute size_t scaling by jschar size).
How about MAX_SHORT_LENGTH = 27? From Gregor's integration that looks to cover > 85% of cases. 56 bytes + 8 for JSString == 64 -- a JS_STATIC_ASSERT that the size is a power of two would be non-vacuous and helpful.
/be
Comment 21•15 years ago
|
||
what's the status here?
Assignee | ||
Updated•15 years ago
|
Attachment #390396 -
Attachment is obsolete: true
Assignee | ||
Comment 22•15 years ago
|
||
patch farmed out into bug 553541
Updated•15 years ago
|
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → DUPLICATE
You need to log in
before you can comment on or make changes to this bug.
Description
•