Closed
Bug 578205
Opened 14 years ago
Closed 14 years ago
Keep string characters inline for small strings
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
People
(Reporter: alangpierce, Assigned: alangpierce)
References
(Blocks 1 open bug)
Details
(Whiteboard: fixed-in-tracemonkey)
Attachments
(2 files, 2 obsolete files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review |
For small strings (<= 6 jschars on 32 bit, <= 12 jschars on 64-bit), we can avoid some malloc overhead by keeping the chars inline in the string header.
After the ropes patch (see bug 571549), here are some statistics on how often initFlat is called in all of sunspider (this is probably the same as the number of actual flat strings):
Length: Number of initFlat calls:
<= 6 12266
>= 7, <= 12 14347
>= 13, <= 32 8509
>= 33, <= 150 9724
>= 150 129
Total 44975
Assignee | ||
Updated•14 years ago
|
Blocks: JaegerSpeed
Comment 1•14 years ago
|
||
Dup of bug 553541 (which was a dup of an older bug).
/be
Comment 2•14 years ago
|
||
Dup either way, I don't care -- good to fix, hope we can avoid losing bugs like this in the future.
/be
Assignee | ||
Updated•14 years ago
|
Assignee: general → apierce
Assignee | ||
Comment 4•14 years ago
|
||
It turns out that this is harder than we thought. The plan was to keep the invariant that any string shorter than some fixed length would be a short string, and a short string must keep its chars inline. Ideally, callers would be changed to find ways around passing in a malloced buffer. If a malloced buffer must be passed in, though, we can keep the invariant by copying from the buffer and then freeing it. On GC, we would know that short strings don't have a buffer that needs to be freed.
I'm pretty sure JS_NewExternalString doesn't work with this model. In order to follow the API, we need to call the finalizer on the buffer when the string is GCed (we cause a refcounting bug in at least one call site if the finalizer is called during NewExternalString, so we can't free the buffer early), so we need to keep track of the pointer, probably in the string header. Losing a pointer of space means we can only hold strings of length 4 (or maybe 3) on 32-bit x86, which probably isn't worth it. It also seems bad to waste a pointer for a stupid special case like this.
I think a better approach is the one from bug 553541, where we allocate a double-size GC thing for these strings, and keep the string contents in the dummy string object (and possibly in the two unused pointers in the normal string header). That way, we can just have a mChars pointer that (except in the case of ropes) always points to the string characters.
Comment 5•14 years ago
|
||
(In reply to comment #4)
> In order to
> follow the API, we need to call the finalizer on the buffer when the string is
> GCed (we cause a refcounting bug in at least one call site if the finalizer is
> called during NewExternalString, so we can't free the buffer early)
Cannot we fix all the users NewExternalString not to depend on this or at least explicitly check if a string is short and avoid using NewExternalString if so?
If this is not that easy, what about just adding short external chars to a buffer and releasing them during the next GC?
Comment 6•14 years ago
|
||
Yeah, if we can copy something cheaper than we can refcnt it, we should do that, make a local copy and don't refcnt up (which requires an API change).
Is there a better API for external strings that would be easier? it's not a complex API, so consumers should be able to update pretty easily.
Assignee | ||
Comment 8•14 years ago
|
||
As far as I can tell, the only thing that would break is XPCStringConvert::ReadableToJSVal in xpcstring.cpp. It creates a string, and then adds a ref on success. Calling the finalizer under JS_NewExternalString would decrement the refcount before it is incremented, so we would free to early and then try to use the freed memory.
We could fix this without changing any function signatures by moving the addRef call to before NewExternalString and then releasing on failure. Alternatively, we could avoid the increment/decrement by giving back a flag that says whether or not the string was actually used. It would then be the caller's responsibility to free the buffer.
I still think that keeping a double-size header for short strings (and have mChars point to the right place in the string header) is better though, since it avoids a branch in chars() (which is heavily-used) and lets us use strings with length up to 12 on 32-bit. It's also safer and less error-prone, since it doesn't add any string invariants; it just sometimes changes the way strings are created.
Of course, if avoiding refcounting in ReadableToJSVal is important, we could do both.
Comment 9•14 years ago
|
||
(In reply to comment #8)
> As far as I can tell, the only thing that would break is
> XPCStringConvert::ReadableToJSVal in xpcstring.cpp. It creates a string, and
> then adds a ref on success.
This is just a bug in that code: it should hold first, drop only on error.
/be
Assignee | ||
Updated•14 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Comment 10•14 years ago
|
||
This is based off of the backed-out patch from bug 553541, with a few changes. I fixed a bug with the original version (js_InflateStringToBuffer was being called incorrectly), and made it create short strings in js_ConcatStrings, and hopefully made the usage of short strings a bit cleaner.
Assignee | ||
Comment 11•14 years ago
|
||
Assignee | ||
Updated•14 years ago
|
Attachment #458484 -
Flags: review?(gal)
Comment 12•14 years ago
|
||
Comment on attachment 458484 [details] [diff] [review]
Patch
>+inline void
>+js_short_strncpy(jschar *dest, const jschar *src, size_t num)
>+{
>+ for (size_t i = 0; i < num; i++)
>+ dest[i] = src[i];
>+}
>+
I think this wants a switch with specialized cases:
switch (num) {
case 1: *dest = *src; break;
case 2: *(uint32 *)dest = *(uint32 *)src; break;
etc
}
Also, a bunch more asserts would be nice, i.e. that short_strncpy is only called for known, short nums.
> /*
> * Return s advanced past any Unicode white space characters.
> */
> static inline const jschar *
> js_SkipWhiteSpace(const jschar *s, const jschar *end)
> {
> JS_ASSERT(s <= end);
> while (s != end && JS_ISSPACE(*s))
Attachment #458484 -
Flags: review?(gal) → review+
Assignee | ||
Comment 13•14 years ago
|
||
Attachment #458484 -
Attachment is obsolete: true
Assignee | ||
Comment 14•14 years ago
|
||
This patch is on top of the one from bug 578189, so that one needs to get pushed before this one.
Depends on: 578189
Assignee | ||
Comment 15•14 years ago
|
||
Attachment #459620 -
Attachment is obsolete: true
Comment 16•14 years ago
|
||
Whiteboard: fixed-in-tracemonkey
Comment 17•14 years ago
|
||
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•