Closed Bug 521423 Opened 15 years ago Closed 15 years ago

[FIX]Make JSString 4 words instead of 2 (str_slice on large strings is allocation-bound)

Categories

(Core :: JavaScript Engine, defect)

x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

(Keywords: perf, Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 3 obsolete files)

Right now dependent strings have these limits on how long and offset they can be before copying, because we have to stuff the length+offset in 28 bits. This is affecting dromaeo's slice() test, at least, which does, amongst others: ret = str.slice(0); ret = str.slice( 12000, -1 ); on a string of length 2^16. 80+% of our time on the test is spent in memcpy.
Summary: Make JSString 4 words instead of 2 → Make JSString 4 words instead of 2 (str_slice on large strings is allocation-bound)
Attached patch Proposed patch (obsolete) (deleted) — Splinter Review
Attachment #413763 - Flags: review?(brendan)
Assignee: general → bzbarsky
Summary: Make JSString 4 words instead of 2 (str_slice on large strings is allocation-bound) → [FIX]Make JSString 4 words instead of 2 (str_slice on large strings is allocation-bound)
Keywords: perf
Can you use bitfields for the flags now? Looks good at a glance, let me get to it (or an updated patch) on Monday. Thanks! /be
> Can you use bitfields for the flags now? If I do, what do I replace JS_ATOMIC_SET_MASK with?
You'd have to union a struct containing the bitfields with a jsword -- may not be worth it, but could win on the using expression side of the fence. This reminds me, though: jsword != size_t so if you stick with flags as a single member, then js_AtomicSetMask wants jsword. /be
> You'd have to union a struct containing the bitfields with a jsword -- may not > be worth it, but could win on the using expression side of the fence. Honestly, the using code is well-encapsulated; the size_t bitfield we have now is treated as a private member, effectively; all access is via accessor methods. So switching to a bitfield would just make the accessor methods prettier. > js_AtomicSetMask wants jsword. Sure, and the casts for that are already there, as well as a static assert that sizeof() is the same for jsword and size_t. Has to be that way, since mLength is size_t. ;) It sounds to me like the bitfield is more trouble than it's worth, honestly.
The bitfields could be used directly instead of the flag encapsulation methods, but since the flag encapsulation method cost has been sunk, there's no point. But I do think the flags field should be jsword, not size_t. Also, and this can be in a later bug but since you are hitting every line: we had a style debate and plunked for unprefixed and unsuffixed member names: just foo, no mFoo or m_foo or foo_. /be
In this case that would be |length|, which makes the length() getter and the |length| argument to some of the methods extra fun (note I already had to |this->length()| in one spot to disambiguate. I can rename the members and some of the arguments, I guess, but I'd also need to rename the method and touch all its callers. That doesn't sound that appealing, honestly. ;)
> But I do think the flags field should be jsword, not size_t. Done.
Attached patch With jsword mFlags (obsolete) (deleted) — Splinter Review
Attachment #413763 - Attachment is obsolete: true
Attachment #413984 - Flags: review?(brendan)
Attachment #413763 - Flags: review?(brendan)
Any benchmark results? Will review tomorrow a.m. when caffeinated! /be
> Any benchmark results? The dromaeo slice() benchmark goes from 0.81 runs/s to 100 runs/s. I can try to run whatever other benchmarks you want as desired.
SunSpider matters more and (I can't believe I'm writing this) is more representative than Dromaeo. /be
OK. Sunspider (in browser; if there's a way I should be running it in shell I haven't found the right docs so far) I get: TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: ?? 931.0ms +/- 2.1% 937.2ms +/- 0.7% not conclusive: might be *1.01x as slow* ============================================================================= 3d: ?? 146.6ms +/- 1.3% 148.2ms +/- 7.3% not conclusive: might be *1.01x as slow* cube: - 53.2ms +/- 2.0% 52.4ms +/- 1.3% morph: - 29.8ms +/- 1.9% 29.2ms +/- 1.9% raytrace: ?? 63.6ms +/- 4.1% 66.6ms +/- 16.5% not conclusive: might be *1.05x as slow* access: ?? 129.8ms +/- 3.6% 131.6ms +/- 2.1% not conclusive: might be *1.01x as slow* binary-trees: - 26.6ms +/- 11.3% 26.2ms +/- 8.5% fannkuch: ?? 58.2ms +/- 4.9% 60.0ms +/- 2.1% not conclusive: might be *1.03x as slow* nbody: ?? 27.8ms +/- 7.3% 28.0ms +/- 10.4% not conclusive: might be *1.01x as slow* nsieve: ?? 17.2ms +/- 12.9% 17.4ms +/- 9.6% not conclusive: might be *1.01x as slow* bitops: ?? 44.8ms +/- 13.4% 45.2ms +/- 4.5% not conclusive: might be *1.01x as slow* 3bit-bits-in-byte: *1.20x as slow* 2.0ms +/- 0.0% 2.4ms +/- 28.4% significant bits-in-byte: ?? 11.6ms +/- 22.2% 12.8ms +/- 12.7% not conclusive: might be *1.10x as slow* bitwise-and: ?? 3.0ms +/- 29.3% 3.2ms +/- 17.4% not conclusive: might be *1.07x as slow* nsieve-bits: - 28.2ms +/- 13.0% 26.8ms +/- 3.9% controlflow: - 12.4ms +/- 34.4% 11.8ms +/- 26.2% recursive: - 12.4ms +/- 34.4% 11.8ms +/- 26.2% crypto: ?? 56.4ms +/- 8.5% 57.6ms +/- 7.1% not conclusive: might be *1.02x as slow* aes: ?? 32.4ms +/- 9.2% 32.6ms +/- 10.3% not conclusive: might be *1.01x as slow* md5: ?? 14.4ms +/- 14.4% 14.8ms +/- 7.0% not conclusive: might be *1.03x as slow* sha1: ?? 9.6ms +/- 19.6% 10.2ms +/- 15.9% not conclusive: might be *1.06x as slow* date: - 156.2ms +/- 1.3% 155.8ms +/- 1.3% format-tofte: ?? 88.0ms +/- 1.4% 88.8ms +/- 2.5% not conclusive: might be *1.01x as slow* format-xparb: - 68.2ms +/- 1.5% 67.0ms +/- 3.5% math: ?? 36.6ms +/- 5.7% 37.2ms +/- 6.4% not conclusive: might be *1.02x as slow* cordic: - 12.4ms +/- 9.0% 12.4ms +/- 19.5% partial-sums: ?? 16.2ms +/- 6.4% 16.4ms +/- 6.8% not conclusive: might be *1.01x as slow* spectral-norm: ?? 8.0ms +/- 19.0% 8.4ms +/- 22.4% not conclusive: might be *1.05x as slow* regexp: - 57.4ms +/- 2.9% 56.8ms +/- 3.2% dna: - 57.4ms +/- 2.9% 56.8ms +/- 3.2% string: ?? 290.8ms +/- 0.8% 293.0ms +/- 0.8% not conclusive: might be *1.01x as slow* base64: ?? 16.0ms +/- 14.5% 17.2ms +/- 13.9% not conclusive: might be *1.07x as slow* fasta: - 73.6ms +/- 1.9% 71.8ms +/- 2.8% tagcloud: ?? 89.4ms +/- 0.8% 89.6ms +/- 2.9% not conclusive: might be *1.00x as slow* unpack-code: *1.02x as slow* 82.0ms +/- 1.5% 83.8ms +/- 1.2% significant validate-input: ?? 29.8ms +/- 5.4% 30.6ms +/- 6.2% not conclusive: might be *1.03x as slow* Basically seems to be a wash, which is the best I can hope for on sunspider: it doesn't exercise the pathological cases, so could only pay the cost of slightly bigger JSString... There's at least one in-development site that this patch definitely helps, though; see bug 525161.
(In reply to comment #13) > OK. Sunspider (in browser; if there's a way I should be running it in shell I > haven't found the right docs so far) You have to run sunspider --shell=<path-to-js> and then sunspider-compare-results --shell .... > There's at least one in-development site that this patch definitely helps, > though; see bug 525161. I'm following that bug but I couldn't tell what was going on. Can the reporter say whether he was using new String (and is no longer)? /be
No idea, but they're substringing an XHR responseText, which causes copies without this patch. For the "run sunspider --shell=..." part, do I need to be doing this on something not in our srctree? We don't seem to have a sunspider executable anything around, right?
The sunspider program is a perl script ;-). So is sunspider-compare-results. Get 'em from webkit.org: http://trac.webkit.org/browser/trunk/SunSpider /be
TEST COMPARISON FROM TO DETAILS ============================================================================= ** TOTAL **: 1.013x as fast 838.1ms +/- 0.2% 827.0ms +/- 0.2% significant ============================================================================= 3d: - 129.8ms +/- 0.6% 129.3ms +/- 0.6% cube: - 42.4ms +/- 0.9% 42.4ms +/- 1.2% morph: 1.024x as fast 25.5ms +/- 1.5% 24.9ms +/- 0.9% significant raytrace: ?? 61.9ms +/- 0.4% 62.0ms +/- 0.5% not conclusive: might be *1.002x as slow* access: 1.034x as fast 122.5ms +/- 0.7% 118.5ms +/- 0.6% significant binary-trees: 1.008x as fast 26.0ms +/- 0.0% 25.8ms +/- 1.2% significant fannkuch: 1.056x as fast 58.4ms +/- 0.9% 55.3ms +/- 0.6% significant nbody: - 24.5ms +/- 1.5% 24.4ms +/- 1.5% nsieve: 1.046x as fast 13.6ms +/- 2.7% 13.0ms +/- 0.0% significant bitops: ?? 35.7ms +/- 1.0% 36.0ms +/- 0.0% not conclusive: might be *1.008x as slow* 3bit-bits-in-byte: ?? 1.3ms +/- 26.6% 1.4ms +/- 26.4% not conclusive: might be *1.077x as slow* bits-in-byte: 1.044x as fast 9.4ms +/- 3.9% 9.0ms +/- 0.0% significant bitwise-and: - 2.0ms +/- 0.0% 2.0ms +/- 0.0% nsieve-bits: *1.026x as slow* 23.0ms +/- 0.0% 23.6ms +/- 1.6% significant controlflow: - 8.2ms +/- 3.7% 8.1ms +/- 2.8% recursive: - 8.2ms +/- 3.7% 8.1ms +/- 2.8% crypto: - 51.5ms +/- 2.2% 51.3ms +/- 1.6% aes: - 29.0ms +/- 3.5% 28.9ms +/- 2.5% md5: - 14.5ms +/- 2.6% 14.4ms +/- 2.6% sha1: - 8.0ms +/- 0.0% 8.0ms +/- 0.0% date: 1.017x as fast 134.2ms +/- 0.2% 132.0ms +/- 0.3% significant format-tofte: 1.039x as fast 68.6ms +/- 0.5% 66.0ms +/- 0.5% significant format-xparb: *1.006x as slow* 65.6ms +/- 0.6% 66.0ms +/- 0.0% significant math: ?? 28.8ms +/- 2.0% 28.9ms +/- 0.8% not conclusive: might be *1.003x as slow* cordic: ?? 9.1ms +/- 2.5% 9.4ms +/- 3.9% not conclusive: might be *1.033x as slow* partial-sums: - 13.8ms +/- 2.2% 13.5ms +/- 2.8% spectral-norm: ?? 5.9ms +/- 3.8% 6.0ms +/- 0.0% not conclusive: might be *1.017x as slow* regexp: - 41.1ms +/- 0.5% 41.0ms +/- 0.0% dna: - 41.1ms +/- 0.5% 41.0ms +/- 0.0% string: 1.016x as fast 286.3ms +/- 0.3% 281.9ms +/- 0.4% significant base64: *1.031x as slow* 12.8ms +/- 2.4% 13.2ms +/- 2.3% significant fasta: *1.015x as slow* 67.7ms +/- 0.5% 68.7ms +/- 0.5% significant tagcloud: *1.013x as slow* 89.4ms +/- 0.4% 90.6ms +/- 0.6% significant unpack-code: 1.090x as fast 90.4ms +/- 0.4% 82.9ms +/- 0.5% significant validate-input: *1.019x as slow* 26.0ms +/- 0.0% 26.5ms +/- 1.9% significant
Comment on attachment 413984 [details] [diff] [review] With jsword mFlags Why not keep flatLength for its isFlat() assertion, and similarly for dependentLength()? It would reduce the patch size. Easy to edit the patch (or a diff -U 1 version) and reapply with patch(1). Agree on getting rid of the prefix complexity of course. /be
> Why not keep flatLength for its isFlat() assertion, and similarly for > dependentLength()? Either way; I figured since the implementation is now identical it wasn't worth the cognitive load of making consumers decide between the two (and specifically I wanted to remove the branching some consumers had to do). I suppose I could remove the branching but keep the specific calls in places where we know the string is one or the other; worth it?
I'm in favor of reducing the size of the patch but I'm imposing a bit on your good nature. At this point I'd like jorendorff or jwalden or both to weigh in. Let me say r=me but get one of them to cast the deciding vote. /be
I vote for bz's patch as it stands.
My vote turned out not to be deciding after all. :)
Context: on IRC I clarified that I was concerned about loss of isFlat() assert coverage. That seems worth keeping. /be
Too terse again: and not all paths that called flatLength() also call flatChars(). So there's some value to having flatLength() -- which just asserts isFlat() and then returns mLength with the patch. /be
OK, brendan explained what he actually wants: making sure that we don't lose assert coverage with the length changes. Looking at the patch: * The dependentLength call in JS_GetStringChars is always followed by a dependentChars() call, which asserts. * The dependentLength() call in MinimizeDependentStrings comes after an assert that str->IsDependent(). * The dependentLength call in js_UndependString is always followed by dependentChars(), unless the malloc fails and we error out. * The dependentLength call in getCharsAndLength is followed by dependentChars() * The dependentLength call in getCharsAndEnd is followed by dependentChars() * The flatLength calls in js_AtomizeString comes right after flatChars() calls. * The flatLength() call in GetAtomTotalSize will no longer assert if ATOM_TO_STRING returns a non-flat string. This could use an isFlat() assert, since it'll return the wrong thing if an atom happens to be a dependent string somehow. * The flatLength call in js_CheckForStringIndex comes after a flatChars() call. * The flatLength calls in js_FoldConstants will stop asserting. Again, this is an ATOM_TO_STRING; should ATOM_TO_STRING handle the asserting, maybe? * The flatLength call in js_GetDependentStringChars is followed by flatChars(). * The flatLength call in getCharsAndLength is followed by flatChars() * The flatLength call in getCharsAndEnd is followed by clatChars()
(In reply to comment #18) > (From update of attachment 413984 [details] [diff] [review]) > Why not keep flatLength for its isFlat() assertion, and similarly for > dependentLength()? [...] > /be (In reply to comment #25) > OK, brendan explained what he actually wants: making sure that we don't lose > assert coverage with the length changes. I re-explained on IRC after explaining once in comment 18 :-/. > Looking at the patch: > > * The dependentLength call in JS_GetStringChars is always followed by a > dependentChars() call, which asserts. Asserting twice is not needed but not harmful. I'd prefer the parallel names but that's aesthetics. > * The dependentLength() call in MinimizeDependentStrings comes after an assert > that str->IsDependent(). That could be eliminated because dependentLength() asserts. > * The flatLength calls in js_AtomizeString comes right after flatChars() calls. > * The flatLength() call in GetAtomTotalSize will no longer assert if > ATOM_TO_STRING returns a non-flat string. This could use an isFlat() assert, > since it'll return the wrong thing if an atom happens to be a dependent > string somehow. The assert is good but why must it be pushed up into the flatLength() caller? I'd rather leave the flatLength() call there and let it do the deed. > * The flatLength calls in js_FoldConstants will stop asserting. Again, this > is an ATOM_TO_STRING; should ATOM_TO_STRING handle the asserting, maybe? It could at some cost -- it currently is a macro whose body is JSVAL_TO_STRING(ATOM_KEY(atom)). I know it's breaking a butterfly on a wheel, but my preference given all of this (thanks for the survey!) is to keep flatLength precisely for its common isFlat() assertion, and (bonus) for the name parallelism. /be
In case it is not obvious, the places where the patch combines things with length(), where the string could be flat or dependent, are good and winning. We want length(). I'm only taking issue with the flatLength or dependentLength call removals that lose assertion coverage. /be
Why isn't any and all consideration of renaming, of changing APIs, and of conceivably losing assertion coverage in the process being done in a separate bug? We should fix the summary and not permit this scope creep. I'd rather see the win here earlier and a slightly more laboriously modified source than one that intermeshes this with name changes, making that change harder to read and review.
Waldo: my point, but a bit extreme. We don't remodel a house in too many small jobs, or defer fixing a hole in the wall while sheetrocking the next room. But we do demolish then frame then finish then paint, so I think you're hitting close to the target. Again, I agree with bz (which may mean that I disagree with you) on preserving silly tests like (isFlat() ? flatLength() : dependentLength()) when we could use length(), and of course the prefix special case goes away. But I agree with you that the patch should be smaller than it is, and bz now agrees too. Let's not get too flamey, ok? /be
I'm actually not sure what I think about the "silly" tests. They made sense at the time, they may still make sense, or maybe we really should get JSFlatString and JSDependentString classes in place so that we can have method overriding give us safety and speed wins almost for free -- that might be best all around. I just don't think they need to, or should, be here.
We should not have vacuous tests of the form (isFlat() : mLength : mLength) in the code for long. Staging the cleanup that follows from mLength being its own size_t field with no flag bits to another bug is silly, you can quote me. /be
> Why isn't any and all consideration of renaming, of changing APIs, and of > conceivably losing assertion coverage in the process being done in a separate > bug? Because I asked whether separate staged changesets were desired and was told rolling it all into one was ok. ;) Though grated, I changed the length API in the same local patch in which I made the number of bits used for length not depend on the dependent bit...
Attached patch Said smaller patch (obsolete) (deleted) — Splinter Review
Note that we have tons of ATOM_TO_STRING(foo)->length() in jsparse.cpp; if we really care about the flatLength() assert coverage on that we should consider changing those to use flatLength. In any case, this patch should be what was asked for.
Attachment #413984 - Attachment is obsolete: true
Attachment #414204 - Flags: review?(brendan)
Attachment #413984 - Flags: review?(brendan)
Comment on attachment 414204 [details] [diff] [review] Said smaller patch >+++ b/js/src/jsstr.h >@@ -47,16 +47,17 @@ > * necessitating a pointer after the count, to form a separately allocated > * string descriptor. String descriptors are GC'ed, while their chars are > * allocated from the malloc heap. > */ > #include <ctype.h> > #include "jspubtd.h" > #include "jsprvtd.h" > #include "jslock.h" >+#include "jsapi.h" We should not include jsapi.h in jsstr.h (but if this were necessary, prevailing style alphabetizes within a group of includes: standard, js-nspr-like, js-remaining). Did you include jsapi.h for JSVAL_INT_MAX? That should not be used for MAX_LENGTH. IIRC we want a saner upper bound on string length than 2^30 or anything larger. We had 2^28 jschars, so you could just hardcode that to avoid making a change in this patch. > JS_ALWAYS_INLINE void getCharsAndLength(const jschar *&chars, size_t &length) { >+ length = this->length(); > if (isDependent()) { >- length = dependentLength(); > chars = dependentChars(); > } else { >- length = flatLength(); > chars = flatChars(); > } Don't overbrace single-line per if/then/else statements. Do use ?: (as getCharsAndEnd does) to set a single chars lvalue, instead of if-else. Do set length (the reference outparam) last. >@@ -10119,19 +10090,18 @@ TraceRecorder::record_JSOP_NOT() > lir->ins_eq0(lir->ins2(LIR_feq, v_ins, v_ins)))); > return ARECORD_CONTINUE; > } > if (JSVAL_TAG(v) == JSVAL_OBJECT) { > set(&v, lir->ins_peq0(get(&v))); > return ARECORD_CONTINUE; > } > JS_ASSERT(JSVAL_IS_STRING(v)); >- set(&v, lir->ins_peq0(lir->ins2(LIR_piand, >- lir->insLoad(LIR_ldp, get(&v), (int)offsetof(JSString, mLength)), >- INS_CONSTWORD(JSString::LENGTH_MASK)))); >+ set(&v, lir->ins_peq0(lir->insLoad(LIR_ldp, get(&v), >+ (int)offsetof(JSString, mLength)))); Are these (int) casts necessary? I think not and keep removing them. Another below in the patch. /be
(In reply to comment #34) > Did you include jsapi.h for JSVAL_INT_MAX? Yes; I preserved that because I assumed that there was a reason for the existing static assert to the effect that MAX_LENGTH <= JSVAL_INT_MAX. > We had 2^28 jschars, so you could just hardcode that to avoid > making a change in this patch. OK. Easy enough. > Don't overbrace single-line per if/then/else statements. > > Do use ?: (as getCharsAndEnd does) to set a single chars lvalue, instead of > if-else. > > Do set length (the reference outparam) last. OK. > Are these (int) casts necessary? I think not and keep removing them. Another > below in the patch. I basically tried to not touch the parts that didn't need touching here, especially given the general undocumented mess with LIR, multiple functions doing different things but having the same names, etc. It does look like all the insLoad signatures have int32_t as the third arg so the cast does more or less nothing. I can remove it.
Attached patch With those changes (deleted) — Splinter Review
Attachment #414204 - Attachment is obsolete: true
Attachment #414268 - Flags: review?(brendan)
Attachment #414204 - Flags: review?(brendan)
Bugzilla interdiff works fine for those last two patches if you just want to see the changes I made.
Comment on attachment 414268 [details] [diff] [review] With those changes >+++ b/js/src/jsstr.h >@@ -47,16 +47,17 @@ > * necessitating a pointer after the count, to form a separately allocated > * string descriptor. String descriptors are GC'ed, while their chars are > * allocated from the malloc heap. > */ > #include <ctype.h> > #include "jspubtd.h" > #include "jsprvtd.h" > #include "jslock.h" >+#include "jsapi.h" Please do remove this #include "jsapi.h". >+ static const size_t MAX_LENGTH = (1 << 28); /* That's what we used to do */ This comment will not age gracefully. How about "generous but sane length bound"? > JS_ALWAYS_INLINE void getCharsAndLength(const jschar *&chars, size_t &length) { >- if (isDependent()) { >- length = dependentLength(); >- chars = dependentChars(); >- } else { >- length = flatLength(); >- chars = flatChars(); >- } >+ chars = this->chars(); >+ length = this->length(); > } > > JS_ALWAYS_INLINE void getCharsAndEnd(const jschar *&chars, const jschar *&end) { >- end = isDependent() >- ? dependentLength() + (chars = dependentChars()) >- : flatLength() + (chars = flatChars()); >+ end = length() + (chars = this->chars()); This is cleaner -- I trust GCC will not test isFlat() more than once, and all that. Can you confirm from gdb disass or equiv.? Looks good otherwise -- r=me with above addressed. Thanks again! /be
Attachment #414268 - Flags: review?(brendan) → review+
> Please do remove this #include "jsapi.h". Oops, done. > How about "generous but sane length bound"? Done. > I trust GCC will not test isFlat() more than once I'd hope not; why would it? I can try to verify on the assembly level, but I don't even see why it would happen on the source level....
Nm, I was having double vision. /be
Whiteboard: fixed-in-tracemonkey
Status: NEW → RESOLVED
Closed: 15 years ago
Resolution: --- → FIXED
Depends on: 533148
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: