Closed Bug 577883 Opened 14 years ago Closed 14 years ago

TM: inline unit string comparison

Categories

(Core :: JavaScript Engine, defect)

Other Branch
x86
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: gal, Assigned: gal)

References

Details

(Whiteboard: fixed-in-tracemonkey)

Attachments

(1 file, 1 obsolete file)

No description provided.
Depends on: 577882
sayrer and I both found this issue separately over the last 2 years, time to finally fix it. I will try to find the original bug(s) and dup it/them. Tired and want to upload this.
Attached patch patch (obsolete) (deleted) — Splinter Review
Measured it again. This is closer to a 30ms/7% speedup with the other patch actually (on sunspider).
Assignee: general → gal
Attachment #456712 - Flags: review?(nnethercote)
Comment on attachment 456712 [details] [diff] [review] patch >+template <typename T> >+static inline bool CompareChars(const jschar **s1p, const jschar **s2p, size_t *np) >+{ >+ const size_t N = sizeof(T) / sizeof(jschar); >+ while (*np >= N) { >+ if (*reinterpret_cast<const T *>(*s1p) != *reinterpret_cast<const T *>(*s2p)) >+ return false; >+ (*np) -= N; >+ (*s1p) += N; >+ (*s2p) += N; >+ } >+ return true; >+} CompareChars is an awful name for a function returning a bool -- what does a true result mean? "CharsAreEqual" or something similar would be better. >+ inline bool equals(JSString *other) { >+ /* Fast case: pointer equality could be a quick win. */ >+ if (this == other) >+ return true; >+ >+ size_t n = length(); >+ if (n != other->length()) >+ return false; >+ >+ if (n == 0) >+ return true; >+ >+ const jschar *s1 = chars(); >+ const jschar *s2 = other->chars(); >+ if (n == 1) >+ return *s1 == *s2; >+ >+ if (sizeof(jsuword) == 8 && !js::CompareChars<jsuword>(&s1, &s2, &n)) >+ return false; >+ if (!js::CompareChars<uint32>(&s1, &s2, &n)) >+ return false; >+ JS_ASSERT(n == 0 || n == 1); >+ return n == 0 || *s1 == *s2; >+ } Barf. So CompareChars is effectively unrolling the loop? On 64-bits you first try comparing 8 bytes at a time, and then fall back to 4 bytes? I guess you're assuming the compiler will eliminate the "sizeof(jsuword)..." test on 32-bit machines as dead code? What happens if the size of the string is not a multiple of 8 or 4 -- won't you be comparing values past the end of the string? In general, there's way too much going on here. Comments would help. I also think the template class confuses things. Just making the two instances explicit would probably be no longer but much easier to understand. I only skimmed the rest of the patch, but I think the same problems occur as in the patch for bug 577882 -- LIR_uc2ui used instead of LIR_us2ui and you're assuming the string is flat.
Attachment #456712 - Flags: review?(nnethercote) → review-
Is the unit string bit worth doing in JM?
Back to the patch: you added inlining, plus you special-cased 1-length strings in equals(), plus you did the loop unrolling in equals(). Any idea the perf benefits of these individually? Also, would using memcmp() instead of unrolling the loop yourself be smarter? Presumably the system memcmp() is optimised already.
> Presumably the system memcmp() is optimised already. We discovered this to not necessarily be the case on some operating systems when experimenting with indexOf. See 551539 for details.
(In reply to comment #6) > > Presumably the system memcmp() is optimised already. > > We discovered this to not necessarily be the case on some operating systems > when experimenting with indexOf. See 551539 for details. That's the kind of thing that I would *love* to see in a comment!
The comments added with the code in bug 551539 mention the issue, yes. Though note that memcmp was faster than manual loop on Windows/Mac for long enough strings, too.
Summary: TM: inline unit string comparison and speed up js_StringEquals → TM: inline unit string comparison
Attached patch patch (deleted) — Splinter Review
Attachment #456712 - Attachment is obsolete: true
Cachegrind results of this patch plus the one in bug 584499: --------------------------------------------------------------- | millions of instructions executed | | total | on-trace (may overestimate) | --------------------------------------------------------------- | 111.202 111.487 (0.997x) | 20.797 20.809 (0.999x) | 3d-cube | 56.990 56.948 (1.001x) | 22.313 22.313 (------) | 3d-morph | 147.820 146.788 (1.007x) | 21.982 21.971 (1.001x) | 3d-raytrace | 77.548 77.419 (1.002x) | 17.401 17.400 (------) | access-binary- | 183.728 183.440 (1.002x) | 103.707 103.704 (------) | access-fannkuc | 41.565 41.403 (1.004x) | 17.553 17.551 (------) | access-nbody | 60.014 59.983 (1.001x) | 27.013 27.012 (------) | access-nsieve | 15.688 15.681 (------) | 2.862 2.862 (------) | bitops-3bit-bi | 43.303 43.285 (------) | 30.280 30.280 (------) | bitops-bits-in | 22.773 22.770 (------) | 10.219 10.219 (------) | bitops-bitwise | 71.624 71.610 (------) | 39.453 39.453 (------) | bitops-nsieve- | 42.397 42.309 (1.002x) | 26.655 26.654 (------) | controlflow-re | 47.115 47.002 (1.002x) | 5.541 5.660 (0.979x) | crypto-md5 | 31.336 31.286 (1.002x) | 7.030 7.109 (0.989x) | crypto-sha1 | 127.956 121.973 (1.049x) | 10.461 12.076 (0.866x) | date-format-to | 99.144 98.905 (1.002x) | 11.118 11.116 (------) | date-format-xp | 52.617 52.594 (------) | 27.614 27.614 (------) | math-cordic | 38.416 38.406 (------) | 6.252 6.252 (------) | math-partial-s | 28.590 28.474 (1.004x) | 10.286 10.285 (------) | math-spectral- | 100.938 100.914 (------) | 75.045 75.045 (------) | regexp-dna | 44.077 40.705 (1.083x) | 8.098 10.397 (0.779x) | string-base64 | 123.212 123.302 (0.999x) | 24.544 24.543 (------) | string-fasta | 272.611 269.834 (1.010x) | 6.840 6.671 (1.025x) | string-tagclou | 251.422 251.321 (------) | 5.637 5.635 (------) | string-unpack- | 64.493 63.695 (1.013x) | 7.401 7.521 (0.984x) | string-validat ------- | 2156.590 2141.546 (1.007x) | 546.114 550.163 (0.993x) | all Basically it's a good improvement for date-format-tofte, little difference for anything else.
Comment on attachment 463431 [details] [diff] [review] patch >+ if (!l_str->isRope() && !r_str->isRope() && l_str->length() == 1 && r_str->length() == 1) { >+ VMSideExit *exit = snapshot(BRANCH_EXIT); >+ guard(true, lir->ins2(LIR_eqp, getStringLength(l_ins), INS_CONSTWORD(1)), exit); >+ guard(true, lir->ins2(LIR_eqp, getStringLength(r_ins), INS_CONSTWORD(1)), exit); Factor out INS_CONSTWORD(1) so CseFilter doesn't have to? >+JS_REQUIRES_STACK LIns* >+TraceRecorder::getStringLength(LIns* str_ins) >+{ >+ return lir->ins2ImmI(LIR_rshup, >+ lir->insLoad(LIR_ldp, str_ins, offsetof(JSString, mLengthAndFlags), >+ ACCSET_OTHER, LOAD_CONST), >+ JSString::FLAGS_LENGTH_SHIFT); >+} >+ >+JS_REQUIRES_STACK LIns* >+TraceRecorder::getStringChars(LIns* str_ins) >+{ >+ return lir->insLoad(LIR_ldp, str_ins, offsetof(JSString, mChars), ACCSET_OTHER, LOAD_CONST); >+} I like these. I'll like them even more if you use addName() to give the values names like "strLen" and "strChars". You could do likewise in getStringLength() with mLengthAndFlags. Nice patch!
Attachment #463431 - Flags: review+
Whiteboard: fixed-in-tracemonkey
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Depends on: 585309
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: