Closed Bug 1019512 Opened 10 years ago Closed 10 years ago

Latin1 strings: support trim*, toLowerCase, toUpperCase

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla32

People

(Reporter: jandem, Assigned: jandem)

References

Details

Attachments

(2 files)

      No description provided.
Pretty straight-forward. Note that we could add a IsSpace(Latin1Char), but IsSpace is already very efficient for ASCII chars.

I've been replacing many getChars calls with ensureLinear. We'll probably be able to remove getChars when I'm done.
Attachment #8433244 - Flags: review?(luke)
toLowerCase is pretty straight-forward: if the input is a Latin1 string, the output must also be a Latin1 string.

toUpperCase doesn't have this property: \u00ff maps to \u0178, a non-Latin1 char. So for now toUpperCase always returns a TwoByte string. I also added a test for this.

There are some other, pre-existing issues with toLowerCase/toUpperCase:

* If the string is small, we should use a (fat) inline string.
* If the input has the right case, we should return it without creating a new string.
* ToLowerCase(c) and ToUpperCase(c) could probably be faster, at least if the argument is Latin1Char.

I'll file a follow-up bug to address these. I don't want to spend too much time on this until the rest is done.
Attachment #8433250 - Flags: review?(luke)
(In reply to Jan de Mooij [:jandem] from comment #2)
> I'll file a follow-up bug to address these.

Bug 1019543.
Attachment #8433244 - Flags: review?(luke) → review+
Comment on attachment 8433250 [details] [diff] [review]
Part 2 - toLowerCase and toUpperCase

Review of attachment 8433250 [details] [diff] [review]:
-----------------------------------------------------------------

::: js/src/jsstr.cpp
@@ +695,5 @@
> +        const CharT *chars = str->chars<CharT>(nogc);
> +        for (size_t i = 0; i < length; i++) {
> +            jschar c = unicode::ToLowerCase(chars[i]);
> +            if (IsSame<CharT, Latin1Char>::value)
> +                MOZ_ASSERT(c <= 0xff);

Perhaps JS_ASSERT_IF?

@@ +703,5 @@
> +    }
> +
> +    JSString *res = js_NewString<CanGC>(cx, newChars, length);
> +    if (!res) {
> +        js_free(newChars);

How about instead making newChars a ScopedJSFreePtr and putting newChars.forget() on the success return path?  Even if it's trivial in this case, I like the general style of using RAII to handle failure paths.

@@ +770,3 @@
>  {
> +    // toUpperCase on a Latin1 string can yield a non-Latin1 string. For now,
> +    // we use a TwoByte string for the result.

Might be nice to file a [meta] bug for each case to consider in the future to track each of these cases.  If nothing else, could be some [good first bugs].  It seems like eventually we should try to do them all since any of them, for the wrong workload could make a 2x space difference.

@@ +784,5 @@
> +    }
> +
> +    JSString *res = js_NewString<CanGC>(cx, newChars, length);
> +    if (!res) {
> +        js_free(newChars);

Same ScopedJSFreePtr comment.

::: js/src/jsstr.h
@@ +96,2 @@
>  extern JSFlatString *
> +js_NewString(js::ThreadSafeContext *cx, CharT *chars, size_t length);

Perhaps in one of these string patches we could remove the js_ prefixes from all these string functions and move them into vm/String.h.  They increasingly stick out like a sore thumb.
Attachment #8433250 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #4)
> > +            if (IsSame<CharT, Latin1Char>::value)
> > +                MOZ_ASSERT(c <= 0xff);
> 
> Perhaps JS_ASSERT_IF?

Yeah, that's what I tried first, but it doesn't work:

  jsstr.cpp:698:60: error: too many arguments provided to function-like macro invocation
              JS_ASSERT_IF(IsSame<CharT, Latin1Char>::value, c <= 0xff);

I think the preprocessor treats the "," after CharT as an argument separator...
https://hg.mozilla.org/integration/mozilla-inbound/rev/67308d682238
https://hg.mozilla.org/integration/mozilla-inbound/rev/c3d4d93e55b3

(In reply to Luke Wagner [:luke] from comment #4)
> How about instead making newChars a ScopedJSFreePtr and putting
> newChars.forget() on the success return path?  Even if it's trivial in this
> case, I like the general style of using RAII to handle failure paths.

Good idea, done.

> Might be nice to file a [meta] bug for each case to consider in the future
> to track each of these cases.  If nothing else, could be some [good first
> bugs].  It seems like eventually we should try to do them all since any of
> them, for the wrong workload could make a 2x space difference.

Yeah I made the follow-up bug block the "latin1strings" bug, I'll revisit these bugs once the initial conversion is done.
https://hg.mozilla.org/mozilla-central/rev/67308d682238
https://hg.mozilla.org/mozilla-central/rev/c3d4d93e55b3
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: