Closed
Bug 1019512
Opened 10 years ago
Closed 10 years ago
Latin1 strings: support trim*, toLowerCase, toUpperCase
Categories
(Core :: JavaScript Engine, defect)
Core
JavaScript Engine
Tracking
()
RESOLVED
FIXED
mozilla32
People
(Reporter: jandem, Assigned: jandem)
References
Details
Attachments
(2 files)
(deleted),
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
luke
:
review+
|
Details | Diff | Splinter Review |
No description provided.
Assignee | ||
Comment 1•10 years ago
|
||
Pretty straight-forward. Note that we could add a IsSpace(Latin1Char), but IsSpace is already very efficient for ASCII chars. I've been replacing many getChars calls with ensureLinear. We'll probably be able to remove getChars when I'm done.
Attachment #8433244 -
Flags: review?(luke)
Assignee | ||
Comment 2•10 years ago
|
||
toLowerCase is pretty straight-forward: if the input is a Latin1 string, the output must also be a Latin1 string. toUpperCase doesn't have this property: \u00ff maps to \u0178, a non-Latin1 char. So for now toUpperCase always returns a TwoByte string. I also added a test for this. There are some other, pre-existing issues with toLowerCase/toUpperCase: * If the string is small, we should use a (fat) inline string. * If the input has the right case, we should return it without creating a new string. * ToLowerCase(c) and ToUpperCase(c) could probably be faster, at least if the argument is Latin1Char. I'll file a follow-up bug to address these. I don't want to spend too much time on this until the rest is done.
Attachment #8433250 -
Flags: review?(luke)
Assignee | ||
Comment 3•10 years ago
|
||
(In reply to Jan de Mooij [:jandem] from comment #2) > I'll file a follow-up bug to address these. Bug 1019543.
Blocks: latin1strings, 1019543
![]() |
||
Updated•10 years ago
|
Attachment #8433244 -
Flags: review?(luke) → review+
![]() |
||
Comment 4•10 years ago
|
||
Comment on attachment 8433250 [details] [diff] [review] Part 2 - toLowerCase and toUpperCase Review of attachment 8433250 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jsstr.cpp @@ +695,5 @@ > + const CharT *chars = str->chars<CharT>(nogc); > + for (size_t i = 0; i < length; i++) { > + jschar c = unicode::ToLowerCase(chars[i]); > + if (IsSame<CharT, Latin1Char>::value) > + MOZ_ASSERT(c <= 0xff); Perhaps JS_ASSERT_IF? @@ +703,5 @@ > + } > + > + JSString *res = js_NewString<CanGC>(cx, newChars, length); > + if (!res) { > + js_free(newChars); How about instead making newChars a ScopedJSFreePtr and putting newChars.forget() on the success return path? Even if it's trivial in this case, I like the general style of using RAII to handle failure paths. @@ +770,3 @@ > { > + // toUpperCase on a Latin1 string can yield a non-Latin1 string. For now, > + // we use a TwoByte string for the result. Might be nice to file a [meta] bug for each case to consider in the future to track each of these cases. If nothing else, could be some [good first bugs]. It seems like eventually we should try to do them all since any of them, for the wrong workload could make a 2x space difference. @@ +784,5 @@ > + } > + > + JSString *res = js_NewString<CanGC>(cx, newChars, length); > + if (!res) { > + js_free(newChars); Same ScopedJSFreePtr comment. ::: js/src/jsstr.h @@ +96,2 @@ > extern JSFlatString * > +js_NewString(js::ThreadSafeContext *cx, CharT *chars, size_t length); Perhaps in one of these string patches we could remove the js_ prefixes from all these string functions and move them into vm/String.h. They increasingly stick out like a sore thumb.
Attachment #8433250 -
Flags: review?(luke) → review+
Assignee | ||
Comment 5•10 years ago
|
||
(In reply to Luke Wagner [:luke] from comment #4) > > + if (IsSame<CharT, Latin1Char>::value) > > + MOZ_ASSERT(c <= 0xff); > > Perhaps JS_ASSERT_IF? Yeah, that's what I tried first, but it doesn't work: jsstr.cpp:698:60: error: too many arguments provided to function-like macro invocation JS_ASSERT_IF(IsSame<CharT, Latin1Char>::value, c <= 0xff); I think the preprocessor treats the "," after CharT as an argument separator...
Assignee | ||
Comment 6•10 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/67308d682238 https://hg.mozilla.org/integration/mozilla-inbound/rev/c3d4d93e55b3 (In reply to Luke Wagner [:luke] from comment #4) > How about instead making newChars a ScopedJSFreePtr and putting > newChars.forget() on the success return path? Even if it's trivial in this > case, I like the general style of using RAII to handle failure paths. Good idea, done. > Might be nice to file a [meta] bug for each case to consider in the future > to track each of these cases. If nothing else, could be some [good first > bugs]. It seems like eventually we should try to do them all since any of > them, for the wrong workload could make a 2x space difference. Yeah I made the follow-up bug block the "latin1strings" bug, I'll revisit these bugs once the initial conversion is done.
https://hg.mozilla.org/mozilla-central/rev/67308d682238 https://hg.mozilla.org/mozilla-central/rev/c3d4d93e55b3
Status: ASSIGNED → RESOLVED
Closed: 10 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla32
You need to log in
before you can comment on or make changes to this bug.
Description
•