Closed Bug 226439 Opened 21 years ago Closed 20 years ago

Reduce footprint of nsA(C?)String::Equals() applied to ASCII strings

Categories

(Core :: XPCOM, defect, P1)

defect

Tracking

()

RESOLVED FIXED

People

(Reporter: roc, Assigned: roc)

References

Details

(Keywords: memory-footprint)

Attachments

(8 files, 8 obsolete files)

(deleted), patch
roc
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
(deleted), patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
(deleted), patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
(deleted), patch
dbaron
: review+
dbaron
: superreview+
Details | Diff | Splinter Review
(deleted), patch
darin.moz
: review-
Details | Diff | Splinter Review
(deleted), patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
(deleted), application/x-tar
darin.moz
: review+
darin.moz
: superreview+
Details
(deleted), patch
darin.moz
: review+
darin.moz
: superreview+
Details | Diff | Splinter Review
Recently Jonas converted some uses of EqualsIgnoreCase(const char*) to Equals(NS_LITERAL_STRING(...)), and it added quite a bit of footprint especially on Linux. I've implemented new Equals(const char*) methods that do not create a temporary nsDependentString nor a temporary nsStandardStringComparator. They should get back most of the footprint bloat --- and then some, if we convert prior Equals(NS_LITERAL_STRING(...)) methods to use these. Runtime performance may be a little slower in cases where the C++ optimizer does a really good job on the existing outer inline Length() check and the strings are mostly not equal. I doubt we'll notice.
Attached patch as described (obsolete) (deleted) — Splinter Review
I wrote a script which basically does s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/Equals\($2\)/ over the entire tree. The resulting build seems to work fine :-). I can't measure the performance impact because it was a debug build, though. I won't attach the resulting patch since it's absolutely gigantic :-).
Comment on attachment 136051 [details] [diff] [review] as described Something to try in 1.7a, but I may as well get it in the queue so I don't forget about it :-).
Attachment #136051 - Flags: superreview?(dbaron)
Attachment #136051 - Flags: review?(scc)
Keywords: footprint
Comment on attachment 136051 [details] [diff] [review] as described You need a new patch for this against the current string code. I'm not crazy about the method on nsAString -- I think it should probably be named something like EqualsASCII, since assertions about non-ASCII characters wouldn't be hit by most developers even if the string were a UTF-8 string coming from user input.
Attachment #136051 - Flags: superreview?(dbaron)
Attachment #136051 - Flags: superreview-
Attachment #136051 - Flags: review?(scc)
so, we currently have: PRBool nsACString::Equals(const char *) const; and, PRBool nsAString::Equals(const char *) const; sounds a lot like: PRBool nsString::EqualsWithConversion(const char *, ... ) const;
Yeah, except that the performance can be good. I'll whip up a new patch with EqualsASCII on both.
EqualsWithConversion does not inflate the argument before comparing. It uses nsStringObsolete.cpp's Compare2to1 method, which is pretty efficient. It sounds to me like the two methods should share the same implementation. I don't mind if you do away with some of the nsStringObsolete code ;-)
Attached patch woohoo, check this out (obsolete) (deleted) — Splinter Review
Okay, it's late and maybe I got carried away
Attachment #136051 - Attachment is obsolete: true
Attached patch revised (obsolete) (deleted) — Splinter Review
it was definitely late, because that patch was off by one.
The name should be changed to EqualsLiteral. darin: oh. well, the old string guide was incorrect then. I think the amount of code that could be reused by nsStringObselete is fairly trivial...
oooh... i like using templates to get the string literal's length. i wonder if we are going to run into portability problems with that :-/
Comment on attachment 145257 [details] [diff] [review] revised >Index: xpcom/string/public/nsCharTraits.h >+ compareASCII( const char_type* s1, const char* s2, size_t n ) ... >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); the same assertion does not exist for the |char_type = char| case. isn't that inconsistent? >Index: xpcom/string/public/nsTAString.h >+ /** >+ * An efficient comparison with ASCII that can be used even for >+ * wide strings. >+ */ >+ NS_COM PRBool EqualsASCII( const char* data, int len ) const; s/int/size_type/ >+ template<int N> >+ inline PRBool EqualsASCII(const char (&str)[N]) const { >+ return EqualsASCII(str, N-1); >+ } nit: use SCC-style indentation for consistency: template<int N> inline PRBool EqualsASCII( const char (&str)[N] ) const { return EqualsASCII(str, N-1); } >Index: xpcom/string/public/nsTSubstring.h >+ NS_COM PRBool EqualsASCII( const char* data, int len ) const; >+ template<int N> >+ inline PRBool EqualsASCII(const char (&str)[N]) const { >+ return EqualsASCII(str, N-1); >+ } same nits apply here. r=me w/ those changes.
Attachment #145257 - Flags: review-
Attached patch revised (obsolete) (deleted) — Splinter Review
revised as per darin's comments. We ought to prepare the fallback code. What's the cleanest way of selecting between the template code and some strlen code per-platform?
Attachment #145257 - Attachment is obsolete: true
Attached patch revised^2 (deleted) — Splinter Review
This adds an easy way to turn off the use of templates to match literals. I've built with the templates disabled and that works.
Attachment #145447 - Attachment is obsolete: true
Comment on attachment 145975 [details] [diff] [review] revised^2 Carrying forward darin's r+. David, if you're OK with this code then I think we can go ahead and check it in on an otherwise quiet evening and see what happens...
Attachment #145975 - Flags: superreview?(dbaron)
Attachment #145975 - Flags: review+
Comment on attachment 145975 [details] [diff] [review] revised^2 The NSCAP_ prefix has always been used only for nsCOMPtr. Perhaps just use NS_? #ifdef-ing includes is a recipe for bustage. However, nsCharTraits.h already includes <string.h>, so I think you should just remove those #ifdef-ed includes completely. with those changes, sr=dbaron
Attachment #145975 - Flags: superreview?(dbaron) → superreview+
Sure. I'll do those things and check in sometime soon when the tree is quiet and I have time to babysit. Thanks!
checked in. here we go
Comment on attachment 145975 [details] [diff] [review] revised^2 >+#ifdef NSCAP_DISABLE_LITERAL_TEMPLATE >+ inline PRBool EqualsLiteral( const char* str ) const >+ { >+ return EqualsASCII(str, strlen(str)); >+ } >+#else >+ template<int N> >+ inline PRBool EqualsLiteral( const char (&str)[N] ) const >+ { >+ return EqualsASCII(str, N-1); >+ } >+#endif Do you suppose this template would work on MSVC? template<class S> inline PRBool EqualsLiteral( S &str ) const { return EqualsASCII(str, sizeof str - 1); }
I don't know if that will work, but I'd like someone with a VC++ compiler to tell me. I'd also like to know whether VC++ 7 can handle the current template. In the meantime I'm going to convert over a file or two so we can see what the code size impact is.
Attached patch experiments for code size (deleted) — Splinter Review
Here are a couple of call sites converted to EqualsLiteral. The nsPresShell patch uses nsAString::EqualsLiteral and the nsObjectFrame patch uses nsACString::EqualsLiteral. I'll check these two parts in separately so we can see the effect of each.
Attachment #146955 - Flags: superreview?(dbaron)
Attachment #146955 - Flags: review?(dbaron)
> I'd also like to know whether VC++ 7 can handle the current template. VC++ 7.1 (as found in Visual Studio .Net 2003 Enterprise) seems to be able to compile the template. I backed out your latest patch to fix windows tinderbox bustage. xpcom.dll just built without errors. Dunno about VC++ 7.0 though.
Alright, I'll make the test check _MSC_VER < 1310.
If I were to try this ... char chars[8] = "foo"; nsCString buf(chars); printf("%s\n", buf.EqualsLiteral(chars) ? "true" : "false"); ... it would compile without errors. And I would see false on a platform which is using the template and true on one which not. No? Documenting how not to use this method might prevent those sorts of abuses.
Good point, but I think this is not very likely. Anyway I will add some documentation explaining that you really need to pass a literal string, not a reference to a variable or any other token.
Comment on attachment 146955 [details] [diff] [review] experiments for code size beware that code size experiments with small samples are sometimes misleading (inlined functions output separately once they're used, perhaps, or things like that)
Attachment #146955 - Flags: superreview?(dbaron)
Attachment #146955 - Flags: superreview+
Attachment #146955 - Flags: review?(dbaron)
Attachment #146955 - Flags: review+
OK. Stuff checked in. I guess the next step will be to use a script to whack all of layout and see what effect that has on code size. I'll apply the following two regular expressions: s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/ s/\b== \(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/.EqualsLiteral\($2\)/ There are also instances of '!= NS_LITERAL_STRING' but I'll have to fix those up by hand. How much review do we want for these changes?
Results from the Firefox tinderboxes: Linux redwood: Zdiff:-208 (+0/-208) MacOSX imola: Zdiff:+16 (+156/-140) WINNT5 beast: Zdiff:+74 (+304/-230) beast is using the strlen version. Too bad we don't have a VC++7.1 tinderbox.
Seamonkey tinderboxes: Linux brad: Zdiff:-336 (+2/-338) Linux luna: Zdiff:-224 (+0/-224)
That's a whole lot for just 3 string compares. Maybe we should experiment with making the non-template version non-inline.
This can be used by the non-templated EqualsLiteral, and also by client code that has a char* or char[] that may not be a literal. It's even plausible that it would be a win to do away with the template altogether and just use this. We can test that later...
Comment on attachment 147309 [details] [diff] [review] provide a fast compare-with-null-terminated-ASCII Note that I've tested this by converting layout (patch forthcoming) to EqualsLiteral and then building with the template disabled.
Attachment #147309 - Flags: superreview?(dbaron)
Attachment #147309 - Flags: review?(dbaron)
Attached patch layout conversion (deleted) — Splinter Review
This used the regexps that I noted plus by-hand conversion of the != operators. Plus, uses of Equals(NS_LITERAL_(C?)STRING(...)) where the string was not a literal were converted to EqualsASCII.
Comment on attachment 147310 [details] [diff] [review] layout conversion Up to you how much you want to review vs rubberstamp :-)
Attachment #147310 - Flags: superreview?(dbaron)
Attachment #147310 - Flags: review?(dbaron)
Attachment #147309 - Flags: superreview?(dbaron)
Attachment #147309 - Flags: superreview+
Attachment #147309 - Flags: review?(dbaron)
Attachment #147309 - Flags: review+
Attachment #147310 - Flags: superreview?(dbaron)
Attachment #147310 - Flags: superreview+
Attachment #147310 - Flags: review?(dbaron)
Attachment #147310 - Flags: review+
The rough numbers (aaronl checked in a small patch at the same time): Firefox Linux (redwood): Zdiff:-14126 (+151/-14277) Firefox MacOSX (imola): Zdiff:-12288 (+720/-13008) Firefox WinNT5 (beast): Zdiff:-19180 (+1895/-21075) Seamonkey Linux (brad): Zdiff:-13670 (+1310/-14980) Seamonkey Linux (luna): Zdiff:-15874 (+194/-16068) It also appears to have reduced Tp. Definitely worth converting the rest of the codebase! We should probably do an AppendASCII/AppendLiteral as well.
Attached patch convert the rest of the tree (obsolete) (deleted) — Splinter Review
This is it!
More things that should be done after this: provide AppendASCII/AppendLiteral provide EqualsASCIICaseInsensitive/EqualsLiteralCaseInsensitive
darin, dbaron, does one of you want to rubber-stamp this?
Attached patch drop tag changes in NSS (deleted) — Splinter Review
those tag changes in NSS don't need to be there.
How did you create the patch? I'd rather review an automated process than review the patch.
It was mostly done using these commands: find . -name '*.cpp' | xargs perl -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g' find . -name '*.h' | xargs perl -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g' find . -name '*.cpp' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g' find . -name '*.h' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g' find . -name '*.cpp' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g' find . -name '*.h' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g' find . -name '*.h' | xargs perl -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g' find . -name '*.cpp' | xargs perl -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g' I also hand-edited a few files: htmlparser/src/nsParser.cpp parser/htmlparser/src/nsParser.cpp (not sure which is the real one, I edited them both) xpfe/components/winhooks/nsWindowsHooksUtil.cpp
- contentType.Equals(NS_LITERAL_CSTRING(UNKNOWN_CONTENT_TYPE)) || + contentType.EqualsASCII(UNKNOWN_CONTENT_TYPE) || you could use EqualsLiteral here. UNKNOWN_CONTENT_TYPE is just a macro. in fact, i think there are other cases like this in the patch. i would recommend searching the patch for cases where EqualsASCII is used, and fixup as appropriate.
I would rather use EqualsASCII for the places where it is not clear looking just at the line that it's a literal. Otherwise we create potential for confusion. The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway.
(In reply to comment #44) > I would rather use EqualsASCII for the places where it is not clear looking just > at the line that it's a literal. Otherwise we create potential for confusion. > The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway. why is that more confusing than the code it replaces? NS_LITERAL_CSTRING(FOO) -> EqualsLiteral(FOO)
good point. Let's reduce the confusion :-)
Honestly, we could convert those to EqualsLiteral, and I don't really mind if someone does it, but it's just extra work for dubious performance gain and I'm not really motivated to do it.
The EqualsASCIIIgnoreCase stuff is hard because xpcom can't depend on intl, so we can't have functions in xpcom that convert Unicode to lower case. Any suggestions?
I believe that in any case when we want to be doing EqualsASCIIIgnoreCase we want to be doing locale-insensitive conversion. So no intl involved...
So is it true that the only Unicode character that locale-independent lower-case maps to 'e' is 'E'?
I think it's more like "if we're doing a EqualsASCIIIgnoreCase and we have any non-ASCII characters, we're not equal". And then it's a matter of doing a case-insensitive comparison between two characters that are both in the ascii range... (still have to do locale-independent because of things like the 'I' and 'i' business in Turkish).
Attached patch Assign, Append and EqualsIgnoreCase variants (obsolete) (deleted) — Splinter Review
OK, this patch does that.
Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants Seems to work after I converted layout to use these functions.
Attachment #147482 - Flags: superreview?(darin)
Attachment #147482 - Flags: review?(darin)
Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS Darin, you said you'd review or rubber-stamp this?
Attachment #147411 - Flags: superreview?(darin)
Attachment #147411 - Flags: review?(darin)
Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants >+ static >+ int >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) >+ { >+ for ( ; n--; ++s1, ++s2 ) >+ { >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); >+ } >+ >+ return 0; >+ } Alternative loop body suggestion: if (*s1 & ~0x7F) return 1; int_type c1 = tolower(*s1); int_type c2 = tolower(*s2); if (!eq_int_type(c1, c2)) return c1 - c2; If you don't like the return 1; then you have the choice of return *s1 - *s2; or int_type c2 = tolower(*s2); if (*s1 & ~0x7F) return *s1 - c2; int_type c1 = tolower(*s1); etc.
Those are fine. It should all come to the same thing really.
You can't use tolower() for ascii comparisons, last I checked -- it's locale-aware. See bug 159328 for the havoc that can ensue.
Cripes. Alright, I'll do it by hand inline in ASCIIToLower.
(In reply to comment #58) >Alright, I'll do it by hand inline in ASCIIToLower. Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
nsCRT::toLower does what you want.
> nsCRT::toLower does what you want. nsCRT::ToLower isn't inline, unfortunately. > Does that include inlining ASCIIToLower into compareASCIIIgnoreCase? That should happen, since ASCIIToLower is an inline method of nsCharTraits.
(In reply to comment #61) >>Does that include inlining ASCIIToLower into compareASCIIIgnoreCase? >That should happen, since ASCIIToLower is an inline method of nsCharTraits. Yes, sorry for overlooking that.
Sorry for the huge delay roc... I do plan to spend some time reviewing these patches early next week.
Attached patch updated (obsolete) (deleted) — Splinter Review
Fixes use of tolower as discussed here. Also, I changed the IgnoreCase methods to assume that the literal/ASCII string is already lower-case.
Attachment #147482 - Attachment is obsolete: true
(In reply to comment #64) >Also, I changed the IgnoreCase methods to assume that the literal/ASCII string >is already lower-case. That reminds me of an ignore case comparison routine I had in which I knew that I would never be comparing symbols so I just used (ch | 0x20) to force the character to lower case.
Note that there are currently callers of EqualsIgnoreCase that does pass strings with uppercase characters. So I would recommend either a more descriptive name, or at least some assertions to make sure that that isn't done for these new methods.
Depends on: 242382
Such assertions are in the patch. When the time comes for mass conversion, it's simple enough to write a regular expression that only converts for literals that have no upper-case characters.
Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS >Index: editor/composer/src/nsComposerCommands.cpp >+ PRBool doingAll = (allStr.EqualsLiteral("all")); kill excess parantheses. >Index: editor/libeditor/html/nsHTMLAbsPosition.cpp >+ PRBool isPositioned = (positionStr.EqualsLiteral("absolute")); ditto. >Index: extensions/cookie/nsPermissionManager.cpp ... >+ if (lineArray[0]->EqualsASCII(kMatchTypeHost) && you can use EqualsLiteral here. >Index: htmlparser/src/CNavDTD.cpp >- if(PR_TRUE==aParserContext.mMimeType.Equals(NS_LITERAL_CSTRING(kHTMLTextContentType))) { >+ if(PR_TRUE==aParserContext.mMimeType.EqualsASCII(kHTMLTextContentType)) { I think it would be good to use EqualsLiteral for these. Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to EqualsLiteral. Clearly if the argument was a literal before it will still be a literal now, so there is no point mapping to EqualsASCII. Why not use EqualsLiteral whenever you can?
>>Index: htmlparser/src/CNavDTD.cpp I just removed the code in question, so it's not something to worry about that callsite anymore. ;)
> Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to > EqualsLiteral. OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.
> OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns. r+sr=darin with that change. sorry for not saying that earlier :-(
Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS marking r- to reflect the fact that roc said he would be landing a different patch. again, r+sr=me for the version that does a straight conversion from NS_LITERAL to EqualsLiteral.
Attachment #147411 - Flags: superreview?(darin)
Attachment #147411 - Flags: review?(darin)
Attachment #147411 - Flags: review-
Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants >Index: xpcom/string/public/nsCharTraits.h >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) >+ { >+ for ( ; n--; ++s1, ++s2 ) >+ { >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); >+ } in the not-equals case you pay the cost of calling tolower twice. wouldn't it be better to compute once and store in the results in a local variable? for ( ; n--; ++s1, ++s2 ) { NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); int i1 = to_int_type(ASCIIToLower(*s1)); int i2 = to_int_type(tolower(*s2)); if ( !eq_int_type(i1, i2) ) return i1 - i2; } it also looks a lot cleaner IMO ;-) >+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 ) ... >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); same idea applies here too. >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) ... >+ if ( tolower(*s1) != tolower(*s2) ) >+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2)); same thing here. >+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 ) ... >+ if ( tolower(*s1) != tolower(*s2) ) >+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2)); and here. >Index: xpcom/string/public/nsTSubstring.h >+ NS_COM void AssignASCII( const char* data, size_type length ); >+ NS_COM void AssignASCII( const char* data ); it seems like these can be simple inline wrappers around the normal Assign methods in the case where char_type is char. that has the advantage of avoiding extra DSO entry points. no need to be redundant, right? same thing applies to the Append variants as well as the methods declared on nsTAString. we could also add ReplaceLiteral and InsertLiteral, but maybe those are less common cases? >Index: xpcom/string/src/nsTSubstring.cpp > void >+nsTSubstring_CharT::AssignASCII( const char* data, size_type length ) >+ { >+ // A Unicode string can't depend on an ASCII string buffer, >+ // so this dependence check only applies to CStrings. >+#ifdef CharT_is_char >+ if (IsDependentOn(data, data + length)) >+ { >+ // take advantage of sharing here... >+ Assign(string_type(data, length)); ... and in this codepath you don't even call copyASCII -- and that's fine -- so, no reason to both implementing AssignASCII as anything more than an inline wrapper around the normal Assign method when CharT_is_char is defined. overall, this patch looks good. thanks for the fine work roc! sorry, it took me forever to review this :-( r+sr=darin (with the changes i suggested)
Attachment #147482 - Flags: superreview?(darin)
Attachment #147482 - Flags: superreview+
Attachment #147482 - Flags: review?(darin)
Attachment #147482 - Flags: review+
I'll do all you suggest except: > it seems like these can be simple inline wrappers around the normal > Assign methods in the case where char_type is char. that has the > advantage of avoiding extra DSO entry points. no need to be redundant, > right? I want to check (in debug builds) that that parameter string really is ASCII, for consistency with the AString case. > we could also add ReplaceLiteral and InsertLiteral, but maybe those > are less common cases? I think so. That can be future work if merited. > ... and in this codepath you don't even call copyASCII -- and that's fine When the parameter is a literal, the IsDependentOn test will fail and we will always take the copyASCII path. I would like to retain the ASCII path here. I think the space/time impact of the extra methods will be small. I'm going to go out on a limb and check it in as is. If you violently disagree, let me know and I'll fix it up afterward.
Oh wait. I think you reviewed an earlier version of the patch. Attachment 147694 [details] [diff] has slightly different code, more efficient by assuming the literal is already lowercase. I'll attach an update with your comments (mostly) taken into account.
Attached patch udpated^2 (deleted) — Splinter Review
Attachment #147964 - Attachment is obsolete: true
Comment on attachment 149104 [details] [diff] [review] udpated^2 one more time :-)
Attachment #149104 - Flags: superreview?(darin)
Attachment #149104 - Flags: review?(darin)
I checked in the conversion of Seamonkey to EqualsLiteral. Here are the final commands that I used, in case someone wants to convert Firefox code or some other code: find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/EqualsLiteral\($2\)/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING\(([^)]*)\)/.EqualsLiteral\($2\)/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING\(([^)]*)\)/!$1.EqualsLiteral\($3\)/g'
Numbers this time: Firefox WINNT 5.0 beast Clbr: Zdiff:-15268 Firefox MacOSX Darwin 6.8 imola Dep: Zdiff:-20656 (+19200/-39856) mZdiff:-16608 (+7440/-24048) Firefox Linux redwood Dep: Zdiff:-30957 (+181/-31138) mZdiff:-23824 (+71/-23895) Seamonkey Linux brad Clbr: Zdiff:-39175 (+645/-39820) mZdiff:-25456 (+224/-25680) Seamonkey Linux luna Dep: Zdiff:-40583 (+384/-40967) mZdiff:-25992 (+89/-26081)
Comment on attachment 149104 [details] [diff] [review] udpated^2 > compareLCASCIIIgnoreCase i think this function deserves a comment explaining what it does. you have a comment that mentions EqualsLiteralIgnoreCase, but the code names the function EqualsLCLiteralIgnoreCase. also, i have to say that i find these names very cumbersome. is there no better name that we can come up with? why don't we just state that all string literals passed to the ignore-case versions need to be lower case? then the function names wouldn't need the LC acronym. EqualsLiteralIC might be better than EqualsLCLiteralIgnoreCase IMO.
> why don't we just state that all string literals passed to the ignore-case > versions need to be lower case? That sounds OK for the Literal versions but for the generic ASCII versions it seems a bit less suitable. Maybe we just have to deal. How about just dropping the LC and having EqualsASCIIIgnoreCase, EqualsLiteralIgnoreCase? EqualsLiteralIC sounds fine but EqualsASCIIIC doesn't...
I'm fine with EqualsLiteralIgnoreCase and EqualsASCIIIgnoreCase.
want me to make a new patch?
Comment on attachment 149104 [details] [diff] [review] udpated^2 no need for a new patch. just drop the LC everywhere, and document the cases where the input string must be lowercase. r+sr=darin with that.
Attachment #149104 - Flags: superreview?(darin)
Attachment #149104 - Flags: superreview+
Attachment #149104 - Flags: review?(darin)
Attachment #149104 - Flags: review+
Hmm, that's bad. Thanks for reminding me. How about LowerCaseEqualsASCII, LowerCaseEqualsLiteral? foo->LowerCaseEqualsLiteral("womble") foo->LowerCaseEqualsASCII(str)
> Hmm, that's bad. Thanks for reminding me. yeah, i guess so... though we do have an assertion ;-) > foo->LowerCaseEqualsLiteral("womble") > foo->LowerCaseEqualsASCII(str) does that imply: foo->LowerCaseEqualsLiteralIgnoreCase("womble"); foo->LowerCaseEqualsASCIIIgnoreCase(str); my fingers hurt :-/
> does that imply: > > foo->LowerCaseEqualsLiteralIgnoreCase("womble"); > foo->LowerCaseEqualsASCIIIgnoreCase(str); Nope, LowerCaseEqualsLiteral and LowerCaseEqualsASCII are the full names
In case I wasn't the only one who didn't get it: "LowerCase of |this| equals some literal" I think I like these function names a lot.
OK, last step: convert codebase to use these functions. Here's what I did: # Update obvious calls to Append and Assign find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAppend\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AppendLiteral\($2\)/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAssign\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AssignLiteral\($2\)/g' # Change Append/AssignWithConversion of a literal to Append/AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAppendWithConversion\(("[^"]*")\)/AppendLiteral\($1\)/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAssignWithConversion\(("[^"]*")\)/AssignLiteral\($1\)/g' # Update simple uses of operator+= and operator= to Append/AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ \+= NS_LITERAL_(C?)STRING\(([^)]*)\);/.AppendLiteral\($2\);/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ = NS_LITERAL_(C?)STRING\(([^)]*)\);/.AssignLiteral\($2\);/g' # Convert EqualsIgnoreCase of literals to LowerCaseEqualsLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bEqualsIgnoreCase\(("[^"]*")\)/"LowerCaseEqualsLiteral\(".lc($1)."\)"/eg' # Convert Equals(NS_LITERAL..., nsCaseInsensitiveStringComparator()) to LowerCaseEqualsLiteral find . | egrep '\.(cpp|h)$' | xargs perl -0 -pi -e 's/\bEquals\(NS_LITERAL_(C?)STRING\(("[^"]*")\),[ \n\t]*nsCaseInsensitiveStringComparator\(\)\)/"LowerCaseEqualsLiteral\(".lc($2)."\)"/egs' # Fix up <type> <identifier>.AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/String (\w+)\.AssignLiteral/String $1; $1\.AssignLiteral/g' # Manually restore calls to nsIFile::Append in nsJVMConfigManagerUnix.cpp and # nsLayoutStyleSheetCache.cpp, CBrowserShell.cpp, nsEmbedGlobalHistory.cpp, # mozPersonalDictionary.cpp, mozMySpell.cpp, myspAffixMgr.cpp, pyloader.cpp, # testXalan.cpp, nsBayesianFilter.cpp, nsBookmarksService.cpp, winEmbed.cpp I'll attach the resulting patch.
Attached file tada! (deleted) —
One thing to consider is whether LowerCaseEqualsLiteral makes the code less readable by forcing the RHS to be lower-cased. This patch by no means covers all the cases that can take advantage of the new functions. It's the easily-automated low hanging fruit. Note that this file is gzipped because the uncompressed patch is too big for Bugzilla's little mind.
Comment on attachment 150174 [details] tada! here you go. by my count, 2252 call sites were changed. You may wish to just review the script :-)
Attachment #150174 - Flags: superreview?(darin)
Attachment #150174 - Flags: review?(darin)
Ignore the Makefile change in that patch. I won't check it in.
Hum. nsCaseInsensitiveStringComparator is Unicode aware so it's not generally correct to replace it with LowerCaseEqualsASCII. I will take out those changes.
Another option would be to make LowerCaseEqualsASCII on wide strings be Unicode aware. It doesn't look that hard. What is the rationale for having the case converter be an XPCOM service?
Hmm. We only care about Unicode characters that can lowercase-map to ASCII. Looking at the Unicode data files to see what Unicode characters that means, I find: -- I-with-dot maps to ASCII i followed by a combining dot; the combining dot is not ASCII so it can't match. -- Similarly for Lithuanian accented I and J -- In Turkic and Azeri locales, I-with-dot maps to plain ASCII i, but would we really want this comparison to be locale dependent? Argh. I don't think we do it locale ind -- 0x212A, KELVIN SIGN, maps to ASCII k I18N is fun! Looking at the current case converter (and assuming that there's not some magic alternative service that might be selected at run time; if so it doesn't appear to be in our tree), we map I-with-dot to plain ASCII i and KELVIN SIGN to ASCII k, and that's it. So it would be trivial to have LowerCaseEquals code preserve this behaviour (in fact the change will go in nsCharTraits<PR_Unichar>::ASCIIToLower). I'll do that, and leave the mass-API-update patch as is.
(In reply to comment #97) > Hmm. We only care about Unicode characters that can lowercase-map to ASCII. wouldn't it be easier to map the ASCII string to unicode, in the lower- and uppercase variant? for ASCII, it seems that it's trivial to find the uppercase equivalent in basically all cases... (unless you want i to uppercase to I-with-dot in the turkish locale, which I doubt you do)
(In reply to comment #98) > wouldn't it be easier to map the ASCII string to unicode, in the lower- and > uppercase variant? for ASCII, it seems that it's trivial to find the uppercase > equivalent in basically all cases... I don't know what you mean. The plan in my previous comment is simple and will preserve current behaviour.
please make sure the i18n people are in on this. They might have plans to add support for more characters.
From my reading of the Unicode tables, there really is no other significant case unless we apply specific rules for Lithuanian locales, but CCing smontagu anyway just in case :-)
As described. This should guarantee that LowerCaseEqualsLiteral(...) (applied to a lowercase literal) behaves identically to Equals(..., nsCaseInsensitiveStringComparator()).
Attachment #150243 - Flags: superreview?(darin)
Attachment #150243 - Flags: review?(darin)
I don't think that these changes are valid. The old way was correct and is what would be expected from a function called 'ASCIIToLower'. Since you are explicitly only lowering ascii characters, and the literals are only supposed to be ASCII characters, the comparison should fail when matching anything other than ascii characters.
Maybe we should change the name of that function from ASCIIToLower to something else. But the comment indicates what it's supposed to do and it does do what it's supposed to do.
Are you sure it's wise to add i18n code to xpcom? What if we learn that our uconv code needs to be changed... will the folks making those changes know to update xpcom? Why not implement the nsSubstring::LowerCaseEqualsLiteral symbol outside of xpcom? Why not put it inside unicharutils along side nsCaseInsensitiveStringComparator? It seems reasonable to me to force consumers to link to unicharutils to get this functionality.
It's only about six lines of code, based directly on the Unicode spec, unlikely to ever need changing. I don't think it's worth the pain of moving those lines out into unicharutils, nor the complexity of having nsSubstring split across two DLLs.
(In reply to comment #99) > I don't know what you mean. I meant: Instead of taking the unicode character and converting it to lowercase, you could take the ASCII character, and convert it to unicode (which is as simple as a cast), if that doesn't match uppercase the ascii character and convert it to unicode and compare. that way, you need not worry about unicode lowercasing rules, I think.
biesi's suggestion, clarified on IRC, amounts to exactly what the currently checked in code does.
> It's only about six lines of code, based directly on the Unicode spec, unlikely > to ever need changing. the unicode spec isn't static but changes from version to version with new characters added.
Re: comment 105 If this is to be included in xpcom, then comments should be added to both this location and the appropriate uconv location so that updates in one place will find the other.
(In reply to comment #109) > > It's only about six lines of code, based directly on the Unicode spec, > > unlikely to ever need changing. > > the unicode spec isn't static but changes from version to version with new > characters added. The chances that the Unicode people add some hitherto-obscure characters which get downcase-mapped to ASCII are extremely low, especially considering there are only two such characters today. The chances that accidentally failing to handle case-insensitive comparision of such characters would be noticeable, let alone regarded as a serious issue, are even more negligible. In any case I can add comments to the unicharutils code pointing to the nsCharTraits code. Can we just do the simple thing to make all the Equals(..., nsCaseInsensitiveStringComparator()) calls turn into nice fast code, and close this bug out?
Comments are sufficient as far as I'm concerned.
Comment on attachment 150243 [details] [diff] [review] fix nsCharTraits<PR_Unichar> for full I18N correctness r+sr=darin Please add comments to this code regarding the Unicode to ASCII conversion. Also, there's no reason to put an "else" after "return 'k';" I don't need to see another patch for comments, but please don't hesitate to be verbose in the comments here and in intl/ :-)
Attachment #150243 - Flags: superreview?(darin)
Attachment #150243 - Flags: superreview+
Attachment #150243 - Flags: review?(darin)
Attachment #150243 - Flags: review+
Comment on attachment 150174 [details] tada! well done! r+sr=darin
Attachment #150174 - Flags: superreview?(darin)
Attachment #150174 - Flags: superreview+
Attachment #150174 - Flags: review?(darin)
Attachment #150174 - Flags: review+
all checked in. Not much Tp impact, still waiting for Tinderbox size numbers. Thanks all!
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
http://www.mozilla.org/projects/xpcom/string-guide.html will need to be updated for these changes.
Reduced codesize by about 70K on brad.
Yes, I should update the string guide, but I'm not sure how.
I've got an alternative to the template based solution for EqualsLiteral and family. The current code breaks VC++ 2003 anyway. The existing solution is a bit edgy template wise. The solution I propose is a bit more main stream, I believe and will allow this optimization to be handled on more compilers. I'm not touching the NS_DISABLE_LITERAL_TEMPLATE macro at this point, but I suspect it can be less restrictive if not removed all together. People with the various platforms need to test this out. Patch coming up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Attached patch Alternative template solution (obsolete) (deleted) — Splinter Review
Oh, and swalker reports that VC++8 won't compile this either.
If you do const char* foo = "blah"; bork.EqualsLiteral(foo) does it compile?
Oh, so were you using this template mechanism to weed out passing string litterals versus by char pointers I think I found what was tripping up the VC++ 2003. It was the use of the terinary operator in the nsIWin32LocalImpl.cpp. So this didn't cause the problem at all. Putting this back to FIXED, I'll provide a patch in another bug for the nsIWin32LocalImp
Status: REOPENED → RESOLVED
Closed: 20 years ago20 years ago
Resolution: --- → FIXED
Attachment #162659 - Attachment is obsolete: true
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: