Closed
Bug 226439
Opened 21 years ago
Closed 20 years ago
Reduce footprint of nsA(C?)String::Equals() applied to ASCII strings
Categories
(Core :: XPCOM, defect, P1)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
People
(Reporter: roc, Assigned: roc)
References
Details
(Keywords: memory-footprint)
Attachments
(8 files, 8 obsolete files)
(deleted),
patch
|
roc
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dbaron
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dbaron
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
dbaron
:
review+
dbaron
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
darin.moz
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
darin.moz
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
application/x-tar
|
darin.moz
:
review+
darin.moz
:
superreview+
|
Details |
(deleted),
patch
|
darin.moz
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
Recently Jonas converted some uses of EqualsIgnoreCase(const char*) to
Equals(NS_LITERAL_STRING(...)), and it added quite a bit of footprint especially
on Linux.
I've implemented new Equals(const char*) methods that do not create a temporary
nsDependentString nor a temporary nsStandardStringComparator. They should get
back most of the footprint bloat --- and then some, if we convert prior
Equals(NS_LITERAL_STRING(...)) methods to use these. Runtime performance may be
a little slower in cases where the C++ optimizer does a really good job on the
existing outer inline Length() check and the strings are mostly not equal. I
doubt we'll notice.
Assignee | ||
Comment 1•21 years ago
|
||
Assignee | ||
Comment 2•21 years ago
|
||
I wrote a script which basically does
s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/Equals\($2\)/ over the entire
tree. The resulting build seems to work fine :-). I can't measure the
performance impact because it was a debug build, though. I won't attach the
resulting patch since it's absolutely gigantic :-).
Assignee | ||
Comment 3•21 years ago
|
||
Comment on attachment 136051 [details] [diff] [review]
as described
Something to try in 1.7a, but I may as well get it in the queue so I don't
forget about it :-).
Attachment #136051 -
Flags: superreview?(dbaron)
Attachment #136051 -
Flags: review?(scc)
Comment 4•21 years ago
|
||
Comment on attachment 136051 [details] [diff] [review]
as described
You need a new patch for this against the current string code.
I'm not crazy about the method on nsAString -- I think it should probably be
named something like EqualsASCII, since assertions about non-ASCII characters
wouldn't be hit by most developers even if the string were a UTF-8 string
coming from user input.
Attachment #136051 -
Flags: superreview?(dbaron)
Attachment #136051 -
Flags: superreview-
Attachment #136051 -
Flags: review?(scc)
Comment 5•21 years ago
|
||
so, we currently have:
PRBool nsACString::Equals(const char *) const;
and,
PRBool nsAString::Equals(const char *) const;
sounds a lot like:
PRBool nsString::EqualsWithConversion(const char *, ... ) const;
Assignee | ||
Comment 6•21 years ago
|
||
Yeah, except that the performance can be good. I'll whip up a new patch with
EqualsASCII on both.
Comment 7•21 years ago
|
||
EqualsWithConversion does not inflate the argument before comparing. It uses
nsStringObsolete.cpp's Compare2to1 method, which is pretty efficient. It sounds
to me like the two methods should share the same implementation. I don't mind
if you do away with some of the nsStringObsolete code ;-)
Assignee | ||
Comment 8•21 years ago
|
||
Okay, it's late and maybe I got carried away
Attachment #136051 -
Attachment is obsolete: true
Assignee | ||
Comment 9•21 years ago
|
||
it was definitely late, because that patch was off by one.
Assignee | ||
Updated•21 years ago
|
Attachment #145241 -
Attachment is obsolete: true
Assignee | ||
Comment 10•21 years ago
|
||
The name should be changed to EqualsLiteral.
darin: oh. well, the old string guide was incorrect then.
I think the amount of code that could be reused by nsStringObselete is fairly
trivial...
Comment 11•21 years ago
|
||
oooh... i like using templates to get the string literal's length. i wonder if
we are going to run into portability problems with that :-/
Comment 12•21 years ago
|
||
Comment on attachment 145257 [details] [diff] [review]
revised
>Index: xpcom/string/public/nsCharTraits.h
>+ compareASCII( const char_type* s1, const char* s2, size_t n )
...
>+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
the same assertion does not exist for the |char_type = char| case.
isn't that inconsistent?
>Index: xpcom/string/public/nsTAString.h
>+ /**
>+ * An efficient comparison with ASCII that can be used even for
>+ * wide strings.
>+ */
>+ NS_COM PRBool EqualsASCII( const char* data, int len ) const;
s/int/size_type/
>+ template<int N>
>+ inline PRBool EqualsASCII(const char (&str)[N]) const {
>+ return EqualsASCII(str, N-1);
>+ }
nit: use SCC-style indentation for consistency:
template<int N>
inline PRBool EqualsASCII( const char (&str)[N] ) const
{
return EqualsASCII(str, N-1);
}
>Index: xpcom/string/public/nsTSubstring.h
>+ NS_COM PRBool EqualsASCII( const char* data, int len ) const;
>+ template<int N>
>+ inline PRBool EqualsASCII(const char (&str)[N]) const {
>+ return EqualsASCII(str, N-1);
>+ }
same nits apply here.
r=me w/ those changes.
Attachment #145257 -
Flags: review-
Assignee | ||
Comment 13•21 years ago
|
||
revised as per darin's comments.
We ought to prepare the fallback code. What's the cleanest way of selecting
between the template code and some strlen code per-platform?
Attachment #145257 -
Attachment is obsolete: true
Assignee | ||
Comment 14•21 years ago
|
||
This adds an easy way to turn off the use of templates to match literals. I've
built with the templates disabled and that works.
Attachment #145447 -
Attachment is obsolete: true
Assignee | ||
Comment 15•21 years ago
|
||
Comment on attachment 145975 [details] [diff] [review]
revised^2
Carrying forward darin's r+. David, if you're OK with this code then I think we
can go ahead and check it in on an otherwise quiet evening and see what
happens...
Attachment #145975 -
Flags: superreview?(dbaron)
Attachment #145975 -
Flags: review+
Comment 16•21 years ago
|
||
Comment on attachment 145975 [details] [diff] [review]
revised^2
The NSCAP_ prefix has always been used only for nsCOMPtr. Perhaps just use
NS_?
#ifdef-ing includes is a recipe for bustage. However, nsCharTraits.h already
includes <string.h>, so I think you should just remove those #ifdef-ed includes
completely.
with those changes, sr=dbaron
Attachment #145975 -
Flags: superreview?(dbaron) → superreview+
Assignee | ||
Comment 17•21 years ago
|
||
Sure. I'll do those things and check in sometime soon when the tree is quiet and
I have time to babysit. Thanks!
Assignee | ||
Updated•21 years ago
|
Priority: -- → P1
Assignee | ||
Comment 18•21 years ago
|
||
checked in. here we go
Comment 19•21 years ago
|
||
Comment on attachment 145975 [details] [diff] [review]
revised^2
>+#ifdef NSCAP_DISABLE_LITERAL_TEMPLATE
>+ inline PRBool EqualsLiteral( const char* str ) const
>+ {
>+ return EqualsASCII(str, strlen(str));
>+ }
>+#else
>+ template<int N>
>+ inline PRBool EqualsLiteral( const char (&str)[N] ) const
>+ {
>+ return EqualsASCII(str, N-1);
>+ }
>+#endif
Do you suppose this template would work on MSVC?
template<class S>
inline PRBool EqualsLiteral( S &str ) const
{
return EqualsASCII(str, sizeof str - 1);
}
Assignee | ||
Comment 20•21 years ago
|
||
I don't know if that will work, but I'd like someone with a VC++ compiler to
tell me.
I'd also like to know whether VC++ 7 can handle the current template.
In the meantime I'm going to convert over a file or two so we can see what the
code size impact is.
Assignee | ||
Comment 21•21 years ago
|
||
Here are a couple of call sites converted to EqualsLiteral. The nsPresShell
patch uses nsAString::EqualsLiteral and the nsObjectFrame patch uses
nsACString::EqualsLiteral. I'll check these two parts in separately so we can
see the effect of each.
Assignee | ||
Updated•21 years ago
|
Attachment #146955 -
Flags: superreview?(dbaron)
Attachment #146955 -
Flags: review?(dbaron)
Comment 22•21 years ago
|
||
> I'd also like to know whether VC++ 7 can handle the current template.
VC++ 7.1 (as found in Visual Studio .Net 2003 Enterprise) seems to be able to
compile the template. I backed out your latest patch to fix windows tinderbox
bustage. xpcom.dll just built without errors. Dunno about VC++ 7.0 though.
Assignee | ||
Comment 23•21 years ago
|
||
Alright, I'll make the test check _MSC_VER < 1310.
Comment 24•21 years ago
|
||
If I were to try this ...
char chars[8] = "foo";
nsCString buf(chars);
printf("%s\n", buf.EqualsLiteral(chars) ? "true" : "false");
... it would compile without errors. And I would see false on a platform which
is using the template and true on one which not. No?
Documenting how not to use this method might prevent those sorts of abuses.
Assignee | ||
Comment 25•21 years ago
|
||
Good point, but I think this is not very likely. Anyway I will add some
documentation explaining that you really need to pass a literal string, not a
reference to a variable or any other token.
Comment 26•21 years ago
|
||
Comment on attachment 146955 [details] [diff] [review]
experiments for code size
beware that code size experiments with small samples are sometimes misleading
(inlined functions output separately once they're used, perhaps, or things like
that)
Attachment #146955 -
Flags: superreview?(dbaron)
Attachment #146955 -
Flags: superreview+
Attachment #146955 -
Flags: review?(dbaron)
Attachment #146955 -
Flags: review+
Assignee | ||
Comment 27•21 years ago
|
||
OK. Stuff checked in. I guess the next step will be to use a script to whack all
of layout and see what effect that has on code size. I'll apply the following
two regular expressions:
s/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/
s/\b== \(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/.EqualsLiteral\($2\)/
There are also instances of '!= NS_LITERAL_STRING' but I'll have to fix those up
by hand.
How much review do we want for these changes?
Assignee | ||
Comment 28•21 years ago
|
||
Results from the Firefox tinderboxes:
Linux redwood: Zdiff:-208 (+0/-208)
MacOSX imola: Zdiff:+16 (+156/-140)
WINNT5 beast: Zdiff:+74 (+304/-230)
beast is using the strlen version. Too bad we don't have a VC++7.1 tinderbox.
Assignee | ||
Comment 29•21 years ago
|
||
Seamonkey tinderboxes:
Linux brad: Zdiff:-336 (+2/-338)
Linux luna: Zdiff:-224 (+0/-224)
Assignee | ||
Comment 30•21 years ago
|
||
That's a whole lot for just 3 string compares.
Maybe we should experiment with making the non-template version non-inline.
Assignee | ||
Comment 31•21 years ago
|
||
This can be used by the non-templated EqualsLiteral, and also by client code
that has a char* or char[] that may not be a literal.
It's even plausible that it would be a win to do away with the template
altogether and just use this. We can test that later...
Assignee | ||
Comment 32•21 years ago
|
||
Comment on attachment 147309 [details] [diff] [review]
provide a fast compare-with-null-terminated-ASCII
Note that I've tested this by converting layout (patch forthcoming) to
EqualsLiteral and then building with the template disabled.
Attachment #147309 -
Flags: superreview?(dbaron)
Attachment #147309 -
Flags: review?(dbaron)
Assignee | ||
Comment 33•21 years ago
|
||
This used the regexps that I noted plus by-hand conversion of the != operators.
Plus, uses of Equals(NS_LITERAL_(C?)STRING(...)) where the string was not a
literal were converted to EqualsASCII.
Assignee | ||
Comment 34•21 years ago
|
||
Comment on attachment 147310 [details] [diff] [review]
layout conversion
Up to you how much you want to review vs rubberstamp :-)
Attachment #147310 -
Flags: superreview?(dbaron)
Attachment #147310 -
Flags: review?(dbaron)
Updated•21 years ago
|
Attachment #147309 -
Flags: superreview?(dbaron)
Attachment #147309 -
Flags: superreview+
Attachment #147309 -
Flags: review?(dbaron)
Attachment #147309 -
Flags: review+
Updated•21 years ago
|
Attachment #147310 -
Flags: superreview?(dbaron)
Attachment #147310 -
Flags: superreview+
Attachment #147310 -
Flags: review?(dbaron)
Attachment #147310 -
Flags: review+
Assignee | ||
Comment 35•21 years ago
|
||
checked in.
Assignee | ||
Comment 36•21 years ago
|
||
The rough numbers (aaronl checked in a small patch at the same time):
Firefox Linux (redwood): Zdiff:-14126 (+151/-14277)
Firefox MacOSX (imola): Zdiff:-12288 (+720/-13008)
Firefox WinNT5 (beast): Zdiff:-19180 (+1895/-21075)
Seamonkey Linux (brad): Zdiff:-13670 (+1310/-14980)
Seamonkey Linux (luna): Zdiff:-15874 (+194/-16068)
It also appears to have reduced Tp.
Definitely worth converting the rest of the codebase! We should probably do an
AppendASCII/AppendLiteral as well.
Assignee | ||
Comment 37•21 years ago
|
||
This is it!
Assignee | ||
Comment 38•21 years ago
|
||
More things that should be done after this:
provide AppendASCII/AppendLiteral
provide EqualsASCIICaseInsensitive/EqualsLiteralCaseInsensitive
Assignee | ||
Comment 39•21 years ago
|
||
darin, dbaron, does one of you want to rubber-stamp this?
Assignee | ||
Comment 40•21 years ago
|
||
those tag changes in NSS don't need to be there.
Assignee | ||
Updated•21 years ago
|
Attachment #147410 -
Attachment is obsolete: true
Comment 41•21 years ago
|
||
How did you create the patch? I'd rather review an automated process than
review the patch.
Assignee | ||
Comment 42•21 years ago
|
||
It was mostly done using these commands:
find . -name '*.cpp' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g'
find . -name '*.h' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\((\"[^"]*\")\)\)/EqualsLiteral\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g'
find . -name '*.h' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\((\"[^"]*\"\)/.EqualsLiteral\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g'
find . -name '*.h' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\((\"[^"]*\")\)/!$1.EqualsLiteral\($3\)/g'
find . -name '*.h' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g'
find . -name '*.cpp' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^\)]+)\)\)/EqualsASCII\($2\)/g'
I also hand-edited a few files:
htmlparser/src/nsParser.cpp
parser/htmlparser/src/nsParser.cpp (not sure which is the real one, I edited
them both)
xpfe/components/winhooks/nsWindowsHooksUtil.cpp
Comment 43•21 years ago
|
||
- contentType.Equals(NS_LITERAL_CSTRING(UNKNOWN_CONTENT_TYPE)) ||
+ contentType.EqualsASCII(UNKNOWN_CONTENT_TYPE) ||
you could use EqualsLiteral here. UNKNOWN_CONTENT_TYPE is just a macro.
in fact, i think there are other cases like this in the patch. i would
recommend searching the patch for cases where EqualsASCII is used, and fixup as
appropriate.
Assignee | ||
Comment 44•21 years ago
|
||
I would rather use EqualsASCII for the places where it is not clear looking just
at the line that it's a literal. Otherwise we create potential for confusion.
The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway.
Comment 45•21 years ago
|
||
(In reply to comment #44)
> I would rather use EqualsASCII for the places where it is not clear looking just
> at the line that it's a literal. Otherwise we create potential for confusion.
> The gain from using EqualsLiteral over EqualsASCII is probably quite small,
anyway.
why is that more confusing than the code it replaces?
NS_LITERAL_CSTRING(FOO) -> EqualsLiteral(FOO)
Assignee | ||
Comment 46•21 years ago
|
||
good point. Let's reduce the confusion :-)
Assignee | ||
Comment 47•21 years ago
|
||
Honestly, we could convert those to EqualsLiteral, and I don't really mind if
someone does it, but it's just extra work for dubious performance gain and I'm
not really motivated to do it.
Assignee | ||
Comment 48•21 years ago
|
||
The EqualsASCIIIgnoreCase stuff is hard because xpcom can't depend on intl, so
we can't have functions in xpcom that convert Unicode to lower case. Any
suggestions?
Comment 49•21 years ago
|
||
I believe that in any case when we want to be doing EqualsASCIIIgnoreCase we
want to be doing locale-insensitive conversion. So no intl involved...
Assignee | ||
Comment 50•21 years ago
|
||
So is it true that the only Unicode character that locale-independent lower-case
maps to 'e' is 'E'?
Comment 51•21 years ago
|
||
I think it's more like "if we're doing a EqualsASCIIIgnoreCase and we have any
non-ASCII characters, we're not equal". And then it's a matter of doing a
case-insensitive comparison between two characters that are both in the ascii
range... (still have to do locale-independent because of things like the 'I' and
'i' business in Turkish).
Assignee | ||
Comment 52•21 years ago
|
||
OK, this patch does that.
Assignee | ||
Comment 53•21 years ago
|
||
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants
Seems to work after I converted layout to use these functions.
Attachment #147482 -
Flags: superreview?(darin)
Attachment #147482 -
Flags: review?(darin)
Assignee | ||
Comment 54•21 years ago
|
||
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS
Darin, you said you'd review or rubber-stamp this?
Attachment #147411 -
Flags: superreview?(darin)
Attachment #147411 -
Flags: review?(darin)
Comment 55•21 years ago
|
||
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants
>+ static
>+ int
>+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
>+ {
>+ for ( ; n--; ++s1, ++s2 )
>+ {
>+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
>+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+ to_int_type(tolower(*s2))) )
>+ return to_int_type(ASCIIToLower(*s1))
>+ - to_int_type(tolower(*s2));
>+ }
>+
>+ return 0;
>+ }
Alternative loop body suggestion:
if (*s1 & ~0x7F)
return 1;
int_type c1 = tolower(*s1);
int_type c2 = tolower(*s2);
if (!eq_int_type(c1, c2))
return c1 - c2;
If you don't like the return 1; then you have the choice of
return *s1 - *s2;
or
int_type c2 = tolower(*s2);
if (*s1 & ~0x7F)
return *s1 - c2;
int_type c1 = tolower(*s1);
etc.
Assignee | ||
Comment 56•21 years ago
|
||
Those are fine. It should all come to the same thing really.
Comment 57•21 years ago
|
||
You can't use tolower() for ascii comparisons, last I checked -- it's
locale-aware. See bug 159328 for the havoc that can ensue.
Assignee | ||
Comment 58•21 years ago
|
||
Cripes.
Alright, I'll do it by hand inline in ASCIIToLower.
Comment 59•21 years ago
|
||
(In reply to comment #58)
>Alright, I'll do it by hand inline in ASCIIToLower.
Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
Comment 60•21 years ago
|
||
nsCRT::toLower does what you want.
Assignee | ||
Comment 61•21 years ago
|
||
> nsCRT::toLower does what you want.
nsCRT::ToLower isn't inline, unfortunately.
> Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
That should happen, since ASCIIToLower is an inline method of nsCharTraits.
Comment 62•21 years ago
|
||
(In reply to comment #61)
>>Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?
>That should happen, since ASCIIToLower is an inline method of nsCharTraits.
Yes, sorry for overlooking that.
Comment 63•20 years ago
|
||
Sorry for the huge delay roc... I do plan to spend some time reviewing these
patches early next week.
Assignee | ||
Comment 64•20 years ago
|
||
Fixes use of tolower as discussed here.
Also, I changed the IgnoreCase methods to assume that the literal/ASCII string
is already lower-case.
Attachment #147482 -
Attachment is obsolete: true
Comment 65•20 years ago
|
||
(In reply to comment #64)
>Also, I changed the IgnoreCase methods to assume that the literal/ASCII string
>is already lower-case.
That reminds me of an ignore case comparison routine I had in which I knew that
I would never be comparing symbols so I just used (ch | 0x20) to force the
character to lower case.
Note that there are currently callers of EqualsIgnoreCase that does pass strings
with uppercase characters. So I would recommend either a more descriptive name,
or at least some assertions to make sure that that isn't done for these new methods.
Assignee | ||
Comment 67•20 years ago
|
||
Such assertions are in the patch.
When the time comes for mass conversion, it's simple enough to write a regular
expression that only converts for literals that have no upper-case characters.
Comment 68•20 years ago
|
||
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS
>Index: editor/composer/src/nsComposerCommands.cpp
>+ PRBool doingAll = (allStr.EqualsLiteral("all"));
kill excess parantheses.
>Index: editor/libeditor/html/nsHTMLAbsPosition.cpp
>+ PRBool isPositioned = (positionStr.EqualsLiteral("absolute"));
ditto.
>Index: extensions/cookie/nsPermissionManager.cpp
...
>+ if (lineArray[0]->EqualsASCII(kMatchTypeHost) &&
you can use EqualsLiteral here.
>Index: htmlparser/src/CNavDTD.cpp
>- if(PR_TRUE==aParserContext.mMimeType.Equals(NS_LITERAL_CSTRING(kHTMLTextContentType))) {
>+ if(PR_TRUE==aParserContext.mMimeType.EqualsASCII(kHTMLTextContentType)) {
I think it would be good to use EqualsLiteral for these. Again, I don't
understand why you are not mapping Equals(NS_LITERAL_...) to EqualsLiteral.
Clearly if the argument was a literal before it will still be a literal
now, so there is no point mapping to EqualsASCII. Why not use EqualsLiteral
whenever you can?
Comment 69•20 years ago
|
||
>>Index: htmlparser/src/CNavDTD.cpp
I just removed the code in question, so it's not something to worry about that
callsite anymore. ;)
Assignee | ||
Comment 70•20 years ago
|
||
> Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to
> EqualsLiteral.
OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.
Comment 71•20 years ago
|
||
> OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.
r+sr=darin with that change. sorry for not saying that earlier :-(
Comment 72•20 years ago
|
||
Comment on attachment 147411 [details] [diff] [review]
drop tag changes in NSS
marking r- to reflect the fact that roc said he would be landing a different
patch. again, r+sr=me for the version that does a straight conversion from
NS_LITERAL to EqualsLiteral.
Attachment #147411 -
Flags: superreview?(darin)
Attachment #147411 -
Flags: review?(darin)
Attachment #147411 -
Flags: review-
Comment 73•20 years ago
|
||
Comment on attachment 147482 [details] [diff] [review]
Assign, Append and EqualsIgnoreCase variants
>Index: xpcom/string/public/nsCharTraits.h
>+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
>+ {
>+ for ( ; n--; ++s1, ++s2 )
>+ {
>+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
>+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+ to_int_type(tolower(*s2))) )
>+ return to_int_type(ASCIIToLower(*s1))
>+ - to_int_type(tolower(*s2));
>+ }
in the not-equals case you pay the cost of calling tolower twice.
wouldn't it be better to compute once and store in the results in a local
variable?
for ( ; n--; ++s1, ++s2 )
{
NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character");
int i1 = to_int_type(ASCIIToLower(*s1));
int i2 = to_int_type(tolower(*s2));
if ( !eq_int_type(i1, i2) )
return i1 - i2;
}
it also looks a lot cleaner IMO ;-)
>+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 )
...
>+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)),
>+ to_int_type(tolower(*s2))) )
>+ return to_int_type(ASCIIToLower(*s1))
>+ - to_int_type(tolower(*s2));
same idea applies here too.
>+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n )
...
>+ if ( tolower(*s1) != tolower(*s2) )
>+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2));
same thing here.
>+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 )
...
>+ if ( tolower(*s1) != tolower(*s2) )
>+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2));
and here.
>Index: xpcom/string/public/nsTSubstring.h
>+ NS_COM void AssignASCII( const char* data, size_type length );
>+ NS_COM void AssignASCII( const char* data );
it seems like these can be simple inline wrappers around the normal
Assign methods in the case where char_type is char. that has the
advantage of avoiding extra DSO entry points. no need to be redundant,
right?
same thing applies to the Append variants as well as the methods
declared on nsTAString.
we could also add ReplaceLiteral and InsertLiteral, but maybe those
are less common cases?
>Index: xpcom/string/src/nsTSubstring.cpp
> void
>+nsTSubstring_CharT::AssignASCII( const char* data, size_type length )
>+ {
>+ // A Unicode string can't depend on an ASCII string buffer,
>+ // so this dependence check only applies to CStrings.
>+#ifdef CharT_is_char
>+ if (IsDependentOn(data, data + length))
>+ {
>+ // take advantage of sharing here...
>+ Assign(string_type(data, length));
... and in this codepath you don't even call copyASCII -- and
that's fine -- so, no reason to both implementing AssignASCII
as anything more than an inline wrapper around the normal
Assign method when CharT_is_char is defined.
overall, this patch looks good. thanks for the fine work roc! sorry,
it took me forever to review this :-(
r+sr=darin (with the changes i suggested)
Attachment #147482 -
Flags: superreview?(darin)
Attachment #147482 -
Flags: superreview+
Attachment #147482 -
Flags: review?(darin)
Attachment #147482 -
Flags: review+
Assignee | ||
Comment 74•20 years ago
|
||
I'll do all you suggest except:
> it seems like these can be simple inline wrappers around the normal
> Assign methods in the case where char_type is char. that has the
> advantage of avoiding extra DSO entry points. no need to be redundant,
> right?
I want to check (in debug builds) that that parameter string really is ASCII,
for consistency with the AString case.
> we could also add ReplaceLiteral and InsertLiteral, but maybe those
> are less common cases?
I think so. That can be future work if merited.
> ... and in this codepath you don't even call copyASCII -- and that's fine
When the parameter is a literal, the IsDependentOn test will fail and we will
always take the copyASCII path. I would like to retain the ASCII path here. I
think the space/time impact of the extra methods will be small. I'm going to go
out on a limb and check it in as is.
If you violently disagree, let me know and I'll fix it up afterward.
Assignee | ||
Comment 75•20 years ago
|
||
Oh wait. I think you reviewed an earlier version of the patch. Attachment 147694 [details] [diff]
has slightly different code, more efficient by assuming the literal is already
lowercase. I'll attach an update with your comments (mostly) taken into account.
Assignee | ||
Comment 76•20 years ago
|
||
Attachment #147964 -
Attachment is obsolete: true
Assignee | ||
Comment 77•20 years ago
|
||
Comment on attachment 149104 [details] [diff] [review]
udpated^2
one more time :-)
Attachment #149104 -
Flags: superreview?(darin)
Attachment #149104 -
Flags: review?(darin)
Assignee | ||
Comment 78•20 years ago
|
||
I checked in the conversion of Seamonkey to EqualsLiteral. Here are the final
commands that I used, in case someone wants to convert Firefox code or some
other code:
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/EqualsLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ ==
NS_LITERAL_(C?)STRING\(([^)]*)\)/.EqualsLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\b(\w+) !=
NS_LITERAL_(C?)STRING\(([^)]*)\)/!$1.EqualsLiteral\($3\)/g'
Assignee | ||
Comment 79•20 years ago
|
||
Numbers this time:
Firefox WINNT 5.0 beast Clbr: Zdiff:-15268
Firefox MacOSX Darwin 6.8 imola Dep: Zdiff:-20656 (+19200/-39856) mZdiff:-16608
(+7440/-24048)
Firefox Linux redwood Dep: Zdiff:-30957 (+181/-31138) mZdiff:-23824 (+71/-23895)
Seamonkey Linux brad Clbr: Zdiff:-39175 (+645/-39820) mZdiff:-25456 (+224/-25680)
Seamonkey Linux luna Dep: Zdiff:-40583 (+384/-40967) mZdiff:-25992 (+89/-26081)
Comment 80•20 years ago
|
||
Comment on attachment 149104 [details] [diff] [review]
udpated^2
> compareLCASCIIIgnoreCase
i think this function deserves a comment explaining what it does.
you have a comment that mentions EqualsLiteralIgnoreCase, but the code names
the function EqualsLCLiteralIgnoreCase. also, i have to say that i find these
names very cumbersome. is there no better name that we can come up with?
why don't we just state that all string literals passed to the ignore-case
versions need to be lower case? then the function names wouldn't need the LC
acronym.
EqualsLiteralIC might be better than EqualsLCLiteralIgnoreCase IMO.
Assignee | ||
Comment 81•20 years ago
|
||
> why don't we just state that all string literals passed to the ignore-case
> versions need to be lower case?
That sounds OK for the Literal versions but for the generic ASCII versions it
seems a bit less suitable. Maybe we just have to deal.
How about just dropping the LC and having EqualsASCIIIgnoreCase,
EqualsLiteralIgnoreCase? EqualsLiteralIC sounds fine but EqualsASCIIIC doesn't...
Comment 82•20 years ago
|
||
I'm fine with EqualsLiteralIgnoreCase and EqualsASCIIIgnoreCase.
Assignee | ||
Comment 83•20 years ago
|
||
want me to make a new patch?
Comment 84•20 years ago
|
||
Comment on attachment 149104 [details] [diff] [review]
udpated^2
no need for a new patch. just drop the LC everywhere, and document the cases
where the input string must be lowercase.
r+sr=darin with that.
Attachment #149104 -
Flags: superreview?(darin)
Attachment #149104 -
Flags: superreview+
Attachment #149104 -
Flags: review?(darin)
Attachment #149104 -
Flags: review+
/me reminds of comment 66
Assignee | ||
Comment 86•20 years ago
|
||
Hmm, that's bad. Thanks for reminding me.
How about LowerCaseEqualsASCII, LowerCaseEqualsLiteral?
foo->LowerCaseEqualsLiteral("womble")
foo->LowerCaseEqualsASCII(str)
Comment 87•20 years ago
|
||
> Hmm, that's bad. Thanks for reminding me.
yeah, i guess so... though we do have an assertion ;-)
> foo->LowerCaseEqualsLiteral("womble")
> foo->LowerCaseEqualsASCII(str)
does that imply:
foo->LowerCaseEqualsLiteralIgnoreCase("womble");
foo->LowerCaseEqualsASCIIIgnoreCase(str);
my fingers hurt :-/
Assignee | ||
Comment 88•20 years ago
|
||
> does that imply:
>
> foo->LowerCaseEqualsLiteralIgnoreCase("womble");
> foo->LowerCaseEqualsASCIIIgnoreCase(str);
Nope, LowerCaseEqualsLiteral and LowerCaseEqualsASCII are the full names
Comment 89•20 years ago
|
||
In case I wasn't the only one who didn't get it:
"LowerCase of |this| equals some literal"
I think I like these function names a lot.
Assignee | ||
Comment 90•20 years ago
|
||
Checked in.
Assignee | ||
Comment 91•20 years ago
|
||
OK, last step: convert codebase to use these functions.
Here's what I did:
# Update obvious calls to Append and Assign
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAppend\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AppendLiteral\($2\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAssign\(NS_LITERAL_(C?)STRING\(([^)]*)\)\)/AssignLiteral\($2\)/g'
# Change Append/AssignWithConversion of a literal to Append/AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAppendWithConversion\(("[^"]*")\)/AppendLiteral\($1\)/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bAssignWithConversion\(("[^"]*")\)/AssignLiteral\($1\)/g'
# Update simple uses of operator+= and operator= to Append/AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ \+=
NS_LITERAL_(C?)STRING\(([^)]*)\);/.AppendLiteral\($2\);/g'
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ =
NS_LITERAL_(C?)STRING\(([^)]*)\);/.AssignLiteral\($2\);/g'
# Convert EqualsIgnoreCase of literals to LowerCaseEqualsLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e
's/\bEqualsIgnoreCase\(("[^"]*")\)/"LowerCaseEqualsLiteral\(".lc($1)."\)"/eg'
# Convert Equals(NS_LITERAL..., nsCaseInsensitiveStringComparator()) to
LowerCaseEqualsLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -0 -pi -e
's/\bEquals\(NS_LITERAL_(C?)STRING\(("[^"]*")\),[
\n\t]*nsCaseInsensitiveStringComparator\(\)\)/"LowerCaseEqualsLiteral\(".lc($2)."\)"/egs'
# Fix up <type> <identifier>.AssignLiteral
find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/String
(\w+)\.AssignLiteral/String $1; $1\.AssignLiteral/g'
# Manually restore calls to nsIFile::Append in nsJVMConfigManagerUnix.cpp and
# nsLayoutStyleSheetCache.cpp, CBrowserShell.cpp, nsEmbedGlobalHistory.cpp,
# mozPersonalDictionary.cpp, mozMySpell.cpp, myspAffixMgr.cpp, pyloader.cpp,
# testXalan.cpp, nsBayesianFilter.cpp, nsBookmarksService.cpp, winEmbed.cpp
I'll attach the resulting patch.
Assignee | ||
Comment 92•20 years ago
|
||
One thing to consider is whether LowerCaseEqualsLiteral makes the code less
readable by forcing the RHS to be lower-cased.
This patch by no means covers all the cases that can take advantage of the new
functions. It's the easily-automated low hanging fruit.
Note that this file is gzipped because the uncompressed patch is too big for
Bugzilla's little mind.
Assignee | ||
Comment 93•20 years ago
|
||
Comment on attachment 150174 [details]
tada!
here you go. by my count, 2252 call sites were changed. You may wish to just
review the script :-)
Attachment #150174 -
Flags: superreview?(darin)
Attachment #150174 -
Flags: review?(darin)
Assignee | ||
Comment 94•20 years ago
|
||
Ignore the Makefile change in that patch. I won't check it in.
Assignee | ||
Comment 95•20 years ago
|
||
Hum. nsCaseInsensitiveStringComparator is Unicode aware so it's not generally
correct to replace it with LowerCaseEqualsASCII. I will take out those changes.
Assignee | ||
Comment 96•20 years ago
|
||
Another option would be to make LowerCaseEqualsASCII on wide strings be Unicode
aware. It doesn't look that hard. What is the rationale for having the case
converter be an XPCOM service?
Assignee | ||
Comment 97•20 years ago
|
||
Hmm. We only care about Unicode characters that can lowercase-map to ASCII.
Looking at the Unicode data files to see what Unicode characters that means, I find:
-- I-with-dot maps to ASCII i followed by a combining dot; the combining dot is
not ASCII so it can't match.
-- Similarly for Lithuanian accented I and J
-- In Turkic and Azeri locales, I-with-dot maps to plain ASCII i, but would we
really want this comparison to be locale dependent? Argh. I don't think we do it
locale ind
-- 0x212A, KELVIN SIGN, maps to ASCII k
I18N is fun!
Looking at the current case converter (and assuming that there's not some magic
alternative service that might be selected at run time; if so it doesn't appear
to be in our tree), we map I-with-dot to plain ASCII i and KELVIN SIGN to ASCII
k, and that's it. So it would be trivial to have LowerCaseEquals code preserve
this behaviour (in fact the change will go in
nsCharTraits<PR_Unichar>::ASCIIToLower). I'll do that, and leave the
mass-API-update patch as is.
Comment 98•20 years ago
|
||
(In reply to comment #97)
> Hmm. We only care about Unicode characters that can lowercase-map to ASCII.
wouldn't it be easier to map the ASCII string to unicode, in the lower- and
uppercase variant? for ASCII, it seems that it's trivial to find the uppercase
equivalent in basically all cases...
(unless you want i to uppercase to I-with-dot in the turkish locale, which I
doubt you do)
Assignee | ||
Comment 99•20 years ago
|
||
(In reply to comment #98)
> wouldn't it be easier to map the ASCII string to unicode, in the lower- and
> uppercase variant? for ASCII, it seems that it's trivial to find the uppercase
> equivalent in basically all cases...
I don't know what you mean.
The plan in my previous comment is simple and will preserve current behaviour.
please make sure the i18n people are in on this. They might have plans to add
support for more characters.
Assignee | ||
Comment 101•20 years ago
|
||
From my reading of the Unicode tables, there really is no other significant case
unless we apply specific rules for Lithuanian locales, but CCing smontagu anyway
just in case :-)
Assignee | ||
Comment 102•20 years ago
|
||
As described. This should guarantee that LowerCaseEqualsLiteral(...) (applied
to a lowercase literal) behaves identically to Equals(...,
nsCaseInsensitiveStringComparator()).
Assignee | ||
Updated•20 years ago
|
Attachment #150243 -
Flags: superreview?(darin)
Attachment #150243 -
Flags: review?(darin)
Comment 103•20 years ago
|
||
I don't think that these changes are valid. The old way was correct and is what
would be expected from a function called 'ASCIIToLower'. Since you are
explicitly only lowering ascii characters, and the literals are only supposed to
be ASCII characters, the comparison should fail when matching anything other
than ascii characters.
Assignee | ||
Comment 104•20 years ago
|
||
Maybe we should change the name of that function from ASCIIToLower to something
else. But the comment indicates what it's supposed to do and it does do what
it's supposed to do.
Comment 105•20 years ago
|
||
Are you sure it's wise to add i18n code to xpcom? What if we learn that our
uconv code needs to be changed... will the folks making those changes know to
update xpcom? Why not implement the nsSubstring::LowerCaseEqualsLiteral symbol
outside of xpcom? Why not put it inside unicharutils along side
nsCaseInsensitiveStringComparator? It seems reasonable to me to force consumers
to link to unicharutils to get this functionality.
Assignee | ||
Comment 106•20 years ago
|
||
It's only about six lines of code, based directly on the Unicode spec, unlikely
to ever need changing. I don't think it's worth the pain of moving those lines
out into unicharutils, nor the complexity of having nsSubstring split across two
DLLs.
Comment 107•20 years ago
|
||
(In reply to comment #99)
> I don't know what you mean.
I meant: Instead of taking the unicode character and converting it to lowercase,
you could take the ASCII character, and convert it to unicode (which is as
simple as a cast), if that doesn't match uppercase the ascii character and
convert it to unicode and compare. that way, you need not worry about unicode
lowercasing rules, I think.
Assignee | ||
Comment 108•20 years ago
|
||
biesi's suggestion, clarified on IRC, amounts to exactly what the currently
checked in code does.
> It's only about six lines of code, based directly on the Unicode spec, unlikely
> to ever need changing.
the unicode spec isn't static but changes from version to version with new
characters added.
Comment 110•20 years ago
|
||
Re: comment 105
If this is to be included in xpcom, then comments should be added to both this
location and the appropriate uconv location so that updates in one place will
find the other.
Assignee | ||
Comment 111•20 years ago
|
||
(In reply to comment #109)
> > It's only about six lines of code, based directly on the Unicode spec,
> > unlikely to ever need changing.
>
> the unicode spec isn't static but changes from version to version with new
> characters added.
The chances that the Unicode people add some hitherto-obscure characters which
get downcase-mapped to ASCII are extremely low, especially considering there are
only two such characters today. The chances that accidentally failing to handle
case-insensitive comparision of such characters would be noticeable, let alone
regarded as a serious issue, are even more negligible. In any case I can add
comments to the unicharutils code pointing to the nsCharTraits code. Can we just
do the simple thing to make all the Equals(...,
nsCaseInsensitiveStringComparator()) calls turn into nice fast code, and close
this bug out?
Comment 112•20 years ago
|
||
Comments are sufficient as far as I'm concerned.
Comment 113•20 years ago
|
||
Comment on attachment 150243 [details] [diff] [review]
fix nsCharTraits<PR_Unichar> for full I18N correctness
r+sr=darin
Please add comments to this code regarding the Unicode to ASCII conversion.
Also, there's no reason to put an "else" after "return 'k';"
I don't need to see another patch for comments, but please don't hesitate to be
verbose in the comments here and in intl/ :-)
Attachment #150243 -
Flags: superreview?(darin)
Attachment #150243 -
Flags: superreview+
Attachment #150243 -
Flags: review?(darin)
Attachment #150243 -
Flags: review+
Comment 114•20 years ago
|
||
Comment on attachment 150174 [details]
tada!
well done!
r+sr=darin
Attachment #150174 -
Flags: superreview?(darin)
Attachment #150174 -
Flags: superreview+
Attachment #150174 -
Flags: review?(darin)
Attachment #150174 -
Flags: review+
Assignee | ||
Comment 115•20 years ago
|
||
all checked in. Not much Tp impact, still waiting for Tinderbox size numbers.
Thanks all!
Status: NEW → RESOLVED
Closed: 20 years ago
Resolution: --- → FIXED
Comment 116•20 years ago
|
||
http://www.mozilla.org/projects/xpcom/string-guide.html
will need to be updated for these changes.
Assignee | ||
Comment 117•20 years ago
|
||
Reduced codesize by about 70K on brad.
Assignee | ||
Comment 118•20 years ago
|
||
Yes, I should update the string guide, but I'm not sure how.
Comment 119•20 years ago
|
||
I've got an alternative to the template based solution for EqualsLiteral and
family. The current code breaks VC++ 2003 anyway. The existing solution is a bit
edgy template wise. The solution I propose is a bit more main stream, I believe
and will allow this optimization to be handled on more compilers. I'm not
touching the NS_DISABLE_LITERAL_TEMPLATE macro at this point, but I suspect it
can be less restrictive if not removed all together. People with the various
platforms need to test this out. Patch coming up.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 120•20 years ago
|
||
Comment 121•20 years ago
|
||
Oh, and swalker reports that VC++8 won't compile this either.
Assignee | ||
Comment 122•20 years ago
|
||
If you do
const char* foo = "blah";
bork.EqualsLiteral(foo)
does it compile?
Comment 123•20 years ago
|
||
Oh, so were you using this template mechanism to weed out passing string
litterals versus by char pointers
I think I found what was tripping up the VC++ 2003. It was the use of the
terinary operator in the nsIWin32LocalImpl.cpp. So this didn't cause the
problem at all.
Putting this back to FIXED, I'll provide a patch in another bug for the
nsIWin32LocalImp
Status: REOPENED → RESOLVED
Closed: 20 years ago → 20 years ago
Resolution: --- → FIXED
Updated•20 years ago
|
Attachment #162659 -
Attachment is obsolete: true
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•