<a class="header-button" href="https://bugzilla-dev.allizom.org/home" title="Go to home page"> Bugzilla

Comment 4

•

21 years ago

Comment on attachment 136051 [details] [diff] [review] as described You need a new patch for this against the current string code. I'm not crazy about the method on nsAString -- I think it should probably be named something like EqualsASCII, since assertions about non-ASCII characters wouldn't be hit by most developers even if the string were a UTF-8 string coming from user input.

Attachment #136051 - Flags: superreview?(dbaron)

Attachment #136051 - Flags: superreview-

Attachment #136051 - Flags: review?(scc)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 5

•

21 years ago

so, we currently have: PRBool nsACString::Equals(const char *) const; and, PRBool nsAString::Equals(const char *) const; sounds a lot like: PRBool nsString::EqualsWithConversion(const char *, ... ) const;

Assignee

Comment 6

•

21 years ago

Yeah, except that the performance can be good. I'll whip up a new patch with EqualsASCII on both.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 7

•

21 years ago

EqualsWithConversion does not inflate the argument before comparing. It uses nsStringObsolete.cpp's Compare2to1 method, which is pretty efficient. It sounds to me like the two methods should share the same implementation. I don't mind if you do away with some of the nsStringObsolete code ;-)

Assignee

Comment 8

•

21 years ago

Attached patch woohoo, check this out (obsolete) (deleted) — Details — Splinter Review

Okay, it's late and maybe I got carried away

Attachment #136051 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 9

•

21 years ago

Attached patch revised (obsolete) (deleted) — Details — Splinter Review

it was definitely late, because that patch was off by one.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Attachment #145241 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 10

•

21 years ago

The name should be changed to EqualsLiteral. darin: oh. well, the old string guide was incorrect then. I think the amount of code that could be reused by nsStringObselete is fairly trivial...

Comment 11

•

21 years ago

oooh... i like using templates to get the string literal's length. i wonder if we are going to run into portability problems with that :-/

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 12

•

21 years ago

Comment on attachment 145257 [details] [diff] [review] revised >Index: xpcom/string/public/nsCharTraits.h >+ compareASCII( const char_type* s1, const char* s2, size_t n ) ... >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); the same assertion does not exist for the |char_type = char| case. isn't that inconsistent? >Index: xpcom/string/public/nsTAString.h >+ /** >+ * An efficient comparison with ASCII that can be used even for >+ * wide strings. >+ */ >+ NS_COM PRBool EqualsASCII( const char* data, int len ) const; s/int/size_type/ >+ template<int N> >+ inline PRBool EqualsASCII(const char (&str)[N]) const { >+ return EqualsASCII(str, N-1); >+ } nit: use SCC-style indentation for consistency: template<int N> inline PRBool EqualsASCII( const char (&str)[N] ) const { return EqualsASCII(str, N-1); } >Index: xpcom/string/public/nsTSubstring.h >+ NS_COM PRBool EqualsASCII( const char* data, int len ) const; >+ template<int N> >+ inline PRBool EqualsASCII(const char (&str)[N]) const { >+ return EqualsASCII(str, N-1); >+ } same nits apply here. r=me w/ those changes.

Attachment #145257 - Flags: review-

Assignee

Comment 13

•

21 years ago

Attached patch revised (obsolete) (deleted) — Details — Splinter Review

revised as per darin's comments. We ought to prepare the fallback code. What's the cleanest way of selecting between the template code and some strlen code per-platform?

Attachment #145257 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 14

•

21 years ago

Attached patch revised^2 (deleted) — Details — Splinter Review

This adds an easy way to turn off the use of templates to match literals. I've built with the templates disabled and that works.

Attachment #145447 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 15

•

21 years ago

Comment on attachment 145975 [details] [diff] [review] revised^2 Carrying forward darin's r+. David, if you're OK with this code then I think we can go ahead and check it in on an otherwise quiet evening and see what happens...

Attachment #145975 - Flags: superreview?(dbaron)

Attachment #145975 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 16

•

21 years ago

Comment on attachment 145975 [details] [diff] [review] revised^2 The NSCAP_ prefix has always been used only for nsCOMPtr. Perhaps just use NS_? #ifdef-ing includes is a recipe for bustage. However, nsCharTraits.h already includes <string.h>, so I think you should just remove those #ifdef-ed includes completely. with those changes, sr=dbaron

Attachment #145975 - Flags: superreview?(dbaron) → superreview+

Assignee

Comment 17

•

21 years ago

Sure. I'll do those things and check in sometime soon when the tree is quiet and I have time to babysit. Thanks!

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Priority: -- → P1

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 18

•

21 years ago

checked in. here we go

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 19

•

21 years ago

Comment on attachment 145975 [details] [diff] [review] revised^2 >+#ifdef NSCAP_DISABLE_LITERAL_TEMPLATE >+ inline PRBool EqualsLiteral( const char* str ) const >+ { >+ return EqualsASCII(str, strlen(str)); >+ } >+#else >+ template<int N> >+ inline PRBool EqualsLiteral( const char (&str)[N] ) const >+ { >+ return EqualsASCII(str, N-1); >+ } >+#endif Do you suppose this template would work on MSVC? template<class S> inline PRBool EqualsLiteral( S &str ) const { return EqualsASCII(str, sizeof str - 1); }

Assignee

Comment 20

•

21 years ago

I don't know if that will work, but I'd like someone with a VC++ compiler to tell me. I'd also like to know whether VC++ 7 can handle the current template. In the meantime I'm going to convert over a file or two so we can see what the code size impact is.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 21

•

21 years ago

Attached patch experiments for code size (deleted) — Details — Splinter Review

Here are a couple of call sites converted to EqualsLiteral. The nsPresShell patch uses nsAString::EqualsLiteral and the nsObjectFrame patch uses nsACString::EqualsLiteral. I'll check these two parts in separately so we can see the effect of each.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Attachment #146955 - Flags: superreview?(dbaron)

Attachment #146955 - Flags: review?(dbaron)

Harshal Pradhan

Comment 22

•

21 years ago

> I'd also like to know whether VC++ 7 can handle the current template. VC++ 7.1 (as found in Visual Studio .Net 2003 Enterprise) seems to be able to compile the template. I backed out your latest patch to fix windows tinderbox bustage. xpcom.dll just built without errors. Dunno about VC++ 7.0 though.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 23

•

21 years ago

Alright, I'll make the test check _MSC_VER < 1310.

Harshal Pradhan

Comment 24

•

21 years ago

If I were to try this ... char chars[8] = "foo"; nsCString buf(chars); printf("%s\n", buf.EqualsLiteral(chars) ? "true" : "false"); ... it would compile without errors. And I would see false on a platform which is using the template and true on one which not. No? Documenting how not to use this method might prevent those sorts of abuses.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 25

•

21 years ago

Good point, but I think this is not very likely. Anyway I will add some documentation explaining that you really need to pass a literal string, not a reference to a variable or any other token.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 26

•

21 years ago

Comment on attachment 146955 [details] [diff] [review] experiments for code size beware that code size experiments with small samples are sometimes misleading (inlined functions output separately once they're used, perhaps, or things like that)

Attachment #146955 - Flags: superreview?(dbaron)

Attachment #146955 - Flags: superreview+

Attachment #146955 - Flags: review?(dbaron)

Attachment #146955 - Flags: review+

Assignee

Comment 27

•

21 years ago

OK. Stuff checked in. I guess the next step will be to use a script to whack all of layout and see what effect that has on code size. I'll apply the following two regular expressions: s/\bEquals$NS_LITERAL_(C?)STRING\((\"[^"]*\")$\)/EqualsLiteral$$2$/ s/\b== $NS_LITERAL_(C?)STRING\((\"[^"]*\")$/.EqualsLiteral$$2$/ There are also instances of '!= NS_LITERAL_STRING' but I'll have to fix those up by hand. How much review do we want for these changes?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 28

•

21 years ago

Results from the Firefox tinderboxes: Linux redwood: Zdiff:-208 (+0/-208) MacOSX imola: Zdiff:+16 (+156/-140) WINNT5 beast: Zdiff:+74 (+304/-230) beast is using the strlen version. Too bad we don't have a VC++7.1 tinderbox.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 29

•

21 years ago

Seamonkey tinderboxes: Linux brad: Zdiff:-336 (+2/-338) Linux luna: Zdiff:-224 (+0/-224)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 30

•

21 years ago

That's a whole lot for just 3 string compares. Maybe we should experiment with making the non-template version non-inline.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 31

•

21 years ago

Attached patch provide a fast compare-with-null-terminated-ASCII (deleted) — Details — Splinter Review

This can be used by the non-templated EqualsLiteral, and also by client code that has a char* or char[] that may not be a literal. It's even plausible that it would be a win to do away with the template altogether and just use this. We can test that later...

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 32

•

21 years ago

Comment on attachment 147309 [details] [diff] [review] provide a fast compare-with-null-terminated-ASCII Note that I've tested this by converting layout (patch forthcoming) to EqualsLiteral and then building with the template disabled.

Attachment #147309 - Flags: superreview?(dbaron)

Attachment #147309 - Flags: review?(dbaron)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 33

•

21 years ago

Attached patch layout conversion (deleted) — Details — Splinter Review

This used the regexps that I noted plus by-hand conversion of the != operators. Plus, uses of Equals(NS_LITERAL_(C?)STRING(...)) where the string was not a literal were converted to EqualsASCII.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 34

•

21 years ago

Comment on attachment 147310 [details] [diff] [review] layout conversion Up to you how much you want to review vs rubberstamp :-)

Attachment #147310 - Flags: superreview?(dbaron)

Attachment #147310 - Flags: review?(dbaron)

Updated

•

21 years ago

Attachment #147309 - Flags: superreview?(dbaron)

Attachment #147309 - Flags: superreview+

Attachment #147309 - Flags: review?(dbaron)

Attachment #147309 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

21 years ago

Attachment #147310 - Flags: superreview?(dbaron)

Attachment #147310 - Flags: superreview+

Attachment #147310 - Flags: review?(dbaron)

Attachment #147310 - Flags: review+

Assignee

Comment 35

•

21 years ago

checked in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 36

•

21 years ago

The rough numbers (aaronl checked in a small patch at the same time): Firefox Linux (redwood): Zdiff:-14126 (+151/-14277) Firefox MacOSX (imola): Zdiff:-12288 (+720/-13008) Firefox WinNT5 (beast): Zdiff:-19180 (+1895/-21075) Seamonkey Linux (brad): Zdiff:-13670 (+1310/-14980) Seamonkey Linux (luna): Zdiff:-15874 (+194/-16068) It also appears to have reduced Tp. Definitely worth converting the rest of the codebase! We should probably do an AppendASCII/AppendLiteral as well.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 37

•

21 years ago

Attached patch convert the rest of the tree (obsolete) (deleted) — Details — Splinter Review

This is it!

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 38

•

21 years ago

More things that should be done after this: provide AppendASCII/AppendLiteral provide EqualsASCIICaseInsensitive/EqualsLiteralCaseInsensitive

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 39

•

21 years ago

darin, dbaron, does one of you want to rubber-stamp this?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 40

•

21 years ago

Attached patch drop tag changes in NSS (deleted) — Details — Splinter Review

those tag changes in NSS don't need to be there.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

21 years ago

Attachment #147410 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 41

•

21 years ago

How did you create the patch? I'd rather review an automated process than review the patch.

Assignee

Comment 42

•

21 years ago

It was mostly done using these commands: find . -name '*.cpp' | xargs perl -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\((\"[^"]*\")$\)/EqualsLiteral$$2$/g' find . -name '*.h' | xargs perl -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\((\"[^"]*\")$\)/EqualsLiteral$$2$/g' find . -name '*.cpp' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING$(\"[^"]*\"$/.EqualsLiteral$$2$/g' find . -name '*.h' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING$(\"[^"]*\"$/.EqualsLiteral$$2$/g' find . -name '*.cpp' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING$(\"[^"]*\")$/!$1.EqualsLiteral$$3$/g' find . -name '*.h' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING$(\"[^"]*\")$/!$1.EqualsLiteral$$3$/g' find . -name '*.h' | xargs perl -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\(([^$]+)\)\)/EqualsASCII$$2$/g' find . -name '*.cpp' | xargs perl -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\(([^$]+)\)\)/EqualsASCII$$2$/g' I also hand-edited a few files: htmlparser/src/nsParser.cpp parser/htmlparser/src/nsParser.cpp (not sure which is the real one, I edited them both) xpfe/components/winhooks/nsWindowsHooksUtil.cpp

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 43

•

21 years ago

- contentType.Equals(NS_LITERAL_CSTRING(UNKNOWN_CONTENT_TYPE)) || + contentType.EqualsASCII(UNKNOWN_CONTENT_TYPE) || you could use EqualsLiteral here. UNKNOWN_CONTENT_TYPE is just a macro. in fact, i think there are other cases like this in the patch. i would recommend searching the patch for cases where EqualsASCII is used, and fixup as appropriate.

Assignee

Comment 44

•

21 years ago

I would rather use EqualsASCII for the places where it is not clear looking just at the line that it's a literal. Otherwise we create potential for confusion. The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 45

•

21 years ago

(In reply to comment #44) > I would rather use EqualsASCII for the places where it is not clear looking just > at the line that it's a literal. Otherwise we create potential for confusion. > The gain from using EqualsLiteral over EqualsASCII is probably quite small, anyway. why is that more confusing than the code it replaces? NS_LITERAL_CSTRING(FOO) -> EqualsLiteral(FOO)

Assignee

Comment 46

•

21 years ago

good point. Let's reduce the confusion :-)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 47

•

21 years ago

Honestly, we could convert those to EqualsLiteral, and I don't really mind if someone does it, but it's just extra work for dubious performance gain and I'm not really motivated to do it.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 48

•

21 years ago

The EqualsASCIIIgnoreCase stuff is hard because xpcom can't depend on intl, so we can't have functions in xpcom that convert Unicode to lower case. Any suggestions?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 49

•

21 years ago

I believe that in any case when we want to be doing EqualsASCIIIgnoreCase we want to be doing locale-insensitive conversion. So no intl involved...

Assignee

Comment 50

•

21 years ago

So is it true that the only Unicode character that locale-independent lower-case maps to 'e' is 'E'?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 51

•

21 years ago

I think it's more like "if we're doing a EqualsASCIIIgnoreCase and we have any non-ASCII characters, we're not equal". And then it's a matter of doing a case-insensitive comparison between two characters that are both in the ascii range... (still have to do locale-independent because of things like the 'I' and 'i' business in Turkish).

Assignee

Comment 52

•

21 years ago

Attached patch Assign, Append and EqualsIgnoreCase variants (obsolete) (deleted) — Details — Splinter Review

OK, this patch does that.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 53

•

21 years ago

Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants Seems to work after I converted layout to use these functions.

Attachment #147482 - Flags: superreview?(darin)

Attachment #147482 - Flags: review?(darin)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 54

•

21 years ago

Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS Darin, you said you'd review or rubber-stamp this?

Attachment #147411 - Flags: superreview?(darin)

Attachment #147411 - Flags: review?(darin)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 55

•

21 years ago

Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants >+ static >+ int >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) >+ { >+ for ( ; n--; ++s1, ++s2 ) >+ { >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); >+ } >+ >+ return 0; >+ } Alternative loop body suggestion: if (*s1 & ~0x7F) return 1; int_type c1 = tolower(*s1); int_type c2 = tolower(*s2); if (!eq_int_type(c1, c2)) return c1 - c2; If you don't like the return 1; then you have the choice of return *s1 - *s2; or int_type c2 = tolower(*s2); if (*s1 & ~0x7F) return *s1 - c2; int_type c1 = tolower(*s1); etc.

Assignee

Comment 56

•

21 years ago

Those are fine. It should all come to the same thing really.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 57

•

21 years ago

You can't use tolower() for ascii comparisons, last I checked -- it's locale-aware. See bug 159328 for the havoc that can ensue.

Assignee

Comment 58

•

21 years ago

Cripes. Alright, I'll do it by hand inline in ASCIIToLower.

Comment 59

•

21 years ago

(In reply to comment #58) >Alright, I'll do it by hand inline in ASCIIToLower. Does that include inlining ASCIIToLower into compareASCIIIgnoreCase?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 60

•

21 years ago

nsCRT::toLower does what you want.

Assignee

Comment 61

•

21 years ago

> nsCRT::toLower does what you want. nsCRT::ToLower isn't inline, unfortunately. > Does that include inlining ASCIIToLower into compareASCIIIgnoreCase? That should happen, since ASCIIToLower is an inline method of nsCharTraits.

Comment 62

•

21 years ago

(In reply to comment #61) >>Does that include inlining ASCIIToLower into compareASCIIIgnoreCase? >That should happen, since ASCIIToLower is an inline method of nsCharTraits. Yes, sorry for overlooking that.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 63

•

20 years ago

Sorry for the huge delay roc... I do plan to spend some time reviewing these patches early next week.

Assignee

Comment 64

•

20 years ago

Attached patch updated (obsolete) (deleted) — Details — Splinter Review

Fixes use of tolower as discussed here. Also, I changed the IgnoreCase methods to assume that the literal/ASCII string is already lower-case.

Attachment #147482 - Attachment is obsolete: true

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 65

•

20 years ago

(In reply to comment #64) >Also, I changed the IgnoreCase methods to assume that the literal/ASCII string >is already lower-case. That reminds me of an ignore case comparison routine I had in which I knew that I would never be comparing symbols so I just used (ch | 0x20) to force the character to lower case.

Comment 66

•

20 years ago

Note that there are currently callers of EqualsIgnoreCase that does pass strings with uppercase characters. So I would recommend either a more descriptive name, or at least some assertions to make sure that that isn't done for these new methods.

Robert O'Callahan (:roc) (email my personal email if necessary)

Updated

•

20 years ago

Depends on: 242382

Assignee

Comment 67

•

20 years ago

Such assertions are in the patch. When the time comes for mass conversion, it's simple enough to write a regular expression that only converts for literals that have no upper-case characters.

Comment 68

•

20 years ago

Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS >Index: editor/composer/src/nsComposerCommands.cpp >+ PRBool doingAll = (allStr.EqualsLiteral("all")); kill excess parantheses. >Index: editor/libeditor/html/nsHTMLAbsPosition.cpp >+ PRBool isPositioned = (positionStr.EqualsLiteral("absolute")); ditto. >Index: extensions/cookie/nsPermissionManager.cpp ... >+ if (lineArray[0]->EqualsASCII(kMatchTypeHost) && you can use EqualsLiteral here. >Index: htmlparser/src/CNavDTD.cpp >- if(PR_TRUE==aParserContext.mMimeType.Equals(NS_LITERAL_CSTRING(kHTMLTextContentType))) { >+ if(PR_TRUE==aParserContext.mMimeType.EqualsASCII(kHTMLTextContentType)) { I think it would be good to use EqualsLiteral for these. Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to EqualsLiteral. Clearly if the argument was a literal before it will still be a literal now, so there is no point mapping to EqualsASCII. Why not use EqualsLiteral whenever you can?

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 69

•

20 years ago

>>Index: htmlparser/src/CNavDTD.cpp I just removed the code in question, so it's not something to worry about that callsite anymore. ;)

Assignee

Comment 70

•

20 years ago

> Again, I don't understand why you are not mapping Equals(NS_LITERAL_...) to > EqualsLiteral. OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns.

Comment 71

•

20 years ago

> OK OK I give. I'll convert all ...(NS_LITERAL(...)) patterns. r+sr=darin with that change. sorry for not saying that earlier :-(

Comment 72

•

20 years ago

Comment on attachment 147411 [details] [diff] [review] drop tag changes in NSS marking r- to reflect the fact that roc said he would be landing a different patch. again, r+sr=me for the version that does a straight conversion from NS_LITERAL to EqualsLiteral.

Attachment #147411 - Flags: superreview?(darin)

Attachment #147411 - Flags: review?(darin)

Attachment #147411 - Flags: review-

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 73

•

20 years ago

Comment on attachment 147482 [details] [diff] [review] Assign, Append and EqualsIgnoreCase variants >Index: xpcom/string/public/nsCharTraits.h >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) >+ { >+ for ( ; n--; ++s1, ++s2 ) >+ { >+ NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); >+ } in the not-equals case you pay the cost of calling tolower twice. wouldn't it be better to compute once and store in the results in a local variable? for ( ; n--; ++s1, ++s2 ) { NS_ASSERTION(!(*s2 & ~0x7F), "Unexpected non-ASCII character"); int i1 = to_int_type(ASCIIToLower(*s1)); int i2 = to_int_type(tolower(*s2)); if ( !eq_int_type(i1, i2) ) return i1 - i2; } it also looks a lot cleaner IMO ;-) >+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 ) ... >+ if ( !eq_int_type(to_int_type(ASCIIToLower(*s1)), >+ to_int_type(tolower(*s2))) ) >+ return to_int_type(ASCIIToLower(*s1)) >+ - to_int_type(tolower(*s2)); same idea applies here too. >+ compareASCIIIgnoreCase( const char_type* s1, const char* s2, size_t n ) ... >+ if ( tolower(*s1) != tolower(*s2) ) >+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2)); same thing here. >+ compareASCIINullTerminatedIgnoreCase( const char_type* s1, size_t n, const char* s2 ) ... >+ if ( tolower(*s1) != tolower(*s2) ) >+ return to_int_type(tolower(*s1)) - to_int_type(tolower(*s2)); and here. >Index: xpcom/string/public/nsTSubstring.h >+ NS_COM void AssignASCII( const char* data, size_type length ); >+ NS_COM void AssignASCII( const char* data ); it seems like these can be simple inline wrappers around the normal Assign methods in the case where char_type is char. that has the advantage of avoiding extra DSO entry points. no need to be redundant, right? same thing applies to the Append variants as well as the methods declared on nsTAString. we could also add ReplaceLiteral and InsertLiteral, but maybe those are less common cases? >Index: xpcom/string/src/nsTSubstring.cpp > void >+nsTSubstring_CharT::AssignASCII( const char* data, size_type length ) >+ { >+ // A Unicode string can't depend on an ASCII string buffer, >+ // so this dependence check only applies to CStrings. >+#ifdef CharT_is_char >+ if (IsDependentOn(data, data + length)) >+ { >+ // take advantage of sharing here... >+ Assign(string_type(data, length)); ... and in this codepath you don't even call copyASCII -- and that's fine -- so, no reason to both implementing AssignASCII as anything more than an inline wrapper around the normal Assign method when CharT_is_char is defined. overall, this patch looks good. thanks for the fine work roc! sorry, it took me forever to review this :-( r+sr=darin (with the changes i suggested)

Attachment #147482 - Flags: superreview?(darin)

Attachment #147482 - Flags: superreview+

Attachment #147482 - Flags: review?(darin)

Attachment #147482 - Flags: review+

Assignee

Comment 74

•

20 years ago

I'll do all you suggest except: > it seems like these can be simple inline wrappers around the normal > Assign methods in the case where char_type is char. that has the > advantage of avoiding extra DSO entry points. no need to be redundant, > right? I want to check (in debug builds) that that parameter string really is ASCII, for consistency with the AString case. > we could also add ReplaceLiteral and InsertLiteral, but maybe those > are less common cases? I think so. That can be future work if merited. > ... and in this codepath you don't even call copyASCII -- and that's fine When the parameter is a literal, the IsDependentOn test will fail and we will always take the copyASCII path. I would like to retain the ASCII path here. I think the space/time impact of the extra methods will be small. I'm going to go out on a limb and check it in as is. If you violently disagree, let me know and I'll fix it up afterward.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 75

•

20 years ago

Oh wait. I think you reviewed an earlier version of the patch. Attachment 147694 [details] [diff] has slightly different code, more efficient by assuming the literal is already lowercase. I'll attach an update with your comments (mostly) taken into account.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 76

•

20 years ago

Attached patch udpated^2 (deleted) — Details — Splinter Review

Attachment #147964 - Attachment is obsolete: true

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 77

•

20 years ago

Comment on attachment 149104 [details] [diff] [review] udpated^2 one more time :-)

Attachment #149104 - Flags: superreview?(darin)

Attachment #149104 - Flags: review?(darin)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 78

•

20 years ago

I checked in the conversion of Seamonkey to EqualsLiteral. Here are the final commands that I used, in case someone wants to convert Firefox code or some other code: find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\(([^)]*)$\)/EqualsLiteral$$2$/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ == NS_LITERAL_(C?)STRING$([^)]*)$/.EqualsLiteral$$2$/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\b(\w+) != NS_LITERAL_(C?)STRING$([^)]*)$/!$1.EqualsLiteral$$3$/g'

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 79

•

20 years ago

Numbers this time: Firefox WINNT 5.0 beast Clbr: Zdiff:-15268 Firefox MacOSX Darwin 6.8 imola Dep: Zdiff:-20656 (+19200/-39856) mZdiff:-16608 (+7440/-24048) Firefox Linux redwood Dep: Zdiff:-30957 (+181/-31138) mZdiff:-23824 (+71/-23895) Seamonkey Linux brad Clbr: Zdiff:-39175 (+645/-39820) mZdiff:-25456 (+224/-25680) Seamonkey Linux luna Dep: Zdiff:-40583 (+384/-40967) mZdiff:-25992 (+89/-26081)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 80

•

20 years ago

Comment on attachment 149104 [details] [diff] [review] udpated^2 > compareLCASCIIIgnoreCase i think this function deserves a comment explaining what it does. you have a comment that mentions EqualsLiteralIgnoreCase, but the code names the function EqualsLCLiteralIgnoreCase. also, i have to say that i find these names very cumbersome. is there no better name that we can come up with? why don't we just state that all string literals passed to the ignore-case versions need to be lower case? then the function names wouldn't need the LC acronym. EqualsLiteralIC might be better than EqualsLCLiteralIgnoreCase IMO.

Assignee

Comment 81

•

20 years ago

> why don't we just state that all string literals passed to the ignore-case > versions need to be lower case? That sounds OK for the Literal versions but for the generic ASCII versions it seems a bit less suitable. Maybe we just have to deal. How about just dropping the LC and having EqualsASCIIIgnoreCase, EqualsLiteralIgnoreCase? EqualsLiteralIC sounds fine but EqualsASCIIIC doesn't...

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 82

•

20 years ago

I'm fine with EqualsLiteralIgnoreCase and EqualsASCIIIgnoreCase.

Assignee

Comment 83

•

20 years ago

want me to make a new patch?

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 84

•

20 years ago

Comment on attachment 149104 [details] [diff] [review] udpated^2 no need for a new patch. just drop the LC everywhere, and document the cases where the input string must be lowercase. r+sr=darin with that.

Attachment #149104 - Flags: superreview?(darin)

Attachment #149104 - Flags: superreview+

Attachment #149104 - Flags: review?(darin)

Attachment #149104 - Flags: review+

Comment 85

•

20 years ago

/me reminds of comment 66

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 86

•

20 years ago

Hmm, that's bad. Thanks for reminding me. How about LowerCaseEqualsASCII, LowerCaseEqualsLiteral? foo->LowerCaseEqualsLiteral("womble") foo->LowerCaseEqualsASCII(str)

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 87

•

20 years ago

> Hmm, that's bad. Thanks for reminding me. yeah, i guess so... though we do have an assertion ;-) > foo->LowerCaseEqualsLiteral("womble") > foo->LowerCaseEqualsASCII(str) does that imply: foo->LowerCaseEqualsLiteralIgnoreCase("womble"); foo->LowerCaseEqualsASCIIIgnoreCase(str); my fingers hurt :-/

Assignee

Comment 88

•

20 years ago

> does that imply: > > foo->LowerCaseEqualsLiteralIgnoreCase("womble"); > foo->LowerCaseEqualsASCIIIgnoreCase(str); Nope, LowerCaseEqualsLiteral and LowerCaseEqualsASCII are the full names

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 89

•

20 years ago

In case I wasn't the only one who didn't get it: "LowerCase of |this| equals some literal" I think I like these function names a lot.

Assignee

Comment 90

•

20 years ago

Checked in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 91

•

20 years ago

OK, last step: convert codebase to use these functions. Here's what I did: # Update obvious calls to Append and Assign find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAppend$NS_LITERAL_(C?)STRING\(([^)]*)$\)/AppendLiteral$$2$/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAssign$NS_LITERAL_(C?)STRING\(([^)]*)$\)/AssignLiteral$$2$/g' # Change Append/AssignWithConversion of a literal to Append/AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAppendWithConversion$("[^"]*")$/AppendLiteral$$1$/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bAssignWithConversion$("[^"]*")$/AssignLiteral$$1$/g' # Update simple uses of operator+= and operator= to Append/AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ \+= NS_LITERAL_(C?)STRING$([^)]*)$;/.AppendLiteral$$2$;/g' find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/ = NS_LITERAL_(C?)STRING$([^)]*)$;/.AssignLiteral$$2$;/g' # Convert EqualsIgnoreCase of literals to LowerCaseEqualsLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/\bEqualsIgnoreCase$("[^"]*")$/"LowerCaseEqualsLiteral$".lc($1)."$"/eg' # Convert Equals(NS_LITERAL..., nsCaseInsensitiveStringComparator()) to LowerCaseEqualsLiteral find . | egrep '\.(cpp|h)$' | xargs perl -0 -pi -e 's/\bEquals$NS_LITERAL_(C?)STRING\(("[^"]*")$,[ \n\t]*nsCaseInsensitiveStringComparator\)/"LowerCaseEqualsLiteral$".lc($2)."$"/egs' # Fix up <type> <identifier>.AssignLiteral find . | egrep '\.(cpp|h)$' | xargs perl -pi -e 's/String (\w+)\.AssignLiteral/String $1; $1\.AssignLiteral/g' # Manually restore calls to nsIFile::Append in nsJVMConfigManagerUnix.cpp and # nsLayoutStyleSheetCache.cpp, CBrowserShell.cpp, nsEmbedGlobalHistory.cpp, # mozPersonalDictionary.cpp, mozMySpell.cpp, myspAffixMgr.cpp, pyloader.cpp, # testXalan.cpp, nsBayesianFilter.cpp, nsBookmarksService.cpp, winEmbed.cpp I'll attach the resulting patch.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 92

•

20 years ago

Attached file tada! (deleted) — Details

One thing to consider is whether LowerCaseEqualsLiteral makes the code less readable by forcing the RHS to be lower-cased. This patch by no means covers all the cases that can take advantage of the new functions. It's the easily-automated low hanging fruit. Note that this file is gzipped because the uncompressed patch is too big for Bugzilla's little mind.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 93

•

20 years ago

Comment on attachment 150174 [details] tada! here you go. by my count, 2252 call sites were changed. You may wish to just review the script :-)

Attachment #150174 - Flags: superreview?(darin)

Attachment #150174 - Flags: review?(darin)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 94

•

20 years ago

Ignore the Makefile change in that patch. I won't check it in.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 95

•

20 years ago

Hum. nsCaseInsensitiveStringComparator is Unicode aware so it's not generally correct to replace it with LowerCaseEqualsASCII. I will take out those changes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 96

•

20 years ago

Another option would be to make LowerCaseEqualsASCII on wide strings be Unicode aware. It doesn't look that hard. What is the rationale for having the case converter be an XPCOM service?

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 97

•

20 years ago

Hmm. We only care about Unicode characters that can lowercase-map to ASCII. Looking at the Unicode data files to see what Unicode characters that means, I find: -- I-with-dot maps to ASCII i followed by a combining dot; the combining dot is not ASCII so it can't match. -- Similarly for Lithuanian accented I and J -- In Turkic and Azeri locales, I-with-dot maps to plain ASCII i, but would we really want this comparison to be locale dependent? Argh. I don't think we do it locale ind -- 0x212A, KELVIN SIGN, maps to ASCII k I18N is fun! Looking at the current case converter (and assuming that there's not some magic alternative service that might be selected at run time; if so it doesn't appear to be in our tree), we map I-with-dot to plain ASCII i and KELVIN SIGN to ASCII k, and that's it. So it would be trivial to have LowerCaseEquals code preserve this behaviour (in fact the change will go in nsCharTraits<PR_Unichar>::ASCIIToLower). I'll do that, and leave the mass-API-update patch as is.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 98

•

20 years ago

(In reply to comment #97) > Hmm. We only care about Unicode characters that can lowercase-map to ASCII. wouldn't it be easier to map the ASCII string to unicode, in the lower- and uppercase variant? for ASCII, it seems that it's trivial to find the uppercase equivalent in basically all cases... (unless you want i to uppercase to I-with-dot in the turkish locale, which I doubt you do)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 99

•

20 years ago

(In reply to comment #98) > wouldn't it be easier to map the ASCII string to unicode, in the lower- and > uppercase variant? for ASCII, it seems that it's trivial to find the uppercase > equivalent in basically all cases... I don't know what you mean. The plan in my previous comment is simple and will preserve current behaviour.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 100

•

20 years ago

please make sure the i18n people are in on this. They might have plans to add support for more characters.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 101

•

20 years ago

From my reading of the Unicode tables, there really is no other significant case unless we apply specific rules for Lithuanian locales, but CCing smontagu anyway just in case :-)

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 102

•

20 years ago

Attached patch fix nsCharTraits<PR_Unichar> for full I18N correctness (deleted) — Details — Splinter Review

As described. This should guarantee that LowerCaseEqualsLiteral(...) (applied to a lowercase literal) behaves identically to Equals(..., nsCaseInsensitiveStringComparator()).

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Updated

•

20 years ago

Attachment #150243 - Flags: superreview?(darin)

Attachment #150243 - Flags: review?(darin)

Brodie

Comment 103

•

20 years ago

I don't think that these changes are valid. The old way was correct and is what would be expected from a function called 'ASCIIToLower'. Since you are explicitly only lowering ascii characters, and the literals are only supposed to be ASCII characters, the comparison should fail when matching anything other than ascii characters.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 104

•

20 years ago

Maybe we should change the name of that function from ASCIIToLower to something else. But the comment indicates what it's supposed to do and it does do what it's supposed to do.

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 105

•

20 years ago

Are you sure it's wise to add i18n code to xpcom? What if we learn that our uconv code needs to be changed... will the folks making those changes know to update xpcom? Why not implement the nsSubstring::LowerCaseEqualsLiteral symbol outside of xpcom? Why not put it inside unicharutils along side nsCaseInsensitiveStringComparator? It seems reasonable to me to force consumers to link to unicharutils to get this functionality.

Assignee

Comment 106

•

20 years ago

It's only about six lines of code, based directly on the Unicode spec, unlikely to ever need changing. I don't think it's worth the pain of moving those lines out into unicharutils, nor the complexity of having nsSubstring split across two DLLs.

Christian :Biesinger (don't email me, ping me on IRC)

Comment 107

•

20 years ago

(In reply to comment #99) > I don't know what you mean. I meant: Instead of taking the unicode character and converting it to lowercase, you could take the ASCII character, and convert it to unicode (which is as simple as a cast), if that doesn't match uppercase the ascii character and convert it to unicode and compare. that way, you need not worry about unicode lowercasing rules, I think.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 108

•

20 years ago

biesi's suggestion, clarified on IRC, amounts to exactly what the currently checked in code does.

Jonas Sicking (:sicking) No longer reading bugmail consistently

Comment 109

•

20 years ago

> It's only about six lines of code, based directly on the Unicode spec, unlikely > to ever need changing. the unicode spec isn't static but changes from version to version with new characters added.

Brodie

Comment 110

•

20 years ago

Re: comment 105 If this is to be included in xpcom, then comments should be added to both this location and the appropriate uconv location so that updates in one place will find the other.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 111

•

20 years ago

(In reply to comment #109) > > It's only about six lines of code, based directly on the Unicode spec, > > unlikely to ever need changing. > > the unicode spec isn't static but changes from version to version with new > characters added. The chances that the Unicode people add some hitherto-obscure characters which get downcase-mapped to ASCII are extremely low, especially considering there are only two such characters today. The chances that accidentally failing to handle case-insensitive comparision of such characters would be noticeable, let alone regarded as a serious issue, are even more negligible. In any case I can add comments to the unicharutils code pointing to the nsCharTraits code. Can we just do the simple thing to make all the Equals(..., nsCaseInsensitiveStringComparator()) calls turn into nice fast code, and close this bug out?

Comment 112

•

20 years ago

Comments are sufficient as far as I'm concerned.

Comment 113

•

20 years ago

Comment on attachment 150243 [details] [diff] [review] fix nsCharTraits<PR_Unichar> for full I18N correctness r+sr=darin Please add comments to this code regarding the Unicode to ASCII conversion. Also, there's no reason to put an "else" after "return 'k';" I don't need to see another patch for comments, but please don't hesitate to be verbose in the comments here and in intl/ :-)

Attachment #150243 - Flags: superreview?(darin)

Attachment #150243 - Flags: superreview+

Attachment #150243 - Flags: review?(darin)

Attachment #150243 - Flags: review+

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 114

•

20 years ago

Comment on attachment 150174 [details] tada! well done! r+sr=darin

Attachment #150174 - Flags: superreview?(darin)

Attachment #150174 - Flags: superreview+

Attachment #150174 - Flags: review?(darin)

Attachment #150174 - Flags: review+

Assignee

Comment 115

•

20 years ago

all checked in. Not much Tp impact, still waiting for Tinderbox size numbers. Thanks all!

Status: NEW → RESOLVED

Closed: 20 years ago

Resolution: --- → FIXED

Brodie

Comment 116

•

20 years ago

http://www.mozilla.org/projects/xpcom/string-guide.html will need to be updated for these changes.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 117

•

20 years ago

Reduced codesize by about 70K on brad.

Robert O'Callahan (:roc) (email my personal email if necessary)

Assignee

Comment 118

•

20 years ago

Yes, I should update the string guide, but I'm not sure how.

Comment 119

•

20 years ago

I've got an alternative to the template based solution for EqualsLiteral and family. The current code breaks VC++ 2003 anyway. The existing solution is a bit edgy template wise. The solution I propose is a bit more main stream, I believe and will allow this optimization to be handled on more compilers. I'm not touching the NS_DISABLE_LITERAL_TEMPLATE macro at this point, but I suspect it can be less restrictive if not removed all together. People with the various platforms need to test this out. Patch coming up.

Status: RESOLVED → REOPENED

Resolution: FIXED → ---

Comment 120

•

20 years ago

Attached patch Alternative template solution (obsolete) (deleted) — Details — Splinter Review

Robert O'Callahan (:roc) (email my personal email if necessary)

Comment 121

•

20 years ago

Oh, and swalker reports that VC++8 won't compile this either.

Assignee

Comment 122

•

20 years ago

If you do const char* foo = "blah"; bork.EqualsLiteral(foo) does it compile?

Comment 123

•

20 years ago

Oh, so were you using this template mechanism to weed out passing string litterals versus by char pointers I think I found what was tripping up the VC++ 2003. It was the use of the terinary operator in the nsIWin32LocalImpl.cpp. So this didn't cause the problem at all. Putting this back to FIXED, I'll provide a patch in another bug for the nsIWin32LocalImp

Status: REOPENED → RESOLVED

Closed: 20 years ago → 20 years ago

Resolution: --- → FIXED