Closed Bug 100214 Opened 23 years ago Closed 23 years ago

xpcom depends on unicharutil for case-insensitive unicode

Categories

(Core :: XPCOM, defect, P1)

x86
Windows 2000
defect

Tracking

()

RESOLVED FIXED
mozilla0.9.6

People

(Reporter: dougt, Assigned: alecf)

References

Details

Attachments

(10 files, 9 obsolete files)

(deleted), patch
jag+mozilla
: review+
sfraser_bugs
: superreview+
Details | Diff | Splinter Review
(deleted), patch
netscape
: review+
Details | Diff | Splinter Review
(deleted), patch
sspitzer
: review+
sfraser_bugs
: superreview+
Details | Diff | Splinter Review
(deleted), patch
jag+mozilla
: review+
Details | Diff | Splinter Review
(deleted), patch
jag+mozilla
: review+
Details | Diff | Splinter Review
(deleted), patch
jag+mozilla
: review+
Details | Diff | Splinter Review
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
netscape
: review+
Details | Diff | Splinter Review
(deleted), patch
jag+mozilla
: review+
Details | Diff | Splinter Review
(deleted), patch
jag+mozilla
: review+
Details | Diff | Splinter Review
Blocks: 66759
Blocks: 100107
No longer blocks: 100107
I started hacking on this a bit. What I've discovered is that the nsCRT routines are the only code that requires unicharutil. My original plan was to simply remove the case-oriented PRUnichar* routines from nsCRT, and switch all consumers to the ns*String routines. But I hit a snag: the ns*String routines use some of the nsCRT routines, in bufferRoutines.h, that I wanted to eliminate! I figured I'd change THOSE consumers to use unicharutil, which the string library already depends on. So string is still just as dependent, but xpcom is free of this dependency. Here's my plan, already underway - remove case-oriented routines that operate on PRUnichar's from nsCRT, and fix consumers - in the string library, switch the consumers of the previous routines over to unicharutil using the CaseConvert routines that already exist I hope this won't be too much work. I could use help as well, especially if someone (scc?) wants to convert bufferRoutines.h not to depend on nsCRT.h - that would make my life MUCH easier.
Attached patch first pass hacky proof-of-concept (obsolete) (deleted) — Splinter Review
ok, heres a patch that has its share of hacks, gross inefficiencies, and blatant lies. Basically what this does is shove off the unicharutil dependency over to the string library: - in string/obsolete, I shuffled around the gCaseConv stuff so that it could be shared - I eliminated the use of the nsCRT routines ToUpper/ToLower for PRUnichar stuff - I made the nsCRT::str*casecmp stuff use the new fangled string library's global Compare function Problems with this patch: - gross use of nsCRT::strlen() - the problem is that we need to nsDependentStrings with their size set to the min of these two - switch from kLatin1ToUCS2 array to NS_ConvertUTF8toUCS2 - I have no idea what the impacts of this are.. the problem is basically that because we're handing a whole string off to Compare(), we need a buffer with the char* -> UCS2 conversion already complete - really I should make a buffer on the stack and do the conversion like the old way. Anyway, this lets me compile the rest of the tree to see if there are any other impacts.
actually, I just realized this bug is for the uconv dependency (not unicharutil) I'm updating the summary, taking this, and opening another bug for uconv (which is totally unrelated, having to do with nsFileSpec/nsLocalFile and friends)
Assignee: dougt → alecf
Summary: xpcom depends on uconv for doublebyte support → xpcom depends on unicharutil for doublebyte support
Target Milestone: --- → mozilla0.9.5
Status: NEW → ASSIGNED
ok, I have a new plan, somewhat like the old. The new plan is to eliminate all case-related APIs from all of XPCOM and String, and make a utility library in unicharutil that resembles the string APIs. For example, I'm eliminating nsCRT::ToUpper() and the string library's ::ToUpperCase(nsAString&) But then I am declaring an implementation of ::ToUpperCase(nsAString&) in nsUnicharUtil.h, and then making an implementation which can be statically linked in. I'd like to hear an opinion from scc on this. What I'd essentially be doing is overloading a few global functions that string declares, in unicharutil. The routines in question are: ToLowerCase() ToUpperCase() CaseInsensitiveFindInReadable() CaseInsensitiveStringComparator (class)
Attached patch pre-landing cleanup (obsolete) (deleted) — Splinter Review
ok, that patch cleans up all uses of nsCRT::str*cmp to compare a PRUnichar* to a char*. nsCRT::str*cmp was doing an implicit conversion from UTF8 to UCS2, so now I make it explicit. Can I get an r=/sr=? jag? scc?
From what I understand, it was doing a conversion from ISO Latin1 to UCS2, which basically means tacking on another 8 bits, all 0, rather than UTF8 to UCS2, which will do all kinds of nasty things if it sees bit 7 set.
doh! you're right. actually, that seems really broken then. ok, I may end up switching those to some temporary NS_ConvertLatin1toUCS2 or something, or I might look at this stuff on a case-by-case basis. I _will_ kill these routines. Argh. this stuff sucks :)
actually, I forgot that most of the cases in that patch are NS_LITERAL_STRING's in fact, there are only 5 uses in that patch, and 4 of them are questionable conversions. I'm going to look at them and figure out how the strings are supposed to be encoded. I found a few more in mail, but they're all either candidates for NS_LITERAL_STRING, or can be changed a bit to just use straight unicode.
ok, I went through the cases and it looks like: nsInternetSearchService: we're just using a global constant char* - I'm pretty sure you can't use NS_NAMED_LITERAL_STRING in a global context since that would declare a global variable, so I'm doing an explicit conversion from ASCII. mozTXTtoHTMLConv.cpp: It turns out the only time we needed to do the conversion was with static strings - so I whacked the APIs to take const PRUnichar*, and then made use of NS_LITERAL_STRING nsComposerCommands: Same deal with editor - but since editor is in the midst of a directory reorg, I didn't want to go changing APIs on them. Other than that, all uses are NS_LITERAL_STRING. Comments?
by the way: 1) that's a diff -bw, I have fixed the funny indentation 2) don't mind the conflicts, that patch was made after I had just updated my tree - I've resolved them appropriately.
Attachment #50632 - Attachment is obsolete: true
I'd like to check in that 3rd patch there (attachment 50699 [details] [diff] [review]) - and would love some reviewers..
Comment on attachment 50699 [details] [diff] [review] updated patch - change APIs where necessary >Index: xpfe/components/search/src/nsInternetSearchService.cpp >=================================================================== >RCS file: /cvsroot/mozilla/xpfe/components/search/src/nsInternetSearchService.cpp,v >retrieving revision 1.146 >diff -u -b -w -r1.146 nsInternetSearchService.cpp >--- nsInternetSearchService.cpp 2001/09/25 00:22:45 1.146 >+++ nsInternetSearchService.cpp 2001/09/25 16:03:11 >@@ -3342,7 +3343,7 @@ > }; > > // make "ISO-8859-1" as the default (not "UTF-8") >- stringEncoding= NS_ConvertASCIItoUCS2("ISO-8859-1"); >+ stringEncoding= NS_LITERAL_STRING("ISO-8859-1"); Might as well add a space before that = while you're there. > PRUint32 loop = 0; > while (encodingList[loop].numericEncoding != nsnull) >Index: mailnews/base/search/src/nsMsgSearchAdapter.cpp >=================================================================== >RCS file: /cvsroot/mozilla/mailnews/base/search/src/nsMsgSearchAdapter.cpp,v >retrieving revision 1.34 >diff -u -b -w -r1.34 nsMsgSearchAdapter.cpp >--- nsMsgSearchAdapter.cpp 2001/07/25 07:52:31 1.34 >+++ nsMsgSearchAdapter.cpp 2001/09/25 16:03:12 >@@ -148,7 +148,7 @@ > nsCAutoString charset; > charset.AssignWithConversion(destCharset); > // Specify a character set unless we happen to be US-ASCII. >- if (nsCRT::strcmp(destCharset, "us-ascii")) >+ if (nsCRT::strcmp(destCharset, NS_LITERAL_STRING("us-ascii").get())) You could compare destCharset and "us-ascii" directly. > result = PR_smprintf("%s%s", nsMsgSearchAdapter::m_kImapCharset, (const char *) charset); > > return result; >Index: mailnews/base/src/nsMsgAccountManager.cpp >=================================================================== >RCS file: /cvsroot/mozilla/mailnews/base/src/nsMsgAccountManager.cpp,v >retrieving revision 1.235 >diff -u -b -w -r1.235 nsMsgAccountManager.cpp >--- nsMsgAccountManager.cpp 2001/09/24 23:20:02 1.235 >+++ nsMsgAccountManager.cpp 2001/09/25 16:03:13 >@@ -996,7 +996,7 @@ > nsCOMPtr<nsIMsgFolder>inboxFolder = do_QueryInterface(aSupport); > PRUnichar *folderName = nsnull; > inboxFolder->GetName(&folderName); >- if (folderName && nsCRT::strcasecmp(folderName, "INBOX") ==0) >+ if (folderName && Compare(nsDependentString(folderName), NS_LITERAL_STRING("INBOX"), nsCaseInsensitiveStringComparator()) ==0) > { > rv1 = inboxFolder->Compact(urlListener, nsnull /* msgwindow */); > if (NS_SUCCEEDED(rv1)) folderName is being leaked here. Use an nsXPIDLString and you can get rid of that nsDependentString. >Index: intl/chardet/src/nsMetaCharsetObserver.cpp >=================================================================== >RCS file: /cvsroot/mozilla/intl/chardet/src/nsMetaCharsetObserver.cpp,v >retrieving revision 1.48 >diff -u -b -w -r1.48 nsMetaCharsetObserver.cpp >--- nsMetaCharsetObserver.cpp 2001/09/19 23:59:58 1.48 >+++ nsMetaCharsetObserver.cpp 2001/09/25 16:03:13 >@@ -89,7 +89,7 @@ > const PRUnichar* nameArray[], > const PRUnichar* valueArray[]) > { >- if(0 != nsCRT::strcasecmp(aTag, "META")) >+ if(0 != nsCRT::strcasecmp(aTag, NS_LITERAL_STRING("META").get())) You converted one of these to a Compare(..., ..., nsCaseInsensitiveStringComparator) earlier. Which style do you want to use? > return NS_ERROR_ILLEGAL_VALUE; > else > return Notify(aDocumentID, numOfAttributes, nameArray, valueArray); >Index: netwerk/streamconv/converters/mozTXTToHTMLConv.cpp >=================================================================== >RCS file: /cvsroot/mozilla/netwerk/streamconv/converters/mozTXTToHTMLConv.cpp,v >retrieving revision 1.43 >diff -u -b -w -r1.43 mozTXTToHTMLConv.cpp >--- mozTXTToHTMLConv.cpp 2001/09/05 10:46:39 1.43 >+++ mozTXTToHTMLConv.cpp 2001/09/25 16:03:15 >@@ -547,7 +551,7 @@ > > PRUint32 > mozTXTToHTMLConv::NumberOfMatches(const PRUnichar * aInString, PRInt32 aInStringLength, >- const char* rep, PRInt32 aRepLen, LIMTYPE before, LIMTYPE after) >+ const PRUnichar* rep, PRInt32 aRepLen, LIMTYPE before, LIMTYPE after) Indentation nit, if you're willing to fix it (seemed like you were). > { > PRUint32 result = 0; > >@@ -564,7 +568,7 @@ > // NOTE: the converted html for the phrase is appended to aOutString > PRBool > mozTXTToHTMLConv::StructPhraseHit(const PRUnichar * aInString, PRInt32 aInStringLength, PRBool col0, >- const char* tagTXT, PRInt32 aTagTXTLen, >+ const PRUnichar* tagTXT, PRInt32 aTagTXTLen, > const char* tagHTML, const char* attributeHTML, > nsString& aOutString, PRUint32& openTags) > { >@@ -621,7 +625,7 @@ > > PRBool > mozTXTToHTMLConv::SmilyHit(const PRUnichar * aInString, PRInt32 aLength, PRBool col0, >- const char* tagTXT, PRInt32 aTagTxtLength, const char* tagHTML, >+ const PRUnichar* tagTXT, PRInt32 aTagTxtLength, const char* tagHTML, > nsString& outputHTML, PRInt32& glyphTextLen) > { > PRInt32 tagLen = nsCRT::strlen(tagTXT); Instead of const PRUnichar* you could change this to const nsAString&.
> Might as well add a space before that = while you're there. /me rolls his eyes :) > >- if (nsCRT::strcmp(destCharset, "us-ascii")) > >+ if (nsCRT::strcmp(destCharset, NS_LITERAL_STRING("us-ascii").get())) > You could compare destCharset and "us-ascii" directly. No, I can't - that's the point of this whole patch. destCharset is a PRUnichar* > result = PR_smprintf("%s%s", nsMsgSearchAdapter::m_kImapCharset, (const char *) charset); > > return result; >Index: mailnews/base/src/nsMsgAccountManager.cpp >=================================================================== >RCS file: /cvsroot/mozilla/mailnews/base/src/nsMsgAccountManager.cpp,v >retrieving revision 1.235 >diff -u -b -w -r1.235 nsMsgAccountManager.cpp >--- nsMsgAccountManager.cpp 2001/09/24 23:20:02 1.235 >+++ nsMsgAccountManager.cpp 2001/09/25 16:03:13 >>- if (folderName && nsCRT::strcasecmp(folderName, "INBOX") ==0) >>+ if (folderName && Compare(nsDependentString(folderName), NS_LITERAL_STRING("INBOX"), nsCaseInsensitiveStringComparator()) ==0) >> { >> rv1 = inboxFolder->Compact(urlListener, nsnull /* msgwindow */); >> if (NS_SUCCEEDED(rv1)) >folderName is being leaked here. Use an nsXPIDLString and you can get rid >of that nsDependentString. Ok, fixed.. and switched back to nsCRT::strcasecmp just for consistency >You converted one of these to a Compare(..., ..., >nsCaseInsensitiveStringComparator) earlier. >Which style do you want to use? I'll leave it as strcasecmp() > mozTXTToHTMLConv::NumberOfMatches(const PRUnichar * aInString, PRInt32 aInStringLength, >- const char* rep, PRInt32 aRepLen, LIMTYPE before, LIMTYPE after) >+ const PRUnichar* rep, PRInt32 aRepLen, LIMTYPE before, LIMTYPE after) Indentation nit, if you're willing to fix it (seemed like you were). As they say "when in rome" - the whole file is indented in the style you see here. It actually looks just fine within the context of surrounding code. >>@@ -621,7 +625,7 @@ >> > PRBool >> mozTXTToHTMLConv::SmilyHit(const PRUnichar * aInString, PRInt32 aLength, PRBool col0, >>- const char* tagTXT, PRInt32 aTagTxtLength, const char* tagHTML, >>+ const PRUnichar* tagTXT, PRInt32 aTagTxtLength, const char* tagHTML, >> nsString& outputHTML, PRInt32& glyphTextLen) >> { >> PRInt32 tagLen = nsCRT::strlen(tagTXT); > Instead of const PRUnichar* you could change this to const nsAString&. I'm going for minimal change here. We've had lots of problems with performance in this particular file due to excess ns*String usage in tight loops (lots of time spent in ns*String constructors, just ask waterson and mscott about it..) so for now I'd rather pass around raw pointers and integers... Attaching an updated patch to address these nits...
> > >- if (nsCRT::strcmp(destCharset, "us-ascii")) > > >+ if (nsCRT::strcmp(destCharset, NS_LITERAL_STRING("us-ascii").get())) > > > You could compare destCharset and "us-ascii" directly. > > No, I can't - that's the point of this whole patch. destCharset is > a PRUnichar* I typo'd that, I meant |charset| (the nsCAutoString just constructed), not |destCharset|. But anyways... And the reason I mentioned the indentation thing in mozTXTToHTMLConv is that you are in fact reindenting near line 490, hence the "seemed like you were" part of that comment.
Comment on attachment 50699 [details] [diff] [review] updated patch - change APIs where necessary r=jag
Attachment #50699 - Flags: review+
Comment on attachment 50699 [details] [diff] [review] updated patch - change APIs where necessary sr=sfraser
Attachment #50699 - Flags: superreview+
ok, that last patch has landed. thanks folks. next stop: eliminate bytewidth-insensitive comparisons from nsStr*
Priority: -- → P2
try as I might, I can't get enough of this done to land for 0.9.5.. I've run into a few issues: - CaseInsensitiveStringComparator() needed to be moved into nsUnicharUtils - CaseInsensitiveFindInReadable() is a little tricky.. we need to expose a generic FindInReadable() that takes a comparator. I'm going to rename FindInReadable_Impl() and expose that. - nsStringArray::IndexOfIgnoreCase is evil! We're only using it once in nsPresShell, but I haven't figured out a great way to work around it. Fortunately, the only consumer is the old viewer app...
Target Milestone: mozilla0.9.5 → mozilla0.9.6
This is going to be a very complex landing - I need to land things in stages... I'm attaching two patches: 1) Add a new CaseCompare() to nsICaseConversion so that we can do fast case-insensitive comparisons inside of nsICaseConversion 2) try to migrate everyone away from the (Assign|Equals)WithConversion routines.. first step is to remove all the obvious conversions to NS_LITERAL_STRING() - this covers about 70% of the usages.
Summary: xpcom depends on unicharutil for doublebyte support → xpcom depends on unicharutil for case-insensitive unicode
Attached patch add CaseCompare() to nsICaseConversion (obsolete) (deleted) — Splinter Review
ok, after a few runs at this and hitting a few dead-ends, here's my plan of action.. I need to write it down so I don't keep forgetting it: - eliminate all callers to nsCRT::str*casecmp that reference PRUnichar* parameters, switching them to the newest string APIs - add nsUnicharUtils.h/.cpp to the build, moving case-insensitive PRUnichar* routines into it. Then, #if 0 out the lines in the header, so that I can procede with the following without changing projects, etc... - hand-roll temporary string comparison routines for the old string library. (This eliminates xpcom's dependency on unicharutil, but not string or string_obsolete) - eliminate all callers to all *WithConversion (old API) routines.. 80-90% of these seem to be operating directly on literal strings, so: - first pass, convert literal string usage to NS_LITERAL_STRING - second pass, convert all other uses - add unicharutil_s.lib (empty at this point) to all consumer libraries that require it. (I'll need mac help here) - turn on #if 0'ed code! - actually remove nsCRT::str*casecmp routines from nsCRT.h/cpp I've done some of this work already. I'm starting to attach patches.
Priority: P2 → P1
Attached patch adding CaseCompare to nsICaseConversion, faster (obsolete) (deleted) — Splinter Review
the above patch gives us a faster case comparison by avoiding virtual method calls, and consolidates the ToLower() stuff into one function, instead of spreading it across the implementations. can I get an r/sr=?
Attachment #52088 - Attachment is obsolete: true
Attached patch massive update of nsCRT::str*casecmp (obsolete) (deleted) — Splinter Review
ok, that last patch: - removes all consumers of nsCRT::str[n]casecmp, and switches over to the new string libraries for the comparison... - converts any such files over to using NS_LITERAL_STRING where we were using AssignWithConversion (there are many more, but I'll put those in a seperate patch) - adds the inclusion of nsUnicharUtils.h to the necessary files, and adds the exporting (win, unix) and linking (windows only). Right now nsUnicharUtils and friends are #if 0'd out. Can I get a preliminary review to make sure I'm not smoking crack here, jag? Also, I'll attach a patch to the exporting of nsUnicharUtils.h to the mac build. After a reviewed version of this patch is checked in, I'll add linux and mac linking. Then, finally I'll throw the switch to turn ON nsUnicharUtils.h, and turn OFF the old routines in nsAString.h/nsReadableUtils.h.
Attached patch export nsUnicharUtils.h on the mac (obsolete) (deleted) — Splinter Review
gee... I don't know why we really need to do all these change. Maybe the easier way to do it just move the Unicode base case conversion code into xpcom.
no! in this case, smaller is better. Unicode base case conversion can and SHOULD BE optional.
frank, the whole point of this bug is to move non-xpcom libraries OUT of XPCOM. There are actually only two somewhat minor dependencies: - case-insensitive comparison and case conversion for PRUnichar* strings - filesystem-charset support This bug is for the first one.
>no! in this case, smaller is better. Unicode base case conversion can and >SHOULD BE optional. Do you know how BIG is the Unicode case conversion before you said that ?
>Unicode base case conversion can and SHOULD BE optional. Why ?
frank, doug (owner of XPCOM) has already decided that unicode case conversion should not be a part of XPCOM, and a number of the XPCOM peers (myself included) agree with him. Its not just a matter of size but also of appropriateness. this simply doesn't belong in XPCOM, all we're doing is moving the dependency out of xpcom and onto the consumers of these functions... we are removing the nsCRT::strcasecmp and nsCRT::strncasecmp routines for unicode, and need to support them instead through the new string APIs. I order to do this and maintain (actually, improve) string performance, we need to add this one comparison method to nsICaseConversion.
Depends on: 104126
Depends on: 104122
ok, I'm breaking this out into a couple different bugs so we can focus our discussions see: bug 104122: adding a case-insensitive compare to nsICaseConversion bug 104126: removing case-insensitive compares from nsStringArray
No longer depends on: 104122, 104126
Attachment #52613 - Attachment is obsolete: true
Attachment #50009 - Attachment is obsolete: true
Comment on attachment 52998 [details] [diff] [review] simply turn on the "util" directory, and make it come first r=cls
Attachment #52998 - Flags: review+
Attachment #52617 - Attachment is obsolete: true
Depends on: 104122, 104126
Attached patch updated nsCRT::strn?cmp changes (deleted) — Splinter Review
Attachment #52614 - Attachment is obsolete: true
I found some flaws in the way I converted nsMetaCharsetObserver.... that is an updated patch. Anyone care to review? The good news is that I've checked in the changes to add nsUnicharUtils.h to the build, and build the libraries on windows and unix, so this patch can land as-is once I get some eyes on it. can I get some reviewers? please? :)
once bug 104122 lands, I can check in attachment 53032 [details] [diff] [review]. Can I get some reviewers there? The only files affected are bufferRoutines.h and nsString2.cpp after that and attachment 53020 [details] [diff] [review] land, there should be no more consumers of nsCRT::strn?casecmp.
Comment on attachment 53032 [details] [diff] [review] remove string/obsolete's dependency on nsCRT::strn?casecmp sr=shaver, pending the results of alecf's regime of thorough testing (the results of which will be posted to this bug shortly-ish).
Attachment #53032 - Flags: superreview+
Comment on attachment 53020 [details] [diff] [review] updated nsCRT::strn?cmp changes r=sspitzer on the mailnews stuff. please make sure to test.
Attachment #53020 - Flags: review+
Comment on attachment 53020 [details] [diff] [review] updated nsCRT::strn?cmp changes >Index: intl/chardet/src/nsMetaCharsetObserver.cpp >=================================================================== >RCS file: /cvsroot/mozilla/intl/chardet/src/nsMetaCharsetObserver.cpp,v >retrieving revision 1.52 >diff -u -r1.52 nsMetaCharsetObserver.cpp >--- nsMetaCharsetObserver.cpp 2001/10/01 20:53:56 1.52 >+++ nsMetaCharsetObserver.cpp 2001/10/11 00:11:48 >@@ -225,25 +240,35 @@ > if(nsnull == httpEquivValue || nsnull == contentValue) > return NS_OK; > >- while(IS_SPACE_CHARS(*httpEquivValue)) >+ while(IS_SPACE_CHARS(*httpEquivValue)) { > httpEquivValue++; >- while(IS_SPACE_CHARS(*contentValue)) >+ httpEquivLen--; >+ } >+ while(IS_SPACE_CHARS(*contentValue)) { > contentValue++; >+ contentLen--; >+ } Do you need contentLen and httpEquivLen? Doesn't look like it. > if( >- ((0==nsCRT::strcasecmp(httpEquivValue,contenttype.get())) || >+ // first try unquoted strings >+ ((0==Compare(nsDependentString(httpEquivValue),contenttype, >+ nsCaseInsensitiveStringComparator())) || >+ // now try "quoted" or 'quoted' strings > (( (httpEquivValue[0]=='\'') || > (httpEquivValue[0]=='\"') ) && >- (0==nsCRT::strncasecmp(httpEquivValue+1, >- contenttype.get(), >- contenttype.Length())) >+ (0==Compare(nsDependentString(httpEquivValue+1, contenttype.Length())), >+ contenttype, >+ nsCaseInsensitiveStringComparator())) Indentation.
ack, you're right... I needed them in a previous iteration of this patch.. they have now been removed. as for indentation, dammit it looks right in emacs! :) stupid tabs. I'm fixing them up as best I can, but this file is full of tabs. does that count as r=jag then? :)
r=/me
Comment on attachment 53020 [details] [diff] [review] updated nsCRT::strn?cmp changes sr=sfraser
Attachment #53020 - Flags: superreview+
thanks so much! That patch has now been checked in... now I can focus on the last patch.. for some reason I'm asserting on startup with it.
The patch you just checked in caused bizarre bustage on Sun WS 5.0 (nebiros tinderbox): Error: Different types for "?:" (nsAString and void). It didn't seem to be a parsing problem as parenthisization didn't help. So I fixed it by moving the conditional deeper into the expression: - !(before == LT_IGNORE ? !Compare(nsDependentString(aInString), nsDependentString(rep, aRepLen), nsCaseInsensitiveStringComparator()) : - !Compare(nsDependentString(aInString+1), nsDependentString(rep, aRepLen), nsCaseInsensitiveStringComparator())) + !(before == !Compare(nsDependentString(aInString + (LT_IGNORE ? 0 : 1) ), nsDependentString(rep, aRepLen), nsCaseInsensitiveStringComparator()))
maybe we should just change this to an if() statement instead of a ?:... your patch, while I'm sure it works, seems confusing to me :)
Attachment #53032 - Attachment is obsolete: true
So after lots of testing, the only thing I found was that in Compare2To1, I had mixed up the 1st and 2nd parameters when iterating through the string. The only changes were: - swapping the kIsoLatin1toUCS2 conversion in Compare2To1 - renaming kIsoLatin1toUCS2 with a "_2" at the end so as not to conflict with the one in xpcom/ds (which is going away, and shouldn't be conflicting because it's static, but the windows linker is complaining) in any case, the patch is almost identical, and I've done lots of testing... jag, can you r= this? shaver, do you mind another sr=? once this patch lands, I can remove about 6 methods from nsCRT, which will remove the unicharutil dependency in xpcom. Then I can tackle the removal from string :)
Comment on attachment 53828 [details] [diff] [review] next iteration of string/obsolete sr=scc
Attachment #53828 - Flags: superreview+
Comment on attachment 53828 [details] [diff] [review] next iteration of string/obsolete r=jag
Attachment #53828 - Flags: review+
Comment on attachment 53882 [details] [diff] [review] minor tree spam to fix consumers of ToLowerCase/ToUpperCase r=jag
Attachment #53882 - Flags: review+
Comment on attachment 53883 [details] [diff] [review] Minor tree spam to wean people off nsCRT::(To|Is)(Upper|Lower) I wonder if we should provide an IsUpper / IsLower, which could be cheaper than ToUpper(ch) == ch.
I considered that, but it involves adding extra methods to the already large interface nsICaseConversion. I think the == test is reasonably fast, and adding support to nsCaseConversionImp2 is going to be somewhat ugly.
Comment on attachment 53883 [details] [diff] [review] Minor tree spam to wean people off nsCRT::(To|Is)(Upper|Lower) r=jag
Attachment #53883 - Flags: review+
Alec: Could patch 53020 be backed out? It seems it created a very visble regression (no *bold* and smileys in mail/news). See bug 105459.
no, we'll fix that bug the right way.
FWIW, there are two bugs that occur in a number of places in that patch: * use of nsDependentString as non-flat (see bug 104651) * forgetting to use nsDependentString as non-flat to simulate the N in strn[|case]cmp :-)
I'm lost. can you explain what I should have been doing there?
Probably the technically correct thing to do would be Substring(nsDependentString(foo),0,7) to get an nsAString& for the first 7 characters of foo.
ok, that's fine too. are you planning on catching all of these in a pass through the tree? (i.e. removing the Length parameter from the nsDependentString constructor)
I'm not sure. See bug 104651.
And, FWIW, I think using nsDependentString here will work fine, so perhaps you're better off just using it the way you did in some cases even though it's a bad patter that, if copied to uses other than compares, could lead to incorrect results.
well, I knew it didn't null-terminate (either by copying or by munging the string) but figured it would be ok with Compare() - I didn't want to make an extra copy of the string, since I'm just comparing.
Attachment #54135 - Attachment is obsolete: true
Comment on attachment 53882 [details] [diff] [review] minor tree spam to fix consumers of ToLowerCase/ToUpperCase sr=scc
Attachment #53882 - Flags: superreview+
Comment on attachment 53883 [details] [diff] [review] Minor tree spam to wean people off nsCRT::(To|Is)(Upper|Lower) looks good, except that I'm hating the old-style casts (at the very end of your patch). In your case, you could be using construction-style casts anyway. Other than that, sr=scc.
Attachment #53883 - Flags: superreview+
Comment on attachment 54141 [details] [diff] [review] take 2 - add ToUpperCase(PRUnichar) and friends, plus add shutdown observer sr=scc
Attachment #54141 - Flags: superreview+
Comment on attachment 54148 [details] [diff] [review] add libunicharutil_s.so to all necessary unix projects r=cls
Attachment #54148 - Flags: review+
ok the following patches are in: the string/obsolete the ToLowerCase/ToUpperCase for nsAStrings the adding of ToUpperCase(PRUnichar) and ToLowerCase(PRUnichar) adding libunicharutil_s to necessary unix projects This is what's left, in this order: 1) fixing up mac projects (hopefully today) 2) landing to (To|Is)(Upper|Lower)Case for PRUnichar (today, requires #1) 3) Yanking all nsICaseConversion out of xpcom (tomorrow, after builds on all platforms) So very close...!
ok, mac project files have finally landed..very soon we'll be able to flip the switch. I've done testing on windows and mac, unix is the only one left
Attached patch the final patch to the string library (obsolete) (deleted) — Splinter Review
ok, that patch finally removes all calls that would require calls into the unicode library, and turns on their equivalent calls in nsUnicharUtils.cpp. One thing you'll notice is that I've added a 2nd operator() for the base type, and merged the two comparators used in FindInReadable() and Compare().. you'll find a i18n implementation of this comparator in nsUnicharUtils.cpp All callers to old APIs (nsCRT) have been converted over to the new string library, and have nsUnicharUtils.h properly included. This means that once I land this patch, everyone will be using the new set of routines.. Can I get a r=jag, and sr=scc?
and that patch simply removes all routines from nsCRT:: that did any dorky conversion or calls into nsICaseConversion. The dependency is dead! Long live the dependency. can I get r=jag and sr=scc on that one too?
Comment on attachment 55324 [details] [diff] [review] the final patch to the string library >Index: string/public/nsAString.h >=================================================================== >RCS file: /cvsroot/mozilla/string/public/nsAString.h,v >retrieving revision 1.9 >diff -u -r1.9 nsAString.h >--- nsAString.h 2001/10/13 15:01:16 1.9 >+++ nsAString.h 2001/10/26 22:17:06 >@@ -658,15 +659,9 @@ > { > public: > virtual int operator()( const char_type*, const char_type*, PRUint32 aLength ) const; >+ virtual int operator() ( char_type, char_type ) const; Indentation. >Index: string/src/nsAString.cpp >=================================================================== >RCS file: /cvsroot/mozilla/string/src/nsAString.cpp,v >retrieving revision 1.10 >diff -u -r1.10 nsAString.cpp >--- nsAString.cpp 2001/10/13 15:01:20 1.10 >+++ nsAString.cpp 2001/10/26 22:17:06 >@@ -34,10 +34,12 @@ > } > > int >-nsCaseInsensitiveStringComparator::operator()( const char_type* lhs, const char_type* rhs, PRUint32 aLength ) const >- { >- return nsCRT::strncasecmp(lhs, rhs, aLength); >- } >+nsDefaultStringComparator::operator()( char_type lhs, char_type rhs) const >+ { >+ if (lhs == rhs) return 0; >+ if (lhs < rhs) return -1; >+ return 1; >+ } return lhs - rhs; >@@ -537,10 +539,31 @@ > return nsCharTraits<char_type>::compare(lhs, rhs, aLength); > } > >+PRBool >+nsDefaultCStringComparator::operator()( char_type lhs, char_type rhs ) const >+ { >+ if (lhs == rhs) return 0; >+ if (lhs < rhs) return -1; >+ return 1; >+ } >+ return lhs - rhs; > int > nsCaseInsensitiveCStringComparator::operator()( const char_type* lhs, const char_type* rhs, PRUint32 aLength ) const > { > return nsCRT::strncasecmp(lhs, rhs, aLength); >+ } >+ >+PRBool >+nsCaseInsensitiveCStringComparator::operator()( char lhs, char rhs ) const >+ { >+ if (lhs == rhs) return 0; >+ >+ lhs = nsCRT::ToUpper(lhs); >+ rhs = nsCRT::ToUpper(rhs); >+ >+ if (rhs == lhs) return 0; >+ if (rhs < lhs) return -1; >+ return 1; Wrong way around. And again you could use: return lhs - rhs; >Index: string/src/nsReadableUtils.cpp >=================================================================== >RCS file: /cvsroot/mozilla/string/src/nsReadableUtils.cpp,v >retrieving revision 1.25 >diff -u -r1.25 nsReadableUtils.cpp >--- nsReadableUtils.cpp 2001/10/02 11:15:45 1.25 >+++ nsReadableUtils.cpp 2001/10/26 22:17:07 >@@ -429,7 +411,8 @@ > while ( !found_it ) > { > // fast inner loop (that's what it's called, not what it is) looks for a potential match >- while ( aSearchStart != aSearchEnd && !compare(*aPatternStart, *aSearchStart) ) >+ while ( aSearchStart != aSearchEnd && >+ compare(*aPatternStart, *aSearchStart) ) |compare| returns 0 if the first argument and the second argument are equal, so |!compare| or rather |compare == 0| is what you want. >@@ -466,7 +449,7 @@ > > // else if we mismatched ... it's time to advance to the next search position > // and get back into the `fast' loop >- if ( !compare(*testPattern, *testSearch) ) >+ if ( compare(*testPattern, *testSearch) ) Same here.
I adjusted the lhs vs. rhs stuff, but you're actually wrong about the compare() - we're actually looking for non-matches in those cases - the old comparator used to be a PRBool, so I've logically reversed the use of compare(), that's why they're in the patch
Doh! You're right. I was being silly and thinking that we were using |compare| consistently with the |compare|s everyone's familiar with. Your patch makes things better.
Comment on attachment 55325 [details] [diff] [review] finally rip unicharutil out of xpcom! r=jag
Attachment #55325 - Flags: review+
I just realized that for the case insensitive compare you probably want to use ToLower, not ToUpper, since ToLower will work for both uppercase and titlecase.
ah, good call.. I noticed that we were being inconsistent, but I didn't know why.. so I just mimiced what the code does today. I like your argument about lower vs. upper vs. titlecase, so I'll go with lower case.
Comment on attachment 55324 [details] [diff] [review] the final patch to the string library some trouble with spacing; also, why not |return lhs-rhs;| and no tests? Clients are only guaranteed a result <, ==, or > |0|, not { -1, 0, 1 }. Fix the spacing (near the top of the patch) and either convince me I'm wrong about the results of compares, or else fix them, then you have my sr.
Comment on attachment 55325 [details] [diff] [review] finally rip unicharutil out of xpcom! removal is beautiful, sr=scc
Attachment #55325 - Flags: superreview+
Attached patch final string patch, take 2 (deleted) — Splinter Review
Comment on attachment 55624 [details] [diff] [review] final string patch, take 2 sr=scc
Attachment #55624 - Flags: superreview+
Comment on attachment 55624 [details] [diff] [review] final string patch, take 2 r=jag
Attachment #55624 - Flags: review+
Attachment #55324 - Attachment is obsolete: true
after much burning, "final string patch" has been checked in. one more patch to go ("finally rip unicharutil out of xpcom") and I'll be done with this one.
yay! finally fixed. praise the string gods.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: