Closed Bug 71247 Opened 24 years ago Closed 23 years ago

change strings in LDAP XPCOM SDK to unicode and/or UTF8 for i18n

Categories

(Directory :: LDAP XPCOM SDK, defect, P3)

Tracking

(Not tracked)

VERIFIED FIXED
mozilla0.9.2

People

(Reporter: dmosedale, Assigned: leif)

References

Details

(Keywords: intl, Whiteboard: [PDT+] Have a fix, r=, sr=, a= critical for 0.9.2)

Attachments

(6 files)

We need to switch to XPIDL wstrings for all the interfaces in LDAP XPCOM SDK so that unicode data doesn't quietly get thrown out when it passes that boundary.
Blocks: 17880
Blocks: 75990
Priority: -- → P2
Status: NEW → ASSIGNED
Apparently we are switching to something called "astring" now (abstract strings).
Keywords: nsbeta1
Priority: P2 → P1
Target Milestone: --- → mozilla0.9.1
Severity: normal → major
Summary: change strings in LDAP XPCOM SDK to wstrings for i18n → change strings in LDAP XPCOM SDK to astrings for i18n
Blocks: 79869
No longer blocks: 79869
Due to this bug, Japanese base DN is displayed garbled on directory edit window. Please see bug 79869 for details.
Two "notes" to this patch: 1. It does use wstring and not AString. There are problems with using the latter in an array context in IDL. 2. Because nsIURI uses "string", nsLDAPURL.cpp (which implements nsIURI) can't use wstring either. This is a big problem, and we haven't decided on a solution. One alternative is to make a new interface, nsIUnicodeURI or something like that, which would be subclassed from nsIURI. #2 means that DNs are still not wide string enabled, and I suggest we open a new bug on that issue, since it requires some other hackage. Dmose: Review this patch please, and if you can, test on Mac/PC.
Keywords: intl
Setting target milestone to 0.9.2 (check it in anytime, even before, when the tree is open for). Per PDT triage.
Target Milestone: mozilla0.9.1 → mozilla0.9.2
Reviewer comments: Can you file a new bug on nsILDAPURL? My reading of RFC 2251 makes me think that attribute names _can_ be UTF8, not just ASCII. So I think all the uses of NS_ConvertASCIItoUCS2 should actually be NS_ConvertUTF8toUCS2. Fix that, and you've got r=dmose@netscape.com. The patch builds fine on Mac, but I don't have a build that actually runs, so I can't test that part just yet.
sdagley just showed me the wonders of the "dx" command to macsbug, so I now do have a usable mac build. LDAP autocomplete is working nicely.
Hmmm, that's not how I read the RFC, it says: A specification may also assign one or more textual names for an attribute type. These names MUST begin with a letter, and only contain ASCII letters, digit characters and hyphens. They are case insensitive. (These ASCII characters are identical to ISO 10646 characters whose UTF-8 encoding is a single byte between 0x00 and 0x7F.) I read this as meaning you can have *SOME* 7-bit ASCII characters, which happens to be a subset of UTF-8. It does exclude "_" for instance, which clearly is allowed in UTF8 (but is ASCII). No?
I was interpreting the AttributeType ::= LDAPString to mean that the attribute name itself could be an LDAPString, not the attribute value. But I think your reading is right. Never mind. :-/
Scott, can you SR this bug please?
Keywords: review
Whiteboard: SR requested from mscott 5/16
Asking "scc" for SR, mscott is swamped. :)
I'll get to this but it won't happen until tonight. if someone wants to do it b4 me then cool.
changing interface methods from string to wstring looks very safe. Looks like you are using NS_LITERAL_STRING macros were appropriate in here too. good. these changes looks safe and correct to me. sr=mscott I assume you have some test cases to demonstrate preservation of unicode data now?
Checked in.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
reassign to Olga as QA contact
QA Contact: yulian → olgac
reassign Xianglan Shirley Ji as QA contact
QA Contact: olgac → ji
I realize I'm reviewing your code after it's already been checked in :-) Sorry. I hope you find this useful anyway. + *_retval = NS_ConvertUTF8toUCS2(rv).ToNewUnicode(); You can go directly to a newly allocated hunk of UCS2 from ASCII with the global |ToNewUnicode| *_retval = ToNewUnicode(rv); If |rv| didn't happen to be a string yet, you would say *_retval = ToNewUnicode( nsDependentCString(rv) ); I think you could use this better pattern in a couple of places. Other than that, though, everything looks fine.
With 05/22 trunk build, the Japanese base DN is still garbled. dmose, should the base DN display problem be fixed in bug 79869?
scc: only the attributes themselves are actually guaranteed to be ascii. values, error messages, etc may contain non-ASCII UTF8 chars ji: no; that depends on nsLDAPURL being fixed to deal with UTF8; since a separate bug didn't get filed for that, I'm reopening this one.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Questions: * rv = mCurrentOperation->SearchExt(NS_ConvertASCIItoUCS2(baseDn).get() Is baseDn ASCII only? * Is there a bug for UTF-8 nsLDAPURL?
nhotta: My idea behind reopening (and retitling this bug slightly) is so that we can use it to also track the UTF8 work. As you surmised, the base DN can indeed contain UTF8; it's one of the fixes need to close this but out.
Summary: change strings in LDAP XPCOM SDK to astrings for i18n → change strings in LDAP XPCOM SDK to unicode and/or UTF8 for i18n
Because the scope of the UTF8 changes are bit unclear, we're dropping priority on this slightly, but we hope to still see it land for 0.9.2. Current theory is to write a little C++ code that calls the charset conversion code and exposes UCS2<->UTF8 conversion goop to JS via the JSAPI.
Priority: P1 → P3
PDT+ per 6/12 mtg.
Whiteboard: SR requested from mscott 5/16 → [PDT+] SR requested from mscott 5/16
Since getting something into JSAPI (or something similar) seems unlikely at this point, I've made a couple of additions to the nsILDAPService interface: |UTF8toUCS2()| and |UCS2toUTF8()| which are both scritable, e.g. from JS you can do stuff like |url.spec = ldapService.UCS2toUTF8(prefsURL);| (with appropriate code to get the LDAP service etc). This is less than ideal, and I've marked the code appropriately. But until we get a real fix for this, I think this is what we have to do. dmose, review please? -- Leif
A couple of nits: * doxygen @return statements don't have the type or variable name; just the comment * from the UCS2toUTF8 commentary: s/nsLDAPURL/nsILDAPURL/ * if you could put the |NS_IMETHODIMP|s on their own lines so that the "find functions using 'grep ^functionName *' still works Fix those, and you've got r=dmose. No need to post another patch.
Attached patch Second patch, reviewers changes (deleted) — Splinter Review
Created a 2nd patch, with dmose's changes, and also updated patch for changes checked in to ns[I]LDAPService. mscott, can you SR= please? The last patch, from 6/17/01 -- Leif
Whiteboard: [PDT+] SR requested from mscott 5/16 → [PDT+] SR requested from mscott 6/18
Your changes look good to me. Although I didn't see any callers of your new conversion APIs on the LDAP service? sr=mscott
Blocks: 79869
Blocks: 79252
mscott: Aye, this would only be called from Javascript. We have some LDAP related bugs depending on UTF8 encoding strings to be stored in the C strings as defined by |nsIURI|. Thanks! -- leif
Blocks: 83989
jumping in, since I'm reviewing #79869 shouldn't the dn attribute on nsILDAPURL be a wstring now?
sspitzer: Because of the choice of nsIURI to stick to UTF8-encoded |string|s, I think it makes sense to have nsILDAPURL (which inherits from it) do the same. Of course, now the XPCOM SDK is a nasty mishmash of |wstring| (UCS2), AString, and UTF8-encoded |string|. So we're gonna have to go back and clean this up. I wanna wait and see how 84196 shakes out, as well as learn whether nsACString is going to be exposed to XPIDL before deciding exactly how best to do this cleanup.
dmose/sspitzer: I already file a bug on the "mess" in the LDAP XPCOM SDK re: strings, it's bug 86000.
a= asa@mozilla.org for checkin to the trunk. (on behalf of drivers)
Checked in.
Status: REOPENED → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
With 06/20 build, the multi-byte base DN display problem is fixed, but the the directory with this kind of search root is not accessible. For example, if I specify a directory with Japanese base DN, autocomplete is not working. With the same directory, it's working perfectly with 4.75. Steps to reproduce: 1. Setup a directory with hostname to polyglot.mcom.com, port to 8000, base DN to o=ネットスケープ (view this page in Japanese (Shift-JIS) and copy this string to your directory create window). 2. Go to the preference window for addressing and set autocomplete to use this directory. 3. Bring up a new mail compose window, in To: field, type Mike Smith no match comes up. You should be able to see Mike Smith <ji2@seamonkey.mcom.com> auto-filled into To: field.
Should I open a new bug for this?
Reopened the bug.
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
ji, seems like a new bug was the right answer. The problem you're describing isn't similar to the summary for this bug.
Dealing **properly** with UTF-8 Base DN in LDAP protcol should be part of LDAP XPCOM SDK. So, it was appropriate to re-open this bug. In asnwer to questions by nhotta, dmose says: -- Additional Comments From Dan Mosedale 2001-05-29 16:20 -- ... My idea behind reopening (and retitling this bug slightly) is so that we can use it to also track the UTF8 work. As you surmised, the base DN can indeed contain UTF8; it's one of the fixes need to close this bug out. --- End of the quoted comment --- The remaining problem is that of Base DN and UTF-8 handling and according to dmose this is the bug to deal with such isssues.
please update the status whiteboard with your eta.
Whiteboard: [PDT+] SR requested from mscott 6/18 → [PDT+] need eta
Can I get access to the polyglot machine, and the access log file from the LDAP server? It would help a lot when debugging this. Thanks, -- leif
With the help of Katsuhiko Momoi, I've narrowed this down quite a lot. If I manually edit the prefs.js file, and put in the appropriate UTF8 "byte" string as the Base-DN, everything works fine. When I looked at the existing base in the prefs.js, it looked wrong. So, the code work for reading the prefs, and performing the actual search (see attached log from the LDAP server). Somehow the actual prefs.js setting is getting the wrong base-DN. Before I edited prefs.js, I'd see this in the access log: [21/Jun/2001:16:23:15 -0700] conn=117 fd=33 slot=33 connection from 208.12.32.152 [21/Jun/2001:16:23:15 -0700] conn=117 op=0 BIND dn="" method=128 version=2 [21/Jun/2001:16:23:15 -0700] conn=117 op=0 RESULT err=0 tag=97 nentries=0 [21/Jun/2001:16:23:17 -0700] conn=117 op=1 SRCH base="o=\c3\a7\c2\b9\c2\9d\c3\a9\c2\98\c2\aa\c3\a3\c2\83\c2\a3\c3\a7\c2\b9\c2\9d\c3\a5\c2\8c\c2\bb\c3\a3\c2\81\c2\9b\c3\a7\c2\b9\c2\a7\c3\af\c2\bd\c2\b1\c3\a7\c2\b9\c2\9d\c3\af\c2\bd\c2\bc\c3\a7\c2\b9\c2\9d" scope=2 filter="(|(cn=smith*)(mail=smith*)(sn=smith*))" [21/Jun/2001:16:23:17 -0700] conn=117 op=1 RESULT err=32 tag=101 nentries=0 [21/Jun/2001:16:31:57 -0700] conn=117 op=-1 fd=33 closed error 32 (Broken pipe) After I edited prefs.js, and put in the string as provide to me, I get: [21/Jun/2001:16:33:28 -0700] conn=118 fd=22 slot=22 connection from 208.12.32.152 [21/Jun/2001:16:33:28 -0700] conn=118 op=0 BIND dn="" method=128 version=2 [21/Jun/2001:16:33:28 -0700] conn=118 op=0 RESULT err=0 tag=97 nentries=0 [21/Jun/2001:16:33:28 -0700] conn=118 op=1 SRCH base="o=\e3\83\8d\e3\83\83\e3\83\88\e3\82\b9\e3\82\b1\e3\83\bc\e3\83\97" scope=2 filter="(|(cn=smith*)(mail=smith*)(sn=smith*))" [21/Jun/2001:16:33:28 -0700] conn=118 op=1 RESULT err=0 tag=101 nentries=2 And this looks right to me. -- leif
It looks to me what is saved in the prefs.js is not UTF-8 data. I actually cannot read my BaseDN name as Japanese when I use UTF-8 capable text editor to open my prefs.js.
base="o=\e3\83\8d\e3\83\83\e3\83\88\e3\82\b9\e3\82\b1\e3\83\bc\e3\83\97" As leif says, the above is indeed the correct representation in UTF-8 of the BaseDN name we are using.
Since the preference is saved using SetUnicharPref the data in the prefs.js will not be utf8. I think the problem is we are using CopyCharPref instead of CopyUnicharPref to read the .uri preference. I tried changing the call on windows and tested it and seems to be working fine. I will post a patch right away. Leif can you try in Linux.
Attached patch patch (deleted) — Splinter Review
To confirm, are you saying that the BaseDN data in prefs.js will not be in UTF-8 even after the patch?
Sorry if I sound a bit paranoid, but by agreement, all non-ASCII data saved in prefs.js must be in UTF-8. Pls ignore this comment if I am off target.
Attached patch patch v3 (deleted) — Splinter Review
mommoi, we are saving basedn in utf-8 but I think it is converted to utf-8 twice. The new patch should save basedn in utf-8.
I looked at my prefs.js and it shows basedn as o=\e3\83\8d\e3\83\83\e3\83\88\e3\82\b9\e3\82\b1\e3\83\bc\e3\83\97 tested the last patch on windows and linux. building on mac right now
Whiteboard: [PDT+] need eta → [PDT+] Have a fix, need reviews
r=dmose
Whiteboard: [PDT+] Have a fix, need reviews → [PDT+] Have a fix, r=, need sr=
works on mac tooo.
Whiteboard: [PDT+] Have a fix, r=, need sr= → [PDT+] Have a fix, r=, sr=, need a=
a=blizzard on behalf of drivers for 0.9.2
Whiteboard: [PDT+] Have a fix, r=, sr=, need a= → [PDT+] Have a fix, r=, sr=, a= critical for 0.9.2
Fix checked in
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
Verified with 06/22 linux and Mac latest trunk build. It's fixed. Will verify it on windows when there is an installable build available.
Verified with win32 06/25 branch build. It's working on windows build too.
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: