Closed Bug 73292 Opened 24 years ago Closed 23 years ago

Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: jag+mozbugs, Assigned: jag+mozilla)

References

Details

Attachments

(1 file, 4 obsolete files)

For parity with NS_ConvertUCS2toUTF8, which has a constructor for nsAString&, and because this just looks silly: NS_ConvertASCIItoUCS2 uStr(aStr.get());
Adding dependency (blocks bug 73009), removing silly cc.
Blocks: 73009
Targeting 1.1 since this isn't crucial, but it is a step in the right direction for consistency and not too hard so I'd like to get it in before then if possible. If I make a tracking bug for handling encoding/conversion issues, I may move this to block that tracker instead.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1
Attached patch [patch] First crude attempt. (obsolete) (deleted) — Splinter Review
re-targeting milestones, starting from a clean slate
Target Milestone: mozilla1.1 → ---
I'm assuming a good bit of this is roughly copied, so I'm not reviewing too closely... + NS_ERROR("not a UTF-8 string"); I'd prefer NS_ASSERTION to NS_ERROR. + nsReadingIterator<char> p, end; I prefer |nsACString::const_iterator p, end;|. + explicit + NS_ConvertUTF8toUCS2( const nsACString& aCString ) + { + Init( aCString ); + } It would be nice to also have: explicit NS_ConvertUTF8toUCS2( const nsAFlatCString& aCString ) { Init( aCString.get() ); } to improve performance for things like nsCString and nsDependentString. Also, you could have the same thing (also inline) for NS_ConvertASCIItoUCS2. With that, r=dbaron.
Taking this. dbaron, could you review the new patch?
Assignee: scc → jaggernaut
Status: ASSIGNED → NEW
NS_ConvertASCIItoUCS2(const nsACString&) * should use nsACString::const_iterator rather than nsReadingIterator<char> * should call aCString.Length() and expand the buffer at the beginning to avoid multiple allocations NS_ConvertUTF8toUCS2(const nsACString&) * It would be nice to unify this with Init -- perhaps Init could take an nsACString& and you could use nsDependentString elsewhere? Making the const char* constructor avoid 3 passes would be a trick, though... * Both loops over the string would probably be better written as double for-loops - outer over fragments and inner over characters. * You should probably allow embedded nulls, right? Or not?
> NS_ConvertASCIItoUCS2(const nsACString&) > > * should use nsACString::const_iterator rather than > nsReadingIterator<char> done > * should call aCString.Length() and expand the buffer at the beginning > to avoid multiple allocations done > NS_ConvertUTF8toUCS2(const nsACString&) > > * It would be nice to unify this with Init -- perhaps Init could take > an nsACString& and you could use nsDependentString elsewhere? > Making the const char* constructor avoid 3 passes would be a trick, > though... > * Both loops over the string would probably be better written as > double for-loops - outer over fragments and inner over characters. Let me think about these two. > * You should probably allow embedded nulls, right? Or not? Erh... scc?
Status: NEW → ASSIGNED
Summary: Add const nsAReadableCString& constructor to NS_Convert(ASCII|UTF8)toUCS2 → Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2
Attached patch third attempt (obsolete) (deleted) — Splinter Review
Attachment #31191 - Attachment is obsolete: true
Attachment #41917 - Attachment is obsolete: true
I'm all ears on how to avoid the initial pass (in nsDependentCString) for the |const char*| constructor of NS_ConvertUTF8toUCS2.
>>nsString2.cpp: >> >>the iterators (ns{Reading,Writing}Iterator) ought to have a fragment_type so >>that you don't have to use nsReadableFragment<char> explicitly. > >Next pass? New bug? > >> + nsReadableFragment<char> frag(start.fragment()); >> >>Maybe >> >> const nsReadableFragment<char>& frag = start.fragment(); >> >>would be better? > >I could use that. > >>How about an NS_ASSERTION(count == mLength, "calculator calculated >>incorrect length"); at the end of nsConvertUTF8toUCS2::Init? > >Done. Btw, I think this would only occur when we encounter a UTF8 string that isn't, in which case we already error. If we're indeed fed utf8 straight from the netwerk, we might want to deal with this more nicely. >>I think we need to look at the error-handling of NS_ConvertUTF8toUCS2 >>carefully since we can expect it to be called on input from the network >>(I would think, anyway, although perhaps not). > >I should do that. > >>+ else >>+ { >>+ ucs4 -= 0x00010000; >>+ *mBuffer++ = 0xD800 | (0x000003FF & (ucs4 >> 10)); >>+ *mBuffer++ = 0xDC00 | (0x000003FF & ucs4); >>+ } >> >>Is this UCS2 or UTF-16 or what? > >I'll come back to you on this one :-) Hmmm... This is UTF-16 [0], and it will mess us up (even in the old code) in that the UTF8 calculator will only have assigned one character for this, and the above writes out two. Eek! [0] ftp://ftp.isi.edu/in-notes/rfc2781.txt
That code sure looks wrong to me. I've cc'd ftang and yokoyama, from whom I stole the conversion code. Guys, we don't really do UTF-16, right?
Attached patch above patch with the UTF16 "support" removed. (obsolete) (deleted) — Splinter Review
Why bother maintaining mLength in ConvertUTF8toUCS2? Or should it just be |#ifdef DEBUG|?
The extra pass with nsDependentString calculating the length is inefficient -- at the very least, don't bother with the ~PRUint32(0), just use the one-parameter constructor, since they mean the same thing and the latter is *much* clearer. I guess, for simplicity, don't worry about the extra pass for now, though -- it's also the best way to allow embedded nulls in all the other cases, I think. Same for the double length calculation. + if ( ucs4 != 0xfeff ) // ignore BOM + *mBuffer++ = ucs4; Don't you need: else --mLength; However, I think it would be better to scrap mLength and keep an mStart instead of mLength, initialized in the ctor to mBuffer, and have Length() be |{ return mBuffer - mStart; }|. One other optimization that would be nice is to, after the check against minUcs4, check |ucs4 < 0xD800| to short-circuit the rest of the checks for the vast majority of characters (with a comment saying that it's short-ciruiting what is below in case someone changes what is below). Should the |SetCapacity| have a |+1| or not? Other than that, r=dbaron. scc should super-review.
I'll rewrite Length() as you proposed. And you're right, nsStr's code takes care of adding space for the 0 terminator, so I don't need to add 1 myself when calling SetCapacity.
At least, if I'm reading this code correctly: http://lxr.mozilla.org/seamonkey/source/string/obsolete/nsStr.cpp#680
Comment on attachment 53457 [details] [diff] [review] Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already. r=dbaron
Attachment #53457 - Flags: review+
scc, sr= please?
Attachment #50925 - Attachment is obsolete: true
Attachment #48525 - Attachment is obsolete: true
sr=scc
Comment on attachment 53457 [details] [diff] [review] Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already. and sr=scc
Attachment #53457 - Flags: superreview+
Ran into a little problem in NS_ConvertUTF8toUCS2::Init: if |count| is |0|, SetCapacity will free the allocated buffer and make mUStr point at a shared \0 string, which we can't write our null terminator to. Added an |mCapacity| guard. Checked in, marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Component: String → XPCOM
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: