Closed
Bug 73292
Opened 24 years ago
Closed 23 years ago
Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2
Categories
(Core :: XPCOM, defect)
Core
XPCOM
Tracking
()
RESOLVED
FIXED
People
(Reporter: jag+mozbugs, Assigned: jag+mozilla)
References
Details
Attachments
(1 file, 4 obsolete files)
(deleted),
patch
|
dbaron
:
review+
jag+mozilla
:
superreview+
|
Details | Diff | Splinter Review |
For parity with NS_ConvertUCS2toUTF8, which has a constructor for nsAString&,
and because this just looks silly:
NS_ConvertASCIItoUCS2 uStr(aStr.get());
Reporter | ||
Comment 1•24 years ago
|
||
Adding dependency (blocks bug 73009), removing silly cc.
Blocks: 73009
Comment 2•24 years ago
|
||
Targeting 1.1 since this isn't crucial, but it is a step in the right direction
for consistency and not too hard so I'd like to get it in before then if
possible. If I make a tracking bug for handling encoding/conversion issues, I
may move this to block that tracker instead.
Status: NEW → ASSIGNED
Target Milestone: --- → mozilla1.1
Reporter | ||
Comment 3•24 years ago
|
||
Comment 4•24 years ago
|
||
re-targeting milestones, starting from a clean slate
Target Milestone: mozilla1.1 → ---
Comment 5•23 years ago
|
||
I'm assuming a good bit of this is roughly copied, so I'm not reviewing too
closely...
+ NS_ERROR("not a UTF-8 string");
I'd prefer NS_ASSERTION to NS_ERROR.
+ nsReadingIterator<char> p, end;
I prefer |nsACString::const_iterator p, end;|.
+ explicit
+ NS_ConvertUTF8toUCS2( const nsACString& aCString )
+ {
+ Init( aCString );
+ }
It would be nice to also have:
explicit
NS_ConvertUTF8toUCS2( const nsAFlatCString& aCString )
{
Init( aCString.get() );
}
to improve performance for things like nsCString and nsDependentString.
Also, you could have the same thing (also inline) for NS_ConvertASCIItoUCS2.
With that, r=dbaron.
Assignee | ||
Comment 6•23 years ago
|
||
Assignee | ||
Comment 7•23 years ago
|
||
Taking this.
dbaron, could you review the new patch?
Assignee: scc → jaggernaut
Status: ASSIGNED → NEW
Comment 8•23 years ago
|
||
NS_ConvertASCIItoUCS2(const nsACString&)
* should use nsACString::const_iterator rather than
nsReadingIterator<char>
* should call aCString.Length() and expand the buffer at the beginning
to avoid multiple allocations
NS_ConvertUTF8toUCS2(const nsACString&)
* It would be nice to unify this with Init -- perhaps Init could take
an nsACString& and you could use nsDependentString elsewhere?
Making the const char* constructor avoid 3 passes would be a trick,
though...
* Both loops over the string would probably be better written as
double for-loops - outer over fragments and inner over characters.
* You should probably allow embedded nulls, right? Or not?
Assignee | ||
Comment 9•23 years ago
|
||
> NS_ConvertASCIItoUCS2(const nsACString&)
>
> * should use nsACString::const_iterator rather than
> nsReadingIterator<char>
done
> * should call aCString.Length() and expand the buffer at the beginning
> to avoid multiple allocations
done
> NS_ConvertUTF8toUCS2(const nsACString&)
>
> * It would be nice to unify this with Init -- perhaps Init could take
> an nsACString& and you could use nsDependentString elsewhere?
> Making the const char* constructor avoid 3 passes would be a trick,
> though...
> * Both loops over the string would probably be better written as
> double for-loops - outer over fragments and inner over characters.
Let me think about these two.
> * You should probably allow embedded nulls, right? Or not?
Erh... scc?
Assignee | ||
Updated•23 years ago
|
Status: NEW → ASSIGNED
Summary: Add const nsAReadableCString& constructor to NS_Convert(ASCII|UTF8)toUCS2 → Add const nsACString& constructor to NS_Convert(ASCII|UTF8)toUCS2
Assignee | ||
Comment 10•23 years ago
|
||
Assignee | ||
Updated•23 years ago
|
Attachment #31191 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #41917 -
Attachment is obsolete: true
Assignee | ||
Comment 11•23 years ago
|
||
I'm all ears on how to avoid the initial pass (in nsDependentCString) for the
|const char*| constructor of NS_ConvertUTF8toUCS2.
Assignee | ||
Comment 12•23 years ago
|
||
>>nsString2.cpp:
>>
>>the iterators (ns{Reading,Writing}Iterator) ought to have a fragment_type so
>>that you don't have to use nsReadableFragment<char> explicitly.
>
>Next pass? New bug?
>
>> + nsReadableFragment<char> frag(start.fragment());
>>
>>Maybe
>>
>> const nsReadableFragment<char>& frag = start.fragment();
>>
>>would be better?
>
>I could use that.
>
>>How about an NS_ASSERTION(count == mLength, "calculator calculated
>>incorrect length"); at the end of nsConvertUTF8toUCS2::Init?
>
>Done.
Btw, I think this would only occur when we encounter a UTF8 string that isn't,
in which case we already error. If we're indeed fed utf8 straight from the
netwerk, we might want to deal with this more nicely.
>>I think we need to look at the error-handling of NS_ConvertUTF8toUCS2
>>carefully since we can expect it to be called on input from the network
>>(I would think, anyway, although perhaps not).
>
>I should do that.
>
>>+ else
>>+ {
>>+ ucs4 -= 0x00010000;
>>+ *mBuffer++ = 0xD800 | (0x000003FF & (ucs4 >> 10));
>>+ *mBuffer++ = 0xDC00 | (0x000003FF & ucs4);
>>+ }
>>
>>Is this UCS2 or UTF-16 or what?
>
>I'll come back to you on this one :-)
Hmmm... This is UTF-16 [0], and it will mess us up (even in the old code) in
that the UTF8 calculator will only have assigned one character for this, and the
above writes out two. Eek!
[0] ftp://ftp.isi.edu/in-notes/rfc2781.txt
Comment 13•23 years ago
|
||
That code sure looks wrong to me. I've cc'd ftang and yokoyama, from whom I
stole the conversion code. Guys, we don't really do UTF-16, right?
Assignee | ||
Comment 14•23 years ago
|
||
Comment 15•23 years ago
|
||
Why bother maintaining mLength in ConvertUTF8toUCS2? Or should it just be
|#ifdef DEBUG|?
Comment 16•23 years ago
|
||
The extra pass with nsDependentString calculating the length is
inefficient -- at the very least, don't bother with the ~PRUint32(0),
just use the one-parameter constructor, since they mean the same thing
and the latter is *much* clearer. I guess, for simplicity, don't worry
about the extra pass for now, though -- it's also the best way to allow
embedded nulls in all the other cases, I think. Same for the double
length calculation.
+ if ( ucs4 != 0xfeff ) // ignore BOM
+ *mBuffer++ = ucs4;
Don't you need:
else
--mLength;
However, I think it would be better to scrap mLength and keep an
mStart instead of mLength, initialized in the ctor to mBuffer, and
have Length() be |{ return mBuffer - mStart; }|.
One other optimization that would be nice is to, after the check against
minUcs4, check |ucs4 < 0xD800| to short-circuit the rest of the checks
for the vast majority of characters (with a comment saying that it's
short-ciruiting what is below in case someone changes what is below).
Should the |SetCapacity| have a |+1| or not?
Other than that, r=dbaron. scc should super-review.
Assignee | ||
Comment 17•23 years ago
|
||
I'll rewrite Length() as you proposed.
And you're right, nsStr's code takes care of adding space for the 0 terminator,
so I don't need to add 1 myself when calling SetCapacity.
Assignee | ||
Comment 18•23 years ago
|
||
At least, if I'm reading this code correctly:
http://lxr.mozilla.org/seamonkey/source/string/obsolete/nsStr.cpp#680
Assignee | ||
Comment 19•23 years ago
|
||
Comment 20•23 years ago
|
||
Comment on attachment 53457 [details] [diff] [review]
Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already.
r=dbaron
Attachment #53457 -
Flags: review+
Assignee | ||
Comment 21•23 years ago
|
||
scc, sr= please?
Assignee | ||
Updated•23 years ago
|
Attachment #50925 -
Attachment is obsolete: true
Assignee | ||
Updated•23 years ago
|
Attachment #48525 -
Attachment is obsolete: true
Comment 22•23 years ago
|
||
sr=scc
Assignee | ||
Comment 23•23 years ago
|
||
Comment on attachment 53457 [details] [diff] [review]
Addressing comments. Also removed init and null terminate code from conversion constructors, nsAutoString's constructor does that already.
and sr=scc
Attachment #53457 -
Flags: superreview+
Assignee | ||
Comment 24•23 years ago
|
||
Ran into a little problem in NS_ConvertUTF8toUCS2::Init:
if |count| is |0|, SetCapacity will free the allocated buffer and make mUStr
point at a shared \0 string, which we can't write our null terminator to. Added
an |mCapacity| guard.
Checked in, marking fixed.
Status: ASSIGNED → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•