Closed
Bug 132583
Opened 23 years ago
Closed 22 years ago
Anchors that contain non-UTF-8 characters may hang Mozilla
Categories
(Core :: XPCOM, defect, P4)
Tracking
()
VERIFIED
FIXED
mozilla1.0
People
(Reporter: emaijala+moz, Assigned: jag+mozilla)
References
Details
(Keywords: intl, Whiteboard: [adt2])
Attachments
(2 files, 2 obsolete files)
(deleted),
text/html
|
Details | |
(deleted),
patch
|
scc
:
review+
darin.moz
:
superreview+
chofmann
:
approval+
|
Details | Diff | Splinter Review |
Clicking anchor that contains scandinavian characters (anchor is "Origo
käytössä") on a page with Windows-1252 character set hangs Mozilla. This is
because the conversion from UTF8 to USC2 goes to an infinite loop if given wrong
input.
I'll attach a patch that checks the validity of the string before passing it to
the conversion.
I hope that the component is correct.
Reporter | ||
Comment 1•23 years ago
|
||
I added a small function to string/obsolete/nsString2.h. It is used in
docshell/base/nsDocShell.cpp to check if the string is valid UTF-8 before
passing it to NS_ConvertUTF8toUCS2().
Now I'm not sure if this is the best way to do it or if the functions are in
correct places. Input requested.
Reporter | ||
Comment 2•23 years ago
|
||
Don't try this in the middle of something important :)
Comment 3•23 years ago
|
||
isn't the real issue that NS_ConvertUTF8toUCS2 will loop on invalid input?
Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Clicking anchor that contains scandinavian characters may hang Mozilla
Reporter | ||
Comment 4•23 years ago
|
||
Well, in the conversion code there are lines such as this:
NS_ERROR("Not a UTF-8 string. This code should only be used for converting from
known UTF-8 strings.");
That's why I assumed it's better to avoid calling it..
Reporter | ||
Comment 5•23 years ago
|
||
Oh, I forgot to say that it would be a simple fix to make it not loop forever.
It will already fail an assertion in the loop. Anyway, I'm not sure if it's
proper, because then the result might be unexpected.
Sorry for the spam.
Reporter | ||
Comment 6•23 years ago
|
||
After investigating more I've found that there are more places where this can
happen (nsHTMLContentSink, nsXMLContentSink at least). Changing summary to
reflect this better.
Assignee: adamlock → jaggernaut
Component: Embedding: Docshell → String
QA Contact: adamlock → scc
Summary: Clicking anchor that contains scandinavian characters may hang Mozilla → Anchors that contain non-UTF-8 characters may hang Mozilla
Reporter | ||
Comment 7•23 years ago
|
||
Comment on attachment 75387 [details] [diff] [review]
Patch to avoid trying the conversion if the string isn't UTF-8
Does not fix all situations. Need to find better place for isValidUTF8() also.
Attachment #75387 -
Attachment is obsolete: true
Reporter | ||
Comment 8•23 years ago
|
||
If we don't get anything else into 1.0, I suggest we use this. This will
prevent the string iterator from staying in the infinite loop. Practically I've
just done what there was in comments (breaking out if !one_hop).
Comment 9•23 years ago
|
||
*** Bug 134714 has been marked as a duplicate of this bug. ***
Comment 10•23 years ago
|
||
adding a test case from bug 134714 and nominating for nsbeta1
const PRUnichar * str = NS_ConvertUTF8toUCS2("\xE0\xE0\xE0").get();
Keywords: nsbeta1
Comment 11•23 years ago
|
||
Nav triage team: nsbeta1+, adt2
Updated•22 years ago
|
Priority: -- → P4
Comment 13•22 years ago
|
||
This has to be fixed soon. Bug 139842, bug 142930 are caused by this.
Comment 14•22 years ago
|
||
cc to waterson, I think NS_ConvertUTF8toUCS2 should validate the input and
detect illegal UTF-8.
Comment 15•22 years ago
|
||
This is not limited to the corrupted data outside from Mozilla.
It can be caused by defrating PRUnichar to char then calling
NS_ConvertUTF8toUCS2 later.
Assignee | ||
Comment 16•22 years ago
|
||
This will set the string to an empty string if the input isn't UTF8.
Attachment #77060 -
Attachment is obsolete: true
Comment 17•22 years ago
|
||
Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
That way, the user can see something like "???abc" instead of a blank string.
Comment 18•22 years ago
|
||
> Can the code put 0xFFFD for invalid bytes until it sees any valid bytes?
> That way, the user can see something like "???abc" instead of a blank string.
>
see the patch of bug 128896
http://bugzilla.mozilla.org/attachment.cgi?id=76351&action=view
Assignee | ||
Comment 19•22 years ago
|
||
Yes, I see your solution there, but there is (or was) code out there that went
something like:
NS_ConvertUTF8toUCS2 temp(someUnknownEncodingStr);
if (temp.IsEmpty()) {
// Looks like we're not UTF8, try something else ...
...
}
Which the old version of this code supported and I broke when I refactored parts
of it.
What I really want (and that's why I've put NS_ERRORs in place) is that people
only call this code when they know that it's a UTF8 string, and to use the intl
converter code when they don't know what they're dealing with.
Comment 20•22 years ago
|
||
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input
>Index: nsString2.cpp
>- SetCapacity(count);
>+ SetLength(count);
> // |SetCapacity| normally doesn't guarantee the use we are putting it to here (see its interface comment in nsAString.h),
> // we can only use it since our local implementation, |nsString::SetCapacity|, is known to do what we want
seems like this comment no longer applies.
otherwise, the patch looks good to me. r/sr=darin
Attachment #82858 -
Flags: superreview+
Comment 21•22 years ago
|
||
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input
r=scc. It looks reasonable to me.
Attachment #82858 -
Flags: review+
Assignee | ||
Comment 22•22 years ago
|
||
Checked in on trunk. Will check in on branch when I get the appropriate approvals.
Updated•22 years ago
|
Comment 23•22 years ago
|
||
Let's get this verified on the trunk first.
Comment 25•22 years ago
|
||
adding adt1.0.0+. Please get drivers approval and then checkin to the 1.0 Branch.
Comment 26•22 years ago
|
||
Comment on attachment 82858 [details] [diff] [review]
Make NS_ConvertUTF8toUCS2 deal better with faulty input
a=chofmann,dbaron,valeski for the 1.0 branch
Attachment #82858 -
Flags: approval+
Comment 28•22 years ago
|
||
*** Bug 145479 has been marked as a duplicate of this bug. ***
Updated•22 years ago
|
Updated•4 years ago
|
Component: String → XPCOM
You need to log in
before you can comment on or make changes to this bug.
Description
•