Closed
Bug 81024
Opened 24 years ago
Closed 22 years ago
Mouseover for http://www.צה.com shows http://www.%E4%F6.com in statusbar
Categories
(Core :: Internationalization, defect)
Tracking
()
VERIFIED
FIXED
mozilla1.2beta
People
(Reporter: Junk_HbJ, Assigned: nhottanscp)
References
()
Details
(Keywords: intl, regression)
Attachments
(4 files)
(deleted),
image/jpeg
|
Details | |
(deleted),
image/jpeg
|
Details | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
ftang
:
review+
darin.moz
:
superreview+
|
Details | Diff | Splinter Review |
From Bugzilla Helper:
User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9) Gecko/20010505
BuildID: 2001051403
If mouse over http://www.צה.com, statusbar shows http://www.%E4%F6.com.
Reproducible: Always
Steps to Reproduce:
1. Mouse over http://www.צה.com.
Actual Results: Message "http://www.%E4%F6.com" in statusbar
Expected Results: Message "http://www.צה.com" in statusbar
This is a spinoff of bug 80942, which contains further comments about this matter.
Assignee | ||
Comment 1•24 years ago
|
||
I can reproduce this with NS6.01.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Updated•24 years ago
|
QA Contact: andreasb → jonrubin
*** This bug has been marked as a duplicate of 81022 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
Comment 5•24 years ago
|
||
This bug is not a duplicate.
bug 81022 is about trying to go to that URL,
while tthis bug is about moving a mouse pointer over a link.
The info status bar shows is different.
For this bug it's: http://www.%E4%F6.com
For bug 81022 it's: www.öä.com
And finally bug 81019 is a direct result of this bug.
While with bug 81022 a hostname is shown in the dialog.
These bugs have spanned from bug 80942 which has some more discussion on this.
reopening.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
got it. apologies for the wrong dup marking. I should have read it carefully.
->dougt
Assignee: gagan → dougt
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
Comment 7•23 years ago
|
||
what is milestone "mozilla1.0" anyway? Moving to future.
Target Milestone: mozilla1.0 → Future
Comment 8•23 years ago
|
||
Hello?
Has this bug been fixed? I couldn't reproduce it - screenshot is attached. In
it, the mouse cursor is above the hyperlink.
Comment 9•23 years ago
|
||
Comment 10•23 years ago
|
||
QA, can you please verify?
Status: NEW → RESOLVED
Closed: 24 years ago → 23 years ago
Resolution: --- → FIXED
Comment 11•23 years ago
|
||
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu
Comment 12•23 years ago
|
||
Verified on EN Win98SE, EN WinME, JP Win98SE and KO Win2K with build 2001083003,
this bug has been fixed.
Status: RESOLVED → VERIFIED
Reporter | ||
Comment 13•23 years ago
|
||
The little bugger is back in Mozilla 0.9.7.
http://www.צה.com brings now some garbled characters in the status bar.
Ironically, http://www.%E4%F6.com is shown as the correct URL. Weird...
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Comment 14•23 years ago
|
||
There is the same problem in trunk.
Updated•23 years ago
|
Keywords: regression
Comment 15•23 years ago
|
||
This has been fixed for a while. Please verify
Status: REOPENED → RESOLVED
Closed: 23 years ago → 23 years ago
Resolution: --- → FIXED
Comment 16•23 years ago
|
||
reopening
mouseover http://www.%E4%F6.com shows http://www.צה.com
but mouseover http://www.צה.com shows some gibberish.
And if you right click and Copy Link Address on it, this is what will be copied:
http://www.%D0%96%D0%94.com/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
Comment 17•23 years ago
|
||
wierd. what build are you using alexey?
Comment 18•23 years ago
|
||
Win32 2002012903
bug 42898 describes similar behaviour
Assignee | ||
Comment 19•23 years ago
|
||
This is for host name only, the remaining problem of bug 102656.
Reassign to nhott.
http://www.צה.com/צה
Assignee: dougt → nhotta
Status: REOPENED → NEW
Assignee | ||
Updated•23 years ago
|
Target Milestone: Future → ---
Comment 21•23 years ago
|
||
It seems this new problem is caused by the fact that ConvertHostnameToUTF8
within NS_MakeAbsoluteURIWithCharset causes different parts of the returned URI
to be encoded differently. Why do we want to encode different parts of the URI
differently in the result of that function rather than fixing the callers to
understand a consistent encoding?
Assignee | ||
Comment 22•23 years ago
|
||
Host name part, it is agreed that Unicode is used (with ACE encoding). But
others like file names are not usually supported by the servers if we use
Unicode. We may internally convert to Unicode then convert back to the charset,
we need to remember that charset (bug 84032).
Comment 23•23 years ago
|
||
are there any specs that talk about non ascii characters in portions of the URL
other than the hostname? that is, can we always use UTF8 as the encoding for
the entire URL string?
Assignee | ||
Comment 24•23 years ago
|
||
There used to be a internet draft for non ASCII URL which was Unicode base. It
has expired for a while, I don't have the last one.
William, do you know anything about that, has the new draft been posted?
I think it is possible to keep URL in UTF-8 internally but as I mentioned, we
need to also keep a charset (e.g. a document charset) so we can convert URL back
if necessary.
Comment 25•23 years ago
|
||
nhotta: in what cases would it be necessary to convert the URL back to a
document specific charset?
Assignee | ||
Comment 26•23 years ago
|
||
E.g., path names, UTF-8 is not usally understood by the server.
Comment 27•23 years ago
|
||
right, but is there any guarantee that servers will understand any extended
ascii encoding?
Assignee | ||
Comment 28•23 years ago
|
||
No, the web author has to know the server's charset then apply URL escape in
order to guarantee the link to work.
But usually people just put non ASCII path names in the docuements instead and
that works most of the time. So there are many existing pages like that.
Comment 29•23 years ago
|
||
ic, that does complicate things. so, we need to ensure that we send out URLs
using the document charset. technically we should be escaping the URLs when we
send them, cuz i think we have to limit ourselves to 7-bit ascii when we hit the
net. hostname's need to be encoded using UTF-8 for IDN purposes. the result is
what we have today which is an URL string composed of different encodings... yuck!
i need to think about this some more... i'm not sure what the right solution is.
if we move to a world in which nsIURI/nsIURL expect UTF-8 parameters, then
we'll need to do charset conversions in necko to generate the right URL string
for sending to servers. and what about proxy servers?? double yuck!
Comment 30•23 years ago
|
||
The IRI draft is here:
http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt
The problem seems to be that the CString or |char *| version of the URL spec
is being passed along to various other modules like content or even docshell.
I agree that having 2 encodings within the same string is yucky.
Comment 31•23 years ago
|
||
ok, so if nsIURI has a charset associated with it, then it seems like it would
be best to encode the nsIURI members (including the hostname) in that charset
and then escape them when passing the values across the nsIURI interface.
before using the hostname at the networking level it would have to be converted
to UTF-8 and then ACE encoded. this unfortunately puts a lot of burden on
nsIURI consumers because they must handle various charsets if they want to
convert the URI elements into something readable that is not URL escaped.
alternatively, we could also provide a nsIUnicodeURI and nsIUnicodeURL that
provides UCS2 equivalents of the attributes. then nsIURI and nsIURL would
remain US-ASCII. this seems like it might be the best solution moving forward,
but i still need to think this out some more.
Assignee | ||
Comment 32•23 years ago
|
||
How about having the string in UTF-8 then convert to a charset since only the
server needs that charset?
Comment 33•23 years ago
|
||
nhotta: yeah, i was thinking about that too.
Comment 35•23 years ago
|
||
*** Bug 120503 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•23 years ago
|
Target Milestone: --- → mozilla1.2
Comment 36•23 years ago
|
||
The key to this is in nsWebShell::OnOverLink() where the call to
nsITextToSubURI::UnEscapeAndConvert() unescapes the entire URL string. This is
not desirable, since the hostname part is in UTF-8. If we could change the
method signature to take an nsIURI instead then we could at least use the URL
segments to build a string for display.
The problem is that nsIURI is not passed down from the function call chain.
Comment 37•23 years ago
|
||
This patch first converts the URL to document charset before calling
textToSubURI->UnEscapeAndConvert() (instead of doing NS_ConvertUCS2toUTF8).
Is there a function to use for this kind of conversion instead of going
to the trouble of getting the ccm, then get the encoder, etc?
Comment 38•23 years ago
|
||
Naoki: Could you take a look at this patch please?
Thanks!
Comment 39•23 years ago
|
||
i don't think you want to unescape characters in the range U+00..U+7F
if any such chars are escaped, they are probably just control characters or
other characters that should be escaped. now, if the URL is a file: URL, i
suppose you could argue that unescaping all chars is likely valid. but, doing
so for HTTP URLs could lead to all sorts of problems (e.g., embedded nulls).
when my patch for bug 124042 lands, there'll be an option to NS_UnescapeURL that
allows you to only unescape bytes with the 8-th bit set.
Assignee | ||
Comment 40•23 years ago
|
||
So the host part is converted to a document charset then later converted back to
UTF-8. Is it possible to process the host name and other part separately?
If the host name is non ASCII then you can call UnEscapeAndConvert with charset
as "UTF-8" then you don't have to put the conversion code there.
Comment 41•23 years ago
|
||
Since nsWebShell::OnOverLink() only has the spec, we would have to instantiate
an nsIURI to parse it don't we?
Doing UnEscapeAndConvert using "UTF-8" would only work on the hostname part
alone.
Darin: UnEscapeAndConvert uses nsUnescape(), does it unescape 00-7F?
Comment 42•23 years ago
|
||
nsUnescape unescapes everything... nsUnescape should never be used. there are
much better alternatives. nsUnescapeCount returns the length of the unescaped
string, so you can be sure not to be fooled by embedded nulls.
once my patch for bug 124042 lands, there'll be a better option. NS_UnescapeURL
which has an argument to specify that only non-ASCII characters should be
unescaped. there'll also be a version of NS_UnescapeURL that returns the result
in a nsACString, which internally handles embedded nulls.
Assignee | ||
Comment 43•22 years ago
|
||
The current plan is to try to unescape URI for the status bar, by trying UTF-8
and originCharset of nsIURI.
Assignee | ||
Comment 44•22 years ago
|
||
The new function tries UTF-8 before the document charset, so no need to special
case mailto.
Comment 45•22 years ago
|
||
Comment on attachment 95007 [details] [diff] [review]
Changed to call a new function to unescape URI for UI.
r=ftang
Attachment #95007 -
Flags: review+
Assignee | ||
Updated•22 years ago
|
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Comment 46•22 years ago
|
||
Comment on attachment 95007 [details] [diff] [review]
Changed to call a new function to unescape URI for UI.
sr=darin (sorry for taking so long to review this patch... it looks great!)
Attachment #95007 -
Flags: superreview+
Assignee | ||
Comment 47•22 years ago
|
||
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago → 22 years ago
Resolution: --- → FIXED
Assignee | ||
Comment 49•22 years ago
|
||
*** Bug 81022 has been marked as a duplicate of this bug. ***
You need to log in
before you can comment on or make changes to this bug.
Description
•