Closed Bug 81024 Opened 24 years ago Closed 22 years ago

Mouseover for http://www.צה.com shows http://www.%E4%F6.com in statusbar

Categories

(Core :: Internationalization, defect)

x86
Windows 98
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla1.2beta

People

(Reporter: Junk_HbJ, Assigned: nhottanscp)

References

()

Details

(Keywords: intl, regression)

Attachments

(4 files)

From Bugzilla Helper: User-Agent: Mozilla/5.0 (Windows; U; Win98; en-US; rv:0.9) Gecko/20010505 BuildID: 2001051403 If mouse over http://www.צה.com, statusbar shows http://www.%E4%F6.com. Reproducible: Always Steps to Reproduce: 1. Mouse over http://www.צה.com. Actual Results: Message "http://www.%E4%F6.com" in statusbar Expected Results: Message "http://www.צה.com" in statusbar This is a spinoff of bug 80942, which contains further comments about this matter.
I can reproduce this with NS6.01.
Status: UNCONFIRMED → NEW
Ever confirmed: true
Reassign to gagan.
Assignee: nhotta → gagan
Keywords: intl
QA Contact: andreasb → jonrubin
I think bug 81019 is a direct result of this.
Blocks: 81019
*** This bug has been marked as a duplicate of 81022 ***
Status: NEW → RESOLVED
Closed: 24 years ago
Resolution: --- → DUPLICATE
This bug is not a duplicate. bug 81022 is about trying to go to that URL, while tthis bug is about moving a mouse pointer over a link. The info status bar shows is different. For this bug it's: http://www.%E4%F6.com For bug 81022 it's: www.öä.com And finally bug 81019 is a direct result of this bug. While with bug 81022 a hostname is shown in the dialog. These bugs have spanned from bug 80942 which has some more discussion on this. reopening.
Status: RESOLVED → REOPENED
Resolution: DUPLICATE → ---
got it. apologies for the wrong dup marking. I should have read it carefully. ->dougt
Assignee: gagan → dougt
Status: REOPENED → NEW
Target Milestone: --- → mozilla1.0
what is milestone "mozilla1.0" anyway? Moving to future.
Target Milestone: mozilla1.0 → Future
Hello? Has this bug been fixed? I couldn't reproduce it - screenshot is attached. In it, the mouse cursor is above the hyperlink.
Attached image screenshot while checking testcase (deleted) —
QA, can you please verify?
Status: NEW → RESOLVED
Closed: 24 years ago23 years ago
Resolution: --- → FIXED
mass change, switching qa contact from jonrubin to ruixu.
QA Contact: jonrubin → ruixu
Verified on EN Win98SE, EN WinME, JP Win98SE and KO Win2K with build 2001083003, this bug has been fixed.
Status: RESOLVED → VERIFIED
The little bugger is back in Mozilla 0.9.7. http://www.צה.com brings now some garbled characters in the status bar. Ironically, http://www.%E4%F6.com is shown as the correct URL. Weird...
Status: VERIFIED → REOPENED
Resolution: FIXED → ---
Attached image The bug is also back in trunk (deleted) —
There is the same problem in trunk.
Keywords: regression
This has been fixed for a while. Please verify
Status: REOPENED → RESOLVED
Closed: 23 years ago23 years ago
Resolution: --- → FIXED
reopening mouseover http://www.%E4%F6.com shows http://www.צה.com but mouseover http://www.צה.com shows some gibberish. And if you right click and Copy Link Address on it, this is what will be copied: http://www.%D0%96%D0%94.com/
Status: RESOLVED → REOPENED
Resolution: FIXED → ---
wierd. what build are you using alexey?
Win32 2002012903 bug 42898 describes similar behaviour
This is for host name only, the remaining problem of bug 102656. Reassign to nhott. http://www.צה.com/צה
Assignee: dougt → nhotta
Status: REOPENED → NEW
Status: NEW → ASSIGNED
Target Milestone: Future → ---
It seems this new problem is caused by the fact that ConvertHostnameToUTF8 within NS_MakeAbsoluteURIWithCharset causes different parts of the returned URI to be encoded differently. Why do we want to encode different parts of the URI differently in the result of that function rather than fixing the callers to understand a consistent encoding?
Host name part, it is agreed that Unicode is used (with ACE encoding). But others like file names are not usually supported by the servers if we use Unicode. We may internally convert to Unicode then convert back to the charset, we need to remember that charset (bug 84032).
are there any specs that talk about non ascii characters in portions of the URL other than the hostname? that is, can we always use UTF8 as the encoding for the entire URL string?
There used to be a internet draft for non ASCII URL which was Unicode base. It has expired for a while, I don't have the last one. William, do you know anything about that, has the new draft been posted? I think it is possible to keep URL in UTF-8 internally but as I mentioned, we need to also keep a charset (e.g. a document charset) so we can convert URL back if necessary.
nhotta: in what cases would it be necessary to convert the URL back to a document specific charset?
E.g., path names, UTF-8 is not usally understood by the server.
right, but is there any guarantee that servers will understand any extended ascii encoding?
No, the web author has to know the server's charset then apply URL escape in order to guarantee the link to work. But usually people just put non ASCII path names in the docuements instead and that works most of the time. So there are many existing pages like that.
ic, that does complicate things. so, we need to ensure that we send out URLs using the document charset. technically we should be escaping the URLs when we send them, cuz i think we have to limit ourselves to 7-bit ascii when we hit the net. hostname's need to be encoded using UTF-8 for IDN purposes. the result is what we have today which is an URL string composed of different encodings... yuck! i need to think about this some more... i'm not sure what the right solution is. if we move to a world in which nsIURI/nsIURL expect UTF-8 parameters, then we'll need to do charset conversions in necko to generate the right URL string for sending to servers. and what about proxy servers?? double yuck!
The IRI draft is here: http://www.ietf.org/internet-drafts/draft-ietf-idn-uri-01.txt The problem seems to be that the CString or |char *| version of the URL spec is being passed along to various other modules like content or even docshell. I agree that having 2 encodings within the same string is yucky.
ok, so if nsIURI has a charset associated with it, then it seems like it would be best to encode the nsIURI members (including the hostname) in that charset and then escape them when passing the values across the nsIURI interface. before using the hostname at the networking level it would have to be converted to UTF-8 and then ACE encoded. this unfortunately puts a lot of burden on nsIURI consumers because they must handle various charsets if they want to convert the URI elements into something readable that is not URL escaped. alternatively, we could also provide a nsIUnicodeURI and nsIUnicodeURL that provides UCS2 equivalents of the attributes. then nsIURI and nsIURL would remain US-ASCII. this seems like it might be the best solution moving forward, but i still need to think this out some more.
How about having the string in UTF-8 then convert to a charset since only the server needs that charset?
nhotta: yeah, i was thinking about that too.
Keywords: nsbeta1
nsbeta1- per triage meeting
Keywords: nsbeta1nsbeta1-
*** Bug 120503 has been marked as a duplicate of this bug. ***
Target Milestone: --- → mozilla1.2
The key to this is in nsWebShell::OnOverLink() where the call to nsITextToSubURI::UnEscapeAndConvert() unescapes the entire URL string. This is not desirable, since the hostname part is in UTF-8. If we could change the method signature to take an nsIURI instead then we could at least use the URL segments to build a string for display. The problem is that nsIURI is not passed down from the function call chain.
This patch first converts the URL to document charset before calling textToSubURI->UnEscapeAndConvert() (instead of doing NS_ConvertUCS2toUTF8). Is there a function to use for this kind of conversion instead of going to the trouble of getting the ccm, then get the encoder, etc?
No longer blocks: 81019
Naoki: Could you take a look at this patch please? Thanks!
i don't think you want to unescape characters in the range U+00..U+7F if any such chars are escaped, they are probably just control characters or other characters that should be escaped. now, if the URL is a file: URL, i suppose you could argue that unescaping all chars is likely valid. but, doing so for HTTP URLs could lead to all sorts of problems (e.g., embedded nulls). when my patch for bug 124042 lands, there'll be an option to NS_UnescapeURL that allows you to only unescape bytes with the 8-th bit set.
So the host part is converted to a document charset then later converted back to UTF-8. Is it possible to process the host name and other part separately? If the host name is non ASCII then you can call UnEscapeAndConvert with charset as "UTF-8" then you don't have to put the conversion code there.
Since nsWebShell::OnOverLink() only has the spec, we would have to instantiate an nsIURI to parse it don't we? Doing UnEscapeAndConvert using "UTF-8" would only work on the hostname part alone. Darin: UnEscapeAndConvert uses nsUnescape(), does it unescape 00-7F?
nsUnescape unescapes everything... nsUnescape should never be used. there are much better alternatives. nsUnescapeCount returns the length of the unescaped string, so you can be sure not to be fooled by embedded nulls. once my patch for bug 124042 lands, there'll be a better option. NS_UnescapeURL which has an argument to specify that only non-ASCII characters should be unescaped. there'll also be a version of NS_UnescapeURL that returns the result in a nsACString, which internally handles embedded nulls.
The current plan is to try to unescape URI for the status bar, by trying UTF-8 and originCharset of nsIURI.
Blocks: 157673
Keywords: nsbeta1-nsbeta1
Depends on: 110943
The new function tries UTF-8 before the document charset, so no need to special case mailto.
Comment on attachment 95007 [details] [diff] [review] Changed to call a new function to unescape URI for UI. r=ftang
Attachment #95007 - Flags: review+
Target Milestone: mozilla1.2alpha → mozilla1.2beta
Comment on attachment 95007 [details] [diff] [review] Changed to call a new function to unescape URI for UI. sr=darin (sorry for taking so long to review this patch... it looks great!)
Attachment #95007 - Flags: superreview+
checked in to the trunk
Status: ASSIGNED → RESOLVED
Closed: 23 years ago22 years ago
Resolution: --- → FIXED
Verified fixed with 2002-09-17 trunk.
Status: RESOLVED → VERIFIED
*** Bug 81022 has been marked as a duplicate of this bug. ***
Depends on: 180372
No longer blocks: 157673
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: