Closed
Bug 43852
Opened 24 years ago
Closed 24 years ago
"Send URLs as UTF-8" not working
Categories
(Core :: Internationalization, defect, P3)
Tracking
()
VERIFIED
FIXED
mozilla0.9
People
(Reporter: bill, Assigned: nhottanscp)
References
()
Details
(Keywords: helpwanted)
Mozilla build M14 seems to work sending URLs as UTF-8, to our
Internationalized domain name service at http://www.nunames.nu/eu-lang-test.htm.
But only if you type the URL into the browser's address form directly (or you
copy and paste it) - not if you click on a link.
The way it works is, if you type the Multilingual URL into the browser window,
using localized non-UTF-8 encoding (say your keyboard and OS encoding is for
ISO-8859-1, for example), the Mozilla M14 browser will convert that URL into
UTF-8 and send the request to the name server to be resolved. For
www.åreskutan.nu it does not display the UTF-8 in the browser window (which
should be "www.Ã¥reskutan.nu" in encoded form) but displays
www.%c3%a5reskutan.nu instead. Even though the browser displays these %
encodings, it actually sends the UTF-8 to the name server query, and this name
will resolve in our system using M14. Under the "rule of least astonishment", it
would be nice if it actually displayed the local language keyboard encoding to
the user, in this case, the originally typed IS-8859-1, or www.åreskutan.nu
But using M14, if you have a link on your page (as we do at
http://www.nunames.nu/eu-lang-test.htm) and you click on the link instead, it
correctly displays the utf-8 encoding at the bottom left side of the browser
(where it says "contacting http://www.Ã¥reskutan.nu"), so it seems to be able
to make the correct conversion to UTF-8 from a link which uses ISO-8859-1. But
it does not actually send that UTF-8 to the resolver and the query does not work
as a result. In this case, just as in the previous one, the UTF-8 encoding does
*not* display in the browser window. But this time, it displays a different
series of % encodings:
http://www.%c3%83%c2%a5reskutan.nu
And this is the UTF-8 it actually sends: "www.Ã¥reskutan.nu", which does not
resolve (since the actual name we are serving is encoded as www.Ã¥reskutan.nu.)
When using Mozilla build M15 (the same as NN 6 Beta, I believe), it also
correctly displays the utf-8 encoding at the bottom left side of the browser
(where it says "contacting http://www.Ã¥reskutan.nu"), so it also seems to be
able to make the correct conversion from a link which uses ISO-8859-1. But it
does not actually send *any* UTF-8 to the resolver, but sends the following %
type encoding to the name server: "www.%c3ƒ%c2%a5reskutan.nu" as ASCII, which
has nothing to do with the correct UTF-8 conversion it initially made, as far as
I can see, is not actually sent as any kind of UTF-8, and is rejected by our
system.
Two variations of a broken browser, I'd say.
Bill Semich
.NU Domain
Assignee | ||
Comment 1•24 years ago
|
||
Teruko, please try to reproduce this and confirm if reproducible.
Please also check 4.x behavior.
Assignee | ||
Comment 2•24 years ago
|
||
bill@mail.nic.nu, could you list up the problems? Also, please try newer builds
(M16 or later).
Using today's build win32 M17, type http://www.åreskutan.nu in the url bar and
hit return sends UTF-8 query "http://www.%D0%93%D2%90reskutan.nu/".
I am not sure what is broken.
Comment 3•24 years ago
|
||
I sent the following message to people listed in this bug report
in a reponse to Bill. I will repeat it here for the record.
I think we are doing the right thing mostly. But there is one
spec-related issue. We can return UTF-8 URLs in case the URL links on a
web page are not going to the same server as the page itself.
In that case, we don't have to be bound by the page/server charset.
For this small improvement, I will confirm the bug.
====
It seems to me that the current Mozilla is behaving more or less correctly with
regard to returning/sending the URL. To summarize the current
behavior,
1. In the location bar, there is no way we can assume the charset which the
target server requires, so we default to UTF-8 in case the URL
entered contains 8-bit data.
2. For web pages, if the pages are marked with the meta-charset (or if the
server sends the charset info with the page), then we return the URL
in that charset.
I think we are following these 2 basic principle described above currently.
Your web pages are marked as follows:
A. http://www.nunames.nu/eu-lang-test.htm (Windows-1252)
Therefore we return Latin 1 encoding in sending back the URL.
B. http://www.nunames.nu/NUregistryJP.htm (UTF-8)
Therefore we send UTF-8 URL back to the server -- I confirmed
that we indeed do this on this page.
C. http://www.nunames.nu/lldemo (Has no charset info)
Therefore we will send back in whatever charset the user has
selected in the Character Coding menu, or the default browser
view charset.
I think these are more or less correct but there is probably one
improvement we can make.
If the links on the page are going to different servers than the one
which is hosting the page, then we probably do not have to follow the
charset of the page in sending the URL from a link. I can think of
returning such URLs in UTF-8.
Perhaps we can make this bug into making such an improvement.
What do you think -- people on this list?
Other than this, I don't see much else we can do.
====
Status: UNCONFIRMED → NEW
Ever confirmed: true
FYI, there are unresolved issues with unicode canonicalization/normalization
and "case" folding with regards to iDNS.
Assignee | ||
Updated•24 years ago
|
Assignee: nhotta → ftang
Assignee | ||
Comment 6•24 years ago
|
||
Reassing to ftang.
Comment 7•24 years ago
|
||
Bob wrote:
>
> IE5 has a preference (on by default?) Tools|Internet Options...|Advanced
> [x] Always sendURLs as UTF-8
I received the following email from a Microsoft employee a while ago:
Subject: Re: The .nu domain's experiment with 8859-1encoded domain names.
Date: Mon, 10 Jan 2000 13:16:55 -0800
From: "Chris Wendt" <christw@microsoft.com>
To: "Erik van der Poel" <erik@netscape.com>, "Karlsson Kent - keka" <keka@im.se>
CC: <hostmaster@mail.nic.nu>, <duerst@w3.org>, <markdavis@ispchannel.com>,
<mark.davis@us.ibm.com>, <goldsmith@apple.com>, <chrispr@microsoft.com>,
<ftang@netscape.com>, <presnick@qualcomm.com>, <henrik.sviden@idg.se>
> > IE 5 can, apparently, always use Unicode/UTF-8 in (all of)
> > the URL, if set properly, already.
(all of) is not correct. Only in the part which comes before the first
question mark '?'.
> What does "if set properly" mean, exactly? How does IE5 deal with HTML
> forms in non-UTF-8 encodings when submitting them?
"If set properly" means that the advanced option "Always send URLs in UTF-8"
is ON. It is ON by default except for the Korean and Traditional Chinese
localized version (major globalization fauxpas, I agree :-(()
The query part (behind the first '?') is encoded in the encoding of the
document bearing the <form> or in the client machine's default code page if
the query is not submitted from a FORM. Clent code can override the default
setting for non-FORM queries as you can see in the IE5 autosearch feature
where the autosearch query is ALWAYS UTF-8.
If any part of the URL is pre-escaped when IE gets it, i.e. by the HTML
author, there will be no change applied.
I think we should look at the domain names without consideration of queries.
> (1) The Location field (URL bar) where users type the URL via keyboard.
> (2) Links in HTML pages <A HREF="...">
> For (1), we can convert the string typed by the user to UTF-8 before
> sending the domain name to the server.
>
> But for (2), what do you suggest? Should we convert it to UTF-8?
Definitely the same for both cases.
Comment 8•24 years ago
|
||
Kat, why should we treat URLs that go back to the original server differently
from URLs that go to other servers? Does some spec say this?
Comment 9•24 years ago
|
||
I don't think there is an RFC which defines that.
However, when we parse an server path (URL) which
is not escaped by the server itself, we do something like
what we are doing, i.e. assume the encoding of the
document and then escape it -- for the part below the host name
level. I think we discussed this issue in:
http://bugzilla.mozilla.org/show_bug.cgi?id=10373
So I am not surprised by what we are doing for the
domain name part of it.
My concern for distinguishing the original server vs. some
other server is motivated by the same consideration, but
I am not sure if that is the best thing to do. That is
should we distinguish how to deal with the domain name part
from the rest of the server paths?
In the absence of the real standard we can agree on, I think
we can only agree on the best practice.
Comment 10•24 years ago
|
||
The approach that Mozilla has taken when the existing browsers do not adhere to
the specs is to implement both, and switch between them based on the "Quirks
Mode" and "Standard Mode". So I guess one possibility here is to follow the
draft in Standard Mode, and follow some mixture of Nav4/MSIE in Quirks Mode.
The draft is ftp://ftp.ietf.org/internet-drafts/draft-masinter-url-i18n-05.txt.
Comment 11•24 years ago
|
||
nhotta- I think you are the P person for URL issue in our current matrix.
Reassign back to nhotta.
We probably need to discuss what we should do with this bug.
Assignee: ftang → nhotta
Assignee | ||
Updated•24 years ago
|
Status: NEW → ASSIGNED
Assignee | ||
Updated•24 years ago
|
Keywords: helpwanted
Comment 12•24 years ago
|
||
*** Bug 49939 has been marked as a duplicate of this bug. ***
Assignee | ||
Comment 13•24 years ago
|
||
*** Bug 55303 has been marked as a duplicate of this bug. ***
Assignee | ||
Updated•24 years ago
|
Target Milestone: --- → Future
Comment 14•24 years ago
|
||
I told Mozilla 0.7 to load http://%e2%88%ae.cr.yp.to. That domain (with
three 8-bit characters in place of %e2%88%ae, of course) has an address
in DNS, namely 131.193.178.181. Try ``dig contourcname.cr.yp.to'' and
you'll see, among other things, the relevant A record.
Mozilla gave me error 804b001e, the same error that it gives for
nonexistent.cr.yp.to, and said that the host wasn't found. I had
expected it to find the host without trouble.
Positive note: The not-found dialog box had a UTF-8 display of the name.
Negative note: The ``Resolving host'' display had an ISO-8859-1 display
of the name. I would have been disappointed in that behavior even if
ISO-8859-1 had been my default character set; domain names should be
displayed the same way throughout the world.
Assignee | ||
Comment 15•24 years ago
|
||
On my Windows2000, WinAPI WSAAsyncGetHostByName (in nsDNSService.cpp) is called
with a host name in UTF-8, and it returns a success.
I also got the same error even with 131.193.178.181.
Assignee | ||
Comment 16•24 years ago
|
||
>and it returns a success.
I mean calling the API succeeded but I got the error dialog which says the name
was not found.
Assignee | ||
Comment 17•24 years ago
|
||
>I also got the same error even with 131.193.178.181.
Not the same error, I got a page which says "file does not exist" but no dialog
appeared.
BTW, the following URLs (mentioned in the original report) are working with NS6.
http://www.%C3%B6resundsregionen.nu/
http://www.%e7%99%bb%e9%8c%b2%e6%89%80.nu/
I am not sure what is special about http://%e2%88%ae.cr.yp.to.
Comment 18•24 years ago
|
||
I've created an index.html now. If you connect to 131.193.178.181 and do
GET http://%e2%88%ae.cr.yp.to HTTP/1.1, you'll see it. But Mozilla says
the host isn't found.
Perhaps this is a UNIX-specific problem. The BIND DNS client library
chokes on unusual characters; does Mozilla still use it?
Assignee | ||
Updated•24 years ago
|
Target Milestone: Future → mozilla0.9
Assignee | ||
Comment 19•24 years ago
|
||
The issue originally filed is resolved. The remaining problem is specific to one
site, it can be filed separately. Actually, I cannot connect to 131.193.178.181.
Assignee | ||
Comment 20•24 years ago
|
||
The original problem is fixed.
Please file a separate bug for http://%e2%88%ae.cr.yp.to, but I see
131.193.178.181 does not work either.
Status: ASSIGNED → RESOLVED
Closed: 24 years ago
Resolution: --- → FIXED
Comment 21•24 years ago
|
||
Changed QA contact to andreasb@netscape.com. Andreas, please talk with nhotta
how to verify this.
QA Contact: teruko → andreasb
Comment 22•24 years ago
|
||
Original problem verified fixed in the following builds:
* 20010313 Linux
* 20010312 Win98
* 20010228 MacOS 9.1
Fix uncovered url display problems, reporting new bugs for this.
Status: RESOLVED → VERIFIED
You need to log in
before you can comment on or make changes to this bug.
Description
•