Closed Bug 138951 Opened 23 years ago Closed 22 years ago

URLs are displayed using %-encoding when they should not

Categories

(SeaMonkey :: Location Bar, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 105909

People

(Reporter: Dan.Oscarsson, Assigned: hewitt)

Details

URL Bar is the nearests to the URL input field I could find, if they are not the same you have to change this bug to a different component. In Mozilla 1.0rc1 you are getting closer to handling non-ASCII in URLs. In Mozilla 0.9.8 when I entered the URL in the location field: /Tjänster in was changed to: /Tj%C3%A4nster/. In 1.0rc1 it is changed to: /Tj%E4nster/ I am guessing this is because my local character set is ISO 8859-1. As the server is doing a redirect using a UTF-8 URL (1.0rc1 do no longer handle redirects with ISO 8859-1 in) it looks like now Mozilla understands that my local character set is ISO 8859-1. This is fine! But the URL should not be displayed using %-encoding when the characters can be displayed as themselves. The above URL should be displayed as /Tjänster/ not as a %-encoding of the URL in local character set.
My understanding of RfC-1738 is that this bug is INVALID. pi
That RFC doesn't really have much to say about _display_ of URLs....
OS: SunOS → All
Hardware: Sun → All
"2.2. URL Character Encoding Issues" says: "URLs are written only with the graphic printable characters of the US-ASCII coded character set. The octets 80-FF hexadecimal are not used in US-ASCII, and the octets 00-1F and 7F hexadecimal represent control characters; these must be encoded." So I think this is what Mozilla should do. pi
Boris: darin has been working on implementing iDNS, which permits UTF-8 URLs, AIUI. Allowing UTF-8 characters in the URLbar should be part of this...
Have a look in: http://www.w3.org/International/2002/draft-w3c-i18n-iri-00.txt While a "URL" is said to be only ASCII and the "IRI" above is the name of the URL in international context. The most important thing for users is that a URL is displayed using the available characters instead of %-encoding everything not ASCII. People want to see things using their own letters. All software should hide protocol details (like %-encoding) from the user, if possible. You can also see bug id: 105909 which was entered before you did your redesign on URI-handling.
fixing this bug is definitely a noble goal IMO. it is extremely tricky, however, because there is no guarantee that non-ASCII bytes in an URL string correspond to any charset at all. moreover, some of the bytes may correspond to a charset and some may not... it is impossible to know for sure. nsIURI::originCharset hints to the charset of the unescaped URL string. it may not be correct though. we basically need some sort of decoder that will preserve the % escape sequences for characters that do not decode correctly.
Yes, you need a special decoder/encoder for displaying URLs. There is both the possibly ACE-encoded host name part and then the %-encoded path part. When displaying, all characters that the local locale (isprint()) says is printable, should be displayed as a character, others should be displayed as %-encoded in in non-hostname part. If host name contains characters that are not displayable, the host name need to be displayed as the IDNA ACE-encoded name (if IDNA get selected to be used).
You're assuming the local locale has something to do with the encoding of the URL... it need not at all.
cc'ing nhotta
loading file ÆæØøÅåÄäÖö.html will display as %C3%86%C3%A6%C3%98%C3%B8%C3%85%C3%A5%C3%84%C3%A4%C3%96%C3%B6.html That's competely unreadable. The URL bar should be considered to be a tiny editor. I need to read things there. There are several instances of this bug however. bug 137597 is another.
*** This bug has been marked as a duplicate of 105909 ***
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → DUPLICATE
Product: Core → SeaMonkey
You need to log in before you can comment on or make changes to this bug.