Closed
Bug 84032
Opened 23 years ago
Closed 22 years ago
Add "uriCharsetEncodingHint" field to nsIURI
Categories
(Core :: Networking, defect, P4)
Core
Networking
Tracking
()
RESOLVED
FIXED
Future
People
(Reporter: nhottanscp, Assigned: neeti)
Details
This was proposed in mozilla netlib newsgroup.
news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com
By adding "hint_charset", charset information will not be so the clients can use
that information to apply appropriate charset conversion. Note that
"hint_charset" may not match with nsIURI internal charset (which is UTF-8).
Comment 1•23 years ago
|
||
How about calling it |uriCharsetEncodingHint|? This would help make it clear
that the encoding in question is intended to apply to the URI itself, and not
the thing pointed to by the URI.
Reporter | ||
Comment 2•23 years ago
|
||
That sounds good, change the summary.
Summary: Add "hint_charset" field to nsIURI → Add "uriCharsetEncodingHint" field to nsIURI
Let me try to understand, this would most likely be acquired from the charset in
the HTML / HTTP response / overridden from View->Encoding menu?
What uses (other than IDN) do you reckon this would be good for?
Reporter | ||
Comment 4•23 years ago
|
||
Other cases would be path names, file names.
Comment 6•23 years ago
|
||
Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1
(you can query for this string to delete spam or retrieve the list of bugs I've
moved)
Target Milestone: mozilla1.0 → mozilla1.0.1
Comment 7•23 years ago
|
||
this proposal:
news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com
fails to satisfy several problems:
1) HTTP nsIURI's can be instantiated through redirects in which no charset
information is available. the server may generate a URL in response to a
redirect that contains URL-escaped non-ASCII characters. we have no way of
converting these URLs to UTF8.
2) also, servers may URL-escape characters that would interfer with parsing a
URL such as a '/' that is part of a path element and not a path element
delimiter... or a '@' in someones password. there is a set of reserved
characters that must be URL-escaped, otherwise the URL would fail to parse properly.
in summary, adding a charset attribute to nsIURI is insufficient.
Reporter | ||
Comment 8•23 years ago
|
||
>adding a charset attribute to nsIURI is insufficient.
I agree, but we need the hint charset in order to support existing documents if
we switch to UTF-8 URI.
Comment 9•23 years ago
|
||
nhotta:
the problem is that we cannot switch to a UTF-8 URL in all cases. in some cases
we have no way of converting the unescaped URL to UTF-8. now, that doesn't stop
us from converting the escaped URL to UTF-8, which is of course a no-op. so it
is possible for nsIURI to support UTF-8 w/o requiring that all unescaped URI's
be encoded using UTF-8. URIs for some protocols should simply never be
unescaped. HTTP is an example of one such protocol.
HTTP for example will most likely not use the charset attribute since there is
no way to know in general what charset sequences %80-%FF correspond to. HTTP
URLs shouldn't be unescaped.
there are of course exceptions, and we want to make sure that, in the cases
where charset information does exist, we try to show the user the unescaped URI.
this really means that we should show the user the URI with escape sequences %80
and above unescaped. other escape sequences should probably stay intact since
they could correspond to control characters and other reserved characters that
would either make the URL not display properly or make the URI mean something
entirely different.
Comment 10•23 years ago
|
||
As far as I know we store the URL escaped in nsStandardURL. This is a must! We
need to change the escaping to no longer escape chars > 127 by default. On
protocols that need to be in ASCII (could be stored on the protocol information)
we need a second special escaping run for all chars > 127. That can happen just
before sending the request to the server.
URLs as a whole should be unescaped for displaying purpose only.
Reporter | ||
Comment 11•23 years ago
|
||
>URLs as a whole should be unescaped for displaying purpose only.
I agree. The hint charset may be used to display if available.
I talked about the unescaped case. Unescaped non ASCII URI in a document (e.g.
HREF) is most likely in a charset of the document. I think we don't currently
convert thoese URI to UTF-8 but I am not sure if those are escaped in nsIURI or
left unescaped. In either cases, the hint charset would help to display those URI.
Reporter | ||
Comment 12•22 years ago
|
||
This is already available as originCharset in nsIURI.
Status: NEW → RESOLVED
Closed: 22 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•