84032 - Add "uriCharsetEncodingHint" field to nsIURI

Reporter

Description

•

23 years ago

This was proposed in mozilla netlib newsgroup. news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com By adding "hint_charset", charset information will not be so the clients can use that information to apply appropriate charset conversion. Note that "hint_charset" may not match with nsIURI internal charset (which is UTF-8).

Dan Mosedale (:dmosedale, :dmose)

Comment 1

•

23 years ago

How about calling it |uriCharsetEncodingHint|? This would help make it clear that the encoding in question is intended to apply to the URI itself, and not the thing pointed to by the URI.

nhottanscp

Reporter

Comment 2

•

23 years ago

That sounds good, change the summary.

Summary: Add "hint_charset" field to nsIURI → Add "uriCharsetEncodingHint" field to nsIURI

neeti

Assignee

Updated

•

23 years ago

Priority: -- → P4

Target Milestone: --- → mozilla1.0

Wil Tan

Comment 3

•

23 years ago

Let me try to understand, this would most likely be acquired from the charset in the HTML / HTTP response / overridden from View->Encoding menu? What uses (other than IDN) do you reckon this would be good for?

nhottanscp

Reporter

Comment 4

•

23 years ago

Other cases would be path names, file names.

Dan Mosedale (:dmosedale, :dmose)

Comment 5

•

23 years ago

Possibly relevant to this is bug 84186.

Asa Dotzler [:asa]

Comment 6

•

23 years ago

Bugs targeted at mozilla1.0 without the mozilla1.0 keyword moved to mozilla1.0.1 (you can query for this string to delete spam or retrieve the list of bugs I've moved)

Target Milestone: mozilla1.0 → mozilla1.0.1

Darin Fisher

Comment 7

•

23 years ago

this proposal: news://news.mozilla.org:119/3AFC62DE.F5CDF5DA%40netscape.com fails to satisfy several problems: 1) HTTP nsIURI's can be instantiated through redirects in which no charset information is available. the server may generate a URL in response to a redirect that contains URL-escaped non-ASCII characters. we have no way of converting these URLs to UTF8. 2) also, servers may URL-escape characters that would interfer with parsing a URL such as a '/' that is part of a path element and not a path element delimiter... or a '@' in someones password. there is a set of reserved characters that must be URL-escaped, otherwise the URL would fail to parse properly. in summary, adding a charset attribute to nsIURI is insufficient.

nhottanscp

Reporter

Comment 8

•

23 years ago

>adding a charset attribute to nsIURI is insufficient. I agree, but we need the hint charset in order to support existing documents if we switch to UTF-8 URI.

Darin Fisher

Comment 9

•

23 years ago

nhotta: the problem is that we cannot switch to a UTF-8 URL in all cases. in some cases we have no way of converting the unescaped URL to UTF-8. now, that doesn't stop us from converting the escaped URL to UTF-8, which is of course a no-op. so it is possible for nsIURI to support UTF-8 w/o requiring that all unescaped URI's be encoded using UTF-8. URIs for some protocols should simply never be unescaped. HTTP is an example of one such protocol. HTTP for example will most likely not use the charset attribute since there is no way to know in general what charset sequences %80-%FF correspond to. HTTP URLs shouldn't be unescaped. there are of course exceptions, and we want to make sure that, in the cases where charset information does exist, we try to show the user the unescaped URI. this really means that we should show the user the URI with escape sequences %80 and above unescaped. other escape sequences should probably stay intact since they could correspond to control characters and other reserved characters that would either make the URL not display properly or make the URI mean something entirely different.

Andreas Otte

Comment 10

•

23 years ago

As far as I know we store the URL escaped in nsStandardURL. This is a must! We need to change the escaping to no longer escape chars > 127 by default. On protocols that need to be in ASCII (could be stored on the protocol information) we need a second special escaping run for all chars > 127. That can happen just before sending the request to the server. URLs as a whole should be unescaped for displaying purpose only.

nhottanscp

Reporter

Comment 11

•

23 years ago

>URLs as a whole should be unescaped for displaying purpose only. I agree. The hint charset may be used to display if available. I talked about the unescaped case. Unescaped non ASCII URI in a document (e.g. HREF) is most likely in a charset of the document. I think we don't currently convert thoese URI to UTF-8 but I am not sure if those are escaped in nsIURI or left unescaped. In either cases, the hint charset would help to display those URI.

neeti

Assignee

Updated

•

22 years ago

Target Milestone: mozilla1.0.1 → Future

nhottanscp

Reporter

Comment 12

•

22 years ago

This is already available as originCharset in nsIURI.

Status: NEW → RESOLVED

Closed: 22 years ago

Resolution: --- → FIXED

Bugzilla

Add "uriCharsetEncodingHint" field to nsIURI

Categories

(Core :: Networking, defect, P4)

Tracking

()

People

(Reporter: nhottanscp, Assigned: neeti)

References

Details

Crash Data

Security

(public)

User Story

Description

Comment 1

Comment 2

Updated

Comment 3

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Comment 9

Comment 10

Comment 11

Updated

Comment 12