<a class="header-button" href="https://bugzilla-dev.allizom.org/home" title="Go to home page"> Bugzilla

Comment 2

•

23 years ago

It's a start. Who can we add from i18n?

Asa Dotzler [:asa]

Comment 3

•

23 years ago

shotgun cc:

Roy Yokoyama

Updated

•

23 years ago

Blocks: 86948

Comment 4

•

23 years ago

no, this is not a necko issue, this is an i18n issue. reassign to ftang for now The problem how can we know we want to encode to ISO-8859-1 instead of UTF-8 in this case.

Assignee: neeti → ftang

Reporter

Comment 5

•

23 years ago

This is nearly the same problem as when entering the URL http://www.mozilla.org/htdig-cgi/htsearch?words=müll But I agree: you don't know whether you have to convert it to ISO8859-1 or ISO8859-9 or whatever. But to choose ISO8859-1 seems to make more sense than to choose UTF8 which is never correct.

Asa Dotzler [:asa]

Comment 6

•

23 years ago

setting bug status to New

Status: UNCONFIRMED → NEW

Ever confirmed: true

Comment 7

•

23 years ago

>But to choose ISO8859-1 seems to make more sense than to choose UTF8 which >is never correct. first, read http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2277.html http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2718.html http://www.cis.ohio-state.edu/cgi-bin/rfc/rfc2640.html http://www.w3.org/International/2000/03/draft-masinter-url-i18n-05.txt http://www.w3.org/TR/REC-html40/appendix/notes.html#h-B.2.1 http://www.w3.org/TR/REC-xml#sec-external-ent

Reporter

Comment 8

•

23 years ago

Sorry, I was not correct. If a page is encoded in UTF8, the contents of forms are of course sent as UTF8 encoded. But pages encoded in UTF8 are the minority, I guess.

Comment 9

•

23 years ago

yea, but there are many many page which are not using ISO-8859-1, for example, chinese page use BIG5, GB2312, Japanese page use Shift_JIS, EUC-JP. This is not a binary decision between ISO-8859-1 and UTF-8. It is a multiple choice decision.

Comment 10

•

23 years ago

how do IE handle this ? Should we build a default URL bar encoding in pref and allwo user to change it from pref ?

Comment 11

•

23 years ago

Can someone update the summary to be more descriptive? Like "URL: encoding in bookmarks is <a> when it should be <b>"

Reporter

Updated

•

23 years ago

Summary: URL encoding not correct → URL search part is always encoded UTF8

bobj

Comment 12

•

23 years ago

I tried typing this in the location bar and hitting return: http://www.google.de/search?charset=UTF-8&q=w%C3%BCste Interestingly, google.de returns good results, but it mislabels the results page: <meta HTTP-EQUIV="content-type" CONTENT="text/html; charset=ISO-8859-1"> But if you override the above tag using the menu View|Character Coding|More|Unicode (UTF-8) then the results page looks fine and you see that google did find hits for "wüste". So, we could fix the client by adding the charset parameter to the query string. Will this work of all servers? But we also need google (and others) to correctly label the charset of the results page. cc'ing Kat Momoi in evangelism.

Reporter

Comment 13

•

23 years ago

>Interestingly, google.de returns good results Google only returns pages encoded in UTF8. The result is not the normal desired result! This is a google bug. The "charset=" parameter does not have any influence since it depends on the underlying CGI. If a form is in a page with UTF8-encoding, the form generates an UTF8 encoded search string. If the form is in a ISO8859-1 encoded page, it generates a ISO8859-1 encoded search string. If one enters the search string manually and you don't know the encoding that is expected by the CGI, do not encode the search string, just leave it as it is on that machine.

Katsuhiko Momoi

Comment 14

•

23 years ago

> how do IE handle this ? It seems to send all its search requests in the Address Bar to an MSN search engine. So, in essence, they control what should be sent. I tried Latin 1 string and it did not use UTF-8 %-escaped format. It was 8859-1 in %-escaped format. > Should we build a default URL bar encoding in pref and > allwo user to change it from pref ? IE has such an option to send URL in UTF-8 (default except in Asian versions of IE). A good solution to this problem needs to include the following consideratons: 1. An option to turn ON or OFF UTF-8 URL encoding when input from URL bar. In case it is a search string as opposed to normal URLs, provide fallback encoding in case there is no info for target search engines/cgi's. 2. Easier way to choose encoding for different search engines. Currently this is set via RDF files for each search engine. This approach has limitations -- users are unable to change values easily. 3. Possible support for emerging CNRP (Common Name Resolution Protocol) -- which uses UTF-8, for keyword search. http://www.ietf.org/ids.by.wg/cnrp.html Are there other requirements?

Andreas Becker

Updated

•

23 years ago

Keywords: intl

Updated

•

23 years ago

Status: NEW → ASSIGNED

Manfred Lebek

Comment 15

•

23 years ago

Are german umlaute allowed in links anyway ?

Comment 16

•

23 years ago

url related issue. give to nhotta.

Assignee: ftang → nhotta

Severity: minor → major

Status: ASSIGNED → NEW

Priority: -- → P3

Target Milestone: --- → mozilla0.9.7

nhottanscp

Comment 17

•

23 years ago

Current only ASCII is allowed for URL. Characters above 127 have to be escaped. I don't think there is any standard of what character set to use for URL. There is a similar bug 105909. I think UTF-8 is the way to support as many characters possible. But depends on the situation, using other character sets might be preferred. In bug 105909, I proposed to specify the character set as a pref. Other possibility is to provide an option to use a current document charaset.

Status: NEW → ASSIGNED

Keywords: mozilla1.0

Target Milestone: mozilla0.9.7 → mozilla1.0

nhottanscp

Updated

•

23 years ago

Target Milestone: mozilla1.0 → mozilla1.2

John Levon

Comment 18

•

23 years ago

*** Bug 135763 has been marked as a duplicate of this bug. ***

Ilya Konstantinov

Comment 19

•

23 years ago

Google now supports ie=encoding (input encoding) and oe=encoding (output encoding) fields in its search requests, so we can simply set those to UTF-8 (ie=UTF-8&oe=UTF-8) and be over with it. For other search engines, we should introduce some way to define in the search engine definition file which encoding the input should be encoded in.

Comment 20

•

23 years ago

->bookmarks Actually, this is a specific issue for each search engine, internet keywords, and bookmarks. The original problem was about a custom keyword in a bookmark. Other areas should be filed a separate bugs.

Assignee: nhotta → new-network-bugs

Status: ASSIGNED → NEW

Reporter

Comment 21

•

23 years ago

With RC1, at least under OS/2, characters above 128 are no longer encoded in UTF8, but in CP437. But for this case, the ie= parm from google is nevertheless useful. Where is it documented?

Ilya Konstantinov

Comment 22

•

23 years ago

Its not documented anywhere on Google's site, but it is a part of their SOAP API. I still believe we should use it in Mozilla until someone from Google confirms this argument is going to stay during further site improvements. I've mailed them (Google) few days ago and they didn't answer me yet. If someone has a shortcut to get a quicker response, that'll be nice :)

Reporter

Comment 23

•

23 years ago

Maybe http://groups.google.de/groups?hl=de&group=google.public.support.general is the right place to ask.

Comment 24

•

23 years ago

really moving to bookmarks.

Component: Networking → Bookmarks

Comment 25

•

23 years ago

okay.

Assignee: new-network-bugs → ben

QA Contact: benc → claudius

Reporter

Comment 26

•

22 years ago

Why is the encoding no longer UTF8? (1.0 final)