Closed Bug 119825 Opened 23 years ago Closed 21 years ago

URL (location) bar Search Feature ignores national encoding (google)

Categories

(Core :: Internationalization, defect)

x86
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: M.Hankus, Assigned: jshin1987)

References

Details

(Keywords: fixed1.6, intl)

Attachments

(1 file)

Linux build 2002011108 I use search feature of URL bar, and i noticed that url bar ignores national encoding of entered text. As an example I use ISO-8859-2, and Google as preferred search engine. When I enter something in URL bar and select search, mozilla query google with http://www.google.com/search?q=%3F%F3%3F%3F&sourceid=mozilla-search but when I open google and enter the same sentence in a form I got query string http://www.google.com/search?q=%BF%F3%B3%E6&hl=pl&btnG=Szukaj+z+Google so results are completly different.
*** Bug 118339 has been marked as a duplicate of this bug. ***
It might be more general, because Search tab in Sidebar behaves in the same way as URL bar. In case of ISO8859-2 all non ascii chars are converted to %3F
*** Bug 131126 has been marked as a duplicate of this bug. ***
It bothers me on Win2k too.
can someone reproduce this on 1.0RC1 ?
It disappeared in Win2K(was in 0.9.9)
I can reproduce it in 2002041903 on Windows 98SE. I have not tested RC1.
On Linux RC1 build it is reproducable, as is in 2002042121 (linux)
*** Bug 124588 has been marked as a duplicate of this bug. ***
*** Bug 141393 has been marked as a duplicate of this bug. ***
*** Bug 141841 has been marked as a duplicate of this bug. ***
Verified with Hebrew characters and BeOS (1.0 RC1.0 - 2002050509) Searching using the Google homepage worked fine, giving 16000 results: http://www.google.com/search?hl=en&q=%26%231496%3B%26%231511%3B%26%231505%3B%26%231496%3B&btnG=Google+Search The URL search for the same string returned no results: http://www.google.com/search?q=%3F%3F%3F%3F&sourceid=mozilla-search Request to change OS from Linux to All
Can confirm this bug on WIN2K in RC3. Using cyrillics. Searching with Google form is fine, searching through URL bar - all characters are sent as %3F, which obviously screws up the search.
*** Bug 143838 has been marked as a duplicate of this bug. ***
OS: Linux → All
*** Bug 136858 has been marked as a duplicate of this bug. ***
changing component
Assignee: hewitt → yokoyama
Component: URL Bar → Internationalization
QA Contact: claudius → ruixu
can we assume this has been confirmed then? :)
Status: UNCONFIRMED → NEW
Ever confirmed: true
Keywords: intl
QA Contact: ruixu → kasumi
-> nhotta
Assignee: yokoyama → nhotta
*** Bug 152065 has been marked as a duplicate of this bug. ***
*** Bug 153487 has been marked as a duplicate of this bug. ***
The search description file defaults to ISO-8859-1. http://lxr.mozilla.org/seamonkey/source/xpfe/components/search/datasets/google.src Adding bobj to cc. He was trying to send UTF-8 for google search.
It looks like it is fixed now (it works for me) build 2002071911 Linux.
*** Bug 144939 has been marked as a duplicate of this bug. ***
cc nhotta
bug 161181 is about google.src change.
Status: NEW → ASSIGNED
I'm not sure if anything has changed but linux build 2002080321 worked fine, and 2002080508 is not working (I just installed latest build).
Summary: URL bar Search Feature ignores national encoding → URL (location) bar Search Feature ignores national encoding (google)
*** Bug 155386 has been marked as a duplicate of this bug. ***
*** Bug 128224 has been marked as a duplicate of this bug. ***
*** Bug 149029 has been marked as a duplicate of this bug. ***
So many DUPS here. Latest one is 155386 which is reported 07/02/2002. As Mirek mentioned in#27, Mirek tested 2002080321. It works. I tested on 2002101805 build. It works also. Mirek: Could you please test on latest?
for me it is working fine for some time (also 2002121922 linux build)
Some time? Not all the time?
Since many bugs are merged to this one, so I have to describe all my observations, although I really doubt all of these is simply one bug. I'm using yesterday's nightly build (English) for windows. Running on w2k English. 1) If you search "中文" in sidebar, it returns no result 2) If you search "中文" in address bar, such as: http://www.google.com/search?hl=en&ie=UTF-8&oe=UTF-8&q=中文&btnG=Google+Search , it translates "中文" to "%D6%D0%CE%C4" & get no result back. It should translate to "%E4%B8%AD%E6%96%87" & get plenty of results. 3) If you highlight "中文" in browser & right click & select web search, it translate to http://www.google.com/search?q=%3F%3F&sourceid=mozilla-search&start=0&start=0 In short, all mozilla based chinese search failed :(
Has bug 145375 affected this one?
QA Contact: kasumi → cpetersen0953
It appears that search in both the sidebar and url-bar gets affected by Preferece | Navigator | Language setting. However, they're affected differently. In the URL-bar, what's entered by a user is correctly converted to UTF-8 no matter what language is at the top of the pref. lang list. That is, if I type U+AC00 and U+AC01, the url of the search result shown in the URL bar contains '%ea%b0%80%ea%b0%81' (url-escaped UTF-8 representation of <U+AC00><U+AC01>). Moreover, what I can type in the URL bar is NOT restricted by the repertoire of the locale charset (at least under Win2k. I guess the same is true of Moz-Linux at least under ll_CC.UTF-8 locale). However, the search result is all mangled (the result itself appears correct, though if an actual serach engine used - as opposed to the meta search server - supports UTF-8) Changing the character coding (EUC-KR) doesn't help. With 'ko' at the top of the list, the search result (for Korean word) is properly rendered. Given this, the problem is not on the Mozilla's side but is on the server side. It is converting the search result into the legacy MIME charset that is primarily associated with the language at the top of the list. If it's English, the 'search server' assumes the result is in ISO-8859-1 although they're actually in EUC-KR. Converting EUC-KR to UTF-8 assuming it's in ISO-8859-1 leads to a lot of question marks. Two things have to be done: 1. The 'meta search server' should store everything in UTF-8 at its DB 2. When sending back the result, it should just hand over the result without any conversion regardless of the prefered langauge setting. These will make multilingual search possible. The search in the sidebar behaves differently and for this Mozilla's also to blame because Mozilla is not converting the input to UTF-8 . It only works when the language of keywords entered matches the language at the top of the prefered lang. list. This is definitely an item for I18N release note. To make search in language 'X' work correctly, that language has to be at the top of the prefered language list in Pref|Navigator|Language. Matt, can you move up zh(-CN) or zh-TW to the top and see what you get?
re: comment #12 > On BeOS... > The URL search for the same string returned no results: > http://www.google.com/search?q=%3F%3F%3F%3F&sourceid=mozilla-search Is it still the case that Hebrew characters typed in the URL bar turn to '?' (U+003F) even with Hebrew at the top of your prefered lang. list? What's the locale under which you run Mozilla (if BeOS has such a thing..)? It might have to do with Unicode-based system (Win2k/XP and Linux with UTF-8 locale) vs legacy encoding based system (Win9x/ME and Linux with locales using legacy encodings).
With Google sherlock file updated, the search sidebar work perfectly well for Google regardless of what's at the top of the prefered. lang. list. I tested en-US Mozilla under Win2k(KO) with the zh-CN at the top of the pref. lang. list. Both Korean word and Greek word (with Greek letter NOT representable in EUC-KR. What I tried is 'Καλωσήλθατε'. CJK legacy character sets cover modern Greek letters without diacritic marks, but don't cover those with diacritic marks such as 'ή' U+03AE, eta with tono ) worked well with Google. However, search in the location(URL) bar doesn't work so well. When I typed '가각' (U+AC00, U+AC01. set View|Character Coding to UTF-8 to see the word) in the location bar with zh-CN at the top of the pref. lang list, I got no result with the URL in the location bar that reads: http://search-intl.netscape.com/zh-cn/google.tmpl? cp=clkzhcnsrp&charset=UTF-8&search=%EA%B0%80%EA%B0%81& lr=lang_zh-CN '%EA%B0%80%EA%B0%81' is the correct UTF-8 representation of '가각'(U+AC00, U+AC01) so that the URL seems to be right. It's most likely that google.tmpl at http://search-intl.netscape.com is to blame. It's assuming that lang=zh-CN means that the character repertoire should be restricted to that of GB2312. With 'ko' at the top, I expected '가각' in the location bar to work fine. I was suprised to find that it does not. Note that the url below has a different format from the one that appeared with zh-CN as the most preferred lang. Notably, 'ko/' is missing before 'google.tmpl' and '&lr=lang_ko' is missing after search. http://search-intl.netscape.com/google.tmpl? cp=clkkosrp&charset=UTF-8&all=yes&cat=World/Korean &search=%EA%B0%80%EA%B0%81 When I manually fixed up the url as follows, it worked. http://search-intl.netscape.com/ko/google.tmpl? cp=clkkosrp&charset=UTF-8&cat=World/Korean&search=%EA%B0%80%EA%B0%81&lr=lang_ko So, this problem with Korean has to be fixed on the Mozilla's side. Next I put Greek(el) at the top of my pref. lang. list and tried 'Καλωσήλθατε'. The search result seems to be correct, but the result looked totally garbled. The URL used was http://search.netscape.com/nscp_results.adp? query=%ce%9a%ce%b1%ce%bb%cf%89%cf%83%ce%ae%ce%bb%ce%b8%ce%b1%cf%84%ce%b5 &source=NSCPRedirect The url-escaped string after query= is the correct representation of 'Καλωσήλθατε'. http://search-intl.netscape.com/el/google.tmpl? cp=clkelsrp&charset=UTF-8 &search=%ce%9a%ce%b1%ce%bb%cf%89%cf%83%ce%ae%ce%bb%ce%b8%ce%b1%cf%84%ce%b5& lr=lang_el Greek was not so lucky and fixing up the url like the above didn't work. So, this is another 'meta search server' issue. There's no 'el/google.tmpl' for Greek. I don't know why 'meta search server' cannot simply fall back to English version if the localized version of 'greek.tmpl' is not available on the server. Google supports a large number of languages and 'meta search server' should be able to be a bridge between google's multilingual search and the location bar.
> With 'ko' at the top, I expected '가각' in the location bar > to work fine. I was suprised to find that it does not. Somehow it began to work (with /ko/google.tmpl?....) > So, this problem with Korean has to be fixed on the Mozilla's side This turned out to be wrong. Most, if not all, fixes have to be done on the server side (keyword.netscape.com). keyword.netscape.com determines which 'meta server' to call with what parameters depending on the value of Accept-Lang http header (that comes from the pref. lang. list of a client) and maybe other parameters handed over from Mozilla. I don't know how keyword.netscape.com determines which meta-search server to redirect incoming requests to based on accept-lang. (can it be configurable on the client side?). There seem to be three classes of 'meta search servers': 1. http://search-intl.netscape.com/ll-CC/google.tmpl : This one works well if 'll-CC' matches the first element in Accept-Lang. However, this one seems to be used only when one of CJK lang. is at the top of the pref. lang. list. Even when that's the case, there's a problem. It makes an invalid association between ll-CC and MIME charset and replaces characters outside the repertoire of the associated MIME charset with question marks. That is, when I include eta with tonos (ή) with ko as my pref. language, it becomes '?'. This one should be easiest to fix because google supports multilingual search very well and the sidebar search already works well. Perhaps, this is a server-side complement of the fix for bug 145375 (which is done on the client-side.) 2. The second category is completely broken. www.netscape.fr (used with fr as my pref. language) and suche.netscape.de (for German). They seem to interpret UTF-8 sequence as Windows-1252 sequence (when I gave '가' (U+AC00 : 0xEA 0xB0 0x80), it searched for U+00EA, U+00B0, U+0080 (ê°€), instead. This means that they don't even work for French and German keywords if there's even a single character outside US-ASCII. I just tried Österreich with 'de' as my pref. language, suche.netscape.de looked for Österreich, instead. Note that Ö in UTF-8 is 0xC3 0x96 which turn to Ö when interpreted as Windows-1252 3. The third category is search.netscape.com/nscp_results.adp. It appears that it's used when the first element in Accept-Lang is English or other languages for which there's no dedicated meta-search server. At the moment, the latter group includes Russian and Greek among many other languages. This is a curious case. a. With Russian or Greek as my pref. lang. When I give keywords not covered by US-ASCII, the search script running there interpret incoming UTF-8 sequences correctly as in UTF-8 judging from the fact that the pre-filled search box (for retry) in the result page preserves the input string intact. It also comes up with some relevant hits. For instance, it returns sites like http://www.vienna.at for Österreich with 'ru' as my pref. lang. For 'Καλωσήλθατε' with Greek, some Greek sites are returned. However, characters outside US-ASCII are all rendered with question marks. If I try a Chinese/Japanese/Korean keyword, a couple of hits in the first page are relevant while others appear to be off the mark. A really funny thing happened when I gave 'Österreich' with Russian pref. and manually switched to Windows-1252. The prefilled keyword for retry turned from Österreich to Österreich, which is perfectly understandable. Strange thing is there are a mix of hits, some with Österreich and the other with Österreich. Apparently, what's stored in the DB for search.netscape.com is a mixture of data in UTF-8(or legacy encoding with the proper encoding tag) and data in legacy encoding(with no or wrong encoding tag). The simplest fix (at least when google is the preferred search engine for the sidebar search) may be to make keyword.netscape.com redirect all keyword search to search-intl.netscape.com/xx/google.tmpl instead of lang-specific ones (that don't even work for target languages) and search.netscape.com/nscp_results.adp And, needless to say, google.tmpl script should not restrict the repertoire to that of legacy encodings. Instead, it should allow any character in Unicode.
Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.6b) Gecko/20031120 I have a probably related problem: When I try to search (both Sidebar and address bar, but not additional MozzilaPL XUL applet) for a word with Polish diacritical chars in it, it gets messed up: word: moździerz Address bar/Sidebar (broken) http://www.google.com/search?q=mo%25u017Adzierz&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8 Google XUL applet (works) http://www.google.com/search?q=mo%C5%BAdzierz&ie=utf8&oe=utf8&sourceid=mozilla-xul
No problem in Mozilla Firebird (with default-charset set to iso8859-2): Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6b) Gecko/20031206 Firebird/0.7+ Got: http://www.google.com/search?q=mo%C5%BAdzierz&sourceid=mozilla-search&start=0&start=0&ie=utf-8&oe=utf-8
Sorry for spam - I triplechecked and the browser had ISO8859-1 set as default charset. Setting it to ISO8859-2 doesn't cause the problem to show up though.
As I wrote in comment #39, it's still broken (in some cases, it works while in other cases it doesn't) NOT because Mozilla (as a client) does anything wrong BUT because keywrod.netscape.com (search.netscape.com) is broken. Presumably, search.netscape.com/keyword.netscape.com is not under the control of mozilla.org anymore. asa, I'm sorry to bother you, but what's mozilla.org's plan for the keyword server(s)? There's nothing we can do on the 'client side' and a relatively simple fix on the server side would fix the problem (comment #39). I'm tempted to change the product field to 'mozilla.org'. For a better tracking, I'm assigning to myself, but should be reassigned to someone who can fix things on the server-side eventually. P.S. Everyone who wants to post to this bug has to set the character coding in View menu to UTF-8 _before_ posting to avoid characters outside the repertoire of the current character encoding turn to NCRs (&#12345;) as in comment #34. re: comment #40. That was a 'transitive' bug. We fixed our escape/unescape to be complaint to ECMAscript standard(bug 44272), but hadn't fixed all our __misuses__ of escape/unescape (bug 225695). Those problems have been addressed since so that 1.6b should be fine with that.
Assignee: nhottanscp → jshin
Status: ASSIGNED → NEW
> to avoid characters outside the repertoire > of the current character encoding turn to NCRs (&#12345;) as in comment #34. to avoid turning characters outside the repertoire of the current character encoding to NCRs (&#21308;) as in comment #34.
Status: NEW → ASSIGNED
Attached patch a patch (deleted) — Splinter Review
Because we're not sure of the value of setting up a separate keyword server at mozilla.org and it's too late for 1.6 even if we decide to do that, we'd better take a simple way out by setting 'keyword.URL' to google. Had we better use 'google feeling lucky' as firebird does? In this patch, I'm using 'the plain google search'.
I think this should be fixed in both 1.4.2 and 1.6. chofmann, what do you think? I guess you favor setting up our own server, but as you wrote it's too late for 1.6. As for fixing things on AOL servers, I can only guess it's a rather simple fix, but can't be sure because I have never seen the code on that side. Therefore, making the default keyword.URL point to google seems to be a n easy way out.
Flags: blocking1.6?
Flags: blocking1.4.2?
Comment on attachment 137625 [details] [diff] [review] a patch asking for r/sr. I can't quite decide who to ask for r/sr... (I would have asked smontagu for r, but he's on vacation). This is kinda just filling the hole, but should be a lot better than what we have now.
Attachment #137625 - Flags: superreview?(brendan)
Attachment #137625 - Flags: review?(chofmann)
This would not block the release. Please request approval when you have the necessary reviews and drivers will consider the fix for inclusion in 1.6.
Flags: blocking1.6?
Flags: blocking1.6-
Flags: blocking1.4.2?
Flags: blocking1.4.2-
Comment on attachment 137625 [details] [diff] [review] a patch Someone test this heavily; code review is not the thing here. /be
Attachment #137625 - Flags: superreview?(brendan) → superreview+
Thanks for sr. All the test cases mentioned here (Greek, Russian, Polish, German, Korean, Japanese, Chinese) and some others I just made up work well as far as I can tell. Others can test it by setting 'keyword.URL' to 'http://www.google.com/search?ie=UTF-8&oe=utf-8&q=' in about:config and enabling 'keyword' in Edit|Preference|Navigator|Smart Browsing. See http://www.mozilla.org/docs/end-user/internet-keywords.html for details.
Blocks: 229262
the patch works for me for French language, thanks
Comment on attachment 137625 [details] [diff] [review] a patch asking the module owner for review
Attachment #137625 - Flags: review?(chofmann) → review?(smontagu)
Comment on attachment 137625 [details] [diff] [review] a patch r=smontagu. This seems to work well enough out of the box, but I see the %3Fs can still resurface if the default search engine is reset from the search sidebar, e.g. to AskJeeves. There may not be much we can do about that.
Attachment #137625 - Flags: review?(smontagu) → review+
fix checked into the trunk.
Status: ASSIGNED → RESOLVED
Closed: 21 years ago
Resolution: --- → FIXED
Comment on attachment 137625 [details] [diff] [review] a patch asking for a1.6
Attachment #137625 - Flags: approval1.6?
Comment on attachment 137625 [details] [diff] [review] a patch a=asa (on behalf of drivers) for checkin to 1.6
Attachment #137625 - Flags: approval1.6? → approval1.6+
Keywords: fixed1.6
forgot to comment; checked in to 1.6 branch this afternoon.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: