Closed Bug 258223 Opened 20 years ago Closed 19 years ago

Bookmark keyword quicksearch need a way to specify character encoding for query URLs

Categories

(Firefox :: Bookmarks & History, defect)

defect
Not set
major

Tracking

()

RESOLVED FIXED
Firefox1.5

People

(Reporter: qiling, Assigned: jshin1987)

References

Details

(Keywords: intl, relnote)

Attachments

(2 files, 10 obsolete files)

User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 Build Identifier: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.7) Gecko/20040616 using the quicksearch feature in Mozilla and Firefox to search for non ASCII search items often result in the said search item to be sent to website in non-universal encoding. I cannot decipher what encoding it is being sent, but sending it in UTF-8 would be nice. Reproducible: Always Steps to Reproduce: 1. define a quicksearch with a keyword (for example, "g" for http://www.google.com/search?q=%s) 2. use the feature: type "g [searchitem]" in address bar ([searchitem] would necessarily be non-english. I am testing this feature using CHINESE and JAPANESE 3. press enter to activate feature. Actual Results: the query is sent as a wrong encoding and google searches for "????" instead. Expected Results: the search website is sent the properly encoded (UTF-8) queries. Several notes: I am providing some search items that can be used for debugging. japanese searchitem would be "凪," chinese searchitem should be "东" (these are characters that does not exist in the other language). Interestingly, all of the chinese searches come out alright. Even more interestingly, if the chinese character is coupled with the japanese "g 东 凪," the japanese character is sent properly. HOWEVER, if sending japanese alone, google receives something in something strange. I use Japanese version of windows., and the same problem was confirmed in both Moz 1.7 and Firefox 0.9.3. As a final usage note, the addressbar search in Mozilla AND firefox searchbox performs these searches flawlessly. Onto some speculation: I believe that the cause of this problem is that the quicksearch feature probably involves two encodings. one is the encoding for the searchitem, and the other is the encoding requested to the website. This feature need to *force* the query items to interpreted as UTF-8, which it is not doing right now, even though the website requests are all in UTF. When the searchstring is japanese, the quicksearch feature would send the item as the default OS encoding (probably EUC-JP), while the page request specifies UTF. This results in google decoding the searchitem by UTF-8 even though it is encoded in EUC. In the case where chinese or chinese + japanese is sent, the default encoding cannot express the chinese characters and sends the query as UTF, which is accepted and processed properly. *) I am marking this bug as major because its existance implies that all non-english users of mozilla / firefox cannot use this feature. It also means that for these users firefox have no address bar search functionality at all and must rely on the searchbox. This becomes even more pronounced as firefox will replace mozilla as the standard browser. *) this bug is probably related to bug # 205652.
Interesting. The testcases in this bug workforme with an English Win98 system.... Can someone point me to where bookmark keywords are implemented? I thought the "keyword" protocol handler handled that, but it looks like that's a different beastie....
Small comment: Please view this page in UTF-8 encoding to see the test cases properly.
Some additional comments: 凪: japanese native, JIS code 4664, Shift JIS code 93E2, Unicode 51EA 东: chinese (simplified), Unicode 4E1C. Does not exist in JIS code pages. I apparently made an mistake earlier. The character in japanese is sent as S_JIS but interpreted as UTF. Using the quicksearch feature for japanese, the following results are obtained (in firefox): 1) type "g 凪" and press enter 2) the address bar performs substitution to "http://www.google.com/search?q=凪" 3) step 2 quickly changes to "http://www.google.com/search?q=%93%E2" (encoded in Shift_JIS) Using the same feature and searching for chinese 1) type "g 东" and press enter 2) the address bar performs substitution to "http://www.google.com/search?q=东" 3) step 2 quickly changes to "http://www.google.com/search?q=%E4%B8%9C" which I have no idea what encoding it is, but is sent through properly. *) if not using quicksearch but rather Firefox's searchbox (same with moz address bar search), the address will become "http://www.google.com/search?q=%E5%87%AA" directly. Probably the replace function in quicksearch (s/%s/[searchterm]) should have code that will convert [searchterm] to proper encoding before sending it to address bar for page retrieval.
(In reply to comment #3) > *) if not using quicksearch but rather Firefox's searchbox (same with moz > address bar search), the address will become > "http://www.google.com/search?q=%E5%87%AA" directly. This is for the japansee test case 凪
Attached patch Possible fix (obsolete) (deleted) — Splinter Review
I can't actually test this because I don't have any Japanese or Chinese fonts.
ccing intl folks and darin. Since bookmark keywords are handled outside the core, they run up against the code added in bug 130393 (which encodes URIs for certain schemes in the OS charset instead of UTF-8 because servers at the time didn't deal well at all well with UTF-8). Is that situation still present? If not, can we remove that code? That said, Neil's fix should do a decent job for this particular bug, I think... Ling Qi, could you possibly test it? (Testing doesn't require a build environment; just modifying the navigator.js in the jar in the chrome dir in a Mozilla install.)
Assignee: p_ch → jag
Status: UNCONFIRMED → NEW
Component: Bookmarks → XP Apps
Ever confirmed: true
QA Contact: seamonkey.bookmarks → pawyskoczka
Patch 158088 was a charming fix for Mozilla. For 1.7 (sorry I didn't have time to install 1.7.2), I changed line 1431 (Diff shows 1495), which fixed the problem for Moz. The I cannot find the same line for Firefox but in /content/browser/browser.js, function getShortcutOrURI(aURL, aPostDataRef) seem to be the one related to this bug (0.9.3 release code). So Firefox remains untested.
This is rather tricky. Perhaps sending UTF-8 would be the best we can do, but wouldn't work always. For instance, google can be configured to accept keywords in an non-UTF-8-encoding. re comment #6: More and more servers can deal with url-escaped UTF-8s (even though their server file system character encoding is not UTF-8). It seems that MS IIS can deal with them very well (well, most of them are run on Win 2k/XP so that it should), which may explain why they have, by default, 'always send URLs in UTF-8' ON in MS IE. There's an apache module for this well. However, I'm not sure how this apache module deals with cases where a url is not a direct reference to a file on the file system but is a 'GET' URL (whose 'existence' can only be checked after running a CGI). See bug 129726. re comment #3: > 3) step 2 quickly changes to "http://www.google.com/search?q=%E4%B8%9C" which > I have no idea what encoding it is, but is sent through properly. '0xE4 0xB8 0x9c' represents U+4E1C (东) in UTF-8. > 3) step 2 quickly changes to "http://www.google.com/search?q=%93%E2" > (encoded in Shift_JIS) I'm surprised that this worked (it doesn't work for me). Is the UI language for google set to Japanese? The server side program on Google may do some encoding checking (check whether it's UTF-8. if not, fall back to a couple of most widely used encoding for the language selected for the UI. For Japanse, they would be Shift_JIS, EUC-JP, and perhaps ISO-2022-JP which can be rather reliably distinguished from each other).
Keywords: intl
(In reply to comment #8) > I'm surprised that this worked (it doesn't work for me). Is the UI language > for google set to Japanese? The server side program on Google may do some > encoding checking (check whether it's UTF-8. if not, fall back to a couple of > most widely used encoding for the language selected for the UI. For Japanse, > they would be Shift_JIS, EUC-JP, and perhaps ISO-2022-JP which can be rather > reliably distinguished from each other). I am not too sure what you mean by "this" in "...surprised that this worked..." In any case, the google UI is set to english, but the same behavior exists for google.com in japanese AND google.co.jp (which, coincidentally, is also in japanese). I have also tried this with altavista with the same results. I believe it has to do with the fact that my OS is japanese version of windows than the interface language of google. While I admit that google and other websites attempts to detect the searchterm and figure out its encoding, it is non-functional in this case because the URL sent to the website is already in escaped form, e.g. %93%E2, probably because mozilla is skeptical about sending urls in UTF encoding? A bit offtopic, but how do does 0xE4 0xB8 0x9C represents U+4E1C? I could not see any corrolation between the two numbers.
So then the question is whether just URL-encoding the data as the patch does is ok....
(In reply to comment #10) > So then the question is whether just URL-encoding the data as the patch does is > ok.... I honestly do not know where to look but I would suspect that the Moz address-bar search and the firefox searchbox already URL-encode the data before sending it out.
Of course; the question is whether the data is URL-encoded before we start converting encodings to the plaform encoding or after...
(In reply to comment #9) > While I admit that google and other websites attempts to detect the searchterm > and figure out its encoding, it is non-functional in this case because the URL > sent to the website is already in escaped form, e.g. %93%E2, probably because > mozilla is skeptical about sending urls in UTF encoding? URLs _MUST_ be sent in escaped form. You can not send 8bit data in HTTP requests. > A bit offtopic, but how do does 0xE4 0xB8 0x9C represents U+4E1C? I could not > see any corrolation between the two numbers. See http://en.wikipedia.org/wiki/UTF-8
This is the third time I'm writting this. (I keep forgetting to submit my comment before I quit mozilla ...). We need to offer a way to let users specify the character encoding to use when converting what's entered in the address bar before url-escaping and submitting to the server. Just always using UTF-8 doesn't work. When you take a look at the definition of a searchplugin, it's clear why you need that. To define a searchplugin, you have to specify what character encoding to use. For instance, a Korean search engine (http://search.naver.com) expects query strings to come in EUC-KR (or x-windows-949). If we convert a keyword to UTF-8 before url-escaping and putting in place of '%s' in 'http://search.naver.com/search.naver?where=nexearch&query=%s', it would return a lot of garbages. What we can do are: 1. parse 'mozencoding=<ENCODING>' (e.g. 'mozencoding=ISO-8859-1', 'mozencoding=UTF-8', 'mozencoding=EUC-JP') in the bookmark keyword quick search definition 2. convert the input string to the encoding specified via 'mozencoding' before url-escaping if it's present. otherwise, use either the platform encdoing or UTF-8. Which one to use, we have to think about. (currently, we always conver to the platform encoding) 3. get rid of 'mozencoding=<ENCODING>' in the query string before submitting it to the server 4. document this feature in the bookmark quick search document (and release notes). I'm adding asa to CC because he documented the bookmark quick search at the mozilla.org web site.
> URLs _MUST_ be sent in escaped form. You can not send 8bit data in HTTP requests. I don't think HTTP has such a restriction. It's just the definition of URI that restricts what can be used in URI to a subset of ASCII characters. Now Martin Duerest and others have been working on IRI drafts according to which non-ASCII characters are allowed in UTF-8. (no other encoding is allowed unless it's url-escaped). See http://www.w3.org/2004/Talks/IUC25iri/
(In reply to comment #15) > I don't think HTTP has such a restriction. It's just the definition of URI > that restricts what can be used in URI to a subset of ASCII characters. Well, HTTP (RFC 2616) normatively references RFC 2396. > Now > Martin Duerest and others have been working on IRI drafts according to which > non-ASCII characters are allowed in UTF-8. (no other encoding is allowed unless > it's url-escaped). See http://www.w3.org/2004/Talks/IUC25iri/ why doesn't it use escaped utf-8?
(In reply to comment #16) > (In reply to comment #15) > > I don't think HTTP has such a restriction. It's just the definition of URI > > that restricts what can be used in URI to a subset of ASCII characters. > > Well, HTTP (RFC 2616) normatively references RFC 2396. Ok. You're right. For HTTP 1.1 request, IRIs have to be converted to URIs. > > it's url-escaped). See http://www.w3.org/2004/Talks/IUC25iri/ > > why doesn't it use escaped utf-8? See http://www.w3.org/International/iri-edit/draft-duerst-iri-07.txt and references therein.
Bug 123006 actually covers the use of encoding. We can keep this bug for the character set issues.
*** Bug 266667 has been marked as a duplicate of this bug. ***
Taking. I've just made a patch but have to test it.
Assignee: jag → jshin
Attached patch patch (obsolete) (deleted) — Splinter Review
I implemented 'mozcharencoding=XXX' I mentioned earlier.
Attachment #158088 - Attachment is obsolete: true
Comment on attachment 163888 [details] [diff] [review] patch asking for r/sr. + return str.replace(/[^a-zA-Z0-9+\-*\._@]/gi, encodeChar); I'll get rid of 'A-Z' (or 'i' flag)
Attachment #163888 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163888 - Flags: review?(jag)
*** Bug 264406 has been marked as a duplicate of this bug. ***
Attached patch firefox patch v1 (obsolete) (deleted) — Splinter Review
I'm not sure what aPostDataReference does so that I leave that part alone. However, the part shared with Mozilla is patched the same way as is done in attachment 163888 [details] [diff] [review].
+ .classes["@mozilla.org/intl/scriptableunicodeconverter"] + .getService(Components.interfaces.nsIScriptableUnicodeConverter); given that it has state, shouldn't it be used with createInstance? (like calendar and mail do. unlike chatzilla and venkman do...)
Attached patch patch v2 (per cbie's comment) (obsolete) (deleted) — Splinter Review
cbie, thanks for the catch. I'm not using nsIScriptableUnicodeConverter as a service any more. I also renamed 'legacyStrEncode' to 'encodeByteString' with a more extensive comment.
Attachment #163888 - Attachment is obsolete: true
Attachment #163897 - Attachment is obsolete: true
Attachment #163888 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163888 - Flags: review?(jag)
Attached patch firefox patch v2 (obsolete) (deleted) — Splinter Review
Comment on attachment 163945 [details] [diff] [review] patch v2 (per cbie's comment) asking for r/sr
Attachment #163945 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163945 - Flags: review?(jag)
My patch implemented what I wrote in comment #14, in which I also wrote : > 2. convert the input string to the encoding specified via 'mozencoding' before > url-escaping if it's present. > otherwise, use either the platform encdoing or UTF-8. Which one to use, we have > to think about. (currently, we always conver to the platform encoding) In bug 123006, '%S' was added (in mozilla but not in firefox) to send strings in the platform encoding. '%s' was chnaged to be used for sending in url-escaped UTF-8. Here we're adding a way to specify explicitly the character encoding to use for URLs. This is necessary because there are still a lot of 'local' search engines and search-engine like web sites that are not enlightnened enough to go all the way to UTF-8 as google has. In addition, on modern Linux (and probably Mac OS X), the platform encoding is UTF-8 so that users really have no choice without this patch.
OS: Windows XP → All
Hardware: PC → All
Summary: Bookmark keyword quicksearch sends query in non-universal encoding → Bookmark keyword quicksearch need a way to specify character encoding for query URLs
Jungshik, I did some testing with the patch of bug 136006, and I found that oppposite to what I initially thought, with that patch %s doesn't always send the URL as UTF-8 with it. I was testing initially with google and was always getting UTF-8, but with dictionary.reference.com, it goes out as cp1252. I'd like to understand better what the call to 'encodeURIComponent' does exactly. Don't you thing it's possible to be smart enough to do the right thing by using LAST_CHARSET to determine the encoding to use, and not require the special mozencoding keyword ?
(In reply to comment #30) > with that patch %s doesn't always send the URL as UTF-8 with it. > > I was testing initially with google and was always getting UTF-8, but with > dictionary.reference.com, it goes out as cp1252. > > I'd like to understand better what the call to 'encodeURIComponent' does exactly. I guess dictionary.reference.com is running either IIS or Apache with IRI module that automatically translates incoming UTF-8 to what its server-side program understands (in this case Windows-1252). See ECMA 262 (ECMAscript standard) section 15.1.3. We implemented ECMA 262 version of encodeURI* failthfully in our JS engine. That is, it always produces URL-escaped UTF-8. > Don't you thing it's possible to be smart enough to do the right thing by using > LAST_CHARSET to determine the encoding to use, and not require the special > mozencoding keyword ? I don't think that's reliable enough. Moreover, that wouldn't work until the page in question is tried for the first time. With that, the 'mass/automatic' deployment of keyword search entries wouldn't be possible.
Status: NEW → ASSIGNED
Comment on attachment 163945 [details] [diff] [review] patch v2 (per cbie's comment) >+ re = /^(.*)(\&mozcharencoding=)([a-z][_\-a-z0-9]+)$/i; Well, if that's the easiest way to specify the charset... oh, and don't forget to watch out for trailing spaces. > shortcutURL = /%[sS]/.test(shortcutURL) ? >- shortcutURL.replace(/%s/g, encodeURIComponent(text)) >- .replace(/%S/g, text) : >+ (newText ? shortcutURL.replace(/%[sS]/g, escapeByteString(newText)) : >+ shortcutURL.replace(/%s/g, encodeURIComponent(text)) >+ .replace(/%S/g, text)) : > null; This looks ugly and offhand the easiest way I could think of simplifying it was to have a variable encodedText which is either encodeURIComponent(text) or escape(convertFromUnicode(text)) for the %s case, and just use text for the %S case; using the escaped charset in the %S case doesn't make sense to me. >+ var unicodeConverter = Components >+ .classes["@mozilla.org/intl/scriptableunicodeconverter"] >+ .createInstance(Components.interfaces.nsIScriptableUnicodeConverter); >+ unicodeConverter.charset = charset; >+ if ("Finish" in unicodeConverter) { Why would "Finish" not be in unicodeConverter? The interface declares it, so it must exist. >+// url-escape a 'byte string' made up of zero-padded '16-bit characters' Ordinary escape() should do this these days. As you know it used to be the same as encodeURIComponent but that violated the ECMA spec. Chatzilla uses its own version because the XPIs are designed to be compatible with previous versions of Mozilla.
Attached patch patch v3 (obsolete) (deleted) — Splinter Review
addressed Neil's concerns
Attachment #163945 - Attachment is obsolete: true
Attachment #163946 - Attachment is obsolete: true
Attachment #163945 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #163945 - Flags: review?(jag)
Comment on attachment 164872 [details] [diff] [review] patch v3 thanks for comments. asking for r/sr.
Attachment #164872 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #164872 - Flags: review?(jag)
Attached patch patch for firefox (1.0) (obsolete) (deleted) — Splinter Review
Comment on attachment 164872 [details] [diff] [review] patch v3 (In reply to comment #32) >don't forget to watch out for trailing spaces. That came out wrong... I was actually referring to your source code!
Attachment #164872 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
oops. thanks for sr. I got rid of trailing spaces in the source code lines I added. Also removed was '\s*' in the regular expression.
Comment on attachment 164872 [details] [diff] [review] patch v3 + re = /^(.*)(\&mozcharencoding=)([a-z][_\-a-z0-9]+)\s*$/i; Missing |var| or |const|, though for one-time uses, prefer to inline. No need really to put &mozcharencoding= into a group, so you can change matches[3] to matches[2]. Now, I have to point out that this is a rather ugly hack. A cleaner solution would be to make the charset be an explicit field in the new bookmark / bookmark properties dialogs, and save it as a property of the bookmark, instead of encoding it in the URL. Also, though highly unlikely, your solution won't play nice with any url that actually uses "mozencoding" in its query. This may require some changes to the bookmark service (specifically, the parser and serializer), perhaps talk to Ben Goodger first.
Attachment #164872 - Flags: review?(jag) → review-
(In reply to comment #38) > + re = /^(.*)(\&mozcharencoding=)([a-z][_\-a-z0-9]+)\s*$/i; > > Missing |var| or |const|, though for one-time uses, prefer to inline. > No need really to put &mozcharencoding= into a group, so you can change > matches[3] to matches[2]. Thanks for point them out. > A cleaner solution would be to make the charset be an explicit field > in the new bookmark /bookmark properties dialogs, and save it as > a property of the bookmark, Sure, it would be cleaner that way and I thought about it before making this hack. However, it seemed to be an overkill to add all those changes to both front-end and back-end for this rather 'obscure' feature. Moreover, unless the front-end is designed to make it clear what 'character encoding' is for, it can easily get users confused. > Also, though highly unlikely, your solution won't > play nice with any url that actually uses "mozencoding" in its query. That's why I made it rather long ('mozcharencoding') and match only at the end (to minimize that possibility as much as possible). > This may require some changes to the bookmark service (specifically, the parser > and serializer), perhaps talk to Ben Goodger first. Anyway, I'll talk to him about it.
don't bookmarks already store a charset?
*** Bug 267755 has been marked as a duplicate of this bug. ***
Christian, see the LAST_CHARSET discussion in comments #30 and #31 I'm not completely convinced by junshik's answer, because I don't see how the deployment of keyword search entries can be done without deploying a complete bookmark entry, including the LAST_CHARSET element. Also when you create by hand a new entry, the encoding will be wrong only one time, so it should not be too bad. I don't know how LAST_CHARSET is selected for newly created bookmarks, but if it is correctly set using the value that corresponds to the last display of the page, it will fail even more rarely than that.
LAST_CHARSET will never be changed for bookmarks containing %s because it works by looking up the bookmarks for the loaded URL and updating the last visited date and character set, so such a bookmark would only have a character set if it was altered from a visited static bookmark.
In fact the same bug applies to bookmarks pointing to redirected pages - their last visited date never updates because only the target page loads.
*** Bug 242508 has been marked as a duplicate of this bug. ***
Blocks: 270703
Blocks: 260006
Product: Core → Mozilla Application Suite
Keywords: relnote
*** Bug 276122 has been marked as a duplicate of this bug. ***
*** Bug 287050 has been marked as a duplicate of this bug. ***
MSIE-based variants have support for this and MS IE7 may do the same. I'll ping Ben once more (comment #39) This should belong to core,but it doesn't have a suitable component so that I'm moving it to firefox | bookmark for a better exposure.
Component: XP Apps → Bookmarks
Flags: review-
Product: Mozilla Application Suite → Firefox
Attached patch new patch for firefox (obsolete) (deleted) — Splinter Review
With this patch, three methods are tried in turn : 1 . mozcharset=xxx at the end of a shortcut URL : needs a manual intervention of a user, needs to mention in the release notes (or asa's note on keyword) 2. LAST_CHARSET value in the bookmark which is only useful when keyword search bookmarks are made from static bookmarks. needs to mention in the release notes (or asa's note on keyword) 3. UTF-8 (the default) I added |getLastCharset| method to nsIBookmarksService and called it in browser.js to refer to |LAST_CHARSET| if available. If method #1 is considered too inconvenient for ordinary users, I can get rid of it.
Attachment #164880 - Attachment is obsolete: true
Attachment #188830 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #188830 - Flags: review?(vladimir)
Comment on attachment 188830 [details] [diff] [review] new patch for firefox I don't think you should be guessing the character set when the URL doesn't even contain %s. >+ nsXPIDLString charset; >+ charsetData->GetValue(getter_Copies(charset)); >+ aCharset = charset; Slightly more efficient would be to use const PRUnichar *charset; charsetData->GetValueConst(&charset); aCharset.Assign(charset);
Attachment #188830 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview-
(In reply to comment #50) > (From update of attachment 188830 [details] [diff] [review] [edit]) > I don't think you should be guessing the character set when the URL doesn't > even contain %s. You're worried about an unncessary(but harmless otherwise) call to |BMSC.getLastCharset| ? I'll invoked that only when '%s' is present > const PRUnichar *charset; > charsetData->GetValueConst(&charset); > aCharset.Assign(charset); Ok, I'll change that.
(In reply to comment #49) > Created an attachment (id=188830) [edit] > new patch for firefox > > With this patch, three methods are tried in turn : > > 1 . mozcharset=xxx at the end of a shortcut URL : needs a manual intervention > of a user, needs to mention in the release notes (or asa's note on keyword) > 2. LAST_CHARSET value in the bookmark which is only useful when keyword search > bookmarks are made from static bookmarks. needs to mention in the release notes > (or asa's note on keyword) > > 3. UTF-8 (the default) > > I added |getLastCharset| method to nsIBookmarksService and called it in > browser.js to refer to |LAST_CHARSET| if available. > > If method #1 is considered too inconvenient for ordinary users, I can get rid > of it. > The order to handle what encoding should use for the searching is nice, and the patch works fine for me for those non UTF-8 encoding search sites with GET method. But there are still lots of websites use POST method (and some sites reject GET method to query), in this case, the %s is in the post data which is still handled in the old way and don't work correctly.
It should be noted that nsFormSubmission.cpp uses nsISaveAsCharset (with attr_EntityAfterCharsetConv and attr_FallbackDecimalNCR) as encoder.
Attached patch browser.js patch for firefox 1.0.4 (obsolete) (deleted) — Splinter Review
Made some small changes for browser.js based on Jungshik Shin's last patch. Should works for POST method search sites now, tested in Firefox 1.0.4/1.0.5 Windows. Notes: This patch file is based on the release package(<install dir>\chrome\browser.jar) of Firefox not the CVS code. it don't contain the patch code to cpp/idl files to support LAST_CHARSET (check Jungshik Shin's patch code for it). I made this so you can temporary fix the problem without waiting for new official release or download/build the source code. (but it only support mozcharset in URL as metioned above)
Comment on attachment 189407 [details] [diff] [review] browser.js patch for firefox 1.0.4 >--- browser.jar.1.0.4\content\browser\browser.js Tue Apr 12 21:43:20 2005 >+++ browser\content\browser\browser.js Fri Jul 15 19:05:29 2005 >@@ -1468,51 +1468,86 @@ > var shortcutURL = BMSVC.resolveKeyword(aURL, aPostDataRef); > if (!shortcutURL) { > // rjc: add support for string substitution with shortcuts (4/4/2000) > // (see bug # 29871 for details) > var aOffset = aURL.indexOf(" "); > if (aOffset > 0) { > var cmd = aURL.substr(0, aOffset); > var text = aURL.substr(aOffset+1); > shortcutURL = BMSVC.resolveKeyword(cmd, aPostDataRef); > if (shortcutURL && text) { >+ var encodedText = null; >+ var charset= ""; >+ var matches = shortcutURL.match(/^(.*)\&mozcharset=([a-zA-Z][_\-a-zA-Z0-9]+)\s*$/); >+ >+ if (matches) { >+ shortcutURL = matches[1]; >+ charset = matches[2]; >+ } else { >+ try { >+ charset = BMSVC.getLastCharset(shortcutURL); >+ } catch (ex) { >+ } >+ } >+ if (charset != "") >+ encodedText = escape(convertFromUnicode(charset, text)); >+ else // default case: charset=UTF-8 >+ encodedText = encodeURIComponent(text); >+ > if (aPostDataRef && aPostDataRef.value) { > // XXXben - currently we only support "application/x-www-form-urlencoded" > // enctypes. > aPostDataRef.value = unescape(aPostDataRef.value); >- if (aPostDataRef.value.match(/%s/)) >- aPostDataRef.value = getPostDataStream(aPostDataRef.value, text, >+ if (encodedText && aPostDataRef.value.match(/%[sS]/)) { >+ aPostDataRef.value = getPostDataStream(aPostDataRef.value.replace(/%s/g, encodedText).replace(/%S/g, text), text, > "application/x-www-form-urlencoded"); >- else { >+ } else { > shortcutURL = null; > aPostDataRef.value = null; > } >- } >- else >- shortcutURL = shortcutURL.match(/%s/) ? shortcutURL.replace(/%s/g, encodeURIComponent(text)) : null; >+ } else { >+ if (encodedText && /%[sS]/.test(shortcutURL)) >+ shortcutURL = shortcutURL.replace(/%s/g, encodedText) >+ .replace(/%S/g, text); >+ else >+ shortcutURL = null; >+ } > } > } > } > > if (shortcutURL) > aURL = shortcutURL; > > } catch (ex) { > } > return aURL; > } > >+function convertFromUnicode(charset, str) >+{ >+ try { >+ var unicodeConverter = Components >+ .classes["@mozilla.org/intl/scriptableunicodeconverter"] >+ .createInstance(Components.interfaces.nsIScriptableUnicodeConverter); >+ unicodeConverter.charset = charset; >+ str = unicodeConverter.ConvertFromUnicode(str); >+ return str + unicodeConverter.Finish(); >+ } catch(ex) { >+ return null; >+ } >+} >+ > function getPostDataStream(aStringData, aKeyword, aType) > { > var dataStream = Components.classes["@mozilla.org/io/string-input-stream;1"] > .createInstance(Components.interfaces.nsIStringInputStream); >- aStringData = aStringData.replace(/%s/g, encodeURIComponent(aKeyword)); > dataStream.setData(aStringData, aStringData.length); > > var mimeStream = Components.classes["@mozilla.org/network/mime-input-stream;1"] > .createInstance(Components.interfaces.nsIMIMEInputStream); > mimeStream.addHeader("Content-Type", aType); > mimeStream.addContentLength = true; > mimeStream.setData(dataStream); > return mimeStream.QueryInterface(Components.interfaces.nsIInputStream); > } >
Attached patch seamonkey patch (obsolete) (deleted) — Splinter Review
Attachment #164872 - Attachment is obsolete: true
Attachment #188830 - Attachment is obsolete: true
Attachment #190223 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #190223 - Flags: review?(vladimir)
Comment on attachment 190223 [details] [diff] [review] seamonkey patch Neil's concerns are addressed in this patch. >+ if (charsetData) { >+ PRUnichar *charset; >+ charsetData->GetValueConst(&charset); It should be |const PRUnichar *charset|. I fixed it in my tree.
Attachment #188830 - Flags: review?(vladimir)
Attached patch firefox patch (deleted) — Splinter Review
firefox patch with 'post' support
Attachment #190860 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #190860 - Flags: review?(vladimir)
My latest patch for firefox includes the patch of wyns_sh@hotmail.com (I changed it slightly, though). Vladimir and Neil, can you r/sr my two patches soon?
Comment on attachment 190223 [details] [diff] [review] seamonkey patch Where's the definition of convertFromUnicode?
|convertFromUnicode| was added and a typo was fixed.
Attachment #189407 - Attachment is obsolete: true
Attachment #190223 - Attachment is obsolete: true
Attachment #191586 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #191586 - Flags: review?(vladimir)
Attachment #190223 - Flags: superreview?(neil.parkwaycc.co.uk)
Attachment #190223 - Flags: review?(vladimir)
Comment on attachment 190860 [details] [diff] [review] firefox patch r=me, this looks ok visually -- haven't tried it out yet. I can't think of anything bad that could happen with sending a &mozcharset=foo to a site if the bookmark is selected instead of used as a keyword, but it's probably no worse than sending "%s" down..
Attachment #190860 - Flags: review?(vladimir) → review+
Comment on attachment 191586 [details] [diff] [review] seamonkey patch with the missing function and a typo fixed >+ var charset= ""; Nit: space before = >+ re = /^(.*)\&mozcharset=([a-zA-Z][_\-a-zA-Z0-9]+)\s*$/; No declaration; I assume you mean "const re" here? >+ if (charset != "") Nit: trailing space. Also, I would just use if (charset)
Attachment #191586 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Thanks for r/sr. Vlad, can I assume that your 'r' of firefox patch imples 'r' of seamonkey patch? The latter is a subset of the former. Neil, can you review firefox patch (attachment 190860 [details] [diff] [review]) as well?
Comment on attachment 190860 [details] [diff] [review] firefox patch I can accept vlad's + as applying to attachment 191586 [details] [diff] [review] too.
Attachment #190860 - Flags: superreview?(neil.parkwaycc.co.uk) → superreview+
Comment on attachment 190860 [details] [diff] [review] firefox patch asking for a 1.8b4 As vlad noted, this is a low risk patch, but those who need to use keyword quicksearch with encodings other than UTF-8 love to see this in ff 1.5
Attachment #190860 - Flags: approval1.8b4?
Comment on attachment 191586 [details] [diff] [review] seamonkey patch with the missing function and a typo fixed asking for a 1.8b4 As vlad noted (he granted 'r' for the superset of this patch for firefox), this is a low risk patch, but those who need to use keyword quicksearch with encodings other than UTF-8 love to see this in 1.8 btw, I've already taken car of Neil's nits in my tree.
Attachment #191586 - Flags: approval1.8b4?
Attachment #191586 - Flags: approval1.8b4? → approval1.8b4+
Attachment #190860 - Flags: approval1.8b4? → approval1.8b4+
Whiteboard: [checkin needed]
landed on trunk
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox1.1
Whiteboard: [checkin needed]
Attachment #191586 - Flags: review?(vladimir)
*** Bug 260006 has been marked as a duplicate of this bug. ***
One more tiny bug need to be fixed, the post data is also stored in encoded form in bookmarks like: name=%3D%25s so in browser.js (aPostDataRef && /%s/.test(aPostDataRef.value)) won't match before unescaping even there is %s in post data. resulting the last_charset isn't used. I am using firefox 1.5 RC3, a quick fix is: --- browser-orig.js Sun Oct 23 12:39:12 2005 +++ browser.js Sat Dec 10 12:31:49 2005 @@ -1631,6 +1631,9 @@ var charset = ""; const re = /^(.*)\&mozcharset=([a-zA-Z][_\-a-zA-Z0-9]+)\s*$/; var matches = shortcutURL.match(re); + if (aPostDataRef && aPostDataRef.value) { + aPostDataRef.value = unescape(aPostDataRef.value); + } if (matches) { shortcutURL = matches[1]; charset = matches[2]; @@ -1651,7 +1654,6 @@ if (aPostDataRef && aPostDataRef.value) { // XXXben - currently we only support "application/x-www-form-urlencoded" // enctypes. - aPostDataRef.value = unescape(aPostDataRef.value); if (aPostDataRef.value.match(/%[sS]/)) { aPostDataRef.value = getPostDataStream(aPostDataRef.value, text, encodedText,
*** Bug 336664 has been marked as a duplicate of this bug. ***
*** Bug 246848 has been marked as a duplicate of this bug. ***
*** Bug 270515 has been marked as a duplicate of this bug. ***
*** Bug 272419 has been marked as a duplicate of this bug. ***
Can’t you automatically detect this, based on the encoding the form field would have? ~Grauw
I agree with Laurens Holst, the browser should detect the character encoding, and the user will be able to override this auto-detected encoding, if needed.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: