Closed
Bug 413784
Opened 17 years ago
Closed 17 years ago
Search for a non-English term in the URL don't match
Categories
(Firefox :: Bookmarks & History, defect)
Firefox
Bookmarks & History
Tracking
()
VERIFIED
FIXED
Firefox 3 beta3
People
(Reporter: erwan, Assigned: erwan)
References
(Blocks 1 open bug, )
Details
Attachments
(3 files, 4 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
image/png
|
Details |
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b3pre) Gecko/2008012320 Minefield/3.0b3pre Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9b3pre) Gecko/2008012320 Minefield/3.0b3pre Since the DB holds an URI-encoded URI, it doesn't match with terms for the decoded URI. For example, this URL: http://flocktest.wordpress.com/2008/01/23/%E3%81%BD%E3%81%BD/ Becomes, decoded: http://flocktest.wordpress.com/2008/01/23/ぽぽ/ A search for "ぽぽ" will return no result. It should return that page. Reproducible: Always Steps to Reproduce: 1. Visit http://flocktest.wordpress.com/2008/01/23/ぽぽ/ 2. Search for "ぽぽ" in the URL bar Actual Results: No result (or other non-related pages) Expected Results: http://flocktest.wordpress.com/2008/01/23/ぽぽ/ appears in the results set I had to create that dummy page because most pages with non-English characters in their URL appear to have the same terms in the title. The bug would not reproduce in this case because, with a match on the title, the pages would show. Some solutions have been discussed in bug 389465: * do a decodeURI before the search, and search for both terms. That would increase the cost of a query. * store a decoded version in the DB. That would increase the cost of indexing and the size of the DB.
Updated•17 years ago
|
Severity: minor → normal
Status: UNCONFIRMED → NEW
Ever confirmed: true
OS: Linux → All
Hardware: PC → All
Version: unspecified → Trunk
Comment 1•17 years ago
|
||
See also bug 407974 and bug #391691
Comment 2•17 years ago
|
||
bug 389465 comment #10: > (Things aren't that bad in this case anyway because the title gets matched..) I beg to differ. If you only consider Wikipedia-style websites, that may be correct, but not even Wikipedia follows this on every kind of page. Example: <http://fa.wikipedia.org/w/index.php?title=%D8%B5%D9%81%D8%AD%D9%87%D9%94_%D8%A7%D8%B5%D9%84%DB%8C&action=edit> I think this is a serious functionality loss for non-English users, mostly for those who speak languages with little Latin alphabets, if any. Therefore, I'm requesting the blocking flag on this bug.
Blocks: Persian-Fx3.5
Flags: blocking-firefox3?
Comment 3•17 years ago
|
||
I agree that the general problem is really bad if we can't match in urls, but like I said, in that particular case of wikipedia, it's not too bad. However, there's a belief that users will start focusing on matching in titles than in urls. That's why there's an emphasis on the title in the display as well as searching the title first before urls when querying.
Comment 4•17 years ago
|
||
Here is a common problem, and serious for power users. After opening two pages in WP which one of them is ascii-only and the other one is IRI, ie. http://en.wikipedia.org/wiki/Durs_Grunbein and http://en.wikipedia.org/wiki/Durs_Gr%C3%BCnbein, I try to open them again later, and I want to select the url I'm looking for, but it's not there. Of course this is not a big deal for German users with a few non-ascii letters, but it makes displaying/searching URL useless for all non-Latin users.
Assignee | ||
Comment 5•17 years ago
|
||
Edward, what do you think about just unescaping the query string? * We just add the cost of one unescape to the query, we don't do a double query or mess up with the DB * If the user types the escaped string (like "%D8%B5") it will not match. But I don't think anyone is ever going to do this kind of search.
Assignee | ||
Comment 6•17 years ago
|
||
Since the URLs are URIencoded in the DB, I changed the SQL query to use an encoded version to match on the URL (but still use the non-encoded version to match on the title). That relies on a native implementation of encodeURI that I put inline in the file, because I didn't know where to put it. Maybe it would be better somewhere else?
Attachment #299335 -
Flags: review?
Assignee | ||
Updated•17 years ago
|
Attachment #299335 -
Flags: review? → review?(dietrich)
Comment 7•17 years ago
|
||
It's not really a problem now, but later if we allow the user to type multiple words like "page title ぽ" to search in both the title and url at the same time, we won't know which ones to escape or not. I suppose we would do something like (title LIKE 'page' OR url LIKE encode('page')) AND (title LIKE 'title' OR url LIKE encode('title')) AND (title LIKE 'ぽ' OR url LIKE encode('ぽ')) // pretend theres %%s
Assignee | ||
Comment 8•17 years ago
|
||
Comment on attachment 299335 [details] [diff] [review] encodeURI the search string for URL match Oops, no longer applies... I'm merging now and I'll submit a new patch then.
Attachment #299335 -
Attachment is obsolete: true
Attachment #299335 -
Flags: review?(dietrich)
Assignee | ||
Comment 9•17 years ago
|
||
Patch merged with recent changes - working on a unit test.
Assignee | ||
Comment 10•17 years ago
|
||
Attachment #299358 -
Attachment is obsolete: true
Attachment #299363 -
Flags: review?(dietrich)
Assignee | ||
Comment 11•17 years ago
|
||
Broke again, new version. Dietrich: is there any patch in the pipe that I should know about?
Attachment #299363 -
Attachment is obsolete: true
Attachment #299639 -
Flags: review?(dietrich)
Attachment #299363 -
Flags: review?(dietrich)
Assignee | ||
Comment 12•17 years ago
|
||
merged again
Attachment #299639 -
Attachment is obsolete: true
Attachment #300093 -
Flags: review?(dietrich)
Attachment #299639 -
Flags: review?(dietrich)
Comment 13•17 years ago
|
||
Thanks for the patches Erwan. I've reimplemented the escaping and merged the patch on top of a few other changes like.. Bug 414285 - Refactor AutoCompleteTagsSearch token splitting code and persist tokens Bug 401869 - Allow multiple words search in Auto-complete/Location Bar
Attachment #300333 -
Flags: review?(dietrich)
Updated•17 years ago
|
Attachment #300093 -
Flags: review?(dietrich)
Updated•17 years ago
|
Attachment #300333 -
Flags: review?(dietrich)
Comment 14•17 years ago
|
||
Thanks for looking into this and providing patches Erwan. Checking in toolkit/components/places/tests/unit/test_413784.js; /cvsroot/mozilla/toolkit/components/places/tests/unit/test_413784.js,v <-- test_413784.js initial revision: 1.1 done
Assignee: nobody → erwan
No longer depends on: 414285
Comment 15•17 years ago
|
||
This should be fixed by bug 407974.
Status: NEW → RESOLVED
Closed: 17 years ago
Depends on: 407974
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → Firefox 3 beta3
Updated•17 years ago
|
Flags: in-litmus-
Verified FIXED using Mozilla/5.0 (Macintosh; U; Intel Mac OS X 10.4; en-US; rv:1.9b3pre) Gecko/2008020419 Minefield/3.0b3pre; see the screenshot in comment 16, above.
Status: RESOLVED → VERIFIED
Updated•17 years ago
|
Flags: blocking-firefox3? → blocking-firefox3+
Updated•17 years ago
|
Blocks: fx35-l10n-fa
Updated•17 years ago
|
No longer blocks: Persian-Fx3.5
Updated•16 years ago
|
Blocks: Persian-Fx3.5
Updated•16 years ago
|
No longer blocks: fx35-l10n-fa
Comment 18•15 years ago
|
||
Bug 451915 - move Firefox/Places bugs to Firefox/Bookmarks and History. Remove all bugspam from this move by filtering for the string "places-to-b-and-h". In Thunderbird 3.0b, you do that as follows: Tools | Message Filters Make sure the correct account is selected. Click "New" Conditions: Body contains places-to-b-and-h Change the action to "Delete Message". Select "Manually Run" from the dropdown at the top. Click OK. Select the filter in the list, make sure "Inbox" is selected at the bottom, and click "Run Now". This should delete all the bugspam. You can then delete the filter. Gerv
Component: Places → Bookmarks & History
QA Contact: places → bookmarks
You need to log in
before you can comment on or make changes to this bug.
Description
•