Octothorpe/sharp/hash/pound and question marks are elided from end of searches
Categories
(Firefox :: Address Bar, defect, P1)
Tracking
()
People
(Reporter: mqudsi, Assigned: mak)
References
(Regression)
Details
(Keywords: regression)
Attachments
(2 files)
(deleted),
text/x-phabricator-request
|
RyanVM
:
approval-mozilla-beta+
|
Details |
(deleted),
patch
|
RyanVM
:
approval-mozilla-release+
RyanVM
:
approval-mozilla-esr68+
|
Details | Diff | Splinter Review |
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0
Steps to reproduce:
Try searching (w/ Google as the default search provider, if it makes a difference) via the omnibox/omnibar for a search term with the literal #
as the last non-whitespace character in your search, e.g. "how to do foo in C#"
Actual results:
The #
is trimmed/elided from the search parameters; it is presumably detected as an unused trailing document location indicator/page anchor.
For the original example, the search results for "how to do foo in C" are loaded instead.
Expected results:
Firefox should detect that a search is being performed rather than a navigation to a specific URL, and any/all url cleanup operations (including eliding unused/trailing #
and ?
characters, typically denoting unused cruft from previous query string or window anchor operations/manipulations) should be skipped.
As a basic heuristic, any omnibar content containing embedded (i.e. non-trailing or and non-prefixed) whitespace should not be sanitized in this manner. More advanced differentiation between searches and navigation requests is the domain of #1080682, but in this particular case the contents of the omnibar are being correctly interpreted as search terms (and the search results are pulled up), but it seems like the sanitization is being blindly performed regardless of the classification of the content/operation.
Comment 1•5 years ago
|
||
Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
20190619214046
- DuckDuckGo set as default search engine.
- Enter
how to program c#
into the address bar.
Browser Console shows request to https://duckduckgo.com/?q=how+to+program+c&t=ffab
Searching directly from the https://duckduckgo.com box shows a request to https://duckduckgo.com/?q=how+to+program+c%23&t=hj
Comment 2•5 years ago
|
||
We have a set of "restriction" characters you can use at the beginning or the end of your search to restrict results to certain sources. The "#" character happens to cause your search to match on titles. The full list is here, if you're interested: https://searchfox.org/mozilla-central/rev/c0ca77697c6868482f30af873ec8069f2c080a34/browser/components/urlbar/UrlbarTokenizer.jsm#58
So what you're seeing is intentional behavior, but I agree it's surprising.
Marco, any thoughts? iirc I thought we should only allow restriction chars at the beginning. :-) Although I guess you could hit unfortunate cases even with that, like searching for a hashtag (#foo).
Comment 3•5 years ago
|
||
Oh but in this case you're selecting the first result, right? So what I mentioned above is true, it's just not really relevant.
There was a similar bug about this I think, and Mark suggested that if you choose the first result ("how to program c -- Search with Engine"), we should include restriction characters in the search string we send to the engine. Which makes sense to me. (And the first result should say "how to program c# -- Search with Engine" -- it should include the #.)
i.e., this really is a bug, not intended behavior.
Assignee | ||
Comment 5•5 years ago
|
||
(In reply to Drew Willcoxon :adw from comment #3)
Oh but in this case you're selecting the first result, right? So what I mentioned above is true, it's just not really relevant.
There was a similar bug about this I think, and Mark suggested that if you choose the first result ("how to program c -- Search with Engine"), we should include restriction characters in the search string we send to the engine. Which makes sense to me. (And the first result should say "how to program c# -- Search with Engine" -- it should include the #.)
I agree with that, provided we still remove restriction chars at the beginning of the search string. Otherwise we'd regress the fix for bug 1334019.
In practice a trailing restriction char would still apply, but won't be stripped from the heuristic search result, right?
Assignee | ||
Updated•5 years ago
|
Comment 6•5 years ago
|
||
(In reply to Marco Bonardo [::mak] from comment #5)
I agree with that, provided we still remove restriction chars at the
beginning of the search string. Otherwise we'd regress the fix for bug
1334019.
True, although ideally you could search for "#someHashtag". Could "?" be a special case? That is, remove "?" at the beginning but not other restriction chars? "?" is the only char that affects the heuristic result (right?) -- it forces a search heuristic, but none of the other chars force one.
In practice a trailing restriction char would still apply, but won't be
stripped from the heuristic search result, right?
Yes, I think so, e.g. "how to program c#" would still restrict non-heuristic results to titles, but the heuristic would show "how to program c# -- Search with Google"
Updated•5 years ago
|
Assignee | ||
Comment 7•5 years ago
|
||
(In reply to Drew Willcoxon :adw from comment #6)
(In reply to Marco Bonardo [::mak] from comment #5)
I agree with that, provided we still remove restriction chars at the
beginning of the search string. Otherwise we'd regress the fix for bug
1334019.True, although ideally you could search for "#someHashtag". Could "?" be a special case? That is, remove "?" at the beginning but not other restriction chars? "?" is the only char that affects the heuristic result (right?) -- it forces a search heuristic, but none of the other chars force one.
Maybe, it depends on how much it complicates the code.
This feature is very cool and mildly useful now that I read what it does, but it's absolutely obscure otherwise. In my opinion the UI should reflect that the search is being restricted to titles/bookmarks/etc.
For instance, this already happens in part with the "?" token, since typing it will show an empty "Search with <engine>". Ideally it should be something that remains visible even after the user has typed other characters - perhaps something like what is done with custom search engines (via keywords), where you can clearly understand that you will be searching on website X rather than a normal search?
Updated•5 years ago
|
Assignee | ||
Comment 16•5 years ago
|
||
I can try to find some time.
Comment 18•5 years ago
|
||
I was thinking of looking at this too, so if you need any help or want me to take it, let me know.
Comment hidden (me-too) |
Updated•5 years ago
|
Comment 21•5 years ago
|
||
Given the number of dupes coming in, I'd consider this for a 68.0.2 dot release.
Comment 22•5 years ago
|
||
Should we bump this to a P1 since it's on the dot release radar?
Assignee | ||
Comment 23•5 years ago
|
||
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Updated•5 years ago
|
Comment 26•5 years ago
|
||
Comment 28•5 years ago
|
||
As a workaround you can add two hashtags like "##" at the end
Comment 29•5 years ago
|
||
bugherder |
Comment 30•5 years ago
|
||
(In reply to pipernene from comment #28)
As a workaround you can add two hashtags like "##" at the end
A workaround that works with all characters is to put your search in quotes, like "c#", "c++", "c?"
We're going to fix this possibly as far back as 68, so that won't be necessary soon.
Updated•5 years ago
|
Assignee | ||
Comment 31•5 years ago
|
||
Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw
Beta/Release Uplift Approval Request
- User impact if declined: Some search strings are cut, searching for things like "doing something in c#" is broken
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: Try to search for "c#" or "c++", see other comments for more example. Check the searched string when the first result (search) is confirmed is what was typed.
Only exception, "?something" will search for "something". In practice the only special char we strip is a leading question mark. - List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Simple code removal with test
- String changes made/needed: none
ESR Uplift Approval Request
- If this is not a sec:{high,crit} bug, please state case for ESR consideration: Annoyance in searching from the urlbar
- User impact if declined: Some search strings are cut, searching for things like "doing something in c#" is broken
- Fix Landed on Version:
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): Simple code removal with test
- String or UUID changes made by this patch:
Assignee | ||
Updated•5 years ago
|
Assignee | ||
Comment 32•5 years ago
|
||
Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw
Beta/Release Uplift Approval Request
- User impact if declined: See Beta uplift
- Is this code covered by automated tests?: Yes
- Has the fix been verified in Nightly?: No
- Needs manual test from QE?: Yes
- If yes, steps to reproduce: See Beta uplift
- List of other uplifts needed: None
- Risk to taking this patch: Low
- Why is the change risky/not risky? (and alternatives if risky): See Beta uplift
- String changes made/needed:
Updated•5 years ago
|
Comment 33•5 years ago
|
||
I reproduced this issue using Fx 68.0 on Windows 10 x64.
I can confirm this issue is fixed, I verified using Fx 70.0a1 (2019-07-26) on Windows 10 x64, Ubuntu 18.04 LTS and macOS 10.13.
Comment 34•5 years ago
|
||
Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw
Fixes a regression causing search strings to be cut off when certain characters are present. Verified on Nightly. Approved for 69.0b9.
Comment 35•5 years ago
|
||
bugherder uplift |
Comment 36•5 years ago
|
||
Backed out changeset 77b48431ba6f (Bug 1560228) for failures on browser_keyword.js
Backout link: https://hg.mozilla.org/releases/mozilla-beta/rev/581fdb8b8e9cda964d6020feb00efe1eec062b0d
Failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258578454&repo=mozilla-beta&lineNumber=18972
23:36:22 INFO - Entering test bound test_keyword_using_get
23:36:22 INFO - Buffered messages logged at 23:36:19
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Node should contain the name of the bookmark and query - "example.com: something" == "example.com: something" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have an empty action - true == true -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomething","keyword":"get","input":"get%20something"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomething","keyword":"get","input":"get%20something"}" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | URL hbox element sanity check - true == true -
23:36:22 INFO - Normal click on result
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" -
23:36:22 INFO - Middle-click on result
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" line: 0}]
23:36:22 INFO - Buffered messages logged at 23:36:20
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomethingmore","keyword":"get","input":"get%20somethingmore"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomethingmore","keyword":"get","input":"get%20somethingmore"}" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from middle-clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" -
23:36:22 INFO - Leaving test bound test_keyword_using_get
23:36:22 INFO - Entering test bound test_keyword_using_post
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" line: 0}]
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Node should contain the name of the bookmark and query - "example.com: something" == "example.com: something" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have an empty action - true == true -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs","keyword":"post","input":"post%20something","postData":"q%3Dsomething"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs","keyword":"post","input":"post%20something","postData":"q%3Dsomething"}" -
23:36:22 INFO - Normal click on result
23:36:22 INFO - Buffered messages logged at 23:36:21
23:36:22 INFO - waiting for tab
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" -
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" line: 0}]
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | post data was submitted correctly - "q=something" == "q=something" -
23:36:22 INFO - Leaving test bound test_keyword_using_post
23:36:22 INFO - Entering test bound test_keyword_with_question_mark
23:36:22 INFO - Buffered messages logged at 23:36:22
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Result should be a search - 2 == 2 -
23:36:22 INFO - Buffered messages finished
23:36:22 INFO - TEST-UNEXPECTED-FAIL | browser/components/urlbar/tests/browser/browser_keyword.js | Check search query - "question" == "question?" - JS frame :: chrome://mochitests/content/browser/browser/components/urlbar/tests/browser/browser_keyword.js :: test_keyword_with_question_mark :: line 236
23:36:22 INFO - Stack trace:
23:36:22 INFO - chrome://mochitests/content/browser/browser/components/urlbar/tests/browser/browser_keyword.js:test_keyword_with_question_mark:236
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Result should be a keyword - 4 == 4 -
Updated•5 years ago
|
Assignee | ||
Comment 37•5 years ago
|
||
what is failing is the legacy version of the test, with QB disabled. That will be a common thing for uplifts :(
Assignee | ||
Comment 38•5 years ago
|
||
As expected, the legacy test needed a small update (2 lines removal in UrlbarTestUtils.jsm) to cope with the changed code
Comment 39•5 years ago
|
||
bugherder uplift |
Comment 40•5 years ago
|
||
I can confirm this issue is fixed on beta as well, I verified using Fx 69.0b9 on Windows 10 x64, Ubuntu 18.04 LTS and macOS 10.13.
Assignee | ||
Updated•5 years ago
|
Comment 42•5 years ago
|
||
Updated•5 years ago
|
Updated•5 years ago
|
Comment 43•5 years ago
|
||
bugherder uplift |
Comment 44•5 years ago
|
||
bugherder uplift |
Comment 45•5 years ago
|
||
Added to the 68.0.2 relnotes as:
Fixed a bug causing some special characters to be cut off from the end of the search terms when searching from the URL bar
Comment 48•5 years ago
|
||
Using the builds from taskcluster, I verified this issue with Fx 68.0.2(x32) and 68.1.0esr(x32) on Windows 10 x64, macOS 10.13.6 and Ubuntu 18.04 LTS.
Comment 49•5 years ago
|
||
Problem is not only with #
but with all special characters (#
, $
, %
, ^
, *
, ?
, +
, %
mentioned in https://searchfox.org/mozilla-central/rev/c0ca77697c6868482f30af873ec8069f2c080a34/browser/components/urlbar/UrlbarTokenizer.jsm#58).
Try to search this keyword:
foo in c++.
search results shown for foo in c+ (https://duckduckgo.com/?q=foo+in+c%2B&t=ffab&ia=web).
Comment 50•5 years ago
|
||
Yes, and it should be fixed now on all Firefox versions back to 68 -- although I don't think the latest 68 release includes it yet, and I'm not sure about the latest 69/beta build. If it's not fixed in the current 68 and 69 releases, it will be in the next ones.
Comment 51•5 years ago
|
||
The current 69 beta should contain this fix. The 68.0.2 release isn't available yet - ETA is later this week.
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•5 years ago
|
Updated•3 years ago
|
Updated•3 years ago
|
Description
•