Closed Bug 1560228 Opened 5 years ago Closed 5 years ago

Octothorpe/sharp/hash/pound and question marks are elided from end of searches

Categories

(Firefox :: Address Bar, defect, P1)

68 Branch
defect
Points:
2

Tracking

()

VERIFIED FIXED
Firefox 70
Iteration:
70.2 - Jul 22 - Aug 4
Tracking Status
relnote-firefox --- 68+
firefox-esr60 --- unaffected
firefox-esr68 68+ verified
firefox68 + verified
firefox69 + verified
firefox70 + verified

People

(Reporter: mqudsi, Assigned: mak)

References

(Regression)

Details

(Keywords: regression)

Attachments

(2 files)

User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:68.0) Gecko/20100101 Firefox/68.0

Steps to reproduce:

Try searching (w/ Google as the default search provider, if it makes a difference) via the omnibox/omnibar for a search term with the literal # as the last non-whitespace character in your search, e.g. "how to do foo in C#"

Actual results:

The # is trimmed/elided from the search parameters; it is presumably detected as an unused trailing document location indicator/page anchor.

For the original example, the search results for "how to do foo in C" are loaded instead.

Expected results:

Firefox should detect that a search is being performed rather than a navigation to a specific URL, and any/all url cleanup operations (including eliding unused/trailing # and ? characters, typically denoting unused cruft from previous query string or window anchor operations/manipulations) should be skipped.

As a basic heuristic, any omnibar content containing embedded (i.e. non-trailing or and non-prefixed) whitespace should not be sanitized in this manner. More advanced differentiation between searches and navigation requests is the domain of #1080682, but in this particular case the contents of the omnibar are being correctly interpreted as search terms (and the search results are pulled up), but it seems like the sanitization is being blindly performed regardless of the classification of the content/operation.

Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:69.0) Gecko/20100101 Firefox/69.0
20190619214046

  1. DuckDuckGo set as default search engine.
  2. Enter how to program c# into the address bar.

Browser Console shows request to https://duckduckgo.com/?q=how+to+program+c&t=ffab

Searching directly from the https://duckduckgo.com box shows a request to https://duckduckgo.com/?q=how+to+program+c%23&t=hj

Status: UNCONFIRMED → NEW
Component: Untriaged → Address Bar
Ever confirmed: true

We have a set of "restriction" characters you can use at the beginning or the end of your search to restrict results to certain sources. The "#" character happens to cause your search to match on titles. The full list is here, if you're interested: https://searchfox.org/mozilla-central/rev/c0ca77697c6868482f30af873ec8069f2c080a34/browser/components/urlbar/UrlbarTokenizer.jsm#58

So what you're seeing is intentional behavior, but I agree it's surprising.

Marco, any thoughts? iirc I thought we should only allow restriction chars at the beginning. :-) Although I guess you could hit unfortunate cases even with that, like searching for a hashtag (#foo).

Priority: -- → P3

Oh but in this case you're selecting the first result, right? So what I mentioned above is true, it's just not really relevant.

There was a similar bug about this I think, and Mark suggested that if you choose the first result ("how to program c -- Search with Engine"), we should include restriction characters in the search string we send to the engine. Which makes sense to me. (And the first result should say "how to program c# -- Search with Engine" -- it should include the #.)

i.e., this really is a bug, not intended behavior.

(In reply to Drew Willcoxon :adw from comment #3)

Oh but in this case you're selecting the first result, right? So what I mentioned above is true, it's just not really relevant.

There was a similar bug about this I think, and Mark suggested that if you choose the first result ("how to program c -- Search with Engine"), we should include restriction characters in the search string we send to the engine. Which makes sense to me. (And the first result should say "how to program c# -- Search with Engine" -- it should include the #.)

I agree with that, provided we still remove restriction chars at the beginning of the search string. Otherwise we'd regress the fix for bug 1334019.
In practice a trailing restriction char would still apply, but won't be stripped from the heuristic search result, right?

Points: --- → 2

(In reply to Marco Bonardo [::mak] from comment #5)

I agree with that, provided we still remove restriction chars at the
beginning of the search string. Otherwise we'd regress the fix for bug
1334019.

True, although ideally you could search for "#someHashtag". Could "?" be a special case? That is, remove "?" at the beginning but not other restriction chars? "?" is the only char that affects the heuristic result (right?) -- it forces a search heuristic, but none of the other chars force one.

In practice a trailing restriction char would still apply, but won't be
stripped from the heuristic search result, right?

Yes, I think so, e.g. "how to program c#" would still restrict non-heuristic results to titles, but the heuristic would show "how to program c# -- Search with Google"

(In reply to Drew Willcoxon :adw from comment #6)

(In reply to Marco Bonardo [::mak] from comment #5)

I agree with that, provided we still remove restriction chars at the
beginning of the search string. Otherwise we'd regress the fix for bug
1334019.

True, although ideally you could search for "#someHashtag". Could "?" be a special case? That is, remove "?" at the beginning but not other restriction chars? "?" is the only char that affects the heuristic result (right?) -- it forces a search heuristic, but none of the other chars force one.

Maybe, it depends on how much it complicates the code.

This feature is very cool and mildly useful now that I read what it does, but it's absolutely obscure otherwise. In my opinion the UI should reflect that the search is being restricted to titles/bookmarks/etc.

For instance, this already happens in part with the "?" token, since typing it will show an empty "Search with <engine>". Ideally it should be something that remains visible even after the user has typed other characters - perhaps something like what is done with custom search engines (via keywords), where you can clearly understand that you will be searching on website X rather than a normal search?

Regressed by: 1538050

7 duplicates now. We should probably prioritize this.

Priority: P3 → P2

I can try to find some time.

Assignee: nobody → mak77
Status: NEW → ASSIGNED

I was thinking of looking at this too, so if you need any help or want me to take it, let me know.

Given the number of dupes coming in, I'd consider this for a 68.0.2 dot release.

Should we bump this to a P1 since it's on the dot release radar?

Flags: needinfo?(mak77)
Flags: needinfo?(mak77)
Priority: P2 → P1
Iteration: --- → 70.2 - Jul 22 - Aug 4
Pushed by mak77@bonardo.net: https://hg.mozilla.org/integration/autoland/rev/5cd2f7a145ac Strip only leading question marks from search string. r=adw

As a workaround you can add two hashtags like "##" at the end

Status: ASSIGNED → RESOLVED
Closed: 5 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 70

(In reply to pipernene from comment #28)

As a workaround you can add two hashtags like "##" at the end

A workaround that works with all characters is to put your search in quotes, like "c#", "c++", "c?"

We're going to fix this possibly as far back as 68, so that won't be necessary soon.

Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw

Beta/Release Uplift Approval Request

  • User impact if declined: Some search strings are cut, searching for things like "doing something in c#" is broken
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: Try to search for "c#" or "c++", see other comments for more example. Check the searched string when the first result (search) is confirmed is what was typed.
    Only exception, "?something" will search for "something". In practice the only special char we strip is a leading question mark.
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simple code removal with test
  • String changes made/needed: none

ESR Uplift Approval Request

  • If this is not a sec:{high,crit} bug, please state case for ESR consideration: Annoyance in searching from the urlbar
  • User impact if declined: Some search strings are cut, searching for things like "doing something in c#" is broken
  • Fix Landed on Version:
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): Simple code removal with test
  • String or UUID changes made by this patch:
Attachment #9080347 - Flags: approval-mozilla-esr68?
Attachment #9080347 - Flags: approval-mozilla-beta?
Flags: qe-verify+

Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw

Beta/Release Uplift Approval Request

  • User impact if declined: See Beta uplift
  • Is this code covered by automated tests?: Yes
  • Has the fix been verified in Nightly?: No
  • Needs manual test from QE?: Yes
  • If yes, steps to reproduce: See Beta uplift
  • List of other uplifts needed: None
  • Risk to taking this patch: Low
  • Why is the change risky/not risky? (and alternatives if risky): See Beta uplift
  • String changes made/needed:
Attachment #9080347 - Flags: approval-mozilla-release?
QA Whiteboard: [qa-triaged]

I reproduced this issue using Fx 68.0 on Windows 10 x64.
I can confirm this issue is fixed, I verified using Fx 70.0a1 (2019-07-26) on Windows 10 x64, Ubuntu 18.04 LTS and macOS 10.13.

Comment on attachment 9080347 [details]
Bug 1560228 - Strip only leading question marks from search string. r=adw

Fixes a regression causing search strings to be cut off when certain characters are present. Verified on Nightly. Approved for 69.0b9.

Attachment #9080347 - Flags: approval-mozilla-beta? → approval-mozilla-beta+

Backed out changeset 77b48431ba6f (Bug 1560228) for failures on browser_keyword.js

Push with failures: https://treeherder.mozilla.org/#/jobs?repo=mozilla-beta&fromchange=a00fc0a3e360c00ae12da0cf501f99721ed613c3&searchStr=mochitest-browser-chrome&tochange=581fdb8b8e9cda964d6020feb00efe1eec062b0d&selectedJob=258578454

Backout link: https://hg.mozilla.org/releases/mozilla-beta/rev/581fdb8b8e9cda964d6020feb00efe1eec062b0d

Failure log:
https://treeherder.mozilla.org/logviewer.html#/jobs?job_id=258578454&repo=mozilla-beta&lineNumber=18972

23:36:22 INFO - Entering test bound test_keyword_using_get
23:36:22 INFO - Buffered messages logged at 23:36:19
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Node should contain the name of the bookmark and query - "example.com: something" == "example.com: something" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have an empty action - true == true -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomething","keyword":"get","input":"get%20something"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomething","keyword":"get","input":"get%20something"}" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | URL hbox element sanity check - true == true -
23:36:22 INFO - Normal click on result
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" -
23:36:22 INFO - Middle-click on result
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=something" line: 0}]
23:36:22 INFO - Buffered messages logged at 23:36:20
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomethingmore","keyword":"get","input":"get%20somethingmore"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs%3Fq%3Dsomethingmore","keyword":"get","input":"get%20somethingmore"}" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from middle-clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" -
23:36:22 INFO - Leaving test bound test_keyword_using_get
23:36:22 INFO - Entering test bound test_keyword_using_post
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs?q=somethingmore" line: 0}]
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have a keyword result - 4 == 4 -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Node should contain the name of the bookmark and query - "example.com: something" == "example.com: something" -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Should have an empty action - true == true -
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Expect correct url - "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs","keyword":"post","input":"post%20something","postData":"q%3Dsomething"}" == "moz-action:keyword,{"url":"https%3A%2F%2Fexample.com%2Fbrowser%2Fbrowser%2Fcomponents%2Furlbar%2Ftests%2Fbrowser%2Fprint_postdata.sjs","keyword":"post","input":"post%20something","postData":"q%3Dsomething"}" -
23:36:22 INFO - Normal click on result
23:36:22 INFO - Buffered messages logged at 23:36:21
23:36:22 INFO - waiting for tab
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Tab should have loaded from clicking on result - "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" == "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" -
23:36:22 INFO - Console message: [JavaScript Error: "The character encoding of the plain text document was not declared. The document will render with garbled text in some browser configurations if the document contains characters from outside the US-ASCII range. The character encoding of the file needs to be declared in the transfer protocol or file needs to use a byte order mark as an encoding signature." {file: "https://example.com/browser/browser/components/urlbar/tests/browser/print_postdata.sjs" line: 0}]
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | post data was submitted correctly - "q=something" == "q=something" -
23:36:22 INFO - Leaving test bound test_keyword_using_post
23:36:22 INFO - Entering test bound test_keyword_with_question_mark
23:36:22 INFO - Buffered messages logged at 23:36:22
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Result should be a search - 2 == 2 -
23:36:22 INFO - Buffered messages finished
23:36:22 INFO - TEST-UNEXPECTED-FAIL | browser/components/urlbar/tests/browser/browser_keyword.js | Check search query - "question" == "question?" - JS frame :: chrome://mochitests/content/browser/browser/components/urlbar/tests/browser/browser_keyword.js :: test_keyword_with_question_mark :: line 236
23:36:22 INFO - Stack trace:
23:36:22 INFO - chrome://mochitests/content/browser/browser/components/urlbar/tests/browser/browser_keyword.js:test_keyword_with_question_mark:236
23:36:22 INFO - TEST-PASS | browser/components/urlbar/tests/browser/browser_keyword.js | Result should be a keyword - 4 == 4 -

Flags: needinfo?(mak77)

what is failing is the legacy version of the test, with QB disabled. That will be a common thing for uplifts :(

Attached patch Beta/ESR patch (deleted) — Splinter Review

As expected, the legacy test needed a small update (2 lines removal in UrlbarTestUtils.jsm) to cope with the changed code

Flags: needinfo?(mak77)
Attachment #9081280 - Flags: checkin?(ryanvm)

I can confirm this issue is fixed on beta as well, I verified using Fx 69.0b9 on Windows 10 x64, Ubuntu 18.04 LTS and macOS 10.13.

Summary: Octothorpe/sharp/hash/pound and question marks are elided from end of omnibox searches → Octothorpe/sharp/hash/pound and question marks are elided from end of searches
Comment on attachment 9081280 [details] [diff] [review] Beta/ESR patch Fixes a commonly-reported regression in Fx68. Verified fixed by QA and no known issues reported since it landed. Approved for 68.0.2 and 68.0.2esr.
Attachment #9081280 - Flags: checkin?(ryanvm)
Attachment #9081280 - Flags: approval-mozilla-release+
Attachment #9081280 - Flags: approval-mozilla-esr68+
Attachment #9080347 - Flags: approval-mozilla-release?
Attachment #9080347 - Flags: approval-mozilla-esr68?

Added to the 68.0.2 relnotes as:

Fixed a bug causing some special characters to be cut off from the end of the search terms when searching from the URL bar

Using the builds from taskcluster, I verified this issue with Fx 68.0.2(x32) and 68.1.0esr(x32) on Windows 10 x64, macOS 10.13.6 and Ubuntu 18.04 LTS.

Status: RESOLVED → VERIFIED
Flags: qe-verify+

Problem is not only with # but with all special characters (#, $, %, ^, *, ?, +, % mentioned in https://searchfox.org/mozilla-central/rev/c0ca77697c6868482f30af873ec8069f2c080a34/browser/components/urlbar/UrlbarTokenizer.jsm#58).

Try to search this keyword:
foo in c++.
search results shown for foo in c+ (https://duckduckgo.com/?q=foo+in+c%2B&t=ffab&ia=web).

Yes, and it should be fixed now on all Firefox versions back to 68 -- although I don't think the latest 68 release includes it yet, and I'm not sure about the latest 69/beta build. If it's not fixed in the current 68 and 69 releases, it will be in the next ones.

The current 69 beta should contain this fix. The 68.0.2 release isn't available yet - ETA is later this week.

Flags: in-qa-testsuite?(comorasu.cristian)
Flags: in-qa-testsuite?(comorasu.cristian) → in-qa-testsuite+
Has Regression Range: --- → yes
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: