Closed Bug 1164159 Opened 10 years ago Closed 9 years ago

Send information about the default search engines through Telemetry

Categories

(Toolkit :: Telemetry, defect, P1)

defect

Tracking

()

RESOLVED FIXED
mozilla41
Iteration:
41.3 - Jun 29
Tracking Status
firefox40 --- fixed
firefox41 --- fixed

People

(Reporter: glind, Assigned: florian)

References

Details

(Whiteboard: [hijacking][fxsearch])

Attachments

(2 files, 11 obsolete files)

(deleted), patch
florian
: review+
Details | Diff | Splinter Review
(deleted), patch
florian
: review+
Details | Diff | Splinter Review
Minimal Proposal: - near BEGINNING and END of session - store / transmit all `SearchService.getDefaultEngines()` as (alias, anonymized path, template string) tuples Justifcations: - changes DURING the session can be correlated with addon installs (and will overestimate those effects) - changes BETWEEN sessions should not happen, and are clear evidence of tampering. Notes: 1. Doesn't require a 100% sample. Small samples (1%) should be sufficient to detect things "pretty early" unless they have "really slow" rollouts. We can formalize this if this number is too much. 2. Claim: hashing the values of these engines removes much of the value. These *should* be our engines as deployed. 3. This is both a VERIFICATION and an ESCALATION DETECTION mechanism. As the fix rolls out, "unusual defaultEngines" should go to (and maintain at) near 0 levels.
Gavin, Benjamin, Florian, or Vladan :) Ideas and critiques welcome. I am working on implementing it, unless you want someone else to do it.
Flags: needinfo?(gavin.sharp)
Assignee: nobody → glind
Status: NEW → ASSIGNED
Comment on attachment 8607055 [details] [diff] [review] WIP Hardening Verification Telemetry Questions: (Florian: - this a sensible plan / way to implement? This does not capture any aspects of the file path. - I claim that this should only be called once, and that being called twice is actually a Bad Sign. Is that right? Benjamin: - new probe. - I claim this is needed. - Georg had some suggestions for simplifying the description
Attachment #8607055 - Flags: feedback?(florian)
Attachment #8607055 - Flags: feedback?(benjamin)
(In reply to Gregg Lind (User Advocacy - Heartbeat - Test Pilot) from comment #3) > Questions: > > (Florian: > > - This does not capture any aspects of the file path. Is this intentional, or because you haven't found a straight forward way to capture the path? > - I claim that this should only be called once, and that being called twice > is actually a Bad Sign. Is that right? Assuming you are talking about the _recordDefaultEngines method you are adding, do you see any reason why it could be called twice? Maybe the case where a synchronous initialization of the search service happens while the async code path was already in progress?
Comment on attachment 8607055 [details] [diff] [review] WIP Hardening Verification Telemetry Review of attachment 8607055 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/search/nsSearchService.js @@ +4869,5 @@ > + }, > + > + _recordDefaultEngines: function eus_recordDefaultEngines () { > + DO_LOG("recording defaultengines"); > + let engs = this.getDefaultEngines({}).map((eng)=>[eng.name,eng.searchForm]) I think you really want to record some information about where we are loading the plugin from. How would you feel about using a format like this: jar:[app]omni.ja/browser/yahoo.xml [profile]searchplugins/foo.xml [app]searchplugins/evil.xml jar:[profile]extensions/foo.xpi/foo.xml [other]unexpected.xml <- 'other' is used when neither the app nor profile directory is an ancestor of the folder containing the engine file, to avoid leaking user-identifiable data like a username. To compute this location information, you'll most likely want to add another getter in the Engine implementation. This will be inspired from the existing code in the _id getter. I can't decide if the name is useful information or not. I'm afraid the only reason why it's useful is that it's the info we use to set the original default from the browser.search.defaultenginename pref. I'm pretty sure .searchForm isn't the information you want to save. This is the URL of the form that will be loaded if the user presses <enter> in the searchbox without entering any search query. Instead I think you want something like engine._getURLOfType(URLTYPE_SEARCH_HTML).template ::: toolkit/components/telemetry/Histograms.json @@ +4768,5 @@ > + "expires_in_version": "never", > + "kind": "count", > + "keyed": true, > + "releaseChannelCollection": "opt-out", > + "description": "Verify claims that https://bugzilla.mozilla.org/1162569 and https://bugzilla.mozilla.org/show_bug.cgi?id=1164159 works to protect getDefault" This doesn't seem to be a very useful description to me. What about something more like "Record information about the set of default search engines"?
Attachment #8607055 - Flags: feedback?(florian)
Summary: Verify Hardening of SearchSerivce DefaultEngines → Verify Hardening of SearchService DefaultEngines
Attachment #8607055 - Attachment is obsolete: true
Attachment #8607055 - Flags: feedback?(benjamin)
Comment on attachment 8607917 [details] [diff] [review] WIP v2 Hardening Verification Telemetry (ignore all the rawEngines stuff, that's for me to debug :) ) I suggest four possible ways to serialize the engine. 1 + // eng._serializeToJSON() this is RIGHT, but WAY TOO BIG 2 + // eng._getURLOfType("URLTYPE_SEARCH_HTML").template <- TOO SMALL. 3 + // eng._getURLOfType("URLTYPE_SEARCH_HTML") <- SUFFICIENT, EXPLICIT 4 + // eng.getSubmission("dummy", null, "searchbar").uri.spec <- SUFFICIENT, Implicit, shorter, less clear? 1, 3, 4 are sufficient to get partner code hijack. Thoughts? I am leaning toward (4), but maybe am being silly about space costs. 3 is SUPER EXPLICIT. With 4, analysts have to do query arg splitting anyway to get the 'partner code arg' for each engine. Nits: only some engines currently support the "saerchbar" Purpose. Example output: JSON.stringify(Services.search.getDefaultEngines().map(x => x.getSubmission('something', null, 'searchbar').uri.spec), null, 2) `[ "https://search.yahoo.com/yhs/search?p=something&ei=UTF-8&hspart=mozilla&hsimp=yhs-001", "https://www.google.com/search?q=something&ie=utf-8&oe=utf-8", "https://www.bing.com/search?q=something&pc=MOZI&form=MOZSBR", "http://www.amazon.com/exec/obidos/external-search/?field-keywords=something&mode=blended&tag=mozilla-20&sourceid=Mozilla-search", "https://duckduckgo.com/?q=something&t=ffsb", "http://rover.ebay.com/rover/1/711-47294-18009-3/4?mfe=search&mpre=http://www.ebay.com/sch/i.html?_nkw=something", "https://twitter.com/search?q=something&partner=Firefox&source=desktop-search", "https://en.wikipedia.org/wiki/Special:Search?search=something&sourceid=Mozilla-search" ]`
Attachment #8607917 - Flags: feedback?(florian)
Rephrasing the summary to match the code changes we are doing in this bug. Verifying will be done elsewhere once we have received the data. (and removing the needinfo on Gavin as I don't think you are still waiting for information from him)
Flags: needinfo?(gavin.sharp)
Summary: Verify Hardening of SearchService DefaultEngines → Send information about the default search engines through Telemetry
(In reply to Gregg Lind (User Advocacy - Heartbeat - Test Pilot) from comment #7) > Comment on attachment 8607917 [details] [diff] [review] > WIP v2 Hardening Verification Telemetry > > (ignore all the rawEngines stuff, that's for me to debug :) ) > > I suggest four possible ways to serialize the engine. > > 1 + // eng._serializeToJSON() this is RIGHT, but WAY TOO BIG > 2 + // eng._getURLOfType("URLTYPE_SEARCH_HTML").template <- TOO SMALL. > 3 + // eng._getURLOfType("URLTYPE_SEARCH_HTML") <- SUFFICIENT, EXPLICIT > 4 + // eng.getSubmission("dummy", null, "searchbar").uri.spec <- > SUFFICIENT, Implicit, shorter, less clear? I would suggest we use a variation of your 3 and 4: eng._getURLOfType("text/html").getSubmission("", e, "searchbar").uri.spec This gives the same result as your 4, but avoids sending the word 'dummy' for each url. Here are the results we get with my suggestion: Services.search.getDefaultEngines().map(e => e.wrappedJSObject._getURLOfType("text/html").getSubmission("", e, "searchbar").uri.spec).join('\n') "https://www.google.com/search?q=&ie=utf-8&oe=utf-8 https://search.yahoo.com/yhs/search?p=&ei=UTF-8&hspart=mozilla&hsimp=yhs-001 https://www.bing.com/search?q=&pc=MOZI&form=MOZSBR http://www.amazon.com/exec/obidos/external-search/?field-keywords=&mode=blended&tag=mozilla-20&sourceid=Mozilla-search https://duckduckgo.com/?q=&t=ffsb http://rover.ebay.com/rover/1/711-47294-18009-3/4?mfe=search&mpre=http://www.ebay.com/sch/i.html?_nkw= https://twitter.com/search?q=&partner=Firefox&source=desktop-search https://en.wikipedia.org/wiki/Special:Search?search=&sourceid=Mozilla-search"
Attached patch WIP v3 (obsolete) (deleted) — Splinter Review
Mark, I'm only requesting feedback for now because I haven't figured out yet how much automated testing this should receive. If you have suggestions about how this can be meaningfully tested, feel free to share :). Vladan, I'm needinfo'ing you for the added telemetry probe. It's the first time I add a new probe, so in addition to checking that we are not sending sensitive user data, please also verify that the data we'll get will be usable. I'm not sure how one sees the results of an histogram of type 'count'.
Attachment #8607917 - Attachment is obsolete: true
Attachment #8607917 - Flags: feedback?(florian)
Flags: needinfo?(vdjeric)
Attachment #8610154 - Flags: feedback?(markh)
Isn't this already covered by the search engine telemetry in TelemetryEnvironment? See bug 1138503 and ping.environment.settings.defaultSearchEngine in https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/
Flags: needinfo?(vdjeric) → needinfo?(glind)
Telemetry records the default search engine in the environment section of every ping and sends off a new ping as soon as the environment changes (e.g. when the default search engine changes during the session or if a new addon gets installed)
Flags: needinfo?(alessio.placitelli)
It should also be possible to infer search engine changes between sessions by analyzing the collected Telemetry from a given client
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #11) > Isn't this already covered by the search engine telemetry in > TelemetryEnvironment? See bug 1138503 and > ping.environment.settings.defaultSearchEngine The patch here is not to record the user's default search engine, but the set of default engines the search service finds at startup. This probe is to help us understand how many of our users are affected by malware modifying the engine files on the disk, or impersonating some of our default providers in some other ways.
Florian and I talked on IRC. I understand now that this patch is looking for search engine modifications, not search engine changes. - We should still report the search engine details in the environment section, instead of trying to shoe-horn the data into keyed histograms - Gregg: Why don't we only report on changes to the default search engine? - There are privacy issues with reporting URLs of search engines, e.g. if IBM's corporate intranet search engine is my default at work, then it would identify me as an IBM employee. Can we only report hashes, or only report the search engine's URL if we found it to have been tampered with?
Flags: needinfo?(alessio.placitelli)
Comment on attachment 8610154 [details] [diff] [review] WIP v3 Review of attachment 8610154 [details] [diff] [review]: ----------------------------------------------------------------- The code itself looks fine to me, and I'll leave the debates about the privacy tradeoffs to Gregg and Vladan. I'd have thought a fairly simple test for _telemetryId() would be good, and it should also be possible to test exactly what was submitted to telemetry - ie, I guess I don't see what is particularly difficult about testing this. (Maybe testing XREAppDist is tricky in an xpcshell environment, but living without specific coverage for that might be ok) ::: toolkit/components/search/nsSearchService.js @@ +2842,5 @@ > + } > + > + let id = "unknown"; > + if (file) { > + id = file.path; It seems a bit confusing to assign to id here but always overwrite it later after checking a different bool. Can't you set id to null, s/id.startsWith/file.path.startsWith/ and replace knownLocation with a check that id isn't null? @@ +4557,5 @@ > e.hidden = false; > } > }, > > + _recordDefaultEngines: function SRCH_SVC__recordDefaultEngines() { I don't think we need this old-school function name.
Attachment #8610154 - Flags: feedback?(markh) → feedback+
Whiteboard: [hijacking][fxsearch]
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #15) > - There are privacy issues with reporting URLs of search engines, e.g. if > IBM's corporate intranet search engine is my default at work, then it would > identify me as an IBM employee. Can we only report hashes, or only report > the search engine's URL if we found it to have been tampered with? Identifying hijacked instances will rely on the full URI, as service providers maintain affiliate programs which make use of affiliate codes that are used in conjunction with monetizing hijacking activities. This provides for better identification of known hijackers, and also allows for identification of new hijackers as they come out. I'm happy to walk this through justifications, because there's definite benefit to the user in being helped to fix a known problem.
So, tradeoffs, tradeoffs. Controversies: 1. do we record all getDefaultEngines() or just getDefaultEngine()[0] 2. do we add attributes to `currentEngine`. a) if so, telemetryId? b) getSubmission? c) Should we replace .identifier with telemetryId? 3. should getSubmission() be hashed. Problems I want to solve most: - one-time full census of issues, problems, tactics. (overall approach = measure big, then drop data at server / remove proves) - anticipate reactions, and measure likely escalation. Short answers (for my opinion): 1. [0] is sufficient for 'most cases'. All() allows us to catch patterns and attempts (for when people try to hijack google, but it's not yet default). I claim that for 'default' engines, there is no privacy concern, if things are working correctly. There is a *storage* concern, but 'normal' packets could be dropped at the server. 2. a) yes. Adding .telemetryId has little additional risk. Benefits: detect load order issues. b) getSubmissionPath puts legitimate engines at risk. I claim that this risk is worthwhile, for ALL ENGINES. c) No: Risk: replacing identifier with telemetryId may break #metrics analyses, dashboards, etc. 3. No. Hashing would protect privacy for legitimate users. I claim that easily seeing the partners and structure of malicious urls is more valuable. Each engine uses different query args, and conventions. After a week or two of analysis, we should be able to whitelist ones and drop those at the server. - Right now, we have poor insight into how much these problems are occurring. - Risk to legitimate users: If you have a sensitive engine (say uTorrent, Pornhub, or Company-Internal), it will show as 'other-uTorrent'. This is already 'embarrassing'. Adding an .telemetryId will make the origin clearer ([profile]/uTorrent.xml), with no additional risk. Claim: Recording full submission url increases the risk for legitimate users, because it may add identifiable keys. - If the engine is *malicious*, then the submissionURL is the fingerprint of the attack. (as in #17). Knowing this helps to know who is responsible, and allows more forms of pressure (business contracts, press, legal) to come to bear on the problem.
Flags: needinfo?(kev)
(In reply to Gregg Lind (User Advocacy - Heartbeat - Test Pilot) from comment #18) > - Risk to legitimate users: If you have a sensitive engine (say uTorrent, > Pornhub, or Company-Internal), it will show as 'other-uTorrent'. This is > already 'embarrassing'. Adding an .telemetryId will make the origin clearer > ([profile]/uTorrent.xml), with no additional risk. Naming this telemetryId (here and in the patch) is opaque - it seems like it should be e.g. anonymizedEnginePath.
Rank: 32
Flags: firefox-backlog+
Priority: -- → P3
Attached patch WIP v4 (obsolete) (deleted) — Splinter Review
Vladan, do you like this new proposal from a privacy point of view? I'm now extending what has been done in bug 1138503 to send in the environment more information about the engine currently set as the default. For user installed engines, the only additional piece of info is the anonymized load path. For engines that are coming from a jar file, or from the application folder, or from the distribution folder, or engines with a name causing it to be sorted at the top of the list, we will also include the submissionURL, which can let us see if partner codes have been changed. Alessio, is it fine to just replace the string you are reporting in bug 1138503 with JSON data, or do you expect this to cause any issue?
Assignee: glind → florian
Attachment #8610154 - Attachment is obsolete: true
Flags: needinfo?(vdjeric)
Flags: needinfo?(alessio.placitelli)
Comment on attachment 8614135 [details] [diff] [review] WIP v4 Review of attachment 8614135 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/TelemetryEnvironment.jsm @@ +850,5 @@ > // Make sure we have a settings section. > this._currentEnvironment.settings = this._currentEnvironment.settings || {}; > // Update the search engine entry in the current environment. > + this._currentEnvironment.settings.defaultSearchEngine = > + Services.search.wrappedJSObject.getDefaultEngineInfo(); What form will this data have, can you summarize? We want to have data equivalency with FHR and need to run comparisons for verification. Search data is one of the critical metrics there.
Flags: needinfo?(alessio.placitelli)
(In reply to Florian Quèze [:florian] [:flo] from comment #20) Some example of the defaultSearchEngine values put in the Telemetry environment: Google as the default, from omni.ja: {"identifier":"google","name":"Google","loadPath":"jar:[app]/omni.ja!browser/google.xml","submissionURL":"https://www.google.com/search?q=&ie=utf-8&oe=utf-8"} Google from an unpacked omni.ja (ie. a local build that hasn't been packaged yet): {"identifier":"google","name":"Google","loadPath":"[app]/chrome/en-US/locale/browser/searchplugins/google.xml","submissionURL":"https://www.google.com/search?q=&ie=utf-8&oe=utf-8"} User installed engine: {"name":"Ecosia","loadPath":"[profile]/searchplugins/ecosia.xml"} Engine from a distribution partner: {"name":"Yahoo","loadPath":"[distribution]/searchplugins/common/ysearch.xml","submissionURL":"https://fr.search.yahoo.com/search?p=&ei=UTF-8&fr=chrf-yff38"}
Comment on attachment 8614135 [details] [diff] [review] WIP v4 Review of attachment 8614135 [details] [diff] [review]: ----------------------------------------------------------------- (In reply to Gregg Lind (User Advocacy - Heartbeat - Test Pilot) from comment #18) > one-time full census of issues, problems, tactics. (overall approach = measure big, then drop data at server / remove proves) Dropping data at the server is tricky. AIUI, the backend can't modify Telemetry pings, only delete them. Obviously, we can't delete lots of pings from a release. So if we want to purge the collected data server-side, we would have to limit the number of pings with sensitive data to a small random % of users. (In reply to Florian Quèze [:florian] [:flo] from comment #20) > Vladan, do you like this new proposal from a privacy point of view? I do like this! I recommend first landing this Telemetry on Nightly to check that we don't have any privacy oversights. After that we can uplift. ::: toolkit/components/search/nsSearchService.js @@ +4710,5 @@ > + > + if (sendSubmissionURL) { > + let submission = > + engine._getURLOfType("text/html").getSubmission("", engine, "searchbar"); > + result.submissionURL = submission.uri.spec; Is it still possible for the username/password to be explicitly provided in a URL? e.g. https://username:password@server/resource.ext http://support2.microsoft.com/default.aspx?scid=kb;[LN];834489
Flags: needinfo?(vdjeric)
(In reply to Florian Quèze [:florian] [:flo] from comment #22) > (In reply to Florian Quèze [:florian] [:flo] from comment #20) > > Some example of the defaultSearchEngine values put in the Telemetry > environment: > > Google as the default, from omni.ja: > {"identifier":"google","name":"Google","loadPath":"jar:[app]/omni.ja!browser/ > google.xml","submissionURL":"https://www.google.com/search?q=&ie=utf- > 8&oe=utf-8"} > > Google from an unpacked omni.ja (ie. a local build that hasn't been packaged > yet): > {"identifier":"google","name":"Google","loadPath":"[app]/chrome/en-US/locale/ > browser/searchplugins/google.xml","submissionURL":"https://www.google.com/ > search?q=&ie=utf-8&oe=utf-8"} > > User installed engine: > {"name":"Ecosia","loadPath":"[profile]/searchplugins/ecosia.xml"} > > Engine from a distribution partner: > {"name":"Yahoo","loadPath":"[distribution]/searchplugins/common/ysearch.xml", > "submissionURL":"https://fr.search.yahoo.com/search?p=&ei=UTF-8&fr=chrf- > yff38"} Brendan, i assume you are not running comparisons on that yet? Would it problematic for metrics needs to remap the proposed search engine data here to what matches FHR [0] [1]? If yes we can probably just move that to a separate field and leave `defaultSearchEngine` as-is. 0: https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/services/healthreport/healthreport/dataformat.html#org-mozilla-searches-engines 1: https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/toolkit/components/telemetry/telemetry/environment.html#defaultsearchengine
Flags: needinfo?(bcolloran)
(In reply to Georg Fritzsche [:gfritzsche] from comment #24) > (In reply to Florian Quèze [:florian] [:flo] from comment #22) > > (In reply to Florian Quèze [:florian] [:flo] from comment #20) > > > > Some example of the defaultSearchEngine values put in the Telemetry > > environment: > > > > Google as the default, from omni.ja: > > {"identifier":"google","name":"Google","loadPath":"jar:[app]/omni.ja!browser/ > > google.xml","submissionURL":"https://www.google.com/search?q=&ie=utf- > > 8&oe=utf-8"} > > > > Google from an unpacked omni.ja (ie. a local build that hasn't been packaged > > yet): > > {"identifier":"google","name":"Google","loadPath":"[app]/chrome/en-US/locale/ > > browser/searchplugins/google.xml","submissionURL":"https://www.google.com/ > > search?q=&ie=utf-8&oe=utf-8"} > Brendan, i assume you are not running comparisons on that yet? > Would it problematic for metrics needs to remap the proposed search engine > data here to what matches FHR [0] [1]? I'll note that I included the 'identifier' value only for backward compatibility here, but the loadPath value should be giving us much more reliable information. At some point in the future, I would like us to stop using the identifier completely, and to just remove it. Doesn't have to happen now, but if metrics tools can be adapted sooner than later to stop relying on 'identifier', that would be extra nice :).
(In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #23) > I recommend first landing this Telemetry on Nightly to check that we don't > have any privacy oversights. After that we can uplift. How far can we hope to uplift? We will apparently need to uplift bug 1138503 at the same time, as it landed only for 41. It looks like TelemetryEnvironment.jsm already existed in 39 (landed in bug 1122047), but there have been lots of changes to that file since the initial landing. > ::: toolkit/components/search/nsSearchService.js > @@ +4710,5 @@ > > + > > + if (sendSubmissionURL) { > > + let submission = > > + engine._getURLOfType("text/html").getSubmission("", engine, "searchbar"); > > + result.submissionURL = submission.uri.spec; > > Is it still possible for the username/password to be explicitly provided in > a URL? e.g. https://username:password@server/resource.ext Good catch! The next version of the patch will clear the username/password from submission URLs.
Flags: needinfo?(kev)
Flags: needinfo?(glind)
(In reply to Florian Quèze [:florian] [:flo] from comment #26) > (In reply to Vladan Djeric (:vladan) -- please needinfo! from comment #23) > > > I recommend first landing this Telemetry on Nightly to check that we don't > > have any privacy oversights. After that we can uplift. > > How far can we hope to uplift? We will apparently need to uplift bug 1138503 > at the same time, as it landed only for 41. We do have data submissions from pre-release channels with environment data from 39, but there are known bugs and confidence in the data is not high. The earliest ship-date to the release channel is now 40, i would uplift to that. We will try to uplift bug 1138503 and others to 40 as soon as we have some issues fixed.
(In reply to Georg Fritzsche [:gfritzsche] from comment #24) > (In reply to Florian Quèze [:florian] [:flo] from comment #22) > > (In reply to Florian Quèze [:florian] [:flo] from comment #20) > > > > Some example of the defaultSearchEngine values put in the Telemetry > > environment: > > > > Google as the default, from omni.ja: > > {"identifier":"google","name":"Google","loadPath":"jar:[app]/omni.ja!browser/ > > google.xml","submissionURL":"https://www.google.com/search?q=&ie=utf- > > 8&oe=utf-8"} > > > > Google from an unpacked omni.ja (ie. a local build that hasn't been packaged > > yet): > > {"identifier":"google","name":"Google","loadPath":"[app]/chrome/en-US/locale/ > > browser/searchplugins/google.xml","submissionURL":"https://www.google.com/ > > search?q=&ie=utf-8&oe=utf-8"} > > > > User installed engine: > > {"name":"Ecosia","loadPath":"[profile]/searchplugins/ecosia.xml"} > > > > Engine from a distribution partner: > > {"name":"Yahoo","loadPath":"[distribution]/searchplugins/common/ysearch.xml", > > "submissionURL":"https://fr.search.yahoo.com/search?p=&ei=UTF-8&fr=chrf- > > yff38"} > > Brendan, i assume you are not running comparisons on that yet? > Would it problematic for metrics needs to remap the proposed search engine > data here to what matches FHR [0] [1]? > If yes we can probably just move that to a separate field and leave > `defaultSearchEngine` as-is. > > 0: > https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/services/ > healthreport/healthreport/dataformat.html#org-mozilla-searches-engines > 1: > https://ci.mozilla.org/job/mozilla-central-docs/Tree_Documentation/toolkit/ > components/telemetry/telemetry/environment.html#defaultsearchengine needinfo-ing Christina and Dave, who work more with search stuff than I do. This is all fine by me, but I'm not the main consumer of this info.
Flags: needinfo?(dzeber)
Flags: needinfo?(christina)
Flags: needinfo?(bcolloran)
Attached patch Patch v5 (obsolete) (deleted) — Splinter Review
The wrappedJSObject hack is so that the patch can land on aurora. For landing on central, I'll remove that hack and expose the getDefaultEngineInfo function in the nsIBrowserSearchService interface.
Attachment #8614135 - Attachment is obsolete: true
Attachment #8615426 - Flags: review?(markh)
(In reply to Mark Hammond [:markh] from comment #16) > The code itself looks fine to me, and I'll leave the debates about the > privacy tradeoffs to Gregg and Vladan. I'd have thought a fairly simple test > for _telemetryId() would be good, and it should also be possible to test > exactly what was submitted to telemetry - ie, I guess I don't see what is > particularly difficult about testing this. (Maybe testing XREAppDist is > tricky in an xpcshell environment, but living without specific coverage for > that might be ok) The test from bug 1138503 already covered the case of an engine in a jar file, of an engine in the profile, and of no engine at all. I just adapted that test, I think that'll be enough testing coverage. The case I was concerned about testing was an engine in the omni.ja file, which doesn't exist yet at the time our tests run. I guess we'll just live without coverage for that. I also didn't test the distribution case, but I think that'll be OK too.
Comment on attachment 8615426 [details] [diff] [review] Patch v5 Review of attachment 8615426 [details] [diff] [review]: ----------------------------------------------------------------- Looks fine to me - please take my comments as suggestions only. ::: toolkit/components/search/nsSearchService.js @@ +2841,5 @@ > + uri.QueryInterface(Ci.nsIFileURL) > + file = uri.file; > + } > + } > + it looks to me like if file is null after this block, prefix and suffix also are null, meaning the result is "unknown". Assuming I'm not missing anything, I think we should just early return if (!file) here. Also, given this is about mal(ish)ware, it seems there might be a risk that this._file is null and the scheme isn't chrome - is it work trying to capture at least the scheme that's actually being used so analysis of the data might tell us something interesting? eg, "[scheme]/resource" or similar? @@ +2874,5 @@ > + // If the folder doesn't have a known ancestor, don't record its path to > + // avoid leaking user identifiable data. > + if (!id) > + id = "[other]/" + file.leafName; > + } please cuddle this else (the file is already a bit inconsistent, but no need to make that worse :) (oops - but if you take my other suggestion of the early-return above, it gets removed completely!) @@ +4674,5 @@ > + engine = this.defaultEngine; > + } catch(e) { > + // The defaultEngine getter will throw if there's no engine at all, > + // which shouldn't happen unless an add-on or a test deleted all of them. > + // Our preferences UI doesn't let users do that. maybe a cu.reportError()? @@ +4680,5 @@ > + > + if (!engine) { > + result.name = "NONE"; > + } > + else { please cuddle else ::: toolkit/components/telemetry/tests/unit/test_TelemetryEnvironment.js @@ +1009,5 @@ > // Our default engine from the JAR file has an identifier. Check if it is correctly > // reported. > data = TelemetryEnvironment.currentEnvironment; > checkEnvironmentData(data); > + let expectedSearchEngineData = '{"identifier":"telemetrySearchIdentifier","name":"telemetrySearchIdentifier","loadPath":"jar:[other]/searchTest.jar!testsearchplugin/telemetrySearchIdentifier.xml","submissionURL":"http://ar.wikipedia.org/wiki/%D8%AE%D8%A7%D8%B5:%D8%A8%D8%AD%D8%AB?search=&sourceid=Mozilla-search"}'; would it be possible to make this prettier by writing it as let expected.. = JSON.stringify({ identifier: ..., ... }); ? @@ +1042,5 @@ > > data = TelemetryEnvironment.currentEnvironment; > checkEnvironmentData(data); > > + const EXPECTED_SEARCH_ENGINE = '{"name":"telemetry_default","loadPath":"[profile]/searchplugins/telemetrydefault.xml"}'; similarly here, although it's not quite as ugly here :)
Attachment #8615426 - Flags: review?(markh) → review+
Attached patch Patch v6 (obsolete) (deleted) — Splinter Review
(In reply to Mark Hammond [:markh] from comment #31) > Looks fine to me - please take my comments as suggestions only. Thanks for the review! :) I'm carrying forward your r+, but feel free to have another look at the changes if you see this before it lands. > it looks to me like if file is null after this block, prefix and suffix also > are null, meaning the result is "unknown". Assuming I'm not missing > anything, I think we should just early return if (!file) here. > > Also, given this is about mal(ish)ware, it seems there might be a risk that > this._file is null and the scheme isn't chrome - is it work trying to > capture at least the scheme that's actually being used so analysis of the > data might tell us something interesting? eg, "[scheme]/resource" or > similar? This is a good idea, I've made it return [scheme]/leafName. And... I think this change eliminates the only remaining case where we were returning 'unknown', so I simplified the code.
Attachment #8615426 - Attachment is obsolete: true
Attachment #8616010 - Flags: review+
Attached patch Remove wrappedJSObject hack (obsolete) (deleted) — Splinter Review
This patch is only for central, and applies after attachment 8616010 [details] [diff] [review] which I expect to uplift to aurora.
Attachment #8616011 - Flags: review?(markh)
Attachment #8616011 - Flags: review?(markh) → review+
Comment on attachment 8616011 [details] [diff] [review] Remove wrappedJSObject hack Review of attachment 8616011 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/TelemetryEnvironment.jsm @@ +854,5 @@ > // Make sure we have a settings section. > this._currentEnvironment.settings = this._currentEnvironment.settings || {}; > // Update the search engine entry in the current environment. > this._currentEnvironment.settings.defaultSearchEngine = > + Services.search.getDefaultEngineInfo(); We should not do this until we hear back from the people who will use the measurement. If this is urgent to land, we could maybe add it as a separate field for now, say |defaultSearchEngineData|. You should also update the documentation in environment.rst.
(In reply to Georg Fritzsche [:gfritzsche] from comment #35) > Comment on attachment 8616011 [details] [diff] [review] > Remove wrappedJSObject hack > > Review of attachment 8616011 [details] [diff] [review]: > ----------------------------------------------------------------- > > ::: toolkit/components/telemetry/TelemetryEnvironment.jsm > @@ +854,5 @@ > > // Make sure we have a settings section. > > this._currentEnvironment.settings = this._currentEnvironment.settings || {}; > > // Update the search engine entry in the current environment. > > this._currentEnvironment.settings.defaultSearchEngine = > > + Services.search.getDefaultEngineInfo(); > > We should not do this until we hear back from the people who will use the > measurement. > If this is urgent to land, we could maybe add it as a separate field for > now, say |defaultSearchEngineData|. > > You should also update the documentation in environment.rst. This was supposed to comment on attachment 8616010 [details] [diff] [review].
Iteration: --- → 41.3 - Jun 29
Rank: 32 → 4
Priority: P3 → P1
(In reply to Georg Fritzsche [:gfritzsche] from comment #35) > We should not do this until we hear back from the people who will use the > measurement. Turns out dzeber is the person that will have the best insight into how this search data should be set up. He's been out on vacation, but will be back this week.
Flags: needinfo?(christina)
(In reply to Georg Fritzsche [:gfritzsche] from comment #36) > (In reply to Georg Fritzsche [:gfritzsche] from comment #35) > > Comment on attachment 8616011 [details] [diff] [review] > > Remove wrappedJSObject hack > > > > Review of attachment 8616011 [details] [diff] [review]: > > ----------------------------------------------------------------- > > > > ::: toolkit/components/telemetry/TelemetryEnvironment.jsm > > @@ +854,5 @@ > > > // Make sure we have a settings section. > > > this._currentEnvironment.settings = this._currentEnvironment.settings || {}; > > > // Update the search engine entry in the current environment. > > > this._currentEnvironment.settings.defaultSearchEngine = > > > + Services.search.getDefaultEngineInfo(); > > > > We should not do this until we hear back from the people who will use the > > measurement. > > If this is urgent to land, we could maybe add it as a separate field for > > now, say |defaultSearchEngineData|. Talking with Brendan again, storing the engine data in a separate field now would make our comparisons to the FHR data definitely easier. Can we do that and drop the |defaultSearchEngine| field later after we could make the shipping decision?
> Talking with Brendan again, storing the engine data in a separate field now > would make our comparisons to the FHR data definitely easier. > Can we do that and drop the |defaultSearchEngine| field later after we could > make the shipping decision? I agree that we should keep the current |defaultSearchEngine| as is, and add the new detailed info as a separate |defaultSearchEngineData| field. This will make things easiest on our end.
Flags: needinfo?(dzeber)
(In reply to Dave Zeber [:dzeber] from comment #39) > > Talking with Brendan again, storing the engine data in a separate field now > > would make our comparisons to the FHR data definitely easier. > > Can we do that and drop the |defaultSearchEngine| field later after we could > > make the shipping decision? > > I agree that we should keep the current |defaultSearchEngine| as is, and add > the new detailed info as a separate |defaultSearchEngineData| field. This > will make things easiest on our end. Then, can we drop the 'identifier' value immediately from the new JSON data? I had included it to simplify the transition, but if we are keeping the old defaultSearchEngine value during the transition, I don't think it's needed anymore.
Flags: needinfo?(dzeber)
Attached patch Patch v7 (obsolete) (deleted) — Splinter Review
- reverted defaultSearchEngine to its current value. - added defaultSearchEngineData for the JSON info (I didn't include the identifier in it). - updated environment.rst Georg, requesting review from you mostly for the environment.rst changes. I'll be carrying forward Mark's review for the rest of the patch.
Attachment #8616010 - Attachment is obsolete: true
Attachment #8617248 - Flags: review?(gfritzsche)
Carrying forward markh's review here; simple unbitrotting.
Attachment #8616011 - Attachment is obsolete: true
Attachment #8617249 - Flags: review+
Comment on attachment 8617248 [details] [diff] [review] Patch v7 Review of attachment 8617248 [details] [diff] [review]: ----------------------------------------------------------------- This looks ok, but we should really not make defaultEngineData a string if possible. ::: toolkit/components/telemetry/docs/environment.rst @@ +35,5 @@ > blocklistEnabled: <bool>, // true on failure > isDefaultBrowser: <bool>, // null on failure, not available on Android > defaultSearchEngine: <string>, // e.g. "yahoo" > + defaultSearchEngineData: <string>, // JSON data giving the engine name, > + // its loadpath and for default engines the submission URL. Does this need to be a string? It should just be a JSON style object so we can work with all the data with one JSON serialization or deserialization.
Attachment #8617248 - Flags: review?(gfritzsche) → review+
(In reply to Georg Fritzsche [:gfritzsche] from comment #43) > Comment on attachment 8617248 [details] [diff] [review] > Patch v7 > > Review of attachment 8617248 [details] [diff] [review]: > ----------------------------------------------------------------- > > This looks ok, but we should really not make defaultEngineData a string if > possible. > > ::: toolkit/components/telemetry/docs/environment.rst > @@ +35,5 @@ > > blocklistEnabled: <bool>, // true on failure > > isDefaultBrowser: <bool>, // null on failure, not available on Android > > defaultSearchEngine: <string>, // e.g. "yahoo" > > + defaultSearchEngineData: <string>, // JSON data giving the engine name, > > + // its loadpath and for default engines the submission URL. > > Does this need to be a string? > It should just be a JSON style object so we can work with all the data with > one JSON serialization or deserialization. In the end it comes down to what form Metrics can best work with, but i don't see why any object-style data should be on this in stringified form. Note that the string form is also fragile for ordering, even if we put big comments in the code that generates it.
Attached patch Patch v8 (obsolete) (deleted) — Splinter Review
Attachment #8617248 - Attachment is obsolete: true
Attachment #8617317 - Flags: review?(gfritzsche)
Attachment #8617249 - Attachment is obsolete: true
Attachment #8617318 - Flags: review+
Comment on attachment 8617317 [details] [diff] [review] Patch v8 Review of attachment 8617317 [details] [diff] [review]: ----------------------------------------------------------------- ::: toolkit/components/telemetry/docs/environment.rst @@ +37,5 @@ > defaultSearchEngine: <string>, // e.g. "yahoo" > + defaultSearchEngineData: {, // data about the current default engine > + name: <string>, // e.g. "Yahoo" > + loadPath: <string>, // where the engine line is located > + submissionURL: <string> // only included for default engines loadPath & submissionURL are only present if we have any default search engine set. We should add something to that effect to the comment. This is the default search engine data, so "only included for default engine" seems redundant. @@ +213,5 @@ > This field's contents are ``Services.search.defaultEngine.identifier`` (if defined) or ``"other-"`` + ``Services.search.defaultEngine.name`` if not. In other words, search engines without an ``.identifier`` are prefixed with ``other-``. > + > +defaultSearchEngineData > +~~~~~~~~~~~~~~~~~~~~~~~ > +Contains JSON style data identifying the engine currently set as the default. Nit: "Contains data identifying ..." as the whole ping structure is JSON style. @@ +220,5 @@ > + > +- a ``name`` property with the name of the engine, or ``NONE`` if no > + engine is currently set as the default. > + > +- a ``loadPath`` property: an anonymized path of the engine xml file, e.g. Document as only present if we have a default search engine set. @@ +228,5 @@ > + [distribution]/searchplugins/common/engine.xml > + [other]/engine.xml > + > +- a ``submissionURL`` property with the HTTP url we would use to make > +a search. (Only included for default engines.) Document as only present if we have a default search engine set. "(Only included for default engines.)" seems redundant.
Attachment #8617317 - Flags: review?(gfritzsche) → review+
Attached patch Patch v8 with improved comments (obsolete) (deleted) — Splinter Review
Attachment #8617317 - Attachment is obsolete: true
Attachment #8617346 - Flags: review+
Attached patch Patch v9 (as landed) (deleted) — Splinter Review
My first try push (https://treeherder.mozilla.org/#/jobs?repo=try&revision=39bf4092f67c) had xpcshell failures on Windows due to the Windows path separator being \. I added a .replace(/\\/g, "/") and then the tests passed: https://treeherder.mozilla.org/#/jobs?repo=try&revision=3374b9815e33 Removing the needinfo from comment 40 as I'm no longer waiting for additional information here.
Attachment #8617346 - Attachment is obsolete: true
Flags: needinfo?(dzeber)
Attachment #8617852 - Flags: review+
Depends on: 1174223
Could we have Metrics review the data we're seeing from Nightly and verify its meeting needs? Needinfo-ing Christina for feedback/review
Flags: needinfo?(christina)
(In reply to Kev Needham [:kev] from comment #52) > Could we have Metrics review the data we're seeing from Nightly and verify > its meeting needs? Needinfo-ing Christina for feedback/review Please follow-up on this in bug 1174223.
Comment on attachment 8617852 [details] [diff] [review] Patch v9 (as landed) Now that bug 1120356 (and more specifically bug 1138503) have reached aurora, we can uplift this too. Approval Request Comment [Feature/regressing bug #]: record data about search hijacking so that we can plan the next steps. [User impact if declined]: no direct user impact; indirect user impact is that it would possibly take us another 6 weeks to have a solid plan to help users suffering from hijacked search. [Describe test coverage new/current, TreeHerder]: landed on central more than a week ago. Covered by automated tests. [Risks and why]: low risk given the testing this has already received. The code additions are self-contained. [String/UUID change made/needed]: none, because this patch uses a hack to avoid changing the nsIBrowserSearchService interface. This hack has been reverted on central already.
Attachment #8617852 - Flags: approval-mozilla-aurora?
Update: Ilana is reviewing the data collection and we should be able to give an update before the end of the week.
Flags: needinfo?(isegall)
Flags: needinfo?(isegall)
Flags: needinfo?(christina) → needinfo?(isegall)
Comment on attachment 8617852 [details] [diff] [review] Patch v9 (as landed) We are not in the beta cycle yet, we would have time to fix potential regressions and it is important to fix that. Taking it.
Attachment #8617852 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
(In reply to Ryan VanderMeulen [:RyanVM UTC-4] from comment #57) > https://hg.mozilla.org/releases/mozilla-aurora/rev/7fdd6ef12231 Ryan, this second changeset wasn't meant for aurora, it changes an interface.
Flags: needinfo?(ryanvm)
Per IRC discussion, we're going to leave things as-is since interfaces don't freeze until beta.
Flags: needinfo?(ryanvm)
Provisionally approved; running analysis to see if any obvious impersonation is going on. Right now, distribution list for that analysis will be by end of week and will include glind and cchoi. If you'd like to be on this list, let me know as soon as possible.
Flags: needinfo?(isegall)
Seeing environment.settings.defaultSearchEngineData in about:telemetry on an aurora 40.0a2 profile, but not seeing it when I pull telemetry from aws (most updated aurora and nightly). I see the following fields in environment.settings: isDefaultBrowser', u'locale', u'update', u'e10sEnabled', u'blocklistEnabled', u'userPrefs', u'telemetryEnabled', so environment settings data is being submitted, but not defaultSearchData. Also, saved-telemetry-pings in profile directory (not sure if that's still actively used) also omits defaultSearchEngineData. Is this not being uploaded to the server? If not, is that intentional?
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
Ilana, what's the buildid? This didn't uplift to aurora until the 25-June build.
(In reply to Benjamin Smedberg [:bsmedberg] from comment #62) > Ilana, what's the buildid? This didn't uplift to aurora until the 25-June > build. Right - if you pull samples you might have data from older builds which don't contain this data yet. The same might apply for your local pings in saved-telemetry-pings. New pings should contain the data, you can check the archived pings in about:telemetry (local archive of the same data that is submitted to the server).
Flags: needinfo?(mreid)
Flags: needinfo?(gfritzsche)
Thanks, everyone!
Several search engines have null submission urls, example: en-US other-AOL Web Search AOL Web Search [profile]/searchplugins/aol-web-search.xml None Any idea why this would be happening?
Flags: needinfo?(gfritzsche)
(In reply to Ilana from comment #65) > Several search engines have null submission urls, example: > > en-US other-AOL Web Search AOL Web Search > [profile]/searchplugins/aol-web-search.xml None > > > Any idea why this would be happening? Because it's not an engine we ship, so we don't report the submission URL for privacy reasons.
Flags: needinfo?(gfritzsche)
We miscommunicated during the design phase then. Glad it's a "worked as designed" at least! Claim: FOR HOSTILE providers, those submission urls (in particular the partner codes) are the key to identifying and pressuring them. (For example, through pressuring engines to cancel their affiliate contracts). I will review the preliminary data and see if there is some sensible compromise here.
> Because it's not an engine we ship, so we don't report the submission URL > for privacy reasons. FWIW we do on Fennec.
We appear to capture the submissionURL of some non-defaults (ex: LEO Eng-Deu) and DON'T for several that are theoretically built-in (some instances of Yahoo!). See below: https://docs.google.com/spreadsheets/d/1V2uuxxqAwvIHuSW3ahmQPeYM9gfEukTWmCrD1MtdstM/edit#gid=0
(In reply to Ilana from comment #69) > We appear to capture the submissionURL of some non-defaults (ex: LEO > Eng-Deu) and DON'T for several that are theoretically built-in (some > instances of Yahoo!). Are we talking a bout "shipping" (comment 66) or "default"? Because we don't ship AOL, but LEO is a default searchplugin for the German version of Firefox. https://transvision.mozfr.org/productization/?locale=de&product=browser
Hi all, I'm not sure what is the correct course of action here to add the submissionURL, but we need to do so for two reasons: 1) Understand the scope of search hijacking. I hope I don't need to explain why this is a high priority. 2) Consistency with other FHR/Telemetry implementations, notably Fennec, which does include this information.
So, let me clarify what I said in comment 66. We report the submissionURL for engines that are: - in the application folder (these should be our shipped defaults) - in the distribution folder (these should be from known partners) Additionally we report the submission URL for engines that have a non default order (ie. they are placed at the top of the engine list rather than sorted alphabetically). These are the engines that have their name mentioned in one of the browser.search.order.* prefs. Things we could do to collect a bit more data: - collect the submissionURL if Firefox thinks an engine that isn't from the application/distribution folders is the original default engine. - add a prefix to the submissionURL when it has been collected because of a browser.search.order pref. Per discussion during the data review, we intentionally don't report the submission URL for engines that could have intentionally be installed by the user (and so contain personally identifiable information) : - the engines in [profile]/searchplugins/ - the engines in [profile]/extensions/ (note that we do report the name of the engine file, so we can still figure out what extension installed the engine) Note that since bug 1109354, engines installed in the profile folder can no longer override the engines in the application or distribution folders, so these engines are not responsible for search impersonation. Due to a bug in the patch here, we currently report the submissionURL for engines that are inside packed add-ons (ie. loadpaths like jar:[profile]/extensions/file.xpi!...). I'll file a follow-up to fix that. (In reply to John Jensen from comment #71) > 1) Understand the scope of search hijacking. I hope I don't need to explain > why this is a high priority. I don't see how collecting the submissionURL of user-installed engines helps us understand the scope of search hijacking. We are already collecting much more data than we were before, and AFAIK we are just beginning the analysis of that additional data. I think the data we are now collecting is enough to help us plan the next steps in search hijacking mitigation. > 2) Consistency with other FHR/Telemetry implementations, notably Fennec, > which does include this information. Can you point me to the Fennec implementation you are referring to? The submissionURL thing is something I added and that didn't exist at all before this patch. It's being collected from nsSearchService.js which is shared between Firefox Desktop and Fennec.
Depends on: 1184974
(In reply to Florian Quèze [:florian] [:flo] from comment #72) > Due to a bug in the patch here, we currently report the submissionURL for > engines that are inside packed add-ons (ie. loadpaths like > jar:[profile]/extensions/file.xpi!...). I'll file a follow-up to fix that. Filed bug 1184974 to fix this.
Blocks: 1259139
Blocks: 1576184
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: