Closed
Bug 1159884
Opened 10 years ago
Closed 9 years ago
Implement inadjacency with a hardcoded list of hashed sites
Categories
(Firefox :: New Tab Page, defect)
Firefox
New Tab Page
Tracking
()
People
(Reporter: kghim, Assigned: mzhilyaev)
References
(Blocks 1 open bug)
Details
(Whiteboard: .002)
Attachments
(5 files, 6 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
benjamin
:
feedback+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
image/png
|
Details |
Problem: commercial partners do not want their content to be negatively associated with adult content. In the context of Suggested Tiles, this means no sponsored or affiliate tiles should appear within same browser viewport. As an example, MGM would not want a 007 DVD release to be appearing within the same page (in particular, directly next to) where PirateBay tile appears.
Scope:
- Prevent Suggested Tiles from appearing when a blacklisted site appears within the user top 100 frecent history
- Limit to adult content sites for 39 release
- Utilize preexisting open source blacklists such as:
http://urlblacklist.com/?sec=download
http://dsi.ut-capitole.fr/blacklists/index_en.php
http://www.squidblacklist.org/
Success criteria:
- Physical verification: with a test campaign Suggested Tile, verify that the test tile does not appear with the blacklisted site within the viewport of the new tab page
- Data verification: with a given test campaign's report, verify that the blacklisted site did not record an impression. I assume we would have to use Rappor or RR for this.
Reporter | ||
Updated•10 years ago
|
Whiteboard: .002
Comment 1•10 years ago
|
||
(In reply to Kevin Ghim from comment #0)
> - Data verification: with a given test campaign's report, verify that the
> blacklisted site did not record an impression. I assume we would have to use
> Rappor or RR for this.
We have not discussed using RAPPOR or randomized response for sending back user history, so I'm not sure what you're suggesting for "test campaign's report." And even with RAPPOR/RR, it wouldn't tell us for sure if a blacklisted site was visible.
We don't send impressions about which history tiles were shown.
Comment 2•10 years ago
|
||
jterry, is there an acceptable way to shorten the lists? http://dsi.ut-capitole.fr/blacklists/index_en.php has an adult list with over 1million sites. Can we do just the top X number of adult sites?
Flags: needinfo?(jterry)
Reporter | ||
Comment 3•10 years ago
|
||
Taking this off of priority list until we hear back from Lass on how agencies work with Twitter and Facebook's user generated content adjacency issues.
Whiteboard: .002 → .?
Comment 4•10 years ago
|
||
In that we might not need to implement this at all? Or we just don't know the details of the list?
If it's the latter, maksik can continue implementing something to support a list that we decide to go with.
Comment 5•10 years ago
|
||
Just for posterity, what we discussed earlier today and my thoughts:
So there are several lines of thinking that interrelate on this issue I think that it might be best if we set aside some time just on this to discuss further and solve in more detail. If needed I am in.
BD side - as we discussed, we can't and therefore shouldn't agree to terms that guarantee to a partner that we can prevent any type of adjacency issues (mainly pinning is a user control feature that we do not have any control on). But, to close this issue with them and help we can offer that we will make reasonable efforts to control for certain types of adjaceny (if they ask what this means we would have to be more and more specific so the devil is in the details of what we can control technologically and I would need to know if we can do this if at all...how?), and that upon a notice from the partner we can implement certain take-down control (i.e., turning a Pinned Tile into a History Tile (Ed to confirm this can be done easily), or perhaps moving the partner Tile to a new location in the grid).
Blacklisting - I think we need to think through carefully again and operate very transparently and share with the world our plan. In my view, Mozilla should curate this list. We can check out publicly available data and then create our own. Any lists that we get from others or that have license terms I need to be looped in and let's discuss the rules of the road more deeply for each as we go. I think we should involve policy on this early as well.
I think we need to define what we mean when we say "adult content" or any other similarly blacklisted category or site and the who, what and why very carefully.
I don't think we should let partners dictate at all what goes on our lists.
I don't think we should rely on others lists or license them as there are many problems with that (cost, SLA, 3rd party curating the web for Mozilla and policy problems etc..)
This is probably the flip of the "what partners we will work with list" / whitelist, so perhaps we can leverage that as we build this one and again policy needs to be involved.
I'd like to understand how RAPPOR works here a bit more. If we are doing user data stuff here too btw let's add Marshall and make sure he understands this piece.
Whiteboard: .? → .002
Comment 6•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #2)
> jterry, is there an acceptable way to shorten the lists?
> http://dsi.ut-capitole.fr/blacklists/index_en.php has an adult list with
> over 1million sites. Can we do just the top X number of adult sites?
As discussed, we'll do a best effort while being mindful of disk usage. We'll produce a list in bug 1160596 to hardcode into Firefox in this bug.
Depends on: 1160596
Flags: needinfo?(jterry)
Comment 7•10 years ago
|
||
maksik said that MattN had some comments about blacklisting:
1) don't have a cleartext list of adult sites in Firefox
2) potential size/disk/memory issues with having a large list (100k+?)
3) could potentially reuse some of safebrowsing
For (1), what's the concern here? We could hash eTLD+1 and match, but it's not really hiding anything. A list of porn sites is not hard to find, and hashes as strings probably makes (2) worse.
There was a suggestion from dev.planning on using a bloom filter, which should help address (1) and (2) and provide some false positives to add to some deniability if somehow we tracked that the user triggered on the blacklist.
Flags: needinfo?(MattN+bmo)
Comment 9•9 years ago
|
||
(In reply to Ed Lee :Mardak from comment #7)
> maksik said that MattN had some comments about blacklisting:
>
> 1) don't have a cleartext list of adult sites in Firefox
I agree that we shouldn't do this.
We need to either push back on the requirement or find another solution.
Comment 10•9 years ago
|
||
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #9)
> (In reply to Ed Lee :Mardak from comment #7)
> > 1) don't have a cleartext list of adult sites in Firefox
> I agree that we shouldn't do this.
Just to add some notes from IRC:
The concern isn't so much of plaintext or not, i.e., encoding/decoding (rot13,atob) vs 1-way hash doesn't change the worry.
Part of the reason why this is required is for legal compliance of contracts from advertisers that want to protect their brand, so Mozilla will provide a best effort of not juxtaposing questionable content with the paid content.
We could try to avoid the problem by not having it in the contract in the first place, but this then leaves a significant amount of money on the table from brands that would pay for this feature/protection. We have foregone money in building a system that protects user data and privacy, but I don't think this feature impacts users much at all.
This lack of user impact is also an argument against this because there isn't much user value gained for implementing this either.
From the business side, I hear that negative adjacency is one of the largest if not the largest limiter of how much money can be made.
Comment 11•9 years ago
|
||
Following up from earlier, I think it might be time to have a meeting on this topic as a group so we can be on the same page, avoid surprises and plan the right path to get to release without hiccups or stalls, Kevin will you lead this on this topic? Add all the relevant / key folks here so we can continue the discussion from earlier today.
Flags: needinfo?(kghim)
Comment 12•9 years ago
|
||
We'll include example.com to the list for people to be able to test without actually going to an adult site.
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → mzhilyaev
Assignee | ||
Updated•9 years ago
|
Iteration: --- → 40.2 - 27 Apr
Points: --- → 13
Updated•9 years ago
|
Component: Tiles → New Tab Page
Product: Content Services → Firefox
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(kghim)
Comment 13•9 years ago
|
||
(In reply to Geoff Piper from comment #11)
> Following up from earlier, I think it might be time to have a meeting on
> this topic as a group so we can be on the same page, avoid surprises and
> plan the right path to get to release without hiccups or stalls, Kevin will
> you lead this on this topic? Add all the relevant / key folks here so we can
> continue the discussion from earlier today.
One possible technical approach here is to use the safe browsing technology
to carry the blacklist. This would avoid needing to have any list on the browser
and would also let us leverage existing technology. Geoff, I'd be happy to help
out here as needed.
Updated•9 years ago
|
Iteration: 40.2 - 27 Apr → 41.1 - May 25
Assignee | ||
Comment 14•9 years ago
|
||
Attachment #8604971 -
Flags: review?(msamuel)
Comment 15•9 years ago
|
||
Comment on attachment 8604971 [details] [diff] [review]
v1. initial implementation of negative adjacency
Review of attachment 8604971 [details] [diff] [review]:
-----------------------------------------------------------------
Max,
Are you planning to land a non-invertible transform before shipping?
Comment 16•9 years ago
|
||
(In reply to Eric Rescorla (:ekr) from comment #15)
> Are you planning to land a non-invertible transform before shipping?
We could, what's the benefit of doing that?
Assignee | ||
Comment 17•9 years ago
|
||
> Are you planning to land a non-invertible transform before shipping?
just to clarify that this patch is for getting client's negative adjacency logic right.
it will be trivial to make client download black list from "save browsing" endpoint and/or use crypto hash.
To Ed's point, the black list of ill reputable sites is available: anyone can reconstruct the blacklist client has (or downloads). "non-invertible transform" does not seem to provide any more protection as the simplest clear-text obfuscation (such as base64)... right?
Comment 18•9 years ago
|
||
My understanding was that there was some concern about having the list of
blacklisted sites on the client (hence the interest in Bloom filtrs,
hashing, etc.) I don't personally have a strong opinion on whether this
is necessary, but I would observe that having the data base64ed
(or any trivially invertible transform) looks a lot more like having
the list on the client than not.
Comment 19•9 years ago
|
||
One specific example I heard was to avoid false positive matches of some computer file scanner searching for porn domains triggering on plaintext lists in Firefox. So basic base64 could avoid that.
The bloom filter was more in the context of adding false positive site matches to add noise to when a site isn't shown, so that a malicious listener wouldn't know for sure if it was truly a blacklisted site triggering.
Assignee | ||
Comment 20•9 years ago
|
||
Comment on attachment 8604971 [details] [diff] [review]
v1. initial implementation of negative adjacency
cancelling review until safe browsing issues are considered (see Bug 1164303)
Attachment #8604971 -
Flags: review?(msamuel)
Assignee | ||
Comment 21•9 years ago
|
||
Comment on attachment 8604971 [details] [diff] [review]
v1. initial implementation of negative adjacency
Requesting review again since synchronous calls to nsIURIClassifier.idl is OK.
Attachment #8604971 -
Flags: review?(msamuel)
Comment 22•9 years ago
|
||
Comment on attachment 8604971 [details] [diff] [review]
v1. initial implementation of negative adjacency
Review of attachment 8604971 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/modules/DirectoryLinksProvider.jsm
@@ +745,5 @@
> this._topSitesWithSuggestedLinks.add(changedLinkSite);
> return true;
> }
> +
> + // always run _updateSuggestedTile if changedLinkSite is black listed
You're sending aLink to _isSiteBlackListed() not changedLinkSite so this comment is a bit confusing.
@@ +747,5 @@
> }
> +
> + // always run _updateSuggestedTile if changedLinkSite is black listed
> + // and there are tiles configured to avoid it
> + if (this._avoidNegativeSites && this._isSiteBlackListed(aLink)) {
`_isSiteBlackListed` seems like a misleading name since you're passing it a link object rather than a site
@@ +803,5 @@
> }
> }
> + // since newTabLinks are available, set _newTabHasNegativeSite here
> + // note that _shouldUpdateSuggestedTile is called by _updateSuggestedTile
> + this._newTabHasNegativeSite = this._avoidNegativeSites && this._checkForNegativeSites(newTabLinks);
I would expect this._newTabHasNegativeSite to get set whenever newtab links change. _getCurrentTopSiteCount() does get called indirectly whenever that happens, but it's not super clear.
Maybe a comment here that we are setting _newTabHasNegativeSite when links change or to move this code to a more explicit location?
@@ +942,5 @@
>
> + /****** Negative adjacency *********/
> +
> + /**
> + * Loads sites balcklist for negative adjacency
balcklist -> blacklist
@@ +980,5 @@
> + }
> + // get the domain from host
> + try {
> + // extract base domain from host
> + baseDomain = eTLD.getBaseDomainFromHost(host);
You would only get here if !link.baseDomain. So by the end of this try/catch, you might either have a baseDomain that was achieved through NewTabUtils.extractSite() OR eTLD.getBaseDomainFromHost(host). Was this on purpose? If so, could you explain in a comment in which case you use each approach?
@@ +997,5 @@
> + * Checks if new tab has black listed site
> + * @param new tab links (or nothing, in which case NewTabUtils.links.getLinks() is called
> + * @return true if new tab shows negative site
> + */
> + _checkForNegativeSites: function DirectoryLinksProvider_checkForNegativeSites(newTabLink) {
newTabLink -> newTabLinks
Attachment #8604971 -
Flags: review?(msamuel)
Comment 23•9 years ago
|
||
(In reply to :Gavin Sharp [email: gavin@gavinsharp.com] from comment #9)
> I agree that we shouldn't do this.
Could you explain your concern, so we can figure out how we should move forwards?
Flags: needinfo?(gavin.sharp)
Comment 24•9 years ago
|
||
dcamp pointed out how the blacklist is not quite the right name and could cause people to think the wrong thing. We're not actually blocking content from a list of sites, and we're actually closer to blocking ads. (Hey! We're shipping with a built in ad blocker? ;))
It sounds like we need a name for the list and the behavior it self.
"safety list" and "check safety"
e.g., if (tile.checkSafety) SAFETY_LIST.enforce()
Comment 25•9 years ago
|
||
maksik suggested inadjacency. That seems plain enough. :) We've got frecency and now inadjacency! Hopefully people don't get confused with adjacency lists or frecent/recency. ;)
Assignee | ||
Comment 26•9 years ago
|
||
I actually like safety theme better. inadjacency seems over-academic (and hard to spell) :)
Comment 27•9 years ago
|
||
Comment on attachment 8604971 [details] [diff] [review]
v1. initial implementation of negative adjacency
> link.lastVisitDate = rawLinks.suggested.length - position;
>+ // check if link wants to avoid negative sites
>+ if (link.avoidNegativeSites) {
>+ this._avoidNegativeSites = true;
The naming convention of values from the server aren't camel case, so use:
if (link.check_inadjacency)
Assignee | ||
Comment 28•9 years ago
|
||
replaced mentions of black and negative with inadjacency
Attachment #8604971 -
Attachment is obsolete: true
Attachment #8608379 -
Flags: review?(adw)
Assignee | ||
Comment 29•9 years ago
|
||
Attachment #8608379 -
Attachment is obsolete: true
Attachment #8608379 -
Flags: review?(adw)
Attachment #8608389 -
Flags: review?(adw)
Comment 30•9 years ago
|
||
I tried to catch you on IRC, but what I think is:
- shipping a not-too-big, non-reversible in-product blacklist "for now" is acceptable, assuming technically feasible and not too risky for your timeline etc.
- we should really move away from that towards something like bug 1164303 in the not-too-distant future
Updated•9 years ago
|
Flags: needinfo?(gavin.sharp)
Assignee | ||
Comment 31•9 years ago
|
||
make file names more explicit
Attachment #8608389 -
Attachment is obsolete: true
Attachment #8608389 -
Flags: review?(adw)
Attachment #8608392 -
Flags: review?(adw)
Updated•9 years ago
|
Attachment #8608392 -
Attachment is patch: true
Attachment #8608392 -
Attachment mime type: text/x-patch → text/plain
Comment 32•9 years ago
|
||
Comment on attachment 8608392 [details] [diff] [review]
v4. explicit naming of newTab.inadjacent.json file
Review of attachment 8608392 [details] [diff] [review]:
-----------------------------------------------------------------
::: browser/modules/DirectoryLinksProvider.jsm
@@ +976,5 @@
> + this._inadjacentSites = new Set();
> + if (jsonObject && jsonObject["domains"]) {
> + // iterate through inadjacent sites
> + jsonObject["domains"].forEach(site => {
> + this._inadjacentSites.add(site);
You can just do new Set(jsonObject.domains) since Sets can be initialized with arrays.
Attachment #8608392 -
Flags: review?(adw) → review+
Updated•9 years ago
|
Summary: Implement adult related content negative adjacency → Implement inadjacency with a hardcoded list of sites
Updated•9 years ago
|
Summary: Implement inadjacency with a hardcoded list of sites → Implement inadjacency with a hardcoded list of hashed sites
Comment 33•9 years ago
|
||
(In reply to Drew Willcoxon :adw from comment #32)
> You can just do new Set(jsonObject.domains) since Sets can be initialized
> with arrays.
Neat. ;)
- this._inadjacentSites = new Set();
- if (jsonObject && jsonObject["domains"]) {
- // iterate through inadjacent sites
- jsonObject["domains"].forEach(site => {
- this._inadjacentSites.add(site);
- });
- }
+ this._inadjacentSites = new Set(jsonObject && jsonObject.domains);
new Set(undefined).size == 0
Saved some more lines :)
Comment 34•9 years ago
|
||
Attachment #8608392 -
Attachment is obsolete: true
Comment 35•9 years ago
|
||
Attachment #8608425 -
Attachment is obsolete: true
Comment 36•9 years ago
|
||
Attachment #8608427 -
Attachment is obsolete: true
Comment 38•9 years ago
|
||
Attachment #8608564 -
Flags: feedback?(benjamin)
Comment 39•9 years ago
|
||
Comment on attachment 8608564 [details] [diff] [review]
documentation v1
f? per https://wiki.mozilla.org/Firefox/Data_Collection
There's no additional data being collected, but the server provides an extra boolean to restrict when a suggestion can be shown.
Comment 40•9 years ago
|
||
Status: NEW → RESOLVED
Closed: 9 years ago
status-firefox41:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → Firefox 41
Comment 41•9 years ago
|
||
Comment on attachment 8608564 [details] [diff] [review]
documentation v1
"The inadjacency list is packaged with Firefox" would be better if you specified the source file which contains the list.
Attachment #8608564 -
Flags: feedback?(benjamin) → feedback+
Comment 43•9 years ago
|
||
Comment 44•9 years ago
|
||
Comment 45•9 years ago
|
||
Comment 46•9 years ago
|
||
verified on osx nightly 2015-05-26
1) create new profile
2) visit enough mozilla related sites (at least 8) to get suggestion to appear on new tab page [see attachment first row]
3) visit example.com
4) open new tab page and see example.com but no suggestion [see attachment second row]
5) block example.com tile and open new tab to see suggestion return
6) drag example.com from another tab into newtab to pin tile and cause suggestion to disappear [see row third row]
Comment 47•9 years ago
|
||
Sorry step 1a:
change pref in about:config
browser.newtabpage.directory.source
to
https://people.mozilla.org/~elee/suggested.1159884.json
Status: RESOLVED → VERIFIED
Comment 48•9 years ago
|
||
Comment on attachment 8610024 [details] [diff] [review]
for aurora (Mardak will land)
Approval Request Comment: See bug 1140185 comment 11
Attachment #8610024 -
Flags: approval-mozilla-aurora?
Comment 49•9 years ago
|
||
Comment 50•9 years ago
|
||
Comment on attachment 8610024 [details] [diff] [review]
for aurora (Mardak will land)
approval-mozilla-aurora+ granted in bug 1140185 comment 14
Attachment #8610024 -
Flags: approval-mozilla-aurora?
Comment 51•9 years ago
|
||
Confirming the fix using the STR from comment 46 and comment 47 on latest Aurora, build ID: 20150528004000.
Testing was performed on Windows 7 64-bit, Mac OS X 10.9.5 and Ubuntu 14.04 32-bit.
QA Contact: cornel.ionce
Updated•3 years ago
|
Flags: qe-verify+
You need to log in
before you can comment on or make changes to this bug.
Description
•