Closed
Bug 1387694
Opened 7 years ago
Closed 7 years ago
Top frecent sites query should dedupe to the more frecent rev_nowww
Categories
(Firefox :: New Tab Page, defect)
Firefox
New Tab Page
Tracking
()
RESOLVED
FIXED
Firefox 57
Tracking | Status | |
---|---|---|
firefox57 | --- | fixed |
People
(Reporter: Mardak, Assigned: Mardak)
References
Details
Attachments
(1 file)
Looks like the query is missing an ORDER for a GROUP and picks the correct most frecent of a given rev_host but then the less frecent of rev_nowww then the most frecent of those hosts.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 2•7 years ago
|
||
Updating patch. I thought this code seemed familiar... :(
nanj fixed this in the Test Pilot version but after we had already exported the query to m-c:
https://github.com/mozilla/activity-stream/pull/2372
As was suggested in the PR, I'll make the query index-fast and do custom logic in JS.
Comment hidden (mozreview-request) |
Assignee | ||
Comment 4•7 years ago
|
||
Yay for unit tests. And SQL can indeed be quite compact line-wise but not necessarily easily performant with complex logic.
Comment 5•7 years ago
|
||
mozreview-review |
Comment on attachment 8894063 [details]
Bug 1387694 - Top frecent sites query should dedupe to the more frecent rev_nowww.
https://reviewboard.mozilla.org/r/165162/#review171196
Overall looks good, just a couple questions in the query and with the regex. Also would be nice to know how all this post-processing stuff performs on a larger number of sites i.e when Object.keys(BlockedLinks.links).length returns a large number and therefore our limit is a large number.
::: toolkit/modules/NewTabUtils.jsm:946
(Diff revision 2)
> - // Which is why subselect orders frecency and last_visit_date backwards.
> - // In general the groupby behavior in the absence of aggregates is not
> - // defined in SQL, hence we are relying on sqlite implementation that may
> - // change in the future.
> -
> - const limit = Object.keys(BlockedLinks.links).length + TOP_SITES_LIMIT;
> + const limit = Object.keys(BlockedLinks.links).length + TOP_SITES_LIMIT * 2 * 10;
> +
> + // Keep this query fast with frecency-indexed lookups (even with excess
> + // rows) and shift the more complex logic to post-processing afterwards
> + const sqlQuery = `SELECT
> + moz_bookmarks.guid AS bookmarkGuid,
Do you need this ```AS bookmarkGuid``` here?
::: toolkit/modules/NewTabUtils.jsm:949
(Diff revision 2)
> - // change in the future.
> -
> - const limit = Object.keys(BlockedLinks.links).length + TOP_SITES_LIMIT;
> - let sqlQuery = `/* do not warn (bug N/A): do not need index */
> - SELECT url, title, SUM(frecency) frecency, guid, bookmarkGuid,
> - last_visit_date / 1000 as lastVisitDate, "history" as type
> + // rows) and shift the more complex logic to post-processing afterwards
> + const sqlQuery = `SELECT
> + moz_bookmarks.guid AS bookmarkGuid,
> + frecency,
> + moz_places.guid,
> + last_visit_date / 1000 AS lastVisitDate,
Same here with ```AS lastVisitDate```. I'm asking because above you don't have ```moz_places.guid AS guid``` yet you're still extracting the guid as a listed column, I'm wondering if you need it here or if you forgot to add ```AS guid``` for ```moz_places.guid```
::: toolkit/modules/NewTabUtils.jsm:1003
(Diff revision 2)
> + }
> +
> + link.type = "history";
> +
> + // First we want to find the best link for an exact host
> + setBetterLink(exactHosts, link, url => url.match(/:\/\/([^\/]+)\//));
Why match on a trailing slash here?
For example, this regex matches on 'http://foo.co.uk/' but wouldn't match on 'http://foo.co.uk', even though they would have the same host, right? So exactHosts wouldn't have the right values in the end.
Unless this case would never happen?
::: toolkit/modules/NewTabUtils.jsm:1009
(Diff revision 2)
> }
> - links = links.slice(0, TOP_SITES_LIMIT);
> +
> + // Clean up exact hosts to dedupe as non-www hosts
> + const hosts = new Map();
> + for (const link of exactHosts.values()) {
> + setBetterLink(hosts, link, url => url.match(/:\/\/(?:www\.)?([^\/]+)\//),
Same question here with the trailing slash; this regex matches on 'http://foo.com/' but not on 'http://www.foo.com', does that affect the result set?
Attachment #8894063 -
Flags: review?(usarracini) → review+
Assignee | ||
Comment 6•7 years ago
|
||
mozreview-review-reply |
Comment on attachment 8894063 [details]
Bug 1387694 - Top frecent sites query should dedupe to the more frecent rev_nowww.
https://reviewboard.mozilla.org/r/165162/#review171196
> Do you need this ```AS bookmarkGuid``` here?
Yup. Otherwise "guid" conflicts with the one from moz_places
> Same here with ```AS lastVisitDate```. I'm asking because above you don't have ```moz_places.guid AS guid``` yet you're still extracting the guid as a listed column, I'm wondering if you need it here or if you forgot to add ```AS guid``` for ```moz_places.guid```
Yup. Need this one to be able to select the computed value. Although, we could do the computation in JS instead but then that would probably require us to clean up the undesired column from the object.
> Why match on a trailing slash here?
> For example, this regex matches on 'http://foo.co.uk/' but wouldn't match on 'http://foo.co.uk', even though they would have the same host, right? So exactHosts wouldn't have the right values in the end.
> Unless this case would never happen?
It doesn't have to, but pretty sure normal places usage will have a trailing slash on top level urls.
sqlite> SELECT COUNT(1) FROM moz_places;
118245
sqlite> SELECT url FROM moz_places WHERE url LIKE 'http%' AND INSTR(SUBSTR(url, 9), "/") = 0 LIMIT 10;
From my own history, there's no pages that start with http that don't have a slash somewhere (other than for the protocol).
I'll just update this to not require the trailing slash anyway.
Comment hidden (mozreview-request) |
Pushed by edilee@gmail.com:
https://hg.mozilla.org/integration/autoland/rev/601590dc02a7
Top frecent sites query should dedupe to the more frecent rev_nowww. r=ursula
Comment 9•7 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 7 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 57
Updated•5 years ago
|
Component: Activity Streams: Newtab → New Tab Page
You need to log in
before you can comment on or make changes to this bug.
Description
•