Closed Bug 1467627 Opened 6 years ago Closed 6 years ago

Calculate origin frecency as the sum of all URL frecencies instead of the max

Categories

(Toolkit :: Places, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla63
Tracking Status
firefox62 + verified
firefox63 + verified

People

(Reporter: adw, Assigned: adw)

References

Details

(Whiteboard: [fxsearch])

Attachments

(2 files)

As discussed over email, we currently calculate an origin's frecency as the max of all its URL frecencies, but we think summing the URL frecencies instead would be better. That will help fix the case where you visit the same origin many times but with different URLs (see bug 1464154's wordreference.com discussion). The origin should have a high frecency in that case.
Blocks: 1464154
Whiteboard: [fxsearch]
I don't recall, do we plan to also change the threshold to be extracted from moz_origins?
Tracking this for 62 since it seems to significantly affect search bar autocomplete quality.
This should be pretty much done, including tests... I'll push to try to see which tests fail, and I'll need to rebase on a more current tree too before asking for review.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=1fc5f10f4bea121082692cebc9e0b123e58523d9 All tests should pass now. Next, I'll rebase on the current tree and push to try again.
Try looks good, here's one more push on a new tree to make sure, and then I'll ask for review https://treeherder.mozilla.org/#/jobs?repo=try&revision=a1fd32238503dbc4d2d5dc1fe5ffb0c162817c73
One important thing I want to call out is that I added a new temp table and trigger for updating origin frecencies, the same way we have temp tables and triggers for adding and removing origins. The reason is to ensure that changes to moz_origins occur in the same order that the corresponding changes to moz_places occur. After inserting a new moz_place, we immediately update its frecency. So when the moz_place represents a new origin that's not in moz_origins, what ends up happening is: (1) insert moz_place (2) trigger on moz_places inserts runs, inserts row into temp table (3) update moz_place's frecency (4) trigger on moz_places.frecency update runs, tries to update corresponding moz_origins.frecency, but we didn't insert the new origin yet (5) delete from the temp table (6) trigger on temp table deletes runs, inserts the new origin into moz_origins Before this patch, at both steps 4 and 6, we took the max(frecency) of all moz_places with the origin to get the origin frecency. With this patch, I wanted to avoid calling an aggregate function over all moz_places (in this case, total()) every time a new origin is inserted or a frecency changes. i.e., at step 6, the new origin is inserted with the frecency that the corresponding moz_place was inserted with. But that means that the origin frecency is wrong after step 6 because the update in step 4 isn't accounted for. Maybe there's some way to write these triggers to account for that without resorting to another temp table and trigger. And I tried, but ultimately it's just easier to understand and reason about if each change to moz_origins corresponds to a change in moz_places, and they occur in the same order. I found this problem as I was writing new tests, specifically the "Sequential" test tasks I added to test_origins.js. Summary of other changes: * I also rearranged some #defines in nsPlacesTriggers.h and nsPlacesTables.h, so the diff is slightly more complex because of that * Removed the v48 frecency migration because it's now obsolete * Modified the v49 migration to not recalculate frecency stats (because the old stats are different and now the new stats are calculated on the new v52 migration) * Added a v52 migration * Frecency stats are now calculated from moz_origins * Renamed the frecency stats moz_meta keys: they're no longer the stats of all moz_places but only the origins, so I renamed them origins_frecency_count, etc. Also renamed the recalculate function * The origin-related triggers need to take into account the possibility that moz_places.origin_id is null. That's because it's possible to insert into the temp tables *before* the trigger that sets moz_places.origin_id runs. (e.g., add a moz_place and then delete it before triggering the triggers via a delete on the temp table.) This isn't strictly related to this bug but something I realized * In test_origins.js, I split the parsing test task out into its own file since it can take a while to run * I moved the one test task from test_frecency_stats.js into test_origins.js and deleted the former file. test_origins.js should really be checking the stats each time it checks the origins * Added test tasks to test_origins.js * Added some garbage-input tests to test_sql_function_origin.js
Comment on attachment 8988634 [details] Bug 1467627 - Calculate origin frecency as the sum of all URL frecencies instead of the max. https://reviewboard.mozilla.org/r/253866/#review260942 I have a few questions. The major pain point here is the fact some origin temp tables now will contain duplicates, killing the perf improvement we created them for (lack of a per statement trigger). I wonder if we shouldn't just recalc and don't try to incrementally update, or keep an incremental update in the temp table itself, using an ON CONFLICT clause. I'd be great if we could keep a single entry per origin in the temp tables, because otherwise any batch change will be a pain. ::: browser/base/content/test/urlbar/browser_urlbarAutoFillTrimURLs.js:25 (Diff revision 4) > uri: "http://www.autofilltrimurl.com/whatever", > - transition: Ci.nsINavHistoryService.TRANSITION_TYPED, > }, { > uri: "https://www.secureautofillurl.com/whatever", > - transition: Ci.nsINavHistoryService.TRANSITION_TYPED, > }]); nit: you can just pass the straight href to PlacesTestUtils.addVisits, without array nor object. ::: toolkit/components/places/Database.cpp:2618 (Diff revision 4) > + "SELECT CAST(TOTAL(frecency) AS INTEGER) " > + "FROM moz_places " > + "WHERE frecency > 0 AND moz_places.origin_id = moz_origins.id " > + ") " > + "WHERE rowid IN ( " > + "SELECT rowid " nit: just use id, when id is an integer primary key, it's just an alias to rowid ::: toolkit/components/places/Database.cpp:2635 (Diff revision 4) > + mOffset += 400; > + > + nsCOMPtr<mozIStorageStatement> selectStmt; > + rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING( > + "SELECT rowid " > + "FROM moz_origins " you should add an ORDER BY here, or, more efficiently, just use a count(*) and compare with mOffset Though, now that you use an offset and don't set frecencies to -1, how do you track where you were arrived, in case of a crash or shutdown during the process? Do we restart from scratch every time? On the other side the only alternative I see would be to store the offset in the pref, (but would require dispatching to the main thread). Or we should in v52 set all origins frecencies to -1 before starting the recalculation. ::: toolkit/components/places/nsNavHistory.cpp:3820 (Diff revision 4) > + nsCOMPtr<mozIStorageAsyncStatement> updateOriginFrecenciesStmt = > + mDB->GetAsyncStatement("DELETE FROM moz_updateoriginsupdate_temp"); > + NS_ENSURE_STATE(updateOriginFrecenciesStmt); > + nsresult rv = > + updateOriginFrecenciesStmt->ExecuteAsync(nullptr, getter_AddRefs(ps)); > + NS_ENSURE_SUCCESS(rv, rv); The more I look at the frecency recalculation code, the more I think bug 1466267 would be great :/ It's getting crazy, when we don't actually need precise-to-the-second values. ::: toolkit/components/places/nsPlacesTables.h:181 (Diff revision 4) > +// Note that origin_id may be null because rows may be added to this table > +// before moz_places.origin_id is set. That's why there are prefix and host > +// columns, too. > #define CREATE_UPDATEORIGINSDELETE_TEMP NS_LITERAL_CSTRING( \ > "CREATE TEMP TABLE moz_updateoriginsdelete_temp ( " \ > - "origin_id INTEGER PRIMARY KEY, " \ > + "origin_id INTEGER, " \ This kills the performance improvement we had thanks to primary key on origin_id, that is the fact on removing many pages from an origin we only had to handle one removal, now it will take one entry per removal, and have to run through each change, that sounds quite expensive. The same is sort-of valid for CREATE_UPDATEORIGINSUPDATE_TEMP, it also tracks every single change. At this point I'm not sure, it may be cheaper to just recalc completely than incremental for large changes... But surely this is a perf hit. ::: toolkit/components/places/nsPlacesTriggers.h:71 (Diff revision 4) > + "UNION " \ > + "SELECT '" MOZ_META_KEY_ORIGIN_FRECENCY_SUM_OF_SQUARES "', " \ > + "IFNULL((SELECT value FROM moz_meta WHERE key = '" \ > + MOZ_META_KEY_ORIGIN_FRECENCY_SUM_OF_SQUARES "'), 0) " \ > + op " (MAX(frecency, 0) * MAX(frecency, 0)) " \ > + "FROM moz_origins WHERE " where "; " let's not include the semicolon here, and instead add it in the triggers themselves. ::: toolkit/components/places/nsPlacesTriggers.h:103 (Diff revision 4) > #define CREATE_PLACES_AFTERINSERT_TRIGGER NS_LITERAL_CSTRING( \ > "CREATE TEMP TRIGGER moz_places_afterinsert_trigger " \ > "AFTER INSERT ON moz_places FOR EACH ROW " \ > "BEGIN " \ > "SELECT store_last_inserted_id('moz_places', NEW.id); " \ > - "INSERT OR IGNORE INTO moz_updateoriginsinsert_temp (place_id, prefix, host) " \ > + "INSERT OR IGNORE INTO moz_updateoriginsinsert_temp (place_id, prefix, host, frecency) " \ what happens if we insert into moz_places with a certain frecency, and then immediately UPDATE that frecency value, and later delete from moz_updateoriginsinsert_temp? Aren't we going to use the wrong frecency? moz_updateoriginsinsert_temp was originally intended to track immutable fields, that made this kind of problem non-existent. ::: toolkit/components/places/nsPlacesTriggers.h:118 (Diff revision 4) > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_INSERT("-") \ > "INSERT OR IGNORE INTO moz_origins (prefix, host, frecency) " \ > "VALUES (OLD.prefix, OLD.host, 0); " \ > + "UPDATE moz_origins SET frecency = frecency + OLD.frecency " \ > + "WHERE OLD.frecency > 0 AND prefix = OLD.prefix AND host = OLD.host; " \ > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_INSERT("+") \ Could you please better document what this is doing? it's a bit unreadable, especially what we are subtracting and adding back again from stats. What is "frecency" in UPDATE_ORIGIN_FRECENCY_STATS? is it OLD.frecency, because it doesn't seem to select anywhere a frecency field.
Attachment #8988634 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #14) > I'd be great if we could keep a single entry per origin in the temp tables, > because otherwise any batch change will be a pain. The reason the temp table can contain the same origin more than once is that multiple operations might occur for the same origin before the temp table is emptied. e.g., a new origin is added, its frecency is changed, all places with the origin are removed so the origin now must be removed -- and then finally we delete from the temp table. IOW the temp table is a (temporary) history of origin changes. That said, maybe we could still keep one row per origin, if we just update the row's frecency in response to frecency changes, instead of adding new rows when the frecency changes. Is that what you were suggesting with an ON CONFLICT clause? (I need to read up on that.)
yes, that was the idea. The fact is that one row per origin is far more performant in those cases where, for example, we remove N pages for a given origin, then we run the trigger once of N times when we empty the temp table.
I changed moz_updateoriginsupdate_temp to this: > #define CREATE_UPDATEORIGINSUPDATE_TEMP NS_LITERAL_CSTRING( \ > "CREATE TEMP TABLE moz_updateoriginsupdate_temp ( " \ > "origin_id INTEGER, " \ > "prefix TEXT NOT NULL, " \ > "host TEXT NOT NULL, " \ > "old_frecency INTEGER NOT NULL, " \ > "new_frecency INTEGER NOT NULL, " \ > "PRIMARY KEY (origin_id, prefix, host) " \ > ") " \ > ) And moz_places_afterupdate_frecency_trigger: > #define CREATE_PLACES_AFTERUPDATE_FRECENCY_TRIGGER NS_LITERAL_CSTRING( \ > "CREATE TEMP TRIGGER moz_places_afterupdate_frecency_trigger " \ > "AFTER UPDATE OF frecency ON moz_places FOR EACH ROW " \ > "WHEN NOT is_frecency_decaying() " \ > "BEGIN " \ > "INSERT OR REPLACE INTO moz_updateoriginsupdate_temp (origin_id, prefix, host, old_frecency, new_frecency) " \ > "VALUES (NEW.origin_id, get_prefix(NEW.url), get_host_and_port(NEW.url), OLD.frecency, NEW.frecency); " \ > "END " \ > ) So the temp table will contain at most one row per origin. (In reply to Marco Bonardo [::mak] from comment #14) > ::: browser/base/content/test/urlbar/browser_urlbarAutoFillTrimURLs.js:25 > (Diff revision 4) > > uri: "http://www.autofilltrimurl.com/whatever", > > - transition: Ci.nsINavHistoryService.TRANSITION_TYPED, > > }, { > > uri: "https://www.secureautofillurl.com/whatever", > > - transition: Ci.nsINavHistoryService.TRANSITION_TYPED, > > }]); > > nit: you can just pass the straight href to PlacesTestUtils.addVisits, > without array nor object. You mean call addVisits() for each of the urls? If you pass multiple urls in a call, it looks like you need to wrap them in a { uri } object: https://dxr.mozilla.org/mozilla-central/rev/23885c14f025b61bb74d85845ac4018f05eb39f8/toolkit/components/places/tests/PlacesTestUtils.jsm#54 I'm not sure that calling it per url is better than calling it once with an array of objects, so I left this as is. Let me know if I'm not understanding what you mean. > ::: toolkit/components/places/Database.cpp:2618 > (Diff revision 4) > > + "SELECT CAST(TOTAL(frecency) AS INTEGER) " > > + "FROM moz_places " > > + "WHERE frecency > 0 AND moz_places.origin_id = moz_origins.id " > > + ") " > > + "WHERE rowid IN ( " > > + "SELECT rowid " > > nit: just use id, when id is an integer primary key, it's just an alias to > rowid Done > ::: toolkit/components/places/Database.cpp:2635 > (Diff revision 4) > > + mOffset += 400; > > + > > + nsCOMPtr<mozIStorageStatement> selectStmt; > > + rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING( > > + "SELECT rowid " > > + "FROM moz_origins " > > you should add an ORDER BY here, or, more efficiently, just use a count(*) > and compare with mOffset > > Though, now that you use an offset and don't set frecencies to -1, how do > you track where you were arrived, in case of a crash or shutdown during the > process? > Do we restart from scratch every time? Yes > On the other side the only alternative I see would be to store the offset in > the pref, (but would require dispatching to the main thread). > Or we should in v52 set all origins frecencies to -1 before starting the > recalculation. You're right, I chose to set all origin frecencies to -1 before starting the recalculation. > ::: toolkit/components/places/nsNavHistory.cpp:3820 > (Diff revision 4) > > + nsCOMPtr<mozIStorageAsyncStatement> updateOriginFrecenciesStmt = > > + mDB->GetAsyncStatement("DELETE FROM moz_updateoriginsupdate_temp"); > > + NS_ENSURE_STATE(updateOriginFrecenciesStmt); > > + nsresult rv = > > + updateOriginFrecenciesStmt->ExecuteAsync(nullptr, getter_AddRefs(ps)); > > + NS_ENSURE_SUCCESS(rv, rv); > > The more I look at the frecency recalculation code, the more I think bug > 1466267 would be great :/ It's getting crazy, when we don't actually need > precise-to-the-second values. Yeah... > ::: toolkit/components/places/nsPlacesTables.h:181 > (Diff revision 4) > > +// Note that origin_id may be null because rows may be added to this table > > +// before moz_places.origin_id is set. That's why there are prefix and host > > +// columns, too. > > #define CREATE_UPDATEORIGINSDELETE_TEMP NS_LITERAL_CSTRING( \ > > "CREATE TEMP TABLE moz_updateoriginsdelete_temp ( " \ > > - "origin_id INTEGER PRIMARY KEY, " \ > > + "origin_id INTEGER, " \ > > This kills the performance improvement we had thanks to primary key on > origin_id, that is the fact on removing many pages from an origin we only > had to handle one removal, now it will take one entry per removal, and have > to run through each change, that sounds quite expensive. Yeah, I reverted this > The same is sort-of valid for CREATE_UPDATEORIGINSUPDATE_TEMP, it also > tracks every single change. > At this point I'm not sure, it may be cheaper to just recalc completely than > incremental for large changes... I fixed this as described above > ::: toolkit/components/places/nsPlacesTriggers.h:71 > (Diff revision 4) > > + "UNION " \ > > + "SELECT '" MOZ_META_KEY_ORIGIN_FRECENCY_SUM_OF_SQUARES "', " \ > > + "IFNULL((SELECT value FROM moz_meta WHERE key = '" \ > > + MOZ_META_KEY_ORIGIN_FRECENCY_SUM_OF_SQUARES "'), 0) " \ > > + op " (MAX(frecency, 0) * MAX(frecency, 0)) " \ > > + "FROM moz_origins WHERE " where "; " > > let's not include the semicolon here, and instead add it in the triggers > themselves. Done > ::: toolkit/components/places/nsPlacesTriggers.h:103 > (Diff revision 4) > > #define CREATE_PLACES_AFTERINSERT_TRIGGER NS_LITERAL_CSTRING( \ > > "CREATE TEMP TRIGGER moz_places_afterinsert_trigger " \ > > "AFTER INSERT ON moz_places FOR EACH ROW " \ > > "BEGIN " \ > > "SELECT store_last_inserted_id('moz_places', NEW.id); " \ > > - "INSERT OR IGNORE INTO moz_updateoriginsinsert_temp (place_id, prefix, host) " \ > > + "INSERT OR IGNORE INTO moz_updateoriginsinsert_temp (place_id, prefix, host, frecency) " \ > > what happens if we insert into moz_places with a certain frecency, and then > immediately UPDATE that frecency value, and later delete from > moz_updateoriginsinsert_temp? Aren't we going to use the wrong frecency? The origin will be inserted with the first frecency, but then when we delete from moz_updateoriginsupdate_temp, the origin's frecency will be set to the second frecency. As long as we delete from the temp tables in the same order that we insert/update into moz_places, then origin operations will occur in the same order. > ::: toolkit/components/places/nsPlacesTriggers.h:118 > (Diff revision 4) > > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_INSERT("-") \ > > "INSERT OR IGNORE INTO moz_origins (prefix, host, frecency) " \ > > "VALUES (OLD.prefix, OLD.host, 0); " \ > > + "UPDATE moz_origins SET frecency = frecency + OLD.frecency " \ > > + "WHERE OLD.frecency > 0 AND prefix = OLD.prefix AND host = OLD.host; " \ > > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_INSERT("+") \ > > Could you please better document what this is doing? > it's a bit unreadable, especially what we are subtracting and adding back > again from stats. Done > What is "frecency" in UPDATE_ORIGIN_FRECENCY_STATS? is it OLD.frecency, > because it doesn't seem to select anywhere a frecency field. Each of the SELECTs in UPDATE_ORIGIN_FRECENCY_STATS is FROM moz_origins, so it's the frecency from moz_origins
Comment on attachment 8988634 [details] Bug 1467627 - Calculate origin frecency as the sum of all URL frecencies instead of the max. https://reviewboard.mozilla.org/r/253866/#review261280 ::: toolkit/components/places/nsPlacesTables.h:178 (Diff revision 5) > +// This table is used in a similar way to moz_updateoriginsinsert_temp, but for > +// deletes, and triggered via moz_places_afterdelete_trigger. > +// > +// Note that origin_id may be null because rows may be added to this table > +// before moz_places.origin_id is set. That's why there are prefix and host > +// columns, too. origin_id is a primary key now, can it really be null? Anyway, it looks like it would be trivial to always insert an id here, using an IFNULL(OLD.origin_id, (SELECT id FROM moz_origins WHERE prefix = m AND host = n), without complicating this table and making the trigger much simpler (no need to check for null) Maybe the same could be done for the update trigger? ::: toolkit/components/places/nsPlacesTables.h:191 (Diff revision 5) > ) > > -// This table is used in a similar way to moz_updateoriginsdelete_temp, but for > -// inserts, and triggered via moz_places_afterinsert_trigger. > -#define CREATE_UPDATEORIGINSINSERT_TEMP NS_LITERAL_CSTRING( \ > - "CREATE TEMP TABLE moz_updateoriginsinsert_temp ( " \ > +// This table is used in a similar way to moz_updateoriginsinsert_temp, but for > +// updates to places' frecencies, and triggered via > +// moz_places_afterupdate_frecency_trigger. > +// empty comment line ::: toolkit/components/places/nsPlacesTables.h:202 (Diff revision 5) > + "origin_id INTEGER, " \ > "prefix TEXT NOT NULL, " \ > - "host TEXT NOT NULL " \ > + "host TEXT NOT NULL, " \ > + "old_frecency INTEGER NOT NULL, " \ > + "new_frecency INTEGER NOT NULL, " \ > + "PRIMARY KEY (origin_id, prefix, host) " \ So I assume the reason for the compound key is that origin_id may be unknown (null). probably this table could be WITHOUT ROWID ::: toolkit/components/places/nsPlacesTriggers.h:156 (Diff revision 5) > + /* Deduct the origin's current contribution to frecency stats */ \ > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_UPDATE_OR_DELETE("-") "; " \ > + "UPDATE moz_origins SET frecency = frecency - OLD.frecency " \ > + "WHERE OLD.frecency > 0 AND ( " \ > + "(OLD.origin_id NOTNULL AND id = OLD.origin_id) OR " \ > + "(OLD.origin_id ISNULL AND prefix = OLD.prefix AND host = OLD.host) " \ As I said I'd be great if we could change the triggers to always store the id with a subselect, then we could remove these indirections. ::: toolkit/components/places/nsPlacesTriggers.h:190 (Diff revision 5) > "CREATE TEMP TRIGGER moz_places_afterupdate_frecency_trigger " \ > "AFTER UPDATE OF frecency ON moz_places FOR EACH ROW " \ > - "WHEN NEW.frecency >= 0 AND NOT is_frecency_decaying() " \ > + "WHEN NOT is_frecency_decaying() " \ > "BEGIN " \ > + "INSERT OR REPLACE INTO moz_updateoriginsupdate_temp (origin_id, prefix, host, old_frecency, new_frecency) " \ > + "VALUES (NEW.origin_id, get_prefix(NEW.url), get_host_and_port(NEW.url), OLD.frecency, NEW.frecency); " \ This assumes that there can't be multiple frecency changes for the same origin without a DELETE FROM moz_updateoriginsupdate_temp in the middle? I mean, if there are 2 frecency updates to moz_places, to 2 different pages in the same origin (mozilla.com/1 and mozilla.com/2), wouldn't the second update replace the frecency of the first update? and then a DELETE FROM moz_updateoriginsupdate_temp would only see the last one. Should we store a diff or a list of frecencies instead of just one value? The same question is valid for moz_updateoriginsdelete_temp, if we remove 10 pages with the same origin, the trigger seems to only keep the frecency of the first removed page (since it uses INSERT OR IGNORE). I know you changed from multiple entries per origin to one because I asked for it (perf), but there's still this problem that I'm not sure I understand how is solved.
Attachment #8988634 - Flags: review?(mak77)
(In reply to Marco Bonardo [::mak] from comment #20) > ::: toolkit/components/places/nsPlacesTables.h:178 > (Diff revision 5) > > +// This table is used in a similar way to moz_updateoriginsinsert_temp, but for > > +// deletes, and triggered via moz_places_afterdelete_trigger. > > +// > > +// Note that origin_id may be null because rows may be added to this table > > +// before moz_places.origin_id is set. That's why there are prefix and host > > +// columns, too. > > origin_id is a primary key now, can it really be null? Yes of course, because (1) moz_places.origin_id can be null when new rows are inserted into the temp table, and (2) it's not the primary key by itself, the origin_id + prefix + host are the key all together. > Anyway, it looks like it would be trivial to always insert an id here, using > an IFNULL(OLD.origin_id, (SELECT id FROM moz_origins WHERE prefix = m AND > host = n), without complicating this table and making the trigger much > simpler (no need to check for null) There's no need to look the ID up via a SELECT. We have all the info necessary at the time we insert into the temp table. For new origins, the origin_id will simply be null and we'll fall back to the prefix and host. For new origins, the origin_id will be non-null. > ::: toolkit/components/places/nsPlacesTables.h:191 > (Diff revision 5) > > ) > > > > -// This table is used in a similar way to moz_updateoriginsdelete_temp, but for > > -// inserts, and triggered via moz_places_afterinsert_trigger. > > -#define CREATE_UPDATEORIGINSINSERT_TEMP NS_LITERAL_CSTRING( \ > > - "CREATE TEMP TABLE moz_updateoriginsinsert_temp ( " \ > > +// This table is used in a similar way to moz_updateoriginsinsert_temp, but for > > +// updates to places' frecencies, and triggered via > > +// moz_places_afterupdate_frecency_trigger. > > +// > > empty comment line Yes, it's the start of a new paragraph within the comment > ::: toolkit/components/places/nsPlacesTables.h:202 > (Diff revision 5) > > + "origin_id INTEGER, " \ > > "prefix TEXT NOT NULL, " \ > > - "host TEXT NOT NULL " \ > > + "host TEXT NOT NULL, " \ > > + "old_frecency INTEGER NOT NULL, " \ > > + "new_frecency INTEGER NOT NULL, " \ > > + "PRIMARY KEY (origin_id, prefix, host) " \ > > So I assume the reason for the compound key is that origin_id may be unknown > (null). Yes > probably this table could be WITHOUT ROWID OK > ::: toolkit/components/places/nsPlacesTriggers.h:156 > (Diff revision 5) > > + /* Deduct the origin's current contribution to frecency stats */ \ > > + UPDATE_ORIGIN_FRECENCY_STATS_AFTER_UPDATE_OR_DELETE("-") "; " \ > > + "UPDATE moz_origins SET frecency = frecency - OLD.frecency " \ > > + "WHERE OLD.frecency > 0 AND ( " \ > > + "(OLD.origin_id NOTNULL AND id = OLD.origin_id) OR " \ > > + "(OLD.origin_id ISNULL AND prefix = OLD.prefix AND host = OLD.host) " \ > > As I said I'd be great if we could change the triggers to always store the > id with a subselect, then we could remove these indirections. Addressed above > ::: toolkit/components/places/nsPlacesTriggers.h:190 > (Diff revision 5) > > "CREATE TEMP TRIGGER moz_places_afterupdate_frecency_trigger " \ > > "AFTER UPDATE OF frecency ON moz_places FOR EACH ROW " \ > > - "WHEN NEW.frecency >= 0 AND NOT is_frecency_decaying() " \ > > + "WHEN NOT is_frecency_decaying() " \ > > "BEGIN " \ > > + "INSERT OR REPLACE INTO moz_updateoriginsupdate_temp (origin_id, prefix, host, old_frecency, new_frecency) " \ > > + "VALUES (NEW.origin_id, get_prefix(NEW.url), get_host_and_port(NEW.url), OLD.frecency, NEW.frecency); " \ > > This assumes that there can't be multiple frecency changes for the same > origin without a DELETE FROM moz_updateoriginsupdate_temp in the middle? > > I mean, if there are 2 frecency updates to moz_places, to 2 different pages > in the same origin (mozilla.com/1 and mozilla.com/2), wouldn't the second > update replace the frecency of the first update? and then a DELETE FROM > moz_updateoriginsupdate_temp would only see the last one. > Should we store a diff or a list of frecencies instead of just one value? That's what I thought too at first, but when I tried storing a diff, tests failed. A list of frecencies is what I was doing in the first version of this patch that you looked at: storing a row per frecency change, but you said that's not performant. I'm not sure why tests failed when storing a diff. I'll have to think about it more.
(In reply to Marco Bonardo [::mak] from comment #20) > simpler (no need to check for null) btw the null checks aren't actually necessary, at least in most places. We can just do `origin_id = OLD.origin_id OR (prefix = OLD.prefix AND host = OLD.host)`, etc. I added the explicit null checks to make it more obvious why we're comparing both IDs and prefixes/hosts -- because the ID may be null. I can remove the null checks if you like
(In reply to Drew Willcoxon :adw from comment #21) > > origin_id is a primary key now, can it really be null? > > Yes of course, because (1) moz_places.origin_id can be null when new rows > are inserted into the temp table, and (2) it's not the primary key by > itself, the origin_id + prefix + host are the key all together. Nope, in moz_updateoriginsdelete_temp origin_id is a PRIMARY INTEGER KEY > That's what I thought too at first, but when I tried storing a diff, tests > failed. A list of frecencies is what I was doing in the first version of > this patch that you looked at: storing a row per frecency change, but you > said that's not performant. > > I'm not sure why tests failed when storing a diff. I'll have to think about > it more. a diff can't be used as-is for the square stat because (a-b)^2 != a^2-b^2, as well as you should also track a count diff at that point.
OK, this uses frecency_delta columns in both the delete and frecency-update temp tables. I'm not sure why I couldn't get that to work before, but tests pass with this now. For simplicity and based on your feedback, I also removed the origin_id column from those tables in favor of a prefix + host primary key. That simplifies the triggers. My understanding is that we get indexes on prefix + host "for free" since they're the primary key, and therefore selecting rows in those tables via prefix + host should be as fast as selecting them via an integer ID, right? (Or nearly anyway, since I guess sqlite needs to hash the text values, an additional step, but still it's not a function of table size.) If the origin_id is null at the time that the trigger that inserts into a temp table runs, it's not possible to select the ID out of some other table somewhere, because it is the trigger that runs when the temp table is emptied that creates the origin ID (specifically moz_updateoriginsinsert_afterdelete_trigger). So selecting the origin ID in this way, as I think you suggested?, isn't possible. In moz_updateoriginsdelete_afterdelete_trigger I check whether there are any more moz_places with the affected origin, and not having the origin ID in that case hurts, because otherwise we would need to call get_prefix() and get_host_and_port() on every moz_place. So in that case, I first do a select on moz_origins to get the origin ID, and then from there select on moz_places with that origin ID. (In reply to Marco Bonardo [::mak] from comment #23) > (In reply to Drew Willcoxon :adw from comment #21) > > > origin_id is a primary key now, can it really be null? > > > > Yes of course, because (1) moz_places.origin_id can be null when new rows > > are inserted into the temp table, and (2) it's not the primary key by > > itself, the origin_id + prefix + host are the key all together. > > Nope, in moz_updateoriginsdelete_temp origin_id is a PRIMARY INTEGER KEY You're right, sorry, I wasn't paying close enough attention to your comment. > > That's what I thought too at first, but when I tried storing a diff, tests > > failed. A list of frecencies is what I was doing in the first version of > > this patch that you looked at: storing a row per frecency change, but you > > said that's not performant. > > > > I'm not sure why tests failed when storing a diff. I'll have to think about > > it more. > > a diff can't be used as-is for the square stat because (a-b)^2 != a^2-b^2, > as well as you should also track a count diff at that point. The diffs aren't directly relevant to the stats calculations because all those calculations do is deduct an origin's current frecency from the stats and then add its new contribution back to the stats once its frecency is updated. Of course if the origin's new frecency is calculated incorrectly, then the stats will be wrong too.
A little clean up: Only a single UPDATE_ORIGIN_FRECENCY_STATS macro is necessary now
(In reply to Drew Willcoxon :adw from comment #26) > selecting > rows in those tables via prefix + host should be as fast as selecting them > via an integer ID, right? (Or nearly anyway, since I guess sqlite needs to > hash the text values, an additional step, but still it's not a function of > table size.) It doesn't actually hash strings afaik, it just uses them as blobs, string matching is still quite slower than integer matching. Anyway, you have an index, so it's not horribly slow. > So selecting the origin ID in > this way, as I think you suggested?, isn't possible. Let me sum up, for future reference. The id is created by moz_updateoriginsinsert_afterdelete_trigger that runs when we delete from moz_updateoriginsinsert, that usually happens after we insert into moz_places The tables that may get a null id are moz_updateoriginsdelete_temp that is filled on DELETE From moz_places moz_updateoriginsupdate_temp that is filled on Update of moz_places To always have an id in these tables, we should ensure to delete from moz_updateoriginsinsert after every INSERT, and before running any update or delete. History.cpp doesn't seem to guarantee this, and anyway we'd need full serialization for that. All of this complication would probably go away if we'd get per statement triggers in Sqlite, I'll ping the Sqlite team to listen if they have plans regarding it.
Comment on attachment 8988634 [details] Bug 1467627 - Calculate origin frecency as the sum of all URL frecencies instead of the max. https://reviewboard.mozilla.org/r/253866/#review261572 OK, my only remaining concern is the fact costs of updating frecency are increasing. But we'll work on that in the future, since we have some plans for it already. ::: toolkit/components/places/Database.cpp:2627 (Diff revision 7) > + > + nsCOMPtr<mozIStorageStatement> selectStmt; > + rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING( > + "SELECT count(*) " > + "FROM moz_origins " > + "WHERE frecency < 0 " nit: it may be a little bit more efficient to SELECT 1...LIMIT 1 and just use hasResult to check for existence. ::: toolkit/components/places/nsPlacesTriggers.h:113 (Diff revision 7) > "BEGIN " \ > + /* Deduct the origin's current contribution to frecency stats */ \ > + UPDATE_ORIGIN_FRECENCY_STATS("-") "; " \ > "INSERT OR IGNORE INTO moz_origins (prefix, host, frecency) " \ > "VALUES (OLD.prefix, OLD.host, 0); " \ > + "UPDATE moz_origins SET frecency = frecency + OLD.frecency " \ We can probably use an UPSERT here, to upgrade a failed INSERT into an UPDATE, and get the existing value at the same time. https://www.sqlite.org/lang_UPSERT.html you can do this in a follow-up if you wish, it's not super critical. ::: toolkit/components/places/nsPlacesTriggers.h:133 (Diff revision 7) > + "CREATE TEMP TRIGGER moz_places_afterdelete_trigger " \ > + "AFTER DELETE ON moz_places FOR EACH ROW " \ > + "BEGIN " \ > + "INSERT OR IGNORE INTO moz_updateoriginsdelete_temp (prefix, host, frecency_delta) " \ > + "VALUES (get_prefix(OLD.url), get_host_and_port(OLD.url), 0); " \ > + "UPDATE moz_updateoriginsdelete_temp " \ same UPSERT comment ::: toolkit/components/places/nsPlacesTriggers.h:157 (Diff revision 7) > + "SELECT id " \ > + "FROM moz_origins " \ > + "WHERE prefix = OLD.prefix AND host = OLD.host " \ > + ") " \ > + "LIMIT 1 " \ > + ") ISNULL; " \ wouldn't this do? AND NOT EXISTS( SELECT id FROM moz_places WHERE origin_id = moz_origins.id LIMIT 1 ) ::: toolkit/components/places/nsPlacesTriggers.h:183 (Diff revision 7) > "BEGIN " \ > + "INSERT OR IGNORE INTO moz_updateoriginsupdate_temp (prefix, host, frecency_delta) " \ > + "VALUES (get_prefix(NEW.url), get_host_and_port(NEW.url), 0); " \ > + "UPDATE moz_updateoriginsupdate_temp " \ > + "SET frecency_delta = frecency_delta + MAX(NEW.frecency, 0) - MAX(OLD.frecency, 0) " \ > + "WHERE prefix = get_prefix(NEW.url) AND host = get_host_and_port(NEW.url); " \ UPSERT comment
Attachment #8988634 - Flags: review?(mak77) → review+
(In reply to Marco Bonardo [::mak] from comment #29) > Let me sum up, for future reference. Yes, right
(In reply to Marco Bonardo [::mak] from comment #30) > ::: toolkit/components/places/Database.cpp:2627 > (Diff revision 7) > > + > > + nsCOMPtr<mozIStorageStatement> selectStmt; > > + rv = mDBConn->CreateStatement(NS_LITERAL_CSTRING( > > + "SELECT count(*) " > > + "FROM moz_origins " > > + "WHERE frecency < 0 " > > nit: it may be a little bit more efficient to SELECT 1...LIMIT 1 and just > use hasResult to check for existence. Done > ::: toolkit/components/places/nsPlacesTriggers.h:113 > (Diff revision 7) > > "BEGIN " \ > > + /* Deduct the origin's current contribution to frecency stats */ \ > > + UPDATE_ORIGIN_FRECENCY_STATS("-") "; " \ > > "INSERT OR IGNORE INTO moz_origins (prefix, host, frecency) " \ > > "VALUES (OLD.prefix, OLD.host, 0); " \ > > + "UPDATE moz_origins SET frecency = frecency + OLD.frecency " \ > > We can probably use an UPSERT here, to upgrade a failed INSERT into an > UPDATE, and get the existing value at the same time. > https://www.sqlite.org/lang_UPSERT.html Done > ::: toolkit/components/places/nsPlacesTriggers.h:133 > (Diff revision 7) > > + "CREATE TEMP TRIGGER moz_places_afterdelete_trigger " \ > > + "AFTER DELETE ON moz_places FOR EACH ROW " \ > > + "BEGIN " \ > > + "INSERT OR IGNORE INTO moz_updateoriginsdelete_temp (prefix, host, frecency_delta) " \ > > + "VALUES (get_prefix(OLD.url), get_host_and_port(OLD.url), 0); " \ > > + "UPDATE moz_updateoriginsdelete_temp " \ > > same UPSERT comment Done > ::: toolkit/components/places/nsPlacesTriggers.h:157 > (Diff revision 7) > > + "SELECT id " \ > > + "FROM moz_origins " \ > > + "WHERE prefix = OLD.prefix AND host = OLD.host " \ > > + ") " \ > > + "LIMIT 1 " \ > > + ") ISNULL; " \ > > wouldn't this do? > AND NOT EXISTS( > SELECT id FROM moz_places > WHERE origin_id = moz_origins.id > LIMIT 1 > ) Yes, thanks! > ::: toolkit/components/places/nsPlacesTriggers.h:183 > (Diff revision 7) > > "BEGIN " \ > > + "INSERT OR IGNORE INTO moz_updateoriginsupdate_temp (prefix, host, frecency_delta) " \ > > + "VALUES (get_prefix(NEW.url), get_host_and_port(NEW.url), 0); " \ > > + "UPDATE moz_updateoriginsupdate_temp " \ > > + "SET frecency_delta = frecency_delta + MAX(NEW.frecency, 0) - MAX(OLD.frecency, 0) " \ > > + "WHERE prefix = get_prefix(NEW.url) AND host = get_host_and_port(NEW.url); " \ > > UPSERT comment Done
Pushed by dwillcoxon@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/5c31149518e3 Calculate origin frecency as the sum of all URL frecencies instead of the max. r=mak
Status: ASSIGNED → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Depends on: 1473547
Attached patch Beta/62 patch (deleted) — Splinter Review
Marco, (1) Are you OK with uplifting this? I'd like to. (2) This patch is the one that landed on m-c except in Database.cpp I changed the "Firefox 63 uses schema version 52" comment to say "Firefox 62" since that seems like the right thing to do. Do you agree? If so, we should change that comment on m-c too to match, right?
Attachment #8990136 - Flags: review?(mak77)
(In reply to Drew Willcoxon :adw from comment #36) > (1) Are you OK with uplifting this? I'd like to. Yes. > (2) This patch is the one that landed on m-c except in Database.cpp I > changed the "Firefox 63 uses schema version 52" comment to say "Firefox 62" > since that seems like the right thing to do. Do you agree? If so, we > should change that comment on m-c too to match, right? We should, and remove the double comment for Firefox 62, rs=me on doing that when it's the time.
Attachment #8990136 - Flags: review?(mak77) → review+
Comment on attachment 8990136 [details] [diff] [review] Beta/62 patch Approval Request Comment [Feature/Bug causing the regression]: Bug 1239708 -- improving autofill in the urlbar [User impact if declined]: Bug 1239708 is already on beta. If we decline this patch, autofill will actually be unintentionally worse in some cases for users on 62 compared to what they would have had if bug 1239708 hadn't been fixed. [Is this code covered by automated tests?]: Yes [Has the fix been verified in Nightly?]: No [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: Low/Medium risk [Why is the change risky/not risky?]: We introduced a new data migration to the places.sqlite database, and we usually don't uplift migrations. But the core functionality is well covered by automated tests. [String changes made/needed]: None
Attachment #8990136 - Flags: approval-mozilla-beta?
This is a big change and I'd like to see it on 62. Is there any verification we could do before uplift? Alternately, we could uplift, then verify on beta.
Flags: needinfo?(adw)
What about asking QA to test the migration from a try build?
You're right, I'll get a beta try build with this patch and some STR and coordinate with QA/Shilpi to test it.
Flags: needinfo?(adw) → qe-verify+
STR for QA testing: build1 (62.0b7) => http://ftp.mozilla.org/pub/firefox/releases/62.0b7/ build2 (custom try build with this patch) => https://treeherder.mozilla.org/#/jobs?repo=try&revision=37909ffbb890c578750eac100646b08e3593a194&selectedJob=187656129 1. Start a new profile in build1 2. Visit example.com 3. Visit example.com again 4. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1239708 5. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1467627 6. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469651 7. Start typing "example" in the urlbar and verify that it's autofilled to example.com 8. Type "bugzilla" in the urlbar and verify that nothing is autofilled 9. Quit build1, restart the same profile in build2 10. Open about:config and wait until "places.database.migrateV52OriginFrecencies" does not exist (it may not exist at all when you first check, and in that case continue on to the next step) 11. Type "example" in the urlbar and verify that nothing is autofilled 12. Start typing "bugzilla" in the urlbar and verify that bugzilla.mozilla.org is autofilled 13. Visit example.com 14. Type "example" in the urlbar and verify that nothing is autofilled 15. Type "bugzilla" in the urlbar and verify that nothing is autofilled 16. Visit example.com 17. Start typing "example" in the urlbar and verify that it's autofilled to example.com 18. Type "bugzilla" in the urlbar and verify that nothing is autofilled 19. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1464154 20. Type "example" in the urlbar and verify that nothing is autofilled 21. Type "bugzilla" in the urlbar and verify that nothing is autofilled 22. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1467631 23. Type "example" in the urlbar and verify that nothing is autofilled 24. Start typing "bugzilla" in the urlbar and verify that bugzilla.mozilla.org is autofilled
Please see STR for beta 62 in the previous comment (comment 42)
Flags: needinfo?(camelia.badau)
I've tested on Windows 7 x64, macos 10.13 and Ubuntu 16.04 x64 using the builds from comment 42 and I have the following results: - on Windows 7 x64 and macos 10.13, everything looks fine according to steps from comment 42 - on Ubuntu 16.04 x64, at steps 14, 15, 20 and 21 from comment 42, example.com and bugzilla.mozilla.org are autofilled when typing "example" or "bugzilla" in the urlbar - it's not according to comment 42
Flags: needinfo?(camelia.badau) → needinfo?(adw)
Thanks Camelia. About the Ubuntu behavior -- I want to make sure that you're seeing those results consistently, and that they don't change over the course of trying these STR several times. Could you please try the STR three or four times on Ubuntu and let me know whether you get the same results each time? Please be sure to let each visit fully load before continuing on with the next step. I'm not sure why Ubuntu would be different. I'll try to reproduce this myself, but it will take a little while since I don't have an install handy. My hunch is that it's not important; what's important is that results are consistent within each system tested, so the fact that both example.com and bugzilla are autofilled consistently at those steps shows that this is probably working OK. But still I'd like to find an explanation.
Flags: needinfo?(adw) → needinfo?(camelia.badau)
The STR actually work correctly for me on Ubuntu 16.04, and the underlying numbers look OK. I tried five or six times. Would you mind trying again a few times?
Blocks: 1475398
I've retested on another 2 machines with Ubuntu 16.04 x64 and Ubuntu 18.04 x64 and it seems that now everything it's OK, according with the STR from comment 42. I've tried several times and it works correctly.
Flags: needinfo?(camelia.badau)
Comment on attachment 8990136 [details] [diff] [review] Beta/62 patch This is a somewhat risky change, but it seems necessary to make sure we ship a quality improvement to the awesomebar. QA has done some extra testing/verification in beta try builds. This should land first, then the work in bug 1469651.
Attachment #8990136 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
Depends on: 1476674
I've tested on Windows 7 x64, macos 10.13 and Ubuntu 16.04 x64 using Firefox 62 Beta 10 and STR from comment 42 and everything looks fine according to steps from this comment. I saw that Firefox 63 is marked as verified fixed: I've tried to verify this issue using Nightly 63.0a1: 2018-07-01 (build which doesn't contain the fix) and latest Nightly 63.0a1: 2018-07-19 (build which should contain the fix) and I saw that there is a different behaviour, it doesn't work according to steps from comment 42 - it is the expected behaviour? Is the fix just for Beta builds and not for Nightly builds?
Flags: needinfo?(adw)
We landed bug 1474755 in the meantime and it changed the autofill calculation. I can still reproduce the STR using an older Nightly that has the fix for this bug but not the fix for bug 1474755, for example: http://ftp.mozilla.org/pub/firefox/nightly/2018/07/2018-07-06-10-02-10-mozilla-central/ With the current Nightly, the STR are a little different (the number of URLs you have to visit is different): 1. Start a new profile in build1 2. Visit example.com 3. Visit example.com again 4. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1239708 5. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1467627 6. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469651 7. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469652 8. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469653 9. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469654 10. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469655 11. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469656 12. Visit https://bugzilla.mozilla.org/show_bug.cgi?id=1469657 13. Start typing "example" in the urlbar and verify that it's autofilled to example.com 14. Type "bugzilla" in the urlbar and verify that nothing is autofilled 15. Quit build1, restart the same profile in build2 16. Open about:config and wait until "places.database.migrateV52OriginFrecencies" does not exist (it may not exist at all when you first check, and in that case continue on to the next step) 17. Type "example" in the urlbar and verify that nothing is autofilled 18. Start typing "bugzilla" in the urlbar and verify that bugzilla.mozilla.org is autofilled 19. Visit example.com 20. Type "example" in the urlbar and verify that nothing is autofilled 21. Start typing "bugzilla" in the urlbar and verify that bugzilla.mozilla.org is autofilled 22. Visit example.com 23. Start typing "example" in the urlbar and verify that it's autofilled to example.com 24. Start typing "bugzilla" in the urlbar and verify that bugzilla.mozilla.org is autofilled
Flags: needinfo?(adw)
Thank you, Drew! I've tested on Windows 7 x64, Ubuntu 16.04 x64 and macOS 10.13 using build1: Firefox 62 Beta 7 (comment 42) and build2: latest Nightly 63.0a1 (2018-07-23) and everything looks fine according to STR from comment 51.
Marking VERIFIED FIXED based on comment 50 and comment 52.
Status: RESOLVED → VERIFIED
Flags: qe-verify+
Depends on: 1494471
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: