1239708 - Improve the autofill decisions algorithms

Reporter

Description

•

9 years ago

In bug 769348 we changed the algorithm to figure out the best schema for an host, we decided to go with "use a prefix if all typed pages for that host use that prefix". That was after a long discussion in bug 769994 where users needed a way to override our too aggressive algorithm that was forcing a prefix even if just one entry had that prefix. Let's make an example, supposing the host "mozilla.org". Before bug 769348 it was enough that just one typed page with host "mozilla.org" had "https" schema, for us to suggest an https autofill entry. That was nice from a security point of view, but opened a can of worms (bug 769994), mostly due to https pages returning errors or broken certificates and other situations we cannot detect in Places. So it was decided in bug 769348 to suggest a given prefix for "mozilla.org" only if all typed pages have that prefix. The second solution offers a way out to users that couldn't always use https on an host. On the other side makes extremely easy to lost secure prefix just by typing once the non secure host. Typing in reality includes various user actions: - Opening a tab from "another device" (Sync) - middle-mouse paste - typing an url - loading one (or multiple) pages from any history view On a second thought, I think both decisions were pretty extreme, what matters is that we should somehow be able to adapt to the user/page changes over time. I think we have 2 problems here. The first problem is that we are abusing the typed flag and that pollutes data. It's not even anymore a real typed flag, since clicking on any history view sets it, plus we don't set it for loads from the bookmarks views that are likely even more important? Due to this polluting, since autofill is based on typed being set, we autofill a page loaded once from history, but not a bookmarked page :( This is sub-optimal. We could try to restore the typed flag to its original intent, remove browser.urlbar.autoFill.typed (it's the default) and change the algorithm to decide what to autofill. We decided to autofill only typed domains in bug 720258, since otherwise we were autofilling pages that the user visited just once... Using typed looked like a simple solution, but it's again sub-optimal. A better decision may be based on frecency of the host, we could autofill only if it's over a given threshold (TBD). This would likely also bring some perf wins, since keeping typed in sync has a cost. The second problem is that by stating "use a schema only if all typed pages have it" does not adapt with time. it's enough to have one single wrong visit to break the profile forever. Again, I think a better solution would be to base the decision on frecency, like we could say "use a schema only if all of the 3 most frecent pages have it". Both algorithms can be tweaked by changing the frecency threshold and number of frecent pages. The work to implement these can be splitted into 3 to 4 bugs, it's not extremely complex but involves changes to the awesomebar behavior, imo they will end up being improvements, since we will be able to better adapt to each single user, but it's still touching the most common point of interaction and autofill behavior. I'd like to get some thoughts on these suggested changes. Plus, I'd like to know from managers whether it's worth I spend some time on trying to implement them, these affect awesomebar quality.

Flags: needinfo?(past)

Flags: needinfo?(paolo.mozmail)

Flags: needinfo?(dolske)

Flags: needinfo?(adw)

Marco Bonardo [:mak]

Reporter

Updated

•

9 years ago

Priority: -- → P2

Justin Dolske [:Dolske]

Comment 1

•

9 years ago

I don't have a strong opinion here. Sounds like a good improvement?

Flags: needinfo?(dolske)

:Paolo Amadini

Comment 2

•

9 years ago

I agree that we should remove the variable of whether a URI was typed using the keyboard or not in our frecency calculations. This may even simplify our code? We may obviously still want to exclude from the calculations (or even from the recording of the visit) some loads without direct interaction, like pinned tabs and tabs restored by session restore, or maybe consider them differently. Selecting the schema based on frecency is also better. We should ensure that the adaptive algorithm still works for schema changes, in other words if I'm suggested HTTP but I manually select an HTTPS base domain, results should adapt for next time. I think this may occur naturally anyways without significant code changes.

Flags: needinfo?(paolo.mozmail)

Marco Bonardo [:mak]

Reporter

Comment 3

•

9 years ago

(In reply to :Paolo Amadini from comment #2) > We may obviously still want to exclude from the calculations (or even from > the recording of the visit) some loads without direct interaction, like > pinned tabs and tabs restored by session restore, or maybe consider them > differently. Currently we don't ever register visits to pages restored with a session. That is a bug in itself (bug 613126) since supposing you use that page everyday but it's always restored, it will have a very low frecency. It is sort of off-topic here, but the problem exists. > Selecting the schema based on frecency is also better. We should ensure that > the adaptive algorithm still works for schema changes, in other words if I'm > suggested HTTP but I manually select an HTTPS base domain, results should > adapt for next time. I think this may occur naturally anyways without > significant code changes. By using frecency it would happen naturally, but may take some days for the new prefix to replace the old one. It would still be an improvement over the current situation where it may take one year and typing even just once the wrong prefix restarts the "timer". We could then build improvements on top of a better situation.

Panos Astithas (he/him) [:past] (please ni?)

Comment 4

•

9 years ago

I agree that this seems like a good improvement. I'm less certain about the relative priority against other awesomebar work, but I think this should be part of the Places bug list that we talked about this week.

Flags: needinfo?(past)

Marco Bonardo [:mak]

Reporter

Comment 5

•

9 years ago

one of the reasons I filed this, is that due to the regression bug 1234186, we have basically polluted users autofill data for the next year.

Panos Astithas (he/him) [:past] (please ni?)

Comment 6

•

9 years ago

Yes, that one is quite nasty. This one seems certainly important too, I just don't know off the top of my head what else we've got.

Marco Bonardo [:mak]

Reporter

Updated

•

9 years ago

Whiteboard: [fxsearch][unifiedcomplete]

Bug 1239708: Improve awesomebar autofill. Part 0: Core changes. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 1: Core follow-ons. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 2: Non-core follow-ons. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 3: Front-end changes. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 4: Frecency stats. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 5: xpcshell tests. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details
Bug 1239708: Improve awesomebar autofill. Part 6: Browser tests. 7 years ago Drew Willcoxon :adw (deleted), text/x-review-board-request	mak : review+	Details