Open Bug 382415 (autocompleteFrecency) Opened 17 years ago Updated 2 years ago

Popularity index of recipient autocomplete doesn't honor timeline (implement frecency algorithm for searching email contacts: weigh how frequently AND recently-used contacts are)

Categories

(MailNews Core :: Address Book, defect)

defect
Not set
major

Tracking

(Not tracked)

People

(Reporter: whimboo, Unassigned)

References

(Depends on 1 open bug, Blocks 5 open bugs, )

Details

(Keywords: helpwanted)

Initially mentioned in bug 279640. Since we have our popularity index for each address book card we should also honor the time line. If you have a very active conversation with someone for month he will have a high popularity index. For some circumstances you stop writing or reduce the amount of messages to this person. What happens with the popularity index? It still stands at this high rank. We should lower it if you haven't written a message for a longer period of time. We could add an additional element which stores the date of the last message sent to this user and with a special time line function we could lower the popularity index over the time.
Summary: Popularity index of autocomplete doesn't honor timeline → Popularity index of autocomplete doesn't honor timeline (use frecency for email contacts)
This is probably going to be useful for the improved search with autocomplete. Therefore nominating for wanted tb3. Bryan said on irc: cool. we'll need to use some kind of decay algorithm, but that also seems pretty advanced. in my understanding it would need to be similar to a logarithmic analysis in order to decay naturally.
Flags: wanted-thunderbird3?
Product: Core → MailNews Core
Flags: wanted-thunderbird3? → wanted-thunderbird3+
It would be interesting to look at the awesomebar algorithm to see if there are interesting tips and tricks that we can glean from it.
Am CC:ing on this so someone from U2U support is in the loop for questions that surface about the frequency index.
I think the important part of the awesome bar algorithm is the promotion of correct selections. They have a weighting algorithm for recent history vs. bookmark items that is supposed to float the right one to the top. That kind of weighting algorithm is always hard to get right, if there is a "right". What I think works best for the awesome bar and would equally work very well for our system is promoting the correct hits. The awesome bar saves everything you've typed into it such that the half written words are used as indexes to complete URLs. e.g. You type: slash And then complete on: http://slashdot.org/ This key / value combo is saved such that the next time you type 'slash' the first result is what you last completed on. For emails where older emails are less relevant or even broken I think a similar system would work out well. e.g. you type: clark and complete on: clarkbw@example.com Now the phrase 'clark' is saved as the key for that email address. But at some the email address is changed. you now type: clark and instead complete on: bclark@example.com We should now enter a new key / value combo for building that frequent use. To get the recent part of our 'freceny' we now just need to only look at the last 3 or so key / value combos in making the decision for which address is at the top. After the recent set of key / value combos we could append in the frequent set below.
(In reply to comment #4) > The awesome bar saves everything > you've typed into it such that the half written words are used as indexes to > complete URLs. I don't think that's actually true. What they do is that any URL you recently completed to gets a higher frecency value so that it will more likely be at the top of any fitting search in the future. They don't save any incomplete you type before completing to a URL anywhere though, AFAIK. Or do you know the details of the code to be sure of that?
Actually, I'll consider this a bug, because in scenarios like comment 0, popularityIndex feature just becomes plain useless without "frecency" (and users do care about this a lot, as seen from many bug reports where they feel TB does not offer the right thing at the top of the list). Design problems of current popularityIndex: - popularityIndex only ever goes up ad infinitum (and will probably break at some point if you try hard enough) [1] - does not involve recency (so that age-old card which you haven't used for the last 3 years will still appear at the top if it was popular in the past, whilst you'll never see the new card topmost until new card's count is more than old card's count) - if you have multiple cards with same email address, somewhat randomly first found card gets bumped [2]; restructure your address books or move cards around, and it will probably mess up - no way to manually control or reset popularity values - does not correlate the actual strings of user input with results selected by user from that string, so e.g. so when you press m often and recently enough and always pick "mozilla" from the dropdown, FF will learn that and offer "mozilla" at the top (that seems to be the resulting behaviour of FF awesome bar, not sure how they realize it technically).
Severity: enhancement → major
Whiteboard: [patchlove]
Depends on: 325458
Ftr, bug 543114 mentions that read-only or external ABs (like MAC AB) do not even use current popularityIndex.
Adding relevant searchwords to summary.
Summary: Popularity index of autocomplete doesn't honor timeline (use frecency for email contacts) → Popularity index of recipient autocomplete doesn't honor timeline (implement frecency algorithm for searching email contacts: weigh how frequently AND recently-used contacts are)
Blocks: 972690
Alias: autocompleteFrecency
Bug 497722 has worked in the same area and might thus provide a good starting point in code.
(In reply to Thomas D. from comment #14) > Bug 497722 has worked in the same area and might thus provide a good > starting point in code.
Blocks: 497722
Here's my suggestion: 1. The edit box in which the user enters email (or nick) should be replaced by a combo box. Like the address bar in Firefox. Multiple suggestions maximize the chance of guessing what the user wanted. 2. Top offered item in the combo box should not be based on the count. It should be the one where the NICK starts with the typed string. For example, let's say I have these friends: CRAZY JOHNNY (johnny@nasa.com) // sent to this address 5 times JOHN (john@example.com) // I sent to this address 170 times JENNY (johanna@playboy.com) // only once, long time ago, she never replied LINDA (linda@johnnylogan.com) // over 1000 times, most frequent item JOHN (john@mozilla_tb.com) // only 20 times, but it's the most recent item Now, let's say I enter the string "JOH" in the email box. This is how the suggestions in the combo-box should be sorted: JOHN (john@mozilla_tb.com) // because the nick starts with "JOH" and is most recent among them JOHN (john@example.com) // because the nick starts with "JOH", though it's not recent CRAZY JOHNNY (johnny@nasa.com) // because the nick contains "JOH", but doesn't start with it LINDA (linda@johnnylogan.com) // because email contains "JOH" and it's most recent among them JENNY (johanna@playboy.com) // because email contains "JOH" So I think the count doesn't matter. Only the date and string position, like so: a) Primary items are those where the NICK starts with the string. Multiple results sorted by recency. b) Second items are those where the NICK contains the string. Multiple hits sorted by recency. c) Then cases where EMAIL starts with the string (sorted by most recent hits). d) Then cases where EMAIL contains the string anywhere (sorted by recency). Why? Because if I want Linda I will type "Linda". Not "Johhny". Ok, maybe I forgot her nick. Then I may type "johny" and pick her email from the list of suggestions (it won't be on top). Also, it enables me to control the search algorithm.For example, to push somebody lower in the list I will change his nick from "John" to "Obsolete John". So when I start typing "Obso" it will list me all the obsolete items.
Thomas, FWIW, TB is behaving reasonably for contact selection these days, and has been for about the last year. At least, as far as I'm concerned.
The algorithm I described above was made to reduce the number of bytes per contact (only 1 extra date per email). It could be improved if we allow say 5 dates per email. In this case every time you send a message to someone, the date-time of sending is stored next to the contact. But since there are only 5 last dates stored, that means the oldest date is removed and replaced with the current one. At the time the date is added, an average of all 5 dates is computed. And that average value of dates is used to determine which contact should be at the top. Meaning, if you're frequently sending emails to someone, this running average of date-times will be close to now. This algorithm would take 6 floats per email (5 dates, plus 1 precomputed average date) and it should give better results, since it's a compromise between last used item and frequency of use.
(In reply to Zex from comment #16) > JOHN (john@example.com) // I sent to this address 170 times Zex, I think your suggestions have the right intentions and they might even have the right directions (but I don't have time to check details now). However, pls stop adding confusion about NICKS: A NICK is defined in a separate field on each AB card, with this explicit fieldname: Nickname. Values of First Name, Last Name, and Display Name are NEVER NICKS. Nickname is an ALIAS Name (unique shortcut) which is supposed to be replaced directly with the actual contact generally on a 1 to 1 relation ship, bypassing frecency. Which is broken in TB (Bug 325458), but can be worked around with truly unique nicks like "#j01". All the names you have in your examples of comment 16 look very much like Display Names (or First/Last names), not like nicknames of nickname field. Please use correct terminology to avoid confusion, and specify correct field names with their values.
Ah, I didn't know that nickname has that exact purpose. Bug 325458 should definitely be fixed first. If nickname search worked as it should then we probably wouldn't even talk about email address frequency.
See new Bug 1058583, "Address Book Popularity Index needs to age", and a proposed, simple algorithm there. ====== I think this bug 382415 mis-states the problem, and the restatement could simplify the solution path. The perspective of bug 382415 is that the address autocomplete needs to possess and account for time sensitive data within an address book. A solution to such a denominated problem requires knowledge of when addresses are used, and requires changes to the AB database. The programming for the solution path is complicated. A programmed solution will not be very helpful when someone moves/copies address books. The alternative bug, a restatement at Bug 1058583, focuses on a narrower bug and what I believe is the actual problem -- that the address book does not age. In the absence of aging, a sort of search responses, arising from fixes like Bug 529584, can skew sorts away from commonly and recently used addresses. C#92 in Bug 529584, by Thomas D., and his item #3 in that comment, triggered my thinking on this.
(In reply to john ruskin from comment #22) > See new Bug 1058583, "Address Book Popularity Index needs to age", and a > proposed, simple algorithm there. > John, thank you, I'm curious to look at that. > > C#92 in Bug 529584, by Thomas D., and his item #3 in that comment, triggered > my thinking on this. To allow bugzilla's auto-linkification to do the trick for comments, use this syntax: From Bug 529584 Comment 92: > 3) For those irrated by different result sets with seemingly different > order, per comment 91, we're exploring ways of improving the order of > autocomplete results in bug 970456. As I explained there, under the > assumption that we do NOT want to break popularityIndex which at least > somewhat helps the most frequently used contacts to be toplisted in > autocomplete results, imo that bug can only satisfy a very small number of > scenarios/contacts (if at all), namely those where popularityIndex is either > not yet set (virgin AB) or high AND identical with other cards. For most > other cases, that bug can't and won't do anything. > > 4) The real culprit of the surprising behaviour seen by users like Matt > (comment 89, comment 90) is not the twin bugs, but age-old design > deficiencies of the current result sorting algorithm: current results are > first ordered by popularityIndex, a dull and ever-increasing counter of > frequency. Write 1000 emails to "Angelina" in 1999 and none everafter, then > 500 emails to "Angus Miller" in 2014, and Thunderbird will stubbornly keep > pushing your old friend "Angelina" to the top all the time: That's Bug > 382415, we need "frequency + recency" algorithm correlated with your > personal search input and which result you pick from there, instead of just > card-based frequency. Perhaps some ideas of bug 970456 can somehow go into > the weighting/scoring of frecency algorithm. Assuming that you've *recently* > communicated with "Angus Miller" more frequently than with "Angelina > Joyful", and you always pick "Angus Miller" after typing "Ang", TB should > automatically adapt to that use pattern just like FF awesome bar > successfully does for URLs (I never type more than 2 or 3 letters to find my > favourite URLs from thousands in the history).
(I wasn't sure if the syntax would pick up the -current- comment , or the comment for the referenced bug...!)
Depends on: 1058583
Blocks: 594401
I write everyday to some mail partners. A long time ago, when search feature was ok, the correct mail appeared as soon as i typed the first letter(s). But NOW, the correct email doesnt appear, not even anymore at all in the selectable menu options !!!!!!!
Blocks: 1133269
Blocks: 1114751
Blocks: 1134986
I just want to know, at what point this stopped worked properly and why? For a few months Thunderbird completion is giving me random addresses. I mean it prefers people which I may have sent a few emails 3 years ago rather than people to whom I DO send emails daily for years.
(In reply to SuD from comment #27) > I just want to know, at what point this stopped worked properly and why? "This" can't have stopped working because it is not yet implemented. But yes, there have been issues with the autocomplete result sorting algorithm. I believe we've fixed the worst parts of the problem, pls expect improvements for TB 38 which is around the corner. Have you tried defining unique nicknames for your favorite addresses in nickname field? Display Name: John Doe Nickname: "jd#" Then to add John Doe to recipients, just type jd#, Enter. Works 100% of the time if it's unique enough, which is pretty much ensured by using the # sign, but you need to avoid creating duplicate nicks yourself. > For a few months Thunderbird completion is giving me random addresses. > I mean it prefers people which I may have sent a few emails 3 years ago > rather than people to whom I DO send emails daily for years. The causes for this are mostly outside this bug, but this bug should also contribute to avoid such undesired phenomenoms.
Blocks: 546737
Blocks: 1067681
(In reply to SuD from comment #27) > I just want to know, at what point this stopped worked properly and why? > > For a few months Thunderbird completion is giving me random addresses. > I mean it prefers people which I may have sent a few emails 3 years ago > rather than people to whom I DO send emails daily for years. Thunderbird stopped properly making use of the popularity after 31.3.0 - I switched back to that version and I'm happy since.
(In reply to St Heine from comment #29) > (In reply to SuD from comment #27) > > I just want to know, at what point this stopped worked properly and why? > > > > For a few months Thunderbird completion is giving me random addresses. > > I mean it prefers people which I may have sent a few emails 3 years ago > > rather than people to whom I DO send emails daily for years. > > Thunderbird stopped properly making use of the popularity after 31.3.0 - I > switched back to that version and I'm happy since. Switching back to outdated versions is not a sustainable solution. Instead, pls comment on an existing bug or file new bug if nobody has filed it yet. In this case, there's bug 1134986 (fixed for tb38) and another bug iirc. We've tried to fix the biggest part of this problem for TB 38 in bug 1134986. Starting from TB 38 (!), if your searchword matches the beginning of word-like parts of "John Doe (Johnny :jo) <john.doe@asdf.com>", popularity will be used correctly. So it will work for search words like John, Doe, Johnny, jo (or shorter versions of these search words). Only for inner matches (like "ohn", "oe", "ohnny"), popularity will be ignored. I've registered my doubts about that, but it might be a reasonable balance until we implement a more intelligent frecency algorithm. There's a chance that until recently, we didn't even find the results which will still be ignored for popularity, so it should work as it used to for most users in TB 38.
You need to log in before you can comment on or make changes to this bug.