Closed Bug 909948 Opened 11 years ago Closed 9 years ago

[Contacts] Normalization improvements for contacts

Categories

(Core Graveyard :: DOM: Contacts, defect, P3)

x86
macOS
defect

Tracking

(Not tracked)

RESOLVED INVALID

People

(Reporter: kgrandon, Unassigned)

References

Details

(Keywords: perf, Whiteboard: [c= p= s= u=])

We need to make improvements in how we normalize contacts in the contacts app. We need to be able to query for contacts which do not have a "normalized" name to display. Two quick ideas: 1 - A new "normalized" column for contacts, and have the mozContacts API manage this column on edit. Questions - Does this belong in the specification? Could the normalization be driven by an app/os specific setting? 2 - Be able to filter on multiple columns and rules. Questions - Do we really want to invent our own query language? Here is some example of how we normalize a contact display name: if (first != '' || second != '') return Normalizer.toAscii(ret.join('')).toUpperCase().trim(); ret.push(contact.org); ret.push(contact.tel && contact.tel.length > 0 ? contact.tel[0].value.trim() : ''); ret.push(contact.email && contact.email.length > 0 ? contact.email[0].value.trim() : ''); ret.push('#'); return Normalizer.toAscii(ret.join('')).toUpperCase().trim();
Can you give a use-case here? Why would we need a 'normalized name' value? Do you mean http://tools.ietf.org/html/rfc6350#section-6.2.1? We already have the 'name' field for it.
(In reply to Gregor Wagner [:gwagner] from comment #1) > Can you give a use-case here? Why would we need a 'normalized name' value? > Do you mean http://tools.ietf.org/html/rfc6350#section-6.2.1? > We already have the 'name' field for it. In contacts, if an object does not have a givenName or familyName, we will sort the contact under either the phone number, organization or email - depending on what data the contact has. It appears that the vCard spec does does not list org or email to under the name field. One option might be to extend this field, by adding the necessary fields to the name value if we have an empty value. I think modifying the name field to handle all of our normalization needs could be a very clean approach.
Here's some additional context at the risk of duplicating what Kevin wrote: Kevin is investigating if we can avoid loading the entire contacts list in bug 909935. In doing this, we are trying to provide feature parity with the current contacts app. One feature that we are having trouble matching without the full list is the way it does name normalization. As Kevin highlights above, if the user does not have a first name or last name the app will display their email if its present. So if their email is dsmith@example.com, it needs to be placed in the 'D' jump list group. Currently this is possible because contacts gets the whole list and will eventually put the contact in the correct group list (albeit in the wrong order within that group). In the system Kevin is investigating, however, we want to just use cursors retrieved from getAll() with a startsWith filter specifying the name begins with 'A', 'B', etc. Contacts without a name, however, won't show up at all so we won't have a chance to consider their email. The question is, how can we deal with this kind of normalization and avoid loading the entire list? To me the options seem to be: a) The contacts API in gecko provides the normalization and life is good. This seems highly unlikely, though, since I think you can make a legitimate argument that this normalization is app specific. Also, it seems like API-side normalization has bitten us in the past with the tel field, etc. b) The contacts API provides a way to perform more complex queries. For example, all contacts without a name that have an email starting with letter 'D'. This is really pushing things towards a relational database style query, though, and also seems unlikely. c) Have the app perform a separate query for contacts without names and treat them specially. They may appear in the list slowly, etc. This may be reasonable given that large lists of contacts are likely to be well organized by name in order to be useful. d) The project could decide that filing contacts with no name, but an email in the correct letter group is not worth the performance costs of the full load or complexities of (c). In this case we could populate all of these contacts in the '#' group even if they have an email. Do you see other options? Any ideas on how we can best handle these partially empty contacts without loading the entire list?
Hi all, Just for adding a bit more of context (if possible ;)), it's not just the email, but a set of fields the one used to set the visibile row, in this case if not given or family name, as you comment, we will try to fix with email, company and phone number, and finally of none of those (like just having an address), the string 'No name' that must be grouped under '#'. My personal approach, sorry Gregor ;P, will be a set of more complex available filters/queries. This has been already required, well, not exactly that, but the fact of being able to filter by other fields, like 'name' in order to achieve some use cases like contacts matching (for doing the merge), see bug 898337. I understand that we cannot make the API app specific, but having richer filters will help us in many situations. Regards, F.
c or d would work for me. For sure, having a rich set of queries and filters would be awesome, but this is imho a lot of work for what is currently an edge case.
We already had endless discussions about the contacts API performance and the solution was always we need one more feature in the API and then we are good :) So understand that my default reaction is always skeptical :) We ended up in a highly complex API implementation and I guess we learned our lesson that complex lookups can't be solve in the API. We should move towards a datastore model even if the actual datastore API might be out a few releases. Even now we have a revision number where an app can see if the content of the contacts DB has changed. The only functionality that is missing is a function like 'give me all changes starting with revision X'. Right now we mostly change contacts within the contacts app and facebook contacts get also updated in there afaik. So the contacts app will be open most of the time the content of the DB changes. How fast is the performance if we just store the HTML list of contacts in the app and after startup we load the whole list at once (maybe just set innerHTML) and update the contact list in the background? My idea of the contacts API right now is that it should provide the very basic functionality so that all apps can load contacts with a cursor or ask for a contact with a given phone number but any app specific problem shouldn't be solved in the API. I think we should also try to prototype these approaches within the contacts app and if we see that it is the right solution we might want to move it into the API but once we add functionality to the API it's almost impossible to get rid of it again and any further changes become exponentially harder :)
(In reply to Gregor Wagner [:gwagner] from comment #6) > We should move towards a datastore model even if the actual datastore API > might be out a few releases. > Even now we have a revision number where an app can see if the content of > the contacts DB has changed. The only functionality that is missing is a > function like 'give me all changes starting with revision X'. Right now we > mostly change contacts within the contacts app and facebook contacts get > also updated in there afaik. So the contacts app will be open most of the > time the content of the DB changes. > > How fast is the performance if we just store the HTML list of contacts in > the app and after startup we load the whole list at once (maybe just set > innerHTML) and update the contact list in the background? You can never load the entire list at once - it kills performance. The best solution for our performance problems would be to load objects into javascript as necessary using a cursor. Any background work of loading the entire list won't work either as you need to be able to use jump links right when the app loads. > My idea of the contacts API right now is that it should provide the very > basic functionality so that all apps can load contacts with a cursor or ask > for a contact with a given phone number but any app specific problem > shouldn't be solved in the API. I think we should also try to prototype > these approaches within the contacts app and if we see that it is the right > solution we might want to move it into the API but once we add functionality > to the API it's almost impossible to get rid of it again and any further > changes become exponentially harder :) The current problem is that we have tried to build an app around an insufficient API. If we remove jump links and special sorting, we would have zero performance problems - and the current API would be sufficient. Perhaps an option would be to remove these features, and build them when the API can support them properly. I don't think UX/product would be too happy with that, but it's something we can ask about.
(In reply to Kevin Grandon :kgrandon from comment #7) > (In reply to Gregor Wagner [:gwagner] from comment #6) > > We should move towards a datastore model even if the actual datastore API > > might be out a few releases. > > Even now we have a revision number where an app can see if the content of > > the contacts DB has changed. The only functionality that is missing is a > > function like 'give me all changes starting with revision X'. Right now we > > mostly change contacts within the contacts app and facebook contacts get > > also updated in there afaik. So the contacts app will be open most of the > > time the content of the DB changes. > > > > How fast is the performance if we just store the HTML list of contacts in > > the app and after startup we load the whole list at once (maybe just set > > innerHTML) and update the contact list in the background? > > You can never load the entire list at once - it kills performance. The best > solution for our performance problems would be to load objects into > javascript as necessary using a cursor. Any background work of loading the > entire list won't work either as you need to be able to use jump links right > when the app loads. Have you profiled all this ?
(In reply to Kevin Grandon :kgrandon from comment #7) > (In reply to Gregor Wagner [:gwagner] from comment #6) > > We should move towards a datastore model even if the actual datastore API > > might be out a few releases. > > Even now we have a revision number where an app can see if the content of > > the contacts DB has changed. The only functionality that is missing is a > > function like 'give me all changes starting with revision X'. Right now we > > mostly change contacts within the contacts app and facebook contacts get > > also updated in there afaik. So the contacts app will be open most of the > > time the content of the DB changes. > > > > How fast is the performance if we just store the HTML list of contacts in > > the app and after startup we load the whole list at once (maybe just set > > innerHTML) and update the contact list in the background? > > You can never load the entire list at once - it kills performance. The best > solution for our performance problems would be to load objects into > javascript as necessary using a cursor. Any background work of loading the > entire list won't work either as you need to be able to use jump links right > when the app loads. I would love to see hard numbers for it :) When reuben and I did some performance testing just for the API call we collected very interesting numbers. It was like 80% of the time is spent in the contacts app incrementally building the contacts list. Inserting contacts into the DOM tree is the expensive part. I can't recall the exact numbers but in our UI test app we timed the getAll call and for 1000 contacts there was a huge difference. Something like 4 sec to retrieve all contacts in the UI test app and 20 sec in the contacts app where we spend all time building the DOM incrementally. We might get away with a single reflow after inserting all contacts without even noticing. > > > > My idea of the contacts API right now is that it should provide the very > > basic functionality so that all apps can load contacts with a cursor or ask > > for a contact with a given phone number but any app specific problem > > shouldn't be solved in the API. I think we should also try to prototype > > these approaches within the contacts app and if we see that it is the right > > solution we might want to move it into the API but once we add functionality > > to the API it's almost impossible to get rid of it again and any further > > changes become exponentially harder :) > > The current problem is that we have tried to build an app around an > insufficient API. If we remove jump links and special sorting, we would have > zero performance problems - and the current API would be sufficient. Perhaps > an option would be to remove these features, and build them when the API can > support them properly. I don't think UX/product would be too happy with > that, but it's something we can ask about. The current contacts API problem is that we have tried to build an API around an insufficient indexedDB API :) So the situation doesn't get much better if we move it in the API implementation.
(In reply to Gregor Wagner [:gwagner] from comment #9) > I would love to see hard numbers for it :) When reuben and I did some > performance testing just for the API call we collected very interesting > numbers. It was like 80% of the time is spent in the contacts app > incrementally building the contacts list. Inserting contacts into the DOM > tree is the expensive part. > I can't recall the exact numbers but in our UI test app we timed the getAll > call and for 1000 contacts there was a huge difference. Something like 4 sec > to retrieve all contacts in the UI test app and 20 sec in the contacts app > where we spend all time building the DOM incrementally. > We might get away with a single reflow after inserting all contacts without > even noticing. The 4 seconds is still about where we are. See bug 888666. I think Kevin's point is if it takes 4 seconds to load all the contacts, it doesn't matter what we do in the app. We can't implement the features required and be responsive. That leaves us with: 1) Drop features from the app. 2) Improve the API to permit implementing the features without loading all the contacts. 3) Declare API bankruptcy and implement our own storage in the app. It seems like (3) is everyone's recommendation right now. This is really disheartening given I thought part of the goal of all this was to build useful APIs that could be widely adopted. Approach (3) leaves 3rd party apps that use contacts API out in the cold and is not a good story for other platforms to adopt it. > The current contacts API problem is that we have tried to build an API > around an insufficient indexedDB API :) So the situation doesn't get much > better if we move it in the API implementation. I agree that a lot of the pain is related to a mismatch between the API features provided/requested and a key-value store like IDB. Personally I think we should investigate using mozStorage instead of IDB in contacts API. It seems like a relational DB would provide a better fit for the features that seem to be needed.
Another solution is slightly modifying the features so that it fits with what's currently possible to do with the Contact API. Having a complex query engine is likely not happening, but we still can implement simpler solutions. As I said before, your c and d solutions work for me, and would need only simple changes in the API (and maybe even no change). Here are additional thoughts: I think what happens with Contacts is really similar to what happens with SMS (unresolved as of now), that I know quite well. Therefore, the big slowdowns are probably: * inserting incrementally contacts into the DOM while the cursor returns data: this is painfully slow because each insertion triggers a style calculation, even if it's hidden (ie: display: none). If it's not hidden, it's worse, since we'll have a reflow. * IPC: the more we return contacts, the more it takes time. That's why I was requesting a profile. I agree it's not exactly the same problem as the SMS app: in the SMS app the user likely wants to see the latest messages, whereas in the Contacts app the user can want to show anything. So clearly we'll want to be able to "jump" to a letter. However, 4 seconds would already be a 5x improvement over the current behavior. I'd love to have this. Also, using the mozContact API to do the Search instead of searching using the DOM would be very enjoyable. So, well, here are my 2 cents.
(In reply to Julien Wajsberg [:julienw] from comment #11) > However, 4 seconds would already be a 5x improvement over the current > behavior. I'd love to have this. Just to address the 5x comment, I believe master/m-c should be able to complete the full load within ~8.5 seconds. I had patches to bring this closer to 6.5 to 7 seconds, but it started to regress scrolling performance too badly. So we're not that far off from the 4 seconds now. Are you seeing different behavior? > Also, using the mozContact API to do the Search instead of searching using > the DOM would be very enjoyable. I think Kevin's approach would favor using the API for search as well. Unfortunately we would probably run into the same problem since contacts app searches against the normalized names.
I'm not dogfooding on master right now so I can't really comment, but this is actually a great news, thanks ! Still haven't seen a profile for the loading though ;)
(In reply to Julien Wajsberg [:julienw] from comment #13) > Still haven't seen a profile for the loading though ;) Which part are you interested in? Its been on my TODO list to profile the API more for bug 888666 for a while now, but have not had time. One of the reasons its been a lower priority for me is that I know a lot of smart people have already been looking at those profiles a lot at prior work weeks, etc. Most of my efforts have simply been to avoid touching the DOM as much as possible during load which I don't really need a profile to pursue. My experience profiling the app during load is that most the work is in style/reflow. Even with the visibility_monitor this is unfortunately the case due to tag_visibility_monitor.js touching scrollTop. Maybe there is a separate bug there. I've also wondered if some of the selectors in our style code need to be improved since we use things like descendant tag selectors all over the place in shared/style, etc. I think this is getting looked at in the style refactor going on now, though. To be honest, though, Kevin's prototype which avoids loading the entire list is so much faster I'm not sure any amount of profiling is going to catch up with it. Its effectively O(c) during launch while any solution requiring a full load of the contacts is O(n). Even if we don't touch the DOM we're still slamming the CPU for a time proportional to the size of the full contact list. Anyway, I think all of this discussion is good, but perhaps far afield from normalization which is the topic of this bug. Maybe we should move the discussion to Kevin's parent bug 909935.
I think touching scrollTop doesn't trigger reflow :) I also agree that loading only a part of the list is definitely faster. In the end, we want to move to a datastore-like model, however we can probably add email to the index if there is no name already. Or add it to name. Why isn't the Contacts app using "name" for such things by the way ? There is another bug about adding "name" as an index. Would that work for you ?
(In reply to Julien Wajsberg [:julienw] from comment #15) > Why isn't the Contacts app using "name" for such things by the way ? There > is another bug about adding "name" as an index. Would that work for you ? I would love to be able to use "name", but we would need to modify the implementation so we wouldn't need to do normalization inside the application. If we added email, and telephone to the name field, I believe we could use it. Basically we just need a field that we can order on for the UI display. Right now we don't have that, but with some work we could probably use the "name" field.
(In reply to Julien Wajsberg [:julienw] from comment #15) > I think touching scrollTop doesn't trigger reflow :) It wasn't so much triggering a reflow, but I thought I read on another bug somewhere that it prevented some optimizations. For example, reading scrollTop forced gecko to perform style calculations immediately instead of deferring them, etc. I really haven't dug into it, though, so this could be completely wrong. In any case, its probably a separate issue from this. > In the end, we want to move to a datastore-like model, however we can > probably add email to the index if there is no name already. Or add it to > name. I guess I'm still waiting to see how datastore materializes. My concern is its API still requires loading all data into memory as far as I can tell. I'm also generally concerned that datastore/caching approaches are counter to our goal of creating good web APIs for domain specific things like contacts. It just feels like papering over the problems without solving them. I worry this will deter other platforms from eventually adopting our contacts API. Hopefully I am just worrying for no reason, though. :-)
(In reply to Ben Kelly [:bkelly] from comment #17) > (In reply to Julien Wajsberg [:julienw] from comment #15) > > I think touching scrollTop doesn't trigger reflow :) > > It wasn't so much triggering a reflow, but I thought I read on another bug > somewhere that it prevented some optimizations. For example, reading > scrollTop forced gecko to perform style calculations immediately instead of > deferring them, etc. Ah yep, reading it could be another story, right. > I worry this will deter other platforms from eventually adopting our contacts API. but having more stuff in our API means it's more difficult for them to implement it in a cross-browser way... (In reply to Kevin Grandon :kgrandon from comment #16) > (In reply to Julien Wajsberg [:julienw] from comment #15) > > Why isn't the Contacts app using "name" for such things by the way ? There > > is another bug about adding "name" as an index. Would that work for you ? > > I would love to be able to use "name", but we would need to modify the > implementation so we wouldn't need to do normalization inside the > application. If we added email, and telephone to the name field, I believe > we could use it. Sorry, I don't get why the contacts app could not choose what to put in "name". Maybe we need another "index" or "search" exposed field or something like that for just this, because "name" is maybe already used for something else (eg for sim card imported contacts) but I strongly feel such index field value should be given by the app. I also think this could be made an array. And we would just throw it into our internal search index, so the app could effectively completely control the internal index in some way.
(In reply to Julien Wajsberg [:julienw] from comment #18) > Sorry, I don't get why the contacts app could not choose what to put in > "name". Other, 3rd party apps can use contacts API to create contacts. The contacts app is not the standardized gatekeeper for the data, the API is. The app can't assume it personally populated all the fields in a particular way on each contact. This is why it needs to be in the API if we want to avoid loading all the data. > Maybe we need another "index" or "search" exposed field or something like > that for just this, because "name" is maybe already used for something else > (eg for sim card imported contacts) but I strongly feel such index field > value should be given by the app. Again, if its given by the app and is not part of the standard, how do 3rd party apps usefully take advantage of the API? > I also think this could be made an array. And we would just throw it into > our internal search index, so the app could effectively completely control > the internal index in some way. Which internal search index is this? Is it in the API implementation or in app-side storage? Thanks!
(In reply to Ben Kelly [:bkelly] from comment #19) > (In reply to Julien Wajsberg [:julienw] from comment #18) > > Sorry, I don't get why the contacts app could not choose what to put in > > "name". > > Other, 3rd party apps can use contacts API to create contacts. The contacts > app is not the standardized gatekeeper for the data, the API is. The app > can't assume it personally populated all the fields in a particular way on > each contact. This is why it needs to be in the API if we want to avoid > loading all the data. > > > Maybe we need another "index" or "search" exposed field or something like > > that for just this, because "name" is maybe already used for something else > > (eg for sim card imported contacts) but I strongly feel such index field > > value should be given by the app. > > Again, if its given by the app and is not part of the standard, how do 3rd > party apps usefully take advantage of the API? Yep this argument makes sense. And in the same time, having everything in the API means applications won't be able to do clever things. I need to think. :=) > > > I also think this could be made an array. And we would just throw it into > > our internal search index, so the app could effectively completely control > > the internal index in some way. > > Which internal search index is this? Is it in the API implementation or in > app-side storage? I was thinking the search index in the API implementation here.
(In reply to Julien Wajsberg [:julienw] from comment #20) > > > I also think this could be made an array. And we would just throw it into > > > our internal search index, so the app could effectively completely control > > > the internal index in some way. > > > > Which internal search index is this? Is it in the API implementation or in > > app-side storage? > > I was thinking the search index in the API implementation here. I must admit I'm still not sure what you mean by using an array that the app can control here, but I do now see that filterBy can be an array of fields. I wasn't aware of this before. An array for filterBy should allow contacts app to search effectively using the API as defined in the spec. It can just specify givenName, familyName, org, email, and tel as the fields. That should be fairly equivalent to the current DOM based solution. If this is the case then the normalization discussed above is really only needed for effective iteration over an ordered list. I guess another wrinkle to consider is that contacts app also wants to order contacts based on either givenName or familyName depending on a setting. This further makes it harder to use a combination field like name. :-\
Sooo should we close this bug after all ?
(In reply to Julien Wajsberg [:julienw] from comment #22) > Sooo should we close this bug after all ? I don't think so, but I need to investigate further. Even if filterBy can accept an array, it's unlikely that we can construct the requests in a way that handles all of the current normalization cases that the app needs. Like Ben said earlier, there's three routes we can take here: > 1) Drop features from the app. > 2) Improve the API to permit implementing the features without loading all the contacts. > 3) Declare API bankruptcy and implement our own storage in the app.
Ben was also saying: c) Have the app perform a separate query for contacts without names and treat them specially. They may appear in the list slowly, etc. This may be reasonable given that large lists of contacts are likely to be well organized by name in order to be useful. d) The project could decide that filing contacts with no name, but an email in the correct letter group is not worth the performance costs of the full load or complexities of (c). In this case we could populate all of these contacts in the '#' group even if they have an email. Would that go into your (1) ? :)
I don't think C is acceptable from a UX point of view. This would still call for loading all contacts to potentially display one in the first or second page. D basically maps to the first point.
(In reply to Kevin Grandon :kgrandon from comment #23) > I don't think so, but I need to investigate further. Even if filterBy can > accept an array, it's unlikely that we can construct the requests in a way > that handles all of the current normalization cases that the app needs. Kevin, can you see how your prototype works using array filterBy requests? I think the reference-workload-heavy contacts should represent close to a worst case. We will be requesting many more contacts in order to normalize, but since all the contacts have names we will just be ignoring the additional values. If that works, it seems we can at least close this normalization bug.
(In reply to Ben Kelly [:bkelly] from comment #26) > Kevin, can you see how your prototype works using array filterBy requests? > I think the reference-workload-heavy contacts should represent close to a > worst case. We will be requesting many more contacts in order to normalize, > but since all the contacts have names we will just be ignoring the > additional values. > > If that works, it seems we can at least close this normalization bug. I'll upload a prototype soon. I don't think just filterBy will cut it though. I'm pretty sure we need supporting sorting and normalization from the API, which would map to your 2nd item. FYI - I'm totally fine with any of the options that you provided above. We should just figure out who would need to sign off on the various options and start talking to them. > 1) Drop features from the app. > 2) Improve the API to permit implementing the features without loading all the contacts. > 3) Declare API bankruptcy and implement our own storage in the app.
(In reply to Kevin Grandon :kgrandon from comment #27) > I'll upload a prototype soon. I don't think just filterBy will cut it > though. I'm pretty sure we need supporting sorting and normalization from > the API, which would map to your 2nd item. Kevin clarified this on IRC for me. The multi-value filterBy is inadequate because sortBy is still single field. In order to use a cursor, without a normalized field, then we would need both multi-value filterBy (which exists) and also sorting to be dependent on whichever field matched the filter.
Sounds like a reasonable request. What do you think Gregor ?
(In reply to Ben Kelly [:bkelly] from comment #28) > (In reply to Kevin Grandon :kgrandon from comment #27) > > I'll upload a prototype soon. I don't think just filterBy will cut it > > though. I'm pretty sure we need supporting sorting and normalization from > > the API, which would map to your 2nd item. > > Kevin clarified this on IRC for me. The multi-value filterBy is inadequate > because sortBy is still single field. In order to use a cursor, without a > normalized field, then we would need both multi-value filterBy (which > exists) and also sorting to be dependent on whichever field matched the > filter. Yeah no reason why we should keep this limitation. It was needed for an earlier implementation but the sortfunction became more mature and it should be easy to support.
Incidentally, the current contacts app sorting solution interacts poorly with the normalization as well. I wrote bug 913183 to track that.
Component: DOM: Device Interfaces → DOM: Contacts
Priority: -- → P3
I think that this bug is no longer valid with the transition plan. Please re-open if you think we should still do this.
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → INVALID
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.