Closed
Bug 1483667
Opened 6 years ago
Closed 6 years ago
Pocket personalization V2: add text tagger
Categories
(Firefox :: New Tab Page, enhancement, P1)
Firefox
New Tab Page
Tracking
()
Tracking | Status | |
---|---|---|
firefox64 | --- | fixed |
People
(Reporter: nanj, Assigned: jkoren)
References
()
Details
### Description
This PR adds the ability to classify text. We define two different classifiers, a Naïve Bayes (NB) classifier, and a multiclass nonnegative matrix factorization (NMF) classifier. Both use a bag of words, TF-IDF vectors as features. The purpose of this code is to allow Firefox to classify pages into topics, by examining the text found on the page.
This code is part of the Pocket Personalization v2 experiment which uses content analysis to locally build interest profiles.
This code is dark.
### Testing
Unit tests
This code has no current consumers.
### Related
We reviewed this internally on PR https://github.com/Pocket/activity-stream/pull/1
### See Also
https://docs.google.com/document/d/12OtUZywivIvBnO3hmMNjptmIQ8cQLFOIOzJO4bRLqdg/edit
https://en.wikipedia.org/wiki/Naive_Bayes_classifier
https://en.wikipedia.org/wiki/Non-negative_matrix_factorization
https://en.wikipedia.org/wiki/Tf%E2%80%93idf
Reporter | ||
Updated•6 years ago
|
Assignee: nobody → jkoren
Reporter | ||
Comment 1•6 years ago
|
||
Hey :jkoren, I have a few questions about this feature. Could you clarify?
* What's the input of this tagger? I assume it's Places, correct? If so, what tables/columns will be used? How far we want to look back at user's browsing history?
* Are we going to use both NB and NMF? Or just try them first to see which one performs better?
* How often do we want to conduct this tagging? As you might know, in personalization V1, we build the site affinity profile in the browser daily-idle handler to minimize the impact of calculation. I guess we'd use the same strategy for text tagging as well.
Flags: needinfo?(jkoren)
Assignee | ||
Comment 2•6 years ago
|
||
0) This is only the first PR. It's meant to be a bit general. The code that uses it it is coming under another PR. We're trying to queue the PRs up so each part can be looked at effectively, and won't overwhelm anyone. That said...
1) We are currently planning for two sources of input into the tagger: The title and description fields from places db. The source is the title and description fields from the items received from the Pocket servers. In our prototype we've been going back 30 days. Without any sort parallelization, the prototype takes 9 to 10 seconds to process. I would not characterize this as "fast". We don't want small time ranges, because it doesn't capture enough long term interests.
2) We're using a two tier ontology. We've found that the NB algorithm works better on the top level, while the NMF algorithm works better on the lower level. (I don't know why, but it's like a lot better.) So we do use them both. Later, we'll probably replace both these algorithms with something else, but that's what we're using on the Pocket servers now.
3) We've been thinking every 24 hours or so. We could do incremental updates (which would be faster), but it's easier just to recalculate the whole thing while continuing to use the old one until it finishes.
Flags: needinfo?(jkoren)
Updated•6 years ago
|
Iteration: --- → 64.1 (Sep 14)
Priority: -- → P1
Comment 3•6 years ago
|
||
Commit pushed to master at https://github.com/mozilla/activity-stream
https://github.com/mozilla/activity-stream/commit/9c44263a123ed43fce8aea5a0534768d8ff1ab9f
Fix Bug 1483667: add text taggers for pocket personalization (#4294)
Updated•6 years ago
|
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Comment 4•6 years ago
|
||
status-firefox64:
--- → fixed
Target Milestone: --- → Firefox 64
Comment 5•6 years ago
|
||
Backout by btara@mozilla.com:
https://hg.mozilla.org/mozilla-central/rev/d2e41f2f964d
Backed out changeset 8dde92f89a24 for browser_asrouter_cfr.js failures. a=backout
Relanded:
https://hg.mozilla.org/mozilla-central/rev/581019e9ea70
Updated•5 years ago
|
Component: Activity Streams: Newtab → New Tab Page
You need to log in
before you can comment on or make changes to this bug.
Description
•