Closed Bug 1483667 Opened 6 years ago Closed 6 years ago

Pocket personalization V2: add text tagger

Tracking

()

Status:

RESOLVED FIXED

Milestone:

Firefox 64

Iteration:

64.1 - Sep 14

Tracking Flags:

Tracking

Status

firefox64

---

fixed

People

(Reporter: nanj, Assigned: jkoren)

References

(
URL
)

Details

Nan Jiang [:nanj]

Reporter

Description

•

6 years ago

### Description This PR adds the ability to classify text. We define two different classifiers, a Naïve Bayes (NB) classifier, and a multiclass nonnegative matrix factorization (NMF) classifier. Both use a bag of words, TF-IDF vectors as features. The purpose of this code is to allow Firefox to classify pages into topics, by examining the text found on the page. This code is part of the Pocket Personalization v2 experiment which uses content analysis to locally build interest profiles. This code is dark. ### Testing Unit tests This code has no current consumers. ### Related We reviewed this internally on PR https://github.com/Pocket/activity-stream/pull/1 ### See Also https://docs.google.com/document/d/12OtUZywivIvBnO3hmMNjptmIQ8cQLFOIOzJO4bRLqdg/edit https://en.wikipedia.org/wiki/Naive_Bayes_classifier https://en.wikipedia.org/wiki/Non-negative_matrix_factorization https://en.wikipedia.org/wiki/Tf%E2%80%93idf

Nan Jiang [:nanj]

Reporter

Updated

•

6 years ago

Assignee: nobody → jkoren

URL: https://github.com/mozilla/activity-s...

Nan Jiang [:nanj]

Reporter

Comment 1

•

6 years ago

Hey :jkoren, I have a few questions about this feature. Could you clarify? * What's the input of this tagger? I assume it's Places, correct? If so, what tables/columns will be used? How far we want to look back at user's browsing history? * Are we going to use both NB and NMF? Or just try them first to see which one performs better? * How often do we want to conduct this tagging? As you might know, in personalization V1, we build the site affinity profile in the browser daily-idle handler to minimize the impact of calculation. I guess we'd use the same strategy for text tagging as well.

Flags: needinfo?(jkoren)

jonathan koren

Assignee

Comment 2

•

6 years ago

0) This is only the first PR. It's meant to be a bit general. The code that uses it it is coming under another PR. We're trying to queue the PRs up so each part can be looked at effectively, and won't overwhelm anyone. That said... 1) We are currently planning for two sources of input into the tagger: The title and description fields from places db. The source is the title and description fields from the items received from the Pocket servers. In our prototype we've been going back 30 days. Without any sort parallelization, the prototype takes 9 to 10 seconds to process. I would not characterize this as "fast". We don't want small time ranges, because it doesn't capture enough long term interests. 2) We're using a two tier ontology. We've found that the NB algorithm works better on the top level, while the NMF algorithm works better on the lower level. (I don't know why, but it's like a lot better.) So we do use them both. Later, we'll probably replace both these algorithms with something else, but that's what we're using on the Pocket servers now. 3) We've been thinking every 24 hours or so. We could do incremental updates (which would be faster), but it's easier just to recalculate the whole thing while continuing to use the old one until it finishes.

Flags: needinfo?(jkoren)

Tim Spurway [:tspurway]

Updated

•

6 years ago

Iteration: --- → 64.1 (Sep 14)

Priority: -- → P1

[github robot]

Comment 3

•

6 years ago

Commit pushed to master at https://github.com/mozilla/activity-stream https://github.com/mozilla/activity-stream/commit/9c44263a123ed43fce8aea5a0534768d8ff1ab9f Fix Bug 1483667: add text taggers for pocket personalization (#4294)

[github robot]

Updated

•

6 years ago

Status: NEW → RESOLVED

Closed: 6 years ago

Resolution: --- → FIXED

Ed Lee :Mardak

Updated

•

6 years ago

Blocks: 1489962

Ed Lee :Mardak

Comment 4

•

6 years ago

https://hg.mozilla.org/mozilla-central/rev/8dde92f89a24

status-firefox64: --- → fixed

Target Milestone: --- → Firefox 64

Ed Lee :Mardak

Comment 5

•

6 years ago

Backout by btara@mozilla.com: https://hg.mozilla.org/mozilla-central/rev/d2e41f2f964d Backed out changeset 8dde92f89a24 for browser_asrouter_cfr.js failures. a=backout Relanded: https://hg.mozilla.org/mozilla-central/rev/581019e9ea70

Nobody; OK to take it and work on it

Updated

•

5 years ago

Component: Activity Streams: Newtab → New Tab Page

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Pocket personalization V2: add text tagger

Categories

(Firefox :: New Tab Page, enhancement, P1)

Tracking

()

People

(Reporter: nanj, Assigned: jkoren)

References

(
URL
)

Details

Crash Data

Security

(public)

User Story

Description

Updated

Comment 1

Comment 2

Updated

Comment 3

Updated

Updated

Comment 4

Comment 5

Updated