Closed
Bug 1136977
Opened 10 years ago
Closed 9 years ago
Audience buckets estimator tool (Bucketerer)
Categories
(Content Services Graveyard :: Classification Engine, defect)
Content Services Graveyard
Classification Engine
Tracking
(Not tracked)
RESOLVED
FIXED
Iteration:
41.3 - Jun 29
People
(Reporter: mruttley, Assigned: mruttley)
References
Details
(Whiteboard: .001)
Attachments
(1 file)
(deleted),
text/rtf
|
Details |
We need to abstract the classification system. At the moment it is intertwined with the ID and very difficult to decouple.
An ideal scenario would be us being able (in either nodejs/python) to:
>>> import mozclassifier
>>> mozclassifier.LICAClassify("http://www.google.com")
['Web Search', 0.99]
>>> mozclassifier.DFRClassify("http://www.google.com")
['Web Search', 1]
As simple as that.
Assignee | ||
Comment 1•10 years ago
|
||
(This can be used for inventory projection by classifying telemetry data)
Assignee | ||
Updated•10 years ago
|
Points: --- → 13
Updated•10 years ago
|
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: .001
Comment 2•10 years ago
|
||
Goal:
1. Classify 1,000 URLs from both 1st and 2nd Telemetry Experiments using UP categories and other classifiers to generate URLs to IAB category mapping.
2. Create UI for 1) URL to category mapping, 2) Category to URL mapping and 3) impression estimates for each category/URL combination for each release channel
Rationale:
1. Equips Business Development teams with Related Tiles audience data
2. Allows impression estimation on standard and custom audience buckets
User: Mozilla internal, Content Services
Iteration: --- → 39.1 - 9 Mar
Comment 3•10 years ago
|
||
Classifier is decoupled from ID already:
https://github.com/mzhilyaev/pfeed/blob/master/scripts/testDFR.js
this script will take DFR and apply to url,tile being read from stdin
Comment 4•10 years ago
|
||
maksik has some work in bug 1136234 to compute inventory for a given set of sites based on the telemetry sites + cooccurence data.
Depends on: 1136234
Updated•10 years ago
|
Iteration: 39.1 - 9 Mar → 39.2 - 23 Mar
Assignee | ||
Comment 5•10 years ago
|
||
I've done some more extensive testing of LICA and DFR and it seems that LICA still outperforms DFR at a much larger scale (1 million documents): https://github.com/matthewruttley/mozclassify (see table in Performance section where it gets 83.9% Precision). I'm 99% sure this is correct, though there are some slight differences in my Python implementation.
Comment 6•10 years ago
|
||
Per comment 2, this bug is about classifying 1000 urls/sites from the telemetry experiment. Where are you getting 1 million documents?
Updated•10 years ago
|
Iteration: 39.2 - 23 Mar → 39.3 - 30 Mar
Updated•10 years ago
|
Iteration: 39.3 - 30 Mar → 40.1 - 13 Apr
Updated•10 years ago
|
Iteration: 40.1 - 13 Apr → 40.2 - 27 Apr
Comment 7•10 years ago
|
||
The requirements for this bug has shifted to providing audience estimates for buckets containing URLs. The goal is to provide business development an estimation of audience size with percentage probability (maxP) for a given set of URLs.
High level requirements: Using a combination of available URL traffic tracking sources like Alexa, ComScore and SimilarSties provide an interface that -
1) Allows selection of one or a set subdomains and domains
2) Outputs an estimate audience size in unique visitors
- If more than one domain is inquired, display both duplicated and unduplicated unique visitors
3) Outputs probability (MaxP) of a user visiting at least one of the domains
4) Outputs a similar sites
Good to have:
- Ability to incorporate Firefox specific audience traffic data
- Ability to save audience buckets name and domains to query at a later time
- Audience buckets and domain impression estimates
- Use structured hierarchical category taxonomy
Blocks: 1140185
Summary: Inventory Projection via Classification Abstraction → Audience buckets estimator tool
Updated•10 years ago
|
Iteration: 40.2 - 27 Apr → 40.3 - 11 May
Comment 8•10 years ago
|
||
Background from the Metrics team on how initial audience buckets estimation was created.
Updated•9 years ago
|
Iteration: 40.3 - 11 May → 41.1 - May 25
Updated•9 years ago
|
Iteration: 41.1 - May 25 → 41.2 - Jun 8
Updated•9 years ago
|
Iteration: 41.2 - Jun 8 → 41.3 - Jun 29
Updated•9 years ago
|
Summary: Audience buckets estimator tool → Audience buckets estimator tool (Bucketerer)
closing this resolved fixed.
if there are any follow up bugs for bucketerer please file them as separate bugs
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•