Closed Bug 1136977 Opened 10 years ago Closed 9 years ago

Audience buckets estimator tool (Bucketerer)

Categories

(Content Services Graveyard :: Classification Engine, defect)

defect
Not set
normal
Points:
13

Tracking

(Not tracked)

RESOLVED FIXED
Iteration:
41.3 - Jun 29

People

(Reporter: mruttley, Assigned: mruttley)

References

Details

(Whiteboard: .001)

Attachments

(1 file)

We need to abstract the classification system. At the moment it is intertwined with the ID and very difficult to decouple. An ideal scenario would be us being able (in either nodejs/python) to: >>> import mozclassifier >>> mozclassifier.LICAClassify("http://www.google.com") ['Web Search', 0.99] >>> mozclassifier.DFRClassify("http://www.google.com") ['Web Search', 1] As simple as that.
(This can be used for inventory projection by classifying telemetry data)
Points: --- → 13
OS: Mac OS X → All
Hardware: x86 → All
Whiteboard: .001
Goal: 1. Classify 1,000 URLs from both 1st and 2nd Telemetry Experiments using UP categories and other classifiers to generate URLs to IAB category mapping. 2. Create UI for 1) URL to category mapping, 2) Category to URL mapping and 3) impression estimates for each category/URL combination for each release channel Rationale: 1. Equips Business Development teams with Related Tiles audience data 2. Allows impression estimation on standard and custom audience buckets User: Mozilla internal, Content Services
Iteration: --- → 39.1 - 9 Mar
Classifier is decoupled from ID already: https://github.com/mzhilyaev/pfeed/blob/master/scripts/testDFR.js this script will take DFR and apply to url,tile being read from stdin
maksik has some work in bug 1136234 to compute inventory for a given set of sites based on the telemetry sites + cooccurence data.
Depends on: 1136234
Iteration: 39.1 - 9 Mar → 39.2 - 23 Mar
I've done some more extensive testing of LICA and DFR and it seems that LICA still outperforms DFR at a much larger scale (1 million documents): https://github.com/matthewruttley/mozclassify (see table in Performance section where it gets 83.9% Precision). I'm 99% sure this is correct, though there are some slight differences in my Python implementation.
Per comment 2, this bug is about classifying 1000 urls/sites from the telemetry experiment. Where are you getting 1 million documents?
Iteration: 39.2 - 23 Mar → 39.3 - 30 Mar
Iteration: 39.3 - 30 Mar → 40.1 - 13 Apr
Iteration: 40.1 - 13 Apr → 40.2 - 27 Apr
The requirements for this bug has shifted to providing audience estimates for buckets containing URLs. The goal is to provide business development an estimation of audience size with percentage probability (maxP) for a given set of URLs. High level requirements: Using a combination of available URL traffic tracking sources like Alexa, ComScore and SimilarSties provide an interface that - 1) Allows selection of one or a set subdomains and domains 2) Outputs an estimate audience size in unique visitors - If more than one domain is inquired, display both duplicated and unduplicated unique visitors 3) Outputs probability (MaxP) of a user visiting at least one of the domains 4) Outputs a similar sites Good to have: - Ability to incorporate Firefox specific audience traffic data - Ability to save audience buckets name and domains to query at a later time - Audience buckets and domain impression estimates - Use structured hierarchical category taxonomy
Blocks: 1140185
Summary: Inventory Projection via Classification Abstraction → Audience buckets estimator tool
Iteration: 40.2 - 27 Apr → 40.3 - 11 May
Background from the Metrics team on how initial audience buckets estimation was created.
Iteration: 40.3 - 11 May → 41.1 - May 25
No longer blocks: 1140185
Iteration: 41.1 - May 25 → 41.2 - Jun 8
Iteration: 41.2 - Jun 8 → 41.3 - Jun 29
Summary: Audience buckets estimator tool → Audience buckets estimator tool (Bucketerer)
Depends on: 1168414
Depends on: 1173569
closing this resolved fixed. if there are any follow up bugs for bucketerer please file them as separate bugs
Status: NEW → RESOLVED
Closed: 9 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: