Closed
Bug 1043669
Opened 10 years ago
Closed 10 years ago
Clarify what data to send on click vs view vs fetch
Categories
(Firefox :: New Tab Page, defect)
Firefox
New Tab Page
Tracking
()
RESOLVED
FIXED
People
(Reporter: Mardak, Unassigned)
References
(Blocks 1 open bug)
Details
Bug 975235 added per-click data for directory tiles:
- list id, e.g., server-hosted-list-1
- link index, e.g., 0 for facebook
- tile index, e.g., 0 for top left
- action, e.g., "click" for regular click for visit
- score, e.g., 1000 frecency
- pinned, e.g., true if pinned
Bug 993906 added per-day directory links fetch data:
- locale, e.g., en-US
- optional directoryCount, e.g., {sponsored:1,affiliate:2,organic:3} or undefined
Bug 1042214 will add per-view data for each shown tile:
- link type, e.g., history
- (tile index, e.g., 0 for top left -- implied through the position)
- score, e.g., 12345 frecency
- pinned, e.g., true if pinned
- url, e.g., http://foo.bar
Do we still need directoryCount with the daily fetch request? We get that through the view ping, but is it needed to serve the links data?
For the per-view url data, do we want to cleanse/filter out some urls on the client instead of the server? E.g., passwords or intranet hostnames?
Should the per-view send back "null" if the user set it to blank tiles? Should we distinguish it from "[]" if a user blocked out all tiles even directory tiles?
Reporter | ||
Comment 1•10 years ago
|
||
Sounds like we'll want HLL value from bug 972933 to be sent with the click and view pings. Not sure about fetch?
Blocks: 972933
Reporter | ||
Comment 2•10 years ago
|
||
To be clear, the HLL value is to get cardinality estimates because we don't want to have unique IDs to reidentify users even though partners would really want that ability.
Reporter | ||
Updated•10 years ago
|
Reporter | ||
Comment 3•10 years ago
|
||
clarkbw, I got an update from oyiptong that we'll need the HLL value from bug 972933. And that impression and click pings will look the same (sending all visible tiles data) except click pings will have an "action" value for the appropriate tile.
E.g.,
click ping = [{tile 1 url}, {tile 2 url, action = click}, {tile 3 url}]
click ping = [{tile 1 url}, {tile 2 url}, {tile 3 url, action = block}]
Comment 4•10 years ago
|
||
payload
-------
We went back to our measurement requirements and here is what we'd like to be sent on click and impression, as a POST request with a JSON payload:
{
locale: _, # locale for user agent
urls: [ # metadata for list of urls currently appearing in a newtab page, including history links
{
tile_id: _, # an id provided in the fetch payload, if a tile
url: _, # a url if a history tile
pos: _, # the position of the tile in the grid
pinned: _, # boolean, whether or not the tile is pinned
action: _ # one of ["click", "block", "pin"] or null
},
...
],
hll: { count: _, index: _ } # HLL parameters
}
Can this payload be sent both for impression and click?
For impression, the action will be absent for all urls.
url meta data
-------------
* if the tile is an enhanced history tile, it will have both the tile_id and url
* if the tile is a directory tile, it will only have a tile_id
* if the tile is a history tile, it will have just the url
source attribution
------------------
for tiles that are "enhanced", when a click happen, can we attach a header?
Referer: https://tiles.mozilla.org/?type=directory
Reporter | ||
Comment 5•10 years ago
|
||
(In reply to Olivier Yiptong [:oyiptong] from comment #4)
> source attribution
> for tiles that are "enhanced", when a click happen, can we attach a header?
> Referer: https://tiles.mozilla.org/?type=directory
You're saying if the original link, say bugzilla.mozilla.org enhanced by mozilla.org, the browser should load bugzilla.mozilla.org and set the HTTP Referer header? What is this for?
Reporter | ||
Comment 6•10 years ago
|
||
clarkbw, should I also remove the telemetry probes for shown (per index and per type) and click (per type)?
Flags: needinfo?(clarkbw)
Reporter | ||
Updated•10 years ago
|
Flags: needinfo?(clarkbw)
Comment 7•10 years ago
|
||
In the interest of optimizing the size of the message and making it a bit easier to parse, I suggest replacing the 'action' attribute in each URL. Instead we have a single attribute at the top level that indicates the action ('click', 'block', or 'pin') whose value is the *index* of the tile in the 'urls' array.
Comment 8•10 years ago
|
||
In the interest of optimizing the size of the message, we could remove the 'pos' attribute in the 'urls' array and just rely on the natural index of the tile within the array. Unless, of course this is a sparse array.
Reporter | ||
Comment 9•10 years ago
|
||
(In reply to Tim Spurway [:tspurway] from comment #7)
> I suggest replacing the 'action' attribute in each URL.
My current patch has "action" (as well as "id", "url", "pin") as optional. So only the one tile that has an action will have an "action" entry.
(In reply to Tim Spurway [:tspurway] from comment #8)
> In the interest of optimizing the size of the message, we could remove the
> 'pos' attribute in the 'urls' array and just rely on the natural index of
> the tile within the array. Unless, of course this is a sparse array.
Sounds good. The logic will be if 'pos' would be the same as the natural index, exclude it. E.g., [{id:"foo"},{url:"http://bar",pinned:1,pos:8}] for 2 tiles shown where the first tile is a directory tile and the second tile is a pinned history tile in the bottom right of a 3x3 grid.
Comment 10•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #5)
> (In reply to Olivier Yiptong [:oyiptong] from comment #4)
> > source attribution
> > for tiles that are "enhanced", when a click happen, can we attach a header?
> > Referer: https://tiles.mozilla.org/?type=directory
> You're saying if the original link, say bugzilla.mozilla.org enhanced by
> mozilla.org, the browser should load bugzilla.mozilla.org and set the HTTP
> Referer header? What is this for?
Yes. At some point, either Sean or Jason spoke about tile targets wanting to know that traffic came from Tiles. One way to do so would be to have a referrer url.
We can probably do that later, tracked by a separate bug. It's also not clear if this became a requirement.
Reporter | ||
Comment 11•10 years ago
|
||
oyiptong, I asked in bug 1042214 comment 14 about including the window client resolution (not the screen resolution) in pings. If we do want that data, should this just appear at the top level next to locale: en-US?
{
"locale": "en-US",
"height": 799,
"width": 1436,
"tiles": [..]
}
Or maybe "res": "1436x799" or "size": [1436,799] ?
Also, if we're saying view/click/fetch all share the same payload, this size information would be optional (in addition to tiles data optional) for fetch. (There's no associated window for fetch and there could be multiple monitors, etc.)
Comment 12•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #11)
> {
> "locale": "en-US",
> "height": 799,
> "width": 1436,
> "tiles": [..]
> }
>
> Or maybe "res": "1436x799" or "size": [1436,799] ?
>
separate height and width fields are OK.
> Also, if we're saying view/click/fetch all share the same payload, this size
> information would be optional (in addition to tiles data optional) for
> fetch. (There's no associated window for fetch and there could be multiple
> monitors, etc.)
OK. I will make the required changes in the inferno rules and Redshift schema.
Reporter | ||
Comment 13•10 years ago
|
||
clarkbw, I added 2 more action types for bug 1040369: sponsored (click or dismiss the sponsored explanation text), sponsored-link (click on the Learn more link within the explanation area). Do we care about them? If so, tspurway/oyiptong, we'll need to update rules/schema again.
Flags: needinfo?(clarkbw)
Reporter | ||
Comment 14•10 years ago
|
||
Sample payloads:
{"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10}],"sponsored-link":3}
clicked "learn more..." for wired
{"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},{"id":10}],"sponsored":3}
clicked "sponsored" to show/hide the explanation text
Comment 15•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #14)
> Sample payloads:
>
> {"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/
> about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},
> {"id":10}],"sponsored-link":3}
>
> clicked "learn more..." for wired
>
>
> {"locale":"en-US","tiles":[{"id":5,"url":"https://www.mozilla.org/en-US/
> about/"},{"id":1},{"id":2},{"id":3},{"id":6},{"id":7},{"id":8},{"id":9},
> {"id":10}],"sponsored":3}
>
> clicked "sponsored" to show/hide the explanation text
Can I make a request that it be "sponsored_link" instead of "sponsored-link" ? In general, it helps if we keep JSON properties legal identifiers so that we don't need to have a bunch of little transformations as it travels towards the data warehouse.
Reporter | ||
Comment 16•10 years ago
|
||
(In reply to Tim Spurway [:tspurway] from comment #15)
> Can I make a request that it be "sponsored_link" instead of "sponsored-link"
Not a problem. I'm guessing nothing is camelCase, so not sponsoredLink? Or slink / sLink?
> it helps if we keep JSON properties legal identifiers
Curious, what part of the data processing is constraining this identifier? JSON accepts any string I believe, so it could be "sponsored link" too.
Comment 17•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #16)
> > it helps if we keep JSON properties legal identifiers
> Curious, what part of the data processing is constraining this identifier?
> JSON accepts any string I believe, so it could be "sponsored link" too.
Well, it's not a huge problem, but it's convenient to reuse the JSON keys as column names in the database. It saves one line of code, in this case:
parts['sponsored_link'] = parts['sponsored-link']
The problem becomes more pronounced when there are many such transformations, type conversions, and non-standard naming across multiple log files.
Reporter | ||
Comment 18•10 years ago
|
||
(In reply to Tim Spurway [:tspurway] from comment #17)
> Well, it's not a huge problem, but it's convenient to reuse the JSON keys as
> column names in the database.
I was asking more to just figure out what the actual constraints are. So as table columns, I would assume they're case insensitive?
No "click" for link click and "Click" for sponsored link click? ;) :p So we'll want to avoid camelCasing too.
I've changed the client to use "sponsored_link" as the action.
Comment 19•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #18)
> I was asking more to just figure out what the actual constraints are. So as
> table columns, I would assume they're case insensitive?
>
> No "click" for link click and "Click" for sponsored link click? ;) :p So
> we'll want to avoid camelCasing too.
>
> I've changed the client to use "sponsored_link" as the action.
right. i would keep the following constraints regarding logfile key names:
- all lowercase
- delimit words with '_'s
- all dictionary keys should be valid identifiers in Javascript, Python and SQL (avoid reserved words if possible)
- name keys precisely. don't worry about key length, compression is your friend
- invent new keys *only* when you can't find a suitable key in your codebase, other logfiles, or database schemas project-wide
- mind your datatypes, use string types only for string data. (BAD: "impression_count": "15")
- consider if using JSON 'null' helps or hinders. an alternative is to make the key optional
- ISO dates are sortable as strings (eg. 2014-10-27)
- timestamps in milliseconds
Comment 20•10 years ago
|
||
(In reply to Ed Lee :Mardak from comment #13)
> clarkbw, I added 2 more action types for bug 1040369: sponsored (click or
> dismiss the sponsored explanation text), sponsored-link (click on the Learn
> more link within the explanation area). Do we care about them? If so,
> tspurway/oyiptong, we'll need to update rules/schema again.
Interesting, but not required right now.
Flags: needinfo?(clarkbw)
You need to log in
before you can comment on or make changes to this bug.
Description
•