Closed Bug 1280630 Opened 8 years ago Closed 8 years ago

Add telemetry probes to know which back-end is used the most

Categories

(Core :: Audio/Video: cubeb, defect, P2)

defect

Tracking

()

RESOLVED FIXED
mozilla50
Tracking Status
firefox50 --- fixed

People

(Reporter: padenot, Assigned: padenot)

References

Details

Attachments

(1 file)

For example, we want to know how often Vista/7/8/10 machines fallback to winmm, or how much alsa users there is compared to pulse on Linux.
This will allow us to answer the following:

- Do we get a lot of audio failures
- Is winmm important to maintain even if we don't support XP
- Is Alsa widely used on Linux
- Is audiotrack still necessary on Android

Review commit: https://reviewboard.mozilla.org/r/59686/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/59686/
Attachment #8763506 - Flags: review?(kinetik)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Benjamin, we have a number of audio backend that can be used in Firefox, to play back and record audio, we'd like to know if some users are still using the old backends (for example because of bugs on our side or weird hardware), or if we can safely remove old code, and where we should focus our attention.

This is sending back a enum that is the audio back-end in use by Firefox, and is collected the first time the application process is trying to open an audio stream. Failures to get access to the system-level audio subsystem is also recorded.
Attachment #8763506 - Flags: feedback?(benjamin)
Assignee: nobody → padenot
Rank: 25
Priority: -- → P2
Attachment #8763506 - Flags: review?(kinetik)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

https://reviewboard.mozilla.org/r/59686/#review56814

::: dom/media/CubebUtils.cpp:179
(Diff revision 1)
> +    return nullptr;
> +  }
> +
> +  for (uint32_t i = 0; i < ArrayLength(BACKEND_ID_LENGTH); i++) {
> +    if (!strcmp(cubeb_get_backend_id(sCubebContext), backend_id_str[i])) {
> +      Telemetry::Accumulate(Telemetry::AUDIOSTREAM_BACKEND_USED, i);

If this was a keyed histogram we could simply pass the string from cubeb_get_backend_id without maintaining a second table of strings.

::: toolkit/components/telemetry/Histograms.json:98
(Diff revision 1)
>      "high": 10000,
>      "n_buckets": 50,
>      "description": "The length of time (in milliseconds) for the subsequent opens of AudioStream."
>    },
> +  "AUDIOSTREAM_BACKEND_USED": {
> +    "alert_emails": ["padenot@mozilla.com"],

Add me to the alerts, please.
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59686/diff/1-2/
Attachment #8763506 - Flags: feedback?(benjamin) → review?(kinetik)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

(see comment 2)
Attachment #8763506 - Flags: feedback?(benjamin)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

https://reviewboard.mozilla.org/r/59686/#review56858
Attachment #8763506 - Flags: review?(kinetik) → review+
I don't see why a keyed histogram is required in this case. Keyed histograms are far more expensive to aggregate, if you have a known enumeration of possible items. This should be an enumerated histogram instead. You can leave a few "extra" values in the enumeration to account for possible future backends.
Attachment #8763506 - Flags: feedback?(benjamin) → feedback-
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #7)
> I don't see why a keyed histogram is required in this case. Keyed histograms
> are far more expensive to aggregate, if you have a known enumeration of
> possible items. This should be an enumerated histogram instead. You can
> leave a few "extra" values in the enumeration to account for possible future
> backends.

The docs on MDN (https://developer.mozilla.org/en-US/docs/Mozilla/Performance/Adding_a_new_Telemetry_probe) should probably discourage the use of keyed histograms, then, because the description and example seems to match the use case here quite well.

Paul, if you remove the comment about the table requiring synchonization with the table in cubeb.c and alter the search loop to report backends that aren't in the table as "unknown" I'm fine with the old patch.  I'm concerned that the chance of the tables staying in sync is low given the code lives in different repos.  Alternatively, we can change cubeb's API to return an enum rather than a freeform string to avoid having two tables.
Thinking about it a bit more, I think we want to report the backend ID after the first attempt to create a stream, because we want to differentiate between cases where the fallback also doesn't work and when it does.  For example, on Windows, if there's no audio hardware we'll "fall back" to WinMM, but creating a stream with that backend would also fail.  We need to differentiate that case from a successful fallback to WinMM, otherwise the stats for fallbacks will be artificially high.
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59686/diff/2-3/
Attachment #8763506 - Attachment description: Bug 1280630 - Add telemetry probes to know which back-end is used the most. → Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.
Attachment #8763506 - Flags: feedback-
Hrm this needs a little bit more work.
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59686/diff/3-4/
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59686/diff/4-5/
So, this reverts back to using indices. It also add three index that don't correspond to a particular back-end:
- Unknown backend
- Failure to open a stream, the first time a stream is ever being opened in this session
- Failure to open a stream, after a first stream has been successfully opened.
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Benjamin, here is an updated patch. It's collecting the same things (see comment 2), but with a enumerated histogram instead of a keyed histogram.
Attachment #8763506 - Flags: feedback?(benjamin)
Telemetry that expires "never" and is also opt-out has a fairly high bar. Can you explain:

* The value to users that is associated with collecting this data?
* Who is responsible for generating the reports?
* Who is responsible for monitoring the data "forever" and how often that will happen?

Also on a more technical note, please improve this description: "The audio backend used in cubeb for audio playback and input.". Reading this, it's impossible to know what the collected numbers mean. Can we either list them in this description, or reference some enumeration definition in the source code?
Flags: needinfo?(padenot)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #18)
> Telemetry that expires "never" and is also opt-out has a fairly high bar.
> Can you explain:
> 
> * The value to users that is associated with collecting this data?

This lets us know:
- Whether some user experience issues with a particular audio back-end, and fall back to a back-end that is older. For example, we suspect that some users that should be on the WASAPI back-end (that is modern and low latency) are falling back to the WINMM back-end, that is higher latency and less maintained. The people working on Chrome told us they have this issue, but we can't really take action before measuring.
- Whether some back-end that we still maintain are used, and how often (for example, old Linux and Android APIs).
- Whether some user experience failure to open audio streams _after_ some streams have been opened successfully, hinting at driver issues or leaks or the like. We know some VM setups have this issue, but we don't know how often it happens in the wild.

This let us invest developer time where it's really needed instead of guessing. For now, we're investing a lot on Windows' most modern back-end (and have mostly stopped working on the old back-end), because it seems that it is where the users are, but I would be better to have hard data.

> * Who is responsible for generating the reports?

I suppose myself and/or Matthew Gregan (the two people listed in the definition of the histogram).

> * Who is responsible for monitoring the data "forever" and how often that
> will happen?

That was not written anywhere indeed, but the reasoning here is to check the data periodically, and specifically just after the a new version of an operating system has been released.

For example, the last version of OSX (El Capitan) did break a number of software because some audio hardware were made incompatible. Windows 8 introduced new restrictions compared to Windows 7 on a specific API, that might have changed the numbers. We can't know for certain before measuring.

In addition to OS and other software updates, we know that some USB headsets cause issues with Firefox, and could hit the market anytime.

> Also on a more technical note, please improve this description: "The audio
> backend used in cubeb for audio playback and input.". Reading this, it's
> impossible to know what the collected numbers mean. Can we either list them
> in this description, or reference some enumeration definition in the source
> code?

Absolutely, I can rephrase this and point to the enum in the code that explains what the numbers mean. Would:

"The operating system audio back-end used when successfully opening an audio stream, or whether the failure occurred on the first try or not"

be more appropriate ? I tried to model this message from other messages in `Histograms.json`, but I end up with long sentences, not sure if this is a problem.

In addition, I set it "opt-out" because I'm afraid that there is a non-trivial bias between the population that opt in and opt out of those measurements, but I think I should have asked you whether this is the case in practice (or even, if we know that this is the case in practice). Do we have some documentation on the matter ? I think we could make it opt-in if my concerns are unfounded.
Flags: needinfo?(padenot) → needinfo?(benjamin)
> This lets us know:
> - Whether some user experience issues with a particular audio back-end, and
> fall back to a back-end that is older. For example, we suspect that some
> users that should be on the WASAPI back-end (that is modern and low latency)
> are falling back to the WINMM back-end, that is higher latency and less
> maintained. The people working on Chrome told us they have this issue, but
> we can't really take action before measuring.

So you don't need to correlate this against other failure metrics: it's simply enough to know that a user is on the winmm backend?

> - Whether some back-end that we still maintain are used, and how often (for
> example, old Linux and Android APIs).
> - Whether some user experience failure to open audio streams _after_ some
> streams have been opened successfully, hinting at driver issues or leaks or
> the like. We know some VM setups have this issue, but we don't know how
> often it happens in the wild.

That's a pretty clear user benefit. It sounds like you'll have to do some custom data querying and analysis: please file bugs and let RyanVM know if you need training or mentoring for this (we're eager to help).

> > * Who is responsible for monitoring the data "forever" and how often that
> > will happen?
> 
> That was not written anywhere indeed, but the reasoning here is to check the
> data periodically, and specifically just after the a new version of an
> operating system has been released.
> 
> For example, the last version of OSX (El Capitan) did break a number of
> software because some audio hardware were made incompatible. Windows 8
> introduced new restrictions compared to Windows 7 on a specific API, that
> might have changed the numbers. We can't know for certain before measuring.
> 
> In addition to OS and other software updates, we know that some USB headsets
> cause issues with Firefox, and could hit the market anytime.

This sounds a lot more speculative: in particular, even if you started to see changes here, we don't have enough data to correlate it to particular headsets or driver updates or anything like that. Are you sure that doing this properly is worth the effort?

If so, you probably need to do some data engineering to actually alert on changes to these metrics and distributions on a fairly regularly basis (daily or at most weekly). There is no pre-built system to do that currently.

If you can commit to that, I'm happy to approve this as expires: never. Otherwise, I suggest we start out with expires: 55 which gives you six months to examine the data and decide whether it's valuable over a longer period.
 
> "The operating system audio back-end used when successfully opening an audio
> stream, or whether the failure occurred on the first try or not"
> 
> be more appropriate ? I tried to model this message from other messages in
> `Histograms.json`, but I end up with long sentences, not sure if this is a
> problem.

Yes, that's better, if you also link to the enumeration. Long sentences are not a problem at all: they show up properly wrapped on telemetry.mozilla.org and in the autogenerated docs we're working on.

> In addition, I set it "opt-out" because I'm afraid that there is a
> non-trivial bias between the population that opt in and opt out of those
> measurements, but I think I should have asked you whether this is the case
> in practice (or even, if we know that this is the case in practice). Do we
> have some documentation on the matter ? I think we could make it opt-in if
> my concerns are unfounded.

There is a bias of rather unknown amounts. Be aware that the opt-in/prerelease population is reported on telemetry.mozilla.org, but the release population is not (for cost reasons), so you'll have to generate your own reports using either sql.telemetry.mozilla.org or analysis.telemetry.mozilla.org
Flags: needinfo?(benjamin)
(In reply to Benjamin Smedberg  [:bsmedberg] from comment #20)
> So you don't need to correlate this against other failure metrics: it's
> simply enough to know that a user is on the winmm backend?

Yes, for a first cut. We simply want to know if this fallback happens, and in general, what is the distribution between back-ends on each platform.

> That's a pretty clear user benefit. It sounds like you'll have to do some
> custom data querying and analysis: please file bugs and let RyanVM know if
> you need training or mentoring for this (we're eager to help).

Sure, I'll let you or him know.

> This sounds a lot more speculative: in particular, even if you started to
> see changes here, we don't have enough data to correlate it to particular
> headsets or driver updates or anything like that. Are you sure that doing
> this properly is worth the effort?

Well, we don't know how much of a problem we have, so it's a bit premature to put in some other reporting mechanism (à la gfx).

> If so, you probably need to do some data engineering to actually alert on
> changes to these metrics and distributions on a fairly regularly basis
> (daily or at most weekly). There is no pre-built system to do that currently.
> 
> If you can commit to that, I'm happy to approve this as expires: never.
> Otherwise, I suggest we start out with expires: 55 which gives you six
> months to examine the data and decide whether it's valuable over a longer
> period.

I suppose five release is a good start, I'll amend the patch.

I can't commit to do a manual task very often, but I can write some code that do it for me. However I don't want to duplicate some work that is already happening, and would probably be better than my hacked up solution, do we want to work on automatic reporting or something? What are the emails in `Histograms.json` for ?

> > "The operating system audio back-end used when successfully opening an audio
> > stream, or whether the failure occurred on the first try or not"
> > 
> > be more appropriate ? I tried to model this message from other messages in
> > `Histograms.json`, but I end up with long sentences, not sure if this is a
> > problem.
> 
> Yes, that's better, if you also link to the enumeration. Long sentences are
> not a problem at all: they show up properly wrapped on telemetry.mozilla.org
> and in the autogenerated docs we're working on.

Right, I'll put in the link to our enum.

> > In addition, I set it "opt-out" because I'm afraid that there is a
> > non-trivial bias between the population that opt in and opt out of those
> > measurements, but I think I should have asked you whether this is the case
> > in practice (or even, if we know that this is the case in practice). Do we
> > have some documentation on the matter ? I think we could make it opt-in if
> > my concerns are unfounded.
> 
> There is a bias of rather unknown amounts. Be aware that the
> opt-in/prerelease population is reported on telemetry.mozilla.org, but the
> release population is not (for cost reasons), so you'll have to generate
> your own reports using either sql.telemetry.mozilla.org or
> analysis.telemetry.mozilla.org

That's fine, I sure I can figure it out or ask some questions. We're mostly interested in the release population because, again, of population bias.
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

Review request updated; see interdiff: https://reviewboard.mozilla.org/r/59686/diff/5-6/
Attachment #8763506 - Flags: feedback?(benjamin)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

This includes the changes from comment 21:
- Activate the probe up to version 55
- Change the description and put a link to the enum
Attachment #8763506 - Flags: feedback?(benjamin)
Comment on attachment 8763506 [details]
Bug 1280630 - Add telemetry probes to know which cubeb back-end is used the most.

https://reviewboard.mozilla.org/r/59686/#review59748

data-review=me (I only reviewed the histogram definition, not the related code).
Attachment #8763506 - Flags: review+
Attachment #8763506 - Flags: feedback?(benjamin)
Pushed by paul@paul.cx:
https://hg.mozilla.org/integration/mozilla-inbound/rev/7ad580ff20db
Add telemetry probes to know which cubeb back-end is used the most. r=kinetik data-review=bsmedberg
https://hg.mozilla.org/mozilla-central/rev/7ad580ff20db
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla50
Depends on: 1286041
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: