Closed Bug 1297556 Opened 8 years ago Closed 8 years ago

Add telemetry to know how many crashes happens in decoding process.

Categories

(Firefox for Android Graveyard :: Audio/Video, defect, P2)

defect

Tracking

(firefox52 fixed)

RESOLVED FIXED
Firefox 52
Tracking Status
firefox52 --- fixed

People

(Reporter: bwu, Assigned: JamesCheng)

References

Details

Attachments

(1 file)

Via telemetry, we should be able to know how many crashes happening in the "decoding process" and we successfully prevent the main process from dying together. This could be doable by checking the death notification from binder.
Blocks: 1257777
Priority: -- → P2
I'm interested to doing this. I take this bug to my working queue.
Assignee: nobody → jacheng
I plan to add the telemetry to record the crash case on both side Decoding Service Process and Main Fennec App when decoding. Do we need to record other information besides crashing on Fennec?
Flags: needinfo?(bwu)
Hi Blake, What do you think whom shall we email to when we got crash issues? like this part https://dxr.mozilla.org/mozilla-central/rev/7e873393cc11d326338779e5a3ed2da031e30936/toolkit/components/telemetry/Histograms.json#6204 Thanks.
(In reply to James Cheng[:JamesCheng] from comment #2) > I plan to add the telemetry to record the crash case on both side Decoding > Service Process and Main Fennec App when decoding. > > Do we need to record other information besides crashing on Fennec? I think at least we should know the number of crashes that we avoid successfully. (In reply to James Cheng[:JamesCheng] from comment #3) > Hi Blake, > What do you think whom shall we email to when we got crash issues? How about John and you? :-)
Flags: needinfo?(bwu)
Attachment #8790564 - Attachment is obsolete: true
Attachment #8790564 - Flags: review?(jolin)
Attachment #8790564 - Flags: review?(benjamin)
Hi bsmedberg, Please help to review the telemetry part, thank you. Hi jolin, Please help to review the patch if I add at the correct point to collect the crash happened. Thanks.
Comment on attachment 8790564 [details] Bug 1297556 - Add telemetry to know how many crashes happens in decoding process. https://reviewboard.mozilla.org/r/78326/#review77080 ::: toolkit/components/telemetry/Histograms.json:6207 (Diff revision 2) > + "MEDIA_APP_CRASH_WHILE_DECODING": { > + "alert_emails": ["bwu@mozilla.com", "jolin@mozilla.com", "jacheng@mozilla.com"], > + "expires_in_version": "57", > + "kind": "count", > + "bug_numbers": [1297556, 1257777], > + "description": "Number of users who encountered fennec app crash while decoding media content.", This is a data review only: I don't know enough about android data collection to be sure this works. This description is not very accurate. This histogram by itself doesn't record the number of *users*: it records a count of crashes. So please rewrite this to describe what is recorded and when. So I'd suggest something like: "Records a value each time Fennec crashes while decoding." But also, what does "crash" mean in this context? Does the FxAndroid app actually stop? If so, how can we record reliable telemetry? I strongly suggest making this description much more detailed. Similar in the second histogram below. Is there a Fennec stability/quality dashboard that you plan to hook this data up to? If so, what's the bug# for that work? r- I'd like to see this again with fixes.
Attachment #8790564 - Flags: review?(benjamin) → review-
(In reply to Benjamin Smedberg [:bsmedberg] from comment #8) > Comment on attachment 8790564 [details] > Bug 1297556 - Add telemetry to know how many crashes happens in decoding > process. > > https://reviewboard.mozilla.org/r/78326/#review77080 > > ::: toolkit/components/telemetry/Histograms.json:6207 > (Diff revision 2) > > + "MEDIA_APP_CRASH_WHILE_DECODING": { > > + "alert_emails": ["bwu@mozilla.com", "jolin@mozilla.com", "jacheng@mozilla.com"], > > + "expires_in_version": "57", > > + "kind": "count", > > + "bug_numbers": [1297556, 1257777], > > + "description": "Number of users who encountered fennec app crash while decoding media content.", > > This is a data review only: I don't know enough about android data > collection to be sure this works. > > This description is not very accurate. This histogram by itself doesn't > record the number of *users*: it records a count of crashes. So please > rewrite this to describe what is recorded and when. > Sorry , I will modify as your suggestion below. > So I'd suggest something like: "Records a value each time Fennec crashes > while decoding." But also, what does "crash" mean in this context? Does the > FxAndroid app actually stop? If so, how can we record reliable telemetry? I > strongly suggest making this description much more detailed. > Android has provided such mechanism [1] for developer that we can be notified when other process which we communicated with is been killed. We have suffered from crashing since we use Android stagefright which is not maintained by Mozilla. The crash means the process is actually dead and we can get the notification by another process which is still alive. [1] DeathRecipient https://developer.android.com/reference/android/os/IBinder.DeathRecipient.html [2] linkToDeath(IBinder.DeathRecipient, int) https://developer.android.com/reference/android/os/IBinder.html#linkToDeath(android.os.IBinder.DeathRecipient, int) > Similar in the second histogram below. > > Is there a Fennec stability/quality dashboard that you plan to hook this > data up to? If so, what's the bug# for that work? > No, we just only want to record how many crashes happened on decoding process or Fennec app itself since we have a recovery mechanism to re-launch the decoding process to continue decoding. We want to know how many user will benefit from this. > r- I'd like to see this again with fixes.
(In reply to James Cheng[:JamesCheng] from comment #7) > Hi jolin, > Please help to review the patch if I add at the correct point to collect the > crash happened. Yes, recording in binderDied() is fine. However, I'm not sure whether tracking the number of app process crashes is helpful here. That's what the crash reports are for, right?
Based on comment 11, It is not necessary to record App crashing. I removed the code and update the patch. Please help to review again. Thank you!
Comment on attachment 8790564 [details] Bug 1297556 - Add telemetry to know how many crashes happens in decoding process. https://reviewboard.mozilla.org/r/78326/#review78414 Recording crash in `binderDied()` is fine, thank you.
Attachment #8790564 - Flags: review?(jolin)
Comment on attachment 8790564 [details] Bug 1297556 - Add telemetry to know how many crashes happens in decoding process. https://reviewboard.mozilla.org/r/78326/#review78432
Attachment #8790564 - Flags: review+
Comment on attachment 8790564 [details] Bug 1297556 - Add telemetry to know how many crashes happens in decoding process. https://reviewboard.mozilla.org/r/78326/#review78532 data-review=me ::: toolkit/components/telemetry/Histograms.json:6204 (Diff revision 4) > "bug_numbers": [1262659] > - > }, > + "MEDIA_DECODING_PROCESS_CRASH": { > + "alert_emails": ["bwu@mozilla.com", "jolin@mozilla.com", "jacheng@mozilla.com"], > + "expires_in_version": "57", If this data turns out to be correct, we should do additional work to permanently monitor and alert on it as part of overall stability metrics. Can you please file a followup bug on this?
Attachment #8790564 - Flags: review?(benjamin) → review+
Thanks for reviewing and I will file a followup bug later.
Keywords: checkin-needed
Pushed by cbook@mozilla.com: https://hg.mozilla.org/integration/fx-team/rev/6fb92ddf4857 Add telemetry to know how many crashes happens in decoding process. r=bsmedberg, r=jolin
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → FIXED
Target Milestone: --- → Firefox 52
Product: Firefox for Android → Firefox for Android Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: