Closed
Bug 723802
Opened 13 years ago
Closed 13 years ago
Add telemetry for startup crash detection
Categories
(Toolkit :: Telemetry, enhancement)
Toolkit
Telemetry
Tracking
()
RESOLVED
FIXED
mozilla13
People
(Reporter: MattN, Assigned: MattN)
References
Details
Attachments
(2 files, 1 obsolete file)
(deleted),
patch
|
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Dolske
:
review+
taras.mozilla
:
review+
|
Details | Diff | Splinter Review |
Startup crash detection was added in bug 294260. I think the following pieces of telemetry data would be useful:
1) The uptime when start crash tracking began and ended.
** This is so we can measure the proportion of startup crashes that the feature
detects by comparing with uptime data in crash-stats.
2) Whether the user was forced into safe mode by the startup crash detection.
** This would let us know how useful this feature is: to know if/when people are
experiencing N consecutive startup crashes (where N is currently 3).
3) Whether the last startup was a crash.
** This would only be recorded if we didn't crash on the following startup since
it seems telemetry data is only saved at shutdown. Could I have telemetry
save to disk in this case?
** It would be nice to know the number of consecutive startup crashes but then
we'd be counting users more than once with every additional crash.
There are problems collecting some of this data if there is a crash before telemetry is sent or persisted on disk. Any suggestions?
Note that the startup crash detection may detect some crashes which breakpad doesn't but it can only increment the counter if startup reaches the point where it begins tracking. #2 & #3 will be useful to compare to Socorro data.
Is it necessary to call accumulate with a false value for boolean telemetry when we just want to know how often something happens ie. #2? From what I can tell it should be enough to just report the interesting (true) case and then compare that to the number of telemetry reports for the time period. This is what Telemetry::A11Y_INSTANTIATED currently does but it leads to useless bar graphs showing 100% of reporters with it on without showing the total number of reports that didn't specify a value. It seems like boolean telemetry probes should have a default value. The advantage of only reporting in the true case is that it has no perf. impact.
Comment 1•13 years ago
|
||
Filed bug 723846 for making boolean histograms more useful.
We have bug 719167 for saving telemetry more often during a session.
It's certainly possible to save telemetry at other, specific times--I guess that's what you're asking for in #3?
Assignee | ||
Comment 2•13 years ago
|
||
This is to be able to compare to Socorro data (like bug 722566) to see how effective the crash detection is and to be able to compare crash trends between the two data sources.
Attachment #595566 -
Flags: review?(taras.mozilla)
Comment 3•13 years ago
|
||
is the intention here to correlate data points across two data sets (telemetry and socorro) or just monitor trends in a similar fashion?
Assignee | ||
Comment 4•13 years ago
|
||
(In reply to Sid Stamm [:geekboy] from comment #3)
> is the intention here to correlate data points across two data sets
> (telemetry and socorro) or just monitor trends in a similar fashion?
There will be no direct correlation so we would compare the aggregate results to get an approximation for the proportion of startup crashes detected. Then we would compare trends for other uses such as detecting a spike in crashes that Breakpad/Soccorro may not catch.
You can see in the patch we're just reporting two new timestamps.
Comment 5•13 years ago
|
||
Comment on attachment 595566 [details] [diff] [review]
v.1 Implement #1 - timeline portion
This patch needs a comment explaining what startupCrashDetectionBegin/startupCrashDetectionEnd are for. Is the goal to determine how much time elapses between begin/end, whether it sometimes begins but doesn't end?
Attachment #595566 -
Flags: review?(taras.mozilla)
Assignee | ||
Comment 6•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #5)
Do you prefer the comment in the header or with the Record() calls?
The goal is in comment 0 and comment 2 but I'm not sure if you want me to elaborate here or you just want those questions answered in the code comment.
If startupCrashDetectionEnd is not called then that means we crashed in which case telemetry wouldn't be sent. The elapsed time is not as important as the specific times.
Is there a separate privacy review that is needed for new telemetry or is this bug sufficient?
Comment 7•13 years ago
|
||
(In reply to Matthew N. [:MattN] from comment #6)
> (In reply to Taras Glek (:taras) from comment #5)
>
> Do you prefer the comment in the header or with the Record() calls?
>
> The goal is in comment 0 and comment 2 but I'm not sure if you want me to
> elaborate here or you just want those questions answered in the code comment.
Code comment in the header.
I'm not clear how having these timestamps will help you make decisions.
>
> If startupCrashDetectionEnd is not called then that means we crashed in
> which case telemetry wouldn't be sent. The elapsed time is not as important
> as the specific times.
>
> Is there a separate privacy review that is needed for new telemetry or is
> this bug sufficient?
This is good.
Keywords: privacy-review-needed
Assignee | ||
Comment 8•13 years ago
|
||
(In reply to Taras Glek (:taras) from comment #7)
> I'm not clear how having these timestamps will help you make decisions.
Here are some more details for usage of the timestamps:
Assumption: The StartupTimeline timestamps and Breakpad uptime both record t=0 at about the same time.
Let's define a startup crash for now as a crash within the first minute of uptime.
Scenario 1 (example):
* Socorro data shows that 80% of startup crashes happen with uptime <= 4s.
* Telemetry data shows startupCrashDetectionBegin happens 60% of the time with uptime > 4s.
Potential action: Improve startup crash detection to begin tracking earlier since a large proportion of startup crashes are not being detected (so UI won't be shown to users to offer help).
Assignee | ||
Comment 9•13 years ago
|
||
Attachment #595566 -
Attachment is obsolete: true
Attachment #595601 -
Flags: review?(taras.mozilla)
Updated•13 years ago
|
Attachment #595601 -
Flags: review?(taras.mozilla) → review+
this will help us pinpoint performance problems, the opt-in covers this; removing the privacy flag
Keywords: privacy-review-needed
Assignee | ||
Comment 11•13 years ago
|
||
Attachment #604623 -
Flags: review?(taras.mozilla)
Attachment #604623 -
Flags: review?(dolske)
Comment 12•13 years ago
|
||
Comment on attachment 604623 [details] [diff] [review]
v.1 Implement #2+3. STARTUP_CRASH_DETECTED + SAFE_MODE_USAGE
Review of attachment 604623 [details] [diff] [review]:
-----------------------------------------------------------------
::: toolkit/xre/nsXREDirProvider.cpp
@@ +785,1 @@
>
I'd consider breaking this out into a conditional, just to make it slightly less magical...
int mode = 1;
if (gSafeMode)
if (safeModeNecessary)
mode = 3
else
mode = 2
m::T::A::(blah, mode)
But no biggie either way.
Attachment #604623 -
Flags: review?(dolske) → review+
Comment 13•13 years ago
|
||
Comment on attachment 604623 [details] [diff] [review]
v.1 Implement #2+3. STARTUP_CRASH_DETECTED + SAFE_MODE_USAGE
kudos for using the new flag histograms. bug 732053 will make the linear histogram easier to specify in the future.
Attachment #604623 -
Flags: review?(taras.mozilla) → review+
Assignee | ||
Comment 14•13 years ago
|
||
https://hg.mozilla.org/integration/mozilla-inbound/rev/cdad5c4bfad7
https://hg.mozilla.org/integration/mozilla-inbound/rev/d228248c1b5a
(In reply to Taras Glek (:taras) from comment #13)
> kudos for using the new flag histograms. bug 732053 will make the linear
> histogram easier to specify in the future.
I was the one who brought up the boolean/flag issue so it only makes sense :)
I'm glad we're going to fix linear histograms since the current behaviour was non-obvious especially because the 0 bucket also shows when inspecting the histogramSnapshots property.
Flags: in-testsuite-
Comment 15•13 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/cdad5c4bfad7
https://hg.mozilla.org/mozilla-central/rev/d228248c1b5a
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla13
You need to log in
before you can comment on or make changes to this bug.
Description
•