Closed
Bug 1187270
Opened 9 years ago
Closed 9 years ago
Add Telemetry session id to crash annotations
Categories
(Toolkit :: Telemetry, defect, P2)
Toolkit
Telemetry
Tracking
()
People
(Reporter: rvitillo, Assigned: benjamin)
References
(Blocks 1 open bug)
Details
(Whiteboard: [unifiedTelemetry][measurement:client])
Attachments
(1 file, 4 obsolete files)
(deleted),
text/x-review-board-request
|
gfritzsche
:
review+
ritu
:
approval-mozilla-aurora+
ritu
:
approval-mozilla-beta+
|
Details |
Crash pings are hard to chain/join back with main pings as the info field is missing. Is there a reason why the info section is missing?
Reporter | ||
Updated•9 years ago
|
Flags: needinfo?(gfritzsche)
Comment 1•9 years ago
|
||
Crash pings are separate to the "main" pings and we don't store that session information with the crashes.
I suggest we add a crash annotation with the current session id.
From talking with Roberto, that should be enough and allows us to correlate the crash pings with the specific session they belong to.
Blocks: 1120356
Flags: needinfo?(gfritzsche)
Updated•9 years ago
|
Summary: Missing info field in telemetry crash pings. → Add Telemetry session id to crash annotations
Updated•9 years ago
|
Whiteboard: [unifiedTelemetry] [uplift3]
Comment 2•9 years ago
|
||
Attachment #8639433 -
Flags: review?(benjamin)
Updated•9 years ago
|
Assignee: nobody → gfritzsche
Status: NEW → ASSIGNED
Assignee | ||
Comment 3•9 years ago
|
||
We try not to correlate the crash reports with the telemetry data. It's fine to store it and send it as part of the "crash" telemetry ping, but it shouldn't be sent to crash-stats as part of the metadata.
Assignee | ||
Updated•9 years ago
|
Attachment #8639433 -
Flags: review?(benjamin) → review-
Comment 4•9 years ago
|
||
Good point. Also, we dropped the retentionDays parameter for now, so removing that here.
Attachment #8639819 -
Flags: review?(benjamin)
Updated•9 years ago
|
Attachment #8639433 -
Attachment is obsolete: true
Updated•9 years ago
|
Whiteboard: [unifiedTelemetry] [uplift3] → [unifiedTelemetry] [uplift4]
Assignee | ||
Comment 5•9 years ago
|
||
Comment on attachment 8639819 [details] [diff] [review]
Add Telemetry session id to crash annotations
The metadata is still included in the crash report though. You're only excluding it here from the crash manager.
Attachment #8639819 -
Flags: review?(benjamin) → review-
Updated•9 years ago
|
Iteration: --- → 42.3 - Aug 10
Comment 6•9 years ago
|
||
Ok, figured out my misconceptions. This now always filters out TelemetrySessionId fields from the extra data on crash submissions. I am a bit sad that we dont have proper end-to-end testing from crash generation to submission, but that would take more time to fix than i can justify now. So for now the test coverage is a bit... distributed.
Attachment #8641649 -
Flags: review?(benjamin)
Updated•9 years ago
|
Attachment #8639819 -
Attachment is obsolete: true
Updated•9 years ago
|
Updated•9 years ago
|
Whiteboard: [unifiedTelemetry] [uplift4] → [unifiedTelemetry] [uplift5]
Assignee | ||
Comment 7•9 years ago
|
||
Comment on attachment 8641649 [details] [diff] [review]
Add Telemetry session id to crash annotations
This still only deals with the CrashSubmit.jsm submission path which is primarily used for plugin/content crashes. The native client submission path would still have this.
Overall I think that this approach of annotating and then removing is fragile and we should do something different. Probably add the session ID specifically only to the event file here: http://mxr.mozilla.org/mozilla-central/source/toolkit/crashreporter/nsExceptionHandler.cpp#770
Attachment #8641649 -
Flags: review?(benjamin) → review-
Updated•9 years ago
|
Updated•9 years ago
|
Whiteboard: [unifiedTelemetry] [uplift5] → [unifiedTelemetry]
Comment 8•9 years ago
|
||
Dropping this for now over other work as it's not urgent.
Assignee: gfritzsche → nobody
Status: ASSIGNED → NEW
Updated•9 years ago
|
Priority: -- → P3
Updated•9 years ago
|
Points: --- → 2
Whiteboard: [unifiedTelemetry] → [unifiedTelemetry][measurement:client]
Updated•9 years ago
|
Priority: P3 → P4
Assignee | ||
Updated•9 years ago
|
Assignee: nobody → benjamin
Status: NEW → ASSIGNED
Priority: P4 → P2
Assignee | ||
Comment 9•9 years ago
|
||
Assignee | ||
Comment 10•9 years ago
|
||
Review commit: https://reviewboard.mozilla.org/r/39769/diff/#index_header
See other reviews: https://reviewboard.mozilla.org/r/39769/
Attachment #8730231 -
Flags: review?(gfritzsche)
Assignee | ||
Comment 11•9 years ago
|
||
Comment 12•9 years ago
|
||
Comment on attachment 8730231 [details]
MozReview Request: Bug 1187270 - Add Telemetry session ID to crash annotations, r?gfritzsche
https://reviewboard.mozilla.org/r/39769/#review36555
::: toolkit/components/crashes/CrashManager.jsm:546
(Diff revision 1)
> }
> delete reportMeta.TelemetryEnvironment;
> }
> + if ('TelemetrySessionId' in reportMeta) {
> + sessionId = reportMeta.TelemetrySessionId;
> + delete reportMeta.TelemetrySessionId;
Given your concerns about submitting the session id with crash reports; can we also assert that we are not submitting it with crash reports?
::: toolkit/components/telemetry/docs/crash-ping.rst:14
(Diff revision 1)
> The client ID is submitted with this ping.
>
> Structure::
>
> {
> version: 1,
Given that the addition is in the payload, we probably don't want to bump the version?
::: toolkit/components/telemetry/docs/crash-ping.rst:21
(Diff revision 1)
> ... common ping data
> clientId: <UUID>,
> environment: { ... },
> payload: {
> crashDate: "YYYY-MM-DD",
> + sessionId: <UUID>, // may be missing for crashes that happen early
Let's add a note below on what Firefox version this was added in.
Finding the right historic cutoffs will be a pain otherwise.
::: toolkit/components/telemetry/docs/crash-ping.rst:22
(Diff revision 1)
> clientId: <UUID>,
> environment: { ... },
> payload: {
> crashDate: "YYYY-MM-DD",
> + sessionId: <UUID>, // may be missing for crashes that happen early
> + // in startup
We should also update the schema:
https://github.com/mozilla-services/mozilla-pipeline-schemas/blob/master/telemetry/crash.schema.json
::: toolkit/crashreporter/nsExceptionHandler.cpp:912
(Diff revision 1)
> + WriteLiteral(eventFile, "TelemetrySessionId=");
> + WriteString(eventFile, currentSessionId);
> + WriteLiteral(eventFile, "\n");
This can use `WriteAnnotation()`.
::: toolkit/crashreporter/nsExceptionHandler.cpp:1943
(Diff revision 1)
> free(eventsDirectory);
> eventsDirectory = nullptr;
> }
>
> + if (currentSessionId) {
> + free(currentSessionId);
Our documentation says to use `NS_Free()` for `ToNewCstring()`:
https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Getting_a_char_*_buffer_from_a_String
I see that this matches existing code here though.
Is this an oversight or is the documentation outdated?
::: toolkit/crashreporter/nsExceptionHandler.cpp:2664
(Diff revision 1)
> +{
> + if (!gExceptionHandler) {
> + return;
> + }
> + if (currentSessionId) {
> + free(currentSessionId);
Ditto on the `NS_Free()`.
This seems to potentially race with other uses of `currentSessionId`.
Are we sure those can't run concurrently/interleaved?
Currently we only have one call per session for `SetTelemetrySessionId()`, but this seems like a bad surprise to leave in.
::: toolkit/crashreporter/test/unit/test_crashreporter_crash.js:1
(Diff revision 1)
> +Components.utils.import("resource://gre/modules/TelemetrySession.jsm");
Unused import.
::: toolkit/crashreporter/test/unit/test_event_files.js:51
(Diff revision 1)
> Assert.equal(crashes.length, 1);
> let crash = crashes[0];
> Assert.ok(crash.isOfType(cm.PROCESS_TYPE_MAIN, cm.CRASH_TYPE_CRASH));
> Assert.equal(crash.id + ".dmp", basename, "ID recorded properly");
> Assert.equal(crash.metadata.ShutdownProgress, "event-test");
> + Assert.ok("TelemetrySessionId" in crash.metadata);
We could explicitly compare the session id here or at least check for a valid UUID format (to protect against junk or `null` values).
::: xpcom/system/nsICrashReporter.idl:130
(Diff revision 1)
> * @throws NS_ERROR_NOT_INITIALIZED if crash reporting is disabled.
> */
> void saveMemoryReport();
> +
> + /**
> + * Set the telemetry session ID which is recorded in crash metadata.
Lets add a note that this is not to be submitted with crash reports and only used for crash pings etc.
Attachment #8730231 -
Flags: review?(gfritzsche)
Updated•9 years ago
|
Attachment #8641649 -
Attachment is obsolete: true
Updated•9 years ago
|
Attachment #8729014 -
Attachment is obsolete: true
Assignee | ||
Comment 13•9 years ago
|
||
https://reviewboard.mozilla.org/r/39769/#review36555
> Given that the addition is in the payload, we probably don't want to bump the version?
I don't know what bumping the version would mean in practice.
> Our documentation says to use `NS_Free()` for `ToNewCstring()`:
> https://developer.mozilla.org/en-US/docs/Mozilla/Tech/XPCOM/Guide/Internal_strings#Getting_a_char_*_buffer_from_a_String
> I see that this matches existing code here though.
>
> Is this an oversight or is the documentation outdated?
NS_Free is equivalent to free() when inside libxul. It's only an issue if you're writing external code and need the consistent heap. So we actually prefer free() for internal use now.
> Ditto on the `NS_Free()`.
>
> This seems to potentially race with other uses of `currentSessionId`.
> Are we sure those can't run concurrently/interleaved?
>
> Currently we only have one call per session for `SetTelemetrySessionId()`, but this seems like a bad surprise to leave in.
We don't have a safe way to protect from this kind of threading hazard. You can't lock because that would just deadlock. So we live with the slight risk.
Assignee | ||
Comment 14•9 years ago
|
||
Comment on attachment 8730231 [details]
MozReview Request: Bug 1187270 - Add Telemetry session ID to crash annotations, r?gfritzsche
Review request updated; see interdiff: https://reviewboard.mozilla.org/r/39769/diff/1-2/
Attachment #8730231 -
Flags: review?(gfritzsche)
Comment 15•9 years ago
|
||
https://reviewboard.mozilla.org/r/39769/#review36555
> I don't know what bumping the version would mean in practice.
I think this is only interesting right now if we would make `sessionId` required.
As it is optional i think we can just leave it.
> NS_Free is equivalent to free() when inside libxul. It's only an issue if you're writing external code and need the consistent heap. So we actually prefer free() for internal use now.
Ok, updated the wiki.
> We don't have a safe way to protect from this kind of threading hazard. You can't lock because that would just deadlock. So we live with the slight risk.
What about an `atomic.exchange(nullptr)` using `Atomics.h`?
Assignee | ||
Comment 16•9 years ago
|
||
That won't help. The only race here is if we crash while we are setting the session ID. And in that case, using an atomic-set or atomic exchange doesn't make a difference because the word-size pointer set is atomic on all our platforms anyway.
Comment 17•9 years ago
|
||
Comment on attachment 8730231 [details]
MozReview Request: Bug 1187270 - Add Telemetry session ID to crash annotations, r?gfritzsche
https://reviewboard.mozilla.org/r/39769/#review36689
Attachment #8730231 -
Flags: review?(gfritzsche) → review+
Comment 18•9 years ago
|
||
Comment 19•9 years ago
|
||
Build bustage:
/toolkit/crashreporter/nsExceptionHandler.cpp(912) : error C2780: 'void CrashReporter::WriteAnnotation(CrashReporter::PlatformWriter &,const char (&)[N],const char *)' : expects 3 arguments - 2 provided
Sample log:
http://archive.mozilla.org/pub/firefox/tinderbox-builds/mozilla-inbound-win64/1458081472/mozilla-inbound-win64-bm74-build1-build333.txt.gz
Backed out:
https://hg.mozilla.org/integration/mozilla-inbound/rev/e9a5d52d88dd
Flags: needinfo?(benjamin)
Comment 20•9 years ago
|
||
Assignee | ||
Updated•9 years ago
|
Flags: needinfo?(benjamin)
Comment 21•9 years ago
|
||
bugherder |
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
status-firefox48:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Assignee | ||
Comment 22•9 years ago
|
||
Comment on attachment 8730231 [details]
MozReview Request: Bug 1187270 - Add Telemetry session ID to crash annotations, r?gfritzsche
Approval Request Comment
[Feature/regressing bug #]: Not a regression, but a necessary part of the crash analysis for bug 1257321 to implement the new telemetry-based stability dashboards
[Describe test coverage new/current, TreeHerder]: manual and automated tests
[Risks and why]: relatively low risk. Very isolated code paths. The main risk is that we're changing nsICrashReporter, but we're doing that at the end of a vtable so it's unlikely to matter.
[String/UUID change made/needed]: UUID changes are no longer done, but this does add a method to the end of nsICrashReporter.
Attachment #8730231 -
Flags: approval-mozilla-beta?
Attachment #8730231 -
Flags: approval-mozilla-aurora?
status-firefox46:
--- → affected
status-firefox47:
--- → affected
Comment on attachment 8730231 [details]
MozReview Request: Bug 1187270 - Add Telemetry session ID to crash annotations, r?gfritzsche
Another improvement to the crash metadata that will help us make more informed decisions about release quality and severity of crashes. Aurora47+, Beta46+
Attachment #8730231 -
Flags: approval-mozilla-beta?
Attachment #8730231 -
Flags: approval-mozilla-beta+
Attachment #8730231 -
Flags: approval-mozilla-aurora?
Attachment #8730231 -
Flags: approval-mozilla-aurora+
Comment 24•9 years ago
|
||
bugherder uplift |
Comment 25•9 years ago
|
||
bugherder uplift |
You need to log in
before you can comment on or make changes to this bug.
Description
•