Document sources of non-deterministic data collection in FOG
Categories
(Toolkit :: Telemetry, task, P3)
Tracking
()
People
(Reporter: chutten, Assigned: chutten)
References
(Blocks 1 open bug)
Details
Attachments
(1 file)
(deleted),
text/x-phabricator-request
|
Details |
Due to its design, FOG has some non-deterministic behaviour that may influence how much and what kinds of data will be reported.
- IPC flushes on idle, meaning it might miss data from sessions that have fewer idle periods.
- IPC should also be triggered when a process is being taken down in an orderly fashion. File a bug for this, or fix it here.
- Ping-lifetime data is held in-memory (since bug 1729723) and is persisted to the db on idle and on shutdown, meaning a crash in a session with fewer idle periods would result in lost data.
To our best ability we should instrument and document these and other sources of data unreliability to build confidence in our data collection system in Firefox Desktop.
Assignee | ||
Comment 1•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 3•3 years ago
|
||
Marking leave-open
as the landing patch takes care of
IPC should also be triggered when a process is being taken down in an orderly fashion. File a bug for this, or fix it here.
But doesn't instrument or document non-determinism otherwise.
Comment 4•3 years ago
|
||
Backed out for causing frequent address sanitizer failures.
Failure log Wc jobs
Failure log WdH1 jobs
Failure log wpt jobs
Failure log Wr jobs
Failure log Wd jobs
Assignee | ||
Comment 5•3 years ago
|
||
Dang. Looks like my "while we're here" addendum to the patch to ensure delayed ping lifetime io is persisted after the IPC data has flushed in has found another way to hit Bug 1731595 (still pending a release and a vendor before it'll be fixed in m-c).
I can take it out and file a follow-up for putting it back in. Though if it's truly being caused by persisting after shutdown, that means some of the IPC data we're flushing on idle is going to miss the bus. Which means we may wish to rethink how we schedule these at-shutdown flushes.
Comment 7•3 years ago
|
||
bugherder |
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 8•3 years ago
|
||
Florian learned that some instrumentation on content children comes in too late to be recorded. We should document exactly how late is too late for our subprocess support and what we do when it's too late (which is, for the most part, nothing).
Comment 9•2 years ago
|
||
The leave-open keyword is there and there is no activity for 6 months.
:chutten, maybe it's time to close this bug?
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 10•2 years ago
|
||
Alrighty, auto-nag, you win this one. I do still hope to document this in the dev docs, but the non-deterministicity (not a word) of data collection is set to be improved by bug 1641989 in the not-too-distant future and we haven't bemoaned the lack of documentation yet... so maybe this is work that isn't needed after all.
Updated•2 years ago
|
Description
•