XPCOMSpinEventLoopStack annotation misalignment with main thread status
Categories
(Core :: XPCOM, defect, P2)
Tracking
()
Tracking | Status | |
---|---|---|
firefox-esr91 | --- | unaffected |
firefox94 | --- | unaffected |
firefox95 | --- | wontfix |
firefox96 | --- | fixed |
People
(Reporter: jstutte, Assigned: jstutte)
References
(Blocks 1 open bug, Regression)
Details
(Keywords: regression)
Attachments
(3 files)
In some crash reports we see a misalignment between the annotation and the real status of the main thread. While the annotation suggests to be stuck in some SpinEventLoopUntil
, we actually seem to just sit and wait in the main event loop.
Some examples are:
https://crash-stats.mozilla.org/report/index/349cb5a1-4bfd-4c94-ba86-552450211112
https://crash-stats.mozilla.org/report/index/5d4233b5-109a-4ccd-8fa1-e45150211111
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 1•3 years ago
|
||
It is probably worth noting, that these are not random reports. There seems to be a stable relation between the annotation's origin and the nature of the stack, that is: For the same annotation I either always find a SpinEventLoopUntil
on the stack (like in bug 1740889) or never (like in bug 1740899).
Nika, all the instances I've seen failing this way so far seemed to be on content
processes (though there are also working examples from content processes). Ignoring how crash annotations really work, is there any chance we have a fallible way of reporting changes to them in content processes? Is this going through the parent process somehow?
Updated•3 years ago
|
Comment 2•3 years ago
|
||
Set release status flags based on info from the regressing bug 1731564
Assignee | ||
Updated•3 years ago
|
Updated•3 years ago
|
Comment 3•3 years ago
|
||
:jstutte do you have a severity and priority for this bug?
Assignee | ||
Comment 4•3 years ago
|
||
Severity for end users is very low, but it could help to diagnose other more nasty bugs. If we can't rely on the annotation (we put quite some effort in it), this would be bad for those bugs.
Comment 5•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #1)
Nika, all the instances I've seen failing this way so far seemed to be on
content
processes (though there are also working examples from content processes). Ignoring how crash annotations really work, is there any chance we have a fallible way of reporting changes to them in content processes? Is this going through the parent process somehow?
It looks like the failures which you're seeing where the stack doesn't line up are all IPCError-browser | ShutDownKill
failures, which are caused when a bug causes the parent process to kill the child process during shutdown. In these cases the minidump etc. are generated by the parent process on behalf of the killed child process. I wouldn't be surprised at all if there was a race between when we read the annotations sent by the content process and when we capture the content process' stacks (and I expect that annotations in content processes are sent somewhat asynchronously anyway), so it makes sense that we'd get this annotation out of sync with the captured stack there.
Assignee | ||
Comment 6•3 years ago
|
||
Updated•3 years ago
|
Assignee | ||
Comment 7•3 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #5)
I wouldn't be surprised at all if there was a race between when we read the annotations sent by the content process and when we capture the content process' stacks (and I expect that annotations in content processes are sent somewhat asynchronously anyway), so it makes sense that we'd get this annotation out of sync with the captured stack there.
Assuming we cannot force a flush of this information from the content process side when unwinding the stack the best we can do is probably just to document this caveat.
Assignee | ||
Comment 8•3 years ago
|
||
I just learned we can filter on crashstats through proto signature
in order to have only "valid" cases. Thanks Randell!
Assignee | ||
Comment 9•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #8)
I just learned we can filter on crashstats through
proto signature
in order to have only "valid" cases. Thanks Randell!
Be aware that this can filter out valid cases where SpinEventLoopUntil
is inlined (which happens on some stacks, it seems).
Comment 10•3 years ago
|
||
Comment 11•3 years ago
|
||
bugherder |
Assignee | ||
Comment 12•3 years ago
|
||
We see the annotation for "IPC Launch" unexpectedly on the child process despite we know that it lives only in the parent process. The doubt would be that we actually see the parent process' one here.
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 13•3 years ago
|
||
We suspect to see the parent processes annotation on crash reports from child processes that result from IPC shutdown kills. In order to reduce the confusion we add a prefix for the process type.
Comment 14•3 years ago
|
||
The patch landed in nightly and beta is affected.
:jstutte, is this bug important enough to require an uplift?
If not please set status_beta
to wontfix
.
For more information, please visit auto_nag documentation.
Assignee | ||
Comment 15•3 years ago
|
||
Actually we do not think this is yet fixed at all.
Comment 16•3 years ago
|
||
Comment 17•3 years ago
|
||
bugherder |
Assignee | ||
Comment 18•3 years ago
|
||
Just to follow up here: We have the smoking gun that we actually see the parent process' annotation on crashing child processes if they do not have it.
I assume we should amend our documentation we just changed here in that sense, too.
Assignee | ||
Comment 19•3 years ago
|
||
Actually for the XPCOMSpinEventLoopStack
annotation I would consider this to be a feature that lets us peek into the parent process' state when the IPCError-browser | ShutDownKill
occurs. So while it feels a bit quirky to see parent process' annotations on child process reports, I hesitate to file a bug for it...
Assignee | ||
Comment 20•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Comment 21•3 years ago
|
||
(In reply to Jens Stutte [:jstutte] from comment #19)
Actually for the
XPCOMSpinEventLoopStack
annotation I would consider this to be a feature that lets us peek into the parent process' state when theIPCError-browser | ShutDownKill
occurs. So while it feels a bit quirky to see parent process' annotations on child process reports, I hesitate to file a bug for it...
It might be worth adding some form of attribute on the crash annotations specifying whether you want to have this behaviour at least, even if we decide to turn it on for XPCOMSpinEventLoopStack
. I suppose the optimal situation would be where the source process of an annotation is included in the crash report, so in cases like this we can see both, but I imagine that'll be a lot of work for the crash reporting team.
Comment 22•3 years ago
|
||
Assignee | ||
Updated•3 years ago
|
Assignee | ||
Comment 23•3 years ago
|
||
(In reply to Nika Layzell [:nika] (ni? for response) from comment #21)
It might be worth adding some form of attribute on the crash annotations specifying whether you want to have this behaviour at least, even if we decide to turn it on for
XPCOMSpinEventLoopStack
. I suppose the optimal situation would be where the source process of an annotation is included in the crash report, so in cases like this we can see both, but I imagine that'll be a lot of work for the crash reporting team.
I filed bug 1742836 for this. It is definitely too confusing to remain just as is, IMHO.
Comment 24•3 years ago
|
||
bugherder |
Updated•3 years ago
|
Description
•