Ensure GPU process crashes get reported on Android
Categories
(GeckoView :: Sandboxing, task)
Tracking
(firefox97 fixed)
Tracking | Status | |
---|---|---|
firefox97 | --- | fixed |
People
(Reporter: jnicol, Assigned: jnicol)
References
Details
Attachments
(4 files)
Currently on desktop, GPU process crashes get detected and we generate a crash report here[1]. I don't know the crash reporting code at all, but somehow I think these get reported automatically in the background.
I don't believe that is the case on geckoview. We call GenerateCrashReport
as on desktop, but I don't think any reports get submitted automatically. From reading the code it looks to me like we need to execute this code to launch the crash reporter service.
We can do so by:
- In GPUChild::ActorDestroy(), pass a
dumpId
parameter toGenerateCrashReport()
so we know the ID of the generated dump. - Stuff that
dumpId
in a property bag that gets sent to thecompositor:process-aborted
observers - Add a
GpuCrashHandler.jsm
very similar to the existingContentCrashHandler.jsm
. Make it observecompositor:process-aborted
rather thanipc:content-shutdown
. And emitGeckoView:GpuCrashReport
rather thanGeckoView:ContentCrashReport
- In GeckoRuntime handle
GeckoView:GpuCrashReport
messages by launching the crash reporter service.
Does that sound reasonable to you, Agi?
I've tested this works in a local build of Fenix. It shows the same full-screen page that it does for a content process crash (because EXTRA_CRASH_FATAL
is false). I guess it's more a question for the Fenix team what UI they want, but perhaps from the gecko side there could be a field to say a crash is even less fatal than a content crash. Fenix could then show a less invasive dialog, or just submit in the background.
[1] I believe we may actually have a bug here. Sometimes GPUChild::ActorDestroy
does indeed get called with aWhy == AbnormalShutdown
, and we generate the crash report as described. However, sometimes a different actor gets destroyed first, such as CompositorBridgeChild::ActorDestroy()
. This calls GPUProcessManager::NotifyRemoteActorDestroyed()
, which tears down the GPU process. GPUChild::ActorDestroy
consequently ends up getting called with aWhy == **Normal**Shutdown
, so we do not generate the crash report.
Comment 1•3 years ago
|
||
And emit GeckoView:GpuCrashReport rather than GeckoView:ContentCrashReport
this is for the codereview, but I think we can emit the same event? something like GeckoView:ChildProcessCrashReport
, and maybe even reuse ContentCrashHandler.jsm
.
I've tested this works in a local build of Fenix. It shows the same full-screen page that it does for a content process crash (because EXTRA_CRASH_FATAL is false). I guess it's more a question for the Fenix team what UI they want, but perhaps from the gecko side there could be a field to say a crash is even less fatal than a content crash. Fenix could then show a less invasive dialog, or just submit in the background.
Yeah I think we need a new field, something like EXTRA_CRASH_BACKGROUND
perhaps? because Gecko can just restart the GPU process and everything will keep working as usual, right? I think all Fenix would want to do is display a notification that a background process crashed (or perhaps not even that)
Does that sound reasonable to you, Agi?
in general this all sounds reasonable to me
Assignee | ||
Comment 2•3 years ago
|
||
(In reply to Agi Sferro | :agi | ni? for questions | ⏰ PST | he/him from comment #1)
this is for the codereview, but I think we can emit the same event? something like
GeckoView:ChildProcessCrashReport
, and maybe even reuseContentCrashHandler.jsm
.
Sounds good. Yes we should be able to use a shared ChildCrashHandler.jsm
.
Yeah I think we need a new field, something like
EXTRA_CRASH_BACKGROUND
perhaps? because Gecko can just restart the GPU process and everything will keep working as usual, right? I think all Fenix would want to do is display a notification that a background process crashed (or perhaps not even that)
Yes, from the user's perspective they'll usually see a brief flicker and then everything will resume as usual. A new background
field sounds good. I'll update the existing Parent and Content crash tests to expect background == false
, and add a GPU crash test which expects fatal == false && background == true
.
Assignee | ||
Comment 3•3 years ago
|
||
Currently the geckoview crash reporter has an EXTRA_CRASH_FATAL
argument, which indicates whether a crash is fatal, ie it occured in
the parent process and cannot be recovered from. Crashes in content
processes are by contrast deemed non-fatal, as the application can
recover from them.
With the upcoming GPU process on Android we will start collecting GPU
process crashes. These are certainly not fatal, and in fact should
barely be noticed by users. Applications may want a way to distinguish
between non-fatal but prominent content process crashes, and barely
noticeable GPU process crashes, so that they can avoid showing an
intrusive UI for GPU process crashes.
This patch adds a EXTRA_CRASH_BACKGROUND argument to crash intents,
that is used to specify whether a crash occured in a background
process and would have been barely noticeable to the user. It also
updates the existing ContentCrashTest and ParentCrashTest to both
expect the background value to be false.
Updated•3 years ago
|
Assignee | ||
Comment 4•3 years ago
|
||
Rename ContentCrashHandler.jsm to ChildCrashHandler.jsm as it is now
responsible for all types of child process crashes. Have it observe
"compositor:process-aborted" in addition to
"ipc:content-shutdown". Additionally, rename the
"GeckoView:ContentCrashReport" event it sends to
"GeckoView:ChildCrashReport".
In GPUChild::ActorDestroy, provide an out variable for
GenerateCrashReport to return the dump ID, and stuff this in to a
property bag, along with "abnormal: true", sent to
"compositor:process-aborted" observers.
In ChildCrashHandler, set the "background" argument sent with the
GeckoView:ChildCrashReport event to true for GPU process crashes, and
false otherwise.
Depends on D132809
Assignee | ||
Comment 5•3 years ago
|
||
GPU process crash reports are handled by calling GenerateCrashReport()
in GPUChild::ActorDestroy() if the reason is AbnormalShutdown. This
ensures we only create crash report if the process actually crashed,
and not when it was deliberately stopped.
However, sometimes actors other than GPUChild are the first to be
destroyed immediately after a crash, for example CompositorBridgeChild
or UiCompositorControllerChild. If such an actor receives an
ActorDestroy message with AbnormalShutdown as the reason, they will
call GPUProcessManager::NotifyRemoteActorDestroyed(), which leads to
GPUProcessHost::Shutdown(), which will close the PGPU channel. This
creates a race condition after a GPU process crash, where sometimes
the channel gets closed gracefully and ActorDestroy will receive a
NormalShutdown reason rather than AbnormalShutdown.
This patch adds a flag to GPUProcessHost::Shutdown() indicating
whether it is being called in response to an unexpected shutdown being
detected by another actor. If set, it sets a flag on the
GPUChild. When GPUChild::ActorDestroy() eventually gets called, it
knows to act in response to a crash if either the reason is
AbnormalShutdown or this flag has been set.
Depends on D132810
Assignee | ||
Comment 6•3 years ago
|
||
Add a function to GPUProcessManager to force the GPU process to crash,
and expose it through gfxInfo. Expose this to geckoview tests via the
test-support webextension.
Add a junit test GpuCrashTest, which triggers a GPU process crash and
ensures the crash reporter was notified.
Additionally, ensure the TestCrashHandler service is stopped in
between tests, as otherwise only the first crash test to run will be
notified of the crash.
Depends on D132811
Updated•3 years ago
|
Comment 8•3 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/4e4775abdc17
https://hg.mozilla.org/mozilla-central/rev/bc587029a021
https://hg.mozilla.org/mozilla-central/rev/452eef165731
https://hg.mozilla.org/mozilla-central/rev/49bfa085c1ac
Comment 9•2 years ago
|
||
Moving GPU process bugs to the new GeckoView::Sandboxing component.
Description
•