Closed Bug 1743454 Opened 3 years ago Closed 3 years ago

Ensure GPU process crashes get reported on Android

Categories

(GeckoView :: Sandboxing, task)

Unspecified
Android

Tracking

(firefox97 fixed)

RESOLVED FIXED
97 Branch
Tracking Status
firefox97 --- fixed

People

(Reporter: jnicol, Assigned: jnicol)

References

Details

Attachments

(4 files)

Currently on desktop, GPU process crashes get detected and we generate a crash report here[1]. I don't know the crash reporting code at all, but somehow I think these get reported automatically in the background.

I don't believe that is the case on geckoview. We call GenerateCrashReport as on desktop, but I don't think any reports get submitted automatically. From reading the code it looks to me like we need to execute this code to launch the crash reporter service.

We can do so by:

  • In GPUChild::ActorDestroy(), pass a dumpId parameter to GenerateCrashReport() so we know the ID of the generated dump.
  • Stuff that dumpId in a property bag that gets sent to the compositor:process-aborted observers
  • Add a GpuCrashHandler.jsm very similar to the existing ContentCrashHandler.jsm. Make it observe compositor:process-aborted rather than ipc:content-shutdown. And emit GeckoView:GpuCrashReport rather than GeckoView:ContentCrashReport
  • In GeckoRuntime handle GeckoView:GpuCrashReport messages by launching the crash reporter service.

Does that sound reasonable to you, Agi?

I've tested this works in a local build of Fenix. It shows the same full-screen page that it does for a content process crash (because EXTRA_CRASH_FATAL is false). I guess it's more a question for the Fenix team what UI they want, but perhaps from the gecko side there could be a field to say a crash is even less fatal than a content crash. Fenix could then show a less invasive dialog, or just submit in the background.

[1] I believe we may actually have a bug here. Sometimes GPUChild::ActorDestroy does indeed get called with aWhy == AbnormalShutdown, and we generate the crash report as described. However, sometimes a different actor gets destroyed first, such as CompositorBridgeChild::ActorDestroy(). This calls GPUProcessManager::NotifyRemoteActorDestroyed(), which tears down the GPU process. GPUChild::ActorDestroy consequently ends up getting called with aWhy == **Normal**Shutdown, so we do not generate the crash report.

Flags: needinfo?(agi)

And emit GeckoView:GpuCrashReport rather than GeckoView:ContentCrashReport

this is for the codereview, but I think we can emit the same event? something like GeckoView:ChildProcessCrashReport, and maybe even reuse ContentCrashHandler.jsm.

I've tested this works in a local build of Fenix. It shows the same full-screen page that it does for a content process crash (because EXTRA_CRASH_FATAL is false). I guess it's more a question for the Fenix team what UI they want, but perhaps from the gecko side there could be a field to say a crash is even less fatal than a content crash. Fenix could then show a less invasive dialog, or just submit in the background.

Yeah I think we need a new field, something like EXTRA_CRASH_BACKGROUND perhaps? because Gecko can just restart the GPU process and everything will keep working as usual, right? I think all Fenix would want to do is display a notification that a background process crashed (or perhaps not even that)

Does that sound reasonable to you, Agi?

in general this all sounds reasonable to me

Flags: needinfo?(agi)

(In reply to Agi Sferro | :agi | ni? for questions | ⏰ PST | he/him from comment #1)

this is for the codereview, but I think we can emit the same event? something like GeckoView:ChildProcessCrashReport, and maybe even reuse ContentCrashHandler.jsm.

Sounds good. Yes we should be able to use a shared ChildCrashHandler.jsm.

Yeah I think we need a new field, something like EXTRA_CRASH_BACKGROUND perhaps? because Gecko can just restart the GPU process and everything will keep working as usual, right? I think all Fenix would want to do is display a notification that a background process crashed (or perhaps not even that)

Yes, from the user's perspective they'll usually see a brief flicker and then everything will resume as usual. A new background field sounds good. I'll update the existing Parent and Content crash tests to expect background == false, and add a GPU crash test which expects fatal == false && background == true.

Currently the geckoview crash reporter has an EXTRA_CRASH_FATAL
argument, which indicates whether a crash is fatal, ie it occured in
the parent process and cannot be recovered from. Crashes in content
processes are by contrast deemed non-fatal, as the application can
recover from them.

With the upcoming GPU process on Android we will start collecting GPU
process crashes. These are certainly not fatal, and in fact should
barely be noticed by users. Applications may want a way to distinguish
between non-fatal but prominent content process crashes, and barely
noticeable GPU process crashes, so that they can avoid showing an
intrusive UI for GPU process crashes.

This patch adds a EXTRA_CRASH_BACKGROUND argument to crash intents,
that is used to specify whether a crash occured in a background
process and would have been barely noticeable to the user. It also
updates the existing ContentCrashTest and ParentCrashTest to both
expect the background value to be false.

Assignee: nobody → jnicol
Status: NEW → ASSIGNED

Rename ContentCrashHandler.jsm to ChildCrashHandler.jsm as it is now
responsible for all types of child process crashes. Have it observe
"compositor:process-aborted" in addition to
"ipc:content-shutdown". Additionally, rename the
"GeckoView:ContentCrashReport" event it sends to
"GeckoView:ChildCrashReport".

In GPUChild::ActorDestroy, provide an out variable for
GenerateCrashReport to return the dump ID, and stuff this in to a
property bag, along with "abnormal: true", sent to
"compositor:process-aborted" observers.

In ChildCrashHandler, set the "background" argument sent with the
GeckoView:ChildCrashReport event to true for GPU process crashes, and
false otherwise.

Depends on D132809

GPU process crash reports are handled by calling GenerateCrashReport()
in GPUChild::ActorDestroy() if the reason is AbnormalShutdown. This
ensures we only create crash report if the process actually crashed,
and not when it was deliberately stopped.

However, sometimes actors other than GPUChild are the first to be
destroyed immediately after a crash, for example CompositorBridgeChild
or UiCompositorControllerChild. If such an actor receives an
ActorDestroy message with AbnormalShutdown as the reason, they will
call GPUProcessManager::NotifyRemoteActorDestroyed(), which leads to
GPUProcessHost::Shutdown(), which will close the PGPU channel. This
creates a race condition after a GPU process crash, where sometimes
the channel gets closed gracefully and ActorDestroy will receive a
NormalShutdown reason rather than AbnormalShutdown.

This patch adds a flag to GPUProcessHost::Shutdown() indicating
whether it is being called in response to an unexpected shutdown being
detected by another actor. If set, it sets a flag on the
GPUChild. When GPUChild::ActorDestroy() eventually gets called, it
knows to act in response to a crash if either the reason is
AbnormalShutdown or this flag has been set.

Depends on D132810

Add a function to GPUProcessManager to force the GPU process to crash,
and expose it through gfxInfo. Expose this to geckoview tests via the
test-support webextension.

Add a junit test GpuCrashTest, which triggers a GPU process crash and
ensures the crash reporter was notified.

Additionally, ensure the TestCrashHandler service is stopped in
between tests, as otherwise only the first crash test to run will be
notified of the crash.

Depends on D132811

Attachment #9253636 - Attachment description: Bug 1743454 - Add EXTRA_CRASH_BACKGROUND argument to GeckoView crash reporter. r?agi → Bug 1743454 - Add EXTRA_CRASH_PROCESS_TYPE argument to GeckoView crash reporter. r?agi
Pushed by jnicol@mozilla.com: https://hg.mozilla.org/integration/autoland/rev/4e4775abdc17 Add EXTRA_CRASH_PROCESS_TYPE argument to GeckoView crash reporter. r=agi,gsvelto https://hg.mozilla.org/integration/autoland/rev/bc587029a021 Notify GeckoView crash handler of GPU process crashes. r=agi,aosmond https://hg.mozilla.org/integration/autoland/rev/452eef165731 Ensure GPU process crash reports are generated regardless of which IPC actor dies first. r=aosmond https://hg.mozilla.org/integration/autoland/rev/49bfa085c1ac Add junit test to ensure GPU process crash triggers GeckoView crash reporter. r=agi,aosmond

Moving GPU process bugs to the new GeckoView::Sandboxing component.

Component: General → Sandboxing
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: