Closed Bug 1779425 Opened 2 years ago Closed 2 years ago

UI freezes and the GPU process crashes when enabled

Categories

(Core :: Graphics: WebRender, defect)

Firefox 104
defect

Tracking

()

VERIFIED FIXED
104 Branch
Tracking Status
firefox-esr91 --- unaffected
firefox-esr102 --- unaffected
firefox102 --- unaffected
firefox103 --- disabled
firefox104 --- fixed

People

(Reporter: tgnff242, Assigned: rmader)

References

(Regression)

Details

(Keywords: nightly-community, regression)

Attachments

(1 file)

User Agent: Mozilla/5.0 (X11; Linux x86_64; rv:104.0) Gecko/20100101 Firefox/104.0

Steps to reproduce:

  1. Enable the GPU process (layers.gpu-process.enabled:true).
  2. Visit a site with WebGL, like https://webglsamples.org/aquarium/aquarium.html.

Actual results:

The GPU process crashes, the browser UI and content freeze and WebGL fails.

Expected results:

GPU crash report: https://crash-stats.mozilla.org/report/index/bb81a472-c961-4dbd-b200-7c65c0220713

Sometimes, the browser might not resume quickly enough (or at all ?). Here's a crash report after sending a SIGABRT to the main process in that case: https://crash-stats.mozilla.org/report/index/286eac31-97b0-431a-8c09-f49bb0220713

mozregression result:
https://hg.mozilla.org/integration/autoland/pushloghtml?fromchange=f745c54f85264261404d0be6dd14627788ff6485&tochange=ea595572b3922c6ae4e034b6af82c05750b692f4

Has STR: --- → yes
Regressed by: 1778114

Set release status flags based on info from the regressing bug 1778114

:stransky, since you are the author of the regressor, bug 1778114, could you take a look?
For more information, please visit auto_nag documentation.

Flags: needinfo?(stransky)

This is a regression from Bug 1776563 - we don't create dmabuf device in GetDMABufDevice()->GetGbmDevice() but in gfxPlatform which is apparently not initialized in GPU process. Robert, any idea?

Flags: needinfo?(stransky) → needinfo?(robert.mader)
Regressed by: 1776563
No longer regressed by: 1778114

Triage - setting severity to S3, since this is only enabled on nightly currently.

Severity: -- → S3

The bug has a release status flag that shows some version of Firefox is affected, thus it will be considered confirmed.

Status: UNCONFIRMED → NEW
Ever confirmed: true

(In reply to Ashley Hale from comment #4)

Triage - setting severity to S3, since this is only enabled on nightly currently.

Is that really? I see GPU process disabled by default on Nighly (clean profile). It needs be enabled at about:config.

(In reply to Martin Stránský [:stransky] (ni? me) from comment #3)

This is a regression from Bug 1776563 - we don't create dmabuf device in GetDMABufDevice()->GetGbmDevice() but in gfxPlatform which is apparently not initialized in GPU process. Robert, any idea?

Ah yes, makes sense.

After bug 1776563 we only configure the dmabuf device in gfxPlatformGtk::InitDmabufConfig(), which is only called on the parent process.

Maybe we should add something like

if (XRE_IsParentProcess()) {
   (leave as it is)
} else if (gfxVars::UseDMABuf()) {
  nsCString failureId;
  GetDMABufDevice()->Configure(failureId);
}

Maybe even add a check for GPU or RDD process - IIUC that's the only processes where we want drm device access, right?

Flags: needinfo?(robert.mader)

(In reply to Martin Stránský [:stransky] (ni? me) from comment #6)

(In reply to Ashley Hale from comment #4)

Triage - setting severity to S3, since this is only enabled on nightly currently.

Is that really? I see GPU process disabled by default on Nighly (clean profile). It needs be enabled at about:config.

Oh sorry for the confusion, I didn't mean to imply it was definitely enabled, only that it could be enabled on nightly.

In some non-standard configurations we unexpectedly end up in this paths
without a GBM device - one example being the GPU process. Fail cleanly
instead of crashing in those cases, triggering fallback paths.

Context: in the past DMABuf usage was tightly coupled to GBM. Since the
introduction of the surfaceless and device EGL platforms that is not
longer the case, thus we can't make checks like IsDMABufWebGLEnabled()
depend on the presence of a GBM device.

Optimally all affected cases get fixed eventually. Until then and also
for future cases it makes sense to fail softly.

Assignee: nobody → robert.mader
Status: NEW → ASSIGNED
Pushed by robert.mader@posteo.de: https://hg.mozilla.org/integration/autoland/rev/c29ee933bf30 Check for GbmDevice before using it, r=stransky,jgilbert

Regarding comment 7: that approach doesn't work as the GPU- and RDD-process don't seem to initialize the platform. So just add some error handling in the affected cases, ensuring we take fallback paths.

Status: ASSIGNED → RESOLVED
Closed: 2 years ago
Resolution: --- → FIXED
Target Milestone: --- → 104 Branch
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: