Closed Bug 1724385 Opened 3 years ago Closed 3 years ago

[vaapi] VA-API does no fallback to SW decode when we fail to create EGLImage

Categories

(Core :: Audio/Video: Playback, defect)

defect

Tracking

()

RESOLVED FIXED
98 Branch
Tracking Status
firefox98 --- fixed

People

(Reporter: emilio, Assigned: stransky)

References

(Blocks 1 open bug)

Details

Crash Data

Attachments

(6 files)

STR:

ER:

  • A video.

AR:

  • Just green. On macOS or without the vaapi pref it works as expected.
Flags: needinfo?(stransky)

Works for me, it's h264 clip...run with MOZ_LOG="PlatformDecoderModule:5" for more info please.

Flags: needinfo?(stransky) → needinfo?(emilio)
Attached file Log (deleted) —

Seems to be doing DMABuf stuff, but still all I see on the screen is green.

Flags: needinfo?(emilio) → needinfo?(stransky)
Attached file about:support (deleted) —

This is a clean build with vaapi pref flipped and MOZ_ENABLE_WAYLAND=1

Attachment #9235105 - Attachment mime type: application/octet-stream → text/plain

Looks like the stream is decoded, perhaps dmabuf textures are not processes / exported correctly. Please run with MOZ_LOG="Dmabuf:5" env set to check any issues. Also please try to set widget.dmabuf-textures.enabled to true at about:config and restart browser and check MOZ_LOG="Dmabuf:5" log.

Flags: needinfo?(emilio)
Attached file Dmabuf log (deleted) —

The dmabuf-textures pref doesn't make any difference on the output afaict.

The interesting bits seem to be "EGLImageKHR creation failed" from here.

Flags: needinfo?(emilio)
Attachment #9235420 - Attachment description: Dma → Dmabuf log with widget.dmabuf-textures.enabled=true
Attachment #9235419 - Attachment mime type: application/octet-stream → text/plain
Attachment #9235420 - Attachment mime type: application/octet-stream → text/plain

Yes, we fail to create a texture over dmabuf buffer and that's also reason why you see green screen - it's empty frame.

Can you try to put a breakpoint to DMABufSurface.cpp#1165 (DMABufSurfaceYUV::CreateTexture in chrome process) and attach content of 'this' ?
Also make sure you have disabled widget.dmabuf-textures.enabled to make sure only vaapi dmabuf frames are uploaded as textures.

I expect the dmabuf has some unusual format or so and we fail to create a texture for it.

Thanks.

Flags: needinfo?(stransky)
Flags: needinfo?(emilio)
(gdb) p *this
$3 = (DMABufSurfaceYUV) {
  <DMABufSurface> = {
    _vptr$DMABufSurface = 0x7f7f9e33fb30 <vtable for DMABufSurfaceYUV+16>,
    mRefCnt = {
      mValue = {
        <std::__atomic_base<unsigned long>> = {
          _M_i = 2
        }, 
      }
    },
    mSurfaceType = DMABufSurface::SURFACE_NV12,
    mBufferModifier = 0,
    mBufferPlaneCount = 2,
    mDmabufFds = {159, 160, -1, -1},
    mDrmFormats = {538982482, 943215175, 0, 0},
    mStrides = {1536, 1536, 0, 0},
    mOffsets = {0, 909312, 0, 0},
    mGbmBufferObject = {0x0, 0x0, 0x0, 0x0},
    mMappedRegion = {0x0, 0x0, 0x0, 0x0},
    mMappedRegionData = {0xe4e4e4e4e4e4e4e4, 0xe4e4e4e4e4e4e4e4, 0xe4e4e4e4e4e4e4e4, 0xe4e4e4e4e4e4e4e4},
    mMappedRegionStride = {0, 0, 0, 0},
    mSyncFd = -1,
    mSync = 0x0,
    mGL = [(mozilla::gl::GLContext *) 0x0],
    mGlobalRefCountFd = 161,
    mUID = 3,
    mSurfaceLock = {
      <mozilla::OffTheBooksMutex> = {
        <mozilla::detail::MutexImpl> = {
          platformData_ = {0x1, 0x1000603fd, 0x2, 0x0, 0x0}
        }, 
        <mozilla::BlockingResourceBase> = {
          mChainPrev = 0x0,
          mName = 0x7f7f93a89b8b "DMABufSurface",
          mType = mozilla::BlockingResourceBase::eMutex,
          mAcquired = true
        }, 
        members of mozilla::OffTheBooksMutex:
        mOwningThread = 0x7f7f66965160
      }, <No data fields>}
  }, 
  members of DMABufSurfaceYUV:
  mWidth = {1280, 640, 0, 0},
  mHeight = {584, 292, 0, 0},
  mEGLImage = {0x0, 0x0, 0x0, 0x0},
  mTexture = {0, 0, 0, 0},
  mColorSpace = mozilla::gfx::YUVColorSpace::BT709
}

Digging a bit more, we fail to create the texture because this check returns false in the DRI driver, because pitch is 768 but surf->u.gfx9.surf_pitch is 640.

pitch comes from stride / 2, and that stride comes from DMABufSurfaceYUV::mStrides (1536).

Does that help?

Flags: needinfo?(emilio) → needinfo?(stransky)

We need to check we can create EGLImage from the dmabuf surface and fallback to SW decode otherwise.

Flags: needinfo?(stransky)
Summary: [vaapi] Recording for bug 1722662 shows just green → [vaapi] VA-API does no fallback to SW decode when we fail to create EGLImage

Set to S4 for now. Feel free to change if needed.

Severity: -- → S4

I think the best solution is to create EGLImage right after image decode in media process and pass EGLImage to IPC bridge instead of dmabuf fds. It may need some work to not break WebGL backend and it may consume some memory on media process to create GL context but we get rid of complex dmabuf setup pipelined through IPC bridge.

Hm, looks like EGLImage can't be shared across processes - that works in single process only when address space is shared. So we need to create the EGLImage for decoded images or pass the failure from compositor to decoder.

To verify EGLImage creation we need to create GL context in RDD process. jld, do you think we can create GL context in RDD process? It's blocked by RDD sandbox right now.

Flags: needinfo?(jld)

I didn't think we were blocking anything that would be needed to create a GL context (as of the patch that allowed DRI and various Mesa device-detection things), but something may be missing. In particular, we may need to allow creating new connections to the display server (like this, for content processes on X11), but it could also be something else like a config file. If there aren't any seccomp-bpf errors, setting MOZ_SANDBOX_LOGGING=1 and reading the broker logs should help. Alternately, if you have a patch I can try locally, I can try to see what's going wrong.

Flags: needinfo?(jld)

Thanks, there are the error code I see on RDD process:

Sandbox: Failed errno -2 op open flags 02000000 path /raid/src2/objdir/dist/bin/libEGL.so
Sandbox: Failed errno -2 op open flags 02000000 path /raid/src2/objdir/dist/bin/libGLdispatch.so.0
Sandbox: Failed errno -2 op open flags 02000000 path /raid/src2/objdir/dist/bin/libGL.so
Sandbox: Failed errno -2 op open flags 02000000 path /raid/src2/objdir/dist/bin/libGLX.so.0
Sandbox: SandboxBroker: denied op=open rflags=2204000 perms=0 path=/etc/glvnd/egl_vendor.d for pid=25285
Sandbox: Failed errno -13 op open flags 02204000 path /etc/glvnd/egl_vendor.d
Sandbox: SandboxBroker: denied op=open rflags=2204000 perms=0 path=/usr/share/glvnd/egl_vendor.d for pid=25285
Sandbox: Failed errno -13 op open flags 02204000 path /usr/share/glvnd/egl_vendor.d

Assignee: nobody → stransky
Status: NEW → ASSIGNED

I see the sandbox breakage with the patch above.
Thanks.

Flags: needinfo?(jld)

(ni? me to file comment 8 as a bug when back at my computer)

Flags: needinfo?(emilio)

There are indeed some things I was forgetting. For one, the content policy allows reading all of /etc and /usr/share/, which takes care of the …/glvnd/eglvendor.d/… files that specify what library to load. It also wants access the Mesa shader cache, and there are various drirc files. I'll need to spend more time looking into this.

Jed, that would be great. With the checks we can enable va-api by default (on tested hardware) as we can reliably detect decode errors and fallback to SW decode in such case.

Martin, in the mesa issue above it was pointed out to me that these two lines are the ones causing the issue. Do you know why they are needed?

Flags: needinfo?(stransky)

Frankly I don't know, I'll look at it.
As for the patch here, I think if we want to enable va-api by default we need such check anyway to fallback to SW decode in case of any possible issue with GL/drivers.

Depends on: 1751363
Flags: needinfo?(stransky)

Keep VADRMPRIMESurfaceDescriptor width/height as is to avoid EGLImage import failures:
https://gitlab.freedesktop.org/mesa/mesa/-/issues/5851

Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/32af039c697d [Linux] Don't set width/height to VADRMPRIMESurfaceDescriptor r=alwu,media-playback-reviewers
Pushed by stransky@redhat.com: https://hg.mozilla.org/integration/autoland/rev/c3d14f2b234a [Linux] Try to create EGLImage over decoded VA-API video frame r=alwu,emilio,media-playback-reviewers
Status: ASSIGNED → RESOLVED
Closed: 3 years ago
Resolution: --- → FIXED
Target Milestone: --- → 98 Branch
Crash Signature: [@ mozilla::VideoFramePool::ReleaseUnusedVAAPIFrames]
Crash Signature: [@ mozilla::VideoFramePool::ReleaseUnusedVAAPIFrames] → [@ mozilla::gmp::GMPChild::ProcessingError] [@ mozilla::PRDDChild::OtherPid] [@ mozilla::VideoFramePool::ReleaseUnusedVAAPIFrames]
Regressions: 1752282
Regressions: 1752605
Regressions: 1753390
Flags: qe-verify+
Regressions: 1754074

It seems that the issue does not reproduce on my end using Ubuntu 18.04 x64, with an affected Nightly build (92.0a1, Build ID 20210806213901). I had the build opened in a wayland session, and the pref media.ffmpeg.vaapi.enabled set to true.

Emilio, would you mind checking if the fix works for you on latest Beta 98?

Flags: needinfo?(emilio)
No longer depends on: 1751363
Regressions: 1751363

Yeah, this reproed only on AMD, and can confirm is fixed now.

Flags: needinfo?(emilio)
Regressions: 1769499
No longer regressions: 1769499
Flags: needinfo?(jld)
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: