Closed Bug 1341496 Opened 8 years ago Closed 8 years ago

Intermittent damp | application crashed [@ mozilla::CrossProcessSemaphore::CrossProcessSemaphore]

Categories

(Core :: Graphics: Layers, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla55
Tracking Status
firefox-esr52 --- unaffected
firefox53 --- unaffected
firefox54 --- fixed
firefox55 --- fixed

People

(Reporter: intermittent-bug-filer, Assigned: mattwoodrow)

References

Details

(Keywords: crash, intermittent-failure)

Attachments

(3 files)

matt: since you introduced CrossProcessSemaphore in Bug 1325227 - could you take a look, thanks!
Component: Talos → Graphics: Layers
Flags: needinfo?(matt.woodrow)
Keywords: crash
Product: Testing → Core
Version: Version 3 → unspecified
Thread 0 (crashed) [task 2017-03-16T07:32:31.143611Z] 07:32:31 INFO - 0 libxul.so!mozilla::CrossProcessSemaphore::CrossProcessSemaphore [CrossProcessSemaphore_posix.cpp:ab96d8a9e247 : 54 + 0x0] [task 2017-03-16T07:32:31.144335Z] 07:32:31 INFO - rax = 0x000000000061fd00 rdx = 0x0000000000000002 [task 2017-03-16T07:32:31.145136Z] 07:32:31 INFO - rcx = 0x00007fd0c5ffeee7 rbx = 0x00007fd0a5984860 [task 2017-03-16T07:32:31.145896Z] 07:32:31 INFO - rsi = 0x0000000000000000 rdi = 0x00007fd0d291c140 [task 2017-03-16T07:32:31.146727Z] 07:32:31 INFO - rbp = 0x00007fff235bf3f0 rsp = 0x00007fff235bf3d0 [task 2017-03-16T07:32:31.147530Z] 07:32:31 INFO - r8 = 0x0000000000000000 r9 = 0x00007fd0a5800000 [task 2017-03-16T07:32:31.148242Z] 07:32:31 INFO - r10 = 0x0000000000000000 r11 = 0x0000000000000206 [task 2017-03-16T07:32:31.149132Z] 07:32:31 INFO - r12 = 0x00007fd0a7121240 r13 = 0x0000000000000001 [task 2017-03-16T07:32:31.149964Z] 07:32:31 INFO - r14 = 0x00007fff235bf610 r15 = 0x0000000000000001 [task 2017-03-16T07:32:31.150703Z] 07:32:31 INFO - rip = 0x00007fd0c3c66ea0 [task 2017-03-16T07:32:31.151526Z] 07:32:31 INFO - Found by: given as instruction pointer in context [task 2017-03-16T07:32:31.152321Z] 07:32:31 INFO - 1 libxul.so!mozilla::layers::TextureClient::EnableBlockingReadLock [TextureClient.cpp:ab96d8a9e247 : 1441 + 0x21] [task 2017-03-16T07:32:31.153349Z] 07:32:31 INFO - rbx = 0x00007fd0a78b8ee0 rbp = 0x00007fff235bf410 [task 2017-03-16T07:32:31.154415Z] 07:32:31 INFO - rsp = 0x00007fff235bf400 r12 = 0x00007fd0a5984850 [task 2017-03-16T07:32:31.155510Z] 07:32:31 INFO - r13 = 0x00007fff235bf428 r14 = 0x00007fff235bf610 [task 2017-03-16T07:32:31.156539Z] 07:32:31 INFO - r15 = 0x0000000000000001 rip = 0x00007fd0c4090d4f [task 2017-03-16T07:32:31.157555Z] 07:32:31 INFO - Found by: call frame info [task 2017-03-16T07:32:31.158617Z] 07:32:31 INFO - 2 libxul.so!mozilla::layers::ContentClientRemoteBuffer::CreateBackBuffer [ContentClient.cpp:ab96d8a9e247 : 323 + 0xc] [task 2017-03-16T07:32:31.159691Z] 07:32:31 INFO - rbx = 0x00007fd0a46aeef0 rbp = 0x00007fff235bf450 [task 2017-03-16T07:32:31.160761Z] 07:32:31 INFO - rsp = 0x00007fff235bf420 r12 = 0x0000000000000002 [task 2017-03-16T07:32:31.161820Z] 07:32:31 INFO - r13 = 0x00007fff235bf428 r14 = 0x00007fff235bf610 [task 2017-03-16T07:32:31.162836Z] 07:32:31 INFO - r15 = 0x0000000000000001 rip = 0x00007fd0c4098cec [task 2017-03-16T07:32:31.163889Z] 07:32:31 INFO - Found by: call frame info [task 2017-03-16T07:32:31.165018Z] 07:32:31 INFO - 3 libxul.so!mozilla::layers::ContentClientDoubleBuffered::EnsureBackBufferIfFrontBuffer [ContentClient.cpp:ab96d8a9e247 : 631 + 0x5] [task 2017-03-16T07:32:31.166178Z] 07:32:31 INFO - rbx = 0x00007fd0a46aeef0 rbp = 0x00007fff235bf470 [task 2017-03-16T07:32:31.167279Z] 07:32:31 INFO - rsp = 0x00007fff235bf460 r12 = 0x00007fff235bf5c0 [task 2017-03-16T07:32:31.168436Z] 07:32:31 INFO - r13 = 0x00007fff235bf5c8 r14 = 0x00007fff235bf610 [task 2017-03-16T07:32:31.169568Z] 07:32:31 INFO - r15 = 0x0000000000000001 rip = 0x00007fd0c4098ebd
It looks like sem_init is failing, possibly due to hitting resource limits. Nical, any ideas what we should do when we can't have any new semaphores? The two obvious (and simple) choices are: * Treat it as a failure to allocate the texture and bail out from rendering the layer entirely. * Ignore it, render the layer, but have it unsychronized. It's possible that the user would never notice, but they might get weird corruption from races. There's also other work I think we could do to reduce the chances of this happening. * Share a single readlock across a component alpha white/black buffer pair. * When re-allocating buffers (due to a size change), re-use the existing lock rather than allocating a new one. These may or may not help (depending on what exactly it causing us to run out), and it's not clear if they're worth the engineering effort right now.
Flags: needinfo?(matt.woodrow) → needinfo?(nical.bugzilla)
I would rather avoid potentially unsychronized texture accesses, because it's a bit hard to debug and the temptation to blame any glitch on that may become strong. If the lock serialization fails I would rather fall back to the copy-on-write behavior and force the copy next time we render into that texture, or have a pre-allocated lock per frame and fall back to blocking on it instead of blocking on just that texture. Sharing the lock across buffer pairs sounds like a good idea, and recycling locks as well.
Flags: needinfo?(nical.bugzilla)
Comment on attachment 8855636 [details] Bug 1341496 - Part 3: Make CrossProcessSemaphore allocation fallible. https://reviewboard.mozilla.org/r/127506/#review130512
Attachment #8855636 - Flags: review?(wmccloskey) → review+
Comment on attachment 8855635 [details] Bug 1341496 - Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time. https://reviewboard.mozilla.org/r/127504/#review133650
Attachment #8855635 - Flags: review?(nical.bugzilla) → review+
Comment on attachment 8855634 [details] Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid. https://reviewboard.mozilla.org/r/127502/#review133664
Attachment #8855634 - Flags: review?(nical.bugzilla) → review+
Pushed by mwoodrow@mozilla.com: https://hg.mozilla.org/integration/mozilla-inbound/rev/c5785434af74 Part 1: Don't try to serialize read locks that aren't valid. r=nical https://hg.mozilla.org/integration/mozilla-inbound/rev/f4c724034728 Part 2: Don't use a separate ReadLock for the second component alpha texture as they should always be locked/unlocked at the same time. r=nical https://hg.mozilla.org/integration/mozilla-inbound/rev/f79d7564d39d Part 3: Make CrossProcessSemaphore allocation fallible. r=billm
Matt, do we need to uplift this to Beta as well?
Assignee: nobody → matt.woodrow
Flags: needinfo?(matt.woodrow)
Comment on attachment 8855634 [details] Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid. Approval Request Comment [Feature/Bug causing the regression]: Bug 1325227 [User impact if declined]: Crashes on pages that introduce lots of layers. [Is this code covered by automated tests?]: Yes, intermittent failures introduced by the regressing bug have stopped happening. [Has the fix been verified in Nightly?]: No. [Needs manual test from QE? If yes, steps to reproduce]: No [List of other uplifts needed for the feature/fix]: None [Is the change risky?]: No, [Why is the change risky/not risky?]: It just adds graceful fallback for when allocation of a system object fails. [String changes made/needed]: None
Flags: needinfo?(matt.woodrow)
Attachment #8855634 - Flags: approval-mozilla-beta?
Comment on attachment 8855634 [details] Bug 1341496 - Part 1: Don't try to serialize read locks that aren't valid. Fix an intermittent-failure. Beta54+. Should be in 54 beta 3.
Attachment #8855634 - Flags: approval-mozilla-beta? → approval-mozilla-beta+
(In reply to Matt Woodrow (:mattwoodrow) from comment #19) > [Is this code covered by automated tests?]: Yes, intermittent failures > introduced by the regressing bug have stopped happening. > [Has the fix been verified in Nightly?]: No. > [Needs manual test from QE? If yes, steps to reproduce]: No Setting qe-verify- based on Matt's assessment on manual testing needs and the fact that this fix has automated coverage.
Flags: qe-verify-
Fwiw this might fix #1345899 by accident where CrossProcessSemaphore isnt functional - i'll make sure to test 54.0b3.
With 54.0b3 on OpenBSD and the default of false for layers.enable-tiles, the window is displayed empty, and the terminal is filled with messages like: Crash Annotation GraphicsCriticalError: |[0][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 751 (t=3.74234) |[256][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3699) |[257][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3843) |[258][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4132) |[259][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4206) |[260][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4337) |[261][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4429) |[262][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4628) |[263][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4736) |[264][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.4998) |[250][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3021) |[251][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3099) |[252][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3219) |[253][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3301) |[254][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3478) |[255][GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 (t=22.3554) [GFX1-]: Failed 2 buffer db=0 dw=0 for 0, 0, 1280, 747 So even if the browser doesnt crash per se (#1345899), it isnt usable without enabling tiles - dunno how related it is to the CrossProcessSemaphore thing , im a bit lost in the interdependencies of e10s/gfx/tiles...
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: