Closed Bug 1252237 Opened 9 years ago Closed 9 years ago

[e10s] 780392-1.html crashes/asserts on OSX (Failed to create a valid ShmemTextureHost)

Categories

(Core :: Graphics, defect)

Unspecified
macOS
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla48
Tracking Status
e10s + ---
firefox47 --- wontfix
firefox48 --- fixed

People

(Reporter: RyanVM, Assigned: jerry)

References

(Blocks 1 open bug)

Details

(Keywords: assertion, crash, Whiteboard: [gfx-noted][e10s-orangeblockers])

Attachments

(1 file)

https://treeherder.mozilla.org/logviewer.html#?job_id=17372401&repo=try 12:33:28 INFO - REFTEST TEST-START | file:///builds/slave/test/build/tests/reftest/tests/dom/canvas/crashtests/780392-1.html 12:33:28 INFO - REFTEST TEST-LOAD | file:///builds/slave/test/build/tests/reftest/tests/dom/canvas/crashtests/780392-1.html | 260 / 3008 (8%) 12:33:28 INFO - ++DOMWINDOW == 78 (0x11f893000) [pid = 1719] [serial = 871] [outer = 0x114539800] 12:33:28 INFO - [GFX1-]: Failed to create a SkiaGL DrawTarget, falling back to software 12:33:28 INFO - [Parent 1716] WARNING: Failed to map shared memory (400011264 bytes) into 707, port 10463. (os/kern) invalid argument (4) 12:33:28 INFO - : file /builds/slave/try-m64-d-00000000000000000000/build/src/ipc/glue/SharedMemoryBasic_mach.mm, line 581 12:33:28 INFO - ###!!! [Child][DispatchAsyncMessage] Error: (msgtype=0xFFFB,name=???) Payload error: message could not be deserialized 12:33:29 INFO - [GFX1]: Failed to create a valid ShmemTextureHost 12:33:29 INFO - Assertion failure: false (An assert from the graphics logger), at /builds/slave/try-m64-d-00000000000000000000/build/src/gfx/2d/Logging.h:510 12:33:56 INFO - #01: mozilla::layers::ShmemTextureHost::ShmemTextureHost(mozilla::ipc::Shmem const&, mozilla::layers::BufferDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::TextureFlags) [/usr/include/c++/4.2.1/sstream:558] 12:33:56 INFO - #02: mozilla::layers::CreateBackendIndependentTextureHost(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::TextureFlags) [gfx/layers/composite/TextureHost.cpp:664] 12:33:56 INFO - #03: mozilla::layers::TextureHost::Create(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::LayersBackend, mozilla::layers::TextureFlags) [gfx/layers/composite/TextureHost.cpp:208] 12:33:56 INFO - #04: mozilla::layers::TextureParent::Init(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::LayersBackend const&, mozilla::layers::TextureFlags const&) [mfbt/AlreadyAddRefed.h:116] 12:33:56 INFO - #05: mozilla::layers::TextureHost::CreateIPDLActor(mozilla::layers::CompositableParentManager*, mozilla::layers::SurfaceDescriptor const&, mozilla::layers::LayersBackend, mozilla::layers::TextureFlags) [gfx/layers/composite/TextureHost.cpp:105] 12:33:56 INFO - #06: mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PLayerTransactionParent.cpp:451] 12:33:56 INFO - #07: mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const&) [obj-firefox/ipc/ipdl/PCompositorParent.cpp:496] 12:33:56 INFO - #08: mozilla::ipc::MessageChannel::DispatchAsyncMessage(IPC::Message const&) [ipc/glue/MessageChannel.h:553] 12:33:56 INFO - #09: mozilla::ipc::MessageChannel::DispatchMessage(IPC::Message const&) [ipc/glue/MessageChannel.cpp:1384] 12:33:56 INFO - #10: mozilla::ipc::MessageChannel::OnMaybeDequeueOne() [ipc/glue/MessageChannel.cpp:1353] 12:33:56 INFO - #11: MessageLoop::DeferOrRunPendingTask(MessageLoop::PendingTask const&) [ipc/chromium/src/base/message_loop.cc:365] 12:33:56 INFO - #12: MessageLoop::DoWork() [ipc/chromium/src/base/message_loop.cc:459] 12:33:56 INFO - #13: base::MessagePumpDefault::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_default.cc:34] 12:33:56 INFO - #14: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:520] 12:33:56 INFO - #15: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:175] 12:33:56 INFO - #16: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:36] 12:33:56 INFO - #17: libsystem_pthread.dylib + 0x405a 12:33:56 INFO - #18: libsystem_pthread.dylib + 0x3fd7 12:33:56 INFO - ###!!! [Child][MessageChannel::SendAndWait] Error: Channel error: cannot send/recv 12:33:56 INFO - [Child 1719] ###!!! ABORT: Aborting on channel error.: file /builds/slave/try-m64-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1824 12:33:56 INFO - [Child 1719] WARNING: failed to forward Layers transaction: file /builds/slave/try-m64-d-00000000000000000000/build/src/gfx/layers/client/ClientLayerManager.cpp, line 638 12:33:56 INFO - ###!!! [Child][MessageChannel] Error: (msgtype=0x900001,name=PLayer::Msg___delete__) Channel error: cannot send/recv 12:33:56 INFO - ###!!! [Child][MessageChannel] Error: (msgtype=0x900001,name=PLayer::Msg___delete__) Channel error: cannot send/recv 12:33:56 INFO - ###!!! [Child][MessageChannel] Error: (msgtype=0x900001,name=PLayer::Msg___delete__) Channel error: cannot send/recv 12:33:56 INFO - ###!!! [Child][MessageChannel] Error: (msgtype=0x900001,name=PLayer::Msg___delete__) Channel error: cannot send/recv 12:33:56 INFO - #01: mozilla::ipc::ProcessLink::OnChannelError() [xpcom/glue/Monitor.h:36] 12:33:56 INFO - --DOMWINDOW == 44 (0x11b471000) [pid = 1719] [serial = 834] [outer = 0x0] [url = about:blank] 12:33:56 INFO - #02: IPC::Channel::ChannelImpl::OnFileCanReadWithoutBlocking(int) [ipc/chromium/src/chrome/common/ipc_channel_posix.cc:894] 12:33:56 INFO - #03: event_base_loop [ipc/chromium/src/third_party/libevent/event.c:1355] 12:33:56 INFO - #04: base::MessagePumpLibevent::Run(base::MessagePump::Delegate*) [ipc/chromium/src/base/message_pump_libevent.cc:362] 12:33:56 INFO - #05: MessageLoop::Run() [ipc/chromium/src/base/message_loop.cc:520] 12:33:56 INFO - ###!!! [Child][OnMaybeDequeueOne] Error: Channel error: cannot send/recv 12:33:56 INFO - Hit MOZ_CRASH(MsgDropped) at /builds/slave/try-m64-d-00000000000000000000/build/src/ipc/glue/BackgroundChildImpl.cpp:136 12:33:56 INFO - #06: base::Thread::ThreadMain() [ipc/chromium/src/base/thread.cc:175] 12:33:56 INFO - #07: ThreadFunc [ipc/chromium/src/base/platform_thread_posix.cc:36] 12:33:56 INFO - #08: libsystem_pthread.dylib + 0x405a 12:33:56 INFO - #09: libsystem_pthread.dylib + 0x3fd7 12:33:56 INFO - [Child 1719] ###!!! ABORT: Aborting on channel error.: file /builds/slave/try-m64-d-00000000000000000000/build/src/ipc/glue/MessageChannel.cpp, line 1824 12:33:56 INFO - Hit MOZ_CRASH() at /builds/slave/try-m64-d-00000000000000000000/build/src/memory/mozalloc/mozalloc_abort.cpp:33 12:33:56 WARNING - TEST-UNEXPECTED-FAIL | file:///builds/slave/test/build/tests/reftest/tests/dom/canvas/crashtests/780392-1.html | application terminated with exit code 1 12:33:56 INFO - REFTEST INFO | Copy/paste: /builds/slave/test/build/macosx64-minidump_stackwalk /var/folders/1r/5dhvqs_52pbg86_h8rkrnzj800000w/T/tmpRgBCSq.mozrunner/minidumps/AC899A0F-8589-41C6-9F72-957657AFA983.dmp /builds/slave/test/build/symbols 12:34:09 INFO - REFTEST INFO | Saved minidump as /builds/slave/test/build/blobber_upload_dir/AC899A0F-8589-41C6-9F72-957657AFA983.dmp 12:34:09 INFO - REFTEST INFO | Saved app info as /builds/slave/test/build/blobber_upload_dir/AC899A0F-8589-41C6-9F72-957657AFA983.extra 12:34:09 ERROR - REFTEST PROCESS-CRASH | file:///builds/slave/test/build/tests/reftest/tests/dom/canvas/crashtests/780392-1.html | application crashed [@ mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::string const&)] 12:34:09 INFO - Crash dump filename: /var/folders/1r/5dhvqs_52pbg86_h8rkrnzj800000w/T/tmpRgBCSq.mozrunner/minidumps/AC899A0F-8589-41C6-9F72-957657AFA983.dmp 12:34:09 INFO - Operating system: Mac OS X 12:34:09 INFO - 10.10.5 14F27 12:34:09 INFO - CPU: amd64 12:34:09 INFO - family 6 model 69 stepping 1 12:34:09 INFO - 4 CPUs 12:34:09 INFO - Crash reason: EXC_BAD_ACCESS / KERN_INVALID_ADDRESS 12:34:09 INFO - Crash address: 0x0 12:34:09 INFO - Process uptime: 42 seconds 12:34:09 INFO - Thread 30 (crashed) 12:34:09 INFO - 0 XUL!mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::WriteLog(std::string const&) [Logging.h:544469683953 : 510 + 0x0] 12:34:09 INFO - rax = 0x0000000000000000 rdx = 0x00007fff741181f8 12:34:09 INFO - rcx = 0x0000000000000000 rbx = 0x00007fff74118c50 12:34:09 INFO - rsi = 0x0000610000006100 rdi = 0x000000010677c875 12:34:09 INFO - rbp = 0x000000011a280330 rsp = 0x000000011a280320 12:34:09 INFO - r8 = 0x000000011a2802d0 r9 = 0x000000011a281000 12:34:09 INFO - r10 = 0x00007fff8e0763ef r11 = 0x00007fff8e0763c0 12:34:09 INFO - r12 = 0x0000000125e50180 r13 = 0x000000011a2807d0 12:34:09 INFO - r14 = 0x000000011a280398 r15 = 0x000000011a2807d0 12:34:09 INFO - rip = 0x0000000102924ac1 12:34:09 INFO - Found by: given as instruction pointer in context 12:34:09 INFO - 1 XUL!mozilla::gfx::Log<1, mozilla::gfx::CriticalLogger>::Flush() [Logging.h:544469683953 : 280 + 0x8] 12:34:09 INFO - rbx = 0x000000011a280380 rbp = 0x000000011a280370 12:34:09 INFO - rsp = 0x000000011a280340 r12 = 0x0000000125e50180 12:34:09 INFO - r13 = 0x000000011a2807d0 r14 = 0x000000011a280398 12:34:09 INFO - r15 = 0x000000011a2807d0 rip = 0x0000000102924977 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 2 XUL!mozilla::layers::ShmemTextureHost::ShmemTextureHost(mozilla::ipc::Shmem const&, mozilla::layers::BufferDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::TextureFlags) [Logging.h:544469683953 : 272 + 0x5] 12:34:09 INFO - rbx = 0x0000000125e50198 rbp = 0x000000011a280520 12:34:09 INFO - rsp = 0x000000011a280380 r12 = 0x0000000125e50180 12:34:09 INFO - r13 = 0x000000011a2807d0 r14 = 0x0000000114da11f0 12:34:09 INFO - r15 = 0x000000011a2807d0 rip = 0x0000000102ad2cd0 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 3 XUL!mozilla::layers::CreateBackendIndependentTextureHost(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::TextureFlags) [TextureHost.cpp:544469683953 : 664 + 0x15] 12:34:09 INFO - rbx = 0x000000011a2807a8 rbp = 0x000000011a280570 12:34:09 INFO - rsp = 0x000000011a280530 r12 = 0x0000000125e50180 12:34:09 INFO - r13 = 0x000000011a2807d0 r14 = 0x0000000114da11f0 12:34:09 INFO - r15 = 0x000000011a2805a8 rip = 0x0000000102ad0d7a 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 4 XUL!mozilla::layers::TextureHost::Create(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::ISurfaceAllocator*, mozilla::layers::LayersBackend, mozilla::layers::TextureFlags) [TextureHost.cpp:544469683953 : 208 + 0xb] 12:34:09 INFO - rbx = 0x000000011a2805a8 rbp = 0x000000011a280590 12:34:09 INFO - rsp = 0x000000011a280580 r12 = 0x000000011a280ad8 12:34:09 INFO - r13 = 0x0000000125e500c0 r14 = 0x000000011a2805d0 12:34:09 INFO - r15 = 0x0000000125e50180 rip = 0x0000000102ad0c7f 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 5 XUL!mozilla::layers::TextureParent::Init(mozilla::layers::SurfaceDescriptor const&, mozilla::layers::LayersBackend const&, mozilla::layers::TextureFlags const&) [TextureHost.cpp:544469683953 : 819 + 0xf] 12:34:09 INFO - rbx = 0x0000000114762330 rbp = 0x000000011a2805c0 12:34:09 INFO - rsp = 0x000000011a2805a0 r12 = 0x000000011a280ad8 12:34:09 INFO - r13 = 0x0000000125e500c0 r14 = 0x000000011a2805d0 12:34:09 INFO - r15 = 0x0000000125e50180 rip = 0x0000000102ad08f7 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 6 XUL!mozilla::layers::TextureHost::CreateIPDLActor(mozilla::layers::CompositableParentManager*, mozilla::layers::SurfaceDescriptor const&, mozilla::layers::LayersBackend, mozilla::layers::TextureFlags) [TextureHost.cpp:544469683953 : 105 + 0xb] 12:34:09 INFO - rbx = 0x0000000114762330 rbp = 0x000000011a2805f0 12:34:09 INFO - rsp = 0x000000011a2805d0 r12 = 0x000000011a280ad8 12:34:09 INFO - r13 = 0x0000000125e500c0 r14 = 0x000000011a2807a8 12:34:09 INFO - r15 = 0x0000000125e50180 rip = 0x0000000102ad0821 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 7 XUL!mozilla::layers::PLayerTransactionParent::OnMessageReceived(IPC::Message const&) [PLayerTransactionParent.cpp:544469683953 : 451 + 0x22] 12:34:09 INFO - rbx = 0x00000000fffff0b1 rbp = 0x000000011a280830 12:34:09 INFO - rsp = 0x000000011a280600 r12 = 0x000000011a280ad8 12:34:09 INFO - r13 = 0x0000000125e500c0 r14 = 0x2f0079e15239f5b0 12:34:09 INFO - r15 = 0x000000011a280ae0 rip = 0x0000000102244f9f 12:34:09 INFO - Found by: call frame info 12:34:09 INFO - 8 XUL!mozilla::layers::PCompositorParent::OnMessageReceived(IPC::Message const&) [PCompositorParent.cpp:544469683953 : 496 + 0xc] 12:34:09 INFO - rbx = 0x0000000000000006 rbp = 0x000000011a280980 12:34:09 INFO - rsp = 0x000000011a280840 r12 = 0x000000011a280ad8 12:34:09 INFO - r13 = 0x0000000000000000 r14 = 0x0000000123bc9800 12:34:09 INFO - r15 = 0x000000011a280ae0 rip = 0x00000001024a5029 12:34:09 INFO - Found by: call frame info
Skipping 780392-1.html just moves the failure to 789933-1.html instead, so this isn't something we're easily going to disable around without being pretty heavy-handed.
I can reproduce this locally.
This doesn't need the automated test - just open dom/canvas/crashtests/780392-1.html in the browser. The test just attempts to make a 10k x 10k canvas.
Also, this doesn't depend on the canvas type - skia gl, skia, and cg have the same problem. Just depends on the e10s being on.
We hit: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/TextureHost.cpp#771 called from: https://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/TextureHost.cpp#259 and it doesn't look like we're ready for this kind of a failure. Just dealing with the failure by checking if newly created ShmemTextureHost::GetBuffer() returns null is not enough; the test passes (for me), but in the interactive scenario, we eventually crash in: Assertion failure: mDestroyed, at /Users/msreckovic/Repos/mozilla-central/gfx/layers/IPDLActor.h:108 #01: mozilla::layers::ParentActor<mozilla::layers::PTextureParent>::~ParentActor() #02: mozilla::layers::TextureParent::~TextureParent() Sotaro, do you know enough about this code before :nical comes back?
Flags: needinfo?(sotaro.ikeda.g)
(In reply to Milan Sreckovic [:milan] from comment #5) > We hit: > https://dxr.mozilla.org/mozilla-central/source/gfx/layers/composite/ > TextureHost.cpp#771 I know the code. It was added by Bug 1208226. The bug just moved the shmem failure handling from gecko ipc to gfx layers. But the bug did not make clear why there is a case that failed to map shmem in Chrome process, the shmem was succeeded to allocated in content process. It might be better to investigate why such things happens at first. In Bug 1208226, we did not have a STR that could be reproduced on any Mac pc.
Whiteboard: [gfx-noted]
Flags: needinfo?(sotaro.ikeda.g)
Flags: needinfo?(sotaro.ikeda.g)
Failure to map a shmem on the parent side has often been caused by lack of contiguous address space. There might be other reasons.
Any news here? This is one of two bugs keeping OSX 10.10 debug from being green w/ e10s enabled.
Whiteboard: [gfx-noted] → [gfx-noted][e10s-orangeblockers]
I'll get a patch that at least lets me pass the test, even if it leaves another assertion elsewhere, so that we can enable the tests.
Flags: needinfo?(milan)
No such patch anymore, the code has changed enough. Will need a proper solution.
I'm somewhat running out of time to do this, but I will keep looking; Peter, maybe somebody can join in parallel, and whoever gets there first :)
Flags: needinfo?(howareyou322)
(In reply to Nicolas Silva [:nical] from comment #7) > Failure to map a shmem on the parent side has often been caused by lack of > contiguous address space. There might be other reasons. Yes, this is the test that's checking what happens when people ask for too much space (in this case, 400MB.) So, we know why the allocation fails, and we expect it to fail with those arguments. We just don't want to crash just because somebody asks for a 10k x 10k canvas :)
Bill, Brad - I understand you may know something about this (pass the request to parent to allocate a shmem) and could suggest the right way to go about digging our way out of the failed shmem allocation.
Flags: needinfo?(wmccloskey)
Flags: needinfo?(howareyou322)
Flags: needinfo?(blassey.bugs)
I'm tracing the allocation flow now. Maybe we don't need to have the assert when the allocation failed. Just construct a good error handling code for rendering.
(In reply to Milan Sreckovic [:milan] from comment #13) > Bill, Brad - I understand you may know something about this (pass the > request to parent to allocate a shmem) and could suggest the right way to go > about digging our way out of the failed shmem allocation. Perhaps I'm reading this wrong, but based on the stacks it looks like the shared memory code is working correctly. Shared memory is limited, so it is possible for the allocations to fail and we need to be able to handle that in the calling code. Let me know if I'm reading this wrong though.
Flags: needinfo?(blassey.bugs)
Both of you are correct - we do not need the assert when the allocation failed, because that can happen. We are handling it in the calling code, but in the wrong places - we hit all kinds of IPC assertions in the process, mostly in the (de)serialization code. Assuming those assertions should stay, we need to catch and handle these problems elsewhere.
Flags: needinfo?(milan)
I just remove the assert for that mapping failed. I also check the ipc deserialized flow. It's fine with the null mapping in ShmemTextureHost. But we will have lock failed when we try to lock the ShmemTextureHost. Then there will be nothing on the screen.
(In reply to Jerry Shih[:jerry] (UTC+8) from comment #17) > I just remove the assert for that mapping failed. I also check the ipc > deserialized flow. > It's fine with the null mapping in ShmemTextureHost. But we will have lock > failed when we try to lock the ShmemTextureHost. Then there will be nothing > on the screen. Perhaps we should still crash debug builds when the requested size is below a reasonable threshold, like here: https://dxr.mozilla.org/mozilla-central/rev/29d5a4175c8b74f45482276a53985cf2568b4be2/gfx/layers/d3d11/TextureD3D11.cpp#372
(In reply to Milan Sreckovic [:milan] from comment #16) > Both of you are correct - we do not need the assert when the allocation > failed, because that can happen. We are handling it in the calling code, > but in the wrong places - we hit all kinds of IPC assertions in the process, > mostly in the (de)serialization code. Assuming those assertions should > stay, we need to catch and handle these problems elsewhere. Actually, with the latest central, I don't get the additional assertions. Looks like something else changed to help us deal with this.
Comment on attachment 8740047 [details] [diff] [review] remove the TextureHost shmem mapping failed assert. Review of attachment 8740047 [details] [diff] [review]: ----------------------------------------------------------------- Let's definitely do this; in the past, this wasn't enough, but with the latest mozilla-central, I don't get any other assertions, testing or interactive.
Attachment #8740047 - Flags: review+
Flags: needinfo?(wmccloskey)
Assign to Jerry since he is working on this.
Assignee: nobody → hshih
(In reply to Nicolas Silva [:nical] from comment #18) > (In reply to Jerry Shih[:jerry] (UTC+8) from comment #17) > > I just remove the assert for that mapping failed. I also check the ipc > > deserialized flow. > > It's fine with the null mapping in ShmemTextureHost. But we will have lock > > failed when we try to lock the ShmemTextureHost. Then there will be nothing > > on the screen. > > Perhaps we should still crash debug builds when the requested size is below > a reasonable threshold, like here: > https://dxr.mozilla.org/mozilla-central/rev/ > 29d5a4175c8b74f45482276a53985cf2568b4be2/gfx/layers/d3d11/TextureD3D11. > cpp#372 We have 10k*10K size in 780392-1.html test. So I'm not sure what's the reasonable threshold. And we need to pass the test for both debug and release build.
Status: NEW → ASSIGNED
(In reply to Carsten Book [:Tomcat] from comment #25) > https://hg.mozilla.org/mozilla-central/rev/aa80c4c2ed4f 10.10 debug C-e10s is green on Ash now :). Given that it's a debug-only fix, I'm calling this wontfix from a backporting standpoint.
Status: ASSIGNED → RESOLVED
Closed: 9 years ago
Keywords: leave-open
Resolution: --- → FIXED
Target Milestone: --- → mozilla48
Flags: needinfo?(sotaro.ikeda.g)
Regressions: 1532870
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: