Closed Bug 1155423 Opened 9 years ago Closed 8 years ago

Intermittent 632781-verybig.html | application crashed [@ mozilla::ipc::MessageChannel::AssertWorkerThread() const]

Categories

(Core :: Graphics: Layers, defect)

39 Branch
All
Linux
defect
Not set
critical

Tracking

()

RESOLVED WORKSFORME
Tracking Status
e10s + ---

People

(Reporter: KWierso, Assigned: mchang)

References

(Blocks 1 open bug)

Details

(Keywords: crash, intermittent-failure, Whiteboard: [gfx-noted])

14:17:19 INFO - REFTEST TEST-START | file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/632781-verybig.html
14:17:19 INFO - REFTEST TEST-LOAD | file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/632781-verybig.html | 4112 / 6039 (68%)
14:17:19 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:19 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:19 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:19 INFO - ++DOMWINDOW == 62 (0x7eff96de4400) [pid = 1916] [serial = 13789] [outer = 0x7eff9d190400]
14:17:20 INFO - --DOMWINDOW == 61 (0x7eff932e9400) [pid = 1916] [serial = 13757] [outer = (nil)] [url = http://localhost:58885/1429216986095/165/bugs/621253-helper-externalFilter.xhtml]
14:17:20 INFO - --DOMWINDOW == 60 (0x7eff96cb7000) [pid = 1916] [serial = 13748] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 59 (0x7eff96bbe800) [pid = 1916] [serial = 13745] [outer = (nil)] [url = http://localhost:58885/1429216986086/164/bugs/621253-helper-externalFilter.xhtml]
14:17:20 INFO - --DOMWINDOW == 58 (0x7eff96e45800) [pid = 1916] [serial = 13751] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/621253-helper-internalFilter.xhtml]
14:17:20 INFO - --DOMWINDOW == 57 (0x7eff97645400) [pid = 1916] [serial = 13743] [outer = (nil)] [url = http://localhost:58885/1429216986086/164/bugs/621253-1-externalFilter.html]
14:17:20 INFO - --DOMWINDOW == 56 (0x7eff96bb2800) [pid = 1916] [serial = 13729] [outer = (nil)] [url = http://localhost:58885/1429216986066/162/bugs/615121-2-notref.html]
14:17:20 INFO - --DOMWINDOW == 55 (0x7eff96d98800) [pid = 1916] [serial = 13730] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 54 (0x7eff96da0c00) [pid = 1916] [serial = 13731] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/617242-1.html]
14:17:20 INFO - --DOMWINDOW == 53 (0x7eff932e8800) [pid = 1916] [serial = 13732] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/617242-1-ref.html]
14:17:20 INFO - --DOMWINDOW == 52 (0x7eff96bb6000) [pid = 1916] [serial = 13733] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 51 (0x7eff96dadc00) [pid = 1916] [serial = 13728] [outer = (nil)] [url = http://localhost:58885/1429216986066/162/bugs/615121-2.html]
14:17:20 INFO - --DOMWINDOW == 50 (0x7eff96bbe400) [pid = 1916] [serial = 13746] [outer = (nil)] [url = http://localhost:58885/1429216986086/164/bugs/621253-helper-externalFilter.xhtml]
14:17:20 INFO - --DOMWINDOW == 49 (0x7eff96bbf800) [pid = 1916] [serial = 13736] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 48 (0x7eff96cbd800) [pid = 1916] [serial = 13740] [outer = (nil)] [url = http://localhost:58885/1429216986080/163/bugs/619511-1.html]
14:17:20 INFO - --DOMWINDOW == 47 (0x7eff96da0800) [pid = 1916] [serial = 13735] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/618071-notref.html]
14:17:20 INFO - --DOMWINDOW == 46 (0x7eff96caf400) [pid = 1916] [serial = 13741] [outer = (nil)] [url = http://localhost:58885/1429216986080/163/bugs/619511-1-ref.html]
14:17:20 INFO - --DOMWINDOW == 45 (0x7eff96eb6800) [pid = 1916] [serial = 13747] [outer = (nil)] [url = http://localhost:58885/1429216986086/164/bugs/621253-1-ref.html]
14:17:20 INFO - --DOMWINDOW == 44 (0x7eff96e4a800) [pid = 1916] [serial = 13739] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 43 (0x7eff96eab800) [pid = 1916] [serial = 13734] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/618071.html]
14:17:20 INFO - --DOMWINDOW == 42 (0x7eff96cb2800) [pid = 1916] [serial = 13742] [outer = (nil)] [url = data:text/html;charset=UTF-8,%3C%21%2D%2DCLEAR%2D%2D%3E]
14:17:20 INFO - --DOMWINDOW == 41 (0x7eff96bb5800) [pid = 1916] [serial = 13738] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/619117-1-ref.html]
14:17:20 INFO - --DOMWINDOW == 40 (0x7eff96bbd400) [pid = 1916] [serial = 13737] [outer = (nil)] [url = file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/619117-1.html]
14:17:20 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:20 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:20 INFO - [Child 1916] WARNING: NS_ENSURE_TRUE(node) failed: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/layout/base/nsDocumentViewer.cpp, line 3530
14:17:20 INFO - [Child 1916] WARNING: Most platforms still need an optimized way to share GL cross process.: file /builds/slave/m-in-l64-d-0000000000000000000/build/src/gfx/layers/client/CanvasClient.cpp, line 42
14:17:33 INFO - firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
14:17:33 INFO - plugin-container: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
14:17:33 INFO - nsStringStats
14:17:33 INFO - => mAllocCount: 143
14:17:33 INFO - => mReallocCount: 1
14:17:33 INFO - => mFreeCount: 136 -- LEAKED 7 !!!
14:17:33 INFO - => mShareCount: 297
14:17:33 INFO - => mAdoptCount: 0
14:17:33 INFO - => mAdoptFreeCount: 0
14:17:33 INFO - => Process ID: 1990, Thread ID: 140472436787648
14:17:33 INFO - plugin-container: Fatal IO error 12 (Cannot allocate memory) on X server :0.
14:17:33 INFO - nsStringStats
14:17:33 INFO - => mAllocCount: 3765567
14:17:33 INFO - => mReallocCount: 251630
14:17:33 INFO - => mFreeCount: 3754832 -- LEAKED 10735 !!!
14:17:33 INFO - => mShareCount: 6665733
14:17:33 INFO - => mAdoptCount: 463234
14:17:33 INFO - => mAdoptFreeCount: 463232 -- LEAKED 2 !!!
14:17:33 INFO - => Process ID: 1916, Thread ID: 139636700649920
14:17:33 INFO - Assertion failure: mIOLoop == MessageLoop::current() (not on I/O thread!), at ../../dist/include/mozilla/ipc/MessageLink.h:147
14:17:34 INFO - Hit MOZ_CRASH(Layer not thread-safe) at /builds/slave/m-in-l64-d-0000000000000000000/build/src/gfx/layers/Layers.h:738
14:17:34 INFO - g_dbus_connection_real_closed: Remote peer vanished with error: Underlying GIOStream returned 0 bytes on an async read (g-io-error-quark, 0). Exiting.
14:17:34 WARNING - TEST-UNEXPECTED-FAIL | file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/632781-verybig.html | application terminated with exit code -15
14:17:44 INFO - mozcrash INFO | Saved minidump as /builds/slave/test/build/blobber_upload_dir/1783d515-4f1e-330c-54f31b09-11abb6b2.dmp
14:17:44 WARNING - PROCESS-CRASH | file:///builds/slave/test/build/tests/reftest/tests/layout/reftests/bugs/632781-verybig.html | application crashed [@ mozilla::ipc::MessageChannel::AssertWorkerThread() const]
14:17:44 INFO - Crash dump filename: /tmp/tmpwHz2ON.mozrunner/minidumps/1783d515-4f1e-330c-54f31b09-11abb6b2.dmp
14:17:44 INFO - Operating system: Linux
14:17:44 INFO - 0.0.0 Linux 3.2.0-76-generic #111-Ubuntu SMP Tue Jan 13 22:16:09 UTC 2015 x86_64
14:17:44 INFO - CPU: amd64
14:17:44 INFO - family 6 model 62 stepping 4
14:17:44 INFO - 1 CPU
14:17:44 INFO - Crash reason: SIGSEGV
14:17:44 INFO - Crash address: 0xc
14:17:44 INFO - Thread 23 (crashed) 
14:17:44 INFO - 0 libxul.so!mozilla::ipc::MessageChannel::AssertWorkerThread() const [MessageChannel.h:c75413a564b2 : 417 + 0x5]
14:17:44 INFO - rbx = 0x000000000000000e r12 = 0x00007fe53b2747f0
14:17:44 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe564ef6b99
14:17:44 INFO - rsp = 0x00007fe54a6feb50 rbp = 0x00007fe54a6feb60
14:17:44 INFO - Found by: given as instruction pointer in context
14:17:44 INFO - 1 libxul.so!mozilla::ipc::MessageChannel::CxxStackFrame::CxxStackFrame(mozilla::ipc::MessageChannel&, mozilla::ipc::Direction, IPC::Message const*) [MessageChannel.cpp:c75413a564b2 : 219 + 0x7]
14:17:44 INFO - rbx = 0x00007fe54a6febc8 r12 = 0x00007fe53b2747f0
14:17:44 INFO - r13 = 0x0000000000000001 r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe564efd009
14:17:44 INFO - rsp = 0x00007fe54a6feb70 rbp = 0x00007fe54a6febb0
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 2 libxul.so!mozilla::ipc::MessageChannel::Send(IPC::Message*) [MessageChannel.cpp:c75413a564b2 : 536 + 0x4]
14:17:44 INFO - rbx = 0x00007fe5408bac68 r12 = 0x00007fe53b2747f0
14:17:44 INFO - r13 = 0x00007fe54a6fec58 r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe564f04381
14:17:44 INFO - rsp = 0x00007fe54a6febc0 rbp = 0x00007fe54a6febf0
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 3 libxul.so!mozilla::layout::PVsyncParent::SendNotify(mozilla::TimeStamp const&) [PVsyncParent.cpp:c75413a564b2 : 64 + 0xb]
14:17:44 INFO - rbx = 0x00007fe5409de0e0 r12 = 0x00007fe53b2747f0
14:17:44 INFO - r13 = 0x00007fe54a6fec58 r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe5650410d4
14:17:44 INFO - rsp = 0x00007fe54a6fec00 rbp = 0x00007fe54a6fec40
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 4 libxul.so!mozilla::layout::VsyncParent::DispatchVsyncEvent(mozilla::TimeStamp) [VsyncParent.cpp:c75413a564b2 : 71 + 0x8]
14:17:44 INFO - rbx = 0x00007fe54aa87990 r12 = 0x00007fe55267c701
14:17:44 INFO - r13 = 0x00007fe54a6fed0f r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe5665a11da
14:17:44 INFO - rsp = 0x00007fe54a6fec50 rbp = 0x00007fe54a6fec60
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 5 libxul.so!nsRunnableMethodImpl<void (mozilla::layout::VsyncParent::*)(mozilla::TimeStamp), true, mozilla::TimeStamp>::Run() [nsThreadUtils.h:c75413a564b2 : 615 + 0x16]
14:17:44 INFO - rbx = 0x00007fe54aa87990 r12 = 0x00007fe55267c701
14:17:44 INFO - r13 = 0x00007fe54a6fed0f r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe5665a1208
14:17:44 INFO - rsp = 0x00007fe54a6fec70 rbp = 0x00007fe54a6fec70
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 6 libxul.so!nsThread::ProcessNextEvent(bool, bool*) [nsThread.cpp:c75413a564b2 : 866 + 0x11]
14:17:44 INFO - rbx = 0x00007fe54aa87990 r12 = 0x00007fe55267c701
14:17:44 INFO - r13 = 0x00007fe54a6fed0f r14 = 0x0000000000000701
14:17:44 INFO - r15 = 0x0000000000000000 rip = 0x00007fe564c811cd
14:17:44 INFO - rsp = 0x00007fe54a6fec80 rbp = 0x00007fe54a6fecf0
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 7 libxul.so!NS_ProcessNextEvent(nsIThread*, bool) [nsThreadUtils.cpp:c75413a564b2 : 265 + 0xc]
14:17:44 INFO - rbx = 0x00007fe54a7aef01 r12 = 0x00007fe55267c760
14:17:44 INFO - r13 = 0x00007fe54a7aefe8 r14 = 0x0000000000000700
14:17:44 INFO - r15 = 0x00007fe54a7aefe0 rip = 0x00007fe564ca006e
14:17:44 INFO - rsp = 0x00007fe54a6fed00 rbp = 0x00007fe54a6fed20
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 8 libxul.so!mozilla::ipc::MessagePumpForNonMainThreads::Run(base::MessagePump::Delegate*) [MessagePump.cpp:c75413a564b2 : 368 + 0xd]
14:17:44 INFO - rbx = 0x00007fe54a7aefc0 r12 = 0x00007fe55267c760
14:17:44 INFO - r13 = 0x00007fe54a7aefe8 r14 = 0x0000000000000700
14:17:44 INFO - r15 = 0x00007fe54a7aefe0 rip = 0x00007fe564efe29c
14:17:44 INFO - rsp = 0x00007fe54a6fed30 rbp = 0x00007fe54a6fed80
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 9 libxul.so!MessageLoop::RunInternal() [message_loop.cc:c75413a564b2 : 233 + 0x16]
14:17:44 INFO - rbx = 0x00007fe55267c760 r12 = 0x00007fe55267c760
14:17:44 INFO - r13 = 0x00007fe54aa879b0 r14 = 0x0000000000000773
14:17:44 INFO - r15 = 0x0000000000000001 rip = 0x00007fe564ee0901
14:17:44 INFO - rsp = 0x00007fe54a6fed90 rbp = 0x00007fe54a6fedc0
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 10 libxul.so!MessageLoop::Run() [message_loop.cc:c75413a564b2 : 226 + 0x7]
14:17:44 INFO - rbx = 0x00007fe55267c760 r12 = 0x00007fe55267c760
14:17:44 INFO - r13 = 0x00007fe54aa879b0 r14 = 0x0000000000000773
14:17:44 INFO - r15 = 0x0000000000000001 rip = 0x00007fe564ee0928
14:17:44 INFO - rsp = 0x00007fe54a6fedd0 rbp = 0x00007fe54a6fee00
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 11 libxul.so!nsThread::ThreadFunc(void*) [nsThread.cpp:c75413a564b2 : 364 + 0x7]
14:17:44 INFO - rbx = 0x00007fe54aa87990 r12 = 0x00007fe55267c760
14:17:44 INFO - r13 = 0x00007fe54aa879b0 r14 = 0x0000000000000773
14:17:44 INFO - r15 = 0x0000000000000001 rip = 0x00007fe564c81aaf
14:17:44 INFO - rsp = 0x00007fe54a6fee10 rbp = 0x00007fe54a6fee50
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 12 libnspr4.so!_pt_root [ptthread.c:c75413a564b2 : 212 + 0x6]
14:17:44 INFO - rbx = 0x00007fe54aaf56a0 r12 = 0x0000000000000000
14:17:44 INFO - r13 = 0x0000000000000773 r14 = 0x0000000000000773
14:17:44 INFO - r15 = 0x0000000000000001 rip = 0x00007fe56aae6ad8
14:17:44 INFO - rsp = 0x00007fe54a6fee60 rbp = 0x00007fe54a6feea0
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - 13 libpthread-2.15.so + 0x7e99
14:17:44 INFO - rbx = 0x0000000000000000 r12 = 0x00007fff3c4e2cb0
14:17:44 INFO - r13 = 0x00007fe54a6ff9c0 r14 = 0x0000000000000000
14:17:44 INFO - r15 = 0x0000000000000003 rip = 0x00007fe56bf2de9a
14:17:44 INFO - rsp = 0x00007fe54a6feeb0 rbp = 0x0000000000000000
14:17:44 INFO - Found by: call frame info
14:17:44 INFO - Thread 0
Severity: normal → critical
Component: Layout → IPC
Keywords: crash
Blocks: e10s-tests
Could you take a look at this Mason? It seems to be some sort of vsync issue.
Component: IPC → Graphics: Layers
Flags: needinfo?(mchang)
Assignee: nobody → mchang
Status: NEW → ASSIGNED
Flags: needinfo?(mchang)
I'm not sure this is actually a vsync issue versus that's what the crash dump caught in the parent process. Before the assertion failure, I see this stack trace from the log in [1].

20:26:10     INFO -  Assertion failure: mIOLoop == MessageLoop::current() (not on I/O thread!), at ../../dist/include/mozilla/ipc/MessageLink.h:147
20:26:10     INFO -  Hit MOZ_CRASH(Layer not thread-safe) at /builds/slave/b2g-in-l64-d-00000000000000000/build/src/gfx/layers/Layers.h:732
20:26:10     INFO -  #01: mozilla::ipc::ProcessLink::OnChannelError() [ipc/glue/MessageLink.cpp:401]
20:26:10     INFO -  #01: std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >*) [memory/mozalloc/mozalloc.h:210]
20:26:10     INFO -  #02: std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >*) [gfx/layers/ipc/LayerTransactionChild.cpp:141]

Layers are not thread safe ref counted[2]. I wonder if the child process is crashing and then we send a vsync message and get this failure from the parent process. That would make more sense for a couple of reasons.

First, the VsyncParent is on the background thread of the parent process and asserts that it's on the background thread [3]. This test is also the longest running test in the suite, and takes 21 seconds to run according to the log, which means we've been sending vsync messages for 21 seconds already. In addition, this reftest is in the middle of the suite, so it's not a start up / shutdown issue as the other reftests have been using vsync. Unless the background thread changes it's message loop or channel halfway between, we shouldn't have switched message loop threads. 

Also, looking at the log, I checked out thread 0, the main thread,  to see the stack trace and get this:

20:26:23     INFO -  13  libxul.so!mozilla::layers::Layer::Release() [Layers.h:cd229a55ed1c : 732 + 0x17]
20:26:23     INFO -      rip = 0x00007f9af87022fa   rsp = 0x00007fffa3f95eb0
20:26:23     INFO -      rbp = 0x00007fffa3f96270
20:26:23     INFO -      Found by: stack scanning
20:26:23     INFO -  14  libxul.so!std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >*) [stl_pair.h:cd229a55ed1c : 88 + 0x8]
20:26:23     INFO -      rbx = 0x00007f9add9c08a0   r12 = 0x00007f9afccb5520
20:26:23     INFO -      rip = 0x00007f9af877071f   rsp = 0x00007fffa3f96280
20:26:23     INFO -      rbp = 0x00007fffa3f962a0
20:26:23     INFO -      Found by: call frame info
20:26:23     INFO -  15  libxul.so!std::_Rb_tree<unsigned long, std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState>, std::_Select1st<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >, std::less<unsigned long>, std::allocator<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> > >::_M_erase(std::_Rb_tree_node<std::pair<unsigned long const, mozilla::layers::CompositorParent::LayerTreeState> >*) [stl_tree.h:cd229a55ed1c : 1082 + 0xb]
20:26:23     INFO -      rbx = 0x00007f9add9c0790   r12 = 0x00007f9afccb5520

We're releasing a layer on the main thread. The only thing that has a map<unsigned long, LayerTreeState> is sIndirectLayerTrees [4] and that usually releases a layer on the compositor thread[5]. All the other times we delete sIndirectLayerTree[id].mRoot, which is a layer, is on the compositor thread.

The CrossProcessCompositorParent can still erase a layer at [6] which I'm not sure is the main thread or not. Have a quick test on try to make sure we always erase on the compositor thread:

https://treeherder.mozilla.org/#/jobs?repo=try&revision=4c4d9e2fd16a

[1] http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/b2g-inbound-linux64-debug/1431050459/b2g-inbound_ubuntu64_vm-debug_test-reftest-e10s-1-bm117-tests1-linux64-build5.txt.gz
[2] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/Layers.h#732
[3] https://dxr.mozilla.org/mozilla-central/source/layout/ipc/VsyncParent.cpp?from=VsyncParent.cpp&case=true#63
[4] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#101
[5] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#717
[6] https://dxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorParent.cpp#1889
The logs certainly are pretty confusing. I just noticed that, right before we assert, these messages are printed:

20:26:10     INFO -  plugin-container: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
20:26:10     INFO -  firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.

I wonder if there's some sort of X-level failure that we're not prepared for and it causes all sorts of badness down the line. Given that the test is called 632781-verybig.html, it wouldn't be too surprising that we would get some sort of allocation failure.
Hmm, so looks like on try, reftests-e10s does not run on debug builds explicitly :/. 

@Ryan - Who can we talk to if we want to run reftest-e10s on try? Thanks.

@Nical - Does comment 22 make sense to you? Is there anything we can check on Linux for comment 23? I remember you saying there was an issue where we can take down the X server if something was too big on Linux during the gfx dailies. Where can we check that? Thanks!
Flags: needinfo?(ryanvm)
Flags: needinfo?(nical.bugzilla)
(In reply to Mason Chang [:mchang] from comment #25)
> @Ryan - Who can we talk to if we want to run reftest-e10s on try? Thanks.

File a Release Engineering::General Automation bug for it.
Flags: needinfo?(ryanvm)
A workaround is to use |-u all|. That will run e10s debug reftests.
(In reply to Bill McCloskey (:billm) from comment #23)
> The logs certainly are pretty confusing. I just noticed that, right before
> we assert, these messages are printed:
> 
> 20:26:10     INFO -  plugin-container: Fatal IO error 11 (Resource
> temporarily unavailable) on X server :0.
> 20:26:10     INFO -  firefox: Fatal IO error 11 (Resource temporarily
> unavailable) on X server :0.
> 
> I wonder if there's some sort of X-level failure that we're not prepared for
> and it causes all sorts of badness down the line. Given that the test is
> called 632781-verybig.html, it wouldn't be too surprising that we would get
> some sort of allocation failure.

https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0471b19c66d

That seems to be the case. The assertion I put in to assert that the layer state is removed on the compositor thread never failed, but in the log we got this:

19:27:44     INFO -  firefox: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
19:27:44     INFO -  plugin-container: Fatal IO error 11 (Resource temporarily unavailable) on X server :0.
....
....
19:27:45     INFO -  plugin-container: Fatal IO error 12 (Cannot allocate memory) on X server :0.

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/mchang@mozilla.com-f0471b19c66d/try-linux64-debug/try_ubuntu64_vm-debug_test-reftest-e10s-1-bm115-tests1-linux64-build62.txt.gz
(In reply to Mason Chang [:mchang] from comment #25)
> Hmm, so looks like on try, reftests-e10s does not run on debug builds
> explicitly :/. 
> 
> @Ryan - Who can we talk to if we want to run reftest-e10s on try? Thanks.
> 
> @Nical - Does comment 22 make sense to you?

Yup

> Is there anything we can check
> on Linux for comment 23? I remember you saying there was an issue where we
> can take down the X server if something was too big on Linux during the gfx
> dailies. Where can we check that? Thanks!

My proposition was to throttle the size of canvases to some sane limit, but that's mostly band aid. In the dailies the answer I got was, let's just keep working on removing our dependency to xrender.
As far as I understand we don't have a way to handle this specific allocation failure which happens asynchronously in the x server but I could be wrong (I hope I am wrong). I haven't been able to reproduce this one locally (my x server gets killed by the LMK when I force it to allocate too many big surfaces which is also a lot of fun).
Flags: needinfo?(nical.bugzilla)
Looking at the test, this looks like a giant 10000x10000 canvas - https://dxr.mozilla.org/mozilla-central/source/layout/reftests/bugs/632781-verybig.html

Is there a specific place we can assert that we got an allocation failure for this limit? I just want to double check that we're explicitly hitting this allocation failure and not something else. Thanks!
Flags: needinfo?(nical.bugzilla)
(In reply to Mason Chang [:mchang] from comment #30)
> Looking at the test, this looks like a giant 10000x10000 canvas -
> https://dxr.mozilla.org/mozilla-central/source/layout/reftests/bugs/632781-
> verybig.html
> 
> Is there a specific place we can assert that we got an allocation failure
> for this limit? I just want to double check that we're explicitly hitting
> this allocation failure and not something else. Thanks!

I'd look at all of the places where we allocate the surface in DrawTargetCairo.cpp, as well as whether we return null in TextureClient::CreateForDrawing.
But it could still be a failure that happens asynchronously in the x server and the firefox process gets killed without getting a chance to handle the error (not sure).
Flags: needinfo?(nical.bugzilla)
Is this failure still happening?
Flags: needinfo?(mchang)
Whiteboard: [gfx-noted]
IIRC we don't use xrender anymore either.
Status: ASSIGNED → RESOLVED
Closed: 8 years ago
Flags: needinfo?(mchang)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.