Closed
Bug 862324
Opened 12 years ago
Closed 12 years ago
Use after free of PGrallocBufferParent when the child process is killed
Categories
(Core :: Graphics, defect)
Tracking
()
RESOLVED
FIXED
mozilla23
People
(Reporter: cyu, Assigned: bjacob)
References
Details
Attachments
(2 files, 5 obsolete files)
(deleted),
patch
|
nrc
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
bjacob
:
review+
|
Details | Diff | Splinter Review |
Using trunk. STR as in bug 860079.
Here are the stack traces:
> Breakpoint 1, ~PGrallocBufferParent (this=0x44b99e58, __in_chrg=<value optimized out>) at ipc/ipdl/PGrallocBufferParent.cpp:49
> 49 PGrallocBufferParent::~PGrallocBufferParent()
> #0 ~PGrallocBufferParent (this=0x44b99e58, __in_chrg=<value optimized out>) at ipc/ipdl/PGrallocBufferParent.cpp:49
> #1 0x41dfee74 in ~GrallocBufferActor (this=0x44b99e40, __in_chrg=<value optimized out>) at gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:193
> #2 0x41dfee9a in ~GrallocBufferActor (this=0x44b99e58, __in_chrg=<value optimized out>) at gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:193
> #3 0x40e01f06 in mozilla::net::NeckoParent::DeallocPCookieService (this=<value optimized out>, cs=0x401ba044) at netwerk/ipc/NeckoParent.cpp:424
> #4 0x419e95ac in mozilla::layers::PLayersParent::DeallocSubtree (this=0x404486a0) at ipc/ipdl/PLayersParent.cpp:819
> #5 0x419defb0 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x47c61c50) at ipc/ipdl/PCompositorParent.cpp:827
> #6 0x419df1da in mozilla::layers::PCompositorParent::OnChannelError (this=0x47c61c50) at ipc/ipdl/PCompositorParent.cpp:664
> #7 0x41971e4c in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x47c61c58) at ipc/glue/AsyncChannel.cpp:570
> #8 0x41973124 in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x47c61c58) at ipc/glue/AsyncChannel.cpp:535
> #9 0x4194732c in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at ipc/chromium/src/base/tuple.h:383
> #10 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at ipc/chromium/src/base/task.h:307
> #11 0x41d6345e in MessageLoop::RunTask (this=0x478ffdd0, task=0x46c5f060) at ipc/chromium/src/base/message_loop.cc:334
> #12 0x41d63c88 in MessageLoop::DeferOrRunPendingTask (this=0x44b99e58, pending_task=<value optimized out>) at ipc/chromium/src/base/message_loop.cc:342
> #13 0x41d649de in MessageLoop::DoWork (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:442
> #14 0x41d64d5a in base::MessagePumpDefault::Run (this=0x46864160, delegate=0x478ffdd0) at ipc/chromium/src/base/message_pump_default.cc:23
> #15 0x41d63a12 in MessageLoop::RunInternal (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:216
> #16 0x41d63a72 in MessageLoop::RunHandler (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:209
> #17 MessageLoop::Run (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:183
> #18 0x41d6d9a4 in base::Thread::ThreadMain (this=0x4682d3a0) at ipc/chromium/src/base/thread.cc:156
> #19 0x41d7b0d6 in ThreadFunc (closure=0x44b99e58) at ipc/chromium/src/base/platform_thread_posix.cc:39
> #20 0x400dce18 in __thread_entry (func=0x41d7b0cd <ThreadFunc>, arg=0x4682d3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
> #21 0x400dc96c in pthread_create (thread_out=<value optimized out>, attr=0xbeae1240, start_routine=0x41d7b0cd <ThreadFunc>, arg=0x4682d3a0) at bionic/libc/bionic/pthread.c:357
> #22 0x00000000 in ?? ()
>
> Program received signal SIGSEGV, Segmentation fault.
> 0x4278585c in mozalloc_abort (msg=<value optimized out>) at memory/mozalloc/mozalloc_abort.cpp:30
> 30 MOZ_CRASH();
> (gdb) bt
> #0 0x4278585c in mozalloc_abort (msg=<value optimized out>) at memory/mozalloc/mozalloc_abort.cpp:30
> #1 0x41d26e36 in Abort (aSeverity=<value optimized out>, aStr=<value optimized out>, aExpr=<value optimized out>, aFile=<value optimized out>, aLine=34) at xpcom/base/nsDebugImpl.cpp:430
> #2 NS_DebugBreak (aSeverity=<value optimized out>, aStr=<value optimized out>, aExpr=<value optimized out>, aFile=<value optimized out>, aLine=34) at xpcom/base/nsDebugImpl.cpp:387
> #3 0x41aa188c in mozilla::layers::PGrallocBuffer::Transition (from=<value optimized out>, trigger=..., next=<value optimized out>) at ipc/ipdl/PGrallocBuffer.cpp:34
> #4 0x419df80e in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x44b99e58) at ipc/ipdl/PGrallocBufferParent.cpp:82
> #5 0x41dff5c6 in mozilla::layers::ISurfaceAllocator::PlatformDestroySharedSurface (aSurface=0x44b99980) at gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:274
> #6 0x41df8690 in mozilla::layers::ISurfaceAllocator::DestroySharedSurface (this=0x404486c4, aSurface=0x44b99980) at gfx/layers/ipc/ISurfaceAllocator.cpp:110
> #7 0x41dfbc90 in ~TextureHost (this=0x459f9b80, __in_chrg=<value optimized out>) at gfx/layers/composite/TextureHost.cpp:58
> #8 0x41dfb8fc in ~GrallocTextureHostOGL (this=0x459f9b80, __in_chrg=<value optimized out>) at gfx/layers/opengl/TextureHostOGL.cpp:653
> #9 0x41dfb912 in ~GrallocTextureHostOGL (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/opengl/TextureHostOGL.cpp:653
> #10 0x41dedf92 in mozilla::RefCounted<mozilla::layers::TextureSource>::Release (this=0x452320e8, t=0x0) at ../../dist/include/mozilla/RefPtr.h:71
> #11 mozilla::RefPtr<mozilla::layers::TextureHost>::unref (this=0x452320e8, t=0x0) at ../../dist/include/mozilla/RefPtr.h:166
> #12 mozilla::RefPtr<mozilla::layers::TextureHost>::assign (this=0x452320e8, t=0x0) at ../../dist/include/mozilla/RefPtr.h:152
> #13 0x41dedfec in mozilla::RefPtr<mozilla::layers::TextureHost>::operator= (this=0x452320e8, t=0x478ff138) at ../../dist/include/mozilla/RefPtr.h:127
> #14 0x41dee136 in mozilla::layers::ContentHostDoubleBuffered::DestroyTextures (this=0x45232080) at gfx/layers/composite/ContentHost.cpp:303
> #15 0x41dee7ba in ~ContentHostDoubleBuffered (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/composite/ContentHost.cpp:274
> #16 0x41dee7ea in ~ContentHostDoubleBuffered (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/composite/ContentHost.cpp:276
> #17 0x41dd80dc in mozilla::RefCounted<mozilla::layers::CompositableHost>::Release (this=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:71
> #18 0x41ddca24 in mozilla::RefPtr<mozilla::layers::ContentHost>::unref (this=0x47d57c00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:166
> #19 ~RefPtr (this=0x47d57c00, __in_chrg=<value optimized out>) at ../../dist/include/mozilla/RefPtr.h:116
> #20 ~ThebesLayerComposite (this=0x47d57c00, __in_chrg=<value optimized out>) at gfx/layers/composite/ThebesLayerComposite.cpp:42
> #21 0x41ddca5a in ~ThebesLayerComposite (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/composite/ThebesLayerComposite.cpp:42
> #22 0x41dd96ae in mozilla::layers::Layer::Release (this=<value optimized out>, aChild=0x47d57c90) at gfx/layers/Layers.h:588
> #23 mozilla::layers::ContainerLayerComposite::RemoveChild (this=<value optimized out>, aChild=0x47d57c90) at gfx/layers/composite/ContainerLayerComposite.cpp:246
> #24 0x41dd975a in ~ContainerLayerComposite (this=0x45183c00, __in_chrg=<value optimized out>) at gfx/layers/composite/ContainerLayerComposite.cpp:176
> #25 0x41dd97ce in ~ContainerLayerComposite (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/composite/ContainerLayerComposite.cpp:178
> #26 0x40edf34e in mozilla::layers::Layer::Release (this=0x45183c90) at ../../dist/include/Layers.h:588
> #27 0x41df5326 in ~nsRefPtr (this=0x404486a0, __in_chrg=<value optimized out>) at ../../dist/include/nsAutoPtr.h:880
> #28 ~ShadowLayersParent (this=0x404486a0, __in_chrg=<value optimized out>) at gfx/layers/ipc/ShadowLayersParent.cpp:150
> #29 0x41df535e in ~ShadowLayersParent (this=0xa5, __in_chrg=<value optimized out>) at gfx/layers/ipc/ShadowLayersParent.cpp:150
> #30 0x41deb5e2 in mozilla::layers::CrossProcessCompositorParent::DeallocPLayers (this=<value optimized out>, aLayers=0x404486a0) at gfx/layers/ipc/CompositorParent.cpp:1407
> #31 0x419defd6 in mozilla::layers::PCompositorParent::DeallocSubtree (this=0x47c61c50) at ipc/ipdl/PCompositorParent.cpp:831
> #32 0x419df1da in mozilla::layers::PCompositorParent::OnChannelError (this=0x47c61c50) at ipc/ipdl/PCompositorParent.cpp:664
> #33 0x41971e4c in mozilla::ipc::AsyncChannel::NotifyMaybeChannelError (this=0x47c61c58) at ipc/glue/AsyncChannel.cpp:570
> #34 0x41973124 in mozilla::ipc::AsyncChannel::OnNotifyMaybeChannelError (this=0x47c61c58) at ipc/glue/AsyncChannel.cpp:535
> #35 0x4194732c in DispatchToMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)()> (this=<value optimized out>) at ipc/chromium/src/base/tuple.h:383
> #36 RunnableMethod<mozilla::dom::ContentParent, void (mozilla::dom::ContentParent::*)(), Tuple0>::Run (this=<value optimized out>) at ipc/chromium/src/base/task.h:307
> #37 0x41d6345e in MessageLoop::RunTask (this=0x478ffdd0, task=0x46c5f060) at ipc/chromium/src/base/message_loop.cc:334
> #38 0x41d63c88 in MessageLoop::DeferOrRunPendingTask (this=0xa5, pending_task=<value optimized out>) at ipc/chromium/src/base/message_loop.cc:342
> #39 0x41d649de in MessageLoop::DoWork (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:442
> #40 0x41d64d5a in base::MessagePumpDefault::Run (this=0x46864160, delegate=0x478ffdd0) at ipc/chromium/src/base/message_pump_default.cc:23
> #41 0x41d63a12 in MessageLoop::RunInternal (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:216
> #42 0x41d63a72 in MessageLoop::RunHandler (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:209
> #43 MessageLoop::Run (this=0x478ffdd0) at ipc/chromium/src/base/message_loop.cc:183
> #44 0x41d6d9a4 in base::Thread::ThreadMain (this=0x4682d3a0) at ipc/chromium/src/base/thread.cc:156
> #45 0x41d7b0d6 in ThreadFunc (closure=0xa5) at ipc/chromium/src/base/platform_thread_posix.cc:39
> #46 0x400dce18 in __thread_entry (func=0x41d7b0cd <ThreadFunc>, arg=0x4682d3a0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
> #47 0x400dc96c in pthread_create (thread_out=<value optimized out>, attr=0xbeae1240, start_routine=0x41d7b0cd <ThreadFunc>, arg=0x4682d3a0) at bionic/libc/bionic/pthread.c:357
> #48 0x00000000 in ?? ()
Comment 1•12 years ago
|
||
Which version of FirefoxOS do you use? From the crash log, the source code seems after gfx refactoring's one. They should be m-c/m-i.
Updated•12 years ago
|
Flags: needinfo?(cyu)
Comment 2•12 years ago
|
||
(In reply to Sotaro Ikeda [:sotaro] from comment #1)
> Which version of FirefoxOS do you use? From the crash log, the source code
> seems after gfx refactoring's one. They should be m-c/m-i.
Oh, it is written as turnk.
Comment 3•12 years ago
|
||
It seems like the crash is regression from gfx layer refactoring.
Flags: needinfo?(cyu)
Comment 4•12 years ago
|
||
Benoit, is this the same as the other ones we're tracking?
Flags: needinfo?(bjacob)
Assignee | ||
Comment 5•12 years ago
|
||
Yes, this is exactly the crash we've been having.
This bug is well-filed, the GDB session in comment 0 shows exactly what's happening, so let's keep this bug as our canonical one about this issue.
Flags: needinfo?(bjacob)
Updated•12 years ago
|
Assignee: nobody → bjacob
Comment 6•12 years ago
|
||
bug 776940 had a very similar problem
Assignee | ||
Comment 7•12 years ago
|
||
Here's what I know, based mostly on earlier work on this bug with Jeff M.
If you look at the two stacks in comment 0, the last stack frame in common is in PCompositorParent::DeallocSubtree().
Here's this function:
void
PCompositorParent::DeallocSubtree()
{
{
// Recursively deleting PLayers kids
InfallibleTArray<PLayersParent*>& kids = mManagedPLayersParent;
for (uint32_t i = 0; (i) < ((kids).Length()); (++(i))) {
(kids[i])->DeallocSubtree(); // <-- first stack, where we destroy
} // the PGrallocBufferParent
for (uint32_t i = 0; (i) < ((kids).Length()); (++(i))) {
DeallocPLayers(kids[i]); // <-- second stack, where we destroy
} // the TextureHost, calling Send__delete__
(mManagedPLayersParent).Clear(); // on the deleted PGrallocBufferParent
}
}
From there I am not sure how to interprete this bug, as I am not sure what design principles our IPC code relies on.
One possible interpretation is that this is almost right, we just are doing this in the wrong order: if the TextureHost were destroyed _before_ the PGrallocBufferParent, we would be in good shape --- according to that interpretation. We've been working with that interpretation in past attempts to fix this bug, but evidently we haven't succeeded.
Another interpretation might be that IPDL structures such as PGrallocBufferParent are NOT to be manually held by non-IPDL-generated code. Here, the TextureHost holds the PGrallocBufferParent indirectly: the TextureHost really holds a SurfaceDescriptor (TextureHost::mBuffer), which wraps a PGrallocBufferParent in this case. In this interpretation, the bug is that non-IPDL-generated code such as TextureHost should not hold on to IPDL stuff such as PGrallocBufferParent (here though the SurfaceDescriptor).
Josh, could you comment on that? Hope that made any sense...
Flags: needinfo?(josh)
Comment 8•12 years ago
|
||
At all times, code using IPDL actors needs to be ensuring that the actors are actually valid. This is the reason ActorDestroy exists; it should be the responsibility of a GrallocBufferParent actor to notify any code that could make use of it that it is no longer a valid actor. It looks like the ShadowLayerForwarder::PlatformDestroySharedSurface code is in the wrong here; the way IPDL deallocates subtrees is correct.
Flags: needinfo?(josh)
> In this interpretation, the bug
> is that non-IPDL-generated code such as TextureHost should not hold on to
> IPDL stuff such as PGrallocBufferParent (here though the SurfaceDescriptor).
I don't see how this could be the case -- you have to be able to hold on to IPDL "stuff" outside of IPDL objects, because that's your entry point to communication with the other end...
Comment 10•12 years ago
|
||
I am a bit confused by ISurfaceAllocator::PlatformDestroySharedSurface. The child case makes sense (Send__delete__). In the parent case, however, I don't get why we are sending a delete there.
Comment 11•12 years ago
|
||
Bug 860442 added that code, I don't think we had this before the refactoring. There should be a symmetry between the two PlatformDestroySharedSurface implementations (ISurfaceAllocator and ShadowLayerForwarder), which is no longer the case, so I would say bug 860442 needs a different fix, and backing out that code should fix this problem here.
Assignee | ||
Comment 12•12 years ago
|
||
The patch in bug 860442 only makes a difference in the child case; here we are in the parent case. So I don't suppose that it would make a difference here, though I could be missing something.
We've just looked into this with :jrmuizel, :sotaro and :nical, and agreed on the following:
The core problem is that TextureHost has a SurfaceDescriptor member (mBuffer). SurfaceDescriptor holds a raw pointer to a gralloc buffer actor, allowing the actor to disappear under its feet. The crash here then occurs when the TextureHost destructor then tries to call Send__delete__ on the already-disappeared actor.
The fix is to have TextureHost no longer have a SurfaceDescriptor member. Instead, concrete TextureHost child classes should hold concrete surfaces and keep them alive; there should be no need to keep the SurfaceDescriptor. Note that GrallocTextureHostOGL already has a android::sp<android::GraphicBuffer> member; the differences is that its base class, TextureHost, should no longer have a SurfaceDescriptor member.
:nical is already working on a sizable change to TextureHost (among other classes) that will in particular accomplish that. But that is only scheduled for Gecko 24.
So we want to go for a short-term fix here, fixing only the B2G path rather than spending much more time fixing TextureHost in general --- as :nical's upcoming changes will overwrite ours anyway.
The short-term fix is going to be to ensure that the mBuffer field of TextureHost is not used for anything else than passing the gralloc buffer to GrallocTextureHostOGL.
Assignee | ||
Comment 13•12 years ago
|
||
Hey, I have a quick hacky patch that /seems/ to work in that the b2g main process no longer crashes on http://nightly-gupshup.herokuapp.com, but I can't test that page yet because of this content process crash still happening:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 1491.1543]
0x40d39b12 in nsDOMCameraManager::AddRef (this=0x45c9dc10) at /hack/mozilla-central/dom/camera/DOMCameraManager.cpp:31
31 NS_IMPL_CYCLE_COLLECTING_ADDREF(nsDOMCameraManager)
(gdb) bt
#0 0x40d39b12 in nsDOMCameraManager::AddRef (this=0x45c9dc10) at /hack/mozilla-central/dom/camera/DOMCameraManager.cpp:31
#1 0x40bac0ec in nsRefPtr (this=0x464ffcfc, aSmartPtr=<value optimized out>) at ../../dist/include/nsAutoPtr.h:901
#2 0x40bad288 in mozilla::MediaManager::GetBackend (this=0x461a3850, aWindowId=3) at /hack/mozilla-central/dom/media/MediaManager.cpp:1153
#3 0x40baf606 in mozilla::GetUserMediaRunnable::Run (this=0x46395720) at /hack/mozilla-central/dom/media/MediaManager.cpp:532
#4 0x414cdcc0 in nsThread::ProcessNextEvent (this=0x45b27a20, mayWait=<value optimized out>, result=<value optimized out>) at /hack/mozilla-central/xpcom/threads/nsThread.cpp:627
#5 0x41491554 in NS_ProcessNextEvent (thread=0xb6, mayWait=true) at /hack/b2g/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:238
#6 0x414ce3e4 in nsThread::ThreadFunc (arg=<value optimized out>) at /hack/mozilla-central/xpcom/threads/nsThread.cpp:265
#7 0x42df728c in _pt_root (arg=<value optimized out>) at /hack/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:191
#8 0x40047e18 in __thread_entry (func=0x42df71f9 <_pt_root>, arg=0x463955e0, tls=<value optimized out>) at bionic/libc/bionic/pthread.c:217
#9 0x4004796c in pthread_create (thread_out=<value optimized out>, attr=0xbe99bb94, start_routine=0x42df71f9 <_pt_root>, arg=0x463955e0) at bionic/libc/bionic/pthread.c:357
#10 0x00000000 in ?? ()
Anyone got a clue about it?
Assignee | ||
Comment 14•12 years ago
|
||
This seems to be running nicely here:
- No compositor crash anymore on http://nightly-gupshup.herokuapp.com, even though the browser process keeps crashing as in the previous comment. This shows that the compositor now survives the content process crash.
- Also tested on some newspaper websites .
Attachment #741525 -
Flags: review?(nical.bugzilla)
Assignee | ||
Comment 15•12 years ago
|
||
Comment on attachment 741525 [details] [diff] [review]
hacky fix for GrallocTextureHostOGL to avoid storing the SurfaceDescriptor, until Nical makes everything nice
Review of attachment 741525 [details] [diff] [review]:
-----------------------------------------------------------------
Since this review is very time-sensitive and it's very late in Europe, let's add Nick as an alternate reviewer; one review is enough.
Attachment #741525 -
Flags: review?(ncameron)
Comment 16•12 years ago
|
||
Comment on attachment 741525 [details] [diff] [review]
hacky fix for GrallocTextureHostOGL to avoid storing the SurfaceDescriptor, until Nical makes everything nice
Review of attachment 741525 [details] [diff] [review]:
-----------------------------------------------------------------
r=me if it works :-p
::: gfx/layers/opengl/TextureHostOGL.cpp
@@ +752,5 @@
> +SurfaceDescriptor*
> +GrallocTextureHostOGL::GetBuffer() const
> +{
> + static SurfaceDescriptor dummyDescriptor;
> + return &dummyDescriptor;
Woah! This works? This kind of scares me. I'll trust you if you say it works, but presumably that is only if this never gets called? In which case please add at least a warning and maybe MOZ_NOT_REACHED
@@ +760,5 @@
> +GrallocTextureHostOGL::SetBuffer(SurfaceDescriptor* aBuffer, ISurfaceAllocator* aAllocator)
> +{
> + mDeAllocator = aAllocator;
> +
> + android::sp<android::GraphicBuffer> buffer = GrallocBufferActor::GetFrom(*aBuffer);
buffer is not used - if this is just to keep aBuffer alive, please add a comment, otherwise delete it.
Attachment #741525 -
Flags: review?(ncameron) → review+
Assignee | ||
Comment 17•12 years ago
|
||
Comment on attachment 741525 [details] [diff] [review]
hacky fix for GrallocTextureHostOGL to avoid storing the SurfaceDescriptor, until Nical makes everything nice
Actually, this only worked on my testing on newspaper sites doing ThebesLayers... I then tested the Camera app which has an ImageHost, and it crashes. The problem is, I was assuming that the old way of double buffering with TextureHost was not used anymore (it is not used anymore by ContentHost) but it is still used by ImageLayer. With the old way to double buffering, the TextureHost actually needs to keep track of the SurfaceDescriptor...
New patch coming up with a different approach...
Attachment #741525 -
Flags: review?(nical.bugzilla)
Comment 18•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #17)
> New patch coming up with a different approach...
The new approach has problems that we have no solutions to.
Comment 19•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #13)
> Hey, I have a quick hacky patch that /seems/ to work in that the b2g main
> process no longer crashes on http://nightly-gupshup.herokuapp.com, but I
> can't test that page yet because of this content process crash still
> happening:
>
> Program received signal SIGSEGV, Segmentation fault.
> [Switching to Thread 1491.1543]
> 0x40d39b12 in nsDOMCameraManager::AddRef (this=0x45c9dc10) at
> /hack/mozilla-central/dom/camera/DOMCameraManager.cpp:31
> 31 NS_IMPL_CYCLE_COLLECTING_ADDREF(nsDOMCameraManager)
> (gdb) bt
> #0 0x40d39b12 in nsDOMCameraManager::AddRef (this=0x45c9dc10) at
> /hack/mozilla-central/dom/camera/DOMCameraManager.cpp:31
> #1 0x40bac0ec in nsRefPtr (this=0x464ffcfc, aSmartPtr=<value optimized
> out>) at ../../dist/include/nsAutoPtr.h:901
> #2 0x40bad288 in mozilla::MediaManager::GetBackend (this=0x461a3850,
> aWindowId=3) at /hack/mozilla-central/dom/media/MediaManager.cpp:1153
> #3 0x40baf606 in mozilla::GetUserMediaRunnable::Run (this=0x46395720) at
> /hack/mozilla-central/dom/media/MediaManager.cpp:532
> #4 0x414cdcc0 in nsThread::ProcessNextEvent (this=0x45b27a20,
> mayWait=<value optimized out>, result=<value optimized out>) at
> /hack/mozilla-central/xpcom/threads/nsThread.cpp:627
> #5 0x41491554 in NS_ProcessNextEvent (thread=0xb6, mayWait=true) at
> /hack/b2g/B2G/objdir-gecko/xpcom/build/nsThreadUtils.cpp:238
> #6 0x414ce3e4 in nsThread::ThreadFunc (arg=<value optimized out>) at
> /hack/mozilla-central/xpcom/threads/nsThread.cpp:265
> #7 0x42df728c in _pt_root (arg=<value optimized out>) at
> /hack/mozilla-central/nsprpub/pr/src/pthreads/ptthread.c:191
> #8 0x40047e18 in __thread_entry (func=0x42df71f9 <_pt_root>,
> arg=0x463955e0, tls=<value optimized out>) at
> bionic/libc/bionic/pthread.c:217
> #9 0x4004796c in pthread_create (thread_out=<value optimized out>,
> attr=0xbe99bb94, start_routine=0x42df71f9 <_pt_root>, arg=0x463955e0) at
> bionic/libc/bionic/pthread.c:357
>
> #10 0x00000000 in ?? ()
>
> Anyone got a clue about it?
If you take patch in Bug 825110 to enable WebRTC on B2G, I uploaded a new patch to avoid this cycle collector crash.
Assignee | ||
Comment 20•12 years ago
|
||
OK, here is approach #3: implement GrallocBufferActor::ActorDestroy to inform the TextureHost when it's going away. This was pointed out by Bas and :bent as a reasonable short-term fix for this kind of bug. It's not great, because it relies on the possibly nontrivial (?) assumption that the TextureHost is the only thing outside of IPDL-generated code referencing the PGrallocBufferParent.
Anyway, it works here. Contrary to the 2 approaches tried above, it removes the present crash from all the places where I've been able to reproduce it consistently, including the Camera app and WebRTC pages.
Attachment #741985 -
Flags: review?(ncameron)
Attachment #741985 -
Flags: feedback?(bent.mozilla)
Comment 21•12 years ago
|
||
Comment on attachment 741985 [details] [diff] [review]
Let the PGrallocBufferParent inform the TextureHost when it's going away
Review of attachment 741985 [details] [diff] [review]:
-----------------------------------------------------------------
Um, this is awful. But it looks like it should work, just as long as we get rid of it as soon as possible (in fact, please file a bug to do that now).
::: gfx/layers/composite/TextureHost.h
@@ +253,5 @@
> mDeAllocator = aAllocator;
> }
>
> + // used only for hacky fix in gecko 23 for bug 862324
> + void ForgetBuffer();
Can we add this to GrallocTextureHostOGL instead? Then cast in the gralloc actor? I mean if it is not the right type there, we are in all kinds of trouble anyway, right?
@@ +297,5 @@
> + SurfaceDescriptor* mBuffer; // FIXME [bjacob] it's terrible to have a SurfaceDescriptor here,
> + // because SurfaceDescriptor's may have raw pointers to IPDL actors,
> + // which can go away under our feet at any time. This is the cause
> + // of bug 862324 among others. Our current understanding is that
> + // this will be gone in Gecko 24.
Do we have a bug number for Nical's Texture host/client work? Could you add that to all the comments in this file so we know when they should be unnecessary?
::: gfx/layers/ipc/ShadowLayerUtilsGralloc.h
@@ +84,5 @@
> static android::sp<GraphicBuffer>
> GetFrom(const SurfaceDescriptorGralloc& aDescriptor);
>
> + // used only for hacky fix in gecko 23 for bug 862324
> + void ActorDestroy(ActorDestroyReason why);
Does this override something? If so please make it virtual and MOZ_OVERRRIDE
Attachment #741985 -
Flags: review?(ncameron) → review+
Comment 22•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #20)
> Created attachment 741985 [details] [diff] [review]
> ...
> because it relies on the possibly nontrivial (?) assumption that the
Is there a way to put in some kinds of guards (asserts?) for this assumption?
Comment 23•12 years ago
|
||
(In reply to Nick Cameron [:nrc] from comment #21)
> Um, this is awful. But it looks like it should work, just as long as we get
> rid of it as soon as possible (in fact, please file a bug to do that now).
This would be bug 858914
Comment 24•12 years ago
|
||
(In reply to Nicolas Silva [:nical] from comment #23)
> (In reply to Nick Cameron [:nrc] from comment #21)
> > Um, this is awful. But it looks like it should work, just as long as we get
> > rid of it as soon as possible (in fact, please file a bug to do that now).
>
> This would be bug 858914
Can we have a specific bug, blocked by bug 858914, for removing this stuff?
Assignee | ||
Comment 25•12 years ago
|
||
Yup, will file the bug.
A quick update. It turns out that the crash is not completely fixed yet, for the following reason. The patch here assumes that the only way that TextureHost::mBuffer can be set by the outside, is through SetBuffer(). That is true as far as the _pointer_ mBuffer is concerned. But we still have a crash were something around ImageHost is changing *mBuffer under our feet --- same mBuffer pointer, no SetBuffer call, but *mBuffer changes. Debugging at the moment.
Assignee | ||
Comment 26•12 years ago
|
||
Got it by breaking on SurfaceDescriptor::operator=. So the other place where the TextureHost *mBuffer can be changed is in SwapTextures.
New patch coming with review comments from :nrc.
Assignee | ||
Comment 27•12 years ago
|
||
carrying forward r+ from nrc.
Attachment #741525 -
Attachment is obsolete: true
Attachment #742084 -
Flags: review+
Assignee | ||
Comment 28•12 years ago
|
||
By the way, filed bug 865908.
Assignee | ||
Comment 29•12 years ago
|
||
Pushed to try:
https://tbpl.mozilla.org/?tree=Try&rev=870ad028ce8c
Feel free to land if it's green.
Assignee | ||
Comment 30•12 years ago
|
||
Regarding this review comment which I didn't apply:
(In reply to Nick Cameron [:nrc] from comment #21)
> ::: gfx/layers/composite/TextureHost.h
> @@ +253,5 @@
> > mDeAllocator = aAllocator;
> > }
> >
> > + // used only for hacky fix in gecko 23 for bug 862324
> > + void ForgetBuffer();
>
> Can we add this to GrallocTextureHostOGL instead? Then cast in the gralloc
> actor? I mean if it is not the right type there, we are in all kinds of
> trouble anyway, right?
Unfortunately, that turned out to be complicated. It meant adding #include "TextureHostOGL.h" to ShadowLayersUtilsGralloc.h which created complicated compilation errors; I gave up after trying and failing a bit.
Assignee | ||
Comment 31•12 years ago
|
||
Orange on Try, with just 'Killed'. Could be out-of-memory; could speculate that's a leak. Maybe caused by ForgetBuffer(). Will look tomorrow.
Assignee | ||
Comment 32•12 years ago
|
||
Try push destroying all what we can in ForgetBuffer()... would help is the TextureHost's lifetime extended significantly past the GrallocBufferActor...
https://tbpl.mozilla.org/?tree=Try&rev=f3bd163f3ef0
Assignee | ||
Comment 33•12 years ago
|
||
Another try push. I dont remember what really was the rationale for skipping the Send__delete__ in ForgetBuffer. Re-adding it. ForgetBuffer may be a misnomer btw.
https://tbpl.mozilla.org/?tree=Try&rev=915dc0ef65d0
Assignee | ||
Comment 34•12 years ago
|
||
Known to be leaking.
Assignee | ||
Comment 35•12 years ago
|
||
Assignee | ||
Comment 36•12 years ago
|
||
Assignee | ||
Comment 37•12 years ago
|
||
The last patch (NOT hold a strong reference) seems to fix the leak in local testing. Time to push again to try:
https://tbpl.mozilla.org/?tree=Try&rev=31d73e4ff5c3
Assignee | ||
Updated•12 years ago
|
Attachment #742540 -
Attachment description: Make GrallocBufferActor NOT hold a strong reference to TextureHost. Testing at the moment. → Make GrallocBufferActor NOT hold a strong reference to TextureHost. Seems to fix the leak locally.
Comment 38•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #37)
> The last patch (NOT hold a strong reference) seems to fix the leak in local
> testing. Time to push again to try:
>
> https://tbpl.mozilla.org/?tree=Try&rev=31d73e4ff5c3
We're additionally going to make sure we clear this reference after the TextureHost is gone.
Assignee | ||
Comment 39•12 years ago
|
||
Hah, the patch adding the Send__delete__ to ForgetBuffer is actually bad, as it creates a recursion which ends in an assertion failure:
Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 644.667]
0x426049e0 in mozalloc_abort (msg=<value optimized out>) at /hack/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30
30 MOZ_CRASH();
(gdb) bt
#0 0x426049e0 in mozalloc_abort (msg=<value optimized out>) at /hack/mozilla-central/memory/mozalloc/mozalloc_abort.cpp:30
#1 0x41c77172 in Abort (aSeverity=<value optimized out>, aStr=<value optimized out>, aExpr=<value optimized out>, aFile=<value optimized out>, aLine=392) at /hack/mozilla-central/xpcom/base/nsDebugImpl.cpp:430
#2 NS_DebugBreak (aSeverity=<value optimized out>, aStr=<value optimized out>, aExpr=<value optimized out>, aFile=<value optimized out>, aLine=392) at /hack/mozilla-central/xpcom/base/nsDebugImpl.cpp:387
#3 0x4199e8ac in mozilla::layers::PGrallocBufferParent::Write (this=<value optimized out>, __v=<value optimized out>, __msg=0x48c86480, __nullable=true) at /hack/b2g/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:392
#4 0x4199ebb4 in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x49bb9b1c) at /hack/b2g/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:73
#5 0x41d4d9d6 in mozilla::layers::ISurfaceAllocator::PlatformDestroySharedSurface (aSurface=0x48fd48c0) at /hack/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:295
#6 0x41d45fd8 in mozilla::layers::ISurfaceAllocator::DestroySharedSurface (this=0x4873b5c8, aSurface=0x48fd48c0) at /hack/mozilla-central/gfx/layers/ipc/ISurfaceAllocator.cpp:110
#7 0x41d496e4 in mozilla::layers::GrallocTextureHostOGL::ForgetBuffer (this=0x4938ce20) at /hack/mozilla-central/gfx/layers/opengl/TextureHostOGL.h:617
#8 0x41d4cf2e in mozilla::layers::GrallocBufferActor::ActorDestroy (this=<value optimized out>) at /hack/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:230
#9 0x41954b0a in mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x49bb9b1c, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::Deletion)
at /hack/b2g/B2G/objdir-gecko/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:332
#10 0x4199ec5a in mozilla::layers::PGrallocBufferParent::Send__delete__ (actor=0x49bb9b1c) at /hack/b2g/B2G/objdir-gecko/ipc/ipdl/PGrallocBufferParent.cpp:88
#11 0x41d4d9d6 in mozilla::layers::ISurfaceAllocator::PlatformDestroySharedSurface (aSurface=0x48fd48c0) at /hack/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:295
#12 0x41d45fd8 in mozilla::layers::ISurfaceAllocator::DestroySharedSurface (this=0x4873b5c8, aSurface=0x48fd48c0) at /hack/mozilla-central/gfx/layers/ipc/ISurfaceAllocator.cpp:110
#13 0x41d496e4 in mozilla::layers::GrallocTextureHostOGL::ForgetBuffer (this=0x4938ce20) at /hack/mozilla-central/gfx/layers/opengl/TextureHostOGL.h:617
#14 0x41d4cf2e in mozilla::layers::GrallocBufferActor::ActorDestroy (this=<value optimized out>) at /hack/mozilla-central/gfx/layers/ipc/ShadowLayerUtilsGralloc.cpp:230
#15 0x41954b0a in mozilla::plugins::PPluginBackgroundDestroyerParent::DestroySubtree (this=0x49bb975c, why=mozilla::ipc::IProtocolManager<mozilla::ipc::RPCChannel::RPCListener>::Deletion)
at /hack/b2g/B2G/objdir-gecko/ipc/ipdl/PPluginBackgroundDestroyerParent.cpp:332
So now I remember --- that's why the original patch didn't have the Send__delete__. Will clarify the comment.
Assignee | ||
Comment 40•12 years ago
|
||
Comment on attachment 742539 [details] [diff] [review]
Add Send__delete__ call. Still leaking.
Obsoleting this patch, see previous comment.
Attachment #742539 -
Attachment is obsolete: true
Assignee | ||
Comment 41•12 years ago
|
||
Alright. So the do-not-hold-a-strong-reference alone (without the bad Send__delete__ patch) seems to work:
- no kill in local emulator reftest run,
- no issue in local testing of Camera and Browser on unagi.
--> pushed again to try, hoping for green based on local emulator reftest run:
https://tbpl.mozilla.org/?tree=Try&rev=35832507011d
Can land if it's green!
Attachment #742084 -
Attachment is obsolete: true
Attachment #742538 -
Attachment is obsolete: true
Attachment #742540 -
Attachment is obsolete: true
Attachment #742568 -
Flags: review+
Comment 42•12 years ago
|
||
(In reply to Benoit Jacob [:bjacob] from comment #41)
> Created attachment 742568 [details] [diff] [review]
> Can land if it's green!
This has an orange R9. With a crash in:
libxul.so!IPC::ParamTraits<mozilla::layers::MagicGrallocBufferHandle>::Write(IPC::Message*, mozilla::layers::MagicGrallocBufferHandle const&
This looks suspicious.
Assignee | ||
Comment 43•12 years ago
|
||
Thanks for retriggering; the second run is green. I agree this looks a bit too intimidating to just land now. Let's retrigger more times, try cycles are cheap on Friday nights, and then let's compare to the average frequency of random oranges on B2G emulator reftests.
Assignee | ||
Comment 44•12 years ago
|
||
It's all green so far, including all 11 retriggers of R9.
Also, this orange was actually bug 818103: TBPL suggested it, and looking at the stacks there, the suggestion is correct --- I checked a few stacks and they all involve that MagicGrallocBufferHandle stuff.
That bug 818103 has been starred over 40 times today.
So I think that our one orange R9 out of 12 runs is this existing intermittent.
Landing now.
Assignee | ||
Comment 45•12 years ago
|
||
Retriggers of R2 and R7 have hit bug 818103 too; still no reason to believe it's a regression given the huge volume of bug 818103.
Assignee | ||
Updated•12 years ago
|
Attachment #741985 -
Flags: feedback?(bent.mozilla)
Assignee | ||
Comment 46•12 years ago
|
||
Inbound is closed for bustage :-( and I don't want to have to watch the tree after pushing to central. Will push tomorrow.
Assignee | ||
Comment 47•12 years ago
|
||
Target Milestone: --- → mozilla23
Comment 48•12 years ago
|
||
Status: NEW → RESOLVED
Closed: 12 years ago
Resolution: --- → FIXED
Comment 50•11 years ago
|
||
Adding Nicolas to the questions as well (Benoit is not back for another week), because bug 858914 is close to landing and it affects this area significantly. It may be worth waiting for that instead of undoing this change on its own?
Flags: needinfo?(nical.bugzilla)
Assignee | ||
Comment 51•11 years ago
|
||
Indeed, :nical is likely the right person: if any current/recent change makes the present cset no longer needed, it would be his current work on refactoring texturehost.
Flags: needinfo?(bjacob)
Comment 52•11 years ago
|
||
Patches in Bug 858914 haven't landed yet so I guess we need to wait before undoing that.
We will also need to wait for followups of 858914 because this one only fixes textures with ImageLayers.
Flags: needinfo?(nical.bugzilla)
Comment 53•11 years ago
|
||
Might the crash seen in bug 905152 be related to these changes?
You need to log in
before you can comment on or make changes to this bug.
Description
•