Closed
Bug 910962
Opened 11 years ago
Closed 11 years ago
DeallocShmem abort with multi-process
Categories
(Core :: IPC, defect)
Tracking
()
VERIFIED
FIXED
mozilla29
People
(Reporter: Felipe, Assigned: billm)
References
Details
Attachments
(2 files)
(deleted),
text/plain
|
Details | |
(deleted),
patch
|
benjamin
:
review+
|
Details | Diff | Splinter Review |
With our current multi-process setup, the abort introduced in bug 844996 is always hit when the content process dies.
STR:
- Set browser.tabs.remote to true
- Restart firefox
- Kill the content process that was launched
Result: parent process aborts
I don't know if this is exposing a real problem, or if the code was just not expecting this situation.
Comment 1•11 years ago
|
||
It's exposing a real problem. This likely means you're doing a double free or trying have a shmem somewhere that isn't a shmem at all.
Comment 2•11 years ago
|
||
Are you sure? When you kill the process there really isn't time to free stuff properly.
Comment 3•11 years ago
|
||
I don't understand the word "time" here. The parent process has all the time in the world.
Comment 4•11 years ago
|
||
(In reply to Tom Schuster [:evilpie] from comment #2)
> Are you sure? When you kill the process there really isn't time to free
> stuff properly.
Note that this isn't a 'leak' but most likely a double free.
Comment 5•11 years ago
|
||
nrc pointed out a version of this on desktop Firefox builds:
https://tbpl.mozilla.org/php/getParsedLog.php?id=28086871&tree=Try
It appears that in this test the child is intentionally being killed. The IPDL system will clean up any active ShMem instances when it tears down the actor tree. Client code (the layer system) should be using ActorDestroy to notice that the actor is dead and should not be touching these ShMem instances after that point.
Stack:
03:46:35 INFO - Crash reason: EXCEPTION_BREAKPOINT
03:46:35 INFO - Crash address: 0x7c90120e
03:46:35 INFO - Thread 23 (crashed)
03:46:35 INFO - 0 ntdll.dll + 0x120e
03:46:35 INFO - eip = 0x7c90120e esp = 0x08aff85c ebp = 0x08affc78 ebx = 0x00000040
03:46:35 INFO - esi = 0x10261440 edi = 0x00000000 eax = 0x00000000 ecx = 0x08aff874
03:46:35 INFO - edx = 0x003853dc efl = 0x00000202
03:46:35 INFO - Found by: given as instruction pointer in context
03:46:35 INFO - 1 xul.dll!mozilla::layers::PLayerTransactionParent::DeallocShmem(mozilla::ipc::Shmem &) [PLayerTransactionParent.cpp:226d33cca5ed : 827 + 0x16]
03:46:35 INFO - eip = 0x0328f218 esp = 0x08affc80 ebp = 0x08affc9c
03:46:35 INFO - Found by: previous frame's frame pointer
03:46:35 INFO - 2 xul.dll!mozilla::layers::ISurfaceAllocator::DestroySharedSurface(mozilla::layers::SurfaceDescriptor *) [ISurfaceAllocator.cpp:226d33cca5ed : 138 + 0x10]
03:46:35 INFO - eip = 0x037a3423 esp = 0x08affca4 ebp = 0x08affce8
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 3 xul.dll!mozilla::layers::DeprecatedTextureHost::~DeprecatedTextureHost() [TextureHost.cpp:226d33cca5ed : 192 + 0x5]
03:46:35 INFO - eip = 0x037c7a93 esp = 0x08affcf0 ebp = 0x08affd00
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 4 xul.dll!mozilla::layers::DeprecatedTextureHostShmemD3D9::`scalar deleting destructor'(unsigned int) + 0xa
03:46:35 INFO - eip = 0x0375b4d3 esp = 0x08affcfc ebp = 0x08affd00
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 5 xul.dll!mozilla::detail::RefCounted<mozilla::layers::TextureSource,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa]
03:46:35 INFO - eip = 0x037568d6 esp = 0x08affd08 ebp = 0x08affd28
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 6 xul.dll!mozilla::layers::ContentHostDoubleBuffered::DestroyTextures() [ContentHost.cpp:226d33cca5ed : 439 + 0xb]
03:46:35 INFO - eip = 0x037a0ff4 esp = 0x08affd10 ebp = 0x08affd28
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 7 xul.dll!mozilla::layers::ContentHostDoubleBuffered::~ContentHostDoubleBuffered() [ContentHost.cpp:226d33cca5ed : 377 + 0x4]
03:46:35 INFO - eip = 0x037a1765 esp = 0x08affd1c ebp = 0x08affd28
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 8 xul.dll!mozilla::layers::ContentHostDoubleBuffered::`vector deleting destructor'(unsigned int) + 0xa
03:46:35 INFO - eip = 0x03789e15 esp = 0x08affd24 ebp = 0x08affd28
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 9 xul.dll!mozilla::detail::RefCounted<mozilla::layers::CompositableHost,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa]
03:46:35 INFO - eip = 0x03780e88 esp = 0x08affd30 ebp = 0x08affd44
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 10 xul.dll!mozilla::layers::CompositableParent::~CompositableParent() [CompositableHost.cpp:226d33cca5ed : 259 + 0x11]
03:46:35 INFO - eip = 0x0378b9ba esp = 0x08affd38 ebp = 0x08affd44
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 11 xul.dll!mozilla::layers::CompositableParent::`scalar deleting destructor'(unsigned int) + 0xa
03:46:35 INFO - eip = 0x0378bdc3 esp = 0x08affd40 ebp = 0x08affd44
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 12 xul.dll!mozilla::layers::LayerTransactionParent::DeallocPCompositableParent(mozilla::layers::PCompositableParent *) [LayerTransactionParent.cpp:226d33cca5ed : 592 + 0xc]
03:46:35 INFO - eip = 0x037b6720 esp = 0x08affd4c ebp = 0x08affd50
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 13 xul.dll!mozilla::layers::PLayerTransactionParent::DeallocSubtree() [PLayerTransactionParent.cpp:226d33cca5ed : 898 + 0x11]
03:46:35 INFO - eip = 0x032946f1 esp = 0x08affd58 ebp = 0x08affda0
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 14 xul.dll!mozilla::layers::PCompositorParent::DeallocSubtree() [PCompositorParent.cpp:226d33cca5ed : 893 + 0x12]
03:46:35 INFO - eip = 0x03275150 esp = 0x08affd6c ebp = 0x08affda0
03:46:35 INFO - Found by: call frame info
03:46:35 INFO - 15 xul.dll!mozilla::layers::PCompositorParent::OnChannelError() [PCompositorParent.cpp:226d33cca5ed : 756 + 0x6]
03:46:35 INFO - eip = 0x03276604 esp = 0x08affd7c ebp = 0x08affda0
In this case it seems that CompositableParent::ActorDestroy does call Detach on its CompositableHost, but still holds a ref to ~ContentHostDoubleBuffered (via mFirstTexture? not sure about the object hierarchy here). So probably we need to make sure that ContentHostDoubleBuffered::DestroyTextures() is called from within ActorDestroy. But I'm a little fuzzy on how it all works.
Comment 6•11 years ago
|
||
I was trying to reproduce something else and hit this again. Just clicked on reddit links and went back and forward rather quickly for a few minutes.
Comment 7•11 years ago
|
||
At our last e10s meeting, Bill said he could take a look at this bug.
Assignee: nobody → wmccloskey
Assignee | ||
Comment 8•11 years ago
|
||
This is basically a bogus assertion. It happens when the IPC channel unexpected closes and we tear everything down. The code looks like this:
auto PCompositorParent::OnChannelError() -> void
{
DestroySubtree(AbnormalShutdown);
DeallocSubtree();
DeallocShmems();
}
The code for DeallocSubtree() eventually calls DestroySharedMemory. Currently, that code looks like this:
auto PCompositorParent::DestroySharedMemory(Shmem& shmem) -> bool
{
Shmem::id_t aId = (shmem).Id(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead());
Shmem::SharedMemory* segment = LookupSharedMemory(aId);
if ((!(segment))) {
return false;
}
Message* descriptor = (shmem).UnshareFrom(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), OtherProcess(), MSG_ROUTING_CONTROL);
(mShmemMap).Remove(aId);
Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment);
return (descriptor) && ((mChannel).Send(descriptor));
}
We fail here when we try to do the Send call, since the channel is already closed. However, everything else is successful.
There doesn't seem to be any reason why we can't allow DestroySharedMemory to be called at this time. Aside from the Send call, there's nothing wrong. This patch just adds a check and avoids the Send() call if the channel is already closed. The new code looks like this:
...
Message* descriptor = ...;
(mShmemMap).Remove(aId);
Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment);
if ((!((mChannel).CanSend()))) {
delete descriptor;
return true;
}
return (descriptor) && ((mChannel).Send(descriptor));
}
I had to add an extra CanSend method to the channel. I was hoping to use the protocol's mState to detect if we had already closed. However, that doesn't appear to be updated at all during this entire sequence. At the time we crash, it just contains the normal __Null state.
Attachment #8368244 -
Flags: review?(benjamin)
Updated•11 years ago
|
Attachment #8368244 -
Flags: review?(benjamin) → review+
Assignee | ||
Comment 9•11 years ago
|
||
Comment 10•11 years ago
|
||
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Comment 11•11 years ago
|
||
I tried to reproduce this on Firefox 26, on Mac OS X 10.9. After restarting, I get one extra process: Firefox Plugin Process. When killing it, I get: "Tab crashed
Well, this is embarrassing. We tried to display this Web page, but it's not responding."
Nothing else happens. When hitting "Retry", the process is restarted and the tab is restored.
I see the exact same behavior with Firefox 29b3 on MAC OS X 10.9.
Felipe, is this what you reproduced initially? If not, could you please verify if this bug is fixed for you on Firefox 29?
Flags: needinfo?(felipc)
Keywords: verifyme
Reporter | ||
Comment 12•11 years ago
|
||
Ioana, this bug affected only debug builds of Firefox, not release builds. So that's probably why you weren't able to reproduce it on your test. But I think further verification is not necessary, I do recall the problem going away with the fix.
Flags: needinfo?(felipc)
You need to log in
before you can comment on or make changes to this bug.
Description
•