Closed Bug 910962 Opened 11 years ago Closed 11 years ago

DeallocShmem abort with multi-process

Categories

(Core :: IPC, defect)

x86
macOS
defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla29

People

(Reporter: Felipe, Assigned: billm)

References

Details

Attachments

(2 files)

With our current multi-process setup, the abort introduced in bug 844996 is always hit when the content process dies. STR: - Set browser.tabs.remote to true - Restart firefox - Kill the content process that was launched Result: parent process aborts I don't know if this is exposing a real problem, or if the code was just not expecting this situation.
It's exposing a real problem. This likely means you're doing a double free or trying have a shmem somewhere that isn't a shmem at all.
Blocks: 913155
Are you sure? When you kill the process there really isn't time to free stuff properly.
I don't understand the word "time" here. The parent process has all the time in the world.
(In reply to Tom Schuster [:evilpie] from comment #2) > Are you sure? When you kill the process there really isn't time to free > stuff properly. Note that this isn't a 'leak' but most likely a double free.
nrc pointed out a version of this on desktop Firefox builds: https://tbpl.mozilla.org/php/getParsedLog.php?id=28086871&tree=Try It appears that in this test the child is intentionally being killed. The IPDL system will clean up any active ShMem instances when it tears down the actor tree. Client code (the layer system) should be using ActorDestroy to notice that the actor is dead and should not be touching these ShMem instances after that point. Stack: 03:46:35 INFO - Crash reason: EXCEPTION_BREAKPOINT 03:46:35 INFO - Crash address: 0x7c90120e 03:46:35 INFO - Thread 23 (crashed) 03:46:35 INFO - 0 ntdll.dll + 0x120e 03:46:35 INFO - eip = 0x7c90120e esp = 0x08aff85c ebp = 0x08affc78 ebx = 0x00000040 03:46:35 INFO - esi = 0x10261440 edi = 0x00000000 eax = 0x00000000 ecx = 0x08aff874 03:46:35 INFO - edx = 0x003853dc efl = 0x00000202 03:46:35 INFO - Found by: given as instruction pointer in context 03:46:35 INFO - 1 xul.dll!mozilla::layers::PLayerTransactionParent::DeallocShmem(mozilla::ipc::Shmem &) [PLayerTransactionParent.cpp:226d33cca5ed : 827 + 0x16] 03:46:35 INFO - eip = 0x0328f218 esp = 0x08affc80 ebp = 0x08affc9c 03:46:35 INFO - Found by: previous frame's frame pointer 03:46:35 INFO - 2 xul.dll!mozilla::layers::ISurfaceAllocator::DestroySharedSurface(mozilla::layers::SurfaceDescriptor *) [ISurfaceAllocator.cpp:226d33cca5ed : 138 + 0x10] 03:46:35 INFO - eip = 0x037a3423 esp = 0x08affca4 ebp = 0x08affce8 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 3 xul.dll!mozilla::layers::DeprecatedTextureHost::~DeprecatedTextureHost() [TextureHost.cpp:226d33cca5ed : 192 + 0x5] 03:46:35 INFO - eip = 0x037c7a93 esp = 0x08affcf0 ebp = 0x08affd00 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 4 xul.dll!mozilla::layers::DeprecatedTextureHostShmemD3D9::`scalar deleting destructor'(unsigned int) + 0xa 03:46:35 INFO - eip = 0x0375b4d3 esp = 0x08affcfc ebp = 0x08affd00 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 5 xul.dll!mozilla::detail::RefCounted<mozilla::layers::TextureSource,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa] 03:46:35 INFO - eip = 0x037568d6 esp = 0x08affd08 ebp = 0x08affd28 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 6 xul.dll!mozilla::layers::ContentHostDoubleBuffered::DestroyTextures() [ContentHost.cpp:226d33cca5ed : 439 + 0xb] 03:46:35 INFO - eip = 0x037a0ff4 esp = 0x08affd10 ebp = 0x08affd28 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 7 xul.dll!mozilla::layers::ContentHostDoubleBuffered::~ContentHostDoubleBuffered() [ContentHost.cpp:226d33cca5ed : 377 + 0x4] 03:46:35 INFO - eip = 0x037a1765 esp = 0x08affd1c ebp = 0x08affd28 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 8 xul.dll!mozilla::layers::ContentHostDoubleBuffered::`vector deleting destructor'(unsigned int) + 0xa 03:46:35 INFO - eip = 0x03789e15 esp = 0x08affd24 ebp = 0x08affd28 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 9 xul.dll!mozilla::detail::RefCounted<mozilla::layers::CompositableHost,1>::Release() [RefPtr.h:226d33cca5ed : 82 + 0xa] 03:46:35 INFO - eip = 0x03780e88 esp = 0x08affd30 ebp = 0x08affd44 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 10 xul.dll!mozilla::layers::CompositableParent::~CompositableParent() [CompositableHost.cpp:226d33cca5ed : 259 + 0x11] 03:46:35 INFO - eip = 0x0378b9ba esp = 0x08affd38 ebp = 0x08affd44 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 11 xul.dll!mozilla::layers::CompositableParent::`scalar deleting destructor'(unsigned int) + 0xa 03:46:35 INFO - eip = 0x0378bdc3 esp = 0x08affd40 ebp = 0x08affd44 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 12 xul.dll!mozilla::layers::LayerTransactionParent::DeallocPCompositableParent(mozilla::layers::PCompositableParent *) [LayerTransactionParent.cpp:226d33cca5ed : 592 + 0xc] 03:46:35 INFO - eip = 0x037b6720 esp = 0x08affd4c ebp = 0x08affd50 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 13 xul.dll!mozilla::layers::PLayerTransactionParent::DeallocSubtree() [PLayerTransactionParent.cpp:226d33cca5ed : 898 + 0x11] 03:46:35 INFO - eip = 0x032946f1 esp = 0x08affd58 ebp = 0x08affda0 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 14 xul.dll!mozilla::layers::PCompositorParent::DeallocSubtree() [PCompositorParent.cpp:226d33cca5ed : 893 + 0x12] 03:46:35 INFO - eip = 0x03275150 esp = 0x08affd6c ebp = 0x08affda0 03:46:35 INFO - Found by: call frame info 03:46:35 INFO - 15 xul.dll!mozilla::layers::PCompositorParent::OnChannelError() [PCompositorParent.cpp:226d33cca5ed : 756 + 0x6] 03:46:35 INFO - eip = 0x03276604 esp = 0x08affd7c ebp = 0x08affda0 In this case it seems that CompositableParent::ActorDestroy does call Detach on its CompositableHost, but still holds a ref to ~ContentHostDoubleBuffered (via mFirstTexture? not sure about the object hierarchy here). So probably we need to make sure that ContentHostDoubleBuffered::DestroyTextures() is called from within ActorDestroy. But I'm a little fuzzy on how it all works.
Attached file stacktrace (deleted) —
I was trying to reproduce something else and hit this again. Just clicked on reddit links and went back and forward rather quickly for a few minutes.
At our last e10s meeting, Bill said he could take a look at this bug.
Assignee: nobody → wmccloskey
Attached patch ipc-fix (deleted) — Splinter Review
This is basically a bogus assertion. It happens when the IPC channel unexpected closes and we tear everything down. The code looks like this: auto PCompositorParent::OnChannelError() -> void { DestroySubtree(AbnormalShutdown); DeallocSubtree(); DeallocShmems(); } The code for DeallocSubtree() eventually calls DestroySharedMemory. Currently, that code looks like this: auto PCompositorParent::DestroySharedMemory(Shmem& shmem) -> bool { Shmem::id_t aId = (shmem).Id(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead()); Shmem::SharedMemory* segment = LookupSharedMemory(aId); if ((!(segment))) { return false; } Message* descriptor = (shmem).UnshareFrom(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), OtherProcess(), MSG_ROUTING_CONTROL); (mShmemMap).Remove(aId); Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment); return (descriptor) && ((mChannel).Send(descriptor)); } We fail here when we try to do the Send call, since the channel is already closed. However, everything else is successful. There doesn't seem to be any reason why we can't allow DestroySharedMemory to be called at this time. Aside from the Send call, there's nothing wrong. This patch just adds a check and avoids the Send() call if the channel is already closed. The new code looks like this: ... Message* descriptor = ...; (mShmemMap).Remove(aId); Shmem::Dealloc(Shmem::IHadBetterBeIPDLCodeCallingThis_OtherwiseIAmADoodyhead(), segment); if ((!((mChannel).CanSend()))) { delete descriptor; return true; } return (descriptor) && ((mChannel).Send(descriptor)); } I had to add an extra CanSend method to the channel. I was hoping to use the protocol's mState to detect if we had already closed. However, that doesn't appear to be updated at all during this entire sequence. At the time we crash, it just contains the normal __Null state.
Attachment #8368244 - Flags: review?(benjamin)
Attachment #8368244 - Flags: review?(benjamin) → review+
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla29
Keywords: verifyme
I tried to reproduce this on Firefox 26, on Mac OS X 10.9. After restarting, I get one extra process: Firefox Plugin Process. When killing it, I get: "Tab crashed Well, this is embarrassing. We tried to display this Web page, but it's not responding." Nothing else happens. When hitting "Retry", the process is restarted and the tab is restored. I see the exact same behavior with Firefox 29b3 on MAC OS X 10.9. Felipe, is this what you reproduced initially? If not, could you please verify if this bug is fixed for you on Firefox 29?
Flags: needinfo?(felipc)
Keywords: verifyme
Ioana, this bug affected only debug builds of Firefox, not release builds. So that's probably why you weren't able to reproduce it on your test. But I think further verification is not necessary, I do recall the problem going away with the fix.
Flags: needinfo?(felipc)
based on comment 12
Status: RESOLVED → VERIFIED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: