Closed
Bug 1137875
Opened 10 years ago
Closed 10 years ago
window.open()/.close() memory leak
Categories
(Core :: Panning and Zooming, defect, P1)
Tracking
()
People
(Reporter: m1, Assigned: kats)
References
Details
(Keywords: regression, Whiteboard: [caf priority: p1][CR 801548][MemShrink:P2])
Attachments
(3 files, 2 obsolete files)
(deleted),
patch
|
nical
:
review+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nical
:
review+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
nical
:
review+
bajaj
:
approval-mozilla-b2g37+
|
Details | Diff | Splinter Review |
+++ This bug was initially created as a clone of Bug #998504 +++
A memory leak is observed on b2g v2.2 in content process of an app that repeatedly invokes window.open()/close().
USS of the content process grows until the LMK kills the process after a couple hours.
STR:
* Run the test app at bug 964386 attachment 8366455 [details] on a Flame device w/ a recent b2g v2.2 build.
Comment 1•10 years ago
|
||
Can we get memory reports after every 100 or so open/close pairs? (Not sure how many open/close cycles it takes; maybe every 10 or every 1000 would be better) GC/CC logs from those times would also be helpful.
Flags: needinfo?(mvines)
Updated•10 years ago
|
Whiteboard: [MemShrink]
Reporter | ||
Comment 2•10 years ago
|
||
Absolutely, please run the test app from the other bug to reproduce this issue yourself.
Flags: needinfo?(mvines)
Comment 3•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #2)
> Absolutely, please run the test app from the other bug to reproduce this
> issue yourself.
I don't have a Flame to test apps on, nor any idea how to load apps on it. In both of the previous bugs, it looked like logs etc. were provided to help diagnose the memory issues, so that's why I asked.
Reporter | ||
Comment 4•10 years ago
|
||
Got it. I'm sure somebody from Moz QA can help. This isn't the first time (or second time) we have regressed here so it's surprising that this still isn't covered as a part of the normal Moz b2g test plan.
Comment 5•10 years ago
|
||
I have a Flame, and I've gotten it working. It seems to be leaking 2 /dev/ashmem fds per iteration (my thanks to the SystemMemoryReporter open-fds tree). It may be interesting to see what happens to it after about 2¼ hours of runtime, when it hits its fd limit.
Updated•10 years ago
|
Whiteboard: [MemShrink] → [CR 801548][MemShrink]
Updated•10 years ago
|
Whiteboard: [CR 801548][MemShrink] → [caf priority: p1][CR 801548][MemShrink]
Reporter | ||
Updated•10 years ago
|
blocking-b2g: --- → 2.2+
Comment 6•10 years ago
|
||
Dave, you fixed something like this back in 1.3, any thoughts?
Flags: needinfo?(huseby)
Comment 7•10 years ago
|
||
This seems to be where the two fd leaks are coming from:
(gdb) bt
#0 mozilla::ipc::SharedMemoryBasic::SharedMemoryBasic (this=0xb24231a0, aHandle=...)
at ../../../gecko/ipc/glue/SharedMemoryBasic_android.cpp:47
#1 0xb5133510 in mozilla::layers::CompositorChild::SharedFrameMetricsData::SharedFrameMetricsData
(this=0xb2468290, metrics=..., handle=..., aAPZCId=<optimized out>)
at ../../../gecko/gfx/layers/ipc/CompositorChild.cpp:236
#2 0xb5133596 in mozilla::layers::CompositorChild::RecvSharedCompositorFrameMetrics (
this=0xb3d7f8e0, metrics=..., handle=..., aAPZCId=@0xbefcbe60: 79)
at ../../../gecko/gfx/layers/ipc/CompositorChild.cpp:209
#3 0xb4f4063e in mozilla::layers::PCompositorChild::OnMessageReceived (this=0xb3d7f8e0, __msg=...)
at PCompositorChild.cpp:898
#4 0xb4f0203c in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0xb3d7f910, aMsg=...)
at ../../../gecko/ipc/glue/MessageChannel.cpp:1218
(gdb) bt
#0 mozilla::ipc::SharedMemoryBasic::SharedMemoryBasic (this=0xb24627a0, aHandle=...)
at ../../../gecko/ipc/glue/SharedMemoryBasic_android.cpp:47
#1 0xb4f03f78 in mozilla::CrossProcessMutex::CrossProcessMutex (this=0xb2cffc20, aHandle=...)
at ../../../gecko/ipc/glue/CrossProcessMutex_posix.cpp:89
#2 0xb5133544 in mozilla::layers::CompositorChild::SharedFrameMetricsData::SharedFrameMetricsData
(this=0xb2468290, metrics=..., handle=..., aAPZCId=<optimized out>)
at ../../../gecko/gfx/layers/ipc/CompositorChild.cpp:238
#3 0xb5133596 in mozilla::layers::CompositorChild::RecvSharedCompositorFrameMetrics (
this=0xb3d7f8e0, metrics=..., handle=..., aAPZCId=@0xbefcbe60: 79)
at ../../../gecko/gfx/layers/ipc/CompositorChild.cpp:209
#4 0xb4f4063e in mozilla::layers::PCompositorChild::OnMessageReceived (this=0xb3d7f8e0, __msg=...)
at PCompositorChild.cpp:898
#5 0xb4f0203c in mozilla::ipc::MessageChannel::DispatchAsyncMessage (this=0xb3d7f910, aMsg=...)
at ../../../gecko/ipc/glue/MessageChannel.cpp:1218
So, here:
229 CompositorChild::SharedFrameMetricsData::SharedFrameMetricsData(
230 const ipc::SharedMemoryBasic::Handle& metrics,
231 const CrossProcessMutexHandle& handle,
232 const uint32_t& aAPZCId) :
233 mMutex(nullptr),
234 mAPZCId(aAPZCId)
235 {
==> 236 mBuffer = new ipc::SharedMemoryBasic(metrics);
237 mBuffer->Map(sizeof(FrameMetrics));
==> 238 mMutex = new CrossProcessMutex(handle);
239 MOZ_COUNT_CTOR(SharedFrameMetricsData);
240 }
Called from here:
209 SharedFrameMetricsData* data = new SharedFrameMetricsData(metrics, handle, aAPZCId);
210 mFrameMetricsTable.Put(data->GetViewID(), data);
Assignee | ||
Comment 8•10 years ago
|
||
I guess the removed SharedFrameMetricsData isn't getting destroyed at [1]? The destructor for that class should clear the fds if it's running.
[1] http://mxr.mozilla.org/mozilla-central/source/gfx/layers/ipc/CompositorChild.cpp?rev=15dea2b0929b#340
Comment 9•10 years ago
|
||
mFrameMetricsTable.Count() seems to be increasing without bound. This looks like a misbalance of SendSharedCompositorFrameMetrics and SendReleaseSharedCompositorFrameMetrics in the parent.
Comment 10•10 years ago
|
||
This, in AsyncPanZoomController::GetSharedFrameMetricsCompositor:
// |state| may be null here if the CrossProcessCompositorParent has already been destroyed.
It is, and this seems to leak the child-side APZC metrics table entry. At this point it might be better for someone who know actually knows this code to look at it.
Component: Performance → Graphics: Layers
Product: Firefox OS → Core
Assignee | ||
Comment 11•10 years ago
|
||
I can take it. Thanks for tracking it down!
Assignee: nobody → bugmail.mozilla
Component: Graphics: Layers → Panning and Zooming
Updated•10 years ago
|
Whiteboard: [caf priority: p1][CR 801548][MemShrink] → [caf priority: p1][CR 801548][MemShrink:P2]
Assignee | ||
Comment 12•10 years ago
|
||
Assignee | ||
Comment 13•10 years ago
|
||
Assignee | ||
Comment 14•10 years ago
|
||
Assignee | ||
Comment 15•10 years ago
|
||
Try push with the above patches: https://treeherder.mozilla.org/#/jobs?repo=try&revision=694bd7a3bf9f
Basically what's happening here is that there is a single child process, with a single CompositorChild instance, which keeps allocating and destroying PLayerTransactions (once with each window open/close). Each PLayerTransaction has a new layers id and so each window open/close creates a new LayerTreeState in the table on the parent-side.
However, the SharedFrameMetrics code doesn't deal with the case where the LayerTreeState for a layers id is removed without the child process itself getting removed. In this scenario it might end up (racy behaviour here) leaking the entries in the mFrameMetricsTable on the child side corresponding to the destroyed PLayerTransaction.
The patches I posted track basically find these dangling SharedFrameMetrics objects on the client side at the time that the PLayerTransaction is destroyed, and remove them. I verified locally that the code is getting hit although I didn't verify that the fd's where actually getting closed. I assume that happens though because it seems to be happening in the normal flow path where we remove these objects via an explicit Release call.
Note that the explicit Release call is still needed, because we want to destroy the SharedFrameMetrics objects when their corresponding APZC goes away, which can happen for reasons other than destroying the PLayerTransaction (for example, if the page content changes so that the scrollframe frame is no longer).
Will flag patches for review once the try results are in, but if anybody wants to try these patches to verify it fixes the memory leak please feel free to do so.
Comment 16•10 years ago
|
||
(In reply to Kartikaya Gupta (email:kats@mozilla.com) from comment #15)
> Will flag patches for review once the try results are in, but if anybody
> wants to try these patches to verify it fixes the memory leak please feel
> free to do so.
I've rebased them to v2.2 and confirmed they fix the file descriptor leak.
I haven't reproduced the memory leak mentioned in comment #0; I conjectured that it would occur when the file descriptor table filled up with leaked ashmem instances, but I didn't let it run that long. I'm currently trying to see if I can reproduce it more easily by lowering the file descriptor limit.
Assignee | ||
Comment 17•10 years ago
|
||
(In reply to Jed Davis [:jld] from comment #16)
> I've rebased them to v2.2 and confirmed they fix the file descriptor leak.
Thanks! The try push also appears green. There was a static analysis build fail which I fixed locally; I'll update the part 1 patch with that and flag for review. Would you prefer if I moved these patches to another bug, if they don't fix the memory leak this bug was filed about?
Comment 18•10 years ago
|
||
(In reply to Michael Vines [:m1] [:evilmachines] from comment #0)
> A memory leak is observed on b2g v2.2 in content process of an app that
> repeatedly invokes window.open()/close().
>
> USS of the content process grows until the LMK kills the process after a
> couple hours.
How sure are we that this is what's going on? I've seen the content process exit due to IPC channel failure after its fd table is filled, but I haven't seen a significant USS increase. (There will be some increase due to the “leaked” objects under the SharedFrameMetricsData, but that shouldn't be very large relative to the phone memory given that the leak can happen at most 4000ish times.)
Flags: needinfo?(mvines)
Reporter | ||
Comment 19•10 years ago
|
||
If you pass me the rebased patch for v2.2 I can give it a shot here
Flags: needinfo?(mvines)
Comment 20•10 years ago
|
||
The three patches, in git-format-patch format, rebased to mozillaorg/v2.2.
Attachment #8572878 -
Flags: feedback?(mvines)
Reporter | ||
Comment 21•10 years ago
|
||
Thanks Jed. I've pulled attachment 8572878 [details] [diff] [review] into our trees and will let automation take a stab at it overnight.
Flags: needinfo?(mvines)
Reporter | ||
Comment 22•10 years ago
|
||
Comment on attachment 8572878 [details] [diff] [review]
Patches, rebased for v2.2, in Git format.
Tests are pretty happy with this patch on v2.2, thanks so much guys.
(FTR: in some devices this bug showed up as a VSS leak, in others as a USS leak. I suspect device differences in display composition with the GPU vs. HWC)
Flags: needinfo?(mvines)
Attachment #8572878 -
Flags: feedback?(mvines) → feedback+
Assignee | ||
Comment 23•10 years ago
|
||
Updated to make the constructor explicit so that it doesn't fail the static analysis build.
Attachment #8572700 -
Attachment is obsolete: true
Attachment #8573258 -
Flags: review?(nical.bugzilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8572701 -
Flags: review?(nical.bugzilla)
Assignee | ||
Updated•10 years ago
|
Attachment #8572702 -
Flags: review?(nical.bugzilla)
Attachment #8572702 -
Flags: feedback?(rbarker)
Updated•10 years ago
|
Attachment #8572701 -
Flags: review?(nical.bugzilla) → review+
Updated•10 years ago
|
Attachment #8572702 -
Flags: review?(nical.bugzilla) → review+
Updated•10 years ago
|
Attachment #8573258 -
Flags: review?(nical.bugzilla) → review+
Assignee | ||
Comment 24•10 years ago
|
||
Comment 25•10 years ago
|
||
https://hg.mozilla.org/mozilla-central/rev/2b53655f3db5
https://hg.mozilla.org/mozilla-central/rev/c0331a879dee
https://hg.mozilla.org/mozilla-central/rev/cc28a1ece6ab
Status: NEW → RESOLVED
Closed: 10 years ago
status-firefox39:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla39
Updated•10 years ago
|
Flags: needinfo?(huseby)
Reporter | ||
Comment 26•10 years ago
|
||
Hey kats, can you please request approval‑mozilla‑b2g37? on what needs to get uplifted here?
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Comment 27•10 years ago
|
||
Yeah, I wanted to give it a few days to settle just in case. I'll make a rollup and request approval.
Flags: needinfo?(bugmail.mozilla)
Assignee | ||
Updated•10 years ago
|
status-b2g-v2.2:
--- → affected
status-b2g-master:
--- → fixed
Assignee | ||
Comment 28•10 years ago
|
||
Comment on attachment 8572878 [details] [diff] [review]
Patches, rebased for v2.2, in Git format.
Actually the patches apply cleanly to 2.2, so I'd rather just uplift them individually rather than do a rollup.
Attachment #8572878 -
Attachment is obsolete: true
Assignee | ||
Comment 29•10 years ago
|
||
Comment on attachment 8573258 [details] [diff] [review]
Part 1 - Make LayerTransactionChild track the layers id
NOTE: Please see https://wiki.mozilla.org/Release_Management/B2G_Landing to better understand the B2G approval process and landings.
[Approval Request Comment]
Bug caused by (feature/regressing bug #): progressive painting + APZ
User impact if declined: in some cases after opening and closing a window in the child process we will leak some file descriptors. Eventually after prolonged use this can cause an OOM crash
Testing completed: patches verified to fix the problem on master and 2.2 (see comment 22)
Risk to taking this patch (and alternatives if risky): fairly low risk; the first two patches just add a new field and then the last patch with the fix is pretty small. patches apply cleanly, this code hasn't been touched a lot.
String or UUID changes made by this patch: none
Attachment #8573258 -
Flags: approval-mozilla-b2g37?
Assignee | ||
Updated•10 years ago
|
Attachment #8572701 -
Flags: approval-mozilla-b2g37?
Assignee | ||
Updated•10 years ago
|
Attachment #8572702 -
Flags: feedback?(rbarker) → approval-mozilla-b2g37?
Updated•10 years ago
|
Attachment #8572701 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•10 years ago
|
Attachment #8572702 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Updated•10 years ago
|
Attachment #8573258 -
Flags: approval-mozilla-b2g37? → approval-mozilla-b2g37+
Comment 30•10 years ago
|
||
Blocks: CAF-v3.0-FL-metabug
No longer blocks: CAF-v3.0-FL-metabug
You need to log in
before you can comment on or make changes to this bug.
Description
•