Closed
Bug 893012
Opened 11 years ago
Closed 11 years ago
We leak a ContentParent when its corresponding process is SIGKIL'ed
Categories
(Core :: IPC, defect)
Core
IPC
Tracking
()
People
(Reporter: justin.lebar+bug, Assigned: justin.lebar+bug)
References
Details
(Keywords: meta, Whiteboard: [MemShrink:P1])
Attachments
(10 files, 1 obsolete file)
(deleted),
application/x-xz
|
Details | |
(deleted),
application/x-xz
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/zip
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/x-compressed-tar
|
Details | |
(deleted),
application/x-compressed-tar
|
Details |
STR:
* Send SIGKIL to a B2G child process
* ContentParent::ActorDestroy runs, but ContentParent::ShutDownProcess does not, and neither does the ContentParent destructor.
This is at least part of the cause of bug 889261 and friends.
Assignee | ||
Updated•11 years ago
|
Assignee: nobody → justin.lebar+bug
Assignee | ||
Updated•11 years ago
|
Whiteboard: [MemShrink]
Comment 1•11 years ago
|
||
If ShutDownProcess is supposed to be called, it probably needs a slightly different name. But yes, ActorDestroy for the root actor is responsible for making sure that it triggers the actions necessary to destroy that actor, because the IPC layer doesn't own it and is therefore not responsible for destroying it as it does for all the child actors.
Updated•11 years ago
|
Assignee | ||
Comment 2•11 years ago
|
||
We leak the ContentParent because its nsFrameLoader keeps a strong ref to it in in |mChildHost|.
Still figuring out why we leak the frame loader.
Smells cycle-y.
Assignee | ||
Comment 4•11 years ago
|
||
We leak the frameloader because we leak BrowserElementParent. We leak BEP in part because it registers an event listener on its window. I fixed that and it seems like we still leak the BEP, and I'm not sure yet what's going on.
But I'm going to open a separate bug for closing the ContentParent's channel upon ActorDestroy. That should take care of the many leaked messages, and it's a prerequisite for not leaking ContentParent.
Assignee | ||
Comment 5•11 years ago
|
||
With the patch from bug 893172, if I remove the mChildHost ref in nsFrameLoader, my memory reporter from bug 893242 shows that we no longer leak the ContentParent.
So that's good confirmation. But removing the mChildHost doesn't really fix the problem, since we're still leaking an nsFrameLoader.
Assignee | ||
Comment 6•11 years ago
|
||
Andrew (or someone), can you help me understand what this means?
The nsFrameLoader I care about is 0x476a8ec0:
> $ python find_roots.py cc-edges.14646.log 0x476a8ec0
> Parsing cc-edges.14646.log. Done loading graph. Reversing graph. Done.
>
> 0x44e66600 [JS Object (XPCWrappedNative_NoHelper)]
> --[js::GetObjectPrivate(obj)]-> 0x4721baf0 [XPCWrappedNative]
> --[mIdentity]-> 0x476a8ec0 [nsFrameLoader]
>
> Root 0x44e66600 is a marked GC object.
So far so good; it's held alive only by JS.
$ python find_roots.py gc-edges.14646.log 0x44e66600
Parsing gc-edges.14646.log. Done loading graph. Reversing graph. Done.
> via resource://gre/modules/BrowserElementParent.jsm :
> 0x4391f0a0 [FakeBackstagePass <no private>]
> --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>]
> --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via resource://gre/modules/BrowserElementPromptService.jsm :
> 0x4391f0b0 [FakeBackstagePass <no private>]
> --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>]
> --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via XPCWrappedNative::mFlatJSObject :
> 0x47455940 [ContentFrameMessageManager 460381f0]
> --[BrowserElementPromptService]-> 0x4394e0a0 [Proxy <no private>]
> --[private]-> 0x4393b2b0 [Object <no private>]
> --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via jar:file:///system/b2g/omni.ja!/components/Webapps.js :
> 0x4391fa20 [FakeBackstagePass <no private>]
> --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>]
> --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via XPCWrappedNative::mFlatJSObject :
> 0x47455040 [ContentFrameMessageManager 469ea100]
> --[BrowserElementPromptService]-> 0x47725460 [Proxy <no private>]
> --[private]-> 0x4393b2b0 [Object <no private>]
> --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via nsXPCWrappedJS::mJSObj :
> 0x43944100 [Object <no private>]
> --[_bepMap]-> 0x4393c5a0 [WeakMap 460380a0]
> --[WeakMap entry]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
>
> via Preserved wrapper :
> 0x43b72580 [HTMLIFrameElement <no private>]
> --[setVisible]-> 0x43b84150 [Proxy <no private>]
> --[private]-> 0x47781f40 [Function]
> --[fun_callscope]-> 0x47481d30 [Call <no private>]
> --[**UNKNOWN SLOT 1**]-> 0x47781f00 [Function BrowserElementParent/defineMethod]
> --[fun_callscope]-> 0x47481b50 [Call <no private>]
> --[self]-> 0x47481b80 [Object <no private>]
> --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0]
The only non-weak entry here is the last one. The <iframe mozbrowser> has an expando called setVisible, and that function holds a ref to |self|, which is BrowserElementParent, which holds a ref to _frameLoader.
That makes sense. There are a few other expandos on <iframe mozbrowser> which are functions whose scopes also hold a ref to the BrowserElementParent, but I presume they're being elided here.
So now, why is the iframe alive?
> $ python find_roots.py gc-edges.14646.log 0x43b72580
> Parsing gc-edges.14646.log. Done loading graph. Reversing graph. Done.
>
> via Preserved wrapper :
> 0x43b72580 [HTMLIFrameElement <no private>]
Hm...what's a preserved wrapper? The cc log isn't any more help.
> $ python find_roots.py cc-edges.14646.log 0x43b72580
> Parsing cc-edges.14646.log. Done loading graph. Reversing graph. Done.
>
> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x43b72580 is a marked GC object.
I tried to look at the object in gdb, but 0x43b72580 does not appear to be a
mozilla::dom::HTMLIFrameElement; its vtable pointers don't line up with
anything gdb knows about, and its refcnt is 1136019944. Maybe that pointer is
to one of the object's nsISupports's, or something.
Anyway, I'm not sure what to do now. I could make the expandos on the iframe
hold weak refs to the BrowserElementParent, but then we'd still be leaking the
iframe.
Any ideas?
Assignee | ||
Comment 7•11 years ago
|
||
Ah, I misunderstood those weakmap refs.
They're refs where the BEP is the /value/, which is a strong ref.
However, these are all keyed on the iframe, so when the iframe goes away, the BEP should too. So we're still left with the question of why this iframe is alive.
Assignee | ||
Comment 8•11 years ago
|
||
Without this, both the window and the iframe hold the BEP alive. With this
patch, only the iframe seems to hold the BEP alive (per the above comments),
which is what we want. But we still leak the iframe, it seems.
Comment 9•11 years ago
|
||
Could you upload the GC and CC logs you are talking about? I can look at them.
0x43b72580 is a JS Object (HTMLIFrameElement), which means it is the JS reflector for the HTMLIFrameElement, not the element itself. This is an all-traces log (hopefully), so you want to look up the reflector 0x43b72580 in the CC log, and that should contain a pointer to the C++ object.
When a JS reflector (aka wrapper in the context of preserved wrapper) is created for a C++ object, it owns the C++ object, but not vice versa. If you set an expando on the reflector (or a few other things), we want to ensure the reflector lives as long as the C++ object, so we make the C++ object own its reflector. That's called a "preserved wrapper".
Comment 10•11 years ago
|
||
While a preserved wrapper on a C++ object makes it own its reflector, it shouldn't cause its reflector to be marked black (which means the CC won't collect it), except as a CC optimization (if we decide the C++ object is definitely alive, then if the reflector is preserved, the reflector is definitely alive), so I suspect this is an instance where AllTraces is failing to deoptimize, perhaps in mozilla::dom::TraceBlackJS (I filed bug 893378 for that).
The good news though is that this is really just find_roots getting confused, the information should still be in the log. In general, you might need to manually root around (no pun intended) in the log, but here I think it should suffice to edit the log, and replace the [gc.marked] next to the entry for 0x43b72580 with [gc]. This will void your warranty (eg it may mess stuff up, including but not limited to things this reflector owns), but I think it should be okay, because nothing else is pointing to the reflector.
Assignee | ||
Comment 11•11 years ago
|
||
Assignee | ||
Comment 12•11 years ago
|
||
Assignee | ||
Comment 13•11 years ago
|
||
Okay, with the suggested change to the log from comment 10, I now get
> $ python cc/find_roots.py cc-edges.log 0x43b72580
> Parsing cc-edges.log. Done loading graph. Reversing graph. Done.
>
> 0x43b725b0 [JS Object (HTMLDivElement)]
> --[UnwrapDOMObject(obj)]-> 0x476a8dd0 [FragmentOrElement (xhtml) div id='appframe0' class='appWindow homescreen active' app://system.gaiamobile.org/index.html]
> --[mAttrsAndChildren[i]]-> 0x46b9a880 [FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html]
> --[Preserved wrapper]-> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x43b725b0 is a marked GC object.
>
> 0x43b72610 [JS Object (HTMLDivElement)]
> --[UnwrapDOMObject(obj)]-> 0x47b472e0 [FragmentOrElement (xhtml) div class='screenshot-overlay' app://system.gaiamobile.org/index.html]
> --[GetParent()]-> 0x476a8dd0 [FragmentOrElement (xhtml) div id='appframe0' class='appWindow homescreen active' app://system.gaiamobile.org/index.html]
> --[mAttrsAndChildren[i]]-> 0x46b9a880 [FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html]
> --[Preserved wrapper]-> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x43b72610 is a marked GC object.
>
> 0x47756d90 [JS Object (Proxy)]
> --[private]-> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x47756d90 is a marked GC object.
>
> 0x43b84010 [JS Object (Call)]
> --[iframe]-> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x43b84010 is a marked GC object.
>
> 0x43b6eeb0 [JS Object (Object)]
> --[iframe]-> 0x43b72580 [JS Object (HTMLIFrameElement)]
>
> Root 0x43b6eeb0 is a marked GC object.
I'm not sure what the last three mean, but the first is indicating that we're holding a div element alive, and that div contains the iframe. I think the second is saying that this first div contains a second div, and that second div holds alive the first one by virtue of its parent pointer. It's not clear to me whether this rules out the fact that the second div is held alive by the first one (by virtue of being its child); that is, I don't know if the second div is interesting.
Assignee | ||
Comment 14•11 years ago
|
||
Looking for gc roots of the first div element appears to show that Gaia is leaking the world.
> $ python ~/code/moz/heapgraph/gc/find_roots.py gc-edges.log 0x43b725b0
> Parsing gc-edges.log. Done loading graph. Reversing graph. Done.
>
> via nsXPCWrappedJS::mJSObj :
> 0x43b6dac0 [Function AppError/<]
> --[fun_callscope]-> 0x43b6eee0 [Call <no private>]
> --[self]-> 0x43b6ef10 [Object <no private>]
> --[app]-> 0x43b6eeb0 [Object <no private>]
> --[frame]-> 0x43b725b0 [HTMLDivElement <no private>]
>
> via nsXPCWrappedJS::mJSObj :
> 0x43b6db40 [Function apploaded]
> --[fun_callscope]-> 0x43b84010 [Call <no private>]
> --[homescreenFrame]-> 0x43b725b0 [HTMLDivElement <no private>]
Assignee | ||
Comment 15•11 years ago
|
||
In Gaia's apps/system/js/window.js:
> window.AppError = function AppError(app) {
> var self = this;
> this.app = app;
> this.app.frame.addEventListener('mozbrowsererror', function(evt) {
> if (evt.detail.type != 'other')
> return;
>
> console.warn(
> 'app of [' + self.app.origin + '] got a mozbrowsererror event.');
>
> if (self.injected) {
> self.update();
> } else {
> self.render();
> }
> self.show();
> self.injected = true;
> });
> return this;
> };
We register one mozbrowsererror listener for every frame, but we never remove the listener. The mozbrowsererror listener's closure holds alive self, which is |this|, which holds alive |this.app|, which holds alive |frame|, which holds alive our iframe. Sheesh.
So that's just gonna leak.
In Gaia's apps/system/js/window_manager.js:
> app.frame.addEventListener(type, function apploaded(e) {
> e.target.removeEventListener(e.type, apploaded, true);
>
> var evt = document.createEvent('CustomEvent');
> evt.initCustomEvent('apploadtime', true, false, {
> time: parseInt(Date.now() - iframe.dataset.start),
> type: (e.type == 'appopen') ? 'w' : 'c'
> });
> iframe.dispatchEvent(evt);
> }, true);
This one is a bit weird. If this closure holds alive homescreenFrame (which exists in the closure's scope but is not referenced inside the closure), does that mean the closure is heavyweight? If so, why is it heavyweight?
Let us pause to consider how incredibly difficult it is to write JS that doesn't leak.
Comment 16•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #15)
> In Gaia's apps/system/js/window.js:
>
> We register one mozbrowsererror listener for every frame, but we never
> remove the listener. The mozbrowsererror listener's closure holds alive
> self, which is |this|, which holds alive |this.app|, which holds alive
> |frame|, which holds alive our iframe. Sheesh.
>
> So that's just gonna leak.
Does "removeEventListener() before the element is removed" help?
Do you suggest that we should remove event listener before an element is removed from DOM tree?
>
> In Gaia's apps/system/js/window_manager.js:
>
> > app.frame.addEventListener(type, function apploaded(e) {
> > e.target.removeEventListener(e.type, apploaded, true);
> >
> > var evt = document.createEvent('CustomEvent');
> > evt.initCustomEvent('apploadtime', true, false, {
> > time: parseInt(Date.now() - iframe.dataset.start),
> > type: (e.type == 'appopen') ? 'w' : 'c'
> > });
> > iframe.dispatchEvent(evt);
> > }, true);
>
> This one is a bit weird. If this closure holds alive homescreenFrame (which
> exists in the closure's scope but is not referenced inside the closure),
> does that mean the closure is heavyweight? If so, why is it heavyweight?
This one is not invented by me...however I don't get your point here.
Which line is the heavyweight point you mean? Or you are saying the whole WindowManager closure is something bad?
>
> Let us pause to consider how incredibly difficult it is to write JS that
> doesn't leak.
I agree.
(In reply to Justin Lebar [:jlebar] from comment #15)
> In Gaia's apps/system/js/window_manager.js:
>
> > app.frame.addEventListener(type, function apploaded(e) {
> > e.target.removeEventListener(e.type, apploaded, true);
> >
> > var evt = document.createEvent('CustomEvent');
> > evt.initCustomEvent('apploadtime', true, false, {
> > time: parseInt(Date.now() - iframe.dataset.start),
> > type: (e.type == 'appopen') ? 'w' : 'c'
> > });
> > iframe.dispatchEvent(evt);
> > }, true);
>
> This one is a bit weird. If this closure holds alive homescreenFrame (which
> exists in the closure's scope but is not referenced inside the closure),
> does that mean the closure is heavyweight? If so, why is it heavyweight?
I believe there are situations where a closure can keep alive things that appear to be unnecessary without being heavyweight. Let's poke at this tomorrow (today? :-P) in the debugger, and rope in some JS folks if we need them.
> Let us pause to consider how incredibly difficult it is to write JS that
> doesn't leak.
*le sigh*
Comment 18•11 years ago
|
||
> So that's just gonna leak.
Cycle collector should collect this cycle. Why doesn't it? Or is the point that "self" is actually a live window all along or something?
Assignee | ||
Comment 19•11 years ago
|
||
> I don't get your point here.
> Which line is the heavyweight point you mean? Or you are saying the whole WindowManager
> closure is something bad?
In our JS engine, a closure can either be "heavyweight" or "lightweight".
The idea is that a closure can access variables in its containing scope, so if you keep the closure alive, it has to keep alive the variables from the containing scope that it can access.
Normally the JS engine parses the text of the closure, understands which variables the closure might access, and keeps them all alive. When we can do this, we call the closure lightweight.
But sometimes we can't determine statically which variables the closure might keep alive. For example, if the closure calls eval(), the eval'ed code might access anything. In this case, we make the closure "heavyweight" and it keeps alive everything in its containing scope.
All of this is to say, I don't see why this particular closure should be heavyweight. But also, it doesn't access homescreenFrame, so if it's lightweight, I don't see why it's holding that variable alive. So obviously my explanation is incomplete in some way. :) khuey is looking into it.
Assignee | ||
Comment 20•11 years ago
|
||
> Does "removeEventListener() before the element is removed" help?
In this case, yes. Alternatively, we could register just one event listener and use the target of the event to get the iframe.
> Do you suggest that we should remove event listener before an element is removed from DOM
> tree?
I'm not suggesting this as a general rule. The issue is that this particular event listener holds a reference to the DOM element in question. Therefore if we don't remove the event listener when we remove the DOM element from the tree, then the listener leaks the DOM element.
Comment 21•11 years ago
|
||
Again: it shouldn't. The cycle collector knows about event listeners!
Assignee | ||
Comment 22•11 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #18)
> > So that's just gonna leak.
>
> Cycle collector should collect this cycle. Why doesn't it? Or is the point
> that "self" is actually a live window all along or something?
Confusingly enough, |frame| is not the iframe. I think it is an element in the window that stays alive. So I don't think there's a cycle to collect.
However, if we registered the event listener on |this.app.iframe|, that might let the CC collect the closure...
Comment 23•11 years ago
|
||
Oh, so app.frame sticks around because someone is explicitly keeping it alive? Then yeah, its listeners will leak whatever they close over...
Comment 24•11 years ago
|
||
(In reply to Boris Zbarsky (:bz) from comment #23)
> Oh, so app.frame sticks around because someone is explicitly keeping it
> alive? Then yeah, its listeners will leak whatever they close over...
app.frame is the container element, a DIV, of the mozbrowser iframe.
<div class="app-window">
<iframe mozbrowser></iframe>
</div>
So we should use app.iframe to attach the event listener and need not to do remove event listener but just remove the whole frame element and then GC would work correctly? Correct me if I'm wrong. If yes, I'm going to fix this case.
(In reply to Justin Lebar [:jlebar] from comment #19)
> > I don't get your point here.
> > Which line is the heavyweight point you mean? Or you are saying the whole WindowManager
> > closure is something bad?
>
> In our JS engine, a closure can either be "heavyweight" or "lightweight".
>
> The idea is that a closure can access variables in its containing scope, so
> if you keep the closure alive, it has to keep alive the variables from the
> containing scope that it can access.
>
> Normally the JS engine parses the text of the closure, understands which
> variables the closure might access, and keeps them all alive. When we can
> do this, we call the closure lightweight.
>
> But sometimes we can't determine statically which variables the closure
> might keep alive. For example, if the closure calls eval(), the eval'ed
> code might access anything. In this case, we make the closure "heavyweight"
> and it keeps alive everything in its containing scope.
>
> All of this is to say, I don't see why this particular closure should be
> heavyweight. But also, it doesn't access homescreenFrame, so if it's
> lightweight, I don't see why it's holding that variable alive. So obviously
> my explanation is incomplete in some way. :) khuey is looking into it.
Thanks for explanation. I'll try to find out why this is relevant to |homescreenFrame| variable from my point of view.
(In reply to Alive Kuo [:alive] from comment #24)
> (In reply to Justin Lebar [:jlebar] from comment #19)
> > > I don't get your point here.
> > > Which line is the heavyweight point you mean? Or you are saying the whole WindowManager
> > > closure is something bad?
> >
> > In our JS engine, a closure can either be "heavyweight" or "lightweight".
> >
> > The idea is that a closure can access variables in its containing scope, so
> > if you keep the closure alive, it has to keep alive the variables from the
> > containing scope that it can access.
> >
> > Normally the JS engine parses the text of the closure, understands which
> > variables the closure might access, and keeps them all alive. When we can
> > do this, we call the closure lightweight.
> >
> > But sometimes we can't determine statically which variables the closure
> > might keep alive. For example, if the closure calls eval(), the eval'ed
> > code might access anything. In this case, we make the closure "heavyweight"
> > and it keeps alive everything in its containing scope.
> >
> > All of this is to say, I don't see why this particular closure should be
> > heavyweight. But also, it doesn't access homescreenFrame, so if it's
> > lightweight, I don't see why it's holding that variable alive. So obviously
> > my explanation is incomplete in some way. :) khuey is looking into it.
>
> Thanks for explanation. I'll try to find out why this is relevant to
> |homescreenFrame| variable from my point of view.
I'm digging into this with the help of some JS engine folks. Hopefully I'll have an answer soon.
Assignee | ||
Comment 26•11 years ago
|
||
Yes, I think switching to app.iframe would work. I'm testing a patch now, and I'll send a review your way, if that sounds OK to you.
Comment 27•11 years ago
|
||
(In reply to Justin Lebar [:jlebar] from comment #26)
> Yes, I think switching to app.iframe would work. I'm testing a patch now,
> and I'll send a review your way, if that sounds OK to you.
Sure, thank you!
Assignee | ||
Comment 28•11 years ago
|
||
Comment on attachment 775147 [details] [diff] [review]
Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it.
I'm moving this into its own bug (bug 894147), since the Gaia patches also have their own bugs.
That way, if there are any other surprise fixes we need here, we can continue to attach them off this bug.
Attachment #775147 -
Attachment is obsolete: true
Assignee | ||
Comment 29•11 years ago
|
||
Steps for testing:
1) Apply patches from bug 894081, bug 894135, bug 894147, and bug 893242. I'm working on landing these, but it will take a day or two longer.
2) Load the browser app, go to example.com
3) Load the gallery app
4) adb shell b2g-ps
5) adb shell kill -9 PIDS
where PIDS is equal to the list of all PIDs from step 4 above, not including the main b2g process. For example, if adb shell b2g-ps gives me
>APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME
>b2g root 2810 158 160808 71504 ffffffff 400f8430 S /system/b2g/b2g
>Usage app_2863 2863 2810 64392 24084 ffffffff 40020430 S /system/b2g/plugin-container
>Homescreen app_2882 2882 2810 69212 26684 ffffffff 40059430 S /system/b2g/plugin-container
>Gallery app_2962 2962 2810 84700 25088 ffffffff 400dd430 S /system/b2g/plugin-container
>Browser app_3004 3004 2810 64072 22204 ffffffff 4002e430 S /system/b2g/plugin-container
>(Preallocated a root 3068 2810 62996 21152 ffffffff 400e3430 S /system/b2g/plugin-container
Then I would run
> $ adb shell kill -9 2863 2882 2962 3004 3068
6) Wait ten seconds, then run
$ B2G/tools/get_about_memory.py --minimize
where B2G is the root B2G checkout (from https://github.com/mozilla-b2g/B2G).
7) In the resulting memory report, search for "content-parent".
If this bug is fixed, you should see only one or two content-parent entries:
One for the homescreen, and maybe one for the preallocated process. Neither
should have pid=-1, and neither should have a closed channel.
If this bug is not fixed, you will see content-parent entries with pid=-1 or
closed channels. You may see content-parent entries for Browser, Gallery, or
Usage, and you may see multiple content-parent entries for the Homescreen.
When I checked with these STR a few minutes ago, everything seemed fixed. But
now I'm seeing things which indicate that this bug may not be fixed. Please
stand by.
Depends on: 893242
Assignee | ||
Comment 30•11 years ago
|
||
Here's an example of a bad memory report, indicating that the bug still exists.
> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent((Preallocated), pid=3809, open channel, 0x46dd8000, refcnt=14)
> ├──0 (100.0%) ── content-parent(Homescreen, pid=-1, closed channel, 0x43208c00, refcnt=1)
> ├──0 (100.0%) ── content-parent(Homescreen, pid=3756, open channel, 0x45bef800, refcnt=15)
> └──0 (100.0%) ── content-parent(Usage, pid=-1, closed channel, 0x46f0a800, refcnt=1)
We want it instead to look like this.
> 0 (100.0%) -- queued-ipc-messages
> ├──0 (100.0%) ── content-parent((Preallocated), pid=3809, open channel, 0x46dd8000, refcnt=14)
> └──0 (100.0%) ── content-parent(Homescreen, pid=3756, open channel, 0x45bef800, refcnt=15)
Assignee | ||
Comment 31•11 years ago
|
||
It turns out that we "leak" one or two ContentParents under some circumstances
(*). This isn't a big deal -- each one is relatively small. What's important
is that the number of ContentParent objects we leak is bounded by a small
constant.
So to test for this, I need to amend the steps from comment 29 a bit.
A) Do step 1
B) Repeat steps 2, 3, 4, 5 five times.
C) Do steps 6 and 7
D) When looking through the memory report, we pass if there are at most two
ContentParent entries with pid=-1 and/or a closed channel. Otherwise we
fail.
Let me know if this isn't clear.
(*) In our testing, it looks like BrowserElementPanning.js can leak one
ContentParent, via its pointerDownTarget member.
It also looks like Gaia leaks one ContentParent when you kill the Usage
app. It gets a mozbrowsererror when the process crashes, but it apparently
does not remove the old, dead iframe until it wants to show the Usage app
again. These may be bugs (particularly the second one), but I don't think
they're a big deal.
Assignee | ||
Comment 32•11 years ago
|
||
One thing I should add is that I haven't tested any of this in b2g18 yet. All of my experiments have been on trunk.
If you're going to test on trunk, be sure that you have bug 887075; without it, we leak like crazy. As far as I know, this isn't a problem on b2g18.
Depends on: 887075
Assignee | ||
Updated•11 years ago
|
blocking-b2g: --- → leo+
Comment 33•11 years ago
|
||
Comment 34•11 years ago
|
||
Assignee | ||
Comment 35•11 years ago
|
||
Thank you; this is very useful information.
Assignee | ||
Comment 36•11 years ago
|
||
Next time you collect this information, it would be helpful if you'd collect it with get_about_memory.py --minimize.
--minimize forces a GC right before collecting the information. This helps reduce noise in the test results.
Assignee | ||
Comment 37•11 years ago
|
||
> Memory Reports And DMD After 16 hour
Also next time it would be helpful to have a GC/CC log both before and after. We'll work with what we have until you can get us better data, but if that's at all possible, please post the new data to the bug when you get it.
Assignee | ||
Comment 38•11 years ago
|
||
Judging by the DMD logs, there's definitely a ContentParent leak still remaining here.
But even if we fix that, it's not going to fix the whole problem. About 50% of the increase in memory usage is due to JS; that is likely a completely unrelated problem. To diagnose that, we really need before and after GC/CC logs.
My working theory is that maybe ContentParent::ActorDestroy isn't being run. That would definitely cause us to leak. I think we're going to try to catch this in a debugger tomorrow.
Just to be sure for when we try to reproduce this, the tests from comment 33 and 34 are run on b2g-18 and gaia v1-train (plus the patches)?
Flags: needinfo?(jeremy.kim.leo)
Assignee | ||
Comment 40•11 years ago
|
||
By the way, I just pushed a fix to the get_about_memory.py script that should fix getting memory reports from phones running b2g18.
If you git fetch && git merge in the root B2G repository, you should get this new version; hopefully it will work for you!
https://github.com/mozilla-b2g/B2G/commit/1f777756bfd1a80381208ef9851315ce1cec3dc3
Comment 41•11 years ago
|
||
Base Revision : AU_LINUX_GECKO_ICS_STRAWBERRY.01.01.00.019.149
gecko : 499c1f8ea7ad0cdaa7086214278e2944b8a2ea33
gaia : 0d5a9a7577f16b6a72a982148c6f509ee1714ea2
and add those patches.
+ Bug 894081, Bug 894135, Bug 894147, Bug 894191
+ Bug 893242 : except
Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding.
Part 1.5: Make sContentParents a StaticAutoPtr.
Flags: needinfo?(jeremy.kim.leo)
Comment 42•11 years ago
|
||
Digging in the logs some more, from the 33 hour CC/GC logs, I see around a thousand BEPs, but I also see about a thousand iframe elements. These iframes are not present in the CC log, so it seems like maybe the CC log it not an AllTraces log, so we can't tell why the event listeners that are holding the iframes alive are being held alive. But I guess the good news here is that it is a similar problem to one we've looked into before. Kyle is trying to reproduce the problem locally.
Assignee | ||
Comment 43•11 years ago
|
||
> so it seems like maybe the CC log it not an AllTraces log
Ack, I must have forgotten to mark bug 890135 as leo+. Sorry. :-/
We were able to reproduce more ContentParent leaks today. These latest leaks appear to be distinct from everything we've fixed so far. We were able to partially address them, but we ran into a bug in our GC heap dumping code where we're not dumping the entire heap (!?!?). We'll pick up on Monday trying to fix the tool so we can see what leaks remain.
Comment 45•11 years ago
|
||
Can you please provide steps to verify this fix?- to perform blackbox testing.
Depends on: 896776
Depends on: 896825
Updated•11 years ago
|
Whiteboard: [MemShrink] → [MemShrink:P1]
Comment 46•11 years ago
|
||
to gather more debugging infomation. this version applied more patchs.
- Bug 893242, Bug 895933, Bug 890135
and this is log of before testing after boot up.
memory-report with minimize
dmd
gc/cc log.
Comment 47•11 years ago
|
||
and this is log of after testing for 14 hour
memory-report with minimize
dmd
gc/cc log.
Thanks, jeremy, that's really useful.
The log from comment 47 has 827 (yes, really) leaked ContentParent objects. Tomorrow I think we will have patches that fix all the ContentParent leaks we have found so far. Hopefully after you retest with those we'll see a much small (maybe even 0!) number.
Depends on: 897707
Depends on: 897701
Hi Jeremy,
B2G-18 tip and gaia v1-train tip plus the patches from bug 897701, bug 896825, and bug 896755 fix all of the ContentParent leaks that we have been able to reproduce. It would be great if leo could test with those patches.
Flags: needinfo?(jeremy.kim.leo)
Comment 50•11 years ago
|
||
Comment 51•11 years ago
|
||
For comparison, I also checked this on the latest nightly on Unagi after 7 Trials
Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/91fa9bf5e638
Gaia a64f37c3327e4cf2105a16eedaf68727bb2adbde
BuildID 20130724070208
Version 18.0
Comment 52•11 years ago
|
||
It seems that, at least there, the bug was still occurring.
If you weren't on tip and don't have the patches from comment 49 that's expected.
Comment 54•11 years ago
|
||
Hi Justin,
This is Andre from Leo.
The bug 811636 is observed in Leo as well.
After hours of test, the output channel's queue starts accumulating again.
The child associated to that channel no longer exists, and the main process seems to hold both sides of the pipe. So no channel error is dispatched.
This is exactly what bug 811636 fixes.
I tested that patch in Leo, and the result was positive: no messages accumulating in the output queue.
So I am putting the current issue as dependent of that, since that causes the ContentParent to leak.
Comment 55•11 years ago
|
||
this version applied more patchs.
- Bug 897701, Bug 896825, Bug 896755
and this is log of before testing after boot up.
memory-report with minimize
dmd
gc/cc log.
Flags: needinfo?(jeremy.kim.leo)
Comment 56•11 years ago
|
||
and this is log of after testing for 3 hour
memory-report with minimize
dmd
gc/cc log.
Comment 57•11 years ago
|
||
and this is log of after testing for 15 hours
memory-report with minimize
dmd
gc/cc log.
awesome! there are only 3 ContentParent objects. it looks very greats.
That is great news. No leaked ContentParent objects after 15 hours, and RSS has only increased by 1.5 MB in 12 hours.
Jeremy if leo is happy with these results I'd like to close this bug as RESOLVED FIXED. But if you want to do more QA testing on these patches we can wait until you are done.
Flags: needinfo?(jeremy.kim.leo)
Comment 59•11 years ago
|
||
It is also worth noting that in the main process, from 3 hours to 15 hours, the only real increase in explicit memory (around 0.36mb) is from DMD itself.
Comment 60•11 years ago
|
||
sorry for delay because of weekends.
first of all, thanks for efforts of all.
i agree with Comment 58. it looks clear leaked issue caused by ContentParents.
let's change status of this as RESOLVED-FIXED.
in additionally, our official QA testing will start on next monday(05-AUG-2013) for a week.
it maybe, the official results is possible to take after two weeks.'
Flags: needinfo?(jeremy.kim.leo)
(In reply to leo.bugzilla.gecko from comment #60)
> sorry for delay because of weekends.
No problem. Have to take a break sometime ;-)
> first of all, thanks for efforts of all.
> i agree with Comment 58. it looks clear leaked issue caused by
> ContentParents.
> let's change status of this as RESOLVED-FIXED.
Great!
> in additionally, our official QA testing will start on next
> monday(05-AUG-2013) for a week.
> it maybe, the official results is possible to take after two weeks.'
Looking forward to those results.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Comment 62•11 years ago
|
||
Kyle, i have some question.
currently, i found an abnormal behavior.
after 3~4 days testing this, b2g has some delay when i do something on target.
(for examples, delayed launch application / delayed response as my action)
i checked 'adb shell top -m 10'
----------------------------------------------------------------------------------------------
User 88%, System 11%, IOW 0%, IRQ 0%
User 275 + Nice 0 + Sys 35 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 310
PID PR CPU% S #THR VSS RSS PCY UID Name
140 0 92% R 57 240296K 82372K fg root /system/b2g/b2g
163 0 5% S 6 5908K 436K fg root /system/bin/sensord
9099 0 1% R 1 1060K 452K fg root top
79 0 0% S 1 0K 0K fg root mmcqd/0
8909 0 0% S 1 0K 0K fg root kworker/0:2
8 0 0% S 1 0K 0K fg root suspend
9 0 0% S 1 0K 0K fg root sync_supers
10 0 0% S 1 0K 0K fg root bdi-default
11 0 0% S 1 0K 0K fg root kblockd
12 0 0% S 1 0K 0K fg root khubd
----------------------------------------------------------------------------------------------
sometimes, b2g process has 92% of CPU per 20 seconds.
so i guess that b2g is enabled some log to gather memory-report/dmd/gc/cc log.
is it possible to cause some delays? (after 3days, cc log is 14mbytes, gc log is 20mbytes)
(if it is right, how i can disable to log data?)
if i builds user-mode, is it disabled logging? (currently, i'm used userdebug-mode.)
Assignee | ||
Comment 63•11 years ago
|
||
There is no way to disable logging, but having logging enabled causes very little overhead, because Firefox has to maintain the data we use to generate the logs, even if we're not using the logs. We have this sort of logging enabled on all desktop Firefox builds.
What you are seeing is very unlikely related to gc/cc/about:memory logs.
Assignee | ||
Comment 64•11 years ago
|
||
You should use a profiler to see why Firefox is spinning. I'm not sure how a userdebug-mode build is made, but if you have debug symbols, you could attach GDB, hit ctrl-c, and observe the backtrace. Do this a bunch of times and you might understand why the build is spinning.
It might also be worth trying to use SPS, but that will probably only be helpful if we're spending time running JavaScript. Also I'm not sure you can use SPS without first rebooting the device, which isn't what you want to do.
https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29
This issue is probably worth a separate bug; I can help you make sure you have the right people cc'ed to help you with this issue.
Comment 65•11 years ago
|
||
ok. thanks Justin.
After attach GDB, When b2g has very high cpu, i hits ctrl-c, at this point, almost, b2g is doing to collect cycle memory. so, i doubt cc logging has a overhead.
and also, when i reboots to enable profile, because this will be cleared, so i can't do this yet.
i understood, i will file a bug as new after more investigation.
Assignee | ||
Comment 66•11 years ago
|
||
Ah, long cycle collection pauses! That is interesting.
Please cc me, :mccr8, and :smaug on the bug. Please also attach the gc and cc logs.
Comment 67•11 years ago
|
||
OK to mark this as status-b2g18:fixed? If so, please do :)
Flags: needinfo?(justin.lebar+bug)
Updated•11 years ago
|
status-b2g18:
--- → fixed
Flags: needinfo?(justin.lebar+bug)
You need to log in
before you can comment on or make changes to this bug.
Description
•