893012 - We leak a ContentParent when its corresponding process is SIGKIL'ed

Assignee

Description

•

11 years ago

STR: * Send SIGKIL to a B2G child process * ContentParent::ActorDestroy runs, but ContentParent::ShutDownProcess does not, and neither does the ContentParent destructor. This is at least part of the cause of bug 889261 and friends.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Blocks: 886217

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Assignee: nobody → justin.lebar+bug

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Whiteboard: [MemShrink]

Benjamin Smedberg

Comment 1

•

11 years ago

If ShutDownProcess is supposed to be called, it probably needs a slightly different name. But yes, ActorDestroy for the root actor is responsible for making sure that it triggers the actions necessary to destroy that actor, because the IPC layer doesn't own it and is therefore not responsible for destroying it as it does for all the child actors.

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Blocks: 880940, 870588

Justin Lebar (not reading bugmail)

Assignee

Comment 2

•

11 years ago

We leak the ContentParent because its nsFrameLoader keeps a strong ref to it in in |mChildHost|. Still figuring out why we leak the frame loader.

Ben Turner (not reading bugmail, use the needinfo flag!)

Comment 3

•

11 years ago

Smells cycle-y.

Justin Lebar (not reading bugmail)

Assignee

Comment 4

•

11 years ago

We leak the frameloader because we leak BrowserElementParent. We leak BEP in part because it registers an event listener on its window. I fixed that and it seems like we still leak the BEP, and I'm not sure yet what's going on. But I'm going to open a separate bug for closing the ContentParent's channel upon ActorDestroy. That should take care of the many leaked messages, and it's a prerequisite for not leaking ContentParent.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 893172

Justin Lebar (not reading bugmail)

Assignee

Comment 5

•

11 years ago

With the patch from bug 893172, if I remove the mChildHost ref in nsFrameLoader, my memory reporter from bug 893242 shows that we no longer leak the ContentParent. So that's good confirmation. But removing the mChildHost doesn't really fix the problem, since we're still leaking an nsFrameLoader.

Justin Lebar (not reading bugmail)

Assignee

Comment 6

•

11 years ago

Andrew (or someone), can you help me understand what this means? The nsFrameLoader I care about is 0x476a8ec0: > $ python find_roots.py cc-edges.14646.log 0x476a8ec0 > Parsing cc-edges.14646.log. Done loading graph. Reversing graph. Done. > > 0x44e66600 [JS Object (XPCWrappedNative_NoHelper)] > --[js::GetObjectPrivate(obj)]-> 0x4721baf0 [XPCWrappedNative] > --[mIdentity]-> 0x476a8ec0 [nsFrameLoader] > > Root 0x44e66600 is a marked GC object. So far so good; it's held alive only by JS. $ python find_roots.py gc-edges.14646.log 0x44e66600 Parsing gc-edges.14646.log. Done loading graph. Reversing graph. Done. > via resource://gre/modules/BrowserElementParent.jsm : > 0x4391f0a0 [FakeBackstagePass <no private>] > --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>] > --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via resource://gre/modules/BrowserElementPromptService.jsm : > 0x4391f0b0 [FakeBackstagePass <no private>] > --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>] > --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via XPCWrappedNative::mFlatJSObject : > 0x47455940 [ContentFrameMessageManager 460381f0] > --[BrowserElementPromptService]-> 0x4394e0a0 [Proxy <no private>] > --[private]-> 0x4393b2b0 [Object <no private>] > --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via jar:file:///system/b2g/omni.ja!/components/Webapps.js : > 0x4391fa20 [FakeBackstagePass <no private>] > --[BrowserElementPromptService]-> 0x4393b2b0 [Object <no private>] > --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via XPCWrappedNative::mFlatJSObject : > 0x47455040 [ContentFrameMessageManager 469ea100] > --[BrowserElementPromptService]-> 0x47725460 [Proxy <no private>] > --[private]-> 0x4393b2b0 [Object <no private>] > --[_browserElementParentMap]-> 0x4393c020 [WeakMap 46ebffd0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via nsXPCWrappedJS::mJSObj : > 0x43944100 [Object <no private>] > --[_bepMap]-> 0x4393c5a0 [WeakMap 460380a0] > --[WeakMap entry]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] > > via Preserved wrapper : > 0x43b72580 [HTMLIFrameElement <no private>] > --[setVisible]-> 0x43b84150 [Proxy <no private>] > --[private]-> 0x47781f40 [Function] > --[fun_callscope]-> 0x47481d30 [Call <no private>] > --[**UNKNOWN SLOT 1**]-> 0x47781f00 [Function BrowserElementParent/defineMethod] > --[fun_callscope]-> 0x47481b50 [Call <no private>] > --[self]-> 0x47481b80 [Object <no private>] > --[_frameLoader]-> 0x44e66600 [XPCWrappedNative_NoHelper 4721baf0] The only non-weak entry here is the last one. The <iframe mozbrowser> has an expando called setVisible, and that function holds a ref to |self|, which is BrowserElementParent, which holds a ref to _frameLoader. That makes sense. There are a few other expandos on <iframe mozbrowser> which are functions whose scopes also hold a ref to the BrowserElementParent, but I presume they're being elided here. So now, why is the iframe alive? > $ python find_roots.py gc-edges.14646.log 0x43b72580 > Parsing gc-edges.14646.log. Done loading graph. Reversing graph. Done. > > via Preserved wrapper : > 0x43b72580 [HTMLIFrameElement <no private>] Hm...what's a preserved wrapper? The cc log isn't any more help. > $ python find_roots.py cc-edges.14646.log 0x43b72580 > Parsing cc-edges.14646.log. Done loading graph. Reversing graph. Done. > > 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x43b72580 is a marked GC object. I tried to look at the object in gdb, but 0x43b72580 does not appear to be a mozilla::dom::HTMLIFrameElement; its vtable pointers don't line up with anything gdb knows about, and its refcnt is 1136019944. Maybe that pointer is to one of the object's nsISupports's, or something. Anyway, I'm not sure what to do now. I could make the expandos on the iframe hold weak refs to the BrowserElementParent, but then we'd still be leaking the iframe. Any ideas?

Justin Lebar (not reading bugmail)

Assignee

Comment 7

•

11 years ago

Ah, I misunderstood those weakmap refs. They're refs where the BEP is the /value/, which is a strong ref. However, these are all keyed on the iframe, so when the iframe goes away, the BEP should too. So we're still left with the question of why this iframe is alive.

Justin Lebar (not reading bugmail)

Assignee

Comment 8

•

11 years ago

Attached patch Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it. (obsolete) (deleted) — Details — Splinter Review

Without this, both the window and the iframe hold the BEP alive. With this patch, only the iframe seems to hold the BEP alive (per the above comments), which is what we want. But we still leak the iframe, it seems.

Andrew McCreight [:mccr8]

Comment 9

•

11 years ago

Could you upload the GC and CC logs you are talking about? I can look at them. 0x43b72580 is a JS Object (HTMLIFrameElement), which means it is the JS reflector for the HTMLIFrameElement, not the element itself. This is an all-traces log (hopefully), so you want to look up the reflector 0x43b72580 in the CC log, and that should contain a pointer to the C++ object. When a JS reflector (aka wrapper in the context of preserved wrapper) is created for a C++ object, it owns the C++ object, but not vice versa. If you set an expando on the reflector (or a few other things), we want to ensure the reflector lives as long as the C++ object, so we make the C++ object own its reflector. That's called a "preserved wrapper".

Andrew McCreight [:mccr8]

Comment 10

•

11 years ago

While a preserved wrapper on a C++ object makes it own its reflector, it shouldn't cause its reflector to be marked black (which means the CC won't collect it), except as a CC optimization (if we decide the C++ object is definitely alive, then if the reflector is preserved, the reflector is definitely alive), so I suspect this is an instance where AllTraces is failing to deoptimize, perhaps in mozilla::dom::TraceBlackJS (I filed bug 893378 for that). The good news though is that this is really just find_roots getting confused, the information should still be in the log. In general, you might need to manually root around (no pun intended) in the log, but here I think it should suffice to edit the log, and replace the [gc.marked] next to the entry for 0x43b72580 with [gc]. This will void your warranty (eg it may mess stuff up, including but not limited to things this reflector owns), but I think it should be okay, because nothing else is pointing to the reflector.

Justin Lebar (not reading bugmail)

Assignee

Comment 11

•

11 years ago

Attached file cc edges from comment 6 (deleted) — Details

Justin Lebar (not reading bugmail)

Assignee

Comment 12

•

11 years ago

Attached file gc edges from comment 6 (deleted) — Details

Justin Lebar (not reading bugmail)

Assignee

Comment 13

•

11 years ago

Okay, with the suggested change to the log from comment 10, I now get > $ python cc/find_roots.py cc-edges.log 0x43b72580 > Parsing cc-edges.log. Done loading graph. Reversing graph. Done. > > 0x43b725b0 [JS Object (HTMLDivElement)] > --[UnwrapDOMObject(obj)]-> 0x476a8dd0 [FragmentOrElement (xhtml) div id='appframe0' class='appWindow homescreen active' app://system.gaiamobile.org/index.html] > --[mAttrsAndChildren[i]]-> 0x46b9a880 [FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html] > --[Preserved wrapper]-> 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x43b725b0 is a marked GC object. > > 0x43b72610 [JS Object (HTMLDivElement)] > --[UnwrapDOMObject(obj)]-> 0x47b472e0 [FragmentOrElement (xhtml) div class='screenshot-overlay' app://system.gaiamobile.org/index.html] > --[GetParent()]-> 0x476a8dd0 [FragmentOrElement (xhtml) div id='appframe0' class='appWindow homescreen active' app://system.gaiamobile.org/index.html] > --[mAttrsAndChildren[i]]-> 0x46b9a880 [FragmentOrElement (xhtml) iframe app://system.gaiamobile.org/index.html] > --[Preserved wrapper]-> 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x43b72610 is a marked GC object. > > 0x47756d90 [JS Object (Proxy)] > --[private]-> 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x47756d90 is a marked GC object. > > 0x43b84010 [JS Object (Call)] > --[iframe]-> 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x43b84010 is a marked GC object. > > 0x43b6eeb0 [JS Object (Object)] > --[iframe]-> 0x43b72580 [JS Object (HTMLIFrameElement)] > > Root 0x43b6eeb0 is a marked GC object. I'm not sure what the last three mean, but the first is indicating that we're holding a div element alive, and that div contains the iframe. I think the second is saying that this first div contains a second div, and that second div holds alive the first one by virtue of its parent pointer. It's not clear to me whether this rules out the fact that the second div is held alive by the first one (by virtue of being its child); that is, I don't know if the second div is interesting.

Justin Lebar (not reading bugmail)

Assignee

Comment 14

•

11 years ago

Looking for gc roots of the first div element appears to show that Gaia is leaking the world. > $ python ~/code/moz/heapgraph/gc/find_roots.py gc-edges.log 0x43b725b0 > Parsing gc-edges.log. Done loading graph. Reversing graph. Done. > > via nsXPCWrappedJS::mJSObj : > 0x43b6dac0 [Function AppError/<] > --[fun_callscope]-> 0x43b6eee0 [Call <no private>] > --[self]-> 0x43b6ef10 [Object <no private>] > --[app]-> 0x43b6eeb0 [Object <no private>] > --[frame]-> 0x43b725b0 [HTMLDivElement <no private>] > > via nsXPCWrappedJS::mJSObj : > 0x43b6db40 [Function apploaded] > --[fun_callscope]-> 0x43b84010 [Call <no private>] > --[homescreenFrame]-> 0x43b725b0 [HTMLDivElement <no private>]

Justin Lebar (not reading bugmail)

Assignee

Comment 15

•

11 years ago

In Gaia's apps/system/js/window.js: > window.AppError = function AppError(app) { > var self = this; > this.app = app; > this.app.frame.addEventListener('mozbrowsererror', function(evt) { > if (evt.detail.type != 'other') > return; > > console.warn( > 'app of [' + self.app.origin + '] got a mozbrowsererror event.'); > > if (self.injected) { > self.update(); > } else { > self.render(); > } > self.show(); > self.injected = true; > }); > return this; > }; We register one mozbrowsererror listener for every frame, but we never remove the listener. The mozbrowsererror listener's closure holds alive self, which is |this|, which holds alive |this.app|, which holds alive |frame|, which holds alive our iframe. Sheesh. So that's just gonna leak. In Gaia's apps/system/js/window_manager.js: > app.frame.addEventListener(type, function apploaded(e) { > e.target.removeEventListener(e.type, apploaded, true); > > var evt = document.createEvent('CustomEvent'); > evt.initCustomEvent('apploadtime', true, false, { > time: parseInt(Date.now() - iframe.dataset.start), > type: (e.type == 'appopen') ? 'w' : 'c' > }); > iframe.dispatchEvent(evt); > }, true); This one is a bit weird. If this closure holds alive homescreenFrame (which exists in the closure's scope but is not referenced inside the closure), does that mean the closure is heavyweight? If so, why is it heavyweight? Let us pause to consider how incredibly difficult it is to write JS that doesn't leak.

(inactive after 6/18) Alive Kuo [:alive]

Comment 16

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #15) > In Gaia's apps/system/js/window.js: > > We register one mozbrowsererror listener for every frame, but we never > remove the listener. The mozbrowsererror listener's closure holds alive > self, which is |this|, which holds alive |this.app|, which holds alive > |frame|, which holds alive our iframe. Sheesh. > > So that's just gonna leak. Does "removeEventListener() before the element is removed" help? Do you suggest that we should remove event listener before an element is removed from DOM tree? > > In Gaia's apps/system/js/window_manager.js: > > > app.frame.addEventListener(type, function apploaded(e) { > > e.target.removeEventListener(e.type, apploaded, true); > > > > var evt = document.createEvent('CustomEvent'); > > evt.initCustomEvent('apploadtime', true, false, { > > time: parseInt(Date.now() - iframe.dataset.start), > > type: (e.type == 'appopen') ? 'w' : 'c' > > }); > > iframe.dispatchEvent(evt); > > }, true); > > This one is a bit weird. If this closure holds alive homescreenFrame (which > exists in the closure's scope but is not referenced inside the closure), > does that mean the closure is heavyweight? If so, why is it heavyweight? This one is not invented by me...however I don't get your point here. Which line is the heavyweight point you mean? Or you are saying the whole WindowManager closure is something bad? > > Let us pause to consider how incredibly difficult it is to write JS that > doesn't leak. I agree.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 17

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #15) > In Gaia's apps/system/js/window_manager.js: > > > app.frame.addEventListener(type, function apploaded(e) { > > e.target.removeEventListener(e.type, apploaded, true); > > > > var evt = document.createEvent('CustomEvent'); > > evt.initCustomEvent('apploadtime', true, false, { > > time: parseInt(Date.now() - iframe.dataset.start), > > type: (e.type == 'appopen') ? 'w' : 'c' > > }); > > iframe.dispatchEvent(evt); > > }, true); > > This one is a bit weird. If this closure holds alive homescreenFrame (which > exists in the closure's scope but is not referenced inside the closure), > does that mean the closure is heavyweight? If so, why is it heavyweight? I believe there are situations where a closure can keep alive things that appear to be unnecessary without being heavyweight. Let's poke at this tomorrow (today? :-P) in the debugger, and rope in some JS folks if we need them. > Let us pause to consider how incredibly difficult it is to write JS that > doesn't leak. *le sigh*

Boris Zbarsky [:bzbarsky]

Comment 18

•

11 years ago

> So that's just gonna leak. Cycle collector should collect this cycle. Why doesn't it? Or is the point that "self" is actually a live window all along or something?

Justin Lebar (not reading bugmail)

Assignee

Comment 19

•

11 years ago

> I don't get your point here. > Which line is the heavyweight point you mean? Or you are saying the whole WindowManager > closure is something bad? In our JS engine, a closure can either be "heavyweight" or "lightweight". The idea is that a closure can access variables in its containing scope, so if you keep the closure alive, it has to keep alive the variables from the containing scope that it can access. Normally the JS engine parses the text of the closure, understands which variables the closure might access, and keeps them all alive. When we can do this, we call the closure lightweight. But sometimes we can't determine statically which variables the closure might keep alive. For example, if the closure calls eval(), the eval'ed code might access anything. In this case, we make the closure "heavyweight" and it keeps alive everything in its containing scope. All of this is to say, I don't see why this particular closure should be heavyweight. But also, it doesn't access homescreenFrame, so if it's lightweight, I don't see why it's holding that variable alive. So obviously my explanation is incomplete in some way. :) khuey is looking into it.

Justin Lebar (not reading bugmail)

Assignee

Comment 20

•

11 years ago

> Does "removeEventListener() before the element is removed" help? In this case, yes. Alternatively, we could register just one event listener and use the target of the event to get the iframe. > Do you suggest that we should remove event listener before an element is removed from DOM > tree? I'm not suggesting this as a general rule. The issue is that this particular event listener holds a reference to the DOM element in question. Therefore if we don't remove the event listener when we remove the DOM element from the tree, then the listener leaks the DOM element.

Boris Zbarsky [:bzbarsky]

Comment 21

•

11 years ago

Again: it shouldn't. The cycle collector knows about event listeners!

Justin Lebar (not reading bugmail)

Assignee

Comment 22

•

11 years ago

(In reply to Boris Zbarsky (:bz) from comment #18) > > So that's just gonna leak. > > Cycle collector should collect this cycle. Why doesn't it? Or is the point > that "self" is actually a live window all along or something? Confusingly enough, |frame| is not the iframe. I think it is an element in the window that stays alive. So I don't think there's a cycle to collect. However, if we registered the event listener on |this.app.iframe|, that might let the CC collect the closure...

Boris Zbarsky [:bzbarsky]

Comment 23

•

11 years ago

Oh, so app.frame sticks around because someone is explicitly keeping it alive? Then yeah, its listeners will leak whatever they close over...

(inactive after 6/18) Alive Kuo [:alive]

Comment 24

•

11 years ago

(In reply to Boris Zbarsky (:bz) from comment #23) > Oh, so app.frame sticks around because someone is explicitly keeping it > alive? Then yeah, its listeners will leak whatever they close over... app.frame is the container element, a DIV, of the mozbrowser iframe. <div class="app-window"> <iframe mozbrowser></iframe> </div> So we should use app.iframe to attach the event listener and need not to do remove event listener but just remove the whole frame element and then GC would work correctly? Correct me if I'm wrong. If yes, I'm going to fix this case. (In reply to Justin Lebar [:jlebar] from comment #19) > > I don't get your point here. > > Which line is the heavyweight point you mean? Or you are saying the whole WindowManager > > closure is something bad? > > In our JS engine, a closure can either be "heavyweight" or "lightweight". > > The idea is that a closure can access variables in its containing scope, so > if you keep the closure alive, it has to keep alive the variables from the > containing scope that it can access. > > Normally the JS engine parses the text of the closure, understands which > variables the closure might access, and keeps them all alive. When we can > do this, we call the closure lightweight. > > But sometimes we can't determine statically which variables the closure > might keep alive. For example, if the closure calls eval(), the eval'ed > code might access anything. In this case, we make the closure "heavyweight" > and it keeps alive everything in its containing scope. > > All of this is to say, I don't see why this particular closure should be > heavyweight. But also, it doesn't access homescreenFrame, so if it's > lightweight, I don't see why it's holding that variable alive. So obviously > my explanation is incomplete in some way. :) khuey is looking into it. Thanks for explanation. I'll try to find out why this is relevant to |homescreenFrame| variable from my point of view.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 25

•

11 years ago

(In reply to Alive Kuo [:alive] from comment #24) > (In reply to Justin Lebar [:jlebar] from comment #19) > > > I don't get your point here. > > > Which line is the heavyweight point you mean? Or you are saying the whole WindowManager > > > closure is something bad? > > > > In our JS engine, a closure can either be "heavyweight" or "lightweight". > > > > The idea is that a closure can access variables in its containing scope, so > > if you keep the closure alive, it has to keep alive the variables from the > > containing scope that it can access. > > > > Normally the JS engine parses the text of the closure, understands which > > variables the closure might access, and keeps them all alive. When we can > > do this, we call the closure lightweight. > > > > But sometimes we can't determine statically which variables the closure > > might keep alive. For example, if the closure calls eval(), the eval'ed > > code might access anything. In this case, we make the closure "heavyweight" > > and it keeps alive everything in its containing scope. > > > > All of this is to say, I don't see why this particular closure should be > > heavyweight. But also, it doesn't access homescreenFrame, so if it's > > lightweight, I don't see why it's holding that variable alive. So obviously > > my explanation is incomplete in some way. :) khuey is looking into it. > > Thanks for explanation. I'll try to find out why this is relevant to > |homescreenFrame| variable from my point of view. I'm digging into this with the help of some JS engine folks. Hopefully I'll have an answer soon.

Justin Lebar (not reading bugmail)

Assignee

Comment 26

•

11 years ago

Yes, I think switching to app.iframe would work. I'm testing a patch now, and I'll send a review your way, if that sounds OK to you.

(inactive after 6/18) Alive Kuo [:alive]

Comment 27

•

11 years ago

(In reply to Justin Lebar [:jlebar] from comment #26) > Yes, I think switching to app.iframe would work. I'm testing a patch now, > and I'll send a review your way, if that sounds OK to you. Sure, thank you!

Andrew McCreight [:mccr8]

Updated

•

11 years ago

No longer blocks: 870588

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 894081

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 894135

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 894147

Justin Lebar (not reading bugmail)

Assignee

Comment 28

•

11 years ago

Comment on attachment 775147 [details] [diff] [review] Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it. I'm moving this into its own bug (bug 894147), since the Gaia patches also have their own bugs. That way, if there are any other surprise fixes we need here, we can continue to attach them off this bug.

Attachment #775147 - Attachment is obsolete: true

Justin Lebar (not reading bugmail)

Assignee

Comment 29

•

11 years ago

Steps for testing: 1) Apply patches from bug 894081, bug 894135, bug 894147, and bug 893242. I'm working on landing these, but it will take a day or two longer. 2) Load the browser app, go to example.com 3) Load the gallery app 4) adb shell b2g-ps 5) adb shell kill -9 PIDS where PIDS is equal to the list of all PIDs from step 4 above, not including the main b2g process. For example, if adb shell b2g-ps gives me >APPLICATION USER PID PPID VSIZE RSS WCHAN PC NAME >b2g root 2810 158 160808 71504 ffffffff 400f8430 S /system/b2g/b2g >Usage app_2863 2863 2810 64392 24084 ffffffff 40020430 S /system/b2g/plugin-container >Homescreen app_2882 2882 2810 69212 26684 ffffffff 40059430 S /system/b2g/plugin-container >Gallery app_2962 2962 2810 84700 25088 ffffffff 400dd430 S /system/b2g/plugin-container >Browser app_3004 3004 2810 64072 22204 ffffffff 4002e430 S /system/b2g/plugin-container >(Preallocated a root 3068 2810 62996 21152 ffffffff 400e3430 S /system/b2g/plugin-container Then I would run > $ adb shell kill -9 2863 2882 2962 3004 3068 6) Wait ten seconds, then run $ B2G/tools/get_about_memory.py --minimize where B2G is the root B2G checkout (from https://github.com/mozilla-b2g/B2G). 7) In the resulting memory report, search for "content-parent". If this bug is fixed, you should see only one or two content-parent entries: One for the homescreen, and maybe one for the preallocated process. Neither should have pid=-1, and neither should have a closed channel. If this bug is not fixed, you will see content-parent entries with pid=-1 or closed channels. You may see content-parent entries for Browser, Gallery, or Usage, and you may see multiple content-parent entries for the Homescreen. When I checked with these STR a few minutes ago, everything seemed fixed. But now I'm seeing things which indicate that this bug may not be fixed. Please stand by.

Depends on: 893242

Justin Lebar (not reading bugmail)

Assignee

Comment 30

•

11 years ago

Here's an example of a bad memory report, indicating that the bug still exists. > 0 (100.0%) -- queued-ipc-messages > ├──0 (100.0%) ── content-parent((Preallocated), pid=3809, open channel, 0x46dd8000, refcnt=14) > ├──0 (100.0%) ── content-parent(Homescreen, pid=-1, closed channel, 0x43208c00, refcnt=1) > ├──0 (100.0%) ── content-parent(Homescreen, pid=3756, open channel, 0x45bef800, refcnt=15) > └──0 (100.0%) ── content-parent(Usage, pid=-1, closed channel, 0x46f0a800, refcnt=1) We want it instead to look like this. > 0 (100.0%) -- queued-ipc-messages > ├──0 (100.0%) ── content-parent((Preallocated), pid=3809, open channel, 0x46dd8000, refcnt=14) > └──0 (100.0%) ── content-parent(Homescreen, pid=3756, open channel, 0x45bef800, refcnt=15)

Justin Lebar (not reading bugmail)

Assignee

Comment 31

•

11 years ago

It turns out that we "leak" one or two ContentParents under some circumstances (*). This isn't a big deal -- each one is relatively small. What's important is that the number of ContentParent objects we leak is bounded by a small constant. So to test for this, I need to amend the steps from comment 29 a bit. A) Do step 1 B) Repeat steps 2, 3, 4, 5 five times. C) Do steps 6 and 7 D) When looking through the memory report, we pass if there are at most two ContentParent entries with pid=-1 and/or a closed channel. Otherwise we fail. Let me know if this isn't clear. (*) In our testing, it looks like BrowserElementPanning.js can leak one ContentParent, via its pointerDownTarget member. It also looks like Gaia leaks one ContentParent when you kill the Usage app. It gets a mozbrowsererror when the process crashes, but it apparently does not remove the old, dead iframe until it wants to show the Usage app again. These may be bugs (particularly the second one), but I don't think they're a big deal.

Justin Lebar (not reading bugmail)

Assignee

Comment 32

•

11 years ago

One thing I should add is that I haven't tested any of this in b2g18 yet. All of my experiments have been on trunk. If you're going to test on trunk, be sure that you have bug 887075; without it, we leak like crazy. As far as I know, this isn't a problem on b2g18.

Depends on: 887075

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

blocking-b2g: --- → leo+

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 894597

Brian Hackett [Laid off!]

Updated

•

11 years ago

Depends on: 894669

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 894191

jeremy.kim.leo [:jeremykimleo]

Comment 33

•

11 years ago

Attached file Memory Reports And DMD After 16 hour (deleted) — Details

this log applied Bug 894081, Bug 894135, Bug 894147, Bug 893242 this was taken after 16 hour.

jeremy.kim.leo [:jeremykimleo]

Comment 34

•

11 years ago

Attached file Memory Reports And DMD And GC/CC log After 33 hour (deleted) — Details

this log applied Bug 894081, Bug 894135, Bug 894147, Bug 893242 this was taken after 33 hour.

Justin Lebar (not reading bugmail)

Assignee

Comment 35

•

11 years ago

Thank you; this is very useful information.

Justin Lebar (not reading bugmail)

Assignee

Comment 36

•

11 years ago

Next time you collect this information, it would be helpful if you'd collect it with get_about_memory.py --minimize. --minimize forces a GC right before collecting the information. This helps reduce noise in the test results.

Justin Lebar (not reading bugmail)

Assignee

Comment 37

•

11 years ago

> Memory Reports And DMD After 16 hour Also next time it would be helpful to have a GC/CC log both before and after. We'll work with what we have until you can get us better data, but if that's at all possible, please post the new data to the bug when you get it.

Justin Lebar (not reading bugmail)

Assignee

Comment 38

•

11 years ago

Judging by the DMD logs, there's definitely a ContentParent leak still remaining here. But even if we fix that, it's not going to fix the whole problem. About 50% of the increase in memory usage is due to JS; that is likely a completely unrelated problem. To diagnose that, we really need before and after GC/CC logs. My working theory is that maybe ContentParent::ActorDestroy isn't being run. That would definitely cause us to leak. I think we're going to try to catch this in a debugger tomorrow.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 39

•

11 years ago

Just to be sure for when we try to reproduce this, the tests from comment 33 and 34 are run on b2g-18 and gaia v1-train (plus the patches)?

Flags: needinfo?(jeremy.kim.leo)

Justin Lebar (not reading bugmail)

Assignee

Comment 40

•

11 years ago

By the way, I just pushed a fix to the get_about_memory.py script that should fix getting memory reports from phones running b2g18. If you git fetch && git merge in the root B2G repository, you should get this new version; hopefully it will work for you! https://github.com/mozilla-b2g/B2G/commit/1f777756bfd1a80381208ef9851315ce1cec3dc3

jeremy.kim.leo [:jeremykimleo]

Comment 41

•

11 years ago

Base Revision : AU_LINUX_GECKO_ICS_STRAWBERRY.01.01.00.019.149 gecko : 499c1f8ea7ad0cdaa7086214278e2944b8a2ea33 gaia : 0d5a9a7577f16b6a72a982148c6f509ee1714ea2 and add those patches. + Bug 894081, Bug 894135, Bug 894147, Bug 894191 + Bug 893242 : except Part 3, v2: Add a memory reporter counting ContentParents and the number of IPC messages they have outstanding. Part 1.5: Make sContentParents a StaticAutoPtr.

Flags: needinfo?(jeremy.kim.leo)

Andrew McCreight [:mccr8]

Comment 42

•

11 years ago

Digging in the logs some more, from the 33 hour CC/GC logs, I see around a thousand BEPs, but I also see about a thousand iframe elements. These iframes are not present in the CC log, so it seems like maybe the CC log it not an AllTraces log, so we can't tell why the event listeners that are holding the iframes alive are being held alive. But I guess the good news here is that it is a similar problem to one we've looked into before. Kyle is trying to reproduce the problem locally.

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Depends on: 895933

Justin Lebar (not reading bugmail)

Assignee

Comment 43

•

11 years ago

> so it seems like maybe the CC log it not an AllTraces log Ack, I must have forgotten to mark bug 890135 as leo+. Sorry. :-/

Till Schneidereit [:till]

Updated

•

11 years ago

Depends on: 896088

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 44

•

11 years ago

We were able to reproduce more ContentParent leaks today. These latest leaks appear to be distinct from everything we've fixed so far. We were able to partially address them, but we ran into a bug in our GC heap dumping code where we're not dumping the entire heap (!?!?). We'll pick up on Monday trying to fix the tool so we can see what leaks remain.

dkumar

Comment 45

•

11 years ago

Can you please provide steps to verify this fix?- to perform blackbox testing.

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 896755

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

11 years ago

Depends on: 896776

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Depends on: 896773

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

11 years ago

Depends on: 896825

Nicholas Nethercote [inactive]

Updated

•

11 years ago

Whiteboard: [MemShrink] → [MemShrink:P1]

jeremy.kim.leo [:jeremykimleo]

Comment 46

•

11 years ago

Attached file memory report before testing. (deleted) — Details

to gather more debugging infomation. this version applied more patchs. - Bug 893242, Bug 895933, Bug 890135 and this is log of before testing after boot up. memory-report with minimize dmd gc/cc log.

jeremy.kim.leo [:jeremykimleo]

Comment 47

•

11 years ago

Attached file memory report after testing for 14 hour. (deleted) — Details

and this is log of after testing for 14 hour memory-report with minimize dmd gc/cc log.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 48

•

11 years ago

Thanks, jeremy, that's really useful. The log from comment 47 has 827 (yes, really) leaked ContentParent objects. Tomorrow I think we will have patches that fix all the ContentParent leaks we have found so far. Hopefully after you retest with those we'll see a much small (maybe even 0!) number.

Andrew McCreight [:mccr8]

Updated

•

11 years ago

No longer depends on: 896088

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

11 years ago

Depends on: 897707

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Updated

•

11 years ago

Depends on: 897701

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 49

•

11 years ago

Hi Jeremy, B2G-18 tip and gaia v1-train tip plus the patches from bug 897701, bug 896825, and bug 896755 fix all of the ContentParent leaks that we have been able to reproduce. It would be great if leo could test with those patches.

Flags: needinfo?(jeremy.kim.leo)

John Hammink

Comment 50

•

11 years ago

Attached file memory reports (get_about_memory.py --minimize output) from unagi after 7 STRs (as outlined in comment 31) (deleted) — Details

John Hammink

Comment 51

•

11 years ago

For comparison, I also checked this on the latest nightly on Unagi after 7 Trials Gecko http://hg.mozilla.org/releases/mozilla-b2g18/rev/91fa9bf5e638 Gaia a64f37c3327e4cf2105a16eedaf68727bb2adbde BuildID 20130724070208 Version 18.0

John Hammink

Comment 52

•

11 years ago

It seems that, at least there, the bug was still occurring.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 53

•

11 years ago

If you weren't on tip and don't have the patches from comment 49 that's expected.

Andre Graziani (:graziani)

Comment 54

•

11 years ago

Hi Justin, This is Andre from Leo. The bug 811636 is observed in Leo as well. After hours of test, the output channel's queue starts accumulating again. The child associated to that channel no longer exists, and the main process seems to hold both sides of the pipe. So no channel error is dispatched. This is exactly what bug 811636 fixes. I tested that patch in Leo, and the result was positive: no messages accumulating in the output queue. So I am putting the current issue as dependent of that, since that causes the ContentParent to leak.

leo.bugzilla.gecko

Updated

•

11 years ago

Depends on: 811636

jeremy.kim.leo [:jeremykimleo]

Comment 55

•

11 years ago

Attached file memory report before testing. (deleted) — Details

this version applied more patchs. - Bug 897701, Bug 896825, Bug 896755 and this is log of before testing after boot up. memory-report with minimize dmd gc/cc log.

Flags: needinfo?(jeremy.kim.leo)

jeremy.kim.leo [:jeremykimleo]

Comment 56

•

11 years ago

Attached file memory report after testing for 3 hour. (deleted) — Details

and this is log of after testing for 3 hour memory-report with minimize dmd gc/cc log.

jeremy.kim.leo [:jeremykimleo]

Comment 57

•

11 years ago

Attached file memory report after testing for 15 hours (deleted) — Details

and this is log of after testing for 15 hours memory-report with minimize dmd gc/cc log. awesome! there are only 3 ContentParent objects. it looks very greats.

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 58

•

11 years ago

That is great news. No leaked ContentParent objects after 15 hours, and RSS has only increased by 1.5 MB in 12 hours. Jeremy if leo is happy with these results I'd like to close this bug as RESOLVED FIXED. But if you want to do more QA testing on these patches we can wait until you are done.

Flags: needinfo?(jeremy.kim.leo)

Justin Lebar (not reading bugmail)

Assignee

Updated

•

11 years ago

Keywords: meta

Andrew McCreight [:mccr8]

Comment 59

•

11 years ago

It is also worth noting that in the main process, from 3 hours to 15 hours, the only real increase in explicit memory (around 0.36mb) is from DMD itself.

Andrew McCreight [:mccr8]

Updated

•

11 years ago

No longer depends on: 896776

leo.bugzilla.gecko

Comment 60

•

11 years ago

sorry for delay because of weekends. first of all, thanks for efforts of all. i agree with Comment 58. it looks clear leaked issue caused by ContentParents. let's change status of this as RESOLVED-FIXED. in additionally, our official QA testing will start on next monday(05-AUG-2013) for a week. it maybe, the official results is possible to take after two weeks.'

Flags: needinfo?(jeremy.kim.leo)

Kyle Huey (Exited; not receiving bugmail, old account, do not use)

Comment 61

•

11 years ago

(In reply to leo.bugzilla.gecko from comment #60) > sorry for delay because of weekends. No problem. Have to take a break sometime ;-) > first of all, thanks for efforts of all. > i agree with Comment 58. it looks clear leaked issue caused by > ContentParents. > let's change status of this as RESOLVED-FIXED. Great! > in additionally, our official QA testing will start on next > monday(05-AUG-2013) for a week. > it maybe, the official results is possible to take after two weeks.' Looking forward to those results.

Status: NEW → RESOLVED

Closed: 11 years ago

Resolution: --- → FIXED

leo.bugzilla.gecko

Comment 62

•

11 years ago

Kyle, i have some question. currently, i found an abnormal behavior. after 3~4 days testing this, b2g has some delay when i do something on target. (for examples, delayed launch application / delayed response as my action) i checked 'adb shell top -m 10' ---------------------------------------------------------------------------------------------- User 88%, System 11%, IOW 0%, IRQ 0% User 275 + Nice 0 + Sys 35 + Idle 0 + IOW 0 + IRQ 0 + SIRQ 0 = 310 PID PR CPU% S #THR VSS RSS PCY UID Name 140 0 92% R 57 240296K 82372K fg root /system/b2g/b2g 163 0 5% S 6 5908K 436K fg root /system/bin/sensord 9099 0 1% R 1 1060K 452K fg root top 79 0 0% S 1 0K 0K fg root mmcqd/0 8909 0 0% S 1 0K 0K fg root kworker/0:2 8 0 0% S 1 0K 0K fg root suspend 9 0 0% S 1 0K 0K fg root sync_supers 10 0 0% S 1 0K 0K fg root bdi-default 11 0 0% S 1 0K 0K fg root kblockd 12 0 0% S 1 0K 0K fg root khubd ---------------------------------------------------------------------------------------------- sometimes, b2g process has 92% of CPU per 20 seconds. so i guess that b2g is enabled some log to gather memory-report/dmd/gc/cc log. is it possible to cause some delays? (after 3days, cc log is 14mbytes, gc log is 20mbytes) (if it is right, how i can disable to log data?) if i builds user-mode, is it disabled logging? (currently, i'm used userdebug-mode.)

Justin Lebar (not reading bugmail)

Assignee

Comment 63

•

11 years ago

There is no way to disable logging, but having logging enabled causes very little overhead, because Firefox has to maintain the data we use to generate the logs, even if we're not using the logs. We have this sort of logging enabled on all desktop Firefox builds. What you are seeing is very unlikely related to gc/cc/about:memory logs.

Justin Lebar (not reading bugmail)

Assignee

Comment 64

•

11 years ago

You should use a profiler to see why Firefox is spinning. I'm not sure how a userdebug-mode build is made, but if you have debug symbols, you could attach GDB, hit ctrl-c, and observe the backtrace. Do this a bunch of times and you might understand why the build is spinning. It might also be worth trying to use SPS, but that will probably only be helpful if we're spending time running JavaScript. Also I'm not sure you can use SPS without first rebooting the device, which isn't what you want to do. https://developer.mozilla.org/en-US/docs/Performance/Profiling_with_the_Built-in_Profiler#Profiling_Boot_to_Gecko_%28with_a_real_device%29 This issue is probably worth a separate bug; I can help you make sure you have the right people cc'ed to help you with this issue.

leo.bugzilla.gecko

Comment 65

•

11 years ago

ok. thanks Justin. After attach GDB, When b2g has very high cpu, i hits ctrl-c, at this point, almost, b2g is doing to collect cycle memory. so, i doubt cc logging has a overhead. and also, when i reboots to enable profile, because this will be cleared, so i can't do this yet. i understood, i will file a bug as new after more investigation.

Justin Lebar (not reading bugmail)

Assignee

Comment 66

•

11 years ago

Ah, long cycle collection pauses! That is interesting. Please cc me, :mccr8, and :smaug on the bug. Please also attach the gc and cc logs.

Ryan VanderMeulen [:RyanVM]

Comment 67

•

11 years ago

OK to mark this as status-b2g18:fixed? If so, please do :)

Flags: needinfo?(justin.lebar+bug)

Andrew McCreight [:mccr8]

Updated

•

11 years ago

status-b2g18: --- → fixed

Flags: needinfo?(justin.lebar+bug)

Andrew McCreight [:mccr8]

Updated

•

11 years ago

Depends on: 739681

Part 1: Don't leak BrowserElementParent due to an event listener on the window which contains it. 11 years ago Justin Lebar (not reading bugmail) (deleted), patch		Details \| Diff \| Splinter Review
cc edges from comment 6 11 years ago Justin Lebar (not reading bugmail) (deleted), application/x-xz		Details
gc edges from comment 6 11 years ago Justin Lebar (not reading bugmail) (deleted), application/x-xz		Details
Memory Reports And DMD After 16 hour 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
Memory Reports And DMD And GC/CC log After 33 hour 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
memory report before testing. 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
memory report after testing for 14 hour. 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
memory reports (get_about_memory.py --minimize output) from unagi after 7 STRs (as outlined in comment 31) 11 years ago John Hammink (deleted), application/zip		Details
memory report before testing. 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
memory report after testing for 3 hour. 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details
memory report after testing for 15 hours 11 years ago jeremy.kim.leo [:jeremykimleo] (deleted), application/x-compressed-tar		Details