Closed Bug 241518 Opened 21 years ago Closed 19 years ago

calling addEventListener with a closure holding a content node leaks the document

Categories

(Core :: DOM: Events, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla1.9alpha1

People

(Reporter: spaceshaker.geo, Assigned: dbaron)

References

Details

(Keywords: fixed1.8.1, memory-leak, testcase, Whiteboard: [patch])

Attachments

(5 files, 16 obsolete files)

(deleted), text/html; charset=UTF-8
Details
(deleted), text/html; charset=UTF-8
Details
(deleted), patch
Details | Diff | Splinter Review
(deleted), patch
bzbarsky
: review+
Details | Diff | Splinter Review
(deleted), patch
bzbarsky
: review+
bzbarsky
: superreview+
Details | Diff | Splinter Review
User-Agent: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 Build Identifier: Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.6) Gecko/20040115 Adding an event listener to a DOM element using addEventListener() without ever calling removeEventListener() before the page is unloaded causes a memory leak in nsGenericElement. The memory for the event handlers is _never_ being cleaned up and is being indicated by an assertion at nsGenericElement.cpp:820 (Moz1.6) at shutdown. Reproducible: Always Steps to Reproduce: 1. Make sure mozilla is compiled with debug symbols. 2. Call addEventListener() on a DOM object with a non-empty anonymous function as the handler. 3. Load the page in the browser. 4. Close mozilla. Notice the assertion Actual Results: ###!!! ASSERTION: nsGenericElement's event listener manager hash not empty at shutdown!: 'sEventListenerManagersHash.entryCount == 0', file nsGenericElement.cpp, line 820
Use the following page as a test case: <html> <head> <script type="text/javascript"> function bodyLoad() { var test = document.getElementById("test"); var func = function() { alert('Test'); } test.addEventListener('click', func, false); //Uncomment the below line if you want to see the problem disappear test.removeEventListener('click', func, false); } </script> </head> <body onload="bodyLoad();"> <input type="Button" id="test"/> </body> </html>
If you use GDB and set a break point at nsGenericElement.cpp at line 820 (or so) and you run the above script you find that entryCount is equal to 1. If you make a small patch (see below) to the test page you notice that for each call to addEventListener() entryCount is incremented. for(var i = 0; i < 1000; ++i) { var node = document.createElement('input'); node.type = 'button'; node.addEventListener('click', function() { alert('test'); }, false); document.body.appendChild(node); } You'll also note that once the memory is created for event listeners it is not released until shutdown. Several of my collegues have successfully reproduced this on a number of version of Mozilla (and IE for that matter). Our javascript code base (over 1 meg of JS) would leak memory until until "Out of memory" errors (I've seen it over 150MB). We added a patch to our code base to use obj.onclick instead of obj.addEventListener() and this successfully solved the leak (gdb reports entryCount == 0).
Attached file reporter's testcase (deleted) —
Well, at least one of the problematic cycles is document -> content -> event listener manager -> event listener and then the following cycle back via JS: 08d2ee30 object 0x8fd2520 HTMLDocument via nsXPCWrappedJS::mJSObj[nsIDOMEventListener,0x8fee3c0,0x8d2ffc0](Function).__parent__(Call).test(HTMLInputElement).__parent__(HTMLDocument).
Status: UNCONFIRMED → NEW
Ever confirmed: true
Oh, I see. The problem is the closure. Where the "line to be uncommented" is, another thing that would fix the leak is: test = null; This is because the test variable is in the scope chain for the anonymous function (i.e., the function is a closure containing both code and variable scope), so the function keeps the element alive, and keeping the element's JS context alive keeps the document alive. This makes me think we really do need to unhook event listeners in SetDocument(nsnull).
(In reply to comment #5) > Oh, I see. The problem is the closure. Where the "line to be uncommented" is, > another thing that would fix the leak is: > > test = null; > > > This is because the test variable is in the scope chain for the anonymous > function (i.e., the function is a closure containing both code and variable > scope), so the function keeps the element alive, and keeping the element's JS > context alive keeps the document alive. > > This makes me think we really do need to unhook event listeners in > SetDocument(nsnull). I will test the above to verify.
> This makes me think we really do need to unhook event listeners in > SetDocument(nsnull). Unfortunately, nodes not in a document can still be targeted with events per the DOM spec, as I understand it...
Per DOM you can't have nodes not in a document. So if ownerDocument has been set to null, we're already past the point of non-compliance anyway.
(In reply to comment #8) > Per DOM you can't have nodes not in a document. So if ownerDocument has been set > to null, we're already past the point of non-compliance anyway. SetDocument(null) doesn't set the ownerDocument to null. By "nodes not in a document" bz probably means nodes having document and parent set to null but ownerDocument not null.
(In reply to comment #9) > SetDocument(null) doesn't set the ownerDocument to null. By "nodes not in a > document" bz probably means nodes having document and parent set to null but > ownerDocument not null. Sigh, I meant to say "nodes having parentNode equal to null but ownerDocument not null".
Yes, I'm talking about nodes whose parentNode chain does not include a document node. Put another way: var foo = new Image(); foo.onerror = bar; foo.onload = baz; foo.src = "whatever"; One of those two event handlers should fire.
(In reply to comment #11) > Yes, I'm talking about nodes whose parentNode chain does not include a document > node. Put another way: > > var foo = new Image(); > foo.onerror = bar; > foo.onload = baz; > foo.src = "whatever"; > > One of those two event handlers should fire. I may be stating the obvious here but as per my initial comments, setting the event handler as a property (foo.onload) does not cause a leak. It only occurs when using foo.addEventListener().
Yes, but the point is that event listeners should fire in general on detached nodes, which means we probably can't clear out the hash when nodes are detached...
Well, the way to fix this that would provide the broadest API would be to add additional C++ code to participate in the GC's mark phase, roughly as follows: * all of the roots (XPCWrappedNative or XPCWrappedJS, and probably others) established by things "in" the document wouldn't be true GC roots, but would be maintained by the document (this requires hacking XPConnect) * if any of the JS objects "in" the document is marked during the mark phase OR if the document is currently being displayed, then all the documents roots are marked. (The case of the document currently being displayed could perhaps be handled at the beginning of the GC to save a bit of work.) (I put "in" in quotes because I've left it undefined, and defining it might be a bit of work.) I don't see any less complicated approach with which we don't either (a) have to disconnect things at a given time or (b) leak. Am I missing something?
Keywords: mlk
Summary: Memory Leak in nsGenericElement.cpp caused by addEventListener() → calling addEventListener with a closure holding a content node leaks the document
We could make document participate in the mark phase, for sure. Isn't it sufficient to know that any wrapper is "in" the document by checking its XPConnect scope? Cc'ing dbradley. /be
Keywords: testcase
Perhaps I'm missing something, or misunderstanding something, but why should there be any worry about raising events on detached elements? If there is still a reference (or chain of references) to a detached element, it can't be GC'd, so its listeners should remain in place. If there is no (chain of) reference to a detached node, how can an event be raised on it? The element is not in the document, so the user can't click on it, keypress on it, or anything else, and general document events can't traverse the document to reach the element. Also, there are no references to the element, so an event cannot be foisted on the element by any JavaScript code because the code can't reach the element to do the foisting. Such an element should be GC'd along with all of its listeners, no? (Well, a given function may be attached to more than one element, so maybe some of the listeners will survive, but their ref-count should drop by one, or whatever.) I'm not sure I've clearly explained myself, but I think the ideas that I meant to express above would satisfy this bug, if implemented. The memory footprint of the project Curtis is mentioning grows most quickly when developers are continually refreshing the browser to test changes to the code (I work with Curtis, so I saw all the OOM errors). If all the nodes (detached or otherwise) were GC'd, along with their listeners, on document unload, most of this problem would go away, or at least be mitigated. If detached, unreachable nodes were GC'd along with their listeners, I think everything about this bug would be solved. (Of course, the likelyhood that IE will suddenly fix its matching bug is pretty close to nil, so we'll still have to use our workaround, but whatever.) Does it break the DOM spec to GC detached, unreachable DOM nodes? Ian
Depends on: 283129
Blocks: 71623, 253688
FWIW, I think this is hitting http://maps.google.com/ in a big way.
Oh, Re comment 16: the issue that makes this hard is that we do garbage collection on the wrapper objects that reflect the DOM nodes into JavaScript, but we don't garbage-collect the nodes themselves -- they're reference counted. A cycle that's partly reference-counted and partly garbage collected is just as bad as one that's fully reference-counted. See http://www.mozilla.org/scriptable/avoiding-leaks.html (and also bug 283129 comment 0, but please don't add unnecessary noise to that bug -- or, for that matter, this one).
Attached patch work in progress (obsolete) (deleted) — Splinter Review
So I'm thinking that in the current world where we disconnect the global object at a set time (likely to change with bryner's back/forward changes, I think), there's no harm in disconnecting event listeners too. This needs a lot more testing, and I need to figure out why the assertion sometimes fires.
Assignee: events → dbaron
Status: NEW → ASSIGNED
Whiteboard: [patch]
Target Milestone: --- → mozilla1.8beta2
Attached patch work in progress (obsolete) (deleted) — Splinter Review
Removal and reinsertion is a real problem, though. I wish we had a "document going away" instead of SetDocument.
Attachment #178015 - Attachment is obsolete: true
And this doesn't fix http://maps.google.com/ . I'm guessing some of the elements there were never in the document...
I just discussed this a bit with jst. I think the thing to do here is, once bug 283129 lands: * make nsEventListenerManager QI added nsIDOMEventListener* objects to nsIXPConnectWrappedJS , and for those that implement it, QI them to nsISupportsWeakReference. If they support nsISupportsWeakReference, store the weak reference instead and store the nsIXPConnectJSObjectHolder (ancestor of nsIXPConnectWrappedJS) in the preserved wrappers table created in bug 283129. * make the table of preserved wrappers handle things other than DOM nodes -- in particular, the API needs to change nsIXPConnectWrappedNative to nsIXPConnectJSObjectHolder, it needs to allow more than one wrapper per node (probably just make the wrapper the key instead of the DOM node and change ReleaseWrapper appropriately) * I need to find a way to call ReleaseWrapper so the preserved wrapper table doesn't get bloated. Not sure about this yet.
And it would be good to do this in a way that's reusable by other consumers of interfaces that are marked [function] in the IDL.
Attached patch work in progress (obsolete) (deleted) — Splinter Review
Attachment #178016 - Attachment is obsolete: true
Attached patch work in progress (obsolete) (deleted) — Splinter Review
Attachment #178914 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #178978 - Attachment is obsolete: true
Comment on attachment 179236 [details] [diff] [review] patch This fixes both the testcase in this bug and the http://maps.google.com/ leak, although it may be a little overkill as well.
Attachment #179236 - Attachment description: work in progress → patch
Comment on attachment 179236 [details] [diff] [review] patch Oh, and this patch doesn't technically compile since it requires tons of REQUIRES changes. The new header should probably be in content simply to avoid that problem, but I wanted to discuss that...
Attached patch patch (obsolete) (deleted) — Splinter Review
This one compiles since the new header is in content. That may not be the ideal location, but it reduces the need for REQUIRES changes. Requesting review to at least get comments on this approach, if not actual review.
Attachment #179236 - Attachment is obsolete: true
Attachment #179404 - Flags: review?(jst)
Blocks: 286619
This one ought to work in IE.
Attachment #181808 - Attachment is obsolete: true
Depends on: 292027
This version actually works on WinIE (setTimeout syntax).
Attachment #181812 - Attachment is obsolete: true
Depends on: 292146
Attachment #182012 - Attachment description: testcase for recent (since 1.7) behavior change related to crash caused by this patch → testcase for recent (since 1.7) behavior change related to crash caused by this patch (filed as bug 292146)
Attached patch patch (obsolete) (deleted) — Splinter Review
Merged to trunk.
Attachment #179404 - Attachment is obsolete: true
Blocks: 295269
Attached patch patch (obsolete) (deleted) — Splinter Review
Merged to trunk, and with null check to work around crash.
Attachment #182723 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
Merged to trunk.
Attachment #187912 - Attachment is obsolete: true
Attached patch patch (no longer works) (obsolete) (deleted) — Splinter Review
After merging with the splitwindow changes, this crashes on startup. I've removed these changes from my primary tree after running with them for over three months.
I'd also note that I think I probably need to fix the XPConnect change so that it doesn't allow all functions to support weak references, only certain ones. Not sure how, though.
(In reply to comment #36) > After merging with the splitwindow changes, this crashes on startup. I've > removed these changes from my primary tree after running with them for over > three months. That actually probably wasn't from the merge, but was from the other nsGlobalWindow::QueryInterface change that slipped into that patch.
The reason I'm crashing is that my marked JS function holder needs to get a finalize callback to call ReleaseWrapper -- it's leaving bad wrappers in the preserved wrapper table.
Attachment #179404 - Flags: review?(jst)
Blocks: 305536
No longer blocks: 253688
Blocks: 253688
Attached patch patch (obsolete) (deleted) — Splinter Review
Attachment #190595 - Attachment is obsolete: true
Attachment #191158 - Attachment is obsolete: true
Attached patch patch (doesn't run) (obsolete) (deleted) — Splinter Review
I think this is closer to what I want, but it doesn't run because the refcounting of XPConnect wrappers caused by HolderToWrappedJS pokes rt->gcPoke (by adding and removing a root) in the middle of every GC.
Attachment #199489 - Attachment is obsolete: true
Attached patch patch (obsolete) (deleted) — Splinter Review
This patch works. I decided to use the simplest ugly hack I could think of, which was an extra interface with dont-refcount semantics.
Attachment #202185 - Attachment is obsolete: true
Flags: blocking1.8.1?
Attached patch patch (obsolete) (deleted) — Splinter Review
This has a few additional changes based on discussion with jst and mrbkap, and some additional comments.
Attachment #203057 - Attachment is obsolete: true
Attachment #203072 - Flags: superreview?(jst)
Attachment #203072 - Flags: review?(mrbkap)
Comment on attachment 203072 [details] [diff] [review] patch r=mrbkap
Attachment #203072 - Flags: review?(mrbkap) → review+
Comment on attachment 203072 [details] [diff] [review] patch Some nits I found: Index: js/src/xpconnect/src/xpcwrappedjs.cpp + // This interface is a hack that says "don't AddRef me". + if(aIID.Equals(NS_GET_IID(nsWeakRefToIXPConnectWrappedJS))) { nit: { gets its own line in xpconnect. + *aInstancePtr = NS_STATIC_CAST(nsWeakRefToIXPConnectWrappedJS*, this); + return NS_OK; Index: dom/src/base/nsDOMClassInfo.cpp WrapperSCCEntry *entry = NS_STATIC_CAST(WrapperSCCEntry*, hdr); - new (entry) WrapperSCCEntry(NS_STATIC_CAST(nsIDOMNode*, + new (entry) WrapperSCCEntry(NS_STATIC_CAST(nsIDOMGCParticipant*, NS_CONST_CAST(void*, key))); nit: these don't line up. Index: dom/src/base/nsDOMClassInfo.h + /** + * Easier way to call the above just for DOM nodes (and better, since + * we get the performance benefits of having the same identity function. nit: missing ). Index: dom/src/base/nsGlobalWindow.cpp // Clear our mutation bitfield. + // XXX We don't always clear mutation listeners (aRemoveEventListeners)! I'll fix this soon. Index: dom/src/base/nsWindowRoot.cpp + mListenerManager->SetListenerTarget( + NS_STATIC_CAST(nsIDOMEventReceiver*,this)); Uber-nit: space after the comma, please.
Attached patch patch (deleted) — Splinter Review
This is merged to the trunk, and addresses mrbkap's comments.
Attachment #203072 - Attachment is obsolete: true
Attachment #203862 - Flags: superreview?(jst)
Attachment #203072 - Flags: superreview?(jst)
Comment on attachment 203862 [details] [diff] [review] patch - In js/src/xpconnect/idl/nsIXPConnect.idl: + * This interface is a complete hack. It is used by the DOM code to get + * call QueryReferent on a weak reference to a wrapped JS object without remove the "get" - In nsEventReceiverSH::Mark(): + nsISupports *native = wrapper->Native(); + nsCOMPtr<nsIDOMGCParticipant> participant(do_QueryInterface(native)); Use do_QueryWrappedNative() here? - In nsGlobalWindow::SetNewDocument(): // Clear our mutation bitfield. + // XXX We don't always clear mutation listeners (aRemoveEventListeners)! mMutationBits = 0; The aRemoveEventListeners argument no longer exists, and this does the right thing either way already (mutation bits are really only kept on the inner window, so clearing it on the outer does no harm). So remove that comment. - In nsJSUtils.h: +/** + * XXX WRITE ME + */ Yes, please do :) The rest looks good to me. sr=jst
Attachment #203862 - Flags: superreview?(jst) → superreview+
Great patch, looking forward to it landing. Word on the street is that closures as XMLHttpRequest handlers, probably again only if they entrain a DOM element in their environment, cause major leaks. So we'd want to use nsMarkedJSFunctionHolder in more than just the ELM. /be
(In reply to comment #48) > Great patch, looking forward to it landing. Word on the street is that > closures as XMLHttpRequest handlers, probably again only if they entrain a DOM > element in their environment, cause major leaks. So we'd want to use > nsMarkedJSFunctionHolder in more than just the ELM. I think the XMLHttpRequest leak (is there a bug on that?) would be better solved by nulling out the 4 pointers (3 nsIDOMEventListener, 1 nsIOnReadyStateChangeHandler) at an appropriate time. Presumably (not being at all an expert on XMLHttpRequest) while the HTTP request is being made there's no requirement that something hold onto the JS object for the XMLHttpRequest in order to get notifications when the XML loads; likewise there's a concrete point (OnStopRequest?) at which we know that the load either succeeded or failed and we'll never again notify the listeners.
Fix checked in to trunk.
Status: ASSIGNED → RESOLVED
Closed: 19 years ago
Resolution: --- → FIXED
Target Milestone: mozilla1.8beta2 → mozilla1.9alpha
Hrm. Well, we do that already, actually, but that leaves bug 206520 plus perhaps a similar issue with successful multipart requests.
*** Bug 317818 has been marked as a duplicate of this bug. ***
Should nsGenericElement and nsGenericDOMDataNode append their ownerDocument to the reachable list?
That's a little scary to do. Wouldn't that cause leaking an element to leak the entire document?
(In reply to comment #53) > Should nsGenericElement and nsGenericDOMDataNode append their ownerDocument to > the reachable list? Yes, they should. (In reply to comment #54) > That's a little scary to do. Wouldn't that cause leaking an element to leak the > entire document? No, and it only makes a difference for elements that have been removed from the document anyway. (The document is still reachable from them thanks to ownerDocument; they're not reachable from it.)
Not tested yet.
One other thought: for nsIContent impls, both AppendReachableList() and GetSCCIndex() use only nsIContent apis, so we could even declare them non-pure-virtual on nsIContent and implement nsIContent::GetSCCIndex/AppendReachableList somewhere (eg in nsGenericElement.cpp?) instead of having two copies of that code in generic element and generic DOM data node.
we really need an nsIContent.cpp file, or somewhere where we place all the virtual nsIContent functions that have default implementations.
Attachment #205010 - Flags: superreview?(jst)
Attachment #205010 - Flags: review?(bzbarsky)
Comment on attachment 205010 [details] [diff] [review] add ownerDocument to reachable list Looks good to me. It looks like the reachable list must not contain null pointers, right? In that case, nsGlobalWindow::AppendReachableList might need to null-check some things (eg an outer window is not guaranteed to have an inner, and we might end up in a situation where an inner has no outer).
Attachment #205010 - Flags: review?(bzbarsky) → review+
I think I should probably just null-check in nsDOMClassInfo.cpp after the call to AppendReachableList, i.e., - MarkReachablePreservedWrappers(reachable[i], cx, arg); + if (reachable[i]) + MarkReachablePreservedWrappers(reachable[i], cx, arg);
Attached patch null-check (deleted) — Splinter Review
Attachment #205073 - Flags: superreview?(bzbarsky)
Attachment #205073 - Flags: review?(bzbarsky)
> I think the XMLHttpRequest leak (is there a bug on that?) would be better > solved by nulling out the 4 pointers (3 nsIDOMEventListener, 1 > nsIOnReadyStateChangeHandler) at an appropriate time. The "XMLHttpRequest leak" was merely a theory... not yet confirmed, so there is no bug on file (that I know of).
Bug 206520 is on file, and I think there may be additional problems with multipart -- see comment 51.
Comment on attachment 205010 [details] [diff] [review] add ownerDocument to reachable list sr=jst
Attachment #205010 - Flags: superreview?(jst) → superreview+
Comment on attachment 205073 [details] [diff] [review] null-check r+sr=bzbarsky
Attachment #205073 - Flags: superreview?(bzbarsky)
Attachment #205073 - Flags: superreview+
Attachment #205073 - Flags: review?(bzbarsky)
Attachment #205073 - Flags: review+
Both of those patches checked in to trunk.
Depends on: 319293
Depends on: 319642
Blocks: mlk1.8
Flags: blocking1.8.0.1?
Depends on: 320946
Depends on: 321054
Blocks: 206520
Would love this for a 1.8.0.x release, but the still-open regressions are very scary.
Flags: blocking1.8.0.2?
Flags: blocking1.8.0.1?
Flags: blocking1.8.0.1-
Blocks: 323534
Depends on: 323807
*** Bug 323074 has been marked as a duplicate of this bug. ***
Is this (and regression fixes) ever going to be appropriate for an .0.x release, or should we just wait for 1.8.1?
(For what it's worth, I think http://maps.google.com/ has changed their code so they don't trigger this leak anymore.)
[oops, I thought I submitted this before the last comment, which was intended as an aside to this one] Probably not, and perhaps not even for 1.8.1.
> (For what it's worth, I think http://maps.google.com/ has changed their code so > they don't trigger this leak anymore.) Yup, confirmed from my internal sources ;-)
Depends on: 323371, 325279
Depends on: 326646
This is too risky for the 1.8.0 branch, and still has open regressions.
Flags: blocking1.8.0.2? → blocking1.8.0.2-
*** Bug 334005 has been marked as a duplicate of this bug. ***
Is 321054 the only open, known regression remaining on the trunk? This is a highly desireable fix for 1.8.1, I have to say...
Known to me, yes.
Blocks: 336791
*** Bug 337405 has been marked as a duplicate of this bug. ***
Flags: blocking1.8.1? → blocking1.8.1+
Fixed on MOZILLA_1_8_BRANCH by checkin of bug 336791.
Keywords: fixed1.8.1
Depends on: 344699
*** Bug 351101 has been marked as a duplicate of this bug. ***
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: