Closed Bug 957109 Opened 11 years ago Closed 10 years ago

Intermittent TEST-UNEXPECTED-FAIL | leakcheck | 384139 bytes leaked (AsyncLatencyLogger, AsyncStatement, AtomImpl, BodyRule, CacheObserver, ...)

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla33
Tracking Status
firefox30 --- wontfix
firefox31 --- disabled
firefox32 --- fixed
firefox33 --- fixed
firefox-esr24 --- unaffected
b2g-v1.3 --- unaffected
b2g-v1.3T --- unaffected
b2g-v1.4 --- disabled
b2g-v2.0 --- fixed
b2g-v2.1 --- fixed

People

(Reporter: cbook, Assigned: wchen)

References

()

Details

(Keywords: intermittent-failure, memory-leak)

Attachments

(4 files, 1 obsolete file)

WINNT 6.2 mozilla-inbound debug test mochitest-4 on 2014-01-07 01:46:27 PST for push 5734ebc705fd slave: t-w864-ix-105 https://tbpl.mozilla.org/php/getParsedLog.php?id=32627456&tree=Mozilla-Inbound TEST-UNEXPECTED-FAIL | leakcheck | 384139 bytes leaked (AsyncLatencyLogger, AsyncStatement, AtomImpl, BodyRule, CacheObserver, ...)
Keywords: mlk
I guess it isn't a good sign that the two instances of this in the last three months are on my try push.
ICC is hitting this 8 out of 21 M4 L64 debug runs on try. :(
Blocks: 911246
I don't know how useful this is, but this is what is holding alive the document: 0x7f2fd9ace000 [nsDocument normal (xhtml) http://mochi.test:8888/tests/dom/tests/mochitest/webcomponents/test_nested_content_element.html] Root 0x7f2fd9ace000 is a ref counted object with 4 unknown edge(s). known edges: 0x7f2fd4b8e3e0 [nsNodeInfoManager] --[mDocument]--> 0x7f2fd9ace000 At least that points me to a test.
Running that directory results in this: 0:11.18 [23900] ###!!! ABORT: You can't dereference a NULL nsCOMPtr with operator->().: 'mRawPtr != 0', file ../../dist/include/nsCOMPtr.h, line 870 0:19.39 nsXBLPrototypeBinding::FlushSkinSheets() [/home/amccreight/mc/dom/xbl/nsXBLPrototypeBinding.cpp:249] 0:19.39 mozilla::dom::ShadowRoot::Restyle() [/home/amccreight/mc/content/base/src/ShadowRoot.cpp:122] 0:19.39 nsStyleLinkElement::DoUpdateStyleSheet(nsIDocument*, mozilla::dom::ShadowRoot*, nsICSSLoaderObserver*, bool*, bool*, bool) [/home/amccreight/mc/content/base/src/nsStyleLinkElement.cpp:308] 0:19.39 nsStyleLinkElement::UpdateStyleSheetInternal(nsIDocument*, mozilla::dom::ShadowRoot*, bool) [/home/amccreight/mc/content/base/src/nsStyleLinkElement.cpp:190] 0:19.39 mozilla::dom::HTMLStyleElement::UnbindFromTree(bool, bool) [/home/amccreight/mc/content/html/content/src/HTMLStyleElement.cpp:163] 0:19.39 mozilla::dom::FragmentOrElement::cycleCollection::Unlink(void*) [/home/amccreight/mc/content/base/src/FragmentOrElement.cpp:1277] 0:19.39 mozilla::dom::ShadowRoot::cycleCollection::Unlink(void*) [/tmp/../../../dist/include/nsCOMPtr.h:851] 0:19.39 nsCycleCollector::CollectWhite() [/home/amccreight/mc/xpcom/base/nsCycleCollector.cpp:2967] 0:19.39 nsCycleCollector::Collect(ccType, js::SliceBudget&, nsICycleCollectorListener*) [/home/amccreight/mc/xpcom/base/nsCycleCollector.cpp:3269] 0:19.39 nsCycleCollector::ShutdownCollect() [/home/amccreight/mc/xpcom/base/nsCycleCollector.cpp:3211] 0:19.39 nsCycleCollector_shutdown() [/home/amccreight/mc/obj-dbg/xpcom/base/../../dist/include/nsAutoPtr.h:879] Maybe that is related. Maybe not.
Note that comment 15 happened on trunk, so we are still (very) rarely hitting this even without ICC. I tried disabling the CC optimization in nsXBLDocumentInfo's Traverse method, and it looks like it reduces the frequency of this with ICC from about 1/3 of the time to 1/6 of the time. https://hg.mozilla.org/try/rev/d0758d71319e I currently have two theories. One is that document marking is interacting badly with ICC in some way. I admit I haven't thought about what might happen in great detail, but it does seem odd this would only show up in one place. To test this, I've tried globally disabling all marking CC optimizations, to see what would happen there. https://tbpl.mozilla.org/?tree=Try&rev=1dd6a4601602 In support of this theory, disabling some of it seems to have halved the frequency. The problem with this theory is that it persists even through shutdown, when marking is already being ignored. The other theory is that XBL teardown is weird in some kind of way such that we end up not properly breaking it down (maybe incomplete unlinking). In support of this, I noticed that when I locally reproduced the problem, we end up CCing 5 times. The last 4 times, there are the same 6 "FragmentOrElement (xhtml) span" that get marked as garbage. Clearly, they don't actually get freed. That suggests there's some incomplete unlinking. I'm going to investigate the ownership model for this XBL stuff.
Disabling CC optimizations by always returning false for InGeneration did not help. https://hg.mozilla.org/try/rev/d98b969ad385
Bug 508725 landed on the 1/9, which is almost a perfect explanation for where this came from, except that this first happened on 1/7...
Attached file leaky log (obsolete) (deleted) —
Here's a log from where I locally managed to reproduce this. What I did was I tracked every node and document that is every created. Once I run the first two shutdown CCs, I add any that are still alive back to the graph. When I was only adding documents to the graph, it was still leaking, but adding both documents and nodes seems to fix the leak in the extra rounds of CC I add. In the doc-only run, there were two missing edges to the document. It appears like these are both nsXBLDocumentInfo, which are being held alive by some kind of XBL-y stuff. As I said, with all nodes and documents added to the graph, there's no leak, so it isn't immediately obvious what is going wrong. I'll try logging all CCs during the mochitest run and see if that turns up anything.
Assignee: nobody → continuation
Component: General → XBL
Component: XBL → DOM
Attachment #8402119 - Attachment is obsolete: true
Attachment #8402155 - Attachment description: CC log from the last time the ShadowRoot appears → CC log 35, the last time the ShadowRoot appears
Attachment #8402156 - Attachment description: CC log from post-shutdown that has all the edges → CC log 40, from post-shutdown that has all the edges
We're leaking an nsDocument 0x7f37affa0800 for http://mochi.test:8888/tests/dom/tests/mochitest/webcomponents/test_shadowroot3.html through shutdown. Log 38 is the last time it appears, and the document can't be freed because there are two missing edges. From log 40, we can see that these two missing edges are nsXBLDocumentInfos, which last showed up in log 35. Those in turn are being held alive by a nsXBLBinding. One of them at least is being held alive by a document fragment: 0x7f37b5640e10 [rc=3] FragmentOrElement ([none]) #document-fragment http://mochi.test:8888/tests/dom/tests/mochitest/webcomponents/test_shadowroot3.html > 0x7f37b7317ca0 mNodeInfo > 0x7f37b5640e10 mSlots->mContainingShadow > 0x7f37b8c991a0 mPoolHost > 0x7f37b5b0ab00 mAssociatedBinding From looking at it, I think it is actually a ShadowRoot (CC logging doesn't show the proper names for subclasses). This in turn is in a little cycle with a div 0x7f37b8c991a0. But if you look in log 35, this ShadowRoot is there, but it doesn't have any outgoing edges, so we don't see the edge to the div, and none of this stuff is torn down. The lack of outgoing edges is usually a sign that nsINode::Traverse returned false, so we decided the node was alive and skipped the rest of the traverse. The first thing that nsINode::Traverse does is this: nsIDocument *currentDoc = tmp->GetCurrentDoc(); if (currentDoc && nsCCUncollectableMarker::InGeneration(currentDoc->GetMarkedCCGeneration())) { return false; } The constructor of ShadowRoot does this: // ShadowRoot isn't really in the document but it behaves like it is. SetInDocument(); So, because of the SetInDocument(), GetCurrentDoc() will return OwnerDoc(), and the document is currently being displayed in log 35, so I think we end up not traversing it. But in this case, that optimization is bad, because it seems like the document does not actually hold a strong reference to the ShadowRoot. Does my analysis seem right? I'm not sure how this should be fixed. Somehow the interaction of ShadowRoot and CC optimizations needs to be adjusted.
Oh, we can't have SetInDocument(); being used that way. We rely on IsInDoc() to be true everywhere. I would be surprised if that SetInDocument() doesn't lead to crashes elsewhere.
FWIW, this appears to be the case of nsINode::Traverse that triggers in the case where we end up leaking: nsIDocument *currentDoc = tmp->GetCurrentDoc(); [...] if (!tmp->UnoptimizableCCNode()) { // If we're in a black document, return early. if ((currentDoc && currentDoc->IsBlack())) { return false; } I'm trying to put together a test case that will leak without ICC, but I haven't managed yet.
Unsurprisingly, adding NODE_IS_IN_SHADOW_TREE to UnoptimizableCCNode() seems to fix the leak, though that's probably more heavy-handed than we want.
We could do that for now, I think. IsInDoc() handling of ShadowRoot need to be fixed, but it isn't clear to me how it should work. Currently it certainly is wrong since document.createElement("div").createShadowRoot() creates a thing which is in document, although the shadow host isn't in document. (Shadow DOM isn't enabled by default.)
With this patch, and ICC enabled, something that looks like bug 992571 is permaorange on WinXP and happening at least sometimes on L64 debug. I don't know if this patch is at fault or that is. I guess I'll try a push without ICC.
Depends on: 992571
Hmm. A try run with my patch here has an instance of this leak on XP, so I guess it isn't a complete fix. https://tbpl.mozilla.org/?tree=Try&rev=a68cdbaea22e
There's some more weirdness with this XBL kind of ownership that might be a problem. 1. nsXBLBinding's ctor does an ADDREF of mPrototypeBinding->XBLDocumentInfo(), and the dtor does a RELEASE. That seems weird, and we should hopefully be able to turn it into a regular CCed field of nsXBLBinding. There's a comment in nsXBLBinding's Unlink that says "Probably can't unlink mPrototypeBinding->XBLDocumentInfo(), because mPrototypeBinding is weak." That comment is bogus, because we rely on mPrototypeBinding being around for the dtor, so we could possibly unlink it. I locally was testing out a patch that converts it to a regular CCed field, and I was getting some errors, but those might be related to another patch. 2. The binding manager holds references to FragmentOrElement's. This is traversed, in the traverse of FoE, by tmp->OwnerDoc()->BindingManager()->Traverse(tmp, cc);, but there's no corresponding unlink for this edge. In one leaking graph for this bug, I saw some elements repeatedly be identified as garbage by the CC, but they were being kept alive by the binding manager. I didn't think that was the root cause of the leak, but I could be wrong. My local patch was causing some errors, but I think I have a version that is less bad.
Depends on: 996151
Depends on: 996346
Depends on: 1000974
OS: Windows 8 → All
Hardware: x86 → All
Bug 992521 and bug 996346 don't appear to be necessary to fix this intermittent leak, at least judging by this WinXP try run with ICC enabled: https://tbpl.mozilla.org/?tree=Try&rev=98d6571b6f4c
No longer depends on: 992521, 996346
Hrmph. That happened on trunk after bug 996151 merged, so I guess there's still something wrong.
The last 3 times it happened were all on 10.6. Maybe I should try a bunch of ICC retriggers on M4 for 10.6.
At some point, somebody should figure out what these residual leaks are.
Assignee: continuation → nobody
This has only happened once on trunk in the last 6 days, so I think it was fixed by bug 1025738. I'll give it another week before I declare this totally fixed.
Assignee: nobody → wchen
Status: NEW → RESOLVED
Closed: 10 years ago
Depends on: 1025738
Resolution: --- → FIXED
Target Milestone: --- → mozilla33
ShadowRoot tests disabled on beta/b2g30 per discussion in bug 1025738. We'll backport the leak fixes to Aurora and leave them enabled there. https://hg.mozilla.org/releases/mozilla-beta/rev/a86f9e861984 https://hg.mozilla.org/releases/mozilla-b2g30_v1_4/rev/6bda64f1ec0a
bug 1033464 may or may not be the same leak
Component: DOM → DOM: Core & HTML
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: