Closed Bug 374096 Opened 18 years ago Closed 18 years ago

Cycle collector doesn't collect all the cycles it could

Categories

(Core :: XPCOM, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED

People

(Reporter: peterv, Assigned: peterv)

References

Details

Attachments

(1 file, 1 obsolete file)

I've found two problems with the cycle collector that when fixed seem to resolve most of the leaks. The first of the two problems I consider a bug, the second one seems more like an optimization.

1) Once an object has been marked purple and then scanned it will not be marked purple again until it is addref'ed. This is because we store a bit in the refcount to signal whether the object has added itself to the purple buffer. This bit is only cleared from AddRef, but objects can be removed from the purple buffer by the cycle collector itself after they have been scanned. The scenario I'm seeing is:
  * object A is in a cycle, and held through a strong ref by object B and C,
    which are not in the cycle
  * B goes away, causing A to add itself to the purple buffer and set its
    purple bit
  * cycle collector scans A, but can't collect it because it's held by C
  * after the scan, A is removed from the purple buffer without clearing its
    purple bit
  * C goes away, but A will not add itself to the purple buffer because its
    purple bit is still set
  * A is in an uncollectable cycle

The solution is to clear the purple bit from objects that are black after scanning (if white objects aren't collected we're just missing Unlink implementations and we need to solve that instead of trying to recollect).

2) At shutdown we'll release cycles that can drop references to objects that are in other cycles that we don't know about because of unknown edges. In an ideal situation we'll know about all edges, but I don't think we'll ever be in that situation, so I think we should run multiple iterations of cycle collection until nothing is added anymore to the purple buffer. Because all of our Unlink implementations aren't complete, we should limit the number of final collections to avoid an infinite loop. I found 5 final collections to be a good number for now, hopefully we can lower it in the future when we have better Unlink implementations.

I have a patch for both problems, I'll attach it. I did see a crash earlier, but I haven't been able to reproduce it at all, so I hope it's unrelated to this patch.
Attached patch v1 (obsolete) (deleted) — Splinter Review
Attachment #258687 - Flags: review?(graydon)
(In reply to comment #0)
> collection until nothing is added anymore to the purple buffer. Because all of
> our Unlink implementations aren't complete, we should limit the number of final
> collections to avoid an infinite loop. I found 5 final collections to be a good
> number for now, hopefully we can lower it in the future when we have better
> Unlink implementations.

Instead of hard-coding the number 5, could you instead look at the number of objects collected (marked white) rather than the number added to the purple buffer?  If nothing is collected in an iteration, then that iteration won't create new garbage.
Actually, that wouldn't work if the lack of a valid Unlink implementation prevents the cycle from being collected, and we can't detect that.
Comment on attachment 258687 [details] [diff] [review]
v1

Nice catch on the clearing-purple-bit.
Attachment #258687 - Flags: review?(graydon) → review+
What is NS_DECL_CYCLE_COLLECTION_UNMARK_PURPLE_STUB for?  I don't see any uses.

Why don't you need to call Forget on non-C++ objects?  Doesn't that make them stay in the purple buffer forever?
There are no non-C++ objects in the purple buffer.

The stub is for objects that do not really track their purple-ness using the cycle-collection addref/release methods. XPCWrappedNatives are like this, for example.
Attached patch v1.1 (deleted) — Splinter Review
Forgot to include the one-line patch to add NS_DECL_CYCLE_COLLECTION_UNMARK_PURPLE_STUB(nsXPCWrappedJS).

Currently only XPCOM C++ objects are put in the purple buffer. If that changes we can update that code imho, this avoids a bunch of forget calls that do nothing. I could add an assertion that checks if an object exists in the purple buffer if it's not CPLUSPLUS, since we'd need to change the code then.

I do hope we can tune the number of collections on shutdown in the future, maybe using numbers like number of white objects collected. Currently the hard limit is necessary, but we should definitely look into adding debug code that detects missing unlink calls and fix those. I found for example that adding unlink code for mScriptContext in nsXBLDocGlobalObject did away with at least 2 collections on shutdown, I'll file a separate bug on that.
Attachment #258687 - Attachment is obsolete: true
Attachment #258699 - Flags: superreview?
Attachment #258699 - Flags: review+
Attachment #258699 - Flags: superreview? → superreview?(dbaron)
(In reply to comment #6)
> The stub is for objects that do not really track their purple-ness using the
> cycle-collection addref/release methods. XPCWrappedNatives are like this, for
> example.

Actually, XPCWrappedNative uses NS_DECL_CYCLE_COLLECTING_ISUPPORTS, we should probably reconsider that. The only class that currently needs NS_DECL_CYCLE_COLLECTION_UNMARK_PURPLE_STUB is nsXPCWrappedJS.
Comment on attachment 258699 [details] [diff] [review]
v1.1

sr=dbaron
Attachment #258699 - Flags: superreview?(dbaron) → superreview+
I checked this in with a fix in xpcom/glue/nsISupportsImpl.h for the crash I was seeing:

@@ -345,7 +355,8 @@ NS_IMETHODIMP_(nsrefcnt) _class::AddRef(
 {                                                                             \
   NS_PRECONDITION(PRInt32(mRefCnt) >= 0, "illegal refcnt");                   \
   NS_ASSERT_OWNINGTHREAD(_class);                                             \
-  nsrefcnt count = mRefCnt.incr(NS_STATIC_CAST(_basetype *, this));           \
+  nsrefcnt count =                                                            \
+    mRefCnt.incr(NS_CYCLE_COLLECTION_CLASSNAME(_class)::Upcast(this));        \
   NS_LOG_ADDREF(this, count, #_class, sizeof(*this));                         \
   return count;                                                               \
 }
@@ -358,10 +369,11 @@ NS_IMETHODIMP_(nsrefcnt) _class::Release
 {                                                                             \
   NS_PRECONDITION(0 != mRefCnt, "dup release");                               \
   NS_ASSERT_OWNINGTHREAD(_class);                                             \
-  nsrefcnt count = mRefCnt.decr(NS_STATIC_CAST(_basetype *, this));           \
+  nsISupports *base = NS_CYCLE_COLLECTION_CLASSNAME(_class)::Upcast(this);    \
+  nsrefcnt count = mRefCnt.decr(base);                                        \
   NS_LOG_RELEASE(this, count, #_class);                                       \
   if (count == 0) {                                                           \
-    mRefCnt.stabilizeForDeletion(NS_STATIC_CAST(_basetype *, this));          \
+    mRefCnt.stabilizeForDeletion(base);                                       \
     _destroy;                                                                 \
     return 0;                                                                 \
   }                                                                           \

This makes sure that we always use the canonical pointer when adding to or removing from the purple buffer. One example where I used different arguments to the macros is nsDOMAttribute which has NS_DECL_CYCLE_COLLECTION_CLASS_AMBIGUOUS(nsDOMAttribute, nsIAttribute) and NS_IMPL_CYCLE_COLLECTING_ADDREF_AMBIGUOUS(nsDOMAttribute, nsIDOMAttr). To avoid problems like that the patch does away with the _basetype parameter in NS_IMPL_CYCLE_COLLECTING_ADDREF_AMBIGUOUS and NS_IMPL_CYCLE_COLLECTING_RELEASE_FULL, which makes those macros dependant on the nsCycleCollectionParticipant helper. I think that is ok, but am willing to look for another solution if it is not. We should probably go through the tree and change all the NS_IMPL_CYCLE_COLLECTING_ADDREF_AMBIGUOUS and NS_IMPL_CYCLE_COLLECTING_RELEASE_AMBIGUOUS_* macros to NS_IMPL_CYCLE_COLLECTING_ADDREF/NS_IMPL_CYCLE_COLLECTING_RELEASE_*.

Rlk on fxdbug-linux-tbox dropped to 6.03KB (from 717KB). No drop on nye but that is a debug build, the fix for bug 368869 seems to make things worse in debug builds.

qm-xserve01 has gone orange, mochitest fails again (see also bug 372960, comment 13). Waiting one more cycle and then I'll back out. I haven't yet been able to reproduce that problem, so still not really sure what's going on.
qm-xserve01 went green after one cycle.
Status: NEW → RESOLVED
Closed: 18 years ago
Resolution: --- → FIXED
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: