Closed Bug 490695 Opened 15 years ago Closed 15 years ago

implement cycle collector's purple buffer with entries that can be cheaply removed from object in buffer

Categories

(Core :: XPCOM, defect, P3)

defect

Tracking

()

RESOLVED FIXED
mozilla1.9.1

People

(Reporter: dbaron, Assigned: dbaron)

References

Details

(Keywords: fixed1.9.1, perf)

Attachments

(2 files, 3 obsolete files)

pldhash's SearchTable called from within nsCycleCollector::Suspect and Forget shows up in profiles a bit.  We can fix this by replacing the purple buffer with a circularly linked list:

 * Each entry in the list has:
   + pointer to object
   + next and prev entry pointers
   + object's refcount

 * nsCycleCollectingAutoRefcnt replaces its mValue with a tagged value:  when untagged it's a reference count, when tagged it's a pointer to the entry in the purple buffer list, which contains the reference count
> When untagged it's a reference count

I'd think we want it the other way around.  If the tagged value is the refcount, then we start with a test; if it's tagged we just need to shift to get the value, and if not we directly deref the pointer.  If the tagged value is the pointer, we still need to shift in the untagged case, but in the tagged case we have to mask and deref.
Do we perhaps want to recycle entries in the list? And/or use arena allocation for
them?
Huh. I was actually part way through writing up a task-specific container for the CC -- a task-specific digital trie -- but this approach sounds faster still. At least for the purple buffer.

(Perhaps that's the only place the speed of pointer-based associative lookup counts? Maybe both..)

(In reply to comment #2)
> Do we perhaps want to recycle entries in the list? And/or use arena allocation
> for them?

I think we'd want a freelist of cells next to the active list -- pointers enter and leave the purple buffer at very high speed, constantly -- but I doubt there'd be a benefit from an arena. I don't imagine the resident-count is terribly bursty. I imagine it's reasonably-near-average for a long while, and then drops to zero during collection, and then quickly trends back up to near-average. 

But of course, real measurements are the only way to tell.
> I don't imagine the resident-count is terribly bursty.

I think it actually is....  At least if I increment a counter in Put() and decrement it in Remove() and log the high-water mark value, I get something like 600k or so on the testcase I used for my profiling talk.  That testcase has aroung 1500k total Put() calls.  I suspect the issue is that each DOM node we work with ends up suspect when we're done with it, since the last thing that happens is a release.  Then we cc sometime after we're done running the testcase.
Attached patch patch (obsolete) (deleted) — Splinter Review
Here's a patch that's getting close to something I'm happy with.

I'm unhappy with the fact that we need to worry about binary compatibility here; nsXPCOMCycleCollectionParticipant is effectively frozen, but I think the changes that that forced me to undo probably were better off undone anyway.  This leaves a chunk of old code for backwards compatibility with users of old XPCOM glue that I'm really not sure will get enough testing going forward, though.
next and prev pointers seem like a waste of space and time to update.  Order in
the list is not important, right?  In which case just put the pointer to Obj
and refcount into a dense array that you can iterate through during cycle
collection. When you delete something put it on a freelist, when you allocate
take it from the freelist if non-empty, otherwise extend the array.  (the array
could be in chunks with all items in a chunk initially added to the freelist
when the chunk is allocated. Obviously you then need some list of the chunks)

If it turns out that a large fraction of all objects are in the purple buffer
at some point in their life then it would make the code a lot simpler and
faster to always keep the refcount in the object itself, and just make each
object 4 bytes larger. That will also save a pointer chase just find find the
refcount (not to mention conditional code to decide where the refcount is)
Hmmm.  I've written the patch before seeing these comments.

(In reply to comment #1)
> > When untagged it's a reference count
> 
> I'd think we want it the other way around.  If the tagged value is the
> refcount, then we start with a test; if it's tagged we just need to shift to
> get the value, and if not we directly deref the pointer.  If the tagged value
> is the pointer, we still need to shift in the untagged case, but in the tagged
> case we have to mask and deref.

I came to the same conclusion without seeing your comment. :-)

(In reply to comment #2)
> Do we perhaps want to recycle entries in the list? And/or use arena allocation
> for
> them?

I allocated them in blocks, but decided to free the blocks each time we empty the list for a collection since I suspect using freelists for a long time is bad for memory locality.  I haven't measured anything here yet, though.
(In reply to comment #6)
> next and prev pointers seem like a waste of space and time to update.  Order in
> the list is not important, right?  In which case just put the pointer to Obj
> and refcount into a dense array that you can iterate through during cycle
> collection.

Yeah, I think you're probably right.  I suspect it's rare that the purple buffer shrinks substantially from its maximum size between collections before the next collection happens.
Attached patch patch (obsolete) (deleted) — Splinter Review
Here's a patch using arrays rather than a linked list (we just step over the items on the free list when we enumerate).

I might have performance comparison numbers from the try server by... well, tomorrow morning, most likely.
Attachment #375239 - Attachment is obsolete: true
There's something wrong with the second patch (but not the first) that makes it leak.
Attached patch patch (obsolete) (deleted) — Splinter Review
Corrected version of previous patch, in which I actually link new blocks into the list when I allocate them.
Attachment #375255 - Attachment is obsolete: true
Performance results from try server talos show that:

 (1) [Based on the 2 runs Friday morning] This patch makes Twinopen about 2-3% faster.

 (2) [Based on the 2 runs Thursday] the array might be slightly faster than the linked list, although that could have been confounded by the leak of Blocks in the linked list run.  (8 of the 10 runs were faster for Twinopen)

 (3) [Based on 2 runs Friday morning] It looks like this makes Tp slightly faster, but not much.  (8 of 10 runs were faster with the patch than without; the 2 that weren't were the 2 Vista runs)
Summary: implement cycle collector's purple buffer using circularly linked list → implement cycle collector's purple buffer with entries that can be cheaply removed from object in buffer
Blocks: 470684
Blocks: 377787
Attached patch patch (deleted) — Splinter Review
Polished up a few things, and now requesting review.
Attachment #375302 - Attachment is obsolete: true
Attachment #375735 - Flags: superreview?(peterv)
Attachment #375735 - Flags: review?(peterv)
Comment on attachment 375735 [details] [diff] [review]
patch

And requesting additional review from Benjamin regarding the XPCOM glue code and the frozenness of things (including nsPurpleBufferEntry).  Also note that we might want to consider this for 1.9.1.
Attachment #375735 - Flags: review?(benjamin)
Comment on attachment 375735 [details] [diff] [review]
patch

Ugh... maybe we should have hidden CC from the glue altogether: I forget why we added it in the first place.
Attachment #375735 - Flags: review?(benjamin) → review+
(In reply to comment #15)
> I forget why we
> added it in the first place.
Maybe because it is useful for binary extensions.
Comment on attachment 375735 [details] [diff] [review]
patch

>diff --git a/xpcom/base/nsCycleCollector.cpp b/xpcom/base/nsCycleCollector.cpp

> nsPurpleBuffer::NoteAll(GCGraphBuilder &builder)
> {
>-    SpillAll();
>-    mBackingStore.Enumerate(noteAllCallback, &builder);
>+    mCompatObjects.EnumerateEntries(selectionCallback, &builder);

s/selectionCallback/noteAllCallback/

>@@ -2164,19 +2188,18 @@ nsCycleCollector::Suspect(nsISupports *n

>+    // Caller is responsible for filling in result's mRefCnt.

This isn't needed for compat objects, right?

>+    return mPurpleBuf.PutCompatObject(n);

>diff --git a/xpcom/glue/nsISupportsImpl.h b/xpcom/glue/nsISupportsImpl.h

>   nsrefcnt decr(nsISupports *owner)

>+      if (NS_UNLIKELY(refcount == 0)) {
>+        if (NS_UNLIKELY(!NS_CycleCollectorForget2(e))) {
>+          NS_NOTREACHED("forget should not fail when reference count hits 0");
>+        }
>+        mTagged = NS_CCAR_REFCNT_TO_TAGGED(refcount);

or mTagged = NS_CCAR_TAGGED_STABILIZED_REFCNT?
Attachment #375735 - Flags: superreview?(peterv)
Attachment #375735 - Flags: superreview+
Attachment #375735 - Flags: review?(peterv)
Attachment #375735 - Flags: review+
(In reply to comment #17)
> >+      if (NS_UNLIKELY(refcount == 0)) {
> >+        if (NS_UNLIKELY(!NS_CycleCollectorForget2(e))) {
> >+          NS_NOTREACHED("forget should not fail when reference count hits 0");
> >+        }
> >+        mTagged = NS_CCAR_REFCNT_TO_TAGGED(refcount);
> 
> or mTagged = NS_CCAR_TAGGED_STABILIZED_REFCNT?

I'd rather keep the stabilizeForDeletion logic separate, as it was before.  (These values are actually different; the stabilized refcount is PURPLE_ENTRY_TO_TAGGED(0) rather than REFCNT_TO_TAGGED(0).)

Other comments addressed, though.
Fine by me.
http://hg.mozilla.org/mozilla-central/rev/f152b230cd48
Status: ASSIGNED → RESOLVED
Closed: 15 years ago
Priority: -- → P3
Resolution: --- → FIXED
There's also a Tp3 improvement (around 1%) visible in the graphs for all platforms other than Vista (which is way too noisy to detect a <5% change).
Depends on: 492324
Comment on attachment 375735 [details] [diff] [review]
patch

This gets a measurable somewhat-across-the-board performance improvement, most noticeable on Twinopen and probably some "DHTML" testcases.
Attachment #375735 - Flags: approval1.9.1?
Attachment #375735 - Flags: approval1.9.1? → approval1.9.1+
(please land this today so we can have some extra time to shake it out for topcrashes)
I'm unable to land it today, but I will probably be able to land it tomorrow morning EDT (UTC-4).  If somebody else could land it today that would be helpful.
Keywords: checkin-needed
(I'll have it merged by this evening if somebody else can watch the tree for me, though.)
http://hg.mozilla.org/releases/mozilla-1.9.1/rev/0866be0a6600

(I still need to fix up the comments about version numbers in the XPCOM frozen functions, on both branches, to reflect that this landed on 1.9.1.)
Target Milestone: mozilla1.9.2a1 → mozilla1.9.1
Yesterday afternoon I backed this out temporarily from 1.9.1 and then relanded it as part of the investigation of bug 494255.  It looks like it did not cause that bug; however, it appears that this patch did hurt some performance numbers on Windows (Tdhtml on XP and Vista; Tp3 on XP) while helping them on Linux and Mac (Txul and Tdhtml (and maybe also Tp3) on Linux, Txul on Mac OS X 10.4 and 10.5, Tp3 on Mac OS X 10.4)

Perhaps that means I should back it out permanently, though I really don't see why it would cause a regression.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: