Closed Bug 377787 Opened 18 years ago Closed 13 years ago

Cycle collector performance tracking

Categories

(Core Graveyard :: Tracking, defect)

defect
Not set
normal

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: bzbarsky, Assigned: chofmann)

References

(Blocks 1 open bug)

Details

(Keywords: meta, perf)

Attachments

(1 file, 1 obsolete file)

This is a tracking bug for performance work on the cycle collector.
Depends on: 377606
Keywords: meta, perf
OS: Linux → All
Hardware: PC → All
Attached file Zipped-up profile as of today (obsolete) (deleted) —
This is a profile of a build with the patches to bug 373693 applied. To get this, I loaded a large bonsai query in one window, and browsed in another one. Every so often I moused over the bonsai window to trigger traversal of all that stuff. I started and stopped jprof programmatically around the three calls to nsCycleCollector_collect in nsJSEnvironment; if we have other entry points into it (other than shutdown), I'd love to know. The profile largely shows hashtable stuff, actually. I should try the patch in bug 377606; didn't realize it was there.
Actually, looks like I can't apply that on top of the patch to bug 373693. I'll just wait for more stuff to land.
Attachment #261877 - Attachment is patch: false
Attachment #261877 - Attachment mime type: text/plain → application/zip
changing Browser.sessionhistory.max_total_viewers in about:config from -1 to 3 produced a noticeable performance increase. perhaps an at least temporary "workaround" would be to submit a patch that changes it to (at least) 3 upon install and maybe if its possible changes it if ff is already installed. Mozilla/5.0 (X11; U; Linux i686; en-US; rv:1.9a4pre) Gecko/20070417 Minefield/3.0a4pre - Build ID: 2007041704
Depends on: 377884
Depends on: 372110
Attached file Updated profile (deleted) —
This was generated the same way that the profile in comment 1 was generated, but in a build with the patches for bug 372110, bug 373693, bug 374872, bug 377606 applied (basically, all the patches checked in as of right now as far as I can tell). Every so often I hovered the bonsai page to make sure it was purple (from the mousemove event dispatch). There are 455673 hits under nsCycleCollector::Collect, broken down mostly as as: 191121 nsCycleCollector::MarkRoots() 164912 nsCycleCollector::ScanRoots() 43962 nsCycleCollector::CollectWhite() 31329 nsXPConnect::BeginCycleCollection() 9144 PL_DHashTableFinish 7560 nsXPConnect::FinishCycleCollection() 6565 memset nsXPConnect::BeginCycleCollection is spent under JS_GC, memset (allocating the hashtable, I guess), and PL_DHashTableFinish. nsCycleCollector::CollectWhite just calls PL_DHashTableEnumerate, which pretty much just calls FindWhiteCallback. About 40% of the time is spent in FindWhiteCallback itself, another 30% under nsCycleCollector::Forget, 5% QI on nsTextNodes, 5% UnmarkPurple on generic elements and text nodes, 10% in QI on various HTML elements. nsCycleCollector::ScanRoots basically calls GraphWalker::Walk, which more or less calls nsCycleCollectionXPCOMRuntime::Traverse (there is a detour that gets there through nsXPConnect::Traverse calling GraphWalker::DescribeNode calling scanWalker::VisitNode calling GraphWalker::Walk). The time under there is basically split between: canonicalize() -- 24% of ScanRoots PL_DHashTableOperate -- 27% of ScanRoots, split up by caller more or less as: 50% -- GraphWalker::NoteXPCOMChild 25% -- GraphWalker::Walk 10% -- nsDocument::GetReference 10% -- nsBindingManager::GetBinding 3% -- GraphWalker::NoteScriptChild 2% -- nsContentUtils::TraverseListenerManager ToParticipant -- 10% of ScanRoots. nsDeque push and pop -- 8% of ScanRoots (from Walk and NoteXPCOMChild). ScanBlackWalker::ShouldVisitNode -- 2.5% of ScanRoots In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of ScanRoots In (not under) nsGenericElement::cycleCollection::Traverse -- 4% of ScanRoots Various other non-hotspotty stuff. nsCycleCollector::MarkRoots calls GraphWalker::Walk, which mostly calls nsCycleCollectionXPCOMRuntime::Traverse, etc. From the bottom up, we have: canonicalize() -- 20% of MarkRoots PL_DHashTableOperate -- 37% of MarkRoots, split up by caller more or less as: 66% -- GraphWalker::NoteXPCOMChild 16% -- GraphWalker::Walk 6% -- nsDocument::GetReference 4% -- nsBindingManager::GetBinding 2% -- GraphWalker::NoteScriptChild 2% -- nsContentUtils::TraverseListenerManager ToParticipant -- 9% of MarkRoots. nsDeque push and pop -- 6% of MarkRoots (from Walk and NoteXPCOMChild). In (not under) GraphWalker::NoteXPCOMChild -- 2.5% of MarkRoots In (not under) nsGenericElement::cycleCollection::Traverse -- 3% of ScanRoots Various other non-hotspotty stuff, I think. Things that jump out at me: canonicalize() and ToParticipant() together make up 24% of the total cycle collection time. I wonder whether there's some way to at least cache the results between ScanRoots and MarkRoots. PL_DHashTableOperate makes up another 25% of the total. Some of these tables we could try to optimize a bit; for example, could we use bits on nodes to indicate whether they have a binding or a preserved wrapper so that we don't have to hit the hashtable for all the ones that do not? Basic problem really is that we're walking a _lot_ of stuff here. :( The pause I was profiling is about 3 seconds long on my machine, without jprof running. Note: I got some of the data above using -i and -e options to jprof; I'm not sure it's possible to get some of those numbers directly out of the profile that I'm attaching.
Attachment #261877 - Attachment is obsolete: true
Filed bug 378389 on the preserved wrapper table and bug 378390 on the binding manager table.
Depends on: 378389, 378390
Depends on: 378514
(In reply to comment #4) 5% QI on > nsTextNodes, 10% in QI on > various HTML elements. Could we improve QIing, changing the order of NS_INTERFACE_MAP_ENTRYs.
Yes, we could. That could hurt performance elsewhere, of course....
Yes, improving QI is 1 of the 3 things we're going to try next: 1) Don't traverse some root objects (probably nsGlobalWindow and nsDocument) that hang on to a large number of other objects, if we can know for sure that they are alive (black). Sicking will file a bug and make a patch. 2) Improve QI by moving cycle collection entries. I will file a bug and make a patch. 3) Record the traversal, so that we can play it back instead of traversing multiple times. Dbaron filed bug 378514 for this.
Depends on: 378595
Depends on: 378987
Depends on: 379593
Can we mark this fixed, or is there more work to do?
Depends on: 491364
Depends on: 490695
I'm going to mark this fixed as it hasn't been touched in a few years. Later CC perf tracking bugs: bug 549533 and bug 698919.
Blocks: 698919
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
No longer blocks: 698919
Blocks: 698919
Product: Core → Core Graveyard
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: