Closed Bug 140758 Opened 23 years ago Closed 23 years ago

[FIX]Cache results of getElementsByTagName on the document

Categories

(Core :: DOM: Core & HTML, defect, P1)

defect

Tracking

()

VERIFIED FIXED
mozilla1.0

People

(Reporter: bzbarsky, Assigned: bzbarsky)

References

Details

(Keywords: embed, perf, topembed-)

Attachments

(2 files, 8 obsolete files)

Right now, every call to getElementsByTagName produces a new nsContentList object. This is very expensive if getElementsByTagName is being called in a loop or even just often between garbage collections. Per discussion with jst, we should cache these content lists and return them, instead of creating new objects.
adding some bugs that show how we could benefit from this to dependency list. In particular, the table at http://bugzilla.mozilla.org/show_bug.cgi?id=118933#c72 shows how much we have to go and the comments at http://bugzilla.mozilla.org/show_bug.cgi?id=132974#c5 shows a possible win from this caching.
Blocks: 118933, 132974
Keywords: perf
Priority: -- → P1
Target Milestone: --- → mozilla1.2alpha
Attached patch Patch v 1.0 (obsolete) (deleted) — Splinter Review
This patch makes us about 2.5 times faster on tight loops which call getElementsByTagName (tested on http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm and the page in bug 132974). Reviews? Testing? I have not run this through a rigorous "did I break getElementsByTagName?" test yet, so if someone wants to do that... :)
I don't see how you keep the cache consistent when someone adds or removes tags from the page dynamically. Am I missing something? I don't know much about this code.
Addition/removal of tags is taken care of by the content lists themselves (they observe the document). This patch just changes the following situation: var a = document.getElementsByTagName("a"); var b = document.getElementsByTagName("a"); Before this change |a| and |b| would end up as two separate objects, both observing the document and responding to content changes. With this patch, we cache the content list on the first call and just pass back a pointer to the cached content list on the second call, so |a| and |b| end up being the same object.
Comment on attachment 82313 [details] [diff] [review] Patch v 1.0 >+ nsContentList* contentList = entry->mContentList; >+ >+ if (contentList && >+ contentList->GetDocument() == aDocument && >+ contentList->GetMatchAtom() == aMatchAtom && >+ contentList->GetMatchNamespaceId() == aMatchNameSpaceId && >+ contentList->GetRootContent() == aRootContent) { >+ *aInstancePtrResult = contentList; >+ } Why are all those checks needed? The hash key encodes all those properties of the content content list, so they must be equal, must they not? Apart from the fact that I don't know the hash table code, this looks great :-)
> The hash key encodes all those properties of the content > content list, so they must be equal, must they not? Well, no. There's the possibility of hash collisions.... Not very likely, but possible.
Attached patch Patch v 1.1 (obsolete) (deleted) — Splinter Review
Same thing, but doesn't touch nsIContentList; no one who doesn't use raw nsContentList objects needs to know about this.
Attachment #82313 - Attachment is obsolete: true
Comment on attachment 82421 [details] [diff] [review] Patch v 1.1 +PR_STATIC_CALLBACK(PLDHashNumber) +ContentListHashHashKey(PLDHashTable *table, const void *key) Calling these functions |ContentListHashTable...etc| will avoid a weirdname such as the above. ================ + if (--gContentListCount == 0 && gContentListHashTableInited) { + PL_DHashTableFinish(&gContentListHashTable); + gContentListHashTableInited = PR_FALSE; + } The patch is not adding all content lists to the HT (only one constructor is calling AddToHashTable(), so |gContentListCount| is not guaranteed to be in sync with what is in the HT. Are there any users of the other constructors? Since you now want users to create a list via the single entry-point of the factory function that you just added, you might as well fold the other constructors under the umbrella of the factory to avoid confusion -- even if they don't add to the hash. ================ nsContentList::~nsContentList() { + RemoveFromHashtable(); if (mDocument) { NS_IMETHODIMP nsContentList::DocumentWillBeDestroyed(nsIDocument *aDocument) { if (mDocument) { + // Our key will change... Best remove ourselves before that happens. + RemoveFromHashtable(); Why is the first one removed unconditionnally and the second one isn't? In the first one, what happens in the case of a shared list as you noted in comment #5 (in a C++ side, |b| could die when going out of certain scope for example). You should perhaps refcount the entries instead? (this could cater for the |gContentListCount| mentioned above). ================ + inline nsIDocument* GetDocument(void) { return mDocument; } + inline nsIAtom* GetMatchAtom(void) { return mMatchAtom; } + inline PRInt32 GetMatchNamespaceId(void) { return mMatchNameSpaceId; } + inline nsIContent* GetRootContent(void) { return mRootContent; } Not necessary to add the |inline| keyword since they are already inline by virtue of being unrolled in the class declaration itself.
Attachment #82421 - Flags: needs-work+
Comment on attachment 82421 [details] [diff] [review] Patch v 1.1 >+// Hashtable for storing nsContentLists >+static PLDHashTable gContentListHashTable; >+static PRBool gContentListHashTableInited = PR_FALSE; Since the standard guarantees null-ness of static memory, you can test gContentListHashTable.ops to see whether it's been initialized. (If you maintain the invariant that you destroy the table when its size hits zero, you could also use entryCount.) >+static PRUint32 gContentListCount = 0; Can you just use gContentListHashTable.entryCount ? >+class ContentListHashEntry : public PLDHashEntryHdr Perhaps |struct|? >+{ >+public: >+ ContentListHashEntry(const PRUint32 aKey) >+ : mKey(aKey), mContentList(nsnull) >+ { >+ } pldhash will zero-initialize the entry space for you (although your customized clearEntry breaks that), so you don't need to null-initialize. It's more traditional not to use ctors/dtors, skip the initEntry callback, and just fill in your key manually where you fill in the value. >+ ~ContentListHashEntry() >+ { >+ } Why bother writing an empty destructor just so you can write a customized clearEntry callback that calls it? >+PR_STATIC_CALLBACK(const void *) >+ContentListHashGetKey(PLDHashTable *table, PLDHashEntryHdr *entry) >+{ >+ ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry); >+ return &e->mKey; In a patch that I haven't got working yet, I used NS_INT32_TO_PTR here (and it and NS_PTR_TO_INT32 in other places, rather than using pointer-to-integer). However, this is a moot point given the advice below. >+// Calculate the hash key. >+ >+// The most variable part of the key will likely be the aMatchAtom, so >+// take the low 12 bits of that, drop the last two, and use those 10 >+// as the low 10 bits of our key. >+// The next 6 bits of the key will be the six low bits of the namespace id >+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of >+// aRootContent >+// The next 8 bits are the 8 low bits (after dropping the lowest 2) of >+// aDocument >+static const PRUint32 >+CalculateHashKey(nsIDocument* aDocument, nsIAtom* aMatchAtom, >+ PRInt32 aMatchNameSpaceId, nsIContent* aRootContent) >+{ >+ NS_ASSERTION(sizeof(PRWord) == sizeof(void *), >+ "PRWord and void* not the same, so this is likely going to go astray"); >+ return >+ ((NS_REINTERPRET_CAST(PRWord, aMatchAtom) & 0xFFF) >> 2) | >+ ((aMatchNameSpaceId & 0x3F) << 10) | >+ (((NS_REINTERPRET_CAST(PRWord, aRootContent) & 0x3FF) >> 2) << 16) | >+ (((NS_REINTERPRET_CAST(PRWord, aDocument) & 0x3FF) >> 2) << 24); >+} OK, now I see what you're doing. So what you really want to do here is have 2-word entries rather than three word entries, and you can use more of the stub ops since you'll look exactly like PLDHashEntryStub (except for a typed |ContentList* mContentList| instead of void *key). You'll then use the nsContentList as your key, essentially, although I think the best way to do this might to create |nsContentListKey| with those 4 members, and make nsContentList derive from that (protected inheritance, with a |GetHashKey()| member like the ones you added), and make your |getKey| callback do an NS_STATIC_CAST to nsContentListKey* from the content list, and then you can put the hash calculation stuff into the |hashKey| callback and just construct a temporary nsContentListKey on the stack when you want to do a lookup. This seems a bit simpler / more elegant to me, although it does have the key construction on the stack (which should be simple enough, though). (BTW, you're probably better off not ignoring so many bits and just XORing on top of things.) In fact, you *have* to do this because otherwise key collisions within your 32-bit int space will cause entries to be lost. >+NS_EXPORT nsresult >+NS_GetContentList(nsContentList** aInstancePtrResult, >+ nsIDocument* aDocument, nsIAtom* aMatchAtom, >+ PRInt32 aMatchNameSpaceId, nsIContent* aRootContent) >+{ >+ *aInstancePtrResult = nsnull; >+ // First we look in our hashtable. Then we create a content list if needed >+ if (gContentListHashTableInited) { Again, check .ops here. >+ const PRUint32 hashKey = CalculateHashKey(aDocument, aMatchAtom, >+ aMatchNameSpaceId, aRootContent); >+ >+ ContentListHashEntry *entry = >+ NS_STATIC_CAST(ContentListHashEntry *, >+ PL_DHashTableOperate(&gContentListHashTable, >+ &hashKey, >+ PL_DHASH_LOOKUP)); >+ if (PL_DHASH_ENTRY_IS_LIVE(entry)) { >+ nsContentList* contentList = entry->mContentList; >+ >+ if (contentList && >+ contentList->GetDocument() == aDocument && >+ contentList->GetMatchAtom() == aMatchAtom && >+ contentList->GetMatchNamespaceId() == aMatchNameSpaceId && >+ contentList->GetRootContent() == aRootContent) { Every single one of these checks should be true if you write matchEntry correctly, given the above changes to the entry structure. >+ *aInstancePtrResult = contentList; >+ } >+ } >+ } >+ >+ if (!*aInstancePtrResult) { >+ // We need to create one >+ *aInstancePtrResult = new nsContentList(aDocument, aMatchAtom, >+ aMatchNameSpaceId, aRootContent); Do you want to add it to the hashtable here, in general? (If so, then the simple solution is to do a PL_DHASH_ADD instead of a PL_DHASH_LOOKUP, and just fill it in if you have to create it.) >+ if (--gContentListCount == 0 && gContentListHashTableInited) { >+ PL_DHashTableFinish(&gContentListHashTable); >+ gContentListHashTableInited = PR_FALSE; If this is all you're using the |gContentListCount| for, then it certainly could just be the entryCount on the table. >+// Support functions for hashing content lists >+void >+nsContentList::AddToHashtable() >+{ >+ static PLDHashTableOps hash_table_ops = >+ { >+ PL_DHashAllocTable, >+ PL_DHashFreeTable, >+ ContentListHashGetKey, >+ ContentListHashHashKey, >+ ContentListHashMatchEntry, >+ PL_DHashMoveEntryStub, >+ ContentListHashClearEntry, >+ PL_DHashFinalizeStub, >+ ContentListHashInitEntry >+ }; >+ >+ if (!gContentListHashTableInited) { You can check against the table's .ops here. >+ gContentListHashTableInited = PL_DHashTableInit(&gContentListHashTable, >+ &hash_table_ops, nsnull, >+ sizeof(ContentListHashEntry), >+ 16); >+ >+ if (!gContentListHashTableInited) You'll want to null .ops on failure here, though. >+ return; >+ } >+ >+ const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom, >+ mMatchNameSpaceId, mRootContent); >+ >+ ContentListHashEntry *entry = >+ NS_STATIC_CAST(ContentListHashEntry *, >+ PL_DHashTableOperate(&gContentListHashTable, >+ &hashKey, PL_DHASH_ADD)); >+ if (!entry) >+ return; >+ >+ entry->mContentList = this; Again, this could be much less complicated (see above). >+void >+nsContentList::RemoveFromHashtable() >+{ >+ if (!gContentListHashTableInited) >+ return; >+ >+ const PRUint32 hashKey = CalculateHashKey(mDocument, mMatchAtom, >+ mMatchNameSpaceId, mRootContent); >+ >+ ContentListHashEntry *entry = >+ NS_STATIC_CAST(ContentListHashEntry *, >+ PL_DHashTableOperate(&gContentListHashTable, >+ &hashKey, >+ PL_DHASH_LOOKUP)); >+ if (!PL_DHASH_ENTRY_IS_LIVE(entry)) >+ return; >+ >+ if (entry->mContentList != this) >+ return; // Not us! >+ >+ PL_DHashTableRawRemove(&gContentListHashTable, entry); Given the changes in key structure, this could just be a PL_DHASH_REMOVE.
Attachment #82421 - Flags: needs-work+
Comment on attachment 82421 [details] [diff] [review] Patch v 1.1 Re: comment 9: > |gContentListCount| is not guaranteed to be in sync with what is in the HT. It need not be. I just wanted to make sure we called finish() on the hashtable before shutdown.. think of it as a poor man's shutdown observer... :) David's comments supercede this, in any case. > Are there any users of the other constructors? There are, but I'm not sure I want to define multiple overloaded copies of this function most of which will do nothing but call the constructor. Also, note that this is not exactly a factory function because it does not addref the out param, hence it not being called NS_New*. > Why is the first one removed unconditionnally and the second one isn't? Because there is no reason to RemoveFromHashtable() in the case when our hash key is not going to change? On the other hand, removing ourselves in the destructor is a must -- the hash table holds no reference and we want to avoid stale pointers here. > In the first one, what happens in the case of a shared list Absolutely nothing. As long as _someone_'s holding a reference to the nsContentList (these are refcounted, btw) the destructor is not going to be called. If the destructor is being called that means no one is holding a reference to the content list anymore. Good call on the inline thing. Re: Comment 10 and comment 11: Ooh. Nice. :) Updated patch coming up.
Attachment #82421 - Flags: needs-work+
Attached patch Patch v 1.2 (obsolete) (deleted) — Splinter Review
Changes: Address David's comments, using the nsContentList as the key. Use .ops to test whether the table is initialized and .entryCount to decide when to finish the table. Move all the hash-addition code to NS_GetContentList, so people who want to use the hashtable can call that and people who don't can use the raw |new|.
Attachment #82421 - Attachment is obsolete: true
Comment on attachment 82565 [details] [diff] [review] Patch v 1.2 >+#define ROTATE(num, bits) (((num) << bits) | ((num) >> (32-bits))) >+ >+PRUint32 >+nsContentListKey::GetHash(void) const >+{ >+ // mMatchNameSpaceId is not likely to get big, so give it only 6 bits. >+ // mMatchAtom is likely the most variable part, so give it 10 bits. >+ // The other two pointers get 8 bits each. >+ return >+ NS_PTR_TO_INT32(mMatchAtom.get()) ^ >+ ROTATE(mMatchNameSpaceId, 10) ^ >+ ROTATE(NS_PTR_TO_INT32(mRootContent), 16) ^ >+ ROTATE(NS_PTR_TO_INT32(mDocument), 24); >+} It's probably ok to just do return NS_PTR_TO_INT32(mMatchAtom.get()) ^ (NS_PTR_TO_INT32(mRootContent) << 8) ^ (NS_PTR_TO_INT32(mDocument) << 16) ^ mMatchNameSpaceId << 24; >+PR_STATIC_CALLBACK(const void *) >+ContentListHashtableGetKey(PLDHashTable *table, PLDHashEntryHdr *entry) >+{ >+ ContentListHashEntry *e = NS_STATIC_CAST(ContentListHashEntry *, entry); >+ return e->mContentList; Hmm. I would think this needs to be return NS_STATIC_CAST(nsContentListKey*, e->mContentList); although actually you might need a member variable on nsContentList to do that since it's protected inheritance. I'm surprised this worked (or did this cost you the perf gain?). >+ ContentListHashEntry *entry = nsnull; >+ // First we look in our hashtable. Then we create a content list if needed >+ if (gContentListHashTable.ops) { >+ nsContentListKey hashKey(aDocument, aMatchAtom, >+ aMatchNameSpaceId, aRootContent); >+ >+ // A PL_DHASH_ADD is equivalent to a PL_DHASH_LOOKUP for cases >+ // when the entry is already in the hashtable. >+ entry = >+ NS_STATIC_CAST(ContentListHashEntry *, >+ PL_DHashTableOperate(&gContentListHashTable, >+ &hashKey, >+ PL_DHASH_ADD)); >+ *aInstancePtrResult = entry->mContentList; This is misindented, and it assumes you didn't hit out-of-memory by dereferencing entry (which you're more carefuly about below). >+void >+nsContentList::RemoveFromHashtable() >+{ >+ if (!gContentListHashTable.ops) >+ return; >+ >+ nsContentListKey hashKey(mDocument, mMatchAtom, mMatchNameSpaceId, mRootContent); You don't need to construct a key -- you can just use NS_STATIC_CAST(nsContentListKey*, this) > class nsContentList : public nsBaseContentList, >+ protected nsContentListKey, Given the |protected| members above, maybe this could just be |public|, at least if you give the |public| member functions les confusing names.
> I'm surprised this worked I am too, now that you point it out... :) This is why I was using stack-allocated keys when it was not really needed -- otherwise it did _not_ work. Thanks for pointing out why.... I hate casts, some days... Patch with all that fixed coming up.
Attached patch Patch v 1.3 (obsolete) (deleted) — Splinter Review
Addresses comment 13's issues on casting and OOM. In fact, I've dropped my custom hash entry struct and the custom getKey callback, since the stubs work just fine once I do the casting right. I'd rather keep the inheritance |protected|, but it's jst's call on what he wants in his module...
Attachment #82565 - Attachment is obsolete: true
How can you drop the getKey callback? The one you had before was equivalent to the stub, but I said in comment 13 that it was broken.
The brokenness was in having an nsContentList* member that I was returning without casting to nsContentListKey*. Since nsContentListKey is not the first class nsContentList inherits from, taking that return value and casting to nsContentListKey* was bad (it gave a pointer to the wrong data). Essentially, it was doing NS_STATIC_CAST(nsContentListKey*, NS_STATIC_CAST(void*, nsContentList* foo)) which is bad. The new version makes sure that I always cast to nsContentListKey* before writing the key and always cast to nsContentListKey* after reading the key. So the conversion is always nsContentListKey* --> void* or the other way, which is consistent. So this should be correct, unless I deeply misunderstand how casting works in C++ (which is entirely possible, btw).
You still need a |getKey| callback. It's only used when resizing the table, so you might not notice the bugs that the lack of it would cause. And, as I mentioned yesterday, you might want to switch which of the things you shift when computing the hash, since you know that only the low bits of the namespace are significant, and you can expect that the document is somewhat less significant given that you're already hashing on the root content node.
(The reason you need the getKey callback is to enforce the casting that you mention in comment 17. And it should just be what I mentioned in comment 13.)
Attached patch Patch v 1.4 (obsolete) (deleted) — Splinter Review
OK, that makes sense. Adds the getKey callback.
Attachment #82586 - Attachment is obsolete: true
Comment on attachment 82596 [details] [diff] [review] Patch v 1.4 OK, now that I look more closely, it still isn't right, because you're actually using the PLDHashEntryStub type, which has a |void *key|. You're probably better off defining your own type that has the exact same layout as the stub type (and is derived from PLDHashEntryHdr), but that uses a typed |nsContentList *mContentList|, so that you can use fewer casts.
Attached patch kind of like this? (obsolete) (deleted) — Splinter Review
Moves all the icky casting into the callback functions.
Comment on attachment 82608 [details] [diff] [review] kind of like this? Yeah, the hashtable stuff looks good to me now. >+ if (entry) >+ entry->mContentList = *aInstancePtrResult; If you're worried about allocation failure for the content list, you might want to change this to be: if (!*aInstancePtrResult) { if (entry) PL_DHashTableRawRemove(gContentListTable, entry) return NS_ERROR_OUT_OF_MEMORY; } if (entry) entry->mContentList = *aInstancePtrResult; (and perhaps even use a local variable instead of *aInstancePtrResult). >+ inline PRBool Equals(const nsContentListKey& aContentListKey) const; >+ inline PRUint32 GetHash(void) const; Maybe just write the inline functions inline in the class? Or at least use the inline keyword at the definition as well? Other than that, r=dbaron if you get module-owner review as your sr=.
Attached patch Patch v 1.73 (obsolete) (deleted) — Splinter Review
Move inlines to header, remove the entries if a new list cannot be allocated.
Attachment #82596 - Attachment is obsolete: true
Attachment #82608 - Attachment is obsolete: true
Comment on attachment 82681 [details] [diff] [review] Patch v 1.73 >+ list = new nsContentList(aDocument, aMatchAtom, >+ aMatchNameSpaceId, aRootContent); weird indentation. Other than that r=dbaron, as I said 2 comments back.
Attachment #82681 - Flags: review+
Comment on attachment 82681 [details] [diff] [review] Patch v 1.73 +nsContentListKey::nsContentListKey(nsIDocument *aDocument, + nsIAtom* aMatchAtom, + PRInt32 aMatchNameSpaceId, + nsIContent* aRootContent) + : mMatchAtom(aMatchAtom), + mMatchNameSpaceId(aMatchNameSpaceId), + mDocument(aDocument), + mRootContent(aRootContent) +{ +} + +nsContentListKey::nsContentListKey(const nsContentListKey& aContentListKey) + : mMatchAtom(aContentListKey.mMatchAtom), + mMatchNameSpaceId(aContentListKey.mMatchNameSpaceId), + mDocument(aContentListKey.mDocument), + mRootContent(aContentListKey.mRootContent) +{ +} I vould've made the above inline, but either way... - In NS_GetContentList(): + nsContentList* list; You'll need to initialize |list| to null or you could end up accessing it w/o ever having set it in error cases. And NS_GetContentList() really needs to refcount the list that it returns, not doing that in a method whose signature looks like it really would do that (even if the type of the out param is a concrete class) is not cool. Also, you're potentially passing back shared content lists, so there's no way for the caller to do the right thing in error cases. + list = new nsContentList(aDocument, aMatchAtom, + aMatchNameSpaceId, aRootContent); Fix the next-line indentation. - In nsContentList::RemoveFromHashtable(): + (void) PL_DHashTableOperate(&gContentListHashTable, + NS_STATIC_CAST(nsContentListKey*, this), + PL_DHASH_REMOVE); Loose the (void) noncense here :-) - In nsContentList.h: + nsIDocument* mDocument; + nsIContent* mRootContent; Add "// Weak" comments after these... Other than that, this looks good so far. Fix the above and I'll have one more look...
Attachment #82681 - Flags: review+ → needs-work+
Attached patch Patch v 1.732 (obsolete) (deleted) — Splinter Review
Fix those issues.
Attachment #82681 - Attachment is obsolete: true
Comment on attachment 82739 [details] [diff] [review] Patch v 1.732 sr=jst
Attachment #82739 - Flags: superreview+
So... the Win32 compiler won't let me cast entry->mContentList to nsContentListKey*, presumably because of the protected inheritance... I suppose I could just make it publicly inherited; any reason not to do that?
The simple alternative is to add a GetKey member function: nsContentListKey* GetKey() { return NS_STATIC_CAST(nsContentListKey*, this); } The only reason I can think of to avoid making it public is that the public member functions of the key class might not make sense as names of members of nsContentList.
dbaron's suggestion sounds good to me.
Attached patch Patch v 1.73205 (deleted) — Splinter Review
Make this compile on Windows/Mac/Irix/etc, per dbaron's suggestion.
Attachment #82739 - Attachment is obsolete: true
Comment on attachment 82782 [details] [diff] [review] Patch v 1.73205 sr=jst
Attachment #82782 - Flags: superreview+
Er.. the last part of that diff (to nsRuleNode.cpp) has nothing to do with this bug. The rest has been checked in on the trunk; I'll ask for branch approval in a day or two.
nominating for 1.0.0 and making topembed.
Keywords: mozilla1.0, topembed
Target Milestone: mozilla1.2alpha → mozilla1.0
Summary: Cache results of getElementsByTagName on the document → [FIX]Cache results of getElementsByTagName on the document
topembed-, embed. not directly impacting embedding customers but it'd be a good fix for performance.
Keywords: topembedembed, topembed-
nominating for nsbeta1, making us faster than IE is always good for releases :)
Keywords: nsbeta1
Drivers are leery of taking this for 1.0. Marking fixed to get off my radar, as it's fixed on the trunk.
Status: NEW → RESOLVED
Closed: 23 years ago
Resolution: --- → FIXED
Cc:ing boullet who did some timings of Mozilla vs. IE in bug 118933 comment 92 (where |getElementsByTagName| is about 4x slower compared to IE 5.5), maybe getting some real numbers here can help drivers to quantify the benefit/risk w.r.t. the competition.
As a note, I sent information on that in the approval request (I had people on #mozillazine test both IE and Mozilla on those exact testcases). This may be landing for 1.0.1, but 1.0 is being very much locked down at this point.
It would be interesting to have an idea of much improvement the patch brought about (there is still life after m1.0 :-) Got some measurements to share with us?
Sure thing. Testing with stock nightly builds from the morning and the evening of the day I checked this in, on Linux, on the testcase at http://www.formula-one.nu/mozilla/domtestcases/getElementsByTagName.htm: 2002-05-08-09 Average (1runs) :1655ms Average (5runs) :1876ms Average (10runs) :2047ms 2002-05-08-21 Average (1runs) :787ms Average (5runs) :736ms Average (10runs) :739ms So a factor of 2-3 improvement.
... and that's on a tiny document, the savings on a larger document should be an order of magnitude in normal useage (depending on the useage patterns, of course). Oh, and by the way Mr. "not reading bugmail till June 12", stop reading your bugmail! :-)
Here are the results with 2002052008 Mozilla is now very close to IE5.5 on getElementByTagName and I don't see anymore the GC effect that was making each run time very different. +---------------------+--------+----------+-------+----------+ |Test | before | w/ 82782 | Gain% | IE5.5 | |---------------------|--------|----------|-------|----------| |getElementById |1169 ms | 1095 ms | --- | 972 ms | |---------------------|--------|----------|-------|----------| |getElementsByTagName |4119 ms | 1185 ms | 71% | 1093 ms | |---------------------|--------|----------|-------|----------| |createElement |2193 ms | 2116 ms | --- | 760 ms | |---------------------|--------|----------|-------|----------| |getAttribute |1472 ms | 1398 ms | --- | 496 ms | |---------------------|--------|----------|-------|----------| |setAttribute |1425 ms | 1336 ms | --- | 541 ms | +---------------------+--------+----------+-------+----------+
That's awesome! The "GC effect" that we used to see here was due to the fact that we create so many new content lists and GC'ing all those took a long time, and the fact that we created so many of them could also have been the reason for running the GC in the first place. Either way, even if we still end up running the GC during these tests, it should be really really fast, and even if it does potentially (and intentionally) invalidate our cached result from getElementsByTagName(), we're still much much faster since even in a trivial case where we execute one getElementsByTagName() per branch we'd now end up walking the tree looking for those elements once every 4096 branch executions (assuming the GC is run that often even if we create no new objects per branch execution), where we used to walk the tree for every single branch execution (and create a new JSObject, not to mention adding one document observer per branch execution as well). We could, and I think we might want to do this, create an internal cache that would hold on to the last requested content list that we get from getElementsByTagName(), that way the last created content list would survive a GC run, even if the XPConnect wrapper for that content list might go away. That would save us the tree walking-after-GC that we might now end up doing in these testcases at least. Not a big deal on real-life pages I would imagine, but doing this would also have the benefit of possibly helping poorly written code that's running in a non-GC environment (read C++), i.e. a similar testcase written in C++ where getElementsByTagName() would be called in a tight loop and the result would be released between every call, the current cache would do us no good in such a case, but if we'd cache the last requested content list, we'd help code like that. Thoughts? Bz (or someone else), wanna add one more tiny cache to this code?
Would this "one-list cache" be global, per-document, or what?
I'd say a global one-list cache would give us pretty good mileage. I couldn't resist trying this out, so I have a patch. Not tested yet though...
Verified on 2002-05-23-08-trunk.
Status: RESOLVED → VERIFIED
Sent drivers an approval request for 1.0.1
Keywords: mozilla1.0.1
please checkin to the 1.0.1 branch ASAP. once there please remove the mozilla1.0.1+ keyword and add the fixed1.0.1 keyword.
Attachment #82782 - Flags: approval+
Checked in on branch.
Comment on attachment 84447 [details] [diff] [review] Add a global one-list cache that holds on to the last requested list... r=bzbarsky. Let's land that.
Attachment #84447 - Flags: review+
Attachment #84447 - Flags: superreview+
Comment on attachment 84447 [details] [diff] [review] Add a global one-list cache that holds on to the last requested list... sr=rbs
Checked in that one-list cache on trunk.
There is an improvement on windows but on linux there is no considerable improvement. Here are the results for 10 testruns with 2002-07-10-07-1.0 and 2002-07-11-08-1.0 branch builds. +---------------------------------------------------------+ |Test | w/2002-05-08 | Windows | Linux | | | comment # 44 | | | |---------------------|-------------------------|---------| |getElementById | 1095 ms | 961ms | 1474ms | |---------------------|-------------------------|---------| |getElementsByTagName | 1185 ms | 1096ms | 1788ms | |---------------------|-------------------------|---------| |createElement | 2116 ms | 2136ms | 2483ms | |---------------------|-------------------------|---------| |getAttribute | 1398 ms | 1252ms | 1966ms | |---------------------|-------------------------|---------| |setAttribute | 1336 ms | 1205ms | 1917ms | +---------------------------------------------------------+
I would expect absolutely no effect from that one-list cache for scripts run from JS, on any platform. It'll only be noticeable if someone calls getElementsByTagName from C++ code in a particularly dumb way. ;)
marking verified.
Component: DOM: Core → DOM: Core & HTML
QA Contact: stummala → general
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: