Closed Bug 213813 Opened 21 years ago Closed 16 years ago

FindMember could probably use optimizing

Categories

(Core :: XPConnect, enhancement)

enhancement
Not set
normal

Tracking

()

RESOLVED WONTFIX
Future

People

(Reporter: dbradley, Assigned: dbradley)

References

(Blocks 1 open bug)

Details

(Keywords: perf)

Attachments

(1 file, 3 obsolete files)

FindMember has shown up in some cases where repeated calls into XPConnect occur, with what I believe are objects with either many interfaces and/or interfaces with many members. Currently this is a two phase linear search, one for interfaces names, and another for method names within each interface.
Blocks: 212831
Status: NEW → ASSIGNED
Summary: FindMember could probably use a hash → FindMember could probably use optimizing
Attached patch Something to play with (obsolete) (deleted) — Splinter Review
I've attached a patch for people with DHTML issues to play with and see if they notice any benefit. Originally I was thinking of using a hash. I think this might be a better solution, though. The idea is to "cache" the most common methods hit. The assumption is that most objects have just a few methods at a time that get hit frequently. My fear was that a cache might create two much overhead. The cache basically tracks the 4 last used methods. Making searching the cache extermely quick, and misses not too expensive. The downside is that this adds another 52 bytes per XPCNativeSet. There are generally usually no more than 300 of these at least for the runs I've done so far, so we're talking an additional 15,600 bytes which I don't think is all that much in the grand schem of things. (Assuming there were 300 active at any given time). Also the cache tracks the number of misses. The idea here was that if we're getting several misses in a row, we need to turn the cache off. I don't have any way accurately profile this now. So I thought I'd get it out so others can take a look maybe with jprof or the like. The patch also optimizes some of the loops in XPCNativeInterface and XPCNativeSet. In many of the loops it was incrementing a pointer as well as a index. I removed the index and changed to code to work with just the pointer. It's possible some clever compiler's optimizer consolidated this, but I doubt it. It also makes the loop more like the container/interator pattern which I think is a plus. I've ran the code, so I don't think I broke anything in doing this, but let me know if you encounter crashes or screwy behavior. In just starting up and shutting down I don't see a lot of improvement. Hits are slightly less than misses. I'm hoping the at least the gains from the hits cover the overhead of the misses. It's hard to say because the stats I'm currently keeping don't tell me about that. I'm going to try and add more stats to better determine how effective this approach is. Hashing might still be a better alternative, but I was thinking this might be a simple approarch with less memory allocation overhread. Stats are kept if you compile and your defined as an DEBUG_xpc_hacker. Otherwise no stats are kept. They are printed at program termination.
Attached patch Fixed release build issues (obsolete) (deleted) — Splinter Review
Comment on attachment 128862 [details] [diff] [review] Fixed release build issues >- const XPCNativeMember* member = mMembers; >- for(int i = (int) mMemberCount; i > 0; i--, member++) >+ for(XPCNativeMember const * member = mMembers; member < mMembers + mMemberCount; >+ ++member) Decent compilers can tell that mMembers and mMemberCount are loop-invariant, but you might want to store mMembers + mMemberCount in an end local, similar to the way you do later, just to be sure. >- XPCNativeMember* member = mMembers; >- for(int i = (int) mMemberCount; i > 0; i--, member++) >+ for(XPCNativeMember* member = mMembers; member < mMembers + mMemberCount; >+ ++member) Ditto. >+ inline >+ void TurnCacheOff() const >+ { >+#ifdef DEBUG_xpc_hacker >+ ++NonConst()->mCacheOff; >+#endif >+ NonConst()->mMemberCache->mName = JSVAL_VOID; Nit: mMemberCache[0].mName might be better style, since mMemberCache is an array. >+ MemberCacheEntry * entry; >+ if(IsCacheOn()) >+ { >+ MemberCacheEntry const * const end = mMemberCache + MEMBER_CACHE_SIZE; >+ for(entry = NonConst()->mMemberCache; entry < end; ++entry) >+ { >+ if(name == entry->mName) >+ { >+ if(entry->mIndex == -1) >+ { >+ CacheHit(); >+ NonConst()->mMisses = 0; >+ return JS_FALSE; >+ } >+ if(pMember) >+ *pMember = entry->mMember; >+ if(pInterfaceIndex) >+ *pInterfaceIndex = entry->mIndex; >+ CacheHit(); >+ NonConst()->mMisses = 0; >+ return JS_TRUE; >+ } >+ // If we're at the end of the list >+ else if(entry->mName == JSVAL_NULL) No sense in else after return. >+ break; >+ } >+ // If we're missing too many turn off the cache >+ if (entry == mMemberCache + MEMBER_CACHE_SIZE && >+ ++NonConst()->mMisses == MAX_MISSES) >+ { >+ TurnCacheOff(); >+ } >+ CacheMiss(); There's no way to turn the cache back on when you start getting better locality of reference again. Idea: use a direct mapped cache, not linear search through a small array. Use a good hash function and do not probe on collision, replacing instead. Then the cache cost is constant and smaller (it's nearly constant now, but may be bigger than you like, especially if MEMBER_CACHE_SIZE is tuned up). With a direct-mapped cache, you could increase MEMBER_CACHE_SIZE to a larger size, depending on your bloat tolerance, and not have to worry about time complexity getting out of hand. /be
Thanks for the ideas Brendan. I meant to use "end" in all cases, probably got lost when I zapped my tree and forgot I hadn't saved my most recent changes and reverted back to my last saved patch. Yes, the direct-mapped cache would most likely yield better results. I was going simple for now to see if this would even be worth pursuing. I'm trying to figure out how you would know when to turn the cache back on. Wouldn't the direct mapped cache eliminate the need for this anyway? You'd need to know that the misses have subsided, and I'm not sure how to do that without utilizing the cache. I did notice the cache gets turned off a fair bit, whether that misses a lot of potential cach hits, I don't have data to support that. One thought might be to turn the cache back on after a GC during the mark phase or maybe when it gets reused through GetNewOrUsed
Would be great if you could provide a win32 build somewhere for testing.
Blocks: 195408
Attached patch FindMember using a simple hash mechanism. (obsolete) (deleted) — Splinter Review
This usese a simple hash. I'm looking for input on a better algorithm. I'm basically stripping off the tag bits of the jsval and then and'ing the last few bits to create an index into the cache. I'm seeing a fair number of collisions, but that may just be par for the course. I'm still experimenting the the cache size and minimum number of members. This seems to give some DHTML pages a 25% boost in speed. I suspect we'll see similar gains for any JavaScript based loops that make native calls to relatively inexpensive native functions. I've been running with the patch for a while now and it seems pretty stable.
Attachment #128815 - Attachment is obsolete: true
Attachment #128862 - Attachment is obsolete: true
You could try out this implementation of HashName (I just happened to see this bug at about the same time I wanted to evaluate the FNV-1a hash, so...) So, the two questions here are: 1) Does this appreciably decrease collisions and misses? 2) Does the expense of this HashName() compared to the old version outweigh to benefits gained from a greater hit ratio? I don't have a reasonable testcase or instrumentation to answer those questions, I think. inline static PRUint32 HashName(jsval name) { // fnv-1a hash // fnv magic numbers #define FNV_32_PRIME ((PRUint32)0x01000193) #define FNV1_32_INIT ((PRUint32)0x811c9dc5) PRUint32 rtn = FNV1_32_INIT; PRUint32 value = NS_STATIC_CAST(PRUint32, name) >> JSVAL_TAGBITS; int i = sizeof(value); // number of bytes of value to mix while (i--) { rtn ^= (unsigned char) value; // mix in low 8 bits of value value >>= 8; rtn *= FNV_32_PRIME; } rtn ^= rtn >> 16; // fold some of the higher-bit entropy down rtn = rtn & PRUint32(MEMBER_CACHE_SIZE - 1); return rtn; }
Blocks: 213943
David, Though I haven't taken a look at the XPCOM code (haven't got any time, sorry), I have an algorithm you might be interested in. I assume that in the two phase linear search you're talking about in your original report you are performing a string compare. I've used this algorithm successfully in a .Net environment but it should be work in c++ as well (though c++ is already more efficient by nature). It's fast but I'm not sure if it beats a hashtable. That perhaps depends on the number of items in your list. I did the following. I generated a 32-bit code for every string in my list in the following way: Turn on bit 0 if the string contains an 'a' or 'A' " 1 " " 'b' or 'B' ... " 25 " " 'z' or 'Z' " 26 " " a digit " 27 " " a '_' or '+' or '-' ... Other schemes are of course also posssible, whatever suits your needs. This code works best if the strings in your list are sufficiently different. This code can be generated very fast by scanning the string once. For even more speed you can use a small 128 byte lookuptable containing the bitvalues which you can |. Once you have to find a string in the list of strings, you generate this same code for the string to be found. You can now compare this code against the other codes. Once 2 codes match, you'll still need to check if the strings are really equal, but in general this reduces the string compare to an int compare which is usually much faster. If you keep the list of codes sorted you can even do a binary search (beware that a certain code can occur multiple times), this massively reduces the number of compares to be done if the list is sufficiently large. Good luck. ps. This algorithm works *very* well if you have to find a substring in a list of strings (you & the codes in the compare to see if all the characters occur in the string).
FindMember is comparing JSVal's which are basically integers and unique within the system. So the simple hash of JSVal's is pretty fast. They essentially represent strings.
Quick update, I'm hoping within the next week to have some performance numbers and recommondations for setting the couple of constants for this patch. Then hopefully we can get this in, I think we'll see some marked improvements in DHTML as well as other areas.
Please compare the drop-in fnv-1a hash in comment 7 too. I'm curious as to whether the better distribution outweighs the increased complexity. I suspect not, but I don't have adequate instrumentation.
I did a while back and the improvement to the hit/collision rate was small to non-existentant compared with using the magic number Brendan mentioned. And I'm pretty sure the code generated was more expensive. I'll double check though, as it's been a while and my memory may be in error.
I've no doubt that the hash is more expensive -- the one it replaces is super-light. If there's little discernable improvement in distribution then there's no benefit at all.
Blocks: 139986, 203448
Blocks: 210602
Blocks: 21762
A chance of trying this for 1.7a ?
I ran 100 runs for each cache setting of browser startup/shutdown. This setting was the best. It also performed better than the version without the patch. The improvement in startup/shutdown time is minimal, around half a percent. I'd still be interested in feedback from people who apply this patch and run various DHTML sites. I'd like to leave the hacker code in, it will only affect people mucking with XPConnect. So the patch is ready for review. Also this patch is more than just the FindMember cache optimization, it also optimizes many of the FindMember loops.
Attachment #129404 - Attachment is obsolete: true
I've installed the patch and everything seems to work fine. I tried the following sites. I couldn't measure any significant improved but at least I didn't see any regression ;) http://www.world-direct.com/mozilla/dhtml/3dcube/index.htm http://www.world-direct.com/mozilla/dhtml/funo/dhtmltest_10to20ms.htm http://www.world-direct.com/mozilla/dhtml/75121/anim-test.htm http://www.world-direct.com/mozilla/dhtml/timeouttest_jpg_withoutscaling.htm It might of course be that these tests make hardly any use of the new code. If so please point me to some tests where this patch can show its muscles, I'll be happy to give you some numbers.
Should it speed up getElementById ?
Yes I would expect improvement from repeated calls from JavaScript to getElementById. Unless the cache entry gets overwritten. This is a simple direct hash, so it's possible that if another function with the same hash value as getElementById were called in the loop you'd actually see a slight degradation since it would be a constant miss.
Attachment #139865 - Flags: review?(jband_mozilla)
Are we ready to bake that on the trunk yet?
Is this still planned for the future, or is it a lost cause ?
I think there may be some value in doing this, but there hasn't been any definitive proof this helps anything. If someone steps up and says this boosts some aspect of performance then I'd be willing to move forward with it. It's not a trivial change so there needs to be some reward for this risk.
Target Milestone: --- → Future
Are there already results available of a build with this patch running the DHTML tinderbox tests?
Has anyone run the DHTML tinderbox tests with this patch applied?
QA Contact: pschwartau → xpconnect
Comment on attachment 139865 [details] [diff] [review] Same patch as before, but with cache settings set to optimal values No need to review right now, until someone identifies a case where this provides a benefit
Attachment #139865 - Flags: review?(jband_mozilla)
(In reply to comment #24) > (From update of attachment 139865 [details] [diff] [review] [edit]) > No need to review right now, until someone identifies a case where this > provides a benefit David, can you supply a win32 build ?
I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to update. Then the next hurdle is finding some place to host the build.
(In reply to comment #26) > I'll see. I suspect this patch is a bit stale, but shouldn't be too hard to > update. Then the next hurdle is finding some place to host the build. > The top hit on Google looks good: http://www.mediafire.com/ 100% Free Unlimited Disk Space Up to 100MB per File No Sign Up Required
Or you can just cheat and submit your patch to the Try server and let mozilla.org host the build: http://wiki.mozilla.org/Build:TryServer as a bonus, you get builds for Win32/Mac/Linux from that.
Can someone attach a minimal testcase for this bug, or set testcase-needed in Whiteboard?
Ah, I forgot: David Bradley, are you still working on this bug? If not, please reassign it to NOBODY.
I had hoped to see some feedback on the patch as to whether it improved any specific paths. From my findings there was a slight degradation in performance with only modest gains in certain paths. At this point unless someone with perf numbers can prove otherwise I think this is a won't fix. At least using the tiny hash approach.
Status: ASSIGNED → RESOLVED
Closed: 16 years ago
Resolution: --- → WONTFIX
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: