Closed Bug 916851 Opened 11 years ago Closed 11 years ago

Peacekeeper getElementsByTagName slower than in Chrome

Categories

(Core :: DOM: Core & HTML, defect)

defect
Not set
normal

Tracking

()

VERIFIED FIXED
mozilla32

People

(Reporter: jandem, Assigned: jandem)

References

(Blocks 1 open bug, )

Details

(Keywords: perf)

Attachments

(1 file)

Attached file Test case (deleted) —
Attached is a testcase reduced from http://www.peacekeeper.therichins.net/test15.html It calls document.getElementsByTagName like this: document.getElementsByTagName("a"); document.getElementsByTagName("h1"); document.getElementsByTagName("div"); Some numbers for this attachment: Chrome 31: 485 ms Nightly 26: 780 ms
Thank you for the testcase. A profile shows the following: 1) TI fail for DOM proxies means we take the slow this-checking path here. That takes about 25% of the total time in this case. See bug 820848. 2) 15% of the time is spent doing WrapNewBindingObjectHelper::Wrap. Some of this is call overhead (it's not being inlined). Some is the fact that nsIHTMLCollection::GetWrapperPreserveColor has to QI. We should really fix that somehow. Peter? 3) About 10% of the time is jitcode. 4) 30% of the time is the hashtable lookup. Should we have a faster hashtable here? Or a faster key umpl? 5.4% is in computing the hashcode and another 8% is in the MatchEntry function. Maybe it's worth having separate caches for different list types, with simpler keys? 5) 14% of the time is nsContentList::AddRef in NS_GetContentList. Most of this is PR_GetCurrentThread. Is this just a nightly-only thing? 5) NS_GetContentList itself is about 5%; lots of code there. :(
Flags: needinfo?(peterv)
Flags: needinfo?(bugs)
(In reply to Boris Zbarsky [:bz] from comment #1) > 5) 14% of the time is nsContentList::AddRef in NS_GetContentList. Most of > this is > PR_GetCurrentThread. Is this just a nightly-only thing? That should be the thing khuey added. Makes profiling on Nightlies harder.
Flags: needinfo?(bugs) → needinfo?(khuey)
Hrm, yeah. Kyle, can I have a configure option for turning that off, please?
(In reply to Boris Zbarsky [:bz] from comment #3) > Hrm, yeah. Kyle, can I have a configure option for turning that off, please? Would a pref also work if it doesn't add too much overhead? As a SpiderMonkey hacker I don't do opt builds of the browser very often and use nightlies to run benchmarks..
Prefs are not really feasible but we plan to make it go away when --enable-profiling is set. I'll make that happen in bug 919380.
Flags: needinfo?(khuey)
Depends on: 820848
No longer blocks: 932322
So once bug 935274 lands later today, I get locally 480 and chrome gives 400. (Nightly builds seem to be slower than my local opt builds, although both are compiled using gcc 4.7.x. ) I guess we really need bug 820848.
Nightly is compiled with -O3 and my local builds were with -O2. I guess Nightly is also pgo'ed and that affects to something, since even with -O3 I get faster times. But I'll retry once bug 935274 is in Nightlies.
So I just tried this with a current inbound build. I get times in the 460ms range on the attached testcase. Chrome dev gets about 500ms, both on Mac. For comparison, Firefox 25 on this testcase is at 600ms. This is all on Mac. A profile shows we're spending 81% of our time under DocumentBinding::genericMethod (the rest is jitcode), and 15% of our time _in_ genericMethod. That leaves 66% under DocumentBinding::getElementsByTagName. Under there we have 5-6% getting string chars, 3% releasing content lists, 9% under Wrap() (of which 1.5% is GetWrapperPreserveColor), 40% under NS_GetContentList (mostly self time, with 6% string Equals() calls). So bug 820848 will save us at least 15% here. But even without that at least on Mac we're faster than Chrome.
Depends on: 939581
Depends on: 892687
So I figured I'd check how much bug 820848 would really help us here. Some numbers on Mac: Current inbound: Testcase in bug: ~450ms Original testcase: About 2,100,000 runs/s With HTMLDocument not a proxy (simulating a fix for bug 820848): Testcase in bug: ~360ms Original testcase: About 2,500,000 runs/s With also the fix for bug 938294 and bug 939581 and assuming bug 892687 gets fixed (aka "optimize all the real work away"): Testcase in bug: ~22ms Original testcase: About 8,000,000 runs/s Chrome: Testcase in bug: ~500ms Original testcase: About 2,300,000 runs/s
Keywords: perf
Summary: getElementsByTagName slower than in Chrome → Peacekeeper getElementsByTagName slower than in Chrome
No longer depends on: 892687
I just tested Nightly 29 (still without bug 939581), Firefox 27 (beta) and Chrome 31. On both Firefox versions I have the same add-ons, so I don't think this can be a problem. I saw a regression on Nightly! No one else did? Results: Nightly - 1950ms and 524,242 runs/s Beta - 1350ms and 684,504 runs/s Chrome - 830ms and 1,304,597 runs/s
Bug 939581 won't affect this bug until bug 820848 is fixed. Do you see a regression on nightly compared to nightlies when beta branched? In general, if you're testing in a 32-bit build I'd expect beta to be a bit faster than the corresponding nightly, because it has an extra register to use.
So I just tried today's nightly (on Mac, so 64-bit). I get times in the 446-454 range on the attached testcase. A nightly from 2013-09-16 (so about when beta 27 branched) gives me 697-706, which is about what I expected given that bug 935274 and bug 934946 were both fixed in the 28 cycle...
I can reproduce the huge difference on Win7. We need some profiling there.
On Linux using Zoom as profiler stuff under DocumentBinding::getElementsByTagName (100%) is dominated by NS_GetContentList (69,8%) and nsContentListKey ctor (12.1%) WrapNewBindingObjectHelper<>::Wrap is 8.7% On Win7 + Gecko profiler the profile looks quite different. WrapNewBindingObjectHelper<>::Wrap is 26.6% (most of that in ::Wrap itself) Gecko profiler shows also tons of stuff spent in Release and nsRefPtr::operator= (although it seems to screw up the classname, I assume it is nsIHTMLCollection). AddRef under NS_GetContentList seems to take about the same amount % both on linux and win. Do we end up having lots of cache misses or what here?
(And btw, just having Gecko profiler running makes the testcase take 2x more time.)
So on Windows you can reproduce the regression from 27 to tip? Might be worth instrumenting the cache hit/miss situation on Windows...
Wait... odd. Now I get 721 Chromium, 956 Nightly. Gecko profiler had screwed something, I think. If it had had the JS profiling on at some point that killed the performance and stopping the profiling didn't actually stop everything, I guess.
So, I don't trust those Windows profile numbers too much after all. Looks like I get pretty similar results now on Linux and Windows Nightlies comparing to Chromium. We're ~30% slower. (Not testing local builds.)
With today's nightly I got 1159 and 714,157 runs/s. Are you sure bug 939581 doesn't affect the results? If it doesn't, so it was a problem between yesterday's nightly and my machine, maybe.
> Are you sure bug 939581 doesn't affect the results? 99% sure, yes. Note that on Mac we're about 20% faster than Chrome here (but comparing our 64-bit build to their 32-bit one).... I wonder what's going on with Linux/Windows.
But I can confirm that on Windows we're a lot slower than Chrome here...
So I tried comparing a 32-bit Mac build of Gecko to a 64-bit Mac build (by starting the nightly with arch -i386), and it's about 10% slower. Nowhere close to the difference we're seeing on Windows...
With bug 822442 fixed we seem to be pretty fast here... but again on Mac. Guillherme, are you willing to test on Windows?
Flags: needinfo?(guijoselito)
On Windows 8.1 Update 1 (64-bit) with tinderbox mozilla-inbound-win32-pgo build: On Peacekeeper Original Testcase: 846240 ops/sec On attached Testcase: 5 ms (Yeah! dialog appeared and it showed 5 ms) does it some kind of caching? Build is from: https://hg.mozilla.org/integration/mozilla-inbound/rev/b73ad6f8bf26 It has bug fix you mentioned. PC is i7 2600K (not overclocked) if it matters. Fresh profile.
zlip792, what numbers do you see on the original testcase in a build from before bug 822442 and in Chrome? > On attached Testcase: 5 ms (Yeah! dialog appeared and it showed 5 ms) does it some kind > of caching? Most of the attached testcase now gets loop-hoisted. ;)
On New Profile on today's (April 16 - CSET: dd50745d7f35) Nightly, I get these numbers: Original Testcase: 614929.5 op/sec Attached Testcase: 748 ms I have Chromium snapshots which I don't think have PGO so they might be slower but these are numbers what I get: Original Testcase: 1562052 op/sec Attached Testcase: 534 ms Chromium Version 36.0.1942.0 (264038) with V8 3.26.15 If need any help.. let me know...
Chrome doesn't do PGO in general, so that part is OK. Thanks for those numbers! It sounds like we need to do something else here, but on Windows only. :( On Mac, we're 4x faster than Chrome on the original testcase here, while on Windows we're 2x slower! We need to measure/profile on Windows somehow to see what the heck is going on. Jan, can you do that for the JIT side? I'll see what I can figure out for the DOM side.
Flags: needinfo?(guijoselito) → needinfo?(jdemooij)
(In reply to Boris Zbarsky [:bz] from comment #27) > Chrome doesn't do PGO in general, so that part is OK. Are you sure? There's definitely a difference between Chromium snapshots and Chrome Canary - I believe they compile the latter with assertions disabled and I thought they also used PGO (I know h4writer looked into it for getting browser numbers on AWFY, but I'm not sure of the details anymore). The difference generally isn't that big though.
> Are you sure? Not 100% positive, but last I checked that was the case. In any case, I'm working on seeing what numbers I can get out of Windows here.
OK, so I tried this on Windows... Using the build linked in comment 24, I get 1773760.5 ops/second. Chrome release on the same machine gets 1602994.5 ops/second. That already doesn't match the numbers reported in comment 24 very well. :( If I then comment out all the getElementsByTagName calls in the benchmark, Chrome release ends up at 4300533.5 ops/second. The build linked in comment 24 ends up at 1887636.5 ops/second. So whatever is going on there is a pure JS/JIT issue, seems like.
For comparison, on my Mac the numbers for the testcase with the getElementsByTagName calls commented out are 7926881 in my local opt build and 7100918.5 in Chrome. Without it commented out Chrome is at about 2005234.5 and we're at about 6056112.5. So in both cases Chrome gets about 3-4x faster when commenting out those calls, while we get about 10-20% faster. It's just that our baseline on Windows is way worse for some reason. :(
Just to confirm I see the same numbers as you on Windows: Nightly pre-822442 testcase 1195ms original test 502413 ops/s Chrome 34 testcase 820ms original test 1343121 ops/s Nightly pos-822442 testcase 7ms original test 1202209 ops/s
(In reply to Boris Zbarsky [:bz] (on PTO, reviews are slow) from comment #27) > We need to measure/profile on Windows somehow to see what the heck is going > on. Jan, can you do that for the JIT side? I'll see what I can figure out > for the DOM side. Ouch, looking at benchmark.js I'm pretty sure this is bug 603872 :( I'll try to fix that this week, this is ridiculous.
Flags: needinfo?(jdemooij)
bz can you confirm this is fixed with a recent inbound build (one that includes bug 1004923)?
Flags: needinfo?(bzbarsky)
On the same machine as the testing in comment 30, I see about the same numbers as before with Chrome and an inbound build from before bug 1004923. An inbound build from after bug 1004923 is more like 3634705 runs/s. I'd call that fixed. ;)
Flags: needinfo?(bzbarsky)
Flags: needinfo?(peterv)
Thanks Boris, will close this when bug 1004923 is on m-c.
Depends on: 1004923
This should be fixed on all platforms now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Assignee: nobody → jdemooij
Target Milestone: --- → mozilla32
I'm using the latest Beta 32.0b9 and Ubuntu 13.04 x32: The results are still twice as better for Chrome than Firefox eg: Firefox: benchmark.submitResult: Submit ok. benchmark.submitResult: Submitting test arrayWeighted from suite array result: 128753 benchmark: Elapsed time 2000. Operations: 257506. Operations per second: 128753 Chrome: benchmark.submitResult: Submit ok. benchmark.submitResult: Submitting test arrayWeighted from suite array result: 323186.5 benchmark: Elapsed time 2000. Operations: 646373. Operations per second: 323186.5
Catalin, that's using the 32-bit build of Firefox on Ubuntu, right? If there remaining issue is Linux-only (or worse yet 32-bit-Linux-only), we should file a separate bug for it; if commonly used configurations are affected it might be worth reopening this one.
Flags: needinfo?(catalin.varga)
Boris, that is indeed the 32 bits Ubuntu but the same behavior was reproduced on Mac 10.9.4 using FF32.0b9: Chrome: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 333235 benchmark: Elapsed time 2000. Operations: 666470. Operations per second: 333235 Firefox: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 180253 benchmark: Elapsed time 2000. Operations: 360506. Operations per second: 180253
Flags: needinfo?(catalin.varga)
This are the values for FF32.0b8 and Win 7 x64 (I've used FF32.0b8 because FF32.0b9 for windows was not created yet): Firefox: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 123810.5 benchmark: Elapsed time 2000. Operations: 247621. Operations per second: 123810.5 Chrome: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 278507 benchmark: Elapsed time 2000. Operations: 557014. Operations per second: 278507
Catalin, thank you for testing that. Here's what I see on Mac: Chrome (38.0.2125.8 (Official Build 290962) dev): benchmark.submitResult: Submitting test arrayWeighted from suite array result: 489359 benchmark: Elapsed time 2000. Operations: 978718. Operations per second: 489359 Firefox 32 beta (whatever is latest on the beta channel): benchmark.submitResult: Submitting test arrayWeighted from suite array result: 290255 benchmark: Elapsed time 2000. Operations: 580510. Operations per second: 290255 Firefox 33 aurora (whatever is latest on that channel): benchmark.submitResult: Submitting test arrayWeighted from suite array result: 307812 benchmark: Elapsed time 2000. Operations: 615624. Operations per second: 307812 Firefox 34 nightly (today's): benchmark.submitResult: Submitting test arrayWeighted from suite array result: 551991 benchmark: Elapsed time 2000. Operations: 1103982. Operations per second: 551991 Firefox 32 nightly from immediately before 32 went to Aurora: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 236820.5 benchmark: Elapsed time 2000. Operations: 473641. Operations per second: 236820.5 Firefox 33 nightly from immediately before 33 went to Aurora: benchmark.submitResult: Submitting test arrayWeighted from suite array result: 292173 benchmark: Elapsed time 2000. Operations: 584346. Operations per second: 292173 I should note that the numbers reported earlier in this bug are about an order of magniture higher for both Chrome and Firefox, so it's possible the testcase was changed partway through...
So from what I understand Latest 34 Nightly build is slightly faster than Chrome, while Beta 32 and Aurora 33 are still about 2 times slower. Since this is targeted for 32, but appears to only be fixed in 34, what is the call here? Shall we call 32 and 33 affected and 34 fixed? Jan, what's your take on this?
Flags: needinfo?(jdemooij)
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #43) > Since this is targeted for 32, but appears to only be fixed in 34, what is > the call here? Shall we call 32 and 33 affected and 34 fixed? Jan, what's > your take on this? (In reply to Boris Zbarsky [:bz] from comment #42) > benchmark.submitResult: Submitting test arrayWeighted from suite array > > I should note that the numbers reported earlier in this bug are about an > order of magniture higher for both Chrome and Firefox, so it's possible the > testcase was changed partway through... Hm, it looks like the URL in comment 0 now points to the arrayWeighted test instead of the getElementsByTagName thing. Copy-on-write arrays (bug 934450) is why arrayWeighted got so much faster in Firefox 34. Apparently domGetElements is test16 now :( http://www.peacekeeper.therichins.net/test16.html
Flags: needinfo?(jdemooij)
I verified the bug using http://www.peacekeeper.therichins.net/test16.html test and this are the values found: Windows 7 x64: Chrome benchmark.submitResult: Submitting test domGetElements from suite dom result: 1524270.5 benchmark: Elapsed time 2000. Operations: 3048541. Operations per second: 1524270.5 Firefox 32.0b9 benchmark.submitResult: Submitting test domGetElements from suite dom result: 2225822.5 benchmark: Elapsed time 2000. Operations: 4451645. Operations per second: 2225822.5 Mas Os X 10.9.4 Chrome benchmark.submitResult: Submitting test domGetElements from suite dom result: 2019774 benchmark: Elapsed time 2000. Operations: 4039548. Operations per second: 2019774 Firefox 32.0b9 benchmark.submitResult: Submitting test domGetElements from suite dom result: 3353293.5 benchmark: Elapsed time 2000. Operations: 6706587. Operations per second: 3353293.5 Ubuntu 14.04 x 64 Chrome benchmark.submitResult: Submitting test domGetElements from suite dom result: 1994682 benchmark: Elapsed time 2000. Operations: 3989364. Operations per second: 1994682 Firefox 32.0b9 benchmark.submitResult: Submitting test domGetElements from suite dom result: 3019200 benchmark: Elapsed time 2000. Operations: 6038400. Operations per second: 3019200 on all operating systems Firefox is performing more operations than Chrome on the same allocated time.
Status: RESOLVED → VERIFIED
Keywords: verifyme
> Hm, it looks like the URL in comment 0 now points to the arrayWeighted test Ouch. This is why we should attach unreduced perf testcases to bugs too. :(
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: