Closed
Bug 916851
Opened 11 years ago
Closed 11 years ago
Peacekeeper getElementsByTagName slower than in Chrome
Categories
(Core :: DOM: Core & HTML, defect)
Core
DOM: Core & HTML
Tracking
()
VERIFIED
FIXED
mozilla32
People
(Reporter: jandem, Assigned: jandem)
References
(Blocks 1 open bug, )
Details
(Keywords: perf)
Attachments
(1 file)
(deleted),
text/html
|
Details |
Attached is a testcase reduced from http://www.peacekeeper.therichins.net/test15.html It calls document.getElementsByTagName like this:
document.getElementsByTagName("a");
document.getElementsByTagName("h1");
document.getElementsByTagName("div");
Some numbers for this attachment:
Chrome 31: 485 ms
Nightly 26: 780 ms
Comment 1•11 years ago
|
||
Thank you for the testcase.
A profile shows the following:
1) TI fail for DOM proxies means we take the slow this-checking path here. That takes
about 25% of the total time in this case. See bug 820848.
2) 15% of the time is spent doing WrapNewBindingObjectHelper::Wrap. Some of this is call
overhead (it's not being inlined). Some is the fact that
nsIHTMLCollection::GetWrapperPreserveColor has to QI. We should really fix that
somehow. Peter?
3) About 10% of the time is jitcode.
4) 30% of the time is the hashtable lookup. Should we have a faster hashtable here? Or
a faster key umpl? 5.4% is in computing the hashcode and another 8% is in the
MatchEntry function. Maybe it's worth having separate caches for different list
types, with simpler keys?
5) 14% of the time is nsContentList::AddRef in NS_GetContentList. Most of this is
PR_GetCurrentThread. Is this just a nightly-only thing?
5) NS_GetContentList itself is about 5%; lots of code there. :(
Flags: needinfo?(peterv)
Flags: needinfo?(bugs)
Comment 2•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #1)
> 5) 14% of the time is nsContentList::AddRef in NS_GetContentList. Most of
> this is
> PR_GetCurrentThread. Is this just a nightly-only thing?
That should be the thing khuey added. Makes profiling on Nightlies harder.
Flags: needinfo?(bugs) → needinfo?(khuey)
Comment 3•11 years ago
|
||
Hrm, yeah. Kyle, can I have a configure option for turning that off, please?
Assignee | ||
Comment 4•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #3)
> Hrm, yeah. Kyle, can I have a configure option for turning that off, please?
Would a pref also work if it doesn't add too much overhead? As a SpiderMonkey hacker I don't do opt builds of the browser very often and use nightlies to run benchmarks..
Prefs are not really feasible but we plan to make it go away when --enable-profiling is set. I'll make that happen in bug 919380.
Flags: needinfo?(khuey)
Comment 6•11 years ago
|
||
So once bug 935274 lands later today, I get locally 480 and chrome gives 400.
(Nightly builds seem to be slower than my local opt builds, although both are compiled using
gcc 4.7.x. )
I guess we really need bug 820848.
Comment 7•11 years ago
|
||
Nightly is compiled with -O3 and my local builds were with -O2.
I guess Nightly is also pgo'ed and that affects to something, since even with -O3 I get faster times.
But I'll retry once bug 935274 is in Nightlies.
Comment 8•11 years ago
|
||
So I just tried this with a current inbound build. I get times in the 460ms range on the attached testcase. Chrome dev gets about 500ms, both on Mac. For comparison, Firefox 25 on this testcase is at 600ms. This is all on Mac.
A profile shows we're spending 81% of our time under DocumentBinding::genericMethod (the rest is jitcode), and 15% of our time _in_ genericMethod. That leaves 66% under DocumentBinding::getElementsByTagName.
Under there we have 5-6% getting string chars, 3% releasing content lists, 9% under Wrap() (of which 1.5% is GetWrapperPreserveColor), 40% under NS_GetContentList (mostly self time, with 6% string Equals() calls).
So bug 820848 will save us at least 15% here. But even without that at least on Mac we're faster than Chrome.
Comment 9•11 years ago
|
||
So I figured I'd check how much bug 820848 would really help us here. Some numbers on Mac:
Current inbound:
Testcase in bug: ~450ms
Original testcase: About 2,100,000 runs/s
With HTMLDocument not a proxy (simulating a fix for bug 820848):
Testcase in bug: ~360ms
Original testcase: About 2,500,000 runs/s
With also the fix for bug 938294 and bug 939581 and assuming bug 892687 gets fixed (aka "optimize all the real work away"):
Testcase in bug: ~22ms
Original testcase: About 8,000,000 runs/s
Chrome:
Testcase in bug: ~500ms
Original testcase: About 2,300,000 runs/s
Updated•11 years ago
|
Keywords: perf
Summary: getElementsByTagName slower than in Chrome → Peacekeeper getElementsByTagName slower than in Chrome
Comment 10•11 years ago
|
||
I just tested Nightly 29 (still without bug 939581), Firefox 27 (beta) and Chrome 31. On both Firefox versions I have the same add-ons, so I don't think this can be a problem.
I saw a regression on Nightly! No one else did?
Results:
Nightly - 1950ms and 524,242 runs/s
Beta - 1350ms and 684,504 runs/s
Chrome - 830ms and 1,304,597 runs/s
Comment 11•11 years ago
|
||
Bug 939581 won't affect this bug until bug 820848 is fixed.
Do you see a regression on nightly compared to nightlies when beta branched? In general, if you're testing in a 32-bit build I'd expect beta to be a bit faster than the corresponding nightly, because it has an extra register to use.
Comment 12•11 years ago
|
||
So I just tried today's nightly (on Mac, so 64-bit). I get times in the 446-454 range on the attached testcase. A nightly from 2013-09-16 (so about when beta 27 branched) gives me 697-706, which is about what I expected given that bug 935274 and bug 934946 were both fixed in the 28 cycle...
Comment 13•11 years ago
|
||
I can reproduce the huge difference on Win7. We need some profiling there.
Comment 14•11 years ago
|
||
On Linux using Zoom as profiler stuff under DocumentBinding::getElementsByTagName (100%)
is dominated by NS_GetContentList (69,8%) and nsContentListKey ctor (12.1%)
WrapNewBindingObjectHelper<>::Wrap is 8.7%
On Win7 + Gecko profiler the profile looks quite different.
WrapNewBindingObjectHelper<>::Wrap is 26.6% (most of that in ::Wrap itself)
Gecko profiler shows also tons of stuff spent in Release and nsRefPtr::operator= (although it seems to screw up
the classname, I assume it is nsIHTMLCollection).
AddRef under NS_GetContentList seems to take about the same amount % both on linux and win.
Do we end up having lots of cache misses or what here?
Comment 15•11 years ago
|
||
(And btw, just having Gecko profiler running makes the testcase take 2x more time.)
Comment 16•11 years ago
|
||
So on Windows you can reproduce the regression from 27 to tip?
Might be worth instrumenting the cache hit/miss situation on Windows...
Comment 17•11 years ago
|
||
Wait... odd.
Now I get 721 Chromium, 956 Nightly.
Gecko profiler had screwed something, I think. If it had had the JS profiling on at some point
that killed the performance and stopping the profiling didn't actually stop everything, I guess.
Comment 18•11 years ago
|
||
So, I don't trust those Windows profile numbers too much after all.
Looks like I get pretty similar results now on Linux and Windows Nightlies comparing to Chromium.
We're ~30% slower.
(Not testing local builds.)
Comment 19•11 years ago
|
||
With today's nightly I got 1159 and 714,157 runs/s. Are you sure bug 939581 doesn't affect the results?
If it doesn't, so it was a problem between yesterday's nightly and my machine, maybe.
Comment 20•11 years ago
|
||
> Are you sure bug 939581 doesn't affect the results?
99% sure, yes.
Note that on Mac we're about 20% faster than Chrome here (but comparing our 64-bit build to their 32-bit one).... I wonder what's going on with Linux/Windows.
Comment 21•11 years ago
|
||
But I can confirm that on Windows we're a lot slower than Chrome here...
Comment 22•11 years ago
|
||
So I tried comparing a 32-bit Mac build of Gecko to a 64-bit Mac build (by starting the nightly with arch -i386), and it's about 10% slower. Nowhere close to the difference we're seeing on Windows...
Comment 23•11 years ago
|
||
With bug 822442 fixed we seem to be pretty fast here... but again on Mac. Guillherme, are you willing to test on Windows?
Flags: needinfo?(guijoselito)
Comment 24•11 years ago
|
||
On Windows 8.1 Update 1 (64-bit) with tinderbox mozilla-inbound-win32-pgo build:
On Peacekeeper Original Testcase: 846240 ops/sec
On attached Testcase: 5 ms (Yeah! dialog appeared and it showed 5 ms) does it some kind of caching?
Build is from: https://hg.mozilla.org/integration/mozilla-inbound/rev/b73ad6f8bf26
It has bug fix you mentioned.
PC is i7 2600K (not overclocked) if it matters. Fresh profile.
Comment 25•11 years ago
|
||
zlip792, what numbers do you see on the original testcase in a build from before bug 822442 and in Chrome?
> On attached Testcase: 5 ms (Yeah! dialog appeared and it showed 5 ms) does it some kind
> of caching?
Most of the attached testcase now gets loop-hoisted. ;)
Comment 26•11 years ago
|
||
On New Profile on today's (April 16 - CSET: dd50745d7f35) Nightly, I get these numbers:
Original Testcase: 614929.5 op/sec
Attached Testcase: 748 ms
I have Chromium snapshots which I don't think have PGO so they might be slower but these are numbers what I get:
Original Testcase: 1562052 op/sec
Attached Testcase: 534 ms
Chromium Version 36.0.1942.0 (264038) with V8 3.26.15
If need any help.. let me know...
Comment 27•11 years ago
|
||
Chrome doesn't do PGO in general, so that part is OK.
Thanks for those numbers!
It sounds like we need to do something else here, but on Windows only. :( On Mac, we're 4x faster than Chrome on the original testcase here, while on Windows we're 2x slower!
We need to measure/profile on Windows somehow to see what the heck is going on. Jan, can you do that for the JIT side? I'll see what I can figure out for the DOM side.
Flags: needinfo?(guijoselito) → needinfo?(jdemooij)
Comment 28•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] from comment #27)
> Chrome doesn't do PGO in general, so that part is OK.
Are you sure? There's definitely a difference between Chromium snapshots and Chrome Canary - I believe they compile the latter with assertions disabled and I thought they also used PGO (I know h4writer looked into it for getting browser numbers on AWFY, but I'm not sure of the details anymore). The difference generally isn't that big though.
Comment 29•11 years ago
|
||
> Are you sure?
Not 100% positive, but last I checked that was the case.
In any case, I'm working on seeing what numbers I can get out of Windows here.
Comment 30•11 years ago
|
||
OK, so I tried this on Windows...
Using the build linked in comment 24, I get 1773760.5 ops/second. Chrome release on the same machine gets 1602994.5 ops/second. That already doesn't match the numbers reported in comment 24 very well. :(
If I then comment out all the getElementsByTagName calls in the benchmark, Chrome release ends up at 4300533.5 ops/second. The build linked in comment 24 ends up at 1887636.5 ops/second.
So whatever is going on there is a pure JS/JIT issue, seems like.
Comment 31•11 years ago
|
||
For comparison, on my Mac the numbers for the testcase with the getElementsByTagName calls commented out are 7926881 in my local opt build and 7100918.5 in Chrome. Without it commented out Chrome is at about 2005234.5 and we're at about 6056112.5.
So in both cases Chrome gets about 3-4x faster when commenting out those calls, while we get about 10-20% faster. It's just that our baseline on Windows is way worse for some reason. :(
Comment 32•11 years ago
|
||
Just to confirm I see the same numbers as you on Windows:
Nightly pre-822442
testcase 1195ms
original test 502413 ops/s
Chrome 34
testcase 820ms
original test 1343121 ops/s
Nightly pos-822442
testcase 7ms
original test 1202209 ops/s
Assignee | ||
Comment 33•11 years ago
|
||
(In reply to Boris Zbarsky [:bz] (on PTO, reviews are slow) from comment #27)
> We need to measure/profile on Windows somehow to see what the heck is going
> on. Jan, can you do that for the JIT side? I'll see what I can figure out
> for the DOM side.
Ouch, looking at benchmark.js I'm pretty sure this is bug 603872 :(
I'll try to fix that this week, this is ridiculous.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 34•11 years ago
|
||
bz can you confirm this is fixed with a recent inbound build (one that includes bug 1004923)?
Flags: needinfo?(bzbarsky)
Comment 35•11 years ago
|
||
On the same machine as the testing in comment 30, I see about the same numbers as before with Chrome and an inbound build from before bug 1004923. An inbound build from after bug 1004923 is more like 3634705 runs/s. I'd call that fixed. ;)
Flags: needinfo?(bzbarsky)
Updated•11 years ago
|
Flags: needinfo?(peterv)
Assignee | ||
Comment 36•11 years ago
|
||
Thanks Boris, will close this when bug 1004923 is on m-c.
Depends on: 1004923
Assignee | ||
Comment 37•11 years ago
|
||
This should be fixed on all platforms now.
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Assignee: nobody → jdemooij
Target Milestone: --- → mozilla32
Comment 38•10 years ago
|
||
I'm using the latest Beta 32.0b9 and Ubuntu 13.04 x32:
The results are still twice as better for Chrome than Firefox eg:
Firefox:
benchmark.submitResult: Submit ok.
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 128753
benchmark: Elapsed time 2000. Operations: 257506. Operations per second: 128753
Chrome:
benchmark.submitResult: Submit ok.
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 323186.5
benchmark: Elapsed time 2000. Operations: 646373. Operations per second: 323186.5
Comment 39•10 years ago
|
||
Catalin, that's using the 32-bit build of Firefox on Ubuntu, right?
If there remaining issue is Linux-only (or worse yet 32-bit-Linux-only), we should file a separate bug for it; if commonly used configurations are affected it might be worth reopening this one.
Updated•10 years ago
|
Flags: needinfo?(catalin.varga)
Comment 40•10 years ago
|
||
Boris, that is indeed the 32 bits Ubuntu but the same behavior was reproduced on Mac 10.9.4 using FF32.0b9:
Chrome:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 333235
benchmark: Elapsed time 2000. Operations: 666470. Operations per second: 333235
Firefox:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 180253
benchmark: Elapsed time 2000. Operations: 360506. Operations per second: 180253
Flags: needinfo?(catalin.varga)
Comment 41•10 years ago
|
||
This are the values for
FF32.0b8 and Win 7 x64 (I've used FF32.0b8 because FF32.0b9 for windows was not created yet):
Firefox:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 123810.5
benchmark: Elapsed time 2000. Operations: 247621. Operations per second: 123810.5
Chrome:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 278507
benchmark: Elapsed time 2000. Operations: 557014. Operations per second: 278507
Comment 42•10 years ago
|
||
Catalin, thank you for testing that.
Here's what I see on Mac:
Chrome (38.0.2125.8 (Official Build 290962) dev):
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 489359
benchmark: Elapsed time 2000. Operations: 978718. Operations per second: 489359
Firefox 32 beta (whatever is latest on the beta channel):
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 290255
benchmark: Elapsed time 2000. Operations: 580510. Operations per second: 290255
Firefox 33 aurora (whatever is latest on that channel):
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 307812
benchmark: Elapsed time 2000. Operations: 615624. Operations per second: 307812
Firefox 34 nightly (today's):
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 551991
benchmark: Elapsed time 2000. Operations: 1103982. Operations per second: 551991
Firefox 32 nightly from immediately before 32 went to Aurora:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 236820.5
benchmark: Elapsed time 2000. Operations: 473641. Operations per second: 236820.5
Firefox 33 nightly from immediately before 33 went to Aurora:
benchmark.submitResult: Submitting test arrayWeighted from suite array result: 292173
benchmark: Elapsed time 2000. Operations: 584346. Operations per second: 292173
I should note that the numbers reported earlier in this bug are about an order of magniture higher for both Chrome and Firefox, so it's possible the testcase was changed partway through...
Comment 43•10 years ago
|
||
So from what I understand Latest 34 Nightly build is slightly faster than Chrome, while Beta 32 and Aurora 33 are still about 2 times slower.
Since this is targeted for 32, but appears to only be fixed in 34, what is the call here? Shall we call 32 and 33 affected and 34 fixed? Jan, what's your take on this?
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 44•10 years ago
|
||
(In reply to Florin Mezei, QA (:FlorinMezei) from comment #43)
> Since this is targeted for 32, but appears to only be fixed in 34, what is
> the call here? Shall we call 32 and 33 affected and 34 fixed? Jan, what's
> your take on this?
(In reply to Boris Zbarsky [:bz] from comment #42)
> benchmark.submitResult: Submitting test arrayWeighted from suite array
>
> I should note that the numbers reported earlier in this bug are about an
> order of magniture higher for both Chrome and Firefox, so it's possible the
> testcase was changed partway through...
Hm, it looks like the URL in comment 0 now points to the arrayWeighted test instead of the getElementsByTagName thing. Copy-on-write arrays (bug 934450) is why arrayWeighted got so much faster in Firefox 34.
Apparently domGetElements is test16 now :(
http://www.peacekeeper.therichins.net/test16.html
Flags: needinfo?(jdemooij)
Comment 45•10 years ago
|
||
I verified the bug using http://www.peacekeeper.therichins.net/test16.html test and this are the values found:
Windows 7 x64:
Chrome
benchmark.submitResult: Submitting test domGetElements from suite dom result: 1524270.5
benchmark: Elapsed time 2000. Operations: 3048541. Operations per second: 1524270.5
Firefox 32.0b9
benchmark.submitResult: Submitting test domGetElements from suite dom result: 2225822.5
benchmark: Elapsed time 2000. Operations: 4451645. Operations per second: 2225822.5
Mas Os X 10.9.4
Chrome
benchmark.submitResult: Submitting test domGetElements from suite dom result: 2019774
benchmark: Elapsed time 2000. Operations: 4039548. Operations per second: 2019774
Firefox 32.0b9
benchmark.submitResult: Submitting test domGetElements from suite dom result: 3353293.5
benchmark: Elapsed time 2000. Operations: 6706587. Operations per second: 3353293.5
Ubuntu 14.04 x 64
Chrome
benchmark.submitResult: Submitting test domGetElements from suite dom result: 1994682
benchmark: Elapsed time 2000. Operations: 3989364. Operations per second: 1994682
Firefox 32.0b9
benchmark.submitResult: Submitting test domGetElements from suite dom result: 3019200
benchmark: Elapsed time 2000. Operations: 6038400. Operations per second: 3019200
on all operating systems Firefox is performing more operations than Chrome on the same allocated time.
Status: RESOLVED → VERIFIED
Keywords: verifyme
Comment 46•10 years ago
|
||
> Hm, it looks like the URL in comment 0 now points to the arrayWeighted test
Ouch. This is why we should attach unreduced perf testcases to bugs too. :(
Updated•10 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•