Closed
Bug 606648
Opened 14 years ago
Closed 11 years ago
typeof(this.data[x]) != "undefined" is slow
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
FIXED
mozilla26
People
(Reporter: developer, Assigned: jandem)
References
(Blocks 2 open bugs, )
Details
(Keywords: perf, Whiteboard: [jsperf])
Attachments
(4 files)
User-Agent: Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.9.2.11) Gecko/20101012 Firefox/3.6.11
Build Identifier: Mozilla/5.0 (Windows NT 5.1; rv:2.0b8pre) Gecko/20101022 Firefox/4.0b8pre
In the peacekeeper performance tests, the "ripple" tests use typeof(this.data[x]) != "undefined" and typeof(this.data[x][y]) != "undefined" over and over again. If I swap out those checks for another equivalent test (and comment out the canvas render code), the fps almost double (27 fps to 51 fps) on Windows XP.
I've made a page that shows the slowdown. If you go to http://www.peacekeeper.therichins.net/test10b.html, you'll see the time it took to do just 100 iterations using the typeof operator. If you go to http://www.peacekeeper.therichins.net/test10c.html, you'll see the same test without the typeof operator.
On my box, it goes from about 2s to sub 1s. Chrome is sub 1s for both of the versions.
The lines that are modified are:
if (typeof(this.data[x]) != "undefined" && typeof(this.data[x][y]) != "undefined") {
to:
if (x > 0 && x < this.width && y > 0 && y < this.height) {
Both are equivalent in this test.
I originally though that this was the same as bug 470143 but they said it looked like it was a different bug. This should probably block 499198 as it is related to Peacekeeper.
Reproducible: Always
Steps to Reproduce:
1. Load http://www.peacekeeper.therichins.net/test10b.html
2. Load http://www.peacekeeper.therichins.net/test10c.html
3. Compare the times
Actual Results:
Half the time spent in test10c
Expected Results:
For test10b to much faster.
Comment 1•14 years ago
|
||
Thanks for the report! Having the typeof on the out-of-bounds access means we have to generate a separate trace due to the guard on being out-of-bounds. Which in this case means we end up having a somewhat branchy tree, I bet, since there are lots of get() calls in a row.
That said, over here I see us take about 240ms on the slow version, 180ms on the fast one, if we trace the whole thing. That's much faster than Chrome (over 2x faster).
I can confirm that in Firefox 3.6 those testcases are a lot slower (750ms for the fast one, 1100ms for the slow one over here).
I filed bug 606650 on the fact that right this second this isn't tracing due to the heuristics patch.
Blocks: peacekeeper
Comment 2•14 years ago
|
||
Oh, in practice I guess we never end up in the undefined case.... So we don't end up with a branchy trace after all.
Note bug 591172.
Updated•14 years ago
|
Whiteboard: [jsperf]
Comment 3•14 years ago
|
||
So at this point we do _faster_ on test10b than on test10c, because jitprofiling makes us trace the former but not the latter (see bug 607201). The tracer kicks V8's butt here, but JM is way slower than V8. I tried JM+TI and it's faster than just JM, but nowhere close to V8, much less the tracer....
Brian, want to take a look at this for TI? Per comments in bug 607201 at least the test10c case should be doable with better interval analysis.
Comment 4•14 years ago
|
||
Comment 5•14 years ago
|
||
Updated•14 years ago
|
Blocks: infer-perf-regress
Comment 6•14 years ago
|
||
From profiling the JS, we seem to spend most of the time in processBuffers. After editing the testcase to see what helps, I think doing the below would get JM+TI roughly on par with TM.
- We need to inline the get() and set() calls --- JM+TI only inlines when the called function is a singleton (there can only ever be one in existence). This shouldn't be hard to fix, we just need to make sure we can expand the inlined frames correctly.
- 'this.a' and 'this.b' are inferred as either bools or Buffers (due to the 'false' initializations of 'a' and 'b'). We need read barriers on these accesses as the only type that could be observed here is Buffer.
- The 'this.a' and 'this.b' accesses in the processBuffers loop need to be LICM'ed, along with the read barriers on these accesses.
- The Math.round call in the 'set' function can overflow (it can produce -0 at the very least), which causes us to never inline this call. We should be more robust.
Comment 7•13 years ago
|
||
So, I'm using Firefox 9.0.1 and both testcases take the same amount of time.
Comment 8•13 years ago
|
||
Yes. The bug sort of morphed once the tracer went away... Maybe it's better to file a new bug on the remaining issue.
Comment 9•11 years ago
|
||
So, there was a problem, the problem partially went away, and now it's back again.
Nightly 26.0a1 (2013-09-05)
test10b - 1362ms
test10c - 385ms
Comment 10•11 years ago
|
||
Guilherme, are you willing to hunt down when 10b became slower than 10c (the nightly range should be fine)?
Flags: needinfo?(guijoselito)
Comment 11•11 years ago
|
||
I didn't do exactly what you asked.
I tested a few versions, trying the avaliable combinations.
The results are (10b/10c)
Firefox 17 - 990/885 (JM+TI) - 1275/1210 (TI-OFF)
Firefox 18 - 2400/1230 (JM+TI+ION) - 2100/910 (JM+TI) - 1700/920 (ION+TI)
Firefox 23 - 1480/860 (BC+ION) - 2700/920 (JM)
Firefox 24 - 1500/800 (BC+ION) - 5200/3700 (BC)
Nightly 26 - 1360/380 (BC+ION)
It seems to me that Ion just never was fast on 10b. And recently something made 10c much faster.
If you want me to get one of the exactly regressions, just ask.
Flags: needinfo?(guijoselito)
Comment 12•11 years ago
|
||
Thanks. That's good enough. ;)
So what I see in a profiler is a js::TypeOfOperation stubcall. I wonder whether we can get rid of that, esp in the != undefined case.....
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 13•11 years ago
|
||
Assignee | ||
Comment 14•11 years ago
|
||
Assignee | ||
Comment 15•11 years ago
|
||
For the shell tests I get:
10b:
SM : 584 ms
d8 : 375 ms
10c:
SM : 127 ms
d8 : 305 ms
So we're doing really well on the modified version, but 10b is the one that matters. Looking into that now.
Assignee | ||
Comment 16•11 years ago
|
||
Number of things we can do here, most important first:
(1) LTypeOfV should not use a VM call for objects. Thanks to Tom, the object typeof hook is gone, so the remaining interesting cases are:
obj->isCallable() => "function"
EmulatesUndefined(obj) => "undefined"
We could inline these checks, but thanks to TypeObject determining Class now, we should be able to guard against these statically as well and just use "object" if the type is an object. I will try this first, hopefully it will work here and in most cases.
(2) We should optimize typeof x == "string" (bug 725966).
(3) LTypeOfV should not emit code for types that are statically known to be impossible.
Depends on: 725966
Assignee | ||
Comment 17•11 years ago
|
||
Clearing needinfo. With the patches in bug 914132 the difference is only 12 ms and almost 3x faster than v8.
Flags: needinfo?(jdemooij)
Assignee | ||
Comment 18•11 years ago
|
||
(In reply to Guilherme Lima from comment #11)
> Nightly 26 - 1360/380 (BC+ION)
Previous nightly - 557/112 ms
Latest nightly - 78/112 ms
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → WORKSFORME
Updated•11 years ago
|
Resolution: WORKSFORME → FIXED
Updated•11 years ago
|
Assignee: general → jdemooij
Target Milestone: --- → mozilla26
You need to log in
before you can comment on or make changes to this bug.
Description
•