Closed
Bug 654410
Opened 14 years ago
Closed 8 years ago
NES emulator 3X faster in Chrome
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WORKSFORME
People
(Reporter: dvander, Unassigned)
References
(Blocks 1 open bug, )
Details
Attachments
(2 files)
Author said his NES emulator runs 3X faster in Chrome, so filing a tracking bug to investigate.
Comment 1•14 years ago
|
||
Here's a free program to use with the emulator so that people don't need to use infringing copies. The archive contains spritecans.nes (the NES executable) and complete corresponding source code.
Comment 2•14 years ago
|
||
According to http://nesdev.parodius.com/bbs/viewtopic.php?p=77416#77416
the emulator is a fork of JSNES, for which we have bug 509986.
Comment 3•14 years ago
|
||
On spritecans, I get 30fps in Fx4, 60fps in Chrome.
xperf confirms that 2/3 of our time here is in JS, 1/3 in "unknown", but almost none in gfx, so definitely a JS issue. We should try our JS profilers on this.
I've tracked down the primary performance problem to the emulate: function. The problem is probably the JIT not compiling the switches into jump tables. This is the "ideal" method to implement the emulator. The other possibility is to have a separate function for each op, but that adds function call setup overhead which is non-trivial. Overall I've found the switch implementation method to be faster in chrome thus far, but there are some possibilities for the function call method that I haven't explored.
Comment 5•14 years ago
|
||
(In reply to comment #4)
> I've tracked down the primary performance problem to the emulate: function.
> The problem is probably the JIT not compiling the switches into jump tables.
> This is the "ideal" method to implement the emulator. The other possibility
> is to have a separate function for each op, but that adds function call
> setup overhead which is non-trivial. Overall I've found the switch
> implementation method to be faster in chrome thus far, but there are some
> possibilities for the function call method that I haven't explored.
I think we do compile switches to jump tables (except on ARM), if the switch is done with JSOP_TABLESWITCH by the bytecode compiler. So maybe it's getting compiled as JSOP_LOOKUPSWITCH, or maybe the problem is something else.
In the case of switches over symbolic constants which are properties of the same object, could we maybe guard on the shape of the holder object and then generate a table?
Comment 7•14 years ago
|
||
(In reply to comment #5)
>
> I think we do compile switches to jump tables (except on ARM), if the switch
> is done with JSOP_TABLESWITCH by the bytecode compiler
I hope you're talking about the method JIT, because we recently removed table switch support from the trace JIT (bug 620757).
Comment 8•14 years ago
|
||
Here's a shell version + the free ROM. It runs 100 frames like this:
--
100 frames: 3042ms.
32.9 fps
--
It reads the rom using the shiny new snarf(.., "binary").
Comment 9•14 years ago
|
||
Most time is spent in GetElem/SetElem stubs. Shark shows 23.5% under js::PropertyTable::search...
The switch-statements here look like table switches, I also see no switch-related stub calls in the profile.
Reporter | ||
Comment 10•14 years ago
|
||
On some NES ROMs, with JSNES, the arrays fill in an order that causes them to become sparse. Could something similar be happening here?
Comment 11•14 years ago
|
||
(In reply to comment #10)
> On some NES ROMs, with JSNES, the arrays fill in an order that causes them
> to become sparse. Could something similar be happening here?
Yeah if I change this:
--
var i = 256*240;
while(i--) {
buffer[i] = bgColor;
}
var pixrendered = this.pixrendered;
i = pixrendered.length;
while(i--) {
pixrendered[i]=65;
}
--
to this:
--
for(var i=0; i<256*240; i++) {
buffer[i] = bgColor;
}
var pixrendered = this.pixrendered;
for(i = 0; i < pixrendered.length; i++) {
pixrendered[i]=65;
}
--
we're almost twice as fast:
100 frames: 1753ms.
57 fps
Reporter | ||
Comment 12•14 years ago
|
||
Nice. My kingdom for bug 586842!
Comment 13•14 years ago
|
||
I've updated the code everywhere from a simple new Array(size) to a function which new's the array, then initializes all its elements in 0 .. size-1 order. (rather than reverse order).
It is indeed faster. Before I was getting 10-15fps on my macbook air, now its more like 23-25fps. Still not 60fps like chrome, but definitely an improvement!
Comment 14•14 years ago
|
||
This is definitely something I can work with :) I'll swat at the code tonight and see if I can squeeze out some perf =D Thanks guys!
Comment 15•14 years ago
|
||
Jon: if/as you find any more performance faults, please let us know so (as dvander did with bug 586842) we can compile use cases to guide our optimization efforts.
Comment 16•14 years ago
|
||
Initializing an array via
{ someArray: [ v1, v2, v3, v4] }
seems much faster when using the array after initialization than initing via
this.someArray = new Array(4)
this.someArray[0] = v1;
this.someArray[1] = v2;
this.someArray[2] = v3;
this.someArray[3] = v4;
Am I hallucinating a performance improvement here? (obviously with much larger than 4 elements and with random accesses)
So far I've improved perf by 40% or so in FF4. Still looking for places to improve and tricks to get things faster.
Comment 17•14 years ago
|
||
That example isn't 100% accurate
{foo: [{a:0, b:0}, {a:0, b:0}, {a:0, b:0}, {a:0, b:0}]
vs
this.foo = new Array(4);
this.foo[0] = [];
this.foo[0].a = 0;
this.foo[1] = [];
this.foo[1].a = 0;
this.foo[2] = [];
this.foo[2].a = 0;
this.foo[3] = [];
this.foo[3].a = 0;
this.foo[0].b = 0;
this.foo[1].b = 0;
this.foo[2].b = 0;
this.foo[3].b = 0;
Comment 18•13 years ago
|
||
(In reply to comment #16)
> Initializing an array via
> { someArray: [ v1, v2, v3, v4] }
>
> seems much faster when using the array after initialization than initing via
> this.someArray = new Array(4)
> this.someArray[0] = v1;
> this.someArray[1] = v2;
> this.someArray[2] = v3;
> this.someArray[3] = v4;
>
> Am I hallucinating a performance improvement here? (obviously with much
> larger than 4 elements and with random accesses)
For those particular examples, I get a dense array in either one. I also tried your code in comment 17 and that gave me a dense array, too. But if you initialize the elements of a long array in random order, then it will probably get demoted to a sparse array.
Btw, I check whether the array is dense or not using the |dumpObject| function of the JS shell. Example of a dense array:
js> x = [ 1, 2, 3 ]
[1, 2, 3]
js> dumpObject(x)
object 00D0C048
class 016A8D48 Array
flags: none
elements
0: 1
1: 2
2: 3
Example of a sparse array:
js> y = new Array(4)
[, , , ,]
js> y[1000000] = 8
8
js> dumpObject(y)
object 00D0C090
class 016A8F60 Array
flags: indexed
proto <Array object at 00D02118>
parent <global object at 00D02028>
private 000F4241
properties:
((Shape *) 00D09AA0) permanent shared getterOp=0130B880 setterOp=0130B8D0 "length": slot -1
((Shape *) 00D09AC8) enumerate 1000000: slot 0 = 8
Only the sparse (slow) array has proto, parent, or properties. Only the dense array has |elements|. The |class| value is different, but it doesn't tell you which is which so that's less useful.
Reporter | ||
Comment 19•13 years ago
|
||
(In reply to comment #16)
> Initializing an array via
> { someArray: [ v1, v2, v3, v4] }
>
> seems much faster when using the array after initialization than initing via
> this.someArray = new Array(4)
> this.someArray[0] = v1;
> this.someArray[1] = v2;
> this.someArray[2] = v3;
> this.someArray[3] = v4;
>
> Am I hallucinating a performance improvement here? (obviously with much
> larger than 4 elements and with random accesses)
At least in JaegerMonkey, array initializers are very fast - it knows the layout up-front and can poke directly into the slots. In the latter example there are a lot more instructions and memory traffic needed.
Comment 20•13 years ago
|
||
I know this isn't the right place for this. But for the next HTML5 spec, I'd like to request support for gamepads from HTML5 devices.
Comment 21•12 years ago
|
||
Bug 827490 just landed. It might help here.
Comment 22•12 years ago
|
||
I just ran the shell test case on my Linux64 box and got:
100 frames: 1803ms.
55.5 fps
Then I ran an old (pre-bug 827490) build and got almost identical numbers.
As for the browser, I tried http://zelex.net/nezulator/ with spritecans.nes but I couldn't get it to do anything useful (i.e. the FPS was stuck at 0). Can someone who knows how to run it try it again with a Nightly build?
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Comment 23•8 years ago
|
||
According to https://arewefastyet.com/#machine=11&view=single&suite=misc&subtest=bugs-654410-nezulator this bug got fixed in the first half of 2013.
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WORKSFORME
You need to log in
before you can comment on or make changes to this bug.
Description
•