Open Bug 1111000 Opened 10 years ago Updated 2 years ago

2x slower than v8 on BBC Micro emulator

Categories

(Core :: JavaScript Engine, defect)

x86_64
Linux
defect

Tracking

()

People

(Reporter: azakai, Unassigned)

References

(Depends on 1 open bug, Blocks 1 open bug)

Details

(Keywords: perf, Whiteboard: [platform-rel-BBC])

Attachments

(2 files)

Attached file a.js (deleted) —
Attached is a standalone shell benchmark from jsbeeb, a BBC Micro emulator. v8 is almost twice as fast as sm on this.

Note that jsbeeb has two possible modes, search for |if (1)| in the code. When the if is taken, v8 is almost twice as fast. If the else is taken (edit it to |0|) then they are about the same, and it uses typed array .set(). In other words, using .set() the two engines are about the same here. But when the if is taken, it does individual typed array operations (blitFb8, etc.), and v8 is much faster.
Blocks: gecko-games
We spend pretty much all time in JIT code, so that's good, but the JIT code has a bunch of type barriers and stuff that's not very efficient.
Flags: needinfo?(jdemooij)
Depends on: 1111218
Bug 1111218 helps a bit (2.0 ms/frame -> 1.74 ms/frame).

However there are some other issues:

(1) The test uses some Uint32Arrays, unfortunately those are slower than Int32Array for us. In blitFb8 we end converting values we read to double and immediately back to int32 to store them :( This is bug 750947 and we should really fix that soonish...

(2) It accesses many closure variables. It might be faster to make them globals or use a "run-once closure":

(function() {
   ...
})();

I'll look more next week, but pretty sure (1) is the main problem here.
Blocks: 750947
No longer blocks: 750947
Depends on: 750947
Flags: needinfo?(jdemooij)
Flags: needinfo?(jdemooij)
(In reply to Jan de Mooij [:jandem] from comment #2)
> (2) It accesses many closure variables. It might be faster to make them
> globals or use a "run-once closure":

Or make them properties of the Video object and use methods + |this|. Might be faster in all engines but hard to say...
Hi all, original author here. First up; thanks so much for taking a look: I'm so excited to be get help on the performance of my emulator.

The Uint32Array thing is very interesting: some other parts of the emulator also used unsigned arrays; is the issue the unsignedness throughout or just 32-bit unsigned?

I can definitely try making member methods and |this| instead of captures; but in my original benchmarking with jsperf, it seemed like it was roughly the same cost.  Glancing over some other jsperf results (e.g. http://jsperf.com/prototype-vs-closures/64) it appears that's no longer the case, and prototype-style wins out.

Please let me know if I can assist in any way, thanks!
(In reply to Matt Godbolt from comment #4)
> The Uint32Array thing is very interesting: some other parts of the emulator
> also used unsigned arrays; is the issue the unsignedness throughout or just
> 32-bit unsigned?

The engine can represent numbers as int32 or double. This means Uint32Array is fine as long as you read values that fit in int32 out of it, but if somewhere we read a value that's too big for an int32 we will convert all values we read there to double (and in this case we have to convert immediately back to int32 to store it to another uint32 array). That's slow.

Once we fix bug 750947, our optimizing JIT can deal with uint32 values directly, without converting them to either double or int32. In the meantime (and for other engines), it might be beneficial to use an Int32Array, or at least for the parts where you simply copy values (signedness doesn't matter for that).

(To be complete: the JIT can also optimize float32 values in some cases, as float32 operations are typically faster than the double ones, but in general it's int32 or double.)

> Please let me know if I can assist in any way, thanks!

Please file bugs or send me an email whenever you run into other performance issues in Firefox (and tell other people to do the same), I'd be happy to investigate.

Note that bug 1111218, which I found and fixed thanks to the test in this bug, also improved various other benchmarks on arewefastyet.com :)
Flags: needinfo?(jdemooij)
Thanks so much for the explanation. I tried wrapping all Uint32Arrays with a shim function:

        function makeFast(array) {
            return new Int32Array(array.buffer);
            //return array;
        }

but it didn't seem to make much difference. That being said; I'm on a train on my laptop on battery power so it's hard to be certain. I'll re-post a new sample once I've tested on a proper machine.

> Please file bugs or send me an email whenever you run into other performance issues in Firefox (and tell other people to do the same), I'd be happy to investigate.

I certainly will do! The CPU emulation is also slower than on v8: that's a little more involved to get isolated into a neat little file but I will definitely give it a go.

(Also, if you're interested, I have a little background on the whole project at http://xania.org/Emulation-archive - and full source is on https://github.com/mattgodbolt/jsbeeb should you wish to see the program in context)
I just ran while actually plugged in, racing the code in the attachment against the same code with 's/Uint32/Int32/g' applied. Running three times each and taking the minimum, there's a 2% speedup using Int32. That's quite nice, but the variance in the runs is so high I'm suspicious of this being not just a fluke.
(In reply to Matt Godbolt from comment #7)
> I just ran while actually plugged in, racing the code in the attachment
> against the same code with 's/Uint32/Int32/g' applied. Running three times
> each and taking the minimum, there's a 2% speedup using Int32.

Is this with a build that includes the fix for bug 1111218? I'd expect much more than a 2% speedup.
(In reply to Jan de Mooij [:jandem] from comment #8)
> 
> Is this with a build that includes the fix for bug 1111218? I'd expect much
> more than a 2% speedup.

Ah; no. It was with the same build as I had previously tested. I'll be sure to re-test with a recent build!
Just tried with a nightly build (built Dec 15 2014 at 07:27:52) and my somewhat questionable benchmarking methodology says the Int32 codepath is 5% faster now, which is definitely worth investigating more on my side - thanks!

> Or make them properties of the Video object and use methods + |this|. 

I'll also give the prototype-style thing a go; although I have a lot of "private" variables I realise now are part of the closure; Jan, what's your feeling on performance best practices: no closure variables at all (putting everything in "this")? Should I use "Video.prototype.XXX" instead of "this.XXX" ?  I've not found a compelling answer elsewhere, but I may not be looking in the right place.
(In reply to Matt Godbolt from comment #10)
> Just tried with a nightly build (built Dec 15 2014 at 07:27:52) and my
> somewhat questionable benchmarking methodology says the Int32 codepath is 5%
> faster now, which is definitely worth investigating more on my side - thanks!

Hm, that probably still doesn't include bug 1111218, it was merged to mozilla-central around 11am PST. It should be in the next nightly for sure. 5% is still less than I expected. Are we still a lot slower than V8?

> I'll also give the prototype-style thing a go; although I have a lot of
> "private" variables I realise now are part of the closure; Jan, what's your
> feeling on performance best practices: no closure variables at all (putting
> everything in "this")?

this.foo should work I think, especially if you keep the constructor very simple and initialize all variables there like: this.foo = bar; The JITs then know that all Video instances definitely have those properties...

Not sure about Video.prototype.XXX, it might be a bit faster but hard to say offhand...
Good news: now with today's nightly (built on Dec 16 2014 at 07:45:53) vs d8 3.31.0:

Best of three runs of the original code:

d8: 11434ms
js: 13704ms (19.9% slower than d8)

Now, original code modified to use prototype-style only (no closure variables or functions at all):

d8: 11465ms
js: 10658ms (7% faster than d8)

Finally, that plus changing all the Uint32Arrays to Int32Arrays:

d8: 11410ms
js: 10089ms (12% faster than d8)

The Uint->Int change itself is a 5.4% speedup.

I'll attach my latest code. Thanks so much for the help here: seems there's a little ways to go to work as fast as v8 in code that's closure-variable-heavy, but for prototype-style code, SpiderMonkey is faster! Great work :)
Attached file prototype-based, Int32-using codepath (deleted) —
Code as referenced in Comment 12
(In reply to Matt Godbolt from comment #12)
> Thanks so much for the help here: seems there's
> a little ways to go to work as fast as v8 in code that's
> closure-variable-heavy, but for prototype-style code, SpiderMonkey is
> faster! Great work :)

Awesome :) Please let us know if you run into other performance issues. I'll keep this bug open for the original testcase, as ideally it should be as fast as the second version.
Keywords: perf
Depends on: 1182668
Whiteboard: [platform-rel-BBC]
platform-rel: --- → ?
platform-rel: ? → ---
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: