Closed Bug 769765 Opened 12 years ago Closed 10 years ago

2.5x Slower than Chrome on Compiled Sundown

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED WORKSFORME

People

(Reporter: azakai, Unassigned)

References

(Depends on 1 open bug, )

Details

(Whiteboard: [js:t])

Attachments

(1 obsolete file)

Someone compiled the Sundown markdown parser with Emscripten. It runs much faster on Chrome than Nightly or IonMonkey Nightly, Firefox gets emscript x 2,517 ops/sec ±3.66% (8 runs sampled) (IonMonkey is slightly faster, 2,759 ops/sec) pagedown x 750 ops/sec ±2.16% (8 runs sampled) while Chrome gets emscript x 6,062 ops/sec ±0.55% (9 runs sampled) pagedown x 521 ops/sec ±2.26% (9 runs sampled) emscript is Emscripten compiled Sundown, pagedown is the handwritten pagedown parser. For more background see https://github.com/github/gollum/issues/404#issuecomment-6670367 So Firefox is a bit faster on the handwritten one, but on the high-performance one we are more than 2x slower.
Per the JIT inspector (https://addons.mozilla.org/en-US/firefox/addon/jit-inspector/) the problem here is that we're taking a bazillion stub calls for some x >>> 0 operations. Looks like some more fast paths would be handy.
So the issue here is that we always stub x >>> 0 accesses, regardless of the type of x or of the result. That could be improved to inline the opcode and convert to double if necessary, but the fastest thing would be to sniff the uint32 comparison which is happening here, e.g. x >>> 0 >= y >>> 0, generate plain int32s for the lhs/rhs and do an unsigned comparison. Alon, what are the possible cases when Emscripten generates x >>> 0 ?
x >>> 0 is generated * When we need to use an i32 value in an unsigned manner, so for unsigned i32 / or % as well as comparisons <>=!=. This is the most important case. Here you see things like x >>> 0 >= y >>> 0. * When converting floats to unsigned i32s and other conversions.
Blocks: WebJSPerf
Whiteboard: [js:t]
Attached patch patch (8cc01c494f6a) (obsolete) (deleted) — Splinter Review
Peephole optimize comparisons of the form (X >>> 0) op (Y >>> 0). The >>> 0 operations are treated as if they are | 0, and unsigned comparisons are used for the operation on the resulting signed int32s. This will need a bit of port work to merge to IM. function foo(n) { for (var i = 0; i >>> 0 <= n >>> 0; i++) {} } foo(1000000); Before: 479 After: 2
Assignee: general → bhackett1024
Attachment #638120 - Flags: review?(dvander)
Comment on attachment 638120 [details] [diff] [review] patch (8cc01c494f6a) It's unfortunate that optimizing this requires lots of new instrumentation in the interpreter. It seems like it could be done entirely within a compiler like IonMonkey. Is there something about type inference that makes this necessary?
Keeping information in sync between the compiler and interpreter is good for ensuring a consistent representation in the presence of invalidation. What happens with an IM optimization on code like: i >>> 0 >= somethingComplicated() >>> 0 If somethingComplicated() invalidates the calling script then we end up in the interpreter for the comparison, and if the interpreter's representation of i >>> 0 does not account for an overflow that occurred then the comparison may compute the wrong result.
For IonMonkey at least, that shouldn't be a problem; we could choose to not speculate if the whole expression wasn't idempotent. Otherwise, we could introduce uint32 types in the compiler, and then bailouts could be taught to handle those pretty easily.
Eh, choosing to not optimize this for anything other than the simplest ops seems a hack and will hurt robustness. Adding uint32 types doesn't seem any simpler than what's already happening here. This is a sideshow though, this bug is about the performance of the browser we're shipping today.
The difference is that solving this completely in the JIT wouldn't need to add new invariants to the VM.
I'm not sure why JIT complexity is a better thing to have than VM complexity. This change is pretty isolated, mirrors stuff we've done in the past (e.g. lazy arguments optimization) and preserves coherence between the JIT and interpreter, which *is* a good property to have both for cleanliness of implementation and simplicity of design.
The lazy arguments optimization is not a good model of simplicity. If it wasn't for the gnarly interactions with scopes in the VM, I would've tried to make it a simple jit-local thing. Having done the rest of the scope changes now, I wouldn't be surprised if there was a simple jit-local optimization strategy... I should revisit this...
Attachment #638120 - Attachment is obsolete: true
Attachment #638120 - Flags: review?(dvander)
Assignee: bhackett1024 → general
Seems v8 is making progress on optimizing uint32s, http://code.google.com/p/v8/issues/detail?id=2097 which should make them even faster on this type of code I believe.
Depends on: 750947
Would the potential performance improvements resulting from this bug be considered "low hanging fruit"? If so, what is preventing the fruit from being picked? There are a multitude of Ion performance bugs, and Ion will be shipping in a released browser relatively soon.
Right now: Nightly 32.0a1 (2014-05-28) emscript x 3.49 ops/sec ±19.57% (8 runs sampled) pagedown x 0.23 ops/sec ±0.89% (6 runs sampled) markdown deep x 2.37 ops/sec ±20.22% (9 runs sampled) Chrome 35 emscript x 2.18 ops/sec ±5.81% (8 runs sampled) pagedown x 0.24 ops/sec ±1.76% (6 runs sampled) markdown deep x 7.01 ops/sec ±1.56% (9 runs sampled) Should this bug remain open because of the 'markdown deep'?
Flags: needinfo?(azakai)
I think markdown deep is not emscripten-compiled? In any case, unless these were refreshed, I assume the compiled code here is from 2 years ago, and firefox does well on the emscripten version anyhow, so I think we can close this.
Status: NEW → RESOLVED
Closed: 10 years ago
Flags: needinfo?(azakai)
Resolution: --- → WORKSFORME
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: