Closed Bug 769765 Opened 12 years ago Closed 10 years ago

2.5x Slower than Chrome on Compiled Sundown

Tracking

()

Status:

RESOLVED WORKSFORME

People

(Reporter: azakai, Unassigned)

References

(Depends on 1 open bug,
URL
)

Details

(Whiteboard: [js:t])

Attachments

(1 obsolete file)

patch (8cc01c494f6a) 12 years ago Brian Hackett [Laid off!] (deleted), patch		Details \| Diff \| Splinter Review

Alon Zakai (:azakai)

Reporter

Description

•

12 years ago

Someone compiled the Sundown markdown parser with Emscripten. It runs much faster on Chrome than Nightly or IonMonkey Nightly, Firefox gets emscript x 2,517 ops/sec ±3.66% (8 runs sampled) (IonMonkey is slightly faster, 2,759 ops/sec) pagedown x 750 ops/sec ±2.16% (8 runs sampled) while Chrome gets emscript x 6,062 ops/sec ±0.55% (9 runs sampled) pagedown x 521 ops/sec ±2.26% (9 runs sampled) emscript is Emscripten compiled Sundown, pagedown is the handwritten pagedown parser. For more background see https://github.com/github/gollum/issues/404#issuecomment-6670367 So Firefox is a bit faster on the handwritten one, but on the high-performance one we are more than 2x slower.

Alon Zakai (:azakai)

Reporter

Updated

•

12 years ago

URL: http://bootstraponline.github.com/liv...

Brian Hackett [Laid off!]

Comment 1

•

12 years ago

Per the JIT inspector (https://addons.mozilla.org/en-US/firefox/addon/jit-inspector/) the problem here is that we're taking a bazillion stub calls for some x >>> 0 operations. Looks like some more fast paths would be handy.

Brian Hackett [Laid off!]

Comment 2

•

12 years ago

So the issue here is that we always stub x >>> 0 accesses, regardless of the type of x or of the result. That could be improved to inline the opcode and convert to double if necessary, but the fastest thing would be to sniff the uint32 comparison which is happening here, e.g. x >>> 0 >= y >>> 0, generate plain int32s for the lhs/rhs and do an unsigned comparison. Alon, what are the possible cases when Emscripten generates x >>> 0 ?

Alon Zakai (:azakai)

Reporter

Comment 3

•

12 years ago

x >>> 0 is generated * When we need to use an i32 value in an unsigned manner, so for unsigned i32 / or % as well as comparisons <>=!=. This is the most important case. Here you see things like x >>> 0 >= y >>> 0. * When converting floats to unsigned i32s and other conversions.

David Mandelin [:dmandelin]

Updated

•

12 years ago

Blocks: WebJSPerf

Whiteboard: [js:t]

Brian Hackett [Laid off!]

Comment 4

•

12 years ago

Attached patch patch (8cc01c494f6a) (obsolete) (deleted) — Details — Splinter Review

Peephole optimize comparisons of the form (X >>> 0) op (Y >>> 0). The >>> 0 operations are treated as if they are | 0, and unsigned comparisons are used for the operation on the resulting signed int32s. This will need a bit of port work to merge to IM. function foo(n) { for (var i = 0; i >>> 0 <= n >>> 0; i++) {} } foo(1000000); Before: 479 After: 2

Assignee: general → bhackett1024

Attachment #638120 - Flags: review?(dvander)

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 5

•

12 years ago

Comment on attachment 638120 [details] [diff] [review] patch (8cc01c494f6a) It's unfortunate that optimizing this requires lots of new instrumentation in the interpreter. It seems like it could be done entirely within a compiler like IonMonkey. Is there something about type inference that makes this necessary?

Brian Hackett [Laid off!]

Comment 6

•

12 years ago

Keeping information in sync between the compiler and interpreter is good for ensuring a consistent representation in the presence of invalidation. What happens with an IM optimization on code like: i >>> 0 >= somethingComplicated() >>> 0 If somethingComplicated() invalidates the calling script then we end up in the interpreter for the comparison, and if the interpreter's representation of i >>> 0 does not account for an overflow that occurred then the comparison may compute the wrong result.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 7

•

12 years ago

For IonMonkey at least, that shouldn't be a problem; we could choose to not speculate if the whole expression wasn't idempotent. Otherwise, we could introduce uint32 types in the compiler, and then bailouts could be taught to handle those pretty easily.

Brian Hackett [Laid off!]

Comment 8

•

12 years ago

Eh, choosing to not optimize this for anything other than the simplest ops seems a hack and will hurt robustness. Adding uint32 types doesn't seem any simpler than what's already happening here. This is a sideshow though, this bug is about the performance of the browser we're shipping today.

David Anderson [:dvander] - inactive, e-mail if emergency

Comment 9

•

12 years ago

The difference is that solving this completely in the JIT wouldn't need to add new invariants to the VM.

Brian Hackett [Laid off!]

Comment 10

•

12 years ago

I'm not sure why JIT complexity is a better thing to have than VM complexity. This change is pretty isolated, mirrors stuff we've done in the past (e.g. lazy arguments optimization) and preserves coherence between the JIT and interpreter, which *is* a good property to have both for cleanliness of implementation and simplicity of design.

Luke Wagner [:luke]

Comment 11

•

12 years ago

The lazy arguments optimization is not a good model of simplicity. If it wasn't for the gnarly interactions with scopes in the VM, I would've tried to make it a simple jit-local thing. Having done the rest of the scope changes now, I wouldn't be surprised if there was a simple jit-local optimization strategy... I should revisit this...

Brian Hackett [Laid off!]

Updated

•

12 years ago

Attachment #638120 - Attachment is obsolete: true

Attachment #638120 - Flags: review?(dvander)

Brian Hackett [Laid off!]

Updated

•

12 years ago

Assignee: bhackett1024 → general

Alon Zakai (:azakai)

Reporter

Comment 12

•

12 years ago

Seems v8 is making progress on optimizing uint32s, http://code.google.com/p/v8/issues/detail?id=2097 which should make them even faster on this type of code I believe.

Tom S [:evilpie]

Updated

•

12 years ago

Depends on: 750947

Paul Wright

Comment 13

•

12 years ago

Would the potential performance improvements resulting from this bug be considered "low hanging fruit"? If so, what is preventing the fruit from being picked? There are a multitude of Ion performance bugs, and Ion will be shipping in a released browser relatively soon.

Guilherme Lima

Comment 14

•

10 years ago

Right now: Nightly 32.0a1 (2014-05-28) emscript x 3.49 ops/sec ±19.57% (8 runs sampled) pagedown x 0.23 ops/sec ±0.89% (6 runs sampled) markdown deep x 2.37 ops/sec ±20.22% (9 runs sampled) Chrome 35 emscript x 2.18 ops/sec ±5.81% (8 runs sampled) pagedown x 0.24 ops/sec ±1.76% (6 runs sampled) markdown deep x 7.01 ops/sec ±1.56% (9 runs sampled) Should this bug remain open because of the 'markdown deep'?

Flags: needinfo?(azakai)

Alon Zakai (:azakai)

Reporter

Comment 15

•

10 years ago

I think markdown deep is not emscripten-compiled? In any case, unless these were refreshed, I assume the compiled code here is from 2 years ago, and firefox does well on the emscripten version anyhow, so I think we can close this.

Status: NEW → RESOLVED

Closed: 10 years ago

Flags: needinfo?(azakai)

Resolution: --- → WORKSFORME

You need to log in before you can comment on or make changes to this bug.