Open Bug 771106 Opened 12 years ago Updated 2 years ago

Meta: improve memory access performance in Emscripten-translated code

Tracking

()

Status:

NEW

People

(Reporter: bhackett1024, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [js:t])

Attachments

(4 files)

basic translated fannkuch 12 years ago Brian Hackett [Laid off!] (deleted), application/x-javascript		Details
hand-optimized translated fannkuch 12 years ago Brian Hackett [Laid off!] (deleted), application/x-javascript		Details
C++ fannkuch 12 years ago Brian Hackett [Laid off!] (deleted), text/plain		Details
fannkuches 12 years ago Alon Zakai (:azakai) (deleted), application/octet-stream		Details

Brian Hackett [Laid off!]

Reporter

Description

•

12 years ago

Currently, memory access performance is a major gap between native code and Emscripten-translated JS (and other autotranslators). For a native access like: x[i] From C this is compiled to a single base+index instruction. The translated JS for this looks something like: Mem[(x + (i << 2)) >> 2] In some cases new variables can be introduced by optimizations (don't know if these are in Emscripten or the Closure compiler) which can eliminate one or both of these shifts, but those don't seem to be widely applicable and aren't used much on the (simple) fannkuch benchmark. This metabug is about changes to the JITs and/or Emscripten to allow the translation of x[i] to be compiled by the JIT to a base+index instruction and (hopefully hoisted from loops) bounds check. While Emscripten is the focus here, not looking to overly constrain the input and allow adaptation to/by other autotranslators.

Brian Hackett [Laid off!]

Reporter

Comment 1

•

12 years ago

Attached file basic translated fannkuch (deleted) — Details

Translated version of fannkuch-11 benchmark by emscripten -O3 (code is 1s faster than -O2). I currently get 10.2s in JM and 10.0s in IM.

Brian Hackett [Laid off!]

Reporter

Comment 2

•

12 years ago

Attached file hand-optimized translated fannkuch (deleted) — Details

Above benchmark hand optimized to eliminate most shifts. I get 9.3s in JM and 23.9s in IM (weird perf fault, needs investigating). This changes the representation of 'x' in the above to be an index into Mem[] rather than the absolute offset. Don't know how hard this change would be to make to Emscripten, but generating code like this would require less pattern matching in the JIT and would apply more easily to other JS engines.

Brian Hackett [Laid off!]

Reporter

Comment 3

•

12 years ago

Attached file C++ fannkuch (deleted) — Details

Original C++. gcc -O3 is 3.3s for me.

Brian Hackett [Laid off!]

Reporter

Updated

•

12 years ago

Attachment #639312 - Attachment is patch: false

Brian Hackett [Laid off!]

Reporter

Updated

•

12 years ago

Depends on: 771285

Brian Hackett [Laid off!]

Reporter

Updated

•

12 years ago

Depends on: 771383

Alon Zakai (:azakai)

Comment 4

•

12 years ago

Regarding things like Mem[(x + (i << 2)) >> 2] it is possible to split off x >> 2 and if that recurs to define x2 = x >> 2. However for (i << 2) >> 2 it is not trivial to replace it with say i | 0 since a few bits can get zeroed out here. This might become easier though once emscripten has a C++ LLVM backend (see bug 771285 comment 5) because likely inside LLVM it is straightforward to detect when such operations are on pointers (and we can reasonably assume the top few bits are not needed).

Alon Zakai (:azakai)

Comment 5

•

12 years ago

Attached file fannkuches (deleted) — Details

Ok, we already have the infrastructure in the current compiler to optimize similar expressions, I did some tests now. Attached are 4 versions of fannkuch, with an example of the code in each. Closure was not run to keep things readable. src.0.js HEAP32[($i_23_i << 2 >> 2) + $20$s2] src.1.js HEAP32[($i_23_i & 1073741823) + $20$s2] src.2.js HEAP32[($i_23_i | 0) + $20$s2] src.3.js HEAP32[$i_23_i + $20$s2] So, src.0.js is the original unmodified compiler. src.1.js replaces << >> with a single & with the proper mask, which is a safe transformation that seems like it could be useful (1 operation instead of 2). src.2.js does an unsafe transformation of << >> to | 0, which is valid for pointers and happens to be ok here. Finally, src.3.js is the same as the previous one but without the |0, for the smallest possible code. time mozjs --ion -n src.*.js 11 gives src.0.js 9.869 seconds src.1.js 9.797 src.2.js 9.793 src.3.js 10.273 1 and 2 give slightly less than a 1% speedup. Very different than the hand-optimized version mentioned earlier so I guess that hand-optimized one did something important that these simple optimizations did not. Note that 1% is not a bad thing in itself. However 1 generates larger code and 2 is unsafe, so for now I won't use these optimizations in emscripten. 3 is slower, not surprising I guess, the JIT needs to add checks on the type of the variable inside HEAP.

Alon Zakai (:azakai)

Comment 6

•

12 years ago

Btw, $20$s2 is a helper variable the optimizer generated, after seeing that $20 was used through >> 2 several times, so it defined $20$s2 = $20 >> 2.

Brian Hackett [Laid off!]

Reporter

Comment 7

•

12 years ago

Ah, what are the times with -m -n? ion still behaves weirdly here and is slower on the optimized benchmark.

Alon Zakai (:azakai)

Comment 8

•

12 years ago

Ok, with -m -n I get src.0.js 9.761 src.1.js 9.713 src.2.js 9.749 src.3.js 10.141

Brian Hackett [Laid off!]

Reporter

Updated

•

12 years ago

Depends on: 771835

Brian Hackett [Laid off!]

Reporter

Updated

•

12 years ago

Depends on: 771864

David Mandelin [:dmandelin]

Updated

•

12 years ago

Blocks: 767238

Whiteboard: [js:t]

Nobody; OK to take it and work on it

Assignee

Updated

•

10 years ago

Assignee: general → nobody

BMO Automation

Updated

•

2 years ago

Severity: normal → S3

You need to log in before you can comment on or make changes to this bug.

Bugzilla

Meta: improve memory access performance in Emscripten-translated code

Categories

(Core :: JavaScript Engine, defect)

Tracking

()

People

(Reporter: bhackett1024, Unassigned)

References

(Depends on 2 open bugs, Blocks 1 open bug)

Details

(Whiteboard: [js:t])

Crash Data

Security

(public)

User Story

Attachments

(4 files)

Description

Comment 1

Comment 2

Comment 3

Updated

Updated

Updated

Comment 4

Comment 5

Comment 6

Comment 7

Comment 8

Updated

Updated

Updated

Updated

Updated

Attachment

General

Description

File Name

Content Type