Open
Bug 771106
Opened 12 years ago
Updated 2 years ago
Meta: improve memory access performance in Emscripten-translated code
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
NEW
People
(Reporter: bhackett1024, Unassigned)
References
(Depends on 2 open bugs, Blocks 1 open bug)
Details
(Whiteboard: [js:t])
Attachments
(4 files)
Currently, memory access performance is a major gap between native code and Emscripten-translated JS (and other autotranslators). For a native access like:
x[i]
From C this is compiled to a single base+index instruction. The translated JS for this looks something like:
Mem[(x + (i << 2)) >> 2]
In some cases new variables can be introduced by optimizations (don't know if these are in Emscripten or the Closure compiler) which can eliminate one or both of these shifts, but those don't seem to be widely applicable and aren't used much on the (simple) fannkuch benchmark.
This metabug is about changes to the JITs and/or Emscripten to allow the translation of x[i] to be compiled by the JIT to a base+index instruction and (hopefully hoisted from loops) bounds check.
While Emscripten is the focus here, not looking to overly constrain the input and allow adaptation to/by other autotranslators.
Reporter | ||
Comment 1•12 years ago
|
||
Translated version of fannkuch-11 benchmark by emscripten -O3 (code is 1s faster than -O2). I currently get 10.2s in JM and 10.0s in IM.
Reporter | ||
Comment 2•12 years ago
|
||
Above benchmark hand optimized to eliminate most shifts. I get 9.3s in JM and 23.9s in IM (weird perf fault, needs investigating).
This changes the representation of 'x' in the above to be an index into Mem[] rather than the absolute offset. Don't know how hard this change would be to make to Emscripten, but generating code like this would require less pattern matching in the JIT and would apply more easily to other JS engines.
Reporter | ||
Comment 3•12 years ago
|
||
Original C++. gcc -O3 is 3.3s for me.
Reporter | ||
Updated•12 years ago
|
Attachment #639312 -
Attachment is patch: false
Comment 4•12 years ago
|
||
Regarding things like
Mem[(x + (i << 2)) >> 2]
it is possible to split off x >> 2 and if that recurs to define x2 = x >> 2. However for (i << 2) >> 2 it is not trivial to replace it with say i | 0 since a few bits can get zeroed out here. This might become easier though once emscripten has a C++ LLVM backend (see bug 771285 comment 5) because likely inside LLVM it is straightforward to detect when such operations are on pointers (and we can reasonably assume the top few bits are not needed).
Comment 5•12 years ago
|
||
Ok, we already have the infrastructure in the current compiler to optimize similar expressions, I did some tests now. Attached are 4 versions of fannkuch, with an example of the code in each. Closure was not run to keep things readable.
src.0.js HEAP32[($i_23_i << 2 >> 2) + $20$s2]
src.1.js HEAP32[($i_23_i & 1073741823) + $20$s2]
src.2.js HEAP32[($i_23_i | 0) + $20$s2]
src.3.js HEAP32[$i_23_i + $20$s2]
So, src.0.js is the original unmodified compiler. src.1.js replaces << >> with a single & with the proper mask, which is a safe transformation that seems like it could be useful (1 operation instead of 2). src.2.js does an unsafe transformation of << >> to | 0, which is valid for pointers and happens to be ok here. Finally, src.3.js is the same as the previous one but without the |0, for the smallest possible code.
time mozjs --ion -n src.*.js 11
gives
src.0.js 9.869 seconds
src.1.js 9.797
src.2.js 9.793
src.3.js 10.273
1 and 2 give slightly less than a 1% speedup. Very different than the hand-optimized version mentioned earlier so I guess that hand-optimized one did something important that these simple optimizations did not.
Note that 1% is not a bad thing in itself. However 1 generates larger code and 2 is unsafe, so for now I won't use these optimizations in emscripten.
3 is slower, not surprising I guess, the JIT needs to add checks on the type of the variable inside HEAP.
Comment 6•12 years ago
|
||
Btw, $20$s2 is a helper variable the optimizer generated, after seeing that $20 was used through >> 2 several times, so it defined $20$s2 = $20 >> 2.
Reporter | ||
Comment 7•12 years ago
|
||
Ah, what are the times with -m -n? ion still behaves weirdly here and is slower on the optimized benchmark.
Comment 8•12 years ago
|
||
Ok, with -m -n I get
src.0.js 9.761
src.1.js 9.713
src.2.js 9.749
src.3.js 10.141
Assignee | ||
Updated•10 years ago
|
Assignee: general → nobody
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•