Closed
Bug 607874
Opened 14 years ago
Closed 13 years ago
TM: unnecessary integer overflow checks suck
Categories
(Core :: JavaScript Engine, defect)
Tracking
()
RESOLVED
WONTFIX
People
(Reporter: n.nethercote, Assigned: n.nethercote)
Details
The most important part of imaging-gaussian-blur in Kraken looks like this:
function gaussianBlur() {
for (var y = 0; y < height; ++y) {
for (var x = 0; x < width; ++x) {
var r = 0, g = 0, b = 0, a = 0;
for (var j = 1 - kernelSize; j < kernelSize; ++j) {
if (y + j < 0 || y + j >= height) continue;
for (var i = 1 - kernelSize; i < kernelSize; ++i) {
if (x + i < 0 || x + i >= width) continue;
r += squidImageData[4 * ((y + j) * width + (x + i)) + 0] * k
ernel[Math.abs(j)][Math.abs(i)];
g += squidImageData[4 * ((y + j) * width + (x + i)) + 1] * k
ernel[Math.abs(j)][Math.abs(i)];
b += squidImageData[4 * ((y + j) * width + (x + i)) + 2] * k
ernel[Math.abs(j)][Math.abs(i)];
a += squidImageData[4 * ((y + j) * width + (x + i)) + 3] * k
ernel[Math.abs(j)][Math.abs(i)];
}
}
squidImageData[4 * (y * width + x) + 0] = r / kernelSum;
squidImageData[4 * (y * width + x) + 1] = g / kernelSum;
squidImageData[4 * (y * width + x) + 2] = b / kernelSum;
squidImageData[4 * (y * width + x) + 3] = a / kernelSum;
}
}
return squidImageData;
}
and it's the inner loop that's most important. Nanojit CSEs the |4 * ((y + j) * width + (x + i))| expression, which is good.
However, the two multiplies and three adds in that expression all have integer overflow checks. But to a human it's easy to see these checks aren't needed. ('width' is set to 400 and 'kernelSize' is set to 7 elsewhere).
Now, it's not the overflow checks themselves that is the problem, so much as the fact that guards keep stack stores alive. Kraken runs about 1.05x faster if I (unsafely) disable overflow checks. Here are the corresponding instruction counts:
---------------------------------------------------------------
| millions of instructions executed |
| total | on-trace (may overestimate) |
---------------------------------------------------------------
| 4574.490 4574.600 (------) | 4286.153 4286.156 (------) | ai-astar
| 3058.918 3049.939 (1.003x) | 1460.148 1449.772 (1.007x) | audio-beat-det
| 1339.516 1313.584 (1.020x) | 1138.747 1112.893 (1.023x) | audio-dft
| 2806.632 2797.575 (1.003x) | 1412.124 1401.952 (1.007x) | audio-fft
| 2787.669 2754.894 (1.012x) | 1852.108 1819.321 (1.018x) | audio-oscillat
| 6931.230 6193.816 (1.119x) | 4784.156 4047.032 (1.182x) | imaging-gaussi
| 3308.192 3268.408 (1.012x) | 761.520 722.577 (1.054x) | imaging-darkro
| 6145.874 5825.529 (1.055x) | 4006.649 3686.355 (1.087x) | imaging-desatu
| 676.676 676.454 (------) | 10.073 10.072 (------) | json-parse-fin
| 498.369 498.233 (------) | 5.954 5.954 (------) | json-stringify
| 1326.425 1325.770 (------) | 782.597 782.568 (------) | stanford-crypt
| 764.065 719.947 (1.061x) | 369.166 360.748 (1.023x) | stanford-crypt
| 1157.988 1097.978 (1.055x) | 591.951 555.279 (1.066x) | stanford-crypt
| 507.230 429.067 (1.182x) | 201.616 183.557 (1.098x) | stanford-crypt
-------
| 35883.279 34525.802 (1.039x) | 21662.967 20424.241 (1.061x) | all
Nanojit has an interval analysis that would eliminate the overflow checks, if only it knew the bounds of i, j, x, y, and width. But for the fragment representing the inner loop it just sees each of those variables as loads from the stack. It would be great if we could infer the bounds and then find a way to communicate that to the tracer.
(Nb: Bug 536641 is similar to this one, but only encompasses overflow checks for the loop counter increments. Those actually aren't that important because they are near the end of fragments and so the guards don't keep much stuff alive. Also, that bug is kind of stale.)
Rather than eliminating the checks, could you just move them to the top of the fragment? Maybe I misunderstand, but it seems like that would also eliminate many of the stack stores.
Assignee | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
> Rather than eliminating the checks, could you just move them to the top of the
> fragment? Maybe I misunderstand, but it seems like that would also eliminate
> many of the stack stores.
It's not a bad idea. And you don't necesarily need to move them to the top -- you just want to move them to a place where the VM stack is as close to empty as possible, so that as few stack stores are kept alive as possible.
There are two complications:
- The guard obviously has to come after its operands are defined. And they're usually defined only shortly before. So that limits the scope of guard movement greatly.
- Nanojit's design makes it difficult to rewrite LIR in a second pass, and I can't see how you'd do this without a second pass. (Bug 545406 has a draft patch implementing LICM which demonstrates the minor heroics required to do rewriting.)
----
Relatedly, someone will undoubtedly mention sinking the stores to side-exits if I don't do it myself. We have bug 537842 open for it, but it's complicated, the right design is not clear, and it would also require significant Nanojit changes. No guard is better than a sunk guard!
Assignee | ||
Comment 3•14 years ago
|
||
(In reply to comment #2)
>
> - Nanojit's design makes it difficult to rewrite LIR in a second pass, and I
> can't see how you'd do this without a second pass. (Bug 545406 has a draft
> patch implementing LICM which demonstrates the minor heroics required to do
> rewriting.)
I just realized that the rewriting heroics aren't necessary. LIR has a 'skip' instruction which simply points to another instruction; it's used to link together chunks of LIR.
You could overwrite an existing LIR instruction with a 'skip', which would trampoline you to an out-of-line chunk holding the replacement chunk, and then skip back.
Comment 4•14 years ago
|
||
Brian and I talked about this a bit. The code in comment 0 can skip guards if it knows the value ranges of width, height, and kernelsize. The problem is that there's no way to know them at trace-time, nor in general. For example, chrome script could reach into this window, change the global variable values, and run the function again; that would pick up the existing trace if we have one.
So the trace either needs to guard on value ranges for the free variables there (which value ranges?) or be specialized to exact values of those free variables (could hurt in some cases) or something....
Assignee | ||
Comment 5•14 years ago
|
||
(In reply to comment #4)
>
> So the trace either needs to guard on value ranges for the free variables there
> (which value ranges?) or be specialized to exact values of those free variables
> (could hurt in some cases) or something....
Thanks for the info, that's good to know. The latter sounds like a bad idea. But extra guards might be possible. I recently realized that LIR's 'skip' instruction makes it possible to patch in extra instructions into a LIR chunk by replacing an existing instruction with a skip that trampolines to an out-of-line LIR sequence and then skips back.
On a related note, LICM (bug 545406) has the potential to reduce the number of overflow checks. Consider the expression |4 * ((y + j) * width + (x + i))| from comment 0. In the inner loop all the variables except 'i' are loop-invariant. Which means that the |(y + j) * width| sub-expression is loop-invariant, so two overflow checks per iteration can be avoided.
Assignee | ||
Comment 6•13 years ago
|
||
TM's days are numbered: WONTFIX.
Status: ASSIGNED → RESOLVED
Closed: 13 years ago
Resolution: --- → WONTFIX
You need to log in
before you can comment on or make changes to this bug.
Description
•