Closed
Bug 606561
Opened 14 years ago
Closed 13 years ago
optimize various Math calls
Categories
(Tamarin Graveyard :: Virtual Machine, defect, P2)
Tamarin Graveyard
Virtual Machine
Tracking
(Not tracked)
RESOLVED
FIXED
Q4 11 - Anza
People
(Reporter: wsharp, Assigned: wmaddox)
References
Details
(Whiteboard: PACMAN WE:3001900)
Attachments
(6 files, 5 obsolete files)
(deleted),
patch
|
wmaddox
:
review-
|
Details | Diff | Splinter Review |
(deleted),
patch
|
edwsmith
:
superreview+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max
(deleted),
patch
|
rreitmai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rreitmai
:
review+
|
Details | Diff | Splinter Review |
(deleted),
patch
|
rreitmai
:
review+
|
Details | Diff | Splinter Review |
From the brightspot data from bug 588922, we should optimize various Math routines to either emit custom JIT code or to directly bind to the routine (no thunk). The potential list:
Math.round
Math.floor
Math._max (2 arg)
Math.abs
Math._min (2 arg)
Math.sin
Math.random
Math.ceil
Math.cos
Math.pow
Math.sqrt
Reporter | ||
Updated•14 years ago
|
Whiteboard: PACMAN
Reporter | ||
Comment 1•14 years ago
|
||
Patch which changes inlineBuiltinFunction to be table driven. Add helper functions for isNaN, Math.min, Math.max. Add table entries for charCodeAt, Math.sin.
Looking for feedback on the basic approach related to discussion in bug 598322.
Attachment #485380 -
Flags: feedback?(wmaddox)
Attachment #485380 -
Flags: feedback?(rreitmai)
Assignee | ||
Comment 2•14 years ago
|
||
(In reply to comment #1)
> Created attachment 485380 [details] [diff] [review]
> Looking for feedback on the basic approach related to discussion in bug 598322.
We need not search the entirety of specializedFunctions[], as we have already done a switch on the method id and dispatched to a case label specific to that id, and can simply invoke a specialization method with the subset of the table that is applicable to that id. It is then necessary only to iterate over the
possible argument matches, making the lookup time insensitive to the number of functions for which specializations are defined.
It is probably possible to simplify the argument matching and specialization code a bit. Taking advantage of the the initial switch on the method id, we could invoke a separate specialization method for each arity, reducing the internal branchiness of the specialization method(s). I have an inkling that further streamlining is possible -- I'll take a closer look and get back to you with with a more careful analysis.
Reporter | ||
Comment 3•14 years ago
|
||
I don't think as this work progresses that we want to have a large switch statement with 30+ cases to handle each specialization, even if they are table driven. (Or 30 individual tables, one for each native ID.) It would be better to generate a HashTable to find a match. We definitely can't linearly search for a match on each native which is why I started the top-level case to begin with.
HashKey could be (id, argCount, argType0, argType1) but the multiple matches for argType could be tricky. An arg could match an int/uint/Number function signature and we want the most optimized one to be hit first.
Maybe a bit on MethodInfo can be set for a native method that has a specialization entry. Some investigation is still required.
Reporter | ||
Updated•14 years ago
|
Assignee: nobody → wsharp
Reporter | ||
Comment 4•14 years ago
|
||
Follow up to last patch, remove the specializeOneArgFunction function and switch to a table driven approach. A HashMap of <method id, argCount> keys points to starting entries in our static table. We iterate over all matching entries in the table looking for matching argument types. This adds direct calls support for nearly all the Math routines, String.length plus the existing charAt/charCodeAt/isNaN. Math.abs, min, max are emitted as pure LIR ops.
Remove Math._min/_max and Verifier::findMathFunction logic which is obsolete now.
Faster Windows MathUtils::round which used to just call MathUtils.floor(value+0.5)
Microbenchmarks:
wsharp@wsharp-PC ~
$ $AVM test.abc
int = Math.abs(int) elapsed time is 803
Number = Math.abs(Number) elapsed time is 912
Math.max(int,int) elapsed time is 831
Math.max(Number,Number) elapsed time is 830
Math.sqrt(Number) elapsed time is 1056
Math.round(Number) elapsed time is 1830
Math.cos(Number) elapsed time is 4533
Math.sin(Number) elapsed time is 5024
Math.floor(Number) elapsed time is 1791
Math.ceil(Number) elapsed time is 1823
wsharp@wsharp-PC ~
$ $AVM2 test.abc
int = Math.abs(int) elapsed time is 2720
Number = Math.abs(Number) elapsed time is 2229
Math.max(int,int) elapsed time is 2671
Math.max(Number,Number) elapsed time is 2951
Math.sqrt(Number) elapsed time is 2558
Math.round(Number) elapsed time is 3244
Math.cos(Number) elapsed time is 6683
Math.sin(Number) elapsed time is 6791
Math.floor(Number) elapsed time is 3936
Math.ceil(Number) elapsed time is 3840
Attachment #485380 -
Attachment is obsolete: true
Attachment #487662 -
Flags: superreview?(edwsmith)
Attachment #487662 -
Flags: review?(wmaddox)
Attachment #485380 -
Flags: feedback?(wmaddox)
Attachment #485380 -
Flags: feedback?(rreitmai)
Reporter | ||
Comment 5•14 years ago
|
||
Running acceptance after last minute change showed two problems. One typo and also that we should not treat negative zero as an integer.
Attachment #487680 -
Flags: superreview?(edwsmith)
Attachment #487680 -
Flags: review?(wmaddox)
Flags: flashplayer-qrb+
Priority: -- → P2
Target Milestone: --- → flash10.x - Serrano
Reporter | ||
Updated•14 years ago
|
Attachment #487662 -
Flags: superreview?(edwsmith)
Attachment #487662 -
Flags: review?(wmaddox)
Assignee | ||
Comment 6•14 years ago
|
||
Comment on attachment 487680 [details] [diff] [review]
slight update to last patch
R+, with comments added/fixed per notes below:
The second part of the psuedocode comment in emitMathMin() doesn't make sense, and doesn't agree with the expression it is supposed to be equivalent to. It does not appear that the 'y' argument can ever be the result.
It may be cheaper to explicitly OR the result of the comparision and the null check and use only one insChoose() on targets that do not implement cmov.
Then again, you may lose on targets that can't generate a boolean result without a branch. Just something to consider.
Similarly, for emitMathMax().
Need header comment on specializedFunctions[] array describing the interpretation of the fields of each entry.
Attachment #487680 -
Flags: review?(wmaddox) → review+
Reporter | ||
Comment 7•14 years ago
|
||
LIR_cmovd (and/or multiple uses of it in a row) is terribly buggy on various platforms. Switch our abs/min/max code to use if/else cases to get all sandboxes happy (including -Dnosse).
Attachment #487662 -
Attachment is obsolete: true
Attachment #487680 -
Attachment is obsolete: true
Attachment #489208 -
Flags: superreview?(edwsmith)
Attachment #489208 -
Flags: review?(wmaddox)
Attachment #487680 -
Flags: superreview?(edwsmith)
Comment 8•14 years ago
|
||
In case I missed an earlier discussion -- what's the problem with cmovd? is it buggy implementations in Nanojit backends? fine if so, but we should write bugs when possible.
Reporter | ||
Comment 9•14 years ago
|
||
PowerPC - not implemented
-DnoSSE - can't find a float register error (two cmovd's active at once?)
MIPS - unknown problems (failed sandbox acceptance tests)
SH4 - unknown problems (failed sandbox acceptance tests)
Comment 10•14 years ago
|
||
Cool. I copied these notes over to bug 584270.
Assignee | ||
Comment 11•14 years ago
|
||
Comment on attachment 489208 [details] [diff] [review]
dont rely on buggy cmovd
1) In emitMathMin() and emitMathMax(), you no longer use insChoose() for the non-integer case. Based on some remarks from Moh a while back, I would not be the least bit surprised if this code is faster on Intel CPUs. Intel's performance recommendations prefer conditional control over CMOV for cases that predict well. The situation may likely be different on other CPUs, e.g., ARM. If the change was motivated strictly to work around buggy cmov implementations, and has not been experimentally validated for better performance across platforms, please leave a comment so we might revisit the issue when the backends are fixed.
2) Math.abs() is inlined as follows:
if (value >= 0)
return i2d(value);
else
return i2d(-value);
This will fail for MIN_INT, as the expression -value will then overflow.
You should convert value as-is, and negate the resulting float.
3) I am concerned that testing did not catch this, perhaps because, in the absence of this optimization, Math.abs(MIN_INT) is not an interesting boundary case. New white-box tests should be added to cover such cases. While on the subject of tests, note also the usage of JIT_EVENT() in the addition fastpath and elswhere. This can be used to validate tests for branch coverage in the generated code.
Attachment #489208 -
Flags: review?(wmaddox) → review-
Updated•14 years ago
|
Attachment #489208 -
Flags: superreview?(edwsmith)
Reporter | ||
Comment 12•14 years ago
|
||
Remove integer path from Math.abs. Adding checks for 0x80000000 make it just as slow as the number path. Add test case to catch this case in ecma3 Math.abs test.
Related to cmovd, x86+x64 both use branches for cmovd (see Assembler::asm_cmov). I'm not sure if there is a conditional move instruction for SSE that we could use but currently nanojit is just a branch. The new code is a couple percent slower than the older code in micobenchmarks. (win32 x86). Previously I did try to simplify the code to only use one insChoose instruction but that did not work correctly with ORing the intermediate boolean checks.
Also re-enable Math.min/max inlining for Math_private__min and Math_private__max. Interpreter execution requires we keep our _min and _max code for negative zero bug so JIT code will see those as well as regular min/max functions.
Attachment #491529 -
Flags: superreview?(edwsmith)
Attachment #491529 -
Flags: review?(wmaddox)
Comment 13•14 years ago
|
||
(In reply to comment #12)
> Related to cmovd, x86+x64 both use branches for cmovd (see
> Assembler::asm_cmov). I'm not sure if there is a conditional move instruction
> for SSE that we could use but currently nanojit is just a branch. The new code
> is a couple percent slower than the older code in micobenchmarks. (win32 x86).
> Previously I did try to simplify the code to only use one insChoose instruction
> but that did not work correctly with ORing the intermediate boolean checks.
Aside from bugs in the backend, frontends should never hesitate to use a LIR_cmov in situations when it can. A simple pure 3-ary operator is always better than a diamond or something. (see bug 586472).
Assignee | ||
Comment 14•14 years ago
|
||
Werner: RE review request - I'm in the middle of a release "firefight", and will not be able to get to this for a few more days.
Reporter | ||
Updated•14 years ago
|
Assignee: wsharp → nobody
Updated•14 years ago
|
Assignee: nobody → wmaddox
Comment 15•14 years ago
|
||
(In reply to comment #9)
> PowerPC - not implemented
> -DnoSSE - can't find a float register error (two cmovd's active at once?)
> MIPS - unknown problems (failed sandbox acceptance tests)
> SH4 - unknown problems (failed sandbox acceptance tests)
I think the -Dnosse bug is addressed by bug 580515.
Comment 16•14 years ago
|
||
Here's the modified code (emitMathAbs() is the only change). Not requesting review for now, in case there is feedback.
Comment 17•14 years ago
|
||
Comment on attachment 491529 [details] [diff] [review]
fix math.abs(0x80000000)
This all looks basically okay. I uploaded a modified patch that is slightly smaller for Math.abs, but haven't tested performance. (take it or leave it). I didn't review the other cases in detail for correctness.
Attachment #491529 -
Flags: superreview?(edwsmith) → superreview+
Assignee | ||
Comment 18•14 years ago
|
||
Comment on attachment 491529 [details] [diff] [review]
fix math.abs(0x80000000)
Removing myself as reviewer, since I'm now the owner of the bug.
Attachment #491529 -
Flags: review?(wmaddox)
Updated•14 years ago
|
Flags: flashplayer-bug-
Updated•13 years ago
|
Attachment #494492 -
Flags: review?(wmaddox)
Updated•13 years ago
|
Flags: flashplayer-injection-
Assignee | ||
Comment 19•13 years ago
|
||
By using a member function pointer, the emit functions can be written as ordinary non-static CodegenLIR methods, with implicit access to 'this'. The CodegenLIR* argument, and the repeated references to it, looked ugly.
The simple inline code emitted for min and max in the v3 patch for the Number/Number case did not handle -0 correctly, intentionally replicating a bug in the builtin class. That bug (bug 551587) has now been fixed, however, and the corrected code is likely more than we want to inline. I have thus removed the inlining specialization for that case. It would probably be worthwhile to consider a direct call to an optimized out-of-line handler, however.
Assignee | ||
Updated•13 years ago
|
Attachment #494492 -
Flags: review?(wmaddox)
Assignee | ||
Comment 20•13 years ago
|
||
Comment on attachment 527712 [details] [diff] [review]
(v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max
Werner and Edwin's v3 patch looked good to me at the time, so what needs to be reviewed here for v4 are primarily the changes noted above.
Attachment #527712 -
Flags: review?(rreitmai)
Comment 21•13 years ago
|
||
Comment on attachment 527712 [details] [diff] [review]
(v4) Simplify emitFunction calling convention with member pointer, don't inline non-integral min/max
Review of attachment 527712 [details] [diff] [review]:
r+ nicely done!
::: core/CodegenLIR.cpp
@@ +3865,5 @@
+ }
+ else {
+ localSet(op1-1, binaryIns(LIR_eqi, binaryIns(LIR_eqd, f, f), InsConst(0)), result);
+ }
+ }
I realize that this code existed previously, but it might be a good time to add comments on the eqi/eqd usage here, as its probably not obvious to new-comers what is going on.
@@ +3958,5 @@
+ }
+ else if (bt == BUILTIN_int) {
+ LIns *arg = localGet(argOffset);
+ if (arg->isImmI() && arg->immI() >= 0)
+ btMask |= 1 << BUILTIN_uint;
could we not set btMask BUILTIN_int if >= 0 fails?
@@ +4071,5 @@
+ funcKey = newFuncKey;
+ builtinFunctionOptimizerHashMap->put(funcKey, i);
+ }
+ }
+ }
Maybe beyond the scope of this fix , but I wonder if we should start thinking about partitioning the 'static' portion of CodegenLIR's data.
E.g. this map could be built once and shared for all CodegenLIR's.
Attachment #527712 -
Flags: review?(rreitmai) → review+
Comment 22•13 years ago
|
||
changeset: 6220:e5e1279cf7a4
user: William Maddox <wmaddox@adobe.com>
summary: Bug 606561 - Specialization framework, including optimizing various Math calls
http://hg.mozilla.org/tamarin-redux/rev/e5e1279cf7a4
Assignee | ||
Comment 23•13 years ago
|
||
(In reply to comment #21)
> Comment on attachment 527712 [details] [diff] [review]
> r+ nicely done!
This was mostly Werner's work, with a contribution by Edwin as well.
> I realize that this code existed previously, but it might be a good time to add
> comments on the eqi/eqd usage here, as its probably not obvious to new-comers
> what is going on.
Done.
> could we not set btMask BUILTIN_int if >= 0 fails?
This case is handled by the earlier line:
int32_t btMask = 1 << bt;
> Maybe beyond the scope of this fix , but I wonder if we should start thinking
> about partitioning the 'static' portion of CodegenLIR's data.
>
> E.g. this map could be built once and shared for all CodegenLIR's.
Good idea! I don't want to hold up getting this in for the integration, but there really is no need to duplicate the initialization for every method compiled. Will follow up.
Assignee | ||
Comment 24•13 years ago
|
||
(In reply to comment #21)
> Comment on attachment 527712 [details] [diff] [review]
> E.g. this map could be built once and shared for all CodegenLIR's.
At first blush, it looked like this might be as easy as simply making
builtinFunctionOptimizerHashMap static. The real problem is that HashMap
requires a nanojit::Allocator, and the only one conveniently at hand is an
arena allocator whose storage is reclaimed along with the CodegenLIR object.
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → FIXED
Comment 25•13 years ago
|
||
Reopening this bug since this injected an issue with Math.abs(0)
Attachment #567701 -
Flags: review?(gchaney)
Updated•13 years ago
|
Status: RESOLVED → REOPENED
Flags: flashplayer-qrb+ → flashplayer-qrb?
Resolution: FIXED → ---
Whiteboard: PACMAN → PACMAN WE:3001900
Comment 26•13 years ago
|
||
tweak the failconfig.txt so that it is not an expectedfail on debugger runs and skip during differential testing
Attachment #567701 -
Attachment is obsolete: true
Attachment #567709 -
Flags: review?(gchaney)
Attachment #567701 -
Flags: review?(gchaney)
Assignee | ||
Comment 27•13 years ago
|
||
Do not negate a zero-valued argument unless it is -0, not +0.
Attachment #567889 -
Flags: review?(rreitmai)
Comment 28•13 years ago
|
||
Comment on attachment 567709 [details] [diff] [review]
v2. Math.abs(0) ensure is +0
patch obsoleted by attachment# 567889 [details] [diff] [review]
Attachment #567709 -
Attachment is obsolete: true
Attachment #567709 -
Flags: review?(gchaney)
Flags: flashplayer-qrb? → flashplayer-qrb+
Target Milestone: Q3 11 - Serrano → Q4 11 - Anza
Updated•13 years ago
|
Attachment #567889 -
Flags: review?(rreitmai) → review+
Assignee | ||
Comment 29•13 years ago
|
||
Somehow, I convinced myself that the last patch worked. It showed up quite broken in a sandbox build, however, and the reason is fundamental: The standard floating point comparisons do not distinguish -0 and 0, despite the intuition that -0 < 0, even if -0 == 0. Correct inline code would need to examine the bitwise representation, or use a hardware instruction such as FABS on x86.
This patch removes the inlining, and simply specializes Math::abs() to a direct call into MathUtils, as we do for trig, etc.
Attachment #568335 -
Flags: review?(rreitmai)
Comment 30•13 years ago
|
||
Do we have a handle on what release branches already shipped with the bug? (backports? ug)
Assignee | ||
Comment 31•13 years ago
|
||
The original bug was injected into FRMain in April. It looks like the bug is in Serrano, but not in Wasabi.
Updated•13 years ago
|
Attachment #568335 -
Flags: review?(rreitmai) → review+
Assignee | ||
Comment 32•13 years ago
|
||
Pushed to TR:
http://hg.mozilla.org/tamarin-redux/rev/28ea922b25b9
Status: REOPENED → RESOLVED
Closed: 13 years ago → 13 years ago
Resolution: --- → FIXED
You need to log in
before you can comment on or make changes to this bug.
Description
•