Closed Bug 570476 Opened 14 years ago Closed 14 years ago

Specializer could support integer division for non-zero integer constants.

Categories

(Tamarin Graveyard :: Baseline JIT (CodegenLIR), defect, P3)

x86
Windows 7
defect

Tracking

(Not tracked)

RESOLVED FIXED
Q3 11 - Serrano

People

(Reporter: wsharp, Unassigned)

References

Details

(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 3 obsolete files)

No description provided.
We can not promote all division operations to integer division without supporting divide by zero: a:int b:int c:int c = a / b; // if b is zero, we need to handle this: but it would easy enough to handle when b was a non-zero constant: c = a / 5; Changing the specializer code to handle this case doubles the performance of division on my x86 XEON processor. Code for patch: else if (op == LIR_divd) { LIns *a = v->oprnd1(); LIns *b = v->oprnd2(); a = isPromote(a->opcode()) ? a->oprnd1() : imm2Int(a); b = imm2Int(b); if (a && b && b->immI() != 0) return out->ins2(LIR_divi, a, b); }
Whiteboard: PACMAN
Sounds nice. I wonder if it would be profitable to handle non-constant cases here too, with a check-for-zero at runtime? (Results would all have to be promoted to Number since 0 -> Inf but the time savings might still be worthwhile)
The case I was optimize for was when the result stays as integer. In that case, a divide by zero equals zero. To simulate a runtime check, I added AS3 code in my loop... if (!d) c = 0 else c = a / d; This adds a slight bit of overhead to the optimization that we could theoretically emit in the JITed code. One interesting thing is integer division on my Xeon is about 2x faster for small numbers (99/5), about 30% faster for larger numbers and only 10% faster for very large numbers (1000000+). So the big win in really only with small integer values, at least on x86.
Assignee: nobody → wsharp
Status: NEW → ASSIGNED
Flags: flashplayer-qrb+
Priority: -- → P3
Target Milestone: --- → flash10.2
Assignee: wsharp → nobody
Summary: Specializer could support integer division for positive integer constants. → Specializer could support integer division for non-zero integer constants.
Brightspot data from the specializer where we have an integer_d following a LIR_divd 0/1 values are: a->isPromote(), b->isPromote(), a->imm2Int() != 0, b->imm2Int() != 0, a && b, our integer constant is non-zero. So 216 divide ops have a int->double promotion on one side plus a non-zero constant on the other. These we can optimize to an integer divide: 216 DIVD 1 0 0 1 1 (nonzero const=1) int / constant int - we can optimize 208 DIVD 0 0 0 1 0 (nonzero const=1) float / constant int 149 DIVD 1 0 0 0 0 (nonzero const=0) int / float 111 DIVD 0 0 0 0 0 (nonzero const=0) float / float 20 DIVD 0 1 0 0 0 (nonzero const=0) float / int 19 DIVD 1 1 0 0 1 (nonzero const=0) int but not a constant divisor 9 DIVD 0 0 1 0 0 (nonzero const=0) constant int / float 6 DIVD 0 1 1 0 1 (nonzero const=0) constant / int And a dump of what our constants are: 142 DIVD 1 0 0 1 1 (nonzero const=2) 11 DIVD 1 0 0 1 1 (nonzero const=1000) 8 DIVD 1 0 0 1 1 (nonzero const=3) 7 DIVD 1 0 0 1 1 (nonzero const=4) 7 DIVD 1 0 0 1 1 (nonzero const=1024) 5 DIVD 1 0 0 1 1 (nonzero const=8) 4 DIVD 1 0 0 1 1 (nonzero const=60) 3 DIVD 1 0 0 1 1 (nonzero const=16) 1 DIVD 1 0 0 1 1 (nonzero const=108) 1 DIVD 1 0 0 1 1 (nonzero const=18) 1 DIVD 1 0 0 1 1 (nonzero const=10) 1 DIVD 1 0 0 1 1 (nonzero const=60000) 1 DIVD 1 0 0 1 1 (nonzero const=5) Perhaps we should specialize the /2 as >>1 beyond optimizing the integer division itself.
If our divisor is a non-zero integer constant, optimize it to an integer divide. If our dividend is unsigned and our divisor is 2, use a rshlui. Out of 249 divide by 2 calls, 36 of them were an unsigned dividend. right-shift does not work for signed variables since (-1>>1 != -1/2).
Attachment #478843 - Flags: superreview?(edwsmith)
Attachment #478843 - Flags: review?(wmaddox)
(In reply to comment #4) > Perhaps we should specialize the /2 as >>1 beyond optimizing the integer > division itself. Not too hard to make this more general: if exactlyOneBit(rhs) == true, then the RHS is a power of 2. Then, nanojit::msbSet32 or lsbSet32() will give you the shift amount.
Also beware LIR_divi is currently only implemented on x86-32
Comment on attachment 478843 [details] [diff] [review] support for faster integer divide specialization Looks good. R+ It would be possible to avoid the call imm2Int(b) if a == NULL, rather than computing both a and b and testing a && b. The same could be said for the preceding addd/subd/muld case as well, though. Do any of the performance suite benchmarks show significant improvement?
Attachment #478843 - Flags: review?(wmaddox) → review+
Made changes suggested from both Bill and Ed. IA32 and AMD64 are the two backends that support LIR_divi currently. I have only tested this with micro benchmarks: int/2 - float 497 msec, int 456 msec unsigned/2 - float 614 msec, int shift 266 msec int/3 - float 1172, int 758 msec uint/1024 - float 614 msec, int shift 268 msec
Attachment #478843 - Attachment is obsolete: true
Attachment #479044 - Flags: superreview?(edwsmith)
Attachment #479044 - Flags: review?(wmaddox)
Attachment #478843 - Flags: superreview?(edwsmith)
Comment on attachment 479044 [details] [diff] [review] incorporate review suggestions. #ifdef ia32 and amd64 Other CPU-specific features have things like NJ_EXPANDED_LOADSTORE_SUPPORTED, NJ_F2I_SUPPORTED, etc... if there isn't one for LIR_divi, we should add one and use it, rather than adding CPU-specific ifdefs here
Attached patch Update patch with NJ_DIVI_SUPPORTED flag (obsolete) (deleted) — Splinter Review
Attachment #479044 - Attachment is obsolete: true
Attachment #479521 - Flags: superreview?(edwsmith)
Attachment #479521 - Flags: review?(wmaddox)
Attachment #479044 - Flags: superreview?(edwsmith)
Attachment #479044 - Flags: review?(wmaddox)
Comment on attachment 479521 [details] [diff] [review] Update patch with NJ_DIVI_SUPPORTED flag clearing SR flag until rebased (Specializer went away in bug 600649).
Attachment #479521 - Flags: superreview?(edwsmith)
See also https://bugzilla.mozilla.org/show_bug.cgi?id=600459, "better codegen for div/mod by a constant"
Comment on attachment 479521 [details] [diff] [review] Update patch with NJ_DIVI_SUPPORTED flag R-, since the patch needs rebasing as Edwin noted above. While you are at it, you might consider multiplication by powers of two. Also, the integer-specific opcode multiply_i would benefit from the same optimization. (There is no divide_i.) I suspect we are more likely to see int rather than uint arguments, so it might be slightly preferable to reverse the operands to the '&&' below: if (exactlyOneBit(intConst) && aOpcode == LIR_ui2d) {
Attachment #479521 - Flags: review?(wmaddox) → review-
Attached patch rebase patch (deleted) — Splinter Review
Rebased to current codebase. Please review in isolation from bug 600585 which if it lands first will modify this patch ever so slightly. (a trivial change that I don't think requires a re-review).
Attachment #479521 - Attachment is obsolete: true
Attachment #482396 - Flags: superreview?(edwsmith)
Attachment #482396 - Flags: review?(wmaddox)
Comment on attachment 482396 [details] [diff] [review] rebase patch R+ Nit: You rely on the default in Native.h to define NJ_DIVI_SUPPORTED on unsupported platforms. In the machine-specific NativeXXX.h files, it appears that the convention is to enumerate the unsuported options as well, explicitly giving them a value of zero. I have not verified whether this is done with complete consistency. Following this convention while having a transparent fallback to a default seems to be just asking for inconsistent application of the convention to creep int. I'm not sure what the policy should be, but perhaps it is best to go with the flow for the purposes of this patch.
Attachment #482396 - Flags: review?(wmaddox) → review+
Attachment #482396 - Flags: superreview?(edwsmith)
Status: ASSIGNED → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Is it silly to ask why only the x86 backend supports LIR_divi? (Surely ARM has an integer division instruction?)
ARM has no hardware support for integer division. It's done in a C++ helper.
Whiteboard: PACMAN → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey
Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin
Flags: flashplayer-bug+
Depends on: 642535
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: