Closed Bug 570476 Opened 14 years ago Closed 14 years ago

Specializer could support integer division for non-zero integer constants.

Tracking

(Not tracked)

Status:

RESOLVED FIXED

Milestone:

Q3 11 - Serrano

People

(Reporter: wsharp, Unassigned)

References

Details

(Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin)

Attachments

(1 file, 3 obsolete files)

support for faster integer divide specialization 14 years ago Werner Sharp (deleted), patch	wmaddox : review+	Details \| Diff \| Splinter Review
incorporate review suggestions. #ifdef ia32 and amd64 14 years ago Werner Sharp (deleted), patch		Details \| Diff \| Splinter Review
Update patch with NJ_DIVI_SUPPORTED flag 14 years ago Werner Sharp (deleted), patch	wmaddox : review-	Details \| Diff \| Splinter Review
rebase patch 14 years ago Werner Sharp (deleted), patch	wmaddox : review+	Details \| Diff \| Splinter Review

Werner Sharp

Reporter

Description

•

14 years ago

No description provided.

Werner Sharp

Reporter

Comment 1

•

14 years ago

We can not promote all division operations to integer division without supporting divide by zero: a:int b:int c:int c = a / b; // if b is zero, we need to handle this: but it would easy enough to handle when b was a non-zero constant: c = a / 5; Changing the specializer code to handle this case doubles the performance of division on my x86 XEON processor. Code for patch: else if (op == LIR_divd) { LIns *a = v->oprnd1(); LIns *b = v->oprnd2(); a = isPromote(a->opcode()) ? a->oprnd1() : imm2Int(a); b = imm2Int(b); if (a && b && b->immI() != 0) return out->ins2(LIR_divi, a, b); }

Whiteboard: PACMAN

Steven Johnson

Comment 2

•

14 years ago

Sounds nice. I wonder if it would be profitable to handle non-constant cases here too, with a check-for-zero at runtime? (Results would all have to be promoted to Number since 0 -> Inf but the time savings might still be worthwhile)

Werner Sharp

Reporter

Comment 3

•

14 years ago

The case I was optimize for was when the result stays as integer. In that case, a divide by zero equals zero. To simulate a runtime check, I added AS3 code in my loop... if (!d) c = 0 else c = a / d; This adds a slight bit of overhead to the optimization that we could theoretically emit in the JITed code. One interesting thing is integer division on my Xeon is about 2x faster for small numbers (99/5), about 30% faster for larger numbers and only 10% faster for very large numbers (1000000+). So the big win in really only with small integer values, at least on x86.

Dan Smith

Updated

•

14 years ago

Assignee: nobody → wsharp

Status: NEW → ASSIGNED

Flags: flashplayer-qrb+

Priority: -- → P3

Target Milestone: --- → flash10.2

Werner Sharp

Reporter

Updated

•

14 years ago

Assignee: wsharp → nobody

Werner Sharp

Reporter

Updated

•

14 years ago

Summary: Specializer could support integer division for positive integer constants. → Specializer could support integer division for non-zero integer constants.

Werner Sharp

Reporter

Comment 4

•

14 years ago

Brightspot data from the specializer where we have an integer_d following a LIR_divd 0/1 values are: a->isPromote(), b->isPromote(), a->imm2Int() != 0, b->imm2Int() != 0, a && b, our integer constant is non-zero. So 216 divide ops have a int->double promotion on one side plus a non-zero constant on the other. These we can optimize to an integer divide: 216 DIVD 1 0 0 1 1 (nonzero const=1) int / constant int - we can optimize 208 DIVD 0 0 0 1 0 (nonzero const=1) float / constant int 149 DIVD 1 0 0 0 0 (nonzero const=0) int / float 111 DIVD 0 0 0 0 0 (nonzero const=0) float / float 20 DIVD 0 1 0 0 0 (nonzero const=0) float / int 19 DIVD 1 1 0 0 1 (nonzero const=0) int but not a constant divisor 9 DIVD 0 0 1 0 0 (nonzero const=0) constant int / float 6 DIVD 0 1 1 0 1 (nonzero const=0) constant / int And a dump of what our constants are: 142 DIVD 1 0 0 1 1 (nonzero const=2) 11 DIVD 1 0 0 1 1 (nonzero const=1000) 8 DIVD 1 0 0 1 1 (nonzero const=3) 7 DIVD 1 0 0 1 1 (nonzero const=4) 7 DIVD 1 0 0 1 1 (nonzero const=1024) 5 DIVD 1 0 0 1 1 (nonzero const=8) 4 DIVD 1 0 0 1 1 (nonzero const=60) 3 DIVD 1 0 0 1 1 (nonzero const=16) 1 DIVD 1 0 0 1 1 (nonzero const=108) 1 DIVD 1 0 0 1 1 (nonzero const=18) 1 DIVD 1 0 0 1 1 (nonzero const=10) 1 DIVD 1 0 0 1 1 (nonzero const=60000) 1 DIVD 1 0 0 1 1 (nonzero const=5) Perhaps we should specialize the /2 as >>1 beyond optimizing the integer division itself.

Werner Sharp

Reporter

Comment 5

•

14 years ago

Attached patch support for faster integer divide specialization (obsolete) (deleted) — Details — Splinter Review

If our divisor is a non-zero integer constant, optimize it to an integer divide. If our dividend is unsigned and our divisor is 2, use a rshlui. Out of 249 divide by 2 calls, 36 of them were an unsigned dividend. right-shift does not work for signed variables since (-1>>1 != -1/2).

Attachment #478843 - Flags: superreview?(edwsmith)

Attachment #478843 - Flags: review?(wmaddox)

Edwin Smith

Comment 6

•

14 years ago

(In reply to comment #4) > Perhaps we should specialize the /2 as >>1 beyond optimizing the integer > division itself. Not too hard to make this more general: if exactlyOneBit(rhs) == true, then the RHS is a power of 2. Then, nanojit::msbSet32 or lsbSet32() will give you the shift amount.

Edwin Smith

Comment 7

•

14 years ago

Also beware LIR_divi is currently only implemented on x86-32

William Maddox

Comment 8

•

14 years ago

Comment on attachment 478843 [details] [diff] [review] support for faster integer divide specialization Looks good. R+ It would be possible to avoid the call imm2Int(b) if a == NULL, rather than computing both a and b and testing a && b. The same could be said for the preceding addd/subd/muld case as well, though. Do any of the performance suite benchmarks show significant improvement?

Attachment #478843 - Flags: review?(wmaddox) → review+

Werner Sharp

Reporter

Comment 9

•

14 years ago

Attached patch incorporate review suggestions. #ifdef ia32 and amd64 (obsolete) (deleted) — Details — Splinter Review

Made changes suggested from both Bill and Ed. IA32 and AMD64 are the two backends that support LIR_divi currently. I have only tested this with micro benchmarks: int/2 - float 497 msec, int 456 msec unsigned/2 - float 614 msec, int shift 266 msec int/3 - float 1172, int 758 msec uint/1024 - float 614 msec, int shift 268 msec

Attachment #478843 - Attachment is obsolete: true

Attachment #479044 - Flags: superreview?(edwsmith)

Attachment #479044 - Flags: review?(wmaddox)

Attachment #478843 - Flags: superreview?(edwsmith)

Steven Johnson

Comment 10

•

14 years ago

Comment on attachment 479044 [details] [diff] [review] incorporate review suggestions. #ifdef ia32 and amd64 Other CPU-specific features have things like NJ_EXPANDED_LOADSTORE_SUPPORTED, NJ_F2I_SUPPORTED, etc... if there isn't one for LIR_divi, we should add one and use it, rather than adding CPU-specific ifdefs here

Werner Sharp

Reporter

Comment 11

•

14 years ago

Attached patch Update patch with NJ_DIVI_SUPPORTED flag (obsolete) (deleted) — Details — Splinter Review

Attachment #479044 - Attachment is obsolete: true

Attachment #479521 - Flags: superreview?(edwsmith)

Attachment #479521 - Flags: review?(wmaddox)

Attachment #479044 - Flags: superreview?(edwsmith)

Attachment #479044 - Flags: review?(wmaddox)

Edwin Smith

Comment 12

•

14 years ago

Comment on attachment 479521 [details] [diff] [review] Update patch with NJ_DIVI_SUPPORTED flag clearing SR flag until rebased (Specializer went away in bug 600649).

Attachment #479521 - Flags: superreview?(edwsmith)

Steven Johnson

Comment 13

•

14 years ago

See also https://bugzilla.mozilla.org/show_bug.cgi?id=600459, "better codegen for div/mod by a constant"

William Maddox

Comment 14

•

14 years ago

Comment on attachment 479521 [details] [diff] [review] Update patch with NJ_DIVI_SUPPORTED flag R-, since the patch needs rebasing as Edwin noted above. While you are at it, you might consider multiplication by powers of two. Also, the integer-specific opcode multiply_i would benefit from the same optimization. (There is no divide_i.) I suspect we are more likely to see int rather than uint arguments, so it might be slightly preferable to reverse the operands to the '&&' below: if (exactlyOneBit(intConst) && aOpcode == LIR_ui2d) {

Attachment #479521 - Flags: review?(wmaddox) → review-

Werner Sharp

Reporter

Comment 15

•

14 years ago

Attached patch rebase patch (deleted) — Details — Splinter Review

Rebased to current codebase. Please review in isolation from bug 600585 which if it lands first will modify this patch ever so slightly. (a trivial change that I don't think requires a re-review).

Attachment #479521 - Attachment is obsolete: true

Attachment #482396 - Flags: superreview?(edwsmith)

Attachment #482396 - Flags: review?(wmaddox)

William Maddox

Comment 16

•

14 years ago

Comment on attachment 482396 [details] [diff] [review] rebase patch R+ Nit: You rely on the default in Native.h to define NJ_DIVI_SUPPORTED on unsupported platforms. In the machine-specific NativeXXX.h files, it appears that the convention is to enumerate the unsuported options as well, explicitly giving them a value of zero. I have not verified whether this is done with complete consistency. Following this convention while having a transparent fallback to a default seems to be just asking for inconsistent application of the convention to creep int. I'm not sure what the policy should be, but perhaps it is best to go with the flow for the purposes of this patch.

Attachment #482396 - Flags: review?(wmaddox) → review+

Werner Sharp

Reporter

Updated

•

14 years ago

Attachment #482396 - Flags: superreview?(edwsmith)

Werner Sharp

Reporter

Comment 17

•

14 years ago

TR part: http://hg.mozilla.org/tamarin-redux/rev/4cc9be867ef8

Werner Sharp

Reporter

Comment 18

•

14 years ago

NJ part: http://hg.mozilla.org/projects/nanojit-central/rev/064dc57a5763

Status: ASSIGNED → RESOLVED

Closed: 14 years ago

Resolution: --- → FIXED

Steven Johnson

Comment 19

•

14 years ago

Is it silly to ask why only the x86 backend supports LIR_divi? (Surely ARM has an integer division instruction?)

Werner Sharp

Reporter

Comment 20

•

14 years ago

ARM has no hardware support for integer division. It's done in a C++ helper.

Nicholas Nethercote [inactive]

Comment 21

•

14 years ago

http://hg.mozilla.org/tracemonkey/rev/7eadec8c91c6

Whiteboard: PACMAN → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey

Rick Reitmaier

Comment 22

•

14 years ago

http://hg.mozilla.org/tamarin-redux/rev/b986959fbce5

Whiteboard: PACMAN, fixed-in-nanojit, fixed-in-tracemonkey → PACMAN, fixed-in-nanojit, fixed-in-tracemonkey, fixed-in-tamarin

Trevor Baker

Updated

•

14 years ago

Flags: flashplayer-bug+

Edwin Smith

Updated

•

14 years ago

Depends on: 642535

You need to log in before you can comment on or make changes to this bug.