Closed Bug 911254 Opened 11 years ago Closed 11 years ago

Odinmonkey: (ARM) use the 'compare immediate' instruction for bounds checks

Categories

(Core :: JavaScript Engine, defect)

ARM
All
defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla27
Tracking Status
firefox26 --- fixed
firefox27 --- fixed

People

(Reporter: dougc, Assigned: dougc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [games])

Attachments

(1 file, 6 obsolete files)

The ARM backend currently uses a logical shift and comparison to zero for bounds checks but this only works for heap lengths that are a power of two. Using the 'compare immediate' instruction allows a wider range of heap lengths. The immediate value to this instruction is an eight bit value that can be shifted in multiples of two bits. Thus the quantum by which the heap length can be increased depends on the heap length as follows: up to 4080M: 16M up to 1020M: 4M up to 255M: 1M up to 65280k: 256k up to 16320k: 64k up to 4080k: 16k up to 1020k: 4k It seems unfortunate to expose the asm.js spec. to such ARM constraints, but it's better than the power-of-two restriction. Perhaps a subset of these could be exposed. There is also a compare-negative-immediate instruction that could add some other supported lengths but it's not clear that these would be particularly useful.
Attached patch Backend support. (obsolete) (deleted) — Splinter Review
This patch has the backend support and switches to using the compare-immediate instruction. TODO update the AsmJS linker to accept some range of the supported heap lengths and add tests for these.
The max size of an ArrayBuffer is INT32_MAX at the moment, so I think that means we only need a quanta of 8MB to support the largest size. I'm inclined to *always* require a quanta of 8MB since this is conveniently close and bigger than the "huge page" size option on Linux/Windows which conceivably some day we'd like to support (right now we'll just disable asm.js compilation on x64 if the system page size isn't 4KB).
Another question: is there any performance difference on the awfy asmjs-apps benchmarks?
(In reply to Luke Wagner [:luke] from comment #2) > The max size of an ArrayBuffer is INT32_MAX at the moment, so I think that > means we only need a quanta of 8MB to support the largest size. I'm > inclined to *always* require a quanta of 8MB since this is conveniently > close and bigger than the "huge page" size option on Linux/Windows which > conceivably some day we'd like to support (right now we'll just disable > asm.js compilation on x64 if the system page size isn't 4KB). The instruction encoding only supports shifts of the 8 bits that are a multiple of two, so a quanta of 8MB might not achievable for a maximum heap size of INT32_MAX. E.g. the encoding options are: abcdefgh 00000000 00000000 00000000 00abcdef gh000000 00000000 00000000 ...
Ah, I missed the "multiple of two" part. Yes, so then perhaps we could start with a simple 16MB quanta.
Support for non-power-of-two heap lengths can help with the optimization of heap access, particularly when using masking for safe heap access. For example, HEAP8[(i&0x07FFFFFF)+96] could be proven to be a safe heap access if the heap length is 128M+1M. The top 1M would be a buffer zone that could still be accessed by a non-masked index. Similarly for a base pointer plus an offset proven to have a limited range. For example, HEAP8[(i&0x07FFFFFF)+(k&0xff)] could proven to be a safe heap access. From the perspective of this use, the table in comment 0 could be rewritten to relate the mask heap length to the minimum possible buffer zone size beyond the masked length: Masked length, minimum quanta 2048M 16M ?? 1024M 16M 512M 4M 256M 4M 128M 1M 64M 1M 32M 256k 16M 256k 8M 64k 4M 64k 2M 16k 1M 16k 512k 4k This scheme hits a problem for a 2048M masked heap length if the maximum heap length is also 2048M. An alternative is to use memory protection to implement such a buffer zone beyond the heap length, and this would only be restricted by the pages size. Both approaches might be offered, giving the programmer the option to choose an explicit buffer zone length or use a default implicit buffer zone using memory protection.
(In reply to Luke Wagner [:luke] from comment #3) > Another question: is there any performance difference on the awfy asmjs-apps > benchmarks? Testing on a Nexus 4 (Krait) shows a good gain in performance! For example, almost a 10% reduction in run time on memops. This is an unexpected result, but welcomed. asmjs-ubench memops: mean 42.962 vs 38.955 std 0.29575 seconds Double checked the code, and it looks ok. Here's the hottest block: 0xb5ffa614: adds r4, r0, r3 0xb5ffa618: cmp r4, #134217728 ; 0x8000000 <<< bounds check 0xb5ffa61c: ldrsbcc r4, [r11, r4] 0xb5ffa620: movcs r4, #0 0xb5ffa624: and r5, r4, #1 0xb5ffa628: adds r4, r5, r2 0xb5ffa62c: adds r2, r3, #1 0xb5ffa630: cmp r2, #1048576 ; 0x100000 0xb5ffa634: bge 0xb5ffa644
Comment on attachment 797918 [details] [diff] [review] Backend support. Requesting approval to land the backend changes on the basis that the change gives a useful performance improvement. Could we keep the bug open and land a separate patch to enable the use of some non-power-of-two heaps sizes once the spec. changes are decided?
Attachment #797918 - Flags: review?(luke)
Comment on attachment 797918 [details] [diff] [review] Backend support. It appears we can do even better performance wise for the load-heap sequence. Let me explore this and double check.
Attachment #797918 - Flags: review?(luke)
Testing shows that swapping the two load instructions gives a further performance gain. For the memops benchmark the gain is around 10% and for a particular game the gain is 2% on significant noise but still trending in the right direction. Tested on the Krait processor in the Nexus 4. This patch includes only the ARM backend changes related to this bug. It assumes bug 865516 has landed.
Assignee: general → dtc-moz
Attachment #797918 - Attachment is obsolete: true
Attachment #799465 - Flags: review?(mrosenberg)
Nice! Could you describe what the new instruction sequence is?
(In reply to Luke Wagner [:luke] from comment #11) > Nice! Could you describe what the new instruction sequence is? The ARM uses a test (now compare-immediate) followed by two conditional loads, one load for the in-bounds case and another for the out-of-bounds case. The latest change just swapped the order of these two loads, so that the out-of-bounds conditional load is now first.
Comment on attachment 799465 [details] [diff] [review] Further optimize the load-heap instruction sequence. Review of attachment 799465 [details] [diff] [review]: ----------------------------------------------------------------- ::: js/src/jit/arm/CodeGenerator-arm.cpp @@ +1844,3 @@ > } else { > + Register d = ToRegister(ins->output()); > + masm.ma_mov(Imm32(0), d, NoSetCond, Assembler::AboveOrEqual); I was thinking of just making this unconditional. Can you see if that has any effect on perf?
Attachment #799465 - Flags: review?(mrosenberg) → review+
(In reply to Marty Rosenberg [:mjrosenb] from comment #13) > Comment on attachment 799465 [details] [diff] [review] > Further optimize the load-heap instruction sequence. ... > > + Register d = ToRegister(ins->output()); > > + masm.ma_mov(Imm32(0), d, NoSetCond, Assembler::AboveOrEqual); > > I was thinking of just making this unconditional. Can you see if that has > any effect on perf? Yes, this was tried but slowed performance.
Attached patch Rebase ARM backend changes. (obsolete) (deleted) — Splinter Review
Carry forward r+
Attachment #799465 - Attachment is obsolete: true
Attachment #801054 - Flags: review+
Attached patch Fix rebase. (obsolete) (deleted) — Splinter Review
Sorry, rebase was broken. Carrying forward r+.
Attachment #801054 - Attachment is obsolete: true
Attachment #801059 - Flags: review+
Keywords: checkin-needed
Whiteboard: leave open
This seems to be the cause of 23% performance regression on the awft asmjs-ubench-memops Tegra-2 benchmark. Performance of the awft asm.js apps seems to be more positive: asmjs-apps-bullet and asmjs-apps-zlib seem to have improved. So this does not appear to be as simple as reverting back to using 'mov' for some processors. A patch that supports both mov and cmp for use as a bounds check plus some instruction order variations has been code up, and I will test on a wider range of processors.
This patch enables use of all possible heap lengths as constrained by the x64 page size restriction and the ARM 'cmp immediate' instruction encoding. If you want to enable use of just a subset then it should be trivial so just let me know. The heap length policy has been abstracted into two functions: roundUpAsmJSHeapLength, and IsValidAsmJSHeapLength. On a link failure due to an invalid heap length the message reports the failing length and makes a suggestion by rounding up to the next valid length.
Attachment #804559 - Flags: review?(luke)
Forgot to test for heap sizes less than 4096 bytes. Added this and reworked the code to hopefully be a little clearer. Note the sparc has a page size of 8192 bytes, and perhaps it would be prudent to increase the minimum to 8192 and depreciate the 4096 minimum heap length. It's not clear how important this is to ff26 at this stage.
Attachment #804559 - Attachment is obsolete: true
Attachment #804559 - Flags: review?(luke)
Attachment #805144 - Flags: review?(luke)
Comment on attachment 805144 [details] [diff] [review] Catch heap sizes less than 4096 bytes, and rework the code for improved clarity. Review of attachment 805144 [details] [diff] [review]: ----------------------------------------------------------------- Great job! ::: js/src/jit/AsmJS.cpp @@ +929,5 @@ > +// 000000ab cdefgh00 00000000 00000000 > +// 00000000 abcdefgh 00000000 00000000 > +// 00000000 00abcdef gh000000 00000000 > +// 00000000 0000abcd efgh0000 00000000 > +// ... On the ellipsis: are there any other immediates supported? @@ +932,5 @@ > +// 00000000 0000abcd efgh0000 00000000 > +// ... > +// > +// ARM Thumb T2 encoding supports the following immediate constants and although > +// this mode is not currently used it might be worth considering as a constraint: If I'm reading correctly, ARM Thumb T2 seems like a strict generalization of the immediate constraints on ARMv7. Is that right? If so, then it seems like you could remove the chart and comment. @@ +961,5 @@ > +// Heap length 0x00100000 to 0x003fc000 quanta 0x00004000 > +// Heap length 0x00001000 to 0x000ff000 quanta 0x00001000 > +// > +uint32_t > +js::roundUpAsmJSHeapLength(uint32_t length) { { on new line @@ +977,5 @@ > + return (length + 0x000fffff) & ~0x000fffff; > + if (length < 0x10000000u) // < 1024M quanta 4M > + return (length + 0x003fffff) & ~0x003fffff; > + // < 4096M quanta 16M. Note zero is returned if over 0xff000000 but such > + // length are not currently valid. Can you JS_ASSERT(length <= 0xff000000)? @@ +982,5 @@ > + return (length + 0x00ffffff) & ~0x00ffffff; > +} > + > +bool > +js::IsValidAsmJSHeapLength(uint32_t length) { { on new line ::: js/src/jit/AsmJS.h @@ +79,5 @@ > static const size_t AsmJSAllocationGranularity = 4096; > > +// These functions define the valid heap lengths. > +extern uint32_t > +roundUpAsmJSHeapLength(uint32_t length); Capitalize 'r' and also, how about 'RoundUpToNextValidAsmJSHeapLength'? Otherwise 'round' is a bit ambiguous. ::: js/src/jit/AsmJSLink.cpp @@ +214,5 @@ > return LinkFail(cx, "bad ArrayBuffer argument"); > > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { Can you move the >= AsmJSAllocationGranularity check into IsValidAsmJSHeapLength? @@ +216,5 @@ > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { > + return LinkFail(cx, JS_smprintf("ArrayBuffer byteLength 0x%x is not a valid heap length. The next valid length is 0x%x", > + heap->byteLength(), roundUpAsmJSHeapLength(heap->byteLength()))); Nice message!
Attachment #805144 - Flags: review?(luke) → review+
Attached patch Address reviewer feedback. (deleted) — Splinter Review
Address reviewer feedback. Carry forward r+
Attachment #805144 - Attachment is obsolete: true
Attachment #805612 - Flags: review+
(In reply to Luke Wagner [:luke] from comment #22) > Comment on attachment 805144 [details] [diff] [review] > Catch heap sizes less than 4096 bytes, and rework the code for improved > clarity. > > Review of attachment 805144 [details] [diff] [review]: > ----------------------------------------------------------------- Thank you for the quick review. > ::: js/src/jit/AsmJS.cpp > @@ +929,5 @@ > > +// 000000ab cdefgh00 00000000 00000000 > > +// 00000000 abcdefgh 00000000 00000000 > > +// 00000000 00abcdef gh000000 00000000 > > +// 00000000 0000abcd efgh0000 00000000 > > +// ... > > On the ellipsis: are there any other immediates supported? Yes, smaller immediates are supported but this is all we need because the minimum granularity is 4096 or 0x1000. There is also a compare-negative-immediate instruction that could add some more, but it is not clear if they are useful. > @@ +932,5 @@ > > +// 00000000 0000abcd efgh0000 00000000 > > +// ... > > +// > > +// ARM Thumb T2 encoding supports the following immediate constants and although > > +// this mode is not currently used it might be worth considering as a constraint: > > If I'm reading correctly, ARM Thumb T2 seems like a strict generalization of > the immediate constraints on ARMv7. Is that right? If so, then it seems > like you could remove the chart and comment. Ok, the table has been removed, and a small comment added noting that it is also compatible with the Thumb T2 encodeing. > @@ +961,5 @@ > > +// Heap length 0x00100000 to 0x003fc000 quanta 0x00004000 > > +// Heap length 0x00001000 to 0x000ff000 quanta 0x00001000 > > +// > > +uint32_t > > +js::roundUpAsmJSHeapLength(uint32_t length) { > > { on new line Done > @@ +977,5 @@ > > + return (length + 0x000fffff) & ~0x000fffff; > > + if (length < 0x10000000u) // < 1024M quanta 4M > > + return (length + 0x003fffff) & ~0x003fffff; > > + // < 4096M quanta 16M. Note zero is returned if over 0xff000000 but such > > + // length are not currently valid. > > Can you JS_ASSERT(length <= 0xff000000)? Done. > @@ +982,5 @@ > > + return (length + 0x00ffffff) & ~0x00ffffff; > > +} > > + > > +bool > > +js::IsValidAsmJSHeapLength(uint32_t length) { > > { on new line Done. > ::: js/src/jit/AsmJS.h > @@ +79,5 @@ > > static const size_t AsmJSAllocationGranularity = 4096; > > > > +// These functions define the valid heap lengths. > > +extern uint32_t > > +roundUpAsmJSHeapLength(uint32_t length); > > Capitalize 'r' and also, how about 'RoundUpToNextValidAsmJSHeapLength'? > Otherwise 'round' is a bit ambiguous. Done. > ::: js/src/jit/AsmJSLink.cpp > @@ +214,5 @@ > > return LinkFail(cx, "bad ArrayBuffer argument"); > > > > heap = &bufferVal.toObject().as<ArrayBufferObject>(); > > > > + if (!IsValidAsmJSHeapLength(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) { > > Can you move the >= AsmJSAllocationGranularity check into > IsValidAsmJSHeapLength? Good suggestion. Done.
Whiteboard: [leave open]
Keywords: checkin-needed
Status: NEW → RESOLVED
Closed: 11 years ago
Flags: in-testsuite+
Resolution: --- → FIXED
Target Milestone: --- → mozilla27
Comment on attachment 805612 [details] [diff] [review] Address reviewer feedback. [Approval Request Comment] Bug caused by (feature/regressing bug #): User impact if declined: asm.js code will not be able to use the new non-power-of-two heap lengths and backwards compatibility requirements would prevent app developers using in future until ff26 is history. This feature might help reduce memory usage and also improve performance of apps. Testing completed (on m-c, etc.): includes tests Risk to taking this patch (and alternatives if risky): low String or IDL/UUID changes made by this patch:
Attachment #805612 - Flags: approval-mozilla-aurora?
Depends on: 917800
Whiteboard: [games]
Attachment #805612 - Flags: approval-mozilla-aurora? → approval-mozilla-aurora+
Attachment #801059 - Attachment is obsolete: true
Blocks: gecko-games
Depends on: CVE-2015-0817
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: