Closed Bug 865516 Opened 12 years ago Closed 11 years ago

Asm.js: optimize heap access when the index and length range are known.

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla26

People

(Reporter: dougc, Assigned: dougc)

References

(Blocks 1 open bug)

Details

(Whiteboard: [games])

Attachments

(1 file, 14 obsolete files)

(deleted), patch
dougc
: review+
Details | Diff | Splinter Review
The asm.js buffer is guaranteed to be at least 4096 bytes and this knowledge can be used to avoid inline bounds checks on the index (x86 and ARM) when the index is a constant referencing an element within the first 4096 bytes. There are also opportunities to use more compact instruction sequences - using an immediate offset or displacement. The ARM and x64 can encode the small offsets within the load or store instruction. For the x86 the offset can be record and applied when patching the instruction displacement. This minimizes the number of instructions required for these loads and stores. Asm.js applications could exploit this by packing frequently used static variables into the first page of the buffer.
Hardware: ARM → All
Summary: Optimize asm.js access to the first page of the heap. → Optimize asm.js access to heap when the index is a constant.
Attached patch Support for the ARM, x86 and x64 archs. (obsolete) (deleted) — Splinter Review
Here's a first cut at the support. There is something for all the backends: the ARM and x86 avoid unnecessary bounds checks, and all the backends use an immediate constant offset when possible which potentially frees a register and reduces code size. The ARM and x64 avoid accounting for the access when it is a constant which is an extra savings. For the x86, the constant index is stored in the instruction and the AsmJS linker adds the heap base when this is known. The maximum constant index seen is noted and the AsmJS linker checks that the buffer is large enough and bails out if not. Without this it would only be possible to optimize access to the first 4096 bytes of the heap. Thanks for the suggestion Luke. There would appear to be further opportunities to use knowledge of the maximum heap index to use range analysis to optimise away bounds checks. An app could include a high constant index heap access early to effectively declare a minimum heap size, and combined with masks etc on variable indexes declaring their range this could allow bounds checks to be optimize away. This may be more effective if the maximum index were known earlier - currently this is recorded when the instructions are emitted. The jit-tests have been updated and extended to test all the new code paths. The AsmJS link failure test is not fully effective. Perhaps the warning upon successful compilation of asm.js code makes it problematic detecting asm.js compilation failure warnings. The combination of avoiding the bounds checks and using shorter instruction sequences gives a useful win. For this to be most effective app developers would want to place the most frequently accessed state at the start of the heap so that the indexes are small and can be encoded in one instruction - the details vary between archs. But there is some benefit for all constant index accesses to the heap.
Attachment #744582 - Flags: feedback?(luke)
Comment on attachment 744582 [details] [diff] [review] Support for the ARM, x86 and x64 archs. Review of attachment 744582 [details] [diff] [review]: ----------------------------------------------------------------- Awesome, this is looking great. I'd be interested to see the speedups on a few microbenchmarks. In particular, it would be useful to see what sort of speed difference there is when using small constant global indices (Alon was interested in this because he could change Emscripten to do this). ::: js/src/ion/AsmJSLink.cpp @@ +189,5 @@ > if (!IsPowerOfTwo(heap->byteLength()) || heap->byteLength() < AsmJSAllocationGranularity) > return LinkFail(cx, "ArrayBuffer byteLength must be a power of two greater than or equal to 4096"); > > + if (heap->byteLength() <= module.maxConstantHeapAccess()) > + return LinkFail(cx, "ArrayBuffer byteLength must be greater than constant pointer accesses into the array."); Could you add a comment explaining why this check is sufficient without considering the size of the loaded datum (the reason being that the load starts on an aligned boundary and the byteLength is a power of 2)? ::: js/src/ion/Lowering.cpp @@ +2531,5 @@ > + LAllocation ptrAlloc; > + if (ptr->type() == MIRType_Int32 && ptr->isConstant()) > + ptrAlloc = LAllocation(ptr->toConstant()->vp()); > + else > + ptrAlloc = useRegisterAtStart(ptr); In all the cases for lowering, I think you can use useRegisterOrConstant/useRegisterOrConstantAtStart instead of this manual branching. (Also, you can depend on ptr->type() == MIRType_Int32; this is implied by the asm.js type rules.) ::: js/src/ion/RegisterSets.h @@ +758,1 @@ > AsmJSHeapAccess(uint32_t cmp, uint32_t offset, uint32_t after, ArrayBufferView::ViewType vt, s/'cmp/'cmp'/ and s/'offset/'offset'/ @@ +792,5 @@ > bool isFloat32Load() const { return isFloat32Load_; } > ion::AnyRegister loadedReg() const { return ion::AnyRegister::FromCode(loadedReg_); } > > #if defined(JS_CPU_X86) > + bool isPatchLength() const { return cmpDelta_ > 0; } How about naming this "hasLengthCheck". ::: js/src/ion/x64/CodeGenerator-x64.cpp @@ +416,5 @@ > + return true; > + } > + > + switch (vt) { > + case ArrayBufferView::TYPE_INT8: masm.movxbl(srcAddr, ToRegister(ins->output())); break; General style nit: the cases of a switch are indented 2 chars from the start of the switch. (In case you haven't seen it, here's the SpiderMonkey style guide: https://wiki.mozilla.org/JavaScript:SpiderMonkey:Coding_Style.) ::: js/src/ion/x86/Assembler-x86.h @@ +469,5 @@ > CodeOffsetLabel movxblWithPatch(Address src, Register dest) { > masm.movxbl_mr_disp32(src.offset, src.base.code(), dest.code()); > return masm.currentOffset(); > } > + CodeOffsetLabel movxblWithPatch(const AbsoluteAddress &src, Register dest) { The above "Load from *(base + disp32) ..." comment doesn't apply to all these new AbsoluteAddress overloads that you're adding, so could you put them all together and give them a new comment "Load from *address where address can be patched"? ::: js/src/ion/x86/CodeGenerator-x86.cpp @@ +471,5 @@ > const MAsmJSLoadHeap *mir = ins->mir(); > ArrayBufferView::ViewType vt = mir->viewType(); > + const LAllocation *ptr = ins->ptr(); > + const LDefinition *out = ins->output(); > + We try to remove all trailing whitespace (although you can certainly find cases where we missed it). There are a few in this patch. An easy way to catch them is in bugzilla's diff viewer (they look red) or in hg's diff viewer, if you add [color] diff.trailingwhitespace = bold red_background
Attachment #744582 - Flags: feedback?(luke) → feedback+
I should have ended the last sentence in the previous comment with "to your .hgrc".
This revision tries to address the feedback comments, and extends the optimization to leverage the existing Ion range analysis. The maximum of the heap index range lower bounds is noted and rounded up for a power of two heap size. This gives a lower bound for the heap size while compiling, and the linker enforces this bound. The code generator can then compare a heap index range to the inferred minimum heap length and when the access is proven to always be within the bounds of the heap then the bounds check and be omitted. Since all code in a module uses the same heap, the inferred minimum size noted when compiling each function is progressively combined into a bound for the module. With parallel compilation of functions this could lead to non-deterministic optimization, and a solution would be to support a heap size declaration. An issue is that an errant heap access could set a large minimum heap length. Using constant indexes alone has the same problem. This would be a static issue that the asm.js code developer could test for so might be acceptable. The patch also optimizes constant index heap access, and tries to use shorter instruction sequences when possible, and this can be exploited by packing frequently accessed static state at the start of the heap. The range analysis does not appear to assist some of the current benchmarks, but would give tool developers a new opportunity to hoist bounds checks or masks out of loops and to start exploring strategies for doing so. For example, the heap buffer could be doubled in length beyond that required and all access offset by a quarter of the buffer length. If pointers were masked to half the buffer length then small offsets from these pointers could be proven to be within range of the heap and the bounds checks optimized away. Perhaps the offsets could be hoisted too. Further work could provide implicit buffer zones beyond the requested heap length to assist the removal of bounds checks through range analysis without the need to explicitly offset heap access indexes. This might use an offset to the heap base or memory protection.
Attachment #744582 - Attachment is obsolete: true
Attachment #746929 - Flags: feedback?(luke)
Summary: Optimize asm.js access to heap when the index is a constant. → Asm.js: optimize heap access when the index and length range are known.
Rebase needed.
Attachment #746929 - Attachment is obsolete: true
Attachment #746929 - Flags: feedback?(luke)
Attachment #746949 - Flags: feedback?(luke)
Comment on attachment 746949 [details] [diff] [review] Asm.js: optimize heap access when the index and length range are known. Review of attachment 746949 [details] [diff] [review]: ----------------------------------------------------------------- I do think we should do more with Range analysis to remove bounds checks (in the ways we've already discussed), but I don't think we should use it in this particular case. The reason is that the results of range analysis will affect the link-time validation behavior in a way that is difficult to formally define in the asm.js spec. That is, the asm.js spec can say "link time validation requires that the heap length is a power of 2 greater than 4096 and the largest i in a constant heap access expression HEAPX[i]", but if we use range analysis, then you could have a program where a constant happens to flow into a HEAPX access and then Odin would fail to validate even though the asm.js spec says it should. Instead, I think we can accumulate the max constant heap access index (seen during asm.js type checking) and this can be observed by lowering: if the input is a constant AND it is less than the max constant heap access observed during type checking, emit the bounds-check-free global heap access (comment on this below). ::: js/src/ion/MIRGenerator.h @@ +129,5 @@ > + if (index > maxAsmJSHeapIndex_) { > + // The heap length is always a power of two, so round up. > + int32_t shift; > + JS_FLOOR_LOG2(shift, index); > + maxAsmJSHeapIndex_ = index | ((1 << shift) - 1); You can use RoundUpPow2 here. ::: js/src/ion/MIRGraph.cpp @@ +24,5 @@ > error_(false), > cancelBuild_(0), > maxAsmJSStackArgBytes_(0), > + performsAsmJSCall_(false), > + maxAsmJSHeapIndex_(4095) Could you replace this constant with AsmJSAllocationGranularity-1? (The name is a bit off, AsmJSMinimumHeapSize would be better; you can rename this if you'd like.) ::: js/src/ion/arm/Lowering-arm.cpp @@ +462,5 @@ > + case ArrayBufferView::TYPE_INT8: case ArrayBufferView::TYPE_UINT8: > + case ArrayBufferView::TYPE_INT16: case ArrayBufferView::TYPE_UINT16: > + case ArrayBufferView::TYPE_INT32: case ArrayBufferView::TYPE_UINT32: > + case ArrayBufferView::TYPE_FLOAT32: case ArrayBufferView::TYPE_FLOAT64: > + lir = new LAsmJSStoreHeap(LAllocation(ptr->toConstant()->vp()), I was thinking, since there is a good bit of duplication in the CodeGenerators anyway, what if we created a new LAsmJS(Load|Store)HeapGlobal (I think this would also be good because it allow LAsmJSLoadHeapGlobal to be an LInstructionHelper<1,0,0> and LAsmJSStoreHeapGlobal to be a <0,1,0>, decreasing register pressure.
Attachment #746949 - Flags: feedback?(luke)
Recording the maximum heap access constant indexes when checking the asm.js code seems a good approach and I'll rework the patch to do this. It turns out that the range information available to the backend is rather degraded so for now it will probably be necessary to move the decision to skip the bounds check into the range analysis stage. Let me explore this further. Regarding the 'register pressure', when the index is a constant no register is allocated for the index, and it's not immediately obvious that the register usage can be reduced further. Sorry, it's not clear what function LAsmJS(Load|Store)HeapGlobal would have? Are these intended to handle a constant index, or more generally an access with no bounds check? For some constant index heap accesses it might be possible to transform directly to using these when checking the asm.js code, otherwise later during range analysis.
Attached patch Rebased, and reworked. (obsolete) (deleted) — Splinter Review
* Added a 'skipBoundsCheck' flag to the MIR heap access ops. This is read by the backend to emit heap accesses without bounds checks. * Moved the range based code into the range analysis stage. This sets the 'skipBoundsCheck' flag if a bounds check can be proven to be unnecessary. The range state degrades after the range analysis stage so this seems best. * The minimum heap length constraint is now noted when checking functions and converting to MIR. The cumulative constraint is passed to the compiler analysis stages, but the results of range analysis no longer update the heap length constraint. The heap length constraint should follow simple deterministic rules now. * Also accept u32[nnn>>2] where nnn is a literal constant as defining a heap length constraint, and fold this early when checking functions. This requires a change to the asm.js spec. and needs review. * This patch includes much of the infrastructure needed for bug 897337. It remains a simple matter to set the skipBoundsCheck flag and note a heap length constraint in order to have a mask eliminate a bounds check.
Attachment #746949 - Attachment is obsolete: true
Blocks: 897337
Attached patch Rework the tests to use less memory. (obsolete) (deleted) — Splinter Review
Rework the tests to use less memory for the benefit of testing on limited memory ARM devices. One issue to consider is that this adds a constraint on the heap size based on constant heap indexes of the form u32[1234] and u32[1234>>2] and this will need coordination with the asm.js specification. Do you want the later syntax to add a constraint? If it is not added now then code might be written to depend on it not adding a constraint and might break if this were added later. Looking ahead to the use of 'const' variables, it is handy to be able to use them in these contexts and the later syntax is handy for such use. This patch also adds much of the support needed for the high bit masking, adding a flag on the MIR objects to skip the bounds check. This flag is set for constant indexes, and when range analysis proves that the bounds check is unnecessary, and hand written code might be able to take advantage of this to avoid bounds checks in loops. The results of range analysis no longer affect the constraint on the heap size, even if range analysis proves that code accesses the heap out of the expect bounds.
Attachment #785747 - Attachment is obsolete: true
Attachment #790775 - Flags: review?(luke)
Comment on attachment 790775 [details] [diff] [review] Rework the tests to use less memory. Review of attachment 790775 [details] [diff] [review]: ----------------------------------------------------------------- First of all, this is looking great! I'm definitely excited about landing this: it looks like this removes about 10% of total heap access bounds checks (9% from constant indices, 1% via RangeAnalysis) on Citadel and 3.5% of total heap accesses from WMW (half from constant indices, half via range analysis). (In reply to Douglas Crosher [:dougc] from comment #9) > One issue to consider is that this adds a constraint on the heap size based > on constant heap indexes of the form u32[1234] and u32[1234>>2] and this > will need coordination with the asm.js specification. Do you want the later > syntax to add a constraint? If it is not added now then code might be > written to depend on it not adding a constraint and might break if this were > added later. Looking ahead to the use of 'const' variables, it is handy to > be able to use them in these contexts and the later syntax is handy for such > use. I had been thinking that the rule "if you use >>, there is no link-time constraint" would actually be useful since it would let codegenerators reliably know that they wouldn't trigger link-time failures from code which perhaps was never executed (by some dynamic means). ::: js/src/jit/AsmJS.cpp @@ +1162,5 @@ > > int64_t usecBefore_; > SlowFunctionVector slowFunctions_; > > + uint32_t minHeapLength_; Instead of maintaining this state locally in the ModuleCompiler and propagating to the AsmJSModule at the end, how about just have ModuleCompiler::noteMinHeapLength call AsmJSModule::noteMinHeapLength? (Could then remove ModuleCompiler::minHeapLength.) @@ +1996,5 @@ > + load->setSkipBoundsCheck(true); > + // It is adequate to note the ptrValue+1 rather than rounding up to the next > + // access-size boundary because access is always aligned and the constraint will be > + // rounded up to the nearest power of two later. > + m().noteMinHeapLength((uint32_t) ptrValue + 1); I think this technically fine, but I'd much rather do the noteMinHeapLength call from CheckArrayAccess in the IsLiteralUint32 branch so that way it is logically part of the type checking rules; FunctionCompiler methods should basically just do unsurprising MIR-gen implementation stuff. But then it'd be bad to have the load->setSkipBoundsCheck(true) here (b/c it needs to match up exactly with the noteMinHeapLength call), so what I propose is that we have an enum: enum NeedsBoundsCheck { NEEDS_BOUNDS_CHECK, NO_BOUNDS_CHECK }; and change the signature of CheckArrayAccess to take a NeedsBoundsCheck* outparam which is then passed to loadHeap() so loadHeap() calls setSkipBoundsCheck(true) if it gets NO_BOUNDS_CHECK. Second, to emphasize the link-time check, perhaps we could rename noteMinHeapLength to requireHeapLengthToBeAtLeast(n)? @@ +2013,5 @@ > + int32_t ptrValue = ptr->toConstant()->value().toInt32(); > + if (ptrValue >= 0) { > + store->setSkipBoundsCheck(true); > + // See comment above in loadHeap. > + m().noteMinHeapLength((uint32_t) ptrValue + 1); Ditto @@ +4773,5 @@ > m.parser().release(mark); > > + // Copy the cumulative minimum heap size constraint to the MIR for use in analysis. > + // The length is also constrained to be a power of two, so firstly round up. > + size_t len = mozilla::RoundUpPow2((size_t) m.minHeapLength()); Given that AsmJSLink already does an IsPowerOfTwo check, it seems like we could leave out these detail here and keep the checks orthogonal. ::: js/src/jit/AsmJSModule.h @@ +663,5 @@ > + } > + uint32_t minHeapLength() const { > + return minHeapLength_; > + } > + Trailing whitespace (there's a few others, so check the patch with .hgrc changes mentioned in earlier comment so that they show up easily). ::: js/src/jit/MIR.h @@ +8264,5 @@ > > ArrayBufferView::ViewType viewType() const { return viewType_; } > MDefinition *ptr() const { return getOperand(0); } > + bool skipBoundsCheck() const { return skipBoundsCheck_; } > + void setSkipBoundsCheck(bool v) { skipBoundsCheck_ = v; } Do you think it'd make sense to hoist a factor out a base class MAsmJSHeapAccess (derived by Load/Store) that held skipBoundsCheck, viewType, and ptr? ::: js/src/jit/x64/Lowering-x64.cpp @@ +154,5 @@ > + switch (ins->viewType()) { > + case ArrayBufferView::TYPE_INT8: case ArrayBufferView::TYPE_UINT8: > + case ArrayBufferView::TYPE_INT16: case ArrayBufferView::TYPE_UINT16: > + case ArrayBufferView::TYPE_INT32: case ArrayBufferView::TYPE_UINT32: > + lir = new LAsmJSStoreHeap(LAllocation(ptr->toConstant()->vp()), I'm probably missing something, but why is this extra path necessary? Couldn't you change the existing to cases to useRegisterOrConstantAtStart(ins->ptr())?
Attached patch Address many of the review comments. (obsolete) (deleted) — Splinter Review
The update attempts to address many of the review comments. It will need some more testing after these changes.
Attachment #790775 - Attachment is obsolete: true
Attachment #790775 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #10) > Comment on attachment 790775 [details] [diff] [review] ... > (In reply to Douglas Crosher [:dougc] from comment #9) > > One issue to consider is that this adds a constraint on the heap size based > > on constant heap indexes of the form u32[1234] and u32[1234>>2] and this > > will need coordination with the asm.js specification. Do you want the later > > syntax to add a constraint? If it is not added now then code might be > > written to depend on it not adding a constraint and might break if this were > > added later. Looking ahead to the use of 'const' variables, it is handy to > > be able to use them in these contexts and the later syntax is handy for such > > use. > > I had been thinking that the rule "if you use >>, there is no link-time > constraint" would actually be useful since it would let codegenerators > reliably know that they wouldn't trigger link-time failures from code which > perhaps was never executed (by some dynamic means). One example where the >> case was handy was replacing a var holding a pointer into the heap with a 'const' var when appropriate, such as the tempDouble location in emscripten. I have left it in for now, but it is trivial to remove once a final decision is made. Looking at the non->> case raises a concern that this might not be handled correctly in the current code. For example, it appears the u32[0x40000000] would be converted to u32[0]? Let me double check the current code in this areas too. > ::: js/src/jit/AsmJS.cpp > @@ +1162,5 @@ > > > > int64_t usecBefore_; > > SlowFunctionVector slowFunctions_; > > > > + uint32_t minHeapLength_; > > Instead of maintaining this state locally in the ModuleCompiler and > propagating to the AsmJSModule at the end, how about just have > ModuleCompiler::noteMinHeapLength call AsmJSModule::noteMinHeapLength? > (Could then remove ModuleCompiler::minHeapLength.) Good idea, done. > @@ +1996,5 @@ > > + load->setSkipBoundsCheck(true); > > + // It is adequate to note the ptrValue+1 rather than rounding up to the next > > + // access-size boundary because access is always aligned and the constraint will be > > + // rounded up to the nearest power of two later. > > + m().noteMinHeapLength((uint32_t) ptrValue + 1); > > I think this technically fine, but I'd much rather do the noteMinHeapLength > call from CheckArrayAccess in the IsLiteralUint32 branch so that way it is > logically part of the type checking rules; FunctionCompiler methods should > basically just do unsurprising MIR-gen implementation stuff. But then it'd > be bad to have the load->setSkipBoundsCheck(true) here (b/c it needs to > match up exactly with the noteMinHeapLength call), so what I propose is that > we have an enum: > enum NeedsBoundsCheck { NEEDS_BOUNDS_CHECK, NO_BOUNDS_CHECK }; > and change the signature of CheckArrayAccess to take a NeedsBoundsCheck* > outparam which is then passed to loadHeap() so loadHeap() calls > setSkipBoundsCheck(true) if it gets NO_BOUNDS_CHECK. Done. > Second, to emphasize the link-time check, perhaps we could rename > noteMinHeapLength to requireHeapLengthToBeAtLeast(n)? Done. ... > @@ +4773,5 @@ > > m.parser().release(mark); > > > > + // Copy the cumulative minimum heap size constraint to the MIR for use in analysis. > > + // The length is also constrained to be a power of two, so firstly round up. > > + size_t len = mozilla::RoundUpPow2((size_t) m.minHeapLength()); > > Given that AsmJSLink already does an IsPowerOfTwo check, it seems like we > could leave out these detail here and keep the checks orthogonal. The purpose of rounding up here is to give range analysis the largest known minimum heap length as this could allow more bounds check to be avoided. > ::: js/src/jit/AsmJSModule.h > @@ +663,5 @@ > > + } > > + uint32_t minHeapLength() const { > > + return minHeapLength_; > > + } > > + > > Trailing whitespace (there's a few others, so check the patch with .hgrc > changes mentioned in earlier comment so that they show up easily). Thanks. Got this working now. Should be fixed. > ::: js/src/jit/MIR.h > @@ +8264,5 @@ > > > > ArrayBufferView::ViewType viewType() const { return viewType_; } > > MDefinition *ptr() const { return getOperand(0); } > > + bool skipBoundsCheck() const { return skipBoundsCheck_; } > > + void setSkipBoundsCheck(bool v) { skipBoundsCheck_ = v; } > > Do you think it'd make sense to hoist a factor out a base class > MAsmJSHeapAccess (derived by Load/Store) that held skipBoundsCheck, > viewType, and ptr? Probably. Let me look into this one for the next patch update. > ::: js/src/jit/x64/Lowering-x64.cpp > @@ +154,5 @@ > > + switch (ins->viewType()) { > > + case ArrayBufferView::TYPE_INT8: case ArrayBufferView::TYPE_UINT8: > > + case ArrayBufferView::TYPE_INT16: case ArrayBufferView::TYPE_UINT16: > > + case ArrayBufferView::TYPE_INT32: case ArrayBufferView::TYPE_UINT32: > > + lir = new LAsmJSStoreHeap(LAllocation(ptr->toConstant()->vp()), > > I'm probably missing something, but why is this extra path necessary? > Couldn't you change the existing to cases to > useRegisterOrConstantAtStart(ins->ptr())? The issue is that for the x64 it is necessary to avoid using an immediate constant encoded in the addressing mode when the constant is negative. A negative constant here would add using 64 bit arith. and not wrap back onto the protected region, which would not be good. An attempt has been made to better document these code paths.
(In reply to Douglas Crosher [:dougc] from comment #12) > One example where the >> case was handy was replacing a var holding a pointer > into the heap with a 'const' var when appropriate, such as the tempDouble > location in emscripten. I have left it in for now, but it is trivial to > remove once a final decision is made. If we added 'const', couldn't we have u32[K] (where K is the const) generate the link-time check and u32[K>>0] use the bounds check? Since you can't use const now, we wouldn't have to worry about breaking old validating asm.js. > Looking at the non->> case raises a concern that this might not be handled > correctly in the current code. For example, it appears the u32[0x40000000] would be > converted to u32[0]? Let me double check the current code in this areas too. You're right! Care to file the bug? > The purpose of rounding up here is to give range analysis the largest known > minimum heap length as this could allow more bounds check to be avoided. Ah, makes sense. Could you put this in a comment? Related question: in my measurements it didn't look like range analysis was eliminating that many bounds checks (that weren't already eliminated in Odin). In the context of bug 901109, I've been thinking that it might be a good idea to turn off a bunch of these passes since they tend to not help that much (more measurement needed, though). Do you think there would be that much to gain in followup work on range analysis? I know you have bug 870743, but I was hoping bug 897337 could be used to avoid these bounds checks altogether. > The issue is that for the x64 it is necessary to avoid using an immediate > constant encoded in the addressing mode when the constant is negative. Good point. What about adding a useRegisterOrNonNegativeConstant?
(In reply to Luke Wagner [:luke] from comment #13) > You're right! Care to file the bug? Actually, the fix to this will naturally fall out of your patch (it'll just be a link-time validation error), as long as we handle the int32 overflow correctly :)
Attached patch Address review comments. (obsolete) (deleted) — Splinter Review
* Add base class MAsmJSHeapAccess. * Add useRegisterOrNonNegativeConstant and use on the x64 backend. * Fix u32[0x40000000] bug etc. * Rebase.
Attachment #791298 - Attachment is obsolete: true
(In reply to Luke Wagner [:luke] from comment #13) > (In reply to Douglas Crosher [:dougc] from comment #12) Thank you for the feedback. > > One example where the >> case was handy was replacing a var holding a pointer > > into the heap with a 'const' var when appropriate, such as the tempDouble > > location in emscripten. I have left it in for now, but it is trivial to > > remove once a final decision is made. > > If we added 'const', couldn't we have u32[K] (where K is the const) generate > the link-time check and u32[K>>0] use the bounds check? Since you can't use > const now, we wouldn't have to worry about breaking old validating asm.js. The case that needs a decision is u32[C>>2]? The current patch adds a heap length restriction for this but it is trivial to remove this. There seems to be some merit in both alternatives: * If there were a way to declare the heap length restriction then none of these cases would need to add a heap length restriction. Some simple constant folding and a comparison to the declared minimum length would allow the accesses to optimize away the bounds check if within the declared heap length. The u32[C] syntax would suffice and it might become common asm.js coding style to make the first function include such an access to effectively declare the minimum heap length. * Perhaps given that these do not currently add restrictions there is a good case to argue that for now u32[C>>2] should not. * It could be argued that the writer of asm.js code should be responsible for constant folding optimizations, and thus u32[C>>2] would be unexpected anyway. However it could also be argued that it is a syntactic distinction. * An advantage of adding the restriction now is that it can always be relaxed in future with little chance of breaking code. It might be convenient to accept u32[C>>2] as adding a heap length restriction and being optimized to skip the bounds checks without depending on folding or range analysis. It also allows a 'var' to be replaced by a 'const', and visa versa, in asm.js code without rewriting all uses to ensure they are all optimize - however an earlier u32[C] plus range analysis could suffice. If it is decided to not have u32[C>>2] add a length restriction then perhaps it would be best to at least fold the op in CheckArrayAccess and compare it against the current known minimum heap length so that the bounds check can be optimized away even without range analysis? > > Looking at the non->> case raises a concern that this might not be handled > > correctly in the current code. For example, it appears the u32[0x40000000] would be > > converted to u32[0]? Let me double check the current code in this areas too. > > You're right! Care to file the bug? As mentioned in comment 14 this can be corrected in this patch, and has been. Could the asm.js spec. be clarified to note that only constant indexes in the range 0 to 0x7fffffff inclusive (after shifting if supported) add a heap length constraint? Some extra tests can be added if this is accepted. > > The purpose of rounding up here is to give range analysis the largest known > > minimum heap length as this could allow more bounds check to be avoided. > > Ah, makes sense. Could you put this in a comment? Done. > Related question: in my measurements it didn't look like range analysis was > eliminating that many bounds checks (that weren't already eliminated in > Odin). In the context of bug 901109, I've been thinking that it might be a > good idea to turn off a bunch of these passes since they tend to not help > that much (more measurement needed, though). Do you think there would be > that much to gain in followup work on range analysis? I know you have bug > 870743, but I was hoping bug 897337 could be used to avoid these bounds > checks altogether. For range analysis to be useful requires the code author to hoist tests that guarantee that the accesses are within range, and for the range analysis to understand them. The patch is a start that supports further improvements to the range analysis and experimentation by asm.js code writers. FWIW the patch has been written to work with range analysis disabled. Bug 897337 will allow the code author to choose to use masking rather than bounds checks, but the masking still needs to be optimized away and do so will need range analysis. Regarding bug 901109, disabling optimizations and using a less effective register allocator, does not appear to be a useful path unless support for 'compiler optimization level directives' is added. Perhaps the more useful question to ask is: how much code optimization can be pushed back to the code writer (emscripten etc), and what subset is really necessary for the generation of efficient asm.js code? > > The issue is that for the x64 it is necessary to avoid using an immediate > > constant encoded in the addressing mode when the constant is negative. > > Good point. What about adding a useRegisterOrNonNegativeConstant? Good idea, done.
(In reply to Douglas Crosher [:dougc] from comment #16) > The case that needs a decision is u32[C>>2]? Yes > * Perhaps given that these do not currently add restrictions there > is a good case to argue that for now u32[C>>2] should not. Agreed > * It could be argued that the writer of asm.js code should be > responsible for constant folding optimizations, and thus u32[C>>2] > would be unexpected anyway. Agreed > However it could also be argued that it is a syntactic distinction. I think that is fine and in line with asm.js's general approach. > It also allows a 'var' to be replaced > by a 'const', and visa versa, in asm.js code without rewriting all > uses to ensure they are all optimize - however an earlier u32[C] > plus range analysis could suffice. I don't think we need to worry about the hand-written-asm.js-being-mutated use case. > Could the asm.js spec. be clarified to note that only constant indexes in the > range 0 to 0x7fffffff inclusive (after shifting if supported) add a heap > length constraint? I don't think that exception would be necessary: if the constant index is bigger than 0x7fffffff, it'll just generate a length check that always fails. > Bug 897337 will allow the code author to choose to use masking rather > than bounds checks, but the masking still needs to be optimized away > and do so will need range analysis. Removal needs range analysis, but I think hoisting (so there is one mask done for many loads/stores in the loop) could be done w/o range analysis. > Regarding bug 901109, disabling optimizations and using a less effective > register allocator, does not appear to be a useful path unless support > for 'compiler optimization level directives' is added. Worse regalloc would only be applied for super-large functions (which end up dominating compile time b/c of the apparent polynomial behavior of regalloc). Optimizations would be disabled if they weren't expected to be helpful for Emscripten code (which has already passed through LLVM). > Perhaps the more useful question to ask is: how much code optimization > can be pushed back to the code writer (emscripten etc), and what subset > is really necessary for the generation of efficient asm.js code? I think that is the key question and I think we should "a lot". Based on this, I just don't see any win from having u32[C>>2] generate a length check and I see several sources of lose, so I'd like to leave it out.
Random thought -- if you anticipate greater variety in asm.js producers, an asm.js-to-asm.js optimizer utility may make sense -- perhaps something like Emscripten's js-optimizer.js factored out for use by other tools as well. Or perhaps something new using a simple asm.js-oriented IR rather than a general JS AST, which could be fairly powerful.
Great point! Alon already has the relooper factored out in a separate, reusable program, so what you say could be considered an expansion of that project.
Attached patch Address reviewer comments. (obsolete) (deleted) — Splinter Review
* u32[C>>2] etc no longer add a heap length constraint, but this usage is folded immediately and bounds checks are skipped if possible even without range analysis. * A literal constant index up to 0xffffffff is now recorded and adds a heap length constraint so high indexes in the range 0x80000000 to 0xffffffff will cause a link time failure. The representation of the heap length constraint has been changed to handle this, and now stores the maximum index rather than the length. Note that byte indexes beyond 0xffffffff result in an immediate failure which is a minor change to the asm.js spec. which previously allowed u32[0xffffffff] etc. If this is not the desired behaviour then the heap length restriction could saturate and be stored as 0xffffffff for such larger indexes and also result in a link time error? It might be more useful to fail while parsing if given such large indexes to give the developer better feedback.
Attachment #792084 - Attachment is obsolete: true
Comment on attachment 794083 [details] [diff] [review] Address reviewer comments. Review of attachment 794083 [details] [diff] [review]: ----------------------------------------------------------------- This is looking good. Mostly nits and some requests to minimize code duplication in Lowering/CodeGenerator: ::: js/src/jit-test/tests/asm.js/testZOOB.js @@ +9,2 @@ > assertEq(asmLink(asmCompile('glob', 'imp', 'b', USE_ASM + 'var arr=new glob.Int8Array(b); function f() {return arr[0x8fffffff]|0 } return f'), this, null, buf)(), 0); > +assertEq(asmLink(asmCompile('glob', 'imp', 'b', USE_ASM + 'var arr=new glob.Int32Array(b); function f() {return arr[0x8fffffff>>2]|0 } return f'), this, null, buf)(), 0); Can you also test with an Int32Array with arr[0x3fffffff] and similar corner-case constants? ::: js/src/jit/AsmJS.cpp @@ +3104,5 @@ > *viewType = global->viewType(); > > uint32_t pointer; > if (IsLiteralUint32(indexExpr, &pointer)) { > + if (pointer > (0xffffffffU >> TypedArrayShift(*viewType))) Can you s/0xffffffffU/UINT32_MAX/ ? @@ +3172,5 @@ > { > ArrayBufferView::ViewType viewType; > MDefinition *pointerDef; > + NeedsBoundsCheck needsBoundsCheck = NEEDS_BOUNDS_CHECK; > + if (!CheckArrayAccess(f, elem, &viewType, &pointerDef, &needsBoundsCheck)) Instead of having CheckArrayAccess depend on the caller to initialize needsBoundsCheck, can you put "*needsBoundsCHeck = NEEDS_BOUNDS_CHECK;" as the first line of CheckArrayAccess and remove the initialization from the callsites? @@ +4734,5 @@ > + // The length is also constrained to be a power of two, so firstly round up - a larger > + // 'heap required length' can help range analysis to prove that bounds checks are not > + // needed. Note rounding up here may overflow a 32 bit integer, but only by one, and > + // after subtraction of one it gives the appropriate result. > + size_t index = (size_t(1) << mozilla::CeilingLog2((size_t) m.minHeapIndex() + 1)) - 1; If you instead store 'length' and use a uint64_t (as proposed below), this could be RoundUpPow2, but you'll need to add a uint64_t overload to MathAlgorithms.h. ::: js/src/jit/AsmJSModule.h @@ +339,5 @@ > GlobalVector globals_; > ExitVector exits_; > ExportedFunctionVector exports_; > HeapAccessVector heapAccesses_; > + uint32_t minHeapIndex_; How about just store a uint64_t and call it minHeapLength_ again? In the future, we'd like to support ArrayBuffers bigger than 2GB so this will be necessary eventually. ::: js/src/jit/Lowering.cpp @@ +2747,5 @@ > LIRGenerator::visitAsmJSLoadHeap(MAsmJSLoadHeap *ins) > { > + MDefinition *ptr = ins->ptr(); > + JS_ASSERT(ptr->type() == MIRType_Int32); > +#if defined(JS_CPU_X86) || defined(JS_CPU_ARM) Instead of using an #ifdef, can you split these out into LIRGenerator{X86,ARM,X64}::visitAsmJSLoadHeap? ::: js/src/jit/arm/Lowering-arm.cpp @@ +508,3 @@ > LAsmJSStoreHeap *lir; > + JS_ASSERT(ptr->type() == MIRType_Int32); > + if (ptr->isConstant() && ins->skipBoundsCheck()) { IIUC, you could write: LAllocation ptrAlloc = ins->skipBoundCheck() ? useRegisterOrNonNegativeConstant(ptr) : useRegisterAtStart(ptr); and then avoid the rest of the duplication. ::: js/src/jit/x86/CodeGenerator-x86.cpp @@ +473,2 @@ > > + if (ptr->isConstant()) { It seems like we could avoid duplicating the following 50 lines if you wrapped the OOL path and cmpl in an 'if (mir->skipBoundsCheck())' and similarly for srcAddr: Operand srcAddr; if (ptr->isConstant()) { ... assertions srcAddr = AbsoluteAddress((void *) ptrImm); } else { srcAddr = Operand(ptrReg, 0); } Could you factor the store path as well as x64 the same way? (ARM didn't look quite as straightforward, but perhaps too if you can.) ::: js/src/jit/x86/Lowering-x86.cpp @@ +253,5 @@ > // It's a trap! On x86, the 1-byte store can only use one of > // {al,bl,cl,dl,ah,bh,ch,dh}. That means if the register allocator > // gives us one of {edi,esi,ebp,esp}, we're out of luck. (The formatter > // will assert on us.) Ideally, we'd just ask the register allocator to > // give us one of {al,bl,cl,dl}. For now, just useFixed(al). We probably only need this comment once in this function.
Attached patch Address reviewer comments (obsolete) (deleted) — Splinter Review
* revert to using requireHeapLengthToBeAtLeast, and store the length in a uint64_t. * revert to generating a compile time error for unshifted literal constants larger than 2^31-1, rather than 2^32-1. The compiler paths are not uint32_t safe so it is not safe to pass in such constants. This could be considered a limitation of the current implementation or written into the asm.js spec.
Attachment #794083 - Attachment is obsolete: true
(In reply to Luke Wagner [:luke] from comment #21) ... Thank you for the feedback, this helped clean the code up a bit. I ran into problems trying to pass a uint32 constant through the compiler. They are being converted to a int32 and causing assertion failures. It seems best to just keep these out of the compiler for now and generate an error if seen. This could be written into the asm.js spec. or considered a limitation of the current implementation. Tests have been added which might help clarify the cases. > ::: js/src/jit-test/tests/asm.js/testZOOB.js > @@ +9,2 @@ > > assertEq(asmLink(asmCompile('glob', 'imp', 'b', USE_ASM + 'var arr=new glob.Int8Array(b); function f() {return arr[0x8fffffff]|0 } return f'), this, null, buf)(), 0); > > +assertEq(asmLink(asmCompile('glob', 'imp', 'b', USE_ASM + 'var arr=new glob.Int32Array(b); function f() {return arr[0x8fffffff>>2]|0 } return f'), this, null, buf)(), 0); > > Can you also test with an Int32Array with arr[0x3fffffff] and similar > corner-case constants? Done. > ::: js/src/jit/AsmJS.cpp > @@ +3104,5 @@ > > *viewType = global->viewType(); > > > > uint32_t pointer; > > if (IsLiteralUint32(indexExpr, &pointer)) { > > + if (pointer > (0xffffffffU >> TypedArrayShift(*viewType))) > > Can you s/0xffffffffU/UINT32_MAX/ ? Done, but using INT32_MAX now. > @@ +3172,5 @@ > > { > > ArrayBufferView::ViewType viewType; > > MDefinition *pointerDef; > > + NeedsBoundsCheck needsBoundsCheck = NEEDS_BOUNDS_CHECK; > > + if (!CheckArrayAccess(f, elem, &viewType, &pointerDef, &needsBoundsCheck)) > > Instead of having CheckArrayAccess depend on the caller to initialize > needsBoundsCheck, can you put "*needsBoundsCHeck = NEEDS_BOUNDS_CHECK;" as > the first line of CheckArrayAccess and remove the initialization from the > callsites? Done. > @@ +4734,5 @@ > > + // The length is also constrained to be a power of two, so firstly round up - a larger > > + // 'heap required length' can help range analysis to prove that bounds checks are not > > + // needed. Note rounding up here may overflow a 32 bit integer, but only by one, and > > + // after subtraction of one it gives the appropriate result. > > + size_t index = (size_t(1) << mozilla::CeilingLog2((size_t) m.minHeapIndex() + 1)) - 1; > > If you instead store 'length' and use a uint64_t (as proposed below), this > could be RoundUpPow2, but you'll need to add a uint64_t overload to > MathAlgorithms.h. Done. But did not add a uint64_t version of RoundUpPow2. > ::: js/src/jit/AsmJSModule.h > @@ +339,5 @@ > > GlobalVector globals_; > > ExitVector exits_; > > ExportedFunctionVector exports_; > > HeapAccessVector heapAccesses_; > > + uint32_t minHeapIndex_; > > How about just store a uint64_t and call it minHeapLength_ again? In the > future, we'd like to support ArrayBuffers bigger than 2GB so this will be > necessary eventually. Done. > ::: js/src/jit/Lowering.cpp > @@ +2747,5 @@ > > LIRGenerator::visitAsmJSLoadHeap(MAsmJSLoadHeap *ins) > > { > > + MDefinition *ptr = ins->ptr(); > > + JS_ASSERT(ptr->type() == MIRType_Int32); > > +#if defined(JS_CPU_X86) || defined(JS_CPU_ARM) > > Instead of using an #ifdef, can you split these out into > LIRGenerator{X86,ARM,X64}::visitAsmJSLoadHeap? Done. > ::: js/src/jit/arm/Lowering-arm.cpp > @@ +508,3 @@ > > LAsmJSStoreHeap *lir; > > + JS_ASSERT(ptr->type() == MIRType_Int32); > > + if (ptr->isConstant() && ins->skipBoundsCheck()) { > > IIUC, you could write: > > LAllocation ptrAlloc = ins->skipBoundCheck() ? > useRegisterOrNonNegativeConstant(ptr) : useRegisterAtStart(ptr); > > and then avoid the rest of the duplication. Done. > ::: js/src/jit/x86/CodeGenerator-x86.cpp > @@ +473,2 @@ > > > > + if (ptr->isConstant()) { > > It seems like we could avoid duplicating the following 50 lines if you > wrapped the OOL path and cmpl in an 'if (mir->skipBoundsCheck())' and > similarly for srcAddr: > > Operand srcAddr; > if (ptr->isConstant()) { > ... assertions > srcAddr = AbsoluteAddress((void *) ptrImm); > } else { > srcAddr = Operand(ptrReg, 0); > } > > Could you factor the store path as well as x64 the same way? (ARM didn't > look quite as straightforward, but perhaps too if you can.) This helped for the x64, thanks, but does not appear to be practical for the x86 without more work on the Assembler and perhaps such work could be handled in a separate bug. Some other re-factoring has been done on the x86 which improves it a little. > ::: js/src/jit/x86/Lowering-x86.cpp > @@ +253,5 @@ > > // It's a trap! On x86, the 1-byte store can only use one of > > // {al,bl,cl,dl,ah,bh,ch,dh}. That means if the register allocator > > // gives us one of {edi,esi,ebp,esp}, we're out of luck. (The formatter > > // will assert on us.) Ideally, we'd just ask the register allocator to > > // give us one of {al,bl,cl,dl}. For now, just useFixed(al). > > We probably only need this comment once in this function. Done.
Comment on attachment 794655 [details] [diff] [review] Address reviewer comments Review of attachment 794655 [details] [diff] [review]: ----------------------------------------------------------------- Great, thanks for addressing my comments! Only a few nits below: ::: js/src/jit/AsmJS.cpp @@ +1479,5 @@ > + module_->requireHeapLengthToBeAtLeast(len); > + } > + uint64_t minHeapLength() const { > + return module_->minHeapLength(); > + } Given your decision to limit constant heap accesses to INT32_MAX (which makes sense), I think we could flip this back to uint32_t (now with plenty of space in the INT32_MAX edge case). @@ +3104,5 @@ > > uint32_t pointer; > if (IsLiteralUint32(indexExpr, &pointer)) { > + if (pointer > (INT32_MAX >> TypedArrayShift(*viewType))) > + return f.fail(indexExpr, "index out of range"); How about "constant heap index out of range"? @@ +3109,5 @@ > pointer <<= TypedArrayShift(*viewType); > + // It is adequate to note pointer+1 rather than rounding up to the next > + // access-size boundary because access is always aligned and the constraint > + // will be rounded up to the nearest power of two later. > + f.m().requireHeapLengthToBeAtLeast((uint64_t) pointer + uint64_t(1)); As mentioned above, requireHeapLengthToBeAtLeast can take a uint32_t and can't overflow from this addition so we can drop the uint64_t casts. @@ +4732,5 @@ > > + // Copy the cumulative minimum heap size constraint to the MIR for use in analysis. The length > + // is also constrained to be a power of two, so firstly round up - a larger 'heap required > + // length' can help range analysis to prove that bounds checks are not needed. > + uint64_t len = uint64_t(1) << mozilla::CeilingLog2(m.minHeapLength()); With a uint32_t value < INT32_MAX, RoundUpPow2 should work fine. @@ +4739,2 @@ > *mir = f.extractMIR(); > + (*mir)->requireAsmJSHeapLengthToBeAtLeast(len); In the case of the MIRGenerator, we aren't requiring anything, we're just informing the MIRGenerator what it can assume, so probably some different name makes sense "noteAsmJSMinHeapLength" perhaps? ::: js/src/jit/AsmJSLink.cpp @@ +221,5 @@ > return LinkFail(cx, "ArrayBuffer byteLength must be a power of two greater than or equal to 4096"); > > + // This check is sufficient without considering the size of the loaded datum because heap > + // loads and stores start on an aligned boundary and the heap byteLength is a power of two. > + if (((heap->byteLength() - 1) <= INT32_MAX) && (heap->byteLength() < module.minHeapLength())) Given that (minHeapLength()-1) <= INT32_MAX, it seems like you could just remove the first conjunct. ::: js/src/jit/RangeAnalysis.cpp @@ +1613,5 @@ > + if (i->isAsmJSLoadHeap()) { > + MAsmJSLoadHeap *ins = i->toAsmJSLoadHeap(); > + Range *range = ins->ptr()->range(); > + if (range && range->lower() >= 0 && > + (uint64_t) range->upper() < mir->minAsmJSHeapLength()) No (uint64_t) needed now here (and below). Second, I'm no expert on range analysis, but I think, from what I'm looking at, you also have to check for range->isLowerInfinite()/range->isUpperInfinite(). I'm not sure though; perhaps asm.js constraints precludes this (in which case you should JS_ASSERT). Let's get a second opinion from Marty. ::: js/src/jit/arm/Lowering-arm.cpp @@ +539,5 @@ > case ArrayBufferView::TYPE_INT8: case ArrayBufferView::TYPE_UINT8: > case ArrayBufferView::TYPE_INT16: case ArrayBufferView::TYPE_UINT16: > case ArrayBufferView::TYPE_INT32: case ArrayBufferView::TYPE_UINT32: > + case ArrayBufferView::TYPE_FLOAT32: case ArrayBufferView::TYPE_FLOAT64: > + lir = new LAsmJSStoreHeap(ptrAlloc, useRegisterAtStart(ins->value())); Hey wait, I think the entire switch can be removed :) ::: js/src/jit/x64/CodeGenerator-x64.cpp @@ +395,5 @@ > MAsmJSLoadHeap *mir = ins->mir(); > ArrayBufferView::ViewType vt = mir->viewType(); > + const LAllocation *ptr = ins->ptr(); > + // No need to note the access if it will never fault. > + bool skipNote = mir->skipBoundsCheck(); Can you put a \n before and after this statement? @@ +413,5 @@ > uint32_t before = masm.size(); > masm.movss(srcAddr, dest); > uint32_t after = masm.size(); > masm.cvtss2sd(dest, dest); > + return skipNote ? true : gen->noteHeapAccess(AsmJSHeapAccess(before, after, vt, ToAnyRegister(ins->output()))); I think this would be a tiny bit simpler as: return skipNote || gen->noteHeapAccess(...); (here and 3 times below) ::: js/src/jit/x86/Lowering-x86.cpp @@ +228,5 @@ > + // A bounds check is only skipped for a positive index. > + JS_ASSERT(ptrValue >= 0); > + ptrAlloc = LAllocation(ptr->toConstant()->vp()); > + } else > + ptrAlloc = useRegisterAtStart(ptr); Need { } around else branch since the if branch has { }
Attachment #794655 - Flags: review+
Comment on attachment 794655 [details] [diff] [review] Address reviewer comments Marty: could you review just the little blog in RangeAnalysis.cpp? I was wondering whether it needed checks for, e.g., isLowerInifinite.
Attachment #794655 - Flags: review?(mrosenberg)
Attachment #794655 - Flags: review?(luke)
Attachment #794655 - Flags: review+
Comment on attachment 794655 [details] [diff] [review] Address reviewer comments Silly bugzilla.
Attachment #794655 - Flags: review?(luke) → review+
(In reply to Luke Wagner [:luke] from comment #24) > ::: js/src/jit/RangeAnalysis.cpp > @@ +1613,5 @@ > > + if (i->isAsmJSLoadHeap()) { > > + MAsmJSLoadHeap *ins = i->toAsmJSLoadHeap(); > > + Range *range = ins->ptr()->range(); > > + if (range && range->lower() >= 0 && > > + (uint64_t) range->upper() < mir->minAsmJSHeapLength()) > > No (uint64_t) needed now here (and below). > > Second, I'm no expert on range analysis, but I think, from what I'm looking > at, you also have to check for > range->isLowerInfinite()/range->isUpperInfinite(). I'm not sure though; > perhaps asm.js constraints precludes this (in which case you should > JS_ASSERT). Let's get a second opinion from Marty. isUpperInfinite is used when the Range cannot encode a value above INT32_MAX, in which case upper() is set to INT32_MAX. So if the max of minAsmJSHeapLength is below or equal (INT32_MAX + 1), then doing a signed 64 bits comparison is safe here. As lower is non-negative, then upper is also non-negative, so this can be doing an unsigned 32 bits comparison should be safe based on the previous assumption. I guess that AsmJS ensure that ins->ptr() cannot be a double, otherwise you will also need to check for hasRoundingErrors.
(In reply to Nicolas B. Pierron [:nbp] from comment #27) Ahh, so then we are right at the razor's edge of being correct. To be future-friendly (when we allow bigger arrays and constants), I think we should check isUpperInifinite. Also it looks like my previous comment wasn't right, a cast is needed, but I think (uint32_t) is sufficient.
Attached patch Address reviewer feedback. (obsolete) (deleted) — Splinter Review
Attachment #794655 - Attachment is obsolete: true
Attachment #794655 - Flags: review?(mrosenberg)
(In reply to Luke Wagner [:luke] from comment #24) > Comment on attachment 794655 [details] [diff] [review] ... > ::: js/src/jit/AsmJS.cpp > @@ +1479,5 @@ > > + module_->requireHeapLengthToBeAtLeast(len); > > + } > > + uint64_t minHeapLength() const { > > + return module_->minHeapLength(); > > + } > > Given your decision to limit constant heap accesses to INT32_MAX (which > makes sense), I think we could flip this back to uint32_t (now with plenty > of space in the INT32_MAX edge case). Done. > @@ +3104,5 @@ > > > > uint32_t pointer; > > if (IsLiteralUint32(indexExpr, &pointer)) { > > + if (pointer > (INT32_MAX >> TypedArrayShift(*viewType))) > > + return f.fail(indexExpr, "index out of range"); > > How about "constant heap index out of range"? Done. > @@ +3109,5 @@ > > pointer <<= TypedArrayShift(*viewType); > > + // It is adequate to note pointer+1 rather than rounding up to the next > > + // access-size boundary because access is always aligned and the constraint > > + // will be rounded up to the nearest power of two later. > > + f.m().requireHeapLengthToBeAtLeast((uint64_t) pointer + uint64_t(1)); > > As mentioned above, requireHeapLengthToBeAtLeast can take a uint32_t and > can't overflow from this addition so we can drop the uint64_t casts. Reverted. > @@ +4732,5 @@ > > > > + // Copy the cumulative minimum heap size constraint to the MIR for use in analysis. The length > > + // is also constrained to be a power of two, so firstly round up - a larger 'heap required > > + // length' can help range analysis to prove that bounds checks are not needed. > > + uint64_t len = uint64_t(1) << mozilla::CeilingLog2(m.minHeapLength()); > > With a uint32_t value < INT32_MAX, RoundUpPow2 should work fine. > > @@ +4739,2 @@ > > *mir = f.extractMIR(); > > + (*mir)->requireAsmJSHeapLengthToBeAtLeast(len); > > In the case of the MIRGenerator, we aren't requiring anything, we're just > informing the MIRGenerator what it can assume, so probably some different > name makes sense "noteAsmJSMinHeapLength" perhaps? Done, reverted. > ::: js/src/jit/AsmJSLink.cpp > @@ +221,5 @@ > > return LinkFail(cx, "ArrayBuffer byteLength must be a power of two greater than or equal to 4096"); > > > > + // This check is sufficient without considering the size of the loaded datum because heap > > + // loads and stores start on an aligned boundary and the heap byteLength is a power of two. > > + if (((heap->byteLength() - 1) <= INT32_MAX) && (heap->byteLength() < module.minHeapLength())) > > Given that (minHeapLength()-1) <= INT32_MAX, it seems like you could just > remove the first conjunct. Done, but also added an assertion check. > ::: js/src/jit/RangeAnalysis.cpp > @@ +1613,5 @@ > > + if (i->isAsmJSLoadHeap()) { > > + MAsmJSLoadHeap *ins = i->toAsmJSLoadHeap(); > > + Range *range = ins->ptr()->range(); > > + if (range && range->lower() >= 0 && > > + (uint64_t) range->upper() < mir->minAsmJSHeapLength()) > > No (uint64_t) needed now here (and below). > > Second, I'm no expert on range analysis, but I think, from what I'm looking > at, you also have to check for > range->isLowerInfinite()/range->isUpperInfinite(). I'm not sure though; > perhaps asm.js constraints precludes this (in which case you should > JS_ASSERT). Let's get a second opinion from Marty. Added checks for isLowerInfinite() and isUpperInfinite(). > ::: js/src/jit/arm/Lowering-arm.cpp > @@ +539,5 @@ > > case ArrayBufferView::TYPE_INT8: case ArrayBufferView::TYPE_UINT8: > > case ArrayBufferView::TYPE_INT16: case ArrayBufferView::TYPE_UINT16: > > case ArrayBufferView::TYPE_INT32: case ArrayBufferView::TYPE_UINT32: > > + case ArrayBufferView::TYPE_FLOAT32: case ArrayBufferView::TYPE_FLOAT64: > > + lir = new LAsmJSStoreHeap(ptrAlloc, useRegisterAtStart(ins->value())); > > Hey wait, I think the entire switch can be removed :) Done. > ::: js/src/jit/x64/CodeGenerator-x64.cpp > @@ +395,5 @@ > > MAsmJSLoadHeap *mir = ins->mir(); > > ArrayBufferView::ViewType vt = mir->viewType(); > > + const LAllocation *ptr = ins->ptr(); > > + // No need to note the access if it will never fault. > > + bool skipNote = mir->skipBoundsCheck(); > > Can you put a \n before and after this statement? Done. > @@ +413,5 @@ > > uint32_t before = masm.size(); > > masm.movss(srcAddr, dest); > > uint32_t after = masm.size(); > > masm.cvtss2sd(dest, dest); > > + return skipNote ? true : gen->noteHeapAccess(AsmJSHeapAccess(before, after, vt, ToAnyRegister(ins->output()))); > > I think this would be a tiny bit simpler as: > return skipNote || gen->noteHeapAccess(...); > (here and 3 times below) Agreed, done. > ::: js/src/jit/x86/Lowering-x86.cpp > @@ +228,5 @@ > > + // A bounds check is only skipped for a positive index. > > + JS_ASSERT(ptrValue >= 0); > > + ptrAlloc = LAllocation(ptr->toConstant()->vp()); > > + } else > > + ptrAlloc = useRegisterAtStart(ptr); > > Need { } around else branch since the if branch has { } Done. There has also been a minor change in CheckArrayAccess for the u32[c>>2] code path - it now only optimizes this if c is <= INT32_MAX.
Optimize some common bitwise-and heap access index patterns while checking in CheckArrayAccess so that bounds checks can be optimized away even without range analysis. Some more tests will be added to exercise the new code paths. The asm.js of some or the larger demos has been converted to use masking on heap accesses for testing, including BananaBench and Citadel, and it looks good so far.
Attachment #795426 - Attachment is obsolete: true
Excellent. What results are you seeing? Also, are you saying you modified Emscripten (or perhaps sed'ed the output) to emit emit HEAP32[(i&M)>>2] and then put a single HEAP32[TOTAL_HEAP_SIZE-1] at the top?
Attached patch Rebase. (obsolete) (deleted) — Splinter Review
Attachment #796706 - Attachment is obsolete: true
(In reply to Luke Wagner [:luke] from comment #32) > Excellent. What results are you seeing? Have just been running debug builds for testing. Some of the examples do not spend a lot of time in asm.js code so the improvement will not be large for these. Shall look into the performance soon. > Also, are you saying you modified Emscripten (or perhaps sed'ed the output) > to emit emit HEAP32[(i&M)>>2] and then put a single > HEAP32[TOTAL_HEAP_SIZE-1] at the top? A hack script does a source-to-source transformation adding the masking at heap accesses, and adds a new function at the start with HEAP8[TOTAL_HEAP_SIZE-1]. I don't have the source for some of the examples, so this is the best that can be done for now for these.
Attachment #797252 - Attachment is obsolete: true
Attachment #798226 - Flags: review?(luke)
Comment on attachment 798226 [details] [diff] [review] Add tests for the bitwise-and optimization paths. Review of attachment 798226 [details] [diff] [review]: ----------------------------------------------------------------- This looks really good, almost ready to land; this should be the last round of comments, but I'd like to see how a few of the suggested refactorings work. ::: js/src/jit/AsmJS.cpp @@ +3097,5 @@ > *viewType = global->viewType(); > > uint32_t pointer; > if (IsLiteralUint32(indexExpr, &pointer)) { > + if (pointer > ((uint32_t) INT32_MAX >> TypedArrayShift(*viewType))) Can you write uint32_t(INT32_MAX) so I don't have to ask whether >> or (cast) has higher associativity? @@ +3103,5 @@ > pointer <<= TypedArrayShift(*viewType); > + // It is adequate to note pointer+1 rather than rounding up to the next > + // access-size boundary because access is always aligned and the constraint > + // will be rounded up to the nearest power of two later. > + f.m().requireHeapLengthToBeAtLeast(pointer + 1U); Could you write (uint32_t(pointer) + 1) so I don't have to wonder whether int + unsigned does int or unsigned arithmetic? @@ +3131,5 @@ > + if (pointerNode->isKind(PNK_BITAND)) { > + ParseNode *indexNode = BinaryLeft(pointerNode); > + ParseNode *maskNode = BinaryRight(pointerNode); > + uint32_t mask2; > + if (IsLiteralUint32(maskNode, &mask2) && mask2 <= (uint32_t) INT32_MAX) { I don't think the second "mask2 <= INT32_MAX" test is necessary: if mask2 is > INT32_MAX, CountLeadingZeroes32(mask2) will be 0 which works out fine below. (Same comment for the INT32_MAX test in the else branch below. @@ +3152,5 @@ > + > + // Fold a 'literal constant right shifted' now, and skip the bounds check if > + // currently possible. This handles the optimization of many of these uses > + // without the need for range analysis. > + if (IsLiteralUint32(pointerNode, &pointer) && pointer <= (uint32_t) INT32_MAX) { If we are definitely running range analysis (I've dropped the idea of removing it), then I don't think doing this here buys us anything: it doesn't save MIR nodes, it shouldn't catch any cases range analysis misses, and range analysis will be looking anyways, so it seems like we could remove this code w/o losing anything. @@ +3156,5 @@ > + if (IsLiteralUint32(pointerNode, &pointer) && pointer <= (uint32_t) INT32_MAX) { > + pointer &= mask; > + if (pointer < f.m().minHeapLength()) > + *needsBoundsCheck = NO_BOUNDS_CHECK; > + *def = f.constant(Int32Value(pointer)); Also: I think this would need to run before the CheckExpr or else we'd be emitting the MConstant twice. @@ +3171,5 @@ > + if (IsLiteralUint32(maskNode, &mask2) && mask2 <= (uint32_t) INT32_MAX) { > + // Flag the access to skip the bounds check if the mask ensures that an 'out of > + // bounds' access can not occur based on the current heap length constraint. > + if (mozilla::CountLeadingZeroes32(f.m().minHeapLength() - 1) <= mozilla::CountLeadingZeroes32(mask2)) > + *needsBoundsCheck = NO_BOUNDS_CHECK; I'd like to factor this logic out into a new Check function: static bool CheckMaskedHeapAccess(FunctionCompiler &f, ParseNode *indexExpr, int32_t *mask, MDefinition *def); that does all the type checking for indexExpr (where indexExpr is a PKN_BITAND) (whether or not the rhs is a constant), such that the two callers can write: if (indexExpr->isKind(PNK_BITAND)) { if (!CheckMaskedHeapAccess(f, indexExpr, &mask, def)) return false; } else { ... just like before } Does this work? ::: js/src/jit/MIR.h @@ +8133,5 @@ > +{ > + ArrayBufferView::ViewType viewType_; > + bool skipBoundsCheck_; > + > + Remove extra \n ::: js/src/jit/x64/CodeGenerator-x64.cpp @@ +415,5 @@ > uint32_t before = masm.size(); > masm.movss(srcAddr, dest); > uint32_t after = masm.size(); > masm.cvtss2sd(dest, dest); > + return skipNote ? true : gen->noteHeapAccess(AsmJSHeapAccess(before, after, vt, ToAnyRegister(ins->output()))); Could you switch to || here to match below? ::: js/src/jit/x64/Lowering-x64.cpp @@ +153,5 @@ > + // negative offset encoded as an offset in the addressing mode would not wrap > + // back into the protected area reserved for the heap. > + if (ptr->isConstant()) { > + int32_t ptrValue = ptr->toConstant()->value().toInt32(); > + if (ptrValue >= 0) { Since ptrValue is only used once, might as well write if (ptr->toConstant()->value().toInt32() >= 0) ::: js/src/jit/x86/CodeGenerator-x86.cpp @@ +464,5 @@ > CodeGeneratorX86::visitAsmJSLoadHeap(LAsmJSLoadHeap *ins) > { > // This is identical to LoadTypedArrayElementStatic, except that the > // array's base and length are not known ahead of time and can be patched > // later on, and the instruction is always infallible. Can you remove this comment? @@ +473,2 @@ > > + if (ptr->isConstant()) { If I read correctly, we can still factor out more duplication by: 1. merging the mir->skipBoundsCheck() branch with the !mir->skipBoundsCheck() branch by having: OutOfLineLoadTypedArrayBounds *ool = NULL; if (!mir->skipBoundsCheck()) { ool = new OutOfLineLoadTypedArrayOutOfBounds... if (!addOutOfLineCode(ool)) return false; } and later having if (ool) masm.bind(ool->rejoin()); 2. making loadViewTypeElement templated on the srcAddr arg so that either an Address or AbsoluteAddress can be passed and calling this from the ptr->isConstant() case. 3. factoring out a new CodeGeneratorX86::loadAsmJSHeap that contains the float32 case and is also templated on srcAddr and calling this from the remaining ptr->isConstant()/!ptr->isConstant() branches. IIUC, the same factoring can be done for visitAsmJSStoreHeap. ::: js/src/jit/x86/Lowering-x86.cpp @@ +229,5 @@ > + // A bounds check is only skipped for a positive index. > + JS_ASSERT(ptrValue >= 0); > + ptrAlloc = LAllocation(ptr->toConstant()->vp()); > + } else > + ptrAlloc = useRegisterAtStart(ptr); { } around else-branch since then-branch has { }
Attached patch Address reviewer comments. Re-factor. (obsolete) (deleted) — Splinter Review
Attachment #798226 - Attachment is obsolete: true
Attachment #798226 - Flags: review?(luke)
Attachment #800123 - Flags: review?(luke)
(In reply to Luke Wagner [:luke] from comment #36) > Comment on attachment 798226 [details] [diff] [review] Thank you for the feedback. This helped cleanup CheckArrayAccess and the x86 backend support. ... > ::: js/src/jit/AsmJS.cpp > @@ +3097,5 @@ > > *viewType = global->viewType(); > > > > uint32_t pointer; > > if (IsLiteralUint32(indexExpr, &pointer)) { > > + if (pointer > ((uint32_t) INT32_MAX >> TypedArrayShift(*viewType))) > > Can you write uint32_t(INT32_MAX) so I don't have to ask whether >> or > (cast) has higher associativity? Done. > @@ +3103,5 @@ > > pointer <<= TypedArrayShift(*viewType); > > + // It is adequate to note pointer+1 rather than rounding up to the next > > + // access-size boundary because access is always aligned and the constraint > > + // will be rounded up to the nearest power of two later. > > + f.m().requireHeapLengthToBeAtLeast(pointer + 1U); > > Could you write (uint32_t(pointer) + 1) so I don't have to wonder whether > int + unsigned does int or unsigned arithmetic? Done. > @@ +3131,5 @@ > > + if (pointerNode->isKind(PNK_BITAND)) { > > + ParseNode *indexNode = BinaryLeft(pointerNode); > > + ParseNode *maskNode = BinaryRight(pointerNode); > > + uint32_t mask2; > > + if (IsLiteralUint32(maskNode, &mask2) && mask2 <= (uint32_t) INT32_MAX) { > > I don't think the second "mask2 <= INT32_MAX" test is necessary: if mask2 is > > INT32_MAX, CountLeadingZeroes32(mask2) will be 0 which works out fine > below. (Same comment for the INT32_MAX test in the else branch below. Done. The intention was to keep potentially negative indexes out of these paths as they are not an optimization target. > @@ +3152,5 @@ > > + > > + // Fold a 'literal constant right shifted' now, and skip the bounds check if > > + // currently possible. This handles the optimization of many of these uses > > + // without the need for range analysis. > > + if (IsLiteralUint32(pointerNode, &pointer) && pointer <= (uint32_t) INT32_MAX) { > > If we are definitely running range analysis (I've dropped the idea of > removing it), then I don't think doing this here buys us anything: it > doesn't save MIR nodes, it shouldn't catch any cases range analysis misses, > and range analysis will be looking anyways, so it seems like we could remove > this code w/o losing anything. It still saves the creation of the MBitAnd op and its later folding. It can be done very efficiently here. It also helps stake out a set of patterns with simple optimizations that code authors could depend on in an asm.js implementation. The importance of this pattern is that it is necessary for a constant index that does not add a heap length constraint - the code writer can not fold it without changing the semantics. > @@ +3156,5 @@ > > + if (IsLiteralUint32(pointerNode, &pointer) && pointer <= (uint32_t) INT32_MAX) { > > + pointer &= mask; > > + if (pointer < f.m().minHeapLength()) > > + *needsBoundsCheck = NO_BOUNDS_CHECK; > > + *def = f.constant(Int32Value(pointer)); > > Also: I think this would need to run before the CheckExpr or else we'd be > emitting the MConstant twice. Good point, thanks. Fixed. > @@ +3171,5 @@ > > + if (IsLiteralUint32(maskNode, &mask2) && mask2 <= (uint32_t) INT32_MAX) { > > + // Flag the access to skip the bounds check if the mask ensures that an 'out of > > + // bounds' access can not occur based on the current heap length constraint. > > + if (mozilla::CountLeadingZeroes32(f.m().minHeapLength() - 1) <= mozilla::CountLeadingZeroes32(mask2)) > > + *needsBoundsCheck = NO_BOUNDS_CHECK; > > I'd like to factor this logic out into a new Check function: ... > Does this work? Tried a few approaches, and factoring out the bitwise-and folding seems to work well. > ::: js/src/jit/MIR.h > @@ +8133,5 @@ > > +{ > > + ArrayBufferView::ViewType viewType_; > > + bool skipBoundsCheck_; > > + > > + > > Remove extra \n Done > ::: js/src/jit/x64/CodeGenerator-x64.cpp > @@ +415,5 @@ > > uint32_t before = masm.size(); > > masm.movss(srcAddr, dest); > > uint32_t after = masm.size(); > > masm.cvtss2sd(dest, dest); > > + return skipNote ? true : gen->noteHeapAccess(AsmJSHeapAccess(before, after, vt, ToAnyRegister(ins->output()))); > > Could you switch to || here to match below? Yes, missed this one last time, fixed. > ::: js/src/jit/x64/Lowering-x64.cpp > @@ +153,5 @@ > > + // negative offset encoded as an offset in the addressing mode would not wrap > > + // back into the protected area reserved for the heap. > > + if (ptr->isConstant()) { > > + int32_t ptrValue = ptr->toConstant()->value().toInt32(); > > + if (ptrValue >= 0) { > > Since ptrValue is only used once, might as well write > if (ptr->toConstant()->value().toInt32() >= 0) Yes, done. > ::: js/src/jit/x86/CodeGenerator-x86.cpp > @@ +464,5 @@ > > CodeGeneratorX86::visitAsmJSLoadHeap(LAsmJSLoadHeap *ins) > > { > > // This is identical to LoadTypedArrayElementStatic, except that the > > // array's base and length are not known ahead of time and can be patched > > // later on, and the instruction is always infallible. > > Can you remove this comment? Done. > @@ +473,2 @@ > > > > + if (ptr->isConstant()) { > > If I read correctly, we can still factor out more duplication by: > 1. merging the mir->skipBoundsCheck() branch with the > !mir->skipBoundsCheck() branch by having: > > OutOfLineLoadTypedArrayBounds *ool = NULL; > if (!mir->skipBoundsCheck()) { > ool = new OutOfLineLoadTypedArrayOutOfBounds... > if (!addOutOfLineCode(ool)) > return false; > } > > and later having > > if (ool) > masm.bind(ool->rejoin()); > Tried this, but the resulting function's got too many knobs and is out of context, so it seems more readable to leave it inline. > 2. making loadViewTypeElement templated on the srcAddr arg so that either > an Address or AbsoluteAddress can be passed and calling this from the > ptr->isConstant() case. > 3. factoring out a new CodeGeneratorX86::loadAsmJSHeap that contains the > float32 case and is also templated on srcAddr and calling this from the > remaining ptr->isConstant()/!ptr->isConstant() branches. Great suggestions, these helped a lot. > IIUC, the same factoring can be done for visitAsmJSStoreHeap. Done. > ::: js/src/jit/x86/Lowering-x86.cpp > @@ +229,5 @@ > > + // A bounds check is only skipped for a positive index. > > + JS_ASSERT(ptrValue >= 0); > > + ptrAlloc = LAllocation(ptr->toConstant()->vp()); > > + } else > > + ptrAlloc = useRegisterAtStart(ptr); > > { } around else-branch since then-branch has { } Done.
Comment on attachment 800123 [details] [diff] [review] Address reviewer comments. Re-factor. Review of attachment 800123 [details] [diff] [review]: ----------------------------------------------------------------- Awesome! I agree with the points you made. Thanks for being so receptive to my comments! ::: js/src/jit/AsmJS.cpp @@ +3090,5 @@ > + ParseNode *indexNode = BinaryLeft(*indexExpr); > + ParseNode *maskNode = BinaryRight(*indexExpr); > + uint32_t mask2; > + > + if (IsLiteralUint32(maskNode, &mask2)) { Could you put the \n before mask2 instead of after? ::: js/src/jit/x86/CodeGenerator-x86.cpp @@ +407,2 @@ > void > +CodeGeneratorX86::loadPartViewTypeElement(ArrayBufferView::ViewType vt, const T &srcAddr, How about loadNonFloat32ViewTypeElement (and storeNonFloat32ViewTypeElement below) ?
Attachment #800123 - Flags: review?(luke) → review+
Carrying over r+
Attachment #800123 - Attachment is obsolete: true
Attachment #800425 - Flags: review+
Keywords: checkin-needed
Keywords: checkin-needed
Keywords: checkin-needed
Status: ASSIGNED → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla26
Whiteboard: [games]
Blocks: gecko-games
No longer depends on: CVE-2015-0817
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: