Closed Bug 891739 Opened 11 years ago Closed 11 years ago

Range::or_ can call js_bitscan_clz32 with an argument 0

Categories

(Core :: JavaScript Engine, defect)

defect
Not set
normal

Tracking

()

RESOLVED FIXED
mozilla25

People

(Reporter: Waldo, Assigned: sunfish)

References

Details

Attachments

(1 file, 4 obsolete files)

Attached patch Patch (obsolete) (deleted) — Splinter Review
js_bitscan_clz32 (soon to be renamed to mozilla::CountLeadingZeroes32 in bug 891177, with a != 0 assertion added at the same time, that detected this) can't be called with an argument of 0, as gcc's __builtin_clz(0) has undefined behavior.  The existing jit-test ion/bug882565-1.js with --ion-eager triggers that assertion for me.

I'm not sure there's an elegant way to structure the code to fix this.  If you see a better way, clue me in.
Attachment #773112 - Flags: review?(sunfish)
Hmm, I guess xor_ needs the same treatment.  And, if tinderbox is accurate, may not have an existing test.  I'll sweep that up too in the morning.
Attached patch fix all the cases in or_ and xor_ (obsolete) (deleted) — Splinter Review
Yes, and the other part of or_ needs the same treatment as well. Here's a patch which fixes both the cases in or_, and all the cases in xor_.

It's tempting to fix this by just special-casing zero in js_bitscan_clz32/CountLeadingZeroes32 and making them return 32, however this wouldn't be sufficient here, because then the code could perform a shift by 32, which would also have undefined behavior.

I'm investigating whether the xor or the other or case can be triggered from plain JavaScript, but so far I haven't found anything. It appears most cases where one operand of an or or xor is statically known to be 0 or -1 get folded away.
Attachment #773342 - Flags: review?(jwalden+bmo)
Comment on attachment 773342 [details] [diff] [review]
fix all the cases in or_ and xor_

Review of attachment 773342 [details] [diff] [review]:
-----------------------------------------------------------------

I love range analysis.

I think the adjusted patch wants a second review from nbp or someone, in addition to mine.  This stuff is *seriously* gnarly.

::: js/src/ion/RangeAnalysis.cpp
@@ +467,5 @@
>      int64_t lower = INT32_MIN;
>      int64_t upper = INT32_MAX;
>  
>      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
>          // Both operands are non-negative, so the result won't be greater than either.

This comment reads wrong, as I read it.  The result must be one operand possibly with additional bits turned on, so the result can be greater than either.  (5 | 2) is 7, greater than either.  Did this mean to say something like "the result can't be less than either operand"?

@@ +471,5 @@
>          // Both operands are non-negative, so the result won't be greater than either.
>          lower = Max(lhs->lower_, rhs->lower_);
>          // The result will have leading zeros where both operands have leading zeros.
> +        if (lhs->upper_ == 0 || rhs->upper_ == 0)
> +            upper = UINT32_MAX;

This could be expanded to

  if (lhs->upper_ == 0)
      upper = rhs->upper_;
  else if (rhs->upper_ == 0)
      upper = lhs->upper_;
  else
      upper = UINT32_MAX >> ...;

if we cared, right?  Why don't we?

Also, if we were to do it the way you have it, we'd want INT32_MAX, not UINT32_MAX.

@@ +479,2 @@
>      } else {
>          // The result will have leading ones where either operand has leading ones.

This comment got me thinking.  What's wrong with this alternative, simpler suggestion below, to replace the four lhs/rhs-negative bit-scanning lines here?

  // If an operand is a known-negative number, or-ing will take that operand and maybe
  // set additional bits in it, but it'll never unset any -- producing a value greater
  // than the negative operand's lower bound.  Therefore we can add in all the bits set
  // in the known-negative lower bounds to compute a more-optimal lower bound.
  if (lhs->upper_ < 0)
    lower |= lhs->lower_;
  if (rhs->upper _ < 0)
    lower |= rhs->lower_;

@@ +497,5 @@
>      int64_t upper = INT32_MAX;
>  
>      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
>          // Both operands are non-negative. The result will be non-negative and
>          // not greater than either.

This comment too seems wrong.  Supposing lhs = [0, 4] and rhs = [0, 1], (4 ^ 1) == 5 which is greater than either.  Shouldn't it be something like "The result will be non-negative and less than the smallest power of two greater than both operands"?

@@ +500,5 @@
>          // Both operands are non-negative. The result will be non-negative and
>          // not greater than either.
>          lower = 0;
> +        if (lhs->upper_ == 0 || rhs->upper_ == 0)
> +            upper = UINT32_MAX;

Again UINT32_MAX looks wrong.

Can't this similarly be expanded like so:

  if (lhs->upper_ == 0)
    upper = rhs->upper_;
  else if (rhs->upper_ == 0)
    upper = lhs->upper_;
  else
    upper = ....;

Oh, style rules will have you adding braces around these single-line bodies, because the last body occupies multiple lines.

@@ +512,5 @@
> +        if (~lhs->lower_ == 0 || ~rhs->lower_ == 0)
> +            upper = UINT32_MAX;
> +        else
> +            upper = UINT32_MAX >> Min(js_bitscan_clz32(~lhs->lower_),
> +                                      js_bitscan_clz32(~rhs->lower_));

This would read nicer, I think, like so:

  uint8_t lhsOnes = (~lhs->lower_ != 0)
                    ? js_bitscan_clz32(~lhs->lower_)
                    : 32;
  uint8_t rhsOnes = (~rhs->lower_ != 0)
                    ? js_bitscan_clz32(~rhs->lower_)
                    : 32;
  upper = UINT32_MAX >> Min(lhsOnes, rhsOnes);

Also UINT32_MAX would again be wrong here.

@@ +517,1 @@
>      } else if (lhs->upper_ < 0 && rhs->lower_ >= 0) {

Seeing as the negative/non-negative code is repeated, reversed, for non-negative/negative -- and how could we ever possibly have a bug in such simple code as this? -- let's do this:

} else {
    if (lhs->lower >= 0 && rhs->upper_ < 0)
        Swap(lhs, rhs);

    if (lhs->upper_ < 0 && rhs->lower >= 0) {
        ...code handling different-sign xor...
    }
}

@@ +519,5 @@
>          // will have leading ones where the negative operand has leading ones
>          // and the non-negative operand has leading zeros.
>          upper = -1;
> +        if (~lhs->lower_ == 0 || rhs->upper_ == 0)
> +            upper = 0;

If the sign bits differ the output can't possibly ever be 0.  This can't be right.

@@ +522,5 @@
> +        if (~lhs->lower_ == 0 || rhs->upper_ == 0)
> +            upper = 0;
> +        else
> +            lower = (int32_t)~(UINT32_MAX >> Min(js_bitscan_clz32(~lhs->lower_),
> +                                                 js_bitscan_clz32(rhs->upper_)));

I think this would read more clearly as (roughly paralleling the both-negative code):

  uint8_t lhsOnes = (~lhs->lower_ != 0)
                    ? js_bitscan_clz32(~lhs->lower_)
                    : 32;
  uint8_t rhsZeroes = (rhs->upper_ != 0)
                      ? js_bitscan_clz32(rhs->upper_)
                      : 32;
  upper = ~int32_t(UINT32_MAX >> Min(lhsOnes, rhsZeroes));

Note that technically casting, say, ~(UINT32_MAX >> 5) to int32_t invokes undefined behavior, because cast-unsigned-to-signed is only defined if the unsigned value is in the signed type's range.  Casting to int32_t before inverting doesn't have this problem.
Attachment #773342 - Flags: review?(jwalden+bmo)
Attachment #773112 - Flags: review?(sunfish)
Blocks: 891177
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #3)
> Comment on attachment 773342 [details] [diff] [review]
> fix all the cases in or_ and xor_
> 
> Review of attachment 773342 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> I love range analysis.
> 
> I think the adjusted patch wants a second review from nbp or someone, in
> addition to mine.  This stuff is *seriously* gnarly.

Ok. Thanks for the careful review!

> ::: js/src/ion/RangeAnalysis.cpp
> @@ +467,5 @@
> >      int64_t lower = INT32_MIN;
> >      int64_t upper = INT32_MAX;
> >  
> >      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> >          // Both operands are non-negative, so the result won't be greater than either.
> 
> This comment reads wrong, as I read it.  The result must be one operand
> possibly with additional bits turned on, so the result can be greater than
> either.  (5 | 2) is 7, greater than either.  Did this mean to say something
> like "the result can't be less than either operand"?

Yes, that is what the comment should say.

> @@ +471,5 @@
> >          // Both operands are non-negative, so the result won't be greater than either.
> >          lower = Max(lhs->lower_, rhs->lower_);
> >          // The result will have leading zeros where both operands have leading zeros.
> > +        if (lhs->upper_ == 0 || rhs->upper_ == 0)
> > +            upper = UINT32_MAX;
> 
> This could be expanded to
> 
>   if (lhs->upper_ == 0)
>       upper = rhs->upper_;
>   else if (rhs->upper_ == 0)
>       upper = lhs->upper_;
>   else
>       upper = UINT32_MAX >> ...;
> 
> if we cared, right?  Why don't we?

I was thinking more about conservatively fixing the bug than getting the most precise possible answer. Your suggestion looks good.

> 
> Also, if we were to do it the way you have it, we'd want INT32_MAX, not
> UINT32_MAX.

Agreed. UINT32_MAX works but is over-conservative. INT32_MAX is the right value.

> 
> @@ +479,2 @@
> >      } else {
> >          // The result will have leading ones where either operand has leading ones.
> 
> This comment got me thinking.  What's wrong with this alternative, simpler
> suggestion below, to replace the four lhs/rhs-negative bit-scanning lines
> here?
> 
>   // If an operand is a known-negative number, or-ing will take that operand
> and maybe
>   // set additional bits in it, but it'll never unset any -- producing a
> value greater
>   // than the negative operand's lower bound.  Therefore we can add in all
> the bits set
>   // in the known-negative lower bounds to compute a more-optimal lower
> bound.
>   if (lhs->upper_ < 0)
>     lower |= lhs->lower_;
>   if (rhs->upper _ < 0)
>     lower |= rhs->lower_;

This doesn't work if rhs can be less than lhs.

> 
> @@ +497,5 @@
> >      int64_t upper = INT32_MAX;
> >  
> >      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> >          // Both operands are non-negative. The result will be non-negative and
> >          // not greater than either.
> 
> This comment too seems wrong.  Supposing lhs = [0, 4] and rhs = [0, 1], (4 ^
> 1) == 5 which is greater than either.  Shouldn't it be something like "The
> result will be non-negative and less than the smallest power of two greater
> than both operands"?

Yes, it should be.

> 
> @@ +500,5 @@
> >          // Both operands are non-negative. The result will be non-negative and
> >          // not greater than either.
> >          lower = 0;
> > +        if (lhs->upper_ == 0 || rhs->upper_ == 0)
> > +            upper = UINT32_MAX;
> 
> Again UINT32_MAX looks wrong.
> 
> Can't this similarly be expanded like so:
> 
>   if (lhs->upper_ == 0)
>     upper = rhs->upper_;
>   else if (rhs->upper_ == 0)
>     upper = lhs->upper_;
>   else
>     upper = ....;
> 
> Oh, style rules will have you adding braces around these single-line bodies,
> because the last body occupies multiple lines.

Ok.

> 
> @@ +512,5 @@
> > +        if (~lhs->lower_ == 0 || ~rhs->lower_ == 0)
> > +            upper = UINT32_MAX;
> > +        else
> > +            upper = UINT32_MAX >> Min(js_bitscan_clz32(~lhs->lower_),
> > +                                      js_bitscan_clz32(~rhs->lower_));
> 
> This would read nicer, I think, like so:
> 
>   uint8_t lhsOnes = (~lhs->lower_ != 0)
>                     ? js_bitscan_clz32(~lhs->lower_)
>                     : 32;
>   uint8_t rhsOnes = (~rhs->lower_ != 0)
>                     ? js_bitscan_clz32(~rhs->lower_)
>                     : 32;
>   upper = UINT32_MAX >> Min(lhsOnes, rhsOnes);

That shift would invoke undefined behavior in the case where lhs->lower_ and rhs->lower_ are both -1.

> Also UINT32_MAX would again be wrong here.

As above.

> @@ +517,1 @@
> >      } else if (lhs->upper_ < 0 && rhs->lower_ >= 0) {
> 
> Seeing as the negative/non-negative code is repeated, reversed, for
> non-negative/negative -- and how could we ever possibly have a bug in such
> simple code as this? -- let's do this:
> 
> } else {
>     if (lhs->lower >= 0 && rhs->upper_ < 0)
>         Swap(lhs, rhs);
> 
>     if (lhs->upper_ < 0 && rhs->lower >= 0) {
>         ...code handling different-sign xor...
>     }
> }

Ok.

> @@ +519,5 @@
> >          // will have leading ones where the negative operand has leading ones
> >          // and the non-negative operand has leading zeros.
> >          upper = -1;
> > +        if (~lhs->lower_ == 0 || rhs->upper_ == 0)
> > +            upper = 0;
> 
> If the sign bits differ the output can't possibly ever be 0.  This can't be
> right.

It's slightly over-conservative. The upper value can be -1 when ~lhs->lower == 0.

> @@ +522,5 @@
> > +        if (~lhs->lower_ == 0 || rhs->upper_ == 0)
> > +            upper = 0;
> > +        else
> > +            lower = (int32_t)~(UINT32_MAX >> Min(js_bitscan_clz32(~lhs->lower_),
> > +                                                 js_bitscan_clz32(rhs->upper_)));
> 
> I think this would read more clearly as (roughly paralleling the
> both-negative code):
> 
>   uint8_t lhsOnes = (~lhs->lower_ != 0)
>                     ? js_bitscan_clz32(~lhs->lower_)
>                     : 32;
>   uint8_t rhsZeroes = (rhs->upper_ != 0)
>                       ? js_bitscan_clz32(rhs->upper_)
>                       : 32;
>   upper = ~int32_t(UINT32_MAX >> Min(lhsOnes, rhsZeroes));

As above, the shift could have undefined behavior this way.

> Note that technically casting, say, ~(UINT32_MAX >> 5) to int32_t invokes
> undefined behavior, because cast-unsigned-to-signed is only defined if the
> unsigned value is in the signed type's range.  Casting to int32_t before
> inverting doesn't have this problem.

Cast-unsigned-to-signed is implementation-defined, not undefined, and all meaningful implementations define it with two's complement rules. That said, rewriting it to the way you suggest is fine.

I'll submit an updated patch with the changes discussed above shortly.
Attached patch updated and improved patch (obsolete) (deleted) — Splinter Review
Here's an updated patch. I took a different approach to dealing with single-element ranges which factors out the handling better, and which is more precise as a bonus. Also, while addressing the other review comments, I made a few additional adjustments to improve precision.
Assignee: general → sunfish
Attachment #773112 - Attachment is obsolete: true
Attachment #773342 - Attachment is obsolete: true
Attachment #774111 - Flags: review?(jwalden+bmo)
(In reply to Dan Gohman [:sunfish] from comment #4)
> This doesn't work if rhs can be less than lhs.

Could you provide an example of an lhs/rhs range pair that would run afoul of this?  I don't follow this.

> >   uint8_t lhsOnes = (~lhs->lower_ != 0)
> >                     ? js_bitscan_clz32(~lhs->lower_)
> >                     : 32;
> >   uint8_t rhsOnes = (~rhs->lower_ != 0)
> >                     ? js_bitscan_clz32(~rhs->lower_)
> >                     : 32;
> >   upper = UINT32_MAX >> Min(lhsOnes, rhsOnes);
> 
> That shift would invoke undefined behavior in the case where lhs->lower_ and
> rhs->lower_ are both -1.

Ugh, I was off by one in how far a value can be legally shifted.  :-\
Comment on attachment 774111 [details] [diff] [review]
updated and improved patch

Review of attachment 774111 [details] [diff] [review]:
-----------------------------------------------------------------

Mostly nits and stuff, but the more-substantial changes you made, beyond what I'd suggested, I don't grasp the value of.  Examples of when the old algorithm was sub-optimal, and when the new one is more optimal, would help a lot with reviewing those parts.  I'm not convinced they buy any more precision, although I could easily be missing something.

::: js/src/ion/RangeAnalysis.cpp
@@ +463,5 @@
>  
>  Range *
>  Range::or_(const Range *lhs, const Range *rhs)
>  {
> +    // When one operand is a known value, its a special case where we can

it's

@@ +467,5 @@
> +    // When one operand is a known value, its a special case where we can
> +    // compute a fully precise result. Handling these up front also protects
> +    // the code below from calling js_bitscan_clz32 with a zero operand or
> +    // from shifting an int32_t by 32.
> +    if (lhs->lower_ == lhs->upper_) {

Let's do the Swap trick here, too, to cut duplication:

  if (rhs->lower_ == rhs->upper_)
      Swap(lhs, rhs);
  if (lhs->lower_ == lhs->upper_) {
      ...
  }

@@ +470,5 @@
> +    // from shifting an int32_t by 32.
> +    if (lhs->lower_ == lhs->upper_) {
> +        int32_t a = rhs->lower_ | lhs->lower_;
> +        int32_t b = rhs->upper_ | lhs->lower_;
> +        return new Range(Min(a, b), Max(a, b));

As noted in the previous comment, I still don't follow the math here, and why this can't be

  return new Range(rhs->lower_ | lhs->lower_, rhs->upper_ | lhs->lower_);

@@ +491,5 @@
>      } else {
>          // The result will have leading ones where either operand has leading ones.
> +        if (lhs->upper_ < 0) {
> +            int leadingOnes = js_bitscan_clz32(~lhs->lower_);
> +            lower = Max(lower, (int64_t)~(int32_t)(UINT32_MAX >> leadingOnes));

Use unsigned for leadingOnes, as the value is always non-negative.  (I will resist the urge to go on a diatribe about how Google's style guide is wrong on this point, at least not for more than a sentence.  ;-) )

C++-style casts, so

  int64_t(~int32_t(UINT32_MAX >> leadingOnes));

@@ +507,5 @@
>  
>  Range *
>  Range::xor_(const Range *lhs, const Range *rhs)
>  {
> +    // When one operand is always 0 or always -1, its a special case where we

it's

Yeah, these are good special cases to handle.

@@ +511,5 @@
> +    // When one operand is always 0 or always -1, its a special case where we
> +    // can compute a fully precise result. Handling these up front also protects
> +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> +    // shifting an int32_t by 32.
> +    if (lhs->lower_ == lhs->upper_) {

if (rhs->lower_ == rhs->upper_)
    Swap(lhs, rhs);
if (lhs->lower_ == lhs->upper_) {
    ...
}

@@ +513,5 @@
> +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> +    // shifting an int32_t by 32.
> +    if (lhs->lower_ == lhs->upper_) {
> +        if (lhs->lower_ == 0)
> +            return new Range(*rhs);

I assume there's some requirement that we have a new Range here, and that we couldn't just return rhs directly?

@@ +535,5 @@
> +        // other to one.
> +        int lhsLeadingZeros = js_bitscan_clz32(lhs->upper_);
> +        int rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> +        upper = Min(rhs->upper_ | (UINT32_MAX >> lhsLeadingZeros),
> +                    lhs->upper_ | (UINT32_MAX >> rhsLeadingZeros));

Why not this, as it was before:

  upper = UINT32_MAX >> Min(lhsLeadingZeroes, rhsLeadingZeroes);

What ranges are handled more optimally by this than by the previous code?  Please provide examples; I think they come to the same thing, myself.

@@ +543,5 @@
> +        // To compute the upper value, take the negation of each operand's lower
> +        // value and set all bits that don't correspond to leading one bits in
> +        // the other to one.
> +        int lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> +        int rhsLeadingOnes = js_bitscan_clz32(~rhs->lower_);

unsigned here too

@@ +545,5 @@
> +        // the other to one.
> +        int lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> +        int rhsLeadingOnes = js_bitscan_clz32(~rhs->lower_);
> +        upper = Min(~rhs->lower_ | (UINT32_MAX >> lhsLeadingOnes),
> +                    ~lhs->lower_ | (UINT32_MAX >> rhsLeadingOnes));

Again, why not as before:

  upper = UINT32_MAX >> Min(lhsLeadingOnes, rhsLeadingOnes);

More examples wanted.

@@ +550,4 @@
>          lower = 0;
> +    } else {
> +        if (lhs->lower_ >= 0 && rhs->upper_ < 0)
> +            mozilla::Swap(lhs, rhs);

Add a |using mozilla::Swap| at the top of the file, then just use Swap directly everywhere.

@@ +551,5 @@
> +    } else {
> +        if (lhs->lower_ >= 0 && rhs->upper_ < 0)
> +            mozilla::Swap(lhs, rhs);
> +        if (lhs->upper_ < 0 && rhs->lower_ >= 0) {
> +            // Lhs is negative and rhs is non-negative. The result be negative.

"The sign bits differ, so the result's sign bit will make the result negative."

@@ +561,5 @@
> +            int rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> +            int32_t lhsMask = ~int32_t(UINT32_MAX >> lhsLeadingOnes);
> +            int32_t rhsMask = ~int32_t(UINT32_MAX >> rhsLeadingZeros);
> +            lower = Max((rhs->upper_ ^ lhsMask) & lhsMask,
> +                        lhs->lower_ & rhsMask);

Again I'm not sure why

  lower = ~int32_t(UINT32_MAX >> Min(lhsLeadingOnes, rhsLeadingZeroes));

as was done before is incorrect, or if it wasn't, what the extra complexity here buys us.  (More examples!)
Attachment #774111 - Flags: review?(jwalden+bmo)
(In reply to Jeff Walden [:Waldo] (remove +bmo to email) from comment #7)
> Comment on attachment 774111 [details] [diff] [review]
> updated and improved patch
> 
> Review of attachment 774111 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Mostly nits and stuff, but the more-substantial changes you made, beyond
> what I'd suggested, I don't grasp the value of.  Examples of when the old
> algorithm was sub-optimal, and when the new one is more optimal, would help
> a lot with reviewing those parts.  I'm not convinced they buy any more
> precision, although I could easily be missing something.
> 
> ::: js/src/ion/RangeAnalysis.cpp
> @@ +463,5 @@
> >  
> >  Range *
> >  Range::or_(const Range *lhs, const Range *rhs)
> >  {
> > +    // When one operand is a known value, its a special case where we can
> 
> it's

Oops, fixed.

> @@ +467,5 @@
> > +    // When one operand is a known value, its a special case where we can
> > +    // compute a fully precise result. Handling these up front also protects
> > +    // the code below from calling js_bitscan_clz32 with a zero operand or
> > +    // from shifting an int32_t by 32.
> > +    if (lhs->lower_ == lhs->upper_) {
> 
> Let's do the Swap trick here, too, to cut duplication:
> 
>   if (rhs->lower_ == rhs->upper_)
>       Swap(lhs, rhs);
>   if (lhs->lower_ == lhs->upper_) {
>       ...
>   }

Ok.

> @@ +470,5 @@
> > +    // from shifting an int32_t by 32.
> > +    if (lhs->lower_ == lhs->upper_) {
> > +        int32_t a = rhs->lower_ | lhs->lower_;
> > +        int32_t b = rhs->upper_ | lhs->lower_;
> > +        return new Range(Min(a, b), Max(a, b));
> 
> As noted in the previous comment, I still don't follow the math here, and
> why this can't be
> 
>   return new Range(rhs->lower_ | lhs->lower_, rhs->upper_ | lhs->lower_);

For example, when lhs is [2,2] and rhs is [1,2], that would return [3,2], where the lower value is greater than the upper value. The Min and Max effectively swap the bounds in this case to fix that. I added a comment about that.

> @@ +491,5 @@
> >      } else {
> >          // The result will have leading ones where either operand has leading ones.
> > +        if (lhs->upper_ < 0) {
> > +            int leadingOnes = js_bitscan_clz32(~lhs->lower_);
> > +            lower = Max(lower, (int64_t)~(int32_t)(UINT32_MAX >> leadingOnes));
> 
> Use unsigned for leadingOnes, as the value is always non-negative.  (I will
> resist the urge to go on a diatribe about how Google's style guide is wrong
> on this point, at least not for more than a sentence.  ;-) )

Ok ;-).

> C++-style casts, so
> 
>   int64_t(~int32_t(UINT32_MAX >> leadingOnes));

Ok.

> @@ +507,5 @@
> >  
> >  Range *
> >  Range::xor_(const Range *lhs, const Range *rhs)
> >  {
> > +    // When one operand is always 0 or always -1, its a special case where we
> 
> it's

Ok.

> Yeah, these are good special cases to handle.
> 
> @@ +511,5 @@
> > +    // When one operand is always 0 or always -1, its a special case where we
> > +    // can compute a fully precise result. Handling these up front also protects
> > +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> > +    // shifting an int32_t by 32.
> > +    if (lhs->lower_ == lhs->upper_) {
> 
> if (rhs->lower_ == rhs->upper_)
>     Swap(lhs, rhs);
> if (lhs->lower_ == lhs->upper_) {
>     ...
> }

This xor code is a little different than the or code. It doesn't handle all possible values up front, just 0 and -1. If lhs is [0,0] or [-1,-1] and rhs is [2,2] or any other single value besides 0 or -1, then the swap will happen and the code won't catch the [0,0] or [-1,-1] now in the rhs.

I added some extra asserts and a comment about this.

> @@ +513,5 @@
> > +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> > +    // shifting an int32_t by 32.
> > +    if (lhs->lower_ == lhs->upper_) {
> > +        if (lhs->lower_ == 0)
> > +            return new Range(*rhs);
> 
> I assume there's some requirement that we have a new Range here, and that we
> couldn't just return rhs directly?

Yes; all these functions have const Range * arguments and they have plain Range * as their return type. This could probably be improved, though I'd prefer to leave that for another patch.

> @@ +535,5 @@
> > +        // other to one.
> > +        int lhsLeadingZeros = js_bitscan_clz32(lhs->upper_);
> > +        int rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> > +        upper = Min(rhs->upper_ | (UINT32_MAX >> lhsLeadingZeros),
> > +                    lhs->upper_ | (UINT32_MAX >> rhsLeadingZeros));
> 
> Why not this, as it was before:
> 
>   upper = UINT32_MAX >> Min(lhsLeadingZeroes, rhsLeadingZeroes);
> 
> What ranges are handled more optimally by this than by the previous code? 
> Please provide examples; I think they come to the same thing, myself.

Another way of wording the logic is: for each side, take the upper value and then or in all the possible one bits it could get from the other side. We're in the both-non-negative case here, so the result can't be greater than this. Doing this for both sides gives us two values that the result can't be greater than, so we get to take the minimum, which is advantageous.

For example, when lhs and rhs are [4,16] and [2,6], the old code returns [0,31], while the new code returns [0,23].

> @@ +543,5 @@
> > +        // To compute the upper value, take the negation of each operand's lower
> > +        // value and set all bits that don't correspond to leading one bits in
> > +        // the other to one.
> > +        int lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> > +        int rhsLeadingOnes = js_bitscan_clz32(~rhs->lower_);
> 
> unsigned here too

Ok.

> @@ +545,5 @@
> > +        // the other to one.
> > +        int lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> > +        int rhsLeadingOnes = js_bitscan_clz32(~rhs->lower_);
> > +        upper = Min(~rhs->lower_ | (UINT32_MAX >> lhsLeadingOnes),
> > +                    ~lhs->lower_ | (UINT32_MAX >> rhsLeadingOnes));
> 
> Again, why not as before:
> 
>   upper = UINT32_MAX >> Min(lhsLeadingOnes, rhsLeadingOnes);
> 
> More examples wanted.

This is similar to the previous code, but inverted since we're in the both-negative case here.

For example, when lhs and rhs are [-17,-5] and [-7,-3], the old code returns [0,31], while the new code returns [0,23].

> @@ +550,4 @@
> >          lower = 0;
> > +    } else {
> > +        if (lhs->lower_ >= 0 && rhs->upper_ < 0)
> > +            mozilla::Swap(lhs, rhs);
> 
> Add a |using mozilla::Swap| at the top of the file, then just use Swap
> directly everywhere.

Ok.

> @@ +551,5 @@
> > +    } else {
> > +        if (lhs->lower_ >= 0 && rhs->upper_ < 0)
> > +            mozilla::Swap(lhs, rhs);
> > +        if (lhs->upper_ < 0 && rhs->lower_ >= 0) {
> > +            // Lhs is negative and rhs is non-negative. The result be negative.
> 
> "The sign bits differ, so the result's sign bit will make the result
> negative."

How about "The sign bits differ, so the result will have its sign bit set, making it be negative." ?

> @@ +561,5 @@
> > +            int rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> > +            int32_t lhsMask = ~int32_t(UINT32_MAX >> lhsLeadingOnes);
> > +            int32_t rhsMask = ~int32_t(UINT32_MAX >> rhsLeadingZeros);
> > +            lower = Max((rhs->upper_ ^ lhsMask) & lhsMask,
> > +                        lhs->lower_ & rhsMask);
> 
> Again I'm not sure why
> 
>   lower = ~int32_t(UINT32_MAX >> Min(lhsLeadingOnes, rhsLeadingZeroes));
> 
> as was done before is incorrect, or if it wasn't, what the extra complexity
> here buys us.  (More examples!)

This is the different-signs version of the above code. rhs is non-negative, so the bits where lhs has leading ones will simply be inverted, and the rest of the bits are conservatively zeroed, since we're looking for the lowest possible result value. For lhs, it's negative, so we conservatively zero out the bits where rhs doesn't have leading zeros. Both of those give us a bound, so we get to take the Max, which is advantageous.

For example, when lhs and rhs are [4,16] and [-7,-3], the old code returns [-32,-1], while the new code returns [-24,-1].
Attached patch updated per review feedback (obsolete) (deleted) — Splinter Review
Attachment #774111 - Attachment is obsolete: true
Attachment #774485 - Flags: review?(jwalden+bmo)
Comment on attachment 774485 [details] [diff] [review]
updated per review feedback

Review of attachment 774485 [details] [diff] [review]:
-----------------------------------------------------------------

(In reply to Dan Gohman [:sunfish] from comment #8)
> > @@ +470,5 @@
> > > +    // from shifting an int32_t by 32.
> > > +    if (lhs->lower_ == lhs->upper_) {
> > > +        int32_t a = rhs->lower_ | lhs->lower_;
> > > +        int32_t b = rhs->upper_ | lhs->lower_;
> > > +        return new Range(Min(a, b), Max(a, b));
> > 
> > As noted in the previous comment, I still don't follow the math here, and
> > why this can't be
> > 
> >   return new Range(rhs->lower_ | lhs->lower_, rhs->upper_ | lhs->lower_);
> 
> For example, when lhs is [2,2] and rhs is [1,2], that would return [3,2],
> where the lower value is greater than the upper value. The Min and Max
> effectively swap the bounds in this case to fix that.

Ah!  I think I was trying a bit hard in applying it in the all-numbers case, when the original was just negative numbers.  But it turns out there was an issue in the negative-number case, too (using 4-bit numbers for sanity :-) ):

  [1100, 1100] and [1001, 1100]  (or [-4,-4] and [-7,-4])
  producing
  [1101, 1100]

In the negatives-only case we could have done |lower |= Min(lhs->lower_, rhs->lower_)|.  But now that we're handling all numbers here, that simplification doesn't work, because positives follow different rules from negatives in these respects.

Gah, what a mess.

> This xor code is a little different than the or code. It doesn't handle all
> possible values up front, just 0 and -1. If lhs is [0,0] or [-1,-1] and rhs
> is [2,2] or any other single value besides 0 or -1, then the swap will
> happen and the code won't catch the [0,0] or [-1,-1] now in the rhs.

Ah, right.  It's a good thing *someone* is thinking through this stuff!

> This could probably be improved, though I'd prefer to leave that for
> another patch.

Most definitely.

> Another way of wording the logic is: for each side, take the upper value and
> then or in all the possible one bits it could get from the other side. We're
> in the both-non-negative case here, so the result can't be greater than
> this. Doing this for both sides gives us two values that the result can't be
> greater than, so we get to take the minimum, which is advantageous.
> 
> For example, when lhs and rhs are [4,16] and [2,6], the old code returns
> [0,31], while the new code returns [0,23].

Yeesh.  I get it now.  Thanks for the explanation!

> This is similar to the previous code, but inverted since we're in the
> both-negative case here.
> 
> For example, when lhs and rhs are [-17,-5] and [-7,-3], the old code returns
> [0,31], while the new code returns [0,23].

Yeesh**2.

> How about "The sign bits differ, so the result will have its sign bit set,
> making it be negative." ?

Fine by me.  I was aiming for maximal compactness, but only because I assumed you'd want the same.  :-)  Clarity is good.

> This is the different-signs version of the above code. rhs is non-negative,
> so the bits where lhs has leading ones will simply be inverted, and the rest
> of the bits are conservatively zeroed, since we're looking for the lowest
> possible result value. For lhs, it's negative, so we conservatively zero out
> the bits where rhs doesn't have leading zeros. Both of those give us a
> bound, so we get to take the Max, which is advantageous.
> 
> For example, when lhs and rhs are [4,16] and [-7,-3], the old code returns
> [-32,-1], while the new code returns [-24,-1].

Yeesh**3.

I hope a lot of crypto code gets sped up by all this complicated xor range analysis!  Er, wait... ;-)
Attachment #774485 - Flags: review?(nicolas.b.pierron)
Attachment #774485 - Flags: review?(jwalden+bmo)
Attachment #774485 - Flags: review+
Comment on attachment 774485 [details] [diff] [review]
updated per review feedback

Review of attachment 774485 [details] [diff] [review]:
-----------------------------------------------------------------

Ok, we are getting closer.  There is still one error on the Range::or_ case, and I think we can factor the Range::xor_ case to make it easier to understand.

::: js/src/ion/RangeAnalysis.cpp
@@ +475,5 @@
> +        int32_t a = rhs->lower_ | lhs->lower_;
> +        int32_t b = rhs->upper_ | lhs->lower_;
> +        // Use Min and Max, to effectively swap the operands in the case
> +        // where a > b after the ors.
> +        return new Range(Min(a, b), Max(a, b));

I do not think this is true.  Here is an opposite example:

  lhs = {0b0110}
  rhs = [0b0010 .. 0b0100] (= {0b0010, 0b0011, 0b0100})

  result = [0b0110 .. 0b0111]

because the max would be reached with (0b0110 or 0b0011) == 0b0111.

@@ +483,5 @@
> +    // operand is 0. We rely on the code above to protect it.
> +    JS_ASSERT(lhs->lower_ != 0 || lhs->upper_ != 0);
> +    JS_ASSERT(rhs->lower_ != 0 || rhs->upper_ != 0);
> +    JS_ASSERT(lhs->lower_ != -1 || lhs->upper_ != -1);
> +    JS_ASSERT(rhs->lower_ != -1 || rhs->upper_ != -1);

Just by reading this assertions, I am not sure how this code implies that the code below is secure. I would have written:

  JS_ASSERT_IF(lhs->lower_ >= 0, lhs->upper_ != 0);
  JS_ASSERT_IF(rhs->lower_ >= 0, rhs->upper_ != 0);
  JS_ASSERT_IF(lhs->upper_ < 0, lhs->lower != -1);
  JS_ASSERT_IF(rhs->upper_ < 0, rhs->lower != -1);

Which, I think, clearly highlight how this protect the inputs of js_bitscan_clz32 for being called with 0, and makes easier to understand how the previous code protects against it.

@@ +519,5 @@
> +    // can compute a fully precise result. Handling these up front also protects
> +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> +    // shifting an int32_t by 32.
> +    if (lhs->lower_ == lhs->upper_) {
> +        if (lhs->lower_ == 0)

Or you can just swap the operands as you did in Range::or_.  Which would avoid code duplication.

@@ +536,5 @@
> +    // operand is 0. We rely on the code above to protect it.
> +    JS_ASSERT(lhs->lower_ != 0 || lhs->upper_ != 0);
> +    JS_ASSERT(rhs->lower_ != 0 || rhs->upper_ != 0);
> +    JS_ASSERT(lhs->lower_ != -1 || lhs->upper_ != -1);
> +    JS_ASSERT(rhs->lower_ != -1 || rhs->upper_ != -1);

Same thing as with Range::or_.  Make the assertion such as it is easier to see the link with js_bitscan_clz32 and the previous conditions.

@@ +542,5 @@
>      int64_t lower = INT32_MIN;
>      int64_t upper = INT32_MAX;
>  
>      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> +        // Both operands are non-negative. The result will be non-negative.

move "lower = 0;" just below this comment.

@@ +545,5 @@
>      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> +        // Both operands are non-negative. The result will be non-negative.
> +        // To compute the upper value, take each operand's upper value and
> +        // set all bits that don't correspond to leading zero bits in the
> +        // other to one.

I suggest to use the following comment, which partially explain why we have a Min function involved:

All leading zeros of the smallest upper value are keeping the bits identical in the other range.  The non-leading zero bits are approximated to the smallest bit-mask including all bits of the range having the smallest upper value. 

And I think, after having written this comment, that it would make sense to lift the comparisons of the 2 upper value.

int32_t minUpper = lhs->upper_;
int32_t maxUpper = rhs->upper_;
if (minUpper > maxUpper)
  Swap(minUpper, maxUpper);

minLeadingZeros = js_bitscan_clz32(minUpper);
upper = maxUpper | (UINT32_MAX >> minLeadingZeros);

@@ +555,1 @@
>      } else if (lhs->upper_ < 0 && rhs->upper_ < 0) {

Add a comment to explain that:
  A xor B == (not A) xor (not B)
  not [l .. u] == [(not u) .. (not l)]

And factor this case with the previous one.

@@ +574,5 @@
> +            unsigned lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> +            unsigned rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> +            int32_t lhsMask = ~int32_t(UINT32_MAX >> lhsLeadingOnes);
> +            int32_t rhsMask = ~int32_t(UINT32_MAX >> rhsLeadingZeros);
> +            lower = Max((rhs->upper_ ^ lhsMask) & lhsMask,

I think it would be better to fallback on the first case by explaining the following relation on sets:
  A xor B == not (A xor (not B))
  not [l .. u] == [(not u) .. (not l)]
Attachment #774485 - Flags: review?(nicolas.b.pierron)
Comment on attachment 774485 [details] [diff] [review]
updated per review feedback

Review of attachment 774485 [details] [diff] [review]:
-----------------------------------------------------------------

Set review flag to r- instead of cancelling the review, as there is another reviewer involved.
Attachment #774485 - Flags: review-
(In reply to Nicolas B. Pierron [:nbp] from comment #11)
> Comment on attachment 774485 [details] [diff] [review]
> updated per review feedback
> 
> Review of attachment 774485 [details] [diff] [review]:
> -----------------------------------------------------------------
> 
> Ok, we are getting closer.  There is still one error on the Range::or_ case,
> and I think we can factor the Range::xor_ case to make it easier to
> understand.
> 
> ::: js/src/ion/RangeAnalysis.cpp
> @@ +475,5 @@
> > +        int32_t a = rhs->lower_ | lhs->lower_;
> > +        int32_t b = rhs->upper_ | lhs->lower_;
> > +        // Use Min and Max, to effectively swap the operands in the case
> > +        // where a > b after the ors.
> > +        return new Range(Min(a, b), Max(a, b));
> 
> I do not think this is true.  Here is an opposite example:
> 
>   lhs = {0b0110}
>   rhs = [0b0010 .. 0b0100] (= {0b0010, 0b0011, 0b0100})
> 
>   result = [0b0110 .. 0b0111]
> 
> because the max would be reached with (0b0110 or 0b0011) == 0b0111.

Ah, right.

This part is getting beyond the original problem I set out to solve, so I'll change this to just handle [0,0] and [-1,-1], which are the two cases that are really needed here.

> 
> @@ +483,5 @@
> > +    // operand is 0. We rely on the code above to protect it.
> > +    JS_ASSERT(lhs->lower_ != 0 || lhs->upper_ != 0);
> > +    JS_ASSERT(rhs->lower_ != 0 || rhs->upper_ != 0);
> > +    JS_ASSERT(lhs->lower_ != -1 || lhs->upper_ != -1);
> > +    JS_ASSERT(rhs->lower_ != -1 || rhs->upper_ != -1);
> 
> Just by reading this assertions, I am not sure how this code implies that
> the code below is secure. I would have written:
> 
>   JS_ASSERT_IF(lhs->lower_ >= 0, lhs->upper_ != 0);
>   JS_ASSERT_IF(rhs->lower_ >= 0, rhs->upper_ != 0);
>   JS_ASSERT_IF(lhs->upper_ < 0, lhs->lower != -1);
>   JS_ASSERT_IF(rhs->upper_ < 0, rhs->lower != -1);
> 
> Which, I think, clearly highlight how this protect the inputs of
> js_bitscan_clz32 for being called with 0, and makes easier to understand how
> the previous code protects against it.

Ok.

> @@ +519,5 @@
> > +    // can compute a fully precise result. Handling these up front also protects
> > +    // the code below from calling js_bitscan_clz32 with a zero operand or from
> > +    // shifting an int32_t by 32.
> > +    if (lhs->lower_ == lhs->upper_) {
> > +        if (lhs->lower_ == 0)
> 
> Or you can just swap the operands as you did in Range::or_.  Which would
> avoid code duplication.

This code needs to handle [0,0] and [-1,-1] in either operand, even when the other operand is also a single-value range. I think this code is easier to understand than code using a swap if it has to check (rhs is single-value && (lhs is not single-value || (lhs isn't [0,0] && lhs isn't [1,1]))).

> @@ +536,5 @@
> > +    // operand is 0. We rely on the code above to protect it.
> > +    JS_ASSERT(lhs->lower_ != 0 || lhs->upper_ != 0);
> > +    JS_ASSERT(rhs->lower_ != 0 || rhs->upper_ != 0);
> > +    JS_ASSERT(lhs->lower_ != -1 || lhs->upper_ != -1);
> > +    JS_ASSERT(rhs->lower_ != -1 || rhs->upper_ != -1);
> 
> Same thing as with Range::or_.  Make the assertion such as it is easier to
> see the link with js_bitscan_clz32 and the previous conditions.

Ok.

> @@ +542,5 @@
> >      int64_t lower = INT32_MIN;
> >      int64_t upper = INT32_MAX;
> >  
> >      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> > +        // Both operands are non-negative. The result will be non-negative.
> 
> move "lower = 0;" just below this comment.

Ok.

> @@ +545,5 @@
> >      if (lhs->lower_ >= 0 && rhs->lower_ >= 0) {
> > +        // Both operands are non-negative. The result will be non-negative.
> > +        // To compute the upper value, take each operand's upper value and
> > +        // set all bits that don't correspond to leading zero bits in the
> > +        // other to one.
> 
> I suggest to use the following comment, which partially explain why we have
> a Min function involved:
> 
> All leading zeros of the smallest upper value are keeping the bits identical
> in the other range.  The non-leading zero bits are approximated to the
> smallest bit-mask including all bits of the range having the smallest upper
> value. 
> 
> And I think, after having written this comment, that it would make sense to
> lift the comparisons of the 2 upper value.
> 
> int32_t minUpper = lhs->upper_;
> int32_t maxUpper = rhs->upper_;
> if (minUpper > maxUpper)
>   Swap(minUpper, maxUpper);

That's not quite what this code is doing. It's looking at each upper value separately. For each side, this gives an upper bound for the overall result. With two upper bounds, a Min picks the best one. I'll add a comment about this.

> minLeadingZeros = js_bitscan_clz32(minUpper);
> upper = maxUpper | (UINT32_MAX >> minLeadingZeros);
> 
> @@ +555,1 @@
> >      } else if (lhs->upper_ < 0 && rhs->upper_ < 0) {
> 
> Add a comment to explain that:
>   A xor B == (not A) xor (not B)
>   not [l .. u] == [(not u) .. (not l)]
> 
> And factor this case with the previous one.

See below.

> @@ +574,5 @@
> > +            unsigned lhsLeadingOnes = js_bitscan_clz32(~lhs->lower_);
> > +            unsigned rhsLeadingZeros = js_bitscan_clz32(rhs->upper_);
> > +            int32_t lhsMask = ~int32_t(UINT32_MAX >> lhsLeadingOnes);
> > +            int32_t rhsMask = ~int32_t(UINT32_MAX >> rhsLeadingZeros);
> > +            lower = Max((rhs->upper_ ^ lhsMask) & lhsMask,
> 
> I think it would be better to fallback on the first case by explaining the
> following relation on sets:
>   A xor B == not (A xor (not B))
>   not [l .. u] == [(not u) .. (not l)]

That's a great idea. The xor code is significantly less complex this way.
Attachment #774485 - Attachment is obsolete: true
Attachment #776765 - Flags: review?(nicolas.b.pierron)
Comment on attachment 776765 [details] [diff] [review]
an updated patch, following review feedback

Review of attachment 776765 [details] [diff] [review]:
-----------------------------------------------------------------

Awesome :)
Attachment #776765 - Flags: review?(nicolas.b.pierron) → review+
Comment on attachment 776765 [details] [diff] [review]
an updated patch, following review feedback

Review of attachment 776765 [details] [diff] [review]:
-----------------------------------------------------------------

Looks good to me too, with a slight nitpick.  Land it!  :-)

::: js/src/ion/RangeAnalysis.cpp
@@ +562,5 @@
> +        // result, so we can take the minimum between the two.
> +        unsigned lhsLeadingZeros = js_bitscan_clz32(lhsUpper);
> +        unsigned rhsLeadingZeros = js_bitscan_clz32(rhsUpper);
> +        upper = Min(rhsUpper | (UINT32_MAX >> lhsLeadingZeros),
> +                    lhsUpper | (UINT32_MAX >> rhsLeadingZeros));

Please add a |int32_t| in front of the two (UINT32_MAX >> ...) here.  Integral promotion makes the bitwise-or operation produce uint32_t, meaning the Min will produce uint32_t, which will trigger -Wconversion warnings assigning to an int32_t.  (We don't have -Wconversion enabled now, but I'd like to reenable it eventually.)
Attachment #776765 - Flags: review+
awfy appears to show that this regressed non-odin asm-primes by 13.5% and non-odin asm-skinning by 3.6%.
That's on 32-bit; on 64-bit, non-odin asm-primes got 5.9% better, and non-odin asm-skinning got 2.9% better. My initial guess is that neither of these are actually meaningful.

We recently saw asm-primes be highly sensitive to an as-yet unidentified micro-architectural quirk, with odin (bug 885186), so it wouldn't be too surprising to see something similar with non-odin. I'm working on some scripts to look at hardware counters, so hopefully I'll be able to put a name on this phenomenon and recognize it when it comes up.
Yeah, I figured as much, just wanted to comment here to make sure it was looked at.

Hardware counters stuff sounds interesting.
https://hg.mozilla.org/mozilla-central/rev/aeff293dadd3
Status: NEW → RESOLVED
Closed: 11 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla25
Hardware counters seem to say that primes is falling out of the Decoded ICache a lot, and it looks related to the density of branches. It turns out that ModI is a very branchy place. It checks for mod-by-zero, it checks for mod-of-negative, it checks for mod-by-power-of-two, it checks for INT32_MIN%-1 (two branches), it checks for a -0 result, and it has branches to step around all the special cases. In a tight loop which has even more branches, this all appears to do bad things to the CPU.

I experimented with moving some of ModI's branches into OutOfLineCode regions, though I don't actually get much speedup on primes unless I also disable the power-of-two optimization (bug 864400), and then performance jumps up by about 25%.
Bug 898461 has my patch to move some of ModI's branches out of line.
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: