948321 - Differential Testing: Different division results on x86 platforms

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Description

•

11 years ago

+++ This bug was initially created as a clone of Bug #946969 +++ function f() { return Math.sin((Math.asinh(Math.PI) / Number.MAX_VALUE)) } x = []; x.push(f()) x.push(f()) print(x) $ ./js-opt-32-dm-ts-linux-75c0c92d7fa4 w1529-cj-in.js 1.035936393812552e-308,1.035936393812552e-308 $ ./js-opt-32-dm-ts-linux-75c0c92d7fa4 --ion-eager w1529-cj-in.js 1.035936393812552e-308,1.0359363938125513e-308 Tested on js 32-bit opt threadsafe shell on m-c changeset 75c0c92d7fa4 on Ubuntu Linux. My configure flags are: CC="gcc -m32" AR=ar PKG_CONFIG_LIBDIR=/usr/lib/pkgconfig CXX="g++ -m32" sh ./configure --target=i686-pc-linux --enable-optimize --disable-debug --enable-profiling --enable-gczeal --enable-debug-symbols --enable-methodjit --enable-type-inference --disable-tests --enable-more-deterministic --with-ccache --enable-threadsafe <other NSPR options> autoBisect shows this is probably related to the following changeset: The first bad revision is: changeset: http://hg.mozilla.org/mozilla-central/rev/348b2ba27515 user: David Caabeiro date: Mon Jul 15 10:03:14 2013 -0500 summary: Bug 717379, part 1 - Implement the new ES6 math functions. r=jorendorff. This may have been a bug from whenever ES6 math functions landed in bug 717379. Setting needinfo? from jandem, as this may be a JIT issue.

Flags: needinfo?(jdemooij)

Jan de Mooij [:jandem]

Comment 1

•

11 years ago

Gary, are you still seeing this? Do you also see this on OS X or only Linux?

Flags: needinfo?(jdemooij) → needinfo?(gary)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 2

•

11 years ago

Can reproduce with --ion-parallel-compile=off --ion-eager on 32 bits builds, on Linux at least.

Flags: needinfo?(gary)

Benjamin Bouvier [:bbouvier] (inactive)

Comment 3

•

11 years ago

Simpler test case: function f() { var x = Math.asinh(Math.PI); var y = Number.MAX_VALUE; var z = x / y; return z; } assertEq(f(), f()) So I have investigated a little bit and while the generated JIT code uses divsd, the compiled NumberDiv uses the fdiv instruction (specialized division for the FPU). Maybe we could try to use the fdiv instruction on x86 if it's available (SSE3?), otherwise there will be some small errors like this for sure. FWIW, there was apparently a bug with the FDIV instruction at some point, see also [0]. Maybe we should ask some Intel engineers if that's expected or if that's a bug? [0] https://en.wikipedia.org/wiki/Pentium_FDIV_bug

Jan de Mooij [:jandem]

Comment 4

•

11 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #3) > So I have investigated a little bit and while the generated JIT code uses > divsd, the compiled NumberDiv uses the fdiv instruction (specialized > division for the FPU). Maybe we could try to use the fdiv instruction on x86 > if it's available (SSE3?), otherwise there will be some small errors like > this for sure. fdiv is part of the x87 instruction set that's much older than SSE2/divsd. x87 has its own stack and is a pain to use compared to SSE; we try to avoid it as much as possible. We do use it for the fisttp instruction for instance because there's no SSE equivalent on x86. Unfortunately, fdiv uses 80-bit double-extended precision and divsd/SSE2 uses 64-bit double precision. That's probably causing the difference here. This sucks because it's not very easy to work around I'm afraid. For differential testing we could try to use a compiler switch so that it's free to emit SSE2+ instructions; maybe that will keep it from emitting fdiv, or maybe it won't.. Is this also a problem with 64-bit builds?

Benjamin Bouvier [:bbouvier] (inactive)

Comment 5

•

11 years ago

(In reply to Jan de Mooij [:jandem] from comment #4) > Unfortunately, fdiv uses 80-bit double-extended precision and divsd/SSE2 > uses 64-bit double precision. That's probably causing the difference here. > This sucks because it's not very easy to work around I'm afraid. > > For differential testing we could try to use a compiler switch so that it's > free to emit SSE2+ instructions; maybe that will keep it from emitting fdiv, > or maybe it won't.. Is this also a problem with 64-bit builds? I can't tell for other compilers, but gcc 4.8-real on ubuntu generates the divsd instruction in NumberDiv, so the test passes.

Benjamin Bouvier [:bbouvier] (inactive)

Updated

•

11 years ago

Summary: Differential Testing: Different output message involving Math.sin and Math.asinh → Differential Testing: Different division results on x86 platforms

Benjamin Bouvier [:bbouvier] (inactive)

Comment 6

•

11 years ago

After trying locally, it didn't make a difference when using the gcc compiler flags -msse2, -mfpmath=sse or -ffast-math, the compiled NumberDiv function still contains x87 instructions. Dan, do you have any idea of any combination of compiler flags that could prevent gcc from emitting x87 instructions?

Flags: needinfo?(sunfish)

Dan Gohman [:sunfish]

Comment 7

•

11 years ago

If you use both -msse2 and -mfpmath=sse at the same time, it should use SSE2 instructions.

Flags: needinfo?(sunfish)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 8

•

11 years ago

Setting needinfo? from myself to get this added to both our GCC and Clang parameters, and I need to check if Windows shows this issue as well.

Flags: needinfo?(gary)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 9

•

11 years ago

I have some questions after thinking about this a little more: (1) Is this 32-bit only? (2) Does this affect ARM/ARM64?

Flags: needinfo?(sunfish)

Dan Gohman [:sunfish]

Comment 10

•

11 years ago

For (1), all x86-64 hardware supports SSE2, and every C++ compiler I know of uses it by default when targetting x86-64, so yes, the x87 vs sse issue is 32-bit only here. For (2), ARM doesn't have anything like 80-bit floating-point, but I believe NEON does not implement IEEE-754 subnormals [0], while VFP does is some configurations, and I don't know what soft-float implementations do. You could potentially see different results when switching from a compiler/platform that supports subnormals to one that doesn't. I don't know how big of a problem this will be for you in practice though. [0] http://en.wikipedia.org/wiki/Denormal_number

Flags: needinfo?(sunfish)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 11

•

11 years ago

Thanks Dan! I have verified that adding "-msse2 -mfpmath=sse" to GCC compiler flags fixed the issue. Mac 32-bit didn't seem to be affected (Mac uses Clang), but adding the flag didn't seem to cause any hurt. Windows 32-bit is also affected, and I was pointed to http://msdn.microsoft.com/en-us/library/7t5yh4fd%28v=vs.110%29.aspx Todos: 1. Recheck ARM on the pandaboard. 2. Fix this for Win32.

OS: Linux → All

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 12

•

11 years ago

> 1. Recheck ARM on the pandaboard. ARM is not affected, and does not need the extra "-msse2 -mfpmath=sse" flags. Final todo: fix this on Win32.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Updated

•

11 years ago

Depends on: 998577

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 13

•

11 years ago

Clearing needinfo, waiting on bug 998577.

Flags: needinfo?(gary)

Julian Seward [:jseward]

Comment 14

•

10 years ago

Could we just switch the x87 FPU into doing arithmetic at 53 bit mantissa precision (that is, to 64 bit double accuracy) instead of 64 bit mantissas (that is, 80 bit extended doubles)? That is, set %fpucw to 0x027F instead of the default 0x037F? Theoretically I think we could do that for our own JIT generated code, but in practice I suspect this would cause breakage in other libraries. So, probably a bad idea.

Benjamin Bouvier [:bbouvier] (inactive)

Comment 15

•

10 years ago

(In reply to Julian Seward from comment #14) > Theoretically I think we could do that for our own JIT generated > code, but in practice I suspect this would cause breakage in other > libraries. So, probably a bad idea. It's not a problem for JITted code, as it generates SSE2 instructions (in this case, divsd), while the interpreter (which is plain C++ and compiled by gcc) uses FPU (in this case, fdivl). Maybe we could set the control word before making arithmetic operations and resetting it afterwards (to avoid issues with other libraries), in more deterministic builds. This might have a perf cost but it wouldn't be a blocker if it's low, as it'd be only for fuzzing builds. Actually, after some digging and testing: 1) bug 496816 (that Gary also filed a few years ago!) shows the same kind of error and a patch that is setting the FPU control word so that double precision is used (see also [1]). Copying the FIX_FPU function in js::NumberDiv doesn't solve the issue. 2) in js::NumberDiv, setting the control word manually to 0x037F (i.e. use extended precision, 80 bits doubles) fixes the assertion (!). Not sure which one (divsd or fdivl) yields the right result. [1] http://dxr.mozilla.org/mozilla-central/source/js/src/jsnum.cpp#1116

Benjamin Bouvier [:bbouvier] (inactive)

Comment 16

•

10 years ago

Attached file C test case (deleted) — Details

Fwiw, in plain C: - on x64 (with or without setting the FPU flag), the result is correct. - on x86 without setting the FPU flag (i.e. having it to 0x037F in the first place), the result is correct. - on x86 with setting the FPU flag (i.e. as we do in FIX_FPU on x86 platforms), the result is wrong (by one bit, I think).

Dan Gohman [:sunfish]

Comment 17

•

10 years ago

Interesting. It reproduces for me too. You can reproduce the x86 behavior on x64 too by compiling with -mfpmath=387. I looked up the FPU flags in Intel's documentation. Setting bit 9 (and clearing bit 8) is indeed documented to select double-precision mode. However, if I change the FIX_FPU code to set both bits 8 and 9 (changing 0x2f3 to 0x3f3), that should select double-extended-precision mode, and I get the "correct" result. Is the x87's double-precision mode rounded differently than SSE2's? As an aside, FIX_FPU's comment also says "Raise bits 0-5" but the code actually does "control |= 0x2f3" which doesn't raise bits 2 or 3, but it isn't actually important because those flags are set by default these days. It also sets bits 6 and 7, which are "reserved", but that doesn't seem to be affecting anything.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 18

•

10 years ago

This still affects MSVC 2013 Update 3 even though SSE mode is now compiled by default. (as per http://msdn.microsoft.com/en-us/library/7t5yh4fd%28v=vs.120%29.aspx ) $ ./js-dbg-opt-32-dm-nsprBuild-windows-d7ec36a77534.exe --fuzzing-safe --ion-eager 948321.js 1.035936393812552e-308,1.0359363938125513e-308 $ ./js-dbg-opt-32-dm-nsprBuild-windows-d7ec36a77534.exe --fuzzing-safe 948321.js 1.035936393812552e-308,1.035936393812552e-308 Dan/Benjamin, any suggestions on how to move forward?

Flags: needinfo?(sunfish)

Flags: needinfo?(benj)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 19

•

10 years ago

Debug configure options for comment 18: MAKE=mozmake AR=ar sh c://Users//fuzz1win8//trees//mozilla-central//js//src//configure --enable-debug --enable-optimize --enable-nspr-build --enable-more-deterministic --enable-gczeal --enable-debug-symbols --disable-tests Tested on Windows 8.1, MSVC 2013 Update 3, MozillaBuild 1.10.0 and m-c rev d7ec36a77534.

Dan Gohman [:sunfish]

Comment 20

•

10 years ago

To answer my question in comment 17, with FIX_FPU(), the x87 divide *does* round differently than the SSE2 divide in this case: $ gcc -m32 f.c -msse2 -mfpmath=387 && ./a.out 0x0.772fda7c4001cp-1022 $ gcc -m32 f.c -msse2 -mfpmath=sse && ./a.out 0x0.772fda7c4001bp-1022 $ cat f.c #include <stdio.h> volatile double x = 0x1.dcbf69f10006dp+0; volatile double y = 0x1.fffffffffffffp+1023; volatile double z; int main() { short control; asm("fstcw %0" : "=m" (control) : ); control &= ~0x300; // Lower bits 8 and 9 (precision control). control |= 0x2f3; // Raise bits 0-5 (exception masks) and 9 (64-bit precision). asm("fldcw %0" : : "m" (control) ); z = x / y; printf("%a\n", z); }

Benjamin Bouvier [:bbouvier] (inactive)

Comment 21

•

10 years ago

I don't see what else we could do. Gary, did using the switch work fine for former versions of Microsoft Visual Studio? What if you force using the /arch:SSE2 compiler option? Does changing any of the compiler options indicated in [0] avoids the assertion to be triggered? [0] http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx

Flags: needinfo?(benj)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 22

•

10 years ago

(In reply to Benjamin Bouvier [:bbouvier] from comment #21) > I don't see what else we could do. Gary, did using the switch work fine for > former versions of Microsoft Visual Studio? What if you force using the > /arch:SSE2 compiler option? Does changing any of the compiler options > indicated in [0] avoids the assertion to be triggered? > > [0] http://msdn.microsoft.com/en-us/library/e7s85ffb.aspx I was unable to get the /arch:SSE2 (or other compiler) option(s) to work properly in our build system, so it never worked with MSVC 2010. I switched to MSVC 2013 since it's the default option there, and this bug still shows.

Emanuel Hoogeveen [:ehoogeveen]

Comment 23

•

10 years ago

(In reply to Gary Kwong [:gkw] [:nth10sd] GMT-4 Sep 29 - Oct 3 from comment #22) > I switched to MSVC 2013 since it's the default option there, and this bug > still shows. We explicitly use -arch:IA32 for 32-bit builds since bug 836658. So you could probably undo attachment 8454247 [details] [diff] [review] or otherwise use it for inspiration.

Emanuel Hoogeveen [:ehoogeveen]

Comment 24

•

10 years ago

Oh, and I guess attachment 8454247 [details] [diff] [review] from bug 1043108 landed on top of that to fix x64.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 25

•

10 years ago

Thanks Emanuel! This is a big clue on how to move forward... I'll look into seeing if removing those lines causes the bug to still occur, so setting a needinfo? for myself. Also, I wonder if configure.in can accept an #ifdef for --fuzzing-safe. Does anyone know?

Flags: needinfo?(gary)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Updated

•

10 years ago

Depends on: 1077074

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 26

•

10 years ago

Filed bug 1077074 to get a --enable-sse2 configure option, especially since I already pass in additional options for Mac and Linux (-msse2 -mfpmath=sse)

Dan Gohman [:sunfish]

Comment 27

•

10 years ago

I don't think I know anything else that's in question here now.

Flags: needinfo?(sunfish)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 28

•

10 years ago

Sadly, undo-ing the patches in bug 836658 and bug 1043108 didn't seem to cause the issue to go away. I guess I'll wait for bug 1077074 to land and see if it helps.

Flags: needinfo?(gary)

Mark Straver

Comment 29

•

10 years ago

Just my $0.02 here, but ES is only required to provide an approximation of mathematical functions. If you really need 80-bit precision, then you should consider either using a library written in JS that provides more accuracy, or not rely on a scripting language at all. The spec says: "For other argument values, these functions are intended to compute approximations to the results of familiar mathematical functions, but some latitude is allowed in the choice of approximation algorithms. The general intent is that an implementer should be able to use the same mathematical library for ECMAScript on a given hardware platform that is available to C programmers on that platform." Since what is implemented is what is available to C programmers, what is currently present should suffice, even if there are small accuracy differences between generated SIMD and x87 code by the compiler in use. MSVC also makes no guarantee about which path will be chosen by the compiler. even with /arch:SSE2, there is no guarantee that the compiler will always use SSE2 instructions, since the compiler decides at compile-time what to use -- which, in the case of PGO especially, will simply be unknown.

Jesse Ruderman

Comment 30

•

10 years ago

Leaving precision up to the whims of PGO and JIT compilation makes me sad. Would it make sense to replace the C div in js::NumberDiv with asm for the exact division instruction we want?

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 31

•

10 years ago

:sunfish requests me to needinfo? him this time. We have some sort of a likely solution on how to proceed.

Flags: needinfo?(sunfish)

Dan Gohman [:sunfish]

Comment 32

•

10 years ago

Attached patch deterministic-div.patch (deleted) — Details — Splinter Review

Here's a patch implementing my idea. When SSE2 is available but not necessarily used for all math, use inline assembly to explicitly perform an SSE2 divide. Under JS_MORE_DETERMINISTIC only. I've not extensively tested it, but in theory this ought to work.

Flags: needinfo?(sunfish)

Mark Straver

Comment 33

•

10 years ago

(In reply to Dan Gohman [:sunfish] from comment #32) > Created attachment 8560580 [details] [diff] [review] > deterministic-div.patch Since I'm interested in this bug I had a look at the patch, and something stood out to me: 76 # else 77 # error "deterministic divide implemented yet" 78 # endif 79 return a; If you're neither having GNUC nor MSVC, you hit this error and looks like you return the UNdivided result (just a) instead of a / b.

Dan Gohman [:sunfish]

Comment 34

•

10 years ago

(In reply to Mark Straver from comment #33) > 76 # else > 77 # error "deterministic divide implemented yet" > 78 # endif > 79 return a; > > If you're neither having GNUC nor MSVC, you hit this error and looks like > you return the UNdivided result (just a) instead of a / b. #error means you get a build error, so it never gets far enough to be wrong. I did miss a "not" in the error string though -- I've fixed that in my local copy.

Mark Straver

Comment 35

•

10 years ago

Right, so, what if I wanted to build with, say, the Intel compiler?

Jeff Walden [:Waldo]

Comment 36

•

10 years ago

1. Patches welcome. :-) 2. It's not actualy an issue. Many compilers (including clang and icc) competing against gcc, also define __GNUC__ and attempt to support most of the things gcc supports. Assembly as here is definitely part of that.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 37

•

10 years ago

Comment on attachment 8560580 [details] [diff] [review] deterministic-div.patch Putting this on my plate to test.

Attachment #8560580 - Flags: feedback?(gary)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 38

•

10 years ago

Attached file compile failure (deleted) — Details

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 39

•

10 years ago

Comment on attachment 8560580 [details] [diff] [review] deterministic-div.patch See compile failure log.

Attachment #8560580 - Flags: feedback?(gary) → feedback-

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Updated

•

10 years ago

Flags: needinfo?(sunfish)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 40

•

10 years ago

This is probably the issue: c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(69) : error C2039: 'supports_sse2' : is not a member of 'mozilla' c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(69) : error C3861: 'supports_sse2': identifier not found c:\users\mozillaadmin\trees\mozilla-central\js\src\jslibmath.h(74) : error C2415: improper operand type

Dan Gohman [:sunfish]

Comment 41

•

10 years ago

Oops, the first issue needs a #include "mozilla/SSE.h". The second issue is that compilers apparently don't permit inline asm to require xmm registers when SSE is disabled. We could theoretically hack around this restriction by passing the values through memory, and using an xmm register covertly, with something like this in GNU-style inline asm: double temp[2]; asm("movups %%xmm0, %0;" "movsd %1, %%xmm0;" "divsd %2, %%xmm0;" "movsd %%xmm0, %1;" "movups %0, %%xmm0" : "=m"(temp), "+m"(a) : "m"(b)); Does anyone know how to do something like this in Microsoft-style inline asm?

Flags: needinfo?(sunfish)

Mark Straver

Comment 42

•

10 years ago

Sorry if I don't understand properly, but if you know the significant width of the result from an SSE instruction, and you know if it either rounds or truncs the result (which I think is likely the culprit here), can't you just make sure the result is treated the same way? So: even if x87 instructions (double-extended) are used, round or trunc the result accordingly to double, to match what would be the result from SSE division, instead of trying to force SSE manipulation of the operation?

Dan Gohman [:sunfish]

Comment 43

•

10 years ago

x87 and sse are both set to round-to-nearest, so in theory we're already doing that. And they do get the same results in most cases; it's just these extreme subnormal cases like in comment #20 where we're seeing a difference.

Jesse Ruderman

Comment 44

•

9 years ago

jsfunfuzz works around this by not using compareJIT on Windows if the testcase looks like it uses division: https://github.com/MozillaSecurity/funfuzz/blob/6b21e57589c1e1236fa4f0a24a096d45077932f7/js/loopjsfunfuzz.py#L234

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 45

•

9 years ago

This still happens with Visual Studio 2015 with default configure parameters (e.g. --enable-debug --enable-more-deterministic) on 32-bit js shell builds.

Lars T Hansen [:lth]

Updated

•

8 years ago

Priority: -- → P5

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 46

•

7 years ago

With Visual Studio 2017, this seems to be resolved: autoBisect shows this is probably related to the following changeset: The first good revision is: changeset: https://hg.mozilla.org/mozilla-central/rev/21edf7e4f6ad user: Jan de Mooij date: Thu Sep 14 10:03:09 2017 +0200 summary: Bug 1399471 - Add a mechanism to check callWithABI invariants in debug builds. r=nbp Jan, is bug 1399471 a likely fix?

Flags: needinfo?(jdemooij)

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 47

•

7 years ago

fwiw, this fix was found while testing with 32-bit deterministic builds on Windows 10 compiled with MSVC 2017. Tested with the testcase in comment 3.

Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now)

Reporter

Comment 48

•

7 years ago

Jan and I spoke f2f at Austin and while it seems that his patch is not likely to be the fix, SSE-only situations are on their way out anyway, and I don't see the bug anymore with the testcases above anymore. Thus, resolving WORKSFORME and we can file issues as needed where necessary in the future.

Status: NEW → RESOLVED

Closed: 7 years ago

Flags: needinfo?(jdemooij)

Resolution: --- → WORKSFORME

C test case 10 years ago Benjamin Bouvier [:bbouvier] (inactive) (deleted), text/x-c++src		Details
deterministic-div.patch 10 years ago Dan Gohman [:sunfish] (deleted), patch	gkw : feedback-	Details \| Diff \| Splinter Review
compile failure 10 years ago Gary Kwong [:gkw] [:nth10sd] (NOT official MoCo now) (deleted), text/plain		Details