1729459 - Floating-Point Normalization breaks build on 32bit Linux

sanketh

Reporter

Description

•

3 years ago

See crash log at https://bugzilla.mozilla.org/show_bug.cgi?id=531915#c44

sanketh

Reporter

Comment 1

•

3 years ago

(In reply to Tooru Fujisawa [:arai] from comment #48 on Bug 531915)

given that what we want to do there is just use double, regardless of system's definition (double or long double),
I think we'd better workaround the conflict by not using the double_t or __double_t identifiers.

We can replace double_t and __double_t to some other names, or maybe just use double itself.

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

On Windows, a double is always 64bit so we don't have an issue. However, that seems to be not true on the specified linux as per the crashlog because the math library in glibc is defining double_t to be long double.

We could check if double_t is defined and use that but, from a quick google search, that doesn't seem easy to do in cpp.

Flags: needinfo?(tom)

Flags: needinfo?(arai.unmht)

Tom Ritter [:tjr] (Back 9/5)

Comment 2

•

3 years ago

AFAICT this isn't a runtime crash it's a build break

Summary: Floating-Point Normalization Crashes on 32bit Linux → Floating-Point Normalization breaks build on 32bit Linux

Tom Ritter [:tjr] (Back 9/5)

Comment 3

•

3 years ago

So just to confirm, this didn't break all Firefox x86 builds; we're still green on mozilla-central. Those builds are called 'Linux' but don't have an explicit x86 or 32-bit id in their name. So it's only affecting a subset of Linux builds.

Tooru Fujisawa [:arai]

Comment 4

•

3 years ago

Just to make sure, does sizeof(double) == 4 happen also on C++ ?
the original code is C, but we're compiling it as C++.

Flags: needinfo?(arai.unmht) → needinfo?(me)

sanketh

Reporter

Comment 5

•

3 years ago

One potential solution adapted from your original idea of just using the system supplied double, that sidesteps this issue, is to not define double_t for 32-bit Linux and only define it for other platforms. Since glibc is guaranteed to define double_t (cf. these lines in math.h) it should, in theory, work (and it shouldn't break any existing behaviour on other platforms).

I am thinking something like

#if defined(__linux__) && defined(__i386__)
// do nothing, rely on glibc's double_t
#else
// keep the original code
#endif

Flags: needinfo?(me)

Mike Hommey [:glandium]

Comment 6

•

3 years ago

(In reply to Tom Ritter [:tjr] (ni? for response to CVE/sec-approval/advisories/etc) from comment #3)

So just to confirm, this didn't break all Firefox x86 builds; we're still green on mozilla-central.

mozilla-central builds using an old glibc.

André Bargull [:anba]

Comment 7

•

3 years ago

(In reply to sanketh from comment #1)

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

There is some misunderstanding here, double is always 64-bit, whereas long double is at least 64-bit. See also https://en.cppreference.com/w/cpp/language/types#Floating-point_types and https://en.cppreference.com/w/cpp/types/climits/FLT_EVAL_METHOD.

BugBot [:suhaib / :marco/ :calixte]

Updated

•

3 years ago

Keywords: regression

BugBot [:suhaib / :marco/ :calixte]

Comment 8

•

3 years ago

Set release status flags based on info from the regressing bug 531915

status-firefox92: --- → unaffected

status-firefox93: --- → affected

status-firefox94: --- → affected

status-firefox-esr78: --- → unaffected

status-firefox-esr91: --- → affected

sanketh

Reporter

Comment 9

•

3 years ago

(In reply to André Bargull [:anba] from comment #7)

(In reply to sanketh from comment #1)

I don't think that would resolve the issue. In math libraries, for uniformity we want to use 64bit doubles on all systems---on some systems a double might be 32bit---which we why we define a separate double_t.

There is some misunderstanding here, double is always 64-bit, whereas long double is at least 64-bit. See also https://en.cppreference.com/w/cpp/language/types#Floating-point_types and https://en.cppreference.com/w/cpp/types/climits/FLT_EVAL_METHOD.

Hmm, interesting, why glibc is doing then. 🤔

On 32-bit x86, unless explicitly overridden by someone setting __FLT_EVAL_METHOD__, it sets __GLIBC_FLT_EVAL_METHOD=2 and that setting causes double_t = long double in math.h.

I am really confused now. I don't have access to a 32-bit linux distro right now but I'd be curious to know what sizeof(double) and sizeof(long double) return. (fwiw, on 32-bit and 64-bit Windows, they both return 8, and on 64-bit linux sizeof(double) = 8 and sizeof(long double) = 16 (I got these numbers from the Compiler Explorer)).

André Bargull [:anba]

Comment 10

•

3 years ago

sizeof(double) is always 8. (We don't support any exotic platform where sizeof(double) is anything other than 8.)

sizeof(long double) with be most likely:

Windows (any platform): 8
x86: 12
x86-64: 16
ARM32: 8
ARM64: 16

Looking at our custom patches, https://searchfox.org/mozilla-central/source/modules/fdlibm/patches/20_emulate_freebsd_internal_double_types.patch looks suspicious:

Defining __double_t makes this code UB, cf. https://en.cppreference.com/w/cpp/language/identifiers#In_declarations.
Redefining double_t from <math.h> is also bad, cf. https://en.cppreference.com/w/cpp/header/cmath#Types.

Furthermore the patch links to FreeBSD, but FreeBSD actually differentiates between different platforms, whereas our code simply takes the definitions for __LP64__:

sanketh

Reporter

Comment 11

•

3 years ago

(In reply to André Bargull [:anba] from comment #10)

sizeof(double) is always 8. (We don't support any exotic platform where sizeof(double) is anything other than 8.)

sizeof(long double) with be most likely:

Windows (any platform): 8

x86: 12

x86-64: 16

ARM32: 8

ARM64: 16

Cool, thank you so much for looking into this. In that case, I guess we can continue with arai's suggestion of using double everywhere and not defining the freebsd internal types, and it would save us a lot of pain in the future.

Looking at our custom patches, https://searchfox.org/mozilla-central/source/modules/fdlibm/patches/20_emulate_freebsd_internal_double_types.patch looks suspicious:

Defining __double_t makes this code UB, cf. https://en.cppreference.com/w/cpp/language/identifiers#In_declarations.

In retrospect, that was a bad idea, at the time, I was just thinking about keeping the patch size small and it didn't hit me that we shouldn't be defining identifiers that start with double-underscore.

Redefining double_t from <math.h> is also bad, cf. https://en.cppreference.com/w/cpp/header/cmath#Types.

That's fair.

Furthermore the patch links to FreeBSD, but FreeBSD actually differentiates between different platforms, whereas our code simply takes the definitions for __LP64__:

https://github.com/freebsd/freebsd-src/blob/9bb8a4091c4f63dacda5108f4f994f7f6b35bbf1/sys/x86/include/_types.h#L58-L61

https://github.com/freebsd/freebsd-src/blob/9bb8a4091c4f63dacda5108f4f994f7f6b35bbf1/sys/x86/include/_types.h#L65-L68

https://github.com/freebsd/freebsd-src/blob/9bb8a4091c4f63dacda5108f4f994f7f6b35bbf1/sys/arm64/include/_types.h#L47-L50

https://github.com/freebsd/freebsd-src/blob/9bb8a4091c4f63dacda5108f4f994f7f6b35bbf1/sys/arm/include/_types.h#L53-L56

Originally, the patch was supposed to differentiate between platforms, but it started to get unwieldy once we got Windows into the mix (since we needed to differentiate between windows 32-bit and non-windows 32-bit), we would have a bunch of cases and in all cases except one (linux 32-bit) it was the same outcome (double_t = double), and the simple version (the current one) passed all the try tests so it seemed like it was working.

Jan Honza Odvarko [:Honza] (always need-info? me)

Updated

•

3 years ago

status-firefox93: affected → wontfix

Christoph Moench-Tegeder

Comment 12

•

3 years ago

Attached patch fix build on 32bit (at least for FreeBSD) (deleted) — Details — Splinter Review

I suggest something along these lines: that brings the double_t-related typedefs in line with FreeBSD's definitions with/out LP_64 https://cgit.freebsd.org/src/tree/sys/x86/include/_types.h#n55 and basically reverts modules/fdlibm/patches/19_remove_unneeded_round_to_integer_helpers.patch as on 32bit these helpers are not as unused as they were assumed to be.

Ryan VanderMeulen [:RyanVM]

Updated

•

3 years ago

status-firefox94: affected → wontfix

status-firefox95: --- → fix-optional

status-firefox-esr91: affected → fix-optional

acmondor

Comment 14

•

3 years ago

This bug affects 32bit (x86) builds on Gentoo as well. That is documented in https://bugs.gentoo.org/816975

I tried Christoph's patch but it would not work as is. The following new issues were introduced:

8 0:48.08 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.08 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:638:8: error: unknown type name '__float_t'; did you mean 'float_t'?
8 0:48.08 rnintf(__float_t x)
8 0:48.08 ^~~~~~~~~
8 0:48.09 float_t
8 0:48.09 /usr/include/math.h:155:21: note: 'float_t' declared here
8 0:48.09 typedef long double float_t;
8 0:48.09 ^
8 0:48.09 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.10 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:662:14: error: exponent has no digits
8 0:48.10 return (x + __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2 -
8 0:48.10 ^
8 0:48.10 /usr/include/sys/cdefs.h:109:23: note: expanded from macro '__CONCAT'
8 0:48.11 #define __CONCAT(x,y) x ## y
8 0:48.11 ^
8 0:48.11 <scratch space>:283:6: note: expanded from here
8 0:48.11 0x1.8pLDBL_MANT_DIG
8 0:48.12 ^
8 0:48.12 In file included from /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/e_acos.cpp:44:
8 0:48.12 /var/tmp/portage/www-client/firefox-93.0/work/firefox-93.0/modules/fdlibm/src/math_private.h:663:3: error: exponent has no digits
8 0:48.13 __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2);
8 0:48.13 ^
8 0:48.13 /usr/include/sys/cdefs.h:109:23: note: expanded from macro '__CONCAT'
8 0:48.13 #define __CONCAT(x,y) x ## y
8 0:48.13 ^
8 0:48.14 <scratch space>:284:6: note: expanded from here
8 0:48.14 0x1.8pLDBL_MANT_DIG
8 0:48.14 ^
8 0:48.15 3 errors generated.

I managed to modify Christoph's patch to resolve the above issues (see attachment), although my changes may not be the best.

acmondor

Comment 15

•

3 years ago

Attached patch firefox-93.0_x86_math_private.patch (deleted) — Details — Splinter Review

Working patch for 32bit (x86) builds on Gentoo. My changes very likely can be improved upon, but they do work.

Carsten Schoenert

Comment 16

•

3 years ago

This patch does not work on arm64, ppc64el and s390x and probably also not on mips64el.
I only tested a build for i386 and amd64 locally so I didn't noticed early enough the breakage on the other architectures before I did the upload.

https://buildd.debian.org/status/logs.php?pkg=thunderbird&ver=1%3A91.2.1-1

acmondor

Comment 17

•

3 years ago

Carsten, Which patch are you referring too? Christoph's patch or my modified version of that?

I looked at all the build logs for the "Maybe-Failed" builds and they all fail the same way, everyone of them is complaining about the "#define __CONCAT(x,y) ...." like what I saw for x86 on Gentoo (see above). That suggests you didn't actually use my version of the patch, or LDBL_MANT_DIG is not 64 like it is for x86 on Gentoo. Either way, it seems the macro expansion is done differently by the compiler/pre-processor on freebsd then the systems we tried to patch.

The compiler/pre-processor on freebsd seems to replace LDBL_MANT_DIG with its actual value before processing the __CONCAT() macro, whereas the systems we tried it on do not. That is, if LDBL_MANT_DIG is 64, __CONCAT(0x1.8p, LDBL_MANT_DIG) will get properly expanded to 0x1.8p64 on freebsd (or so it seems) and on the systems we used it gets expanded to 0x1.8pLDBL_MANT_DIG, which is invalid. From some test code I coubled together the behavior was the same for both gcc/g++ and clang/clang++. I was not able to resolve this macro expansion issue, that's why my version of the patch used the following ugly hack to avoid the problem, but it only works when LDBL_MANT_DIG is 64:

#if (LDBL_MANT_DIG == 64)
return (x + __CONCAT(0x1.8p, 64) / 2 -
__CONCAT(0x1.8p, 64) / 2);
#else
return (x + __CONCAT(0x1.8p, LDBL_MANT_DIG) / 2 -
__CONCAT(0x1.8p, LDBL_MANT_DIG) / 2);
#endif

Carsten Schoenert

Comment 18

•

3 years ago

I mean and used your patch (modified) patch.

I've tried to play a bit around with the pre-processor variables, but I'm lost in the pool of possible options here. I'm to stupid to dig further into this, it's to long ago I've done some C or C++ hacking.

pp

Comment 19

•

3 years ago

I'm curious... how does Mozilla themselves manage to build the x86 version available for download below?

https://ftp.mozilla.org/pub/firefox/releases/94.0.1/linux-i686/en-US/firefox-94.0.1.tar.bz2

Mike Hommey [:glandium]

Comment 20

•

3 years ago

(In reply to pp from comment #19)

I'm curious... how does Mozilla themselves manage to build the x86 version available for download below?

https://ftp.mozilla.org/pub/firefox/releases/94.0.1/linux-i686/en-US/firefox-94.0.1.tar.bz2

It's built against a relatively old glibc that either doesn't define double_t or defines it to that same type (probably the former).

acmondor

Comment 21

•

3 years ago

Attached patch firefox-93.0_x86_math_private_20211110.patch (deleted) — Details — Splinter Review

This new patch will resolve the __CONCAT() issue encountered by Carsten on Debian, which is the same as I previously encountered with x86 on Gentoo (see comments #14 and #16). The original workaround I had for that issue was specific to x86 and was ugly. That was due to a lack of understanding of the preprocessor macros and specifically the __CONCAT() macro.

It turns out that __CONCAT() is defined differently on FreeBSD and non-FreeBSD targets. On FreeBSD __CONCAT() is defined as:

#define __CONCAT1(x,y) x ## y
#define __CONCAT(x,y) __CONCAT1(x,y)

On non-FreeBSD targets it is defined as:

#define __CONCAT(x,y) x ## y

That multi-level implementation on FreeBSD matters in this case due to the hexadecimal floating-point values being created and without it the macro does not get expanded properly as we saw in the build failures.

The new patch uses a new macro to wrap __CONCAT() and ensure the result is always expanded properly.

See the following for more details:

https://gcc.gnu.org/onlinedocs/gcc-3.0.1/cpp_3.html#SEC32
https://sources.debian.org/src/glibc/2.32-3/misc/sys/cdefs.h/
https://github.com/freebsd/freebsd-src/blob/main/sys/sys/cdefs.h

Carsten Schoenert

Comment 22

•

3 years ago

I can confirm that this patch fixes at least the compilation issues I had previously for 91.2.0 on Debian, I've added the above patch to the build chain for 91.3.2 in Debian unstable and all critical platform have build successfully binary packages.

https://buildd.debian.org/status/logs.php?pkg=thunderbird&ver=1%3A91.3.2-1

Thanks for working on the patch!

Mike Hommey [:glandium]

Comment 23

•

3 years ago

So here's another funny fact: __FTL_EVAL_METHOD__ is set by the compiler. GCC sets it to 2 on x86, but clang sets it to 0 (it's actually more complicated, clang gives different values on netbsd). Which means depending on the compiler, double_t can be different. So I would actually expect that last patch to still be failing in some situations.

Mike Hommey [:glandium]

Comment 24

•

3 years ago

So I would actually expect that last patch to still be failing in some situations.

Confirmed: the patch breaks on Mozilla CI.

Mike Hommey [:glandium]

Comment 25

•

3 years ago

The following makes things build on x86 for me with both GCC and clang, but the test added in bug 531915 doesn't pass when compiling with GCC (but does when compiling with clang)

diff --git a/modules/fdlibm/src/math_private.h b/modules/fdlibm/src/math_private.h
index 51d79f9c2ec59..fafd7d6fc1e0d 100644
--- a/modules/fdlibm/src/math_private.h
+++ b/modules/fdlibm/src/math_private.h
@@ -30,7 +30,11 @@
  * Adapted from https://github.com/freebsd/freebsd-src/search?q=__double_t
  */
 
+#if defined __FLT_EVAL_METHOD__ && (__FLT_EVAL_METHOD__ == 2)
+typedef long double      __double_t;
+#else
 typedef double      __double_t;
+#endif
 typedef __double_t  double_t;
 
 /*

Mike Hommey [:glandium]

Comment 26

•

3 years ago

(In reply to Mike Hommey [:glandium] from comment #25)

The following makes things build on x86 for me with both GCC and clang, but the test added in bug 531915 doesn't pass when compiling with GCC (but does when compiling with clang)

The same applies to the patch from comment 21 adjusted with the patch from comment 25.

Matthew Gaudet (he/him) [:mgaudet]

Updated

•

3 years ago

Blocks: sm-meta

Severity: -- → S4

Priority: -- → P3

BMO Automation

Updated

•

3 years ago

Has Regression Range: --- → yes

pierrot.labastie

Comment 27

•

2 years ago

What about setting __FLT_EVAL_METHOD__ to 0 before including math.h? I'm not sure all compilers would accept it, but it would always set double_t the same type as double...
At least gcc and clang accept:

#undef __FLT_EVAL_METHOD__
#define __FLT_EVAL_METHOD__ 0
#include <math.h>

But other compilers may behave differently

André Bargull [:anba]

Updated

•

2 years ago

Duplicate of this bug: 1802067

John Hein

Comment 30

•

2 years ago

(In reply to Mike Hommey [:glandium] from comment #25)

The following makes things build on x86 for me with both GCC and clang, but the test added in bug 531915 doesn't pass when compiling with GCC (but does when compiling with clang)

.
.

+#if defined __FLT_EVAL_METHOD__ && (__FLT_EVAL_METHOD__ == 2)
+typedef long double      __double_t;
+#else
 typedef double      __double_t;
+#endif
 typedef __double_t  double_t;

FWIW, FreeBSD does defines FLT_EVAL_METHOD differently (regardless of gcc or clang)... In /usr/include/x86/float.h:

#ifdef __LP64__
#define FLT_EVAL_METHOD 0               /* no promotions */
#else
#define FLT_EVAL_METHOD (-1)            /* i387 semantics are...interesting */
#endif

But for the builtin __FLT_EVAL_METHOD__, it is 2 on i386 and 0 on amd64 (again, regardless of gcc or clang - at least on FreeBSD 12.x). So the patch based on __FLT_EVAL_METHOD__ would get past the i386 compiling failure (without local patch to fix it) on FreeBSD.

It would be interesting to understand more details on the test that is failing (described in comment 25). I couldn't tell how to [try to] reproduce that from reading bug 531915.

fix build on 32bit (at least for FreeBSD) 3 years ago Christoph Moench-Tegeder (deleted), patch		Details \| Diff \| Splinter Review
firefox-93.0_x86_math_private.patch 3 years ago acmondor (deleted), patch		Details \| Diff \| Splinter Review
firefox-93.0_x86_math_private_20211110.patch 3 years ago acmondor (deleted), patch		Details \| Diff \| Splinter Review