Closed
Bug 1448976
Opened 7 years ago
Closed 6 years ago
Figure out Windows ThinLTO undefined symbol issues
Categories
(Firefox Build System :: General, defect)
Tracking
(firefox63 fixed)
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: away, Assigned: away)
References
Details
Attachments
(1 file, 2 obsolete files)
(deleted),
patch
|
froydnj
:
review+
|
Details | Diff | Splinter Review |
Currently in my local ThinLTO experiments I have to pass -FORCE in LDFLAGS to get around some undefined symbols in FlagRegisterer and swprintf_s.
This is very likely the same root cause as https://bugs.chromium.org/p/chromium/issues/detail?id=786127
lld-link.exe: warning: lto.tmp: undefined symbol: ?ParseCommandLineFlags@google@@YAIPEAHPEAPEAPEAD_N@Z
lld-link.exe: warning: lto.tmp: undefined symbol: ??0FlagRegisterer@google@@QEAA@PEBD000PEAX1@Z
Unsurprisingly, ignoring these errors with -FORCE is not safe. This is responsible for at least a few mochitest failures.
For example, the _sqrtf problem in comment 2 causes this code
const float std_max_bit_rate = sqrt(var_max_bitrate_kbps_ *
avg_max_bitrate_kbps_);
to turn into `call <middle_of_nowhere>`
inglorion, I see that you've got a very similar bug over in Chromium. Is there anyone looking into this on your side, and/or is there any information I could provide that might help the investigation? Thanks!
I have been looking into that bug. I was out last week but back at it today. I still haven't been able to pinpoint the root cause, and it doesn't reproduce reliably for me - sometimes, the build succeeds, and when it fails, it doesn't always fail in the same way. It also only fails on large link jobs. If you have something that reliably fails in the same way, or can be built quickly and fails, that would help me. I've started looking at it from a different angle, cross-compiling from Linux so that I can run with sanitizers. This has turned up a data race, which I fixed, but the undefined symbol failures are still there.
Thanks. FWIW I get the FlagRegisterer and _sqrtf errors very reliably, when linking xul.dll and webrtc-gtest.exe (which are both huge). I haven't had much luck reducing it so far, but I'll try to give it another attempt.
David, is there a quick way for me to reproduce the failures you're seeing? I have the Firefox source code checked out, but I'm not sure what I would do to build just one of the failing targets, with the relevant options to use Clang, LLD, and ThinLTO.
Unfortunately it's not very turnkey at the moment. (Bug 1448980 ought to make this easier one day)
https://developer.mozilla.org/en-US/docs/Mozilla/Developer_guide/Build_Instructions/Building_Firefox_on_Windows_with_clang-cl has instructions on setting clang-cl and lld-link.
Then apply this patch to enable ThinLTO: https://hg.mozilla.org/try/rev/7915a85d2ef4c31605285d9f7ef1e5e05067237d -- except take out the "-FORCE" from both files so that you stop immediately at the error.
Our build system isn't great at building a single target from scratch. Easiest route is probably just to "mach build" and wait until it gets to one of the failing binaries. I _think_ the first one to fail will be gdb-tests.exe. After that you can repeat the linking with "mach build obj-whatever/dist/bin/gdb-tests.exe".
> Our build system isn't great at building a single target from scratch.
This probably comes too late, but having just overheard some discussion in our build channel, it seems that this is possible after all:
mach build pre-export export && mach build -C . media/webrtc/trunk/gtest/target
Assignee | ||
Comment 10•6 years ago
|
||
I've been following the crbug and tried the patch at https://reviews.llvm.org/D48051 but it didn't fix the undefined symbols on our side. (I didn't expect it to; our symbol errors have always been consistently reproducible.)
Assignee | ||
Comment 11•6 years ago
|
||
Interesting, with Full-LTO, I get a different set of messages about these functions.
1:55.38 Global is external, but doesn't have external or weak linkage!
1:55.38 %"class.google::FlagRegisterer"* (%"class.google::FlagRegisterer"*, i8*, i8*, i8*, i8*, i8*, i8*)* @"??0FlagRegisterer@google@@QEAA@PEBD000PEAX1@Z"
1:55.38 Global is external, but doesn't have external or weak linkage!
1:55.38 i32 (i32*, i8***, i1)* @"?ParseCommandLineFlags@google@@YAIPEAHPEAPEAPEAD_N@Z"
1:55.47 LLVM ERROR: Broken module found, compilation aborted!
(I was pretty sure that Full-LTO builds were successful circa January, but maybe I'm misremembering)
Assignee | ||
Comment 12•6 years ago
|
||
Also, when I add a call to ParseCommandLineNonHelpFlags to test_main.cc, I get no complaints about that, even though it has identical signature to ParseCommandLineFlags and identical implementation except for the value of one boolean. (And even if I flip that boolean, still no complaint.)
Perhaps ParseCommandLineFlags has another use somewhere in the module, that is influencing its definition, where ParseCommandLineNonHelpFlags does not.
Assignee | ||
Comment 13•6 years ago
|
||
(Oh no, I had this comment sitting here for days and forgot to submit it!)
This may be a clue. I noticed this scroll past on a clang-cl + MSVC link.exe build:
15:15:48 INFO - Unified_cpp_webrtc_trunk_gtest13.obj : warning LNK4217: locally defined symbol ?AllowCommandLineReparsing@google@@YAXXZ (void __cdecl google::AllowCommandLineReparsing(void)) imported in function _main
15:15:48 INFO - Unified_cpp_webrtc_trunk_gtest13.obj : warning LNK4217: locally defined symbol ?ParseCommandLineFlags@google@@YAIPAHPAPAPAD_N@Z (unsigned int __cdecl google::ParseCommandLineFlags(int *,char * * *,bool)) imported in function _main
15:15:48 INFO - Unified_cpp_webrtc_trunk_gtest13.obj : warning LNK4217: locally defined symbol ??0FlagRegisterer@google@@QAE@PBD000PAX1@Z (public: __thiscall google::FlagRegisterer::FlagRegisterer(char const *,char const *,char const *,char const *,void *,void *)) imported in function __GLOBAL__sub_I_Unified_cpp_webrtc_trunk_gtest13.cpp
These are the same functions that give clang LTO trouble. Maybe this local use of dllimports is triggering the codepath that messes up the linkage issues in comment 11.
Assignee | ||
Comment 14•6 years ago
|
||
And with the help of that clue, I managed to reduce this to a tiny repro!
I've filed https://bugs.llvm.org/show_bug.cgi?id=38105
Assignee | ||
Comment 15•6 years ago
|
||
The patch in https://bugs.llvm.org/show_bug.cgi?id=38105 fixes all but one of our undefined symbols.
The one remaining is an undefined reference to `ldexpf` that comes from this use of `pow`, possibly by way of some #defines or inlined functions: https://searchfox.org/mozilla-central/rev/28daa2806c89684b3dfa4f0b551db1d099dda7c2/gfx/angle/checkout/src/common/mathutil.cpp#68
I'll attempt to reduce it. Leaving this open for now.
Assignee | ||
Comment 16•6 years ago
|
||
creduce is the best.
$ cat mathutil.cpp
extern "C" float powf(float, float);
int main(int argc) {
return powf(2.0f, argc);
}
$ clang-cl -O1 -c mathutil.cpp -W0 -flto=thin
$ lld-link -nodefaultlib -entry:main mathutil.obj
lld-link: error: lto.tmp: undefined symbol: ldexpf
Assignee | ||
Comment 17•6 years ago
|
||
Filed https://bugs.llvm.org/show_bug.cgi?id=38139 for the ldexpf issue.
Assignee | ||
Comment 18•6 years ago
|
||
With these two patches, our ThinLTO builds work!
Really excited that we finally got these figured out, after months of staring at those build failures.
Assignee: nobody → dmajor
Attachment #8992336 -
Flags: review?(nfroyd)
Comment 19•6 years ago
|
||
Comment on attachment 8992336 [details] [diff] [review]
Bug 1448976: Pick up two LLVM fixes for ThinLTO undefined symbols on Windows.
Review of attachment 8992336 [details] [diff] [review]:
-----------------------------------------------------------------
As cool as it is to have this figured out, is it worth getting them in ahead of them being landed in LLVM?
At the very least, we should probably address the below.
::: build/build-clang/bug38105.patch
@@ +9,5 @@
> +
> +- auto &GlobalRes = GlobalResolutions[Sym.getName()];
> ++ StringRef Name = Sym.getName();
> ++ if (Name.startswith("__imp_"))
> ++ Name = Name.substr(strlen("__imp_"));
It looks like https://reviews.llvm.org/D49138 has had some changes here to restrict this to COFF-only.
Attachment #8992336 -
Flags: review?(nfroyd)
Assignee | ||
Comment 20•6 years ago
|
||
> As cool as it is to have this figured out, is it worth getting them in ahead
> of them being landed in LLVM?
I have no assurance that the landings will occur in any particular timeframe, and it would be great to get ThinLTO enabled on some nightlies before we get too late in the cycle for big changes.
> It looks like https://reviews.llvm.org/D49138 has had some changes here to
> restrict this to COFF-only.
I didn't bother because we're only patching our Windows build of clang, but OK, sure.
Assignee | ||
Comment 21•6 years ago
|
||
Picked up the updated patch with the COFF restriction.
Attachment #8992336 -
Attachment is obsolete: true
Attachment #8992357 -
Flags: review?(nfroyd)
Assignee | ||
Comment 22•6 years ago
|
||
Comment on attachment 8992357 [details] [diff] [review]
Bug 1448976: Pick up two LLVM fixes for ThinLTO undefined symbols on Windows.
Or not! Monday morning...
Attachment #8992357 -
Attachment is obsolete: true
Attachment #8992357 -
Flags: review?(nfroyd)
Assignee | ||
Comment 23•6 years ago
|
||
Ok, now really with the COFF update
Attachment #8992359 -
Flags: review?(nfroyd)
Updated•6 years ago
|
Attachment #8992359 -
Flags: review?(nfroyd) → review+
Comment 24•6 years ago
|
||
Pushed by dmajor@mozilla.com:
https://hg.mozilla.org/integration/mozilla-inbound/rev/530bee63a63c
Pick up two LLVM fixes for ThinLTO undefined symbols on Windows. r=froydnj
Comment 25•6 years ago
|
||
bugherder |
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Updated•6 years ago
|
Version: Version 3 → 3 Branch
You need to log in
before you can comment on or make changes to this bug.
Description
•