Closed Bug 1458109 Opened 7 years ago Closed 6 years ago

lld-link seems to have an over-aggressive link-time inlining even for debug build

Categories

(Firefox Build System :: General, enhancement)

3 Branch
enhancement
Not set
normal

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: xidorn, Assigned: away)

References

(Blocks 1 open bug)

Details

Attachments

(1 file)

I'm currently using lld-link for its speed of linking (but I'm still using cl rather than clang-cl due to the SDK troublesome). One problem I just reveal is that, lld-link seems to have a quite aggressive LTO for debug build, which may affect some of functions for debugging. Specifically, in a build with lld-link, I cannot invoke nsIFrame::DumpFrameTree(). MSVC complains > Function nsIFrame::DumpFrameTree has no address, possibly due to compiler optimizations. and also I cannot use DumpJSStack(), which MSVC complains > identifier "DumpJSStack" is undefined I suspect they are related to LTO in lld-link, since I didn't have this problem with link itself before. Note that my mozconfig has both --enable-debug and --enable-optimize (because I want debug assertions but non-opt build runs too slow). If it's indeed related to LTO, I guess we probably want to disable that when doing debug build.
lld can't do lto if the compiler doesn't output lto-able objects for it, which msvc clearly doesn't do. So your actual problem is that *msvc* is doing the inlining.
(or lld is not creating symbols for those functions)
Hmmm... I don't think I have this problem before... I'm currently doing a build with link.exe and I'll see whether that would work. Another possibility is that the debug-related options required by sccache don't work well for debugging. I switched to lld-link mostly because after using sccache, link time seems to be much longer than before, likely because of the option to disable separate pdb files.
Indeed this problem is because of lld-link. I can use both DumpFrameTree() and DumpJSStack() just with switching back to link.exe, and it also seems that MSVC works better with binary from link.exe than lld-link, in the sense that it doesn't need to evaluate value when trapped on break point, and local values are tracked correctly during debugging.
If you're using cl.exe to compile, then lld is not doing LTO and it's not even doing any optimization. More likely this is a symbols problem. What is your SVN revision? (You'll need to run `clang-cl --version` because `lld --version` is pretty terse.)
It's not clear to me which one the build system is using. I have a LLVM installed in PATH, and one from mach bootstrap. The local installed one is > clang version 6.0.0 (tags/RELEASE_600/final) and the one from mach bootstrap is > clang version 7.0.0 (trunk 328769)
In my mozconfig, I just specify > export LINKER="lld-link" and everything seems to just work, so I didn't dig deeper into which is being used.
Given the path searching code in toolchain.configure[1] I suspect it uses the installed one. I can probably try removing it from the path and see if the one in ~/.mozbuild works better. [1] https://searchfox.org/mozilla-central/rev/8837610b6c999451435695e800f38d4acbc0a644/build/moz.configure/toolchain.configure#639-654
You can probably dig the full path out of config.status or something, but a quick-and-dirty test would be to temporarily rename your 6.0.0 binary and see what happens.
In my objdir/config.status I see: 'LINKER': 'c:/Users/vm/.mozbuild/clang/bin/lld-link.exe',
So yes, it was using the installed lld-link rather than the one in .mozbuild. I'm trying to use the mozbuild linker and see if that helps.
I tried to explicitly set linker to the one under .mozbuild, and I can confirm that config.status now shows that one. But nothing mentioned in comment 0 gets improved. DumpFrameTree and DumpJSStack still don't work, and it still takes Visual Studio much time for "Evaluating local variables" when trapping at a breakpoint.
Per discussion with dmajor, I tried to build with clang-cl + lld-link (rather than cl + lld-link) to test whether this is still an issue. And unfortunately, as far as I can see, this is still an issue with that configuration. I still cannot invoke DumpJSStack or DumpFrameTree from Visual Studio. I'm getting the same message as what I mentioned in comment 0. My mozconfig for clang is the following: > ac_add_options --enable-optimize > ac_add_options --enable-debug > > ac_add_options --target=x86_64-pc-mingw32 > ac_add_options --host=x86_64-pc-mingw32 > > export CC="C:/Users/upsuper/.mozbuild/clang/bin/clang-cl.exe" > export CXX="C:/Users/upsuper/.mozbuild/clang/bin/clang-cl.exe" > export LINKER="C:/Users/upsuper/.mozbuild/clang/bin/lld-link.exe" > export WINDOWSSDKDIR="C:/winsdk-for-clang" > > mk_add_options MOZ_OBJDIR=@TOPSRCDIR@/obj-firefox-clang The corresponding clang version is: > clang version 7.0.0 (trunk 328769) > Target: x86_64-pc-windows-msvc > Thread model: posix > InstalledDir: c:\Users\upsuper\.mozbuild\clang\bin
(The good side is that, it seems variable watching is no longer an issue with clang-cl + lld-link.)
For the issue around invoking debug functions in immediate window: r328769 with clang-cl works. r328769 with lld-link doesn't work. r328769 with clang-cl and lld-link doesn't work. r332095 with clang-cl works. r332095 with lld-link doesn't work. r332095 with clang-cl and lld-link doesn't work. So it seems lld-link still has problem, even with updated version. In addition to that, cl with lld-link seems to be especially slow to debug (stepping takes long time to proceed), and clang-cl with lld-link is faster but still slower than just clang-cl with link. I suspect lld-link doesn't generate all debug information VS wants.
Flags: needinfo?(xidorn+moz)
Thank you for the thorough testing. I agree this points to a problem with lld-link. We'll need to file this over at bugs.llvm.org. I can do this for you if you don't have an account. To help the LLVM developers avoid dealing with multi-hundred-megabyte xul.pdb files, could you try looking around for a similarly broken function in some smaller binary, maybe firefox.exe or mozglue.dll?
It seems I don't have account at bugs.llvm.org. I'm wondering whether it is possible to have some small reproducible example code directly... DumpJSStack seems to be > extern "C" __declspec(dllexport) void DumpJSStack() { ... } and its file nsXPConnect.cpp is in unified source, so it's likely wrapped in an anonymous namespace. So probably something like that and wrap in "namespace {}" is enough to reproduce. I suppose anything with dllexport shouldn't normally be removed, and something with |extern "C"| should exist in the global namespace. DumpFrameTree is a virtual function of nsIFrame, which has only one implementation and one callsite. This is pretty likely just a victim of some kind of (simple) link-time optimization...
Actually, I built a small program trying to reproduce the issues mentioned in comment 0, but failed to. It seems that I can invoke both an extern-C-dllexport function and a virtual function from the debugger. So maybe it is a matter of linker flags being used? I have /debug in the linker flags (otherwise I cannot even put a breakpoint anywhere even with pdb file).
This build seems to be fine for me in WinDBG: https://treeherder.mozilla.org/#/jobs?repo=try&revision=7a47fe24a223e356cda07b70cdb2d46d59ebb85e&filter-tier=3&filter-tier=1&filter-tier=2&selectedJob=180978688 0:000> x xul!*dumpframetree 00000001`838d2660 xul!nsIFrame::DumpFrameTree (void) 0:000> x xul!dumpjsstack 00000001`80d3a730 xul!DumpJSStack (<no parameter info>) Does it work for you in MSVC?
Flags: needinfo?(xidorn+moz)
No, it doesn't. VS reports the same error as comment 0. I grepped .pdb file I downloaded, and I can see DumpJSStack and nsIFrame::DumpFrameTree in it, but VS insists that the former is undefined and the latter doesn't have address.
Flags: needinfo?(xidorn+moz)
On IRC, Xidorn tried a few more functions, and it seems that no functions in xul.dll are callable, but some in mozglue.dll are ok. I filed this as https://bugs.llvm.org/show_bug.cgi?id=37634.
Hmmm... so actually I found at least one member function callable in mozglue.dll, which is TimeStamp::Now (others don't seem to be callable still). There are several non-member functions in WindowsDllBlocklist.cpp are callable like DllBlocklist_Initialize and ShouldBlockThread.
I restarted Visual Studio, and now the address of TimeStamp::Now is lost, and I can no longer call it anymore. Probably it became callable via some other mechanism when I was trying other stuff or so...
> I restarted Visual Studio, and now the address of TimeStamp::Now is lost, Yikes, that doesn't sound very reassuring.
Xidorn, can you please create an account on llvm's bugzilla and reply to the questions at https://bugs.llvm.org/show_bug.cgi?id=37634#c13 ? I don't really want to be the middleman for this.
Flags: needinfo?(xidorn+moz)
Sent the email to request account.
Flags: needinfo?(xidorn+moz)
Does the proposed fix in the LLVM bug address the issue for you?
Flags: needinfo?(xidorn+moz)
Do you have a packed toolchain I can test with?
Flags: needinfo?(xidorn+moz) → needinfo?(dmajor)
The try link is in that bug: https://treeherder.mozilla.org/#/jobs?repo=try&revision=4ea181f6e52c9b07d78ebacf89e594951f7a3b2c I figured you would just want to test that Firefox build, but if you specifically want the toolchain, you can get it from `TW64(clang-cl)` under the `Toolchains` line.
Flags: needinfo?(dmajor) → needinfo?(xidorn+moz)
Yeah I just need the build. I'll test tomorrow.
No, that build solves neither of the problems, unfortunately. I'm still seeing the same result as comment 0.
Flags: needinfo?(xidorn+moz)
This should fix the symbols bug https://bugs.llvm.org/show_bug.cgi?id=37634 and also picks up a ccov fix in https://reviews.llvm.org/D48538. Xidorn, it's looking very much like this is fixed now, but can you please check once more? https://treeherder.mozilla.org/#/jobs?repo=try&revision=a4642cbf4a417e7e6a37a489cbd4b719245070dd
Assignee: nobody → dmajor
Attachment #8990423 - Flags: review?(core-build-config-reviews)
Attachment #8990423 - Flags: feedback?(xidorn+moz)
Comment on attachment 8990423 [details] [diff] [review] Update to LLVM r336407 on Windows That try build works! Thanks!
Attachment #8990423 - Flags: feedback?(xidorn+moz) → feedback+
Attachment #8990423 - Flags: review?(core-build-config-reviews) → review+
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Version: Version 3 → 3 Branch
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: