Closed Bug 1423822 Opened 7 years ago Closed 6 years ago

elfhack doesn't work with lld-linked binaries

Categories

(Firefox Build System :: General, defect)

defect
Not set
normal

Tracking

(firefox63 fixed)

RESOLVED FIXED
mozilla63
Tracking Status
firefox63 --- fixed

People

(Reporter: glandium, Assigned: glandium)

References

(Blocks 1 open bug)

Details

Attachments

(7 files)

+++ This bug was initially created as a clone of Bug #1423821 +++ [This is where we should actually fix the issue somehow] (Found while trying to make elfhack -r work with lld-linked binaries) STR: - Take one of the .so files attached to bug 1385783 - Run elfhack on it Observe the ELF program headers before and after: Before: PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000230 0x000230 R 0x8 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0042e8 0x0042e8 R 0x1000 LOAD 0x005000 0x0000000000005000 0x0000000000005000 0x003040 0x003040 R E 0x1000 LOAD 0x009000 0x0000000000009000 0x0000000000009000 0x0021c8 0x003010 RW 0x1000 TLS 0x00a3a0 0x000000000000a3a0 0x000000000000a3a0 0x000000 0x001010 R 0x10 DYNAMIC 0x00b010 0x000000000000b010 0x000000000000b010 0x0001a0 0x0001a0 RW 0x8 GNU_RELRO 0x00b000 0x000000000000b000 0x000000000000b000 0x0001c8 0x001000 R 0x1 GNU_EH_FRAME 0x004240 0x0000000000004240 0x0000000000004240 0x000024 0x000024 R 0x1 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0 NOTE 0x00046e 0x000000000000046e 0x000000000000046e 0x000018 0x000018 R 0x1 After: PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8 LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000810 0x000810 R 0x1000 LOAD 0x001240 0x0000000000004240 0x0000000000004240 0x0000a8 0x0000a8 R 0x1000 LOAD 0x0012e8 0x00000000000042e8 0x00000000000042e8 0x003d58 0x003d58 R E 0x1000 LOAD 0x006000 0x0000000000009000 0x0000000000009000 0x0021c8 0x003010 RW 0x1000 TLS 0x0073a0 0x000000000000a3a0 0x000000000000a3a0 0x000000 0x001010 R 0x10 DYNAMIC 0x008010 0x000000000000b010 0x000000000000b010 0x0001a0 0x0001a0 RW 0x8 GNU_RELRO 0x008000 0x000000000000b000 0x000000000000b000 0x0001c8 0x001000 R 0x1 GNU_EH_FRAME 0x001240 0x0000000000004240 0x0000000000004240 0x000024 0x000024 R 0x1 GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0 LOPROC+0x36968 0x6f00747365740061 0x676e697274730066 0x6600796172726100 0x770065737500726f 0x68666c6500687469 0x73006f74006b6361 The problem is that elfhack is adding a program header item, growing the program header by 56 bytes, but the data section that directly follows that is .rodata, which can't have its address moved (lld is being too smart for its own good).
Ideas: - Move the program headers to some location where there's space. This could possibly break things that assume that mapping the first page of the binary allows to read the program headers (as a matter of fact, IIRC our custom dynamic linker does this assumption) - Move the non-relocatable data that follows the program headers by one page, creating a one-page gap, and adjust the PT_LOAD to load it at the right address. - Somehow take advantage of the fact that .rel.dyn and .text are not in the same PT_LOAD, and place the elfhack sections in a way that doesn't require splitting further. The current problem is that elfhack assumes it can't move the eh_frame data address (which /might/ be wrong) and since it places the elfhack data section before that, it needs to split the PT_LOAD to allow the binary to shrink in size.
Product: Core → Firefox Build System
Blocks: 1457482
I do like the option of fixing lld rather than moving to another linker. I'm not sure it's easier than modifying elfhack, but I'll start somewhere. Using readelf -l, on my local machine my linker is 'GNU ld (GNU Binutils for Ubuntu) 2.30' (bfd) and produces the following for test-array.so > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x000000000000af10 0x000000000000af10 R E 0x200000 > LOAD 0x000000000000be40 0x000000000020be40 0x000000000020be40 > 0x00000000000015b0 0x00000000000015b8 RW 0x200000 > DYNAMIC 0x000000000000be50 0x000000000020be50 0x000000000020be50 > 0x0000000000000190 0x0000000000000190 RW 0x8 > NOTE 0x0000000000000200 0x0000000000000200 0x0000000000000200 > 0x0000000000000024 0x0000000000000024 R 0x4 > TLS 0x000000000000be40 0x000000000020be40 0x000000000020be40 > 0x0000000000000000 0x0000000000001010 R 0x10 > GNU_EH_FRAME 0x000000000000ae40 0x000000000000ae40 0x000000000000ae40 > 0x000000000000002c 0x000000000000002c R 0x4 > GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 RW 0x10 > GNU_RELRO 0x000000000000be40 0x000000000020be40 0x000000000020be40 > 0x00000000000001c0 0x00000000000001c0 R 0x1 > > Section to Segment mapping: > Segment Sections... > 00 .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame > 01 .init_array .dynamic .got .got.plt .data .bss > 02 .dynamic > 03 .note.gnu.build-id > 04 .tbss > 05 .eh_frame_hdr > 06 > 07 .init_array .dynamic .got The lld-produced test-array.so from Bug 1385783 (and a local one I just generated) looks like this: > Type Offset VirtAddr PhysAddr > FileSiz MemSiz Flags Align > PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040 > 0x0000000000000230 0x0000000000000230 R 0x8 > LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x00000000000042e8 0x00000000000042e8 R 0x1000 > LOAD 0x0000000000005000 0x0000000000005000 0x0000000000005000 > 0x0000000000003040 0x0000000000003040 R E 0x1000 > LOAD 0x0000000000009000 0x0000000000009000 0x0000000000009000 > 0x00000000000021c8 0x0000000000003010 RW 0x1000 > TLS 0x000000000000a3a0 0x000000000000a3a0 0x000000000000a3a0 > 0x0000000000000000 0x0000000000001010 R 0x10 > DYNAMIC 0x000000000000b010 0x000000000000b010 0x000000000000b010 > 0x00000000000001a0 0x00000000000001a0 RW 0x8 > GNU_RELRO 0x000000000000b000 0x000000000000b000 0x000000000000b000 > 0x00000000000001c8 0x0000000000001000 R 0x1 > GNU_EH_FRAME 0x0000000000004240 0x0000000000004240 0x0000000000004240 > 0x0000000000000024 0x0000000000000024 R 0x1 > GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000 > 0x0000000000000000 0x0000000000000000 RW 0x0 > NOTE 0x000000000000046e 0x000000000000046e 0x000000000000046e > 0x0000000000000018 0x0000000000000018 R 0x1 > > Section to Segment mapping: > Segment Sections... > 00 > 01 .rodata .note.gnu.build-id .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .rela.plt .eh_frame_hdr .eh_frame > 02 .text .plt > 03 .data .got.plt .init_array .dynamic .got .bss > 04 .tbss > 05 .dynamic > 06 .init_array .dynamic .got > 07 .eh_frame_hdr > 08 > 09 .note.gnu.build-id As I understand it (some of this is probably wrong): elfhack is shrinking the .rela.plt section, and using the reclaimed space to insert it's own two sections. By using the reclaimed space it can avoid relocating anything subsequent to it (like .rodata) which means it doesn't need to relocate something non-relocatable. When doing so it creates a new segment and moves all subsequent sections to it's sections into the new segment. (I'm not sure why it does this.) PHDR is not needed; and bfd doesn't produce it. Because lld does produce it, we need to edit it. Editing it increases its size. That requires something right after it to moved, and that's not possible. Additionally, we need our .elfhack sections to be in an executable segment. In theory, possible solutions include: a) Figuring out how to use less space in PHDR so we don't need to increase its size b) moving .rela.dyn .rela.plt to the beginning (right after PHDR) so when we shrink them, we can use the reclaimed space c) Not generating PHDR and generating the first segment as executable d) others...
More conversation: d) Using a linker script to order sections more helpfully e) Using --sort-sections if that will do what we want - it probably won't f) Using --symbol-ordering-file which "will map symbols to sections and sort sections in output according to symbol ordering file" I'm going to investigate (f) first since this need was its intended purpose.
PHDR is that thing you are displaying. LLD is just putting a self-reference in it. So of course, to add a PT_LOAD, there needs to be one entry added, which means it *has* to grow. But, actually, this gives another opportunity: g) reuse one of the useless program headers so that PHDR doesn't have to grow. Now, for more explanation wrt comment 3: elfhack shrinks .rel* as you said, and to reclaim the "freed" space, needs to move the sections that appear after that. But you can only do so if you make relative addresses in the entire binary be the same, so you need to ensure the VADDRs stay the same. Which mean elfhack has to split a PT_LOAD in two so that VADDRs don't change. Some section, however, don't really care about their VADDR, and maybe eh_frame is one of them, and if it is, then there is no need to split a PT_LOAD with lld, since elfhack could move eh_frame to follow the resized .rel*, and place its own executable section before .text. That's the 3 item in comment 1. I'd advise against f) because it requires listing all .rodata symbols, or even more. b) and c) are not valid options.
I'd got two pretty good paths forward on the clang side. Both involve the --no-rosegment argument for lld to change its segments/sections to behave like bfd's. But that doesn't solve the PHDR problem. 1) Patch clang not to output PHDR 2) Offset the first section after PHDR with a linker script so there is room for it to grow. Unfortunately; both break rust.
If --no-rosegment makes it behave more like bfd, why is there a phdr problem still?
Ah; there may not be after all. https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5cff261dda30045133faf41e3c0a87b1830153c&selectedJob=179725669 I'm hitting 'No gain, skipping': https://searchfox.org/mozilla-central/source/build/unix/elfhack/elfhack.cpp#796 I thought this might be fine; but the comment there makes me unsure. Additionally, in the build log there are a lot of 'readelf: Error: bad dynamic symbol' but I'm not sure where it's coming from or if it's a real issue or just a 'readelf can't process the object file output at that point at the job that does it runs unconditionally' type error.
Assignee: nobody → mh+mozilla
Blocks: android-lto
Attachment #8996648 - Flags: review?(nfroyd)
The eh_frame thing, at the moment, breaks the profiler.
Comment on attachment 8996643 [details] Bug 1423822 - Make `elfhack -f` work in all cases where no gain would happen. https://reviewboard.mozilla.org/r/260730/#review267740
Attachment #8996643 - Flags: review?(nfroyd) → review+
Attachment #8996646 - Flags: review?(nfroyd) → review+
Comment on attachment 8996647 [details] Bug 1423822 - Set the address for the elfhack code section based on that of the section it is attached to. https://reviewboard.mozilla.org/r/260738/#review267794
Attachment #8996647 - Flags: review?(nfroyd) → review+
Comment on attachment 8996644 [details] Bug 1423822 - Change how elfhack determines it's not worth trying. https://reviewboard.mozilla.org/r/260732/#review267802
Attachment #8996644 - Flags: review?(nfroyd) → review+
Attachment #8996649 - Flags: review?(nfroyd) → review+
Comment on attachment 8996645 [details] Bug 1423822 - Handle more cases of pointer reuse in DT_INIT_ARRAY. https://reviewboard.mozilla.org/r/260734/#review267806
Attachment #8996645 - Flags: review?(nfroyd) → review+
So, for the record, it turns out llvm-dwarfdump is a little deceptive, in that it presents the raw data, and doesn't care to present "PC"-relative pointers in a resolved fashion, so the previous patch, which just moved the sections, didn't show any difference, but didn't actually work because the "PC"-relative pointers did need to be updated. Dwarfdump -F, however, *does* show the resolved pointers, and made it more obvious the previous patch was wrong. With the new patch, there is (almost) no difference in output for `dwarfdump -F` before and after the patch. The only difference comes from the fact that dwarfdump shows LSDA pointers as raw data, rather than interpreting them. They are pointers into the .gcc_except_table section and do need the same adjustments.
> before and after the patch I mean before and after running elfhack.
Comment on attachment 8996648 [details] Bug 1423822 - Allow to relocate eh_frame. https://reviewboard.mozilla.org/r/260740/#review268048 Fun, fun. ::: build/unix/elfhack/elfhack.cpp:1090 (Diff revision 3) > + // The .eh_frame section usually follows the eh_frame_hdr section. > + ElfSection* eh_frame = eh_frame_hdr ? eh_frame_hdr->getNext() : nullptr; "Usually follows" doesn't seem like a very strong guarantee. Should we throw an error here if `eh_frame_hdr && !eh_frame`, just so we can refine this later if need be?
Attachment #8996648 - Flags: review?(nfroyd) → review+
Pushed by mh@glandium.org: https://hg.mozilla.org/integration/autoland/rev/2698872fcdef Make `elfhack -f` work in all cases where no gain would happen. r=froydnj https://hg.mozilla.org/integration/autoland/rev/a5792a8f6568 Change how elfhack determines it's not worth trying. r=froydnj https://hg.mozilla.org/integration/autoland/rev/6d74232cf5ef Handle more cases of pointer reuse in DT_INIT_ARRAY. r=froydnj https://hg.mozilla.org/integration/autoland/rev/172feaea900c Check segments overlapping later. r=froydnj https://hg.mozilla.org/integration/autoland/rev/8993d26433d8 Set the address for the elfhack code section based on that of the section it is attached to. r=froydnj https://hg.mozilla.org/integration/autoland/rev/83e147e83538 Allow to relocate eh_frame. r=froydnj https://hg.mozilla.org/integration/autoland/rev/a973e7f82da4 Stop disabling elfhack on lto builds. r=froydnj
Depends on: 1480617
Depends on: 1480654
Blocks: 1480688
Blocks: 1481727
Depends on: 1499915
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Created:
Updated:
Size: