Closed
Bug 1423822
Opened 7 years ago
Closed 6 years ago
elfhack doesn't work with lld-linked binaries
Categories
(Firefox Build System :: General, defect)
Firefox Build System
General
Tracking
(firefox63 fixed)
RESOLVED
FIXED
mozilla63
Tracking | Status | |
---|---|---|
firefox63 | --- | fixed |
People
(Reporter: glandium, Assigned: glandium)
References
(Blocks 1 open bug)
Details
Attachments
(7 files)
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
(deleted),
text/x-review-board-request
|
froydnj
:
review+
|
Details |
+++ This bug was initially created as a clone of Bug #1423821 +++
[This is where we should actually fix the issue somehow]
(Found while trying to make elfhack -r work with lld-linked binaries)
STR:
- Take one of the .so files attached to bug 1385783
- Run elfhack on it
Observe the ELF program headers before and after:
Before:
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000230 0x000230 R 0x8
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x0042e8 0x0042e8 R 0x1000
LOAD 0x005000 0x0000000000005000 0x0000000000005000 0x003040 0x003040 R E 0x1000
LOAD 0x009000 0x0000000000009000 0x0000000000009000 0x0021c8 0x003010 RW 0x1000
TLS 0x00a3a0 0x000000000000a3a0 0x000000000000a3a0 0x000000 0x001010 R 0x10
DYNAMIC 0x00b010 0x000000000000b010 0x000000000000b010 0x0001a0 0x0001a0 RW 0x8
GNU_RELRO 0x00b000 0x000000000000b000 0x000000000000b000 0x0001c8 0x001000 R 0x1
GNU_EH_FRAME 0x004240 0x0000000000004240 0x0000000000004240 0x000024 0x000024 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
NOTE 0x00046e 0x000000000000046e 0x000000000000046e 0x000018 0x000018 R 0x1
After:
PHDR 0x000040 0x0000000000000040 0x0000000000000040 0x000268 0x000268 R 0x8
LOAD 0x000000 0x0000000000000000 0x0000000000000000 0x000810 0x000810 R 0x1000
LOAD 0x001240 0x0000000000004240 0x0000000000004240 0x0000a8 0x0000a8 R 0x1000
LOAD 0x0012e8 0x00000000000042e8 0x00000000000042e8 0x003d58 0x003d58 R E 0x1000
LOAD 0x006000 0x0000000000009000 0x0000000000009000 0x0021c8 0x003010 RW 0x1000
TLS 0x0073a0 0x000000000000a3a0 0x000000000000a3a0 0x000000 0x001010 R 0x10
DYNAMIC 0x008010 0x000000000000b010 0x000000000000b010 0x0001a0 0x0001a0 RW 0x8
GNU_RELRO 0x008000 0x000000000000b000 0x000000000000b000 0x0001c8 0x001000 R 0x1
GNU_EH_FRAME 0x001240 0x0000000000004240 0x0000000000004240 0x000024 0x000024 R 0x1
GNU_STACK 0x000000 0x0000000000000000 0x0000000000000000 0x000000 0x000000 RW 0
LOPROC+0x36968 0x6f00747365740061 0x676e697274730066 0x6600796172726100 0x770065737500726f 0x68666c6500687469 0x73006f74006b6361
The problem is that elfhack is adding a program header item, growing the program header by 56 bytes, but the data section that directly follows that is .rodata, which can't have its address moved (lld is being too smart for its own good).
Assignee | ||
Comment 1•7 years ago
|
||
Ideas:
- Move the program headers to some location where there's space. This could possibly break things that assume that mapping the first page of the binary allows to read the program headers (as a matter of fact, IIRC our custom dynamic linker does this assumption)
- Move the non-relocatable data that follows the program headers by one page, creating a one-page gap, and adjust the PT_LOAD to load it at the right address.
- Somehow take advantage of the fact that .rel.dyn and .text are not in the same PT_LOAD, and place the elfhack sections in a way that doesn't require splitting further. The current problem is that elfhack assumes it can't move the eh_frame data address (which /might/ be wrong) and since it places the elfhack data section before that, it needs to split the PT_LOAD to allow the binary to shrink in size.
Updated•7 years ago
|
Blocks: linker-lld
Updated•7 years ago
|
Product: Core → Firefox Build System
Comment 3•7 years ago
|
||
I do like the option of fixing lld rather than moving to another linker. I'm not sure it's easier than modifying elfhack, but I'll start somewhere.
Using readelf -l, on my local machine my linker is 'GNU ld (GNU Binutils for Ubuntu) 2.30' (bfd) and produces the following for test-array.so
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x000000000000af10 0x000000000000af10 R E 0x200000
> LOAD 0x000000000000be40 0x000000000020be40 0x000000000020be40
> 0x00000000000015b0 0x00000000000015b8 RW 0x200000
> DYNAMIC 0x000000000000be50 0x000000000020be50 0x000000000020be50
> 0x0000000000000190 0x0000000000000190 RW 0x8
> NOTE 0x0000000000000200 0x0000000000000200 0x0000000000000200
> 0x0000000000000024 0x0000000000000024 R 0x4
> TLS 0x000000000000be40 0x000000000020be40 0x000000000020be40
> 0x0000000000000000 0x0000000000001010 R 0x10
> GNU_EH_FRAME 0x000000000000ae40 0x000000000000ae40 0x000000000000ae40
> 0x000000000000002c 0x000000000000002c R 0x4
> GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000000000 0x0000000000000000 RW 0x10
> GNU_RELRO 0x000000000000be40 0x000000000020be40 0x000000000020be40
> 0x00000000000001c0 0x00000000000001c0 R 0x1
>
> Section to Segment mapping:
> Segment Sections...
> 00 .note.gnu.build-id .gnu.hash .dynsym .dynstr .gnu.version .gnu.version_r .rela.dyn .rela.plt .plt .text .rodata .eh_frame_hdr .eh_frame
> 01 .init_array .dynamic .got .got.plt .data .bss
> 02 .dynamic
> 03 .note.gnu.build-id
> 04 .tbss
> 05 .eh_frame_hdr
> 06
> 07 .init_array .dynamic .got
The lld-produced test-array.so from Bug 1385783 (and a local one I just generated) looks like this:
> Type Offset VirtAddr PhysAddr
> FileSiz MemSiz Flags Align
> PHDR 0x0000000000000040 0x0000000000000040 0x0000000000000040
> 0x0000000000000230 0x0000000000000230 R 0x8
> LOAD 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x00000000000042e8 0x00000000000042e8 R 0x1000
> LOAD 0x0000000000005000 0x0000000000005000 0x0000000000005000
> 0x0000000000003040 0x0000000000003040 R E 0x1000
> LOAD 0x0000000000009000 0x0000000000009000 0x0000000000009000
> 0x00000000000021c8 0x0000000000003010 RW 0x1000
> TLS 0x000000000000a3a0 0x000000000000a3a0 0x000000000000a3a0
> 0x0000000000000000 0x0000000000001010 R 0x10
> DYNAMIC 0x000000000000b010 0x000000000000b010 0x000000000000b010
> 0x00000000000001a0 0x00000000000001a0 RW 0x8
> GNU_RELRO 0x000000000000b000 0x000000000000b000 0x000000000000b000
> 0x00000000000001c8 0x0000000000001000 R 0x1
> GNU_EH_FRAME 0x0000000000004240 0x0000000000004240 0x0000000000004240
> 0x0000000000000024 0x0000000000000024 R 0x1
> GNU_STACK 0x0000000000000000 0x0000000000000000 0x0000000000000000
> 0x0000000000000000 0x0000000000000000 RW 0x0
> NOTE 0x000000000000046e 0x000000000000046e 0x000000000000046e
> 0x0000000000000018 0x0000000000000018 R 0x1
>
> Section to Segment mapping:
> Segment Sections...
> 00
> 01 .rodata .note.gnu.build-id .dynsym .gnu.version .gnu.version_r .gnu.hash .hash .dynstr .rela.dyn .rela.plt .eh_frame_hdr .eh_frame
> 02 .text .plt
> 03 .data .got.plt .init_array .dynamic .got .bss
> 04 .tbss
> 05 .dynamic
> 06 .init_array .dynamic .got
> 07 .eh_frame_hdr
> 08
> 09 .note.gnu.build-id
As I understand it (some of this is probably wrong):
elfhack is shrinking the .rela.plt section, and using the reclaimed space to insert it's own two sections. By using the reclaimed space it can avoid relocating anything subsequent to it (like .rodata) which means it doesn't need to relocate something non-relocatable. When doing so it creates a new segment and moves all subsequent sections to it's sections into the new segment. (I'm not sure why it does this.)
PHDR is not needed; and bfd doesn't produce it.
Because lld does produce it, we need to edit it. Editing it increases its size. That requires something right after it to moved, and that's not possible.
Additionally, we need our .elfhack sections to be in an executable segment.
In theory, possible solutions include:
a) Figuring out how to use less space in PHDR so we don't need to increase its size
b) moving .rela.dyn .rela.plt to the beginning (right after PHDR) so when we shrink them, we can use the reclaimed space
c) Not generating PHDR and generating the first segment as executable
d) others...
Comment 4•7 years ago
|
||
More conversation:
d) Using a linker script to order sections more helpfully
e) Using --sort-sections if that will do what we want - it probably won't
f) Using --symbol-ordering-file which "will map symbols to sections and sort sections in output according to symbol ordering file"
I'm going to investigate (f) first since this need was its intended purpose.
Assignee | ||
Comment 5•7 years ago
|
||
PHDR is that thing you are displaying. LLD is just putting a self-reference in it. So of course, to add a PT_LOAD, there needs to be one entry added, which means it *has* to grow. But, actually, this gives another opportunity:
g) reuse one of the useless program headers so that PHDR doesn't have to grow.
Now, for more explanation wrt comment 3:
elfhack shrinks .rel* as you said, and to reclaim the "freed" space, needs to move the sections that appear after that. But you can only do so if you make relative addresses in the entire binary be the same, so you need to ensure the VADDRs stay the same. Which mean elfhack has to split a PT_LOAD in two so that VADDRs don't change.
Some section, however, don't really care about their VADDR, and maybe eh_frame is one of them, and if it is, then there is no need to split a PT_LOAD with lld, since elfhack could move eh_frame to follow the resized .rel*, and place its own executable section before .text. That's the 3 item in comment 1.
I'd advise against f) because it requires listing all .rodata symbols, or even more.
b) and c) are not valid options.
Comment 6•7 years ago
|
||
I'd got two pretty good paths forward on the clang side. Both involve the --no-rosegment argument for lld to change its segments/sections to behave like bfd's. But that doesn't solve the PHDR problem.
1) Patch clang not to output PHDR
2) Offset the first section after PHDR with a linker script so there is room for it to grow.
Unfortunately; both break rust.
See Also: → https://github.com/rust-lang/rust/issues/50975
Assignee | ||
Comment 7•7 years ago
|
||
If --no-rosegment makes it behave more like bfd, why is there a phdr problem still?
Comment 8•7 years ago
|
||
Ah; there may not be after all.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=a5cff261dda30045133faf41e3c0a87b1830153c&selectedJob=179725669
I'm hitting 'No gain, skipping': https://searchfox.org/mozilla-central/source/build/unix/elfhack/elfhack.cpp#796
I thought this might be fine; but the comment there makes me unsure.
Additionally, in the build log there are a lot of 'readelf: Error: bad dynamic symbol' but I'm not sure where it's coming from or if it's a real issue or just a 'readelf can't process the object file output at that point at the job that does it runs unconditionally' type error.
Assignee | ||
Updated•6 years ago
|
Assignee: nobody → mh+mozilla
Assignee | ||
Updated•6 years ago
|
Blocks: android-lto
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Updated•6 years ago
|
Attachment #8996648 -
Flags: review?(nfroyd)
Assignee | ||
Comment 16•6 years ago
|
||
The eh_frame thing, at the moment, breaks the profiler.
Comment 17•6 years ago
|
||
mozreview-review |
Comment on attachment 8996643 [details]
Bug 1423822 - Make `elfhack -f` work in all cases where no gain would happen.
https://reviewboard.mozilla.org/r/260730/#review267740
Attachment #8996643 -
Flags: review?(nfroyd) → review+
Comment 18•6 years ago
|
||
mozreview-review |
Comment on attachment 8996646 [details]
Bug 1423822 - Check segments overlapping later.
https://reviewboard.mozilla.org/r/260736/#review267792
Attachment #8996646 -
Flags: review?(nfroyd) → review+
Comment 19•6 years ago
|
||
mozreview-review |
Comment on attachment 8996647 [details]
Bug 1423822 - Set the address for the elfhack code section based on that of the section it is attached to.
https://reviewboard.mozilla.org/r/260738/#review267794
Attachment #8996647 -
Flags: review?(nfroyd) → review+
Comment 20•6 years ago
|
||
mozreview-review |
Comment on attachment 8996644 [details]
Bug 1423822 - Change how elfhack determines it's not worth trying.
https://reviewboard.mozilla.org/r/260732/#review267802
Attachment #8996644 -
Flags: review?(nfroyd) → review+
Comment 21•6 years ago
|
||
mozreview-review |
Comment on attachment 8996649 [details]
Bug 1423822 - Stop disabling elfhack on lto builds.
https://reviewboard.mozilla.org/r/260742/#review267804
Attachment #8996649 -
Flags: review?(nfroyd) → review+
Comment 22•6 years ago
|
||
mozreview-review |
Comment on attachment 8996645 [details]
Bug 1423822 - Handle more cases of pointer reuse in DT_INIT_ARRAY.
https://reviewboard.mozilla.org/r/260734/#review267806
Attachment #8996645 -
Flags: review?(nfroyd) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Assignee | ||
Comment 30•6 years ago
|
||
So, for the record, it turns out llvm-dwarfdump is a little deceptive, in that it presents the raw data, and doesn't care to present "PC"-relative pointers in a resolved fashion, so the previous patch, which just moved the sections, didn't show any difference, but didn't actually work because the "PC"-relative pointers did need to be updated.
Dwarfdump -F, however, *does* show the resolved pointers, and made it more obvious the previous patch was wrong.
With the new patch, there is (almost) no difference in output for `dwarfdump -F` before and after the patch. The only difference comes from the fact that dwarfdump shows LSDA pointers as raw data, rather than interpreting them. They are pointers into the .gcc_except_table section and do need the same adjustments.
Assignee | ||
Comment 31•6 years ago
|
||
> before and after the patch
I mean before and after running elfhack.
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 34•6 years ago
|
||
mozreview-review |
Comment on attachment 8996648 [details]
Bug 1423822 - Allow to relocate eh_frame.
https://reviewboard.mozilla.org/r/260740/#review268048
Fun, fun.
::: build/unix/elfhack/elfhack.cpp:1090
(Diff revision 3)
> + // The .eh_frame section usually follows the eh_frame_hdr section.
> + ElfSection* eh_frame = eh_frame_hdr ? eh_frame_hdr->getNext() : nullptr;
"Usually follows" doesn't seem like a very strong guarantee. Should we throw an error here if `eh_frame_hdr && !eh_frame`, just so we can refine this later if need be?
Attachment #8996648 -
Flags: review?(nfroyd) → review+
Comment hidden (mozreview-request) |
Comment hidden (mozreview-request) |
Comment 37•6 years ago
|
||
Pushed by mh@glandium.org:
https://hg.mozilla.org/integration/autoland/rev/2698872fcdef
Make `elfhack -f` work in all cases where no gain would happen. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/a5792a8f6568
Change how elfhack determines it's not worth trying. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/6d74232cf5ef
Handle more cases of pointer reuse in DT_INIT_ARRAY. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/172feaea900c
Check segments overlapping later. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/8993d26433d8
Set the address for the elfhack code section based on that of the section it is attached to. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/83e147e83538
Allow to relocate eh_frame. r=froydnj
https://hg.mozilla.org/integration/autoland/rev/a973e7f82da4
Stop disabling elfhack on lto builds. r=froydnj
Comment 38•6 years ago
|
||
bugherder |
https://hg.mozilla.org/mozilla-central/rev/2698872fcdef
https://hg.mozilla.org/mozilla-central/rev/a5792a8f6568
https://hg.mozilla.org/mozilla-central/rev/6d74232cf5ef
https://hg.mozilla.org/mozilla-central/rev/172feaea900c
https://hg.mozilla.org/mozilla-central/rev/8993d26433d8
https://hg.mozilla.org/mozilla-central/rev/83e147e83538
https://hg.mozilla.org/mozilla-central/rev/a973e7f82da4
Status: NEW → RESOLVED
Closed: 6 years ago
status-firefox63:
--- → fixed
Resolution: --- → FIXED
Target Milestone: --- → mozilla63
Updated•6 years ago
|
You need to log in
before you can comment on or make changes to this bug.
Description
•