Open Bug 241081 Opened 21 years ago Updated 2 years ago

Cleanup for mozilla/security/nss/lib/freebl/mpi/mpi_x86.s

Categories

(NSS :: Libraries, defect, P3)

x86
Linux

Tracking

(Not tracked)

People

(Reporter: tenthumbs, Unassigned)

References

Details

(Whiteboard: awaiting consensus of interested parties)

Attachments

(3 files, 2 obsolete files)

It's missing some .size directives which help the linker. A patch is attached. Also, if speed really is a consideration then s_mpv_div_2dx1d at least could be a gcc inline function which would be even faster. Something like this. static inline mp_err s_mpv_div_2dx1d (mp_digit nhi, mp_digit nlo, mp_digit den, mp_digit *qp, mp_digit *rp) { __asm__ __volatile__ ( "divl %4 \n\t" :"=a" (*qp), "=d" (*rp) : "a" (nlo), "d" (nhi), "rm" (den) : "cc" ); return 0; }
Attached patch patch (obsolete) (deleted) — Splinter Review
Assignee: wchang0222 → MisterSSL
Comment on attachment 146587 [details] [diff] [review] patch >- # Magic indicating no need for an executable stack >-.section .note.GNU-stack, "", @progbits >-.previous Did you intentionally delete these? You did not describe this change in the bug report.
> Did you intentionally delete these? Yes I did. They would affect an entire executable/shared object that contained this file. Are you sure you want to do that in an obscure utility file? I would think you should allow the compiler to control this. Leave it in if you want. You can just delete the .previous line, it's uesless at the end of a file.
What is the net effect of this change? smaller shared lib? or ? Is there any reason NOT to disable executable stacks? Executable stacks strike me as a vulnerability. mpi is linked into nss's libsoftoken, which has no need of an executable stack.
Priority: -- → P3
Target Milestone: --- → 3.10
The .size directive gives hints to ld which may or may not use the info to move the code to a more efficient location. It's highly platform dependent but gcc always emits it. There's no reason mot to and it's harmless. As for GNU-stack, it's not at all clear it actually does anything. See <http://gcc.gnu.org/ml/gcc-patches/2003-06/msg00302.html> and <http://gcc.gnu.org/ml/gcc-patches/2003-11/msg00975.html>. It's just a hint to the kernel. In fact, the linker in binutils-2.15.90.0.2-2.15.90.0.3 from kernel.org just discards the section. Since you have no control over any linker but your own it guarantess nothing. And can you guarantee that this won't adversely affect some gcc version out there that might want an executable stack? And, of course, there's nothing preventing some other part of a particular process from changing the permissions on a peice of memory. It's just a couple of lines that make you feel good without necessarily doing anything.
Comment on attachment 146587 [details] [diff] [review] patch ># Magic indicating no need for an executable stack >.section .note.GNU-stack, "", @progbits >.previous Just wanted to provide the history of these three lines. I received these three lines from a Red Hat Linux developer (via Chris Blizzard).
note.GNU-stack is very much a Redhat thing so it's not surprising their people think it's the usual thing to do. This is all part of the "exec shield" package but that's not part of the official Linux kernel (at least not yet). Other distributions may choose to do things differently so there's the potential for conflicts. Since the stack belongs to the application putting this is a shared library is probably superfluous. Linking against a static library might be a problem if the app needs an executable stack but your library says no. This isn't worth getting too excited about. It's probably safe to leave it if you want as long as you realize that it doesn't do much for you and that you might have some problems.
Comment on attachment 146587 [details] [diff] [review] patch >- # Magic indicating no need for an executable stack >-.section .note.GNU-stack, "", @progbits My guess is that this doesn't force a non-executable stack, but rather avoids forcing an executable stack. I'm inclined to leave it in if RH recommends it and it doesn't hurt anyone else. Wan-Teh, if you approve this patch, I'll check it in on the trunk.
Attachment #146587 - Flags: review?(wchang0222)
Those 2 lines did hurt OS/2 (see bug 240775 ).
Comment on attachment 146587 [details] [diff] [review] patch r=wtc, assuming the executable stack stuff at the end of this file is left alone. I don't know the syntax of the .size directive, so I don't know if this patch is correct. I'll trust the submitter of this patch. If I compile the following C file using "gcc -c -O2 -S foo.c": int incr(int a) { return ++a; } int decr(int a) { return --a; } the compiler generates the following assembly code: .file "foo.c" .text .p2align 2,,3 .globl incr .type incr,@function incr: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax incl %eax leave ret .Lfe1: .size incr,.Lfe1-incr .p2align 2,,3 .globl decr .type decr,@function decr: pushl %ebp movl %esp, %ebp movl 8(%ebp), %eax decl %eax leave ret .Lfe2: .size decr,.Lfe2-decr .ident "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)" So my question is what "." means in a .size directive?
Attachment #146587 - Flags: review?(wchang0222) → review+
I think Julien would like to see those last 3 lines removed from the source, (as this patch presently does), and Wan-Teh would like to see them remain in the source. I just wanna do whatever makes the .s usable on most platforms, or else make another source file for platforms with odd requirements. Please advise.
I am not really qualified to review this patch, so you can ignore my comments. I tried to find documentation of .size, but what I found is not very helpful: http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_7.html#SEC123
Wan-Teh Chang: the '.' is the current location counter which, in this case, is the location immediately after the ret. The compiler always puts a local label at this point which always has the same value as '.'. The label is superfluous. Nelson Bolyard: if you use a special make rule for this file you can use "gcc -x assembler-with-cpp" as I suggested and the wrap the two lines in an ifdef. You may just need to redefine make's AS variable.
That last message was confusing. I didn't mention gcc is this bug. The idea is that gcc will pass the file through the preprocessor so you can use the usual tricks.
Comment on attachment 146587 [details] [diff] [review] patch Thank you for the explanation, tenthumbs. I agree the .size directives in this patch are correct. We should check in those changes. As for removing the executable stack stuff, it is a good idea to have Chris Blizzard find a right person to review it. Does anyone know why we had "nop" instructions after "ret" instructions?
Wan-Teh : Maybe the nops after ret were there to align the code of the next function on some optimized boundary which causes faster execution ? They would probably have to be changed whenever the asm code does.
In answer to comment 15: If I recall correctly, I first wrote mpi_x86.s, and then derived mpi_x86.asm and (much later) mpi_i86.s from it. On the linux platform, I started by compiling a c implementation with optimization, and examining the compiler's assembly code output, to see what pseudo-ops were appropriate to use. Then I replaced most of the assembly code, following the style of the original pseudo ops. When I was nearly done, I linked my assembly code into a shared library, and then disassembled the result to see if the assembler had reordered instructions or done anything else unexpected. There I found those NOPs had been put in, even though they were not in my original code. So, figuring that whatever tool put them there knew what it was doing, I copied them into the source, so that the source exactly matched the assembled object code. I never did figure out why those NOPs were added (presumably by the linker). I figured they didn't hurt and might even help in some way, (e.g. alignment) so I left them in. Here are some guesses as to how/why those NOPs might have been put in: a. Align the functions to begin on "paragraph" boundaries (cache line boundaries), to minimize the number of cache line fetches required to run the function. b. If the processor does instruction pre-fetching and partial decoding of instructions following the current instruction, NOPs minimize the amount of work done there. (it would be pretty bad branch-prediction logic that failed to notice a return!)
I was looking through some disassemblies of linux libc code today, and I noticed that immediately after nearly every unconditional jump or return there was some form of nop instruction. among the opcodes I saw were: nop (1 byte) mov %esi,%esi (2 bytes) lea 0(%esi),%esi (3 bytes) In all cases, the instruction following the nop was aligned on a 4-byte boundary. I found one case where the instruction immediately after a return was already on a 4-byte boundary, and there was no nop after that return. Of course, the next non-nop instruction after a return or unconditional jump is a jump target. So, I think the intent is clear. It is to put jump targets on 4-byte boundaries. Our assembler sources should probably do something to force alignment of the first function, and then add nops to keep all subsequent jump targets 4-byte aligned.
Attached patch patch with align directives (obsolete) (deleted) — Splinter Review
Aligning is easy. This is the previous patch + suitable directives. These are the ones gcc uses when tuning for an i486. If you tune for a pentium4 then gcc doesn't emit them. They will increase the code size slightly but should be otherwise harmless.
Attachment #146587 - Attachment is obsolete: true
The latest patch just aligns the functions. It's not hard to align loops but you should really test if loop alignment actually does any good on your cpu of choice.
Attached patch patch 3 (deleted) — Splinter Review
Yet more alignment. This reproduces what gcc does. There's no gurantee this will actually improve the speed. You need to test it.
Attachment #146848 - Attachment is obsolete: true
Looking through the gcc source code shows that it jumps through hoops to figure out what the assembler is capable of. I doubt the alignment stuff will work on all platforms, like the BSD's. It may be better not to try.
Attached patch patch 4 (deleted) — Splinter Review
If anyone still cares (and I don't mind if they don't) here's a patch with some ifdefs to change the symbol names.
Attached patch a completely different patch (deleted) — Splinter Review
A completely different approach, a common file with the function bodies and some small wrappers that change names and make them global. No ifdefs. This works on Linux but that's the only one I can test.
tenthumbs: we have already figured out how to make gcc use the same symbol names on OS/2. So we only need to delete the non-executable stack directive in order to share mpi_x86.s between Linux and OS/2.
Folks, Thank you all for your contributions to this bug. I'd like to see you all collaborate and produce a single patch that solves the issues of this bug (bug 241081) and of bug 240775. When we have one patch with which everyone here is satisfied, I will be happy to check it in. I'm not clear at the moment about the resolutions of several aspects of these bugs: 1. Do we want to keep the non-executable stack directive at all? Perhaps only on certain platforms? I was under the impression that Wan-Teh wanted to keep it at least for red hat. 2. Do we want the .size directives? 3. Do we want any alignment directives? 4. Do we want any no-ops?
If you're using the same file on multiple platforms then I think you want to include as much information as possible so that the platform can perform as many optimizations as it can. That's why I suggested the .size directives. The executable stack thing is really an question of whether a given platform supports arbitrary code sections, ELF does and OS/2 apparently does not. Given that you're using gcc for the rest of the code (which will insert such things if it wants to) it's probably not necessary for this one file. Nops and alignment directives are really the same issue. Whether they have any effect depends on the run-time cpu (which you don't know). The gas docs say that different platform assemblers behave differently so it's difficult to do the right thing. I don't think it's worth the trouble. Presumably, the assembler code is already significantly faster than gcc code so I don't think it's worth getting too greedy.
Blocks: 240775
Tenthumbs - you hit a hot button issue here . The RSA code can always be faster, especially in NSS server applications. I don't have time to benchmark the effect of alignment on the various CPUs. However, presumably, alignment, whether taken care of through NOPs or through alignment directives, would not hurt the performance. It could only benefit for certain CPUs. So I would be in favor of having it done one way or the other. Perhaps we would align on the largest possible boundary needed for current CPUs to run code optimally (eg. maybe a 64 bit boundary ?), which wouldn't hurt on the other CPU that don't need it - except for a few wasted NOP bytes of memory, which never get executed anyway (might as well use DB FF ). Since the alignment directives aren't portable, there are 2 solutions: 1. split the mpi_x86.s source files into two - one with directives, one with manual alignment done through NOPs . This might also allow using the other directives. 2. just use one source file with manual alignment done through NOPs, and don't use any directives
I exchanged email with Chris Blizzard on Monday. He said that we should keep the non-executable stack directive for Linux. So we need to either add #ifdef LINUX to mpi_x86.s and assemble it using gcc (so we can use C preprocessor macros), or make a copy of mpi_x86.s for OS/2.
If speed is a major issue then measurements are essential. I've found that predictions are almost always wrong. People just don't think like machines. As for alignment, yes nops can hurt. They take time and space. You absolutely have to measure the performance difference on typical data. An improvement on a loop of 1000 may have little (or a bad) effect a loop of 100. Then there's the cpu. For a 32-bit Intel chip and clones, gcc-3.4.0 can tune for 17 different flavors. Ultimate speed would mean a lot of files. If you assume that the Intel Pentium4 is the most common chip sold and used these days and if you believe gcc's optimizations for that chip then you don't want any alignment at all. Inserting manual nops is a bad idea. You wind up with code that's impossible to maintain. If you put in big nasty warnings about why the code does what it does then you wind up with code that no one will touch and it doesn't get updated when it should. Or someone will ignore the warnings and screw up all the careful work. Obviously I think this is a bad idea. If performance is critical then you should consider gcc's inline assembly to avoid function calls but that's harder than you think. I'm not a big fan of ifdefs. In one of the K&R books there's a caution noting that N ifdefs imply 2^N complexity. I would prefer separate files. Here you have assembler code for non-ELF, ELF on Linux, and ELF for everyone else. You may well need three different versions. See my last patch, though. On Linux you can include one assembler file in another so you can keep the Linux-specific stuff separate. BTW, there are a few tweaks that could improve speed marginally; e.g., the usual idiom for testing whether a register contains 0 is to use the test instruction rather than comparing with 0. I can do that if anyone's interested.
tenthumbs, n ifdefs only imply 2^n complexity if each ifdef is for a different condition. I don't like the ifdefs either much in this case, but I think we are only talking about using them in order to hide the directives that aren't supported on some platforms; ie. there would be only two states - directives supported or not. But of course once you start doing this you open the door to more states. Your points on measurements are well taken. I think later this year I will have cycles to do more performance measurements for RSA on x86 platforms. Most likely x86-64 will be involved, and there will be another body of assembly code for it, though.
*** Bug 240775 has been marked as a duplicate of this bug. ***
Folks, the reason this bug isn't progressing, is that the various contributors (y'all) seem to be in disagreement about how to proceed. The questions I asked in comment 26 are still unsettled. As soon as consensus emerges about those 4 questions, I will proceed.
Whiteboard: awaiting consensus of interested parties
All I wanted to do originally was to add size information as an aid to the linker. The question then arose whether this code could/should be shared among various platforms which, in turn, depends on whether a given platform handles arbitrary sections and the platform naming conventions. We then got diverted into an optimization discussion. As I see it, there are just a few real issues here. Assuming this is a good idea then the first question is whether there should be separate files for each platform. If so, then we're essentially finished. If not, then the next question is how to mangle the files. The usual method is to use the C preprocessor. That's straightforward too. Someone has to decide on which approach to use. The other stuff probably belongs in another bug.
QA Contact: bishakhabanerjee → jason.m.reid
3.11 is designated as a performance enhancement release, so I'd like to resolve this bug once and for all in that release.
Target Milestone: 3.10 → 3.11
QA Contact: jason.m.reid → libraries
Any chance of getting a freebl/Makefile patch in? Perhaps in a separate bug? I'm trying to get a 2.0 Firefox build on x64 and ran into this problem (and a few others of course).
Assignee: nelson → nobody
Target Milestone: 3.11 → ---
Severity: normal → S3
You need to log in before you can comment on or make changes to this bug.

Attachment

General

Creator:
Created:
Updated:
Size: