Open
Bug 241081
Opened 21 years ago
Updated 2 years ago
Cleanup for mozilla/security/nss/lib/freebl/mpi/mpi_x86.s
Categories
(NSS :: Libraries, defect, P3)
Tracking
(Not tracked)
NEW
People
(Reporter: tenthumbs, Unassigned)
References
Details
(Whiteboard: awaiting consensus of interested parties)
Attachments
(3 files, 2 obsolete files)
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review | |
(deleted),
patch
|
Details | Diff | Splinter Review |
It's missing some .size directives which help the linker. A patch is attached.
Also, if speed really is a consideration then s_mpv_div_2dx1d at least could be
a gcc inline function which would be even faster. Something like this.
static inline mp_err s_mpv_div_2dx1d (mp_digit nhi, mp_digit nlo, mp_digit den,
mp_digit *qp, mp_digit *rp)
{
__asm__ __volatile__
(
"divl %4 \n\t"
:"=a" (*qp), "=d" (*rp)
: "a" (nlo), "d" (nhi), "rm" (den)
: "cc"
);
return 0;
}
Updated•21 years ago
|
Assignee: wchang0222 → MisterSSL
Comment 2•21 years ago
|
||
Comment on attachment 146587 [details] [diff] [review]
patch
>- # Magic indicating no need for an executable stack
>-.section .note.GNU-stack, "", @progbits
>-.previous
Did you intentionally delete these? You did not
describe this change in the bug report.
> Did you intentionally delete these?
Yes I did. They would affect an entire executable/shared object that contained
this file. Are you sure you want to do that in an obscure utility file? I
would think you should allow the compiler to control this. Leave it in if you
want. You can just delete the .previous line, it's uesless at the end of a
file.
Comment 4•21 years ago
|
||
What is the net effect of this change? smaller shared lib? or ?
Is there any reason NOT to disable executable stacks? Executable stacks
strike me as a vulnerability. mpi is linked into nss's libsoftoken,
which has no need of an executable stack.
Priority: -- → P3
Target Milestone: --- → 3.10
The .size directive gives hints to ld which may or may not use the info to
move the code to a more efficient location. It's highly platform dependent but
gcc always emits it. There's no reason mot to and it's harmless.
As for GNU-stack, it's not at all clear it actually does anything. See
<http://gcc.gnu.org/ml/gcc-patches/2003-06/msg00302.html> and
<http://gcc.gnu.org/ml/gcc-patches/2003-11/msg00975.html>. It's just a hint to
the kernel. In fact, the linker in binutils-2.15.90.0.2-2.15.90.0.3 from
kernel.org just discards the section. Since you have no control over any
linker but your own it guarantess nothing. And can you guarantee that this
won't adversely affect some gcc version out there that might want an
executable stack?
And, of course, there's nothing preventing some other part of a particular
process from changing the permissions on a peice of memory.
It's just a couple of lines that make you feel good without necessarily doing
anything.
Comment 6•21 years ago
|
||
Comment on attachment 146587 [details] [diff] [review]
patch
># Magic indicating no need for an executable stack
>.section .note.GNU-stack, "", @progbits
>.previous
Just wanted to provide the history of these three
lines. I received these three lines from a Red Hat
Linux developer (via Chris Blizzard).
note.GNU-stack is very much a Redhat thing so it's not surprising their
people think it's the usual thing to do. This is all part of the "exec
shield" package but that's not part of the official Linux kernel (at least
not yet). Other distributions may choose to do things differently so there's
the potential for conflicts.
Since the stack belongs to the application putting this is a shared library
is probably superfluous. Linking against a static library might be a problem
if the app needs an executable stack but your library says no.
This isn't worth getting too excited about. It's probably safe to leave it
if you want as long as you realize that it doesn't do much for you and that
you might have some problems.
Comment 8•21 years ago
|
||
Comment on attachment 146587 [details] [diff] [review]
patch
>- # Magic indicating no need for an executable stack
>-.section .note.GNU-stack, "", @progbits
My guess is that this doesn't force a non-executable stack, but
rather avoids forcing an executable stack. I'm inclined to
leave it in if RH recommends it and it doesn't hurt anyone else.
Wan-Teh, if you approve this patch, I'll check it in on the trunk.
Attachment #146587 -
Flags: review?(wchang0222)
Comment 9•21 years ago
|
||
Those 2 lines did hurt OS/2 (see bug 240775 ).
Comment 10•21 years ago
|
||
Comment on attachment 146587 [details] [diff] [review]
patch
r=wtc, assuming the executable stack stuff at
the end of this file is left alone.
I don't know the syntax of the .size directive,
so I don't know if this patch is correct. I'll
trust the submitter of this patch.
If I compile the following C file using
"gcc -c -O2 -S foo.c":
int incr(int a)
{
return ++a;
}
int decr(int a)
{
return --a;
}
the compiler generates the following assembly code:
.file "foo.c"
.text
.p2align 2,,3
.globl incr
.type incr,@function
incr:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
incl %eax
leave
ret
.Lfe1:
.size incr,.Lfe1-incr
.p2align 2,,3
.globl decr
.type decr,@function
decr:
pushl %ebp
movl %esp, %ebp
movl 8(%ebp), %eax
decl %eax
leave
ret
.Lfe2:
.size decr,.Lfe2-decr
.ident "GCC: (GNU) 3.2.2 20030222 (Red Hat Linux 3.2.2-5)"
So my question is what "." means in a .size directive?
Attachment #146587 -
Flags: review?(wchang0222) → review+
Comment 11•21 years ago
|
||
I think Julien would like to see those last 3 lines removed from the source,
(as this patch presently does), and Wan-Teh would like to see them remain
in the source. I just wanna do whatever makes the .s usable on most platforms,
or else make another source file for platforms with odd requirements.
Please advise.
Comment 12•21 years ago
|
||
I am not really qualified to review this
patch, so you can ignore my comments.
I tried to find documentation of .size, but
what I found is not very helpful:
http://www.gnu.org/software/binutils/manual/gas-2.9.1/html_chapter/as_7.html#SEC123
Reporter | ||
Comment 13•21 years ago
|
||
Wan-Teh Chang: the '.' is the current location counter which, in this case, is
the location immediately after the ret. The compiler always puts a local label
at this point which always has the same value as '.'. The label is
superfluous.
Nelson Bolyard: if you use a special make rule for this file you can use "gcc
-x assembler-with-cpp" as I suggested and the wrap the two lines in an ifdef.
You may just need to redefine make's AS variable.
Reporter | ||
Comment 14•21 years ago
|
||
That last message was confusing. I didn't mention gcc is this bug. The idea is
that gcc will pass the file through the preprocessor so you can use the usual
tricks.
Comment 15•21 years ago
|
||
Comment on attachment 146587 [details] [diff] [review]
patch
Thank you for the explanation, tenthumbs.
I agree the .size directives in this patch
are correct. We should check in those changes.
As for removing the executable stack stuff,
it is a good idea to have Chris Blizzard find
a right person to review it.
Does anyone know why we had "nop" instructions
after "ret" instructions?
Comment 16•21 years ago
|
||
Wan-Teh :
Maybe the nops after ret were there to align the code of the next function on
some optimized boundary which causes faster execution ? They would probably have
to be changed whenever the asm code does.
Comment 17•21 years ago
|
||
In answer to comment 15:
If I recall correctly, I first wrote mpi_x86.s, and then derived
mpi_x86.asm and (much later) mpi_i86.s from it.
On the linux platform, I started by compiling a c implementation with
optimization, and examining the compiler's assembly code output, to see
what pseudo-ops were appropriate to use. Then I replaced most of the
assembly code, following the style of the original pseudo ops.
When I was nearly done, I linked my assembly code into a shared library,
and then disassembled the result to see if the assembler had reordered
instructions or done anything else unexpected. There I found those NOPs
had been put in, even though they were not in my original code.
So, figuring that whatever tool put them there knew what it was doing,
I copied them into the source, so that the source exactly matched the
assembled object code.
I never did figure out why those NOPs were added (presumably by the
linker). I figured they didn't hurt and might even help in some way,
(e.g. alignment) so I left them in.
Here are some guesses as to how/why those NOPs might have been put in:
a. Align the functions to begin on "paragraph" boundaries (cache line
boundaries), to minimize the number of cache line fetches required to
run the function.
b. If the processor does instruction pre-fetching and partial decoding
of instructions following the current instruction, NOPs minimize the
amount of work done there. (it would be pretty bad branch-prediction
logic that failed to notice a return!)
Comment 18•21 years ago
|
||
I was looking through some disassemblies of linux libc code today, and I
noticed that immediately after nearly every unconditional jump or return
there was some form of nop instruction. among the opcodes I saw were:
nop (1 byte)
mov %esi,%esi (2 bytes)
lea 0(%esi),%esi (3 bytes)
In all cases, the instruction following the nop was aligned on a 4-byte
boundary. I found one case where the instruction immediately after a
return was already on a 4-byte boundary, and there was no nop after
that return. Of course, the next non-nop instruction after a return or
unconditional jump is a jump target. So, I think the intent is clear.
It is to put jump targets on 4-byte boundaries.
Our assembler sources should probably do something to force alignment of
the first function, and then add nops to keep all subsequent jump targets
4-byte aligned.
Reporter | ||
Comment 19•21 years ago
|
||
Aligning is easy. This is the previous patch + suitable directives. These are
the ones gcc uses when tuning for an i486. If you tune for a pentium4 then gcc
doesn't emit them. They will increase the code size slightly but should be
otherwise harmless.
Attachment #146587 -
Attachment is obsolete: true
Reporter | ||
Comment 20•21 years ago
|
||
The latest patch just aligns the functions. It's not hard to align loops but
you should really test if loop alignment actually does any good on your cpu of
choice.
Reporter | ||
Comment 21•21 years ago
|
||
Yet more alignment. This reproduces what gcc does. There's no gurantee this
will actually improve the speed. You need to test it.
Attachment #146848 -
Attachment is obsolete: true
Reporter | ||
Comment 22•21 years ago
|
||
Looking through the gcc source code shows that it jumps through hoops to
figure out what the assembler is capable of. I doubt the alignment stuff will
work on all platforms, like the BSD's. It may be better not to try.
Reporter | ||
Comment 23•21 years ago
|
||
If anyone still cares (and I don't mind if they don't) here's a patch with some
ifdefs to change the symbol names.
Reporter | ||
Comment 24•21 years ago
|
||
A completely different approach, a common file with the function bodies and
some small wrappers that change names and make them global. No ifdefs. This
works on Linux but that's the only one I can test.
Comment 25•21 years ago
|
||
tenthumbs: we have already figured out how to make
gcc use the same symbol names on OS/2. So we only
need to delete the non-executable stack directive
in order to share mpi_x86.s between Linux and OS/2.
Comment 26•21 years ago
|
||
Folks, Thank you all for your contributions to this bug.
I'd like to see you all collaborate and produce a single patch that
solves the issues of this bug (bug 241081) and of bug 240775.
When we have one patch with which everyone here is satisfied,
I will be happy to check it in.
I'm not clear at the moment about the resolutions of several aspects of
these bugs:
1. Do we want to keep the non-executable stack directive at all?
Perhaps only on certain platforms? I was under the impression that
Wan-Teh wanted to keep it at least for red hat.
2. Do we want the .size directives?
3. Do we want any alignment directives?
4. Do we want any no-ops?
Reporter | ||
Comment 27•21 years ago
|
||
If you're using the same file on multiple platforms then I think you want to
include as much information as possible so that the platform can perform as
many optimizations as it can. That's why I suggested the .size directives.
The executable stack thing is really an question of whether a given platform
supports arbitrary code sections, ELF does and OS/2 apparently does not. Given
that you're using gcc for the rest of the code (which will insert such things
if it wants to) it's probably not necessary for this one file.
Nops and alignment directives are really the same issue. Whether they have any
effect depends on the run-time cpu (which you don't know). The gas docs say
that different platform assemblers behave differently so it's difficult to do
the right thing. I don't think it's worth the trouble. Presumably, the
assembler code is already significantly faster than gcc code so I don't think
it's worth getting too greedy.
Comment 28•21 years ago
|
||
Tenthumbs - you hit a hot button issue here . The RSA code can always be faster,
especially in NSS server applications. I don't have time to benchmark the effect
of alignment on the various CPUs. However, presumably, alignment, whether taken
care of through NOPs or through alignment directives, would not hurt the
performance. It could only benefit for certain CPUs. So I would be in favor of
having it done one way or the other. Perhaps we would align on the largest
possible boundary needed for current CPUs to run code optimally (eg. maybe a 64
bit boundary ?), which wouldn't hurt on the other CPU that don't need it -
except for a few wasted NOP bytes of memory, which never get executed anyway
(might as well use DB FF ).
Since the alignment directives aren't portable, there are 2 solutions:
1. split the mpi_x86.s source files into two - one with directives, one with
manual alignment done through NOPs . This might also allow using the other
directives.
2. just use one source file with manual alignment done through NOPs, and don't
use any directives
Comment 29•21 years ago
|
||
I exchanged email with Chris Blizzard on Monday.
He said that we should keep the non-executable
stack directive for Linux.
So we need to either add #ifdef LINUX to mpi_x86.s
and assemble it using gcc (so we can use C
preprocessor macros), or make a copy of mpi_x86.s
for OS/2.
Reporter | ||
Comment 30•21 years ago
|
||
If speed is a major issue then measurements are essential. I've found that
predictions are almost always wrong. People just don't think like machines.
As for alignment, yes nops can hurt. They take time and space. You
absolutely have to measure the performance difference on typical data. An
improvement on a loop of 1000 may have little (or a bad) effect a loop of
100.
Then there's the cpu. For a 32-bit Intel chip and clones, gcc-3.4.0 can tune
for 17 different flavors. Ultimate speed would mean a lot of files.
If you assume that the Intel Pentium4 is the most common chip sold and used
these days and if you believe gcc's optimizations for that chip then you
don't want any alignment at all.
Inserting manual nops is a bad idea. You wind up with code that's impossible
to maintain. If you put in big nasty warnings about why the code does what
it does then you wind up with code that no one will touch and it doesn't get
updated when it should. Or someone will ignore the warnings and screw up all
the careful work.
Obviously I think this is a bad idea. If performance is critical then you
should consider gcc's inline assembly to avoid function calls but that's
harder than you think.
I'm not a big fan of ifdefs. In one of the K&R books there's a caution
noting that N ifdefs imply 2^N complexity. I would prefer separate files.
Here you have assembler code for non-ELF, ELF on Linux, and ELF for everyone
else. You may well need three different versions. See my last patch, though.
On Linux you can include one assembler file in another so you can keep the
Linux-specific stuff separate.
BTW, there are a few tweaks that could improve speed marginally; e.g., the
usual idiom for testing whether a register contains 0 is to use the test
instruction rather than comparing with 0. I can do that if anyone's
interested.
Comment 31•21 years ago
|
||
tenthumbs, n ifdefs only imply 2^n complexity if each ifdef is for a different
condition. I don't like the ifdefs either much in this case, but I think we are
only talking about using them in order to hide the directives that aren't
supported on some platforms; ie. there would be only two states - directives
supported or not. But of course once you start doing this you open the door to
more states.
Your points on measurements are well taken. I think later this year I will have
cycles to do more performance measurements for RSA on x86 platforms. Most likely
x86-64 will be involved, and there will be another body of assembly code for it,
though.
Comment 32•20 years ago
|
||
*** Bug 240775 has been marked as a duplicate of this bug. ***
Comment 33•20 years ago
|
||
Folks, the reason this bug isn't progressing, is that the various
contributors (y'all) seem to be in disagreement about how to proceed.
The questions I asked in comment 26 are still unsettled.
As soon as consensus emerges about those 4 questions, I will proceed.
Whiteboard: awaiting consensus of interested parties
Reporter | ||
Comment 34•20 years ago
|
||
All I wanted to do originally was to add size information as an aid to the
linker. The question then arose whether this code could/should be shared
among various platforms which, in turn, depends on whether a given platform
handles arbitrary sections and the platform naming conventions. We then got
diverted into an optimization discussion.
As I see it, there are just a few real issues here. Assuming this is a good
idea then the first question is whether there should be separate files for
each platform. If so, then we're essentially finished. If not, then the next
question is how to mangle the files. The usual method is to use the C
preprocessor. That's straightforward too.
Someone has to decide on which approach to use.
The other stuff probably belongs in another bug.
Updated•19 years ago
|
QA Contact: bishakhabanerjee → jason.m.reid
Comment 35•19 years ago
|
||
3.11 is designated as a performance enhancement release, so I'd like to
resolve this bug once and for all in that release.
Target Milestone: 3.10 → 3.11
Updated•19 years ago
|
QA Contact: jason.m.reid → libraries
Comment 36•18 years ago
|
||
Any chance of getting a freebl/Makefile patch in? Perhaps in a separate bug?
I'm trying to get a 2.0 Firefox build on x64 and ran into this problem (and a few
others of course).
Updated•18 years ago
|
Assignee: nelson → nobody
Updated•18 years ago
|
Target Milestone: 3.11 → ---
Updated•2 years ago
|
Severity: normal → S3
You need to log in
before you can comment on or make changes to this bug.
Description
•