805604 - Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture.

Reporter

Description

•

12 years ago

Attached patch Proposed patch by Shay Gueron (obsolete) (deleted) — Details — Splinter Review

This bug report was split off from bug 373108. The attached patch was originally submitted by Shay Gueron of Intel in bug 373108 comment 55, with the following description: Hello all - This patch offers an efficient implementations of AES-GCM. The implementation uses Intel's AES and PCLMULQDQ instructions (AES-NI), and is designed for the current (and future) Intel Core Processors, with the AVX instruction set (the 2nd, the 3rd, and the (future) 4th Generation Intel Core). The algorithms and methods that underlie this code are detailed in references [1-4]: [1] Shay Gueron, Michael E. Kounavis: Intel® Carry-Less Multiplication Instruction and its Usage for Computing the GCM Mode (Rev. 2.01) http://software.intel.com/sites/default/files/article/165685/clmul-wp-rev-2.01-2012-09-21.pdf [2] S. Gueron, M. E. Kounavis: Efficient Implementation of the Galois Counter Mode Using a Carry-less Multiplier and a Fast Reduction Algorithm. Information Processing Letters 110: 549–553 (2010). [3] S. Gueron: AES Performance on the 2nd Generation Intel® Core™ Processor Family (to be posted) (2012). [4] S. Gueron: Fast GHASH computations for speeding up AES-GCM (to be published) (2012). The patch assumes NSS version 3.14 RC0 The performance =============== The performance of this patch was measured by using the built-in bltest utility as follows: bltest -E -m aes_gcm -p 10000 -o ct.txt -g 16 -b 16384 (measuring authenticated encryption performance for a 16KB buffer, repeated 10000 times) NSS 3.14 RC0 This patch Speedup ---------------------------------------- Performance in GB/sec: ---------------------------------------- Core i7-2600K @3.4GHz ("Sandy Bridge") 0.057 GB/sec 1.17 GB/sec 20.5x Core i7-3770 @3.4GHz ("Ivy Bridge") 0.059 GB/sec 1.19 GB/sec 20.2x ---------------------------------------- Performance in Cycles per byte (C/B): ---------------------------------------- Core i7-2600K @3.4GHz ("Sandy Bridge") 55.42 C/B 2.70 C/B 20.5x Core i7-3770 @3.4GHz ("Ivy Bridge") 53.67 C/B 2.66 C/B 20.2x ---------------------------------------- Comments: ========= This implementation is designed for the Intel(R) Core(TM) processors that support AES and PCLMULQDQ instructions, as well as the AVX instructions set. (It also works, with high performance, on AMD Bulldozer processor that has these instructions). Its AES-GCM performance is the fastest TLS authenticated encryption combination that is provided by the NSS library. The patch integrates seamlessly with the existing NSS implementation: it selects the appropriate code path by checking the CPUID bits to detect AES-NI, PCLMULQDQ, and AVX support. For processors that do not support these instructions, the implementation resorts to the original GCM code. The code was tested on NSS version 3.14 RC0. Thanks to Wan-Teh Chang, Bob Relyea, Brian Smith, and Eric Rescorla for preliminary discussions on the patch and its integration. Developers and authors: ************************************************************************ Shay Gueron (1, 2), and Vlad Krasnov (1) (1) Intel Corporation, Israel Development Center, Haifa, Israel (2) University of Haifa, Israel ************************************************************************ Copyright(c) 2012, Intel Corp.

Attachment #675278 - Flags: review?(rrelyea)

Robert Relyea

Comment 1

•

12 years ago

OK, I'm still the in the process of reviewing the actual code, but I do have some review comments, and lots of questions, plus there are a few things that would require a new patch, so I thought I'd get my comments and questions out right now. I currently only have partial reviews of the code under USE_HW_AES in gcm.c and intel-gcm.s. --------------------------------------------------------------------------------------------- gcm.c First: The patch includes some changes that will back out changes wtc has made to the trunk of NSS. These obviously need to be reverted. They all the changes to gcm.c not included inside the #ifdef USE_HW_AES ifdefs. Now some general style comments: Pretty much everything inside #ifdef USE_HW_AES are both self-contained, and specific to both AES and Intel HW, so it think it makes sense for them to be in their oen .c file (intel-aes-gcm.c or something like that). It also makes since to move the new function declarations for them from gcm.h to intel-gcm.h. The upshot is the new patch probably shouldn't have any changes to gcm.c (though I won't preclude it). The remaining comments for for the #ifdef USE_HW_AES portion of gcm.c, which I'm recommending to go into a new file. Minor nit. We probably should add AES to the names of these functions since they are all AES specific (that is if we ever supported, say RC-5 GCM, we wouldn't be able to use these functions). Minor nit. There are a lot of places where we use the magic number '16' in the code. I think it's fine, just as long as we have a comment that says that the blocksize is 16 to explain it. Replacing the magic number with a AES_BLOCKSIZE #define is fine as well. NIT: We have a lot of places where an intermediate variable would add to readability,, like if (gcmParams->ulIvLen%16) { for (j=0; j < gcmParams->ulIvLen%16; j++) .. I'm pretty sure the compiler will notice we did the same calculation and collapse it, but I just feel more comfortable adding a temp: unsigned in remainder = gcmParams->ulIvLen%16; . . if (remainder) { for (j=0; j < remainder; j++ )... Be sure to keep 'C' style declarations rather than C++ style here. NIT: There are a lot of memory copyies with for loops rather than PORT_Memcpy (which NSS #defines to the OS memcpy, which is often a compilier intrinsic). PORT_Memcpy should be both faster and more readable than a for loop copying byte at a time. NIT: same comment as above except replace PORT_Memcpy with PORT_Memset when setting memory. Bug: intel_gcmTAG returns the whole tag, which is fine, but intel_GCM_EncryptUpdate only handles truncating the tag to a byte boundary rather than a bit boundary (see gcm_GetTag in gcm.c for an example). Same issue in intel_GCM_DecryptUpdate, except there is less of an issue because we are comparing rather than returning, so missing the last 1-7 bits in the compare would only lead to a very rare false positive. I still haven't reviewed the _mm_ compiler intrinsics, but other than that the rest of the c code is fine. -------------------------------------------------------------------------------------------- rijndael.c Only one style issue: at line 1005 in the existing code, we use the cpuid command to find out if aes_hw is available. The commands fills in a static global that is only visible to aes_InitContext. I think it would be better to just make that a file wide static global, and fill in all the globals from that one cpuid command rather than replicating the code again in AES_InitContext() (actually, it may make more since to move both tests to AES_InitContext and make sure they are ran before aes_InitContext(). In anycase it should be realtively easy to make this call once for both functions and set all the cpuid variables we need. Minor NIT: the new block needs to be indented to match the standard indent for the case statement. The rest of the rijndael.c changes are fine, ------------------------------------------------------------------------------------- The Makefile changes are fine. ------------------------------------------------------------------------------------- intel-aes.s I've barely scratched the surface here, and I have lots of questions: First there are lots of use of 'v' instructions (vmovdqu versus movdqu, vpbufb versus pbufb, etc.) I'm not clear what the real reason for the 'v' version versus the non-'v' version. I know in some cases the 'v' version can handle 256 bit registers (ymm0 versus xmm0), but I don't think we are using any here 256 bit registers in this code. I also know some 'v' versions have non-destructive implementations of the functions (A op B put into C rather than A op B put into B), but I see a lot of use of 'v' instructions where the 2nd and 3rd operand are the same (so they function the same as the non-'v' instructions. Is there some sort of intrinsic performance advantage to the 'v' instructions? in intel_gcmTag, we use the 'u' (unaligned, verses 'a' or aligned) form of the move instructions. The data we operate on are elements in our context, so we could arrange for those elements to be properly aligned. If we did so, would there be a performance with using the 'a' form for these? in GFMUL, it looks like we use the pclmulqdq instruction rather than shifts to do our mod reduction. I was unable to find an example of this in your paper, or a description for how it works. I'm working on a commented version of this to help explain how it works in case someone needs to modify it. Your paper, BTW, was extremely useful in helping me understand how the multiply function worked. I still have the bulk of the .s file to review.

Shay Gueron

Assignee

Comment 2

•

12 years ago

Bob, Thanks for the detailed review. Here are some comments and answers. >>> gcm.c >>> First: The patch includes some changes that will back out changes wtc has >>> made to the trunk of NSS. The patch was built on the latest available version (as declared). But - no problem to build it on top of another version, and not to override changes made by Wan-Teh. The nits: no problem to modify. >>> Bug: intel_gcmTAG returns the whole tag, which is fine, but >>> intel_GCM_EncryptUpdate only >>> handles truncating the tag to a byte boundary rather than a bit boundary >>> (see gcm_GetTag in gcm.c for an example). >>> Same issue in intel_GCM_DecryptUpdate, except there is less of an issue because >>> we are >>> comparing rather than returning, so missing the last 1-7 bits in the compare would >>> only lead to a very rare false positive. No - this is not a bug. The NIST spec (SP 800-38D) states on page 9 that "The bit length of the tag, denoted t, is a security parameter, as discussed in Appendix B. In general, t may be any one of the following five values: 128, 120, 112, 104, or 96. For certain applications, t may be 64 or 32..." So, truncation the tag to a bit-length that is not divisible by 8 is not adhering to the spec, and thus does not need to be supported (as is the case in the discussed patch). Furthermore, on that page of the spec it is stated: "An implementation shall not support values for t that are different from the seven choices in the preceding paragraph." In this sense, I should even go more strictly and not allow truncation to “any” byte-length (only to the 7 allowed caes). I believe that the NSS implementation should do this as well. >>> rijndael.c Style and Nits: no problem to modify >>> intel-aes.s >>> I've barely scratched the surface here, and I have lots of questions: >>> First there are lots of use of 'v' instructions >>> (vmovdqu versus movdqu, vpbufb versus pbufb, The AVX instructions are used for their non-destrcutive destination property: In the code, there are many instructions where the 2nd and 3rd operand are NOT the same, which saves moves, and greatly improves performance (by ~10%.). For the other cases, where the second and third operands are the same: there is a performance penalty associated with mixing SSE and AVX instructions, and to avoid this - all of them are coded as AVX. >>> in intel_gcmTag, we use the 'u' (unaligned, verses 'a' or aligned) form of >>> the move instructions. There is no difference in the performance of aligned and unalighned moves. >>> in GFMUL, it looks like we use the pclmulqdq instruction rather than shifts to >>> do our mod reduction. >>> I was unable to find an example of this in your paper, or a description for >>> how it works. I'm working on a commented version of this to help explain how it >>> works in case >>> someone needs to modify it. Your paper, BTW, was extremely useful in helping >>> me understand how the multiply function worked. This is a new reduction method, and will be described and proved in my coming paper (Ref. [4] cited in the patch body). It uses a very short sequence of instructions and its efficiency increases with the performance of the pclmuqdq instruction.

Robert Relyea

Comment 3

•

12 years ago

No - this is not a bug. The NIST spec (SP 800-38D) states on page 9 that "The bit length of the tag, denoted t, is a security parameter, as discussed in Appendix B. In general, t may be any one of the following five values: 128, 120, 112, 104, or 96. For certain applications, t may be 64 or 32..." > So, truncation the tag to a bit-length that is not divisible by 8 is not adhering to the spec, > and thus does not need to be supported (as is the case in the discussed patch). Furthermore, > on that page of the spec it is stated: > > "An implementation shall not support values for t that are different from the seven choices > in the preceding paragraph." > > In this sense, I should even go more strictly and not allow truncation to “any” byte-length > (only to the 7 allowed caes). We're implementing to the PKCS #11 v2.30 draft 7. The draft specified ulTagbits to be 0 to 128 bits. Unfortunately the reference to the GCM description is [GCM]. It's not clear if it should McGrew/Viega or the NIST document. If it's the NIST document we should just disallow the unsupported lengths. > > I believe that the NSS implementation should do this as well. The difference is why I flagged the issue. We shouldn't have different behavior between the NSS generic GCM and the accelerated-version. bob

Shay Gueron

Assignee

Comment 4

•

12 years ago

Indeed, there is a discrepancy between the NIST spec [NIST] and the PKCS spec [PKCS#11]. Here is my analysis of the situation: 1. The reference to GCM in [PKCS#11] points to McGrew/Viega paper from 2004. This paper was a (initial) proposal. 2. While NIST was soliciting comments, it was pointed out that truncating the MAC tag to a short tag has serious security implications [Ferguson]. 3. This comment [Ferguson] is cited in the NIST spec [NIST] from 2009. While the restriction of the truncation to only 7 sizes is not motivated in [NIST], I assume that it is affected by the attack by [Ferguson]. 4. Allowing general truncation, to any bit length from 0 to 128, would include lengths that are proven to be undesired. I therefore tend to suggest converging to the NIST spec and support truncation only to the 7 allowed values 128, 120, 112, 104, 96, 64, 32. Of course, I agree that whatever is decided should be consistent in both NSS implementations. Shay [NIST] NIST Special Publication 800-38 (2007) [PKCS#11] PKCS #11 v2.30 draft 7 (2009) [Ferguson] Ferguson, N., Authentication Weaknesses in GCM, Natl. Inst. Stand. Technol. [Web page], http://csrc.nist.gov/groups/ST/toolkit/BCM/documents/comments/CWC-GCM/Ferguson2.pdf (2005)

Robert Relyea

Comment 5

•

12 years ago

Let's just go with the truncation. I'll add a separate patch to make sure the args are full bytes. That is the simplest code, and most likely configuration, given the NIST recommendation. bob

Robert Relyea

Comment 6

•

12 years ago

Comment on attachment 675278 [details] [diff] [review] Proposed patch by Shay Gueron r- Shay is already working on a new patch to address the comments I already made in this bug, as well as the following I've sent him: I only found one bug and one potential problem in the .s file.... The potential problem is the vpxor (PT), TMP1, TMP0 (at line 888 in intel_gcmENC near the .LLast 4 label) you have a comment that says "We assume that although the last block is partial, we can still read the whole block". The potential problem is this buffer comes all the way from the user. It may be the case that the user has passed us a buffer that is right at the end of the page. This is likely to be extremely rare, which means when we hit it we will get unexpected crashes that could be difficult to debug. The write to (CT) is fine because, as you said in your comment, we have to have space for the authentication block, so we know CT can hold at least another 128 bits (16 bytes) of additional data. I think we need to bit the bullet and handle loading the right number of bytes of (PT) before the xor. The bug is the converse case in intel_gcmDEC (line 1257 near .LDECLast3). Here the vpxor (CT) is fine, since we know that CT includes the authdata. The problem here is we can't write past the end of PT. In some cases the buffer that is passed to us is exactly what is allocated. Writing past the end could trash the memory allocator data structures. The other case that there is a problem is when CT and PT are the same (It is permissible to pass the same pointer in). In this case we will trash our authentication data before we actually use it. The rest of the .s file looks fine to me. I also asked him for some specific comments in the .s file to guide future reviewers/modifiers of the code. bob

Attachment #675278 - Flags: review?(rrelyea) → review-

Wan-Teh Chang

Reporter

Updated

•

12 years ago

Target Milestone: 3.14.1 → 3.14.2

Shay Gueron

Assignee

Comment 7

•

12 years ago

Attached patch Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture --- Rev. 2 (obsolete) (deleted) — Details — Splinter Review

Hello everyone – Here is Rev. 2 of the patch, taking in all the comments from Bob’s. The changes (as requested): 1. Patch applied to trunk (HEAD). 2. Moved all the definitions to intel-gcm.h 3. Moved all C code to a new file intel-gcm-wrap.c 4. Changed the names of the functions to aes_gcm (to avoid confusion) 5. Replaced 16 with AES_BLOCK_SIZE 6. Added intermediate variables to improve readability 7. Now use PORT_memcpy and PORT_memset, instead loops 8. NOT CHANGED: Tag returned at byte granularity, like it was Note: our discussions led to this decision and the “non AES-NI” NSS AES-GCM code will patch and behave similarly 9. rinjdael.c: The CPU id check is now global, and is performed once 10. Indents fixed 11. Partial blocks are now copied at the exact boundary of the buffer, do not assume there is more space 12. Regarding the –msse4 flag. Now using–mssse3 flag. Please review this version an see if it answers all the concerns. Thanks, Shay

Attachment #686679 - Flags: review+

Robert Relyea

Comment 8

•

12 years ago

Comment on attachment 686679 [details] [diff] [review] Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture --- Rev. 2 Make sure it gets in my review queue.

Attachment #686679 - Flags: review+ → review?(rrelyea)

Robert Relyea

Comment 9

•

12 years ago

Comment on attachment 686679 [details] [diff] [review] Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture --- Rev. 2 r+ rrelyea... I'm going to r+ this but there still is a caveat. We can't add the -msse3 in general either. We can add it specifically for the intel-aes.s file because we don't call that .o unless we've already veried we have sse3 (or actuall sse4). bob

Attachment #686679 - Flags: review?(rrelyea) → review+

Robert Relyea

Comment 10

•

12 years ago

Attached patch GCM patch as checked in. (obsolete) (deleted) — Details — Splinter Review

The changes to the r+ patch were: 1) fixed incorrect addressing of a fixed buffer which lead to wrong data being hashed in intel-gcm-wrap.c 2) add missing jmp statements at the bottom of assembler loops in intel-gcm.s 3) gcm_decrypt was hashing the decrypted data not the encrypted data when dealing with the partial block. That is fixed.

Robert Relyea

Comment 11

•

12 years ago

Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.124; previous revision: 1.123 done RCS file: /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.h,v done Checking in intel-gcm.h; /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.h,v <-- intel-gcm.h initial revision: 1.1 done RCS file: /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.s,v done Checking in intel-gcm.s; /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.s,v <-- intel-gcm.s initial revision: 1.1 done RCS file: /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm-wrap.c,v done Checking in intel-gcm-wrap.c; /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm-wrap.c,v <-- intel-gcm-wrap.c initial revision: 1.1 done Checking in manifest.mn; /cvsroot/mozilla/security/nss/lib/freebl/manifest.mn,v <-- manifest.mn new revision: 1.66; previous revision: 1.65 done Checking in rijndael.c; /cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v <-- rijndael.c new revision: 1.29; previous revision: 1.28 done

Comment 21

•

12 years ago

Comment on attachment 702474 [details] [diff] [review] nspr configure test for AVX support This approach doesn't seem to help. I had assumed that we're able to set variables during NSPR configure, and reuse such variables in NSS makefiles. But it seems I'm wrong. I found a changelog file for the Gnu assembler, which says that support for AVX has been added in version 2.19 I propose that the NSS makefile get enhanced to check for that version number, and it the available AS is older, then disable the use of the new optimization.

Attachment #702474 - Flags: feedback?(wtc)

Kai Engert (:KaiE:)

Updated

•

12 years ago

Attachment #702474 - Attachment is obsolete: true

Robert Relyea

Comment 22

Wan-Teh Chang

Reporter

Updated

•

12 years ago

Attachment #706820 - Flags: checked-in+

Wan-Teh Chang

Reporter

•

12 years ago

Attached patch Use a GNU make feature to add an extra compiler or assembler flag to just one source file (obsolete) (deleted) — Details — Splinter Review

In bug 835050 comment 9, Jan Beich suggested a succinct way to add an extra compiler or assembler flag to just one source file. I verified that technique works. I am using GNU Make 3.81. Since I am quite familiar with GNU Make but I didn't know about this trick, I suspect it is a new GNU Make feature. This patch also fixes two minor issues: - Add or remove white space in variable definitions. - Fix a typo (-msse4 vs. -mssse3) in a comment.

Attachment #707229 - Flags: superreview?(rrelyea)

Attachment #707229 - Flags: review?(kaie)

Kai Engert (:KaiE:)

Comment 30

•

12 years ago

Comment on attachment 707229 [details] [diff] [review] Use a GNU make feature to add an extra compiler or assembler flag to just one source file r=kaie

Attachment #707229 - Flags: review?(kaie) → review+

Wan-Teh Chang

Reporter

Comment 31

•

12 years ago

Comment on attachment 707229 [details] [diff] [review] Use a GNU make feature to add an extra compiler or assembler flag to just one source file The GNU Make feature I used in this patch is called target-specific variable values. I found it documented in the GNU Make 3.77 manual: ftp://ftp.gnu.org/old-gnu/Manuals/make-3.77/html_node/make_69.html#SEC68 GNU Make 3.77 was released in May 1998. So it is safe to use this feature. Patch checked in on the NSS trunk (NSS 3.14.2). Checking in Makefile; /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v <-- Makefile new revision: 1.126; previous revision: 1.125 done

Attachment #707229 - Flags: checked-in+

Robert Relyea

Comment 32

•

12 years ago

Checking to update the license header... /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm-wrap.c,v <-- intel-gcm-wrap.c new revision: 1.2; previous revision: 1.1 done Checking in intel-gcm.h; /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.h,v <-- intel-gcm.h new revision: 1.2; previous revision: 1.1 done Checking in intel-gcm.s; /cvsroot/mozilla/security/nss/lib/freebl/intel-gcm.s,v <-- intel-gcm.s new revision: 1.2; previous revision: 1.1 done

Robert Relyea

Comment 33

•

12 years ago

Attached patch license patch [checked in] (deleted) — Details — Splinter Review

Robert Relyea

Comment 34

•

12 years ago

Comment on attachment 707229 [details] [diff] [review] Use a GNU make feature to add an extra compiler or assembler flag to just one source file r+ like it!

Attachment #707229 - Flags: superreview?(rrelyea) → superreview+

Shay Gueron

Assignee

Comment 35

•

12 years ago

" "

Shay Gueron

Assignee

Comment 36

•

12 years ago

The issue has been resolved by Bob/WTC.

Franziskus Kiefer [:franziskus]

Updated

•

8 years ago

Attachment #675278 - Attachment is obsolete: true

Franziskus Kiefer [:franziskus]

Updated

•

8 years ago

Attachment #686679 - Attachment is obsolete: true

Franziskus Kiefer [:franziskus]

Updated

•

8 years ago

Attachment #702114 - Attachment is obsolete: true

Comment hidden (obsolete)

Comment on attachment 702549 [details] [diff] [review] bustage fix: On Linux, only optimize if installed assembler is >= 2.19 >Index: mozilla/security/nss/lib/freebl/Makefile >=================================================================== >RCS file: /cvsroot/mozilla/security/nss/lib/freebl/Makefile,v >retrieving revision 1.124 >diff -u -u -r1.124 Makefile >--- mozilla/security/nss/lib/freebl/Makefile 15 Jan 2013 02:36:11 -0000 1.124 >+++ mozilla/security/nss/lib/freebl/Makefile 15 Jan 2013 23:17:34 -0000 >@@ -187,9 +187,12 @@ > # DEFINES += -DMPI_AMD64_ADD > # comment the next two lines to turn off intel HW accelleration > DEFINES += -DUSE_HW_AES >- ASFILES += intel-aes.s intel-gcm.s >- EXTRA_SRCS += intel-gcm-wrap.c >- INTEL_GCM=1 >+ ASFILES += intel-aes.s >+ ifeq ($(HAVE_AS_AVX),1) >+ ASFILES += intel-gcm.s >+ EXTRA_SRCS += intel-gcm-wrap.c >+ INTEL_GCM=1 >+ endif > MPI_SRCS += mpi_amd64.c mp_comba.c > endif > ifeq ($(CPU_ARCH),x86) >@@ -444,9 +447,12 @@ > DEFINES += -DNSS_USE_COMBA -DMP_CHAR_STORE_SLOW -DMP_IS_LITTLE_ENDIAN > # comment the next two lines to turn off intel HW accelleration > DEFINES += -DUSE_HW_AES >- ASFILES += intel-aes.s intel-gcm.s >- EXTRA_SRCS += intel-gcm-wrap.c >- INTEL_GCM=1 >+ ASFILES += intel-aes.s >+ ifeq ($(HAVE_AS_AVX),1) >+ ASFILES += intel-gcm.s >+ EXTRA_SRCS += intel-gcm-wrap.c >+ INTEL_GCM=1 >+ endif > MPI_SRCS += mpi_amd64.c > else > # Solaris x86 >Index: mozilla/security/nss/lib/freebl/config.mk >=================================================================== >RCS file: /cvsroot/mozilla/security/nss/lib/freebl/config.mk,v >retrieving revision 1.29 >diff -u -u -r1.29 config.mk >--- mozilla/security/nss/lib/freebl/config.mk 14 Nov 2012 01:14:10 -0000 1.29 >+++ mozilla/security/nss/lib/freebl/config.mk 15 Jan 2013 23:17:34 -0000 >@@ -94,4 +94,27 @@ > EXTRA_SHARED_LIBS += -dylib_file @executable_path/libplc4.dylib:$(DIST)/lib/libplc4.dylib -dylib_file @executable_path/libplds4.dylib:$(DIST)/lib/libplds4.dylib > endif > >+ifeq ($(OS_TARGET),Linux) >+ifeq ($(CPU_ARCH),x86_64) >+ >+AS_VER_MAJOR := $(shell as --version | head -1 | sed 's/^.* version //' | cut -f1 -d.) >+AS_VER_MINOR := $(shell as --version | head -1 | sed 's/^.* version //' | cut -f2 -d.) >+ >+ASMAJ_EQ_2 := $(shell [ $(AS_VER_MAJOR) -eq 2 ] && echo true) >+ASMAJ_GT_2 := $(shell [ $(AS_VER_MAJOR) -gt 2 ] && echo true) >+ASMIN_GT_18 := $(shell [ $(AS_VER_MINOR) -gt 18 ] && echo true) >+ >+ifeq ($(ASMAJ_GT_2),true) >+HAVE_AS_AVX=1 >+else >+ifeq ($(ASMAJ_EQ_2),true) >+ifeq ($(ASMIN_GT_18),true) >+HAVE_AS_AVX=1 >+endif >+endif >+endif >+ >+endif # x86_64 >+endif # Linux >+ > endif >Index: mozilla/security/nss/lib/freebl/rijndael.c >=================================================================== >RCS file: /cvsroot/mozilla/security/nss/lib/freebl/rijndael.c,v >retrieving revision 1.29 >diff -u -u -r1.29 rijndael.c >--- mozilla/security/nss/lib/freebl/rijndael.c 15 Jan 2013 02:36:11 -0000 1.29 >+++ mozilla/security/nss/lib/freebl/rijndael.c 15 Jan 2013 23:17:34 -0000 >@@ -1014,7 +1014,11 @@ > freebl_cpuid(1, &eax, &ebx, &ecx, &edx); > has_intel_aes = (ecx & (1 << 25)) != 0 ? 1 : -1; > has_intel_clmul = (ecx & (1 << 1)) != 0 ? 1 : -1; >+#ifdef HAVE_AS_AVX > has_intel_avx = (ecx & (1 << 28)) != 0 ? 1 : -1; >+#else >+ has_intel_avx = -1; >+#endif > } else { > has_intel_aes = -1; > has_intel_avx = -1; >@@ -1127,7 +1131,7 @@ > cx->isBlock = PR_FALSE; > break; > case NSS_AES_GCM: >-#if USE_HW_AES >+#if defined(USE_HW_AES) && defined(HAVE_AS_AVX) > if(use_hw_gcm) { > cx->worker_cx = intel_AES_GCM_CreateContext(cx, cx->worker, iv, blocksize); > cx->worker = (freeblCipherFunc)

Franziskus Kiefer [:franziskus]

Updated

•

8 years ago

Attachment #706820 - Attachment is obsolete: true

Attachment #706820 - Flags: feedback?(shay.gueron)

Franziskus Kiefer [:franziskus]

Updated

•

8 years ago

Attachment #707229 - Attachment is obsolete: true

Proposed patch by Shay Gueron 12 years ago Wan-Teh Chang (deleted), patch	rrelyea : review-	Details \| Diff \| Splinter Review
Efficient AES-GCM implementation that uses Intel's AES and PCLMULQDQ instructions (AES-NI) and the Advanced Vector Extension (AVX) architecture --- Rev. 2 12 years ago Shay Gueron (deleted), patch	rrelyea : review+	Details \| Diff \| Splinter Review
GCM patch as checked in. 12 years ago Robert Relyea (deleted), patch		Details \| Diff \| Splinter Review
nspr configure test for AVX support 12 years ago Kai Engert (:KaiE:) (deleted), patch		Details \| Diff \| Splinter Review
bustage fix: On Linux, only optimize if installed assembler is >= 2.19 12 years ago Kai Engert (:KaiE:) (deleted), patch	rrelyea : review-	Details \| Diff \| Splinter Review
Work around the problem with Clang's integrated assembler (bug 835050) 12 years ago Wan-Teh Chang (deleted), patch	KaiE : review+ rrelyea : superreview+ wtc : checked-in+	Details \| Diff \| Splinter Review
Use a GNU make feature to add an extra compiler or assembler flag to just one source file 12 years ago Wan-Teh Chang (deleted), patch	KaiE : review+ rrelyea : superreview+ wtc : checked-in+	Details \| Diff \| Splinter Review
license patch [checked in] 12 years ago Robert Relyea (deleted), patch		Details \| Diff \| Splinter Review