freebl: POWER poly1305 mac vector acceleration
Categories
(NSS :: Libraries, enhancement, P5)
Tracking
(Not tracked)
People
(Reporter: johnjmar, Unassigned, NeedInfo)
References
(Blocks 1 open bug)
Details
(Whiteboard: [nss-fx])
Attachments
(3 files, 3 obsolete files)
User Agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:72.0) Gecko/20100101 Firefox/72.0
Expected results:
Would like to reach community to help improve the performance of Poly1305
using POWER8 and POWER9 ISA enhancements, and to provide proof that
achieved performance of Poly1305 is close to optimal for the platform
(i.e matching or beating OpenSSL performance *)
Optimized assembly implementations in the Cryptogams repository [1] may serve
as useful reference. C-based intrinsic implementations are also acceptable
given prior conditions (*).
Financial bounty upon completion, community acceptance of patches, and reviewer
input.
Moreover, free access to a ppc64le VM or direct Jenkins CI account can be
obtained at OSU Open Source Lab: https://osuosl.org/services/powerdev/
You can list me (johnjmar@linux.vnet.ibm.com) as the IBM Advocate.
Updated•5 years ago
|
Updated•5 years ago
|
Comment hidden (off-topic) |
Comment 2•4 years ago
|
||
Hi. The original submitter of this bug, John Martinez, left IBM. For now, I'll be the designated contact from IBM.
Comment 3•4 years ago
|
||
Note that we decided to bring support for vectorization for the IBM platforms directly via HACL*.
B. Beurdouche within : https://bugzilla.mozilla.org/show_bug.cgi?id=1613235#c28
I don't think we would take the assembly for Poly1305, but fortunately no patch was provided and there is vectorization support in progress on the HACL* side, so I am sure that this code will be pretty fast...
related: https://github.com/project-everest/hacl-star/issues/404
@gcwilson, it looks like this issue (and the related bounty: https://www.bountysource.com/issues/88058247-freebl-power-poly1305-mac-vector-acceleration) are superseded by
issue: https://github.com/project-everest/hacl-star/issues/404
bounty: https://www.bountysource.com/issues/96585648-enable-build-system-and-optimize-libintvector-h-for-ppc64le-linux
If so, then this issue (and the related bounty) should be closable.
(@aoeuh, you may take a look at the hacl-star bounty, as you know, sometimes bounties are started but never finished, so possibly you can still solve-and-claim the hacl-star bounty.)
Hi Benjamin,
Bringing vectorized implementations from HACL* is good, specifically with the upcoming ppc64le support of libintvector.h would bring a considerable enhancement to the Poyl1305 mac on ppc64le. However, HACL* vectorization via libintvector.h uses one template to vectorize algorithms for all supported architectures using pre-defined C intrinsics which doesn't yield the same performance margin among supported architectures because the way out-of-order executution (OoOE) working differs from one architecture to another. There are cases in HACL* vectorized template where data alternates between vector and scalar which hold the performance a bit back on powerpc architecture. I made a benchmark of assembly implementation and HACL* vectorization on POWER9 to determine the performance gap, I got 1.45 cycles per byte for Poly1305 update function of native implementation and 3.88 cycles per byte for Poly1305 update function of HACL* vectorization. With that said, could you reconsider implementing Poly1305 mac the same way previous algorithms have been optimized to get the optimal performance on PowerPC architecture?
Comment 7•4 years ago
|
||
Hi Mamone,
The current goal of NSS for its low level crypto is to focus on memory safety, functional correctness and secret independence.
To achieve these goals, we are trying to remove as much code as possible that is handwritten. Especially, we try to reduce the duplication of code between architectures and focus on having a very limited number of implementations which can provide stronger guarantees and can be audited more easily and frequently than the current code. In that process, performance is not something we are looking at much, except to make sure we don't cause significant regressions.
For the specific case that you are mentioning, we already have high quality code for it in HACL* and the vectorization via libintvector.h would provide unoptimal, yet good enough performance for NSS. However, I think it would be interesting for you and George to discuss your observations with my HACL* co-authors [0] and see if they could generate a different version of the code which could directly leverage calls to the power assembly and intrinsics without the "manual" code duplication that we have in the Chacha20Poly1305 case [1].
[0] hacl-star-maintainers@lists.gforge.inria.fr
[1] https://searchfox.org/nss/source/lib/freebl/chacha20poly1305-ppc.c
Updated•4 years ago
|
Comment 8•3 years ago
|
||
Hi Benjamin,
I am from IBM and work with George Wilson. I am curious on the next plan for getting Poly1305 code into NSS for ppc64le.
I understand that the idea is to take the code from hacl-star. Do you think the code is now good enough there to pull from ? Or is there some development left in hacl-star before it can be taken into this ?
Thanks & Regards,
- Nayna
Comment 9•2 years ago
|
||
Redirect a needinfo that is pending on an inactive user to the triage owner.
:beurdouche, since the bug has recent activity, could you please find another way to get the information or close the bug as INCOMPLETE
if it is not actionable?
For more information, please visit auto_nag documentation.
Comment 10•2 years ago
|
||
Comment 11•2 years ago
|
||
[PowerPC] Process leftovers and message length in poly1305 update
Comment 12•2 years ago
|
||
I pushed a patch to phabricator that enables 128-bit vector Poly1305 implementation from HACL* portion for PowerPC arch. It implies the following PowerPC-specific features.
- Import vectorization directives from HACL* upstream to libintvector.h fork in freebl.
- Enable 128-bit Poly1305 cores in GYP and Make build systems.
- Utilize 128-bit poly1305 implementation and remove legacy 32-bit one.
Performance test (measuring Chacha20_Poly1305 encryption performance for a 16KB buffer, repeated 10000 times)
bltest -E -m chacha20_poly1305 -p 10000 -o /dev/null -v tests/chacha20_poly1305/iv0 -k tests/chacha20_poly1305/key0 -b 16384
(Upstream. VSX implementation of Chacha20/32-bit Poly1305 ported from HACL*)
# mode in opreps cxreps context op time(sec) thrgput
chacha20_poly1305_e 156Mb 256Kb 10T 0 0.000 360.000 0.360 434Mb 28Kb
(This patch. Same VSX implementation of Chacha20/128-bit Poly1305 ported from HACL*)
# mode in opreps cxreps context op time(sec) thrgput
chacha20_poly1305_e 156Mb 256Kb 10T 0 0.000 340.000 0.341 458Mb 216Kb
The patch passes all relative tests that in tests/all.sh
on POWER9.
Updated•2 years ago
|
Comment 13•2 years ago
|
||
Hi all. Is there anything we can to to facilitate review of Mamone's work?
Comment 14•2 years ago
|
||
Hi,
what's the difference between the two patches?
Comment 15•2 years ago
|
||
Hi,
I haven't figured out yet how to push commits on top of existing patch so I have a standalone patch for every update . You need to ignore the old patch that marked as legacy and consider the last one.
Comment 16•2 years ago
|
||
Phabricator can handle new updates, you just need the phabricator link in the comment of the commit, and use the mozphap command you use to create the initial patch. I use hg 'q' interface to create a single commit that keeps the same commit comment (Phabricator automatically adds the phabricator link to your commit comment.). If you use git, I believe you can use the git squash to merge your commit. You just need to preserve the phabricator link from the initial commit in your new squashed commit.
When you push a patch, be sure to update the reviewers and request a review.
Comment 18•2 years ago
|
||
Hi,
we are currently working on the chain of patches that update the currently supported HACL* code. One of the patches also update the libintvector.h library. So, let's do the following:
- we will try to have the patches submitted as soon as we can
- I would ask you to rebase the code to support the modifications
- I proceed with this patch.
Does it work for you?
Comment 19•2 years ago
|
||
It does. You can let me know once the patches are merged into the mainstream so I can rebase the code.
Updated•2 years ago
|
Comment 20•2 years ago
|
||
@nkulatova Have you got any progress on updating HACL* code?
Comment 22•2 years ago
|
||
Comment 23•2 years ago
|
||
Thank you for updating HACL* base code. I updated the patch to remove HACL* import part as well it originally cuts a fair amount of legacy code that imported improperly from HACL*, reviewing the changes is pretty straightforward with the current shape.
Comment 25•2 years ago
|
||
Comment 26•2 years ago
|
||
Comment 27•2 years ago
|
||
Updated•2 years ago
|
Updated•1 years ago
|
Comment 28•1 years ago
|
||
Hi, could you precise which out of 4 patches we need to take a look at? All 4? the latest?
Description
•