This is to implement #19932, it adds enc-then-mac aes-cbc-hmac-sha512 on
aarch64, aes-cbc and hmac-sha512 are interleaved to achieve better
performance.It only supports non-padding mode that means the length of
input data should be multiple of 16 bytes.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
(Merged from https://github.com/openssl/openssl/pull/22949)
This is to implement #19932, it adds enc-then-mac aes-cbc-hmac-sha1/256,
aes-cbc and hmac-sha1/256 are interleaved to achieve better performance.
It only supports non-padding mode that means the length of input data
should be multiple of 16 bytes.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
(Merged from https://github.com/openssl/openssl/pull/22949)
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24518)
Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355)
Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24034)
Current PowerPC-related defines omit Darwin ppc64 case.
Use __POWERPC__ in place of __ppc__ + __ppc64__
Fixes#23220
CLA: trivial
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/23245)
Fixes#22818
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Todd Short <todd.short@me.com>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/22860)
To accelerate the performance of the AES-XTS mode, in this patch, we
have the specialized multi-block implementation for AES-128-XTS and
AES-256-XTS.
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
To enhance test coverage for AES-GCM mode, we provided longer additional
testing patterns for AES-GCM testing.
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
To accelerate the performance of the AES-GCM mode, in this patch, we
have the specialized multi-block implementations for AES-128-GCM,
AES-192-GCM and AES-256-GCM.
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Signed-off-by: Jerry Shih <jerry.shih@sifive.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
Support zvbb-zvkned based rvv AES-128/192/256-CTR encryption.
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
Replace old CBC implementation with optimized AES-128/192/256-CBC in
this patch.
Signed-off-by: Phoebe Chen <phoebe.chen@sifive.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
The upcoming RISC-V vector crypto extensions provide
the Zvkned extension, that provides a AES-specific instructions.
This patch provides an implementation that utilizes this
extension if available.
Tested on QEMU and no regressions observed.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21923)
Found by running the checkpatch.pl Linux script to enforce coding style.
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21468)
Running LSX instructions requires both the hardware support and the
kernel support. The `cpucfg` instruction only tests the hardware
support, causing a SIGILL if the hardware supports LSX but the kernel
does not.
Use `getauxval(AT_HWCAP)` as the ["Software Development and Build
Convention for LoongArch Architectures"][1] manual suggests.
The LOONGARCH_HWCAP_LSX and LOONGARCH_HWCAP_LASX bits are copied from
the manual too. In Glibc 2.38 they'll be provided by <sys/auxv.h> as
well, but they are unavailable in earlier Glibc versions so we cannot
rely on it.
The getauxval syscall and Glibc wrapper are available since day one
(Linux-5.19 and Glibc-2.36) for LoongArch.
Fixes#21508.
[1]:https://github.com/loongson/la-softdev-convention/blob/master/la-softdev-convention.adoc#kernel-constraints
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tom Cosgrove <tom.cosgrove@arm.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21509)
In RISC-V we have multiple extensions, that can be
used to accelerate processing.
The known extensions are defined in riscv_arch.def.
From that file test functions of the following
form are generated: RISCV_HAS_$ext().
In recent commits new ways to define the availability
of these test macros have been defined. E.g.:
#define RV32I_ZKND_ZKNE_CAPABLE \
(RISCV_HAS_ZKND() && RISCV_HAS_ZKNE())
[...]
#define RV64I_ZKND_ZKNE_CAPABLE \
(RISCV_HAS_ZKND() && RISCV_HAS_ZKNE())
This leaves us with two different APIs to test capabilities.
Further, creating the same macros for RV32 and RV64 results
in duplicated code (see example above).
This inconsistent situation makes it hard to integrate
further code. So let's clean this up with the following steps:
* Replace RV32I_* and RV64I_* macros by RICSV_HAS_* macros
* Move all test macros into riscv_arch.h
* Use "AND" and "OR" to combine tests with more than one extension
* Rename include files for accelerated processing (remove extension
postfix).
We end up with compile time tests for RV32/RV64 and run-time tests
for available extensions. Adding new routines (e.g. for vector crypto
instructions) should be straightforward.
Testing showed no regressions.
Signed-off-by: Christoph Müllner <christoph.muellner@vrull.eu>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20078)
These aren't currently checked when they are called in cipher_aes_gcm_hw_armv8.inc,
but they are declared as returning as size_t the number of bytes they have processed,
and the aes_gcm_*_*_kernel (unroll by 4) versions of these do return the correct
values.
Change-Id: Ic3eaf139e36e29e8779b5bd8b867c08fde37a337
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20191)
Add 128 bit lsx vector expansion optimization code of Loongarch64 architecture
to AES. The test result on the 3A5000 improves performance by about 40%~50%.
Signed-off-by: zhuchen <zhuchen@loongson.cn>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19364)
Properly fallback to the default implementation on CPUs
missing necessary instructions.
Fixes#19163
Reviewed-by: Dmitry Belyavskiy <beldmit@gmail.com>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19182)
Increase the block numbers to 8 for every iteration. Increase the hash
table capacity. Make use of EOR3 instruction to improve the performance.
This can improve performance 25-40% on out-of-order microarchitectures
with a large number of fast execution units, such as Neoverse V1. We also
see 20-30% performance improvements on other architectures such as the M1.
Assembly code reviewd by Tom Cosgrove (ARM).
Reviewed-by: Bernd Edlinger <bernd.edlinger@hotmail.de>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15916)
Assembly code reviewed by Shricharan Srivatsan <ssrivat@us.ibm.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16854)
This fixes "undefined reference to `aes_gcm_dec_128_kernel' in function
`armv8_aes_gcm_decrypt'" and similar
Fixes#16949
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/16951)
Reviewed-by: Richard Levitte <levitte@openssl.org>
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/15535)
Aes-xts mode can be optimized by interleaving cipher operation on
several blocks and loop unrolling. Interleaving needs one ideal
unrolling factor, here we adopt the same factor with aes-cbc,
which is described as below:
If blocks number > 5, select 5 blocks as one iteration,every
loop, decrease the blocks number by 5.
If left blocks < 5, treat them as tail blocks.
Detailed implementation has a little adjustment for squeezing
code space.
With this way, for small size such as 16 bytes, the performance is
similar as before, but for big size such as 16k bytes, the performance
improves a lot, even reaches to 2x uplift, for some arches such as A57,
the improvement even reaches more than 2x uplift. We collect many
performance datas on different micro-archs such as thunderx2,
ampere-emag, a72, a75, a57, a53 and N1, all of which reach 0.5-2x uplift.
The following table lists the encryption performance data on aarch64,
take a72, a75, a57, a53 and N1 as examples. Performance value takes the
unit of cycles per byte, takes the format as comparision of values.
List them as below:
A72:
Before optimization After optimization Improve
evp-aes-128-xts@16 8.899913518 5.949087263 49.60%
evp-aes-128-xts@64 4.525512668 3.389141845 33.53%
evp-aes-128-xts@256 3.502906908 1.633573479 114.43%
evp-aes-128-xts@1024 3.174210419 1.155952639 174.60%
evp-aes-128-xts@8192 3.053019303 1.028134888 196.95%
evp-aes-128-xts@16384 3.025292462 1.02021169 196.54%
evp-aes-256-xts@16 9.971105023 6.754233758 47.63%
evp-aes-256-xts@64 4.931479093 3.786527393 30.24%
evp-aes-256-xts@256 3.746788153 1.943975947 92.74%
evp-aes-256-xts@1024 3.401743802 1.477394648 130.25%
evp-aes-256-xts@8192 3.278769327 1.32950421 146.62%
evp-aes-256-xts@16384 3.27093296 1.325276257 146.81%
A75:
Before optimization After optimization Improve
evp-aes-128-xts@16 8.397965173 5.126839098 63.80%
evp-aes-128-xts@64 4.176860631 2.59817764 60.76%
evp-aes-128-xts@256 3.069126585 1.284561028 138.92%
evp-aes-128-xts@1024 2.805962699 0.932754655 200.83%
evp-aes-128-xts@8192 2.725820131 0.829820397 228.48%
evp-aes-128-xts@16384 2.71521905 0.823251591 229.82%
evp-aes-256-xts@16 11.24790935 7.383914448 52.33%
evp-aes-256-xts@64 5.294128847 3.048641998 73.66%
evp-aes-256-xts@256 3.861649617 1.570359905 145.91%
evp-aes-256-xts@1024 3.537646797 1.200493533 194.68%
evp-aes-256-xts@8192 3.435353012 1.085345319 216.52%
evp-aes-256-xts@16384 3.437952563 1.097963822 213.12%
A57:
Before optimization After optimization Improve
evp-aes-128-xts@16 10.57455446 7.165438012 47.58%
evp-aes-128-xts@64 5.418185447 3.721241202 45.60%
evp-aes-128-xts@256 3.855184592 1.747145379 120.66%
evp-aes-128-xts@1024 3.477199757 1.253049735 177.50%
evp-aes-128-xts@8192 3.36768104 1.091943159 208.41%
evp-aes-128-xts@16384 3.360373443 1.088942789 208.59%
evp-aes-256-xts@16 12.54559459 8.745489036 43.45%
evp-aes-256-xts@64 6.542808937 4.326387568 51.23%
evp-aes-256-xts@256 4.62668822 2.119908754 118.25%
evp-aes-256-xts@1024 4.161716505 1.557335554 167.23%
evp-aes-256-xts@8192 4.032462227 1.377749511 192.68%
evp-aes-256-xts@16384 4.023293877 1.371558933 193.34%
A53:
Before optimization After optimization Improve
evp-aes-128-xts@16 18.07842135 13.96980808 29.40%
evp-aes-128-xts@64 7.933818397 6.07159276 30.70%
evp-aes-128-xts@256 5.264604704 2.611155744 101.60%
evp-aes-128-xts@1024 4.606660117 1.722713454 167.40%
evp-aes-128-xts@8192 4.405160115 1.454379201 202.90%
evp-aes-128-xts@16384 4.401592028 1.442279392 205.20%
evp-aes-256-xts@16 20.07084054 16.00803726 25.40%
evp-aes-256-xts@64 9.192647294 6.883876732 33.50%
evp-aes-256-xts@256 6.336143161 3.108140452 103.90%
evp-aes-256-xts@1024 5.62502952 2.097960651 168.10%
evp-aes-256-xts@8192 5.412085608 1.807294191 199.50%
evp-aes-256-xts@16384 5.403062591 1.790135764 201.80%
N1:
Before optimization After optimization Improve
evp-aes-128-xts@16 6.48147613 4.209415473 53.98%
evp-aes-128-xts@64 2.847744115 1.950757468 45.98%
evp-aes-128-xts@256 2.085711968 1.061903238 96.41%
evp-aes-128-xts@1024 1.842014669 0.798486302 130.69%
evp-aes-128-xts@8192 1.760449052 0.713853939 146.61%
evp-aes-128-xts@16384 1.760763546 0.707702009 148.80%
evp-aes-256-xts@16 7.264142817 5.265970454 37.94%
evp-aes-256-xts@64 3.251356212 2.41176323 34.81%
evp-aes-256-xts@256 2.380488469 1.342095742 77.37%
evp-aes-256-xts@1024 2.08853022 1.041718215 100.49%
evp-aes-256-xts@8192 2.027432668 0.944571334 114.64%
evp-aes-256-xts@16384 2.00740782 0.941991415 113.10%
Add more XTS test cases to cover the cipher stealing mode and cases of different
number of blocks.
CustomizedGitHooks: yes
Change-Id: I93ee31b2575e1413764e27b599af62994deb4c96
Reviewed-by: Paul Dale <paul.dale@oracle.com>
Reviewed-by: Tomas Mraz <tmraz@fedoraproject.org>
(Merged from https://github.com/openssl/openssl/pull/11399)
Also Add ability for providers to dynamically exclude cipher algorithms.
Cipher algorithms are only returned from providers if their capable() method is either NULL,
or the method returns 1.
This is mainly required for ciphers that only have hardware implementations.
If there is no hardware support, then the algorithm needs to be not available.
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10146)
aes_platform.h
cmll_platform.h
des_platform.h
To make this possible, we must also define DES_ASM and CMLL_ASM to
indicate that we have the necessary internal support.
Reviewed-by: Shane Lontis <shane.lontis@oracle.com>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/10662)