| 
									
										
										
										
											2004-07-27 04:18:55 +08:00
										 |  |  | =pod | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head1 NAME | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  | OPENSSL_ia32cap - the x86[_64] processor capabilities vector | 
					
						
							| 
									
										
										
										
											2004-07-27 04:18:55 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | =head1 SYNOPSIS | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  |  env OPENSSL_ia32cap=... <application> | 
					
						
							| 
									
										
										
										
											2004-07-27 04:18:55 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | =head1 DESCRIPTION | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  | OpenSSL supports a range of x86[_64] instruction set extensions. These | 
					
						
							|  |  |  | extensions are denoted by individual bits in capability vector returned | 
					
						
							|  |  |  | by processor in EDX:ECX register pair after executing CPUID instruction | 
					
						
							|  |  |  | with EAX=1 input value (see Intel Application Note #241618). This vector | 
					
						
							|  |  |  | is copied to memory upon toolkit initialization and used to choose | 
					
						
							|  |  |  | between different code paths to provide optimal performance across wide | 
					
						
							|  |  |  | range of processors. For the moment of this writing following bits are | 
					
						
							|  |  |  | significant: | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-04-04 03:39:09 +08:00
										 |  |  | =over 4 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | =item bit #4 denoting presence of Time-Stamp Counter. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #19 denoting availability of CLFLUSH instruction; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #20, reserved by Intel, is used to choose among RC4 code paths; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #23 denoting MMX support; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #24, FXSR bit, denoting availability of XMM registers; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #25 denoting SSE support; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #26 denoting SSE2 support; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | =item bit #28 denoting Hyperthreading, which is used to distinguish | 
					
						
							|  |  |  | cores with shared cache; | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-27 23:32:43 +08:00
										 |  |  | =item bit #30, reserved by Intel, denotes specifically Intel CPUs; | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | =item bit #33 denoting availability of PCLMULQDQ instruction; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #41 denoting SSSE3, Supplemental SSE3, support; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-27 23:32:43 +08:00
										 |  |  | =item bit #43 denoting AMD XOP support (forced to zero on non-AMD CPUs); | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | =item bit #54 denoting availability of MOVBE instruction; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | =item bit #57 denoting AES-NI instruction set extension; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | =item bit #58, XSAVE bit, lack of which in combination with MOVBE is used | 
					
						
							|  |  |  | to identify Atom Silvermont core; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-05-17 04:35:11 +08:00
										 |  |  | =item bit #59, OSXSAVE bit, denoting availability of YMM registers; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #60 denoting AVX extension; | 
					
						
							| 
									
										
										
										
											2007-04-02 01:28:08 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2011-06-04 20:20:45 +08:00
										 |  |  | =item bit #62 denoting availability of RDRAND instruction; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | =back | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | For example, in 32-bit application context clearing bit #26 at run-time | 
					
						
							|  |  |  | disables high-performance SSE2 code present in the crypto library, while | 
					
						
							|  |  |  | clearing bit #24 disables SSE2 code operating on 128-bit XMM register | 
					
						
							|  |  |  | bank. You might have to do the latter if target OpenSSL application is | 
					
						
							|  |  |  | executed on SSE2 capable CPU, but under control of OS that does not | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  | enable XMM registers. Historically address of the capability vector copy | 
					
						
							|  |  |  | was exposed to application through OPENSSL_ia32cap_loc(), but not | 
					
						
							|  |  |  | anymore. Now the only way to affect the capability detection is to set | 
					
						
							| 
									
										
										
										
											2016-06-29 04:39:55 +08:00
										 |  |  | OPENSSL_ia32cap environment variable prior target application start. To | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  | give a specific example, on Intel P4 processor 'env | 
					
						
							|  |  |  | OPENSSL_ia32cap=0x16980010 apps/openssl', or better yet 'env | 
					
						
							|  |  |  | OPENSSL_ia32cap=~0x1000000 apps/openssl' would achieve the desired | 
					
						
							|  |  |  | effect. Alternatively you can reconfigure the toolkit with no-sse2 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | option and recompile. | 
					
						
							| 
									
										
										
										
											2004-07-27 04:18:55 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-17 22:09:38 +08:00
										 |  |  | Less intuitive is clearing bit #28, or ~0x10000000 in the "environment | 
					
						
							|  |  |  | variable" terms. The truth is that it's not copied from CPUID output | 
					
						
							|  |  |  | verbatim, but is adjusted to reflect whether or not the data cache is | 
					
						
							|  |  |  | actually shared between logical cores. This in turn affects the decision | 
					
						
							|  |  |  | on whether or not expensive countermeasures against cache-timing attacks | 
					
						
							|  |  |  | are applied, most notably in AES assembler module. | 
					
						
							| 
									
										
										
										
											2012-11-18 03:04:15 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | The capability vector is further extended with EBX value returned by | 
					
						
							|  |  |  | CPUID with EAX=7 and ECX=0 as input. Following bits are significant: | 
					
						
							| 
									
										
										
										
											2012-11-18 03:04:15 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-04-04 03:39:09 +08:00
										 |  |  | =over 4 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-11-18 03:04:15 +08:00
										 |  |  | =item bit #64+3 denoting availability of BMI1 instructions, e.g. ANDN; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #64+5 denoting availability of AVX2 instructions; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | =item bit #64+8 denoting availability of BMI2 instructions, e.g. MULX | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | and RORX; | 
					
						
							| 
									
										
										
										
											2012-11-18 03:04:15 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | =item bit #64+16 denoting availability of AVX512F extension; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2012-11-18 03:04:15 +08:00
										 |  |  | =item bit #64+18 denoting availability of RDSEED instruction; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | =item bit #64+19 denoting availability of ADCX and ADOX instructions; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-12-14 20:33:40 +08:00
										 |  |  | =item bit #64+21 denoting availability of VPMADD52[LH]UQ instructions, | 
					
						
							|  |  |  | a.k.a. AVX512IFMA extension; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | =item bit #64+29 denoting availability of SHA extension; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #64+30 denoting availability of AVX512BW extension; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #64+31 denoting availability of AVX512VL extension; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2017-11-06 03:03:17 +08:00
										 |  |  | =item bit #64+41 denoting availability of VAES extension; | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item bit #64+42 denoting availability of VPCLMULQDQ extension; | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2013-06-13 06:42:08 +08:00
										 |  |  | =back | 
					
						
							| 
									
										
										
										
											2016-05-18 22:16:40 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | To control this extended capability word use ':' as delimiter when | 
					
						
							|  |  |  | setting up OPENSSL_ia32cap environment variable. For example assigning | 
					
						
							|  |  |  | ':~0x20' would disable AVX2 code paths, and ':0' - all post-AVX | 
					
						
							|  |  |  | extensions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It should be noted that whether or not some of the most "fancy" | 
					
						
							|  |  |  | extension code paths are actually assembled depends on current assembler | 
					
						
							|  |  |  | version. Base minimum of AES-NI/PCLMULQDQ, SSSE3 and SHA extension code | 
					
						
							| 
									
										
										
										
											2018-03-21 23:20:59 +08:00
										 |  |  | paths are always assembled. Apart from that, minimum assembler version | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | requirements are summarized in below table: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    Extension   | GNU as | nasm   | llvm | 
					
						
							|  |  |  |    ------------+--------+--------+-------- | 
					
						
							|  |  |  |    AVX         | 2.19   | 2.09   | 3.0 | 
					
						
							|  |  |  |    AVX2        | 2.22   | 2.10   | 3.1 | 
					
						
							| 
									
										
										
										
											2016-10-10 04:06:12 +08:00
										 |  |  |    ADCX/ADOX   | 2.23   | 2.10   | 3.3 | 
					
						
							| 
									
										
										
										
											2016-12-14 20:33:40 +08:00
										 |  |  |    AVX512      | 2.25   | 2.11.8 | see NOTES | 
					
						
							|  |  |  |    AVX512IFMA  | 2.26   | 2.11.8 | see NOTES | 
					
						
							| 
									
										
										
										
											2018-03-21 23:20:59 +08:00
										 |  |  |    VAES        | 2.30   | 2.13.3 | | 
					
						
							| 
									
										
										
										
											2016-12-14 20:33:40 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | =head1 NOTES | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Even though AVX512 support was implemented in llvm 3.6, compilation of | 
					
						
							|  |  |  | assembly modules apparently requires explicit -march flag. But then | 
					
						
							|  |  |  | compiler generates processor-specific code, which in turn contradicts | 
					
						
							|  |  |  | the mere idea of run-time switch execution facilitated by the variable | 
					
						
							|  |  |  | in question. Till the limitation is lifted, it's possible to work around | 
					
						
							|  |  |  | the problem by making build procedure use following script: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |    #!/bin/sh | 
					
						
							|  |  |  |    exec clang -no-integrated-as "$@" | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | instead of real clang. In which case it doesn't matter which clang | 
					
						
							|  |  |  | version is used, as it is GNU assembler version that will be checked. | 
					
						
							| 
									
										
										
										
											2016-06-12 18:31:40 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-01-03 01:07:57 +08:00
										 |  |  | =head1 RETURN VALUES | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Not available. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2016-05-18 23:44:05 +08:00
										 |  |  | =head1 COPYRIGHT | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-01-19 17:49:22 +08:00
										 |  |  | Copyright 2004-2018 The OpenSSL Project Authors. All Rights Reserved. | 
					
						
							| 
									
										
										
										
											2016-05-18 23:44:05 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-12-06 21:04:44 +08:00
										 |  |  | Licensed under the Apache License 2.0 (the "License").  You may not use | 
					
						
							| 
									
										
										
										
											2016-05-18 23:44:05 +08:00
										 |  |  | this file except in compliance with the License.  You can obtain a copy | 
					
						
							|  |  |  | in the file LICENSE in the source distribution or at | 
					
						
							|  |  |  | L<https://www.openssl.org/source/license.html>. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =cut |