| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | =pod | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =encoding utf8 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head1 NAME | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-06-13 06:29:48 +08:00
										 |  |  | passphrase-encoding | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | - How diverse parts of OpenSSL treat pass phrases character encoding | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head1 DESCRIPTION | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | In a modern world with all sorts of character encodings, the treatment of pass | 
					
						
							|  |  |  | phrases has become increasingly complex. | 
					
						
							|  |  |  | This manual page attempts to give an overview over how this problem is | 
					
						
							|  |  |  | currently addressed in different parts of the OpenSSL library. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head2 The general case | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | The OpenSSL library doesn't treat pass phrases in any special way as a general | 
					
						
							|  |  |  | rule, and trusts the application or user to choose a suitable character set | 
					
						
							|  |  |  | and stick to that throughout the lifetime of affected objects. | 
					
						
							|  |  |  | This means that for an object that was encrypted using a pass phrase encoded in | 
					
						
							|  |  |  | ISO-8859-1, that object needs to be decrypted using a pass phrase encoded in | 
					
						
							|  |  |  | ISO-8859-1. | 
					
						
							|  |  |  | Using the wrong encoding is expected to cause a decryption failure. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head2 PKCS#12 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | PKCS#12 is a bit different regarding pass phrase encoding. | 
					
						
							|  |  |  | The standard stipulates that the pass phrase shall be encoded as an ASN.1 | 
					
						
							|  |  |  | BMPString, which consists of the code points of the basic multilingual plane, | 
					
						
							|  |  |  | encoded in big endian (UCS-2 BE). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | OpenSSL tries to adapt to this requirements in one of the following manners: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =over 4 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item 1. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Treats the received pass phrase as UTF-8 encoded and tries to re-encode it to | 
					
						
							|  |  |  | UTF-16 (which is the same as UCS-2 for characters U+0000 to U+D7FF and U+E000 | 
					
						
							|  |  |  | to U+FFFF, but becomes an expansion for any other character), or failing that, | 
					
						
							|  |  |  | proceeds with step 2. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item 2. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Assumes that the pass phrase is encoded in ASCII or ISO-8859-1 and | 
					
						
							|  |  |  | opportunistically prepends each byte with a zero byte to obtain the UCS-2 | 
					
						
							|  |  |  | encoding of the characters, which it stores as a BMPString. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Note that since there is no check of your locale, this may produce UCS-2 / | 
					
						
							|  |  |  | UTF-16 characters that do not correspond to the original pass phrase characters | 
					
						
							|  |  |  | for other character sets, such as any ISO-8859-X encoding other than | 
					
						
							|  |  |  | ISO-8859-1 (or for Windows, CP 1252 with exception for the extra "graphical" | 
					
						
							|  |  |  | characters in the 0x80-0x9F range). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =back | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | OpenSSL versions older than 1.1.0 do variant 2 only, and that is the reason why | 
					
						
							|  |  |  | OpenSSL still does this, to be able to read files produced with older versions. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | It should be noted that this approach isn't entirely fault free. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-06-13 06:29:48 +08:00
										 |  |  | A pass phrase encoded in ISO-8859-2 could very well have a sequence such as | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | 0xC3 0xAF (which is the two characters "LATIN CAPITAL LETTER A WITH BREVE" | 
					
						
							|  |  |  | and "LATIN CAPITAL LETTER Z WITH DOT ABOVE" in ISO-8859-2 encoding), but would | 
					
						
							|  |  |  | be misinterpreted as the perfectly valid UTF-8 encoded code point U+00EF (LATIN | 
					
						
							| 
									
										
										
										
											2020-02-03 05:54:01 +08:00
										 |  |  | SMALL LETTER I WITH DIAERESIS) I<if the pass phrase doesn't contain anything that | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | would be invalid UTF-8>. | 
					
						
							|  |  |  | A pass phrase that contains this kind of byte sequence will give a different | 
					
						
							|  |  |  | outcome in OpenSSL 1.1.0 and newer than in OpenSSL older than 1.1.0. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  |  0x00 0xC3 0x00 0xAF                    # OpenSSL older than 1.1.0 | 
					
						
							|  |  |  |  0x00 0xEF                              # OpenSSL 1.1.0 and newer | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | On the same accord, anything encoded in UTF-8 that was given to OpenSSL older | 
					
						
							|  |  |  | than 1.1.0 was misinterpreted as ISO-8859-1 sequences. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head2 OSSL_STORE | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | L<ossl_store(7)> acts as a general interface to access all kinds of objects, | 
					
						
							|  |  |  | potentially protected with a pass phrase, a PIN or something else. | 
					
						
							| 
									
										
										
										
											2018-06-05 21:58:43 +08:00
										 |  |  | This API stipulates that pass phrases should be UTF-8 encoded, and that any | 
					
						
							|  |  |  | other pass phrase encoding may give undefined results. | 
					
						
							|  |  |  | This API relies on the application to ensure UTF-8 encoding, and doesn't check | 
					
						
							|  |  |  | that this is the case, so what it gets, it will also pass to the underlying | 
					
						
							|  |  |  | loader. | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | 
 | 
					
						
							|  |  |  | =head1 RECOMMENDATIONS | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This section assumes that you know what pass phrase was used for encryption, | 
					
						
							|  |  |  | but that it may have been encoded in a different character encoding than the | 
					
						
							|  |  |  | one used by your current input method. | 
					
						
							|  |  |  | For example, the pass phrase may have been used at a time when your default | 
					
						
							|  |  |  | encoding was ISO-8859-1 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 | 
					
						
							|  |  |  | 0xEF 0x76 0x65), and you're now in an environment where your default encoding | 
					
						
							|  |  |  | is UTF-8 (i.e. "naïve" resulting in the byte sequence 0x6E 0x61 0xC3 0xAF 0x76 | 
					
						
							|  |  |  | 0x65). | 
					
						
							|  |  |  | Whenever it's mentioned that you should use a certain character encoding, it | 
					
						
							|  |  |  | should be understood that you either change the input method to use the | 
					
						
							|  |  |  | mentioned encoding when you type in your pass phrase, or use some suitable tool | 
					
						
							|  |  |  | to convert your pass phrase from your default encoding to the target encoding. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Also note that the sub-sections below discuss human readable pass phrases. | 
					
						
							|  |  |  | This is particularly relevant for PKCS#12 objects, where human readable pass | 
					
						
							|  |  |  | phrases are assumed. | 
					
						
							|  |  |  | For other objects, it's as legitimate to use any byte sequence (such as a | 
					
						
							| 
									
										
										
										
											2021-07-31 00:40:27 +08:00
										 |  |  | sequence of bytes from F</dev/urandom> that's been saved away), which makes any | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | character encoding discussion irrelevant; in such cases, simply use the same | 
					
						
							|  |  |  | byte sequence as it is. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head2 Creating new objects | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For creating new pass phrase protected objects, make sure the pass phrase is | 
					
						
							|  |  |  | encoded using UTF-8. | 
					
						
							|  |  |  | This is default on most modern Unixes, but may involve an effort on other | 
					
						
							|  |  |  | platforms. | 
					
						
							|  |  |  | Specifically for Windows, setting the environment variable | 
					
						
							| 
									
										
										
										
											2019-10-10 03:45:12 +08:00
										 |  |  | B<OPENSSL_WIN32_UTF8> will have anything entered on [Windows] console prompt | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | converted to UTF-8 (command line and separately prompted pass phrases alike). | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head2 Opening existing objects | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For opening pass phrase protected objects where you know what character | 
					
						
							|  |  |  | encoding was used for the encryption pass phrase, make sure to use the same | 
					
						
							|  |  |  | encoding again. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | For opening pass phrase protected objects where the character encoding that was | 
					
						
							|  |  |  | used is unknown, or where the producing application is unknown, try one of the | 
					
						
							|  |  |  | following: | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =over 4 | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item 1. | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-06-13 06:29:48 +08:00
										 |  |  | Try the pass phrase that you have as it is in the character encoding of your | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | environment. | 
					
						
							|  |  |  | It's possible that its byte sequence is exactly right. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item 2. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Convert the pass phrase to UTF-8 and try with the result. | 
					
						
							|  |  |  | Specifically with PKCS#12, this should open up any object that was created | 
					
						
							|  |  |  | according to the specification. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =item 3. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | Do a naïve (i.e. purely mathematical) ISO-8859-1 to UTF-8 conversion and try | 
					
						
							|  |  |  | with the result. | 
					
						
							|  |  |  | This differs from the previous attempt because ISO-8859-1 maps directly to | 
					
						
							|  |  |  | U+0000 to U+00FF, which other non-UTF-8 character sets do not. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | This also takes care of the case when a UTF-8 encoded string was used with | 
					
						
							|  |  |  | OpenSSL older than 1.1.0. | 
					
						
							|  |  |  | (for example, C<ï>, which is 0xC3 0xAF when encoded in UTF-8, would become 0xC3 | 
					
						
							|  |  |  | 0x83 0xC2 0xAF when re-encoded in the naïve manner. | 
					
						
							|  |  |  | The conversion to BMPString would then yield 0x00 0xC3 0x00 0xA4 0x00 0x00, the | 
					
						
							|  |  |  | erroneous/non-compliant encoding used by OpenSSL older than 1.1.0) | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =back | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head1 SEE ALSO | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | L<evp(7)>, | 
					
						
							|  |  |  | L<ossl_store(7)>, | 
					
						
							|  |  |  | L<EVP_BytesToKey(3)>, L<EVP_DecryptInit(3)>, | 
					
						
							|  |  |  | L<PEM_do_header(3)>, | 
					
						
							|  |  |  | L<PKCS12_parse(3)>, L<PKCS12_newpass(3)>, | 
					
						
							|  |  |  | L<d2i_PKCS8PrivateKey_bio(3)> | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =head1 COPYRIGHT | 
					
						
							|  |  |  | 
 | 
					
						
							| 
									
										
										
										
											2021-09-07 19:29:33 +08:00
										 |  |  | Copyright 2018-2021 The OpenSSL Project Authors. All Rights Reserved. | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | 
 | 
					
						
							| 
									
										
										
										
											2018-12-06 21:05:22 +08:00
										 |  |  | Licensed under the Apache License 2.0 (the "License").  You may not use | 
					
						
							| 
									
										
										
										
											2018-05-05 06:06:07 +08:00
										 |  |  | this file except in compliance with the License.  You can obtain a copy | 
					
						
							|  |  |  | in the file LICENSE in the source distribution or at | 
					
						
							|  |  |  | L<https://www.openssl.org/source/license.html>. | 
					
						
							|  |  |  | 
 | 
					
						
							|  |  |  | =cut |