openssl/doc/designs/rfc4514.md

Ignoring revisions in .git-blame-ignore-revs. Click here to bypass and see the normal blame view.

259 lines
11 KiB
Markdown
Raw Normal View History

RFC4514: String Representation of Distinguished Names
=====================================================
Introduction
------------
[RFC4514], obsoletes [RFC2253], which defines the standard string format for
representing *Distinguished Names* (**DN**s) and *Relative Distinguished Names*
(**RDN**s) within LDAP, but is used more broadly, e.g., as the string
representations of issuer and subject names in X.509 certificates.
Distinguished Names (DNs)
-------------------------
A *Distinguished Name* (**DN**) is a sequence of *Relative Distinguished Names*
(**RDNs**).
### String representation of DNs
The string representation of a DN consists of strings representing each of its
`RDNs` separated by commas (`,`). The [RFC4514] specification lists the RDNs
**in reverse order** (as a result, the most specific elements, such as
`CommonName`, are output first, and the most general, such as the
`CountryName`, last). The expected physical order of `RDNs` within a `DN` is
to list the most general names first.
Empty DNs are represented by an **empty** string.
**Example:** `cn=John Doe,ou=People,dc=example,dc=com`
### Attributes and Values
Each RDN is composed of one or more attribute-value pairs. An attribute-value
pair is represented as `type=value`. The type names are not case-sensitive.
**Example:** `cn=John Doe`. Here `cn` (CommonName) is the attribute type and
`John Doe` is the attribute value.
Relative Distinguished Names (RDNs)
-----------------------------------
A Relative Distinguished Name (RDN) identifies an entry uniquely within its
immediate superior entry. An RDN can consist of a single attribute-value pair
or a set of multiple attribute-value pairs.
### Multiple Attribute-Value Pairs in an RDN
The string representation of multiple attribute-value pairs within a single RDN
separates these by plus signs (`+`). In most cases each RDN consists of just a
single attribute-value pair. The order of these pairs within an RDN is not
significant (the ASN.1 abstract syntax designates them as a **SET** rather than
as a **SEQUENCE**).
**Example:** `cn=John Doe+uid=jdoe`
String Representation Rules
---------------------------
RFC4514 specifies detailed rules for the string representation of DNs and RDNs
to handle special characters and ensure unambiguous parsing.
### Escaping Special Characters
Certain characters have special meaning in DN strings and must be escaped if
they appear in an attribute value. The special characters are:
* Comma (`,`)
* Plus (`+`)
* Double Quote (`"`)
* Backslash (`\`)
* Less than (`<`)
* Greater than (`>`)
* Semicolon (`;`)
* Leading hash (`#`)
* Leading or trailing space
* Optionally escaped: equals (`=`), non-leading hash(`#`)
These characters are escaped by preceding them with a backslash (`\`). Other
characters **may** be escaped by encoding **each octet** of their UTF-8
encoding as two hexadecimal digits preceded by a backslash.
**Example:** `cn=Doe\, John` (escaping a comma in the value)
### Hexadecimal Escaping
Any character **may** be represented separately encoding each byte of its UTF-8
encoding with its hexadecimal value preceded by a backslash (`\`). This is
particularly applicable for non-ASCII characters.
**Example:** `cn=John\20Doe` (representing a space using its hexadecimal code)
**Example:** `cn=Doe\2c John` (escaping a comma in the value)
**Example**: `cn=Виктор \d0\94\d1\83\d1\85\d0\be\d0\b2\d0\bd\d1\8b\d0\b9`
(escaping each UTF-8 byte of the last name).
### Leading and Trailing Spaces, or a leading hash mark
Leading or trailing spaces and any leading hash mark in an attribute value must
be escaped. Spaces in the middle of a value do not need to be escaped.
**Example:** `cn=\ John Doe\ ` (escaping leading and trailing spaces)
### Character Sets
The string must first be converted to UTF-8, prior to any escaping. In
particular some strings in X.509 certificates may be encoded in 16-bit Unicode
(BMP) form, as a first step, these need to be converted to UTF-8.
Tests should include some examples of non-ASCII, non-UTF8 strings that require
conversion to UTF-8 as part of encoding, the output should not produce the
`\U<xxxx>` or `\W<xxxxxxxx>` forms seen in `do_esc_char()`.
Attribute Type Names
--------------------
The core attribute type names "c", "l", "o", "ou", etc., are specified directly in
[RFC4519] Sections 2 and 4. These names are not case sensitive. We may wish
to expand the set of recognised type names to include some that are new in
[RFC4519] or in the IANA [LDAP descriptor registry].
Only the entries of type "A" (Attribute Type) are potentially relevant. All
the *mainstream* attribute types are already listed in
`crypto/objects/objects.txt` and should be already supported:
| Attribute Name | OID | Reference |
|---|---|---|
| uid | 0.9.2342.19200300.100.1.1 | [RFC4519] |
| userId | 0.9.2342.19200300.100.1.1 | [RFC4519] |
| mail | 0.9.2342.19200300.100.1.3 | [RFC4524] |
| RFC822Mailbox | 0.9.2342.19200300.100.1.3 | [RFC4524] |
| DC | 0.9.2342.19200300.100.1.25 | [RFC4519] |
| domainComponent | 0.9.2342.19200300.100.1.25 | [RFC4519] |
| email | 1.2.840.113549.1.9.1 | [RFC3280] |
| emailAddress | 1.2.840.113549.1.9.1 | [RFC3280] |
| cn | 2.5.4.3 | [RFC4519] |
| commonName | 2.5.4.3 | [RFC4519] |
| sn | 2.5.4.4 | [RFC4519] |
| surname | 2.5.4.4 | [RFC4519] |
| serialNumber | 2.5.4.5 | [RFC4519] |
| c | 2.5.4.6 | [RFC4519] |
| countryName | 2.5.4.6 | [RFC4519] |
| L | 2.5.4.7 | [RFC4519] |
| localityName | 2.5.4.7 | [RFC4519] |
| st | 2.5.4.8 | [RFC4519] |
| stateOrProvinceName | 2.5.4.8 | [RFC2256] |
| street | 2.5.4.9 | [RFC4519] |
| streetAddress | 2.5.4.9 | [RFC2256] |
| o | 2.5.4.10 | [RFC4519] |
| organizationName | 2.5.4.10 | [RFC4519] |
| ou | 2.5.4.11 | [RFC4519] |
| organizationalUnitName | 2.5.4.11 | [RFC4519] |
| title | 2.5.4.12 | [RFC4519] |
| description | 2.5.4.13 | [RFC4519] |
| businessCategory | 2.5.4.15 | [RFC4519] |
| postalAddress | 2.5.4.16 | [RFC4519] |
| postalCode | 2.5.4.17 | [RFC4519] |
| postOfficeBox | 2.5.4.18 | [RFC4519] |
| physicalDeliveryOfficeName | 2.5.4.19 | [RFC4519] |
| telephoneNumber | 2.5.4.20 | [RFC4519] |
| name | 2.5.4.41 | [RFC4519] |
| givenName | 2.5.4.42 | [RFC4519] |
| initials | 2.5.4.43 | [RFC4519] |
| generationQualifier | 2.5.4.44 | [RFC4519] |
| pseudonym | 2.5.4.65 | [RFC3280] |
When an attribute type OID is not one of the known values it is represented by
its dotted-decimal form, and the attribute value must then be encoded with a
leading `#` character followed by the hexadecimal encoding of the DER encoded
value, see section 2.4 of [RFC4514]. This form may also be used when the value
has no suitable string representation.
I have not checked whether we implement case-insensitive string comparison for
any of the attributes for which this is expected in LDAP. In certificates I do
not expect to find case-variants of RDNs that need to be considered equivalent
when comparing subject and issuer DNs.
Parsing of Names
----------------
The parsing of X.509 directory names (e.g. the `-subj` option of the x509
command) is performed by the `parse_name()` function in `apps/lib/apps.c`.
This currently assumes that the output format is that of the legacy
`X509_NAME_oneline()` function. That format always starts with a `/`
character. A single slash by itself represents an **empty** RDN sequence.
The `parse_name()` function is used in the `ca, cmp, req, storeutl,` and
`x509` commands.
If or when we switch to output the [RFC4514] format, we need to also accept
it on input, therefore, `parse_name()` needs to be updated to treat strings
starting with a `/` as legacy online forms, and other strings as the RFC4514
format.
Parsing of [RFC4514] syntax is covered in Section 3. Currently, our parser
does not support RDNs with ad hoc dott-decimal OIDs, only known named attribute
types are supported. We should consider allowing explicit dotted decimal OIDs
and using `X509_NAME_add_entry_by_OBJ()` to add these.
Names in the configuration file
-------------------------------
In configuration files, we represent directory names as a "section" with one
"attr = value" line per RDN component. Relevant documentation is in
`x509v3_config(3)` and `openssl-req(1`). For example:
subjectAltName = dirName:dir_sect
[dir_sect]
C = UK
O = My Organization
OU = My Unit
CN = My Name
So in the configuration file, we only have to handle the syntax of the
individual value elements, the DN as a whole is not parsed. The `string_mask`
affects the encoding of the various strings, and defaults to `utf8only` (other
values are not recommended).
Only the `ca` and `req` commands process the string mask, though user
applications can do the same by calling `ASN1_STRING_set_default_mask_asc()`,
which is an undocumented and non-thread-safe function. The comments above the
code say:
/*-
* This function sets the default to various "flavours" of configuration.
* based on an ASCII string. Currently this is:
* MASK:XXXX : a numerical mask value.
* default : use Printable, IA5, T61, BMP, and UTF8 string types
* nombstr : any string type except variable-sized BMPStrings or UTF8Strings
* pkix : PKIX recommendation in RFC 5280
* utf8only : this is the default, use UTF8Strings
*/
Bottom-line is that for most users the DN components in the configuration file
are already UTF8-friendly, the only thing to check is whether we support the
desired set of attribute type names, both in the configuration file and while
parsing a string representation of a complete DN.
<!-- Links -->
[RFC2253]:
<https://www.rfc-editor.org/rfc/rfc2253.html>
[RFC2256]:
<https://www.rfc-editor.org/rfc/rfc2256.html>
[RFC3280]:
<https://www.rfc-editor.org/rfc/rfc3280.html>
[RFC4514]:
<https://www.rfc-editor.org/rfc/rfc4514.html>
[RFC4519]:
<https://www.rfc-editor.org/rfc/rfc4519.html>
[RFC4524]:
<https://www.rfc-editor.org/rfc/rfc4524.html>
[LDAP descriptor registry]:
<https://www.iana.org/assignments/ldap-parameters/ldap-parameters.xhtml#ldap-parameters-3>