11 KiB
RFC4514: String Representation of Distinguished Names
Introduction
RFC4514, obsoletes RFC2253, which defines the standard string format for representing Distinguished Names (DNs) and Relative Distinguished Names (RDNs) within LDAP, but is used more broadly, e.g., as the string representations of issuer and subject names in X.509 certificates.
Distinguished Names (DNs)
A Distinguished Name (DN) is a sequence of Relative Distinguished Names (RDNs).
String representation of DNs
The string representation of a DN consists of strings representing each of its
RDNs
separated by commas (,
). The RFC4514 specification lists the RDNs
in reverse order (as a result, the most specific elements, such as
CommonName
, are output first, and the most general, such as the
CountryName
, last). The expected physical order of RDNs
within a DN
is
to list the most general names first.
Empty DNs are represented by an empty string.
Example: cn=John Doe,ou=People,dc=example,dc=com
Attributes and Values
Each RDN is composed of one or more attribute-value pairs. An attribute-value
pair is represented as type=value
. The type names are not case-sensitive.
Example: cn=John Doe
. Here cn
(CommonName) is the attribute type and
John Doe
is the attribute value.
Relative Distinguished Names (RDNs)
A Relative Distinguished Name (RDN) identifies an entry uniquely within its immediate superior entry. An RDN can consist of a single attribute-value pair or a set of multiple attribute-value pairs.
Multiple Attribute-Value Pairs in an RDN
The string representation of multiple attribute-value pairs within a single RDN
separates these by plus signs (+
). In most cases each RDN consists of just a
single attribute-value pair. The order of these pairs within an RDN is not
significant (the ASN.1 abstract syntax designates them as a SET rather than
as a SEQUENCE).
Example: cn=John Doe+uid=jdoe
String Representation Rules
RFC4514 specifies detailed rules for the string representation of DNs and RDNs to handle special characters and ensure unambiguous parsing.
Escaping Special Characters
Certain characters have special meaning in DN strings and must be escaped if they appear in an attribute value. The special characters are:
- Comma (
,
) - Plus (
+
) - Double Quote (
"
) - Backslash (
\
) - Less than (
<
) - Greater than (
>
) - Semicolon (
;
) - Leading hash (
#
) - Leading or trailing space
- Optionally escaped: equals (
=
), non-leading hash(#
)
These characters are escaped by preceding them with a backslash (\
). Other
characters may be escaped by encoding each octet of their UTF-8
encoding as two hexadecimal digits preceded by a backslash.
Example: cn=Doe\, John
(escaping a comma in the value)
Hexadecimal Escaping
Any character may be represented separately encoding each byte of its UTF-8
encoding with its hexadecimal value preceded by a backslash (\
). This is
particularly applicable for non-ASCII characters.
Example: cn=John\20Doe
(representing a space using its hexadecimal code)
Example: cn=Doe\2c John
(escaping a comma in the value)
Example: cn=Виктор \d0\94\d1\83\d1\85\d0\be\d0\b2\d0\bd\d1\8b\d0\b9
(escaping each UTF-8 byte of the last name).
Leading and Trailing Spaces, or a leading hash mark
Leading or trailing spaces and any leading hash mark in an attribute value must be escaped. Spaces in the middle of a value do not need to be escaped.
Example: cn=\ John Doe\
(escaping leading and trailing spaces)
Character Sets
The string must first be converted to UTF-8, prior to any escaping. In particular some strings in X.509 certificates may be encoded in 16-bit Unicode (BMP) form, as a first step, these need to be converted to UTF-8.
Tests should include some examples of non-ASCII, non-UTF8 strings that require
conversion to UTF-8 as part of encoding, the output should not produce the
\U<xxxx>
or \W<xxxxxxxx>
forms seen in do_esc_char()
.
Attribute Type Names
The core attribute type names "c", "l", "o", "ou", etc., are specified directly in RFC4519 Sections 2 and 4. These names are not case sensitive. We may wish to expand the set of recognised type names to include some that are new in RFC4519 or in the IANA LDAP descriptor registry.
Only the entries of type "A" (Attribute Type) are potentially relevant. All
the mainstream attribute types are already listed in
crypto/objects/objects.txt
and should be already supported:
Attribute Name | OID | Reference |
---|---|---|
uid | 0.9.2342.19200300.100.1.1 | RFC4519 |
userId | 0.9.2342.19200300.100.1.1 | RFC4519 |
0.9.2342.19200300.100.1.3 | RFC4524 | |
RFC822Mailbox | 0.9.2342.19200300.100.1.3 | RFC4524 |
DC | 0.9.2342.19200300.100.1.25 | RFC4519 |
domainComponent | 0.9.2342.19200300.100.1.25 | RFC4519 |
1.2.840.113549.1.9.1 | RFC3280 | |
emailAddress | 1.2.840.113549.1.9.1 | RFC3280 |
cn | 2.5.4.3 | RFC4519 |
commonName | 2.5.4.3 | RFC4519 |
sn | 2.5.4.4 | RFC4519 |
surname | 2.5.4.4 | RFC4519 |
serialNumber | 2.5.4.5 | RFC4519 |
c | 2.5.4.6 | RFC4519 |
countryName | 2.5.4.6 | RFC4519 |
L | 2.5.4.7 | RFC4519 |
localityName | 2.5.4.7 | RFC4519 |
st | 2.5.4.8 | RFC4519 |
stateOrProvinceName | 2.5.4.8 | RFC2256 |
street | 2.5.4.9 | RFC4519 |
streetAddress | 2.5.4.9 | RFC2256 |
o | 2.5.4.10 | RFC4519 |
organizationName | 2.5.4.10 | RFC4519 |
ou | 2.5.4.11 | RFC4519 |
organizationalUnitName | 2.5.4.11 | RFC4519 |
title | 2.5.4.12 | RFC4519 |
description | 2.5.4.13 | RFC4519 |
businessCategory | 2.5.4.15 | RFC4519 |
postalAddress | 2.5.4.16 | RFC4519 |
postalCode | 2.5.4.17 | RFC4519 |
postOfficeBox | 2.5.4.18 | RFC4519 |
physicalDeliveryOfficeName | 2.5.4.19 | RFC4519 |
telephoneNumber | 2.5.4.20 | RFC4519 |
name | 2.5.4.41 | RFC4519 |
givenName | 2.5.4.42 | RFC4519 |
initials | 2.5.4.43 | RFC4519 |
generationQualifier | 2.5.4.44 | RFC4519 |
pseudonym | 2.5.4.65 | RFC3280 |
When an attribute type OID is not one of the known values it is represented by
its dotted-decimal form, and the attribute value must then be encoded with a
leading #
character followed by the hexadecimal encoding of the DER encoded
value, see section 2.4 of RFC4514. This form may also be used when the value
has no suitable string representation.
I have not checked whether we implement case-insensitive string comparison for any of the attributes for which this is expected in LDAP. In certificates I do not expect to find case-variants of RDNs that need to be considered equivalent when comparing subject and issuer DNs.
Parsing of Names
The parsing of X.509 directory names (e.g. the -subj
option of the x509
command) is performed by the parse_name()
function in apps/lib/apps.c
.
This currently assumes that the output format is that of the legacy
X509_NAME_oneline()
function. That format always starts with a /
character. A single slash by itself represents an empty RDN sequence.
The parse_name()
function is used in the ca, cmp, req, storeutl,
and
x509
commands.
If or when we switch to output the RFC4514 format, we need to also accept
it on input, therefore, parse_name()
needs to be updated to treat strings
starting with a /
as legacy online forms, and other strings as the RFC4514
format.
Parsing of RFC4514 syntax is covered in Section 3. Currently, our parser
does not support RDNs with ad hoc dott-decimal OIDs, only known named attribute
types are supported. We should consider allowing explicit dotted decimal OIDs
and using X509_NAME_add_entry_by_OBJ()
to add these.
Names in the configuration file
In configuration files, we represent directory names as a "section" with one
"attr = value" line per RDN component. Relevant documentation is in
x509v3_config(3)
and openssl-req(1
). For example:
subjectAltName = dirName:dir_sect
[dir_sect]
C = UK
O = My Organization
OU = My Unit
CN = My Name
So in the configuration file, we only have to handle the syntax of the
individual value elements, the DN as a whole is not parsed. The string_mask
affects the encoding of the various strings, and defaults to utf8only
(other
values are not recommended).
Only the ca
and req
commands process the string mask, though user
applications can do the same by calling ASN1_STRING_set_default_mask_asc()
,
which is an undocumented and non-thread-safe function. The comments above the
code say:
/*-
* This function sets the default to various "flavours" of configuration.
* based on an ASCII string. Currently this is:
* MASK:XXXX : a numerical mask value.
* default : use Printable, IA5, T61, BMP, and UTF8 string types
* nombstr : any string type except variable-sized BMPStrings or UTF8Strings
* pkix : PKIX recommendation in RFC 5280
* utf8only : this is the default, use UTF8Strings
*/
Bottom-line is that for most users the DN components in the configuration file are already UTF8-friendly, the only thing to check is whether we support the desired set of attribute type names, both in the configuration file and while parsing a string representation of a complete DN.