openssl/doc/designs/rfc4514.md

11 KiB

RFC4514: String Representation of Distinguished Names

Introduction

RFC4514, obsoletes RFC2253, which defines the standard string format for representing Distinguished Names (DNs) and Relative Distinguished Names (RDNs) within LDAP, but is used more broadly, e.g., as the string representations of issuer and subject names in X.509 certificates.

Distinguished Names (DNs)

A Distinguished Name (DN) is a sequence of Relative Distinguished Names (RDNs).

String representation of DNs

The string representation of a DN consists of strings representing each of its RDNs separated by commas (,). The RFC4514 specification lists the RDNs in reverse order (as a result, the most specific elements, such as CommonName, are output first, and the most general, such as the CountryName, last). The expected physical order of RDNs within a DN is to list the most general names first.

Empty DNs are represented by an empty string.

Example: cn=John Doe,ou=People,dc=example,dc=com

Attributes and Values

Each RDN is composed of one or more attribute-value pairs. An attribute-value pair is represented as type=value. The type names are not case-sensitive.

Example: cn=John Doe. Here cn (CommonName) is the attribute type and John Doe is the attribute value.

Relative Distinguished Names (RDNs)

A Relative Distinguished Name (RDN) identifies an entry uniquely within its immediate superior entry. An RDN can consist of a single attribute-value pair or a set of multiple attribute-value pairs.

Multiple Attribute-Value Pairs in an RDN

The string representation of multiple attribute-value pairs within a single RDN separates these by plus signs (+). In most cases each RDN consists of just a single attribute-value pair. The order of these pairs within an RDN is not significant (the ASN.1 abstract syntax designates them as a SET rather than as a SEQUENCE).

Example: cn=John Doe+uid=jdoe

String Representation Rules

RFC4514 specifies detailed rules for the string representation of DNs and RDNs to handle special characters and ensure unambiguous parsing.

Escaping Special Characters

Certain characters have special meaning in DN strings and must be escaped if they appear in an attribute value. The special characters are:

  • Comma (,)
  • Plus (+)
  • Double Quote (")
  • Backslash (\)
  • Less than (<)
  • Greater than (>)
  • Semicolon (;)
  • Leading hash (#)
  • Leading or trailing space
  • Optionally escaped: equals (=), non-leading hash(#)

These characters are escaped by preceding them with a backslash (\). Other characters may be escaped by encoding each octet of their UTF-8 encoding as two hexadecimal digits preceded by a backslash.

Example: cn=Doe\, John (escaping a comma in the value)

Hexadecimal Escaping

Any character may be represented separately encoding each byte of its UTF-8 encoding with its hexadecimal value preceded by a backslash (\). This is particularly applicable for non-ASCII characters.

Example: cn=John\20Doe (representing a space using its hexadecimal code) Example: cn=Doe\2c John (escaping a comma in the value) Example: cn=Виктор \d0\94\d1\83\d1\85\d0\be\d0\b2\d0\bd\d1\8b\d0\b9 (escaping each UTF-8 byte of the last name).

Leading and Trailing Spaces, or a leading hash mark

Leading or trailing spaces and any leading hash mark in an attribute value must be escaped. Spaces in the middle of a value do not need to be escaped.

Example: cn=\ John Doe\  (escaping leading and trailing spaces)

Character Sets

The string must first be converted to UTF-8, prior to any escaping. In particular some strings in X.509 certificates may be encoded in 16-bit Unicode (BMP) form, as a first step, these need to be converted to UTF-8.

Tests should include some examples of non-ASCII, non-UTF8 strings that require conversion to UTF-8 as part of encoding, the output should not produce the \U<xxxx> or \W<xxxxxxxx> forms seen in do_esc_char().

Attribute Type Names

The core attribute type names "c", "l", "o", "ou", etc., are specified directly in RFC4519 Sections 2 and 4. These names are not case sensitive. We may wish to expand the set of recognised type names to include some that are new in RFC4519 or in the IANA LDAP descriptor registry.

Only the entries of type "A" (Attribute Type) are potentially relevant. All the mainstream attribute types are already listed in crypto/objects/objects.txt and should be already supported:

Attribute Name OID Reference
uid 0.9.2342.19200300.100.1.1 RFC4519
userId 0.9.2342.19200300.100.1.1 RFC4519
mail 0.9.2342.19200300.100.1.3 RFC4524
RFC822Mailbox 0.9.2342.19200300.100.1.3 RFC4524
DC 0.9.2342.19200300.100.1.25 RFC4519
domainComponent 0.9.2342.19200300.100.1.25 RFC4519
email 1.2.840.113549.1.9.1 RFC3280
emailAddress 1.2.840.113549.1.9.1 RFC3280
cn 2.5.4.3 RFC4519
commonName 2.5.4.3 RFC4519
sn 2.5.4.4 RFC4519
surname 2.5.4.4 RFC4519
serialNumber 2.5.4.5 RFC4519
c 2.5.4.6 RFC4519
countryName 2.5.4.6 RFC4519
L 2.5.4.7 RFC4519
localityName 2.5.4.7 RFC4519
st 2.5.4.8 RFC4519
stateOrProvinceName 2.5.4.8 RFC2256
street 2.5.4.9 RFC4519
streetAddress 2.5.4.9 RFC2256
o 2.5.4.10 RFC4519
organizationName 2.5.4.10 RFC4519
ou 2.5.4.11 RFC4519
organizationalUnitName 2.5.4.11 RFC4519
title 2.5.4.12 RFC4519
description 2.5.4.13 RFC4519
businessCategory 2.5.4.15 RFC4519
postalAddress 2.5.4.16 RFC4519
postalCode 2.5.4.17 RFC4519
postOfficeBox 2.5.4.18 RFC4519
physicalDeliveryOfficeName 2.5.4.19 RFC4519
telephoneNumber 2.5.4.20 RFC4519
name 2.5.4.41 RFC4519
givenName 2.5.4.42 RFC4519
initials 2.5.4.43 RFC4519
generationQualifier 2.5.4.44 RFC4519
pseudonym 2.5.4.65 RFC3280

When an attribute type OID is not one of the known values it is represented by its dotted-decimal form, and the attribute value must then be encoded with a leading # character followed by the hexadecimal encoding of the DER encoded value, see section 2.4 of RFC4514. This form may also be used when the value has no suitable string representation.

I have not checked whether we implement case-insensitive string comparison for any of the attributes for which this is expected in LDAP. In certificates I do not expect to find case-variants of RDNs that need to be considered equivalent when comparing subject and issuer DNs.

Parsing of Names

The parsing of X.509 directory names (e.g. the -subj option of the x509 command) is performed by the parse_name() function in apps/lib/apps.c. This currently assumes that the output format is that of the legacy X509_NAME_oneline() function. That format always starts with a / character. A single slash by itself represents an empty RDN sequence.

The parse_name() function is used in the ca, cmp, req, storeutl, and x509 commands.

If or when we switch to output the RFC4514 format, we need to also accept it on input, therefore, parse_name() needs to be updated to treat strings starting with a / as legacy online forms, and other strings as the RFC4514 format.

Parsing of RFC4514 syntax is covered in Section 3. Currently, our parser does not support RDNs with ad hoc dott-decimal OIDs, only known named attribute types are supported. We should consider allowing explicit dotted decimal OIDs and using X509_NAME_add_entry_by_OBJ() to add these.

Names in the configuration file

In configuration files, we represent directory names as a "section" with one "attr = value" line per RDN component. Relevant documentation is in x509v3_config(3) and openssl-req(1). For example:

subjectAltName = dirName:dir_sect
[dir_sect]
C = UK
O = My Organization
OU = My Unit
CN = My Name

So in the configuration file, we only have to handle the syntax of the individual value elements, the DN as a whole is not parsed. The string_mask affects the encoding of the various strings, and defaults to utf8only (other values are not recommended).

Only the ca and req commands process the string mask, though user applications can do the same by calling ASN1_STRING_set_default_mask_asc(), which is an undocumented and non-thread-safe function. The comments above the code say:

/*-
 * This function sets the default to various "flavours" of configuration.
 * based on an ASCII string. Currently this is:
 * MASK:XXXX : a numerical mask value.
 * default   : use Printable, IA5, T61, BMP, and UTF8 string types
 * nombstr   : any string type except variable-sized BMPStrings or UTF8Strings
 * pkix      : PKIX recommendation in RFC 5280
 * utf8only  : this is the default, use UTF8Strings
 */

Bottom-line is that for most users the DN components in the configuration file are already UTF8-friendly, the only thing to check is whether we support the desired set of attribute type names, both in the configuration file and while parsing a string representation of a complete DN.