Recently, our overnight QUIC interop runs began failing in CI when an
openssl server was tested against an ngtcp2 client:
https://github.com/openssl/openssl/actions/runs/16739736813
The underlying cause bears some explination for historical purposes
The problem began happening with a recent update to ngtcp2 in which
ngtcp2 updated its wolfssl tls backend to support ML-KEM, which caused
ngtcp to emit a client hello message that offered several groups
(including X25519MLKEM768) but only provided a keyshare for x25519.
This in turn triggered the openssl server to respond with a hello retry
request (HRR), requesting an ML-KEM keyshare instead, which ngtcp2
obliged. However all subsequent frames from the client were discarded by
the server, due to failing packet body decryption.
The problem was tracked down to a mismatch in the initial vectors used
by the client and server, leading to an AEAD tag mismatch.
Packet protection keys generate their IV's in QUIC by xoring the packet
number of the received frame with the base IV as derived via HKDF in the
tls layer.
The underlying problem was that openssl hit a very odd corner case with
how we compute the packet number of the received frame. To save space,
QUIC encodes packet numbers using a variable length integer, and only
sends the changed bits in the packet number. This requires that the
receiver (openssl) store the largest received pn of the connection,
which we nominally do.
However, in default_port_packet_handler (where QUIC frames are processed
prior to having an established channel allocated) we use a temporary qrx
to validate the packet protection of those frames. This temporary qrx
may be incorporated into the channel in some cases, but is not in the
case of a valid frame that generates an HRR at the TLS layer. In this
case, the channel allocates its own qrx independently. When this
occurs, the largest_pn value of the temporary qrx is lost, and
subsequent frames are unable to be received, as the newly allocated qrx
belives that the larges_pn for a given pn_space is 0, rather than the
value received in the initial frame (which was a complete 32 bit value,
rather than just the changed lower 8 bits). As a result the IV
construction produced the wrong value, and the decrypt failed on those
subsequent frames.
Up to this point, that wasn't even a problem, as most quic
implementations start their packet numbering at 0, so the next packet
could still have its packet number computed properly. The combination
of ngtcp using large random values for initial packet numbers, along
with the HRR triggering a separate qrx creation on a channel led to the
discovery of this discrepancy.
The fix seems pretty straightforward. When we detect in
port_default_packet_handler, that we have a separate qrx in the new
channel, we migrate processed packets from the temporary qrx to the
canonical channel qrx. In addition to doing that, we also need to
migrate the largest_pn array from the temporary qrx to the channel_qrx
so that subsequent frame reception is guaranteed to compute the received
frame packet number properly, and as such, compute the proper IV for
packet protection decryption.
Fixesopenssl/project#1296
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/28115)
Run-checker merge / run-checker (enable-weak-ssl-ciphers) (push) Waiting to runDetails
Run-checker merge / run-checker (enable-zlib) (push) Waiting to runDetails
Run-checker merge / run-checker (no-dso) (push) Waiting to runDetails
Run-checker merge / run-checker (no-dynamic-engine) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ec2m enable-fips) (push) Waiting to runDetails
Run-checker merge / run-checker (no-engine no-shared) (push) Waiting to runDetails
Run-checker merge / run-checker (no-err) (push) Waiting to runDetails
Run-checker merge / run-checker (no-filenames) (push) Waiting to runDetails
Run-checker merge / run-checker (no-integrity-only-ciphers) (push) Waiting to runDetails
Run-checker merge / run-checker (no-module) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ocsp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-pinshared) (push) Waiting to runDetails
Run-checker merge / run-checker (no-srp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-srtp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ts) (push) Waiting to runDetails
Run-checker merge / jitter (push) Waiting to runDetails
Run-checker merge / threads_sanitizer_atomic_fallback (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win32 config:--strict-warnings no-fips os:windows-2022]) (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win64 config:enable-fips no-thread-pool no-quic os:windows-2022]) (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win64 config:enable-fips os:windows-2019]) (push) Waiting to runDetails
Windows GitHub CI / plain (windows-2022) (push) Waiting to runDetails
Windows GitHub CI / minimal (windows-2019) (push) Waiting to runDetails
Windows GitHub CI / cygwin (windows-2019, map[arch:win64 config:-DCMAKE_C_COMPILER=gcc --strict-warnings enable-demos no-fips]) (push) Waiting to runDetails
Windows Compression GitHub CI / zstd (push) Waiting to runDetails
Windows Compression GitHub CI / brotli (push) Waiting to runDetails
Also make port_new_handshake_layer processing clearer.
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/27562)
Run-checker merge / run-checker (enable-weak-ssl-ciphers) (push) Waiting to runDetails
Run-checker merge / run-checker (enable-zlib) (push) Waiting to runDetails
Run-checker merge / run-checker (no-dso) (push) Waiting to runDetails
Run-checker merge / run-checker (no-dynamic-engine) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ec2m enable-fips) (push) Waiting to runDetails
Run-checker merge / run-checker (no-engine no-shared) (push) Waiting to runDetails
Run-checker merge / run-checker (no-err) (push) Waiting to runDetails
Run-checker merge / run-checker (no-filenames) (push) Waiting to runDetails
Run-checker merge / run-checker (no-integrity-only-ciphers) (push) Waiting to runDetails
Run-checker merge / run-checker (no-module) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ocsp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-pinshared) (push) Waiting to runDetails
Run-checker merge / run-checker (no-srp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-srtp) (push) Waiting to runDetails
Run-checker merge / run-checker (no-ts) (push) Waiting to runDetails
Run-checker merge / jitter (push) Waiting to runDetails
Run-checker merge / threads_sanitizer_atomic_fallback (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win32 config:--strict-warnings no-fips os:windows-2022]) (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win64 config:enable-fips no-thread-pool no-quic os:windows-2022]) (push) Waiting to runDetails
Windows GitHub CI / shared (map[arch:win64 config:enable-fips os:windows-2019]) (push) Waiting to runDetails
Windows GitHub CI / plain (windows-2022) (push) Waiting to runDetails
Windows GitHub CI / minimal (windows-2019) (push) Waiting to runDetails
Windows GitHub CI / cygwin (windows-2019, map[arch:win64 config:-DCMAKE_C_COMPILER=gcc --strict-warnings enable-demos no-fips]) (push) Waiting to runDetails
Windows Compression GitHub CI / zstd (push) Waiting to runDetails
Windows Compression GitHub CI / brotli (push) Waiting to runDetails
Trigger docs.openssl.org deployment / trigger (push) Has been cancelledDetails
Used RAND_priv_bytes_ex instead of RAND_bytes_ex to guarantee higher isolation
for cryptographic keys.
Replaced OPENSSL_free with OPENSSL_clear_free to wipe sensitive data and free
it.
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Paul Yang <kaishen.yy@antfin.com>
(Merged from https://github.com/openssl/openssl/pull/27029)
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/27091)
With the addition of larger client hellos, stemming from the use of
larger PQC key shares, it may happen that we get a client hello accross
multiple datagrams. Normally this is not a problem as
port_default_packet_handler allocates a qrx and initializes its initial
secret immediately. But if server address validation is disabled, then
the channel creates the qrx in port_bind_channel itself, without initial
secrets. As a result, we validate the first datagram in
port_default_packet_handler, but the subsequent datagrams containing the
remaining client hello fragments fail decode.
Fix it by ensuring that we add the initial secret in port_bind_channel
if we don't give it a preconfigured qrx
Fixesopenssl/project#1131
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Viktor Dukhovni <viktor@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/27006)
The interoperability tests disable client ip address
validation done by RETRY packet. All tests done in CI
take code path which sends a retry packet.
The first initial packet sent by client uses a different
initial encryption level keys to protect packet integrity.
The keys are derived from DCID chosen by client.
When server accepts connection on behalf of initial packet,
the 'DCID' gets changed which means the initial level encryption keys
are changing too. So when server skips sending a retry packet,
it must forget the qrx which was used to validate initial
packet sent by client.
Forgetting qrx is not straightforward, we must salvage the
unencrypted packets left there after they were validated.
Those unencrypted packets must be injected to newly created channel.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26808)
We let port to create qrx object and use it for
packet validation. If packet validates, we then
create channel and pass pre-created qrx to channel's
constructor.
Co-authored-by: Andrew Dinh <andrewd@openssl.org>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26808)
QUIC interoperability tests discovered bugs in my earlier commit #59e7c2313be7cff.
This change reverts everything out.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26748)
We let port to create qrx object and use it for
packet validation. If packet validates, we then
create channel and pass pre-created qrx to channel's
constructor.
Co-authored-by: Andrew Dinh <andrewd@openssl.org>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26610)
Currently, we send a NEW_TOKEN frame on every new validated connection,
but thats not necessecary. Since NEW_TOKEN tokens have a lifetime of 1
hour currently, we really only need to send a NEW_TOKEN if:
1) We validated a RETRY token
or
2) We validated a NEW_TOKEN for which the lifetime is nearing its limit
So lets do that. When we validate a token, only generate a NEW_TOKEN if
the current token is a RETRY token, or if its a NEW_TOKEN, and there is
less than 10% of the tokens lifetime remaining.
This lets clients use NEW_TOKENS repeatedly (as per the RFC), and saves
us some network bandwith.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
Just realized that NEW_TOKEN tokens don't need a reserved rscid.
Because a client might use a received NEW_TOKEN for multiple subsequent
connections, we allocate a cid when we validate the token on new
connection establishment (in fact we just use the one that the client
sends). As such the allocated rscid never gets used, and just sits
there until it ages out.
Instead, fill the rscid with random data to mutate subsequently
generated NEW_TOKENS's, since it won't ever be part of the validation
process anyway.
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
We don't want to schedule the NEW_TOKEN frame until such time as the
handshake is complete, otherwise we risk giving a token to validate a
future connection to a peer we haven't decided to trust yet
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
Start assiging initial tokens, and validating them on receipt
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
Start storing new tokens in our new cache
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
When we bind a channel, create a NEW_TOKEN token to be sent on the next
available datagram, once the channel is validated
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26517)
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
We have a chicken and egg problem.
Normally when we create a connection object in quic, we associate it
with a listener, and up the ref on the parent listener, which is fine.
However, now that we are pre-allocating user_ssl objects for incomming
connections we have a situation in which:
1) The pre-alocated connection object holds a ref on the listener
2) The application has no awareness of the quic connection object (and
so can't free it)
3) The freeing of the listener object never calls into the quic stack,
because its reference count may hold references from connections that
haven't been accepted yet
We could require that applications register a function for the
new_pending_conn callback, and track/free these pending connections, but
that seems like alot of extra unneeded work to place on the application
Instead:
a) add a quic_conn_st flag named accepted
b) When pre-allocating connections, clear the flag in (a) and _dont_
hold a reference to the parent listener
c) in SSL_accept_connection, set the accepted flag and reference the
listener
d) in ossl_quic_free drop the listener reference only if the accepted
flag is set
c) expressly free all user_ssl objects in ossl_quic_port_drop_incoming
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
Make it clear its only announcing connections, not streams
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26361)
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Paul Dale <ppzgs1@gmail.com>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26333)
Store an EVP_CIPHER_CTX context with an ephemeral key set in port
and use it to encrypt/decrypt the validation token.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26165)
Disabling server address validation here only relates to new connections
that arrive without a token. Future connections using tokens provided
by the server via NEW_TOKEN frames will still be validated
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26114)
Give us the infrastrucute to skip addr validation on the server
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26114)
If we opt not to do server address validation, we have no odcid
and therefore never reserved a local cid
We need to follow the initial code path to generate one
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26114)
Adds fields to the QUIC RETRY packet validation token:
timestamp, remote_addr, odcid, & rscid.
Also adds functionality to validate the token once returned by the client.
Note that this does not encrypt the token yet.
Also check that the RSCID stored in the RETRY validation
token matches the DCID in the header.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26048)
RFC says we should only accept datagrams of at least 1200 bytes, so the
check should discard anything under that, not over that
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26000)
@t8m pointed out that versino negotiation packets weren't guaranteeing
network byte ordering in the array of supported versions.
Convert the client to use network byte order on send and receipt.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25968)
If the server receives an Initial packet with a version number we don't
support (currently a fixed check for QUIC_VERSION_1), instead of
dropping it, respond with a version negotiation packet to the peer
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25968)
In preparation for supporting the handling of version negotiation, we
need to be able to detect why the decoding of quic header failed.
Specifically, ossl_quic_wire_decode_pkt_hdr fails if the version
provided in the header isn't QUIC_VERSION_1. We want to keep that, as
we don't support anything else, but the server code needs to
differentiate when we fail decode because of a version problem, vs some
other more fatal malforming issue.
So add a uint64_t *fail_cause pointer that gets filled out with a
failure cause. We only use VERSION failures right now, but we can
expand this later if needed
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25968)
RFC 9000 describes a method for preforming server address validation on
QUIC using retry packets. Based on:
https://datatracker.ietf.org/doc/html/rfc9000#section-17.2.5.2
We do the following:
1) Client sends an Initial packet without a retry token
2) Server abandons the initial packet and responds with a retry frame
which includes a retry token and integrity tag and new SCID
3) Client send the initial packet again, updating the encryption keys
for the connection based on the SCID sent in (2), using it as the new
DCID, including the retry token/tag provided in (2).
4) Server validates the token in (3) and creates a new connection using
the updated DCID from the client to generate its encryption keys
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/25890)