Recently, our overnight QUIC interop runs began failing in CI when an
openssl server was tested against an ngtcp2 client:
https://github.com/openssl/openssl/actions/runs/16739736813
The underlying cause bears some explination for historical purposes
The problem began happening with a recent update to ngtcp2 in which
ngtcp2 updated its wolfssl tls backend to support ML-KEM, which caused
ngtcp to emit a client hello message that offered several groups
(including X25519MLKEM768) but only provided a keyshare for x25519.
This in turn triggered the openssl server to respond with a hello retry
request (HRR), requesting an ML-KEM keyshare instead, which ngtcp2
obliged. However all subsequent frames from the client were discarded by
the server, due to failing packet body decryption.
The problem was tracked down to a mismatch in the initial vectors used
by the client and server, leading to an AEAD tag mismatch.
Packet protection keys generate their IV's in QUIC by xoring the packet
number of the received frame with the base IV as derived via HKDF in the
tls layer.
The underlying problem was that openssl hit a very odd corner case with
how we compute the packet number of the received frame. To save space,
QUIC encodes packet numbers using a variable length integer, and only
sends the changed bits in the packet number. This requires that the
receiver (openssl) store the largest received pn of the connection,
which we nominally do.
However, in default_port_packet_handler (where QUIC frames are processed
prior to having an established channel allocated) we use a temporary qrx
to validate the packet protection of those frames. This temporary qrx
may be incorporated into the channel in some cases, but is not in the
case of a valid frame that generates an HRR at the TLS layer. In this
case, the channel allocates its own qrx independently. When this
occurs, the largest_pn value of the temporary qrx is lost, and
subsequent frames are unable to be received, as the newly allocated qrx
belives that the larges_pn for a given pn_space is 0, rather than the
value received in the initial frame (which was a complete 32 bit value,
rather than just the changed lower 8 bits). As a result the IV
construction produced the wrong value, and the decrypt failed on those
subsequent frames.
Up to this point, that wasn't even a problem, as most quic
implementations start their packet numbering at 0, so the next packet
could still have its packet number computed properly. The combination
of ngtcp using large random values for initial packet numbers, along
with the HRR triggering a separate qrx creation on a channel led to the
discovery of this discrepancy.
The fix seems pretty straightforward. When we detect in
port_default_packet_handler, that we have a separate qrx in the new
channel, we migrate processed packets from the temporary qrx to the
canonical channel qrx. In addition to doing that, we also need to
migrate the largest_pn array from the temporary qrx to the channel_qrx
so that subsequent frame reception is guaranteed to compute the received
frame packet number properly, and as such, compute the proper IV for
packet protection decryption.
Fixesopenssl/project#1296
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/28189)
It may occur that the qrx we allocate in port_default_packet handler to
do AEAD validation isn't the one the channel ultimately uses (like if we
turn off address validation). In that event, we need to ensure that
anything we have on that qrx isn't returned to its free list to avoid
early freeing when we free the qrx at the end of
port_default_packet_handler, while those frames are still pending on the
channel qrx
Reviewed-by: Tim Hudson <tjh@openssl.org>
Reviewed-by: Saša Nedvědický <sashan@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/27004)
The interoperability tests disable client ip address
validation done by RETRY packet. All tests done in CI
take code path which sends a retry packet.
The first initial packet sent by client uses a different
initial encryption level keys to protect packet integrity.
The keys are derived from DCID chosen by client.
When server accepts connection on behalf of initial packet,
the 'DCID' gets changed which means the initial level encryption keys
are changing too. So when server skips sending a retry packet,
it must forget the qrx which was used to validate initial
packet sent by client.
Forgetting qrx is not straightforward, we must salvage the
unencrypted packets left there after they were validated.
Those unencrypted packets must be injected to newly created channel.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26808)
We let port to create qrx object and use it for
packet validation. If packet validates, we then
create channel and pass pre-created qrx to channel's
constructor.
Co-authored-by: Andrew Dinh <andrewd@openssl.org>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26808)
QUIC interoperability tests discovered bugs in my earlier commit #59e7c2313be7cff.
This change reverts everything out.
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Tim Hudson <tjh@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26748)
We let port to create qrx object and use it for
packet validation. If packet validates, we then
create channel and pass pre-created qrx to channel's
constructor.
Co-authored-by: Andrew Dinh <andrewd@openssl.org>
Reviewed-by: Neil Horman <nhorman@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26610)
Need an api to fetch the configured conn id len for short headers, add
that in here
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/26592)
Reviewed-by: Neil Horman <nhorman@openssl.org>
Release: yes
(cherry picked from commit 0ce7d1f355)
Reviewed-by: Hugo Landau <hlandau@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/24034)
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21135)
Make sure PN duplicate suppression is side-channel safe by doing
the duplicate test after AEAD verification.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21135)
This is needed to support key update validation on the receive side.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21029)
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
Reviewed-by: Paul Dale <pauli@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/21029)
We create setter functions for the msg_callback and msg_callback_arg so
that these values can be properly propagated to the QRX/QTX/TXP even
after the channel has been created.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20914)
At this stage we just support msg_callback on receipt of a datagram.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Hugo Landau <hlandau@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/20914)
Ordinarily we should not allow ELs to be rekeyed as it makes no sense to
do so. However the INITIAL EL can need to be rekeyed if a connection
retry occurs. Modify the QRL to allow this.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19703)
Previously, the QRX filled in a OSSL_QRX_PKT structure provided by the
caller. This necessitated the caller managing reference counting itself
using a OSSL_QRX_PKT_WRAP structure. The need for this structure has
been eliminated by adding refcounting support to the QRX itself. The QRX
now outputs a pointer to an OSSL_QRX_PKT instead of filling in a
structure provided by the caller. The OSSL_QRX_PKT_WRAP structure has
been eliminated.
Reviewed-by: Tomas Mraz <tomas@openssl.org>
Reviewed-by: Matt Caswell <matt@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/19703)
- Adds an RX time field to the OSSL_QRX_PKT structure.
- Adds a timekeeping argument to ossl_demux_new which is used to determine
packet reception time.
- Adds a decoded PN field to the OSSL_QRX_PKT structure.
This has to be decoded by the QRX anyway, and its omission was an oversight.
- Key update support for the TX side.
- Minor refactoring.
Reviewed-by: Paul Dale <pauli@openssl.org>
Reviewed-by: Tomas Mraz <tomas@openssl.org>
(Merged from https://github.com/openssl/openssl/pull/18949)