Left as it was, a failure enabling the feature flags leaves the
cluster in an inconsistent state where the joined nodes think
the joining node is already a member, but the joining node
believes its a standalone node. Thus, later join_cluster commands
fail with an inconsistent cluster error.
Fixes#8129
The query parameter `password` in an AMQP URI should only be used to set
a certificate password, *not* the login password. The login password is
set via the `amqp_authority` section as defined here -
https://www.rabbitmq.com/uri-spec.html
* Add test that demonstrates issue in #8129
* Modify code to fix test
Modify amqp_uri so that test passes
[Why]
The Feature flags registry is implemented as a module called
`rabbit_ff_registry` recompiled and reloaded at runtime.
There is a copy on disk which is a stub responsible for triggering the
first initialization of the real registry and please Dialyzer. Once the
initialization is done, this stub calls `rabbit_ff_registry` again to
get an actual return value. This is kind of recursive: the on-disk
`rabbit_ff_registry` copy calls the `rabbit_ff_registry` copy generated
at runtime.
Early during RabbitMQ startup, there could be multiple processes
indirectly calling `rabbit_ff_registry` and possibly triggering that
first initialization concurrently. Unfortunately, there is a slight
chance of race condition and deadlock:
0. No `rabbit_ff_registry` is loaded yet.
1. Both process A and B call `rabbit_ff_registry:something()` indirectly
which triggers two initializations in parallel.
2. Process A acquires the lock first and finishes the initialization. A
new registry is loaded and the old `rabbit_ff_registry` module copy
is marked as "old". At this point, process B still references that
old copy because `rabbit_ff_registry:something()` is up above in its
call stack.
3. Process B acquires the lock, prepares the new registry and tries to
soft-purge the old `rabbit_ff_registry` copy before loading the new
one.
This is where the deadlock happens: process B requests the Code server
to purge the old copy, but the Code server waits for process B to stop
using it.
[How]
With this commit, process B calls `erlang:check_process_code/2` before
asking for a soft purge. If it is using an old copy, it skips the purge
because it will deadlock anyway.
As requested in https://github.com/rabbitmq/rabbitmq-server/discussions/6331#discussioncomment-5796154
include all infos that were emitted in the MQTT connection created event also
in the MQTT connection closed event.
This ensures infos such as MQTT client ID are part of the connection
closed event.
Therefore, it's easy for the user to correlate between the two event
types.
Note that the MQTT plugin emits connection created and connection closed events only if
the CONNECT packet was successfully processed, i.e.authentication was successful.
Remove the disconnected_at property because it was never used.
rabbit_event already adds a timestamp to any event.
We already were using Cowlib 2.12.1 and therefore were
compatible with OTP-26. This simply updates Cowboy to
the version that depends on Cowlib 2.12.1.
Up to 3.11.x an MQTT client ID is tracked in Ra
as a list of bytes as returned by binary_to_list/1 in
48467d6e12/deps/rabbitmq_mqtt/src/rabbit_mqtt_frame.erl (L137)
This has two downsides:
1. Lists consume more memory than binaries (when tracking many clients).
2. It violates the MQTT spec which states
"The ClientId MUST be a UTF-8 encoded string as defined in Section 1.5.3 [MQTT-3.1.3-4]." [v4 3.1.3.1]
Therefore, the original idea was to always store MQTT client IDs as
binaries starting with Native MQTT in 3.12.
However, this leads to client ID tracking misbehaving in mixed version
clusters since new nodes would register client IDs as binaries and old
nodes would register client IDs as lists. This means that a client
registering on a new node with the same client ID as a connection to the
old node did not terminate the connection on the old node.
Therefore, for backwards compatibility, we leave the client ID as a list of bytes
in the Ra machine state because the feature flag delete_ra_cluster_mqtt_node
introduced in v3.12 will delete the Ra cluster anyway and
the new client ID tracking via pg local will store client IDs as
binaries.
An interesting side note learned here is that the compiled file
rabbit_mqtt_collector must not be changed. This commit only modifies
function specs. However as soon as the compiled code is changed, this
module becomes a new version. The new version causes the anonymous ra query
function to fail in mixed clusters: When the old node does a
ra:leader_query where the leader is on the new node, the query function
fails on the new node with `badfun` because the new node does not have
the same module version. For more context, read:
https://web.archive.org/web/20181017104411/http://www.javalimit.com/2010/05/passing-funs-to-other-erlang-nodes.html
Due to problems with TLS 1.3 clients in OTP-25 we have to continue
using TLS 1.2 until we can drop OTP-25. Similarly, certificate
chain verification is disabled in tests (verify_none) until we
can drop OTP-25.
The test failure was caused by a certificate generated with
an insecure digest and cipher, which resulted in the client
not sending the certificate to the server.
The client will now do a CA check of the server it connects to.
The TLS version used by the client was set to the default and
will likely use TLS 1.3 now. Note that client CA verification
is unrelated to the trust store certificate verification.
This change should be reverted once emqx/emqtt is OTP26 compatible.
Our fork/branch isn't either at this point, but at least partially
works. Let's use this branch for now to uncover server-side OTP26
incompatibilities (and continue working on OTP26 support for emqtt of
course).