... are being used at the same time.
[Why]
Depending on which node clusters with which, a node running an older
version of the Khepri Ra machine may not be able to apply Ra commands
and could be stuck.
There is no real solution and this clearly an unsupported scenario. An
old node won't always be able to join a newer cluster.
[How]
In the testsuites, we skip clustering tests if we detect that multiple
Khepri Ra machine versions are being used.
The following test flaked in CI under Khepri in mixed version mode:
```
make -C deps/rabbitmq_mqtt ct-v5 t=cluster_size_3:will_delay_node_restart RABBITMQ_METADATA_STORE=khepri SECONDARY_DIST=rabbitmq_server-4.0.5 FULL=1
```
The first node took exactly 30 seconds for draining:
```
2025-02-10 15:00:09.550824+00:00 [debug] <0.1449.0> MQTT accepting TCP connection <0.1449.0> (127.0.0.1:33376 -> 127.0.0.1:27005)
2025-02-10 15:00:09.550992+00:00 [debug] <0.1449.0> Received a CONNECT, client ID: sub0, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.551134+00:00 [debug] <0.1449.0> MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.551219+00:00 [debug] <0.1449.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.551530+00:00 [info] <0.1449.0> Accepted MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 for client ID sub0
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> Received a SUBSCRIBE with subscription(s) [{mqtt_subscription,<<"my/topic">>,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> {mqtt_subscription_opts,0,false,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> false,0,undefined}}]
2025-02-10 15:00:09.556233+00:00 [debug] <0.896.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'}
2025-02-10 15:00:09.561518+00:00 [debug] <0.1456.0> MQTT accepting TCP connection <0.1456.0> (127.0.0.1:33390 -> 127.0.0.1:27005)
2025-02-10 15:00:09.561634+00:00 [debug] <0.1456.0> Received a CONNECT, client ID: will, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.561715+00:00 [debug] <0.1456.0> MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.561828+00:00 [debug] <0.1456.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.562596+00:00 [info] <0.1456.0> Accepted MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 for client ID will
2025-02-10 15:00:09.565743+00:00 [warning] <0.1460.0> This node is being put into maintenance (drain) mode
2025-02-10 15:00:09.565833+00:00 [debug] <0.1460.0> Marking the node as undergoing maintenance
2025-02-10 15:00:09.570772+00:00 [info] <0.1460.0> Marked this node as undergoing maintenance
2025-02-10 15:00:09.570904+00:00 [info] <0.1460.0> Asked to suspend 9 client connection listeners. No new client connections will be accepted until these listeners are resumed!
2025-02-10 15:00:09.572268+00:00 [warning] <0.1460.0> Suspended all listeners and will no longer accept client connections
2025-02-10 15:00:09.572317+00:00 [warning] <0.1460.0> Closed 0 local client connections
2025-02-10 15:00:09.572418+00:00 [warning] <0.1449.0> MQTT disconnecting client <<"127.0.0.1:33376 -> 127.0.0.1:27005">> with client ID 'sub0', reason: maintenance
2025-02-10 15:00:09.572414+00:00 [warning] <0.1000.0> Closed 2 local (Web) MQTT client connections
2025-02-10 15:00:09.572499+00:00 [warning] <0.1456.0> MQTT disconnecting client <<"127.0.0.1:33390 -> 127.0.0.1:27005">> with client ID 'will', reason: maintenance
2025-02-10 15:00:09.572866+00:00 [alert] <0.1000.0> Closed 0 local STOMP client connections
2025-02-10 15:00:09.577432+00:00 [debug] <0.1456.0> scheduled delayed Will Message to topic my/topic for MQTT client ID will to be sent in 10000 ms
2025-02-10 15:00:12.991328+00:00 [debug] <0.1469.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:12.991443+00:00 [debug] <0.1469.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:12.992497+00:00 [debug] <0.1469.0> Done with virtual host processes reconciliation (run 3)
2025-02-10 15:00:16.511733+00:00 [debug] <0.1476.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:16.511864+00:00 [debug] <0.1476.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:16.514293+00:00 [debug] <0.1476.0> Done with virtual host processes reconciliation (run 4)
2025-02-10 15:00:24.897477+00:00 [debug] <0.1479.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:24.897607+00:00 [debug] <0.1479.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:24.898483+00:00 [debug] <0.1479.0> Done with virtual host processes reconciliation (run 5)
2025-02-10 15:00:24.898527+00:00 [debug] <0.1479.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:32.994347+00:00 [debug] <0.1484.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:32.994474+00:00 [debug] <0.1484.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:32.996539+00:00 [debug] <0.1484.0> Done with virtual host processes reconciliation (run 6)
2025-02-10 15:00:32.996585+00:00 [debug] <0.1484.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:39.576325+00:00 [info] <0.1460.0> Will transfer leadership of 0 quorum queues with current leader on this node
2025-02-10 15:00:39.576456+00:00 [info] <0.1460.0> Leadership transfer for quorum queues hosted on this node has been initiated
2025-02-10 15:00:39.576948+00:00 [info] <0.1460.0> Will stop local follower replicas of 0 quorum queues on this node
2025-02-10 15:00:39.576990+00:00 [info] <0.1460.0> Stopped all local replicas of quorum queues hosted on this node
2025-02-10 15:00:39.577120+00:00 [info] <0.1460.0> Will transfer leadership of metadata store with current leader on this node
2025-02-10 15:00:39.577282+00:00 [info] <0.1460.0> Khepri clustering: transferring leadership to node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577424+00:00 [info] <0.1460.0> Khepri clustering: skipping leadership transfer, leader is already in node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577547+00:00 [info] <0.1460.0> Leadership transfer for metadata store on this node has been done. The new leader is 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577674+00:00 [info] <0.1460.0> Node is ready to be shut down for maintenance or upgrade
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0> SIGTERM received - shutting down
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0>
2025-02-10 15:00:39.595758+00:00 [debug] <0.44.0> Running rabbit_prelaunch:shutdown_func() as part of `kernel` shutdown
```
Running the same test locally revealed that [rabbit_maintenance:status_consistent_read/1](55ae918094/deps/rabbit/src/rabbit_maintenance.erl (L131))
takes exactly 30 seconds to complete.
The test case assumes a Will Delay higher than the time it takes to
drain and shut down the node. Hence, this commit increases the Will
Delay time from 10 seconds to 40 seconds.
[Why]
Some testcases used to use node 1 as the clustering seed node. With
mixed-version testing, it could cause issues because node 1 would start
with a new version of Ra compared to node 2 and node 2 could fail to
join.
[How]
By using node 2 as the seed node, node 1 running a newer version of Ra
should be able to join because it supports talking to an older version.
[Why]
The `force_reset` command simply removes local files on disk for the
local node.
In the case of Ra, this can't work because the rest of the cluster does
not know about the forced-reset node. Therefore the leader will continue
to send `append_entry` commands to the reset node.
If that forced-reset node restarts and receives these messages, it will
either join the cluster again (because it's on an older Raft term) or it
will hit an assertion and exit (because it's on the same Raft term).
[How]
Given we can't really support this scenario and it has little value, the
command will now return an error if someone attemps a `force_reset` with
a node running Khepri.
This also deprecates the command: once Mnesia support is removed, the
command will be removed at the same time. This is noted in the
rabbitmqctl.8 manpage.
* Redesigned k8s peer discovery
Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.
This should prevent any race conditions and incorrectly formed clusters.
This commit contains the following changes:
1. Simplify .NET suite
2. Simplify Java package naming
3. Extract JMS tests into separate suite. This way, it's easier to run,
debug, and add new tests compared to the previous suite which mixed
.NET tests with JMS tests.
4. Add tests for different JMS message types
for the backends that support it in the first place.
When forming a cluster, registration of the node
joining the cluster might be left to (container)
orchestration tools like Nomad or Kubernetes.
This PR add a new configuration option,
'cluster_formation.registration.enable',
which defaults to true.
When set to false node registration will be skipped.
There is at least one important advantage using a
tool such as Nomad (plus Consul) over the application
(RabbitMQ) doing the registration.
When the application is not stopped gracefully for
any reason, e.g. its OOM killed,
it cannot deregister the service/node.
This leaves behind an unlinked service entry in the registry.
This problem is fundamentally avoided by allowing
Nomad (or similar tools) to register the
node'service.
See #11233#11045 for prior discussions.
Co-authored-by: Frederik Bosch <f.bosch@genkgo.nl>
Contains bug fix which would crash at-most-once dead lettering during
node restarts.
Less excessive debug logging around ra log.
Fix issue that could make leader transfers take 5s+ to complete.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
to match that used with Mnesia.
In the case of Mnesia, there are 10 retries
with a 30 second delay each.
For Khepri, a single timeout is used, so it
must be ten times as long.
Contains bug fix which would crash at-most-once dead lettering during
node restarts.
Less excessive debug logging around ra log.
Fix issue that could make leader transfers take 5s+ to complete.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.