Commit Graph

58716 Commits

Author SHA1 Message Date
Jean-Sébastien Pédron 1f1a13521b
Skip peer discovery clustering tests if multiple Khepri machine versions
... are being used at the same time.

[Why]
Depending on which node clusters with which, a node running an older
version of the Khepri Ra machine may not be able to apply Ra commands
and could be stuck.

There is no real solution and this clearly an unsupported scenario. An
old node won't always be able to join a newer cluster.

[How]
In the testsuites, we skip clustering tests if we detect that multiple
Khepri Ra machine versions are being used.
2025-02-12 17:13:24 +01:00
Jean-Sébastien Pédron e8a302a249
Merge pull request #13234 from rabbitmq/adapt-rabbit_stream_queue_SUITE-to-khepri-0.17.0
rabbit_stream_queue_SUITE: Swap uses of node 2 and 3 in `format` test
2025-02-12 11:39:24 +01:00
Michael Klishin 9f4853c112
Merge pull request #13218 from rabbitmq/identity-metric-endpoint-label
Add rabbitmq_endpoint label to rabbitmq_identity_info
2025-02-11 16:13:22 -05:00
Michael Klishin 3e64d46c2b
Merge pull request #13235 from rabbitmq/mqtt-khepri-flake
Fix MQTT test flake in Khepri mixed version mode
2025-02-11 13:22:12 -05:00
David Ansari 38cba9d63d Fix MQTT test flake in Khepri mixed version mode
The following test flaked in CI under Khepri in mixed version mode:
```
make -C deps/rabbitmq_mqtt ct-v5 t=cluster_size_3:will_delay_node_restart RABBITMQ_METADATA_STORE=khepri SECONDARY_DIST=rabbitmq_server-4.0.5 FULL=1
```

The first node took exactly 30 seconds for draining:
```
2025-02-10 15:00:09.550824+00:00 [debug] <0.1449.0> MQTT accepting TCP connection <0.1449.0> (127.0.0.1:33376 -> 127.0.0.1:27005)
2025-02-10 15:00:09.550992+00:00 [debug] <0.1449.0> Received a CONNECT, client ID: sub0, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.551134+00:00 [debug] <0.1449.0> MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.551219+00:00 [debug] <0.1449.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.551530+00:00 [info] <0.1449.0> Accepted MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 for client ID sub0
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> Received a SUBSCRIBE with subscription(s) [{mqtt_subscription,<<"my/topic">>,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0>                                             {mqtt_subscription_opts,0,false,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0>                                              false,0,undefined}}]
2025-02-10 15:00:09.556233+00:00 [debug] <0.896.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'}
2025-02-10 15:00:09.561518+00:00 [debug] <0.1456.0> MQTT accepting TCP connection <0.1456.0> (127.0.0.1:33390 -> 127.0.0.1:27005)
2025-02-10 15:00:09.561634+00:00 [debug] <0.1456.0> Received a CONNECT, client ID: will, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.561715+00:00 [debug] <0.1456.0> MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.561828+00:00 [debug] <0.1456.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.562596+00:00 [info] <0.1456.0> Accepted MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 for client ID will
2025-02-10 15:00:09.565743+00:00 [warning] <0.1460.0> This node is being put into maintenance (drain) mode
2025-02-10 15:00:09.565833+00:00 [debug] <0.1460.0> Marking the node as undergoing maintenance
2025-02-10 15:00:09.570772+00:00 [info] <0.1460.0> Marked this node as undergoing maintenance
2025-02-10 15:00:09.570904+00:00 [info] <0.1460.0> Asked to suspend 9 client connection listeners. No new client connections will be accepted until these listeners are resumed!
2025-02-10 15:00:09.572268+00:00 [warning] <0.1460.0> Suspended all listeners and will no longer accept client connections
2025-02-10 15:00:09.572317+00:00 [warning] <0.1460.0> Closed 0 local client connections
2025-02-10 15:00:09.572418+00:00 [warning] <0.1449.0> MQTT disconnecting client <<"127.0.0.1:33376 -> 127.0.0.1:27005">> with client ID 'sub0', reason: maintenance
2025-02-10 15:00:09.572414+00:00 [warning] <0.1000.0> Closed 2 local (Web) MQTT client connections
2025-02-10 15:00:09.572499+00:00 [warning] <0.1456.0> MQTT disconnecting client <<"127.0.0.1:33390 -> 127.0.0.1:27005">> with client ID 'will', reason: maintenance
2025-02-10 15:00:09.572866+00:00 [alert] <0.1000.0> Closed 0 local STOMP client connections
2025-02-10 15:00:09.577432+00:00 [debug] <0.1456.0> scheduled delayed Will Message to topic my/topic for MQTT client ID will to be sent in 10000 ms
2025-02-10 15:00:12.991328+00:00 [debug] <0.1469.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:12.991443+00:00 [debug] <0.1469.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:12.992497+00:00 [debug] <0.1469.0> Done with virtual host processes reconciliation (run 3)
2025-02-10 15:00:16.511733+00:00 [debug] <0.1476.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:16.511864+00:00 [debug] <0.1476.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:16.514293+00:00 [debug] <0.1476.0> Done with virtual host processes reconciliation (run 4)
2025-02-10 15:00:24.897477+00:00 [debug] <0.1479.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:24.897607+00:00 [debug] <0.1479.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:24.898483+00:00 [debug] <0.1479.0> Done with virtual host processes reconciliation (run 5)
2025-02-10 15:00:24.898527+00:00 [debug] <0.1479.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:32.994347+00:00 [debug] <0.1484.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:32.994474+00:00 [debug] <0.1484.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:32.996539+00:00 [debug] <0.1484.0> Done with virtual host processes reconciliation (run 6)
2025-02-10 15:00:32.996585+00:00 [debug] <0.1484.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:39.576325+00:00 [info] <0.1460.0> Will transfer leadership of 0 quorum queues with current leader on this node
2025-02-10 15:00:39.576456+00:00 [info] <0.1460.0> Leadership transfer for quorum queues hosted on this node has been initiated
2025-02-10 15:00:39.576948+00:00 [info] <0.1460.0> Will stop local follower replicas of 0 quorum queues on this node
2025-02-10 15:00:39.576990+00:00 [info] <0.1460.0> Stopped all local replicas of quorum queues hosted on this node
2025-02-10 15:00:39.577120+00:00 [info] <0.1460.0> Will transfer leadership of metadata store with current leader on this node
2025-02-10 15:00:39.577282+00:00 [info] <0.1460.0> Khepri clustering: transferring leadership to node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577424+00:00 [info] <0.1460.0> Khepri clustering: skipping leadership transfer, leader is already in node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577547+00:00 [info] <0.1460.0> Leadership transfer for metadata store on this node has been done. The new leader is 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577674+00:00 [info] <0.1460.0> Node is ready to be shut down for maintenance or upgrade
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0> SIGTERM received - shutting down
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0>
2025-02-10 15:00:39.595758+00:00 [debug] <0.44.0> Running rabbit_prelaunch:shutdown_func() as part of `kernel` shutdown
```

Running the same test locally revealed that [rabbit_maintenance:status_consistent_read/1](55ae918094/deps/rabbit/src/rabbit_maintenance.erl (L131))
takes exactly 30 seconds to complete.

The test case assumes a Will Delay higher than the time it takes to
drain and shut down the node. Hence, this commit increases the Will
Delay time from 10 seconds to 40 seconds.
2025-02-11 18:34:36 +01:00
Jean-Sébastien Pédron 5cbda4c838
rabbit_stream_queue_SUITE: Swap uses of node 2 and 3 in `format`
[Why]
We hit some transient errors with the previous order when doing
mixed-version testing. Swapping the nodes seems to fix the problem.
2025-02-11 15:50:07 +01:00
Jean-Sébastien Pédron 55ae918094
Merge pull request #13232 from rabbitmq/adapt-feature_flags_SUITE-to-khepri-0.17.0
feature_flags_SUITE: Change clustering seed node in few tests
2025-02-11 15:48:59 +01:00
Jean-Sébastien Pédron 30fb8a719f
feature_flags_SUITE: Change clustering seed node in few tests
[Why]
Some testcases used to use node 1 as the clustering seed node. With
mixed-version testing, it could cause issues because node 1 would start
with a new version of Ra compared to node 2 and node 2 could fail to
join.

[How]
By using node 2 as the seed node, node 1 running a newer version of Ra
should be able to join because it supports talking to an older version.
2025-02-11 12:10:33 +01:00
Michael Klishin 631a205210
4.0.6 release notes: a missing link 2025-02-10 23:44:58 -05:00
Michael Klishin 7ba05db808
Bump 4.1.0 beta version in release notes 2025-02-10 23:41:25 -05:00
Michael Klishin 4d1b903427
4.1.0-beta.4 release notes: a typo 2025-02-10 23:40:14 -05:00
Michael Klishin 428399dcec
4.0.6 release notes: a typo 2025-02-10 23:40:00 -05:00
Michael Klishin b341a39e65
Update 4.1.0 release notes 2025-02-10 23:38:35 -05:00
Michael Klishin f920baf572
Wording 2025-02-10 23:25:15 -05:00
Michael Klishin e413907a3f
4.0.6 release notes 2025-02-10 21:36:52 -05:00
Michael Klishin a87036b197
Update SERVER_RELEASES.md 2025-02-10 14:16:51 -05:00
Jean-Sébastien Pédron 839a485a0e
Merge pull request #13217 from rabbitmq/force_reset-command-unsupported-with-khepri
rabbit_db: `force_reset` command is unsupported with Khepri
2025-02-10 19:55:44 +01:00
Michael Klishin a4f9babe2d
Update Mergify for v4.1.x
Pair: @the-mikedavis
(cherry picked from commit 1af6c4d2f4)
2025-02-10 10:33:19 -05:00
Jean-Sébastien Pédron c78aec7d48
rabbit_db: `force_reset` command is unsupported with Khepri
[Why]
The `force_reset` command simply removes local files on disk for the
local node.

In the case of Ra, this can't work because the rest of the cluster does
not know about the forced-reset node. Therefore the leader will continue
to send `append_entry` commands to the reset node.

If that forced-reset node restarts and receives these messages, it will
either join the cluster again (because it's on an older Raft term) or it
will hit an assertion and exit (because it's on the same Raft term).

[How]
Given we can't really support this scenario and it has little value, the
command will now return an error if someone attemps a `force_reset` with
a node running Khepri.

This also deprecates the command: once Mnesia support is removed, the
command will be removed at the same time. This is noted in the
rabbitmqctl.8 manpage.
2025-02-10 15:09:36 +01:00
Michael Klishin 211fc5b45f
Merge pull request #13219 from zhongwencool/mk-bump-observer-cli
Bump observer_cli to 1.8.2
2025-02-08 14:36:51 -05:00
zhongwencool b367b40786 Bump observer_cli to 1.8.2 2025-02-08 17:19:37 +08:00
Michal Kuratczyk 703ee8529e
Add rabbitmq_endpoint label to rabbitmq_identity_info 2025-02-07 15:51:40 +01:00
Michael Klishin 3a17473a95
Merge pull request #13214 from rabbitmq/dependabot/maven/deps/rabbitmq_mqtt/test/java_SUITE_data/main/com.rabbitmq-amqp-client-5.25.0
build(deps-dev): bump com.rabbitmq:amqp-client from 5.24.0 to 5.25.0 in /deps/rabbitmq_mqtt/test/java_SUITE_data
2025-02-06 14:28:59 -05:00
dependabot[bot] 27c5f0ae8c
build(deps-dev): bump com.rabbitmq:amqp-client
Bumps [com.rabbitmq:amqp-client](https://github.com/rabbitmq/rabbitmq-java-client) from 5.24.0 to 5.25.0.
- [Release notes](https://github.com/rabbitmq/rabbitmq-java-client/releases)
- [Commits](https://github.com/rabbitmq/rabbitmq-java-client/compare/v5.24.0...v5.25.0)

---
updated-dependencies:
- dependency-name: com.rabbitmq:amqp-client
  dependency-type: direct:development
  update-type: version-update:semver-minor
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-06 19:03:19 +00:00
Michael Klishin 9bc3d5e781
Merge pull request #13210 from frederikbosch/disable_health_check
Peer discovery: disable consul health check helper when registration is disabled
2025-02-06 13:56:49 -05:00
Michael Klishin ee78e27958
Merge pull request #13211 from rabbitmq/local-cluster-name
Set cluster_name to "localhost" when running with make
2025-02-06 10:51:59 -05:00
Michal Kuratczyk ce094f6333
Set cluster_name to "localhost" when running with make 2025-02-06 16:31:23 +01:00
Frederik Bosch b821e52329 disable consul health check helper when registration is disabled 2025-02-06 14:18:18 +01:00
Michael Klishin 6515e9a0cd
Merge pull request #13206 from rabbitmq/gazelle-main
bazel run gazelle
2025-02-04 23:59:34 -05:00
GitHub c6cdc40f1a bazel run gazelle 2025-02-05 04:02:24 +00:00
Michal Kuratczyk a9d5b6001a
k8s peer discovery v2 (#13050)
* Redesigned k8s peer discovery

Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.

This should prevent any race conditions and incorrectly formed clusters.
2025-02-04 17:07:27 +01:00
David Ansari 7bc3ab8cd4 Add tests for different JMS message types
This commit contains the following changes:
1. Simplify .NET suite
2. Simplify Java package naming
3. Extract JMS tests into separate suite. This way, it's easier to run,
debug, and add new tests compared to the previous suite which mixed
.NET tests with JMS tests.
4. Add tests for different JMS message types
2025-02-04 14:46:49 +01:00
Michael Klishin b4f5d3d51a
Core config_schema_SUITE: cosmetics 2025-02-03 19:14:25 -05:00
Michael Klishin 491971f08d
Merge pull request #13201 from rabbitmq/rabbitmq-server-13194
By @frederikbosch: Peer discovery: add an option to opt out of registration
2025-02-03 16:18:47 -05:00
Michael Klishin 1432b6cc12
Merge pull request #13199 from rabbitmq/dependabot/github_actions/main/google-github-actions/auth-2.1.8
build(deps): bump google-github-actions/auth from 2.1.7 to 2.1.8
2025-02-03 14:07:27 -05:00
Michael Klishin 269685dd6e
Make it possible to opt out of peer discovery registration
for the backends that support it in the first place.

When forming a cluster, registration of the node
joining the cluster might be left to (container)
orchestration tools like Nomad or Kubernetes.

This PR add a new configuration option,
'cluster_formation.registration.enable',
which defaults to true.
When set to false node registration will be skipped.

There is at least one important advantage using a
tool such as Nomad (plus Consul) over the application
(RabbitMQ) doing the registration.

When the application is not stopped gracefully for
any reason, e.g. its OOM killed,
it cannot deregister the service/node.

This leaves behind an unlinked service entry in the registry.
This problem is fundamentally avoided by allowing
Nomad (or similar tools) to register the
node'service.

See #11233  #11045 for prior discussions.

Co-authored-by: Frederik Bosch <f.bosch@genkgo.nl>
2025-02-03 13:58:17 -05:00
dependabot[bot] 97f333569e
build(deps): bump google-github-actions/auth from 2.1.7 to 2.1.8
Bumps [google-github-actions/auth](https://github.com/google-github-actions/auth) from 2.1.7 to 2.1.8.
- [Release notes](https://github.com/google-github-actions/auth/releases)
- [Changelog](https://github.com/google-github-actions/auth/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google-github-actions/auth/compare/v2.1.7...v2.1.8)

---
updated-dependencies:
- dependency-name: google-github-actions/auth
  dependency-type: direct:production
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-02-03 18:11:07 +00:00
Karl Nilsson 600bb22939
Ra 2.16.1
Contains bug fix which would crash at-most-once dead lettering during
node restarts.

Less excessive debug logging around ra log.

Fix issue that could make leader transfers take 5s+ to complete.
2025-02-03 11:57:43 -05:00
David Ansari 0b1cfc6f04
Impose limit on AMQP filter complexity
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.

Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
2025-02-03 11:57:43 -05:00
David Ansari 4bdddc7a98
Bump Qpid JMS AMQP 1.0 client 2025-02-03 11:57:43 -05:00
Michael Klishin df424e5edf
Bump Khepri cluster formation timeout
to match that used with Mnesia.

In the case of Mnesia, there are 10 retries
with a 30 second delay each.

For Khepri, a single timeout is used, so it
must be ten times as long.
2025-02-03 11:57:42 -05:00
Michael Klishin 40172aa183
Merge pull request #13190 from rabbitmq/ra-v2.16.1
Ra 2.16.1
2025-02-03 11:27:28 -05:00
Karl Nilsson 7931797761 Ra 2.16.1
Contains bug fix which would crash at-most-once dead lettering during
node restarts.

Less excessive debug logging around ra log.

Fix issue that could make leader transfers take 5s+ to complete.
2025-02-03 14:51:05 +00:00
David Ansari 387391a5e7 Impose limit on AMQP filter complexity
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.

Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
2025-02-03 14:21:18 +01:00
David Ansari 60c2bdff85 Bump Qpid JMS AMQP 1.0 client 2025-02-03 10:38:22 +01:00
Michael Klishin 2c9950648f
Merge pull request #13187 from rabbitmq/mk-bump-khepri-timeout
Bump Khepri cluster formation timeout to the standard 5 minutes
2025-02-01 20:05:41 -05:00
Frederik Bosch 481ffb2d6c add option to disable registration of node during cluster formation 2025-02-01 14:09:03 +01:00
Frederik Bosch 8355bc691e add option to disable registration of node during cluster formation 2025-02-01 13:36:22 +01:00
Michael Klishin 082939c428
Merge pull request #13193 from rabbitmq/dependabot/maven/deps/rabbitmq_stream_management/test/http_SUITE_data/main/com.google.code.gson-gson-2.12.1
build(deps-dev): bump com.google.code.gson:gson from 2.12.0 to 2.12.1 in /deps/rabbitmq_stream_management/test/http_SUITE_data
2025-01-31 22:59:58 -05:00
dependabot[bot] 0371958341
build(deps-dev): bump com.google.code.gson:gson
Bumps [com.google.code.gson:gson](https://github.com/google/gson) from 2.12.0 to 2.12.1.
- [Release notes](https://github.com/google/gson/releases)
- [Changelog](https://github.com/google/gson/blob/main/CHANGELOG.md)
- [Commits](https://github.com/google/gson/compare/gson-parent-2.12.0...gson-parent-2.12.1)

---
updated-dependencies:
- dependency-name: com.google.code.gson:gson
  dependency-type: direct:development
  update-type: version-update:semver-patch
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-01-31 18:39:19 +00:00