Commit Graph

30804 Commits

Author SHA1 Message Date
Karl Nilsson 274f12f063 Start the coordination Ra system before quorum_queues
This ensures that quorum_queues shuts down _before_
coordination where khepri run inside.
Quorum queues depend on khepri so need to be shut down first.
2025-04-09 12:53:34 +01:00
Jean-Sébastien Pédron dc5a703c23
Merge pull request #12753 from rabbitmq/md/khepri-0-17
Bump Khepri to 0.17.0
2025-04-09 10:26:53 +02:00
Michael Klishin 9bb5dc2ef0
Merge pull request #13698 from rabbitmq/loic-require-auth-api-desc-page
Add new option require_auth_for_api_desc_page to mgmt
2025-04-09 02:29:47 -04:00
Michael Klishin 20188a770e
rabbitmq.conf schema and tests for #13698 2025-04-09 02:02:47 -04:00
Michael Klishin b4fe2cc661
Merge pull request #13703 from rabbitmq/qq-handle-tick-tweaks
QQ: handle tick tweaks
2025-04-08 15:57:59 -04:00
Jean-Sébastien Pédron c8fafa3772
rabbit_db: Note that rabbit_db_msup:create_or_update() is not atomic
... with Khepri.
2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 440eb5b355
Khepri: Export `fence/1` 2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron b4cda4a96a
Improve many testsuites to make them work with mixed versions of Khepri 2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 124467e620
rabbitmq_ct_helpers: Use node 2 as seed node, even with secondary umbrella
[Why]
This makes sure that nodes are clustered the same way, whether the tests
are executed with or without a secondary umbrella.
2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 4811fd44fd
Khepri: Don't sync cluster if the node is already clustered in `khepri_db` enable function
[Why]
The feature flag enable function is called during the initial migration
or when a node is later added to a cluster.

In this latter situation, the cluster is already formed and the Mnesia
tables were already migrated. Syncing the cluster in this specific
situation might kick another node that is currently unreachable.

[How]
If the node running the enable function is already clustered, we skip
the cluster sync.
2025-04-08 18:47:27 +02:00
Michael Davis f5805b83d2
Khepri: Handle breaking change in khepri adv API return type
[Why]
All callers of `khepri_adv` and `khepri_tx_adv` need updates to handle
the now uniform return type of `khepri:node_props_map()` in Khepri
0.17.0.

[How]
We don't need any compatibility code to handle "either the old return
type or the new return type" from the khepri_adv API because the
translation is done entirely in the "client side" code in Khepri -
meaning that the return value from the Ra server is the same but it is
translated differently by the functions in `khepri_adv`.

However, we need to adapt transaction functions because they may be
executed on different versions of Khepri and the behaviour of
`khepri_tx_adv` can be different. To take the possible change of return
value format, we use the new `khepri_tx:does_api_comply_with/1` to know
what to expect.
2025-04-08 18:47:27 +02:00
Michael Davis 9b5ab14faf
Khepri: Adapt to new khepri_cluster:members/2 API
[Why]
In Khepri 0.17.0, `khepri_cluster:locally_known_members/1` and
`khepri_cluster:locally_known_node/1` were replaced with
`khepri_cluster:members/2` and `khepri_cluster:nodes/2` with `favor` set
to `low_latency` - this matches the interface for queries in Khepri.
2025-04-08 18:47:26 +02:00
Karl Nilsson 27ef97ecd7 QQ: handle_tick improvements
Move leader repair earlier in tick function to ensure more
timely update of meta data store record after leader change.

Also use RPC_TIMEOUT macro for metric/stats multicalls to improve
liveness when a node is connected but partitioned / frozen.
2025-04-08 15:39:20 +01:00
Michal Kuratczyk 6513d028e3
Avoid crash when reporting federation status
This should address crashes like this in (found in user's logs):
```
exception error: no case clause matching
                  [[{connection_details,[]},
                    {name,<<"10.0.13.41:50497 -> 10.2.230.128:5671 (1)">>},
                    {node,rabbit@foobar},
                    {number,1},
                    {user,<<"...">>},
                    {user_who_performed_action,<<"...">>},
                    {vhost,<<"/">>}],
                   [{connection_details,[]},
                    {name,<<"10.0.13.41:50142 -> 10.2.230.128:5671 (1)">>},
                    {node,rabbit@foobar},
                    {number,1},
                    {user,<<"...">>},
                    {user_who_performed_action,<<"...">>},
                    {vhost,<<"/">>}]]
   in function  rabbit_federation_mgmt:format/3 (rabbit_federation_mgmt.erl, line 100)
   in call from rabbit_federation_mgmt:'-status/3-lc$^0/1-0-'/4 (rabbit_federation_mgmt.erl, line 89)
   in call from rabbit_federation_mgmt:'-status/4-lc$^0/1-0-'/3 (rabbit_federation_mgmt.erl, line 82)
   in call from rabbit_federation_mgmt:'-status/4-lc$^0/1-0-'/3 (rabbit_federation_mgmt.erl, line 82)
   in call from rabbit_federation_mgmt:status/4 (rabbit_federation_mgmt.erl, line 82)
   in call from rabbit_federation_mgmt:to_json/2 (rabbit_federation_mgmt.erl, line 57)
   in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
   in call from cowboy_rest:set_resp_body/2 (src/cowboy_rest.erl, line 1473)
```
2025-04-08 12:05:10 +02:00
Arnaud Cogoluègnes f10e084c51
Bump Logback to 1.5.18 in JMS-over-AMQP tests
The project uses SLF4J 2.x, Logback 1.5.x is compatible with it.
2025-04-08 09:20:20 +02:00
Arnaud Cogoluègnes 12d094bdb3
Use Netty version from AMQP client in JMS-over-AMQP tests
AMQP Java client uses Netty 4.2, QPid JMS uses Netty 4.1. This commit
forces the use of Netty 4.2 (which is backward-compatible with 4.1).
2025-04-08 09:19:49 +02:00
David Ansari 561376052e Fix type spec for AMQP 1.0 address
The target address can be null which denotes the anonymous terminus.
https://docs.oasis-open.org/amqp/anonterm/v1.0/anonterm-v1.0.html
2025-04-07 16:37:17 +02:00
David Ansari 35b5ab3cdc Determine queue topology without checking queue type
## What?
This commit determines the queue topology without checking the queue type.

 ## Why?
This way, checking leader and replicas works the same across all queue
types without the need to introduce other rabbit_queue_type behaviour as
suggested in other PRs.

 ## How?
pid is the leader, nodes in queue_type_states are the members/replicas.

This commit results in an unknown stream leader during queue
declaration. However the correct leader will be returned eventually when
calling GET on the stream.
2025-04-07 16:37:03 +02:00
Loïc Hoguin 400e8006e5
Add new option require_auth_for_api_desc_page to mgmt
This allows restricting access to the /api/index.html and
the /cli/index.html page to authenticated users should the
user really want to. This can be enabled via advanced.config.
2025-04-07 15:59:13 +02:00
Arnaud Cogoluègnes 6f5c8e0c7f
Pin Java AMQP 1.0 client to 0.5.0
Because of Netty version mismatch with QPid JMS.
2025-04-07 14:54:31 +02:00
dependabot[bot] 74d7fbe3a2
[skip ci] Bump the prod-deps group across 4 directories with 1 update
Bumps the prod-deps group with 1 update in the /deps/rabbit/test/amqp_jms_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_mqtt/test/java_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_stream/test/rabbit_stream_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_stream_management/test/http_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).


Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-05 18:51:59 +00:00
Jean-Sébastien Pédron 9704d230fa
quorum_queue_SUITE: Improve reliability of a test
... by waiting for a state.
2025-04-04 18:46:29 +02:00
Michael Klishin e23266253e
Merge pull request #13674 from rabbitmq/avoid-crash-on-stream-connections
Ignore stream connections in unexpected states
2025-04-02 22:56:20 -04:00
Michal Kuratczyk 09ed8fdc07
Ignore stream connections in unexpected states
A connection which terminated before it was fully established
would lead to a function_clause, since metadata is not available
to really call notify_connection_closed. We can just ignore such
connections and not notify about them.

Resolves https://github.com/rabbitmq/rabbitmq-server/discussions/13670
2025-04-02 23:38:55 +02:00
Simon Unge b7c4f66a69 Added 'unlimited' config setting for peer_discovery_retry_limit 2025-04-02 18:34:32 +00:00
Michael Klishin e83c286367
Merge pull request #13643 from rabbitmq/su_aws/try_to_leave_cluster_before_joining
Allow a previously reset node to rejoin its original cluster
2025-04-01 13:20:26 -04:00
Michael Klishin e6bc6a451f
Naming #13643 2025-04-01 12:13:43 -04:00
Marcial Rosales 8dfcfa61e4 Use relative path for the path linked to the cookie
used by mangement ui oauth logic to store the
token until it is moved onto the local storage
2025-04-01 14:02:51 +02:00
Simon Unge 36eb6cafc1 Update spec, noconnection is also a possible error 2025-03-31 21:54:02 +00:00
Simon Unge cdeabe22bc Dont handle the exception just let it out there 2025-03-31 21:16:06 +00:00
Simon Unge e1f2865eae Return the exception 2025-03-31 17:55:49 +00:00
Simon Unge 9ba545cbef Fix dialyzer issue. 2025-03-31 17:52:01 +00:00
Arnaud Cogoluègnes 602b6acd7d
Re-evaluate stream SAC group after connection down event
The same connection can contain several consumers belonging to a SAC
group (group key = vhost + stream + consumer name). The whole new group
must be re-evaluated to select a new active consumer after the consumers
of the down connection are removed from it.

The previous behavior would not re-evaluate the new group and could
select a consumer from the down connection, letting the group with only
inactive consumers, as the selected active consumer would never receive
the activation message from the stream SAC coordinator.

This commit fixes this problem by removing the consumers of the down
down connection from the affected groups and then performing the
appropriate operations for the groups to keep on consuming (e.g.
notifying an active consumer that it needs to step down).

References #13372
2025-03-31 14:59:59 +02:00
dependabot[bot] d5fcab2af2
[skip ci] Bump com.google.googlejavaformat:google-java-format
Bumps the dev-deps group with 1 update in the /deps/rabbit/test/amqp_jms_SUITE_data directory: [com.google.googlejavaformat:google-java-format](https://github.com/google/google-java-format).


Updates `com.google.googlejavaformat:google-java-format` from 1.25.2 to 1.26.0
- [Release notes](https://github.com/google/google-java-format/releases)
- [Commits](https://github.com/google/google-java-format/compare/v1.25.2...v1.26.0)

---
updated-dependencies:
- dependency-name: com.google.googlejavaformat:google-java-format
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: dev-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-29 18:07:51 +00:00
Simon Unge dd49cbe6c3 Mnesia: Ask to leave a cluster and retry to join if cluster already consider node a member.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
2025-03-28 21:24:08 +00:00
Michael Klishin cbb23d65bf
Merge pull request #13648 from rabbitmq/fix-flake-in-rabbit-fifo-int-SUITE
Fix flake(s) in rabbit_fifo_int_SUITE
2025-03-28 14:21:54 -04:00
Karl Nilsson e71fa51925 Speculative flake fix for amqpl_consumer_ack_SUITE.erl 2025-03-28 16:51:32 +00:00
Michal Kuratczyk 9699393da7
[skip ci] fix debug log formatting 2025-03-28 17:47:13 +01:00
Karl Nilsson 1d9f179562 Fix flake(s) in rabbit_fifo_int_SUITE
The start_cluster helper used the same UID (!!) for all members
in the local cluster. This resulted in shared mem tables and
all sorts of havoc.
2025-03-28 13:37:18 +00:00
Michal Kuratczyk f0976b48b2
queue info metric: guard against whereis returning `undefined` (#13646) 2025-03-28 12:37:42 +01:00
Michael Klishin 3756775ebe
Revert "Redirect to end_session_endpoint for idp_initiated logon when it is configured" 2025-03-28 02:31:09 -04:00
Michael Klishin ab8799a739
Redirect to end_session_endpoint for idp-initiated logon
Conflicts:
	selenium/bin/components/fakeportal
2025-03-27 19:01:01 -04:00
Michael Klishin 860bb7c47b
Merge pull request #13638 from rabbitmq/ra-2.16.5 2025-03-27 14:33:19 -04:00
Karl Nilsson 4fe96dfd27 Ra 2.16.5 - bug fixes and minor improvements
Ra improvements:

* Don't allow a non-voter to start elections
* Register with ra directory before initialising ra server.
* Trigger tick_timeout immediately after entering leader state.
* Set a configurable segment max size

This commit also includes a change to turn the quorum queue
become leader callback to become a noop and instead rely on
the more promptly tick_handler to handle the meta data store
update after a leader election.

This more prompt tick update means there should be a much shorter
gap between the queue metrics being deleted from the old leader
node to them being available again on the new node resulting
in smoother message count metrics.

Fix test that relied on waiting on too simplistic a property
before asserting.
2025-03-27 17:06:31 +00:00
Michal Kuratczyk 2a93bbcebd
RMQ-1460: Emit queue_info metric (#13583)
To allow filtering on queue type or membership status,
we need an info metric for queues; see
https://grafana.com/blog/2021/08/04/how-to-use-promql-joins-for-more-effective-queries-of-prometheus-metrics-at-scale/#info-metrics

With this change, per-object metrics and the detailed metrics
(if queue-related families are requested) will contain
rabbitmq_queue_info / rabbitmq_detailed_queue_info with a value of 1
and labels including the queue name, vhost, queue type and membership
status.
2025-03-27 15:54:26 +01:00
David Ansari c151806f7c Apply PR formatting feedback
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016008850
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016010107
2025-03-27 11:30:23 +01:00
David Ansari ef1a595a13 Fix crash when consuming from unavailable quorum queue
Prior to this commit, when a client consumed from an unavailable quorum
queue, the following crash occurred:
```
{badmatch,{error,noproc}}
[{rabbit_quorum_queue,consume,3,[{file,\"rabbit_quorum_queue.erl\"},{line,993}]}
```

This commit fixes this bug by returning any error when registering a
quorum queue consumer to rabbit_queue_type.

This commit also refactors errors returned by
rabbit_queue_type:consume/3 to simplify and ensure seperation of
concerns.

For example prior to this commit, the channel did error
formatting specifically for consuming from streams. It's better if
the channel is unaware of what queue type it consumes from and have each
queue type implementation format their own errors.
2025-03-27 11:30:23 +01:00
David Ansari 44657cd393 Bump timeout in RabbitMQ AMQP 1.0 Erlang client
Bump the timeout for management operations and link attachments from 20s
to 30s. We've seen timeouts in CI.

We bump the poll interval of the `?awaitMatch` macro because CI
sometimes flaked by crashing in
0e803de6dd/deps/rabbitmq_amqp_client/src/rabbitmq_amqp_client.erl (L411)
which indicates that the client lib received a response from a previous
request.
2025-03-27 10:48:49 +01:00
Iliia Khaprov 9efa0d9ffe
RMQ-1263: Shovel Management - add help strings for shovel counters
(cherry picked from commit 8e79a7f500c2df355f3ec7ac1fa1bdd3a8dff6a4)
2025-03-27 00:49:46 -04:00
Iliia Khaprov 6e871f6ab3
RMQ-1263: Shovels Management: show metrics (incl. forwarded counter) in the Shovel Status page
(cherry picked from commit f90dab71f147548c5e9ad921a0bc618179bd34c2)

Conflicts:
	deps/rabbitmq_shovel_management/src/rabbit_shovel_mgmt_util.erl
2025-03-27 00:49:08 -04:00