Commit Graph

3281 Commits

Author SHA1 Message Date
Karl Nilsson 274f12f063 Start the coordination Ra system before quorum_queues
This ensures that quorum_queues shuts down _before_
coordination where khepri run inside.
Quorum queues depend on khepri so need to be shut down first.
2025-04-09 12:53:34 +01:00
Jean-Sébastien Pédron dc5a703c23
Merge pull request #12753 from rabbitmq/md/khepri-0-17
Bump Khepri to 0.17.0
2025-04-09 10:26:53 +02:00
Jean-Sébastien Pédron c8fafa3772
rabbit_db: Note that rabbit_db_msup:create_or_update() is not atomic
... with Khepri.
2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 440eb5b355
Khepri: Export `fence/1` 2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron b4cda4a96a
Improve many testsuites to make them work with mixed versions of Khepri 2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 4811fd44fd
Khepri: Don't sync cluster if the node is already clustered in `khepri_db` enable function
[Why]
The feature flag enable function is called during the initial migration
or when a node is later added to a cluster.

In this latter situation, the cluster is already formed and the Mnesia
tables were already migrated. Syncing the cluster in this specific
situation might kick another node that is currently unreachable.

[How]
If the node running the enable function is already clustered, we skip
the cluster sync.
2025-04-08 18:47:27 +02:00
Michael Davis f5805b83d2
Khepri: Handle breaking change in khepri adv API return type
[Why]
All callers of `khepri_adv` and `khepri_tx_adv` need updates to handle
the now uniform return type of `khepri:node_props_map()` in Khepri
0.17.0.

[How]
We don't need any compatibility code to handle "either the old return
type or the new return type" from the khepri_adv API because the
translation is done entirely in the "client side" code in Khepri -
meaning that the return value from the Ra server is the same but it is
translated differently by the functions in `khepri_adv`.

However, we need to adapt transaction functions because they may be
executed on different versions of Khepri and the behaviour of
`khepri_tx_adv` can be different. To take the possible change of return
value format, we use the new `khepri_tx:does_api_comply_with/1` to know
what to expect.
2025-04-08 18:47:27 +02:00
Michael Davis 9b5ab14faf
Khepri: Adapt to new khepri_cluster:members/2 API
[Why]
In Khepri 0.17.0, `khepri_cluster:locally_known_members/1` and
`khepri_cluster:locally_known_node/1` were replaced with
`khepri_cluster:members/2` and `khepri_cluster:nodes/2` with `favor` set
to `low_latency` - this matches the interface for queries in Khepri.
2025-04-08 18:47:26 +02:00
Karl Nilsson 27ef97ecd7 QQ: handle_tick improvements
Move leader repair earlier in tick function to ensure more
timely update of meta data store record after leader change.

Also use RPC_TIMEOUT macro for metric/stats multicalls to improve
liveness when a node is connected but partitioned / frozen.
2025-04-08 15:39:20 +01:00
Arnaud Cogoluègnes f10e084c51
Bump Logback to 1.5.18 in JMS-over-AMQP tests
The project uses SLF4J 2.x, Logback 1.5.x is compatible with it.
2025-04-08 09:20:20 +02:00
Arnaud Cogoluègnes 12d094bdb3
Use Netty version from AMQP client in JMS-over-AMQP tests
AMQP Java client uses Netty 4.2, QPid JMS uses Netty 4.1. This commit
forces the use of Netty 4.2 (which is backward-compatible with 4.1).
2025-04-08 09:19:49 +02:00
David Ansari 35b5ab3cdc Determine queue topology without checking queue type
## What?
This commit determines the queue topology without checking the queue type.

 ## Why?
This way, checking leader and replicas works the same across all queue
types without the need to introduce other rabbit_queue_type behaviour as
suggested in other PRs.

 ## How?
pid is the leader, nodes in queue_type_states are the members/replicas.

This commit results in an unknown stream leader during queue
declaration. However the correct leader will be returned eventually when
calling GET on the stream.
2025-04-07 16:37:03 +02:00
Arnaud Cogoluègnes 6f5c8e0c7f
Pin Java AMQP 1.0 client to 0.5.0
Because of Netty version mismatch with QPid JMS.
2025-04-07 14:54:31 +02:00
dependabot[bot] 74d7fbe3a2
[skip ci] Bump the prod-deps group across 4 directories with 1 update
Bumps the prod-deps group with 1 update in the /deps/rabbit/test/amqp_jms_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_mqtt/test/java_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_stream/test/rabbit_stream_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_stream_management/test/http_SUITE_data directory: [org.apache.maven.plugins:maven-surefire-plugin](https://github.com/apache/maven-surefire).


Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

Updates `org.apache.maven.plugins:maven-surefire-plugin` from 3.5.2 to 3.5.3
- [Release notes](https://github.com/apache/maven-surefire/releases)
- [Commits](https://github.com/apache/maven-surefire/compare/surefire-3.5.2...surefire-3.5.3)

---
updated-dependencies:
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.apache.maven.plugins:maven-surefire-plugin
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-04-05 18:51:59 +00:00
Jean-Sébastien Pédron 9704d230fa
quorum_queue_SUITE: Improve reliability of a test
... by waiting for a state.
2025-04-04 18:46:29 +02:00
Simon Unge b7c4f66a69 Added 'unlimited' config setting for peer_discovery_retry_limit 2025-04-02 18:34:32 +00:00
Michael Klishin e83c286367
Merge pull request #13643 from rabbitmq/su_aws/try_to_leave_cluster_before_joining
Allow a previously reset node to rejoin its original cluster
2025-04-01 13:20:26 -04:00
Michael Klishin e6bc6a451f
Naming #13643 2025-04-01 12:13:43 -04:00
Simon Unge 36eb6cafc1 Update spec, noconnection is also a possible error 2025-03-31 21:54:02 +00:00
Simon Unge cdeabe22bc Dont handle the exception just let it out there 2025-03-31 21:16:06 +00:00
Simon Unge e1f2865eae Return the exception 2025-03-31 17:55:49 +00:00
Simon Unge 9ba545cbef Fix dialyzer issue. 2025-03-31 17:52:01 +00:00
Arnaud Cogoluègnes 602b6acd7d
Re-evaluate stream SAC group after connection down event
The same connection can contain several consumers belonging to a SAC
group (group key = vhost + stream + consumer name). The whole new group
must be re-evaluated to select a new active consumer after the consumers
of the down connection are removed from it.

The previous behavior would not re-evaluate the new group and could
select a consumer from the down connection, letting the group with only
inactive consumers, as the selected active consumer would never receive
the activation message from the stream SAC coordinator.

This commit fixes this problem by removing the consumers of the down
down connection from the affected groups and then performing the
appropriate operations for the groups to keep on consuming (e.g.
notifying an active consumer that it needs to step down).

References #13372
2025-03-31 14:59:59 +02:00
dependabot[bot] d5fcab2af2
[skip ci] Bump com.google.googlejavaformat:google-java-format
Bumps the dev-deps group with 1 update in the /deps/rabbit/test/amqp_jms_SUITE_data directory: [com.google.googlejavaformat:google-java-format](https://github.com/google/google-java-format).


Updates `com.google.googlejavaformat:google-java-format` from 1.25.2 to 1.26.0
- [Release notes](https://github.com/google/google-java-format/releases)
- [Commits](https://github.com/google/google-java-format/compare/v1.25.2...v1.26.0)

---
updated-dependencies:
- dependency-name: com.google.googlejavaformat:google-java-format
  dependency-type: direct:development
  update-type: version-update:semver-minor
  dependency-group: dev-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-03-29 18:07:51 +00:00
Simon Unge dd49cbe6c3 Mnesia: Ask to leave a cluster and retry to join if cluster already consider node a member.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
2025-03-28 21:24:08 +00:00
Michael Klishin cbb23d65bf
Merge pull request #13648 from rabbitmq/fix-flake-in-rabbit-fifo-int-SUITE
Fix flake(s) in rabbit_fifo_int_SUITE
2025-03-28 14:21:54 -04:00
Karl Nilsson e71fa51925 Speculative flake fix for amqpl_consumer_ack_SUITE.erl 2025-03-28 16:51:32 +00:00
Michal Kuratczyk 9699393da7
[skip ci] fix debug log formatting 2025-03-28 17:47:13 +01:00
Karl Nilsson 1d9f179562 Fix flake(s) in rabbit_fifo_int_SUITE
The start_cluster helper used the same UID (!!) for all members
in the local cluster. This resulted in shared mem tables and
all sorts of havoc.
2025-03-28 13:37:18 +00:00
Michael Klishin 860bb7c47b
Merge pull request #13638 from rabbitmq/ra-2.16.5 2025-03-27 14:33:19 -04:00
Karl Nilsson 4fe96dfd27 Ra 2.16.5 - bug fixes and minor improvements
Ra improvements:

* Don't allow a non-voter to start elections
* Register with ra directory before initialising ra server.
* Trigger tick_timeout immediately after entering leader state.
* Set a configurable segment max size

This commit also includes a change to turn the quorum queue
become leader callback to become a noop and instead rely on
the more promptly tick_handler to handle the meta data store
update after a leader election.

This more prompt tick update means there should be a much shorter
gap between the queue metrics being deleted from the old leader
node to them being available again on the new node resulting
in smoother message count metrics.

Fix test that relied on waiting on too simplistic a property
before asserting.
2025-03-27 17:06:31 +00:00
David Ansari c151806f7c Apply PR formatting feedback
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016008850
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016010107
2025-03-27 11:30:23 +01:00
David Ansari ef1a595a13 Fix crash when consuming from unavailable quorum queue
Prior to this commit, when a client consumed from an unavailable quorum
queue, the following crash occurred:
```
{badmatch,{error,noproc}}
[{rabbit_quorum_queue,consume,3,[{file,\"rabbit_quorum_queue.erl\"},{line,993}]}
```

This commit fixes this bug by returning any error when registering a
quorum queue consumer to rabbit_queue_type.

This commit also refactors errors returned by
rabbit_queue_type:consume/3 to simplify and ensure seperation of
concerns.

For example prior to this commit, the channel did error
formatting specifically for consuming from streams. It's better if
the channel is unaware of what queue type it consumes from and have each
queue type implementation format their own errors.
2025-03-27 11:30:23 +01:00
David Ansari 44657cd393 Bump timeout in RabbitMQ AMQP 1.0 Erlang client
Bump the timeout for management operations and link attachments from 20s
to 30s. We've seen timeouts in CI.

We bump the poll interval of the `?awaitMatch` macro because CI
sometimes flaked by crashing in
0e803de6dd/deps/rabbitmq_amqp_client/src/rabbitmq_amqp_client.erl (L411)
which indicates that the client lib received a response from a previous
request.
2025-03-27 10:48:49 +01:00
Loïc Hoguin fb985bb8b9
Fix the CLI's main module on Windows 2025-03-26 16:32:38 +01:00
Karl Nilsson 26fa541e2c
Merge pull request #13587 from rabbitmq/qq-checkpointing-tweaks-2
QQ: Revise checkpointing logic to take more frequent checkpoints for large message workloads
2025-03-26 10:43:50 +00:00
Karl Nilsson 6695282640 QQ: Revise checkpointing logic
To take more frequent checkpoints for large message workload

Lower the min_checkpoint_interval substantially to allow quorum queues
better control over when checkpoints are taken.

Track bytes enqueued in the aux state and suggest a checkpoint after
every 64MB enqueued (this value is scaled according to backlog just
like the indexes condition).
This should help with more timely checkpointing when very large
messages is used.

Try evaluating byte size independently of time window

also increase max size
2025-03-26 08:23:52 +00:00
Michael Klishin 3a30917809
Merge pull request #13603 from rabbitmq/remove-redundant-queue-type-function
Remove redundant rabbit_queue_type APIs
2025-03-25 17:43:43 -04:00
Iliia Khaprov 8ae0163643 Switch is_<queue_type> to using queue.type field
Also, since queue.type field rendered by QueueMod:format and all queues had it hard-coded here,
I unhardcode them here to use Type name.
2025-03-24 19:15:20 +01:00
Karl Nilsson 0410b7e4a6 Remove rabbit_queue_type:to_binary/1
As it is covered by rabbit_queue_type:short_alias_of/1
2025-03-24 16:28:35 +00:00
Karl Nilsson 73c6f9686f Remove rabbit_queue_type:feature_flag_name/1
As this functionality is covered by the rabbit_queue_type:is_enabled/1
API.
2025-03-24 14:49:54 +00:00
David Ansari 32854e8d34 Auto widen session incoming-window in AMQP 1.0 client
This commit fixes a bug in the Erlang AMQP 1.0 client.

Prior to this commit, to repro this bug:
1. Send more than 2^16 messages to a queue.
2. Grant more than a total of 2^16 link credit initially (on a single link
   or across multiple links) on a single session without any
   auto or manual link credit renewal.

The expectation is that thanks to sufficiently granted initial link-credit,
the client will receive all messages.
However, consumption stops after exactly 2^16-1 messages.

That's because the client lib was never sending a flow frame to the server.
So, after the client received all 2^16-1 messages (the initial
incoming-window set by the client), the server's remote-incoming-window
reached 0 causing the server to stop delivering messages.

The expectation is that the client lib automatically handles session
flow control without any manual involvement of the client app.

This commit implements this fix:
* We keep the server's remote-incoming window always large by default as
  explained in https://www.rabbitmq.com/blog/2024/09/02/amqp-flow-control#incoming-window
* Hence, the client lib sets its incoming-window to 100,000 initially.
* The client lib tracks its incoming-window decrementing it by 1 for
  every transfer it received. (This wasn't done prior to this commit.)
* Whenever this window shrinks below 50,000, the client sends a flow
  frame without any link information widening its incoming-window back to 100,000.
* For test cases (maybe later for apps as well), there is a new function
  `amqp10_client_session:flow/3`, which allows for a test case to do manual
  session flow control. Its API is designed very similar to
  `amqp10_client_session:flow_link/4` in that the test can optionally request
  the lib to auto widen the session window whenever it falls below a certain threshold.
2025-03-19 16:29:02 +00:00
Michael Klishin e93afc5c5b
Merge pull request #13565 from Ayanda-D/extend-amqqueue-tests
Extend rabbit_amqqueue_SUITE and add amqqueue:make_internal/{1,2} type specs
2025-03-19 02:54:45 -04:00
Jean-Sébastien Pédron 14d53f83cd
Merge pull request #13533 from rabbitmq/remove-setup-retries-in-rabbit_khepri
Khepri: Remove setup retries
2025-03-18 14:29:32 +01:00
David Ansari 5bfccbaa28 Improve log message for non-AMQP clients on AMQP port
This is a follow up to #13559 addressing the feedback in
https://github.com/rabbitmq/rabbitmq-server/pull/13559#discussion_r2000439237

The improved logs look as follows:
```
openssl s_client -connect localhost:5672 -tls1_3

[info] <0.946.0> accepting AMQP connection [::1]:49321 -> [::1]:5672
[error] <0.946.0> closing AMQP connection [::1]:49321 -> [::1]:5672 (duration: '0ms'):
[error] <0.946.0> TLS client detected on non-TLS AMQP port. Ensure the client is connecting to the correct port.
```

```
curl http://localhost:5672

[info] <0.954.0> accepting AMQP connection [::1]:49402 -> [::1]:5672
[error] <0.954.0> closing AMQP connection [::1]:49402 -> [::1]:5672 (duration: '0ms'):
[error] <0.954.0> HTTP GET request detected on AMQP port. Ensure the client is connecting to the correct port
```

```
telnet localhost 5672
Trying ::1...
Connected to localhost.
Escape character is '^]'.
hello

[info] <0.946.0> accepting AMQP connection [::1]:49664 -> [::1]:5672
[error] <0.946.0> closing AMQP connection [::1]:49664 -> [::1]:5672 (duration: '2s'):
[error] <0.946.0> client did not start with AMQP protocol header: <<"hello\r\n\r">>
```
2025-03-18 13:36:12 +01:00
Ayanda Dube 762c2ee65a extend rabbit_amqqueue_SUITE with internal_no_owner_queue_delete_with/1 and add amqqueue:make_internal/{1,2} type specs 2025-03-18 11:49:58 +00:00
Loïc Hoguin d52b333649
Merge pull request #12922 from rabbitmq/loic-native-elixir
Native Elixir support in Erlang.mk
2025-03-18 11:03:04 +01:00
Loïc Hoguin c5d150a7ef
Use Erlang.mk's native Elixir support for CLI
This avoids using Mix while compiling which simplifies
a number of things and let us do further build improvements
later on.

Elixir is only enabled from within rabbitmq_cli currently.

Eunit is disabled since there are only Elixir tests.

Dialyzer will force-enable Elixir in order to process
Elixir-compiled beam files.

This commit also includes a few changes that are
related:

 * The Erlang distribution will now be started for parallel-ct

 * Many unnecessary PROJECT_MOD lines have been removed

 * `eunit_formatters` has been removed, it provides little value

 * The new `maybe_flock` Erlang.mk function is used where possible

 * Build test deps when testing rabbitmq_cli (Mix won't do it anymore)

 * rabbitmq_ct_helpers now use the early plugins to have Dialyzer
   properly set up
2025-03-18 10:02:49 +01:00
David Ansari 11e56bdd2d Detect misconfigured HTTP clients
It also happens from time to time that HTTP clients use the wrong port
5672. Like for TLS clients connecting to 5672, RabbitMQ now prints a
more descriptive log message.

For example
```
curl http://localhost:5672
```
will log
```
[info] <0.946.0> accepting AMQP connection [::1]:57736 -> [::1]:5672
[error] <0.946.0> closing AMQP connection <0.946.0> ([::1]:57736 -> [::1]:5672, duration: '1ms'):
[error] <0.946.0> {detected_unexpected_http_header,<<"GET / HT">>}
```

We only check here for GET and not for all other HTTP methods, since
that's the most common case.
2025-03-17 23:43:07 +01:00
David Ansari 7ed3a0b0d8 Log clearer message if TLS client connects to AMQP port
## What?

If a TLS client app is misconfigured trying to connect to AMQP port 5672
instead to the AMQPS port 5671, this commit makes RabbitMQ log a more
descriptive error message.

```
openssl s_client -connect localhost:5672 -tls1_3
openssl s_client -connect localhost:5672 -tls1_2
```

RabbitMQ logs prior to this commit:
```
[info] <0.1073.0> accepting AMQP connection [::1]:53535 -> [::1]:5672
[error] <0.1073.0> closing AMQP connection <0.1073.0> ([::1]:53535 -> [::1]:5672, duration: '0ms'):
[error] <0.1073.0> {bad_header,<<22,3,1,0,192,1,0,0>>}

[info] <0.1080.0> accepting AMQP connection [::1]:53577 -> [::1]:5672
[error] <0.1080.0> closing AMQP connection <0.1080.0> ([::1]:53577 -> [::1]:5672, duration: '1ms'):
[error] <0.1080.0> {bad_header,<<22,3,1,0,224,1,0,0>>}
```

RabbitMQ logs after this commit:
```
[info] <0.969.0> accepting AMQP connection [::1]:53632 -> [::1]:5672
[error] <0.969.0> closing AMQP connection <0.969.0> ([::1]:53632 -> [::1]:5672, duration: '0ms'):
[error] <0.969.0> {detected_unexpected_tls_header,<<22,3,1,0,192,1,0,0>>

[info] <0.975.0> accepting AMQP connection [::1]:53638 -> [::1]:5672
[error] <0.975.0> closing AMQP connection <0.975.0> ([::1]:53638 -> [::1]:5672, duration: '1ms'):
[error] <0.975.0> {detected_unexpected_tls_header,<<22,3,1,0,224,1,0,0>>}
```

 ## Why?

I've seen numerous occurrences in the past few years where misconfigured TLS apps
connected to the wrong port. Therefore, RabbitMQ trying to detect a TLS client
and providing a more descriptive log message seems appropriate to me.

 ## How?

The first few bytes of any TLS connection are:

Record Type (1 byte):
Always 0x16 (22 in decimal) for a Handshake message.

Version (2 bytes):
This represents the highest version of TLS that the client supports. Common values:
0x0301 → TLS 1.0 (or SSL 3.1)
0x0302 → TLS 1.1
0x0303 → TLS 1.2
0x0304 → TLS 1.3

Record Length (2 bytes):
Specifies the length of the following handshake message.

Handshake Type (1 byte, usually the 6th byte overall):
Always 0x01 for ClientHello.
2025-03-17 22:48:42 +01:00