Commit Graph

2232 Commits

Author SHA1 Message Date
Arnaud Cogoluègnes cad8b70ee8
Fix partition index conflict in stream SAC coordinator
Consumers with a same name, consuming from the same stream should have
the same partition index. This commit adds a check to enforce this rule
and make the subscription fail if it does not comply.

Fixes #13835
2025-05-06 16:11:46 +02:00
Michael Klishin 53f511fa15
Merge pull request #13837 from rabbitmq/dqt-export-fix
Modify default queue type injection logic
2025-05-05 21:20:24 +04:00
Michal Kuratczyk 435274bc83
Fix formatter crash in rabbit_reader 2025-05-02 14:56:07 +02:00
Michal Kuratczyk 9bd11b449f
Set the DQT in rabbit_vhost:do_add 2025-05-02 13:58:00 +02:00
Michal Kuratczyk 9d0f01b45b
Add DQT to vhost metadata on recovery
Vhosts that currently don't have their own default queue type, now
inherit it from the node configuration and store it in their metadata
going forward.
2025-05-01 17:28:32 +02:00
Michal Kuratczyk 3c95bf32e7
vhost inherits DQT from node
Rather than injecting node-level DQT when exporting definitions,
inject it into vhost's metadata when a vhost is created.
2025-05-01 17:28:32 +02:00
Michal Kuratczyk 73da2a3fbb
Fix DQT in definition export (redundant property)
The correct place for the `default_queue_type` property
is inside the `metadata` block. However, right now we'd
always export the value outside of `metadata` AND only
export it inside `metadata`, if it was not `undefined`.

This value outside of `metadata` was just misleading:
if a user exported the definitins from a fresh node,
changed `classic` to `quorum` and imported such modified
values, the DQT would still be `classic`, because RMQ looks
for the value inside `metadata`. Just to make it more confusing,
if the DQT was changed successfully one way or another, the
value outside of `metadata` would reflect that
(it always shows the correct value, but is ignored on import).
2025-05-01 17:28:32 +02:00
Michael Klishin 164d495dfc
Merge pull request #13818 from rabbitmq/rabbitmq-server-13767
By @aaron-seo: Adds a new auth backend that only accepts loopback connections
2025-04-27 12:58:02 +04:00
Aaron Seo 3bcdc0f359
Fallback to original implementation of plain auth_mechanism if socket is not provided 2025-04-24 13:41:57 -07:00
Jean-Sébastien Pédron 5300076e33
Khepri: Clean up the proxy functions of the integration code
[Why]
The `rabbit_khepri` module grew during the work to add Khepri support to
RabbitMQ and while Khepri was itself written. The current code is
therefore unorganized.

[How]
This commit tries to change proxy functions to be close to their Khepri
equivalent.

The module continues to set non-default options for write functions. We
also add the variants that take an option map to be consistent and not
have to deal with that in the future.

Several legacy functions were removed, either because they were no
longer called or because they were replace by a regular Khepri call.
2025-04-24 16:06:20 +02:00
Jean-Sébastien Pédron bd3aee35b4
Khepri: Clean up the setup/clustering code of the integration code
[Why]
The `rabbit_khepri` module grew during the work to add Khepri support to
RabbitMQ and while Khepri was itself written. The current code is
therefore unorganized.

[How]
This commit tries to sort the code that manages the setup of Khepri and
the functions tha deal with the Khepri cluster. It also groups functions
which provide support for CLI commands.

It also adds documentation to several functions.

Finally, when a node joins a cluster, we stop displaying the content of
the Khepri tree.
2025-04-24 11:57:51 +02:00
David Ansari 77e73deede Intercept outgoing just before conversion
Intercept outgoing message just before conversion to target protocol as
this will give most flexibility to 3rd party plugins.
2025-04-23 14:01:42 +02:00
David Ansari a24ba55d45 Store message interceptor context in MQTT proc state
It's a tradeoff between building the map for each incoming and outgoing
message (now that there are also outgoing interceptors) vs increased
memory usage for the MQTT proc state.

Connecting with MQTT 5.0 and client ID "xxxxxxxx", the number of words
are 201 before this commit vs 235 after this commit as determined by:
```
S = sys:get_state(MQTTConnectionPid),
erts_debug:size(S).
```
Therefore, this commit requires 34 word * 8 bytes = 272 bytes more per MQTT
connection, that is 272 MB more for 1,000,000 MQTT connections.
2025-04-23 14:01:42 +02:00
David Ansari 21bd300d61 Support outgoing message interceptors 2025-04-23 14:01:42 +02:00
David Ansari 6ade94f50b Improve message interceptors
1. Force the config for timestamp and routing node message interceptors
   to be configured with the overwrite boolean() to avoid defining
   multiple default values throughout the code.
2. Add type specs
3. Extend existing test case for new MQTT client ID interceptor
4. routing node and timestamp should only set the annotation for
   incoming_message_interceptors group
5. Fix `rabbitmq.conf`.
   Prior to this commit there were several issue:
   a.) Setting the right configuration was too user unfriendly, e.g. the user has to set
   ```
   message_interceptor.incoming.rabbit_mqtt_message_interceptor_client_id.annotation_key = x-opt-mqtt-client-id
   ```
   just to enable the MQTT message interceptor.
   b.) The code that parses was too difficult to understand
   c.) MQTT plugin was setting the env for app rabbit, which is an anti-pattern
   d.) disabling a plugin (e.g. MQTT), left its message interceptors still in place
   This is now all fixed, the user sets the rabbitmq.conf as follows:
   ```
   message_interceptors.incoming.set_header_timestamp.overwrite = true
   message_interceptors.incoming.set_header_routing_node.overwrite = false
   mqtt.message_interceptors.incoming.set_client_id_annotation.enabled = true
   ```
   Note that the first two lines use the same format as for RabbitMQ 4.0
   for backwards compatiblity. The last line (MQTT) follows a similar
   pattern.
2025-04-23 14:01:42 +02:00
Lois Soto Lopez 9936b8de69 Add incoming message interceptors
This commit enables users to provide custom message interceptor modules,
i.e. modules to process incoming and outgoing messages. The
`rabbit_message_interceptor` behaviour defines a `intercept/4` callback,
for those modules to implement.

Co-authored-by: Péter Gömöri <gomoripeti@users.noreply.github.com>
2025-04-23 14:01:42 +02:00
Jean-Sébastien Pédron a528a415d3
Khepri: Mark `khepri_db` as stable
[Why]
The intent is to have it stable and enabled by default for new
deployment in RabbitMQ 4.1.x.

To prepare for this goal, it is time to mark the feature flag as stable
to let us iron out the library and its integration into RabbitMQ.

This is not a commitment at this stage: we will revisit this near the
beginning of the release cycle and commit to it or revert to
experimental.
2025-04-23 11:34:32 +02:00
Loïc Hoguin 7138e8a0cc
CQ: Fix rare eof crash of message store with fanout 2025-04-18 13:50:57 +02:00
Michael Klishin 596e3ef41a
Cosmetics 2025-04-15 00:57:39 -04:00
Michal Kuratczyk 589e0b578c
Remove log level tests (#13723)
When debug logging is enabled, we log something at each log level
to test if logs are emitted. I don't think this is particularly useful,
but it's certainly annoying, because I constatnly need to filter
out these logs when searching if any errors happened during tests.
2025-04-11 12:13:06 +02:00
David Ansari 6eb1f87e14
Fix concurrent AMQP queue declarations (#13727)
* Fix concurrent AMQP queue declarations

Prior to this commit, when AMQP clients declared the same queues
concurrently, the following crash occurred:
```
  │ *Error{Condition: amqp:internal-error, Description: {badmatch,{<<"200">>,
  │            {map,[{{utf8,<<"leader">>},{utf8,<<"rabbit-2@carrot">>}},
  │                  {{utf8,<<"message_count">>},{ulong,0}},
  │                  {{utf8,<<"consumer_count">>},{uint,0}},
  │                  {{utf8,<<"name">>},{utf8,<<"cq-145">>}},
  │                  {{utf8,<<"vhost">>},{utf8,<<"/">>}},
  │                  {{utf8,<<"durable">>},{boolean,true}},
  │                  {{utf8,<<"auto_delete">>},{boolean,false}},
  │                  {{utf8,<<"exclusive">>},{boolean,false}},
  │                  {{utf8,<<"type">>},{utf8,<<"classic">>}},
  │                  {{utf8,<<"arguments">>},
  │                   {map,[{{utf8,<<"x-queue-type">>},{utf8,<<"classic">>}}]}},
  │                  {{utf8,<<"replicas">>},
  │                   {array,utf8,[{utf8,<<"rabbit-2@carrot">>}]}}]},
  │            {[{{resource,<<"/">>,queue,<<"cq-145">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-144">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-143">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-142">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-141">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-140">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-139">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-138">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-137">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-136">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-135">>},configure},
  │              {{resource,<<"/">>,queue,<<"cq-134">>},configure}],
  │             []}}}
  │ [{rabbit_amqp_management,handle_http_req,8,
  │                          [{file,"rabbit_amqp_management.erl"},{line,130}]},
  │  {rabbit_amqp_management,handle_request,5,
  │                          [{file,"rabbit_amqp_management.erl"},{line,43}]},
  │  {rabbit_amqp_session,incoming_mgmt_link_transfer,3,
  │                       [{file,"rabbit_amqp_session.erl"},{line,2317}]},
  │  {rabbit_amqp_session,handle_frame,2,
  │                       [{file,"rabbit_amqp_session.erl"},{line,963}]},
  │  {rabbit_amqp_session,handle_cast,2,
  │                       [{file,"rabbit_amqp_session.erl"},{line,539}]},
  │  {gen_server,try_handle_cast,3,[{file,"gen_server.erl"},{line,2371}]},
  │  {gen_server,handle_msg,6,[{file,"gen_server.erl"},{line,2433}]},
  │  {proc_lib,init_p_do_apply,3,[{file,"proc_lib.erl"},{line,329}]}], Info: map[]}
```

To repro, run the following command in parallel in two separate terminals:
```
./omq amqp -x 10000 -t /queues/cq-%d -y 0 -C 0 --queues classic  classic
```

* Simplify
2025-04-11 12:04:00 +02:00
Aaron Seo 6d24aef9b0
Adds rabbit_auth_backend_internal_loopback
This auth backend behaves the same as the internal backend provided in
the core broker, but it only accepts loopback connections. External
connection attempts will receive an error.
2025-04-09 12:53:29 -07:00
Karl Nilsson 274f12f063 Start the coordination Ra system before quorum_queues
This ensures that quorum_queues shuts down _before_
coordination where khepri run inside.
Quorum queues depend on khepri so need to be shut down first.
2025-04-09 12:53:34 +01:00
Jean-Sébastien Pédron dc5a703c23
Merge pull request #12753 from rabbitmq/md/khepri-0-17
Bump Khepri to 0.17.0
2025-04-09 10:26:53 +02:00
Jean-Sébastien Pédron c8fafa3772
rabbit_db: Note that rabbit_db_msup:create_or_update() is not atomic
... with Khepri.
2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 440eb5b355
Khepri: Export `fence/1` 2025-04-08 18:47:27 +02:00
Jean-Sébastien Pédron 4811fd44fd
Khepri: Don't sync cluster if the node is already clustered in `khepri_db` enable function
[Why]
The feature flag enable function is called during the initial migration
or when a node is later added to a cluster.

In this latter situation, the cluster is already formed and the Mnesia
tables were already migrated. Syncing the cluster in this specific
situation might kick another node that is currently unreachable.

[How]
If the node running the enable function is already clustered, we skip
the cluster sync.
2025-04-08 18:47:27 +02:00
Michael Davis f5805b83d2
Khepri: Handle breaking change in khepri adv API return type
[Why]
All callers of `khepri_adv` and `khepri_tx_adv` need updates to handle
the now uniform return type of `khepri:node_props_map()` in Khepri
0.17.0.

[How]
We don't need any compatibility code to handle "either the old return
type or the new return type" from the khepri_adv API because the
translation is done entirely in the "client side" code in Khepri -
meaning that the return value from the Ra server is the same but it is
translated differently by the functions in `khepri_adv`.

However, we need to adapt transaction functions because they may be
executed on different versions of Khepri and the behaviour of
`khepri_tx_adv` can be different. To take the possible change of return
value format, we use the new `khepri_tx:does_api_comply_with/1` to know
what to expect.
2025-04-08 18:47:27 +02:00
Michael Davis 9b5ab14faf
Khepri: Adapt to new khepri_cluster:members/2 API
[Why]
In Khepri 0.17.0, `khepri_cluster:locally_known_members/1` and
`khepri_cluster:locally_known_node/1` were replaced with
`khepri_cluster:members/2` and `khepri_cluster:nodes/2` with `favor` set
to `low_latency` - this matches the interface for queries in Khepri.
2025-04-08 18:47:26 +02:00
Karl Nilsson 27ef97ecd7 QQ: handle_tick improvements
Move leader repair earlier in tick function to ensure more
timely update of meta data store record after leader change.

Also use RPC_TIMEOUT macro for metric/stats multicalls to improve
liveness when a node is connected but partitioned / frozen.
2025-04-08 15:39:20 +01:00
David Ansari 35b5ab3cdc Determine queue topology without checking queue type
## What?
This commit determines the queue topology without checking the queue type.

 ## Why?
This way, checking leader and replicas works the same across all queue
types without the need to introduce other rabbit_queue_type behaviour as
suggested in other PRs.

 ## How?
pid is the leader, nodes in queue_type_states are the members/replicas.

This commit results in an unknown stream leader during queue
declaration. However the correct leader will be returned eventually when
calling GET on the stream.
2025-04-07 16:37:03 +02:00
Michael Klishin e83c286367
Merge pull request #13643 from rabbitmq/su_aws/try_to_leave_cluster_before_joining
Allow a previously reset node to rejoin its original cluster
2025-04-01 13:20:26 -04:00
Michael Klishin e6bc6a451f
Naming #13643 2025-04-01 12:13:43 -04:00
Simon Unge 36eb6cafc1 Update spec, noconnection is also a possible error 2025-03-31 21:54:02 +00:00
Simon Unge cdeabe22bc Dont handle the exception just let it out there 2025-03-31 21:16:06 +00:00
Simon Unge e1f2865eae Return the exception 2025-03-31 17:55:49 +00:00
Simon Unge 9ba545cbef Fix dialyzer issue. 2025-03-31 17:52:01 +00:00
Arnaud Cogoluègnes 602b6acd7d
Re-evaluate stream SAC group after connection down event
The same connection can contain several consumers belonging to a SAC
group (group key = vhost + stream + consumer name). The whole new group
must be re-evaluated to select a new active consumer after the consumers
of the down connection are removed from it.

The previous behavior would not re-evaluate the new group and could
select a consumer from the down connection, letting the group with only
inactive consumers, as the selected active consumer would never receive
the activation message from the stream SAC coordinator.

This commit fixes this problem by removing the consumers of the down
down connection from the affected groups and then performing the
appropriate operations for the groups to keep on consuming (e.g.
notifying an active consumer that it needs to step down).

References #13372
2025-03-31 14:59:59 +02:00
Simon Unge dd49cbe6c3 Mnesia: Ask to leave a cluster and retry to join if cluster already consider node a member.
Khepri: no-op. Khepri is less strict already, and rabbit_khepri:can_join would accept a join request from a node that is already a member
2025-03-28 21:24:08 +00:00
Michal Kuratczyk 9699393da7
[skip ci] fix debug log formatting 2025-03-28 17:47:13 +01:00
Michael Klishin 860bb7c47b
Merge pull request #13638 from rabbitmq/ra-2.16.5 2025-03-27 14:33:19 -04:00
Karl Nilsson 4fe96dfd27 Ra 2.16.5 - bug fixes and minor improvements
Ra improvements:

* Don't allow a non-voter to start elections
* Register with ra directory before initialising ra server.
* Trigger tick_timeout immediately after entering leader state.
* Set a configurable segment max size

This commit also includes a change to turn the quorum queue
become leader callback to become a noop and instead rely on
the more promptly tick_handler to handle the meta data store
update after a leader election.

This more prompt tick update means there should be a much shorter
gap between the queue metrics being deleted from the old leader
node to them being available again on the new node resulting
in smoother message count metrics.

Fix test that relied on waiting on too simplistic a property
before asserting.
2025-03-27 17:06:31 +00:00
David Ansari c151806f7c Apply PR formatting feedback
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016008850
https://github.com/rabbitmq/rabbitmq-server/pull/13625#discussion_r2016010107
2025-03-27 11:30:23 +01:00
David Ansari ef1a595a13 Fix crash when consuming from unavailable quorum queue
Prior to this commit, when a client consumed from an unavailable quorum
queue, the following crash occurred:
```
{badmatch,{error,noproc}}
[{rabbit_quorum_queue,consume,3,[{file,\"rabbit_quorum_queue.erl\"},{line,993}]}
```

This commit fixes this bug by returning any error when registering a
quorum queue consumer to rabbit_queue_type.

This commit also refactors errors returned by
rabbit_queue_type:consume/3 to simplify and ensure seperation of
concerns.

For example prior to this commit, the channel did error
formatting specifically for consuming from streams. It's better if
the channel is unaware of what queue type it consumes from and have each
queue type implementation format their own errors.
2025-03-27 11:30:23 +01:00
Karl Nilsson 26fa541e2c
Merge pull request #13587 from rabbitmq/qq-checkpointing-tweaks-2
QQ: Revise checkpointing logic to take more frequent checkpoints for large message workloads
2025-03-26 10:43:50 +00:00
Karl Nilsson 6695282640 QQ: Revise checkpointing logic
To take more frequent checkpoints for large message workload

Lower the min_checkpoint_interval substantially to allow quorum queues
better control over when checkpoints are taken.

Track bytes enqueued in the aux state and suggest a checkpoint after
every 64MB enqueued (this value is scaled according to backlog just
like the indexes condition).
This should help with more timely checkpointing when very large
messages is used.

Try evaluating byte size independently of time window

also increase max size
2025-03-26 08:23:52 +00:00
Michael Klishin 3a30917809
Merge pull request #13603 from rabbitmq/remove-redundant-queue-type-function
Remove redundant rabbit_queue_type APIs
2025-03-25 17:43:43 -04:00
Iliia Khaprov 8ae0163643 Switch is_<queue_type> to using queue.type field
Also, since queue.type field rendered by QueueMod:format and all queues had it hard-coded here,
I unhardcode them here to use Type name.
2025-03-24 19:15:20 +01:00
Karl Nilsson 0410b7e4a6 Remove rabbit_queue_type:to_binary/1
As it is covered by rabbit_queue_type:short_alias_of/1
2025-03-24 16:28:35 +00:00
Karl Nilsson 73c6f9686f Remove rabbit_queue_type:feature_flag_name/1
As this functionality is covered by the rabbit_queue_type:is_enabled/1
API.
2025-03-24 14:49:54 +00:00