Since https://github.com/rabbitmq/rabbitmq-server/pull/13242 updated
Cowlib to v2.14.0, this commit deletes rabbit_uri as written in the
comments of rabbit_uri.erl:
```
This file is a partial copy of
https://github.com/ninenines/cowlib/blob/optimise-urldecode/src/cow_uri.erl
We use this copy because:
1. uri_string:unquote/1 is lax: It doesn't validate that characters that are
required to be percent encoded are indeed percent encoded. In RabbitMQ,
we want to enforce that proper percent encoding is done by AMQP clients.
2. uri_string:unquote/1 and cow_uri:urldecode/1 in cowlib v2.13.0 are both
slow because they allocate a new binary for the common case where no
character was percent encoded.
When a new cowlib version is released, we should make app rabbit depend on
app cowlib calling cow_uri:urldecode/1 and delete this file (rabbit_uri.erl).
```
Cowboy 2.13 contains the Websocket optimisations as well
as the ability to set the Websocket max_frame_size option
dynamically, plus plenty of other improvements.
Cowlib was added as a test dep to rabbitmq_mqtt to make
sure emqtt doesn't pull the wrong Cowlib version for Cowboy.
Only large messages delivered to multiple CQs are stored once for
multiple queues.
Non-durable queues are deprecated and will be removed, so don't even
mention them.
We don't "page out" messages anymore.
When trying to use OTP28.0-rc1, Elixir fails to compile these modules
because a module attribute cannot be a regex. It is not yet clear
whether it's something to be fixed in Elixir for OTP28 compatibility
or something that accidentally worked in the past, but either way,
using a string as an attribute is equally good and works all OTP
versions, including OTP28.0-rc1.
```
== Compilation error in file lib/rabbitmq/cli/core/command_modules.ex ==
** (ArgumentError) cannot inject attribute @commands_ns into function/macro because cannot escape #Reference<0.2201422310.1333657602.13657>. The supported values are: lists, tuples, maps, atoms, numbers, bitstrings, PIDs and remote functions in the format &Mod.fun/arity
(elixir 1.18.2) lib/kernel.ex:3729: Kernel.do_at/5
(elixir 1.18.2) expanding macro: Kernel.@/1
lib/rabbitmq/cli/core/command_modules.ex:133: RabbitMQ.CLI.Core.CommandModules.make_module_map/2
```
Since 4.0.0 (commit d45fbc3d) the shared message store writes large
messages into their own rdq files. This information can be utilised
when scanning rdq files during recovery to avoid reading in the whole
message body into memory unnecessarily.
This commit addresses the same issue that was addressed in 3.13.x by
commit baeefbec (ie. appending a large binary together from 4MB chunks
leaves a lot of garbage and memory fragmentation behind) but even more
efficiently.
Large messages which were written before 4.0.0, which don't fully fill
the rdq file, are still handled as before.
```
make -C deps/rabbit ct-rabbit_stream_queue t=cluster_size_3_parallel_1 RABBITMQ_METADATA_STORE=mnesia
```
flaked prior to this commit locally on Ubuntu with the following error after 11 runs:
```
rabbit_stream_queue_SUITE > cluster_size_3_parallel_1 > consume_from_replica
{error,
{{shutdown,
{server_initiated_close,406,
<<"PRECONDITION_FAILED - stream queue 'consume_from_replica' in vhost '/' does not have a running replica on the local node">>}},
{gen_server,call,
[<0.8365.0>,
{subscribe,
{'basic.consume',0,<<"consume_from_replica">>,
<<"ctag">>,false,false,false,false,
[{<<"x-stream-offset">>,long,0}]},
<0.8151.0>},
infinity]}}}
```
Initialising a message container from data stored in a
stream is a special case where we need to recover exchange
and routing key information from the following message annatations:
* x-exchange
* x-routing-keys
* x-cc
We do not want to do this when initialising a message container
from AMQP data just received from a publisher.
This commit introduces a new function `mc_amqp:init_from_stream/2`
that is to be used when needing a message container from a stream
message.
[Why]
During mixed-version testing, the old node might not be able to join or
rejoin a cluster if the other nodes run a newer Khepri machine version.
[How]
The old node is used as the cluster seed node and is never touched
otherwise. Other nodes are restarted or join the cluster later.
## What?
Support the `dynamic` field of sources and targets.
## Why?
1. This allows AMQP clients to dynamically create exclusive queues, which
can be useful for RPC workloads.
2. Support creation of JMS temporary queues over AMQP using the Qpid JMS
client. Exclusive queues map very nicely to JMS temporary queues
because:
> Although sessions are used to create temporary destinations, this is only
for convenience. Their scope is actually the entire connection. Their
lifetime is that of their connection and any of the connection’s sessions
are allowed to create a consumer for them.
https://jakarta.ee/specifications/messaging/3.1/jakarta-messaging-spec-3.1#creating-temporary-destinations
## How?
If the terminus contains the capability `temporary-queue` as defined in
[amqp-bindmap-jms-v1.0-wd10](https://groups.oasis-open.org/higherlogic/ws/public/document?document_id=67638)
[5.2] and as sent by Qpid JMS client,
RabbitMQ will create an exclusive queue.
(This allows a future commit to take other actions if capability
`temporary-topic` will be used, such as the additional creation of bindings.)
No matter what the desired node properties are, RabbitMQ will set the
lifetime policy delete-on-close deleting the exclusive queue when the
link which caused its creation ceases to exist. This means the exclusive
queue will be deleted if either:
* the link gets detached, or
* the session ends, or
* the connection closes
Although the AMQP JMS Mapping and Qpid JMS create only a **sending** link
with `dynamic=true`, this commit also supports **receiving** links with
`dynamic=true` for non-JMS AMQP clients.
RabbitMQ is free to choose the generated queue name. As suggested by the
AMQP spec, the generated queue name will contain the container-id and link
name unless they are very long.
Co-authored-by: Arnaud Cogoluègnes <acogoluegnes@gmail.com>
Make AMQP 1.0 connection shut down its sessions before sending the
close frame to the client similar to how the AMQP 0.9.1 connection
shuts down its channels before closing the connection.
This commit avoids concurrent deletion of exclusive queues by the session process
and the classic queue process.
This commit should also fix https://github.com/rabbitmq/rabbitmq-server/issues/2596
* Separate invalid client test from the valid one
* Apply same changes from pr #13197
* Deal with stalereferences caused by timing issues
looking up objects in the DOM
* Unlink before assertion
... are being used at the same time.
[Why]
Depending on which node clusters with which, a node running an older
version of the Khepri Ra machine may not be able to apply Ra commands
and could be stuck.
There is no real solution and this clearly an unsupported scenario. An
old node won't always be able to join a newer cluster.
[How]
In the testsuites, we skip clustering tests if we detect that multiple
Khepri Ra machine versions are being used.
The following test flaked in CI under Khepri in mixed version mode:
```
make -C deps/rabbitmq_mqtt ct-v5 t=cluster_size_3:will_delay_node_restart RABBITMQ_METADATA_STORE=khepri SECONDARY_DIST=rabbitmq_server-4.0.5 FULL=1
```
The first node took exactly 30 seconds for draining:
```
2025-02-10 15:00:09.550824+00:00 [debug] <0.1449.0> MQTT accepting TCP connection <0.1449.0> (127.0.0.1:33376 -> 127.0.0.1:27005)
2025-02-10 15:00:09.550992+00:00 [debug] <0.1449.0> Received a CONNECT, client ID: sub0, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.551134+00:00 [debug] <0.1449.0> MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.551219+00:00 [debug] <0.1449.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.551530+00:00 [info] <0.1449.0> Accepted MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 for client ID sub0
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> Received a SUBSCRIBE with subscription(s) [{mqtt_subscription,<<"my/topic">>,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> {mqtt_subscription_opts,0,false,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> false,0,undefined}}]
2025-02-10 15:00:09.556233+00:00 [debug] <0.896.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'}
2025-02-10 15:00:09.561518+00:00 [debug] <0.1456.0> MQTT accepting TCP connection <0.1456.0> (127.0.0.1:33390 -> 127.0.0.1:27005)
2025-02-10 15:00:09.561634+00:00 [debug] <0.1456.0> Received a CONNECT, client ID: will, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.561715+00:00 [debug] <0.1456.0> MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.561828+00:00 [debug] <0.1456.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.562596+00:00 [info] <0.1456.0> Accepted MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 for client ID will
2025-02-10 15:00:09.565743+00:00 [warning] <0.1460.0> This node is being put into maintenance (drain) mode
2025-02-10 15:00:09.565833+00:00 [debug] <0.1460.0> Marking the node as undergoing maintenance
2025-02-10 15:00:09.570772+00:00 [info] <0.1460.0> Marked this node as undergoing maintenance
2025-02-10 15:00:09.570904+00:00 [info] <0.1460.0> Asked to suspend 9 client connection listeners. No new client connections will be accepted until these listeners are resumed!
2025-02-10 15:00:09.572268+00:00 [warning] <0.1460.0> Suspended all listeners and will no longer accept client connections
2025-02-10 15:00:09.572317+00:00 [warning] <0.1460.0> Closed 0 local client connections
2025-02-10 15:00:09.572418+00:00 [warning] <0.1449.0> MQTT disconnecting client <<"127.0.0.1:33376 -> 127.0.0.1:27005">> with client ID 'sub0', reason: maintenance
2025-02-10 15:00:09.572414+00:00 [warning] <0.1000.0> Closed 2 local (Web) MQTT client connections
2025-02-10 15:00:09.572499+00:00 [warning] <0.1456.0> MQTT disconnecting client <<"127.0.0.1:33390 -> 127.0.0.1:27005">> with client ID 'will', reason: maintenance
2025-02-10 15:00:09.572866+00:00 [alert] <0.1000.0> Closed 0 local STOMP client connections
2025-02-10 15:00:09.577432+00:00 [debug] <0.1456.0> scheduled delayed Will Message to topic my/topic for MQTT client ID will to be sent in 10000 ms
2025-02-10 15:00:12.991328+00:00 [debug] <0.1469.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:12.991443+00:00 [debug] <0.1469.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:12.992497+00:00 [debug] <0.1469.0> Done with virtual host processes reconciliation (run 3)
2025-02-10 15:00:16.511733+00:00 [debug] <0.1476.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:16.511864+00:00 [debug] <0.1476.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:16.514293+00:00 [debug] <0.1476.0> Done with virtual host processes reconciliation (run 4)
2025-02-10 15:00:24.897477+00:00 [debug] <0.1479.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:24.897607+00:00 [debug] <0.1479.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:24.898483+00:00 [debug] <0.1479.0> Done with virtual host processes reconciliation (run 5)
2025-02-10 15:00:24.898527+00:00 [debug] <0.1479.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:32.994347+00:00 [debug] <0.1484.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:32.994474+00:00 [debug] <0.1484.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:32.996539+00:00 [debug] <0.1484.0> Done with virtual host processes reconciliation (run 6)
2025-02-10 15:00:32.996585+00:00 [debug] <0.1484.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:39.576325+00:00 [info] <0.1460.0> Will transfer leadership of 0 quorum queues with current leader on this node
2025-02-10 15:00:39.576456+00:00 [info] <0.1460.0> Leadership transfer for quorum queues hosted on this node has been initiated
2025-02-10 15:00:39.576948+00:00 [info] <0.1460.0> Will stop local follower replicas of 0 quorum queues on this node
2025-02-10 15:00:39.576990+00:00 [info] <0.1460.0> Stopped all local replicas of quorum queues hosted on this node
2025-02-10 15:00:39.577120+00:00 [info] <0.1460.0> Will transfer leadership of metadata store with current leader on this node
2025-02-10 15:00:39.577282+00:00 [info] <0.1460.0> Khepri clustering: transferring leadership to node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577424+00:00 [info] <0.1460.0> Khepri clustering: skipping leadership transfer, leader is already in node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577547+00:00 [info] <0.1460.0> Leadership transfer for metadata store on this node has been done. The new leader is 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577674+00:00 [info] <0.1460.0> Node is ready to be shut down for maintenance or upgrade
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0> SIGTERM received - shutting down
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0>
2025-02-10 15:00:39.595758+00:00 [debug] <0.44.0> Running rabbit_prelaunch:shutdown_func() as part of `kernel` shutdown
```
Running the same test locally revealed that [rabbit_maintenance:status_consistent_read/1](55ae918094/deps/rabbit/src/rabbit_maintenance.erl (L131))
takes exactly 30 seconds to complete.
The test case assumes a Will Delay higher than the time it takes to
drain and shut down the node. Hence, this commit increases the Will
Delay time from 10 seconds to 40 seconds.
That was not keycloak format it was an
extension to the oauth spec introuduced
a few years ago. To get a token from
keycloak using this format, a.k.a.
requesting party token, one has to specify
a different claim type called
urn:ietf:params:oauth:grant-type:uma-ticket
[Why]
Some testcases used to use node 1 as the clustering seed node. With
mixed-version testing, it could cause issues because node 1 would start
with a new version of Ra compared to node 2 and node 2 could fail to
join.
[How]
By using node 2 as the seed node, node 1 running a newer version of Ra
should be able to join because it supports talking to an older version.
[Why]
The `force_reset` command simply removes local files on disk for the
local node.
In the case of Ra, this can't work because the rest of the cluster does
not know about the forced-reset node. Therefore the leader will continue
to send `append_entry` commands to the reset node.
If that forced-reset node restarts and receives these messages, it will
either join the cluster again (because it's on an older Raft term) or it
will hit an assertion and exit (because it's on the same Raft term).
[How]
Given we can't really support this scenario and it has little value, the
command will now return an error if someone attemps a `force_reset` with
a node running Khepri.
This also deprecates the command: once Mnesia support is removed, the
command will be removed at the same time. This is noted in the
rabbitmqctl.8 manpage.
* Redesigned k8s peer discovery
Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.
This should prevent any race conditions and incorrectly formed clusters.
This commit contains the following changes:
1. Simplify .NET suite
2. Simplify Java package naming
3. Extract JMS tests into separate suite. This way, it's easier to run,
debug, and add new tests compared to the previous suite which mixed
.NET tests with JMS tests.
4. Add tests for different JMS message types
for the backends that support it in the first place.
When forming a cluster, registration of the node
joining the cluster might be left to (container)
orchestration tools like Nomad or Kubernetes.
This PR add a new configuration option,
'cluster_formation.registration.enable',
which defaults to true.
When set to false node registration will be skipped.
There is at least one important advantage using a
tool such as Nomad (plus Consul) over the application
(RabbitMQ) doing the registration.
When the application is not stopped gracefully for
any reason, e.g. its OOM killed,
it cannot deregister the service/node.
This leaves behind an unlinked service entry in the registry.
This problem is fundamentally avoided by allowing
Nomad (or similar tools) to register the
node'service.
See #11233#11045 for prior discussions.
Co-authored-by: Frederik Bosch <f.bosch@genkgo.nl>
Consumer count is already returned by the /channels API endpoint. Now
the consumer count column can be shown in the channels table but it is
hidden by default.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
to match that used with Mnesia.
In the case of Mnesia, there are 10 retries
with a 30 second delay each.
For Khepri, a single timeout is used, so it
must be ten times as long.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
to match that used with Mnesia.
In the case of Mnesia, there are 10 retries
with a 30 second delay each.
For Khepri, a single timeout is used, so it
must be ten times as long.
`rabbitmq_management` is missing one suite definition and `rabbit_mqtt`
is missing two. `assert_suites` causes a build failure because of the
missing suites. This change comments out `assert_suites` for these apps
instead of adding the missing suite definitions because Bazel is no
longer used to test these apps.
## What?
Implement the AMQP over WebSocket Binding Committee Specification 01 in
the AMQP 1.0 Erlang client:
https://docs.oasis-open.org/amqp-bindmap/amqp-wsb/v1.0/cs01/amqp-wsb-v1.0-cs01.html
## Why?
1. This allows writing integration tests for the server implementation
of AMQP over WebSocket.
2. Erlang and Elixir clients can use AMQP over WebSocket in environments
where firewalls prohibit access to the AMQP port.
## How?
Use gun as WebSocket client.
The new module `amqp10_client_socket` handles socket operations (open, close, send) for:
* TCP sockets
* SSL sockets
* WebSockets
Prior to this commit, the amqp10_client_connection process closed only the
write end of the socket after it sent the AMQP close performative.
This commit removed premature socket closure because:
1. There is no equivalent feature provided in Gun since sending a
WebSocket close frame causes Gun to cleanly close the connection for
both writing and reading.
2. It's unnecessary and can result in unexpected and confusing behaviour on the server.
3. It's better practive to keep the TCP connection fully open until
the AMQP closing handshake completes.
4. When amqp10_client_frame_reader terminates, it will cleanly close
the socket for both writing and reading.
from rabbit_fifo version 0.
The same was also implemented for the stream coordinator.
QQ: avoid dead lock in queue federation.
When processing the queue federation startup even the process
may call back into the ra process causing a deadlock. in this
case we spawn a temporary process to avoid this.
This offloads the work of reading messages from on-disk segments
to the interacting process rather than doing this blocking, performance
affecting work in the ra server process.
QQ: ensure opened segments are closed after some time of inactivity
Processes that havea received messages that had to be read from disks
may keep a segment open indefinitely. This introduces a timer which
after some time of inactivity will close all opened segments to ensure
file descriptors are not kept open indefinitely.
[Why]
When running mixed-version tests, nodes 1/3/5/... are using the primary
umbrella, so usually the newest version. Nodes 2/4/6/... are using the
secondary umbrella, thus the old version.
When clustering, we used to use node 1 (running a new version) as the
seed node, meaning other nodes would join it.
This complicates things with feature flags because we have to make sure
that we start node 1 with new stable feature flags disabled to allow old
nodes to join.
This is also a problem with Khepri machine versions because the cluster
would start with the latest version, which old nodes might not have.
[How]
This patch changes the logic to use a node running the secondary
umbrella as the seed node instead. If there is no node running it, we
pick the first node as before.
V2: Revert part of "rabbitmq_ct_helpers: Fix how we set
`$RABBITMQ_FEATURE_FLAGS` in tests" (commit
57ed962ef6). These changes are no
longer needed with the new logic.
V3: The check that verifies that the correct metadata store is used has
a special case for nodes that use the secondary umbrella: if Khepri
is supposed to be used but it's not, the feature flag is enabled.
The reason is that the `v4.0.x` branch doesn't know about the `rel`
configuration of `forced_feature_flags_on_init`. The nodes will
have ignored thies parameter and booted with the stable feature
flags only.
Many testsuites are adapted to the new clustering order. If they
manage which node joins which node, either the order is changed in
the testcases, or nodes are started with only required feature
flags. For testsuites that rely on peer discovery where the order is
unknown, nodes are started with only required feature flags.
[How]
1. Use feature flags correctly: the code shouldn't test if a feature
flag is enabled, assuming something else enabled it. It should enable
it and react to an error.
2. Use `close_connection_sync/1` instead of the asynchronous
`amqp10_client:close_connection/1` to make sure they are really
closed. The wait in `end_per_testcase/2` was not enough apparently.
3. For the two testcases that flake the most for me, enclose the code in
a try/after and make sure to close the connection at the end,
regardless of the result. This should be done for all testcases
because the testgroup use a single set of RabbitMQ nodes for all
testcases, therefore testcases are supposed to clean up after them...
This commit is no change in functionality and mostly deletes dead code.
1. Code targeting Erlang 22 and below is deleted since the mininmum
required Erlang version is higher nowadays.
"In OTP 23 distribution flag DFLAG_BIG_CREATION became mandatory. All
pids are now encoded using NEW_PID_EXT, even external pids received
as PID_EXT from older nodes."
https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#new_pid_ext
2. All v1 encoding and decoding of the Pid is deleted since the lower
version RabbitMQ node supports the v2 encoding nowadays.
Exits the with reason "killed" only occurs "naturally" in OTP
when a supervisor tries to shut a child down and it times out.
It is used for failure simulation in tests quite frequently however.
When a leader changes all enqueuer and consumer processes are notified
from the `state_enter(leader,` callback. However a new leader may not
yet have applied all commands that the old leader had. If any of those
commands is a checkout or a register_enqueuer command these processes
will not be notified of the new leader and thus may never resend their
pending commands.
The new leader will however send an applied notification when it does
apply these entries and these are always sent from the leader process
so can also be used to trigger pending resends. This commit implements
that.
## What?
This commit fixes#13040.
Prior to this commit, exchange federation crashed if the MQTT topic exchange
(`amq.topic` by default) got federated and MQTT 5.0 clients subscribed on the
downstream. That's because the federation plugin sends bindings from downstream
to upstream via AMQP 0.9.1. However, binding arguments containing Erlang record
`mqtt_subscription_opts` (henceforth binding args v1) cannot be encoded in AMQP 0.9.1.
## Why?
Federating the MQTT topic exchange could be useful for warm standby use cases.
## How?
This commit makes binding arguments a valid AMQP 0.9.1 table (henceforth
binding args v2).
Binding args v2 can only be used if all nodes support it. Hence binding
args v2 comes with feature flag `rabbitmq_4.1.0`. Note that the AMQP
over WebSocket
[PR](https://github.com/rabbitmq/rabbitmq-server/pull/13071) already
introduces this same feature flag. Although the feature flag subsystem
supports plugins to define their own feature flags, and the MQTT plugin
defined its own feature flags in the past, reusing feature flag
`rabbitmq_4.1.0` is simpler.
This commit also avoids database migrations for both Mnesia and Khepri
if feature flag `rabbitmq_4.1.0` gets enabled. Instead, it's simpler to
migrate binding args v1 to binding args v2 at MQTT connection establishment
time if the feature flag is enabled. (If the feature flag is disabled at
connection etablishment time, but gets enabled during the connection
lifetime, the connection keeps using bindings args v1.)
This commit adds two new suites:
1. `federation_SUITE` which tests that federating the MQTT topic
exchange works, and
2. `feature_flag_SUITE` which tests the binding args migration from v1 to v2.
Visualise busy links from publisher to RabbitMQ. If the link credit
reaches 0, we set a yellow background colour in the cell.
Note that these credit values can change many times per second while the
management UI refreshes only every few seconds. However, it may still
give a user an idea of what links are currently busy.
We use yellow since that's consistent with the `flow` state in AMQP
0.9.1, which is also set to yellow.
We do not want want to highlight **outgoing** links with credit 0 as
that might be a paused consumer, and therefore not a busy link.
We also use yellow background color if incoming-window is 0 (in case of
a cluster wider memory or disk alarm) or if remote-incoming-window is 0
as consumers should try to keep their incoming-window open and instead
use link credit if they want to pause consumption.
Additionaly we set a grey background colour for the `/management`
address just to highlight them slightly since these are "special" link
pairs.
msg_store_io_batch_size is no longer used
msg_store_credit_disc_bound appears to be used in the code, but I don't
see any impact of that value on the performance. It should be properly
investigated and either removed completely or fixed, because there's
hardly any point in warning about the values configured
(plus, this settings is hopefully almost never used anyway)
According to the `rabbit_backing_queue` behavious it must always
return `ok`, but it used to return a list of results one for each
priority. That caused the below crash further up the call chain.
```
> rabbit_classic_queue:delete_crashed(Q)
** exception error: no case clause matching [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
in function rabbit_classic_queue:delete_crashed/2 (rabbit_classic_queue.erl, line 516)
```
Other backing_queue implementations (`rabbit_variable_queue`) just
exit with a badmatch upon error.
This (very minor) issue is present since 3.13.0 when
`rabbit_classic_queue:delete_crashed_in_backing_queue/1` was
instroduced with Khepri in commit 5f0981c5. Before that the result of
`BQ:delete_crashed/1` was simply ignored.
Include monitored session pids in format_status/1 of rabbit_amqp_writer.
They could be useful when debugging.
The maximum number of sessions per connection is limited, hence the
output won't be too large.
[Why]
In order to make `khepri_db` the default in the future, the handling of
`$RABBITMQ_FEATURE_FLAGS` had to be adapted to be able to *disable*
Khepri instead.
Unfortunately I broke the behavior with stable feature flags that are
only available in the primary umbrella. In this case, they were
automatically enabled and thus, clustering with an old umbrella that did
not have these feature flags failed with `incompatible_feature_flags`.
[How]
The solution is to always use an absolute list of feature flags, not the
new relative list.
V2: Allow a testsuite to skip the configuration of the metadata store.
This is needed for the feature_flags_SUITE testsuite because it
tests the default behavior and the configuration of the metadata
store changes that behavior.
While here, fix a ct log message where variables were swapped
compared to the format strieg expectation.
V3: Enable `rabbitmq_4.0.0` feature flag in rabbit_mgmt_http_SUITE. This
testsuite apparently requires it and if it's not enabled, it fails.
The connection cannot return some information while initializing, so we
just return no information.
The CLI info call was supported only in the open gen_statem callback, so
such a call during the connection init would make it crash. This can
happen when several stream connections get closed and the user calls
list_stream_consumers or list_stream_connections while the connection
are recovering.
This commit adds a clause for CLI info calls in the all the gen_statem
callbacks and returns actual information only when appropriate.
Without this change, consumers using protocols other than the stream
protocol would display as inactive in the Management UI/API and CLI
commands, even though they were receiving messages.
This follows the decision that was made for
'rabbitm-diagnostics node_health_check' which
is a no-op as of 4.0.0 following a few years of
deprecation.
The justification is very similar:
1. There is no such thing as "One True Health Check".
A single health check is too coarse-grained to
explain what specifically is not right about
cluster state
2. Indivual fine-grained health checks have been
available for a few years now, see
https://www.rabbitmq.com/docs/monitoring#health-checks
3. This particular check tests something that
effectively never fails, based on my 14+
years of RabbitMQ contributions and user support
of all shapes and forms
4. This check uses a deprecated feature: non-exclusive
non-durable/transient classic queues
If something about this health check is worth
preserving, we can always add a new one
under GET /api/health/checks/*
Closes#13047.
Accidental "fat finger" virtual deletion accidents
would be easier to avoid if there was a protection mechanism
that would apply equally even to CLI tools and external
applications that do not use confirmations for deletion
operations.
This introduce the following changes:
* Virtual host metadata now supports a new queue,
'protected_from_deletion', which, when set,
will be considered by key virtual host deletion function(s)
* DELETE /api/vhosts/{name} was adapted to handle
such blocked deletion attempts to respond with
a 412 Precondition Failed status
* 'rabbitmqctl list_vhosts' and 'rabbitmqctl delete_vhost'
were adapted accordingly
* DELETE /api/vhosts/{name}/deletion/protection
is a new endpoint that can be used to remove
the protective seal (the metadata key)
* POST /api/vhosts/{name}/deletion/protection
marks the virtual host as protected
In the case of the HTTP API, all operations on
virtual host metadata require administrative
privileges from the target user.
Other considerations:
* When a virtual host does not exist, the behavior
remains the same: the original, protection-unaware
code path is used to preserve backwards compatibility
References #12772.