The credit_flow between publishing AMQP 0.9.1 channel (or MQTT
connection) and (non-mirrored) classic queue processes was
unintentionally removed in 4.0 together with anything else related to
CQ mirroring.
By default we restore the 3.x behaviour for non-mirored classic
queues. It is possible to disable flow-control (the earlier 4.0.x
behaviour) with the new env `classic_queue_flow_control`. In 3.x this
was possible with the config `mirroring_flow_control`.
(cherry picked from commit d65bd7d07a)
* MQTT: avoid an exception
when an AMQP 0-9-1 publisher publishes a message
that has expiration set.
Stack trace was contributed in #12707 by @rdsilio.
* mc_mqtt_SUITE test for #12707#12710
* MQTT protocol_interop_SUITE: new test for #12710#12707
* Simplify tests
---------
Co-authored-by: David Ansari <david.ansari@gmx.de>
Prior to this commit, test
```
make -C deps/rabbitmq_mqtt ct-mqtt_shared t=[mqtt,cluster_size_1,v4]:non_clean_sess_reconnect_qos0_and_qos1
```
flaked in CI with error:
```
{mqtt_shared_SUITE,non_clean_sess_reconnect_qos0_and_qos1,972}
{badmatch,{publish_not_received,<<"msg-0">>}}
```
The problem was the following race condition:
* The MQTT v4 client sends an async DISCONNECT
* The global MQTT consumer metric got decremented. However, the classic
queue still has the MQTT connection proc registered as consumer.
* The test case sends a message
* The classic queue checks out the message to the old connection instead
of checking out the message to the new connection.
The solution in this commit is to check the consumer count of the
classic queue before proceeding to send the message after disconnection.
Support x-cc message annotation
Support an `x-cc` message annotation in AMQP 1.0
similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1.
The value of the `x-cc` message annotation must by a list of strings.
A message annotation is used since application properties allow only simple types.
This commit attempts to eliminate the test flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2385449940
```
rabbitmq_mqtt > parallel-ct-set-1 > mqtt_shared_SUITE > cluster_size_3 > v4 rabbit_mqtt_qos0_queue_kill_node
=== Ended at 2024-10-01 09:59:52
=== Location: [{mqtt_shared_SUITE,rabbit_mqtt_qos0_queue_kill_node,[1165](https://github.com/rabbitmq/rabbitmq-server/issues/mqtt_shared_suite.src.html#1165)},
{test_server,ts_tc,1793},
{test_server,run_test_case_eval1,1302},
{test_server,run_test_case_eval,1234}]
=== === Reason: no match of right hand side value {publish_not_received,
<<"m1">>}
in function mqtt_shared_SUITE:rabbit_mqtt_qos0_queue_kill_node/1 (mqtt_shared_SUITE.erl, line 1165)
in call from test_server:ts_tc/3 (test_server.erl, line 1793)
in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1302)
in call from test_server:run_test_case_eval/9 (test_server.erl, line 1234)
```
This flake could not be reproduced locally.
This commit also assumes that this flake occurred under Khepri but not
under Mnesia.
The hypothesis is the following:
* Node 0 is down
* MQTT client creates binding on node 1
* Khepri commits since the binding is replicated and persisted on node 1
and node 2. However the binding isn't reflected yet in node 2's
routing projecting table.
* Publishing a message to node 2 routes to nowhere.
The shared test suite was renamed only for clarity, but the
Web-MQTT test suites were renamed out of necessity: since
we are now adding the MQTT test directory to the code path
we need test suites to have different names to avoid
conflicts. We can't (easily) addpath only for this test suite
either since CT hooks don't call functions in a predictable
enough manner; it would always be hacky.
This is a proof of concept that mostly works but is missing
some tests, such as rabbitmq_mqtt or rabbitmq_cli. It also
doesn't apply to mixed version testing yet.
* Add global histogram metrics for received message sizes per-protocol
fixup: add new files to bazel
fixup: expose message_size_bytes as prometheus classic histogram type
`rabbit_msg_size_metrics` does not use `seshat` any more, but
`counters` directly.
fixup: add msg_size_metrics unit test
* Improve message size histogram
1.
Avoid unnecessary time series emitted for stream protocol
The stream protocol cannot observe message sizes.
This commit ensures that the following time series are omitted:
```
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="67108864"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="268435456"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="+Inf"} 0
rabbitmq_global_message_size_bytes_count{protocol="stream"} 0
rabbitmq_global_message_size_bytes_sum{protocol="stream"} 0
```
This reduces the number of time series by 15.
2.
Further reduce the number of time series by reducing the number of
buckets. Instead of 13 bucktes, emit only 9 buckets. Buckets are not
free, each is an extra time series stored.
Prior to this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
92
```
After this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
57
```
3.
The emitted metric should be called
`rabbitmq_message_size_bytes_bucket` instead of `rabbitmq_global_message_size_bytes_bucket`.
The latter is poor naming. There is no need to use `global` in
the metric name given that this metric doesn't exist in the old flawed
aggregated metrics.
4.
This commit simplies module `rabbit_global_counters`.
5.
Avoid garbage collecting the 10-elements list of buckets per message
being received.
---------
Co-authored-by: Péter Gömöri <peter@84codes.com>
The default of 0.4 was very conservative even when it was
set years ago. Since then:
- we moved to CQv2, which have much more predictable memory usage than (non-lazy) CQv1 used to
- we removed CQ mirroring which caused large sudden memory spikes in some situations
- we removed the option to store message payload in memory in quorum queues
For the past two years or so, we've been running all our internal tests and benchmarks
using the value of 0.8 with no OOMkills at all (note: we do this on
Kubernetes where the Cluster Operators overrides the available memory
levaing some additional headroom, but effectively we are still using more than
0.6 of memory).
1. Only run the CLI tests on a single node cluster. The shared_SUITE is
already very big. Testing the same CLI commands against node-0 on a
3-node cluster brings no benefit.
2. Move the two new CLI test cases in front of
management_plugin_connection because they are similar in that all
three tests close the MQTT connection.
3. There is no need to query the HTTP API for the two new CLI test
cases.
4. There is no need to set keepalive in the two new CLI test cases.
This commit is a breaking change in RabbitMQ 4.0.
## What?
Remove mqtt.default_user and mqtt.default_pass
Instead, rabbit.anonymous_login_user and rabbit.anonymous_login_pass
should be used.
## Why?
RabbitMQ 4.0 simplifies anonymous logins.
There should be a single configuration place
```
rabbit.anonymous_login_user
rabbit.anonymous_login_pass
```
that is used for anonymous logins for any protocol.
Anonymous login is orthogonal to the protocol the client uses.
Hence, there should be a single configuration place which can then be
used for MQTT, AMQP 1.0, AMQP 0.9.1, and RabbitMQ Stream protocol.
This will also simplify switching to SASL for MQTT 5.0 in the future.
This commit contains the following new quorum queue features:
* Fair share high/low priorities
* SAC consumers honour consumer priorities
* Credited consumer refactoring to meet AMQP requirements.
* Use checkpoints feature to reduce memory use for queues with long backlogs
* Consumer cancel option that immediately removes consumer and returns all pending messages.
* More compact commands of the most common commands such as enqueue, settle and credit
* Correctly track the delivery-count to be compatible with the AMQP spec
* Support the "modified" AMQP 1.0 outcome better.
Commits:
* Quorum queues v4 scaffolding.
Create the new version but not including any changes yet.
QQ: force delete followers after leader has terminated.
Also try a longer sleep for mqtt_shared_SUITE so that the
delete operation stands a chance to time out and move on
to the forced deletion stage.
In some mixed machine version scenarios some followers will never
apply the poison pill command so we may as well force delete them
just in case.
QQ: skip test in amqp_client that cannot pass with mixed machine versions
QQ: remove dead code
Code relating to prior machine versions and state conversions.
rabbit_fifo_prop_SUITE fixes
* QQ: add v4 ff and new more compact enqueue command.
Also update rabbit_fifo_* suites to test more relevant code versions
where applicable.
QQ: always use the updated credit mode format
QQv4: use more compact consumer reference in settle, credit, return
This introudces a new type: consumer_key() which is either the consumer_id
or the raft index the checkout was processed at. If the consumer is
using one of the updated credit spec formats rabbit_fifo will use the
raft index as the primary key for the consumer such that the rabbit
fifo client can then use the more space efficient integer index
instead of the full consumer id in subsequent commands.
There is compatibility code to still accept the consumer id in
settle, return, discard and credit commands but this is slighlyt
slower and of course less space efficient.
The old form will be used in cases where the fifo client may have
already remove the local consumer state (as happens after a cancel).
Lots of test refactorings of the rabbit_fifo_SUITE to begin to use
the new forms.
* More test refactoring and new API fixes
rabbit_fifo_prop_SUITE refactoring and other fixes.
* First pass SAC consumer priority implementation.
Single active consumers will be activated if they have a higher priority
than the currently active consumer. if the currently active consumer
has pending messages, no further messages will be assigned to the
consumer and the activation of the new consumer will happen once
all pending messages are settled. This is to ensure processing order.
Consumers with the same priority will internally be ordered to
favour those with credit then those that attached first.
QQ: add SAC consumer priority integration tests
QQ: add check for ff in tests
* QQ: add new consumer cancel option: 'remove'
This option immediately removes and returns all messages for a
consumer instead of the softer 'cancel' option which keeps the
consumer around until all pending messages have been either
settled or returned.
This involves a change to the rabbit_queue_type:cancel/5 API
to rabbit_queue_type:cancel/3.
* QQ: capture checked out time for each consumer message.
This will form the basis for queue initiated consumer timeouts.
* QQ: Refactor to use the new ra_machine:handle_aux/5 API
Instead of the old ra_machine:handle_aux/6 callback.
* QQ hi/lo priority queue
* QQ: Avoid using mc:size/1 inside rabbit_fifo
As we dont want to depend on external functions for things that may
change the state of the queue.
* QQ bug fix: Maintain order when returning multiple
Prior to this commit, quorum queues requeued messages in an undefined
order, which is wrong.
This commit fixes this bug and requeues messages always in the order as
nacked / rejected / released by the client.
We ensure that order of requeues is deterministic from the client's
point of view and doesn't depend on whether the quorum queue soft limit
was exceeded temporarily.
So, even when rabbit_fifo_client batches requeues, the order as nacked
by the client is still maintained.
* Simplify
* Add rabbit_quorum_queue:file_handle* functions back.
For backwards compat.
* dialyzer fix
* dynamic_qq_SUITE: avoid mixed versions failure.
* QQ: track number of requeues for message.
To be able to calculate the correct value for the AMQP delivery_count
header we need to be able to distinguish between messages that were
"released" or returned in QQ speak and those that were returned
due to errors such as channel termination.
This commit implement such tracking as well as the calculation
of a new mc annotations `delivery_count` that AMQP makes use
of to set the header value accordingly.
* Use QQ consumer removal when AMQP client detaches
This enables us to unskip some AMQP tests.
* Use AMQP address v2 in fsharp-tests
* QQ: track number of requeues for message.
To be able to calculate the correct value for the AMQP delivery_count
header we need to be able to distinguish between messages that were
"released" or returned in QQ speak and those that were returned
due to errors such as channel termination.
This commit implement such tracking as well as the calculation
of a new mc annotations `delivery_count` that AMQP makes use
of to set the header value accordingly.
* rabbit_fifo: Use Ra checkpoints
* quorum queues: Use a custom interval for checkpoints
* rabbit_fifo_SUITE: List actual effects in ?ASSERT_EFF failure
* QQ: Checkpoints modifications
* fixes
* QQ: emit release cursors on tick for followers and leaders
else followers could end up holding on to segments a bit longer
after traffic stops.
* Support draining a QQ SAC waiting consumer
By issuing drain=true, the client says "either send a transfer or a flow frame".
Since there are no messages to send to an inactive consumer, the sending
queue should advance the delivery-count consuming all link-credit and send
a credit_reply with drain=true to the session proc which causes the session
proc to send a flow frame to the client.
* Extract applying #credit{} cmd into 2 functions
This commit is only refactoring and doesn't change any behaviour.
* Fix default priority level
Prior to this commit, when a message didn't have a priority level set,
it got enqueued as high prio.
This is wrong because the default priority is 4 and
"for example, if 2 distinct priorities are implemented,
then levels 0 to 4 are equivalent, and levels 5 to 9 are equivalent
and levels 4 and 5 are distinct."
Hence, by default a message without priority set, must be enqueued as
low prio.
* bazel run gazelle
* Avoid deprecated time unit
* Fix aux_test
* Delete dead code
* Fix rabbit_fifo_q:get_lowest_index/1
* Delete unused normalize functions
* Generate less garbage
* Add integration test for QQ SAC with consumer priority
* Improve readability
* Change modified outcome behaviour
With the new quorum queue v4 improvements where a requeue counter was
added in addition to the quorum queue delivery counter, the following
sentence from https://github.com/rabbitmq/rabbitmq-server/pull/6292#issue-1431275848
doesn't apply anymore:
> Also the case where delivery_failed=false|undefined requires the release of the
> message without incrementing the delivery_count. Again this is not something
> that our queues are able to do so again we have to reject without requeue.
Therefore, we simplify the modified outcome behaviour:
RabbitMQ will from now on only discard the message if the modified's
undeliverable-here field is true.
* Introduce single feature flag rabbitmq_4.0.0
## What?
Merge all feature flags introduced in RabbitMQ 4.0.0 into a single
feature flag called rabbitmq_4.0.0.
## Why?
1. This fixes the crash in
https://github.com/rabbitmq/rabbitmq-server/pull/10637#discussion_r1681002352
2. It's better user experience.
* QQ: expose priority metrics in UI
* Enable skipped test after rebasing onto main
* QQ: add new command "modify" to better handle AMQP modified outcomes.
This new command can be used to annotate returned or rejected messages.
This commit also retains the delivery-count across dead letter boundaries
such that the AMQP header delivery-count field can now include _all_ failed
deliver attempts since the message was originally received.
Internally the quorum queue has moved it's delivery_count header to
only track the AMQP protocol delivery attempts and now introduces
a new acquired_count to track all message acquisitions by consumers.
* Type tweaks and naming
* Add test for modified outcome with classic queue
* Add test routing on message-annotations in modified outcome
* Skip tests in mixed version tests
Skip tests in mixed version tests because feature flag
rabbitmq_4.0.0 is needed for the new #modify{} Ra command
being sent to quorum queues.
---------
Co-authored-by: David Ansari <david.ansari@gmx.de>
Co-authored-by: Michael Davis <mcarsondavis@gmail.com>
Test case rabbit_mqtt_qos0_queue_kill_node flaked because after an
MQTT client subscribes on node 0, RabbitMQ returns success
and replicated the new binding to node 0 and node 1, but not
yet to node 2. Another MQTT client then publishes on node 2
without the binding being present yet on node 2, and the
message therefore isn't routed.
This commit attempts to eliminate this flake.
It adds a function to rabbit_ct_broker_helpers which waits until a given
node has caught up with the leader node.
We can reuse that function in future to eliminate more test flakes.
Configuring the mock authentication backend blocks
and generates an error in the test process when the
broker goes down. The error report makes the test fail
in some environments.
The process where the setup takes place must stay up
otherwise the ETS table used will go away.
This commit makes sure the broker-side authentication backend
setup returns at the end of the test. This way the calling
process terminates in a normal way.
Require all MQTT feature flags and remove their compatibility code:
* delete_ra_cluster_mqtt_node
* rabbit_mqtt_qos0_queue
* mqtt_v5
These feature flags were introduced in or before 3.13.0.
This commit is a follow up of https://github.com/rabbitmq/rabbitmq-server/pull/11604
This commit changes the AMQP address format v2 from
```
/e/:exchange/:routing-key
/e/:exchange
/q/:queue
```
to
```
/exchanges/:exchange/:routing-key
/exchanges/:exchange
/queues/:queue
```
Advantages:
1. more user friendly
2. matches nicely with the plural forms of HTTP API v1 and HTTP API v2
This plural form is still non-overlapping with AMQP address format v1.
Although it might feel unusual at first to send a message to `/queues/q1`,
if you think about `queues` just being a namespace or entity type, this
address format makes sense.
to distinguish between v1 and v2 address formats.
Previously, v1 and v2 address formats overlapped and behaved differently
for example for:
```
/queue/:queue
/exchange/:exchange
```
This PR changes the v2 format to:
```
/e/:exchange/:routing-key
/e/:exchange
/q/:queue
```
to distinguish between v1 and v2 addresses.
This allows to call `rabbit_deprecated_features:is_permitted(amqp_address_v1)`
only if we know that the user requests address format v1.
Note that `rabbit_deprecated_features:is_permitted/1` should only
be called when the old feature is actually used.
Use percent encoding / decoding for address URI format v2.
This allows to use any UTF-8 encoded characters including slashes (`/`)
in routing keys, exchange names, and queue names and is more future
safe.
* MQTT: speed up shared_SUITE:many_qos1_messages
* speed up block_only_publisher
* MQTT: reorganise tests groups
To avoid starting a new broker for each protocol group (v3,v4,v5).
Instead we run all protocol groups under a single cluster configuration
group.
* MQTT: speed up publish_to_all_queue_types_qos* tests.
* Remove separate mnesia_store group
to speed up the test suite
* Fix wrong_shard_count
* Remove unused field
* Run subset of v3 tests
The code being tested under v3 and v4 is almost identical.
To save time in CI, we therefore run only a very small subset of tests in v3.
This cuts the total time reported by CT for the shared_SUITE from 898
seconds to 614 seconds.
Also, the java_SUITE tests in v3.
* Fix wrong_shard_count
* Fix mixed version failure
---------
Co-authored-by: David Ansari <david.ansari@gmx.de>