* Call `Mod:format_state/1` if exported to possibly truncate huge states
* Add more information about truncated ram_pending_ack and disk_pending_ack
* Add `log.error_logger_format_depth` cuttlefish schema value
* Add `format_state/1` to `rabbit_channel`
* Add `log.summarize_process_state`, default is `false`, to enable summarizing process state for crash logs.
* Added `format_state` to `rabbit_classic_queue_index_v2` and `rabbit_classic_queue_store_v2`
* Ensure `rabbit_channel:format_state/1` uses `summarize_process_state_when_logged`
* Do not set `summarize_process_state_when_logged` value by default.
* Type specs
# What?
* Support Direct Reply-To for AMQP 1.0
* Compared to AMQP 0.9.1, this PR allows for multiple volatile queues on a single
AMQP 1.0 session. Use case: JMS clients can create multiple temporary queues on
the same JMS/AMQP session:
* https://jakarta.ee/specifications/messaging/3.1/apidocs/jakarta.messaging/jakarta/jms/session#createTemporaryQueue()
* https://jakarta.ee/specifications/messaging/3.1/apidocs/jakarta.messaging/jakarta/jms/jmscontext#createTemporaryQueue()
* Fix missing metrics in for Direct Reply-To in AMQP 0.9.1, e.g.
`messages_delivered_total`
* Fix missing metrics (even without using Direct Reply-To ) in AMQP 0.9.1:
If stats level is not `fine`, global metrics `rabbitmq_global_messages_delivered_*` should still be incremented.
# Why?
* Allow for scalable at-most-once RPC reply delivery
Example use case: thousands of requesters connect, send a single
request, wait for a single reply, and disconnect.
This PR won't create any queue and won't write to the metadata store.
Therefore, there's less pressure on the metadata store, less pressure
on the Management API when listing all queues, less pressure on the
metrics subsystem, etc.
* Feature parity with AMQP 0.9.1
# How?
This PR extracts the previously channel specific Direct Reply-To code
into a new queue type: `rabbit_volatile_queue`.
"Volatile" describes the semantics, not a use-case. It signals non-durable,
zero-buffer, at-most-once, may-drop, and "not stored in Khepri."
This new queue type is then used for AMQP 1.0 and AMQP 0.9.1.
Sending to the volatile queue is stateless like previously with Direct Reply-To in AMQP 0.9.1 and like done
for the MQTT QoS 0 queue.
This allows for use cases where a single responder replies to e.g. 100k different requesters.
RabbitMQ will automatically auto grant new link-credit to the responder because the new queue type confirms immediately.
The key gets implicitly checked by the channel/session:
If the queue name (including the key) doesn’t exist, the `handle_event` callback for this queue isn’t invoked and therefore
no delivery will be sent to the responder.
This commit supports Direct Reply-To across AMQP 1.0 and 0.9.1. In other
words, the requester can be an AMQP 1.0 client while the responder is an
AMQP 0.9.1 client or vice versa.
RabbitMQ will internally convert between AMQP 0.9.1 `reply_to` and AMQP
1.0 `/queues/<queue>` address. The AMQP 0.9.1 `reply_to` property is
expected to contain a queue name. That's in line with the AMQP 0.9.1
spec:
> One of the standard message properties is Reply-To, which is designed
specifically for carrying the name of reply queues.
Compared to AMQP 0.9.1 where the requester sets the `reply_to` property
to `amq.rabbitmq.reply-to` and RabbitMQ modifies this field when
forwarding the message to the request queue, in AMQP 1.0 the requester
learns about the queue name from the broker at link attachment time.
The requester has to set the reply-to property to the server generated
queue name. That's because the server isn't allowed to modify the bare
message.
During link attachment time, the client has to set certain fields.
These fields are expected to be set by the RabbitMQ client libraries.
Here is an Erlang example:
```erl
Source = #{address => undefined,
durable => none,
expiry_policy => <<"link-detach">>,
dynamic => true,
capabilities => [<<"rabbitmq:volatile-queue">>]},
AttachArgs = #{name => <<"receiver">>,
role => {receiver, Source, self()},
snd_settle_mode => settled,
rcv_settle_mode => first},
{ok, Receiver} = amqp10_client:attach_link(Session, AttachArgs),
AddressReplyQ = receive {amqp10_event, {link, Receiver, {attached, Attach}}} ->
#'v1_0.attach'{source = #'v1_0.source'{address = {utf8, Addr}}} = Attach,
Addr
end,
```
The client then sends the message by setting the reply-to address as
follows:
```erl
amqp10_client:send_msg(
SenderRequester,
amqp10_msg:set_properties(
#{message_id => <<"my ID">>,
reply_to => AddressReplyQ},
amqp10_msg:new(<<"tag">>, <<"request">>))),
```
If the responder attaches to the queue target in the reply-to field,
RabbitMQ will check if the requester link is still attached. If the
requester detached, the link will be refused.
The responder can also attach to the anonymous null target and set the
`to` field to the `reply-to` address.
If RabbitMQ cannot deliver a reply, instead of buffering the reply,
RabbitMQ will be drop the reply and increment the following Prometheus metric:
```
rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_volatile_queue",dead_letter_strategy="disabled"} 0.0
```
That's in line with the MQTT QoS 0 queue type.
A reply message could be dropped for a variety of reasons:
1. The requester ran out of link-credit. It's therefore the requester's
responsibility to grant sufficient link-credit on its receiving link.
2. RabbitMQ isn't allowed to deliver any message to due session flow
control. It's the requster's responsibility to keep the session window
large enough.
3. The requester doesn't consume messages fast enough causing TCP
backpressure being applied or the RabbitMQ AMQP writer proc isn't
scheduled quickly enough. The latter can happen for example if
RabbitMQ runs with a single scheduler (is assigned a single CPU
core). In either case, RabbitMQ internal flow control causes the
volatile queue to drop messages.
Therefore, if high throughput is required while message loss is undesirable, a classic queue should be used
instead of a volatile queue since the former buffers messages while the
latter doesn't.
The main difference between the volatile queue and the MQTT QoS 0 queue
is that the former isn't written to the metadata store.
# Breaking Change
Prior to this PR the following [documented caveat](https://www.rabbitmq.com/docs/4.0/direct-reply-to#limitations) applied:
> If the RPC server publishes with the mandatory flag set then `amq.rabbitmq.reply-to.*`
is treated as **not** a queue; i.e. if the server only publishes to this name then the message
will be considered "not routed"; a `basic.return` will be sent if the mandatory flag was set.
This PR removes this caveat.
This PR introduces the following new behaviour:
> If the RPC server publishes with the mandatory flag set, then `amq.rabbitmq.reply-to.*`
is treated as a queue (assuming this queue name is encoded correctly). However,
whether the requester is still there to consume the reply is not checked at routing time.
In other words, if the RPC server only publishes to this name, then the message will be
considered "routed" and RabbitMQ will therefore not send a `basic.return`.
When validation fails for a policy parameter, the resulting popup can't
be read due to one extra binary encoding as well as code that escapes
HTML entites. Since the EJS template uses `<%= >` for the popup, it will
display the text as-is, and not render any HTML.
https://github.com/erlang/otp/issues/9739
In OTP28+, splitting an empty string returns an empty list, not an empty
string (the input).
Additionally `street-address` macro was removed in OTP28 - replace with
the value it used to be.
Lastly, rabbitmq_auth_backend_oauth2 has an MQTT test, so add
rabbitmq_mqtt to TEST_DEPS
[Why]
They were moved from `rabbit` to `rabbit_common` several years ago to
solve an dependency issue because `amqp_client` depended on the file
handle cache. This is not the case anymore.
[How]
The modules are moved back to `rabbit`.
`rabbit_common` doesn't need to depend on `os_mon` anymore. `rabbit`
already depends on it, so no changes needed here.
`include/rabbit_memory.hrl` and some test cases are moved as well to
follow the `vm_memory_monitor` module.
* Redesigned k8s peer discovery
Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.
This should prevent any race conditions and incorrectly formed clusters.
`rabbit_binary_generator:map_exception/3` will crash when there are
unicode characters in the `explaination` field of `Reason#amqp_error`
parameter. The explaination string (list) is assumed to be ascii, with
each character/member in the range of a byte. Any unicode characters
in the string will trigger `badarg` crash of `list_to_binary/1` in
`rabbit_binary_generator:amqp_exception_explanation/2`.
Amqp091 shovel crash due to this is reported,
https://github.com/rabbitmq/rabbitmq-server/discussions/12874
When a queue as shovel source/destination does not exist, and its
name contains non-ascii characters, the explaination of amqp_error
will be like `no queue non_ascii_name_😍 in vhost /`. It will
subsequently crash and even affect management console.
To fix this, `unicode:characters_to_binary/1` is used instead of
`list_to_binary/1`, and unicode-safe truncation of long explaination
with `io_lib:format/3` chars_limit replaces direct bytes truncation.
Expose the same metrics for AMQP 1.0 connections as for AMQP 0.9.1 connections.
Display the following AMQP 1.0 metrics on the Management UI:
* Network bytes per second from/to client on connections page
* Number of sessions/channels on connections page
* Network bytes per second from/to client graph on connection page
* Reductions graph on connection page
* Garbage colletion info on connection page
Expose the following AMQP 1.0 per-object Prometheus metrics:
* rabbitmq_connection_incoming_bytes_total
* rabbitmq_connection_outgoing_bytes_total
* rabbitmq_connection_process_reductions_total
* rabbitmq_connection_incoming_packets_total
* rabbitmq_connection_outgoing_packets_total
* rabbitmq_connection_pending_packets
* rabbitmq_connection_channels
The rabbit_amqp_writer proc:
* notifies the rabbit_amqp_reader proc if it sent frames
* hibernates eventually if it doesn't send any frames
The rabbit_amqp_reader proc:
* does not emit stats (update ETS tables) if no frames are received
or sent to save resources when there are many idle connections.
Provides a specific function to fix client ssl options, i.e.: apply all
fixes that are applied for TLS listeneres and clients on previous
versions but also sets `cacerts` option to CA certificates obtained by
`public_key:cacerts_get`, only when no `cacertfile` or `cacerts` are
provided.
OTP 27 reset all assumptions on how the vm reacts to processes that
buffer and process a lot of large binaries.
Substantially increasing the vheap sizes for such process restores
most of the same performance by allowing processes to hold more binary
data before major garbage collections are triggered.
This introduces a new module to capture process flag configurations.
The new vheap sizes are only applied when running on OTP 27 or
above.
[Why]
Before this patch, the $RABBITMQ_FEATURE_FLAGS environment variable took
an exhaustive list of feature flags to enable. This list overrode the
default of enabling all stable feature flags.
It made it inconvenient when a user wanted to enable an experimental
feature flag like `khepri_db` while still leaving the default behavior.
[How]
$RABBITMQ_FEATURE_FLAGS now acceps the following syntax:
RABBITMQ_FEATURE_FLAGS=+feature1,-feature2
This will start RabbitMQ with all stable feature flags, plus `feature1`,
but without `feature2`.
For users setting `forced_feature_flags_on_init` in the config, the
corresponding syntax is:
{forced_feature_flags_on_init, {rel, [feature1], [feature2]}}
The prior code skirted transactions because the filter function might
cause Khepri to call itself. We want to use the same idea as the old
code - get all queues, filter them, then delete them - but we want to
perform the deletion in a transaction and fail the transaction if any
queues changed since we read them.
This fixes a bug - that the call to `delete_in_khepri/2` could return
an error tuple that would be improperly recognized as `Deletions` -
but should also make deleting transient queues atomic and fast.
Each call to `delete_in_khepri/2` needed to wait on Ra to replicate
because the deletion is an individual command sent from one process.
Performing all deletions at once means we only need to wait for one
command to be replicated across the cluster.
We also bubble up any errors to delete now rather than storing them as
deletions. This fixes a crash that occurs on node down when Khepri is
in a minority.
Add copies of some per-object metrics that are labeled per-channel
aggregated to reduce cardinality. These metrics are valuable and
easier to process if exposed on per-exchange and per-queue basis.
* Deprecate queue-master-locator
This should not be a breaking change - all validation should still pass
* CQs can now use `queue-leader-locator`
* `queue-leader-locator` takes precedence over `queue-master-locator` if both are used
* regardless of which name is used, effectively there are only two values: `client-local` (default) or `balanced`
* other values (`min-masters`, `random`, `least-leaders`) are mapped to `balanced`
* Management UI no longer shows `master-locator` fields when declaring a queue/policy, but such arguments can still be used manually (unless not permitted)
* exclusive queues are always declared locally, as before
They are no longer used.
This removes a couple file_handle_cache:info/1 calls.
We are not removing them from the HTTP API to avoid
breaking things unintentionally.
Stats were not removed, including management UI stats
relating to FDs.
Web-MQTT and Web-STOMP configuration relating to FHC
were not removed.
The file_handle_cache itself must be kept until we
remove CQv1.
It will always use the ETS index. This change lets us
do optimisations that would otherwise not be possible,
including 81b2c39834953d9e1bd28938b7a6e472498fdf13.
A small functional change is included in this commit:
we now always use ets:update_counter to update the
ref_count, instead of a mix of update_{counter,fields}.
When upgrading to 4.0, the index will be rebuilt for
all users that were using a custom index module.
## What?
Introduce RabbitMQ internal flow control for messages sent to AMQP
clients.
Prior this PR, when an AMQP client granted a large amount of link
credit (e.g. 100k) to the sending queue, the sending queue sent
that amount of messages to the session process no matter what.
This becomes problematic for memory usage when the session process
cannot send out messages fast enough to the AMQP client, especially if
1. The writer proc cannot send fast enough. This can happen when
the AMQP client does not receive fast enough and causes TCP
back-pressure to the server. Or
2. The server session proc is limited by remote-incoming-window.
Both scenarios are now added as test cases.
Tests
* tcp_back_pressure_rabbitmq_internal_flow_quorum_queue
* tcp_back_pressure_rabbitmq_internal_flow_classic_queue
cover scenario 1.
Tests
* incoming_window_closed_rabbitmq_internal_flow_quorum_queue
* incoming_window_closed_rabbitmq_internal_flow_classic_queue
cover scenario 2.
This PR sends messages from queues to AMQP clients in a more controlled
manner.
To illustrate:
```
make run-broker PLUGINS="rabbitmq_management" RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 4"
observer_cli:start()
mq
```
where `mq` sorts by message queue length.
Create a stream:
```
deps/rabbitmq_management/bin/rabbitmqadmin declare queue name=s1 queue_type=stream durable=true
```
Next, send and receive from the Stream via AMQP.
Grant a large number of link credit to the sending stream:
```
docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest
bash-5.1# quiver --version
quiver 0.4.0-SNAPSHOT
bash-5.1# quiver //host.docker.internal//queue/s1 --durable -d 30s --credit 100000
```
**Before** to this PR:
```
RESULTS
Count ............................................... 100,696 messages
Duration ............................................... 30.0 seconds
Sender rate ......................................... 120,422 messages/s
Receiver rate ......................................... 3,363 messages/s
End-to-end rate ....................................... 3,359 messages/s
```
We observe that all 100k link credit worth of messages are buffered in the
writer proc's mailbox:
```
|No | Pid | MsgQueue |Name or Initial Call | Memory | Reductions |Current Function |
|1 |<0.845.0> |100001 |rabbit_amqp_writer:init/1 | 126.0734 MB| 466633491 |prim_inet:send/5 |
```
**After** to this PR:
```
RESULTS
Count ............................................. 2,973,440 messages
Duration ............................................... 30.0 seconds
Sender rate ......................................... 123,322 messages/s
Receiver rate ........................................ 99,250 messages/s
End-to-end rate ...................................... 99,148 messages/s
```
We observe that the message queue lengths of both writer and session
procs are low.
## How?
Our goal is to have queues send out messages in a controlled manner
without overloading RabbitMQ itself.
We want RabbitMQ internal flow control between:
```
AMQP writer proc <--- session proc <--- queue proc
```
A similar concept exists for classic queues sending via AMQP 0.9.1.
We want an approach that applies to AMQP and works generic for all queue
types.
For the interaction between AMQP writer proc and session proc we use a
simple credit based approach reusing module `credit_flow`.
For the interaction between session proc and queue proc, the following options
exist:
### Option 1
The session process provides expliclity feedback to the queue after it
has sent N messages.
This approach is implemented in
https://github.com/ansd/rabbitmq-server/tree/amqp-flow-control-poc-1
and works well.
A new `rabbit_queue_type:sent/4` API was added which lets the queue proc know
that it can send further messages to the session proc.
Pros:
* Will work equally well for AMQP 0.9.1, e.g. when quorum queues send messages
in auto ack mode to AMQP 0.9.1 clients.
* Simple for the session proc
Cons:
* Sligthly added complexity in every queue type implementation
* Multiple Ra commands (settle, credit, sent) to decide when a quorum
queue sends more messages.
### Option 2
A dual link approach where two AMQP links exists between
```
AMQP client <---link--> session proc <---link---> queue proc
```
When the client grants a large amount of credits, the session proc will
top up credits to the queue proc periodically in smaller batches.
Pros:
* No queue type modifications required.
* Re-uses AMQP link flow control
Cons:
* Significant added complexity in the session proc. A client can
dynamically decrease or increase credits and dynamically change the drain
mode while the session tops up credit to the queue.
### Option 3
Credit is a 32 bit unsigned integer.
The spec mandates that the receiver independently chooses a credit.
Nothing in the spec prevents the receiver to choose a credit of 1 billion.
However the credit value is merely a **maximum**:
> The link-credit variable defines the current maximum legal amount that the delivery-count can be increased by.
Therefore, the server is not required to send all available messages to this
receiver.
For delivery-count:
> Only the sender MAY independently modify this field.
"independently" could be interpreted as the sender could add to the delivery-count
irrespective of what the client chose for drain and link-credit.
Option 3: The queue proc could at credit time already consume credit
and advance the delivery-count if credit is too large before checking out any messages.
For example if credit is 100k, but the queue only wants to send 1k, the queue could
consume 99k of credits and advance the delivery-count, and subsequently send maximum 1k messages.
If the queue advanced the delivery-count, RabbitMQ must send a FLOW to the receiver,
otherwise the receiver wouldn’t know that it ran out of link-credit.
Pros:
* Very simple
Cons:
* Possibly unexpected behaviour for receiving AMQP clients
* Possibly poor end-to-end throughput in auto-ack mode because the queue
would send a batch of messages followed by a FLOW containing the advanced
delivery-count. Only therafter the client will learn that it ran out of
credits and top-up again. This feels like synchronously pulling a batch
of messages. In contrast, option 2 sends out more messages as soon as
the previous messages left RabbitMQ without requiring again a credit top
up from the receiver.
* drain mode with large credits requires the queue to send all available
messages and only thereafter advance the delivery-count. Therefore,
drain mode breaks option 3 somewhat.
### Option 4
Session proc drops message payload when its outgoing-pending queue gets
too large and re-reads payloads from the queue once the message can be
sent (see `get_checked_out` Ra command for quorum queues).
Cons:
* Would need to be implemented for every queue type, especially classic queues
* Doesn't limit the amount of message metadata in the session proc's
outgoing-pending queue
### Decision: Option 2
This commit implements option 2 to avoid any queue type modification.
At most one credit request is in-flight between session process and
queue process for a given queue consumer.
If the AMQP client sends another FLOW in between, the session proc
stashes the FLOW until it processes the previous credit reply.
A delivery is only sent from the outgoing-pending queue if the
session proc is not blocked by
1. writer proc, or
2. remote-incoming-window
The credit reply is placed into the outgoing-pending queue.
This ensures that the session proc will only top up the next batch of
credits if sufficient messages were sent out to the writer proc.
A future commit could additionally have each queue limit the number of
unacked messages for a given AMQP consumer, or alternatively make use
of session outgoing-window.
Put configuration credit_flow_default_credit into persistent term such
that the tuple doesn't have to be copied on the hot path.
Also, change persistent term keys from `{rabbit, AtomKey}` to `AtomKey`
so that hashing becomes cheaper.
In certain shutdown scenarios this function on
Erlang 26 runs into exceptions that stem from
application_controller.
The objective of this function is to be
an exception-safe version of
application:which_applications/1, so let's
handle more cases.
This helps certain test suites avoid exceptions
(process crash reports) logged during shutdown,
which makes CT helpers fail test run even
though there were no exceptions in RabbitMQ
itself, and all the exception indicates is a
certain edge case (during system shutdown)
that application_controller does not care to handle.
[Why]
This process failed to implement properly the OTP principles. For
instance, the mainloop always kept a reference on the module because it
was not tail-recursive.
This prevents the module from being reloaded at runtime: because the
process always keep that reference on the module, it is killed by the
Code server as part of the code reloading.
[Why]
Before, the backend would always return a list of nodes and the
subsystem would select one based on their uptimes, the nodes they are
already clustered with, and the readiness of their database.
This works well in general but has some limitations. For instance with
the Consul backend, the discoverability of nodes depends on when each
one registered and in which order. Therefore, the node with the highest
uptime might not be the first that registers. In this case, the one that
registers first will only discover itself and boot as a standalone node.
However, the one with the highest uptime that registered after will
discover both nodes. It will then select itself as the node to join
because it has the highest uptime. In the end both nodes form distinct
clusters.
Another example is the Kubernetes backend. The current solution works
fine but it could be optimized: the backend knows we always want to join
the first node ("$node-0") regardless of the order in which they are
started because picking the first node alphabetically is fine.
Therefore we want to let the backend selects the node to join if it
wants.
[How]
The `list_nodes()` callback can now return the following term:
{ok, {SelectedNode :: node(), NodeType}}
If the subsystem sees this return value, it will consider that the
returned node is the one to join. It will still query properties because
we want to make sure the node's database is ready before joining it.
[Why]
The two backends that use registration are Consul and etcd. The
discovery process relies on the registered nodes: they return whatever
was previously registered.
With the new checks and failsafes added in peer discovery in RabbitMQ
3.13.0, the fact that registration happens after running discovery
breaks Consul and etcd backend.
It used to work before because the first node would eventually time out
waiting for a non-empty list of nodes from the backend and proceed as a
standalone node, registering itself on the way. Following nodes would
then discover that first node.
Among the new checks, the node running discovery expects to find itself
in the list of discovered nodes. Because it didn't register yet, it will
never find itself.
[How]
The solution is to register first, then run discovery. The node should
at least get itself in the discovered nodes.