[Why]
The `rabbit_khepri` module grew during the work to add Khepri support to
RabbitMQ and while Khepri was itself written. The current code is
therefore unorganized.
[How]
This commit tries to change proxy functions to be close to their Khepri
equivalent.
The module continues to set non-default options for write functions. We
also add the variants that take an option map to be consistent and not
have to deal with that in the future.
Several legacy functions were removed, either because they were no
longer called or because they were replace by a regular Khepri call.
[Why]
The `rabbit_khepri` module grew during the work to add Khepri support to
RabbitMQ and while Khepri was itself written. The current code is
therefore unorganized.
[How]
This commit tries to sort the code that manages the setup of Khepri and
the functions tha deal with the Khepri cluster. It also groups functions
which provide support for CLI commands.
It also adds documentation to several functions.
Finally, when a node joins a cluster, we stop displaying the content of
the Khepri tree.
It's a tradeoff between building the map for each incoming and outgoing
message (now that there are also outgoing interceptors) vs increased
memory usage for the MQTT proc state.
Connecting with MQTT 5.0 and client ID "xxxxxxxx", the number of words
are 201 before this commit vs 235 after this commit as determined by:
```
S = sys:get_state(MQTTConnectionPid),
erts_debug:size(S).
```
Therefore, this commit requires 34 word * 8 bytes = 272 bytes more per MQTT
connection, that is 272 MB more for 1,000,000 MQTT connections.
1. Force the config for timestamp and routing node message interceptors
to be configured with the overwrite boolean() to avoid defining
multiple default values throughout the code.
2. Add type specs
3. Extend existing test case for new MQTT client ID interceptor
4. routing node and timestamp should only set the annotation for
incoming_message_interceptors group
5. Fix `rabbitmq.conf`.
Prior to this commit there were several issue:
a.) Setting the right configuration was too user unfriendly, e.g. the user has to set
```
message_interceptor.incoming.rabbit_mqtt_message_interceptor_client_id.annotation_key = x-opt-mqtt-client-id
```
just to enable the MQTT message interceptor.
b.) The code that parses was too difficult to understand
c.) MQTT plugin was setting the env for app rabbit, which is an anti-pattern
d.) disabling a plugin (e.g. MQTT), left its message interceptors still in place
This is now all fixed, the user sets the rabbitmq.conf as follows:
```
message_interceptors.incoming.set_header_timestamp.overwrite = true
message_interceptors.incoming.set_header_routing_node.overwrite = false
mqtt.message_interceptors.incoming.set_client_id_annotation.enabled = true
```
Note that the first two lines use the same format as for RabbitMQ 4.0
for backwards compatiblity. The last line (MQTT) follows a similar
pattern.
This commit enables users to provide custom message interceptor modules,
i.e. modules to process incoming and outgoing messages. The
`rabbit_message_interceptor` behaviour defines a `intercept/4` callback,
for those modules to implement.
Co-authored-by: Péter Gömöri <gomoripeti@users.noreply.github.com>
[Why]
The intent is to have it stable and enabled by default for new
deployment in RabbitMQ 4.1.x.
To prepare for this goal, it is time to mark the feature flag as stable
to let us iron out the library and its integration into RabbitMQ.
This is not a commitment at this stage: we will revisit this near the
beginning of the release cycle and commit to it or revert to
experimental.
Key changes:
- endpoint variable to handle scraping multiple endpoints
- message size panels (new metric in 4.1)
- panels at the top of the Overview dashboard should be more up to date
(they show the latest value)
- values should be accurate if multiple endpoints are scraped
(previously, many would be doubled)
- Nodes table shows fewer volumns and shows node uptime
When debug logging is enabled, we log something at each log level
to test if logs are emitted. I don't think this is particularly useful,
but it's certainly annoying, because I constatnly need to filter
out these logs when searching if any errors happened during tests.
This auth backend behaves the same as the internal backend provided in
the core broker, but it only accepts loopback connections. External
connection attempts will receive an error.
This ensures that quorum_queues shuts down _before_
coordination where khepri run inside.
Quorum queues depend on khepri so need to be shut down first.
[Why]
The feature flag enable function is called during the initial migration
or when a node is later added to a cluster.
In this latter situation, the cluster is already formed and the Mnesia
tables were already migrated. Syncing the cluster in this specific
situation might kick another node that is currently unreachable.
[How]
If the node running the enable function is already clustered, we skip
the cluster sync.
[Why]
All callers of `khepri_adv` and `khepri_tx_adv` need updates to handle
the now uniform return type of `khepri:node_props_map()` in Khepri
0.17.0.
[How]
We don't need any compatibility code to handle "either the old return
type or the new return type" from the khepri_adv API because the
translation is done entirely in the "client side" code in Khepri -
meaning that the return value from the Ra server is the same but it is
translated differently by the functions in `khepri_adv`.
However, we need to adapt transaction functions because they may be
executed on different versions of Khepri and the behaviour of
`khepri_tx_adv` can be different. To take the possible change of return
value format, we use the new `khepri_tx:does_api_comply_with/1` to know
what to expect.
[Why]
In Khepri 0.17.0, `khepri_cluster:locally_known_members/1` and
`khepri_cluster:locally_known_node/1` were replaced with
`khepri_cluster:members/2` and `khepri_cluster:nodes/2` with `favor` set
to `low_latency` - this matches the interface for queries in Khepri.
Move leader repair earlier in tick function to ensure more
timely update of meta data store record after leader change.
Also use RPC_TIMEOUT macro for metric/stats multicalls to improve
liveness when a node is connected but partitioned / frozen.
This should address crashes like this in (found in user's logs):
```
exception error: no case clause matching
[[{connection_details,[]},
{name,<<"10.0.13.41:50497 -> 10.2.230.128:5671 (1)">>},
{node,rabbit@foobar},
{number,1},
{user,<<"...">>},
{user_who_performed_action,<<"...">>},
{vhost,<<"/">>}],
[{connection_details,[]},
{name,<<"10.0.13.41:50142 -> 10.2.230.128:5671 (1)">>},
{node,rabbit@foobar},
{number,1},
{user,<<"...">>},
{user_who_performed_action,<<"...">>},
{vhost,<<"/">>}]]
in function rabbit_federation_mgmt:format/3 (rabbit_federation_mgmt.erl, line 100)
in call from rabbit_federation_mgmt:'-status/3-lc$^0/1-0-'/4 (rabbit_federation_mgmt.erl, line 89)
in call from rabbit_federation_mgmt:'-status/4-lc$^0/1-0-'/3 (rabbit_federation_mgmt.erl, line 82)
in call from rabbit_federation_mgmt:'-status/4-lc$^0/1-0-'/3 (rabbit_federation_mgmt.erl, line 82)
in call from rabbit_federation_mgmt:status/4 (rabbit_federation_mgmt.erl, line 82)
in call from rabbit_federation_mgmt:to_json/2 (rabbit_federation_mgmt.erl, line 57)
in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
in call from cowboy_rest:set_resp_body/2 (src/cowboy_rest.erl, line 1473)
```
## What?
This commit determines the queue topology without checking the queue type.
## Why?
This way, checking leader and replicas works the same across all queue
types without the need to introduce other rabbit_queue_type behaviour as
suggested in other PRs.
## How?
pid is the leader, nodes in queue_type_states are the members/replicas.
This commit results in an unknown stream leader during queue
declaration. However the correct leader will be returned eventually when
calling GET on the stream.
This allows restricting access to the /api/index.html and
the /cli/index.html page to authenticated users should the
user really want to. This can be enabled via advanced.config.
A connection which terminated before it was fully established
would lead to a function_clause, since metadata is not available
to really call notify_connection_closed. We can just ignore such
connections and not notify about them.
Resolves https://github.com/rabbitmq/rabbitmq-server/discussions/13670
The same connection can contain several consumers belonging to a SAC
group (group key = vhost + stream + consumer name). The whole new group
must be re-evaluated to select a new active consumer after the consumers
of the down connection are removed from it.
The previous behavior would not re-evaluate the new group and could
select a consumer from the down connection, letting the group with only
inactive consumers, as the selected active consumer would never receive
the activation message from the stream SAC coordinator.
This commit fixes this problem by removing the consumers of the down
down connection from the affected groups and then performing the
appropriate operations for the groups to keep on consuming (e.g.
notifying an active consumer that it needs to step down).
References #13372
Ra improvements:
* Don't allow a non-voter to start elections
* Register with ra directory before initialising ra server.
* Trigger tick_timeout immediately after entering leader state.
* Set a configurable segment max size
This commit also includes a change to turn the quorum queue
become leader callback to become a noop and instead rely on
the more promptly tick_handler to handle the meta data store
update after a leader election.
This more prompt tick update means there should be a much shorter
gap between the queue metrics being deleted from the old leader
node to them being available again on the new node resulting
in smoother message count metrics.
Fix test that relied on waiting on too simplistic a property
before asserting.
Prior to this commit, when a client consumed from an unavailable quorum
queue, the following crash occurred:
```
{badmatch,{error,noproc}}
[{rabbit_quorum_queue,consume,3,[{file,\"rabbit_quorum_queue.erl\"},{line,993}]}
```
This commit fixes this bug by returning any error when registering a
quorum queue consumer to rabbit_queue_type.
This commit also refactors errors returned by
rabbit_queue_type:consume/3 to simplify and ensure seperation of
concerns.
For example prior to this commit, the channel did error
formatting specifically for consuming from streams. It's better if
the channel is unaware of what queue type it consumes from and have each
queue type implementation format their own errors.
Bump the timeout for management operations and link attachments from 20s
to 30s. We've seen timeouts in CI.
We bump the poll interval of the `?awaitMatch` macro because CI
sometimes flaked by crashing in
0e803de6dd/deps/rabbitmq_amqp_client/src/rabbitmq_amqp_client.erl (L411)
which indicates that the client lib received a response from a previous
request.
To take more frequent checkpoints for large message workload
Lower the min_checkpoint_interval substantially to allow quorum queues
better control over when checkpoints are taken.
Track bytes enqueued in the aux state and suggest a checkpoint after
every 64MB enqueued (this value is scaled according to backlog just
like the indexes condition).
This should help with more timely checkpointing when very large
messages is used.
Try evaluating byte size independently of time window
also increase max size
Delayed queuese can automatically create associated Shovels to transfer Ready messages
to the desired destination. This adds forwarded messages counter which will be used
in Management UI for better Shovel internals visibility.
(cherry picked from commit a8800b6cd75d8dc42a91f88655058f2ffa3b6ea6)
for the check introduced in #13487.
Note that encoding a regular expression pattern
with percent encoding is a pain (e.g. '.*' = '.%2a'),
so these endpoints fall back to a default pattern
value that matches all queues.
This commit fixes a bug in the Erlang AMQP 1.0 client.
Prior to this commit, to repro this bug:
1. Send more than 2^16 messages to a queue.
2. Grant more than a total of 2^16 link credit initially (on a single link
or across multiple links) on a single session without any
auto or manual link credit renewal.
The expectation is that thanks to sufficiently granted initial link-credit,
the client will receive all messages.
However, consumption stops after exactly 2^16-1 messages.
That's because the client lib was never sending a flow frame to the server.
So, after the client received all 2^16-1 messages (the initial
incoming-window set by the client), the server's remote-incoming-window
reached 0 causing the server to stop delivering messages.
The expectation is that the client lib automatically handles session
flow control without any manual involvement of the client app.
This commit implements this fix:
* We keep the server's remote-incoming window always large by default as
explained in https://www.rabbitmq.com/blog/2024/09/02/amqp-flow-control#incoming-window
* Hence, the client lib sets its incoming-window to 100,000 initially.
* The client lib tracks its incoming-window decrementing it by 1 for
every transfer it received. (This wasn't done prior to this commit.)
* Whenever this window shrinks below 50,000, the client sends a flow
frame without any link information widening its incoming-window back to 100,000.
* For test cases (maybe later for apps as well), there is a new function
`amqp10_client_session:flow/3`, which allows for a test case to do manual
session flow control. Its API is designed very similar to
`amqp10_client_session:flow_link/4` in that the test can optionally request
the lib to auto widen the session window whenever it falls below a certain threshold.
This is a squashed commit that includes the following changes by @efimov90:
* Initial-theme-fix
Added light.css
Added dark.css
Added link for light.css and dark.css with media attribute
Added switcher
* Rework-light-style
* dark theme
* Removed not needed div
* Fix folder name
* Color scheme fix
Removes color-scheme from main.css
Added color-scheme: dark to dark.css
Added color-scheme: light to light.css
* Fixed theme switch bug with sammy.js
Adapts code to works with sammy.js
* Icons update
* Reworked theme switcher
* Fix updating attributes
---------
Authored-by: Sergey Efimov <efimov90@gmail.com>
[Why]
Khepri already managed retries if needed, we can just use a timeout.
Note that the timeout was already bumped to a more appropriate 5
minutes, which also matches what we had with Mnesia. However, with 10
retries by default, it meant that this timeout at the end of `init/1`
would thus be 5 * 10 = 50 minutes.
This avoids using Mix while compiling which simplifies
a number of things and let us do further build improvements
later on.
Elixir is only enabled from within rabbitmq_cli currently.
Eunit is disabled since there are only Elixir tests.
Dialyzer will force-enable Elixir in order to process
Elixir-compiled beam files.
This commit also includes a few changes that are
related:
* The Erlang distribution will now be started for parallel-ct
* Many unnecessary PROJECT_MOD lines have been removed
* `eunit_formatters` has been removed, it provides little value
* The new `maybe_flock` Erlang.mk function is used where possible
* Build test deps when testing rabbitmq_cli (Mix won't do it anymore)
* rabbitmq_ct_helpers now use the early plugins to have Dialyzer
properly set up
It also happens from time to time that HTTP clients use the wrong port
5672. Like for TLS clients connecting to 5672, RabbitMQ now prints a
more descriptive log message.
For example
```
curl http://localhost:5672
```
will log
```
[info] <0.946.0> accepting AMQP connection [::1]:57736 -> [::1]:5672
[error] <0.946.0> closing AMQP connection <0.946.0> ([::1]:57736 -> [::1]:5672, duration: '1ms'):
[error] <0.946.0> {detected_unexpected_http_header,<<"GET / HT">>}
```
We only check here for GET and not for all other HTTP methods, since
that's the most common case.
## What?
If a TLS client app is misconfigured trying to connect to AMQP port 5672
instead to the AMQPS port 5671, this commit makes RabbitMQ log a more
descriptive error message.
```
openssl s_client -connect localhost:5672 -tls1_3
openssl s_client -connect localhost:5672 -tls1_2
```
RabbitMQ logs prior to this commit:
```
[info] <0.1073.0> accepting AMQP connection [::1]:53535 -> [::1]:5672
[error] <0.1073.0> closing AMQP connection <0.1073.0> ([::1]:53535 -> [::1]:5672, duration: '0ms'):
[error] <0.1073.0> {bad_header,<<22,3,1,0,192,1,0,0>>}
[info] <0.1080.0> accepting AMQP connection [::1]:53577 -> [::1]:5672
[error] <0.1080.0> closing AMQP connection <0.1080.0> ([::1]:53577 -> [::1]:5672, duration: '1ms'):
[error] <0.1080.0> {bad_header,<<22,3,1,0,224,1,0,0>>}
```
RabbitMQ logs after this commit:
```
[info] <0.969.0> accepting AMQP connection [::1]:53632 -> [::1]:5672
[error] <0.969.0> closing AMQP connection <0.969.0> ([::1]:53632 -> [::1]:5672, duration: '0ms'):
[error] <0.969.0> {detected_unexpected_tls_header,<<22,3,1,0,192,1,0,0>>
[info] <0.975.0> accepting AMQP connection [::1]:53638 -> [::1]:5672
[error] <0.975.0> closing AMQP connection <0.975.0> ([::1]:53638 -> [::1]:5672, duration: '1ms'):
[error] <0.975.0> {detected_unexpected_tls_header,<<22,3,1,0,224,1,0,0>>}
```
## Why?
I've seen numerous occurrences in the past few years where misconfigured TLS apps
connected to the wrong port. Therefore, RabbitMQ trying to detect a TLS client
and providing a more descriptive log message seems appropriate to me.
## How?
The first few bytes of any TLS connection are:
Record Type (1 byte):
Always 0x16 (22 in decimal) for a Handshake message.
Version (2 bytes):
This represents the highest version of TLS that the client supports. Common values:
0x0301 → TLS 1.0 (or SSL 3.1)
0x0302 → TLS 1.1
0x0303 → TLS 1.2
0x0304 → TLS 1.3
Record Length (2 bytes):
Specifies the length of the following handshake message.
Handshake Type (1 byte, usually the 6th byte overall):
Always 0x01 for ClientHello.
Before the client authenticates, the standard
frame_max is not used. Instead, the limit is
a special constant.
This is fine for password or x.509 certificate-based
authentication but not for some JWT tokens,
which can vary in size, and take multiple
kilobytes.
8 kB specifically is the default HTTP header
length limit used by Nginx.
Sounds like this value was good enough
for a lot of Bearer headers with JWT tokens.
Closes#13541.
It is very hard now to distinguish different tabs. With this addition
we have titles like 'RabbitMQ - Queue vhost/name', 'RabbitMQ - Exchanges'.
To be continued...
otherwise we end up with two copies of the compiled
module on the code path some of the time.
We don't need to mix Erlang and Elixir even
more to bring in one constant that hasn't changed
since its introduction some eight years ago.
(cherry picked from commit c32b948258f226a86be91cab80448d7a536afd7d)
RMQ-1263: Add a --force option to rabbitmqctl delete_queue command.
This work was originally done by Iliia Khaprov <iliia.khaprov@broadcom.net>.
---------
Co-authored-by: Iliia Khaprov <iliia.khaprov@broadcom.net>
Co-authored-by: Michael Klishin <klishinm@vmware.com>
(cherry picked from commit d9522d3ee708250cc84443af5c3556b14f7c5ab9)
* RMQ-1263: Check if queue protected from deleted inside rabbit_amqqueue:with_delete
Delayed exchange automatically manages associated Delayed Queue. We don't want users to delete it accidentally.
If queue is indeed protected its removal can be forced by calling with
?INTERNAL_USER as ActingUser.
* RMQ-1263: Correct a type spec of amqqueue:internal_owner/1
* RMQ-1263: Add protected queues test
---------
Co-authored-by: Iliia Khaprov <iliia.khaprov@broadcom.net>
Co-authored-by: Michael Klishin <klishinm@vmware.com>
(cherry picked from commit 97f44adfad6d0d98feb1c3a47de76e72694c19e0)
[Why]
The CLI sometimes crashes early because it fails to configure the Erlang
distribution.
Because we use two CLI commands to watch the start of RabbitMQ, if one
of them fails, the Make recipe will exit with an error, leaving the
RabbitMQ node running.
[How]
We use a shell trap to stop the node if the shell is about to exit with
an error.
While here, we retry the `await_startup` CLI command several times
because this is the one failing the most. This is until the crash is
understood and a proper fix is committed.
... and cache it.
[Why]
It happens at least in CI that the computed start time varies by a few
seconds. I think this comes from the Erlang time offset which might be
adjusted over time.
This affects peer discovery's sorting of RabbitMQ nodes which uses that
start time to determine the oldest node. When the start time of a node
changes, it could be considered the seed node to join by some nodes but
ignored by the other nodes, leading to troubles with cluster formation.
[Why]
It happens in CI from time to time and it was crashing the channel
process. There is always a `channel.close` method pending in the
channel mailbox.
[How]
For now, log something and ignore the DOWN message. The channel will
exit after handling the pending `channel.close` method anyway.
* Implement rabbitmq-queues leader_health_check command for quorum queues
(cherry picked from commit c26edbef33)
* Tests for rabbitmq-queues leader_health_check command
(cherry picked from commit 6cc03b0009)
* Ensure calling ParentPID in leader health check execution and
reuse and extend formatting API, with amqqueue:to_printable/2
(cherry picked from commit 76d66a1fd7)
* Extend core leader health check tests and update badrpc error handling in cli tests
(cherry picked from commit 857e2a73ca)
* Refactor leader_health_check command validators and ignore vhost arg
(cherry picked from commit 6cf9339e49)
* Update leader_health_check_command description and banner
(cherry picked from commit 96b8bced2d)
* Improve output formatting for healthy leaders and support
silent mode in rabbitmq-queues leader_health_check command
(cherry picked from commit 239a69b404)
* Support global flag to run leader health check for
all queues in all vhosts on local node
(cherry picked from commit 48ba3e161f)
* Return immediately for leader health checks on empty vhosts
(cherry picked from commit 7873737b35)
* Rename leader health check timeout refs
(cherry picked from commit b7dec89b87)
* Update banner message for global leader health check
(cherry picked from commit c7da4d5b24)
* QQ leader-health-check: check_process_limit_safety before spawning leader checks
(cherry picked from commit 17368454c5)
* Log leader health check result in broker logs (if any leaderless queues)
(cherry picked from commit 1084179a2c)
* Ensure check_passed result for leader health internal calls)
(cherry picked from commit 68739a6bd2)
* Extend CLI format output to process check_passed payload
(cherry picked from commit 5f5e9922bd)
* Format leader healthcheck result log and function exports
(cherry picked from commit ebffd7d8a4)
* Change leader_health_check command scope from queues to diagnostics
(cherry picked from commit 663fc9846e)
* Update (c) line year
(cherry picked from commit df82f12a70)
* Rename command to check_for_quorum_queues_without_an_elected_leader
and use across_all_vhosts option for global checks
(cherry picked from commit b2acbae28e)
* Use rabbit_db_queue for qq leader health check lookups
and introduce rabbit_db_queue:get_all_by_type_and_vhost/2.
Update leader health check timeout to 5s and process limit
threshold to 20% of node's process_limit.
(cherry picked from commit 7a8e166ff6)
* Update tests: quorum_queue_SUITE and rabbit_db_queue_SUITE
(cherry picked from commit 9bdb81fd79)
* Fix typo (cli test module)
(cherry picked from commit 615856853a)
* Small refactor - simpler final leader health check result return on function head match
(cherry picked from commit ea07938f3d)
* Clear dialyzer warning & fix type spec
(cherry picked from commit a45aa81bd2)
* Ignore result without strict match to avoid diayzer warning
(cherry picked from commit bb43c0b929)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' documentation edits
(cherry picked from commit 845230b0b380a5f5bad4e571a759c10f5cc93b91)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' output copywriting
(cherry picked from commit 235f43bad58d3a286faa0377b8778fcbe6f8705d)
* diagnostics check_for_quorum_queues_without_an_elected_leader: behave like a health check w.r.t. error reporting
(cherry picked from commit db7376797581e4716e659fad85ef484cc6f0ea15)
* check_for_quorum_queues_without_an_elected_leader: handle --quiet and --silent
plus simplify function heads.
References #13433.
(cherry picked from commit 7b392315d5e597e5171a0c8196230d92b8ea8e92)
---------
Co-authored-by: Ayanda Dube <adube14@bloomberg.net>
observer_cli (and its dependency recon) was declared as a dependency
of rabbitmq_cli and as a consequence included in all escritps. However
the major part of observer_cli runs in the broker. The cli side only
used `observer_cli:rpc_start/2` which is just an rpc call into the
target node.
By using common rpc call we can remove observer_cli and recon from the
escripts. This can be considered a minor improvement based on the
philosophy "simpler is better".
As an additional benefit auto-completing functions of the recon app
now works in `rabbitmq-diagnostics remote_shell`.
(eg. `recon:proc_c<TAB>`)
CI sometimes failed with the following error:
```
v5_SUITE:session_upgrade_v3_v5_qos failed on line 1068
Reason: {test_case_failed,Received unexpected PUBLISH payload. Expected: <<"2">> Got: <<"3">>}
```
The emqtt client auto acks by default.
Therefore, if Subv3 client was able to successfully auto ack message 2
before Subv3 disconnected, Subv5 client did not receive message 2.
This commit fixes this flake by making sure that Subv3 does not ack
message 2.
Fix crash in close_sent since the client might receive the open frame if
it previously sent the close frame in state open_sent.
We explicitly ignore the open frame. The alternative is to add another
gen_statem state CLOSE_PIPE which might be an overkill however.
This commit also fixes a wrong comment: No sessions have begun if the
app requests the connection to be closed in state open_sent.
[Why]
This testsuite is very unstable and it is difficult to debug while it is
part of a `parallel-ct` group. It also forced us to re-run the entire
`parallel-ct` group just to retry that one testsuite.
The `buffer` socket option will be changed dynamically
based on how much data is received.
This is restricted to AMQP protocols (old and 1.0).
The algorithm is a little different than Cowboy 2.13.
The moving average is less reactive (div 8 instead of 2)
and floats are used so that using smaller lower buffer
values is possible (otherwise the rounding prevents
increasing buffer sizes). The lower buffer size was
set to 128 as a result.
Compared to the previous which was to set `buffer` to
`rcvbuf` effectively, often to 131072 on Linux for
example, the performance sees a slight improvement
in various scenarios for all message sizes using
AMQP-0.9.1 and a lower memory usage as well. But
the difference is small in the benchmarks we have
run (5% to 10%), whereas Cowboy saw a huge improvement
because its default was very small (1460).
For AMQP-1.0 this seems to be no worse but we didn't
detect a clear improvement. We saw scenarios where
small message sizes showed improvement, and large
message sizes showed a regression. But we are even
less confident with these results. David (AMQP-1.0
native developer) ran a few tests and didn't see a
regression.
The dynamic buffer code is currently identical for
old and 1.0 AMQP. But we might tweak them differently
in the future so they're left as duplicate for now.
This is because different protocols have different
behaviors and so the algorithm may need to be tweaked
differently for each protocol.
The `msg` record was used in 3.13. This commit makes 4.x understand
this record for backward compatibility, specifically for the rare case where:
1. a 3.13 node internally parsed a message from a stream via
```
Message = mc:init(mc_amqp, amqp10_framing:decode_bin(Bin), #{})
```
2. published this Message to a queue
3. RabbitMQ got upgraded to 4.x
(This commit can be reverted in some future RabbitMQ version once it's
safe to assume that these upgraded messages have been consumed.)
The changes were manually tested as described in Jira RMQ-1525.
We must consider whether the previous current file is empty
(has data written, but was already removed) when writing
large messages and opening a file specifically for the large
message. If we don't, then the file will never get deleted
as we only consider files for deletion when a message gets
removed (and there are none).
This is only an issue for large messages. Small messages
write a message than roll over to a new file, so there is
at least one valid message. Large messages close the current
file first, regardless of there being a valid message.
Prior to this commit, if the WebSocket client received multiple
WebSocket frames in a single Erlang message by gen_tcp, the WebSocket
client sent only the first received WebSocket frame to the application.
This commit fixes this bug by having the WebSocket client send all
WebSocket frames to the application.
The `rabbit_registry` boot step starts up the `rabbit_registry` gen
server from `rabbit_common`. This is a registry somewhat similar to
the feature flag registry - it's meant to protect an ETS table used for
looking up implementers of behaviors. The registry and its ETS table
should be available as early as possible: the step should enable
external_infrastructure rather than require it.
The previous behaviour was passing solely the message ID making
queue implementations such as, for example, the priority one hard
to fulfil.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit 1f7a27c51d)
Since https://github.com/rabbitmq/rabbitmq-server/pull/13242 updated
Cowlib to v2.14.0, this commit deletes rabbit_uri as written in the
comments of rabbit_uri.erl:
```
This file is a partial copy of
https://github.com/ninenines/cowlib/blob/optimise-urldecode/src/cow_uri.erl
We use this copy because:
1. uri_string:unquote/1 is lax: It doesn't validate that characters that are
required to be percent encoded are indeed percent encoded. In RabbitMQ,
we want to enforce that proper percent encoding is done by AMQP clients.
2. uri_string:unquote/1 and cow_uri:urldecode/1 in cowlib v2.13.0 are both
slow because they allocate a new binary for the common case where no
character was percent encoded.
When a new cowlib version is released, we should make app rabbit depend on
app cowlib calling cow_uri:urldecode/1 and delete this file (rabbit_uri.erl).
```
Cowboy 2.13 contains the Websocket optimisations as well
as the ability to set the Websocket max_frame_size option
dynamically, plus plenty of other improvements.
Cowlib was added as a test dep to rabbitmq_mqtt to make
sure emqtt doesn't pull the wrong Cowlib version for Cowboy.
Only large messages delivered to multiple CQs are stored once for
multiple queues.
Non-durable queues are deprecated and will be removed, so don't even
mention them.
We don't "page out" messages anymore.
When trying to use OTP28.0-rc1, Elixir fails to compile these modules
because a module attribute cannot be a regex. It is not yet clear
whether it's something to be fixed in Elixir for OTP28 compatibility
or something that accidentally worked in the past, but either way,
using a string as an attribute is equally good and works all OTP
versions, including OTP28.0-rc1.
```
== Compilation error in file lib/rabbitmq/cli/core/command_modules.ex ==
** (ArgumentError) cannot inject attribute @commands_ns into function/macro because cannot escape #Reference<0.2201422310.1333657602.13657>. The supported values are: lists, tuples, maps, atoms, numbers, bitstrings, PIDs and remote functions in the format &Mod.fun/arity
(elixir 1.18.2) lib/kernel.ex:3729: Kernel.do_at/5
(elixir 1.18.2) expanding macro: Kernel.@/1
lib/rabbitmq/cli/core/command_modules.ex:133: RabbitMQ.CLI.Core.CommandModules.make_module_map/2
```
Since 4.0.0 (commit d45fbc3d) the shared message store writes large
messages into their own rdq files. This information can be utilised
when scanning rdq files during recovery to avoid reading in the whole
message body into memory unnecessarily.
This commit addresses the same issue that was addressed in 3.13.x by
commit baeefbec (ie. appending a large binary together from 4MB chunks
leaves a lot of garbage and memory fragmentation behind) but even more
efficiently.
Large messages which were written before 4.0.0, which don't fully fill
the rdq file, are still handled as before.
```
make -C deps/rabbit ct-rabbit_stream_queue t=cluster_size_3_parallel_1 RABBITMQ_METADATA_STORE=mnesia
```
flaked prior to this commit locally on Ubuntu with the following error after 11 runs:
```
rabbit_stream_queue_SUITE > cluster_size_3_parallel_1 > consume_from_replica
{error,
{{shutdown,
{server_initiated_close,406,
<<"PRECONDITION_FAILED - stream queue 'consume_from_replica' in vhost '/' does not have a running replica on the local node">>}},
{gen_server,call,
[<0.8365.0>,
{subscribe,
{'basic.consume',0,<<"consume_from_replica">>,
<<"ctag">>,false,false,false,false,
[{<<"x-stream-offset">>,long,0}]},
<0.8151.0>},
infinity]}}}
```
Initialising a message container from data stored in a
stream is a special case where we need to recover exchange
and routing key information from the following message annatations:
* x-exchange
* x-routing-keys
* x-cc
We do not want to do this when initialising a message container
from AMQP data just received from a publisher.
This commit introduces a new function `mc_amqp:init_from_stream/2`
that is to be used when needing a message container from a stream
message.