The rabbit_fifo_dlx_worker should be co-located with the quorum
queue leader.
If a new leader on a different node gets elected before the
rabbit_fifo_dlx_worker initialises (i.e. registers itself as a
consumer), it should stop itself normally, such that it is not restarted
by rabbit_fifo_dlx_sup.
Another rabbit_fifo_dlx_worker should be created on the new quorum
queue leader node.
Previously, it used the default intensity:
"intensity defaults to 1 and period defaults to 5."
However, it's a bit low given there can be dozens or hundreds of DLX
workers: If only 2 fail within 5 seconds, the whole supervisor
terminates.
Even with the new values, there shouldn't be any infnite loop of the
supervisor terminating and restarting childs because the
rabbit_fifo_dlx_worker is terminated and started very quickly
given that the (the slow) consumer registration happens in
rabbit_fifo_dlx_worker:handle_continue/2.
[Jepsen dead lettering tests](5977f587e2/jepsen/scripts/qq-jepsen-test.sh (L108))
of job `qq-jepsen-test-3-12` of Concourse pipeline `jepsen-tests`
fail sometimes with following error:
```
{{:try_clause, [{:undefined, #PID<12128.3596.0>, :worker, [:rabbit_fifo_dlx_worker]}, {:undefined, #PID<12128.10212.0>, :worker, [:rabbit_fifo_dlx_worker]}]}, [{:erl_eval, :try_clauses, 10, [file: 'erl_eval.erl', line: 995]}, {:erl_eval, :exprs, 2, []}]}
```
At the end of the Jepsen test, there are 2 DLX workers on the same node.
Analysing the logs reveals the following:
Source quorum queue node becomes leader and starts its DLX worker:
```
2023-03-18 12:14:04.365295+00:00 [debug] <0.1645.0> started rabbit_fifo_dlx_worker <0.3596.0> for queue 'jepsen.queue' in vhost '/'
```
Less than 1 second later, Mnesia reports a network partition (introduced
by Jepsen).
The DLX worker does not succeed to register as consumer to its source quorum queue because the Ra command times out:
```
2023-03-18 12:15:04.365840+00:00 [warning] <0.3596.0> Failed to process command {dlx,{checkout,<0.3596.0>,32}} on quorum queue leader {'%2F_jepsen.queue',
2023-03-18 12:15:04.365840+00:00 [warning] <0.3596.0> 'rabbit@concourse-qq-jepsen-312-3'}: {timeout,
2023-03-18 12:15:04.365840+00:00 [warning] <0.3596.0> {'%2F_jepsen.queue',
2023-03-18 12:15:04.365840+00:00 [warning] <0.3596.0> 'rabbit@concourse-qq-jepsen-312-3'}}
2023-03-18 12:15:04.365840+00:00 [warning] <0.3596.0> Trying 5 more time(s)...
```
3 seconds after the DLX worker got created, the local source quorum queue node is not leader anymore:
```
2023-03-18 12:14:07.289213+00:00 [notice] <0.1645.0> queue 'jepsen.queue' in vhost '/': leader -> follower in term: 17 machine version: 3
```
But because the DLX worker at this point failed to register as consumer, it will not be terminated in
865d533863/deps/rabbit/src/rabbit_fifo_dlx.erl (L264-L275)
Eventually, when the local node becomes a leader again, that DLX worker succeeds to register
as consumer (due to retries in 865d533863/deps/rabbit/src/rabbit_fifo_dlx_client.erl (L41-L58)),
and stays alive. When that happens, there is a 2nd DLX worker active because the 2nd
got started when the local quorum queue node transitioned to become a leader.
This commit prevents this issue.
So, last consumer who does a `#checkout{}` wins and the “old one” has to terminate.
CQv2 is significantly more efficient (x2-4 on some workloads),
has lower and more predictable memory footprint, and eliminates
the need to make classic queues lazy to achieve that predictability.
Per several discussions with the team.
The test always succeeds on `main` branch.
The test also always succeeds on `mc` branch when running remotely:
```
bazel test //deps/rabbitmq_mqtt:reader_SUITE --test_env FOCUS="-group tests -case rabbit_mqtt_qos0_queue_overflow" --config=rbe-25 -t- --runs_per_test=50
```
However, the test flakes when running on `mc` branch locally on the MAC:
```
make -C deps/rabbitmq_mqtt ct-reader t=tests:rabbit_mqtt_qos0_queue_overflow FULL=1
```
with the following local changes:
```
~/workspace/rabbitmq-server/deps/rabbitmq_mqtt mc *6 !1 > 3s direnv rb 2.7.2
diff --git a/deps/rabbitmq_mqtt/test/reader_SUITE.erl b/deps/rabbitmq_mqtt/test/reader_SUITE.erl
index fb71eae375..21377a2e73 100644
--- a/deps/rabbitmq_mqtt/test/reader_SUITE.erl
+++ b/deps/rabbitmq_mqtt/test/reader_SUITE.erl
@@ -27,7 +27,7 @@ all() ->
groups() ->
[
- {tests, [],
+ {tests, [{repeat_until_any_fail, 30}],
[
block_connack_timeout,
handle_invalid_packets,
@@ -43,7 +43,7 @@ groups() ->
].
suite() ->
- [{timetrap, {seconds, 60}}].
+ [{timetrap, {minutes, 60}}].
%% -------------------------------------------------------------------
%% Testsuite setup/teardown.
```
failes prior to this commit after the 2nd time and does not fail after
this commit.
This is useful for understanding if a deleted queue was matching any
policies given the more selective policies introduced in #7601.
Does not apply to bulk deletion of transient queues on node down.
Not taking the credits can starve the subscription,
making it permanently under its credit send limit.
The subscription then never dispatches messages when
it becomes active again.
This happens in an active-inactive-active cycle, especially
with slow consumers.
Rather than relying on queue name conventions, allow applying policies
based on the queue type. For example, this allows multiple policies that
apply to all queue names (".*") that specify different parameters for
different queue types.
Deploying a 5 node RabbitMQ cluster with rabbitmq_mqtt plugin enabled
using the cluster-operator with rabbitmq image
rabbitmq:3.12.0-beta.1-management often fails with the following
error:
```
Feature flags: failed to enable `delete_ra_cluster_mqtt_node`:
{error,
{no_more_servers_to_try,
[{error,
noproc},
{error,
noproc},
{timeout,
{mqtt_node,
'rabbit@mqtt-rabbit-3-12-server-2.mqtt-rabbit-3-12-nodes.default'}}]}}
```
During rabbitmq_mqtt plugin start, the plugin decides whether it should
create a Ra cluster:
If feature flag delete_ra_cluster_mqtt_node is enabled, no Ra cluster
gets created.
If this feature flag is disabled, a Ra cluster is created.
Even though all feature flags are enabled by default for a fresh
cluster, the feature flag subsystem cannot provide any promise when feature
flag migration functions run during the boot process:
The migration functions can run before plugins are started or many
seconds after plugins are started.
There is also no API that tells when feature flag initialisation on the
local node is completed.
Therefore, during a fresh 3.12 cluster start with rabbitmq_mqtt enabled,
some nodes decide to start a Ra cluster (because not all feature flags
are enabled yet when the rabbitmq_mqtt plugin is started).
Prior to this commit, when the feature flag delete_ra_cluster_mqtt_node
got enabled, Ra cluster deletion timed out because the Ra cluster never got
initialised successfully: Members never proceeded past the pre-vote
phase because their Ra peers already had feature flag delete_ra_cluster_mqtt_node
enabled and therefore don't participate.
One possible fix is to remove the feature flag delete_ra_cluster_mqtt_node
and have the newly upgraded 3.12 node delete its Ra membership during a
rolling update. However, this approach is too risky because during a
rolling update of a 3 node cluster A, B, and C the following issue
arises:
1. C upgrades successfully and deletes its Ra membership
2. B shuts down. At this point A and B are members of the Ra cluster
while only A is online (i.e. not a majority). MQTT clients which try
to re-connect to A fail because A times out registering the client in
ad5cc7e250/deps/rabbitmq_mqtt/src/rabbit_mqtt_processor.erl (L174-L179)
Therefore this commit fixes 3.12 cluster creation as follows:
The Ra cluster deletion timeout is reduced from 60 seconds to 15 seconds,
and the Ra server is force deleted if Ra cluster deletion fails.
Force deleting the server will wipe the Ra cluster data on disk.
The `provided_by` field is required. This permitted to discover a wrong
spec of `rabbit_ff_registry:get/1`.
While here, sort `feature_props_extended()` fields like
`feature_props()`.
The memory of rabbit_stream_coordinator process was reported as other,
because there was a mismatch in ancestors. It is not under
`ra_server_sup_sup` any more but:
`[_,ra_coordination_server_sup_sup,_,ra_systems_sup,ra_sup,_]`
Instead of ancestors use its registered name to categorize it.
[Why]
If Mnesia is stopped and the local node is a disk-less member, the list
returned by `mnesia:system_info(db_nodes)` will be empty. This is
problematic for any users of this list of members obviously.
[How]
The `members_using_mnesia/0` function copies the logic of
`rabbit_mnesia:cluster_status/1` but simplifies it to only work with all
members (not running members or disk members). Therefore, if Mnesia is
not running, we look at the cluster status files stored on disk. If they
are missing, the code falls back to a list made of the local node name
only.
* CQ: Optimise shared store remove when nothing to remove
The message(s) was/were already optimised out during write
when the client acked faster than we could process the
write message.
* Optimise sets:subtract in single-confirm case
This happens due to message store optimisations that tries
to confirm as fast as possible on write or ack, even if that
means processing a single confirm. The ack scenario is common
because clients tend to not be built for multi-ack.
The optimisation avoids calling an expensive sets:subtract/2
when there is a single new confirm and instead does a
sets:del_element/2 of the first set.
Instead of using atom `undefined`, use an integer as correlation term
when sending the will message to destination queues.
Classic queue clients for example expect a non negative integer.
Quorum queues expect any term.
Allow list of preferred_username_claims in cuttlefish
config style.
Use new config style on two selenium test suites
Test oauth2 backend's config schema and oauth2 management
config schema
"If the Client supplies a zero-byte ClientId with CleanSession set to 0,
the Server MUST respond to the CONNECT Packet with a CONNACK return code 0x02
(Identifier rejected) and then close the Network Connection" [MQTT-3.1.3-8].
In Web MQTT, the CONNACK was not sent to the client because the Web MQTT
connection process terminated before being sending the CONNACK to the
client.
This puts a limit to the amount of message data that is added
to the process heap at the same time to around 128KB.
Large prefetch values combined with large messages could cause
excessive garbage collection work.
Also similify the intermediate delivery message format to avoid
allocations that aren't necessary.
This new module sits on top of `rabbit_mnesia` and provide an API with
all cluster-related functions.
`rabbit_mnesia` should be called directly inside Mnesia-specific code
only, `rabbit_mnesia_rename` or classic mirrored queues for instance.
Otherwise, `rabbit_db_cluster` must be used.
Several modules, in particular in `rabbitmq_cli`, continue to call
`rabbit_mnesia` as a fallback option if the `rabbit_db_cluster` module
unavailable. This will be the case when the CLI will interact with an
older RabbitMQ version.
This will help with the introduction of a new database backend.
These functions extend the functionality of `erlang:is_process_alive/1`
to take into account the node a process is running on and its cluster
membership.
These functions are moved away from `rabbit_mnesia` because we don't
want `rabbit_mnesia` to be a central piece of RabbitMQ.
Classic-mirrored-queue-related modules continue to use `rabbit_mnesia`
functions, therefore relying on Mnesia, because they depend entirely on
Mnesia anyway. They will go away at the same time as our use of Mnesia.
So by keeping this code untouched, we avoid possible regressions.
[Why]
If a plugin was already enabled when RabbitMQ starts, its required
feature flags were correctly handled and thus enabled. However, this was
not the case for a plugin enabled at runtime.
Here is an example with the `drop_unroutable_metric` from the
rabbitmq_management_agent plugin:
Feature flags: `drop_unroutable_metric`: required feature flag not
enabled! It must be enabled before upgrading RabbitMQ.
Supporting required feature flags in plugin is trickier than in the
core broker. Indeed, with the broker, we know when this is the first
time the broker is started. Therefore we are sure that a required
feature flag can be enabled directly, there is no existing data/context
that could conflict with the code behind the required feature flag.
For plugins, this is different: a plugin can be enabled/disabled at
runtime and between broker restarts (and thus upgrades). So, when a
plugin is enabled and it has a required feature flag, we have no way to
make sure that there is no existing and conflicting data/context.
[How]
In this patch, if the required feature flag is provided by a plugin
(i.e. not `rabbit`), we always mark it as enabled.
The plugin is responsible for handling any existing data/context and
perform any cleanup/conversion.
Reported by: @ansd
Every ~30 runs, test case `sessionRedelivery` was failing with error:
```
[ERROR] sessionRedelivery{TestInfo} Time elapsed: 1.298 s <<< ERROR!
org.eclipse.paho.client.mqttv3.MqttException: Client is currently disconnecting
at com.rabbitmq.mqtt.test.MqttTest.sessionRedelivery(MqttTest.java:535)
```
The problem was that the Java client was still in connection state
`DISCONNECTING` which throws a Java exception when `connect()`ing.
So, the problem was client side.
We already check for `isConnected()` to be `false` which internally
checks for
```
conState == CONNECTED
```
However, there is no public client API to check for other connection
states. Therefore just waiting for a few milliseconds fixes the flake.
* Faster node startup with many classic queues
On my machines, in a test with 100k queues, node startup goes down from
4-5 minutes to roughly 1 minute. The difference will be even larger with
more queues as `lists:member` gets very expensive with a long list.
We see sporadic test failures where a test case hangs in the
receive until the Bazel suite timeout is reached.
There is no point in a test case to wait forever for an AMQP 0.9.1
connection to establish. Let's time out after 1 minute.
This will make the test case fail faster.
This is the latest commit in the series, it fixes (almost) all the
problems with missing and circular dependencies for typing.
The only 2 unsolved problems are:
- `lg` dependency for `rabbit` - the problem is that it's the only
dependency that contains NIF. And there is no way to make dialyzer
ignore it - looks like unknown check is not suppressable by dialyzer
directives. In the future making `lg` a proper dependency can be a
good thing anyway.
- some missing elixir function in `rabbitmq_cli` (CSV, JSON and
logging related).
- `eetcd` dependency for `rabbitmq_peer_discovery_etcd` - this one
uses sub-directories in `src/`, which confuses dialyzer (or our bazel
machinery is not able to properly handle it). I've tried the latest
rules_erlang which flattens directory for .beam files, but it wasn't
enough for dialyzer - it wasn't able to find core erlang files. This
is a niche plugin and an unusual dependency, so probably not worth
investigating further.
So far, we had the following functions to list nodes in a RabbitMQ
cluster:
* `rabbit_mnesia:cluster_nodes/1` to get members of the Mnesia cluster;
the argument was used to select members (all members or only those
running Mnesia and participating in the cluster)
* `rabbit_nodes:all/0` to get all members of the Mnesia cluster
* `rabbit_nodes:all_running/0` to get all members who currently run
Mnesia
Basically:
* `rabbit_nodes:all/0` calls `rabbit_mnesia:cluster_nodes(all)`
* `rabbit_nodes:all_running/0` calls `rabbit_mnesia:cluster_nodes(running)`
We also have:
* `rabbit_node_monitor:alive_nodes/1` which filters the given list of
nodes to only select those currently running Mnesia
* `rabbit_node_monitor:alive_rabbit_nodes/1` which filters the given
list of nodes to only select those currently running RabbitMQ
Most of the code uses `rabbit_mnesia:cluster_nodes/1` or the
`rabbit_nodes:all*/0` functions. `rabbit_mnesia:cluster_nodes(running)`
or `rabbit_nodes:all_running/0` is often used as a close approximation
of "all cluster members running RabbitMQ". This list might be incorrect
in times where a node is joining the clustered or is being worked on
(i.e. Mnesia is running but not RabbitMQ).
With Khepri, there won't be the same possible approximation because we
will try to keep Khepri/Ra running even if RabbitMQ is stopped to
expand/shrink the cluster.
So in order to clarify what we want when we query a list of nodes, this
patch introduces the following functions:
* `rabbit_nodes:list_members/0` to get all cluster members, regardless
of their state
* `rabbit_nodes:list_reachable/0` to get all cluster members we can
reach using Erlang distribution, regardless of the state of RabbitMQ
* `rabbit_nodes:list_running/0` to get all cluster members who run
RabbitMQ, regardless of the maintenance state
* `rabbit_nodes:list_serving/0` to get all cluster members who run
RabbitMQ and are accepting clients
In addition to the list functions, there are the corresponding
`rabbit_nodes:is_*(Node)` checks and `rabbit_nodes:filter_*(Nodes)`
filtering functions.
The code is modified to use these new functions. One possible
significant change is that the new list functions will perform RPC calls
to query the nodes' state, unlike `rabbit_mnesia:cluster_nodes(running)`.
The queue type being created for MQTT connections is solely determined
by the rabbitmq_mqtt plugin, not by per vhost defaults.
If the per vhost default queue type is configured to be a quorum queue,
we still want to create classic queues for MQTT connections.
Let's decrease the mailbox_soft_limit from 1000 to 200.
Obviously, both values are a bit arbitrary.
However, MQTT workloads usually do not have high throughput patterns for
a single MQTT connection. The only valid scenario where an MQTT
connections' process mailbox could have many messages is in large fan-in
scenarios where many MQTT devices sending messages at once to a single MQTT
device - which is rather unusual.
It makes more sense to protect against cluster wide memory alarms by
decreasing the mailbox_soft_limit.
If we don't pass arguments, `?LOG_*()` don't consider the passed string
as a format string, but as the plain message. Therefore, `~n` was not
interpreted.
While here, use `io_lib:format/2` instead of `rabbit_misc:format/2` as
we do a `lists:flatten/1` in the end anyway.
Always enable feature flag rabbit_mqtt_qos0_queue
in test case rabbit_mqtt_qos0_queue_overflow because this test case does
not make sense without the mqtt_qos0 queue type.
Note that enabling the feature flag should always succeed because this
test case runs on a single node, and therefore on a new version in mixed
version tests.
In the MQTT test assertions, instead of checking whether the test runs
in mixed version mode where all non-required feature flags are disabled
by default, check whether the given feature flag is enabled.
Prior to this commit, once feature flag rabbit_mqtt_qos0_queue becomes
required, the test cases would have failed.
RabbitMQ 3.12 requires feature flag `feature_flags_v2` which got
introduced in 3.11.0 (see
https://github.com/rabbitmq/rabbitmq-server/pull/6810).
Therefore, we can mark all feature flags that got introduced in 3.11.0
or before 3.11.0 as required because users will have to upgrade to
3.11.x first, before upgrading to 3.12.x
The advantage of marking these feature flags as required is that we can
start deleting any compatibliy code for these feature flags, similarly
as done in https://github.com/rabbitmq/rabbitmq-server/issues/5215
This list shows when a given feature flag was first introduced:
```
classic_mirrored_queue_version 3.11.0
stream_single_active_consumer 3.11.0
direct_exchange_routing_v2 3.11.0
listener_records_in_ets 3.11.0
tracking_records_in_ets 3.11.0
empty_basic_get_metric 3.8.10
drop_unroutable_metric 3.8.10
```
In this commit, we also force all required feature flags in Erlang
application `rabbit` to be enabled in mixed version cluster testing
and delete any tests that were about a feature flag starting as disabled.
Furthermore, this commit already deletes the callback (migration) functions
given they do not run anymore in 3.12.x.
All other clean up (i.e. branching depending on whether a feature flag
is enabled) will be done in separate commits.
Nowadays, the old RabbitMQ nodes in mixed version cluster
tests on `main` branch run in version 3.11.7.
Since maintenance mode was wrongly closing cluster-wide MQTT connections
only in RabbitMQ <3.11.2 (and <3.10.10), we can re-enable this mixed
version test.
Testcases are executed in a random order. Unfortunately, this testcase
depended on side effects of other testcases. If this testcase was
executed first, then there were no permissions set and the testcase
would fail.
It now lists permissions before and after the actual test and compare
both.
such that MQTT and WebMQTT tests of the shared_SUITE can run in parallel.
Before this commit, the shared_SUITE runs 14 minutes, after this commit
the shared_SUITE runs 4 minutes in GitHub actions.
AMQP 0.9.1 header x-mqtt-dup was determined by the incoming MQTT PUBLISH
packet's DUP flag. Its only use was to determine the outgoing MQTT
PUBLISH packet's DUP flag. However, that's wrong behaviour because
the MQTT 3.1.1 protocol spec mandates:
"The value of the DUP flag from an incoming PUBLISH packet is not
propagated when the PUBLISH Packet is sent to subscribers by the Server.
The DUP flag in the outgoing PUBLISH packet is set independently to the
incoming PUBLISH packet, its value MUST be determined solely by whether
the outgoing PUBLISH packet is a retransmission."
[MQTT-3.3.1-3]
Native MQTT fixes this wrong behaviour. Therefore, we can delete this
AMQP 0.9.1 header.
Native MQTT introduced a regression where the "{username}" and "{vhost}"
variables were not expanded in permission patterns.
This regression was unnoticed because the java_SUITE's
topicAuthorisationVariableExpansion test was wrongfully passing because
its topic started with "test-topic" which matched another allow listed
topic (namely "test-topic") instead of the pattern
"{username}.{client_id}.a".
This other java_SUITE regression got introduced by commit
26a17e8530
This commit fixes both the buggy Java test and the actual regression
introduced in Native MQTT.
This commit is pure refactoring making the code base more maintainable.
Replace rabbit_misc:pipeline/3 with the new OTP 25 experimental maybe
expression because
"Frequent ways in which people work with sequences of failable
operations include folds over lists of functions, and abusing list
comprehensions. Both patterns have heavy weaknesses that makes them less
than ideal."
https://www.erlang.org/eeps/eep-0049#obsoleting-messy-patterns
Additionally, this commit is more restrictive in the type spec of
rabbit_mqtt_processor state fields.
Specifically, many fields were defined to be `undefined | T` where
`undefined` was only temporarily until the first CONNECT packet was
processed by the processor.
It's better to initialise the MQTT processor upon first CONNECT packet
because there is no point in having a processor without having received
any packet.
This allows many type specs in the processor to change from `undefined |
T` to just `T`.
Additionally, memory is saved by removing the `received_connect_packet`
field from the `rabbit_mqtt_reader` and `rabbit_web_mqtt_handler`.
Include API functions to the rabbit_mqtt_retained_msg_store
behaviour module.
"There is a best practice to have the behaviour module include
the API also as it helps other parts of the code to be correct
and a bit more dialyzable."
This commit also fixes a bug where the retainer process had only
60 milliseconds shutdown time before being unconditionally killed.
60 milliseconds can be too short to dump a large ETS table containing
many retained messages to disk.
Some modules do not pass empty credentials, i.e., []
other modules pass {password, none} credential
and others no password but {rabbit_auth_backend_internal, Impl}
when the user has already been authenticated by the internal
backend
- Use the same base .plt everywhere, so there is no need to list
standard apps everywhere
- Fix typespecs: some typos and the use of not-exported types
Prior to this commit test `deps.rabbitmq_mqtt.cluster_SUITE`
`connection_id_tracking_with_decommissioned_node` was flaky and sometimes
failed with
```
{cluster_SUITE,connection_id_tracking_with_decommissioned_node,160}
{test_case_failed,failed to match connection count 0}
```
The return value was incorrectly specified and documented: it can return
`undefined` when a feature flag name is passed and that feature flag is
unknown.
So far, this function is always called with a feature flag properties
map, in which case it doesn't return `undefined`.
When we collect feature flag properties from all nodes, we start with an
empty cluster inventory (a common Erlang recursion pattern). This means
that all feature flags are unknown at first.
In `merge_feature_flags()`, we must compute a global stability level for
each feature flag, in case all nodes are not on the same page (like one
nodes considers a feature flag experimental, but another one marks it as
stable). That's why we rank stability level: required > stable >
experimental.
This ranking had one issue: `rabbit_feature_flags:get_stability/1`
defaults to `stable` if a feature flag has not explicit stability set.
Therefore, with our empty starting inventory, the starting stability
would be `stable`. And it would superceed an experimental feature flag
stability level, even though all nodes agree on that.
Now, if a feature flag is missing from our inventory being collected, we
consider its stability level to be experimental. This is different from
a known feature flag with no explicit stability level. This way, we are
sure that feature flags marked as experimental everywhere will be
considered experimental globally.
The issue is that users retrieved with
the intention to list in the limits view
are not paged hence they are not wrapped
around a paging struct where users would be
under items attribute.
Pending selenium tests
Classic queues used a different format for the `{send_drained, _}`
queue type action which was missed originally. This change handles both
formats in the channel for backwards compatibility
as well as changes classic queues to conform to the same format when
sending the queue event.
Whilst adding tests for this in the amqp10 plugin another issue around
the amqp10_client and filters was discovered and this commit also includes
improvements in this area. Such as more leninet support of source filters.
* Mark AMQP 1.0 properties chunk as binary
It is marked as an UTF8 string, which is not, so
strict AMQP 1.0 codecs can fail.
* Re-use AMQP 1.0 binary chunks if available
Instead of converting from AMQP 091 back to AMQP 1.0.
This is for AMQP 1.0 properties, application properties,
and message annotations.
* Test AMQP 1.0 binary chunk reuse
* Support AMQP 1.0 multi-value body better
In the rabbit_msg_record module, mostly. Before this commit,
only one Data section was supported. Now multiple Data sections,
multiple Sequence sections, and an AMQP value section are supported.
* Add test for non-single-data-section AMQP 1.0 message
* Squash some Dialyzer warnings
* Silent dialyzer for a function for now
* Fix type declaration, use type, not atom
* Address review comments
* Add rabbitmq_cli dialyze to bazel
and fix a number of warnings
Because we stop mix from recompiling rabbit_common in bazel, many
unknown functions are reported, so this dialyzer analysis is somewhat
incomplete.
* Use erlang dialyzer for rabbitmq_cli rather than mix dialyzer
Since this resolves all of the rabbit functions, there are far fewer
unknown functions.
Requires yet to be released rules_erlang 3.9.2
* Temporarily use pre-release rules_erlang
So that checks can run on this PR without a release
* Fix additional dialyzer warnings in rabbitmq_cli
* rabbitmq_cli: mix format
* Additional fixes for ignored return values
* Revert "Temporarily use pre-release rules_erlang"
This reverts commit c16b5b6815.
* Use rules_erlang 3.9.2
Use the outcome from first authentication
stored in the #user.authz_backends to authenticate
subsequent attempts which occur when a session is
opened.
In particular, during the first authentication attempt
which occurs during the sasl handshake, the amqp 1.0
plugins reads and validates JWT token present in the
password field.
When a new AMQP 1.0 session is opened, the plugin creates
an internal AMQP connection which triggers a second/nth
authentication. For this second/nth authentication, the
plugin propagates as Authentication Credentials the outcome
from the first authentication which is stored in the
`#user.authz_backends`.
The Oauth2 backend first attempts to authenticate using
the password credentials else it uses the credential with the
key `rabbit_auth_backend_oauth2` which has a function which
returns the decoded token
as it seems to always match peer_host.
Commit 7e09b85426 adds peer address
provided by WebMQTT plugin.
However, this seems unnecessary since function rabbit_net:peername/1 on
the unwrapped socket provides the same address.
The peer address was the address of the proxy if the proxy protocol is
enabled.
This commit simplifies code and reduces memory consumption.
Multiple users reported the following error when starting RabbitMQ in
containers:
```
Feature flags: failed to enable `direct_exchange_routing_v2`: {error,
{badarg,
[{ets,lookup,
[rabbit_exchange,
{resource,
<<"/">>,
exchange,
<<"messages">>}],
[{error_info,
#{cause =>
id,
module =>
erl_stdlib_errors}}]},
{rabbit_misc,
dirty_read,
1,
[{file,
"rabbit_misc.erl"},
{line,
372}]},
{rabbit_binding,
'-populate_index_route_table/0-fun-0-',
1,
[{file,
"rabbit_binding.erl"},
{line,
757}]},
...
```
Although Mnesia table rabbit_exchange is present when the migration
function for direct_exchange_routing_v2 runs, its ETS table is not yet
present because there is no table copy on the local node.
The table copy was added later after all feature flags were synced.
Fixes#7068.
In MQTT 3.1.1, the CONNECT packet consists of
1. 10 bytes variable header
2. ClientId (up to 23 bytes must be supported)
3. Will Topic
4. Will Message (maximum length 2^16 bytes)
5. User Name
6. Password
Restricting the CONNECT packet size to 2^16 = 65,536 bytes
seems to be a reasonalbe default.
The value is configurable via the MQTT app parameter
`max_packet_size_unauthenticated`.
(Instead of being called `max_packet_size_connect`) the
name `max_packet_size_unauthenticated` is generic
because MQTT 5 introduces an AUTH packet type.
* Fixes#6969 by
* tracking channels using a map rather than
process dictionary
* modifying Program.cs so that it opens 5
simulatneous sessions to send messages before
closing them and repeat again
* Address PR feedback: formatting, code style
When a single field in a record is updated, all remaining
fields' pointers are copied. Hence, if the record is large,
a lot will be copied.
Therefore, put static or rarely changing fields into their own record.
The same was done for the state in rabbit_channel or rabbit_fifo
for example.
Also, merge #info{} record into the new #cfg{} record.
as it was unnecessary to introduce it in the first place.
Remove the queue name from all queue type clients and pass the queue
name to the queue type callbacks that need it.
We have to leave feature flag classic_queue_type_delivery_support
required because we removed the monitor registry
1fd4a6d353/deps/rabbit/src/rabbit_queue_type.erl (L322-L325)
Implements review from Karl:
"rather than changing the message format we could amend the queue type
callbacks involved with the stateful operation to also take the queue
name record as an argument. This way we don't need to maintain the extra
queue name (which uses memory for known but obscurely technical reasons
with how maps work) in the queue type state (as it is used in the queue
type state map as the key)"
Instead of having optional rabbit_queue_type callbacks, add stub
implementations to rabbit_mqtt_qos0_queue throwing an exception.
The exception uses erlang:error/2 including stack trace and arguments
of the unsupported functions to ease debugging in case these functions
were ever to be called.
Dialyzer suppressions are added for these functions such that dialyzer
won't complain about:
```
rabbit_mqtt_qos0_queue.erl:244:1: Function init/1 only terminates with explicit exception
```
For example when at-most-once dead lettering does a fan out to many
target classic queues this commit will reduce inter-node data traffic by
using delegate.
Use delegate.
For large fan-outs with medium to large message size,
this commit will reduce inter-node data traffic by
multiple orders of magnitude preventing busy distribution
ports.
We want the build to fail if there are any dialyzer warnings in
rabbitmq_mqtt or rabbitmq_web_mqtt. Otherwise we rely on people manually
executing and checking the results of dialyzer.
Also, we want any test to fail that is flaky.
Flaky tests can indicate subtle errors in either test or program execution.
Instead of marking them as flaky, we should understand and - if possible -
fix the underlying root cause.
Fix OTP 25.0 dialyzer warning
Type gen_server:format_status() is known in OTP 25.2, but not in 25.0