The stream manager does not need to be a gen_server (no cast, no state)
and the gen_server can create contention for large stream deployments
(some functions make cluster-wide calls that can take some time).
This commit fixes two different bugs/crashes.
To repro, prior to this commit:
1. Create an AMQP 1.0 connection on node-1.
2. Open the Management UI on node-2 and open the connection page of this
single AMQP 1.0 connection.
The first crash was the following:
```
[error] <0.1297.0> crasher:
[error] <0.1297.0> initial call: cowboy_stream_h:request_process/3
[error] <0.1297.0> pid: <0.1297.0>
[error] <0.1297.0> registered_name: []
[error] <0.1297.0> exception error: no case clause matching
[error] <0.1297.0> {badrpc,
[error] <0.1297.0> {'EXIT',
[error] <0.1297.0> {undef,
[error] <0.1297.0> [{rabbit_connection_tracking,lookup,
[error] <0.1297.0> [<<"[::1]:51729 -> [::1]:5672">>,
[error] <0.1297.0> ['rabbit-1@ABCDDDEEAA']],
[error] <0.1297.0> []}]}}}
[error] <0.1297.0> in function rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1297.0> in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1297.0> in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1297.0> in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1297.0> in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1297.0> in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1297.0> in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1297.0> in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
```
The second crash was the following:
```
[error] <0.1132.0> crasher:
[error] <0.1132.0> initial call: cowboy_stream_h:request_process/3
[error] <0.1132.0> pid: <0.1132.0>
[error] <0.1132.0> registered_name: []
[error] <0.1132.0> exception error: no case clause matching
[error] <0.1132.0> {tracked_connection,
[error] <0.1132.0> {'rabbit-1@ABCDDDEEAA',
[error] <0.1132.0> <<"[::1]:65505 -> [::1]:5672">>},
[error] <0.1132.0> 'rabbit-1@ABCDDDEEAA',<<"/">>,
[error] <0.1132.0> <<"[::1]:65505 -> [::1]:5672">>,<13661.1110.0>,
[error] <0.1132.0> {1,0},
[error] <0.1132.0> network,
[error] <0.1132.0> {0,0,0,0,0,0,0,1},
[error] <0.1132.0> 65505,<<"guest">>,1730908606089}
[error] <0.1132.0> in function rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1132.0> in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1132.0> in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1132.0> in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1132.0> in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1132.0> in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1132.0> in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1132.0> in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
## What?
On the connection page in the Management UI, display detailed session and
link information including:
* Link names
* Link target and source addresses
* Link flow control state
* Session flow control state
* Number of unconfirmed and unacknowledged messages
## How?
A new HTTP API endpoint is added:
```
/connections/:connection_name/sessions
```
The HTTP handler first queries the Erlang connection process to find out about
all session Pids. The handler then queries each Erlang session process
of this connection.
(The table auto-refreshes by default every 5 seconds. The handler querying a single
connection with 60 idle sessions with each 250 links takes ~100 ms.)
For better user experience in the Management UI, this commit also makes the
session process store and expose link names as well as source/target addresses.
[Why]
The "Feature flags" admin section had several issues:
* It was not designed for experimental feature flags. What was done for
RabbitMQ 4.0.0 was still unclear as to what a user should expect for
experimental feature flags.
* The UI uses synchronous requests from the browser main thread. It
means that for a feature flag that has a long running migration
callback, the browser tab could freeze for a very long time.
[How]
The feature flags table is reworked and now displays:
* a series of icons to highlight the following:
* a feature flag that has a migration function and thus that can
take time to be enabled
* a feature flag that is experimental
* whether this experimental feature flag is supported or not
* a toggle to quickly show if a feature flag is enabled or not and let
the user enable it at the same time.
For stable feature flags, when a user click on the toggle, the toggle
goes into an intermediate state while waiting for the response from the
broker. If the response is successful, the toggle is green. Otherwise it
goes back to red and the error is displayed in a popup as before.
For experimental feature flags, when a user click on the toggle, a popup
is displayed to let the user know of the possible constraints and
consequences, with one or two required checkboxes to tick so the user
confirms they understand the message. The feature flag is enabled only
after the user validates the popup. The displayed message and the
checkboxes depend on if the experimental feature flag is supported or
not (it is a new attribute of experimental feature flags).
The request to enable feature flags now uses the modern `fetch()` API.
Therefore it uses Javascript promises and does not block the main
thread: the UI remains responsive while a migration callback runs.
Finally, an "Enable all stable feature flags" button has been added to
the warning that tells the user some stable feature flags are still
disabled.
V2: Pause auto-refresh while a feature flag is being handled. This fixes
some display inconsistencies.
[Why]
The previous implementation was using the blocking `is_enabled/1` API.
This meant that if a feature flag was being enabled and the enable
callback took time, the CLI's `list_feature_flag` command or any use of
the management UI would block until the feature flag was enabled.
[How]
`get_state/1` now uses the non-blocking API. However it returns a now
possible value: `state_changing`.
[Why]
Durint the development of Khepri, it was difficult to communicate that
it was unsupported in RabbitMQ 3.13.x but was then supported in 4.0.x
even though it was still experimental.
[How]
The feature flag definition now exposes that support level in a now
attribute called `experiment_level`. It can be `unsupported` or
`supported`.
We can use this now attribute in the CLI or the web UI to convey the
level of support to the end user.
In the future, we could imagine that an experimental feature flag
becomes abandoned, where upgraded from a node that has it enabled to a
version that marks the feature flag as abandoned is not possible.
It is possible for a slow running follower with local consumers
to crash after a snapshot installation as it tries to read an entry
from its log that is no longer there (as it has been consumed and
completed by another node but still refers to prior consumers on the
current node).
This commit makes the log effect callback function more defensive
to check that the number of commands returned by the log effect
isn't different from what was requested. if it is different we
consider this a stale read request and return no further effects.
Conflicts:
deps/rabbit/test/quorum_queue_SUITE.erl
Closes#9259.
## What?
Allow an AMQP 1.0 client to renew an OAuth 2.0 token before it expires.
## Why?
This allows clients to keep the AMQP connection open instead of having
to create a new connection whenever the token expires.
## How?
As explained in https://github.com/rabbitmq/rabbitmq-server/issues/9259#issuecomment-2437602040
the client can `PUT` a new token on HTTP API v2 path `/auth/tokens`.
RabbitMQ will then:
1. Store the new token on the given connection.
2. Recheck access to the connection's vhost.
3. Clear all permission caches in the AMQP sessions.
4. Recheck write permissions to exchanges for links publishing to
RabbitMQ, and recheck read permissions from queues for links
consuming from RabbitMQ. The latter complies with the user
expectation in #11364.
For example, if the first restarted node doesn't start,
don't try to restart the other nodes. This mimics what
orchestrators such as Kubernetes or BOSH would do
(although they perform this check differently)
[Why]
Before this patch, required feature flags were basically checked during
boot: they must have been enabled when they were mere stable feature
flags. If they were not, the node refused to boot.
This was easy for the developer because making a feature flag required
allowed to remove the entire compatibility code. Very satisfying.
Unfortunately, this was a pain point to end users, especially those who
did not pay attention to RabbitMQ and the release notes and were just
asking their package manager to update everything. They could end up
with a node that refuse to boot. The only solution was to downgrade,
enable the disabled stable feature flags, upgrade again.
[How]
This patch introduces two levels of requirement to required feature
flags:
* `hard`: this corresponds to the existing behavior where a node will
refuse to boot if a hard required feature flag is not enabled before
the upgrade.
* `soft`: such a required feature flag will be automatically enabled
during the upgrade to a version where it is marked as required.
The level of requirement is set in the feature flag definition:
-rabbit_feature_flag(
{my_feature_flag,
#{stability => required,
require_level => hard
}}).
The default requirement level is `soft`. All existing required feature
flags have now a requirement level of `hard`.
The handling of soft required feature flag is done when the cluster
feature flags states are verified and synchronized. If a required
feature flag is not enabled yet, it is enabled at that time.
This means that as developers, we will have to keep compatibility code
forever for every soft required feature flag, like the feature flag
definition itself.
`init_per_group/3`, which starts the broker, was already called earlier
in the function.
This fixes a bug where the node can't be stopped in `end_per_group/2`,
attecting the next group ability to start one.
This test flakes in CI as described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
The test case fails with
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
"Unhandled exception. System.Exception: expected exception not received
at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477
at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
[{amqp_system_SUITE,run_dotnet_test,2,
[{file,"amqp_system_SUITE.erl"},
{line,257}]},
```
However, RabbitMQ closes the session as expected due to the missing read
permissions to the queue as shown in the RabbitMQ logs:
```
[debug] <0.1321.0> Asked to create a new user 'access_failure', password length in bytes: 24
[info] <0.1321.0> Created user 'access_failure'
[debug] <0.1324.0> Asked to set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1324.0> Successfully set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1333.0> accepting AMQP connection 127.0.0.1:36248 -> 127.0.0.1:25000
[debug] <0.1333.0> User 'access_failure' authenticated successfully by backend rabbit_auth_backend_internal
[info] <0.1333.0> Connection from AMQP 1.0 container 'AMQPNetLite-101d7d51': user 'access_failure' authenticated using SASL mechanism PLAIN and granted access to vhost '/'
[debug] <0.1333.0> AMQP 1.0 connection.open frame: hostname = 127.0.0.1, extracted vhost = /, idle-time-out = undefined
[debug] <0.1333.0> AMQP 1.0 created session process <0.1338.0> for channel number 0
[warning] <0.1338.0> Closing session for connection <0.1333.0>: {'v1_0.error',
[warning] <0.1338.0> {symbol,
[warning] <0.1338.0> <<"amqp:unauthorized-access">>},
[warning] <0.1338.0> {utf8,
[warning] <0.1338.0> <<"read access to queue 'test' in vhost '/' refused for user 'access_failure'">>},
[warning] <0.1338.0> undefined}
[debug] <0.1333.0> AMQP 1.0 closed session process <0.1338.0> with channel number 0
[warning] <0.1333.0> closing AMQP connection <0.1333.0> (127.0.0.1:36248 -> 127.0.0.1:25000, duration: '269ms'):
[warning] <0.1333.0> client unexpectedly closed TCP connection
```
```
let receiver = ReceiverLink(ac.Session, "test-receiver", src)
```
uses a null constructur for the onAttached callback.
ReceiverLink doesn't seem to block.
Given that the exact same authorization error is already tested in test
case attach_source_queue of amqp_auth_SUITE, it's safe to delete this F#
test.
Prior to this commit tests
* leader_transfer_quorum_queue_credit_single
* leader_transfer_quorum_queue_credit_batches
flaked in CI during 4.1 (main) and 4.0 mixed version testing.
The follwing error occurred on node 0:
```
[error] <0.1950.0> Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0
[warning] <0.1950.0> Closing session for connection <0.1945.0>: {'v1_0.error',
[warning] <0.1950.0> {symbol,<<"amqp:internal-error">>},
[warning] <0.1950.0> {utf8,
[warning] <0.1950.0> <<"Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0">>},
[warning] <0.1950.0> undefined}
```
Therefore we enable this feature flag for both tests.
This commit also simplifies some test setups that were necessary for
4.0/3.13 mixed version testing, but isn't necessary anymore for 4.1/4.0
mixed version testing.
Support x-cc message annotation
Support an `x-cc` message annotation in AMQP 1.0
similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1.
The value of the `x-cc` message annotation must by a list of strings.
A message annotation is used since application properties allow only simple types.
in order to troubleshoot the flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
"Unhandled exception. System.Exception: expected exception not received\n
at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477\n
at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
[{amqp_system_SUITE,run_dotnet_test,2,
[{file,"amqp_system_SUITE.erl"},
{line,257}]},
```
As described in https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2385379386
test case queue_topology flaked in CI with the following error:
```
rabbitmq_amqp_client > management_SUITE > cluster_size_3 > queue_topology
#1. {error,{test_case_failed,{824,
<<"rmq-ct-cluster_size_3-1-21000@localhost">>}}}
```
This flake could not be reproduced locally (neither with Mnesia nor with Khepri).
Expose the same metrics for AMQP 1.0 connections as for AMQP 0.9.1 connections.
Display the following AMQP 1.0 metrics on the Management UI:
* Network bytes per second from/to client on connections page
* Number of sessions/channels on connections page
* Network bytes per second from/to client graph on connection page
* Reductions graph on connection page
* Garbage colletion info on connection page
Expose the following AMQP 1.0 per-object Prometheus metrics:
* rabbitmq_connection_incoming_bytes_total
* rabbitmq_connection_outgoing_bytes_total
* rabbitmq_connection_process_reductions_total
* rabbitmq_connection_incoming_packets_total
* rabbitmq_connection_outgoing_packets_total
* rabbitmq_connection_pending_packets
* rabbitmq_connection_channels
The rabbit_amqp_writer proc:
* notifies the rabbit_amqp_reader proc if it sent frames
* hibernates eventually if it doesn't send any frames
The rabbit_amqp_reader proc:
* does not emit stats (update ETS tables) if no frames are received
or sent to save resources when there are many idle connections.
It is possible for a slow running follower with local consumers
to crash after a snapshot installation as it tries to read an entry
from its log that is no longer there (as it has been consumed and
completed by another node but still refers to prior consumers on the
current node).
This commit makes the log effect callback function more defensive
to check that the number of commands returned by the log effect
isn't different from what was requested. if it is different we
consider this a stale read request and return no further effects.
For example, if the first restarted node doesn't start,
don't try to restart the other nodes. This mimics what
orchestrators such as Kubernetes or BOSH would do
(although they perform this check differently)
[Why]
Before this patch, required feature flags were basically checked during
boot: they must have been enabled when they were mere stable feature
flags. If they were not, the node refused to boot.
This was easy for the developer because making a feature flag required
allowed to remove the entire compatibility code. Very satisfying.
Unfortunately, this was a pain point to end users, especially those who
did not pay attention to RabbitMQ and the release notes and were just
asking their package manager to update everything. They could end up
with a node that refuse to boot. The only solution was to downgrade,
enable the disabled stable feature flags, upgrade again.
[How]
This patch introduces two levels of requirement to required feature
flags:
* `hard`: this corresponds to the existing behavior where a node will
refuse to boot if a hard required feature flag is not enabled before
the upgrade.
* `soft`: such a required feature flag will be automatically enabled
during the upgrade to a version where it is marked as required.
The level of requirement is set in the feature flag definition:
-rabbit_feature_flag(
{my_feature_flag,
#{stability => required,
require_level => hard
}}).
The default requirement level is `soft`. All existing required feature
flags have now a requirement level of `hard`.
The handling of soft required feature flag is done when the cluster
feature flags states are verified and synchronized. If a required
feature flag is not enabled yet, it is enabled at that time.
This means that as developers, we will have to keep compatibility code
forever for every soft required feature flag, like the feature flag
definition itself.
Closes#9259.
## What?
Allow an AMQP 1.0 client to renew an OAuth 2.0 token before it expires.
## Why?
This allows clients to keep the AMQP connection open instead of having
to create a new connection whenever the token expires.
## How?
As explained in https://github.com/rabbitmq/rabbitmq-server/issues/9259#issuecomment-2437602040
the client can `PUT` a new token on HTTP API v2 path `/auth/tokens`.
RabbitMQ will then:
1. Store the new token on the given connection.
2. Recheck access to the connection's vhost.
3. Clear all permission caches in the AMQP sessions.
4. Recheck write permissions to exchanges for links publishing to
RabbitMQ, and recheck read permissions from queues for links
consuming from RabbitMQ. The latter complies with the user
expectation in #11364.
`init_per_group/3`, which starts the broker, was already called earlier
in the function.
This fixes a bug where the node can't be stopped in `end_per_group/2`,
attecting the next group ability to start one.
build(deps): bump org.springframework.boot:spring-boot-starter-parent from 3.3.4 to 3.3.5 in /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot_kotlin
This test flakes in CI as described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
The test case fails with
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
"Unhandled exception. System.Exception: expected exception not received
at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477
at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
[{amqp_system_SUITE,run_dotnet_test,2,
[{file,"amqp_system_SUITE.erl"},
{line,257}]},
```
However, RabbitMQ closes the session as expected due to the missing read
permissions to the queue as shown in the RabbitMQ logs:
```
[debug] <0.1321.0> Asked to create a new user 'access_failure', password length in bytes: 24
[info] <0.1321.0> Created user 'access_failure'
[debug] <0.1324.0> Asked to set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1324.0> Successfully set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1333.0> accepting AMQP connection 127.0.0.1:36248 -> 127.0.0.1:25000
[debug] <0.1333.0> User 'access_failure' authenticated successfully by backend rabbit_auth_backend_internal
[info] <0.1333.0> Connection from AMQP 1.0 container 'AMQPNetLite-101d7d51': user 'access_failure' authenticated using SASL mechanism PLAIN and granted access to vhost '/'
[debug] <0.1333.0> AMQP 1.0 connection.open frame: hostname = 127.0.0.1, extracted vhost = /, idle-time-out = undefined
[debug] <0.1333.0> AMQP 1.0 created session process <0.1338.0> for channel number 0
[warning] <0.1338.0> Closing session for connection <0.1333.0>: {'v1_0.error',
[warning] <0.1338.0> {symbol,
[warning] <0.1338.0> <<"amqp:unauthorized-access">>},
[warning] <0.1338.0> {utf8,
[warning] <0.1338.0> <<"read access to queue 'test' in vhost '/' refused for user 'access_failure'">>},
[warning] <0.1338.0> undefined}
[debug] <0.1333.0> AMQP 1.0 closed session process <0.1338.0> with channel number 0
[warning] <0.1333.0> closing AMQP connection <0.1333.0> (127.0.0.1:36248 -> 127.0.0.1:25000, duration: '269ms'):
[warning] <0.1333.0> client unexpectedly closed TCP connection
```
```
let receiver = ReceiverLink(ac.Session, "test-receiver", src)
```
uses a null constructur for the onAttached callback.
ReceiverLink doesn't seem to block.
Given that the exact same authorization error is already tested in test
case attach_source_queue of amqp_auth_SUITE, it's safe to delete this F#
test.
Prior to this commit tests
* leader_transfer_quorum_queue_credit_single
* leader_transfer_quorum_queue_credit_batches
flaked in CI during 4.1 (main) and 4.0 mixed version testing.
The follwing error occurred on node 0:
```
[error] <0.1950.0> Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0
[warning] <0.1950.0> Closing session for connection <0.1945.0>: {'v1_0.error',
[warning] <0.1950.0> {symbol,<<"amqp:internal-error">>},
[warning] <0.1950.0> {utf8,
[warning] <0.1950.0> <<"Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0">>},
[warning] <0.1950.0> undefined}
```
Therefore we enable this feature flag for both tests.
This commit also simplifies some test setups that were necessary for
4.0/3.13 mixed version testing, but isn't necessary anymore for 4.1/4.0
mixed version testing.
Support x-cc message annotation
Support an `x-cc` message annotation in AMQP 1.0
similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1.
The value of the `x-cc` message annotation must by a list of strings.
A message annotation is used since application properties allow only simple types.
in order to troubleshoot the flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
"Unhandled exception. System.Exception: expected exception not received\n
at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477\n
at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
[{amqp_system_SUITE,run_dotnet_test,2,
[{file,"amqp_system_SUITE.erl"},
{line,257}]},
```
As described in https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2385379386
test case queue_topology flaked in CI with the following error:
```
rabbitmq_amqp_client > management_SUITE > cluster_size_3 > queue_topology
#1. {error,{test_case_failed,{824,
<<"rmq-ct-cluster_size_3-1-21000@localhost">>}}}
```
This flake could not be reproduced locally (neither with Mnesia nor with Khepri).
Removes the usage of a ShouldLog parameter on several functions
and limits the logging of the message warning about the delivery_limit
not being set to the moment of queueDeclaration
Instead of checking the values for current configuration, represented in
`rabbit_quorum_queue:handle_tick` by the `Overview` variable, against
the effective policy, just regenerate the configuration and compare with
the current configuration.
This commit attempts to eliminate the test flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2385449940
```
rabbitmq_mqtt > parallel-ct-set-1 > mqtt_shared_SUITE > cluster_size_3 > v4 rabbit_mqtt_qos0_queue_kill_node
=== Ended at 2024-10-01 09:59:52
=== Location: [{mqtt_shared_SUITE,rabbit_mqtt_qos0_queue_kill_node,[1165](https://github.com/rabbitmq/rabbitmq-server/issues/mqtt_shared_suite.src.html#1165)},
{test_server,ts_tc,1793},
{test_server,run_test_case_eval1,1302},
{test_server,run_test_case_eval,1234}]
=== === Reason: no match of right hand side value {publish_not_received,
<<"m1">>}
in function mqtt_shared_SUITE:rabbit_mqtt_qos0_queue_kill_node/1 (mqtt_shared_SUITE.erl, line 1165)
in call from test_server:ts_tc/3 (test_server.erl, line 1793)
in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1302)
in call from test_server:run_test_case_eval/9 (test_server.erl, line 1234)
```
This flake could not be reproduced locally.
This commit also assumes that this flake occurred under Khepri but not
under Mnesia.
The hypothesis is the following:
* Node 0 is down
* MQTT client creates binding on node 1
* Khepri commits since the binding is replicated and persisted on node 1
and node 2. However the binding isn't reflected yet in node 2's
routing projecting table.
* Publishing a message to node 2 routes to nowhere.
Provides a specific function to fix client ssl options, i.e.: apply all
fixes that are applied for TLS listeneres and clients on previous
versions but also sets `cacerts` option to CA certificates obtained by
`public_key:cacerts_get`, only when no `cacertfile` or `cacerts` are
provided.
build(deps-dev): bump org.junit.jupiter:junit-jupiter-params from 5.11.2 to 5.11.3 in /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot
application-properties keys are restricted to be strings.
Prior to this commit, a function_clause error occurred if the client
requested an invalid filter:
```
│ *Error{Condition: amqp:internal-error, Description: Session error: function_clause
│ [{rabbit_amqp_filtex,'-validate0/2-fun-0-',
│ [{{symbol,<<"subject">>},{utf8,<<"var">>}}],
│ [{file,"rabbit_amqp_filtex.erl"},{line,119}]},
│ {lists,map,2,[{file,"lists.erl"},{line,2077}]},
│ {rabbit_amqp_filtex,validate0,2,[{file,"rabbit_amqp_filtex.erl"},{line,119}]},
│ {rabbit_amqp_filtex,validate,1,[{file,"rabbit_amqp_filtex.erl"},{line,28}]},
│ {rabbit_amqp_session,parse_filters,2,
│ [{file,"rabbit_amqp_session.erl"},{line,3068}]},
│ {rabbit_amqp_session,parse_filter,1,
│ [{file,"rabbit_amqp_session.erl"},{line,3014}]},
│ {rabbit_amqp_session,'-handle_attach/2-fun-0-',21,
│ [{file,"rabbit_amqp_session.erl"},{line,1371}]},
│
{rabbit_misc,with_exit_handler,2,[{file,"rabbit_misc.erl"},{line,465}]}],
Info: map[]}
```
After this commit, the filter won't actually take effect without a crash occurring.
Supersedes #12520
This commit notifies the client app with the AMQP performative if
connection config `notify_with_performative` is set to `true`.
This allows the client app to learn about all fields including
properties and capabilities returned by the AMQP server.
* Add BEAM dashboard
Also update the other dashboards by opening in Grafana v11.2.2 and ensuring they work as expected.
* Update the Erlang-Distributions-Compare dashboard
* Update the RabbitMQ-Overview dashboard
* Update the RabbitMQ-Quorum-Queues-Raft dashboard
* Update the RabbitMQ-Stream dashboard
* Update distribution link status panel
---------
Co-authored-by: Michal Kuratczyk <mkuratczyk@vmware.com>
Instead of every time we run Make for these applications.
This means that during development we are free to modify
these values or create new test suites without having to
worry about the check. If we forget to then add the test
suites in PARALLEL_CT the workflow will tell us.
Prior to this commit if dotnet or mvnw failed to fetch test
dependencies, for example because dotnet isn't installed, the test setup
crashed in an unexpected way:
```
amqp_system_SUITE > dotnet
{'EXIT',
{badarg,
[{lists,keysearch,
[rmq_nodes,1,
{skip,
"Failed to fetch .NET Core test project dependencies"}],
[{error_info,#{module => erl_stdlib_errors}}]},
{test_server,lookup_config,2,
[{file,"test_server.erl"},{line,1779}]},
{rabbit_ct_broker_helpers,get_node_configs,2,
[{file,"rabbit_ct_broker_helpers.erl"},{line,1411}]},
{rabbit_ct_broker_helpers,enable_feature_flag,2,
[{file,"rabbit_ct_broker_helpers.erl"},{line,1999}]},
{amqp_system_SUITE,init_per_group,2,
[{file,"amqp_system_SUITE.erl"},{line,77}]},
{test_server,ts_tc,3,[{file,"test_server.erl"},{line,1794}]},
{test_server,run_test_case_eval1,6,
[{file,"test_server.erl"},{line,1391}]},
{test_server,run_test_case_eval,9,
[{file,"test_server.erl"},{line,1235}]}]}}
```
This commit improves the error message instead of failing with `badarg`.
This commit also decides to fail the test setup instead of skipping the
suite because we always want CI to execute this test and be notified
instead of silently skipping if the test can't be run.
This commit fixes the CI error on `main` branch where
amqp_system_SUITE failed with the following error:
```
Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support.
at System.Environment.FailFast(System.String)
at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
at System.Globalization.GlobalizationMode..cctor()
at System.Globalization.CultureData.CreateCultureWithInvariantData()
at System.Globalization.CultureData.get_Invariant()
at System.Globalization.CultureInfo..cctor()
at System.String.ToLowerInvariant()
at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment.GetArch()
at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment..cctor()
at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment.GetRuntimeIdentifier()
at Microsoft.DotNet.Cli.MulticoreJitProfilePathCalculator.CalculateProfileRootPath()
at Microsoft.DotNet.Cli.MulticoreJitActivator.StartCliProfileOptimization()
at Microsoft.DotNet.Cli.MulticoreJitActivator.TryActivateMulticoreJit()
at Microsoft.DotNet.Cli.Program.Main(System.String[])
Exit code: 134 (pid <0.1533.0>)
```
As described in https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-annotations
> The annotations type is a map where the keys are restricted to be of type symbol or of type ulong.
> All ulong keys, and all symbolic keys except those beginning with "x-" are reserved.
Prior to this commit, if an AMQP client used a reserved annotation key,
the entire AMQP connection terminated with a function_clause error
message that might be difficult to understand for client libs:
```
<<"Session error: function_clause\n[{amqp10_framing,'-decode_annotations/1-fun-0-',\n [{{symbol,<<\"aa\">>},{utf8,<<\"bbb\">>}}],\n [{file,\"amqp10_framing.erl\"},{line,158}]},\n {lists,map,2,[{file,\"lists.erl\"},{line,1559}]},\n {amqp10_framing,decode,1,[{file,\"amqp10_framing.erl\"},{line,127}]},\n {lists,map_1,2,[{file,\"lists.erl\"},{line,1564}]},\n {lists,map,2,[{file,\"lists.erl\"},{line,1559}]},\n {mc_amqp,init,1,[{file,\"mc_amqp.erl\"},{line,102}]},\n {mc,init,4,[{file,\"mc.erl\"},{line,150}]},\n {rabbit_amqp_session,incoming_link_transfer,4,\n [{file,\"rabbit_amqp_session.erl\"},{line,2341}]}]">>
```
This commit ends only the session and provides a clearer error message.
Currently this function always falls back to the compatibility code
and never gets the benefit of using ra:key_metrics/1 due to incorrect
use of the map update operatior ":=" instead of the insert operator
"=>".
Test the use case described in https://github.com/rabbitmq/rabbitmq-website/pull/2095:
> Rather than relying solely on RabbitMQ's built-in dead lettering tracking via x-opt-deaths,
consumers can customise dead lettering event tracking.
Support tracking the requeue history as described in
https://github.com/rabbitmq/rabbitmq-website/pull/2095
This commit:
1. adds a test case tracing the requeue history via AMQP 1.0
using the modified outcome and
2. fixes bugs in the broker which crashed if a modified message
annotation value is an AMQP 1.0 list, map, or array.
Complex modified annotation values (list, map, array) are stored as tagged values from now on.
This means AMQP 0.9.1 consumers will not receive modified annotations of
type list, map, or array (which is okay).
OTP 27 reset all assumptions on how the vm reacts to processes that
buffer and process a lot of large binaries.
Substantially increasing the vheap sizes for such process restores
most of the same performance by allowing processes to hold more binary
data before major garbage collections are triggered.
This introduces a new module to capture process flag configurations.
The new vheap sizes are only applied when running on OTP 27 or
above.
It was still possible, although rare, to have message store
files lose message data, when the following conditions were
met:
* the message data contains byte values 255
(255 is used as an OK marker after a message)
* the message is located after a 0-filled hole in the file
* the length of the data is at least 4096 bytes and
if we misread it (as detailed below) we encounter
a 255 byte where we expect the OK marker
The trick for the code to previously misread the length can
be explained as follow:
A message is stored in the following format:
<<Len:64, MsgIdAndMsg:Len/unit:8, 255>>
With MsgId always being 16 bytes in length. So Len is always
at least 16, if the message data Msg is empty. But technically
it never is.
Now if we have a zero filled hole just before this message,
we may end up with this:
<<0, Len:64, MsgIdAndMsg:Len/unit:8, 255>>
When we are scanning we are testing bytes to see if there is
a message there or not. We look for a Len that gives us byte
255 after MsgIdAndMsg.
Len of value 4096 looks like this in binary:
<<0:48, 16, 0>>
Problem is if we have leading zeroes, Len may look like this:
<<0, 0:48, 16, 0>>
If we take the first 64 bits we get a potential length of 16.
We look at the byte after the next 16 bytes. If it is 255, we
think this is a message and skip by this amount of bytes, and
mistakenly miss the real message.
Solving this by changing the file format would be simple enough,
but we don't have the luxury to afford that. A different solution
was found, which is to combine file scanning with checking that
the message exists in the message store index (populated from
queues at startup, and kept up to date over the life time of
the store). Then we know for sure that the message above
doesn't exist, because the MsgId won't be found in the index.
If it is, then the file number and offset will not match,
and the check will fail.
There remains a small chance that we get it wrong during dirty
recovery. Only a better file format would improve that.
Add javascript unit tests given that amount of
javascript code it is difficult to get good coverage
with just end-to-end tests
The tests are not running yet because i need to learn
how to use Babel to convert ES5 modules into NodeJs modules
otherwise it is not possible because all the source modules
use ES5 modules whereas tests run from node.js which requires
CommonJS
split rabbit_oauth2_config into
- rabbit_oauth2_resource_server
- rabbit_oauth2_oauth_provider
and their respective test modules
Signing keys is an oauth provider
concern hence it stays with the
oauth_provider module.
* Support AMQP filter expressions
## What?
This PR implements the following property filter expressions for AMQP clients
consuming from streams as defined in
[AMQP Filter Expressions Version 1.0 Working Draft 09](https://groups.oasis-open.org/higherlogic/ws/public/document?document_id=66227):
* properties filters [section 4.2.4]
* application-properties filters [section 4.2.5]
String prefix and suffix matching is also supported.
This PR also fixes a bug where RabbitMQ would accept wrong filters.
Specifically, prior to this PR the values of the filter-set's map were
allowed to be symbols. However, "every value MUST be either null or of a
described type which provides the archetype filter."
## Why?
This feature adds the ability to RabbitMQ to have multiple concurrent clients
each consuming only a subset of messages while maintaining message order.
This feature also reduces network traffic between RabbitMQ and clients by
only dispatching those messages that the clients are actually interested in.
Note that AMQP filter expressions are more fine grained than the [bloom filter based
stream filtering](https://www.rabbitmq.com/blog/2023/10/16/stream-filtering) because
* they do not suffer false positives
* the unit of filtering is per-message instead of per-chunk
* matching can be performed on **multiple** values in the properties and
application-properties sections
* prefix and suffix matching on the actual values is supported.
Both, AMQP filter expressions and bloom filters can be used together.
## How?
If a filter isn't valid, RabbitMQ ignores the filter. RabbitMQ only
replies with filters it actually supports and validated successfully to
comply with:
"The receiving endpoint sets its desired filter, the sending endpoint
[RabbitMQ] sets the filter actually in place (including any filters defaulted at
the node)."
* Delete streams test case
The test suite constructed a wrong filter-set.
Specifically the value of the filter-set didn't use a described type as
mandated by the spec.
Using https://azure.github.io/amqpnetlite/api/Amqp.Types.DescribedValue.html
throws errors that the descriptor can't be encoded. Given that this code
path is already tests via the amqp_filtex_SUITE, this F# test gets
therefore deleted.
* Re-introduce the AMQP filter-set bug
Since clients might rely on the wrong filter-set value type, we support
the bug behind a deprecated feature flag and gradually remove support
this bug.
* Revert "Delete streams test case"
This reverts commit c95cfeaef7.