Commit Graph

3414 Commits

Author SHA1 Message Date
Jean-Sébastien Pédron 5cbda4c838
rabbit_stream_queue_SUITE: Swap uses of node 2 and 3 in `format`
[Why]
We hit some transient errors with the previous order when doing
mixed-version testing. Swapping the nodes seems to fix the problem.
2025-02-11 15:50:07 +01:00
Jean-Sébastien Pédron 30fb8a719f
feature_flags_SUITE: Change clustering seed node in few tests
[Why]
Some testcases used to use node 1 as the clustering seed node. With
mixed-version testing, it could cause issues because node 1 would start
with a new version of Ra compared to node 2 and node 2 could fail to
join.

[How]
By using node 2 as the seed node, node 1 running a newer version of Ra
should be able to join because it supports talking to an older version.
2025-02-11 12:10:33 +01:00
Jean-Sébastien Pédron c78aec7d48
rabbit_db: `force_reset` command is unsupported with Khepri
[Why]
The `force_reset` command simply removes local files on disk for the
local node.

In the case of Ra, this can't work because the rest of the cluster does
not know about the forced-reset node. Therefore the leader will continue
to send `append_entry` commands to the reset node.

If that forced-reset node restarts and receives these messages, it will
either join the cluster again (because it's on an older Raft term) or it
will hit an assertion and exit (because it's on the same Raft term).

[How]
Given we can't really support this scenario and it has little value, the
command will now return an error if someone attemps a `force_reset` with
a node running Khepri.

This also deprecates the command: once Mnesia support is removed, the
command will be removed at the same time. This is noted in the
rabbitmqctl.8 manpage.
2025-02-10 15:09:36 +01:00
Frederik Bosch b821e52329 disable consul health check helper when registration is disabled 2025-02-06 14:18:18 +01:00
GitHub c6cdc40f1a bazel run gazelle 2025-02-05 04:02:24 +00:00
Michal Kuratczyk a9d5b6001a
k8s peer discovery v2 (#13050)
* Redesigned k8s peer discovery

Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.

This should prevent any race conditions and incorrectly formed clusters.
2025-02-04 17:07:27 +01:00
David Ansari 7bc3ab8cd4 Add tests for different JMS message types
This commit contains the following changes:
1. Simplify .NET suite
2. Simplify Java package naming
3. Extract JMS tests into separate suite. This way, it's easier to run,
debug, and add new tests compared to the previous suite which mixed
.NET tests with JMS tests.
4. Add tests for different JMS message types
2025-02-04 14:46:49 +01:00
Michael Klishin b4f5d3d51a
Core config_schema_SUITE: cosmetics 2025-02-03 19:14:25 -05:00
Michael Klishin 269685dd6e
Make it possible to opt out of peer discovery registration
for the backends that support it in the first place.

When forming a cluster, registration of the node
joining the cluster might be left to (container)
orchestration tools like Nomad or Kubernetes.

This PR add a new configuration option,
'cluster_formation.registration.enable',
which defaults to true.
When set to false node registration will be skipped.

There is at least one important advantage using a
tool such as Nomad (plus Consul) over the application
(RabbitMQ) doing the registration.

When the application is not stopped gracefully for
any reason, e.g. its OOM killed,
it cannot deregister the service/node.

This leaves behind an unlinked service entry in the registry.
This problem is fundamentally avoided by allowing
Nomad (or similar tools) to register the
node'service.

See #11233  #11045 for prior discussions.

Co-authored-by: Frederik Bosch <f.bosch@genkgo.nl>
2025-02-03 13:58:17 -05:00
David Ansari 0b1cfc6f04
Impose limit on AMQP filter complexity
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.

Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
2025-02-03 11:57:43 -05:00
David Ansari 4bdddc7a98
Bump Qpid JMS AMQP 1.0 client 2025-02-03 11:57:43 -05:00
Michael Klishin df424e5edf
Bump Khepri cluster formation timeout
to match that used with Mnesia.

In the case of Mnesia, there are 10 retries
with a 30 second delay each.

For Khepri, a single timeout is used, so it
must be ten times as long.
2025-02-03 11:57:42 -05:00
Frederik Bosch 481ffb2d6c add option to disable registration of node during cluster formation 2025-02-01 14:09:03 +01:00
Frederik Bosch 8355bc691e add option to disable registration of node during cluster formation 2025-02-01 13:36:22 +01:00
Michal Kuratczyk 198d8f93d3
Make sure there are no duplicates on the nodes list
This is probably more of a CI bug than anything else - in CI we use
a lower tick value, which increases the odds of the periodic
repair triggering very early.

Failure:
https://github.com/rabbitmq/rabbitmq-server/actions/runs/13013004042/job/36295172214?pr=13050
2025-01-28 17:10:13 +01:00
Karl Nilsson 7862c2f501
Merge pull request #12713 from rabbitmq/ra-2.16.0
Ra v2.16.0
2025-01-28 16:00:08 +01:00
GitHub a013bb1d1c bazel run gazelle 2025-01-28 04:02:47 +00:00
David Ansari 579c58603e Support AMQP over WebSocket (OSS part) 2025-01-27 17:50:47 +01:00
Karl Nilsson e58b8ebc1d QQ: refactor add_member method to pass dialyzer
And be less confusing around the arguments that add_member/4 actually
takes.
2025-01-27 13:50:38 +00:00
Karl Nilsson 113b16bbc4 Use initial_machine_version config to avoid initalising
from rabbit_fifo version 0.

The same was also implemented for the stream coordinator.

QQ: avoid dead lock in queue federation.

When processing the queue federation startup even the process
may call back into the ra process causing a deadlock. in this
case we spawn a temporary process to avoid this.
2025-01-27 13:50:38 +00:00
Karl Nilsson f2b1f37331 QQ: Use new log_ext effect
This offloads the work of reading messages from on-disk segments
to the interacting process rather than doing this blocking, performance
affecting work in the ra server process.

QQ: ensure opened segments are closed after some time of inactivity

Processes that havea received messages that had to be read from disks
may keep a segment open indefinitely. This introduces a timer which
after some time of inactivity will close all opened segments to ensure
file descriptors are not kept open indefinitely.
2025-01-27 13:50:38 +00:00
Jean-Sébastien Pédron f549425615
rabbitmq_ct_broker_helpers: Use node 2 as the cluster seed node
[Why]
When running mixed-version tests, nodes 1/3/5/... are using the primary
umbrella, so usually the newest version. Nodes 2/4/6/... are using the
secondary umbrella, thus the old version.

When clustering, we used to use node 1 (running a new version) as the
seed node, meaning other nodes would join it.

This complicates things with feature flags because we have to make sure
that we start node 1 with new stable feature flags disabled to allow old
nodes to join.

This is also a problem with Khepri machine versions because the cluster
would start with the latest version, which old nodes might not have.

[How]
This patch changes the logic to use a node running the secondary
umbrella as the seed node instead. If there is no node running it, we
pick the first node as before.

V2: Revert part of "rabbitmq_ct_helpers: Fix how we set
    `$RABBITMQ_FEATURE_FLAGS` in tests" (commit
    57ed962ef6). These changes are no
    longer needed with the new logic.

V3: The check that verifies that the correct metadata store is used has
    a special case for nodes that use the secondary umbrella: if Khepri
    is supposed to be used but it's not, the feature flag is enabled.
    The reason is that the `v4.0.x` branch doesn't know about the `rel`
    configuration of `forced_feature_flags_on_init`. The nodes will
    have ignored thies parameter and booted with the stable feature
    flags only.

    Many testsuites are adapted to the new clustering order. If they
    manage which node joins which node, either the order is changed in
    the testcases, or nodes are started with only required feature
    flags. For testsuites that rely on peer discovery where the order is
    unknown, nodes are started with only required feature flags.
2025-01-27 12:08:12 +01:00
Michael Klishin 28602bea37
scripts/rabbitmqctl: allow standard input reads for 'import_definitions'
It was not listed in 7da7d4e1e, even though the command
accepts definitions via standard input.

References #10268.
Closes #13157.
2025-01-26 18:36:08 -05:00
Michael Klishin a3decfa69e
Merge pull request #13115 from rabbitmq/issue-13040
Support exchange federation with MQTT 5.0 subscribers
2025-01-25 16:25:26 -05:00
Simon c9f060d921
Merge branch 'main' into su_aws/limit_logging_error_with_tests 2025-01-24 15:55:37 -08:00
Simon Unge 3702b00471 Log incorrectly claims the limit is per node, but the component count is over all vhost in the cluster 2025-01-24 23:54:11 +00:00
Michael Klishin de088f8947
Revert "Log incorrectly claims the limit is per node," 2025-01-24 16:42:52 -05:00
Michael Klishin 79dd5038f7
Merge pull request #13149 from rabbitmq/su_aws/limit_logging_error
Log incorrectly claims the limit is per node,
2025-01-24 14:10:00 -05:00
Simon Unge c93cacf477 Log incorrectly claims the limit is per node, but the component count is over all vhost in the cluster 2025-01-24 18:22:57 +00:00
Arnaud Cogoluègnes a67634fcf1
Merge pull request #13146 from rabbitmq/queue-purge-return-not-found-if-queue-does-not-exist
Return 404 in AMQP management queue purge for non-existing queue
2025-01-24 15:59:17 +00:00
Jean-Sébastien Pédron aeca23c69d
amqp_client_SUITE: Fix several test flakes
[How]
1. Use feature flags correctly: the code shouldn't test if a feature
   flag is enabled, assuming something else enabled it. It should enable
   it and react to an error.
2. Use `close_connection_sync/1` instead of the asynchronous
   `amqp10_client:close_connection/1` to make sure they are really
   closed. The wait in `end_per_testcase/2` was not enough apparently.
3. For the two testcases that flake the most for me, enclose the code in
   a try/after and make sure to close the connection at the end,
   regardless of the result. This should be done for all testcases
   because the testgroup use a single set of RabbitMQ nodes for all
   testcases, therefore testcases are supposed to clean up after them...
2025-01-24 15:38:11 +01:00
Arnaud Cogoluègnes 7bfe2fd66f
Return 404 in AMQP management queue purge for non-existing queue 2025-01-24 14:57:44 +01:00
David Ansari 1267d5986d Simplify Direct Reply-To
This commit is no change in functionality and mostly deletes dead code.

1. Code targeting Erlang 22 and below is deleted since the mininmum
   required Erlang version is higher nowadays.
   "In OTP 23 distribution flag DFLAG_BIG_CREATION became mandatory. All
   pids are now encoded using NEW_PID_EXT, even external pids received
   as PID_EXT from older nodes."
   https://www.erlang.org/doc/apps/erts/erl_ext_dist.html#new_pid_ext
2. All v1 encoding and decoding of the Pid is deleted since the lower
   version RabbitMQ node supports the v2 encoding nowadays.
2025-01-23 19:16:30 +01:00
Karl Nilsson 2f89bd9122
Merge pull request #13095 from rabbitmq/qq-resend-pending-commands-on-applied
QQ: resend pending commands when new leader detected on applied notif…
2025-01-22 20:43:31 +01:00
Karl Nilsson d31b9aa8a3 QQ: resend pending commands when new leader detected on applied notification.
When a leader changes all enqueuer and consumer processes are notified
from the `state_enter(leader,` callback. However a new leader may not
yet have applied all commands that the old leader had. If any of those
commands is a checkout or a register_enqueuer command these processes
will not be notified of the new leader and thus may never resend their
pending commands.

The new leader will however send an applied notification when it does
apply these entries and these are always sent from the leader process
so can also be used to trigger pending resends. This commit implements
that.
2025-01-22 13:38:21 +00:00
David Ansari 3a65695d0a Support exchange federation with MQTT 5.0 subscribers
## What?
This commit fixes #13040.

Prior to this commit, exchange federation crashed if the MQTT topic exchange
(`amq.topic` by default) got federated and MQTT 5.0 clients subscribed on the
downstream. That's because the federation plugin sends bindings from downstream
to upstream via AMQP 0.9.1. However, binding arguments containing Erlang record
`mqtt_subscription_opts` (henceforth binding args v1) cannot be encoded in AMQP 0.9.1.

 ## Why?
Federating the MQTT topic exchange could be useful for warm standby use cases.

 ## How?
This commit makes binding arguments a valid AMQP 0.9.1 table (henceforth
binding args v2).

Binding args v2 can only be used if all nodes support it. Hence binding
args v2 comes with feature flag `rabbitmq_4.1.0`. Note that the AMQP
over WebSocket
[PR](https://github.com/rabbitmq/rabbitmq-server/pull/13071) already
introduces this same feature flag. Although the feature flag subsystem
supports plugins to define their own feature flags, and the MQTT plugin
defined its own feature flags in the past, reusing feature flag
`rabbitmq_4.1.0` is simpler.

This commit also avoids database migrations for both Mnesia and Khepri
if feature flag `rabbitmq_4.1.0` gets enabled. Instead, it's simpler to
migrate binding args v1 to binding args v2 at MQTT connection establishment
time if the feature flag is enabled. (If the feature flag is disabled at
connection etablishment time, but gets enabled during the connection
lifetime, the connection keeps using bindings args v1.)

This commit adds two new suites:
1. `federation_SUITE` which tests that federating the MQTT topic
   exchange works, and
2. `feature_flag_SUITE` which tests the binding args migration from v1 to v2.
2025-01-22 11:04:36 +01:00
GitHub a3d0a5af4f bazel run gazelle 2025-01-22 04:02:32 +00:00
Arnaud Cogoluègnes 31a4d611f1
Emit events on stream consume and cancel 2025-01-20 17:24:55 +01:00
Arnaud Cogoluègnes 1886caec0a
Merge pull request #13092 from rabbitmq/stream-consumer-cancel-event
Emit cancellation event only when stream consumer is cancelled
2025-01-17 14:13:08 +00:00
Arnaud Cogoluègnes 69d0382dd2
Emit cancellation event only when stream consumer is cancelled
Not when the channel or the connection is closed.

References #13085, #9356
2025-01-17 14:42:40 +01:00
Michal Kuratczyk 3ff90deb7c
Merge pull request #13069 from rabbitmq/update-startup-checks
Remove deprecated/unused/old startup checks
2025-01-17 14:28:50 +01:00
Michal Kuratczyk 14171fb035
Remove msg_store_io_batch_size and msg_store_credit_disc_bound checks
msg_store_io_batch_size is no longer used

msg_store_credit_disc_bound appears to be used in the code, but I don't
see any impact of that value on the performance. It should be properly
investigated and either removed completely or fixed, because there's
hardly any point in warning about the values configured
(plus, this settings is hopefully almost never used anyway)
2025-01-17 13:38:43 +01:00
Michal Kuratczyk 954b861db7
Don't warn about dirty I/O scheduler count 2025-01-16 17:42:13 +01:00
Péter Gömöri efd4e45ed8 Fix return value of `rabbit_priority_queue:delete_crashed/1`
According to the `rabbit_backing_queue` behavious it must always
return `ok`, but it used to return a list of results one for each
priority. That caused the below crash further up the call chain.

```
> rabbit_classic_queue:delete_crashed(Q)
** exception error: no case clause matching [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
     in function  rabbit_classic_queue:delete_crashed/2 (rabbit_classic_queue.erl, line 516)
```

Other backing_queue implementations (`rabbit_variable_queue`) just
exit with a badmatch upon error.

This (very minor) issue is present since 3.13.0 when
`rabbit_classic_queue:delete_crashed_in_backing_queue/1` was
instroduced with Khepri in commit 5f0981c5. Before that the result of
`BQ:delete_crashed/1` was simply ignored.
2025-01-16 17:34:19 +01:00
Arnaud Cogoluègnes 114a5c220f
Delete stream consumer metrics when AMQP 091 connection closes (#13085)
To avoid rogue consumer records.
2025-01-16 15:40:06 +01:00
Karl Nilsson e7c624dd46 QQ: improve fifo client log message on leader change
to capture the number of pending commands that will be resent
2025-01-16 09:36:23 +00:00
David Ansari 290889b936 Include sessions in format_status/1
Include monitored session pids in format_status/1 of rabbit_amqp_writer.
They could be useful when debugging.
The maximum number of sessions per connection is limited, hence the
output won't be too large.
2025-01-16 10:06:24 +01:00
Jean-Sébastien Pédron 57ed962ef6
rabbitmq_ct_helpers: Fix how we set `$RABBITMQ_FEATURE_FLAGS` in tests
[Why]
In order to make `khepri_db` the default in the future, the handling of
`$RABBITMQ_FEATURE_FLAGS` had to be adapted to be able to *disable*
Khepri instead.

Unfortunately I broke the behavior with stable feature flags that are
only available in the primary umbrella. In this case, they were
automatically enabled and thus, clustering with an old umbrella that did
not have these feature flags failed with `incompatible_feature_flags`.

[How]
The solution is to always use an absolute list of feature flags, not the
new relative list.

V2: Allow a testsuite to skip the configuration of the metadata store.
    This is needed for the feature_flags_SUITE testsuite because it
    tests the default behavior and the configuration of the metadata
    store changes that behavior.

    While here, fix a ct log message where variables were swapped
    compared to the format strieg expectation.

V3: Enable `rabbitmq_4.0.0` feature flag in rabbit_mgmt_http_SUITE. This
    testsuite apparently requires it and if it's not enabled, it fails.
2025-01-15 20:43:41 +01:00
Michal Kuratczyk a4634d3f70
Allow InitialCredit/MoreCreditAfter of zero (#13067)
https://github.com/rabbitmq/rabbitmq-server/pull/13046
introduced additional checks which prevent setting
`{credit_flow_default_credit,{0,0}}`.

Setting credits to zero allows disabling the credit flow mechanism
(we use it in our benchmarks and mention for example in
https://www.rabbitmq.com/blog/2023/03/21/native-mqtt)
2025-01-14 16:12:05 +01:00
Michal Kuratczyk 1077a55194
Stream queue: consumers are active by default
Without this change, consumers using protocols other than the stream
protocol would display as inactive in the Management UI/API and CLI
commands, even though they were receiving messages.
2025-01-13 15:51:39 +01:00
Michael Klishin 208d3d6b59
Follow-up to #13046 #13055: accept MoreCreditAfter that's equal to InitialCredit 2025-01-12 16:48:53 -05:00
Michael Klishin 3dd3433722
Finish off #13046 2025-01-11 19:15:00 -05:00
Michael Klishin e5fe7247dc
rabbitmq.conf.example: a typo #8076 2025-01-11 17:36:51 -05:00
Michael Klishin 82c93ceb23
rabbitmq.conf.example: document quorum_queue.property_equivalence.relaxed_checks_on_redeclaration #8076 2025-01-11 17:36:50 -05:00
Michael Klishin 4603d3597e
rabbitmq.conf.example: suggest Discussions and Discord for questions 2025-01-11 17:36:50 -05:00
jimmy 0135f62528 Fix typo 2025-01-11 14:15:35 +08:00
jimmy 2394d427e3 Fix typo 2025-01-11 13:19:31 +08:00
jimmy e6954c8720 Change to throw an error when credit_flow_default_credit invalid 2025-01-11 13:13:12 +08:00
root e589c42476 Change to throw an error when credit_flow_default_credit invalid 2025-01-11 13:10:17 +08:00
JimmyWang6 81648759f8 Fix typo 2025-01-10 15:50:38 +08:00
jimmy wang 5b244e7504 Fix MoreCreditAfter could larger than InitialCredit 2025-01-10 14:42:10 +08:00
Michael Klishin 1d88a9d28f
definition_import_SUITE: a new case
that features a deletion-protected virtual host.
2025-01-03 19:07:57 -05:00
Michael Klishin 315c247231
Make sure protected_from_deletion is included into virtual host definitions exported over the HTTP API 2025-01-03 18:49:11 -05:00
Michael Klishin 3f5b13d47f
Merge branch 'main' into mk-virtual-host-protection-from-accidental-deletion 2025-01-02 17:01:54 -05:00
Michael Klishin f62d46c286
Introduce a way to protect a virtual host from deletion
Accidental "fat finger" virtual deletion accidents
would be easier to avoid if there was a protection mechanism
that would apply equally even to CLI tools and external
applications that do not use confirmations for deletion
operations.

This introduce the following changes:

 * Virtual host metadata now supports a new queue,
   'protected_from_deletion', which, when set,
   will be considered by key virtual host deletion function(s)
 * DELETE /api/vhosts/{name} was adapted to handle
   such blocked deletion attempts to respond with
   a 412 Precondition Failed status
 * 'rabbitmqctl list_vhosts' and 'rabbitmqctl delete_vhost'
   were adapted accordingly
 * DELETE /api/vhosts/{name}/deletion/protection
   is a new endpoint that can be used to remove
   the protective seal (the metadata key)
 * POST /api/vhosts/{name}/deletion/protection
   marks the virtual host as protected

In the case of the HTTP API, all operations on
virtual host metadata require administrative
privileges from the target user.

Other considerations:

 * When a virtual host does not exist, the behavior
  remains the same: the original, protection-unaware
  code path is used to preserve backwards compatibility

References #12772.
2025-01-02 16:50:51 -05:00
Michael Klishin ac7dcc9abe
Merge pull request #13010 from johanrhodin/link-fix-patch-1
Update doc link
2025-01-02 12:19:04 -05:00
Johan Rhodin e35edf789d
Update doc link 2025-01-02 10:53:19 -06:00
Michael Klishin 2aed29709e
mirrored_supervisor_SUITE: don't search logs for exceptions #13008 2025-01-02 11:05:49 -05:00
Michael Klishin 968eefa1bb
Bump (c) line year
There are no functional changes to this massive diff.
2025-01-01 17:54:10 -05:00
David Ansari 42ede4a258 Speed up tests
Multiple test cases were recently slowed down by up to 30 seconds.
This commit reverts these changes.
2024-12-30 16:56:18 +00:00
Michal Kuratczyk 68de3fdb77
Fix channel crash when publishing to a new stream (#12969)
The following scenario led to a channel crash:
1. Publish to a non-existing stream: `perf-test -y 0 -p -e amq.default -t direct -k stream`
2. Declare the stream: `rabbitmqadmin declare queue name=stream queue_type=stream`

There is no pid yet, so we got a function_clause with `none`
```
{function_clause,
   [{osiris_writer,write,
        [none,<0.877.0>,<<"<0.877.0>_-65ZKFz18ll5lau0phi7CsQ">>,1,
         [[0,"Sp",[192,6,5,"B@@AC"]],
          [0,"Sr",
           [193,38,4,
            [[[163,10,<<"x-exchange">>],[161,0,<<>>]],
             [[163,13,<<"x-routing-key">>],[161,6,<<"stream">>]]]]],
          [0,"Su",[160,12,[<<0,19,252,1,0,0,98,171,20,16,108,167>>]]]]],
        [{file,"src/osiris_writer.erl"},{line,158}]},
    {rabbit_stream_queue,deliver0,4,
        [{file,"rabbit_stream_queue.erl"},{line,540}]},
    {rabbit_stream_queue,'-deliver/3-fun-0-',4,
        [{file,"rabbit_stream_queue.erl"},{line,526}]},
    {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
    {rabbit_queue_type,'-deliver0/4-fun-5-',5,
        [{file,"rabbit_queue_type.erl"},{line,707}]},
    {maps,fold_1,4,[{file,"maps.erl"},{line,860}]},
    {rabbit_queue_type,deliver0,4,
        [{file,"rabbit_queue_type.erl"},{line,704}]},
    {rabbit_queue_type,deliver,4,
        [{file,"rabbit_queue_type.erl"},{line,662}]}]}
```

Co-authored-by: Karl Nilsson <kjnilsson@gmail.com>
2024-12-20 08:56:25 +01:00
Jean-Sébastien Pédron ea2c8db2d1
rabbit_feature_flags: Add testcase after issue #12963
[Why]
Up-to RabbitMQ 3.13.x, there was a case where if:
1. you enabled a plugin
2. you enabled its feature flags
3. you disabled the plugin
4. you restarted a node (or upgraded it)

... the node could crash on startup because it had a feature flag marked
as enabled that it didn't know about:

    error:{badmatch,#{feature_flags => ...

        rabbit_ff_controller:-check_one_way_compatibility/2-fun-0-/3, line 514
        lists:all_1/2, line 1520
        rabbit_ff_controller:are_compatible/2, line 496
        rabbit_ff_controller:check_node_compatibility_task1/4, line 437
        rabbit_db_cluster:check_compatibility/1, line 376

This was "fixed" by the new way of keeping the registry in memory
(#10988) because it introduces a slight change of behavior. Indeed, the
old way walked through the `FeatureFlags` map and looked up the state in
the `FeatureStates` map to create the `is_enabled/1` function. The new
way just looks up the state in `FeatureStates`.

[How]
The new testcase succeeds on 4.0.x and `main`, but would fail on 3.13.x
with the aforementionne crash.
2024-12-19 16:33:43 +01:00
Jean-Sébastien Pédron 3325def8eb
rabbit_feature_flags: Take callback definition from correct node
[Why]
The feature flag controller that is responsible for enabling a feature
flag may be on a node that doesn't know this feature flag. This is
supported by there is a bug when it queries the callback definition for
that feature flag: it uses its own registry which does not have anything
about this feature flag.

This leads to a crash because the `run_callback/5` funtion tries to use
the `undefined` atom returned by the registry as a map:

    crasher:
      initial call: rabbit_ff_controller:init/1
      pid: <0.374.0>
      registered_name: rabbit_ff_controller
      exception error: bad map: undefined
        in function  rabbit_ff_controller:run_callback/5
        in call from rabbit_ff_controller:do_enable/3 (rabbit_ff_controller.erl, line 1244)
        in call from rabbit_ff_controller:update_feature_state_and_enable/2 (rabbit_ff_controller.erl, line 1180)
        in call from rabbit_ff_controller:enable_with_registry_locked/2 (rabbit_ff_controller.erl, line 1050)
        in call from rabbit_ff_controller:enable_many_locked/2 (rabbit_ff_controller.erl, line 991)
        in call from rabbit_ff_controller:enable_many/2 (rabbit_ff_controller.erl, line 979)
        in call from rabbit_ff_controller:updating_feature_flag_states/3 (rabbit_ff_controller.erl, line 307)
        in call from gen_statem:loop_state_callback/11 (gen_statem.erl, line 3735)

[How]
The callback definition is now queried from the first node in the list
given as argument. For the common use case where all nodes know about a
feature flag, the first node is the local one, so there should be no
latency caused by the RPC.

See #12963.
2024-12-19 13:45:27 +01:00
Jean-Sébastien Pédron dbec429fba
rabbit_feature_flags: Fix function name in the controller
[Why]
`state_after_virtual_state()` meant nothing.

`state_after_virtual_reset()` was the name I had in mind.
2024-12-19 11:54:25 +01:00
Jean-Sébastien Pédron debe2a118c
rabbitmq_ct_helpers: Change how Mnesia/Khepri is selected
[Why]
Once `khepr_db` is enabled by default, we need another way to disable it
to select Mnesia instead.

[How]
We use the new relative forced feature flags mechanism to indicate if we
want to explicitly enable or disable `khepri_db`. This way, we don't
touch other stable feature flags and only mess with Khepri.

However, this mechanism is not supported by RabbitMQ 4.0.x and older.
They will ignore the setting. Therefore, to make this work in
mixed-version testing, we set the `$RABBITMQ_FEATURE_FLAGS` variable for
the secondary umbrella. This part will go away once we test against
RabbitMQ 4.1.x as the secondary umbrella in the future.

At the end, we compare the effective metadata store to the expected one.
If they don't match, we skip the test.

While here, change `rjms_topic_selector_SUITE` to only choose Khepri
without specifying any feature flags.
2024-12-17 09:56:54 +01:00
Michael Klishin 0db3d7b014
Merge pull request #12950 from rabbitmq/qq-handle-tick
Quorum queues: ignore handle_tick with an old overview format
2024-12-16 11:31:55 -05:00
Michael Klishin 62ce1c954a
Merge pull request #12948 from rabbitmq/fix-flakes
Test fixes for a few more CI flakes
2024-12-16 11:24:10 -05:00
Diana Parra Corbacho a97ec92785 Quorum queues: ignore handle_tick with an old overview format
If handle_tick is called before the machine has finished the upgrade
process, it could receive an old overview format (stats tuple vs map).
Let's ignore it and the next handle tick should be fine.

Unlikely to happen in production, detected on CI with a very low tick timeout
2024-12-16 15:39:39 +01:00
Diana Parra Corbacho fe7a141331 Test: Increase receive timeout in all rabbit test suites 2024-12-16 11:58:05 +01:00
GitHub 0d750769f9 bazel run gazelle 2024-12-14 04:02:32 +00:00
David Ansari b6027ece28 Fix dead lettering crash
Fixes #12933

The assumption that `x-last-death-*` annotations must have been set
whenever the `deaths` annotation is set was wrong.

Reproducation steps, Option 1:
1. In v3.13.7, dead letter a message from Q1 to Q2 (both can be classic queues).
2. Re-publish the message including its x-death header from Q2 back to Q1.
(RabbitMQ 3.13.7 will interpret this x-death header and set the deaths annotation.)
3. Upgrade to v4.0.4
4. Dead letter the message from Q1 to Q2 will cause the following crash:
```
crasher:
  initial call: rabbit_amqqueue_process:init/1
  pid: <0.577.0>
  registered_name: []
  exception exit: {{badkey,<<"x-last-death-exchange">>},
                   [{mc,record_death,4,[{file,"mc.erl"},{line,410}]},
                    {rabbit_dead_letter,publish,5,
                        [{file,"rabbit_dead_letter.erl"},{line,38}]},
                    {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',
                        7,
                        [{file,"rabbit_amqqueue_process.erl"},{line,1060}]},
                    {rabbit_variable_queue,'-ackfold/4-fun-0-',3,
                        [{file,"rabbit_variable_queue.erl"},{line,655}]},
                    {lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
                    {rabbit_variable_queue,ackfold,4,
                        [{file,"rabbit_variable_queue.erl"},{line,652}]},
                    {rabbit_priority_queue,ackfold,4,
                        [{file,"rabbit_priority_queue.erl"},{line,309}]},
                    {rabbit_amqqueue_process,
                        '-dead_letter_rejected_msgs/3-fun-0-',5,
                        [{file,"rabbit_amqqueue_process.erl"},
                         {line,1038}]}]}
```

Reproduction steps, Option 2:
1. Run a 4.0.4 / 3.13.7 mixed version cluster where both queues Q1 and Q2
   are hosted on the 4.0.4 node.
2. Send a message to Q1 which dead letters to Q2.
3. Re-publish a message with the x-death AMQP 0.9.1 header from Q2 to
   Q1. However, this time make sure to publish to the 3.13.7 node which
   forwards this message to Q1 on the 4.0.4 node.
4. Subsequently dead lettering this message from Q1 to Q2 (happening on
   the 4.0.4 node) will also cause the crash.

The modified test case in this commit was able to repro this crash via
Option 2 in the mixed version cluster tests on the `v4.0.x` branch.
2024-12-13 19:25:43 +01:00
Matteo Cafasso 8d7535e0b1
amqqueue_process: adopt new `is_duplicate` backing queue callback
As the de-duplication plugin is the only adopter of the `is_duplicate`
callback, we now use a simpler signature.

When a message is deemed duplicated, we discard it and re-route it to
dead letter exchange.

Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit f93baa35cb)
2024-12-11 19:43:45 -05:00
Matteo Cafasso 6a6e760107
backing_queue: simplify `is_duplicate` callback signature
`is_duplicate` callback signature was changed in order to support both
the mirroring queues as well as the de-duplication ones.

As the mirroring queues are now deprecated and removed, we can fall
back to a simpler boolean as return value.

Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit c927446e17)
2024-12-11 19:43:38 -05:00
David Ansari 9d8ae14e27 Use correct AMQP filter expression string modifier prefix
Section 4.1.1 of AMQP Filter Expressions Working Draft 09
defines `&` (ampersand) instead of `$` (dollar) as the string modifier prefix.
2024-12-11 16:48:56 +01:00
Michael Klishin b84483ab5c
Merge pull request #12907 from rabbitmq/rabbitmq-server-12906
By @gomoripeti: Restore credit_flow between AMQP 0.9.1 channel/MQTT connection -> CQ processes
2024-12-10 10:03:47 -05:00
David Ansari 0d34ef6047 Set a floor of zero for incoming-window
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
  initial call: rabbit_amqp_writer:init/1
  pid: <0.19353.0>
  registered_name: []
  exception error: bad argument
    in function  iolist_size/1
       called as iolist_size([<<112,0,0,23,120>>,
                              [82,-15],
                              <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
                              <<112,0,0,23,120>>,
                              "Rª",64,64,64,64])
       *** argument 1: not an iodata term
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
    in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
    in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
    in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
    in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
    in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
    in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```

This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.

Fixes #12816
2024-12-10 09:39:21 +01:00
Péter Gömöri 2c1f1a1387
Restore credit_flow between channel/MQTT connection -> CQ processes
The credit_flow between publishing AMQP 0.9.1 channel (or MQTT
connection) and (non-mirrored) classic queue processes was
unintentionally removed in 4.0 together with anything else related to
CQ mirroring.

By default we restore the 3.x behaviour for non-mirored classic
queues. It is possible to disable flow-control (the earlier 4.0.x
behaviour) with the new env `classic_queue_flow_control`. In 3.x this
was possible with the config `mirroring_flow_control`.

(cherry picked from commit d65bd7d07a)
2024-12-09 22:33:47 -05:00
Jean-Sébastien Pédron 56f90a51a9
rabbit_db: Return error from `force_boot_command_test/0` with Khepri 2024-12-02 13:33:08 +01:00
Jean-Sébastien Pédron df9882417c
rabbit_khepri: Report no partitions from `cli_cluster_status/0` 2024-11-29 16:53:55 +01:00
Jean-Sébastien Pédron 4621fe7730
mirrored_supervisor: Catch timeout from Khepri in `hanlde_info/2`
[Why]
The code assumed that the transaction would always succeed. It was kind
of the case with Mnesia because it would throw an exception if it
failed.

Khepri returns an error instead. The code has to handle it. In
particular, we see timeouts in CI and before this patch, they caused a
crash because the list comprehension was asked to work on a tuple.

[How]
We now retry a few times for 10 seconds.
2024-11-29 12:03:59 +01:00
Jean-Sébastien Pédron 913bd9fa42
rabbit_db: Fix `rabbit_db_msup:update_all/2` spec
[Why]
It can return an error.
2024-11-29 12:03:35 +01:00
Jean-Sébastien Pédron ae9fbb7bd5
Pin Horus to 0.3.1 temporarily
[Why]
We pin a version of Horus even if we don't use it directly (it is a
dependency of Khepri). But currently, we can't update Khepri while still
needing the fix in Horus 0.3.1.

Horus 0.3.1 works around a crash in `cover` that mostly affects CI for
now.

This pinning will have to go away with the next update of Khepri.
2024-11-29 09:50:08 +01:00
Jean-Sébastien Pédron 99d8e90df3
rabbit_quorum_queue: Wait for member add in `add_member/4`
[Why]
The `ra:member_add/3` call returns before the change is committed. This
is ok for that addition but any follow-up changes to the cluster might
be rejected with the `cluster_change_not_permitted` error.

[How]
Instead of changing other places to wait or retry their cluster
membership change, this patch waits for the current add to be applied
before proceeding and returning.

This fixes some transient failures in CI where such follow-up changes
are rejected and not retried, leaving the cluster in an unexpected state
for the testcase.

An example is with
`quorum_queue_SUITE:force_shrink_member_to_current_member/1`
2024-11-28 11:27:40 +01:00
Michal Kuratczyk 46259b5a48
Fix invalid warning about transient queues being used
This fixes the issue where RabbitMQ would warn about
transient queues being used in a cluster with no transient queues.

Fixes https://github.com/rabbitmq/rabbitmq-server/issues/12802
2024-11-27 22:04:01 +01:00
Michal Kuratczyk 1552f89dd7
Skip are_transient_nonexcl_used check on virin node
This check fails on a virin node, because the metadata store
is not yet ready to handle the query. However, a virin
node by definition can't have any queues, so let's just return
false without asking.
2024-11-27 22:03:53 +01:00
Michael Klishin 1cae417dbf
Merge pull request #12821 from rabbitmq/rabbitmq-server-12776
Definition export: inject default queue type into virtual host metadata
2024-11-27 14:53:25 -05:00
Michael Klishin 8a5ea76fe4
Inject DQT into 'ctl export_definitions' 2024-11-27 12:29:48 -05:00
Diana Parra Corbacho d004d69200 Tests: feature_flags_v2_SUITE ignore peer:stop/1 return value 2024-11-27 15:45:58 +01:00
Michael Klishin 090d11818f
HTTP API tests for injected default queue type 2024-11-26 18:00:37 -05:00
Michael Klishin 51e6004840
Inject DQT into GET /api/definitions and /api/vhosts
References #12776
2024-11-26 02:04:30 -05:00
Jean-Sébastien Pédron f6314d06b3
rabbit_peer_discovery: Retry RPC calls
[Why]
In CI, we observe some timeouts in the Erlang distribution connections
between the temporary hidden node and the nodes it queries. This affects
peer discovery obviously.

[How]
We introduce some query retries to reduce the risk of an incomplete
query.

While here, we move the sorting of queried nodes from the
`query_node_props2/3` last clause (executed in the temporary hidden
node) to the function setting the temporary hidden node and asking for
these queries. This way the debug messages from that sorting are logged
by RabbitMQ out of the box.
2024-11-25 16:16:16 +01:00
Jean-Sébastien Pédron 4d4985f254
rabbit_peer_discovery: Fix non-tail-recursive `query_node_props2()`
[Why]
This impacts what is reported by the catch because it caught exceptions
emitted by code supposedly called later. An example is the assert
in `query_node_props2/3` last clause.
2024-11-25 16:16:15 +01:00
Jean-Sébastien Pédron 62f22a7655
rabbit_peer_discovery: Remove the use of group leader proxy
[Why]
This was the first solution put in place to prevent that the temporary
hidden node connects to the node that started it to write any printed
messages. Because of this, the nodes that the temporary hidden node
queried found out about the parent node and they opened an Erlang
distribution connection to it. This polluted the known nodes list.

However later, the temporary hidden node was started with the
`standard_io` connection option. This prevented the temporary hidden
node from knowing about the node that started it, solving the problem in
a cleaner way.

[How]
This commit garbage-collects that piece of code that is now useless. It
makes the query code way simpler to understand.
2024-11-25 16:16:12 +01:00
D Corbacho 1fa4fe2735
Merge pull request #12775 from rabbitmq/fix-flakes
Fixes for test flakes
2024-11-25 16:12:29 +01:00
Jean-Sébastien Pédron fe2061b13b
quorum_queue_member_reconciliation_SUITE: Improve `reset_nodes/2`
[How]
The function now accepts that the node to reset is already out of the
cluster. This avoids a mismatch exception for a situation that is ok.
2024-11-25 12:55:26 +01:00
Jean-Sébastien Pédron 03f9d36988
rabbit_vhosts: Don't reconcile vhosts if `rabbit` is stopped
[Why]
That timer was started during boot and continued regardless if `rabbit`
was running or stopped.

This caused the reconsiliation to crash if the `rabbit` app was stopped
before the it ended because it tried to access the database even though
it was stopped or even reset.

[How]
We just check if `rabbit` is running before running one reconciliation
and scheduling a new one.
2024-11-25 12:39:13 +01:00
Diana Parra Corbacho 73924ba08e Tests: amqp_client_SUITE delete all queues on end per testcase 2024-11-25 09:06:33 +01:00
Diana Parra Corbacho a35f56fdc2 Tests: amqp_filtex_SUITE wait for link attachment and longer timeouts 2024-11-25 09:06:32 +01:00
GitHub 4e8d0f3ac2 bazel run gazelle 2024-11-23 04:02:38 +00:00
Michael Davis c3c7675bda
rabbit_khepri: Add macros for path patterns 2024-11-22 11:21:11 -05:00
Michael Davis e8fb9b6889
rabbit: Move include/{khepri.hrl => rabbit_khepri.hrl}
This fixes erlang_ls's header resolution. Previously it would confuse
the include_lib of the `khepri.hrl` from Khepri with this header in
the rabbit app.

This header is also specific to how rabbit uses Khepri so I think the
new name fits better.
2024-11-22 11:21:11 -05:00
Michael Klishin ea58fb1b48
crashing_queues_SUITE: squash a compiler warning 2024-11-17 17:23:00 -05:00
Michael Klishin 9f026f7a4b
Merge pull request #12727 from rabbitmq/rabbitmq-server-12709
By @Ayanda-D: Ensure only alive leaders and followers when fetching QQ replica states
2024-11-15 13:53:41 -05:00
Ayanda Dube 53cc8f8f2b
Update unit_quorum_queue_SUITE to use temporary alive & registered
test queue processes (since we now check/return only alive members
when fetching replica states)

(cherry picked from commit ebc0387b81)
2024-11-15 12:49:55 -05:00
David Ansari 6e8b566323 Deduplicate AMQP type inference
Introduce a single place in the AMQP 1.0 Erlang client that infers the AMQP 1.0 type.

Erlang integers are inferred to be AMQP type `long` to avoid overflow surprises.
2024-11-15 17:40:36 +01:00
Jean-Sébastien Pédron 2938338182
rabbit_khepri: Do not hard-code `coordination`, use the constant instead 2024-11-15 16:41:16 +01:00
Jean-Sébastien Pédron 05717ccccf
rabbit_khepri: Remove serial file during reset 2024-11-15 16:40:50 +01:00
Jean-Sébastien Pédron e41d766b29
rabbit_khepri: Ensure RabbitMQ is stopped before resetting with Khepri 2024-11-15 16:40:45 +01:00
Jean-Sébastien Pédron 7e2e7b79f2
rabbit_feature_flags: Support relative setting in `forced_feature_flags_on_init`
[Why]
We already support that from the environment variable, it is easy to add
to the configuration setting.
2024-11-15 14:50:35 +01:00
Michael Klishin 3e509c9f30
Merge pull request #12714 from rabbitmq/amqp-event-exchange
Support publishing AMQP 1.0 to Event Exchange
2024-11-14 18:09:19 -05:00
Ayanda Dube 3ecb3b61d4
Use whereis/1 instead of rabbit_process helper, and lists:filtermap/2 in
rabbit_quorum_queue:all_replica_states/0

(cherry picked from commit 19cc2d0608)
2024-11-14 14:05:17 -05:00
Ayanda Dube 6bb4c89c71
Add test for rabbit_quorum_queue:all_replica_states/0
and ensure non-existent/inactive/noproc QQ members are
not reported.

(cherry picked from commit 4e2c62b6af)
2024-11-14 14:05:12 -05:00
Ayanda Dube 9070e394d3
Ensure only alive QQ replica states are reported
when checking replica states to help avoid missing
inactive replicas e.g. on QQ checks from cli tools

(cherry picked from commit 491485092c)
2024-11-14 14:05:08 -05:00
Michael Klishin 15d3d5a8a1
Merge pull request #12674 from rabbitmq/add-is_feature_used-callback-to-transient_nonexcl_queues-depr-feature
rabbit_amqqueue: Add `is_feature_used` callback to `transient_nonexcl_queues` depr. feature
2024-11-14 13:57:32 -05:00
Michael Klishin c888689cca
Merge pull request #12722 from rabbitmq/fix-flakes
Fix flakes
2024-11-14 13:36:17 -05:00
Loïc Hoguin db50739ad8
CQ: Fix flakes in the store file scan test
We don't expect random bytes to be there in the current
version of the message store as we overwrite empty spaces
with zeroes when moving messages around.

We also don't expect messages to be false flagged when
the broker is running because it checks for message
validity in the index. Therefore make sure message bodies
in the tests don't contain byte 255.
2024-11-14 15:04:49 +01:00
Diana Parra Corbacho 6e7269994d Tests: per_node_limit_SUITE cleanup
Catch exceptions when closing connections during cleanup
2024-11-14 15:02:47 +01:00
Diana Parra Corbacho 5ef4fba851 tests: amqp_client_SUITE longer wait on receive for CI 2024-11-14 15:02:47 +01:00
Diana Parra Corbacho 2d025b579b Tests: amqpl_consumer_ack use unmanaged connection 2024-11-14 15:02:47 +01:00
David Ansari de804d1fa7 Support publishing AMQP 1.0 to Event Exchange
## What?

Prior to this commit, the `rabbitmq_event_exchange` internally published
always AMQP 0.9.1 messages to the `amq.rabbitmq.event` topic exchange.
This commit allows users to configure the plugin to publish AMQP 1.0
messages instead.

 ## Why?

Prior to this commit, when an AMQP 1.0 client consumed events,
event properties that are lists were omitted. For example property
`client_properties` of event `connection.created` or property
`arguments` of event `queue.created` were omitted because of the following sequence:
1. The event exchange plugins listens for all kind of internal events.
2. The event exchange plugin re-publishes all events as AMQP 0.9.1 message to the event exchange.
3. Later, when an AMQP 1.0 client consumes this message, the broker must translate the message from AMQP 0.9.1 to AMQP 1.0.
4. This translation follows the rules outlined in https://www.rabbitmq.com/docs/conversions#amqpl-amqp
5. Specifically, in this table the row before the last one describes the rule we're hitting here. It says that if the AMQP 0.9.1
header value is not an `x-` prefixed header and its value is an array or table, then this header is not converted.
That's because AMQP 1.0 application-properties must be simple types as mandated in https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-application-properties

 ## How?

The user can configure the plugin as follows to have the plugin
internally publish AMQP 1.0 messages:
```
event_exchange.protocol = amqp_1_0
```

To support complex types such as lists, the plugin sets all event
properties as AMQP 1.0 message-annotations. The plugin prefixes all message
annotation keys with `x-opt-` to comply with the AMQP 1.0 spec.

 ## Alternative Design

An alternative design would have been to format all event properties
e.g. as JSON within the message body. However, this breaks routing on
specific event property values via a headers exchange.

 ## Documentation
https://github.com/rabbitmq/rabbitmq-website/pull/2129
2024-11-14 12:52:09 +01:00
Karl Nilsson bfa293ab3b QQ: reduce memory use when dropping many messages at once.
As may happen when a max_length configuration change is made
when there are many messages on the queue.
2024-11-13 09:07:40 +00:00
Michael Klishin aba62b9d12
Mention node_tags #12702 in rabbitmq.conf.example 2024-11-11 22:56:47 -05:00
Simon Unge 3d35416635 Node tags local to broker, add to /api/overview output and ctl status command 2024-11-11 20:49:21 +00:00
Jean-Sébastien Pédron 638e3a4b08
rabbit_amqqueue: Add `is_feature_used` callback to `transient_nonexcl_queues` depr. feature
[Why]
Without this callback, the deprecated features subsystem can't report if
the feature is used or not.

This reduces the usefulness of the HTTP API endpoint or the CLI command
that help verify if a cluster is using deprecated features.

[How]
The callback counts transient non-exclusive queues and return `true` if
there are one or more of them.

References #12619.
2024-11-11 15:57:52 +01:00
Michael Klishin b43a7263f5
List cluster_tags in rabbitmq.conf.example #12552 #12659 #12699 2024-11-10 20:26:24 -05:00
Michael Klishin f5801be6db
Merge pull request #12659 from rabbitmq/su_aws/cluster_tag
Make it possible to set some cluster metadata besides the name using tags
2024-11-10 19:17:53 -05:00
Michael Klishin 7c66fba0c3
Make it possible to clear cluster_tags via rabbitmq.conf 2024-11-10 14:38:34 -05:00
Michael Klishin 9e649aefc0
We no longer use 'maybe' in this module 2024-11-10 14:35:14 -05:00
Michael Klishin e5d805ea6d
Cluster tags: set unconditionally
Otherwise once set, it would not be possible
to change them by updating rabbitmq.conf
2024-11-10 14:30:51 -05:00
Michael Klishin 6b614fc879
rabbitmq.conf.example: add management.http.max_body_size 2024-11-09 18:02:16 -05:00
Simon Unge eeea517da5 Store tags in global parameters 2024-11-08 21:41:49 +00:00
Simon Unge f5ef64ad06 Add cluster tag config that is exposed via HTTP /api/overview and CTL cluster_status 2024-11-08 21:05:02 +00:00
Arnaud Cogoluègnes ca70f20bca
Merge pull request #12686 from rabbitmq/stream-coordinator-ra-local-query-infinity-timeout
Use infinity timout for RA local query in stream coordinator
2024-11-08 15:08:16 +01:00
David Ansari c839409599 Fix test flake properties_section
This test flaked in CI with the following error:
```
=== === Reason: no match of right hand side value {error,half_attached}
  in function  amqp_utils:detach_link_sync/1 (amqp_utils.erl, line 100)
  in call from amqp_filtex_SUITE:properties_section/1 (amqp_filtex_SUITE.erl, line 187)
  in call from test_server:ts_tc/3 (test_server.erl, line 1793)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1302)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 1234)
```
2024-11-08 11:46:19 +01:00
David Ansari 9095f7d961 Fix test flake
Increase waiting for credit being applied as described in commit
aeedad7b51 since this test case still flakes rarely with:
```
=== === Reason: {assertEqual,[{module,amqp_client_SUITE},
                               {line,3030},
                               {expression,"amqp10_msg : body ( Msg1 )"},
                               {expected,[<<"1">>]},
                               {value,[<<"2">>]}]}
  in function  amqp_client_SUITE:detach_requeues_two_connections/2 (amqp_client_SUITE.erl, line 3030)
  in call from test_server:ts_tc/3 (test_server.erl, line 1793)
  in call from test_server:run_test_case_eval1/6 (test_server.erl, line 1302)
  in call from test_server:run_test_case_eval/9 (test_server.erl, line 1234)
```
2024-11-08 11:35:04 +01:00
Arnaud Cogoluègnes 1634adbff3
Use infinity timout for RA local query in stream coordinator
The 5-second default timeout is too short.
2024-11-08 09:26:24 +01:00
David Ansari 124ef694bc Fix crashes
This commit fixes two different bugs/crashes.

To repro, prior to this commit:
1. Create an AMQP 1.0 connection on node-1.
2. Open the Management UI on node-2 and open the connection page of this
   single AMQP 1.0 connection.

The first crash was the following:
```
[error] <0.1297.0>   crasher:
[error] <0.1297.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1297.0>     pid: <0.1297.0>
[error] <0.1297.0>     registered_name: []
[error] <0.1297.0>     exception error: no case clause matching
[error] <0.1297.0>                      {badrpc,
[error] <0.1297.0>                          {'EXIT',
[error] <0.1297.0>                              {undef,
[error] <0.1297.0>                                  [{rabbit_connection_tracking,lookup,
[error] <0.1297.0>                                       [<<"[::1]:51729 -> [::1]:5672">>,
[error] <0.1297.0>                                        ['rabbit-1@ABCDDDEEAA']],
[error] <0.1297.0>                                       []}]}}}
[error] <0.1297.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1297.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1297.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1297.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1297.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1297.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1297.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
```

The second crash was the following:
```
[error] <0.1132.0>   crasher:
[error] <0.1132.0>     initial call: cowboy_stream_h:request_process/3
[error] <0.1132.0>     pid: <0.1132.0>
[error] <0.1132.0>     registered_name: []
[error] <0.1132.0>     exception error: no case clause matching
[error] <0.1132.0>                      {tracked_connection,
[error] <0.1132.0>                          {'rabbit-1@ABCDDDEEAA',
[error] <0.1132.0>                              <<"[::1]:65505 -> [::1]:5672">>},
[error] <0.1132.0>                          'rabbit-1@ABCDDDEEAA',<<"/">>,
[error] <0.1132.0>                          <<"[::1]:65505 -> [::1]:5672">>,<13661.1110.0>,
[error] <0.1132.0>                          {1,0},
[error] <0.1132.0>                          network,
[error] <0.1132.0>                          {0,0,0,0,0,0,0,1},
[error] <0.1132.0>                          65505,<<"guest">>,1730908606089}
[error] <0.1132.0>       in function  rabbit_connection_tracking:lookup/2 (rabbit_connection_tracking.erl, line 235)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:conn/1 (rabbit_mgmt_wm_connection_sessions.erl, line 72)
[error] <0.1132.0>       in call from rabbit_mgmt_wm_connection_sessions:is_authorized/2 (rabbit_mgmt_wm_connection_sessions.erl, line 63)
[error] <0.1132.0>       in call from cowboy_rest:call/3 (src/cowboy_rest.erl, line 1590)
[error] <0.1132.0>       in call from cowboy_rest:is_authorized/2 (src/cowboy_rest.erl, line 368)
[error] <0.1132.0>       in call from cowboy_rest:upgrade/4 (src/cowboy_rest.erl, line 284)
[error] <0.1132.0>       in call from cowboy_stream_h:execute/3 (src/cowboy_stream_h.erl, line 306)
[error] <0.1132.0>       in call from cowboy_stream_h:request_process/3 (src/cowboy_stream_h.erl, line 295)
2024-11-07 15:11:42 +01:00
David Ansari 9d0c851df2 Show session and link details for AMQP 1.0 connection
## What?

On the connection page in the Management UI, display detailed session and
link information including:
* Link names
* Link target and source addresses
* Link flow control state
* Session flow control state
* Number of unconfirmed and unacknowledged messages

 ## How?

A new HTTP API endpoint is added:
```
/connections/:connection_name/sessions
```

The HTTP handler first queries the Erlang connection process to find out about
all session Pids. The handler then queries each Erlang session process
of this connection.

(The table auto-refreshes by default every 5 seconds. The handler querying a single
connection with 60 idle sessions with each 250 links takes ~100 ms.)

For better user experience in the Management UI, this commit also makes the
session process store and expose link names as well as source/target addresses.
2024-11-07 15:11:42 +01:00
Jean-Sébastien Pédron d2d608211a
rabbit_feature_flags: Use non-blocking call in `get_state/1`
[Why]
The previous implementation was using the blocking `is_enabled/1` API.
This meant that if a feature flag was being enabled and the enable
callback took time, the CLI's `list_feature_flag` command or any use of
the management UI would block until the feature flag was enabled.

[How]
`get_state/1` now uses the non-blocking API. However it returns a now
possible value: `state_changing`.
2024-11-06 11:35:14 +01:00
Jean-Sébastien Pédron f90cb869cc
rabbit_feature_flags: Expose more feature flag properties to the management API
[Why]
It allows to better communicate each feature flag specificities and make
a better more user-friendly management UI.
2024-11-06 11:35:14 +01:00
Jean-Sébastien Pédron 724705ca3b
rabbit_feature_flags: Declare if an experimental feature flag is supported or not
[Why]
Durint the development of Khepri, it was difficult to communicate that
it was unsupported in RabbitMQ 3.13.x but was then supported in 4.0.x
even though it was still experimental.

[How]
The feature flag definition now exposes that support level in a now
attribute called `experiment_level`. It can be `unsupported` or
`supported`.

We can use this now attribute in the CLI or the web UI to convey the
level of support to the end user.

In the future, we could imagine that an experimental feature flag
becomes abandoned, where upgraded from a node that has it enabled to a
version that marks the feature flag as abandoned is not possible.
2024-11-06 11:35:14 +01:00
Lois Soto Lopez 4819801a33 Exclude policy_repair QQ test on mixed versions 2024-11-05 16:38:18 +01:00
GitHub 71bdd1a78c bazel run gazelle 2024-11-05 04:02:22 +00:00
Michael Klishin 508ec97c88
Merge pull request #12646 from rabbitmq/metrics-flakes
metrics_SUITE: wait for tables in proper test
2024-11-04 12:08:26 -05:00
David Ansari 6034f3c411
Merge pull request #12638 from rabbitmq/amqp-connection-metrics
Expose AMQP connection metrics
2024-11-04 17:57:29 +01:00
Diana Parra Corbacho 054fcd676c metrics_SUITE: wait for tables in proper test 2024-11-04 16:46:23 +01:00
Michael Klishin 734f6853bc
Merge branch 'main' into rabbitmq-server-12412 2024-11-04 00:39:51 -05:00
Michael Klishin 8ea7e65e34
QQ: handle case where a stale read request results in member crash.
It is possible for a slow running follower with local consumers
to crash after a snapshot installation as it tries to read an entry
from its log that is no longer there (as it has been consumed and
completed by another node but still refers to prior consumers on the
current node).

This commit makes the log effect callback function more defensive
to check that the number of commands returned by the log effect
isn't different from what was requested. if it is different we
consider this a stale read request and return no further effects.

Conflicts:
	deps/rabbit/test/quorum_queue_SUITE.erl
2024-11-04 00:36:48 -05:00
David Ansari af876ed6d1
Use log macros for AMQP
Using a log macro has the benefit that location data is added as
explained in https://www.erlang.org/doc/apps/kernel/logger.html#t:metadata/0
2024-11-04 00:34:51 -05:00
David Ansari 6fde076707
Support AMQP 1.0 token renewal
Closes #9259.

 ## What?
Allow an AMQP 1.0 client to renew an OAuth 2.0 token before it expires.

 ## Why?
This allows clients to keep the AMQP connection open instead of having
to create a new connection whenever the token expires.

 ## How?
As explained in https://github.com/rabbitmq/rabbitmq-server/issues/9259#issuecomment-2437602040
the client can `PUT` a new token on HTTP API v2 path `/auth/tokens`.
RabbitMQ will then:
1. Store the new token on the given connection.
2. Recheck access to the connection's vhost.
3. Clear all permission caches in the AMQP sessions.
4. Recheck write permissions to exchanges for links publishing to
   RabbitMQ, and recheck read permissions from queues for links
   consuming from RabbitMQ. The latter complies with the user
   expectation in #11364.
2024-11-04 00:34:51 -05:00
Diana Parra Corbacho ff44f4d355
Test: metrics_SUITE queue_idemp wait for queue metrics 2024-11-04 00:34:51 -05:00
Diana Parra Corbacho ab9d225502
Tests: wait for connection closed in metrics_SUITE 2024-11-04 00:34:50 -05:00
Jean-Sébastien Pédron fe7beea4b8
rabbit_feature_flags: Log controller task on a single line 2024-11-04 00:34:50 -05:00
Jean-Sébastien Pédron 9802348683
rabbit_feature_flags: Report feature flags init error reason
[Why]
`failed_to_initialize_feature_flags_registry` was a little too vague.
2024-11-04 00:34:50 -05:00
Jean-Sébastien Pédron 937ca915c9
rabbit_feature_flags: Introduce hard vs. soft required feature flags
[Why]
Before this patch, required feature flags were basically checked during
boot: they must have been enabled when they were mere stable feature
flags. If they were not, the node refused to boot.

This was easy for the developer because making a feature flag required
allowed to remove the entire compatibility code. Very satisfying.

Unfortunately, this was a pain point to end users, especially those who
did not pay attention to RabbitMQ and the release notes and were just
asking their package manager to update everything. They could end up
with a node that refuse to boot. The only solution was to downgrade,
enable the disabled stable feature flags, upgrade again.

[How]
This patch introduces two levels of requirement to required feature
flags:
* `hard`: this corresponds to the existing behavior where a node will
  refuse to boot if a hard required feature flag is not enabled before
  the upgrade.
* `soft`: such a required feature flag will be automatically enabled
  during the upgrade to a version where it is marked as required.

The level of requirement is set in the feature flag definition:

    -rabbit_feature_flag(
       {my_feature_flag,
        #{stability     => required,
	  require_level => hard
         }}).

The default requirement level is `soft`. All existing required feature
flags have now a requirement level of `hard`.

The handling of soft required feature flag is done when the cluster
feature flags states are verified and synchronized. If a required
feature flag is not enabled yet, it is enabled at that time.

This means that as developers, we will have to keep compatibility code
forever for every soft required feature flag, like the feature flag
definition itself.
2024-11-04 00:34:49 -05:00
Diana Parra Corbacho ef06f80bb8
Fix metrics_SUITE connection_metrics flake 2024-11-04 00:34:48 -05:00
Loïc Hoguin 2235492d28
Make CI: Add mixed version testing
This is enabled on main and for pull requests. Bazel remains
used in previous branches.
2024-11-04 00:34:47 -05:00
David Ansari 238ce77585
Delete test access_failure
This test flakes in CI as described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869

The test case fails with
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
                                 "Unhandled exception. System.Exception: expected exception not received
                                 at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477
                                 at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
                [{amqp_system_SUITE,run_dotnet_test,2,
                                    [{file,"amqp_system_SUITE.erl"},
                                     {line,257}]},
```

However, RabbitMQ closes the session as expected due to the missing read
permissions to the queue as shown in the RabbitMQ logs:
```
[debug] <0.1321.0> Asked to create a new user 'access_failure', password length in bytes: 24
[info] <0.1321.0> Created user 'access_failure'
[debug] <0.1324.0> Asked to set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1324.0> Successfully set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1333.0> accepting AMQP connection 127.0.0.1:36248 -> 127.0.0.1:25000
[debug] <0.1333.0> User 'access_failure' authenticated successfully by backend rabbit_auth_backend_internal
[info] <0.1333.0> Connection from AMQP 1.0 container 'AMQPNetLite-101d7d51': user 'access_failure' authenticated using SASL mechanism PLAIN and granted access to vhost '/'
[debug] <0.1333.0> AMQP 1.0 connection.open frame: hostname = 127.0.0.1, extracted vhost = /, idle-time-out = undefined
[debug] <0.1333.0> AMQP 1.0 created session process <0.1338.0> for channel number 0
[warning] <0.1338.0> Closing session for connection <0.1333.0>: {'v1_0.error',
[warning] <0.1338.0>                                             {symbol,
[warning] <0.1338.0>                                              <<"amqp:unauthorized-access">>},
[warning] <0.1338.0>                                             {utf8,
[warning] <0.1338.0>                                              <<"read access to queue 'test' in vhost '/' refused for user 'access_failure'">>},
[warning] <0.1338.0>                                             undefined}
[debug] <0.1333.0> AMQP 1.0 closed session process <0.1338.0> with channel number 0
[warning] <0.1333.0> closing AMQP connection <0.1333.0> (127.0.0.1:36248 -> 127.0.0.1:25000, duration: '269ms'):
[warning] <0.1333.0> client unexpectedly closed TCP connection
```

```
let receiver = ReceiverLink(ac.Session, "test-receiver", src)
```
uses a null constructur for the onAttached callback.
ReceiverLink doesn't seem to block.

Given that the exact same authorization error is already tested in test
case attach_source_queue of amqp_auth_SUITE, it's safe to delete this F#
test.
2024-11-04 00:34:47 -05:00
David Ansari 52b6419876
Remove test flake
Prior to this commit tests
* leader_transfer_quorum_queue_credit_single
* leader_transfer_quorum_queue_credit_batches
flaked in CI during 4.1 (main) and 4.0 mixed version testing.

The follwing error occurred on node 0:
```
[error] <0.1950.0> Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0
[warning] <0.1950.0> Closing session for connection <0.1945.0>: {'v1_0.error',
[warning] <0.1950.0>                                             {symbol,<<"amqp:internal-error">>},
[warning] <0.1950.0>                                             {utf8,
[warning] <0.1950.0>                                              <<"Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0">>},
[warning] <0.1950.0>                                             undefined}
```

Therefore we enable this feature flag for both tests.

This commit also simplifies some test setups that were necessary for
4.0/3.13 mixed version testing, but isn't necessary anymore for 4.1/4.0
mixed version testing.
2024-11-04 00:34:47 -05:00
David Ansari 70597737e4
Support x-cc message annotation (#12559)
Support x-cc message annotation

Support an `x-cc` message annotation in AMQP 1.0
similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1.

The value of the `x-cc` message annotation must by a list of strings.
A message annotation is used since application properties allow only simple types.
2024-11-04 00:34:47 -05:00
David Ansari 9c2ee91a3c
Validate setting permissions works
in order to troubleshoot the flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
                                 "Unhandled exception. System.Exception: expected exception not received\n
                                 at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477\n
                                 at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
                [{amqp_system_SUITE,run_dotnet_test,2,
                                    [{file,"amqp_system_SUITE.erl"},
                                     {line,257}]},
```
2024-11-04 00:34:46 -05:00
David Ansari 3db4a97cfb Expose AMQP connection metrics
Expose the same metrics for AMQP 1.0 connections as for AMQP 0.9.1 connections.

Display the following AMQP 1.0 metrics on the Management UI:
* Network bytes per second from/to client on connections page
* Number of sessions/channels on connections page
* Network bytes per second from/to client graph on connection page
* Reductions graph on connection page
* Garbage colletion info on connection page

Expose the following AMQP 1.0 per-object Prometheus metrics:
* rabbitmq_connection_incoming_bytes_total
* rabbitmq_connection_outgoing_bytes_total
* rabbitmq_connection_process_reductions_total
* rabbitmq_connection_incoming_packets_total
* rabbitmq_connection_outgoing_packets_total
* rabbitmq_connection_pending_packets
* rabbitmq_connection_channels

The rabbit_amqp_writer proc:
* notifies the rabbit_amqp_reader proc if it sent frames
* hibernates eventually if it doesn't send any frames

The rabbit_amqp_reader proc:
* does not emit stats (update ETS tables) if no frames are received
or sent to save resources when there are many idle connections.
2024-11-02 19:08:24 +01:00
Karl Nilsson 94e677987f QQ: handle case where a stale read request results in member crash.
It is possible for a slow running follower with local consumers
to crash after a snapshot installation as it tries to read an entry
from its log that is no longer there (as it has been consumed and
completed by another node but still refers to prior consumers on the
current node).

This commit makes the log effect callback function more defensive
to check that the number of commands returned by the log effect
isn't different from what was requested. if it is different we
consider this a stale read request and return no further effects.
2024-11-01 11:37:20 +00:00
Michael Klishin 67bc9504ed
Merge pull request #12617 from rabbitmq/amqp-log-macro
Use log macros for AMQP
2024-10-31 14:25:42 -04:00
Diana Parra Corbacho 0df71d54cb Test: metrics_SUITE queue_idemp wait for queue metrics 2024-10-31 09:39:44 +01:00
Diana Parra Corbacho 34c1fd13d9 Tests: wait for connection closed in metrics_SUITE 2024-10-31 09:05:30 +01:00
David Ansari dbd9ede67b Use log macros for AMQP
Using a log macro has the benefit that location data is added as
explained in https://www.erlang.org/doc/apps/kernel/logger.html#t:metadata/0
2024-10-30 14:50:05 +01:00
Jean-Sébastien Pédron 3c15d7e3e6
rabbit_feature_flags: Log controller task on a single line 2024-10-30 11:12:40 +01:00
Jean-Sébastien Pédron 2abec68708
rabbit_feature_flags: Report feature flags init error reason
[Why]
`failed_to_initialize_feature_flags_registry` was a little too vague.
2024-10-30 11:12:40 +01:00
Jean-Sébastien Pédron ea899602b0
rabbit_feature_flags: Introduce hard vs. soft required feature flags
[Why]
Before this patch, required feature flags were basically checked during
boot: they must have been enabled when they were mere stable feature
flags. If they were not, the node refused to boot.

This was easy for the developer because making a feature flag required
allowed to remove the entire compatibility code. Very satisfying.

Unfortunately, this was a pain point to end users, especially those who
did not pay attention to RabbitMQ and the release notes and were just
asking their package manager to update everything. They could end up
with a node that refuse to boot. The only solution was to downgrade,
enable the disabled stable feature flags, upgrade again.

[How]
This patch introduces two levels of requirement to required feature
flags:
* `hard`: this corresponds to the existing behavior where a node will
  refuse to boot if a hard required feature flag is not enabled before
  the upgrade.
* `soft`: such a required feature flag will be automatically enabled
  during the upgrade to a version where it is marked as required.

The level of requirement is set in the feature flag definition:

    -rabbit_feature_flag(
       {my_feature_flag,
        #{stability     => required,
	  require_level => hard
         }}).

The default requirement level is `soft`. All existing required feature
flags have now a requirement level of `hard`.

The handling of soft required feature flag is done when the cluster
feature flags states are verified and synchronized. If a required
feature flag is not enabled yet, it is enabled at that time.

This means that as developers, we will have to keep compatibility code
forever for every soft required feature flag, like the feature flag
definition itself.
2024-10-30 11:12:18 +01:00
David Ansari 1778bc22aa Support AMQP 1.0 token renewal
Closes #9259.

 ## What?
Allow an AMQP 1.0 client to renew an OAuth 2.0 token before it expires.

 ## Why?
This allows clients to keep the AMQP connection open instead of having
to create a new connection whenever the token expires.

 ## How?
As explained in https://github.com/rabbitmq/rabbitmq-server/issues/9259#issuecomment-2437602040
the client can `PUT` a new token on HTTP API v2 path `/auth/tokens`.
RabbitMQ will then:
1. Store the new token on the given connection.
2. Recheck access to the connection's vhost.
3. Clear all permission caches in the AMQP sessions.
4. Recheck write permissions to exchanges for links publishing to
   RabbitMQ, and recheck read permissions from queues for links
   consuming from RabbitMQ. The latter complies with the user
   expectation in #11364.
2024-10-30 10:42:40 +01:00
Lois Soto Lopez 2577b7e284 Remove extra keys from `gather_policy_config` out 2024-10-28 12:54:00 +01:00
Diana Parra Corbacho 4e92841a9f Fix metrics_SUITE connection_metrics flake 2024-10-25 18:07:41 +02:00
Loïc Hoguin f68fc8bb94
Make CI: Add mixed version testing
This is enabled on main and for pull requests. Bazel remains
used in previous branches.
2024-10-25 13:50:05 +02:00
David Ansari b1169d06ba Delete test access_failure
This test flakes in CI as described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869

The test case fails with
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
                                 "Unhandled exception. System.Exception: expected exception not received
                                 at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477
                                 at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
                [{amqp_system_SUITE,run_dotnet_test,2,
                                    [{file,"amqp_system_SUITE.erl"},
                                     {line,257}]},
```

However, RabbitMQ closes the session as expected due to the missing read
permissions to the queue as shown in the RabbitMQ logs:
```
[debug] <0.1321.0> Asked to create a new user 'access_failure', password length in bytes: 24
[info] <0.1321.0> Created user 'access_failure'
[debug] <0.1324.0> Asked to set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1324.0> Successfully set permissions for user 'access_failure' in virtual host '/' to '.*', '^banana.*', '^banana.*'
[info] <0.1333.0> accepting AMQP connection 127.0.0.1:36248 -> 127.0.0.1:25000
[debug] <0.1333.0> User 'access_failure' authenticated successfully by backend rabbit_auth_backend_internal
[info] <0.1333.0> Connection from AMQP 1.0 container 'AMQPNetLite-101d7d51': user 'access_failure' authenticated using SASL mechanism PLAIN and granted access to vhost '/'
[debug] <0.1333.0> AMQP 1.0 connection.open frame: hostname = 127.0.0.1, extracted vhost = /, idle-time-out = undefined
[debug] <0.1333.0> AMQP 1.0 created session process <0.1338.0> for channel number 0
[warning] <0.1338.0> Closing session for connection <0.1333.0>: {'v1_0.error',
[warning] <0.1338.0>                                             {symbol,
[warning] <0.1338.0>                                              <<"amqp:unauthorized-access">>},
[warning] <0.1338.0>                                             {utf8,
[warning] <0.1338.0>                                              <<"read access to queue 'test' in vhost '/' refused for user 'access_failure'">>},
[warning] <0.1338.0>                                             undefined}
[debug] <0.1333.0> AMQP 1.0 closed session process <0.1338.0> with channel number 0
[warning] <0.1333.0> closing AMQP connection <0.1333.0> (127.0.0.1:36248 -> 127.0.0.1:25000, duration: '269ms'):
[warning] <0.1333.0> client unexpectedly closed TCP connection
```

```
let receiver = ReceiverLink(ac.Session, "test-receiver", src)
```
uses a null constructur for the onAttached callback.
ReceiverLink doesn't seem to block.

Given that the exact same authorization error is already tested in test
case attach_source_queue of amqp_auth_SUITE, it's safe to delete this F#
test.
2024-10-24 18:34:25 +02:00
David Ansari c476540bbc Remove test flake
Prior to this commit tests
* leader_transfer_quorum_queue_credit_single
* leader_transfer_quorum_queue_credit_batches
flaked in CI during 4.1 (main) and 4.0 mixed version testing.

The follwing error occurred on node 0:
```
[error] <0.1950.0> Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0
[warning] <0.1950.0> Closing session for connection <0.1945.0>: {'v1_0.error',
[warning] <0.1950.0>                                             {symbol,<<"amqp:internal-error">>},
[warning] <0.1950.0>                                             {utf8,
[warning] <0.1950.0>                                              <<"Timed out waiting for credit reply from quorum queue 'leader_transfer_quorum_queue_credit_batches' in vhost '/'. Hint: Enable feature flag rabbitmq_4.0.0">>},
[warning] <0.1950.0>                                             undefined}
```

Therefore we enable this feature flag for both tests.

This commit also simplifies some test setups that were necessary for
4.0/3.13 mixed version testing, but isn't necessary anymore for 4.1/4.0
mixed version testing.
2024-10-24 18:16:11 +02:00
David Ansari 2c0cdee7d2
Support x-cc message annotation (#12559)
Support x-cc message annotation

Support an `x-cc` message annotation in AMQP 1.0
similar to the [CC](https://www.rabbitmq.com/docs/sender-selected) header in AMQP 0.9.1.

The value of the `x-cc` message annotation must by a list of strings.
A message annotation is used since application properties allow only simple types.
2024-10-24 13:03:05 +02:00
David Ansari 0c905f9b17 Validate setting permissions works
in order to troubleshoot the flake described in
https://github.com/rabbitmq/rabbitmq-server/issues/12413#issuecomment-2419293869
```
Node: rabbit_shard2@localhost
Case: amqp_system_SUITE:access_failure
Reason: {error,{{badmatch,{error,134,
                                 "Unhandled exception. System.Exception: expected exception not received\n
                                 at Program.Test.accessFailure(String uri) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 477\n
                                 at Program.main(String[] argv) in /home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbit/test/amqp_system_SUITE_data/fsharp-tests/Program.fs:line 509\n"}},
                [{amqp_system_SUITE,run_dotnet_test,2,
                                    [{file,"amqp_system_SUITE.erl"},
                                     {line,257}]},
```
2024-10-24 12:34:03 +02:00
Lois Soto Lopez 9dc9f974b5 Remove ShouldLog & limit deliv. limit not set logg
Removes the usage of a ShouldLog parameter on several functions
and limits the logging of the message warning about the delivery_limit
not being set to the moment of queueDeclaration
2024-10-24 07:28:03 +02:00
Lois Soto Lopez 3b5069fdc5 Simplify publish_confirm_many 2024-10-24 07:28:03 +02:00
Lois Soto Lopez 42b58c7c01 Use wait_for_messages_ready 2024-10-24 07:28:03 +02:00
Lois Soto Lopez df14b4a9ac Use local function for ensuring qq proc dead 2024-10-24 07:28:03 +02:00
Lois Soto Lopez 51abb5c73f Consider QQs may let pass 1st overflowing msg 2024-10-24 07:28:02 +02:00
Lois Soto Lopez dc9ab1d8cf Move tests to main qq SUITE & refactor a bit 2024-10-24 07:27:47 +02:00
Péter Gömöri ccd854878b Refactoring suggestion
(some of this is just reverting to the original format to reduce the
diff against main)
2024-10-24 07:23:02 +02:00
Lois Soto Lopez ec87ef1ceb Use ra_machine_config but limit keys to check 2024-10-24 07:23:02 +02:00
Lois Soto Lopez fabe54db94 Use `ra_machine_config` to gen a comparable config
Instead of checking the values for current configuration, represented in
`rabbit_quorum_queue:handle_tick` by the `Overview` variable, against
the effective policy, just regenerate the configuration and compare with
the current configuration.
2024-10-24 07:23:02 +02:00
Lois Soto Lopez b408351d9e Add test for QQ policy repair feature 2024-10-24 07:23:02 +02:00
Lois Soto Lopez f9179d1090 Add QQ periodic policy repair 2024-10-24 07:23:02 +02:00
Simon Unge dacdeb024d Fix so that the code handles a timeout return 2024-10-23 23:12:36 +00:00
David Ansari 814d44dd82 Convert array from AMQP 1.0 to AMQP 0.9.1
Fix the following crash when an AMQP 0.9.1 client consumes an AMQP 1.0
encoded message that contains an array value in message annotations:
```
crasher:
  initial call: rabbit_channel:init/1
  pid: <0.685.0>
  registered_name: []
  exception exit: {function_clause,
                      [{mc_amqpl,to_091,
                           [<<"x-array">>,
                            {array,utf8,[{utf8,<<"e1">>},{utf8,<<"e2">>}]}],
                           [{file,"mc_amqpl.erl"},{line,737}]},
                       {mc_amqpl,'-convert_from/3-fun-3-',1,
                           [{file,"mc_amqpl.erl"},{line,168}]},
                       {lists,filtermap_1,2,
                           [{file,"lists.erl"},{line,2279}]},
                       {mc_amqpl,convert_from,3,
                           [{file,"mc_amqpl.erl"},{line,158}]},
                       {mc,convert,3,[{file,"mc.erl"},{line,332}]},
                       {rabbit_channel,handle_deliver0,4,
                           [{file,"rabbit_channel.erl"},{line,2619}]},
                       {lists,foldl_1,3,[{file,"lists.erl"},{line,2151}]},
                       {lists,foldl,3,[{file,"lists.erl"},{line,2146}]}]}
```
2024-10-22 12:16:19 +02:00
David Ansari cbe5551cf1 bazel run gazelle 2024-10-20 12:41:23 +02:00
David Ansari dc9ebc5b81 Check topic permissions of CC and BCC headers 2024-10-20 11:41:29 +02:00
David Ansari 97512b0a9e
Merge pull request #12549 from rabbitmq/amqp-filtex-errors
Prevent crash for invalid application-properties filter
2024-10-18 17:46:03 +02:00
David Ansari c5b6e7f297 bazel run gazelle 2024-10-18 16:09:00 +02:00
David Ansari 1827df811a Prevent crash for invalid application-properties filter
application-properties keys are restricted to be strings.

Prior to this commit, a function_clause error occurred if the client
requested an invalid filter:
```
  │ *Error{Condition: amqp:internal-error, Description: Session error: function_clause
  │ [{rabbit_amqp_filtex,'-validate0/2-fun-0-',
  │                      [{{symbol,<<"subject">>},{utf8,<<"var">>}}],
  │                      [{file,"rabbit_amqp_filtex.erl"},{line,119}]},
  │  {lists,map,2,[{file,"lists.erl"},{line,2077}]},
  │  {rabbit_amqp_filtex,validate0,2,[{file,"rabbit_amqp_filtex.erl"},{line,119}]},
  │  {rabbit_amqp_filtex,validate,1,[{file,"rabbit_amqp_filtex.erl"},{line,28}]},
  │  {rabbit_amqp_session,parse_filters,2,
  │                       [{file,"rabbit_amqp_session.erl"},{line,3068}]},
  │  {rabbit_amqp_session,parse_filter,1,
  │                       [{file,"rabbit_amqp_session.erl"},{line,3014}]},
  │  {rabbit_amqp_session,'-handle_attach/2-fun-0-',21,
  │                       [{file,"rabbit_amqp_session.erl"},{line,1371}]},
  │
  {rabbit_misc,with_exit_handler,2,[{file,"rabbit_misc.erl"},{line,465}]}],
  Info: map[]}
```

After this commit, the filter won't actually take effect without a crash occurring.

Supersedes #12520
2024-10-18 15:37:28 +02:00
Michael Klishin 17dd95a801
Merge pull request #12543 from rabbitmq/su_aws/fix_desctiption_qq_target_group_size_policy
Fix module mentioned in target group size description
2024-10-18 09:01:19 -04:00
David Ansari d1d7d7bad4 Optionally notify client app with AMQP 1.0 performative
This commit notifies the client app with the AMQP performative if
connection config `notify_with_performative` is set to `true`.

This allows the client app to learn about all fields including
properties and capabilities returned by the AMQP server.
2024-10-18 13:51:35 +02:00
Simon Unge 691a0368ba Fix module mentioned in target group size description 2024-10-17 22:52:00 +00:00
Loïc Hoguin 469c3a0791
Make CI: Check that CI knows about all CT_SUITES in CI
Instead of every time we run Make for these applications.

This means that during development we are free to modify
these values or create new test suites without having to
worry about the check. If we forget to then add the test
suites in PARALLEL_CT the workflow will tell us.
2024-10-17 10:52:28 +02:00
David Ansari ab8814ad7d Fix error message
Prior to this commit if dotnet or mvnw failed to fetch test
dependencies, for example because dotnet isn't installed, the test setup
crashed in an unexpected way:
```
amqp_system_SUITE > dotnet
    {'EXIT',
        {badarg,
            [{lists,keysearch,
                 [rmq_nodes,1,
                  {skip,
                      "Failed to fetch .NET Core test project dependencies"}],
                 [{error_info,#{module => erl_stdlib_errors}}]},
             {test_server,lookup_config,2,
                 [{file,"test_server.erl"},{line,1779}]},
             {rabbit_ct_broker_helpers,get_node_configs,2,
                 [{file,"rabbit_ct_broker_helpers.erl"},{line,1411}]},
             {rabbit_ct_broker_helpers,enable_feature_flag,2,
                 [{file,"rabbit_ct_broker_helpers.erl"},{line,1999}]},
             {amqp_system_SUITE,init_per_group,2,
                 [{file,"amqp_system_SUITE.erl"},{line,77}]},
             {test_server,ts_tc,3,[{file,"test_server.erl"},{line,1794}]},
             {test_server,run_test_case_eval1,6,
                 [{file,"test_server.erl"},{line,1391}]},
             {test_server,run_test_case_eval,9,
                 [{file,"test_server.erl"},{line,1235}]}]}}
```

This commit improves the error message instead of failing with `badarg`.

This commit also decides to fail the test setup instead of skipping the
suite because we always want CI to execute this test and be notified
instead of silently skipping if the test can't be run.
2024-10-16 17:53:17 +02:00
David Ansari 8c0cd1b78c Bump dotnet
This commit fixes the CI error on `main` branch where
amqp_system_SUITE failed with the following error:
```
Process terminated. Couldn't find a valid ICU package installed on the system. Set the configuration flag System.Globalization.Invariant to true if you want to run with no globalization support.
   at System.Environment.FailFast(System.String)
   at System.Globalization.GlobalizationMode.GetGlobalizationInvariantMode()
   at System.Globalization.GlobalizationMode..cctor()
   at System.Globalization.CultureData.CreateCultureWithInvariantData()
   at System.Globalization.CultureData.get_Invariant()
   at System.Globalization.CultureInfo..cctor()
   at System.String.ToLowerInvariant()
   at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment.GetArch()
   at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment..cctor()
   at Microsoft.DotNet.PlatformAbstractions.RuntimeEnvironment.GetRuntimeIdentifier()
   at Microsoft.DotNet.Cli.MulticoreJitProfilePathCalculator.CalculateProfileRootPath()
   at Microsoft.DotNet.Cli.MulticoreJitActivator.StartCliProfileOptimization()
   at Microsoft.DotNet.Cli.MulticoreJitActivator.TryActivateMulticoreJit()
   at Microsoft.DotNet.Cli.Program.Main(System.String[])

Exit code: 134 (pid <0.1533.0>)
```
2024-10-16 16:35:37 +02:00
David Ansari 358ff79611 Provide clear error message for reserved annotation keys
As described in https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-annotations
> The annotations type is a map where the keys are restricted to be of type symbol or of type ulong.
> All ulong keys, and all symbolic keys except those beginning with "x-" are reserved.

Prior to this commit, if an AMQP client used a reserved annotation key,
the entire AMQP connection terminated with a function_clause error
message that might be difficult to understand for client libs:
```
<<"Session error: function_clause\n[{amqp10_framing,'-decode_annotations/1-fun-0-',\n                 [{{symbol,<<\"aa\">>},{utf8,<<\"bbb\">>}}],\n                 [{file,\"amqp10_framing.erl\"},{line,158}]},\n {lists,map,2,[{file,\"lists.erl\"},{line,1559}]},\n {amqp10_framing,decode,1,[{file,\"amqp10_framing.erl\"},{line,127}]},\n {lists,map_1,2,[{file,\"lists.erl\"},{line,1564}]},\n {lists,map,2,[{file,\"lists.erl\"},{line,1559}]},\n {mc_amqp,init,1,[{file,\"mc_amqp.erl\"},{line,102}]},\n {mc,init,4,[{file,\"mc.erl\"},{line,150}]},\n {rabbit_amqp_session,incoming_link_transfer,4,\n                      [{file,\"rabbit_amqp_session.erl\"},{line,2341}]}]">>
```

This commit ends only the session and provides a clearer error message.
2024-10-16 14:14:04 +02:00
Loïc Hoguin e4d20bba51
Merge pull request #12502 from rabbitmq/loic-ct-master-patching
Make CI: Fix and enhance ct_master
2024-10-16 12:40:06 +02:00
Karl Nilsson 3b1ef8f529 QQ: fix the key_metrics_rpc function.
Currently this function always falls back to the compatibility code
and never gets the benefit of using ra:key_metrics/1 due to incorrect
use of the map update operatior ":=" instead of the insert operator
"=>".
2024-10-15 16:30:39 +01:00
Loïc Hoguin 655caf6d1a
Make CI: Have ct_master return the test results
Instead of having a CT hook just to know whether our tests failed.
2024-10-15 14:57:42 +02:00
Loïc Hoguin 6cdc32f558
Make CI: Make ct_master handle all testspec instructions 2024-10-15 14:57:42 +02:00
Michael Klishin 1726064579
Merge pull request #12208 from rabbitmq/qq-otp27-conf
Adjust vheap sizes for message handling processes in OTP 27
2024-10-11 09:59:40 -04:00
David Ansari b1064fddba Support negative integers in modified annotations 2024-10-11 14:43:31 +02:00
David Ansari 2e90619a62 Add custom dead letter history test
Test the use case described in https://github.com/rabbitmq/rabbitmq-website/pull/2095:

> Rather than relying solely on RabbitMQ's built-in dead lettering tracking via x-opt-deaths,
consumers can customise dead lettering event tracking.
2024-10-11 13:00:25 +02:00
David Ansari 855a32ab28 Add alternate exchange test assertion
Test the use case described in
https://github.com/rabbitmq/rabbitmq-website/pull/2095
2024-10-11 12:23:00 +02:00
David Ansari e6818f0040 Track requeue history
Support tracking the requeue history as described in
https://github.com/rabbitmq/rabbitmq-website/pull/2095

This commit:
1. adds a test case tracing the requeue history via AMQP 1.0
   using the modified outcome and
2. fixes bugs in the broker which crashed if a modified message
   annotation value is an AMQP 1.0 list, map, or array.

Complex modified annotation values (list, map, array) are stored as tagged values from now on.
This means AMQP 0.9.1 consumers will not receive modified annotations of
type list, map, or array (which is okay).
2024-10-11 12:21:28 +02:00
Michal Kuratczyk 6a7f8d0d1e Remove redundant copy of adjust_for_message_handling_proc/0 2024-10-09 20:08:34 -04:00
Karl Nilsson 465b19e8e8 Adjust vheap sizes for message handling processes in OTP 27
OTP 27 reset all assumptions on how the vm reacts to processes that
buffer and process a lot of large binaries.

Substantially increasing the vheap sizes for such process restores
most of the same performance by allowing processes to hold more binary
data before major garbage collections are triggered.

This introduces a new module to capture process flag configurations.

The new vheap sizes are only applied when running on OTP 27 or
above.
2024-10-09 20:08:34 -04:00
Michael Klishin 5f87cc9c4f
Merge pull request #12487 from rabbitmq/gazelle-main
bazel run gazelle
2024-10-08 06:55:42 -04:00
Michael Klishin 6feca5f72b
Merge pull request #12392 from rabbitmq/loic-fix-cq-scan
CQ: Fix shared store scanner missing messages
2024-10-08 06:49:24 -04:00
GitHub b9bb3014c0 bazel run gazelle 2024-10-08 04:02:25 +00:00
Ayanda Dube 16170d093b
Update QQ force-shrink logging
(cherry picked from commit dd5ec3ccc0)
2024-10-07 14:26:18 -04:00
Ayanda Dube c9d97e61de
Add test for QQ force_vhost_queues_shrink_member_to_current_member/1
(cherry picked from commit de0c0dbd89)
2024-10-07 14:26:10 -04:00
Ayanda Dube b03637f8ec
Implement force_vhost_queues_shrink_member_to_current_member/1
(cherry picked from commit c26aa3b1c7)
2024-10-07 14:26:04 -04:00
Ayanda Dube 10dbde1f71
QQ tests for force-shrink to current member operations
(cherry picked from commit 60ee35ea7e)
2024-10-07 14:25:57 -04:00
Ayanda Dube d9de6d989c
Shutdown peer QQ FSMs on connected nodes on force-shrink execution for cluster
wide consistency, ensuring only the leader is active/running

(cherry picked from commit b675ce29f0)
2024-10-07 14:25:49 -04:00
Jean-Sébastien Pédron 67af27b1b0
Merge pull request #12445 from rabbitmq/accept-relative-feature-flags-list-in-RABBITMQ_FEATURE_FLAGS-env
rabbit_feature_flags: Accept "+feature1,-feature2" in `$RABBITMQ_FEATURE_FLAGS`
2024-10-07 17:20:51 +02:00
David Ansari df59a52b70
Support AMQP filter expressions (#12415)
* Support AMQP filter expressions

 ## What?

This PR implements the following property filter expressions for AMQP clients
consuming from streams as defined in
[AMQP Filter Expressions Version 1.0 Working Draft 09](https://groups.oasis-open.org/higherlogic/ws/public/document?document_id=66227):
* properties filters [section 4.2.4]
* application-properties filters [section 4.2.5]

String prefix and suffix matching is also supported.

This PR also fixes a bug where RabbitMQ would accept wrong filters.
Specifically, prior to this PR the values of the filter-set's map were
allowed to be symbols. However, "every value MUST be either null or of a
described type which provides the archetype filter."

 ## Why?

This feature adds the ability to RabbitMQ to have multiple concurrent clients
each consuming only a subset of messages while maintaining message order.

This feature also reduces network traffic between RabbitMQ and clients by
only dispatching those messages that the clients are actually interested in.

Note that AMQP filter expressions are more fine grained than the [bloom filter based
stream filtering](https://www.rabbitmq.com/blog/2023/10/16/stream-filtering) because
* they do not suffer false positives
* the unit of filtering is per-message instead of per-chunk
* matching can be performed on **multiple** values in the properties and
  application-properties sections
* prefix and suffix matching on the actual values is supported.

Both, AMQP filter expressions and bloom filters can be used together.

 ## How?

If a filter isn't valid, RabbitMQ ignores the filter. RabbitMQ only
replies with filters it actually supports and validated successfully to
comply with:
"The receiving endpoint sets its desired filter, the sending endpoint
[RabbitMQ] sets the filter actually in place (including any filters defaulted at
the node)."

* Delete streams test case

The test suite constructed a wrong filter-set.
Specifically the value of the filter-set didn't use a described type as
mandated by the spec.
Using https://azure.github.io/amqpnetlite/api/Amqp.Types.DescribedValue.html
throws errors that the descriptor can't be encoded. Given that this code
path is already tests via the amqp_filtex_SUITE, this F# test gets
therefore deleted.

* Re-introduce the AMQP filter-set bug

Since clients might rely on the wrong filter-set value type, we support
the bug behind a deprecated feature flag and gradually remove support
this bug.

* Revert "Delete streams test case"

This reverts commit c95cfeaef7.
2024-10-07 17:12:26 +02:00
Michael Klishin 6c07e70042
Merge pull request #12442 from rabbitmq/gh_12424
QQ: fix bug with discards using a consumer_id()
2024-10-07 10:38:53 -04:00
Jean-Sébastien Pédron 6a0008b06c
rabbit_feature_flags: Accept "+feature1,-feature2" in $RABBITMQ_FEATURE_FLAGS
[Why]
Before this patch, the $RABBITMQ_FEATURE_FLAGS environment variable took
an exhaustive list of feature flags to enable. This list overrode the
default of enabling all stable feature flags.

It made it inconvenient when a user wanted to enable an experimental
feature flag like `khepri_db` while still leaving the default behavior.

[How]
$RABBITMQ_FEATURE_FLAGS now acceps the following syntax:

    RABBITMQ_FEATURE_FLAGS=+feature1,-feature2

This will start RabbitMQ with all stable feature flags, plus `feature1`,
but without `feature2`.

For users setting `forced_feature_flags_on_init` in the config, the
corresponding syntax is:

    {forced_feature_flags_on_init, {rel, [feature1], [feature2]}}
2024-10-07 14:02:50 +02:00
Loïc Hoguin 9645fb1275
Make parallel-ct properly detect test failures
The problem comes from `ct_master` which doesn't tell us
in the return value whether the tests succeeded. In order
to get that information a CT hook was created. But then
we run into another problem: despite its documentation
claiming otherwise, `ct_master` does not handle `ct_hooks`
instructions in the test spec.

So for the time being we fork `ct_master` into a new
`ct_master_fork` module and insert our hook directly
in the code. Later on we will submit patches to OTP.
2024-10-07 13:30:32 +02:00
Loïc Hoguin 639e905aea
CQ: Fix shared store scanner missing messages
It was still possible, although rare, to have message store
files lose message data, when the following conditions were
met:

 * the message data contains byte values 255
   (255 is used as an OK marker after a message)
 * the message is located after a 0-filled hole in the file
 * the length of the data is at least 4096 bytes and
   if we misread it (as detailed below) we encounter
   a 255 byte where we expect the OK marker

The trick for the code to previously misread the length can
be explained as follow:

A message is stored in the following format:

  <<Len:64, MsgIdAndMsg:Len/unit:8, 255>>

With MsgId always being 16 bytes in length. So Len is always
at least 16, if the message data Msg is empty. But technically
it never is.

Now if we have a zero filled hole just before this message,
we may end up with this:

  <<0, Len:64, MsgIdAndMsg:Len/unit:8, 255>>

When we are scanning we are testing bytes to see if there is
a message there or not. We look for a Len that gives us byte
255 after MsgIdAndMsg.

Len of value 4096 looks like this in binary:

  <<0:48, 16, 0>>

Problem is if we have leading zeroes, Len may look like this:

  <<0, 0:48, 16, 0>>

If we take the first 64 bits we get a potential length of 16.
We look at the byte after the next 16 bytes. If it is 255, we
think this is a message and skip by this amount of bytes, and
mistakenly miss the real message.

Solving this by changing the file format would be simple enough,
but we don't have the luxury to afford that. A different solution
was found, which is to combine file scanning with checking that
the message exists in the message store index (populated from
queues at startup, and kept up to date over the life time of
the store). Then we know for sure that the message above
doesn't exist, because the MsgId won't be found in the index.
If it is, then the file number and offset will not match,
and the check will fail.

There remains a small chance that we get it wrong during dirty
recovery. Only a better file format would improve that.
2024-10-07 13:23:12 +02:00
Michael Klishin 3540541cbc
Merge pull request #12458 from rabbitmq/rabbitmq-server-12449
By @gomoripeti: Don't start invalid but enabled plugins at startup
2024-10-05 00:08:19 -04:00
Péter Gömöri 2194822b36 Don't start invalid but enabled plugins at startup
For example during the startup after RabbitMQ was upgraded but an
enabled community plugin wasn't, and the plugin's broker version
requirement isn't met any more, RabbitMQ still started the plugin
after logging an error.
2024-10-04 20:53:13 -04:00
Michael Klishin 80f4797e76 Remove multiple mentions of global prefetch
As suggested by @johanrhodin in #12454.

This keeps the Prometheus plugin part but
marks it as deprecated. We can remove it in
4.1.
2024-10-04 20:47:37 -04:00
Jean-Sébastien Pédron d8ae8afe50
rabbit_feature_flags: Fix style 2024-10-03 18:00:12 +02:00
Jean-Sébastien Pédron 160180883b
rabbit_feature_flags: Log a inventory matrix instead of dumping the map
[Why]
The inventory map is huge and difficult to read when it is logged as
is.

[How]
Logging a matrix is much more compact and to the point.

Before:
    Feature flags: inventory of node `rabbit-1@giotto`:
    #{feature_flags =>
          #{rabbit_exchange_type_local_random =>
                #{name => rabbit_exchange_type_local_random,
                  desc => "Local random exchange",stability => stable,
                  provided_by => rabbit},
            message_containers_deaths_v2 =>
                #{name => message_containers_deaths_v2,
                  desc => "Bug fix for dead letter cycle detection",
    ...

After:
    Feature flags: inventory queried from node `rabbit-2@giotto`:
                                           ,-- rabbit-2@giotto
                                           |
                         amqp_address_v1:
          classic_mirrored_queue_version:  x
                 classic_queue_mirroring:  x
     classic_queue_type_delivery_support:  x
    ...
2024-10-03 18:00:02 +02:00
Jean-Sébastien Pédron 5370370d1d
rabbit_feature_flags: Hide required feature flags from the registry init logged list
[Why]
Showing that required feature flags are enabled over and over is not
useful and only adds noise to the logs.

[How]
Required feature flags and removed deprecated features are not lists
explicitly. We just log their respective numbers to still be clear that
they exist.

Before:
    list of feature flags found:
      [x] classic_mirrored_queue_version
      [x] classic_queue_type_delivery_support
      [x] direct_exchange_routing_v2
      [x] feature_flags_v2
      [x] implicit_default_bindings
      [ ] khepri_db
      [x] message_containers_deaths_v2
      [x] quorum_queue_non_voters
      [~] rabbit_exchange_type_local_random
      [x] rabbitmq_4.0.0
      ...
    list of deprecated features found:
      [ ] amqp_address_v1
      [x] classic_queue_mirroring
      [ ] global_qos
      [ ] queue_master_locator
      [ ] ram_node_type
      [ ] transient_nonexcl_queues

After:
    list of feature flags found:
      [ ] khepri_db
      [x] message_containers_deaths_v2
      [x] quorum_queue_non_voters
      [~] rabbit_exchange_type_local_random
      [x] rabbitmq_4.0.0
    list of deprecated features found:
      [ ] amqp_address_v1
      [ ] global_qos
      [ ] queue_master_locator
      [ ] ram_node_type
      [ ] transient_nonexcl_queues
    required feature flags not listed above: 18
    removed deprecated features not listed above: 1
2024-10-03 17:56:38 +02:00
Karl Nilsson 2339401abe QQ: fix bug with discards using a consumer_id()
Fixes a pattern matching bug for discards that come in after a consumer
has been cancelled. Because the rabbit_fifo_client does not keep
the integer consumer key after cancellation, late acks, returns, and
discards use the full {CTag, Pid} consumer id version.

As this is a state machine change the machine version has been
increased to 5.

The same bug is present for the `modify` command also however as
AMQP does not allow late settlements we don't have to make this
fix conditional on the machine version as it cannot happen.
2024-10-03 13:26:41 +01:00
Jean-Sébastien Pédron e4abbfd6c2
rabbit_feature_flags: Fix copyright year
The subsystem didn't exist before 2019. The deprecated features support
was added in 2023.
2024-10-03 13:03:58 +02:00
Jean-Sébastien Pédron 2f67d19bec
rabbit_feature_flags: Lock registry once and enable many feature flags
[Why]
Before this change, the controller was looping on all feature flags to
enable, then for each:
1. it checked if it was supported
2. it acquired the registry lock
3. it enabled the feature flag
4. it released the registry lock

It was done this way to not acquire the log if the feature flag was
unsupported in the first place.

However, this put more load on the lock mechanism.

[How]
This commit changes the order. The controller acquires the registry lock
once, then loops on feature flags to enable. The support check is now
under the registry lock.
2024-10-03 13:03:42 +02:00
Michael Klishin bc1e0ad2ab
Merge pull request #12437 from rabbitmq/rabbitmq-server-12422
By @ayanda-D: make sure (non-replicated) CQs (classic queues) emit leader and members metrics, just like replicated QQs
2024-10-03 01:52:02 -04:00
Michael Klishin 232798c12a This assertion does not belong to this leader-locator test 2024-10-03 01:12:39 -04:00
Michael Davis 537fe75483
Merge pull request #12416 from rabbitmq/md/binding-deletions-map-refactor 2024-10-02 15:27:07 -04:00
Ayanda Dube 71f921c090 support members info item in classic queues, which will always be the leader 2024-10-02 14:57:30 -04:00
Ayanda Dube f9e5d349df test and assert new classic queue leader info item 2024-10-02 14:57:30 -04:00
Ayanda Dube 22b433efb4 add support for the leader info item in classic queues 2024-10-02 14:57:30 -04:00
Michael Klishin 1f98ab6026 Add a rabbit.license_line default
so that products that build on top could adjust
what's printed in the standard banner.

References #12390
2024-10-01 20:25:25 -04:00
Michael Davis 4aa68ca4dd
Represent `rabbit_binding:deletions()` with a map instead of dict
The `dict:dict()` typing of `rabbit_binding` appears to be a historical
artifact. `dict` has been superseded by `maps`. Switching to a map
makes deletions easier to inspect manually and faster. Though if
deletions grow so large that the map representation is important,
manipulation of the deletions is unlikely to be expensive compared to
any other operations that produced them, so performance is probably
irrelevant.

This commit refactors the bottom section of the `rabbit_binding` module
to switch to a map, switch the `deletions()` type to an opaque,
eliminating a TODO created when using Erlang/OTP 17.1, and the deletion
value to a record. We eliminate some historical artifacts and "cruft":

* Deletions taking multiple forms needlessly, specifically the shape
  `{X, deleted | not_deleted, Bindings, none}` no longer being
  handled. `process_deletions/2` was responsible for creating this
  shape. Instead we now use a record to clearly define the fields.
* Clauses to catch `{error, not_found}` are unnecessary after minor
  refactors of the callers. Removing them makes the type specs cleaner.
* `rabbit_binding:process_deletions/1` has no need to update or change
  the deletions. This function uses `maps:foreach/2` instead and returns
  `ok` instead of mapped deletions.
* Remove `undefined` from the typespec of deletions. This value is no
  longer possible with a refactor to `maybe_auto_delete_exchange_in_*`
  functions for Mnesia and Khepri. The value was nonsensical since you
  cannot delete bindings for an exchange that does not exist.
2024-10-01 14:36:34 -04:00
Jean-Sébastien Pédron f69c082b58
rabbit_feature_flags: New `check_node_compatibility/2` variant
... that considers the local node as if it was reset.

[Why]
When a node joins a cluster, we check its compatibility with the
cluster, reset the node, copy the feature flags states from the remote
cluster and add that node to the cluster.

However, the compatibility check is performed with the current feature
flags states, even though they are about to be reset. Therefore, a node
with an enabled feature flag that is unsupported by the cluster will
refuse to join. It's incorrect because after the reset and the states
copy, it could have join the cluster just fine.

[How]
We introduce a new variant of `check_node_compatibility/2` that takes an
argument to indicate if the local node should be considered as a virgin
node (i.e. like after a reset).

This way, the joining node will always be able to join, regardless of
its initial feature flags states, as long as it doesn't require a
feature flag that is unsupported by the cluster.

This also removes the need to use `$RABBITMQ_FEATURE_FLAGS` environment
variable to force a new node to leave stable feature flags disabled to
allow it to join a cluster running an older version.

References #9677.
2024-10-01 10:47:50 +02:00
Jean-Sébastien Pédron 30ab653561
rabbit_{mnesia,khepri}: Skip generic compat check if `CheckNodesConsistency` is false
[Why]
`CheckNodesConsistency` is set to false when the
`check_cluster_consistency()` is called as part of a node joining a
cluster. And the generic compatibility check was already executed by
`rabbit_db_cluster`.

There is no need to run it again. This is even counter-productive with
the improvement to `rabbit_feature_flags:check_node_compatibility/2`
that follows.
2024-10-01 10:47:49 +02:00
Jean-Sébastien Pédron 8126950ade
rabbit_mnesia: Make some functions backward-compatible
... with older RabbitMQ versions which don't know about Khepri.

[Why]
When an older node wants to join a cluster, it calls `node_info/0` and
`cluster_status_from_mnesia/0` directly using RPC calls. If it does that
against a node already using Khepri, t will get an error telling it that
Mnesia is not running. The error is reported to the end user, making it
difficult to understand the problem: both nodes are simply incompatible.

It's better to leave the final decision to the Feature flags subsystem,
but for that, `rabbit_mnesia` on the newer Khepri-based node still needs
to return something the older version can accept.

[How]
`cluster_status_from_mnesia/0` and `node_info/0` are modified to verify
if Khepri is enabled and if it is, return a value based on Khepri's
status as if it was from Mnesia.

This will let the remote older node to continue all its checks and
eventually refuse to join because the Feature flags subsystem will
indicate they are incompatible.
2024-10-01 10:47:42 +02:00
Loïc Hoguin 9fed03a6d6
Add missing suites to non-CI parallel-ct 2024-09-30 12:37:24 +02:00
Loïc Hoguin addb0607fd
Make rabbit_global_counters:overview/0 generally available
Previously it was only available when TEST=1 was set.
2024-09-30 12:35:44 +02:00
Loïc Hoguin 4530fb5d97
make: Add new CT suites and clarify check on CT_SUITES 2024-09-30 12:35:43 +02:00
Loïc Hoguin aee0cd0079
make & make CI: Small cleanups 2024-09-30 12:35:43 +02:00
Loïc Hoguin ae984cc364
make: Set CT_LOGS_DIR to top-level logs/ directory
All CT logs will now be under <toplevel>/logs. An improved
test workflow would be to always keep the logs/all_runs.html
page open in the browser and refresh it whenever tests are
run in any of the rabbit applications.
2024-09-30 12:35:43 +02:00
Loïc Hoguin 85e358642b
Fix OTP-27 Dialyzer errors in rabbit 2024-09-30 12:35:42 +02:00
Loïc Hoguin 5086553bdd
make: Correct rabbitmq_prelaunch/rabbitmq_stream_common deps 2024-09-30 12:35:42 +02:00
Loïc Hoguin f4f375c6a9
Use Make in CI
This is a proof of concept that mostly works but is missing
some tests, such as rabbitmq_mqtt or rabbitmq_cli. It also
doesn't apply to mixed version testing yet.
2024-09-30 12:35:41 +02:00
Loïc Hoguin 645942cf95
make: Move dep_osiris in rabbitmq-components.mk
Otherwise some plugins can't build if we try to run tests
directly after checkout. This is because the plugins
depend on osiris as well as rabbit, but there is no
dep_osiris defined in the plugin itself.
2024-09-30 12:35:41 +02:00
Loïc Hoguin 30a8de3287
rabbit tests: Delete some temporary files to reduce log sizes 2024-09-30 12:35:41 +02:00
Loïc Hoguin 14fe08152f
Merge feature_flags_with_unpriveleged_user_SUITE back in ff_SUITE
On GH Actions we run as an unprivileged user by default.
2024-09-30 12:35:41 +02:00
David Ansari 36a84f4cde Fix function_clause
Fixes https://github.com/rabbitmq/rabbitmq-server/issues/12398

To repro this crash:
1. Start RabbitMQ v3.13.7 with feature flag message_containers disabled:
```
make run-broker TEST_TMPDIR="$HOME/scratch/rabbit/test" RABBITMQ_FEATURE_FLAGS=quorum_queue,implicit_default_bindings,virtual_host_metadata,maintenance_mode_status,user_limits,feature_flags_v2,stream_queue,classic_queue_type_delivery_support,classic_mirrored_queue_version,stream_single_active_consumer,direct_exchange_routing_v2,listener_records_in_ets,tracking_records_in_ets
```
In the Management UI
2. Create a quorum queue with x-delivery-limit=10
3. Publish a message to this queue.
4. Requeue this message two times.
5. ./sbin/rabbitmqctl enable_feature_flag all
6. Stop the node
7. git checkout v4.0.2
8. make run-broker TEST_TMPDIR="$HOME/scratch/rabbit/test"
9. Again in the Management UI, Get Message with Automatic Ack leads to above crash:

```
[error] <0.1185.0> ** Reason for termination ==
[error] <0.1185.0> ** {function_clause,
[error] <0.1185.0>        [{mc_compat,set_annotation,
[error] <0.1185.0>             [delivery_count,2,
[error] <0.1185.0>              {basic_message,
[error] <0.1185.0>                  {resource,<<"/">>,exchange,<<>>},
[error] <0.1185.0>                  [<<"qq1">>],
[error] <0.1185.0>                  {content,60,
[error] <0.1185.0>                      {'P_basic',undefined,undefined,
[error] <0.1185.0>                          [{<<"x-delivery-count">>,long,2}],
[error] <0.1185.0>                          2,undefined,undefined,undefined,undefined,undefined,
[error] <0.1185.0>                          undefined,undefined,undefined,undefined,undefined},
[error] <0.1185.0>                      none,none,
[error] <0.1185.0>                      [<<"m1">>]},
[error] <0.1185.0>                  <<230,146,94,58,177,125,64,163,30,18,177,132,53,207,69,103>>,
[error] <0.1185.0>                  true}],
[error] <0.1185.0>             [{file,"mc_compat.erl"},{line,61}]},
[error] <0.1185.0>         {rabbit_fifo_client,add_delivery_count_header,2,
[error] <0.1185.0>             [{file,"rabbit_fifo_client.erl"},{line,228}]},
[error] <0.1185.0>         {rabbit_fifo_client,dequeue,4,
[error] <0.1185.0>             [{file,"rabbit_fifo_client.erl"},{line,211}]},
[error] <0.1185.0>         {rabbit_queue_type,dequeue,5,
[error] <0.1185.0>             [{file,"rabbit_queue_type.erl"},{line,755}]},
[error] <0.1185.0>         {rabbit_misc,with_exit_handler,2,
[error] <0.1185.0>             [{file,"rabbit_misc.erl"},{line,526}]},
[error] <0.1185.0>         {rabbit_channel,handle_method,3,
[error] <0.1185.0>             [{file,"rabbit_channel.erl"},{line,1257}]},
[error] <0.1185.0>         {rabbit_channel,handle_cast,2,
[error] <0.1185.0>             [{file,"rabbit_channel.erl"},{line,629}]},
[error] <0.1185.0>         {gen_server2,handle_msg,2,[{file,"gen_server2.erl"},{line,1056}]}]}
```

The mc annotation `delivery_count` is a new mc annotation specifically
used in the header section of AMQP 1.0 messages:
https://docs.oasis-open.org/amqp/core/v1.0/os/amqp-core-messaging-v1.0-os.html#type-header

Hence, we can ignore this annotation for the old `#basic_message{}`.
2024-09-30 11:35:22 +02:00
Arnaud Cogoluègnes 788879969e
Merge pull request #12391 from rabbitmq/anon-term-errors
Comply with §2.2.2 of Anonymous Terminus extension
2024-09-27 09:24:47 +02:00
Michael Klishin cf0a4e8e11
Merge pull request #12390 from rabbitmq/issue-12374
Remove duplicate stats keys in quorum queues
2024-09-27 01:49:58 -04:00
Michael Klishin b815902585 Make it possible to override the license line in the startup banner
This is for Tanzu RabbitMQ, nothing changes for
the open source edition.
2024-09-26 23:45:17 -04:00
David Ansari 6863ae14dd Comply with §2.2.2 of Anonymous Terminus extension
Comply with section 2.2.2 Routing Errors:
https://docs.oasis-open.org/amqp/anonterm/v1.0/cs01/anonterm-v1.0-cs01.html#doc-routingerrors
2024-09-26 16:45:18 +02:00
Diana Parra Corbacho d860efaccc Remove duplicate stats keys in quorum queues
Messages, messages_ready and messages_unacknowledged are duplicated
during management stats collection, resulting in internal errors
when sorting queues in the management UI.
These should not be part of rabbit_core_metrics:queue_stats/2
2024-09-26 12:52:08 +02:00
David Ansari 9d7ebf32a9 Enforce correct transfer settled flag
For messages published to RabbitMQ, RabbitMQ honors the transfer `settled`
field, no matter what value the sender settle mode was set to in the attach
frame.

Therefore, prior to this commit, a client could send a transfer with
`settled=true` even though sender settle mode was set to `unsettled` in the
attach frame.

This commit enforces that the publisher sets only transfer `settled` fields
that are valid with the spec.

If sender settle mode is:
* `unsettled`, the transfer `settled` flag must be `false`.
* `settled`, the transfer `settled` flag must be `true`.
* `mixed`, the transfer `settled` flag can be `true` or `false`.
2024-09-25 18:06:22 +02:00
David Ansari 1245119972 Delete unsupported setting
see https://github.com/rabbitmq/rabbitmq-server/pull/11999 for context
2024-09-25 17:53:35 +02:00
David Ansari 960808e6b2
Emit histogram metric for received message sizes per protocol (#12342)
* Add global histogram metrics for received message sizes per-protocol

fixup: add new files to bazel

fixup: expose message_size_bytes as prometheus classic histogram type

`rabbit_msg_size_metrics` does not use `seshat` any more, but
`counters` directly.

fixup: add msg_size_metrics unit test

* Improve message size histogram

1.
Avoid unnecessary time series emitted for stream protocol
The stream protocol cannot observe message sizes.
This commit ensures that the following time series are omitted:
```
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="64"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="256"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1024"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4096"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16384"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="65536"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="262144"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="1048576"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="4194304"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="16777216"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="67108864"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="268435456"} 0
rabbitmq_global_message_size_bytes_bucket{protocol="stream",le="+Inf"} 0
rabbitmq_global_message_size_bytes_count{protocol="stream"} 0
rabbitmq_global_message_size_bytes_sum{protocol="stream"} 0
```

This reduces the number of time series by 15.

2.
Further reduce the number of time series by reducing the number of
buckets. Instead of 13 bucktes, emit only 9 buckets. Buckets are not
free, each is an extra time series stored.

Prior to this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      92
```

After this commit:
```
curl -s -u guest:guest localhost:15692/metrics | ag message_size | wc -l
      57
```

3.
The emitted metric should be called
`rabbitmq_message_size_bytes_bucket` instead of `rabbitmq_global_message_size_bytes_bucket`.
The latter is poor naming. There is no need to use `global` in
the metric name given that this metric doesn't exist in the old flawed
aggregated metrics.

4.
This commit simplies module `rabbit_global_counters`.

5.
Avoid garbage collecting the 10-elements list of buckets per message
being received.

---------

Co-authored-by: Péter Gömöri <peter@84codes.com>
2024-09-24 18:08:24 +02:00
Jean-Sébastien Pédron 8268a11eb2
Merge pull request #12357 from rabbitmq/fix-non-canonical-links
Fix non-canonical and broken links
2024-09-24 09:27:10 +02:00
Karl Nilsson 2ae4dbeb1a QQ: fix off-by-one bug in release cursor effects.
{release_cursor, Idx} effects promote checkpoints with an index
lower or _equal_ to the release cursor index. rabbit_fifo is emitting
the smallest active raft index instead which could cause the log to truncate
one index too many after a checkpoint promotion.
2024-09-23 17:20:17 +01:00
Jean-Sébastien Pédron 5158460cc6
rabbitmqctl.8: Point to 3.13.x mirroring guide
[Why]
Classic queue mirroring was removed from RabbitMQ 4.0.x.
2024-09-23 13:25:39 +02:00
Jean-Sébastien Pédron fe10cd88c3
rabbit/Makefile: Delete `.html` from local URL in manpages 2024-09-23 13:24:54 +02:00
Jean-Sébastien Pédron 89fc33a0f2
Use the new URLs of the `www.rabbitmq.com` website
They changed with the switch to Docusaurus. This avoids a redirect and
gives cleaner search results.
2024-09-23 11:34:54 +02:00
Jean-Sébastien Pédron e7784df169
Use the canonical `www.rabbitmq.com` domain
Using `rabbitmq.com` works and redirects to `www.rabbitmq.com`, but it
is preferable to use the canonical domain to have cleaner search
results.

This is important for manpages because we have an HTML copy in the
website.
2024-09-23 11:13:08 +02:00
Diana Parra Corbacho 430a6b469b Make rabbit_table:wait/2 silent when checking if cmq are used 2024-09-19 17:23:28 +02:00
David Ansari b1eb354385 Strictly validate annotations 2024-09-18 12:42:27 +02:00
David Ansari cd600bef8b Fix modified annotations
```
<field name="message-annotations" type="fields"/>
```

Prior to this commit integration tests succeeded because both Erlang
client and RabbitMQ server contained a bug.

This bug was noticed by a Java client test suite.
2024-09-18 09:38:44 +02:00
Michael Klishin 27dac87a20
Khepri feature flag: add a documentation URL
That links to the vNext version of the site for
now. In 4.0.x, we can change it to the vCurrent
version.
2024-09-17 03:38:26 -04:00
Michael Davis a65ceb6372
rabbit_amqqueue: Catch exits when reading classic Q `consumers/1`
`delegate:invoke/2` catches errors but not exits of the delegate
process. Another process might query for a classic queue's consumers
while the classic queue is being deleted or otherwise terminating and
that would result in an exit of the calling process previously.
2024-09-16 14:43:27 -04:00
Michael Davis 9627903716
rabbit_queue_type: Add `{error,timeout}` to delete/4 callback spec
This return value was already possible since a classic queue will return
it during termination if `rabbit_amqqueue:internal_delete/2` fails with
that value.

`rabbit_amqqueue:delete/4` already handles this value and converts it
into a protocol error and channel exit. The other caller (MQTT
processor) will be updated in a child commit.

This commit also replaces eager conversions to protocol errors in
rabbit_classic_queue, rabbit_quorum_queue and rabbit_stream_coordinator:
we should return `{error, timeout}` consistently and not hide it in
protocol errors.
2024-09-16 14:43:24 -04:00
Diana Parra Corbacho 05f0e03c38 Quorum queues: unblock publishers when clearing max-length policy 2024-09-16 12:19:01 +02:00
Michal Kuratczyk ea976e5b86 Failing test for max-length policy deletion
Clearing a max-length policy doesn't unblock existing
publishers. When a new publisher connects, it can publish
to the queue.
2024-09-16 12:19:01 +02:00
Michael Klishin 4ec0f5e300
Merge pull request #12303 from rabbitmq/issue-1049
forget_cluster_node: delete all local classic queues when using Khepri store
2024-09-13 15:39:36 -04:00
Michal Kuratczyk b64ebf1a91
Fix formatter crash
Before:
```
FORMATTER CRASH: {"Waiting for ~ts queues and streams to have quorum+1 replicas online.You can list them with `rabbitmq-diagnostics check_if_node_is_quorum_critical`","\t"}
```
After:
```
Waiting for 9 queues and streams to have quorum+1 replicas online. You can list them with `rabbitmq-diagnostics check_if_node_is_quorum_critical`
```
2024-09-13 18:09:08 +02:00
Michal Kuratczyk f0f7500f6a
Revert "Log errors from `ranch:handshake`" (#12304)
This reverts commit 620fff22f1.

It intoduced a regression in another area - a TCP health check,
such as the default (with cluster-operator) readinessProbe,
on a TLS-enabled instance would log a `rabbit_reader` crash
every few seconds:
```
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>   crasher:
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     initial call: rabbit_reader:init/3
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     pid: <0.999.0>
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     registered_name: []
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>     exception error: no match of right hand side value {error, handshake_failed}
tls-server-0 rabbitmq 2024-09-13 09:03:13.010115+00:00 [error] <0.999.0>       in function  rabbit_reader:init/3 (rabbit_reader.erl, line 171)
```
2024-09-13 17:07:57 +02:00
David Ansari f78f14ab1d Display container-id in the UI and CLI 2024-09-13 17:05:46 +02:00
Michael Klishin a1893fb28a
Tweak a log message when all classic queues on a node are being removed 2024-09-13 10:52:11 -04:00
Diana Parra Corbacho 29bfaa9ac7 Test remove classic queues when node is removed 2024-09-13 15:01:24 +02:00
Michal Kuratczyk 1db3fd391a
Log when deleting all queues on a forgotten node 2024-09-13 14:48:35 +02:00
Diana Parra Corbacho 990e6d2dc7 forget_cluster_node: delete all local classic queues when using Khepri store
When a cluster node is removed, all classic queues hosted on it should be
removed. This was done for Mnesia but not for the new Khepri metadata store
2024-09-13 13:46:53 +02:00