Commit Graph

865 Commits

Author SHA1 Message Date
Michael Klishin 6d3f008336
CLI: mix format 2023-10-10 09:24:29 -04:00
Michael Klishin 17fded2e12
CLI: support for pre-hashed passwords
Providing a pre-hashed and salted password is
not significantly more secure but satisfies those
who cannot pass clear text passwords on the command
line for regulatory reasons.

Note that the optimal way of seeding users is still
definition import on node boot, not scripting with
CLI tools.

Closes #9166
2023-10-09 23:52:34 -04:00
Alex Valiushko 4df3080fa1 New quorum queue members join as temporary non-voters
Because both `add_member` and `grow` default to Membership status `promotable`,
new members will have to catch up before they are considered cluster members.
This can be overridden with either `voter` or (permanent `non_voter` statuses.
The latter one is useless without additional tooling so kept undocumented.

- non-voters do not affect quorum size for election purposes
- `observer_cli` reports their status with lowercase 'f'
- `rabbitmq-queues check_if_node_is_quorum_critical` takes voter status into
account
2023-10-04 11:14:07 -07:00
Diana Parra Corbacho 5f0981c5a3
Allow to use Khepri database to store metadata instead of Mnesia
[Why]

Mnesia is a very powerful and convenient tool for Erlang applications:
it is a persistent disc-based database, it handles replication accross
multiple Erlang nodes and it is available out-of-the-box from the
Erlang/OTP distribution. RabbitMQ relies on Mnesia to manage all its
metadata:

* virtual hosts' properties
* intenal users
* queue, exchange and binding declarations (not queues data)
* runtime parameters and policies
* ...

Unfortunately Mnesia makes it difficult to handle network partition and,
as a consequence, the merge conflicts between Erlang nodes once the
network partition is resolved. RabbitMQ provides several partition
handling strategies but they are not bullet-proof. Users still hit
situations where it is a pain to repair a cluster following a network
partition.

[How]

@kjnilsson created Ra [1], a Raft consensus library that RabbitMQ
already uses successfully to implement quorum queues and streams for
instance. Those queues do not suffer from network partitions.

We created Khepri [2], a new persistent and replicated database engine
based on Ra and we want to use it in place of Mnesia in RabbitMQ to
solve the problems with network partitions.

This patch integrates Khepri as an experimental feature. When enabled,
RabbitMQ will store all its metadata in Khepri instead of Mnesia.

This change comes with behavior changes. While Khepri remains disabled,
you should see no changes to the behavior of RabbitMQ. If there are
changes, it is a bug. After Khepri is enabled, there are significant
changes of behavior that you should be aware of.

Because it is based on the Raft consensus algorithm, when there is a
network partition, only the cluster members that are in the partition
with at least `(Number of nodes in the cluster ÷ 2) + 1` number of nodes
can "make progress". In other words, only those nodes may write to the
Khepri database and read from the database and expect a consistent
result.

For instance in a cluster of 5 RabbitMQ nodes:
* If there are two partitions, one with 3 nodes, one with 2 nodes, only
  the group of 3 nodes will be able to write to the database.
* If there are three partitions, two with 2 nodes, one with 1 node, none
  of the group can write to the database.

Because the Khepri database will be used for all kind of metadata, it
means that RabbitMQ nodes that can't write to the database will be
unable to perform some operations. A list of operations and what to
expect is documented in the associated pull request and the RabbitMQ
website.

This requirement from Raft also affects the startup of RabbitMQ nodes in
a cluster. Indeed, at least a quorum number of nodes must be started at
once to allow nodes to become ready.

To enable Khepri, you need to enable the `khepri_db` feature flag:

    rabbitmqctl enable_feature_flag khepri_db

When the `khepri_db` feature flag is enabled, the migration code
performs the following two tasks:
1. It synchronizes the Khepri cluster membership from the Mnesia
   cluster. It uses `mnesia_to_khepri:sync_cluster_membership/1` from
   the `khepri_mnesia_migration` application [3].
2. It copies data from relevant Mnesia tables to Khepri, doing some
   conversion if necessary on the way. Again, it uses
   `mnesia_to_khepri:copy_tables/4` from `khepri_mnesia_migration` to do
   it.

This can be performed on a running standalone RabbitMQ node or cluster.
Data will be migrated from Mnesia to Khepri without any service
interruption. Note that during the migration, the performance may
decrease and the memory footprint may go up.

Because this feature flag is considered experimental, it is not enabled
by default even on a brand new RabbitMQ deployment.

More about the implementation details below:

In the past months, all accesses to Mnesia were isolated in a collection
of `rabbit_db*` modules. This is where the integration of Khepri mostly
takes place: we use a function called `rabbit_khepri:handle_fallback/1`
which selects the database and perform the query or the transaction.
Here is an example from `rabbit_db_vhost`:

* Up until RabbitMQ 3.12.x:

        get(VHostName) when is_binary(VHostName) ->
            get_in_mnesia(VHostName).

* Starting with RabbitMQ 3.13.0:

        get(VHostName) when is_binary(VHostName) ->
            rabbit_khepri:handle_fallback(
              #{mnesia => fun() -> get_in_mnesia(VHostName) end,
                khepri => fun() -> get_in_khepri(VHostName) end}).

This `rabbit_khepri:handle_fallback/1` function relies on two things:
1. the fact that the `khepri_db` feature flag is enabled, in which case
   it always executes the Khepri-based variant.
4. the ability or not to read and write to Mnesia tables otherwise.

Before the feature flag is enabled, or during the migration, the
function will try to execute the Mnesia-based variant. If it succeeds,
then it returns the result. If it fails because one or more Mnesia
tables can't be used, it restarts from scratch: it means the feature
flag is being enabled and depending on the outcome, either the
Mnesia-based variant will succeed (the feature flag couldn't be enabled)
or the feature flag will be marked as enabled and it will call the
Khepri-based variant. The meat of this function really lives in the
`khepri_mnesia_migration` application [3] and
`rabbit_khepri:handle_fallback/1` is a wrapper on top of it that knows
about the feature flag.

However, some calls to the database do not depend on the existence of
Mnesia tables, such as functions where we need to learn about the
members of a cluster. For those, we can't rely on exceptions from
Mnesia. Therefore, we just look at the state of the feature flag to
determine which database to use. There are two situations though:

* Sometimes, we need the feature flag state query to block because the
  function interested in it can't return a valid answer during the
  migration. Here is an example:

        case rabbit_khepri:is_enabled(RemoteNode) of
            true  -> can_join_using_khepri(RemoteNode);
            false -> can_join_using_mnesia(RemoteNode)
        end

* Sometimes, we need the feature flag state query to NOT block (for
  instance because it would cause a deadlock). Here is an example:

        case rabbit_khepri:get_feature_state() of
            enabled -> members_using_khepri();
            _       -> members_using_mnesia()
        end

Direct accesses to Mnesia still exists. They are limited to code that is
specific to Mnesia such as classic queue mirroring or network partitions
handling strategies.

Now, to discover the Mnesia tables to migrate and how to migrate them,
we use an Erlang module attribute called
`rabbit_mnesia_tables_to_khepri_db` which indicates a list of Mnesia
tables and an associated converter module. Here is an example in the
`rabbitmq_recent_history_exchange` plugin:

    -rabbit_mnesia_tables_to_khepri_db(
       [{?RH_TABLE, rabbit_db_rh_exchange_m2k_converter}]).

The converter module  — `rabbit_db_rh_exchange_m2k_converter` in this
example  — is is fact a "sub" converter module called but
`rabbit_db_m2k_converter`. See the documentation of a `mnesia_to_khepri`
converter module to learn more about these modules.

[1] https://github.com/rabbitmq/ra
[2] https://github.com/rabbitmq/khepri
[3] https://github.com/rabbitmq/khepri_mnesia_migration

See #7206.

Co-authored-by: Jean-Sébastien Pédron <jean-sebastien@rabbitmq.com>
Co-authored-by: Diana Parra Corbacho <dparracorbac@vmware.com>
Co-authored-by: Michael Davis <mcarsondavis@gmail.com>
2023-09-29 16:00:11 +02:00
Ayanda-D 1d163886fd new rabbit_amqqueue_control for queue related control operations and fix bazel issues raised in MK's review 2023-09-11 14:13:03 +01:00
Ayanda-D 290c1c05a4 add test for ctl delete_queue for a stopped queue 2023-09-11 14:13:03 +01:00
Ayanda-D 7daaee8da0 formatting 2023-09-11 14:13:03 +01:00
Ayanda-D 3ac283d13e use queue resource names for new queue operations (changes the api specs) 2023-09-11 14:13:03 +01:00
Ayanda-D 3502929b9f ctl delete_queue command test case for a crashed queue 2023-09-11 14:13:03 +01:00
Ayanda-D dd900b4c98 fix unrelated typo 2023-09-11 14:13:03 +01:00
Jean-Sébastien Pédron 7b26f44405
list_policies_command_test.exs: Use a non-deprecated policy 2023-09-04 21:09:11 +02:00
Michael Klishin 1a8c5c0e49
mix format 2023-07-08 06:32:32 +04:00
Michael Davis 5f3baf381d
CLI: Rename disk space monitoring commands 2023-07-07 11:16:19 -05:00
Jean-Sébastien Pédron 15f208eed8
rabbit_db_cluster: Move generic clustering steps from `rabbit_mnesia`
[Why]
When a single node or a cluster is initialized, we go through a few
steps which are not Mnesia-specific or even related. For instance, we
verify that both ends are compatible w.r.t. feature flags.

When we will introduce Khepri, we will have to go through the same
generic steps. Therefore, it makes sense to drive those steps from
`rabbit_db_cluster` and only call into `rabbit_mnesia` when needed.

[How]
The generic code is moved from `rabbit_mnesia` to `rabbit_db_cluster`.
2023-07-07 09:42:32 +02:00
Rin Kuryloski 2c5e9a5775 Fix rabbitmq_cli test compilation under elixir 1.15 2023-07-04 17:45:32 +02:00
Rin Kuryloski 42d29a5ca3 Run 'mix format' with elixir 1.15.2 2023-07-04 17:45:32 +02:00
Michael Klishin 98c85f367f Bump (c) year 2023-07-04 00:21:40 +04:00
Michael Klishin e1a1984293
CLI tools: introduce commands that enable and disable free disk space monitoring (#8743)
* Closes #8741

References #8740

---------

Co-authored-by: Michael Davis <mcarsondavis@gmail.com>
2023-07-04 00:14:01 +04:00
Michael Klishin 8625301e8d CLI: drop two table formatter tests
The only way to make them reliable in their current form
would be converting to keyword lists, which are covered anyway.
2023-04-24 19:52:45 +04:00
Michael Klishin ddf7926941 CLI: make input ordering preditable when formatting a table
if the input is given as a map.

References #7931, #7921
2023-04-24 14:50:31 +04:00
Michael Klishin cec151df59
Introduce a way to update virtual host metadata using CLI tools (#7914)
Introduce 'ctl update_vhost_metadata'

that can be used to update the description, tags or default queue type of
any existing virtual hosts.

Closes #7912, #7857.

#7912 will need an HTTP API counterpart change.
2023-04-18 02:54:37 +04:00
Michal Kuratczyk e62bef164a
Don't rely on implicit order in CLI tests 2023-04-13 14:37:19 +02:00
Michal Kuratczyk 699af2c8c3
Don't rely on implicit order in a test 2023-04-13 14:37:18 +02:00
Jean-Sébastien Pédron 091d756d97
rabbitmq_cli: Fix invalid feature flag definitions
`desc` was set to binaries instead of strings.
2023-03-10 09:25:42 +01:00
Rin Kuryloski 93be82d09f Add 3.13.0 as an allowed version in the test plugin in cli tests 2023-02-14 18:11:23 +01:00
Michael Klishin 9beda74d86
Sort tags before comparing them in this test
now that the server sorts and deduplicates them
2023-02-13 22:16:21 -03:00
Jean-Sébastien Pédron 78e8c595b1
set_permissions_globally_command_test: Fix "invalid_regex patterns" test expectations
Testcases are executed in a random order. Unfortunately, this testcase
depended on side effects of other testcases. If this testcase was
executed first, then there were no permissions set and the testcase
would fail.

It now lists permissions before and after the actual test and compare
both.
2023-02-08 10:21:10 +01:00
Jean-Sébastien Pédron 271a7babdf
set_permissions_globally_command_test: Always set @user
Otherwise the command causes a `function_clause` exception in the broker
because a `nil` atom is passed as the username instead of a binary.
2023-02-08 10:18:24 +01:00
Michael Klishin 69add44ce3
CLI: mix format 2023-02-06 15:54:09 -05:00
Michael Klishin ab99ccfa45
Introduce a CLI command that grants permissions to all virtual hosts
Closes #1000
2023-02-06 13:07:09 -05:00
Simon Unge 03617f681c See #5957. Accept empty arg and prompt for password 2023-01-26 11:46:24 -08:00
Simon Unge 67bc94ed16 See #5957. CLI command to generate hashed password from cleartext password 2023-01-23 14:47:29 -08:00
Michael Klishin c3c4665970
Update tests 2023-01-16 09:24:37 -08:00
Jean-Sébastien Pédron 4840ca9f2f
Merge pull request #6866 from rabbitmq/init-db-from-rabbit_db
rabbit_db: Add `init/0`, `is_virgin_node/0`, `dir/0` and `ensure_dir_exists/0` functions
2023-01-13 16:26:06 +01:00
Jean-Sébastien Pédron 950c4ef7eb
Use `rabbit:data_dir/0` instead of `rabbit_mnesia:dir/0` where it makes sense
Some testcases are interested in RabbitMQ data directory, not Mnesia
directory per se. In this case, call `rabbit:data_dir/0` instead.
2023-01-13 11:56:21 +01:00
Rin Kuryloski d3794cf2c0 Conform vhost tags to a list when set with the cli in all cases
`rabbitmqctl add_vhost myvhost --tags "my_tag"` would not previously
conform "my_tag" to a list before setting vhost metadata, which could
cause crashes when the list was read.
2023-01-13 10:06:11 +01:00
Michael Klishin ec4f1dba7d
(c) year bump: 2022 => 2023 2023-01-01 23:17:36 -05:00
Jean-Sébastien Pédron 46b71e8994
rabbitmq-cli: Deleting a non-existing user or vhost is fine
This patch adapts several testcases to this behavior change. The goal is
to avoid transactions where there is no need for one.
2022-12-14 10:06:45 +01:00
Jean-Sébastien Pédron b25175563c
rabbitmq-cli: Fix several testcases which didn't test anything
RabbitMQ lacks argument verifications in many places unfortunately.
Several testcases were passing bad arguments to the command but were
also comparing the result to bad data. This happened to "work"...
2022-12-13 14:55:46 +01:00
Jean-Sébastien Pédron 99b14fd0fa
rabbit_env: Rename `mnesia_*dir` to `data_*dir`
The location and name of this directory remains the same for
compatibility reasons. Therefore, it sill contains "mnesia" in its name.
However, semantically, we want this directory to be unrelated to Mnesia.

In the end, many subsystems write files and directories there, including
Mnesia, all Ra systems and in the future, Khepri.
2022-11-30 13:00:49 +01:00
Alex Valiushko d70660dac7 Add inclusive aliases to ctl info keys
Includes generic ability to provide aliases in other commands.
2022-11-11 14:44:05 -08:00
Michael Klishin 6ebbfaa0a4
Adopt new helpers that assert on plugin presence and state
In many tests, we do not care what is the complete set of plugins
running on a node (or present, or in a specific state). We only
care that a few select ones are running, are of the expected version,
in a certain state, and so on.

So list comparison assertions are counterproductive and lead to
test interference that is difficult to track down. In many cases
we can do more fine grained assertions and ignore the rest of
the plugins present on the node.

References #6289, #6020.
2022-11-02 14:07:10 +04:00
Michael Klishin 8c6b3bfbc6
CLI: continue reworking assertions on plugin state 2022-11-02 13:08:44 +04:00
Michael Klishin e1d14bac2d
Reset enabled plugins after each test that enables any 2022-11-02 13:08:44 +04:00
Simon Unge 917bf55e19
See Issue 6020. Take plugin dependency into account 2022-11-02 13:08:44 +04:00
Michael Klishin 0cd8dff415
CLI: improve isolation of set_disk_free_limit tests 2022-11-01 16:39:35 +04:00
Jean-Sébastien Pédron 4b132daaba
Remove upgrade-specific log file
This category should be unused with the decommissioning of the old
upgrade subsystem (in favor of the feature flags subsystem). It means:
1. The upgrade log file will not be created by default anymore.
2. The `$RABBITMQ_UPGRADE_LOG` environment variable is now unsupported.

The configuration variables remain to avoid breaking an existing and
working configuration.
2022-10-06 21:28:50 +02:00
Ayanda Dube 4cbbaad2df mix format rabbitmq_cli 2022-10-02 18:54:11 +01:00
Jean-Sébastien Pédron 5b98d7d2a2
Remove test code which depended on the `maintenance_mode_status` feature flags
These checks are now irrelevant as the feature flag is required.
2022-07-29 11:51:52 +02:00
Rin Kuryloski e9c1e6b680 Bump the rabbitmq version on this branch to 3.12.0 (bazel)
To reduce cache misses, the bazel build uses a hard coded version. A
correct value is injected in releases.
2022-07-28 09:45:26 +02:00