Commit Graph

24 Commits

Author SHA1 Message Date
Loïc Hoguin c5d150a7ef
Use Erlang.mk's native Elixir support for CLI
This avoids using Mix while compiling which simplifies
a number of things and let us do further build improvements
later on.

Elixir is only enabled from within rabbitmq_cli currently.

Eunit is disabled since there are only Elixir tests.

Dialyzer will force-enable Elixir in order to process
Elixir-compiled beam files.

This commit also includes a few changes that are
related:

 * The Erlang distribution will now be started for parallel-ct

 * Many unnecessary PROJECT_MOD lines have been removed

 * `eunit_formatters` has been removed, it provides little value

 * The new `maybe_flock` Erlang.mk function is used where possible

 * Build test deps when testing rabbitmq_cli (Mix won't do it anymore)

 * rabbitmq_ct_helpers now use the early plugins to have Dialyzer
   properly set up
2025-03-18 10:02:49 +01:00
Rin Kuryloski 6a9d668def Set PLT_APPS in a number of plugins where it was missing 2024-04-29 14:54:28 +02:00
Diana Parra Corbacho 5f0981c5a3
Allow to use Khepri database to store metadata instead of Mnesia
[Why]

Mnesia is a very powerful and convenient tool for Erlang applications:
it is a persistent disc-based database, it handles replication accross
multiple Erlang nodes and it is available out-of-the-box from the
Erlang/OTP distribution. RabbitMQ relies on Mnesia to manage all its
metadata:

* virtual hosts' properties
* intenal users
* queue, exchange and binding declarations (not queues data)
* runtime parameters and policies
* ...

Unfortunately Mnesia makes it difficult to handle network partition and,
as a consequence, the merge conflicts between Erlang nodes once the
network partition is resolved. RabbitMQ provides several partition
handling strategies but they are not bullet-proof. Users still hit
situations where it is a pain to repair a cluster following a network
partition.

[How]

@kjnilsson created Ra [1], a Raft consensus library that RabbitMQ
already uses successfully to implement quorum queues and streams for
instance. Those queues do not suffer from network partitions.

We created Khepri [2], a new persistent and replicated database engine
based on Ra and we want to use it in place of Mnesia in RabbitMQ to
solve the problems with network partitions.

This patch integrates Khepri as an experimental feature. When enabled,
RabbitMQ will store all its metadata in Khepri instead of Mnesia.

This change comes with behavior changes. While Khepri remains disabled,
you should see no changes to the behavior of RabbitMQ. If there are
changes, it is a bug. After Khepri is enabled, there are significant
changes of behavior that you should be aware of.

Because it is based on the Raft consensus algorithm, when there is a
network partition, only the cluster members that are in the partition
with at least `(Number of nodes in the cluster ÷ 2) + 1` number of nodes
can "make progress". In other words, only those nodes may write to the
Khepri database and read from the database and expect a consistent
result.

For instance in a cluster of 5 RabbitMQ nodes:
* If there are two partitions, one with 3 nodes, one with 2 nodes, only
  the group of 3 nodes will be able to write to the database.
* If there are three partitions, two with 2 nodes, one with 1 node, none
  of the group can write to the database.

Because the Khepri database will be used for all kind of metadata, it
means that RabbitMQ nodes that can't write to the database will be
unable to perform some operations. A list of operations and what to
expect is documented in the associated pull request and the RabbitMQ
website.

This requirement from Raft also affects the startup of RabbitMQ nodes in
a cluster. Indeed, at least a quorum number of nodes must be started at
once to allow nodes to become ready.

To enable Khepri, you need to enable the `khepri_db` feature flag:

    rabbitmqctl enable_feature_flag khepri_db

When the `khepri_db` feature flag is enabled, the migration code
performs the following two tasks:
1. It synchronizes the Khepri cluster membership from the Mnesia
   cluster. It uses `mnesia_to_khepri:sync_cluster_membership/1` from
   the `khepri_mnesia_migration` application [3].
2. It copies data from relevant Mnesia tables to Khepri, doing some
   conversion if necessary on the way. Again, it uses
   `mnesia_to_khepri:copy_tables/4` from `khepri_mnesia_migration` to do
   it.

This can be performed on a running standalone RabbitMQ node or cluster.
Data will be migrated from Mnesia to Khepri without any service
interruption. Note that during the migration, the performance may
decrease and the memory footprint may go up.

Because this feature flag is considered experimental, it is not enabled
by default even on a brand new RabbitMQ deployment.

More about the implementation details below:

In the past months, all accesses to Mnesia were isolated in a collection
of `rabbit_db*` modules. This is where the integration of Khepri mostly
takes place: we use a function called `rabbit_khepri:handle_fallback/1`
which selects the database and perform the query or the transaction.
Here is an example from `rabbit_db_vhost`:

* Up until RabbitMQ 3.12.x:

        get(VHostName) when is_binary(VHostName) ->
            get_in_mnesia(VHostName).

* Starting with RabbitMQ 3.13.0:

        get(VHostName) when is_binary(VHostName) ->
            rabbit_khepri:handle_fallback(
              #{mnesia => fun() -> get_in_mnesia(VHostName) end,
                khepri => fun() -> get_in_khepri(VHostName) end}).

This `rabbit_khepri:handle_fallback/1` function relies on two things:
1. the fact that the `khepri_db` feature flag is enabled, in which case
   it always executes the Khepri-based variant.
4. the ability or not to read and write to Mnesia tables otherwise.

Before the feature flag is enabled, or during the migration, the
function will try to execute the Mnesia-based variant. If it succeeds,
then it returns the result. If it fails because one or more Mnesia
tables can't be used, it restarts from scratch: it means the feature
flag is being enabled and depending on the outcome, either the
Mnesia-based variant will succeed (the feature flag couldn't be enabled)
or the feature flag will be marked as enabled and it will call the
Khepri-based variant. The meat of this function really lives in the
`khepri_mnesia_migration` application [3] and
`rabbit_khepri:handle_fallback/1` is a wrapper on top of it that knows
about the feature flag.

However, some calls to the database do not depend on the existence of
Mnesia tables, such as functions where we need to learn about the
members of a cluster. For those, we can't rely on exceptions from
Mnesia. Therefore, we just look at the state of the feature flag to
determine which database to use. There are two situations though:

* Sometimes, we need the feature flag state query to block because the
  function interested in it can't return a valid answer during the
  migration. Here is an example:

        case rabbit_khepri:is_enabled(RemoteNode) of
            true  -> can_join_using_khepri(RemoteNode);
            false -> can_join_using_mnesia(RemoteNode)
        end

* Sometimes, we need the feature flag state query to NOT block (for
  instance because it would cause a deadlock). Here is an example:

        case rabbit_khepri:get_feature_state() of
            enabled -> members_using_khepri();
            _       -> members_using_mnesia()
        end

Direct accesses to Mnesia still exists. They are limited to code that is
specific to Mnesia such as classic queue mirroring or network partitions
handling strategies.

Now, to discover the Mnesia tables to migrate and how to migrate them,
we use an Erlang module attribute called
`rabbit_mnesia_tables_to_khepri_db` which indicates a list of Mnesia
tables and an associated converter module. Here is an example in the
`rabbitmq_recent_history_exchange` plugin:

    -rabbit_mnesia_tables_to_khepri_db(
       [{?RH_TABLE, rabbit_db_rh_exchange_m2k_converter}]).

The converter module  — `rabbit_db_rh_exchange_m2k_converter` in this
example  — is is fact a "sub" converter module called but
`rabbit_db_m2k_converter`. See the documentation of a `mnesia_to_khepri`
converter module to learn more about these modules.

[1] https://github.com/rabbitmq/ra
[2] https://github.com/rabbitmq/khepri
[3] https://github.com/rabbitmq/khepri_mnesia_migration

See #7206.

Co-authored-by: Jean-Sébastien Pédron <jean-sebastien@rabbitmq.com>
Co-authored-by: Diana Parra Corbacho <dparracorbac@vmware.com>
Co-authored-by: Michael Davis <mcarsondavis@gmail.com>
2023-09-29 16:00:11 +02:00
Rin Kuryloski ca1806dbcd
Check additional applications when comparing bazel and make results (#8209)
* Check additional applications when comparing bazel and make results

* Sync bazel/make for amqp_client

* Do not fail-fast in build system comparison

* promethus -> prometheus

* Regenerate BUILD.redbug

* When comparing build systems & .app files ignore empty 'registered'

It's listed as a required key in
https://www.erlang.org/doc/man/app.html, but the same docs state the
default is "[]". It seems to ignore it if it's empty.

* Copy bazel/BUILD.osiris from BUILD.bazel in the osiris repo

Normally it would be generated with `bazel run gazelle-update-repos --
-args osiris@1.5.1=github.com/rabbitmq/osiris@v1.5.1`, but in this
case we just want to match it's compilation with erlang.mk with some
manual tweaks.

* Use elixir 1.15, otherwise mix format fails

* Sync bazel/make for rabbitmq_web_dispatch, rabbitmq_management_agent
2023-07-12 17:26:16 +02:00
Loïc Hoguin dc70cbf281
Update Erlang.mk and switch to new xref code 2022-05-31 13:51:12 +02:00
Philip Kuryloski a63f169fcb Remove duplicate rabbitmq-components.mk and erlang.mk files
Also adjust the references in rabbitmq-components.mk to account for
post monorepo locations
2021-03-22 15:40:19 +01:00
Jean-Sébastien Pédron 2f3665e3d4 Merge branch 'stable' 2017-05-16 18:09:17 +02:00
Jean-Sébastien Pédron 6fccc53d26 Makefile: Load the new `rabbitmq-early-plugin.mk` early-stage plugin
See the corresponding commit in rabbitmq-common for an explanation.

[#144697185]
2017-05-16 17:34:37 +02:00
Jean-Sébastien Pédron b0483a3457 Merge branch 'stable' 2017-01-12 10:50:29 +01:00
Jean-Sébastien Pédron 8096ba744f Makefile: Add rabbitmq_ct_client_helpers to TEST_DEPS
metrics_SUITE depends on rabbitmq_ct_client_helpers and amqp_client now.
This fixes the testsuite.
2017-01-12 10:43:46 +01:00
Jean-Sébastien Pédron c77788b2b9 Merge branch 'stable' 2016-12-07 15:41:32 +01:00
Jean-Sébastien Pédron f59dab0c14 Move from .app.src to Makefile variables
This is the recommended way with Erlang.mk.

By default, the version is inherited from rabbitmq-server-release when
the source archive is created, or computed from git-describe(1) (see
`rabbitmq-components.mk`). One can override the version from the command
line by setting the `PROJECT_VERSION` variable.

[#130992027]
2016-12-06 16:15:16 +01:00
Michael Klishin 369a4a1a76 Merge branch 'stable'
Conflicts:
	src/rabbit_mgmt_external_stats.erl
	src/rabbitmq_management_agent.app.src
2016-12-01 14:14:58 +03:00
kjnilsson b1a4a2e6a1 Move rabbit_mgmt_format and mochiweb_util to rabbitmq_management_agent.
Dialyzer fixes
2016-11-29 16:32:19 +00:00
Jean-Sébastien Pédron 1f218d8534 Makefile: Add rabbitmq_ct_helpers to TEST_DEPS
[#130086871]
2016-09-27 15:55:51 +02:00
Jean-Sébastien Pédron 9c453bea49 Merge branch 'stable' 2016-09-23 16:13:42 +02:00
Jean-Sébastien Pédron 2b792e3692 Makefile: Explicitely list all DEPS
Sync rabbitmq-components.mk with rabbitmq-common to remove automatic
DEPS handling.

[#130086871]
2016-09-19 17:10:34 +02:00
kjnilsson 507a920b13 filter out rabbitmq-test from TEST_DEPS 2016-07-18 12:00:31 +01:00
Jean-Sébastien Pédron a866afef43 rabbit_common is added automatically 2015-10-14 12:34:19 +02:00
Jean-Sébastien Pédron a6e0adccc7 Initial move to erlang.mk 2015-10-13 16:09:27 +02:00
David Wragg d4e984c1b7 Overhaul for bug23568 2010-12-02 12:32:26 +00:00
Simon MacMullen 9deb523d8d Become rabbitmq-management-agent 2010-11-05 13:38:34 +00:00
Simon MacMullen 09038e0bf1 Don't need _util and _format any more. 2010-11-04 17:59:07 +00:00
Simon MacMullen d86f908c8f Initial checkin (bug 23374-related). 2010-11-04 17:26:47 +00:00