Go to file

Diana Parra Corbacho 5f0981c5a3 Allow to use Khepri database to store metadata instead of Mnesia [Why] Mnesia is a very powerful and convenient tool for Erlang applications: it is a persistent disc-based database, it handles replication accross multiple Erlang nodes and it is available out-of-the-box from the Erlang/OTP distribution. RabbitMQ relies on Mnesia to manage all its metadata: * virtual hosts' properties * intenal users * queue, exchange and binding declarations (not queues data) * runtime parameters and policies * ... Unfortunately Mnesia makes it difficult to handle network partition and, as a consequence, the merge conflicts between Erlang nodes once the network partition is resolved. RabbitMQ provides several partition handling strategies but they are not bullet-proof. Users still hit situations where it is a pain to repair a cluster following a network partition. [How] @kjnilsson created Ra [1], a Raft consensus library that RabbitMQ already uses successfully to implement quorum queues and streams for instance. Those queues do not suffer from network partitions. We created Khepri [2], a new persistent and replicated database engine based on Ra and we want to use it in place of Mnesia in RabbitMQ to solve the problems with network partitions. This patch integrates Khepri as an experimental feature. When enabled, RabbitMQ will store all its metadata in Khepri instead of Mnesia. This change comes with behavior changes. While Khepri remains disabled, you should see no changes to the behavior of RabbitMQ. If there are changes, it is a bug. After Khepri is enabled, there are significant changes of behavior that you should be aware of. Because it is based on the Raft consensus algorithm, when there is a network partition, only the cluster members that are in the partition with at least `(Number of nodes in the cluster ÷ 2) + 1` number of nodes can "make progress". In other words, only those nodes may write to the Khepri database and read from the database and expect a consistent result. For instance in a cluster of 5 RabbitMQ nodes: * If there are two partitions, one with 3 nodes, one with 2 nodes, only the group of 3 nodes will be able to write to the database. * If there are three partitions, two with 2 nodes, one with 1 node, none of the group can write to the database. Because the Khepri database will be used for all kind of metadata, it means that RabbitMQ nodes that can't write to the database will be unable to perform some operations. A list of operations and what to expect is documented in the associated pull request and the RabbitMQ website. This requirement from Raft also affects the startup of RabbitMQ nodes in a cluster. Indeed, at least a quorum number of nodes must be started at once to allow nodes to become ready. To enable Khepri, you need to enable the `khepri_db` feature flag: rabbitmqctl enable_feature_flag khepri_db When the `khepri_db` feature flag is enabled, the migration code performs the following two tasks: 1. It synchronizes the Khepri cluster membership from the Mnesia cluster. It uses `mnesia_to_khepri:sync_cluster_membership/1` from the `khepri_mnesia_migration` application [3]. 2. It copies data from relevant Mnesia tables to Khepri, doing some conversion if necessary on the way. Again, it uses `mnesia_to_khepri:copy_tables/4` from `khepri_mnesia_migration` to do it. This can be performed on a running standalone RabbitMQ node or cluster. Data will be migrated from Mnesia to Khepri without any service interruption. Note that during the migration, the performance may decrease and the memory footprint may go up. Because this feature flag is considered experimental, it is not enabled by default even on a brand new RabbitMQ deployment. More about the implementation details below: In the past months, all accesses to Mnesia were isolated in a collection of `rabbit_db` modules. This is where the integration of Khepri mostly takes place: we use a function called `rabbit_khepri:handle_fallback/1` which selects the database and perform the query or the transaction. Here is an example from `rabbit_db_vhost`: Up until RabbitMQ 3.12.x: get(VHostName) when is_binary(VHostName) -> get_in_mnesia(VHostName). * Starting with RabbitMQ 3.13.0: get(VHostName) when is_binary(VHostName) -> rabbit_khepri:handle_fallback( #{mnesia => fun() -> get_in_mnesia(VHostName) end, khepri => fun() -> get_in_khepri(VHostName) end}). This `rabbit_khepri:handle_fallback/1` function relies on two things: 1. the fact that the `khepri_db` feature flag is enabled, in which case it always executes the Khepri-based variant. 4. the ability or not to read and write to Mnesia tables otherwise. Before the feature flag is enabled, or during the migration, the function will try to execute the Mnesia-based variant. If it succeeds, then it returns the result. If it fails because one or more Mnesia tables can't be used, it restarts from scratch: it means the feature flag is being enabled and depending on the outcome, either the Mnesia-based variant will succeed (the feature flag couldn't be enabled) or the feature flag will be marked as enabled and it will call the Khepri-based variant. The meat of this function really lives in the `khepri_mnesia_migration` application [3] and `rabbit_khepri:handle_fallback/1` is a wrapper on top of it that knows about the feature flag. However, some calls to the database do not depend on the existence of Mnesia tables, such as functions where we need to learn about the members of a cluster. For those, we can't rely on exceptions from Mnesia. Therefore, we just look at the state of the feature flag to determine which database to use. There are two situations though: * Sometimes, we need the feature flag state query to block because the function interested in it can't return a valid answer during the migration. Here is an example: case rabbit_khepri:is_enabled(RemoteNode) of true -> can_join_using_khepri(RemoteNode); false -> can_join_using_mnesia(RemoteNode) end * Sometimes, we need the feature flag state query to NOT block (for instance because it would cause a deadlock). Here is an example: case rabbit_khepri:get_feature_state() of enabled -> members_using_khepri(); _ -> members_using_mnesia() end Direct accesses to Mnesia still exists. They are limited to code that is specific to Mnesia such as classic queue mirroring or network partitions handling strategies. Now, to discover the Mnesia tables to migrate and how to migrate them, we use an Erlang module attribute called `rabbit_mnesia_tables_to_khepri_db` which indicates a list of Mnesia tables and an associated converter module. Here is an example in the `rabbitmq_recent_history_exchange` plugin: -rabbit_mnesia_tables_to_khepri_db( [{?RH_TABLE, rabbit_db_rh_exchange_m2k_converter}]). The converter module — `rabbit_db_rh_exchange_m2k_converter` in this example — is is fact a "sub" converter module called but `rabbit_db_m2k_converter`. See the documentation of a `mnesia_to_khepri` converter module to learn more about these modules. [1] https://github.com/rabbitmq/ra [2] https://github.com/rabbitmq/khepri [3] https://github.com/rabbitmq/khepri_mnesia_migration See #7206. Co-authored-by: Jean-Sébastien Pédron <jean-sebastien@rabbitmq.com> Co-authored-by: Diana Parra Corbacho <dparracorbac@vmware.com> Co-authored-by: Michael Davis <mcarsondavis@gmail.com>		2023-09-29 16:00:11 +02:00
.github	Allow to use Khepri database to store metadata instead of Mnesia	2023-09-29 16:00:11 +02:00
bazel	Allow to use Khepri database to store metadata instead of Mnesia	2023-09-29 16:00:11 +02:00
deps	Allow to use Khepri database to store metadata instead of Mnesia	2023-09-29 16:00:11 +02:00
doc	Add files to specify license info	2020-08-18 12:42:43 -07:00
mk	Use rules_erlang v2	2022-01-18 13:43:46 +01:00
packaging	Use OTP 26.1 as OTP 26 in CI	2023-09-20 15:33:34 +02:00
release-notes	Initial 3.12.7 release notes	2023-09-27 15:00:31 -04:00
scripts	Allow ranges for more control	2023-06-13 19:46:55 +00:00
tools	Message Containers (#5077 )	2023-08-31 11:27:13 +01:00
.bazelignore	Add "bazel run //tools:symlink_deps_for_erlang_ls"	2023-01-05 13:02:02 +01:00
.bazelrc	Use OTP 26.1 as OTP 26 in CI	2023-09-20 15:33:34 +02:00
.bazelversion	Add a .bazelversion file	2021-07-28 09:33:11 +02:00
.dockerignore	dockerignore deps	2021-03-18 15:04:39 +00:00
.git-blame-ignore-revs	Ignore code formatter commits in git blame	2023-01-29 14:16:18 +00:00
.gitignore	Fix a .gitignore typo	2023-09-21 13:01:41 -04:00
BAZEL.md	Update bazel docs	2023-09-13 15:22:52 +02:00
BUILD.bats	Fixup the bazel build when used without bzlmod	2022-06-15 11:18:41 +02:00
BUILD.bazel	bazel run gazelle	2023-09-20 04:03:13 +00:00
BUILD.package_generic_unix	Synchronize mixed versions approach with v3.8.x	2021-09-09 13:50:22 +02:00
CODE_OF_CONDUCT.md	Replace @rabbitmq.com addresses with rabbitmq-core@groups.vmware.com	2023-06-20 15:40:13 +04:00
CONTRIBUTING.md	Update CONTRIBUTING.md to mention Bazel	2023-04-27 13:39:15 +04:00
LICENSE	Replace @rabbitmq.com addresses with rabbitmq-core@groups.vmware.com	2023-06-20 15:40:13 +04:00
LICENSE-APACHE2	Switch to MPL 2.0	2020-07-17 16:10:14 +03:00
LICENSE-MPL-RabbitMQ	Revert drop of Exhibit B on MPL 2.0	2020-07-20 17:03:37 +01:00
MODULE.bazel	Allow to use Khepri database to store metadata instead of Mnesia	2023-09-29 16:00:11 +02:00
Makefile	Use pkg_files rules to avoid extra tars	2023-03-14 23:18:00 +01:00
PKG_LINUX.md	URL Cleanup	2019-03-20 03:22:38 -05:00
PKG_WINDOWS.md	Windows doc tweaks	2018-11-08 13:48:46 -08:00
README.md	README: don't mention community Slack and the Google group	2023-09-26 22:59:49 -04:00
SERVER_RELEASES.md	Move (copy) the README file back into place	2020-11-13 15:01:21 +01:00
WORKSPACE	Adopt otp 26.1.1	2023-09-29 03:06:09 +00:00
dist.bzl	Make the extension of :package-generic-unix configurable in bazel	2023-06-20 10:14:31 +02:00
erlang.mk	Update Erlang.mk	2023-07-17 11:12:25 +02:00
erlang_ls.config	erlang_ls: Look for lib includes in extra_deps	2023-02-14 17:34:13 -06:00
moduleindex.yaml	Allow to use Khepri database to store metadata instead of Mnesia	2023-09-29 16:00:11 +02:00
plugins.mk	Fix typo in plugins.mk	2022-09-09 11:40:53 +09:00
rabbitmq-components.mk	Ra 2.7.0	2023-09-28 11:46:39 -04:00
rabbitmq.bzl	Use the latest 3.12.x for the secondary umbrella	2023-09-21 09:35:49 +02:00
rabbitmq_home.bzl	Add a workflow to compare the bazel/erlang.mk output	2023-05-15 13:54:14 +02:00
rabbitmq_package_generic_unix.bzl	Synchronize mixed versions approach with v3.8.x	2021-09-09 13:50:22 +02:00
rabbitmq_run.bzl	Add Bazel targets start-cluster and stop-cluster	2022-07-04 13:54:26 +00:00
rabbitmqctl.bzl	Fix shell quoting in bazel rabbitmqctl helper	2023-02-01 16:40:36 +01:00
rebar.config	Revert "Format MQTT code with `erlfmt`"	2023-01-27 18:25:57 +00:00
user-template.bazelrc	Use rules_erlang 3.8.5	2022-12-19 11:16:04 +01:00