Commit Graph

124 Commits

Author SHA1 Message Date
Michael Klishin 5b974d7c7c
Merge branch 'v3.13.x' into mergify/bp/v3.13.x/pr-12484 2024-10-08 14:19:31 -04:00
Michael Klishin fe85612668
Merge pull request #12493 from rabbitmq/mergify/bp/v3.13.x/pr-12483
3.13.x: Upgrade Recon to 2.5.6 (backport #12483)
2024-10-08 14:18:49 -04:00
Simon Unge 85fa174a26 Dependency Redbug updated from 2.0.7 to 2.1.0
(cherry picked from commit 770dfd6fea)

Conflicts:
	rabbitmq-components.mk
(cherry picked from commit c348501db2)

# Conflicts:
#	rabbitmq-components.mk
2024-10-08 14:08:13 +00:00
Simon Unge 63c3afbfce Dependency Recon updated from 2.5.3 to 2.5.6
(cherry picked from commit 7702a92865)
(cherry picked from commit 7c816a416c)
2024-10-08 12:24:34 +00:00
Simon Unge 173a7fb7e8 Dependency thoas updated from 1.0.0 to 1.2.1
(cherry picked from commit 13bf5c005e)

Conflicts:
	rabbitmq-components.mk
(cherry picked from commit 679eaa1913)
2024-10-08 02:00:12 +00:00
Jean-Sébastien Pédron bcba7af669
Bump Khepri from 0.13.0 to 0.14.0
Release notes:
https://github.com/rabbitmq/khepri/releases/tag/v0.14.0

While here, bump khepri_mnesia_migration from 0.4.0 to 0.5.0 as well.

(cherry picked from commit e9da930f59)

# Conflicts:
#	rabbitmq-components.mk
2024-07-10 16:31:26 -04:00
Rin Kuryloski 6ce5109838 Use the latest rules_erlang & rules_elixir
(cherry picked from commit 3465eef2cd)
2024-06-22 19:02:32 -04:00
Rin Kuryloski c2920658de
Merge pull request #11485 from rabbitmq/mergify/bp/v3.13.x/pr-11456
Use rules_elixir for rabbitmqctl (backport #11456)
2024-06-20 13:48:01 +02:00
Rin Kuryloski f28089177a fixup backport 2024-06-19 14:18:02 +02:00
Rin Kuryloski b38dc84db5 Remove remaining buildbuddy usage
(cherry picked from commit a2709dfd05)

# Conflicts:
#	.github/workflows/perform-bazel-execution-comparison.yaml
#	bazel/bzlmod/secondary_umbrella.bzl
2024-06-19 10:48:44 +00:00
Rin Kuryloski 306c593085 ensure that csv and json elixir deps are embedded in the cli escript
also set 'cfg =' appropriately

(cherry picked from commit 46250dce11)
2024-06-19 08:58:55 +00:00
Rin Kuryloski 7f585d4102 Use rules_elixir to build the cli without mix
Certain elixir-native deps are still build with mix, but this can be
corrected later

(cherry picked from commit 5debebfaf3)

# Conflicts:
#	deps/rabbit/BUILD.bazel
2024-06-19 08:58:53 +00:00
Rin Kuryloski 2c62185317 Turn off BuildBuddy integration (#11343)
Builds now execute on the github actions workers, using Google Cloud Storage (GCS) as a cache

(cherry picked from commit a6874e39cc)

# Conflicts:
#	.github/workflows/update-elixir-patches.yaml
#	.github/workflows/update-otp-patches.yaml
#	MODULE.bazel
#	WORKSPACE
#	deps/rabbitmq_peer_discovery_consul/test/system_SUITE_data/consul.hcl
2024-06-06 09:38:03 +00:00
Rin Kuryloski 57faf57d4e Remove +warnings_as_errors from the jose compilation options
Why?
1. Generally +warnings_as_errors is unnecessary for deps anyway
2. The gazelle plugin in rules_erlang, for the sake of simplicity,
splits compilation of a given application into at most 3 phases,
parse_transforms, behaviours & everything else. Therefore, if there
are behaviours that are behaviours, this produces warnings. It's
certainly possible to write an extra phase into BUILD.jose so that the
warning does not occur, but in this case it's much simpler to just to
allow the warning. Futhermore, this should not be an issue with
rules_erlang 4, as it is not limited to the finite number of
compilation phases, nor does it even have a gazelle plugin.

(cherry picked from commit fbf896c05c)
2024-06-04 17:49:05 +00:00
Michal Kuratczyk 55fd3c1539 bazel run gazelle-update-repos -- hex.pm/jose@1.11.10
(cherry picked from commit ad06ad2552)
2024-06-04 17:49:03 +00:00
Karl Nilsson db39c68e45 Ra 2.10.0
This Ra release contains a number of fixes and improvements including:

* Much improved resiliency when Ra infrastructure such as the WAL or
segment writer encounters unexpected errors during disk operations.

It also includes the following features that are RabbitMQ does not
yet make use of (but will in the near future).

* Checkpoints: allow non truncating snapshots to be written
to allow faster recovery of quorum queues with long backlogs for example.
* Server recovery strategy configuration: allow dynamically started
ra servers to be optionally restarted.
* New handle_aux/5 callback with a better and safer API

(cherry picked from commit 5b2da75b5e)
2024-04-29 15:17:02 +00:00
Rin Kuryloski 6bda429fbc Regenerate bazel/BUILD.cowboy with gazelle
`bazel run gazelle-update-repos -- hex.pm/cowboy@2.12.0`

(cherry picked from commit e2a913a44a)
2024-03-26 17:18:47 +01:00
Rin Kuryloski 937f4a99ee Regenerate bazel/BUILD.cowlib with gazelle
`bazel run gazelle-update-repos -- hex.pm/cowlib@2.13.0`

(cherry picked from commit 5b30b5b0c5)
2024-03-26 17:18:47 +01:00
Rin Kuryloski 1f65fc6c6f Maybe the symlinks were in a recursively copied directory
(cherry picked from commit fa22e4fe0f)
2024-03-20 16:30:37 +00:00
Rin Kuryloski 38bf5083bd Use rules_erlang 3.15.0
(cherry picked from commit 7a54b46379)
2024-03-20 13:43:52 +00:00
Rin Kuryloski 5dc6df2841 Add back osiris from BCR
gazelle update-repos does not correctly generate the bazel build for
it, because it does not pick up the application's env property
2024-02-20 14:19:54 +01:00
Rin Kuryloski de2305992f Do not use BCR for ra, osiris, or seshat
Because khepri is not bazel-native, ra and seshat needed to be
declared twice and manually synchronized. This allows them to be
declared just once.

looking_glass remains a bazel_dep, since it has native extensions
2024-02-20 11:07:49 +01:00
Michal Kuratczyk 8d2657ef2f Adopt OTP 26.2.1 2024-01-09 10:14:08 +01:00
Michael Davis dea4769fed
Update khepri to 0.10.1
Khepri 0.10.0 replaces `khepri:wait_for_async_ret/2,3` with
`khepri:handle_async_ret/1,2`. This will be used by the child commit:
the child commit will use Khepri's async interface and handle async
write events from Ra.

Changes to the bazel build files were done automatically with gazelle:

    bazel run gazelle -- update-repos --verbose \
        --build_files_dir=bazel github.com/rabbitmq/khepri@v0.10.1
2023-12-12 12:01:59 -05:00
Michal Kuratczyk 0f0076a025
Run gazelle for updated deps 2023-11-10 16:47:39 +01:00
Michal Kuratczyk b2c01e3e8e
Remove dialyxir from bazel 2023-11-10 15:37:11 +01:00
Rin Kuryloski 0bbb188aa9
Partially revert commit 3253fe433b
Khepri needs ra, and unless khepri is a native bazel dep, we still
need to declare ra in the classic fashion
2023-09-29 16:00:11 +02:00
Diana Parra Corbacho 5f0981c5a3
Allow to use Khepri database to store metadata instead of Mnesia
[Why]

Mnesia is a very powerful and convenient tool for Erlang applications:
it is a persistent disc-based database, it handles replication accross
multiple Erlang nodes and it is available out-of-the-box from the
Erlang/OTP distribution. RabbitMQ relies on Mnesia to manage all its
metadata:

* virtual hosts' properties
* intenal users
* queue, exchange and binding declarations (not queues data)
* runtime parameters and policies
* ...

Unfortunately Mnesia makes it difficult to handle network partition and,
as a consequence, the merge conflicts between Erlang nodes once the
network partition is resolved. RabbitMQ provides several partition
handling strategies but they are not bullet-proof. Users still hit
situations where it is a pain to repair a cluster following a network
partition.

[How]

@kjnilsson created Ra [1], a Raft consensus library that RabbitMQ
already uses successfully to implement quorum queues and streams for
instance. Those queues do not suffer from network partitions.

We created Khepri [2], a new persistent and replicated database engine
based on Ra and we want to use it in place of Mnesia in RabbitMQ to
solve the problems with network partitions.

This patch integrates Khepri as an experimental feature. When enabled,
RabbitMQ will store all its metadata in Khepri instead of Mnesia.

This change comes with behavior changes. While Khepri remains disabled,
you should see no changes to the behavior of RabbitMQ. If there are
changes, it is a bug. After Khepri is enabled, there are significant
changes of behavior that you should be aware of.

Because it is based on the Raft consensus algorithm, when there is a
network partition, only the cluster members that are in the partition
with at least `(Number of nodes in the cluster ÷ 2) + 1` number of nodes
can "make progress". In other words, only those nodes may write to the
Khepri database and read from the database and expect a consistent
result.

For instance in a cluster of 5 RabbitMQ nodes:
* If there are two partitions, one with 3 nodes, one with 2 nodes, only
  the group of 3 nodes will be able to write to the database.
* If there are three partitions, two with 2 nodes, one with 1 node, none
  of the group can write to the database.

Because the Khepri database will be used for all kind of metadata, it
means that RabbitMQ nodes that can't write to the database will be
unable to perform some operations. A list of operations and what to
expect is documented in the associated pull request and the RabbitMQ
website.

This requirement from Raft also affects the startup of RabbitMQ nodes in
a cluster. Indeed, at least a quorum number of nodes must be started at
once to allow nodes to become ready.

To enable Khepri, you need to enable the `khepri_db` feature flag:

    rabbitmqctl enable_feature_flag khepri_db

When the `khepri_db` feature flag is enabled, the migration code
performs the following two tasks:
1. It synchronizes the Khepri cluster membership from the Mnesia
   cluster. It uses `mnesia_to_khepri:sync_cluster_membership/1` from
   the `khepri_mnesia_migration` application [3].
2. It copies data from relevant Mnesia tables to Khepri, doing some
   conversion if necessary on the way. Again, it uses
   `mnesia_to_khepri:copy_tables/4` from `khepri_mnesia_migration` to do
   it.

This can be performed on a running standalone RabbitMQ node or cluster.
Data will be migrated from Mnesia to Khepri without any service
interruption. Note that during the migration, the performance may
decrease and the memory footprint may go up.

Because this feature flag is considered experimental, it is not enabled
by default even on a brand new RabbitMQ deployment.

More about the implementation details below:

In the past months, all accesses to Mnesia were isolated in a collection
of `rabbit_db*` modules. This is where the integration of Khepri mostly
takes place: we use a function called `rabbit_khepri:handle_fallback/1`
which selects the database and perform the query or the transaction.
Here is an example from `rabbit_db_vhost`:

* Up until RabbitMQ 3.12.x:

        get(VHostName) when is_binary(VHostName) ->
            get_in_mnesia(VHostName).

* Starting with RabbitMQ 3.13.0:

        get(VHostName) when is_binary(VHostName) ->
            rabbit_khepri:handle_fallback(
              #{mnesia => fun() -> get_in_mnesia(VHostName) end,
                khepri => fun() -> get_in_khepri(VHostName) end}).

This `rabbit_khepri:handle_fallback/1` function relies on two things:
1. the fact that the `khepri_db` feature flag is enabled, in which case
   it always executes the Khepri-based variant.
4. the ability or not to read and write to Mnesia tables otherwise.

Before the feature flag is enabled, or during the migration, the
function will try to execute the Mnesia-based variant. If it succeeds,
then it returns the result. If it fails because one or more Mnesia
tables can't be used, it restarts from scratch: it means the feature
flag is being enabled and depending on the outcome, either the
Mnesia-based variant will succeed (the feature flag couldn't be enabled)
or the feature flag will be marked as enabled and it will call the
Khepri-based variant. The meat of this function really lives in the
`khepri_mnesia_migration` application [3] and
`rabbit_khepri:handle_fallback/1` is a wrapper on top of it that knows
about the feature flag.

However, some calls to the database do not depend on the existence of
Mnesia tables, such as functions where we need to learn about the
members of a cluster. For those, we can't rely on exceptions from
Mnesia. Therefore, we just look at the state of the feature flag to
determine which database to use. There are two situations though:

* Sometimes, we need the feature flag state query to block because the
  function interested in it can't return a valid answer during the
  migration. Here is an example:

        case rabbit_khepri:is_enabled(RemoteNode) of
            true  -> can_join_using_khepri(RemoteNode);
            false -> can_join_using_mnesia(RemoteNode)
        end

* Sometimes, we need the feature flag state query to NOT block (for
  instance because it would cause a deadlock). Here is an example:

        case rabbit_khepri:get_feature_state() of
            enabled -> members_using_khepri();
            _       -> members_using_mnesia()
        end

Direct accesses to Mnesia still exists. They are limited to code that is
specific to Mnesia such as classic queue mirroring or network partitions
handling strategies.

Now, to discover the Mnesia tables to migrate and how to migrate them,
we use an Erlang module attribute called
`rabbit_mnesia_tables_to_khepri_db` which indicates a list of Mnesia
tables and an associated converter module. Here is an example in the
`rabbitmq_recent_history_exchange` plugin:

    -rabbit_mnesia_tables_to_khepri_db(
       [{?RH_TABLE, rabbit_db_rh_exchange_m2k_converter}]).

The converter module  — `rabbit_db_rh_exchange_m2k_converter` in this
example  — is is fact a "sub" converter module called but
`rabbit_db_m2k_converter`. See the documentation of a `mnesia_to_khepri`
converter module to learn more about these modules.

[1] https://github.com/rabbitmq/ra
[2] https://github.com/rabbitmq/khepri
[3] https://github.com/rabbitmq/khepri_mnesia_migration

See #7206.

Co-authored-by: Jean-Sébastien Pédron <jean-sebastien@rabbitmq.com>
Co-authored-by: Diana Parra Corbacho <dparracorbac@vmware.com>
Co-authored-by: Michael Davis <mcarsondavis@gmail.com>
2023-09-29 16:00:11 +02:00
Rin Kuryloski b540b9abca Use 3.12.6 for the secondary umbrella 2023-09-22 09:26:04 +02:00
Rin Kuryloski c6672b9dea Use the latest 3.12.x for the secondary umbrella 2023-09-21 09:35:49 +02:00
Rin Kuryloski 75eb0621fc Use OTP 26.1 as OTP 26 in CI 2023-09-20 15:33:34 +02:00
Rin Kuryloski ac3d0251c7 Use osiris from Bazel Central Repository
osiris 1.6.3 is identical to 1.6.2, but we needed a new version for
the sake of the BCR publish

this makes osiris a native bzlmod dependency
2023-08-15 14:49:24 +02:00
Rin Kuryloski ee85344ddc Remove pre-generated .app files in cowboy, cowlib & ranch
As they won't be replaced in the bazel build, the way that they are
with erlang.mk and this causes build divergence
2023-08-15 09:06:32 +02:00
Michael Klishin dbd319f5f8 Bump Osiris to 1.6.2, references #8616
In #8616, Osiris was bumped to 1.6.0 but
`BUILD.bazel` in the repo still reported the version as 1.5.1.

That created a fair amount of confusion.
2023-08-13 12:28:22 +04:00
Rin Kuryloski ca1806dbcd
Check additional applications when comparing bazel and make results (#8209)
* Check additional applications when comparing bazel and make results

* Sync bazel/make for amqp_client

* Do not fail-fast in build system comparison

* promethus -> prometheus

* Regenerate BUILD.redbug

* When comparing build systems & .app files ignore empty 'registered'

It's listed as a required key in
https://www.erlang.org/doc/man/app.html, but the same docs state the
default is "[]". It seems to ignore it if it's empty.

* Copy bazel/BUILD.osiris from BUILD.bazel in the osiris repo

Normally it would be generated with `bazel run gazelle-update-repos --
-args osiris@1.5.1=github.com/rabbitmq/osiris@v1.5.1`, but in this
case we just want to match it's compilation with erlang.mk with some
manual tweaks.

* Use elixir 1.15, otherwise mix format fails

* Sync bazel/make for rabbitmq_web_dispatch, rabbitmq_management_agent
2023-07-12 17:26:16 +02:00
Rin Kuryloski 503bccec31 Explicitly set HOME in `elixir_build` rule
It's not guaranteed to be set, so it's better to be explicit with a
temp dir
2023-07-06 09:51:47 +02:00
Rin Kuryloski 9c66e73266 Patch amqp dep for elixir 1.15 2023-07-04 17:45:32 +02:00
Rin Kuryloski 21978a3234 Add elixir 1.15 and use with otp 26 2023-07-04 17:43:47 +02:00
Rin Kuryloski 1e91bb7327 Allow bazel modules depending on rabbitmq-server to override @rbe 2023-07-04 16:40:06 +02:00
Rin Kuryloski 464182a797
Avoid secondary umbrella archive collisions in actions (#8653)
* Store secondary umbrella archives used in mixed version tests

in a path which implies the erlang version used

* Infer the secondary umbrella otp version from the url

This avoids having two copies that need to be kept in sync
2023-06-23 16:03:22 +02:00
Rin Kuryloski 0501af9ffc Use 3.11.18 for the secondary umbrella 2023-06-16 08:30:40 +02:00
Michal Kuratczyk fb3655610c
Update CSV to 3.0.5;; remove unused dep 2023-05-26 18:04:42 +02:00
Rin Kuryloski ad03e31543 Add bazel build info for syslog dep
This allows building `@syslog//:erlang_app` on windows
2023-05-23 17:15:28 +02:00
Michal Kuratczyk a9a96a4f4a
OTP master is OTP27 2023-05-22 09:15:30 +02:00
Rin Kuryloski eb94a58bc9 Add a workflow to compare the bazel/erlang.mk output
To catch any drift between the builds
2023-05-15 13:54:14 +02:00
Michael Klishin 316251a8d6
Merge pull request #8156 from rabbitmq/mk-gazelle-update-repos
bazel run gazelle-update-repos for Ra 2.6
2023-05-12 20:32:50 +04:00
Michael Klishin 0bf4d5168a bazel run gazelle-update-repos for Ra 2.6 2023-05-12 20:25:40 +04:00
Rin Kuryloski ea895a0023 Account for Elixir containing several core applications
- eex
- elixir
- ex_unit
- iex
- logger
- mix

So that apps (like rabbitmq_cli) can dialyze against the extra
components
2023-05-12 08:26:42 +02:00
Rin Kuryloski 19f4abd55b Build cli deps as .ez archives
This provides an elixir/erlang agnostic way of providing them other
erlang rules
2023-05-12 08:26:42 +02:00
Michael Klishin bbb98226e2
Merge pull request #8100 from rabbitmq/otp26-dialyzer 2023-05-04 19:05:23 +04:00