This new module sits on top of `rabbit_mnesia` and provide an API with
all cluster-related functions.
`rabbit_mnesia` should be called directly inside Mnesia-specific code
only, `rabbit_mnesia_rename` or classic mirrored queues for instance.
Otherwise, `rabbit_db_cluster` must be used.
Several modules, in particular in `rabbitmq_cli`, continue to call
`rabbit_mnesia` as a fallback option if the `rabbit_db_cluster` module
unavailable. This will be the case when the CLI will interact with an
older RabbitMQ version.
This will help with the introduction of a new database backend.
So far, we had the following functions to list nodes in a RabbitMQ
cluster:
* `rabbit_mnesia:cluster_nodes/1` to get members of the Mnesia cluster;
the argument was used to select members (all members or only those
running Mnesia and participating in the cluster)
* `rabbit_nodes:all/0` to get all members of the Mnesia cluster
* `rabbit_nodes:all_running/0` to get all members who currently run
Mnesia
Basically:
* `rabbit_nodes:all/0` calls `rabbit_mnesia:cluster_nodes(all)`
* `rabbit_nodes:all_running/0` calls `rabbit_mnesia:cluster_nodes(running)`
We also have:
* `rabbit_node_monitor:alive_nodes/1` which filters the given list of
nodes to only select those currently running Mnesia
* `rabbit_node_monitor:alive_rabbit_nodes/1` which filters the given
list of nodes to only select those currently running RabbitMQ
Most of the code uses `rabbit_mnesia:cluster_nodes/1` or the
`rabbit_nodes:all*/0` functions. `rabbit_mnesia:cluster_nodes(running)`
or `rabbit_nodes:all_running/0` is often used as a close approximation
of "all cluster members running RabbitMQ". This list might be incorrect
in times where a node is joining the clustered or is being worked on
(i.e. Mnesia is running but not RabbitMQ).
With Khepri, there won't be the same possible approximation because we
will try to keep Khepri/Ra running even if RabbitMQ is stopped to
expand/shrink the cluster.
So in order to clarify what we want when we query a list of nodes, this
patch introduces the following functions:
* `rabbit_nodes:list_members/0` to get all cluster members, regardless
of their state
* `rabbit_nodes:list_reachable/0` to get all cluster members we can
reach using Erlang distribution, regardless of the state of RabbitMQ
* `rabbit_nodes:list_running/0` to get all cluster members who run
RabbitMQ, regardless of the maintenance state
* `rabbit_nodes:list_serving/0` to get all cluster members who run
RabbitMQ and are accepting clients
In addition to the list functions, there are the corresponding
`rabbit_nodes:is_*(Node)` checks and `rabbit_nodes:filter_*(Nodes)`
filtering functions.
The code is modified to use these new functions. One possible
significant change is that the new list functions will perform RPC calls
to query the nodes' state, unlike `rabbit_mnesia:cluster_nodes(running)`.
It allows one to run a common_test testsuite on Erlang nodes running on
remote Amazon EC2 VMs. It configures Erlang distribution so that remote
nodes can communicate with each other and also commmunicate with the
commont_test master node.
rabbit_ct_broker_helpers also offers new setup and teardown steps to
work with VMs: it allows to start RabbitMQ nodes on those VMS and
possible cluster them. The configuration is unchanged compared to local
nodes. The number of RabbitMQ nodes doesn't have to match the number of
VMs: they are spread using round-robin on the available VMs.
v2: Add support to start RabbitMQ nodes spread on remote VMs.
v3: Various improvements to allow parallel executions of testcases. I.e.
several sets of VMs can be spawned in parallel without interference.
v4: Support user-specified VM names. If the name is missing, use the
unique ID generated for per-VM-set resources.
v5: Use a unique local node name when trying to ping the remote ct-peer.
While here use `rabbit_misc:random()` to create Terraform unique UI.
The previous base64-encoded string didn't make a valid node name.
Accept `$ERLANG_VERSION` environment/make variable to force the
Erlang version to use on VMs.
Add setup scripts for Erlang 19.3 and 20.1.
v6: Use Amazon S3 to upload the directories archive. Configure a VPC to
access it from the VMs.
Use `user_data` to provide the setup script. The setup script itself
is now a template.
Those changes allow to get rid of all `exec` or `file` provisioners
in the `aws_instance`. This means it can now be created using a
launch configuration which is the way to create instances via an
autoscaling group.
v7: Export hostnames, nodenames and IP addresses from Terraform state,
and generate `inetrc` in Erlang. This makes it possible to work with
a "two-step Terraform manifest". For instance, with an autoscaling
group, Terraform doesn't start instances. However, we can use a
second manifest to query the created instances.
Export `$HOME` in setup scripts. This fixes the use of `~/...` paths
and the start of the remote Erlang node.
v8: Add support to query Amazon EC2 VMs, based on tags, instead of
relying on the outputs of the manifest. This will allow us to query
VMs created with an autoscaling group for instance. This change is
based on a new query-only manifest called `vms-query`.
This new query-only manifest is used in a loop until we have enough
VMs (compared to the requested numbers) or we reach a timeout of 5
minutes.
v9: Add an autoscaling-group-based module to deploy VMs. The testsuite
is extended to use it, in addition to the `direct-vms` module.
Fix the setup scripts to handle the case where there is no
directories to upload (i.e. the archive is an empty file).
v10: Download log files from remote VMs before destroying them. This
allows further debugging if something fails.
v11: Use a per-VPC CIDR block. This resolves a possible conflict when
VMs in different VPCs gets the same private IP address: this breaks
name resolution on the local common_test node.
Use inet_db:add_host() to reconfigure name resolution, instead of
calling inet_config:init(). We still write the `inetrc` files: they
are used by sub-processes such as rabbitmqctl(8) and
rabbitmq-plugins(8).
Download each VM's common_test priv_dir before destroying the VMs.
They are useful because they contain the RabbitMQ nodes logs for
instance.
Fix several concurrency bugs around global resources accessed or
shared by several setups of rabbit_vm_helpers, in case of parallel
testing.
The upload dirs archive is now created by Erlang, not Terraform. It
allows us to create a single archive per directories set, which
saves time, I/O and CPU (for compression).
Configure an EBS root block device for each VM because the default
internal storage a `t2.micro` instance type is too small.
v12: Verify that terraform(1) is available and working before doing
anything else.
v13: Guess the Erlang application name being tested (using the value of
the `$DIALYZER_PLT` environment variable, lacking a better way). We
use it now as the instance name prefix.
Allow the caller to set the AWS EC2 region.
Allow the caller to set the files suffix. Also, we record it in the
instance and launch configuration tags. This allows the caller to
do things based on a known instance tag.
Install rsync, zip and vim-nox on VMs. They are useful when one
needs to connect to the VMs and try things.
Install Elixir on 19.3+ VMs. It's not used, but it silences a
warnings from `rabbitmq-build.mk` which calls it to initialize
`$ELIXIR_LIB_DIR`.
[#153749132]