* Implement rabbitmq-queues leader_health_check command for quorum queues
(cherry picked from commit c26edbef33)
* Tests for rabbitmq-queues leader_health_check command
(cherry picked from commit 6cc03b0009)
* Ensure calling ParentPID in leader health check execution and
reuse and extend formatting API, with amqqueue:to_printable/2
(cherry picked from commit 76d66a1fd7)
* Extend core leader health check tests and update badrpc error handling in cli tests
(cherry picked from commit 857e2a73ca)
* Refactor leader_health_check command validators and ignore vhost arg
(cherry picked from commit 6cf9339e49)
* Update leader_health_check_command description and banner
(cherry picked from commit 96b8bced2d)
* Improve output formatting for healthy leaders and support
silent mode in rabbitmq-queues leader_health_check command
(cherry picked from commit 239a69b404)
* Support global flag to run leader health check for
all queues in all vhosts on local node
(cherry picked from commit 48ba3e161f)
* Return immediately for leader health checks on empty vhosts
(cherry picked from commit 7873737b35)
* Rename leader health check timeout refs
(cherry picked from commit b7dec89b87)
* Update banner message for global leader health check
(cherry picked from commit c7da4d5b24)
* QQ leader-health-check: check_process_limit_safety before spawning leader checks
(cherry picked from commit 17368454c5)
* Log leader health check result in broker logs (if any leaderless queues)
(cherry picked from commit 1084179a2c)
* Ensure check_passed result for leader health internal calls)
(cherry picked from commit 68739a6bd2)
* Extend CLI format output to process check_passed payload
(cherry picked from commit 5f5e9922bd)
* Format leader healthcheck result log and function exports
(cherry picked from commit ebffd7d8a4)
* Change leader_health_check command scope from queues to diagnostics
(cherry picked from commit 663fc9846e)
* Update (c) line year
(cherry picked from commit df82f12a70)
* Rename command to check_for_quorum_queues_without_an_elected_leader
and use across_all_vhosts option for global checks
(cherry picked from commit b2acbae28e)
* Use rabbit_db_queue for qq leader health check lookups
and introduce rabbit_db_queue:get_all_by_type_and_vhost/2.
Update leader health check timeout to 5s and process limit
threshold to 20% of node's process_limit.
(cherry picked from commit 7a8e166ff6)
* Update tests: quorum_queue_SUITE and rabbit_db_queue_SUITE
(cherry picked from commit 9bdb81fd79)
* Fix typo (cli test module)
(cherry picked from commit 615856853a)
* Small refactor - simpler final leader health check result return on function head match
(cherry picked from commit ea07938f3d)
* Clear dialyzer warning & fix type spec
(cherry picked from commit a45aa81bd2)
* Ignore result without strict match to avoid diayzer warning
(cherry picked from commit bb43c0b929)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' documentation edits
(cherry picked from commit 845230b0b380a5f5bad4e571a759c10f5cc93b91)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' output copywriting
(cherry picked from commit 235f43bad58d3a286faa0377b8778fcbe6f8705d)
* diagnostics check_for_quorum_queues_without_an_elected_leader: behave like a health check w.r.t. error reporting
(cherry picked from commit db7376797581e4716e659fad85ef484cc6f0ea15)
* check_for_quorum_queues_without_an_elected_leader: handle --quiet and --silent
plus simplify function heads.
References #13433.
(cherry picked from commit 7b392315d5e597e5171a0c8196230d92b8ea8e92)
---------
Co-authored-by: Ayanda Dube <adube14@bloomberg.net>
This previously emitted a warning because Elixir will rebind `this_node`
by default, so the `this_node` binding in the line above was unused.
(As opposed to Erlang which would treat this as a match - rejecting
the binding if `this_node` was not equal to the value being matched.)
The node needed to be adjusted as well - `node()` returned the ExUnit
runner's node while the command returned the remote node, which is
stored in the context under `opts.node`.
[Why]
This work started as an effort to add peer discovery support to our
Khepri integration. Indeed, as part of the task to integrate Khepri, we
missed the fact that `rabbit_peer_discovery:maybe_create_cluster/1` was
called from the Mnesia-specific code only. Even though we knew about it
because we hit many issues caused by the fact the `join_cluster` and
peer discovery use different code path to create a cluster.
To add support for Khepri, the first version of this patch was to move
the call to `rabbit_peer_discovery:maybe_create_cluster/1` from
`rabbit_db_cluster` instead of `rabbit_mnesia`. To achieve that, it made
sense to unify the code and simply call `rabbit_db_cluster:join/2`
instead of duplicating the work.
Unfortunately, doing so highlighted another issue: the way the node to
cluster with was selected. Indeed, it could cause situations where
multiple clusters are created instead of one, without resorting to
out-of-band counter-measures, like a 30-second delay added in the
Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was
even more frequent when we tried to unify the code path and call
`join_cluster`.
After several iterations on the patch and even more discussions with the
team, we decided to rewrite the algorithm to make node selection more
robust and still use `rabbit_db_cluster:join/2` to create the cluster.
[How]
This commit is only about the rewrite of the algorithm. Calling peer
discovery from `rabbit_db_cluster` instead of `rabbit_mnesia` (and thus
making peer discovery work with Khepri) will be done in a follow-up
commit.
We wanted the new algorithm to fulfill the following properties:
1. `rabbit_peer_discovery` should provide the ability to re-trigger it
easily to re-evaluate the cluster. The new public API is
`rabbit_peer_discovery:sync_desired_cluster/0`.
2. The selection of the node to join should be designed in a way that
all nodes select the same, regardless of the order in which they
become available. The adopted solution is to sort the list of
discovered nodes with the following criterias (in that order):
1. the size of the cluster a discovered node is part of; sorted from
bigger to smaller clusters
2. the start time of a discovered node; sorted from older to younger
nodes
3. the name of a discovered node; sorted alphabetically
The first node in that list will not join anyone and simply proceed
with its boot process. Other nodes will try to join the first node.
3. To reduce the chance of incorrectly having multiple standalone nodes
because the discovery backend returned only a single node, we want to
apply the following constraints to the list of nodes after it is
filtered and sorted (see property 2 above):
* The list must contain `node()` (i.e. the node running peer
discovery itself).
* If the RabbitMQ's cluster size hint is greater than 1, the list
must have at least two nodes. The cluster size hint is the maximum
between the configured target cluster size hint and the number of
elements in the nodes list returned by the backend.
If one of the constraint is not met, the entire peer discovery
process is restarted after a delay.
4. The lock is acquired only to protect the actual join, not the
discovery step where the backend is queried to get the list of peers.
With the node selection described above, this will let the first node
to start without acquiring the lock.
5. The cluster membership views queried as part of the algorithm to sort
the list of nodes will be used to detect additional clusters or
standalone nodes that did not cluster correctly. These nodes will be
asked to re-evaluate peer discovery to increase the chance of forming
a single cluster.
6. After some delay, peer discovery will be re-evaluated to further
eliminate the chances of having multiple clusters instead of one.
This commit covers properties from point 1 to point 4. Remaining
properties will be the scope of additional pull requests after this one
works.
If there is a failure at any point during discovery, filtering/sorting,
locking or joining, the entire process is restarted after a delay. This
is configured using the following parameters:
* cluster_formation.discovery_retry_limit
* cluster_formation.discovery_retry_interval
The default parameters were bumped to 30 retries with a delay of 1
second between each.
The locking retries/interval parameters are not used by the new
algorithm anymore.
There are extra minor changes that come with the rewrite:
* The configured backend is cached in a persistent term. The goal is to
make sure we use the same backend throughout the entire process and
when we call `maybe_unregister/0` even if the configuration changed
for whatever reason in between.
* `maybe_register/0` is called from `rabbit_db_cluster` instead of at
the end of a successful peer discovery process. `rabbit_db_cluster`
had to call `maybe_register/0` if the node was not virgin anyway. So
make it simpler and always call it in `rabbit_db_cluster` regardless
of the state of the node.
* `log_configured_backend/0` is gone. `maybe_init/0` can log the backend
directly. There is no need to explicitly call another function for
that.
* Messages are logged using `?LOG_*()` macros instead of the old
`rabbit_log` module.
This category should be unused with the decommissioning of the old
upgrade subsystem (in favor of the feature flags subsystem). It means:
1. The upgrade log file will not be created by default anymore.
2. The `$RABBITMQ_UPGRADE_LOG` environment variable is now unsupported.
The configuration variables remain to avoid breaking an existing and
working configuration.
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.
The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.
The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.
The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:
2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com
The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).
JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:
Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}
For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
* `-` still means stdout
* `-stderr` means stderr
* `syslog:` means syslog on localhost
* `exchange:` means logging to `amq.rabbitmq.log`
`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.
The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.
From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.
In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):
?LOG_NOTICE("Logging: switching to configured handler(s); following "
"messages may not be visible in this log output",
#{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),
Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.
At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.
The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
Helps with troubleshooting hostname resolution behavior
on nodes and locally for CLI tools. This is obviously not meant
to be a replacement for existing tools such as dig, only
a way to quickly spot obvious irregularities, e.g. those
in environments that use custom Erlang inetrc files.
Per discussion @harshac.
It prints RabbitMQ-specific environment variables that
are set on the target node. Can be used to inspect env variable-based
configuration without access to the target host.
It makes a lot of assumptions about Lager's log flush
timing and can be tripped by the peak rate protection
mechanism. This test module has a high rate of false
positives on Concourse.
There is another test that asserts over a "folded" stream, so
code coverage is kept about the same.