rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Michael Klishin	09f1ab47b7	By @Ayanda-D: new CLI health check that detects QQs without an elected reachable leader #13433 (#13487 ) * Implement rabbitmq-queues leader_health_check command for quorum queues (cherry picked from commit `c26edbef33`) * Tests for rabbitmq-queues leader_health_check command (cherry picked from commit `6cc03b0009`) * Ensure calling ParentPID in leader health check execution and reuse and extend formatting API, with amqqueue:to_printable/2 (cherry picked from commit `76d66a1fd7`) * Extend core leader health check tests and update badrpc error handling in cli tests (cherry picked from commit `857e2a73ca`) * Refactor leader_health_check command validators and ignore vhost arg (cherry picked from commit `6cf9339e49`) * Update leader_health_check_command description and banner (cherry picked from commit `96b8bced2d`) * Improve output formatting for healthy leaders and support silent mode in rabbitmq-queues leader_health_check command (cherry picked from commit `239a69b404`) * Support global flag to run leader health check for all queues in all vhosts on local node (cherry picked from commit `48ba3e161f`) * Return immediately for leader health checks on empty vhosts (cherry picked from commit `7873737b35`) * Rename leader health check timeout refs (cherry picked from commit `b7dec89b87`) * Update banner message for global leader health check (cherry picked from commit `c7da4d5b24`) * QQ leader-health-check: check_process_limit_safety before spawning leader checks (cherry picked from commit `17368454c5`) * Log leader health check result in broker logs (if any leaderless queues) (cherry picked from commit `1084179a2c`) * Ensure check_passed result for leader health internal calls) (cherry picked from commit `68739a6bd2`) * Extend CLI format output to process check_passed payload (cherry picked from commit `5f5e9922bd`) * Format leader healthcheck result log and function exports (cherry picked from commit `ebffd7d8a4`) * Change leader_health_check command scope from queues to diagnostics (cherry picked from commit `663fc9846e`) * Update (c) line year (cherry picked from commit `df82f12a70`) * Rename command to check_for_quorum_queues_without_an_elected_leader and use across_all_vhosts option for global checks (cherry picked from commit `b2acbae28e`) * Use rabbit_db_queue for qq leader health check lookups and introduce rabbit_db_queue:get_all_by_type_and_vhost/2. Update leader health check timeout to 5s and process limit threshold to 20% of node's process_limit. (cherry picked from commit `7a8e166ff6`) * Update tests: quorum_queue_SUITE and rabbit_db_queue_SUITE (cherry picked from commit `9bdb81fd79`) * Fix typo (cli test module) (cherry picked from commit `615856853a`) * Small refactor - simpler final leader health check result return on function head match (cherry picked from commit `ea07938f3d`) * Clear dialyzer warning & fix type spec (cherry picked from commit `a45aa81bd2`) * Ignore result without strict match to avoid diayzer warning (cherry picked from commit `bb43c0b929`) * 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' documentation edits (cherry picked from commit 845230b0b380a5f5bad4e571a759c10f5cc93b91) * 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' output copywriting (cherry picked from commit 235f43bad58d3a286faa0377b8778fcbe6f8705d) * diagnostics check_for_quorum_queues_without_an_elected_leader: behave like a health check w.r.t. error reporting (cherry picked from commit db7376797581e4716e659fad85ef484cc6f0ea15) * check_for_quorum_queues_without_an_elected_leader: handle --quiet and --silent plus simplify function heads. References #13433. (cherry picked from commit 7b392315d5e597e5171a0c8196230d92b8ea8e92) --------- Co-authored-by: Ayanda Dube <adube14@bloomberg.net>	2025-03-12 00:32:59 -04:00
Michael Klishin	07ec1b4b50	New health check CLI commands	2025-01-28 16:44:37 -05:00
Michael Klishin	3f5b13d47f	Merge branch 'main' into mk-virtual-host-protection-from-accidental-deletion	2025-01-02 17:01:54 -05:00
Michael Klishin	c95a95f822	CLI: mix format	2025-01-02 17:00:46 -05:00
Michael Klishin	968eefa1bb	Bump (c) line year There are no functional changes to this massive diff.	2025-01-01 17:54:10 -05:00
Michael Klishin	9b6ab77f87	CLI: test/diagnostics cosmetics #12894	2024-12-04 13:44:29 -05:00
Michael Davis	c328922a2a	CLI: Fix match of discover peers command test This previously emitted a warning because Elixir will rebind `this_node` by default, so the `this_node` binding in the line above was unused. (As opposed to Erlang which would treat this as a match - rejecting the binding if `this_node` was not equal to the value being matched.) The node needed to be adjusted as well - `node()` returned the ExUnit runner's node while the command returned the remote node, which is stored in the context under `opts.node`.	2024-12-04 11:07:46 -05:00
Michael Davis	d58d874a0b	CLI: Resolve elixirc warnings	2024-12-04 11:07:34 -05:00
Jean-Sébastien Pédron	112ff3f3f5	rabbitmq_cli: Prepare tests to run against a node with Khepri enabled by default	2024-12-02 13:55:41 +01:00
Michael Klishin	f414c2d512	More missed license header updates #9969	2024-02-05 11:53:50 -05:00
Jean-Sébastien Pédron	84cede17e1	rabbit_peer_discovery: Rewrite core logic [Why] This work started as an effort to add peer discovery support to our Khepri integration. Indeed, as part of the task to integrate Khepri, we missed the fact that `rabbit_peer_discovery:maybe_create_cluster/1` was called from the Mnesia-specific code only. Even though we knew about it because we hit many issues caused by the fact the `join_cluster` and peer discovery use different code path to create a cluster. To add support for Khepri, the first version of this patch was to move the call to `rabbit_peer_discovery:maybe_create_cluster/1` from `rabbit_db_cluster` instead of `rabbit_mnesia`. To achieve that, it made sense to unify the code and simply call `rabbit_db_cluster:join/2` instead of duplicating the work. Unfortunately, doing so highlighted another issue: the way the node to cluster with was selected. Indeed, it could cause situations where multiple clusters are created instead of one, without resorting to out-of-band counter-measures, like a 30-second delay added in the Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was even more frequent when we tried to unify the code path and call `join_cluster`. After several iterations on the patch and even more discussions with the team, we decided to rewrite the algorithm to make node selection more robust and still use `rabbit_db_cluster:join/2` to create the cluster. [How] This commit is only about the rewrite of the algorithm. Calling peer discovery from `rabbit_db_cluster` instead of `rabbit_mnesia` (and thus making peer discovery work with Khepri) will be done in a follow-up commit. We wanted the new algorithm to fulfill the following properties: 1. `rabbit_peer_discovery` should provide the ability to re-trigger it easily to re-evaluate the cluster. The new public API is `rabbit_peer_discovery:sync_desired_cluster/0`. 2. The selection of the node to join should be designed in a way that all nodes select the same, regardless of the order in which they become available. The adopted solution is to sort the list of discovered nodes with the following criterias (in that order): 1. the size of the cluster a discovered node is part of; sorted from bigger to smaller clusters 2. the start time of a discovered node; sorted from older to younger nodes 3. the name of a discovered node; sorted alphabetically The first node in that list will not join anyone and simply proceed with its boot process. Other nodes will try to join the first node. 3. To reduce the chance of incorrectly having multiple standalone nodes because the discovery backend returned only a single node, we want to apply the following constraints to the list of nodes after it is filtered and sorted (see property 2 above): * The list must contain `node()` (i.e. the node running peer discovery itself). * If the RabbitMQ's cluster size hint is greater than 1, the list must have at least two nodes. The cluster size hint is the maximum between the configured target cluster size hint and the number of elements in the nodes list returned by the backend. If one of the constraint is not met, the entire peer discovery process is restarted after a delay. 4. The lock is acquired only to protect the actual join, not the discovery step where the backend is queried to get the list of peers. With the node selection described above, this will let the first node to start without acquiring the lock. 5. The cluster membership views queried as part of the algorithm to sort the list of nodes will be used to detect additional clusters or standalone nodes that did not cluster correctly. These nodes will be asked to re-evaluate peer discovery to increase the chance of forming a single cluster. 6. After some delay, peer discovery will be re-evaluated to further eliminate the chances of having multiple clusters instead of one. This commit covers properties from point 1 to point 4. Remaining properties will be the scope of additional pull requests after this one works. If there is a failure at any point during discovery, filtering/sorting, locking or joining, the entire process is restarted after a delay. This is configured using the following parameters: * cluster_formation.discovery_retry_limit * cluster_formation.discovery_retry_interval The default parameters were bumped to 30 retries with a delay of 1 second between each. The locking retries/interval parameters are not used by the new algorithm anymore. There are extra minor changes that come with the rewrite: * The configured backend is cached in a persistent term. The goal is to make sure we use the same backend throughout the entire process and when we call `maybe_unregister/0` even if the configuration changed for whatever reason in between. * `maybe_register/0` is called from `rabbit_db_cluster` instead of at the end of a successful peer discovery process. `rabbit_db_cluster` had to call `maybe_register/0` if the node was not virgin anyway. So make it simpler and always call it in `rabbit_db_cluster` regardless of the state of the node. * `log_configured_backend/0` is gone. `maybe_init/0` can log the backend directly. There is no need to explicitly call another function for that. * Messages are logged using `?LOG_*()` macros instead of the old `rabbit_log` module.	2023-12-07 15:51:54 +01:00
Michael Klishin	1b642353ca	Update (c) according to [1] 1. https://investors.broadcom.com/news-releases/news-release-details/broadcom-and-vmware-intend-close-transaction-november-22-2023	2023-11-21 23:18:22 -05:00
Michael Klishin	8a76e903a3	One more test renaming to follow CLI conventions	2023-11-13 20:46:31 -05:00
Michael Klishin	2ebc23ef23	Use a standard CLI test suite file naming convention	2023-11-13 19:51:58 -05:00
Michael Klishin	c4db560e0e	CLI: mix format	2023-11-13 11:21:29 -05:00
Michal Kuratczyk	408c33ec49	Add list_policies_that_match command	2023-11-13 13:47:54 +01:00
Michael Klishin	114f9b90c9	CLI: refactor 'diagnostics check_if_any_deprecated_features_are_used'	2023-11-06 22:50:35 -05:00
Michael Klishin	cbe2756cbd	CLI: tests and refactoring for 'diagnostics check_if_cluster_has_classic_queue_mirroring_policy'	2023-11-06 07:20:08 -05:00
Rin Kuryloski	42d29a5ca3	Run 'mix format' with elixir 1.15.2	2023-07-04 17:45:32 +02:00
Michael Klishin	98c85f367f	Bump (c) year	2023-07-04 00:21:40 +04:00
Michal Kuratczyk	699af2c8c3	Don't rely on implicit order in a test	2023-04-13 14:37:18 +02:00
Michael Klishin	c3c4665970	Update tests	2023-01-16 09:24:37 -08:00
Jean-Sébastien Pédron	4b132daaba	Remove upgrade-specific log file This category should be unused with the decommissioning of the old upgrade subsystem (in favor of the feature flags subsystem). It means: 1. The upgrade log file will not be created by default anymore. 2. The `$RABBITMQ_UPGRADE_LOG` environment variable is now unsupported. The configuration variables remain to avoid breaking an existing and working configuration.	2022-10-06 21:28:50 +02:00
Ayanda Dube	4cbbaad2df	mix format rabbitmq_cli	2022-10-02 18:54:11 +01:00
Jean-Sébastien Pédron	cdcf602749	Switch from Lager to the new Erlang Logger API for logging The configuration remains the same for the end-user. The only exception is the log root directory: it is now set through the `log_root` application env. variable in `rabbit`. People using the Cuttlefish-based configuration file are not affected by this exception. The main change is how the logging facility is configured. It now happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is removed. The supported outputs remain the same: the console, text files, the `amq.rabbitmq.log` exchange and syslog. The message text format slightly changed: the timestamp is more precise (now to the microsecond) and the level can be abbreviated to always be 4-character long to align all messages and improve readability. Here is an example: 2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE == 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5 2021-03-03 10:22:30.377860+01:00 [info] <0.229.0> Licensed under the MPL 2.0. Website: https://rabbitmq.com The example above also shows that multiline messages are supported and each line is prepended with the same prefix (the timestamp, the level and the Erlang process PID). JSON is also supported as a message format and now for any outputs. Indeed, it is possible to use it with e.g. syslog or the exchange. Here is an example of a JSON-formatted message sent to syslog: Mar 3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}} For quick testing, the values accepted by the `$RABBITMQ_LOGS` environment variables were extended: * `-` still means stdout * `-stderr` means stderr * `syslog:` means syslog on localhost * `exchange:` means logging to `amq.rabbitmq.log` `$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in addition to the existing `+color` one). With that modifier, messages are formatted as JSON intead of plain text. The `rabbitmqctl rotate_logs` command is deprecated. The reason is Logger does not expose a function to force log rotation. However, it will detect when a file was rotated by an external tool. From a developer point of view, the old `rabbit_log` API remains supported, though it is now deprecated. It is implemented as regular modules: there is no `parse_transform` involved anymore. In the code, it is recommended to use the new Logger macros. For instance, `?LOG_INFO(Format, Args)`. If possible, messages should be augmented with some metadata. For instance (note the map after the message): ?LOG_NOTICE("Logging: switching to configured handler(s); following " "messages may not be visible in this log output", #{domain => ?RMQLOG_DOMAIN_PRELAUNCH}), Domains in Erlang Logger parlance are the way to categorize messages. Some predefined domains, matching previous categories, are currently defined in `rabbit_common/include/logging.hrl` or headers in the relevant plugins for plugin-specific categories. At this point, very few messages have been converted from the old `rabbit_log` API to the new macros. It can be done gradually when working on a particular module or logging. The Erlang builtin console/file handler, `logger_std_h`, has been forked because it lacks date-based file rotation. The configuration of date-based rotation is identical to Lager. Once the dust has settled for this feature, the goal is to submit it upstream for inclusion in Erlang. The forked module is calld `rabbit_logger_std_h` and is based `logger_std_h` in Erlang 23.0.	2021-03-11 15:17:36 +01:00
Loïc Hoguin	5c829ff599	Add rabbitmq-diagnostics remote_shell	2021-03-03 11:28:54 +01:00
Michael Klishin	c43db9d4d9	Auth attempt command naming, add JSON --formatter support	2020-10-14 23:32:16 +03:00
dcorbacho	679ca254f3	Switch to Mozilla Public License 2.0 (MPL 2.0)	2020-07-11 19:23:07 +01:00
Michael Klishin	db299967e0	Introduce 'rabbitmq-diagnostics erlang_cookie_sources' to help troubleshoot authentication issues. Inspired by an idea from @gerhard.	2020-07-05 03:19:50 +07:00
Michael Klishin	f11384fe86	Introduce 'rabbitmq-diagnostics resolver_info' To inspect effective inetrc [1] settings used by a node or CLI tools. 1. https://erlang.org/doc/apps/erts/inet_cfg.html	2020-06-21 15:09:21 +03:00
Michael Klishin	b17fda724b	Introduce 'rabbitmq-diagnostics resolve_hostname' Helps with troubleshooting hostname resolution behavior on nodes and locally for CLI tools. This is obviously not meant to be a replacement for existing tools such as dig, only a way to quickly spot obvious irregularities, e.g. those in environments that use custom Erlang inetrc files. Per discussion @harshac.	2020-06-20 16:55:21 +03:00
Michael Klishin	3003b9e615	Introduce 'rabbitmq-diagnostics list_network_interfaces' To make it easier to discover them without using eval and obscure functions. Part of rabbitmq/rabbitmq-cli#424	2020-06-05 17:16:10 +03:00
Michael Klishin	0ff2e3fb77	Explain	2020-05-16 19:12:31 +03:00
Michael Klishin	a2ec22023f	Handle variable case in this test	2020-05-16 00:39:07 +03:00
Michael Klishin	0537a9ca36	Don't depend on a single env variable in this test	2020-05-16 00:36:33 +03:00
Michael Klishin	947940ccd5	Introduce 'rabbitmq-diagnostics os_env' It prints RabbitMQ-specific environment variables that are set on the target node. Can be used to inspect env variable-based configuration without access to the target host.	2020-05-06 23:19:04 +03:00
Michael Klishin	a3d60d35c5	Be less generous in empty whitespace use in setup_all functions	2020-03-24 19:08:00 +03:00
Jean-Sébastien Pédron	0e15591bf5	Update copyright (year 2020)	2020-03-10 15:39:56 +01:00
Michael Klishin	5b7063d07d	More sensible JSON formatting for some commands Fall back to a JSON document if command returns a bitstring (does not do any preformatting for JSON). Per discussion with @lukebakken Closes #394.	2020-01-18 02:06:04 +03:00
Michael Klishin	73776fbf04	(c) bump	2019-12-29 05:50:26 +03:00
Michael Klishin	01e950fd18	Make tests that mess with node or quorum state sequential As most tests already are. It's highly unlikely that these were meant to execute in parallel by design.	2019-12-11 14:51:10 +01:00
Michael Klishin	f3a06eda0d	Squash a warning	2019-09-30 23:19:56 +03:00
Jean-Sébastien Pédron	0cb95e7b55	log_tail_stream_command_test: Bump stream duration to 15 seconds ... from 5 seconds. Hopefully this will increase the chance of seeing the messages logged by the testcase.	2019-09-24 11:48:47 +02:00
Michael Klishin	5a137480b3	Merge pull request #378 from rabbitmq/consume-events-command Consume event command	2019-09-24 01:55:57 +03:00
Michael Klishin	99f1790ac3	Update test expectations	2019-09-24 01:52:31 +03:00
Michael Klishin	4c33ce0961	Move command_line_arguments to rabbitmq-diagnostics	2019-09-24 00:54:11 +03:00
dcorbacho	15d7eb2858	Diagnostics: test consume_event_stream_command [#168224266]	2019-09-23 17:19:07 +01:00
Michael Klishin	5b1086156e	diagnostics log_tail_stream: remove a fragile test It makes a lot of assumptions about Lager's log flush timing and can be tripped by the peak rate protection mechanism. This test module has a high rate of false positives on Concourse. There is another test that asserts over a "folded" stream, so code coverage is kept about the same.	2019-08-11 13:20:19 +10:00
Michael Klishin	462b480f16	Same as d3c01b3a1f1a65d1d935c3e6e0441388da44ba57 in more places (cherry picked from commit 68c8d204c08eb9956925e0fb71608a0737f3e771)	2019-07-06 20:31:43 +03:00
Michael Klishin	535f00e08f	Let Lager's log message rate lapse before logging in these tests Otherwise some log messages we assert on might be dropped. (cherry picked from commit d3c01b3a1f1a65d1d935c3e6e0441388da44ba57)	2019-07-06 18:47:11 +03:00

1 2 3

110 Commits