rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Ben Nguyen	c20c06c9bc	AWS peer discovery: ensure consistent hostname path ordering (#14557 ) * AWS peer discovery: ensure consistent hostname path ordering AWS EC2 API returns networkInterfaceSet and privateIpAddressesSet in arbitrary order, causing non-deterministic hostname resolution during peer discovery. This leads to inconsistent cluster formation. Changes: - Sort network interfaces by deviceIndex (0 first for primary ENI) - Sort private IP addresses by primary flag (primary=true first) - Add debug logging to show hostname path selection and sorting results - Add comprehensive unit tests for sorting behavior The sorting ensures deviceIndex=0 and primary=true IPs are consistently selected first, making peer discovery deterministic across deployments. * AWS peer discovery: ensure consistent hostname path ordering (address feedback on debug logs and sorting helper functions)	2025-09-24 15:05:17 -04:00
Michal Kuratczyk	175ba70e8c	[skip ci] Remove rabbit_log and switch to LOG_ macros	2025-07-18 08:42:59 +02:00
Aitor Perez	07adc3e571	Remove Bazel files	2025-03-13 13:42:34 +00:00
Michael Klishin	968eefa1bb	Bump (c) line year There are no functional changes to this massive diff.	2025-01-01 17:54:10 -05:00
Loïc Hoguin	bbfa066d79	Cleanup .gitignore files for the monorepo We don't need to duplicate so many patterns in so many files since we have a monorepo (and want to keep it). If I managed to miss something or remove something that should stay, please put it back. Note that monorepo-wide patterns should go in the top-level .gitignore file. Other .gitignore files are for application or folder- specific patterns.	2024-06-28 12:00:52 +02:00
Loïc Hoguin	9f15e978b1	make: Remove xrefr It is no longer used by Erlang.mk.	2024-06-25 13:08:08 +02:00
Michael Klishin	9c79ad8d55	More missed license header updates #9969	2024-02-05 12:26:25 -05:00
Michael Klishin	725ddaa43d	AWS peer discovery tests: log at debug level	2024-01-19 12:33:29 -05:00
Michael Klishin	01092ff31f	(c) year bumps	2024-01-01 22:02:20 -05:00
Alex Valiushko	4916a39d74	fix types	2023-12-12 14:50:38 -08:00
Alex Valiushko	fd26ac1713	add hostname path override	2023-12-12 14:50:38 -08:00
Jean-Sébastien Pédron	1e46d334bf	rabbit_peer_discovery: Acquire a lock for the joining and the joined nodes only [Why] A lock is acquired to protect against concurrent cluster joins. Some backends used to use the entire list of discovered nodes and used `global` as the lock implementation. This was a problem because a side effect was that all discovered Erlang nodes were connected to each other. This led to conflicts in the global process name registry and thus processes were killed randomly. This was the case with the feature flags controller for instance. Nodes are running some feature flags operation early in boot before they are ready to cluster or run the peer discovery code. But if another node was executing peer discovery, it could make all nodes connected. Feature flags controller unrelated instances were thus killed because of another node running peer discovery. [How] Acquiring a lock on the joining and the joined nodes only is enough to achieve the goal of protecting against concurrent joins. This is possible because of the new core logic which ensures the same node is used as the "seed node". I.e. all nodes will join the same node. Therefore the API of `rabbit_peer_discovery_backend:lock/1` is changed to take a list of nodes (the two nodes mentionned above) instead of one node (which was the current node, so not that helpful in the first place). These backends also used to check if the current node was part of the discovered nodes. But that's already handled in the generic peer discovery code already. CAUTION: This brings a breaking change in the peer discovery backend API. The `Backend:lock/1` callback now takes a list of node names instead of a single node name. This list will contain the current node name.	2023-12-07 15:51:54 +01:00
Rin Kuryloski	51b99544c5	Capture container logs in rabbitmq_peer_discovery_aws suite Using cloudwatch logs	2023-12-04 12:12:56 +01:00
Michael Klishin	1b642353ca	Update (c) according to [1] 1. https://investors.broadcom.com/news-releases/news-release-details/broadcom-and-vmware-intend-close-transaction-november-22-2023	2023-11-21 23:18:22 -05:00
Michael Klishin	55442aa914	Replace @rabbitmq.com addresses with rabbitmq-core@groups.vmware.com Don't ask why we have to do it. Because reasons!	2023-06-20 15:40:13 +04:00
Rin Kuryloski	a944439fba	Replace globs in bazel with explicit lists of files As this is preferred in rules_erlang 3.9.14	2023-04-25 17:29:12 +02:00
Rin Kuryloski	854d01d9a5	Restore the original -include_lib statements from before #6466 since this broke erlang_ls requires rules_erlang 3.9.13	2023-04-20 12:40:45 +02:00
Rin Kuryloski	8de8f59d47	Use gazelle generated bazel files Bazel build files are now maintained primarily with `bazel run gazelle`. This will analyze and merge changes into the build files as necessitated by certain code changes (e.g. the introduction of new modules). In some cases there hints to gazelle in the build files, such as `# gazelle:erlang...` or `# keep` comments. xref checks on plugins that depend on the cli are a good example.	2023-04-17 18:13:18 +02:00
Rin Kuryloski	8a7eee6a86	Ignore warnings when building plt files for dependencies As we don't generally care if a dependency has warnings, only the target	2023-04-17 10:09:24 +02:00
Alexey Lebedeff	50ed7ad6f7	Fix all dialyzer warnings in AWS-related plugins	2023-01-20 15:20:26 +01:00
Rin Kuryloski	5ef8923462	Avoid the need to pass package name to rabbitmq_integration_suite	2023-01-18 15:25:27 +01:00
Rin Kuryloski	a317b30807	Use improved assert_suites2 macro from rules_erlang 3.9.0	2023-01-18 15:07:06 +01:00
Michael Klishin	ec4f1dba7d	(c) year bump: 2022 => 2023	2023-01-01 23:17:36 -05:00
Luke Bakken	7fe159edef	Yolo-replace format strings Replaces `~s` and `~p` with their unicode-friendly counterparts. ``` git ls-files *.erl \| xargs sed -i.ORIG -e s/~s>/~ts/g -e s/~p>/~tp/g ```	2022-10-10 10:32:03 +04:00
Rin Kuryloski	575c5f9975	Remove all of the .travis.yml files since we no longer use them	2022-08-16 09:46:31 +02:00
Philip Kuryloski	15a79466b1	Use the new xref2 macro from rules_erlang That adopts the modern erlang.mk xref behaviour	2022-06-09 23:18:28 +02:00
Philip Kuryloski	327f075d57	Make rabbitmq-server work with rules_erlang 3 Also rework elixir dependency handling, so we no longer rely on mix to fetch the rabbitmq_cli deps Also: - Specify ra version with a commit rather than a branch - Fixup compilation options for erlang 23 - Add missing ra reference in MODULE.bazel - Add missing flag in oci.yaml - Reduce bazel rbe jobs to try to save memory - Use bazel built erlang for erlang git master tests - Use the same cache for all the workflows but windows - Avoid using `mix local.hex --force` in elixir rules - Fetching seems blocked in CI, and this should reduce hex api usage in all builds, which is always nice - Remove xref and dialyze tags since rules_erlang 3 includes them in the defaults	2022-06-08 14:04:53 +02:00
Loïc Hoguin	dc70cbf281	Update Erlang.mk and switch to new xref code	2022-05-31 13:51:12 +02:00
Luke Bakken	dba25f6462	Replace files with symlinks This prevents duplicated and out-of-date instructions.	2022-04-15 06:04:29 -07:00
Michael Klishin	4ced055341	AWS peer discovery integration suite: squash a compiler warning	2022-04-10 09:38:06 +04:00
Philip Kuryloski	23226ad705	Attempt to reduce flakes in aws integration tests	2022-04-01 09:40:14 +02:00
Michael Klishin	c38a3d697d	Bump (c) year	2022-03-21 01:21:56 +04:00
Philip Kuryloski	226e00fcd2	Tighten up dialyzer usage now that rules_erlang no longer cascades up dialyzer warnings from deps	2022-02-24 11:18:41 +01:00
Philip Kuryloski	d8201726ae	Ignore dialyzer warnings for most apps	2022-02-21 09:19:56 +01:00
Philip Kuryloski	efcd881658	Use rules_erlang v2 bazel-erlang has been renamed rules_erlang. v2 is a substantial refactor that brings Windows support. While this alone isn't enough to run all rabbitmq-server suites on windows, one can at least now start the broker (bazel run broker) and run the tests that do not start a background broker process	2022-01-18 13:43:46 +01:00
Michael Klishin	798be7dcaf	Peer discovery AWS, K8S: more Dialyzer fixes	2021-10-07 03:42:44 +03:00
Philip Kuryloski	ab0a9cd700	Merge pull request #3516 from rabbitmq/move-ct-helpers-to-monorepo Move CT helpers to monorepo	2021-09-30 11:29:26 +02:00
Alexey Lebedeff	46df4f1689	Update makefiles/bazel to reflect CT helpers repo merge-in	2021-09-30 10:48:11 +02:00
Vy Hong	7090199330	Reuse list of nodes in peer discovery plugins that use Erlang global locks AWS, Kubernetes and Classic peer discovery plugins use list_nodes and Erlang global:set_lock to create a mutex lock. To unlock, these plugins get the latest list with list_nodes and call global:del_lock. However, if list_nodes within unlock fails, RabbitMQ will throw an uncaught exception and the lock will not be released until the node holding the lock is restarted. This prevents new nodes from joining the cluster. This failure can be avoided by passing the list of nodes from lock to unlock. If a node goes away (and comes back) between the lock and unlock calls, del_lock could still successfully remove the lock. Similarly, if a new node starts up between the lock and unlock calls, del_lock wouldn't need to inform the new node.	2021-09-29 16:22:37 -07:00
Philip Kuryloski	aefb8ad753	bump a test timeout	2021-08-04 09:42:03 +02:00
Philip Kuryloski	a0a5bf3c01	Update peer discovery aws tests for docker image changes	2021-07-28 10:19:11 +02:00
Philip Kuryloski	0593e8307e	Increase timeouts in aws peer discovery integration suite	2021-07-20 11:41:40 +02:00
Philip Kuryloski	d6399bbb5b	Mixed version testing in bazel (#3200 ) Unlike with gnu make, mixed version testing with bazel uses a package-generic-unix for the secondary umbrella rather than the source. This brings the benefit of being able to mixed version test releases built with older erlang versions (even though all nodes will run under the single version given to bazel) This introduces new test labels, adding a `-mixed` suffix for every existing test. They can be skipped if necessary with `--test_tag_filters` (see the github actions workflow for an example) As part of the change, it is now possible to run an old release of rabbit with rabbitmq_run rule, such as: `bazel run @rabbitmq-server-generic-unix-3.8.17//:rabbitmq-run run-broker`	2021-07-19 14:33:25 +02:00
Philip Kuryloski	8f9de08de7	Also assert no missing suites for all other deps	2021-07-12 18:05:55 +02:00
Philip Kuryloski	8c7e7e0656	Revert "Default all `rabbitmq_integration_suite` to flaky in bazel" This reverts commit `70cb8147b2`.	2021-06-23 20:53:14 +02:00
Philip Kuryloski	70cb8147b2	Default all `rabbitmq_integration_suite` to flaky in bazel Most tests that can start rabbitmq nodes have some chance of flaking. Rather than chase individual flakes for now, this commit changes the default (though it can still be overriden, as is the case for config_scheme_SUITE in many places, since I have yet to see that particular suite flake).	2021-06-21 16:10:38 +02:00
David Ansari	0876746d5f	Remove randomized startup delays On initial cluster formation, only one node in a multi node cluster should initialize the Mnesia database schema (i.e. form the cluster). To ensure that for nodes starting up in parallel, RabbitMQ peer discovery backends have used either locks or randomized startup delays. Locks work great: When a node holds the lock, it either starts a new blank node (if there is no other node in the cluster), or it joins an existing node. This makes it impossible to have two nodes forming the cluster at the same time. Consul and etcd peer discovery backends use locks. The lock is acquired in the consul and etcd infrastructure, respectively. For other peer discovery backends (classic, DNS, AWS), randomized startup delays were used. They work good enough in most cases. However, in https://github.com/rabbitmq/cluster-operator/issues/662 we observed that in 1% - 10% of the cases (the more nodes or the smaller the randomized startup delay range, the higher the chances), two nodes decide to form the cluster. That's bad since it will end up in a single Erlang cluster, but in two RabbitMQ clusters. Even worse, no obvious alert got triggered or error message logged. To solve this issue, one could increase the randomized startup delay range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster formation very slow since it will take up to 3 minutes until every node is ready. In rare cases, we still end up with two nodes forming the cluster. Another way to solve the problem is to name a dedicated node to be the seed node (forming the cluster). This was explored in https://github.com/rabbitmq/cluster-operator/pull/689 and works well. Two minor downsides to this approach are: 1. If the seed node never becomes available, the whole cluster won't be formed (which is okay), and 2. it doesn't integrate with existing dynamic peer discovery backends (e.g. K8s, AWS) since nodes are not yet known at deploy time. In this commit, we take a better approach: We remove randomized startup delays altogether. We replace them with locks. However, instead of implementing our own lock implementation in an external system (e.g. in K8s), we re-use Erlang's locking mechanism global:set_lock/3. global:set_lock/3 has some convenient properties: 1. It accepts a list of nodes to set the lock on. 2. The nodes in that list connect to each other (i.e. create an Erlang cluster). 3. The method is synchronous with a timeout (number of retries). It blocks until the lock becomes available. 4. If a process that holds a lock dies, or the node goes down, the lock held by the process is deleted. The list of nodes passed to global:set_lock/3 corresponds to the nodes the peer discovery backend discovers (lists). Two special cases worth mentioning: 1. That list can be all desired nodes in the cluster (e.g. in classic peer discovery where nodes are known at deploy time) while only a subset of nodes is available. In that case, global:set_lock/3 still sets the lock not blocking until all nodes can be connected to. This is good since nodes might start sequentially (non-parallel). 2. In dynamic peer discovery backends (e.g. K8s, AWS), this list can be just a subset of desired nodes since nodes might not startup in parallel. That's also not a problem as long as the following requirement is met: "The peer disovery backend does not list two disjoint sets of nodes (on different nodes) at the same time." For example, in a 2-node cluster, the peer discovery backend must not list only node 1 on node 1 and only node 2 on node 2. Existing peer discovery backends fullfil that requirement because the resource the nodes are discovered from is global. For example, in K8s, once node 1 is part of the Endpoints object, it will be returned on both node 1 and node 2. Likewise, in AWS, once node 1 started, the described list of instances with a specific tag will include node 1 when the AWS peer discovery backend runs on node 1 or node 2. Removing randomized startup delays also makes cluster formation considerably faster (up to 1 minute faster if that was the upper bound in the range).	2021-06-03 08:01:28 +02:00
Philip Kuryloski	30f9a95b9f	Add dialyze for remaning tier-1 plugins	2021-06-01 10:19:10 +02:00
Philip Kuryloski	98e71c45d8	Perform xref checks on many tier-1 plugins	2021-05-21 12:03:22 +02:00
Philip Kuryloski	e6df6615e1	Futher bazel file refactoring and deduplication	2021-05-11 16:15:33 +02:00

1 2 3 4 5

228 Commits