rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Arnaud Cogoluègnes	d1aab61566	Prevent blocked groups in stream SAC with fine-grained status A boolean status in the stream SAC coordinator is not enough to follow the evolution of a consumer. For example a former active consumer that is stepping down can go down before another consumer in the group is activated, letting the coordinator expect an activation request that will never arrive, leaving the group without any active consumer. This commit introduces 3 status: active (formerly "true"), waiting (formerly "false"), and deactivating. The coordinator will now know when a deactivating consumer goes down and will trigger a rebalancing to avoid a stuck group. This commit also introduces a status related to the connectivity state of a consumer. The possible values are: connected, disconnected, and presumed_down. Consumers are by default connected, they can become disconnected if the coordinator receives a down event with a noconnection reason, meaning the node of the consumer has been disconnected from the other nodes. Consumers can become connected again when their node joins the other nodes again. Disconnected consumers are still considered part of a group, as they are expected to come back at some point. For example there is no rebalancing in a group if the active consumer got disconnected. The coordinator sets a timer when a disconnection occurs. When the timer expires, corresponding disconnected consumers pass into the "presumed down" state. At this point they are no longer considered part of their respective group and are excluded from rebalancing decision. They are expected to get removed from the group by the appropriate down event of a monitor. So the consumer status is now a tuple, e.g. {connected, active}. Note this is an implementation detail: only the stream SAC coordinator deals with the status of stream SAC consumers. 2 new configuration entries are introduced: * rabbit.stream_sac_disconnected_timeout: this is the duration in ms of the disconnected-to-forgotten timer. * rabbit.stream_cmd_timeout: this is the timeout in ms to apply RA commands in the coordinator. It used to be a fixed value of 30 seconds. The default value is still the same. The setting has been introduced to make integration tests faster. Fixes #14070	2025-06-17 11:56:20 +02:00
Iliia Khaprov	9a2f702f4f	Log queue_utils ra's local_query rpc error	2025-06-13 14:51:41 +02:00
Iliia Khaprov	2f3bed5a5b	Move clustering_utils and queue_utils to ct_helpers	2025-06-13 14:43:23 +02:00
Diana Parra Corbacho	e1d71b185c	CT broker helpers: use rabbitmq-plugins from the given node with a secondary umbrella	2025-06-10 18:14:30 +02:00
David Ansari	eccf9fee1e	Run Quorum Queue property test on different OTP versions Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Has been cancelled Details Test (make) / Build and Xref (1.17, 26) (push) Has been cancelled Details Test (make) / Build and Xref (1.17, 27) (push) Has been cancelled Details Test (make) / Test (1.17, 27, khepri) (push) Has been cancelled Details Test (make) / Test (1.17, 27, mnesia) (push) Has been cancelled Details Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Has been cancelled Details Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Has been cancelled Details Test (make) / Type check (1.17, 27) (push) Has been cancelled Details ## What? PR #13971 added a property test that applies the same quorum queue Raft command on different quorum queue members on different Erlang nodes ensuring that the state machine ends up in exaclty the same state. The different Erlang nodes run the same Erlang/OTP version however. This commit adds another property test where the different Erlang nodes run different Erlang/OTP versions. ## Why? This test allows spotting any non-determinism that could occur when running quorum queue members in a mixed version cluster, where mixed version means in our context different Erlang/OTP versions. ## How? CI runs currently tests with Erlang 27. This commit starts an Erlang 26 node in docker, specifically for the `rabbit_fifo_prop_SUITE`. Test case `two_nodes_different_otp_version` running Erlang 27 then transfers a few Erlang modules (e.g. module `rabbit_fifo`) to the Erlang 26 node. The test case then runs the Ra commands on its own node in Erlang 27 and on the Erlang 26 node in Docker. By default, this test case is skipped locally. However, to run this test case locally, simply start an Erlang node as follows: ``` erl -sname rabbit_fifo_prop@localhost ```	2025-06-06 17:08:28 +02:00
Iliia Khaprov	76a5531d8c	Move test_utils.erl from rabbit to rabbitmq_ct_helpers fake_pid function is useful for other plugins	2025-05-27 18:06:59 +02:00
Iliia Khaprov	6b528e2caf	Replace ct:pal with ct:log in select places	2025-05-26 16:57:41 +02:00
Iliia Khaprov	8512a4459b	Hardcode rabbit_ct_hook and cth_styledout inside our ct_master_fork. Helps cleaning-up/coloring stdout for parallel targets TODO: there are obvious races for different nodes outputs In the next iteration I hope to implement cursor tracking for each node	2025-05-26 16:57:40 +02:00
Iliia Khaprov	8dcad8a4fd	Run rabbit_ct_hook for management, and mqtt	2025-05-26 16:57:40 +02:00
Jean-Sébastien Pédron	124467e620	rabbitmq_ct_helpers: Use node 2 as seed node, even with secondary umbrella [Why] This makes sure that nodes are clustered the same way, whether the tests are executed with or without a secondary umbrella.	2025-04-08 18:47:27 +02:00
Arnaud Cogoluègnes	b8244f70f4	Pull from socket up to 10 times in stream test utils (#13588 ) To make sure to have enough data to complete a command.	2025-03-24 09:13:31 +01:00
Loïc Hoguin	c5d150a7ef	Use Erlang.mk's native Elixir support for CLI This avoids using Mix while compiling which simplifies a number of things and let us do further build improvements later on. Elixir is only enabled from within rabbitmq_cli currently. Eunit is disabled since there are only Elixir tests. Dialyzer will force-enable Elixir in order to process Elixir-compiled beam files. This commit also includes a few changes that are related: * The Erlang distribution will now be started for parallel-ct * Many unnecessary PROJECT_MOD lines have been removed * `eunit_formatters` has been removed, it provides little value * The new `maybe_flock` Erlang.mk function is used where possible * Build test deps when testing rabbitmq_cli (Mix won't do it anymore) * rabbitmq_ct_helpers now use the early plugins to have Dialyzer properly set up	2025-03-18 10:02:49 +01:00
Aitor Perez	07adc3e571	Remove Bazel files	2025-03-13 13:42:34 +00:00
Diana Parra Corbacho	c0bd1f5202	Tests: add rabbitmq_diagnostics to test helpers	2025-02-20 15:58:04 +01:00
Jean-Sébastien Pédron	1f1a13521b	Skip peer discovery clustering tests if multiple Khepri machine versions ... are being used at the same time. [Why] Depending on which node clusters with which, a node running an older version of the Khepri Ra machine may not be able to apply Ra commands and could be stuck. There is no real solution and this clearly an unsupported scenario. An old node won't always be able to join a newer cluster. [How] In the testsuites, we skip clustering tests if we detect that multiple Khepri Ra machine versions are being used.	2025-02-12 17:13:24 +01:00
Jean-Sébastien Pédron	c78aec7d48	rabbit_db: `force_reset` command is unsupported with Khepri [Why] The `force_reset` command simply removes local files on disk for the local node. In the case of Ra, this can't work because the rest of the cluster does not know about the forced-reset node. Therefore the leader will continue to send `append_entry` commands to the reset node. If that forced-reset node restarts and receives these messages, it will either join the cluster again (because it's on an older Raft term) or it will hit an assertion and exit (because it's on the same Raft term). [How] Given we can't really support this scenario and it has little value, the command will now return an error if someone attemps a `force_reset` with a node running Khepri. This also deprecates the command: once Mnesia support is removed, the command will be removed at the same time. This is noted in the rabbitmqctl.8 manpage.	2025-02-10 15:09:36 +01:00
David Ansari	579c58603e	Support AMQP over WebSocket (OSS part)	2025-01-27 17:50:47 +01:00
Jean-Sébastien Pédron	f549425615	rabbitmq_ct_broker_helpers: Use node 2 as the cluster seed node [Why] When running mixed-version tests, nodes 1/3/5/... are using the primary umbrella, so usually the newest version. Nodes 2/4/6/... are using the secondary umbrella, thus the old version. When clustering, we used to use node 1 (running a new version) as the seed node, meaning other nodes would join it. This complicates things with feature flags because we have to make sure that we start node 1 with new stable feature flags disabled to allow old nodes to join. This is also a problem with Khepri machine versions because the cluster would start with the latest version, which old nodes might not have. [How] This patch changes the logic to use a node running the secondary umbrella as the seed node instead. If there is no node running it, we pick the first node as before. V2: Revert part of "rabbitmq_ct_helpers: Fix how we set `$RABBITMQ_FEATURE_FLAGS` in tests" (commit `57ed962ef6`). These changes are no longer needed with the new logic. V3: The check that verifies that the correct metadata store is used has a special case for nodes that use the secondary umbrella: if Khepri is supposed to be used but it's not, the feature flag is enabled. The reason is that the `v4.0.x` branch doesn't know about the `rel` configuration of `forced_feature_flags_on_init`. The nodes will have ignored thies parameter and booted with the stable feature flags only. Many testsuites are adapted to the new clustering order. If they manage which node joins which node, either the order is changed in the testcases, or nodes are started with only required feature flags. For testsuites that rely on peer discovery where the order is unknown, nodes are started with only required feature flags.	2025-01-27 12:08:12 +01:00
Karl Nilsson	d6865a648e	Ct helpers: add "** killed" to the defaul log crash ignore list. Exits the with reason "killed" only occurs "naturally" in OTP when a supervisor tries to shut a child down and it times out. It is used for failure simulation in tests quite frequently however.	2025-01-23 13:26:41 +00:00
Arnaud Cogoluègnes	b3b0940024	Fix wait-for-confirms sequence in stream test utils And refine the implementation and its usage.	2025-01-21 17:38:58 +01:00
Jean-Sébastien Pédron	57ed962ef6	rabbitmq_ct_helpers: Fix how we set `$RABBITMQ_FEATURE_FLAGS` in tests [Why] In order to make `khepri_db` the default in the future, the handling of `$RABBITMQ_FEATURE_FLAGS` had to be adapted to be able to disable Khepri instead. Unfortunately I broke the behavior with stable feature flags that are only available in the primary umbrella. In this case, they were automatically enabled and thus, clustering with an old umbrella that did not have these feature flags failed with `incompatible_feature_flags`. [How] The solution is to always use an absolute list of feature flags, not the new relative list. V2: Allow a testsuite to skip the configuration of the metadata store. This is needed for the feature_flags_SUITE testsuite because it tests the default behavior and the configuration of the metadata store changes that behavior. While here, fix a ct log message where variables were swapped compared to the format strieg expectation. V3: Enable `rabbitmq_4.0.0` feature flag in rabbit_mgmt_http_SUITE. This testsuite apparently requires it and if it's not enabled, it fails.	2025-01-15 20:43:41 +01:00
Michael Klishin	3f5b13d47f	Merge branch 'main' into mk-virtual-host-protection-from-accidental-deletion	2025-01-02 17:01:54 -05:00
Michael Klishin	f62d46c286	Introduce a way to protect a virtual host from deletion Accidental "fat finger" virtual deletion accidents would be easier to avoid if there was a protection mechanism that would apply equally even to CLI tools and external applications that do not use confirmations for deletion operations. This introduce the following changes: * Virtual host metadata now supports a new queue, 'protected_from_deletion', which, when set, will be considered by key virtual host deletion function(s) * DELETE /api/vhosts/{name} was adapted to handle such blocked deletion attempts to respond with a 412 Precondition Failed status * 'rabbitmqctl list_vhosts' and 'rabbitmqctl delete_vhost' were adapted accordingly * DELETE /api/vhosts/{name}/deletion/protection is a new endpoint that can be used to remove the protective seal (the metadata key) * POST /api/vhosts/{name}/deletion/protection marks the virtual host as protected In the case of the HTTP API, all operations on virtual host metadata require administrative privileges from the target user. Other considerations: * When a virtual host does not exist, the behavior remains the same: the original, protection-unaware code path is used to preserve backwards compatibility References #12772.	2025-01-02 16:50:51 -05:00
Michael Klishin	968eefa1bb	Bump (c) line year There are no functional changes to this massive diff.	2025-01-01 17:54:10 -05:00
Jean-Sébastien Pédron	debe2a118c	rabbitmq_ct_helpers: Change how Mnesia/Khepri is selected [Why] Once `khepr_db` is enabled by default, we need another way to disable it to select Mnesia instead. [How] We use the new relative forced feature flags mechanism to indicate if we want to explicitly enable or disable `khepri_db`. This way, we don't touch other stable feature flags and only mess with Khepri. However, this mechanism is not supported by RabbitMQ 4.0.x and older. They will ignore the setting. Therefore, to make this work in mixed-version testing, we set the `$RABBITMQ_FEATURE_FLAGS` variable for the secondary umbrella. This part will go away once we test against RabbitMQ 4.1.x as the secondary umbrella in the future. At the end, we compare the effective metadata store to the expected one. If they don't match, we skip the test. While here, change `rjms_topic_selector_SUITE` to only choose Khepri without specifying any feature flags.	2024-12-17 09:56:54 +01:00
Michael Klishin	1cae417dbf	Merge pull request #12821 from rabbitmq/rabbitmq-server-12776 Definition export: inject default queue type into virtual host metadata	2024-11-27 14:53:25 -05:00
Michael Klishin	090d11818f	HTTP API tests for injected default queue type	2024-11-26 18:00:37 -05:00
Diana Parra Corbacho	ca0a450f3b	Tests: SSL certificates Parallel/sharding groups often fail to create certificates in CI. Most likely it is related to the fact they use the same directory for certificates. This commit uses shard/node name and unique id for each SSL certificate	2024-11-25 14:46:05 +01:00
GitHub	873d54a088	bazel run gazelle	2024-11-21 04:02:30 +00:00
Péter Gömöri	9bb7530d04	Move client-side stream protocol test helpers to a separate module So that they can be used from multiple test suites. (cherry picked from commit `cf8a00c5db`)	2024-11-19 19:13:59 -05:00
Michael Klishin	961e5c5a21	Undo the Bazel-related change from #12696 (cherry picked from commit `a66c926985`)	2024-11-09 17:47:06 -05:00
Michael Klishin	673826425a	Merge pull request #12696 from rabbitmq/mk-http-api-lower-body-length-limit-for-binding-creation HTTP API: reduce body size limit for the endpoint used to bind queues/streams/exchanges	2024-11-09 17:13:03 -05:00
Michael Klishin	3dc5c463a4	Pass Dialyzer	2024-11-09 16:53:45 -05:00
Marcial Rosales	e7cb2420a7	Verify non-zero DNS and email SAN	2024-10-29 16:41:20 +01:00
Loïc Hoguin	f68fc8bb94	Make CI: Add mixed version testing This is enabled on main and for pull requests. Bazel remains used in previous branches.	2024-10-25 13:50:05 +02:00
Loïc Hoguin	4127f15676	Make CI: Bazel updates following ct_master work	2024-10-15 14:57:42 +02:00
Loïc Hoguin	8d411c7cda	Make CI: Print auto-skipped and failed test cases at the end Of a ct_master run. This uses the builtin CT Master event handler to gather the results.	2024-10-15 14:57:42 +02:00
Loïc Hoguin	655caf6d1a	Make CI: Have ct_master return the test results Instead of having a CT hook just to know whether our tests failed.	2024-10-15 14:57:42 +02:00
Loïc Hoguin	dddf917378	Make CI: Sort the results printout from ct_master It makes more sense to sort by node name, than to have the results in the order they finished.	2024-10-15 14:57:42 +02:00
Loïc Hoguin	6cdc32f558	Make CI: Make ct_master handle all testspec instructions	2024-10-15 14:57:42 +02:00
Loïc Hoguin	77ab5eddcb	Reduce the amount of printing to the terminal during tests	2024-10-15 14:57:42 +02:00
Loïc Hoguin	1897e02764	Make CI: Fix a small issue in master_runs.html	2024-10-15 14:57:42 +02:00
Loïc Hoguin	ce7184598c	Make CI: Fix the master_runs.html css file paths Needed to file:set_cwd like in normal CT.	2024-10-15 14:57:42 +02:00
Loïc Hoguin	37c2f9f675	Make CI: Don't refresh logs at the end of ct_master run The ct_run:run_test function already takes care of the node's logs. The ct_master_logs module takes care of ct_master itself.	2024-10-15 14:57:41 +02:00
Loïc Hoguin	807c8f8a0b	Make CI: Add forks of ct_master_event and ct_master_logs	2024-10-15 14:57:41 +02:00
GitHub	b9bb3014c0	bazel run gazelle	2024-10-08 04:02:25 +00:00
Loïc Hoguin	9645fb1275	Make parallel-ct properly detect test failures The problem comes from `ct_master` which doesn't tell us in the return value whether the tests succeeded. In order to get that information a CT hook was created. But then we run into another problem: despite its documentation claiming otherwise, `ct_master` does not handle `ct_hooks` instructions in the test spec. So for the time being we fork `ct_master` into a new `ct_master_fork` module and insert our hook directly in the code. Later on we will submit patches to OTP.	2024-10-07 13:30:32 +02:00
Loïc Hoguin	7fe78a3af9	Better fix for a Dialyzer warning The previous fix was leading to a badmatch in some cases, including when trying to stop a node that was already stopped.	2024-09-30 14:25:01 +02:00
Loïc Hoguin	f54e307aee	CT: No longer wait 3 minutes for node start Reverting back to the default 1 minute. The problem with 3 minutes is that this is exceedingly long and when there are problems the test time increases exponentially.	2024-09-30 12:35:44 +02:00
Loïc Hoguin	67eee5602c	Fix OTP-27 Dialyzer errors in rabbitmq_ct_helpers	2024-09-30 12:35:43 +02:00

1 2 3 4 5 ...

571 Commits