rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Michal Kuratczyk	68de3fdb77	Fix channel crash when publishing to a new stream (#12969 ) The following scenario led to a channel crash: 1. Publish to a non-existing stream: `perf-test -y 0 -p -e amq.default -t direct -k stream` 2. Declare the stream: `rabbitmqadmin declare queue name=stream queue_type=stream` There is no pid yet, so we got a function_clause with `none` ``` {function_clause, [{osiris_writer,write, [none,<0.877.0>,<<"<0.877.0>_-65ZKFz18ll5lau0phi7CsQ">>,1, [[0,"Sp",[192,6,5,"B@@AC"]], [0,"Sr", [193,38,4, [[[163,10,<<"x-exchange">>],[161,0,<<>>]], [[163,13,<<"x-routing-key">>],[161,6,<<"stream">>]]]]], [0,"Su",[160,12,[<<0,19,252,1,0,0,98,171,20,16,108,167>>]]]]], [{file,"src/osiris_writer.erl"},{line,158}]}, {rabbit_stream_queue,deliver0,4, [{file,"rabbit_stream_queue.erl"},{line,540}]}, {rabbit_stream_queue,'-deliver/3-fun-0-',4, [{file,"rabbit_stream_queue.erl"},{line,526}]}, {lists,foldl,3,[{file,"lists.erl"},{line,2146}]}, {rabbit_queue_type,'-deliver0/4-fun-5-',5, [{file,"rabbit_queue_type.erl"},{line,707}]}, {maps,fold_1,4,[{file,"maps.erl"},{line,860}]}, {rabbit_queue_type,deliver0,4, [{file,"rabbit_queue_type.erl"},{line,704}]}, {rabbit_queue_type,deliver,4, [{file,"rabbit_queue_type.erl"},{line,662}]}]} ``` Co-authored-by: Karl Nilsson <kjnilsson@gmail.com>	2024-12-20 08:56:25 +01:00
Jean-Sébastien Pédron	ea2c8db2d1	rabbit_feature_flags: Add testcase after issue #12963 [Why] Up-to RabbitMQ 3.13.x, there was a case where if: 1. you enabled a plugin 2. you enabled its feature flags 3. you disabled the plugin 4. you restarted a node (or upgraded it) ... the node could crash on startup because it had a feature flag marked as enabled that it didn't know about: error:{badmatch,#{feature_flags => ... rabbit_ff_controller:-check_one_way_compatibility/2-fun-0-/3, line 514 lists:all_1/2, line 1520 rabbit_ff_controller:are_compatible/2, line 496 rabbit_ff_controller:check_node_compatibility_task1/4, line 437 rabbit_db_cluster:check_compatibility/1, line 376 This was "fixed" by the new way of keeping the registry in memory (#10988) because it introduces a slight change of behavior. Indeed, the old way walked through the `FeatureFlags` map and looked up the state in the `FeatureStates` map to create the `is_enabled/1` function. The new way just looks up the state in `FeatureStates`. [How] The new testcase succeeds on 4.0.x and `main`, but would fail on 3.13.x with the aforementionne crash.	2024-12-19 16:33:43 +01:00
Jean-Sébastien Pédron	3325def8eb	rabbit_feature_flags: Take callback definition from correct node [Why] The feature flag controller that is responsible for enabling a feature flag may be on a node that doesn't know this feature flag. This is supported by there is a bug when it queries the callback definition for that feature flag: it uses its own registry which does not have anything about this feature flag. This leads to a crash because the `run_callback/5` funtion tries to use the `undefined` atom returned by the registry as a map: crasher: initial call: rabbit_ff_controller:init/1 pid: <0.374.0> registered_name: rabbit_ff_controller exception error: bad map: undefined in function rabbit_ff_controller:run_callback/5 in call from rabbit_ff_controller:do_enable/3 (rabbit_ff_controller.erl, line 1244) in call from rabbit_ff_controller:update_feature_state_and_enable/2 (rabbit_ff_controller.erl, line 1180) in call from rabbit_ff_controller:enable_with_registry_locked/2 (rabbit_ff_controller.erl, line 1050) in call from rabbit_ff_controller:enable_many_locked/2 (rabbit_ff_controller.erl, line 991) in call from rabbit_ff_controller:enable_many/2 (rabbit_ff_controller.erl, line 979) in call from rabbit_ff_controller:updating_feature_flag_states/3 (rabbit_ff_controller.erl, line 307) in call from gen_statem:loop_state_callback/11 (gen_statem.erl, line 3735) [How] The callback definition is now queried from the first node in the list given as argument. For the common use case where all nodes know about a feature flag, the first node is the local one, so there should be no latency caused by the RPC. See #12963.	2024-12-19 13:45:27 +01:00
Jean-Sébastien Pédron	dbec429fba	rabbit_feature_flags: Fix function name in the controller [Why] `state_after_virtual_state()` meant nothing. `state_after_virtual_reset()` was the name I had in mind.	2024-12-19 11:54:25 +01:00
Jean-Sébastien Pédron	debe2a118c	rabbitmq_ct_helpers: Change how Mnesia/Khepri is selected [Why] Once `khepr_db` is enabled by default, we need another way to disable it to select Mnesia instead. [How] We use the new relative forced feature flags mechanism to indicate if we want to explicitly enable or disable `khepri_db`. This way, we don't touch other stable feature flags and only mess with Khepri. However, this mechanism is not supported by RabbitMQ 4.0.x and older. They will ignore the setting. Therefore, to make this work in mixed-version testing, we set the `$RABBITMQ_FEATURE_FLAGS` variable for the secondary umbrella. This part will go away once we test against RabbitMQ 4.1.x as the secondary umbrella in the future. At the end, we compare the effective metadata store to the expected one. If they don't match, we skip the test. While here, change `rjms_topic_selector_SUITE` to only choose Khepri without specifying any feature flags.	2024-12-17 09:56:54 +01:00
Michael Klishin	0db3d7b014	Merge pull request #12950 from rabbitmq/qq-handle-tick Quorum queues: ignore handle_tick with an old overview format	2024-12-16 11:31:55 -05:00
Michael Klishin	62ce1c954a	Merge pull request #12948 from rabbitmq/fix-flakes Test fixes for a few more CI flakes	2024-12-16 11:24:10 -05:00
Diana Parra Corbacho	a97ec92785	Quorum queues: ignore handle_tick with an old overview format If handle_tick is called before the machine has finished the upgrade process, it could receive an old overview format (stats tuple vs map). Let's ignore it and the next handle tick should be fine. Unlikely to happen in production, detected on CI with a very low tick timeout	2024-12-16 15:39:39 +01:00
Diana Parra Corbacho	fe7a141331	Test: Increase receive timeout in all rabbit test suites	2024-12-16 11:58:05 +01:00
GitHub	0d750769f9	bazel run gazelle	2024-12-14 04:02:32 +00:00
David Ansari	b6027ece28	Fix dead lettering crash Fixes #12933 The assumption that `x-last-death-*` annotations must have been set whenever the `deaths` annotation is set was wrong. Reproducation steps, Option 1: 1. In v3.13.7, dead letter a message from Q1 to Q2 (both can be classic queues). 2. Re-publish the message including its x-death header from Q2 back to Q1. (RabbitMQ 3.13.7 will interpret this x-death header and set the deaths annotation.) 3. Upgrade to v4.0.4 4. Dead letter the message from Q1 to Q2 will cause the following crash: ``` crasher: initial call: rabbit_amqqueue_process:init/1 pid: <0.577.0> registered_name: [] exception exit: {{badkey,<<"x-last-death-exchange">>}, [{mc,record_death,4,[{file,"mc.erl"},{line,410}]}, {rabbit_dead_letter,publish,5, [{file,"rabbit_dead_letter.erl"},{line,38}]}, {rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-', 7, [{file,"rabbit_amqqueue_process.erl"},{line,1060}]}, {rabbit_variable_queue,'-ackfold/4-fun-0-',3, [{file,"rabbit_variable_queue.erl"},{line,655}]}, {lists,foldl,3,[{file,"lists.erl"},{line,2146}]}, {rabbit_variable_queue,ackfold,4, [{file,"rabbit_variable_queue.erl"},{line,652}]}, {rabbit_priority_queue,ackfold,4, [{file,"rabbit_priority_queue.erl"},{line,309}]}, {rabbit_amqqueue_process, '-dead_letter_rejected_msgs/3-fun-0-',5, [{file,"rabbit_amqqueue_process.erl"}, {line,1038}]}]} ``` Reproduction steps, Option 2: 1. Run a 4.0.4 / 3.13.7 mixed version cluster where both queues Q1 and Q2 are hosted on the 4.0.4 node. 2. Send a message to Q1 which dead letters to Q2. 3. Re-publish a message with the x-death AMQP 0.9.1 header from Q2 to Q1. However, this time make sure to publish to the 3.13.7 node which forwards this message to Q1 on the 4.0.4 node. 4. Subsequently dead lettering this message from Q1 to Q2 (happening on the 4.0.4 node) will also cause the crash. The modified test case in this commit was able to repro this crash via Option 2 in the mixed version cluster tests on the `v4.0.x` branch.	2024-12-13 19:25:43 +01:00
Matteo Cafasso	8d7535e0b1	amqqueue_process: adopt new `is_duplicate` backing queue callback As the de-duplication plugin is the only adopter of the `is_duplicate` callback, we now use a simpler signature. When a message is deemed duplicated, we discard it and re-route it to dead letter exchange. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com> (cherry picked from commit `f93baa35cb`)	2024-12-11 19:43:45 -05:00
Matteo Cafasso	6a6e760107	backing_queue: simplify `is_duplicate` callback signature `is_duplicate` callback signature was changed in order to support both the mirroring queues as well as the de-duplication ones. As the mirroring queues are now deprecated and removed, we can fall back to a simpler boolean as return value. Signed-off-by: Matteo Cafasso <noxdafox@gmail.com> (cherry picked from commit `c927446e17`)	2024-12-11 19:43:38 -05:00
David Ansari	9d8ae14e27	Use correct AMQP filter expression string modifier prefix Section 4.1.1 of AMQP Filter Expressions Working Draft 09 defines `&` (ampersand) instead of `$` (dollar) as the string modifier prefix.	2024-12-11 16:48:56 +01:00
Michael Klishin	b84483ab5c	Merge pull request #12907 from rabbitmq/rabbitmq-server-12906 By @gomoripeti: Restore credit_flow between AMQP 0.9.1 channel/MQTT connection -> CQ processes	2024-12-10 10:03:47 -05:00
David Ansari	0d34ef6047	Set a floor of zero for incoming-window Prior to this commit, when the sending client overshot RabbitMQ's incoming-window (which is allowed in the event of a cluster wide memory or disk alarm), and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative incoming-window field in the FLOW frame causing the following crash in the writer proc: ``` crasher: initial call: rabbit_amqp_writer:init/1 pid: <0.19353.0> registered_name: [] exception error: bad argument in function iolist_size/1 called as iolist_size([<<112,0,0,23,120>>, [82,-15], <<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67, <<112,0,0,23,120>>, "Rª",64,64,64,64]) *** argument 1: not an iodata term in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141) in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88) in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79) in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206) in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189) in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110) in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121) ``` This commit fixes this crash by maintaning a floor of zero for incoming-window in the FLOW frame. Fixes #12816	2024-12-10 09:39:21 +01:00
Péter Gömöri	2c1f1a1387	Restore credit_flow between channel/MQTT connection -> CQ processes The credit_flow between publishing AMQP 0.9.1 channel (or MQTT connection) and (non-mirrored) classic queue processes was unintentionally removed in 4.0 together with anything else related to CQ mirroring. By default we restore the 3.x behaviour for non-mirored classic queues. It is possible to disable flow-control (the earlier 4.0.x behaviour) with the new env `classic_queue_flow_control`. In 3.x this was possible with the config `mirroring_flow_control`. (cherry picked from commit `d65bd7d07a`)	2024-12-09 22:33:47 -05:00
Jean-Sébastien Pédron	56f90a51a9	rabbit_db: Return error from `force_boot_command_test/0` with Khepri	2024-12-02 13:33:08 +01:00
Jean-Sébastien Pédron	df9882417c	rabbit_khepri: Report no partitions from `cli_cluster_status/0`	2024-11-29 16:53:55 +01:00
Jean-Sébastien Pédron	4621fe7730	mirrored_supervisor: Catch timeout from Khepri in `hanlde_info/2` [Why] The code assumed that the transaction would always succeed. It was kind of the case with Mnesia because it would throw an exception if it failed. Khepri returns an error instead. The code has to handle it. In particular, we see timeouts in CI and before this patch, they caused a crash because the list comprehension was asked to work on a tuple. [How] We now retry a few times for 10 seconds.	2024-11-29 12:03:59 +01:00
Jean-Sébastien Pédron	913bd9fa42	rabbit_db: Fix `rabbit_db_msup:update_all/2` spec [Why] It can return an error.	2024-11-29 12:03:35 +01:00
Jean-Sébastien Pédron	ae9fbb7bd5	Pin Horus to 0.3.1 temporarily [Why] We pin a version of Horus even if we don't use it directly (it is a dependency of Khepri). But currently, we can't update Khepri while still needing the fix in Horus 0.3.1. Horus 0.3.1 works around a crash in `cover` that mostly affects CI for now. This pinning will have to go away with the next update of Khepri.	2024-11-29 09:50:08 +01:00
Jean-Sébastien Pédron	99d8e90df3	rabbit_quorum_queue: Wait for member add in `add_member/4` [Why] The `ra:member_add/3` call returns before the change is committed. This is ok for that addition but any follow-up changes to the cluster might be rejected with the `cluster_change_not_permitted` error. [How] Instead of changing other places to wait or retry their cluster membership change, this patch waits for the current add to be applied before proceeding and returning. This fixes some transient failures in CI where such follow-up changes are rejected and not retried, leaving the cluster in an unexpected state for the testcase. An example is with `quorum_queue_SUITE:force_shrink_member_to_current_member/1`	2024-11-28 11:27:40 +01:00
Michal Kuratczyk	46259b5a48	Fix invalid warning about transient queues being used This fixes the issue where RabbitMQ would warn about transient queues being used in a cluster with no transient queues. Fixes https://github.com/rabbitmq/rabbitmq-server/issues/12802	2024-11-27 22:04:01 +01:00
Michal Kuratczyk	1552f89dd7	Skip are_transient_nonexcl_used check on virin node This check fails on a virin node, because the metadata store is not yet ready to handle the query. However, a virin node by definition can't have any queues, so let's just return false without asking.	2024-11-27 22:03:53 +01:00
Michael Klishin	1cae417dbf	Merge pull request #12821 from rabbitmq/rabbitmq-server-12776 Definition export: inject default queue type into virtual host metadata	2024-11-27 14:53:25 -05:00
Michael Klishin	8a5ea76fe4	Inject DQT into 'ctl export_definitions'	2024-11-27 12:29:48 -05:00
Diana Parra Corbacho	d004d69200	Tests: feature_flags_v2_SUITE ignore peer:stop/1 return value	2024-11-27 15:45:58 +01:00
Michael Klishin	090d11818f	HTTP API tests for injected default queue type	2024-11-26 18:00:37 -05:00
Michael Klishin	51e6004840	Inject DQT into GET /api/definitions and /api/vhosts References #12776	2024-11-26 02:04:30 -05:00
Jean-Sébastien Pédron	f6314d06b3	rabbit_peer_discovery: Retry RPC calls [Why] In CI, we observe some timeouts in the Erlang distribution connections between the temporary hidden node and the nodes it queries. This affects peer discovery obviously. [How] We introduce some query retries to reduce the risk of an incomplete query. While here, we move the sorting of queried nodes from the `query_node_props2/3` last clause (executed in the temporary hidden node) to the function setting the temporary hidden node and asking for these queries. This way the debug messages from that sorting are logged by RabbitMQ out of the box.	2024-11-25 16:16:16 +01:00
Jean-Sébastien Pédron	4d4985f254	rabbit_peer_discovery: Fix non-tail-recursive `query_node_props2()` [Why] This impacts what is reported by the catch because it caught exceptions emitted by code supposedly called later. An example is the assert in `query_node_props2/3` last clause.	2024-11-25 16:16:15 +01:00
Jean-Sébastien Pédron	62f22a7655	rabbit_peer_discovery: Remove the use of group leader proxy [Why] This was the first solution put in place to prevent that the temporary hidden node connects to the node that started it to write any printed messages. Because of this, the nodes that the temporary hidden node queried found out about the parent node and they opened an Erlang distribution connection to it. This polluted the known nodes list. However later, the temporary hidden node was started with the `standard_io` connection option. This prevented the temporary hidden node from knowing about the node that started it, solving the problem in a cleaner way. [How] This commit garbage-collects that piece of code that is now useless. It makes the query code way simpler to understand.	2024-11-25 16:16:12 +01:00
D Corbacho	1fa4fe2735	Merge pull request #12775 from rabbitmq/fix-flakes Fixes for test flakes	2024-11-25 16:12:29 +01:00
Jean-Sébastien Pédron	fe2061b13b	quorum_queue_member_reconciliation_SUITE: Improve `reset_nodes/2` [How] The function now accepts that the node to reset is already out of the cluster. This avoids a mismatch exception for a situation that is ok.	2024-11-25 12:55:26 +01:00
Jean-Sébastien Pédron	03f9d36988	rabbit_vhosts: Don't reconcile vhosts if `rabbit` is stopped [Why] That timer was started during boot and continued regardless if `rabbit` was running or stopped. This caused the reconsiliation to crash if the `rabbit` app was stopped before the it ended because it tried to access the database even though it was stopped or even reset. [How] We just check if `rabbit` is running before running one reconciliation and scheduling a new one.	2024-11-25 12:39:13 +01:00
Diana Parra Corbacho	73924ba08e	Tests: amqp_client_SUITE delete all queues on end per testcase	2024-11-25 09:06:33 +01:00
Diana Parra Corbacho	a35f56fdc2	Tests: amqp_filtex_SUITE wait for link attachment and longer timeouts	2024-11-25 09:06:32 +01:00
GitHub	4e8d0f3ac2	bazel run gazelle	2024-11-23 04:02:38 +00:00
Michael Davis	c3c7675bda	rabbit_khepri: Add macros for path patterns	2024-11-22 11:21:11 -05:00
Michael Davis	e8fb9b6889	rabbit: Move include/{khepri.hrl => rabbit_khepri.hrl} This fixes erlang_ls's header resolution. Previously it would confuse the include_lib of the `khepri.hrl` from Khepri with this header in the rabbit app. This header is also specific to how rabbit uses Khepri so I think the new name fits better.	2024-11-22 11:21:11 -05:00
Michael Klishin	ea58fb1b48	crashing_queues_SUITE: squash a compiler warning	2024-11-17 17:23:00 -05:00
Michael Klishin	9f026f7a4b	Merge pull request #12727 from rabbitmq/rabbitmq-server-12709 By @Ayanda-D: Ensure only alive leaders and followers when fetching QQ replica states	2024-11-15 13:53:41 -05:00
Ayanda Dube	53cc8f8f2b	Update unit_quorum_queue_SUITE to use temporary alive & registered test queue processes (since we now check/return only alive members when fetching replica states) (cherry picked from commit `ebc0387b81`)	2024-11-15 12:49:55 -05:00
David Ansari	6e8b566323	Deduplicate AMQP type inference Introduce a single place in the AMQP 1.0 Erlang client that infers the AMQP 1.0 type. Erlang integers are inferred to be AMQP type `long` to avoid overflow surprises.	2024-11-15 17:40:36 +01:00
Jean-Sébastien Pédron	2938338182	rabbit_khepri: Do not hard-code `coordination`, use the constant instead	2024-11-15 16:41:16 +01:00
Jean-Sébastien Pédron	05717ccccf	rabbit_khepri: Remove serial file during reset	2024-11-15 16:40:50 +01:00
Jean-Sébastien Pédron	e41d766b29	rabbit_khepri: Ensure RabbitMQ is stopped before resetting with Khepri	2024-11-15 16:40:45 +01:00
Jean-Sébastien Pédron	7e2e7b79f2	rabbit_feature_flags: Support relative setting in `forced_feature_flags_on_init` [Why] We already support that from the environment variable, it is easy to add to the configuration setting.	2024-11-15 14:50:35 +01:00
Michael Klishin	3e509c9f30	Merge pull request #12714 from rabbitmq/amqp-event-exchange Support publishing AMQP 1.0 to Event Exchange	2024-11-14 18:09:19 -05:00

1 2 3 4 5 ...

3094 Commits