rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Michal Kuratczyk	d2b5384e3f	Remove test for rabbit_log:log/4	2025-07-16 00:31:42 +02:00
Michal Kuratczyk	abe7de2460	rabbit_log -> logger in CLI tests	2025-07-16 00:31:42 +02:00
Michal Kuratczyk	e7c2dc5aff	Show current state of deprecated features	2025-07-14 15:56:28 +02:00
Michal Kuratczyk	2808154f39	CLI: ignore the exact error message There error is slightly different for different Elixir versions	2025-07-08 10:52:24 +02:00
Michal Kuratczyk	6def891d42	Adjust CLI tests for OTP28	2025-06-27 12:42:57 +02:00
Diana Parra Corbacho	597fb419f3	Update rabbitmq_cli federation test dependency	2025-05-27 07:55:29 +02:00
Aaron Seo	0d3dfd9695	Add force checkpoint functions for quorum queues and command line tool (cherry picked from commit `b54ab1d5e5`)	2025-05-22 14:58:12 +02:00
Jean-Sébastien Pédron	a34ce08f8f	rabbitmq_cli: Adapt `force_reset_command_test.exs` testsuites [Why] `force_reset` is unsupported with Khepri.	2025-04-23 11:34:37 +02:00
Loïc Hoguin	c5d150a7ef	Use Erlang.mk's native Elixir support for CLI This avoids using Mix while compiling which simplifies a number of things and let us do further build improvements later on. Elixir is only enabled from within rabbitmq_cli currently. Eunit is disabled since there are only Elixir tests. Dialyzer will force-enable Elixir in order to process Elixir-compiled beam files. This commit also includes a few changes that are related: * The Erlang distribution will now be started for parallel-ct * Many unnecessary PROJECT_MOD lines have been removed * `eunit_formatters` has been removed, it provides little value * The new `maybe_flock` Erlang.mk function is used where possible * Build test deps when testing rabbitmq_cli (Mix won't do it anymore) * rabbitmq_ct_helpers now use the early plugins to have Dialyzer properly set up	2025-03-18 10:02:49 +01:00
Diana Parra Corbacho	d2f66ced1b	RMQ-1263: Add a --force option to rabbitmqctl delete_queue command RMQ-1263: Add a --force option to rabbitmqctl delete_queue command. This work was originally done by Iliia Khaprov <iliia.khaprov@broadcom.net>. --------- Co-authored-by: Iliia Khaprov <iliia.khaprov@broadcom.net> Co-authored-by: Michael Klishin <klishinm@vmware.com> (cherry picked from commit d9522d3ee708250cc84443af5c3556b14f7c5ab9)	2025-03-14 00:06:09 -04:00
Michael Klishin	cf1bfa0b15	CLI: remove a non-essential flaky test	2025-03-12 19:01:55 -04:00
Michael Klishin	b023062749	CLI distribution_test.exs: skip it on CI it flakes specifically on CI. We can afford to skip this specific test there and only run it locally.	2025-03-12 18:33:13 -04:00
Michael Klishin	09f1ab47b7	By @Ayanda-D: new CLI health check that detects QQs without an elected reachable leader #13433 (#13487 ) * Implement rabbitmq-queues leader_health_check command for quorum queues (cherry picked from commit `c26edbef33`) * Tests for rabbitmq-queues leader_health_check command (cherry picked from commit `6cc03b0009`) * Ensure calling ParentPID in leader health check execution and reuse and extend formatting API, with amqqueue:to_printable/2 (cherry picked from commit `76d66a1fd7`) * Extend core leader health check tests and update badrpc error handling in cli tests (cherry picked from commit `857e2a73ca`) * Refactor leader_health_check command validators and ignore vhost arg (cherry picked from commit `6cf9339e49`) * Update leader_health_check_command description and banner (cherry picked from commit `96b8bced2d`) * Improve output formatting for healthy leaders and support silent mode in rabbitmq-queues leader_health_check command (cherry picked from commit `239a69b404`) * Support global flag to run leader health check for all queues in all vhosts on local node (cherry picked from commit `48ba3e161f`) * Return immediately for leader health checks on empty vhosts (cherry picked from commit `7873737b35`) * Rename leader health check timeout refs (cherry picked from commit `b7dec89b87`) * Update banner message for global leader health check (cherry picked from commit `c7da4d5b24`) * QQ leader-health-check: check_process_limit_safety before spawning leader checks (cherry picked from commit `17368454c5`) * Log leader health check result in broker logs (if any leaderless queues) (cherry picked from commit `1084179a2c`) * Ensure check_passed result for leader health internal calls) (cherry picked from commit `68739a6bd2`) * Extend CLI format output to process check_passed payload (cherry picked from commit `5f5e9922bd`) * Format leader healthcheck result log and function exports (cherry picked from commit `ebffd7d8a4`) * Change leader_health_check command scope from queues to diagnostics (cherry picked from commit `663fc9846e`) * Update (c) line year (cherry picked from commit `df82f12a70`) * Rename command to check_for_quorum_queues_without_an_elected_leader and use across_all_vhosts option for global checks (cherry picked from commit `b2acbae28e`) * Use rabbit_db_queue for qq leader health check lookups and introduce rabbit_db_queue:get_all_by_type_and_vhost/2. Update leader health check timeout to 5s and process limit threshold to 20% of node's process_limit. (cherry picked from commit `7a8e166ff6`) * Update tests: quorum_queue_SUITE and rabbit_db_queue_SUITE (cherry picked from commit `9bdb81fd79`) * Fix typo (cli test module) (cherry picked from commit `615856853a`) * Small refactor - simpler final leader health check result return on function head match (cherry picked from commit `ea07938f3d`) * Clear dialyzer warning & fix type spec (cherry picked from commit `a45aa81bd2`) * Ignore result without strict match to avoid diayzer warning (cherry picked from commit `bb43c0b929`) * 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' documentation edits (cherry picked from commit 845230b0b380a5f5bad4e571a759c10f5cc93b91) * 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' output copywriting (cherry picked from commit 235f43bad58d3a286faa0377b8778fcbe6f8705d) * diagnostics check_for_quorum_queues_without_an_elected_leader: behave like a health check w.r.t. error reporting (cherry picked from commit db7376797581e4716e659fad85ef484cc6f0ea15) * check_for_quorum_queues_without_an_elected_leader: handle --quiet and --silent plus simplify function heads. References #13433. (cherry picked from commit 7b392315d5e597e5171a0c8196230d92b8ea8e92) --------- Co-authored-by: Ayanda Dube <adube14@bloomberg.net>	2025-03-12 00:32:59 -04:00
Arnaud Cogoluègnes	7ea2ff2651	Remove set_stream_retention_policy command It is not working as expected. Policies are the way to change data retention for stream.	2025-02-18 11:06:04 +01:00
Michael Klishin	07ec1b4b50	New health check CLI commands	2025-01-28 16:44:37 -05:00
David Ansari	579c58603e	Support AMQP over WebSocket (OSS part)	2025-01-27 17:50:47 +01:00
Michael Klishin	6281dcb798	CLI commands for #12772	2025-01-02 18:35:31 -05:00
Michael Klishin	3f5b13d47f	Merge branch 'main' into mk-virtual-host-protection-from-accidental-deletion	2025-01-02 17:01:54 -05:00
Michael Klishin	c95a95f822	CLI: mix format	2025-01-02 17:00:46 -05:00
Michael Klishin	f62d46c286	Introduce a way to protect a virtual host from deletion Accidental "fat finger" virtual deletion accidents would be easier to avoid if there was a protection mechanism that would apply equally even to CLI tools and external applications that do not use confirmations for deletion operations. This introduce the following changes: * Virtual host metadata now supports a new queue, 'protected_from_deletion', which, when set, will be considered by key virtual host deletion function(s) * DELETE /api/vhosts/{name} was adapted to handle such blocked deletion attempts to respond with a 412 Precondition Failed status * 'rabbitmqctl list_vhosts' and 'rabbitmqctl delete_vhost' were adapted accordingly * DELETE /api/vhosts/{name}/deletion/protection is a new endpoint that can be used to remove the protective seal (the metadata key) * POST /api/vhosts/{name}/deletion/protection marks the virtual host as protected In the case of the HTTP API, all operations on virtual host metadata require administrative privileges from the target user. Other considerations: * When a virtual host does not exist, the behavior remains the same: the original, protection-unaware code path is used to preserve backwards compatibility References #12772.	2025-01-02 16:50:51 -05:00
Michael Klishin	968eefa1bb	Bump (c) line year There are no functional changes to this massive diff.	2025-01-01 17:54:10 -05:00
Michael Klishin	9b6ab77f87	CLI: test/diagnostics cosmetics #12894	2024-12-04 13:44:29 -05:00
Michael Davis	9f60325563	CLI: Silence `status` command test This check is expected to succeed and the status is expected to be printed to stdout rather than stderr. This change silences the status output. The status text was printed mistakenly previously because we captured stderr rather than stdout.	2024-12-04 11:10:46 -05:00
Michael Davis	c328922a2a	CLI: Fix match of discover peers command test This previously emitted a warning because Elixir will rebind `this_node` by default, so the `this_node` binding in the line above was unused. (As opposed to Erlang which would treat this as a match - rejecting the binding if `this_node` was not equal to the value being matched.) The node needed to be adjusted as well - `node()` returned the ExUnit runner's node while the command returned the remote node, which is stored in the context under `opts.node`.	2024-12-04 11:07:46 -05:00
Michael Davis	d58d874a0b	CLI: Resolve elixirc warnings	2024-12-04 11:07:34 -05:00
Jean-Sébastien Pédron	112ff3f3f5	rabbitmq_cli: Prepare tests to run against a node with Khepri enabled by default	2024-12-02 13:55:41 +01:00
Diana Parra Corbacho	8dc712543e	Tests: set disk monitor as active in set_disk_free_limit_command_test	2024-11-06 16:59:19 +01:00
Loïc Hoguin	6b3b0e5c3f	CLI: Make a test more reliable	2024-09-30 12:35:43 +02:00
Loïc Hoguin	1c797387f7	cli: Add 4.1.0 to mock plugin's versions	2024-09-30 12:35:43 +02:00
Michael Klishin	47210c8307	cli enable_feature_flag: alias --experimental as --opt-in --experimental is no longer particularly fair to Khepri, which is not enabled by default because of its enormous scope, and because once enabled, it cannot be disabled. --opt-in would be a better name but --experimental remains for backwards compatiblity. When both are specified, we consider that the user opts in if at least one of the flags is set to true.	2024-09-17 19:08:28 -04:00
Michael Davis	dce8135d30	Add test cases for the '--experimental' flag	2024-08-20 16:23:17 -04:00
Michal Kuratczyk	e3302f2f9a	Rename --force to --experimental Plus, a slightly more scary error message	2024-08-19 13:41:37 -04:00
Michal Kuratczyk	58e0c1600f	Require --force to enable experimental FF	2024-08-19 13:41:37 -04:00
Michael Klishin	9dc899441f	Validation tests for the new command	2024-08-13 14:29:14 -04:00
Michael Klishin	e1490c6d9c	More CLI commands for tagged values	2024-08-13 14:26:02 -04:00
Michael Davis	52a0d70e15	Handle database timeouts when declaring exchanges The spec of `rabbit_exchange:declare/7` needs to be updated to return `{ok, Exchange} \| {error, Reason}` instead of the old return value of `rabbit_types:exchange()`. This is safe to do since `declare/7` is not called by RPC - from the CLI or otherwise - outside of test suites, and in test suites only through the CLI's `TestHelper.declare_exchange/7`. Callers of this helper are updated in this commit. Otherwise this commit updates callers to unwrap the `{ok, Exchange}` and bubble up errors.	2024-07-22 16:02:03 -04:00
Rin Kuryloski	5debebfaf3	Use rules_elixir to build the cli without mix Certain elixir-native deps are still build with mix, but this can be corrected later	2024-06-18 14:50:34 +02:00
Michael Klishin	27269b7399	CLI: mix format	2024-06-06 21:21:26 -04:00
Michael Klishin	6e41d3e6e6	Introduce 'rabbitmqctl reconcile_vhosts'	2024-06-06 21:21:26 -04:00
Diana Parra Corbacho	3bbda5bdba	Remove classic mirror queues	2024-06-04 13:00:31 +02:00
Michael Klishin	4a3806bd18	Makes PluginsFormatterTest pass	2024-05-08 20:03:29 -04:00
Ayanda Dube	9391e48b30	test plugin list supressed warnings, refs: #10865 and #10870	2024-03-28 12:46:19 +00:00
Michael Klishin	c0187ec155	Reuse --silent and --quiet in #10865	2024-03-27 21:36:53 -04:00
Michal Kuratczyk	15688c3f2d	Add 4.0.0 to mock plugin versions	2024-03-04 17:12:02 +01:00
David Ansari	8cb313d5a1	Support AMQP 1.0 natively ## What Similar to Native MQTT in #5895, this commits implements Native AMQP 1.0. By "native", we mean do not proxy via AMQP 0.9.1 anymore. ## Why Native AMQP 1.0 comes with the following major benefits: 1. Similar to Native MQTT, this commit provides better throughput, latency, scalability, and resource usage for AMQP 1.0. See https://blog.rabbitmq.com/posts/2023/03/native-mqtt for native MQTT improvements. See further below for some benchmarks. 2. Since AMQP 1.0 is not limited anymore by the AMQP 0.9.1 protocol, this commit allows implementing more AMQP 1.0 features in the future. Some features are already implemented in this commit (see next section). 3. Simpler, better understandable, and more maintainable code. Native AMQP 1.0 as implemented in this commit has the following major benefits compared to AMQP 0.9.1: 4. Memory and disk alarms will only stop accepting incoming TRANSFER frames. New connections can still be created to consume from RabbitMQ to empty queues. 5. Due to 4. no need anymore for separate connections for publishers and consumers as we currently recommended for AMQP 0.9.1. which potentially halves the number of physical TCP connections. 6. When a single connection sends to multiple target queues, a single slow target queue won't block the entire connection. Publisher can still send data quickly to all other target queues. 7. A publisher can request whether it wants publisher confirmation on a per-message basis. In AMQP 0.9.1 publisher confirms are configured per channel only. 8. Consumers can change their "prefetch count" dynamically which isn't possible in our AMQP 0.9.1 implementation. See #10174 9. AMQP 1.0 is an extensible protocol This commit also fixes dozens of bugs present in the AMQP 1.0 plugin in RabbitMQ 3.x - most of which cannot be backported due to the complexity and limitations of the old 3.x implementation. This commit contains breaking changes and is therefore targeted for RabbitMQ 4.0. ## Implementation details 1. Breaking change: With Native AMQP, the behaviour of ``` Convert AMQP 0.9.1 message headers to application properties for an AMQP 1.0 consumer amqp1_0.convert_amqp091_headers_to_app_props = false \| true (default false) Convert AMQP 1.0 Application Properties to AMQP 0.9.1 headers amqp1_0.convert_app_props_to_amqp091_headers = false \| true (default false) ``` will break because we always convert according to the message container conversions. For example, AMQP 0.9.1 x-headers will go into message-annotations instead of application properties. Also, `false` won’t be respected since we always convert the headers with message containers. 2. Remove rabbit_queue_collector rabbit_queue_collector is responsible for synchronously deleting exclusive queues. Since the AMQP 1.0 plugin never creates exclusive queues, rabbit_queue_collector doesn't need to be started in the first place. This will save 1 Erlang process per AMQP 1.0 connection. 3. 7 processes per connection + 1 process per session in this commit instead of 7 processes per connection + 15 processes per session in 3.x Supervision hierarchy got re-designed. 4. Use 1 writer process per AMQP 1.0 connection AMQP 0.9.1 uses a separate rabbit_writer Erlang process per AMQP 0.9.1 channel. Prior to this commit, AMQP 1.0 used a separate rabbit_amqp1_0_writer process per AMQP 1.0 session. Advantage of single writer proc per session (prior to this commit): * High parallelism for serialising packets if multiple sessions within a connection write heavily at the same time. This commit uses a single writer process per AMQP 1.0 connection that is shared across all AMQP 1.0 sessions. Advantages of single writer proc per connection (this commit): * Lower memory usage with hundreds of thousands of AMQP 1.0 sessions * Less TCP and IP header overhead given that the single writer process can accumulate across all sessions bytes before flushing the socket. In other words, this commit decides that a reader / writer process pair per AMQP 1.0 connection is good enough for bi-directional TRANSFER flows. Having a writer per session is too heavy. We still ensure high throughput by having separate reader, writer, and session processes. 5. Transform rabbit_amqp1_0_writer into gen_server Why: Prior to this commit, when clicking on the AMQP 1.0 writer process in observer, the process crashed. Instead of handling all these debug messages of the sys module, it's better to implement a gen_server. There is no advantage of using a special OTP process over gen_server for the AMQP 1.0 writer. gen_server also provides cleaner format status output. How: Message callbacks return a timeout of 0. After all messages in the inbox are processed, the timeout message is handled by flushing any pending bytes. 6. Remove stats timer from writer AMQP 1.0 connections haven't emitted any stats previously. 7. When there are contiguous queue confirmations in the session process mailbox, batch them. When the confirmations are sent to the publisher, a single DISPOSITION frame is sent for contiguously confirmed delivery IDs. This approach should be good enough. However it's sub optimal in scenarios where contiguous delivery IDs that need confirmations are rare, for example: * There are multiple links in the session with different sender settlement modes and sender publishes across these links interleaved. * sender settlement mode is mixed and sender publishes interleaved settled and unsettled TRANSFERs. 8. Introduce credit API v2 Why: The AMQP 0.9.1 credit extension which is to be removed in 4.0 was poorly designed since basic.credit is a synchronous call into the queue process blocking the entire AMQP 1.0 session process. How: Change the interactions between queue clients and queue server implementations: * Clients only request a credit reply if the FLOW's `echo` field is set * Include all link flow control state held by the queue process into a new credit_reply queue event: * `available` after the queue sends any deliveries * `link-credit` after the queue sends any deliveries * `drain` which allows us to combine the old queue events send_credit_reply and send_drained into a single new queue event credit_reply. * Include the consumer tag into the credit_reply queue event such that the AMQP 1.0 session process can process any credit replies asynchronously. Link flow control state `delivery-count` also moves to the queue processes. The new interactions are hidden behind feature flag credit_api_v2 to allow for rolling upgrades from 3.13 to 4.0. 9. Use serial number arithmetic in quorum queues and session process. 10. Completely bypass the rabbit_limiter module for AMQP 1.0 flow control. The goal is to eventually remove the rabbit_limiter module in 4.0 since AMQP 0.9.1 global QoS will be unsupported in 4.0. This commit lifts the AMQP 1.0 link flow control logic out of rabbit_limiter into rabbit_queue_consumers. 11. Fix credit bug for streams: AMQP 1.0 settlements shouldn't top up link credit, only FLOW frames should top up link credit. 12. Allow sender settle mode unsettled for streams since AMQP 1.0 acknowledgements to streams are no-ops (currently). 13. Fix AMQP 1.0 client bugs Auto renewing credits should not be related to settling TRANSFERs. Remove field link_credit_unsettled as it was wrong and confusing. Prior to this commit auto renewal did not work when the sender uses sender settlement mode settled. 14. Fix AMQP 1.0 client bugs The wrong outdated Link was passed to function auto_flow/2 15. Use osiris chunk iterator Only hold messages of uncompressed sub batches in memory if consumer doesn't have sufficient credits. Compressed sub batches are skipped for non Stream protocol consumers. 16. Fix incoming link flow control Always use confirms between AMQP 1.0 queue clients and queue servers. As already done internally by rabbit_fifo_client and rabbit_stream_queue, use confirms for classic queues as well. 17. Include link handle into correlation when publishing messages to target queues such that session process can correlate confirms from target queues to incoming links. 18. Only grant more credits to publishers if publisher hasn't sufficient credits anymore and there are not too many unconfirmed messages on the link. 19. Completely ignore `block` and `unblock` queue actions and RabbitMQ credit flow between classic queue process and session process. 20. Link flow control is independent between links. A client can refer to a queue or to an exchange with multiple dynamically added target queues. Multiple incoming links can also fan in to the same queue. However the link topology looks like, this commit ensures that each link is only granted more credits if that link isn't overloaded. 21. A connection or a session can send to many different queues. In AMQP 0.9.1, a single slow queue will lead to the entire channel, and then entire connection being blocked. This commit makes sure that a single slow queue from one link won't slow down sending on other links. For example, having link A sending to a local classic queue and link B sending to 5 replica quorum queue, link B will naturally grant credits slower than link A. So, despite the quorum queue being slower in confirming messages, the same AMQP 1.0 connection and session can still pump data very fast into the classic queue. 22. If cluster wide memory or disk alarm occurs. Each session sends a FLOW with incoming-window to 0 to sending client. If sending clients don’t obey, force disconnect the client. If cluster wide memory alarm clears: Each session resumes with a FLOW defaulting to initial incoming-window. 23. All operations apart of publishing TRANSFERS to RabbitMQ can continue during cluster wide alarms, specifically, attaching consumers and consuming, i.e. emptying queues. There is no need for separate AMQP 1.0 connections for publishers and consumers as recommended in our AMQP 0.9.1 implementation. 24. Flow control summary: * If queue becomes bottleneck, that’s solved by slowing down individual sending links (AMQP 1.0 link flow control). * If session becomes bottleneck (more unlikely), that’s solved by AMQP 1.0 session flow control. * If connection becomes bottleneck, it naturally won’t read fast enough from the socket causing TCP backpressure being applied. Nowhere will RabbitMQ internal credit based flow control (i.e. module credit_flow) be used on the incoming AMQP 1.0 message path. 25. Register AMQP sessions Prefer local-only pg over our custom pg_local implementation as pg is a better process group implementation than pg_local. pg_local was identified as bottleneck in tests where many MQTT clients were disconnected at once. 26. Start a local-only pg when Rabbit boots: > A scope can be kept local-only by using a scope name that is unique cluster-wide, e.g. the node name: > pg:start_link(node()). Register AMQP 1.0 connections and sessions with pg. In future we should remove pg_local and instead use the new local-only pg for all registered processes such as AMQP 0.9.1 connections and channels. 27. Requeue messages if link detached Although the spec allows to settle delivery IDs on detached links, RabbitMQ does not respect the 'closed' field of the DETACH frame and therefore handles every DETACH frame as closed. Since the link is closed, we expect every outstanding delivery to be requeued. In addition to consumer cancellation, detaching a link therefore causes in flight deliveries to be requeued. Note that this behaviour is different from merely consumer cancellation in AMQP 0.9.1: "After a consumer is cancelled there will be no future deliveries dispatched to it. Note that there can still be "in flight" deliveries dispatched previously. Cancelling a consumer will neither discard nor requeue them." [https://www.rabbitmq.com/consumers.html#unsubscribing] An AMQP receiver can first drain, and then detach to prevent "in flight" deliveries 28. Init AMQP session with BEGIN frame Similar to how there can't be an MQTT processor without a CONNECT frame, there can't be an AMQP session without a BEGIN frame. This allows having strict dialyzer types for session flow control fields (i.e. not allowing 'undefined'). 29. Move serial_number to AMQP 1.0 common lib such that it can be used by both AMQP 1.0 server and client 30. Fix AMQP client to do serial number arithmetic. 31. AMQP client: Differentiate between delivery-id and transfer-id for better understandability. 32. Fix link flow control in classic queues This commit fixes ``` java -jar target/perf-test.jar -ad false -f persistent -u cq -c 3000 -C 1000000 -y 0 ``` followed by ``` ./omq -x 0 amqp -T /queue/cq -D 1000000 --amqp-consumer-credits 2 ``` Prior to this commit, (and on RabbitMQ 3.x) the consuming would halt after around 8 - 10,000 messages. The bug was that in flight messages from classic queue process to session process were not taken into account when topping up credit to the classic queue process. Fixes #2597 The solution to this bug (and a much cleaner design anyway independent of this bug) is that queues should hold all link flow control state including the delivery-count. Hence, when credit API v2 is used the delivery-count will be held by the classic queue process, quorum queue process, and stream queue client instead of managing the delivery-count in the session. 33. The double level crediting between (a) session process and rabbit_fifo_client, and (b) rabbit_fifo_client and rabbit_fifo was removed. Therefore, instead of managing 3 separate delivery-counts (i. session, ii. rabbit_fifo_client, iii. rabbit_fifo), only 1 delivery-count is used in rabbit_fifo. This is a big simplification. 34. This commit fixes quorum queues without bumping the machine version nor introducing new rabbit_fifo commands. Whether credit API v2 is used is solely determined at link attachment time depending on whether feature flag credit_api_v2 is enabled. Even when that feature flag will be enabled later on, this link will keep using credit API v1 until detached (or the node is shut down). Eventually, after feature flag credit_api_v2 has been enabled and a subsequent rolling upgrade, all links will use credit API v2. This approach is safe and simple. The 2 alternatives to move delivery-count from the session process to the queue processes would have been: i. Explicit feature flag credit_api_v2 migration function * Can use a gen_server:call and only finish migration once all delivery-counts were migrated. Cons: * Extra new message format just for migration is required. * Risky as migration will fail if a target queue doesn’t reply. ii. Session always includes DeliveryCountSnd when crediting to the queue: Cons: * 2 delivery counts will be hold simultaneously in session proc and queue proc; could be solved by deleting the session proc’s delivery-count for credit-reply * What happens if the receiver doesn’t provide credit for a very long time? Is that a problem? 35. Support stream filtering in AMQP 1.0 (by @acogoluegnes) Use the x-stream-filter-value message annotation to carry the filter value in a published message. Use the rabbitmq:stream-filter and rabbitmq:stream-match-unfiltered filters when creating a receiver that wants to filter out messages from a stream. 36. Remove credit extension from AMQP 0.9.1 client 37. Support maintenance mode closing AMQP 1.0 connections. 38. Remove AMQP 0.9.1 client dependency from AMQP 1.0 implementation. 39. Move AMQP 1.0 plugin to the core. AMQP 1.0 is enabled by default. The old rabbitmq_amqp1_0 plugin will be kept as a no-op plugin to prevent deployment tools from failing that execute: ``` rabbitmq-plugins enable rabbitmq_amqp1_0 rabbitmq-plugins disable rabbitmq_amqp1_0 ``` 40. Breaking change: Remove CLI command `rabbitmqctl list_amqp10_connections`. Instead, list both AMQP 0.9.1 and AMQP 1.0 connections in `list_connections`: ``` rabbitmqctl list_connections protocol Listing connections ... protocol {1, 0} {0,9,1} ``` ## Benchmarks ### Throughput & Latency Setup: * Single node Ubuntu 22.04 * Erlang 26.1.1 Start RabbitMQ: ``` make run-broker PLUGINS="rabbitmq_management rabbitmq_amqp1_0" FULL=1 RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 3" ``` Predeclare durable classic queue cq1, durable quorum queue qq1, durable stream queue sq1. Start client: https://github.com/ssorj/quiver https://hub.docker.com/r/ssorj/quiver/tags (digest 453a2aceda64) ``` docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest bash-5.1# quiver --version quiver 0.4.0-SNAPSHOT ``` 1. Classic queue ``` quiver //host.docker.internal//amq/queue/cq1 --durable --count 1m --duration 10m --body-size 12 --credit 1000 ``` This commit: ``` Count ............................................. 1,000,000 messages Duration ............................................... 73.8 seconds Sender rate .......................................... 13,548 messages/s Receiver rate ........................................ 13,547 messages/s End-to-end rate ...................................... 13,547 messages/s Latencies by percentile: 0% ........ 0 ms 90.00% ........ 9 ms 25% ........ 2 ms 99.00% ....... 14 ms 50% ........ 4 ms 99.90% ....... 17 ms 100% ....... 26 ms 99.99% ....... 24 ms ``` RabbitMQ 3.x (main branch as of 30 January 2024): ``` ---------------------- Sender ----------------------- --------------------- Receiver ---------------------- -------- Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms] ----------------------------------------------------- ----------------------------------------------------- -------- 2.1 130,814 65,342 6 73.6 2.1 3,217 1,607 0 8.0 511 4.1 163,580 16,367 2 74.1 4.1 3,217 0 0 8.0 0 6.1 229,114 32,767 3 74.1 6.1 3,217 0 0 8.0 0 8.1 261,880 16,367 2 74.1 8.1 67,874 32,296 8 8.2 7,662 10.1 294,646 16,367 2 74.1 10.1 67,874 0 0 8.2 0 12.1 360,180 32,734 3 74.1 12.1 67,874 0 0 8.2 0 14.1 392,946 16,367 3 74.1 14.1 68,604 365 0 8.2 12,147 16.1 458,480 32,734 3 74.1 16.1 68,604 0 0 8.2 0 18.1 491,246 16,367 2 74.1 18.1 68,604 0 0 8.2 0 20.1 556,780 32,767 4 74.1 20.1 68,604 0 0 8.2 0 22.1 589,546 16,375 2 74.1 22.1 68,604 0 0 8.2 0 receiver timed out 24.1 622,312 16,367 2 74.1 24.1 68,604 0 0 8.2 0 quiver: error: PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/cq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-otujr23y' returned non-zero exit status 1. Traceback (most recent call last): File "/usr/local/lib/quiver/python/quiver/pair.py", line 144, in run _plano.wait(receiver, check=True) File "/usr/local/lib/quiver/python/plano/main.py", line 1243, in wait raise PlanoProcessError(proc) plano.main.PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/cq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-otujr23y' returned non-zero exit status 1. ``` 2. Quorum queue: ``` quiver //host.docker.internal//amq/queue/qq1 --durable --count 1m --duration 10m --body-size 12 --credit 1000 ``` This commit: ``` Count ............................................. 1,000,000 messages Duration .............................................. 101.4 seconds Sender rate ........................................... 9,867 messages/s Receiver rate ......................................... 9,868 messages/s End-to-end rate ....................................... 9,865 messages/s Latencies by percentile: 0% ....... 11 ms 90.00% ....... 23 ms 25% ....... 15 ms 99.00% ....... 28 ms 50% ....... 18 ms 99.90% ....... 33 ms 100% ....... 49 ms 99.99% ....... 47 ms ``` RabbitMQ 3.x: ``` ---------------------- Sender ----------------------- --------------------- Receiver ---------------------- -------- Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms] ----------------------------------------------------- ----------------------------------------------------- -------- 2.1 130,814 65,342 9 69.9 2.1 18,430 9,206 5 7.6 1,221 4.1 163,580 16,375 5 70.2 4.1 18,867 218 0 7.6 2,168 6.1 229,114 32,767 6 70.2 6.1 18,867 0 0 7.6 0 8.1 294,648 32,734 7 70.2 8.1 18,867 0 0 7.6 0 10.1 360,182 32,734 6 70.2 10.1 18,867 0 0 7.6 0 12.1 425,716 32,767 6 70.2 12.1 18,867 0 0 7.6 0 receiver timed out 14.1 458,482 16,367 5 70.2 14.1 18,867 0 0 7.6 0 quiver: error: PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/qq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-b1gcup43' returned non-zero exit status 1. Traceback (most recent call last): File "/usr/local/lib/quiver/python/quiver/pair.py", line 144, in run _plano.wait(receiver, check=True) File "/usr/local/lib/quiver/python/plano/main.py", line 1243, in wait raise PlanoProcessError(proc) plano.main.PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/qq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-b1gcup43' returned non-zero exit status 1. ``` 3. Stream: ``` quiver-arrow send //host.docker.internal//amq/queue/sq1 --durable --count 1m -d 10m --summary --verbose ``` This commit: ``` Count ............................................. 1,000,000 messages Duration ................................................ 8.7 seconds Message rate ........................................ 115,154 messages/s ``` RabbitMQ 3.x: ``` Count ............................................. 1,000,000 messages Duration ............................................... 21.2 seconds Message rate ......................................... 47,232 messages/s ``` ### Memory usage Start RabbitMQ: ``` ERL_MAX_PORTS=3000000 RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+P 3000000 +S 6" make run-broker PLUGINS="rabbitmq_amqp1_0" FULL=1 RABBITMQ_CONFIG_FILE="rabbitmq.conf" ``` ``` /bin/cat rabbitmq.conf tcp_listen_options.sndbuf = 2048 tcp_listen_options.recbuf = 2048 vm_memory_high_watermark.relative = 0.95 vm_memory_high_watermark_paging_ratio = 0.95 loopback_users = none ``` Create 50k connections with 2 sessions per connection, i.e. 100k session in total: ```go package main import ( "context" "log" "time" "github.com/Azure/go-amqp" ) func main() { for i := 0; i < 50000; i++ { conn, err := amqp.Dial(context.TODO(), "amqp://nuc", &amqp.ConnOptions{SASLType: amqp.SASLTypeAnonymous()}) if err != nil { log.Fatal("dialing AMQP server:", err) } _, err = conn.NewSession(context.TODO(), nil) if err != nil { log.Fatal("creating AMQP session:", err) } _, err = conn.NewSession(context.TODO(), nil) if err != nil { log.Fatal("creating AMQP session:", err) } } log.Println("opened all connections") time.Sleep(5 * time.Hour) } ``` This commit: ``` erlang:memory(). [{total,4586376480}, {processes,4025898504}, {processes_used,4025871040}, {system,560477976}, {atom,1048841}, {atom_used,1042841}, {binary,233228608}, {code,21449982}, {ets,108560464}] erlang:system_info(process_count). 450289 ``` 7 procs per connection + 1 proc per session. (7 + 21) 50,000 = 450,000 procs RabbitMQ 3.x: ``` erlang:memory(). [{total,15168232704}, {processes,14044779256}, {processes_used,14044755120}, {system,1123453448}, {atom,1057033}, {atom_used,1052587}, {binary,236381264}, {code,21790238}, {ets,391423744}] erlang:system_info(process_count). 1850309 ``` 7 procs per connection + 15 per session (7 + 215) 50,000 = 1,850,000 procs 50k connections + 100k session require with this commit: 4.5 GB in RabbitMQ 3.x: 15 GB ## Future work 1. More efficient parser and serializer 2. TODO in mc_amqp: Do not store the parsed message on disk. 3. Implement both AMQP HTTP extension and AMQP management extension to allow AMQP clients to create RabbitMQ objects (queues, exchanges, ...).	2024-02-28 14:15:20 +01:00
Michael Klishin	f414c2d512	More missed license header updates #9969	2024-02-05 11:53:50 -05:00
Michael Klishin	c4ae6f30b0	Basic tests for the new CLI command #10304	2024-01-26 20:29:03 -05:00
Diana Parra Corbacho	c9c4574dd0	Update rabbitmqctl tests for rename/update cluster nodes	2024-01-19 11:22:16 -05:00
Diana Parra Corbacho	a363737337	CLI: list_deprecated_features command Lists all or used deprecated features	2023-12-11 12:51:13 +01:00
Jean-Sébastien Pédron	84cede17e1	rabbit_peer_discovery: Rewrite core logic [Why] This work started as an effort to add peer discovery support to our Khepri integration. Indeed, as part of the task to integrate Khepri, we missed the fact that `rabbit_peer_discovery:maybe_create_cluster/1` was called from the Mnesia-specific code only. Even though we knew about it because we hit many issues caused by the fact the `join_cluster` and peer discovery use different code path to create a cluster. To add support for Khepri, the first version of this patch was to move the call to `rabbit_peer_discovery:maybe_create_cluster/1` from `rabbit_db_cluster` instead of `rabbit_mnesia`. To achieve that, it made sense to unify the code and simply call `rabbit_db_cluster:join/2` instead of duplicating the work. Unfortunately, doing so highlighted another issue: the way the node to cluster with was selected. Indeed, it could cause situations where multiple clusters are created instead of one, without resorting to out-of-band counter-measures, like a 30-second delay added in the Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was even more frequent when we tried to unify the code path and call `join_cluster`. After several iterations on the patch and even more discussions with the team, we decided to rewrite the algorithm to make node selection more robust and still use `rabbit_db_cluster:join/2` to create the cluster. [How] This commit is only about the rewrite of the algorithm. Calling peer discovery from `rabbit_db_cluster` instead of `rabbit_mnesia` (and thus making peer discovery work with Khepri) will be done in a follow-up commit. We wanted the new algorithm to fulfill the following properties: 1. `rabbit_peer_discovery` should provide the ability to re-trigger it easily to re-evaluate the cluster. The new public API is `rabbit_peer_discovery:sync_desired_cluster/0`. 2. The selection of the node to join should be designed in a way that all nodes select the same, regardless of the order in which they become available. The adopted solution is to sort the list of discovered nodes with the following criterias (in that order): 1. the size of the cluster a discovered node is part of; sorted from bigger to smaller clusters 2. the start time of a discovered node; sorted from older to younger nodes 3. the name of a discovered node; sorted alphabetically The first node in that list will not join anyone and simply proceed with its boot process. Other nodes will try to join the first node. 3. To reduce the chance of incorrectly having multiple standalone nodes because the discovery backend returned only a single node, we want to apply the following constraints to the list of nodes after it is filtered and sorted (see property 2 above): * The list must contain `node()` (i.e. the node running peer discovery itself). * If the RabbitMQ's cluster size hint is greater than 1, the list must have at least two nodes. The cluster size hint is the maximum between the configured target cluster size hint and the number of elements in the nodes list returned by the backend. If one of the constraint is not met, the entire peer discovery process is restarted after a delay. 4. The lock is acquired only to protect the actual join, not the discovery step where the backend is queried to get the list of peers. With the node selection described above, this will let the first node to start without acquiring the lock. 5. The cluster membership views queried as part of the algorithm to sort the list of nodes will be used to detect additional clusters or standalone nodes that did not cluster correctly. These nodes will be asked to re-evaluate peer discovery to increase the chance of forming a single cluster. 6. After some delay, peer discovery will be re-evaluated to further eliminate the chances of having multiple clusters instead of one. This commit covers properties from point 1 to point 4. Remaining properties will be the scope of additional pull requests after this one works. If there is a failure at any point during discovery, filtering/sorting, locking or joining, the entire process is restarted after a delay. This is configured using the following parameters: * cluster_formation.discovery_retry_limit * cluster_formation.discovery_retry_interval The default parameters were bumped to 30 retries with a delay of 1 second between each. The locking retries/interval parameters are not used by the new algorithm anymore. There are extra minor changes that come with the rewrite: * The configured backend is cached in a persistent term. The goal is to make sure we use the same backend throughout the entire process and when we call `maybe_unregister/0` even if the configuration changed for whatever reason in between. * `maybe_register/0` is called from `rabbit_db_cluster` instead of at the end of a successful peer discovery process. `rabbit_db_cluster` had to call `maybe_register/0` if the node was not virgin anyway. So make it simpler and always call it in `rabbit_db_cluster` regardless of the state of the node. * `log_configured_backend/0` is gone. `maybe_init/0` can log the backend directly. There is no need to explicitly call another function for that. * Messages are logged using `?LOG_*()` macros instead of the old `rabbit_log` module.	2023-12-07 15:51:54 +01:00

1 2 3 4 5 ...

928 Commits