rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Michael Klishin	8de26fbc03	Fix two warnings in a deprecated CMQ-related CLI command	2024-07-15 12:38:01 -04:00
Michal Kuratczyk	e1b649c0c6	check_if_node_is_mirror_sync_critical is no-op Make `check_if_node_is_mirror_sync_critical` a no-op with a deprecation warning. Since this command is commonly used as part of the node shutdown process (eg. by Cluster Operator), making it a no-op instead of removing completly will make the transition to 4.0 easier for users.	2024-07-15 12:38:01 -04:00
Michael Davis	88c1ad2f6e	Adapt to new `{error, timeout}` return value in Khepri 0.14.0 See rabbitmq/khepri#256.	2024-07-10 16:07:43 -04:00
Diana Parra Corbacho	a8dcca084a	rabbit_definitions: ensure federation-upstream-set parameters are exported These parameters are not proplists but a list of maps	2024-07-04 12:41:34 +02:00
Loïc Hoguin	bbfa066d79	Cleanup .gitignore files for the monorepo We don't need to duplicate so many patterns in so many files since we have a monorepo (and want to keep it). If I managed to miss something or remove something that should stay, please put it back. Note that monorepo-wide patterns should go in the top-level .gitignore file. Other .gitignore files are for application or folder- specific patterns.	2024-06-28 12:00:52 +02:00
Loïc Hoguin	18f8ee1457	Merge pull request #11549 from rabbitmq/loic-make-cleanups Various make cleanup/consolidation	2024-06-27 11:42:24 +02:00
Loïc Hoguin	7e9cac3d00	make: Remove Travis-specific targets/config This should no longer be used.	2024-06-24 14:12:02 +02:00
Loïc Hoguin	2a64a0f6c8	Restore FD info in rabbitmqctl status The FD limits are still valuable. The FD used will still show some information during CQv1 upgrade to v2 so it is kept for now. But in the future it will have to be reworked to query the system, or be removed.	2024-06-24 12:07:51 +02:00
Loïc Hoguin	49bedfc17e	Remove most of the fd related FHC code Stats were not removed, including management UI stats relating to FDs. Web-MQTT and Web-STOMP configuration relating to FHC were not removed. The file_handle_cache itself must be kept until we remove CQv1.	2024-06-24 12:07:51 +02:00
Volker Schlecht	732a75d71d	Allow elixir 1.17.x RabbitMQ builds fine with elixir 1.17.1, tested with Release 3.13.3 on OTP26	2024-06-22 09:40:27 +02:00
Rin Kuryloski	3465eef2cd	Use the latest rules_erlang & rules_elixir	2024-06-20 11:13:02 +02:00
Rin Kuryloski	46250dce11	ensure that csv and json elixir deps are embedded in the cli escript also set 'cfg =' appropriately	2024-06-18 16:50:02 +02:00
Rin Kuryloski	f2add661f4	Check that the cli can compile with --warnings-as-errors in ci It's technically a build target, so we didn't need to create a whole new test rule, but it's marked as "manual" so that it doesn't cause bazel build //... to fail	2024-06-18 14:50:35 +02:00
Rin Kuryloski	662ad8edf4	amqp_client system_SUITE apparently needs the cli on the code path I'm not actually sure why the changes to rabbitmqctl compilation necessitated this change, but it seems to be the case	2024-06-18 14:50:35 +02:00
Rin Kuryloski	ba8cf2c2f7	Set the locale with elixir	2024-06-18 14:50:34 +02:00
Rin Kuryloski	35171ebdeb	Add erlang binaries to the path in rabbitmq-run.sh As they are no longer added automatically by the enclosing rule	2024-06-18 14:50:34 +02:00
Rin Kuryloski	5debebfaf3	Use rules_elixir to build the cli without mix Certain elixir-native deps are still build with mix, but this can be corrected later	2024-06-18 14:50:34 +02:00
Loïc Hoguin	c1559f6d12	make: Don't call `find` in rabbitmq_cli's Makefile This greatly speeds up execution because we go through this Makefile twice (currently). Once for building and once for installing (e)scripts. make -C deps/rabbit 0,37s user 0,15s system 101% cpu 0,508 total make -C deps/rabbit 0,35s user 0,09s system 100% cpu 0,438 total	2024-06-10 09:42:33 +02:00
Michael Klishin	27269b7399	CLI: mix format	2024-06-06 21:21:26 -04:00
Michael Klishin	6e41d3e6e6	Introduce 'rabbitmqctl reconcile_vhosts'	2024-06-06 21:21:26 -04:00
Michael Klishin	41e5d38b94	rabbitmq-diagnostics status: drop date-based support status field As of [1], this field has become irrelevant or even misleading. 1. https://www.rabbitmq.com/blog/2024/05/31/new-community-support-policy	2024-06-05 23:20:35 -04:00
Diana Parra Corbacho	3bbda5bdba	Remove classic mirror queues	2024-06-04 13:00:31 +02:00
Michael Klishin	a21a60958b	CLI: placate Dialyzer	2024-05-08 23:33:36 -04:00
Michael Klishin	4a3806bd18	Makes PluginsFormatterTest pass	2024-05-08 20:03:29 -04:00
Péter Gömöri	69eb2b9c29	Don't show disabled plugins as pending upgrade Since commit `c0187ec15` the value of `running_version` is converted to_string (`nil` would become the empty string). But the formatter expected `running_version` to be `nil` if the plugin is not running. This did not match and not running/not enabled plugins were marked incorrectly as "pending upgrade to...". For example: ``` $ rabbitmq-plugins list trust -q [ ] rabbitmq_trust_store (pending upgrade to 3.13.0+51.g9145b53) $ rabbitmq-plugins list trust --formatter erlang -q #{status => running,format => normal, plugins => [#{enabled => not_enabled,name => rabbitmq_trust_store,running => false, version => <<"3.13.0+51.g9145b53">>,running_version => <<>>}]} ```	2024-05-08 08:23:21 +02:00
Aaron Seo	652da0ebaa	Add type info_key for list_unresponsive_queues In the rabbitmqctl docs for list_unresponsive_queues, `type` is listed as a queueinfoitem parameter, but this is not reflected in the source code. This commit adds `type` as a queueinfoitem parameter for list_unresponsive_queues.	2024-04-24 20:11:18 +00:00
Ayanda Dube	9391e48b30	test plugin list supressed warnings, refs: #10865 and #10870	2024-03-28 12:46:19 +00:00
Michael Klishin	c0187ec155	Reuse --silent and --quiet in #10865	2024-03-27 21:36:53 -04:00
Rin Kuryloski	5e1983abee	Resolve additional symlinks when copying in rabbitmqctl.bzl	2024-03-20 16:21:10 +01:00
Rin Kuryloski	ca7027530d	Resolve symlinks when copying in rabbitmqctl.bzl	2024-03-20 16:21:10 +01:00
Michael Klishin	eb261acd30	CLI: update guide URLs to use the new path structure the original paths, e.g. /streams.html, do have redirects in place but it turned out to be a surprisingly fragile Cloudflare feature when there are hundreds of them, so we better switch now.	2024-03-07 15:53:14 -05:00
Michal Kuratczyk	15688c3f2d	Add 4.0.0 to mock plugin versions	2024-03-04 17:12:02 +01:00
David Ansari	8cb313d5a1	Support AMQP 1.0 natively ## What Similar to Native MQTT in #5895, this commits implements Native AMQP 1.0. By "native", we mean do not proxy via AMQP 0.9.1 anymore. ## Why Native AMQP 1.0 comes with the following major benefits: 1. Similar to Native MQTT, this commit provides better throughput, latency, scalability, and resource usage for AMQP 1.0. See https://blog.rabbitmq.com/posts/2023/03/native-mqtt for native MQTT improvements. See further below for some benchmarks. 2. Since AMQP 1.0 is not limited anymore by the AMQP 0.9.1 protocol, this commit allows implementing more AMQP 1.0 features in the future. Some features are already implemented in this commit (see next section). 3. Simpler, better understandable, and more maintainable code. Native AMQP 1.0 as implemented in this commit has the following major benefits compared to AMQP 0.9.1: 4. Memory and disk alarms will only stop accepting incoming TRANSFER frames. New connections can still be created to consume from RabbitMQ to empty queues. 5. Due to 4. no need anymore for separate connections for publishers and consumers as we currently recommended for AMQP 0.9.1. which potentially halves the number of physical TCP connections. 6. When a single connection sends to multiple target queues, a single slow target queue won't block the entire connection. Publisher can still send data quickly to all other target queues. 7. A publisher can request whether it wants publisher confirmation on a per-message basis. In AMQP 0.9.1 publisher confirms are configured per channel only. 8. Consumers can change their "prefetch count" dynamically which isn't possible in our AMQP 0.9.1 implementation. See #10174 9. AMQP 1.0 is an extensible protocol This commit also fixes dozens of bugs present in the AMQP 1.0 plugin in RabbitMQ 3.x - most of which cannot be backported due to the complexity and limitations of the old 3.x implementation. This commit contains breaking changes and is therefore targeted for RabbitMQ 4.0. ## Implementation details 1. Breaking change: With Native AMQP, the behaviour of ``` Convert AMQP 0.9.1 message headers to application properties for an AMQP 1.0 consumer amqp1_0.convert_amqp091_headers_to_app_props = false \| true (default false) Convert AMQP 1.0 Application Properties to AMQP 0.9.1 headers amqp1_0.convert_app_props_to_amqp091_headers = false \| true (default false) ``` will break because we always convert according to the message container conversions. For example, AMQP 0.9.1 x-headers will go into message-annotations instead of application properties. Also, `false` won’t be respected since we always convert the headers with message containers. 2. Remove rabbit_queue_collector rabbit_queue_collector is responsible for synchronously deleting exclusive queues. Since the AMQP 1.0 plugin never creates exclusive queues, rabbit_queue_collector doesn't need to be started in the first place. This will save 1 Erlang process per AMQP 1.0 connection. 3. 7 processes per connection + 1 process per session in this commit instead of 7 processes per connection + 15 processes per session in 3.x Supervision hierarchy got re-designed. 4. Use 1 writer process per AMQP 1.0 connection AMQP 0.9.1 uses a separate rabbit_writer Erlang process per AMQP 0.9.1 channel. Prior to this commit, AMQP 1.0 used a separate rabbit_amqp1_0_writer process per AMQP 1.0 session. Advantage of single writer proc per session (prior to this commit): * High parallelism for serialising packets if multiple sessions within a connection write heavily at the same time. This commit uses a single writer process per AMQP 1.0 connection that is shared across all AMQP 1.0 sessions. Advantages of single writer proc per connection (this commit): * Lower memory usage with hundreds of thousands of AMQP 1.0 sessions * Less TCP and IP header overhead given that the single writer process can accumulate across all sessions bytes before flushing the socket. In other words, this commit decides that a reader / writer process pair per AMQP 1.0 connection is good enough for bi-directional TRANSFER flows. Having a writer per session is too heavy. We still ensure high throughput by having separate reader, writer, and session processes. 5. Transform rabbit_amqp1_0_writer into gen_server Why: Prior to this commit, when clicking on the AMQP 1.0 writer process in observer, the process crashed. Instead of handling all these debug messages of the sys module, it's better to implement a gen_server. There is no advantage of using a special OTP process over gen_server for the AMQP 1.0 writer. gen_server also provides cleaner format status output. How: Message callbacks return a timeout of 0. After all messages in the inbox are processed, the timeout message is handled by flushing any pending bytes. 6. Remove stats timer from writer AMQP 1.0 connections haven't emitted any stats previously. 7. When there are contiguous queue confirmations in the session process mailbox, batch them. When the confirmations are sent to the publisher, a single DISPOSITION frame is sent for contiguously confirmed delivery IDs. This approach should be good enough. However it's sub optimal in scenarios where contiguous delivery IDs that need confirmations are rare, for example: * There are multiple links in the session with different sender settlement modes and sender publishes across these links interleaved. * sender settlement mode is mixed and sender publishes interleaved settled and unsettled TRANSFERs. 8. Introduce credit API v2 Why: The AMQP 0.9.1 credit extension which is to be removed in 4.0 was poorly designed since basic.credit is a synchronous call into the queue process blocking the entire AMQP 1.0 session process. How: Change the interactions between queue clients and queue server implementations: * Clients only request a credit reply if the FLOW's `echo` field is set * Include all link flow control state held by the queue process into a new credit_reply queue event: * `available` after the queue sends any deliveries * `link-credit` after the queue sends any deliveries * `drain` which allows us to combine the old queue events send_credit_reply and send_drained into a single new queue event credit_reply. * Include the consumer tag into the credit_reply queue event such that the AMQP 1.0 session process can process any credit replies asynchronously. Link flow control state `delivery-count` also moves to the queue processes. The new interactions are hidden behind feature flag credit_api_v2 to allow for rolling upgrades from 3.13 to 4.0. 9. Use serial number arithmetic in quorum queues and session process. 10. Completely bypass the rabbit_limiter module for AMQP 1.0 flow control. The goal is to eventually remove the rabbit_limiter module in 4.0 since AMQP 0.9.1 global QoS will be unsupported in 4.0. This commit lifts the AMQP 1.0 link flow control logic out of rabbit_limiter into rabbit_queue_consumers. 11. Fix credit bug for streams: AMQP 1.0 settlements shouldn't top up link credit, only FLOW frames should top up link credit. 12. Allow sender settle mode unsettled for streams since AMQP 1.0 acknowledgements to streams are no-ops (currently). 13. Fix AMQP 1.0 client bugs Auto renewing credits should not be related to settling TRANSFERs. Remove field link_credit_unsettled as it was wrong and confusing. Prior to this commit auto renewal did not work when the sender uses sender settlement mode settled. 14. Fix AMQP 1.0 client bugs The wrong outdated Link was passed to function auto_flow/2 15. Use osiris chunk iterator Only hold messages of uncompressed sub batches in memory if consumer doesn't have sufficient credits. Compressed sub batches are skipped for non Stream protocol consumers. 16. Fix incoming link flow control Always use confirms between AMQP 1.0 queue clients and queue servers. As already done internally by rabbit_fifo_client and rabbit_stream_queue, use confirms for classic queues as well. 17. Include link handle into correlation when publishing messages to target queues such that session process can correlate confirms from target queues to incoming links. 18. Only grant more credits to publishers if publisher hasn't sufficient credits anymore and there are not too many unconfirmed messages on the link. 19. Completely ignore `block` and `unblock` queue actions and RabbitMQ credit flow between classic queue process and session process. 20. Link flow control is independent between links. A client can refer to a queue or to an exchange with multiple dynamically added target queues. Multiple incoming links can also fan in to the same queue. However the link topology looks like, this commit ensures that each link is only granted more credits if that link isn't overloaded. 21. A connection or a session can send to many different queues. In AMQP 0.9.1, a single slow queue will lead to the entire channel, and then entire connection being blocked. This commit makes sure that a single slow queue from one link won't slow down sending on other links. For example, having link A sending to a local classic queue and link B sending to 5 replica quorum queue, link B will naturally grant credits slower than link A. So, despite the quorum queue being slower in confirming messages, the same AMQP 1.0 connection and session can still pump data very fast into the classic queue. 22. If cluster wide memory or disk alarm occurs. Each session sends a FLOW with incoming-window to 0 to sending client. If sending clients don’t obey, force disconnect the client. If cluster wide memory alarm clears: Each session resumes with a FLOW defaulting to initial incoming-window. 23. All operations apart of publishing TRANSFERS to RabbitMQ can continue during cluster wide alarms, specifically, attaching consumers and consuming, i.e. emptying queues. There is no need for separate AMQP 1.0 connections for publishers and consumers as recommended in our AMQP 0.9.1 implementation. 24. Flow control summary: * If queue becomes bottleneck, that’s solved by slowing down individual sending links (AMQP 1.0 link flow control). * If session becomes bottleneck (more unlikely), that’s solved by AMQP 1.0 session flow control. * If connection becomes bottleneck, it naturally won’t read fast enough from the socket causing TCP backpressure being applied. Nowhere will RabbitMQ internal credit based flow control (i.e. module credit_flow) be used on the incoming AMQP 1.0 message path. 25. Register AMQP sessions Prefer local-only pg over our custom pg_local implementation as pg is a better process group implementation than pg_local. pg_local was identified as bottleneck in tests where many MQTT clients were disconnected at once. 26. Start a local-only pg when Rabbit boots: > A scope can be kept local-only by using a scope name that is unique cluster-wide, e.g. the node name: > pg:start_link(node()). Register AMQP 1.0 connections and sessions with pg. In future we should remove pg_local and instead use the new local-only pg for all registered processes such as AMQP 0.9.1 connections and channels. 27. Requeue messages if link detached Although the spec allows to settle delivery IDs on detached links, RabbitMQ does not respect the 'closed' field of the DETACH frame and therefore handles every DETACH frame as closed. Since the link is closed, we expect every outstanding delivery to be requeued. In addition to consumer cancellation, detaching a link therefore causes in flight deliveries to be requeued. Note that this behaviour is different from merely consumer cancellation in AMQP 0.9.1: "After a consumer is cancelled there will be no future deliveries dispatched to it. Note that there can still be "in flight" deliveries dispatched previously. Cancelling a consumer will neither discard nor requeue them." [https://www.rabbitmq.com/consumers.html#unsubscribing] An AMQP receiver can first drain, and then detach to prevent "in flight" deliveries 28. Init AMQP session with BEGIN frame Similar to how there can't be an MQTT processor without a CONNECT frame, there can't be an AMQP session without a BEGIN frame. This allows having strict dialyzer types for session flow control fields (i.e. not allowing 'undefined'). 29. Move serial_number to AMQP 1.0 common lib such that it can be used by both AMQP 1.0 server and client 30. Fix AMQP client to do serial number arithmetic. 31. AMQP client: Differentiate between delivery-id and transfer-id for better understandability. 32. Fix link flow control in classic queues This commit fixes ``` java -jar target/perf-test.jar -ad false -f persistent -u cq -c 3000 -C 1000000 -y 0 ``` followed by ``` ./omq -x 0 amqp -T /queue/cq -D 1000000 --amqp-consumer-credits 2 ``` Prior to this commit, (and on RabbitMQ 3.x) the consuming would halt after around 8 - 10,000 messages. The bug was that in flight messages from classic queue process to session process were not taken into account when topping up credit to the classic queue process. Fixes #2597 The solution to this bug (and a much cleaner design anyway independent of this bug) is that queues should hold all link flow control state including the delivery-count. Hence, when credit API v2 is used the delivery-count will be held by the classic queue process, quorum queue process, and stream queue client instead of managing the delivery-count in the session. 33. The double level crediting between (a) session process and rabbit_fifo_client, and (b) rabbit_fifo_client and rabbit_fifo was removed. Therefore, instead of managing 3 separate delivery-counts (i. session, ii. rabbit_fifo_client, iii. rabbit_fifo), only 1 delivery-count is used in rabbit_fifo. This is a big simplification. 34. This commit fixes quorum queues without bumping the machine version nor introducing new rabbit_fifo commands. Whether credit API v2 is used is solely determined at link attachment time depending on whether feature flag credit_api_v2 is enabled. Even when that feature flag will be enabled later on, this link will keep using credit API v1 until detached (or the node is shut down). Eventually, after feature flag credit_api_v2 has been enabled and a subsequent rolling upgrade, all links will use credit API v2. This approach is safe and simple. The 2 alternatives to move delivery-count from the session process to the queue processes would have been: i. Explicit feature flag credit_api_v2 migration function * Can use a gen_server:call and only finish migration once all delivery-counts were migrated. Cons: * Extra new message format just for migration is required. * Risky as migration will fail if a target queue doesn’t reply. ii. Session always includes DeliveryCountSnd when crediting to the queue: Cons: * 2 delivery counts will be hold simultaneously in session proc and queue proc; could be solved by deleting the session proc’s delivery-count for credit-reply * What happens if the receiver doesn’t provide credit for a very long time? Is that a problem? 35. Support stream filtering in AMQP 1.0 (by @acogoluegnes) Use the x-stream-filter-value message annotation to carry the filter value in a published message. Use the rabbitmq:stream-filter and rabbitmq:stream-match-unfiltered filters when creating a receiver that wants to filter out messages from a stream. 36. Remove credit extension from AMQP 0.9.1 client 37. Support maintenance mode closing AMQP 1.0 connections. 38. Remove AMQP 0.9.1 client dependency from AMQP 1.0 implementation. 39. Move AMQP 1.0 plugin to the core. AMQP 1.0 is enabled by default. The old rabbitmq_amqp1_0 plugin will be kept as a no-op plugin to prevent deployment tools from failing that execute: ``` rabbitmq-plugins enable rabbitmq_amqp1_0 rabbitmq-plugins disable rabbitmq_amqp1_0 ``` 40. Breaking change: Remove CLI command `rabbitmqctl list_amqp10_connections`. Instead, list both AMQP 0.9.1 and AMQP 1.0 connections in `list_connections`: ``` rabbitmqctl list_connections protocol Listing connections ... protocol {1, 0} {0,9,1} ``` ## Benchmarks ### Throughput & Latency Setup: * Single node Ubuntu 22.04 * Erlang 26.1.1 Start RabbitMQ: ``` make run-broker PLUGINS="rabbitmq_management rabbitmq_amqp1_0" FULL=1 RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 3" ``` Predeclare durable classic queue cq1, durable quorum queue qq1, durable stream queue sq1. Start client: https://github.com/ssorj/quiver https://hub.docker.com/r/ssorj/quiver/tags (digest 453a2aceda64) ``` docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest bash-5.1# quiver --version quiver 0.4.0-SNAPSHOT ``` 1. Classic queue ``` quiver //host.docker.internal//amq/queue/cq1 --durable --count 1m --duration 10m --body-size 12 --credit 1000 ``` This commit: ``` Count ............................................. 1,000,000 messages Duration ............................................... 73.8 seconds Sender rate .......................................... 13,548 messages/s Receiver rate ........................................ 13,547 messages/s End-to-end rate ...................................... 13,547 messages/s Latencies by percentile: 0% ........ 0 ms 90.00% ........ 9 ms 25% ........ 2 ms 99.00% ....... 14 ms 50% ........ 4 ms 99.90% ....... 17 ms 100% ....... 26 ms 99.99% ....... 24 ms ``` RabbitMQ 3.x (main branch as of 30 January 2024): ``` ---------------------- Sender ----------------------- --------------------- Receiver ---------------------- -------- Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms] ----------------------------------------------------- ----------------------------------------------------- -------- 2.1 130,814 65,342 6 73.6 2.1 3,217 1,607 0 8.0 511 4.1 163,580 16,367 2 74.1 4.1 3,217 0 0 8.0 0 6.1 229,114 32,767 3 74.1 6.1 3,217 0 0 8.0 0 8.1 261,880 16,367 2 74.1 8.1 67,874 32,296 8 8.2 7,662 10.1 294,646 16,367 2 74.1 10.1 67,874 0 0 8.2 0 12.1 360,180 32,734 3 74.1 12.1 67,874 0 0 8.2 0 14.1 392,946 16,367 3 74.1 14.1 68,604 365 0 8.2 12,147 16.1 458,480 32,734 3 74.1 16.1 68,604 0 0 8.2 0 18.1 491,246 16,367 2 74.1 18.1 68,604 0 0 8.2 0 20.1 556,780 32,767 4 74.1 20.1 68,604 0 0 8.2 0 22.1 589,546 16,375 2 74.1 22.1 68,604 0 0 8.2 0 receiver timed out 24.1 622,312 16,367 2 74.1 24.1 68,604 0 0 8.2 0 quiver: error: PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/cq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-otujr23y' returned non-zero exit status 1. Traceback (most recent call last): File "/usr/local/lib/quiver/python/quiver/pair.py", line 144, in run _plano.wait(receiver, check=True) File "/usr/local/lib/quiver/python/plano/main.py", line 1243, in wait raise PlanoProcessError(proc) plano.main.PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/cq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-otujr23y' returned non-zero exit status 1. ``` 2. Quorum queue: ``` quiver //host.docker.internal//amq/queue/qq1 --durable --count 1m --duration 10m --body-size 12 --credit 1000 ``` This commit: ``` Count ............................................. 1,000,000 messages Duration .............................................. 101.4 seconds Sender rate ........................................... 9,867 messages/s Receiver rate ......................................... 9,868 messages/s End-to-end rate ....................................... 9,865 messages/s Latencies by percentile: 0% ....... 11 ms 90.00% ....... 23 ms 25% ....... 15 ms 99.00% ....... 28 ms 50% ....... 18 ms 99.90% ....... 33 ms 100% ....... 49 ms 99.99% ....... 47 ms ``` RabbitMQ 3.x: ``` ---------------------- Sender ----------------------- --------------------- Receiver ---------------------- -------- Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Time [s] Count [m] Rate [m/s] CPU [%] RSS [M] Lat [ms] ----------------------------------------------------- ----------------------------------------------------- -------- 2.1 130,814 65,342 9 69.9 2.1 18,430 9,206 5 7.6 1,221 4.1 163,580 16,375 5 70.2 4.1 18,867 218 0 7.6 2,168 6.1 229,114 32,767 6 70.2 6.1 18,867 0 0 7.6 0 8.1 294,648 32,734 7 70.2 8.1 18,867 0 0 7.6 0 10.1 360,182 32,734 6 70.2 10.1 18,867 0 0 7.6 0 12.1 425,716 32,767 6 70.2 12.1 18,867 0 0 7.6 0 receiver timed out 14.1 458,482 16,367 5 70.2 14.1 18,867 0 0 7.6 0 quiver: error: PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/qq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-b1gcup43' returned non-zero exit status 1. Traceback (most recent call last): File "/usr/local/lib/quiver/python/quiver/pair.py", line 144, in run _plano.wait(receiver, check=True) File "/usr/local/lib/quiver/python/plano/main.py", line 1243, in wait raise PlanoProcessError(proc) plano.main.PlanoProcessError: Command 'quiver-arrow receive //host.docker.internal//amq/queue/qq1 --impl qpid-proton-c --duration 10m --count 1m --rate 0 --body-size 12 --credit 1000 --transaction-size 0 --timeout 10 --durable --output /tmp/quiver-b1gcup43' returned non-zero exit status 1. ``` 3. Stream: ``` quiver-arrow send //host.docker.internal//amq/queue/sq1 --durable --count 1m -d 10m --summary --verbose ``` This commit: ``` Count ............................................. 1,000,000 messages Duration ................................................ 8.7 seconds Message rate ........................................ 115,154 messages/s ``` RabbitMQ 3.x: ``` Count ............................................. 1,000,000 messages Duration ............................................... 21.2 seconds Message rate ......................................... 47,232 messages/s ``` ### Memory usage Start RabbitMQ: ``` ERL_MAX_PORTS=3000000 RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+P 3000000 +S 6" make run-broker PLUGINS="rabbitmq_amqp1_0" FULL=1 RABBITMQ_CONFIG_FILE="rabbitmq.conf" ``` ``` /bin/cat rabbitmq.conf tcp_listen_options.sndbuf = 2048 tcp_listen_options.recbuf = 2048 vm_memory_high_watermark.relative = 0.95 vm_memory_high_watermark_paging_ratio = 0.95 loopback_users = none ``` Create 50k connections with 2 sessions per connection, i.e. 100k session in total: ```go package main import ( "context" "log" "time" "github.com/Azure/go-amqp" ) func main() { for i := 0; i < 50000; i++ { conn, err := amqp.Dial(context.TODO(), "amqp://nuc", &amqp.ConnOptions{SASLType: amqp.SASLTypeAnonymous()}) if err != nil { log.Fatal("dialing AMQP server:", err) } _, err = conn.NewSession(context.TODO(), nil) if err != nil { log.Fatal("creating AMQP session:", err) } _, err = conn.NewSession(context.TODO(), nil) if err != nil { log.Fatal("creating AMQP session:", err) } } log.Println("opened all connections") time.Sleep(5 * time.Hour) } ``` This commit: ``` erlang:memory(). [{total,4586376480}, {processes,4025898504}, {processes_used,4025871040}, {system,560477976}, {atom,1048841}, {atom_used,1042841}, {binary,233228608}, {code,21449982}, {ets,108560464}] erlang:system_info(process_count). 450289 ``` 7 procs per connection + 1 proc per session. (7 + 21) 50,000 = 450,000 procs RabbitMQ 3.x: ``` erlang:memory(). [{total,15168232704}, {processes,14044779256}, {processes_used,14044755120}, {system,1123453448}, {atom,1057033}, {atom_used,1052587}, {binary,236381264}, {code,21790238}, {ets,391423744}] erlang:system_info(process_count). 1850309 ``` 7 procs per connection + 15 per session (7 + 215) 50,000 = 1,850,000 procs 50k connections + 100k session require with this commit: 4.5 GB in RabbitMQ 3.x: 15 GB ## Future work 1. More efficient parser and serializer 2. TODO in mc_amqp: Do not store the parsed message on disk. 3. Implement both AMQP HTTP extension and AMQP management extension to allow AMQP clients to create RabbitMQ objects (queues, exchanges, ...).	2024-02-28 14:15:20 +01:00
Michael Klishin	91def1edb5	ctl delete_queue: mention [--vhost <name>] explicitly in the usage description. It is already mentioned in the General Options section below but apparently not everyone reads it.	2024-02-27 13:16:22 -05:00
Marcial Rosales	41237fbb3b	Fix gaxelle issues around oauth2 dependencies	2024-02-14 18:55:39 +01:00
Michael Klishin	9c79ad8d55	More missed license header updates #9969	2024-02-05 12:26:25 -05:00
Michael Klishin	f414c2d512	More missed license header updates #9969	2024-02-05 11:53:50 -05:00
Michael Klishin	875fe9a74c	Merge pull request #10459 from rabbitmq/md-elixir-warnings Resolve CLI elixirc warnings	2024-02-01 18:56:04 -05:00
Michael Davis	c285651636	Resolve elixirc warnings * Remove unused aliases/imports * Remove or underscore unused bindings * Fix variables that should be atoms (`unavailable` -> `:unavailable`) Also, `Logger.warn/1` has been replaced by `Logger.warning/1`. It should be safe to just replace the call with `Logger.warning/1` since it's been in the standard library since Elixir 1.11.	2024-02-01 16:30:14 -05:00
Michael Davis	cc12d73b91	Add Bazel test for compiling rabbitmqctl with warnings-as-errors This test should fail when `mix compile --warnings-as-errors` gives any warnings.	2024-02-01 15:39:47 -05:00
Alex Valiushko	70ac0cf3c5	bump elixir to 1.17	2024-01-29 16:28:08 -08:00
Michael Klishin	4641e66c85	Merge pull request #10426 from rabbitmq/amazon-mq-global-quorum-critical New upgrade time QQ health check: add check_if_new_quorum_queue_replicas_have_finished_initial_sync by @illotum (plus a test)	2024-01-26 23:08:26 -05:00
Michael Klishin	c4ae6f30b0	Basic tests for the new CLI command #10304	2024-01-26 20:29:03 -05:00
Alex Valiushko	2177d4a5c8	add ability to list queues with local promotable replicas	2024-01-24 12:25:39 -08:00
Karl Nilsson	f038ae0d06	formatting	2024-01-24 11:28:41 +00:00
Karl Nilsson	71e7c33448	Stream coordinator: fixes to automatic membership changes. Various bug fixes to make stream coordinator membership changes more reliable. Previously various errors could happen as well as partially successful attempts where the membership change command may fail but it leaves the new server running. Also ensure that stream coordinator members are removed as part of the forget_cluster_node command. Add stream coordinator status command. To show the raft status of the stream coordinator just like is done for quorum queues.	2024-01-24 11:02:42 +00:00
Michael Klishin	d9a8c2d964	Merge pull request #10393 from ariel-anieli/pr-os-name Removed extra clause in platform.os_name/1	2024-01-22 18:34:29 -05:00
Ariel Otilibili	9868c919c8	Removed extra clause in platform.os_name/1 * platform.os_name/1 parses output of :rabbit.status/1 * :rabbit.status/1 get its `os` key from :os.type/0 * :linux already matched by `platform.os_name({:unix, name})`.	2024-01-22 22:28:06 +01:00
Michael Klishin	c1d37e3e02	Merge pull request #10364 from rabbitmq/flaky-mc-flake-flake Reduce flakiness of certain Common Test suites	2024-01-22 16:22:24 -05:00
Karl Nilsson	60f9f3ce56	Wait command: loop when file read returns the empty binary. As writing to a file isn't atomic between opening and writing this can happen and would unnecessarily return the :garbage_in_pid_file error.	2024-01-22 15:51:24 +00:00
Karl Nilsson	5266902caf	garbage	2024-01-22 15:27:35 +00:00
Diana Parra Corbacho	2171b5abe5	Mix format	2024-01-19 11:22:16 -05:00
Diana Parra Corbacho	c9c4574dd0	Update rabbitmqctl tests for rename/update cluster nodes	2024-01-19 11:22:16 -05:00
Michael Klishin	06ca99c3ab	Fix CLI tools dialyzer	2024-01-19 11:22:16 -05:00
Michael Klishin	b2180a558b	Make rabbitmqctl rename_cluster_node's friend, update_cluster_nodes, a no-op	2024-01-19 11:22:16 -05:00
Michael Klishin	1556fec127	Make 'rabbitmqctl rename_cluster_node' a no-op This makes a command that renames cluster members a no-op. This command is really complex under the hood and is fundamentally incompatible with a few key Raft-based features: * Khepri * Quorum queues * Streams Because Khepri first ships in RabbitMQ 3.13, now is the time to effectively eliminate this command. It will be permanently removed together with other deprecated CLI commands in 4.0. Per discussion with the team. Closes #10367.	2024-01-19 11:22:16 -05:00
Michael Klishin	79d52b9b5d	CLI: mix format	2024-01-16 00:13:41 -05:00
Michael Klishin	48ce1b5ec7	Improve supported information units (Mi, Gi, Ti) This revisits the information system conversion, that is, support for suffixes like GiB, GB. When configuration values like disk_free_limit.absolute, vm_memory_high_watermark.absolute are set, the value can contain an information unit (IU) suffix. We now support several new suffixes and the meaning a few more changes. First, the changes: * k, K now mean kilobytes and not kibibytes * m, M now mean megabytes and not mebibytes * g, G now means gigabytes and not gibibytes This is to match the system used by Kubernetes. There is no consensus in the industry about how "k", "m", "g", and similar single letter suffixes should be treated. Previously it was a power of 2, now a power of 10 to align with a very popular OSS project that explicitly documents what suffixes it supports. Now, the additions: Finally, the node will now validate these suffixes at boot time, so an unsupported value will cause the node to stop with a rabbitmq.conf validation error. The message logged will look like this: ```` 2024-01-15 22:11:17.829272-05:00 [error] <0.164.0> disk_free_limit.absolute invalid, supported formats: 500MB, 500MiB, 10GB, 10GiB, 2TB, 2TiB, 10000000000 2024-01-15 22:11:17.829376-05:00 [error] <0.164.0> Error preparing configuration in phase validation: 2024-01-15 22:11:17.829387-05:00 [error] <0.164.0> - disk_free_limit.absolute invalid, supported formats: 500MB, 500MiB, 10GB, 10GiB, 2TB, 2TiB, 10000000000 ```` Closes #10310	2024-01-15 22:11:57 -05:00
Ariel Otilibili	09b4be92f6	Removed ASCII escape codes if not enabled; fixes #2634 https://hexdocs.pm/elixir/IO.ANSI.html#format/2	2024-01-10 01:04:12 +01:00
Ariel Otilibili	a9e488dbed	Fix for #8557 , removed ANSI codes in JSON output Add missing newline chars	2024-01-09 09:15:14 -08:00
Luke Bakken	3f33dfa227	Add -noinput via `rabbitmq-env` Follow-up to #10131	2023-12-30 16:27:02 -08:00
Luke Bakken	cb28ffc05b	Ensure that elixir escript does not read from stdin This change ensures that you do not have to redirect `stdin` from `/dev/null` to use `rabbitmqctl` and related utilities in a `while` / `read` shell loop. References: * https://github.com/lukebakken/vesc-1073/blob/main/delete.bash#L24-L32 * https://github.com/rabbitmq/support-tools/pull/38	2023-12-14 06:13:56 -08:00
Karl Nilsson	972e78b9d3	Avoid some cluster wide calls in cluster_status command.	2023-12-13 09:28:21 +00:00
Diana Parra Corbacho	a363737337	CLI: list_deprecated_features command Lists all or used deprecated features	2023-12-11 12:51:13 +01:00
Michael Klishin	9d94048852	Avoid importing resources without specified virtual host On boot first and foremost. Log a more helpful message. See #10068 for the background.	2023-12-08 02:35:17 -05:00
Jean-Sébastien Pédron	84cede17e1	rabbit_peer_discovery: Rewrite core logic [Why] This work started as an effort to add peer discovery support to our Khepri integration. Indeed, as part of the task to integrate Khepri, we missed the fact that `rabbit_peer_discovery:maybe_create_cluster/1` was called from the Mnesia-specific code only. Even though we knew about it because we hit many issues caused by the fact the `join_cluster` and peer discovery use different code path to create a cluster. To add support for Khepri, the first version of this patch was to move the call to `rabbit_peer_discovery:maybe_create_cluster/1` from `rabbit_db_cluster` instead of `rabbit_mnesia`. To achieve that, it made sense to unify the code and simply call `rabbit_db_cluster:join/2` instead of duplicating the work. Unfortunately, doing so highlighted another issue: the way the node to cluster with was selected. Indeed, it could cause situations where multiple clusters are created instead of one, without resorting to out-of-band counter-measures, like a 30-second delay added in the Kubernetes operator (rabbitmq/cluster-operator#1156). This problem was even more frequent when we tried to unify the code path and call `join_cluster`. After several iterations on the patch and even more discussions with the team, we decided to rewrite the algorithm to make node selection more robust and still use `rabbit_db_cluster:join/2` to create the cluster. [How] This commit is only about the rewrite of the algorithm. Calling peer discovery from `rabbit_db_cluster` instead of `rabbit_mnesia` (and thus making peer discovery work with Khepri) will be done in a follow-up commit. We wanted the new algorithm to fulfill the following properties: 1. `rabbit_peer_discovery` should provide the ability to re-trigger it easily to re-evaluate the cluster. The new public API is `rabbit_peer_discovery:sync_desired_cluster/0`. 2. The selection of the node to join should be designed in a way that all nodes select the same, regardless of the order in which they become available. The adopted solution is to sort the list of discovered nodes with the following criterias (in that order): 1. the size of the cluster a discovered node is part of; sorted from bigger to smaller clusters 2. the start time of a discovered node; sorted from older to younger nodes 3. the name of a discovered node; sorted alphabetically The first node in that list will not join anyone and simply proceed with its boot process. Other nodes will try to join the first node. 3. To reduce the chance of incorrectly having multiple standalone nodes because the discovery backend returned only a single node, we want to apply the following constraints to the list of nodes after it is filtered and sorted (see property 2 above): * The list must contain `node()` (i.e. the node running peer discovery itself). * If the RabbitMQ's cluster size hint is greater than 1, the list must have at least two nodes. The cluster size hint is the maximum between the configured target cluster size hint and the number of elements in the nodes list returned by the backend. If one of the constraint is not met, the entire peer discovery process is restarted after a delay. 4. The lock is acquired only to protect the actual join, not the discovery step where the backend is queried to get the list of peers. With the node selection described above, this will let the first node to start without acquiring the lock. 5. The cluster membership views queried as part of the algorithm to sort the list of nodes will be used to detect additional clusters or standalone nodes that did not cluster correctly. These nodes will be asked to re-evaluate peer discovery to increase the chance of forming a single cluster. 6. After some delay, peer discovery will be re-evaluated to further eliminate the chances of having multiple clusters instead of one. This commit covers properties from point 1 to point 4. Remaining properties will be the scope of additional pull requests after this one works. If there is a failure at any point during discovery, filtering/sorting, locking or joining, the entire process is restarted after a delay. This is configured using the following parameters: * cluster_formation.discovery_retry_limit * cluster_formation.discovery_retry_interval The default parameters were bumped to 30 retries with a delay of 1 second between each. The locking retries/interval parameters are not used by the new algorithm anymore. There are extra minor changes that come with the rewrite: * The configured backend is cached in a persistent term. The goal is to make sure we use the same backend throughout the entire process and when we call `maybe_unregister/0` even if the configuration changed for whatever reason in between. * `maybe_register/0` is called from `rabbit_db_cluster` instead of at the end of a successful peer discovery process. `rabbit_db_cluster` had to call `maybe_register/0` if the node was not virgin anyway. So make it simpler and always call it in `rabbit_db_cluster` regardless of the state of the node. * `log_configured_backend/0` is gone. `maybe_init/0` can log the backend directly. There is no need to explicitly call another function for that. * Messages are logged using `?LOG_*()` macros instead of the old `rabbit_log` module.	2023-12-07 15:51:54 +01:00
Michael Klishin	625edededc	Update tests	2023-11-24 16:09:41 -05:00
Michael Klishin	5caad16366	Make 'ctl node_health_check' a no-op It's been deprecated over three years ago [1] (before RabbitMQ 3.9). 1. `e01753ed7a`	2023-11-23 18:18:25 -05:00
Michael Klishin	1b642353ca	Update (c) according to [1] 1. https://investors.broadcom.com/news-releases/news-release-details/broadcom-and-vmware-intend-close-transaction-november-22-2023	2023-11-21 23:18:22 -05:00
Michal Kuratczyk	32eba980f0	CLI: prune_code_paths: false	2023-11-17 13:09:31 +01:00
Michal Kuratczyk	93bb847798	Revert "Drop JSON from mix.exs #9926 #9932" This reverts commit `8de8e0c18a`.	2023-11-17 13:08:10 +01:00
Michal Kuratczyk	db56f662af	Revert "Bazel bits for #9926" This reverts commit `8d402e75b7`.	2023-11-17 13:07:28 +01:00
Michal Kuratczyk	5459975c5c	Revert "Use rabbit_json in the CLI" This reverts commit `6c65715875`.	2023-11-17 13:05:38 +01:00
Michael Klishin	8de8e0c18a	Drop JSON from mix.exs #9926 #9932	2023-11-16 10:05:30 -05:00
Michael Klishin	8d402e75b7	Bazel bits for #9926	2023-11-16 09:46:19 -05:00
Michal Kuratczyk	6c65715875	Use rabbit_json in the CLI dep_json doesn't seem to be maintained and it's unnecessary anyway, since we can use rabbit_json, which uses thoas	2023-11-16 00:01:14 +01:00
Rin Kuryloski	d81bfb46c0	Another attempt to fix the missing rabbit_framing.hrl error when running the cli tests copies rather than symlinks files into the ERL_LIBS dir when building the cli	2023-11-15 09:51:05 +01:00
Michael Klishin	229a9fb7bd	list_policies_that_match: correctly format 'not found' errors as JSON	2023-11-13 21:35:48 -05:00
Michael Klishin	1ff5f6077b	mix format	2023-11-13 21:18:27 -05:00
Michael Klishin	8a76e903a3	One more test renaming to follow CLI conventions	2023-11-13 20:46:31 -05:00
Michael Klishin	fd0488516b	diagnostics list_policies_that_match: support JSON formatting	2023-11-13 20:46:06 -05:00
Michael Klishin	2ebc23ef23	Use a standard CLI test suite file naming convention	2023-11-13 19:51:58 -05:00
Michael Klishin	c4db560e0e	CLI: mix format	2023-11-13 11:21:29 -05:00
Michal Kuratczyk	408c33ec49	Add list_policies_that_match command	2023-11-13 13:47:54 +01:00
Michal Kuratczyk	b2c01e3e8e	Remove dialyxir from bazel	2023-11-10 15:37:11 +01:00
Michal Kuratczyk	1ffae77442	Bump CLI deps, remove dialyxir	2023-11-10 15:31:12 +01:00
Michael Klishin	4e58ad9dad	CLI: two more test suites	2023-11-08 16:49:53 -05:00
Michael Klishin	a88c22144d	CLI: mix format	2023-11-06 23:03:41 -05:00
Michael Klishin	114f9b90c9	CLI: refactor 'diagnostics check_if_any_deprecated_features_are_used'	2023-11-06 22:50:35 -05:00
Diana Parra Corbacho	51783e9464	CLI: check if any deprecated features are used returns just the list of features	2023-11-06 17:45:11 +01:00
Michael Klishin	cbe2756cbd	CLI: tests and refactoring for 'diagnostics check_if_cluster_has_classic_queue_mirroring_policy'	2023-11-06 07:20:08 -05:00
Michael Klishin	827b495d1a	CLI: wording	2023-11-06 07:20:08 -05:00
Michael Klishin	b725bba735	CLI: add a test for 'ctl remove_classic_queue_mirroring_from_policies'	2023-11-06 07:20:08 -05:00
Diana Parra Corbacho	6288e9aa2c	CTL: check if any deprecated features are used command	2023-11-06 07:20:08 -05:00
Diana Parra Corbacho	13e88ced92	Use Diagnostics group	2023-11-06 07:20:08 -05:00
Michael Klishin	ae934d9ebc	CLI: mix format	2023-11-06 07:20:08 -05:00
Diana Parra Corbacho	70c97be06c	CLI: command to remove all classic queue mirroring policies	2023-11-06 07:20:08 -05:00
Diana Parra Corbacho	a06698d43d	CLI: command to check if cluster has a classic queue mirroring policy	2023-11-06 07:20:08 -05:00
Diana Parra Corbacho	8df94cc13c	CLI: commands to list policies / operator policies with CMQ rabbitmq-queues list_operator_policies_with_classic_queue_mirroring rabbitmq-queues list_policies_with_classic_queue_mirroring	2023-11-06 07:20:08 -05:00
Jean-Sébastien Pédron	b15eb0ff1b	rabbit_db: `join/2` now takes care of stopping/starting RabbitMQ [Why] Up until now, a user had to run the following three commands to expand a cluster: 1. stop_app 2. join_cluster 3. start_app Stopping and starting the `rabbit` application and taking care of the underlying Mnesia application could be handled by `join_cluster` directly. [How] After the call to `can_join/1` and before proceeding with the actual join, the code remembers the state of `rabbit`, the Feature flags controler and Mnesia. After the join, it restarts whatever needs to be restarted to. It does so regardless of the success or failure of the join. One exception is when the node switched from Mnesia to Khepri as part of that join. In this case, Mnesia is left stopped.	2023-10-26 11:22:47 +02:00

1 2 3 4 5 ...

2199 Commits