rabbitmq-server

Commit Graph

Author	SHA1	Message	Date
Luke Bakken	db1291d49b	Code to clean up queue process state when crashed * Call `Mod:format_state/1` if exported to possibly truncate huge states * Add more information about truncated ram_pending_ack and disk_pending_ack * Add `log.error_logger_format_depth` cuttlefish schema value * Add `format_state/1` to `rabbit_channel` * Add `log.summarize_process_state`, default is `false`, to enable summarizing process state for crash logs. * Added `format_state` to `rabbit_classic_queue_index_v2` and `rabbit_classic_queue_store_v2` * Ensure `rabbit_channel:format_state/1` uses `summarize_process_state_when_logged` * Do not set `summarize_process_state_when_logged` value by default. * Type specs	2025-09-15 09:37:12 -07:00
David Ansari	72cd7a35c2	Support Direct Reply-To for AMQP 1.0 # What? * Support Direct Reply-To for AMQP 1.0 * Compared to AMQP 0.9.1, this PR allows for multiple volatile queues on a single AMQP 1.0 session. Use case: JMS clients can create multiple temporary queues on the same JMS/AMQP session: * https://jakarta.ee/specifications/messaging/3.1/apidocs/jakarta.messaging/jakarta/jms/session#createTemporaryQueue() * https://jakarta.ee/specifications/messaging/3.1/apidocs/jakarta.messaging/jakarta/jms/jmscontext#createTemporaryQueue() * Fix missing metrics in for Direct Reply-To in AMQP 0.9.1, e.g. `messages_delivered_total` * Fix missing metrics (even without using Direct Reply-To ) in AMQP 0.9.1: If stats level is not `fine`, global metrics `rabbitmq_global_messages_delivered_` should still be incremented. # Why? Allow for scalable at-most-once RPC reply delivery Example use case: thousands of requesters connect, send a single request, wait for a single reply, and disconnect. This PR won't create any queue and won't write to the metadata store. Therefore, there's less pressure on the metadata store, less pressure on the Management API when listing all queues, less pressure on the metrics subsystem, etc. * Feature parity with AMQP 0.9.1 # How? This PR extracts the previously channel specific Direct Reply-To code into a new queue type: `rabbit_volatile_queue`. "Volatile" describes the semantics, not a use-case. It signals non-durable, zero-buffer, at-most-once, may-drop, and "not stored in Khepri." This new queue type is then used for AMQP 1.0 and AMQP 0.9.1. Sending to the volatile queue is stateless like previously with Direct Reply-To in AMQP 0.9.1 and like done for the MQTT QoS 0 queue. This allows for use cases where a single responder replies to e.g. 100k different requesters. RabbitMQ will automatically auto grant new link-credit to the responder because the new queue type confirms immediately. The key gets implicitly checked by the channel/session: If the queue name (including the key) doesn’t exist, the `handle_event` callback for this queue isn’t invoked and therefore no delivery will be sent to the responder. This commit supports Direct Reply-To across AMQP 1.0 and 0.9.1. In other words, the requester can be an AMQP 1.0 client while the responder is an AMQP 0.9.1 client or vice versa. RabbitMQ will internally convert between AMQP 0.9.1 `reply_to` and AMQP 1.0 `/queues/<queue>` address. The AMQP 0.9.1 `reply_to` property is expected to contain a queue name. That's in line with the AMQP 0.9.1 spec: > One of the standard message properties is Reply-To, which is designed specifically for carrying the name of reply queues. Compared to AMQP 0.9.1 where the requester sets the `reply_to` property to `amq.rabbitmq.reply-to` and RabbitMQ modifies this field when forwarding the message to the request queue, in AMQP 1.0 the requester learns about the queue name from the broker at link attachment time. The requester has to set the reply-to property to the server generated queue name. That's because the server isn't allowed to modify the bare message. During link attachment time, the client has to set certain fields. These fields are expected to be set by the RabbitMQ client libraries. Here is an Erlang example: ```erl Source = #{address => undefined, durable => none, expiry_policy => <<"link-detach">>, dynamic => true, capabilities => [<<"rabbitmq:volatile-queue">>]}, AttachArgs = #{name => <<"receiver">>, role => {receiver, Source, self()}, snd_settle_mode => settled, rcv_settle_mode => first}, {ok, Receiver} = amqp10_client:attach_link(Session, AttachArgs), AddressReplyQ = receive {amqp10_event, {link, Receiver, {attached, Attach}}} -> #'v1_0.attach'{source = #'v1_0.source'{address = {utf8, Addr}}} = Attach, Addr end, ``` The client then sends the message by setting the reply-to address as follows: ```erl amqp10_client:send_msg( SenderRequester, amqp10_msg:set_properties( #{message_id => <<"my ID">>, reply_to => AddressReplyQ}, amqp10_msg:new(<<"tag">>, <<"request">>))), ``` If the responder attaches to the queue target in the reply-to field, RabbitMQ will check if the requester link is still attached. If the requester detached, the link will be refused. The responder can also attach to the anonymous null target and set the `to` field to the `reply-to` address. If RabbitMQ cannot deliver a reply, instead of buffering the reply, RabbitMQ will be drop the reply and increment the following Prometheus metric: ``` rabbitmq_global_messages_dead_lettered_maxlen_total{queue_type="rabbit_volatile_queue",dead_letter_strategy="disabled"} 0.0 ``` That's in line with the MQTT QoS 0 queue type. A reply message could be dropped for a variety of reasons: 1. The requester ran out of link-credit. It's therefore the requester's responsibility to grant sufficient link-credit on its receiving link. 2. RabbitMQ isn't allowed to deliver any message to due session flow control. It's the requster's responsibility to keep the session window large enough. 3. The requester doesn't consume messages fast enough causing TCP backpressure being applied or the RabbitMQ AMQP writer proc isn't scheduled quickly enough. The latter can happen for example if RabbitMQ runs with a single scheduler (is assigned a single CPU core). In either case, RabbitMQ internal flow control causes the volatile queue to drop messages. Therefore, if high throughput is required while message loss is undesirable, a classic queue should be used instead of a volatile queue since the former buffers messages while the latter doesn't. The main difference between the volatile queue and the MQTT QoS 0 queue is that the former isn't written to the metadata store. # Breaking Change Prior to this PR the following [documented caveat](https://www.rabbitmq.com/docs/4.0/direct-reply-to#limitations) applied: > If the RPC server publishes with the mandatory flag set then `amq.rabbitmq.reply-to.` is treated as not* a queue; i.e. if the server only publishes to this name then the message will be considered "not routed"; a `basic.return` will be sent if the mandatory flag was set. This PR removes this caveat. This PR introduces the following new behaviour: > If the RPC server publishes with the mandatory flag set, then `amq.rabbitmq.reply-to.*` is treated as a queue (assuming this queue name is encoded correctly). However, whether the requester is still there to consume the reply is not checked at routing time. In other words, if the RPC server only publishes to this name, then the message will be considered "routed" and RabbitMQ will therefore not send a `basic.return`.	2025-09-09 14:52:22 +02:00
Jean-Sébastien Pédron	18b8f9a02d	Move `rabbit_auth*` from rabbit_common to rabbit [Why] These modules are not used by amqp_client. Therefore, they shouldn't be in rabbit_common.	2025-08-11 09:31:40 +02:00
Luke Bakken	5f5bf81e40	Fix error popup text display When validation fails for a policy parameter, the resulting popup can't be read due to one extra binary encoding as well as code that escapes HTML entites. Since the EJS template uses `<%= >` for the popup, it will display the text as-is, and not render any HTML.	2025-07-23 22:44:17 -07:00
Michal Kuratczyk	f6e718c599	Re-introduce rabbit_log for backwards compatibility Some community plugins use rabbit_log. To simplify the transition, we can keep this module as a simple wrapper on logger macros.	2025-07-18 08:43:05 +02:00
Michal Kuratczyk	b6b766cac7	[skip ci] Replace logger: calls is LOG_ macros	2025-07-18 08:43:02 +02:00
Michal Kuratczyk	175ba70e8c	[skip ci] Remove rabbit_log and switch to LOG_ macros	2025-07-18 08:42:59 +02:00
Michal Kuratczyk	9099cfb64e	OTP-PUB-KEY -> PKIXAlgs-2009 for OTP28+	2025-06-27 12:42:57 +02:00
Michal Kuratczyk	8273c500c4	STOMP: Handle OTP28 re:split("", ...) behaviour	2025-06-27 12:42:56 +02:00
Michal Kuratczyk	da8f4299b5	Adapt to OTP28 sslsocket 1. OTP28 changed sslsocket structure 2. an old hack is no longer necessary	2025-06-27 12:42:52 +02:00
Michael Klishin	e9fc656241	Wrap TLS options password into a function in more places A follow-up to #13958 #13999. Pair: @dcorbacho.	2025-06-04 12:24:45 +04:00
Michal Kuratczyk	637a2bc8cc	OTP28: re:split change; street-address macro https://github.com/erlang/otp/issues/9739 In OTP28+, splitting an empty string returns an empty list, not an empty string (the input). Additionally `street-address` macro was removed in OTP28 - replace with the value it used to be. Lastly, rabbitmq_auth_backend_oauth2 has an MQTT test, so add rabbitmq_mqtt to TEST_DEPS	2025-05-19 08:59:50 +02:00
Iliia Khaprov	59701a0ea9	Queues with plugins - Diana's review	2025-05-17 20:47:55 +02:00
Iliia Khaprov	e408c9e0f2	Queues with plugins - core	2025-05-17 20:47:43 +02:00
Jean-Sébastien Pédron	e58eb1807a	Move `file_handle_cache` and `vm_memory_monitor` back to `rabbit` [Why] They were moved from `rabbit` to `rabbit_common` several years ago to solve an dependency issue because `amqp_client` depended on the file handle cache. This is not the case anymore. [How] The modules are moved back to `rabbit`. `rabbit_common` doesn't need to depend on `os_mon` anymore. `rabbit` already depends on it, so no changes needed here. `include/rabbit_memory.hrl` and some test cases are moved as well to follow the `vm_memory_monitor` module.	2025-05-07 09:46:14 +02:00
Arnaud Cogoluègnes	0dc55be1d3	Commit generated code after FRAME-MIN-SIZE change References #13541	2025-03-17 17:04:38 +01:00
Michal Kuratczyk	a9d5b6001a	k8s peer discovery v2 (#13050 ) * Redesigned k8s peer discovery Rather than querying the Kubernetes API, just check the local node name and try to connect to the pod with `-0` suffix (or configured `ordinal_start` value). Only the pod with the lowest ordinal can form a new cluster - all other pods will wait forever. This should prevent any race conditions and incorrectly formed clusters.	2025-02-04 17:07:27 +01:00
David Ansari	579c58603e	Support AMQP over WebSocket (OSS part)	2025-01-27 17:50:47 +01:00
Michael Klishin	968eefa1bb	Bump (c) line year There are no functional changes to this massive diff.	2025-01-01 17:54:10 -05:00
Péter Gömöri	321039c353	fixup: Add rabbit_misc:process_info/2 that also works for remote PIDs Co-authored-by: Luke Bakken <lukerbakken@gmail.com> (cherry picked from commit `095f702093`)	2024-12-09 22:33:54 -05:00
Péter Gömöri	e777c0b263	Add rabbit_misc:process_info/2 that also works for remote PIDs (cherry picked from commit `7b7708f367`)	2024-12-09 22:33:38 -05:00
Ace Breakpoint	ee41983e84	Fix crash caused by mishandling of non-ascii amqp_error explaination `rabbit_binary_generator:map_exception/3` will crash when there are unicode characters in the `explaination` field of `Reason#amqp_error` parameter. The explaination string (list) is assumed to be ascii, with each character/member in the range of a byte. Any unicode characters in the string will trigger `badarg` crash of `list_to_binary/1` in `rabbit_binary_generator:amqp_exception_explanation/2`. Amqp091 shovel crash due to this is reported, https://github.com/rabbitmq/rabbitmq-server/discussions/12874 When a queue as shovel source/destination does not exist, and its name contains non-ascii characters, the explaination of amqp_error will be like `no queue non_ascii_name_😍 in vhost /`. It will subsequently crash and even affect management console. To fix this, `unicode:characters_to_binary/1` is used instead of `list_to_binary/1`, and unicode-safe truncation of long explaination with `io_lib:format/3` chars_limit replaces direct bytes truncation.	2024-12-04 10:39:16 +08:00
Michael Klishin	d3a3acee16	Refactor: as_list/1 belongs to rabbit_data_coercion	2024-11-25 12:33:33 -05:00
David Ansari	3db4a97cfb	Expose AMQP connection metrics Expose the same metrics for AMQP 1.0 connections as for AMQP 0.9.1 connections. Display the following AMQP 1.0 metrics on the Management UI: * Network bytes per second from/to client on connections page * Number of sessions/channels on connections page * Network bytes per second from/to client graph on connection page * Reductions graph on connection page * Garbage colletion info on connection page Expose the following AMQP 1.0 per-object Prometheus metrics: * rabbitmq_connection_incoming_bytes_total * rabbitmq_connection_outgoing_bytes_total * rabbitmq_connection_process_reductions_total * rabbitmq_connection_incoming_packets_total * rabbitmq_connection_outgoing_packets_total * rabbitmq_connection_pending_packets * rabbitmq_connection_channels The rabbit_amqp_writer proc: * notifies the rabbit_amqp_reader proc if it sent frames * hibernates eventually if it doesn't send any frames The rabbit_amqp_reader proc: * does not emit stats (update ETS tables) if no frames are received or sent to save resources when there are many idle connections.	2024-11-02 19:08:24 +01:00
Lois Soto Lopez	3ff7e82c5c	Provide specific f. to fix client ssl options Provides a specific function to fix client ssl options, i.e.: apply all fixes that are applied for TLS listeneres and clients on previous versions but also sets `cacerts` option to CA certificates obtained by `public_key:cacerts_get`, only when no `cacertfile` or `cacerts` are provided.	2024-10-21 18:00:06 -04:00
Karl Nilsson	465b19e8e8	Adjust vheap sizes for message handling processes in OTP 27 OTP 27 reset all assumptions on how the vm reacts to processes that buffer and process a lot of large binaries. Substantially increasing the vheap sizes for such process restores most of the same performance by allowing processes to hold more binary data before major garbage collections are triggered. This introduces a new module to capture process flag configurations. The new vheap sizes are only applied when running on OTP 27 or above.	2024-10-09 20:08:34 -04:00
Jean-Sébastien Pédron	9b2c6d95f8	rabbit_env: Drop $RABBITMQ_LOG_FF_REGISTRY [Why] Its use was removed when the registry was converted from a compiled module to a persistent_term.	2024-10-07 14:02:50 +02:00
Jean-Sébastien Pédron	6a0008b06c	rabbit_feature_flags: Accept "+feature1,-feature2" in $RABBITMQ_FEATURE_FLAGS [Why] Before this patch, the $RABBITMQ_FEATURE_FLAGS environment variable took an exhaustive list of feature flags to enable. This list overrode the default of enabling all stable feature flags. It made it inconvenient when a user wanted to enable an experimental feature flag like `khepri_db` while still leaving the default behavior. [How] $RABBITMQ_FEATURE_FLAGS now acceps the following syntax: RABBITMQ_FEATURE_FLAGS=+feature1,-feature2 This will start RabbitMQ with all stable feature flags, plus `feature1`, but without `feature2`. For users setting `forced_feature_flags_on_init` in the config, the corresponding syntax is: {forced_feature_flags_on_init, {rel, [feature1], [feature2]}}	2024-10-07 14:02:50 +02:00
Loïc Hoguin	ec95c1a88d	rabbit_common: Remove 'cover' related code from 'rabbit_misc' This is very old code that is likely no longer used. Removing it helps avoid depending on cover.	2024-09-30 12:35:42 +02:00
Loïc Hoguin	861943835f	Fix OTP-27 Dialyzer errors in rabbit_common	2024-09-30 12:35:42 +02:00
Michael Davis	0dd26f0c52	rabbit_db_queue: Transactionally delete transient queues from Khepri The prior code skirted transactions because the filter function might cause Khepri to call itself. We want to use the same idea as the old code - get all queues, filter them, then delete them - but we want to perform the deletion in a transaction and fail the transaction if any queues changed since we read them. This fixes a bug - that the call to `delete_in_khepri/2` could return an error tuple that would be improperly recognized as `Deletions` - but should also make deleting transient queues atomic and fast. Each call to `delete_in_khepri/2` needed to wait on Ra to replicate because the deletion is an individual command sent from one process. Performing all deletions at once means we only need to wait for one command to be replicated across the cluster. We also bubble up any errors to delete now rather than storing them as deletions. This fixes a crash that occurs on node down when Khepri is in a minority.	2024-08-13 11:40:18 -04:00
Michael Davis	96c60a2de4	Move 'for_each_while_ok/2' helper to rabbit_misc	2024-07-22 16:02:03 -04:00
Lois Soto Lopez	bb93e718c2	Prometheus: some per-exchange/per-queue metrics aggregated per-channel Add copies of some per-object metrics that are labeled per-channel aggregated to reduce cardinality. These metrics are valuable and easier to process if exposed on per-exchange and per-queue basis.	2024-07-16 14:30:25 +02:00
Michal Kuratczyk	f398892bda	Deprecate queue-master-locator (#11565 ) * Deprecate queue-master-locator This should not be a breaking change - all validation should still pass * CQs can now use `queue-leader-locator` * `queue-leader-locator` takes precedence over `queue-master-locator` if both are used * regardless of which name is used, effectively there are only two values: `client-local` (default) or `balanced` * other values (`min-masters`, `random`, `least-leaders`) are mapped to `balanced` * Management UI no longer shows `master-locator` fields when declaring a queue/policy, but such arguments can still be used manually (unless not permitted) * exclusive queues are always declared locally, as before	2024-07-12 13:22:55 +02:00
Michael Klishin	0700e1cdc4	Revert "Provide per-exchange/queue metrics w/out channelID" This reverts commit `3ed2e30e3a`.	2024-07-11 21:34:52 -04:00
Lois Soto Lopez	ec5e258825	Provide per-exchange/queue metrics w/out channelID	2024-07-11 17:34:18 -04:00
Loïc Hoguin	5c8366f753	Remove file_handle_cache_stats module The stats were not removed from management agent, instead they are hardcoded to zero in the agent itself.	2024-06-24 12:07:51 +02:00
Loïc Hoguin	6a47eaad22	Zero sockets_used/sockets_limit stats They are no longer used. This removes a couple file_handle_cache:info/1 calls. We are not removing them from the HTTP API to avoid breaking things unintentionally.	2024-06-24 12:07:51 +02:00
Loïc Hoguin	49bedfc17e	Remove most of the fd related FHC code Stats were not removed, including management UI stats relating to FDs. Web-MQTT and Web-STOMP configuration relating to FHC were not removed. The file_handle_cache itself must be kept until we remove CQv1.	2024-06-24 12:07:51 +02:00
Michal Kuratczyk	41a4d1711d	OTP27 support (#11366 ) * "maybe" is now a keyword * Bump horus to 0.2.5 and switch to hex * Get rid of some deprecated callbacks/functions	2024-06-18 07:32:58 +02:00
Loïc Hoguin	41ce4da5ca	CQ: Remove ability to change shared store index module It will always use the ETS index. This change lets us do optimisations that would otherwise not be possible, including 81b2c39834953d9e1bd28938b7a6e472498fdf13. A small functional change is included in this commit: we now always use ets:update_counter to update the ref_count, instead of a mix of update_{counter,fields}. When upgrading to 4.0, the index will be rebuilt for all users that were using a custom index module.	2024-06-14 11:52:03 +02:00
David Ansari	d70e529d9a	Introduce outbound RabbitMQ internal AMQP flow control ## What? Introduce RabbitMQ internal flow control for messages sent to AMQP clients. Prior this PR, when an AMQP client granted a large amount of link credit (e.g. 100k) to the sending queue, the sending queue sent that amount of messages to the session process no matter what. This becomes problematic for memory usage when the session process cannot send out messages fast enough to the AMQP client, especially if 1. The writer proc cannot send fast enough. This can happen when the AMQP client does not receive fast enough and causes TCP back-pressure to the server. Or 2. The server session proc is limited by remote-incoming-window. Both scenarios are now added as test cases. Tests * tcp_back_pressure_rabbitmq_internal_flow_quorum_queue * tcp_back_pressure_rabbitmq_internal_flow_classic_queue cover scenario 1. Tests * incoming_window_closed_rabbitmq_internal_flow_quorum_queue * incoming_window_closed_rabbitmq_internal_flow_classic_queue cover scenario 2. This PR sends messages from queues to AMQP clients in a more controlled manner. To illustrate: ``` make run-broker PLUGINS="rabbitmq_management" RABBITMQ_SERVER_ADDITIONAL_ERL_ARGS="+S 4" observer_cli:start() mq ``` where `mq` sorts by message queue length. Create a stream: ``` deps/rabbitmq_management/bin/rabbitmqadmin declare queue name=s1 queue_type=stream durable=true ``` Next, send and receive from the Stream via AMQP. Grant a large number of link credit to the sending stream: ``` docker run -it --rm --add-host host.docker.internal:host-gateway ssorj/quiver:latest bash-5.1# quiver --version quiver 0.4.0-SNAPSHOT bash-5.1# quiver //host.docker.internal//queue/s1 --durable -d 30s --credit 100000 ``` Before to this PR: ``` RESULTS Count ............................................... 100,696 messages Duration ............................................... 30.0 seconds Sender rate ......................................... 120,422 messages/s Receiver rate ......................................... 3,363 messages/s End-to-end rate ....................................... 3,359 messages/s ``` We observe that all 100k link credit worth of messages are buffered in the writer proc's mailbox: ``` \|No \| Pid \| MsgQueue \|Name or Initial Call \| Memory \| Reductions \|Current Function \| \|1 \|<0.845.0> \|100001 \|rabbit_amqp_writer:init/1 \| 126.0734 MB\| 466633491 \|prim_inet:send/5 \| ``` After to this PR: ``` RESULTS Count ............................................. 2,973,440 messages Duration ............................................... 30.0 seconds Sender rate ......................................... 123,322 messages/s Receiver rate ........................................ 99,250 messages/s End-to-end rate ...................................... 99,148 messages/s ``` We observe that the message queue lengths of both writer and session procs are low. ## How? Our goal is to have queues send out messages in a controlled manner without overloading RabbitMQ itself. We want RabbitMQ internal flow control between: ``` AMQP writer proc <--- session proc <--- queue proc ``` A similar concept exists for classic queues sending via AMQP 0.9.1. We want an approach that applies to AMQP and works generic for all queue types. For the interaction between AMQP writer proc and session proc we use a simple credit based approach reusing module `credit_flow`. For the interaction between session proc and queue proc, the following options exist: ### Option 1 The session process provides expliclity feedback to the queue after it has sent N messages. This approach is implemented in https://github.com/ansd/rabbitmq-server/tree/amqp-flow-control-poc-1 and works well. A new `rabbit_queue_type:sent/4` API was added which lets the queue proc know that it can send further messages to the session proc. Pros: * Will work equally well for AMQP 0.9.1, e.g. when quorum queues send messages in auto ack mode to AMQP 0.9.1 clients. * Simple for the session proc Cons: * Sligthly added complexity in every queue type implementation * Multiple Ra commands (settle, credit, sent) to decide when a quorum queue sends more messages. ### Option 2 A dual link approach where two AMQP links exists between ``` AMQP client <---link--> session proc <---link---> queue proc ``` When the client grants a large amount of credits, the session proc will top up credits to the queue proc periodically in smaller batches. Pros: * No queue type modifications required. * Re-uses AMQP link flow control Cons: * Significant added complexity in the session proc. A client can dynamically decrease or increase credits and dynamically change the drain mode while the session tops up credit to the queue. ### Option 3 Credit is a 32 bit unsigned integer. The spec mandates that the receiver independently chooses a credit. Nothing in the spec prevents the receiver to choose a credit of 1 billion. However the credit value is merely a maximum: > The link-credit variable defines the current maximum legal amount that the delivery-count can be increased by. Therefore, the server is not required to send all available messages to this receiver. For delivery-count: > Only the sender MAY independently modify this field. "independently" could be interpreted as the sender could add to the delivery-count irrespective of what the client chose for drain and link-credit. Option 3: The queue proc could at credit time already consume credit and advance the delivery-count if credit is too large before checking out any messages. For example if credit is 100k, but the queue only wants to send 1k, the queue could consume 99k of credits and advance the delivery-count, and subsequently send maximum 1k messages. If the queue advanced the delivery-count, RabbitMQ must send a FLOW to the receiver, otherwise the receiver wouldn’t know that it ran out of link-credit. Pros: * Very simple Cons: * Possibly unexpected behaviour for receiving AMQP clients * Possibly poor end-to-end throughput in auto-ack mode because the queue would send a batch of messages followed by a FLOW containing the advanced delivery-count. Only therafter the client will learn that it ran out of credits and top-up again. This feels like synchronously pulling a batch of messages. In contrast, option 2 sends out more messages as soon as the previous messages left RabbitMQ without requiring again a credit top up from the receiver. * drain mode with large credits requires the queue to send all available messages and only thereafter advance the delivery-count. Therefore, drain mode breaks option 3 somewhat. ### Option 4 Session proc drops message payload when its outgoing-pending queue gets too large and re-reads payloads from the queue once the message can be sent (see `get_checked_out` Ra command for quorum queues). Cons: * Would need to be implemented for every queue type, especially classic queues * Doesn't limit the amount of message metadata in the session proc's outgoing-pending queue ### Decision: Option 2 This commit implements option 2 to avoid any queue type modification. At most one credit request is in-flight between session process and queue process for a given queue consumer. If the AMQP client sends another FLOW in between, the session proc stashes the FLOW until it processes the previous credit reply. A delivery is only sent from the outgoing-pending queue if the session proc is not blocked by 1. writer proc, or 2. remote-incoming-window The credit reply is placed into the outgoing-pending queue. This ensures that the session proc will only top up the next batch of credits if sufficient messages were sent out to the writer proc. A future commit could additionally have each queue limit the number of unacked messages for a given AMQP consumer, or alternatively make use of session outgoing-window.	2024-06-04 13:11:55 +02:00
David Ansari	bd847b8cac	Put credit flow config into persistent term Put configuration credit_flow_default_credit into persistent term such that the tuple doesn't have to be copied on the hot path. Also, change persistent term keys from `{rabbit, AtomKey}` to `AtomKey` so that hashing becomes cheaper.	2024-05-31 16:20:51 +02:00
Jean-Sébastien Pédron	a33768471f	Merge pull request #10088 from rabbitmq/convert-rabbit_writer-to-gen_server rabbit_writer: Convert to a regular gen_server	2024-05-27 10:24:32 +02:00
Michael Klishin	ca094402a9	Make rabbit_misc:which_applications/0 more resilient In certain shutdown scenarios this function on Erlang 26 runs into exceptions that stem from application_controller. The objective of this function is to be an exception-safe version of application:which_applications/1, so let's handle more cases. This helps certain test suites avoid exceptions (process crash reports) logged during shutdown, which makes CT helpers fail test run even though there were no exceptions in RabbitMQ itself, and all the exception indicates is a certain edge case (during system shutdown) that application_controller does not care to handle.	2024-05-24 19:56:58 -04:00
Jean-Sébastien Pédron	7849f143d8	rabbit_writer: Convert to a regular gen_server [Why] This process failed to implement properly the OTP principles. For instance, the mainloop always kept a reference on the module because it was not tail-recursive. This prevents the module from being reloaded at runtime: because the process always keep that reference on the module, it is killed by the Code server as part of the code reloading.	2024-05-24 11:47:25 +02:00
Michal Kuratczyk	cfa3de4b2b	Remove unused imports (thanks elp!)	2024-05-23 16:36:08 +02:00
Roberto Aloi	04800b9cdf	Wrap maybe function in quotes, so it is treated as an atom	2024-05-16 14:04:27 +02:00
Jean-Sébastien Pédron	3147ab7d47	rabbit_peer_discovery: Allow backends to select the node to join themselves [Why] Before, the backend would always return a list of nodes and the subsystem would select one based on their uptimes, the nodes they are already clustered with, and the readiness of their database. This works well in general but has some limitations. For instance with the Consul backend, the discoverability of nodes depends on when each one registered and in which order. Therefore, the node with the highest uptime might not be the first that registers. In this case, the one that registers first will only discover itself and boot as a standalone node. However, the one with the highest uptime that registered after will discover both nodes. It will then select itself as the node to join because it has the highest uptime. In the end both nodes form distinct clusters. Another example is the Kubernetes backend. The current solution works fine but it could be optimized: the backend knows we always want to join the first node ("$node-0") regardless of the order in which they are started because picking the first node alphabetically is fine. Therefore we want to let the backend selects the node to join if it wants. [How] The `list_nodes()` callback can now return the following term: {ok, {SelectedNode :: node(), NodeType}} If the subsystem sees this return value, it will consider that the returned node is the one to join. It will still query properties because we want to make sure the node's database is ready before joining it.	2024-05-14 09:40:44 +02:00
Jean-Sébastien Pédron	cb9f0d8a44	rabbit_peer_discovery: Register node before running discovery [Why] The two backends that use registration are Consul and etcd. The discovery process relies on the registered nodes: they return whatever was previously registered. With the new checks and failsafes added in peer discovery in RabbitMQ 3.13.0, the fact that registration happens after running discovery breaks Consul and etcd backend. It used to work before because the first node would eventually time out waiting for a non-empty list of nodes from the backend and proceed as a standalone node, registering itself on the way. Following nodes would then discover that first node. Among the new checks, the node running discovery expects to find itself in the list of discovered nodes. Because it didn't register yet, it will never find itself. [How] The solution is to register first, then run discovery. The node should at least get itself in the discovered nodes.	2024-05-14 09:40:44 +02:00

1 2 3 4 5 ...

1268 Commits