Commit Graph

59481 Commits

Author SHA1 Message Date
Iliia Khaprov c0e7df7e45
Comment out resource record types, dialyzer goes grazy on r1/2/3 helper types
Also fix tests
2025-06-25 21:29:26 +02:00
Iliia Khaprov dd2ccccbf7
Add stream queue commands fallbacks 2025-06-25 21:29:26 +02:00
Iliia Khaprov 8d36b89f17
Make queue commands generic - i.e. make them work with any queue type 2025-06-25 21:29:24 +02:00
Iliia Khaprov - VMware by Broadcom d685875166
Merge pull request #14134 from lukebakken/rabbitmq-server-14101-3
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Follow up to #14132
2025-06-25 21:29:02 +02:00
Luke Bakken 33cb21ee92
Follow up to #14132
#14132 introduced a small bug in the JSON output that was caught by CI.
2025-06-25 12:01:49 -07:00
Michael Klishin e1b92e41ea
Merge pull request #14132 from lukebakken/rabbitmq-server-14101-2
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Follow-up to 14101
2025-06-25 19:13:26 +04:00
Luke Bakken 00528cb1e8
Follow-up to 14101
Improvement in the code that @the-mikedavis noticed just before #14118 was merged.
2025-06-25 08:04:49 -07:00
Michael Klishin b75fc23770
Merge pull request #14125 from rabbitmq/rabbitmq-server-14087-take-2
Re-submit #14087 by @SimonUnge: introduce an opinionated, opt-in way to prevent a node from booting if it's been reset in the past
2025-06-25 17:43:56 +04:00
Michael Klishin 6c27536777
Wording 2025-06-25 17:42:14 +04:00
Michael Klishin 7876b2df58
Update ct.test.spec 2025-06-25 17:41:18 +04:00
Michael Klishin 74c4ec83df
Don't list a test suite twice in parallel CT suite groups #14087 #14125 2025-06-25 17:39:54 +04:00
Michael Klishin 7ec370347c
Merge pull request #14123 from rabbitmq/rabbitmq-server-14121
By @tomyouyou: Avoid a scary log exception when a closing connection runs into an exception during a command writer flush operation
2025-06-25 17:38:24 +04:00
Michael Klishin d370b529fa
Merge pull request #14118 from lukebakken/rabbitmq-server-14101
Fix JSON output for `rabbitmqctl environment`
2025-06-25 17:05:19 +04:00
Michael Klishin b4a11e61ab
Make dialyzer happy 2025-06-25 16:28:23 +04:00
Michael Klishin 7810b4e018
More renaming #14087, add new test suite to a parallel CT group
(cherry picked from commit 5f1ab1409ff33f51fde535c5ffc22b43b2347a1c)
2025-06-25 16:16:02 +04:00
Simon Unge 8ab2bda4eb
Rename
(cherry picked from commit 77cec4930e)
2025-06-25 16:15:42 +04:00
Simon Unge 1e04b72f6d
Add opt in initial check run
(cherry picked from commit 2d2c70cc7c)
2025-06-25 16:15:35 +04:00
Michael Klishin 9bd0731a5a
Simplify #13121 by @tomyouyou, log it at debug level 2025-06-25 14:28:21 +04:00
Arnaud Cogoluègnes 03d87b5391
Merge pull request #14115 from rabbitmq/stream-partition-test-flake
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Add log in test
2025-06-25 08:10:55 +00:00
Loïc Hoguin 71f5babdde
Merge pull request #14108 from rabbitmq/loic-cq-dont-delete-current-write-file
CQ: Retry opening file when flushing buffers to avoid "DELETE PENDING" issues on Windows
2025-06-25 09:47:18 +02:00
Arnaud Cogoluègnes 066145763f
Add log statements stream network partitions
The test creates network partitions and checks how the stream SAC
coordinator deals with them. It can be flaky on CI, the log statements
should help diagnose the flakiness.
2025-06-25 09:25:07 +02:00
tomyouyou 9e14040456
When the client disconnects, the 'channel' process may generate a large number of exception logs.
When the client disconnects, flushing writer in the termination may result in a large number of exceptions due to the writer being closed.
The exceptions are as follows:

2025-06-24 17:56:06.661 [error] <0.1381.0> ** Generic server <0.1381.0> terminating, ** Last message in was {'$gen_cast',terminate}, ** When Server state == {ch, {conf,running,rabbit_framing_amqp_0_9_1,1, <0.1371.0>,<0.1379.0>,<0.1371.0>, <<"10.225.80.5:50760 -> 10.225.80.6:5673">>, {user,<<"rabbit_inside_user">>,[], [{rabbit_auth_backend_internal, #Fun<rabbit_auth_backend_internal.3.16580688>}]}, <<"/">>, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>, <0.1373.0>, [{<<"authentication_failure_close">>,bool,true}, {<<"connection.blocked">>,bool,true}, {<<"consumer_cancel_notify">>,bool,true}, {<<"need_notify_server_info_with_heartbeat">>,bool, true}], none,5,1800000,#{},infinity,1000000000}, {lstate,<0.1380.0>,false}, none,3, {1, [{pending_ack,2,<<"1">>,-576460618632, {resource,<<"/">>,queue, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>}, 1}], []}, undefined, #{<<"1">> =>, {{amqqueue, {resource,<<"/">>,queue, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>}, false,false,none, [{<<"x-expires">>,signedint,1800000}, {<<"x-queue-type">>,longstr,<<"classic">>}], <0.1385.0>,[],[],[],undefined,undefined,[],[], live,0,[],<<"/">>, #{user => <<"rabbit_inside_user">>, system_creation => 1750758840399767062, recover_on_declare => false, creator =>, {1750758936,"10.225.80.5",50760,"rc.py"}}, rabbit_classic_queue,#{}}, {false,5,false, [{zclient,tuple, {1750758936,"10.225.80.5",50760,"rc.py"}}]}}}, #{{resource,<<"/">>,queue, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>} =>, {1,{<<"1">>,nil,nil}}}, {state,none,30000,undefined}, false,1, {rabbit_confirms,undefined,#{}}, [],[],none,flow,[], {rabbit_queue_type, #{{resource,<<"/">>,queue, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>} =>, {ctx,rabbit_classic_queue, {rabbit_classic_queue,<0.1385.0>,#{}, #{<0.1385.0> => ok}, false}}}}, #Ref<0.2472179985.4173070337.136448>,false, {erlang,#Ref<0.2472179985.4173070337.136063>}, "rc.py",true,0,false,undefined,undefined,undefined, false}, ** Reason for termination == , ** {{shutdown,{writer,send_failed,closed}}, {gen_server,call,[<0.1379.0>,flush,infinity]}},
2025-06-24 17:56:06.665 [error] <0.1381.0>   crasher:, initial call: rabbit_channel:init/1, pid: <0.1381.0>, registered_name: [], exception exit: {{shutdown,{writer,send_failed,closed}}, {gen_server,call,[<0.1379.0>,flush,infinity]}}, in function  gen_server2:terminate/3 (gen_server2.erl, line 1172), ancestors: [<0.1378.0>,<0.1376.0>,<0.1369.0>,<0.1368.0>,<0.1169.0>, <0.1168.0>,<0.1167.0>,<0.1165.0>,<0.1164.0>,rabbit_sup, <0.249.0>], message_queue_len: 1, messages: [{'EXIT',<0.1378.0>,shutdown}], links: [<0.1378.0>], dictionary: [{msg_io_dt_cfg,{1750758936,2}}, {zext_options_dt_cfg,{1750758966,[]}}, {zlog_consumer_dt_cfg,{1750758936,false}}, {channel_operation_timeout,15000}, {rbt_trace_enable,true}, {process_name, {rabbit_channel, {<<"10.225.80.5:50760 -> 10.225.80.6:5673">>,1}}}, {counter_publish_size_dt_cfg,{1750758936,undefined}}, {peer_info, {"10.225.80.5",50760, "10.225.80.5:50760 -> 10.225.80.6:5673 - rc.py:3382128:dfe6ba8d-a42f-4ece-93df-11bff0410814", "rc.py",0}}, {peer_host_port_compname,{"10.225.80.5",50760,"rc.py"}}, {permission_cache_can_expire,false}, {debug_openv_dt_cfg,{1750758936,[]}}, {z_qref_type_dic, [{{resource,<<"/">>,queue, <<"lzz.localdomain_rc.py_reply_89a60f0ef2114da2b3f150ca359ecf46">>}, rabbit_classic_queue}]}, {zconsumer_num,1}, {virtual_host,<<"/">>}, {msg_size_for_gc,458}, {rand_seed, {#{max => 288230376151711743,type => exsplus, next => #Fun<rand.5.65977474>, jump => #Fun<rand.3.65977474>}, [20053568771696737|52030598835932017]}}, {top_queue_msg_dt_cfg, {1750758936, {0,0,0,undefined,false,false,undefined,undefined}}}], trap_exit: true, status: running, heap_size: 4185, stack_size: 28, reductions: 50613, neighbours:,
2025-06-25 14:47:09 +08:00
Luke Bakken 75cd74a2f2
Fix JSON output for `rabbitmqctl environment`
Fixes #14101
2025-06-24 13:20:26 -07:00
Michael Klishin 754352375c
4.1.2 release notes update
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
2025-06-24 17:37:36 +04:00
Michael Klishin 2a034d7464
Merge pull request #14116 from rabbitmq/ra-2.16.11
Ra 2.16.11
2025-06-24 17:35:23 +04:00
Michael Klishin 4691a16af6
Ra 2.16.11
to include rabbitmq/ra#546.
2025-06-24 16:58:43 +04:00
Loïc Hoguin ff8ecf1cf7
CQ: Retry opening write file when flushing buffers
On Windows the file may be in "DELETE PENDING" state following
its deletion (when the last message was acked). A subsequent
message leads us to writing to that file again but we can't
and get an {error,eacces}. In that case we wait 10ms and retry
up to 3 times.
2025-06-24 13:30:40 +02:00
Michael Klishin e019a4e41d
Correct a 4.1.2 release notes formatting issue 2025-06-24 01:16:16 +04:00
Michael Klishin e26fde9086
Initial 4.1.2 release notes 2025-06-24 01:15:10 +04:00
David Ansari 033a87523d Bump ActiveMQ to v6.1.7
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
We've experienced lots of failures in CI:
```
GEN    test/system_SUITE_data/apache-activemq-5.18.3-bin.tar.gz
make: *** [Makefile:65: test/system_SUITE_data/apache-activemq-5.18.3-bin.tar.gz] Error 28
make: Leaving directory '/home/runner/work/rabbitmq-server/rabbitmq-server/deps/amqp10_client'
Error: Process completed with exit code 2.
```

Bumping to the latest ActiveMQ Classic version may or may not help with
these failures.

Either way, we want to test against the latest ActiveMQ version. Version
5.18.3 reached end-of-life and is no longer maintained.
2025-06-23 18:11:42 +02:00
Arnaud Cogoluègnes 5d0823bdc9
Merge pull request #14109 from rabbitmq/stream-coordinator-fix-machine-version
Use module machine version for stream coordinator status
2025-06-23 15:55:07 +00:00
Arnaud Cogoluègnes 4e7e0f0f1d
Support cross-version overview in stream SAC coordinator
When the state comes from V4 and the current module is V5.

References #14106
2025-06-23 17:28:36 +02:00
Arnaud Cogoluègnes 0ca128b80f
Add log message to help diagnose flaky test 2025-06-23 17:28:08 +02:00
Arnaud Cogoluègnes 5042d8eefe
Use module machine version for stream coordinator status
The wrong module was used.
2025-06-23 15:28:32 +02:00
Arnaud Cogoluègnes 6e7058e488
Merge pull request #14106 from rabbitmq/stream-sac-coord-optimization
Miscellaneous minor improvements in stream SAC coordinator
2025-06-23 11:49:26 +00:00
Arnaud Cogoluègnes b4f7d46842
Miscellaneous minor improvements in stream SAC coordinator
This commit handles edge cases in the stream SAC coordinator to make
sure it does not crash during execution. Most of these edge cases
consist in an inconsistent state, so there are very unlikely to happen.

This commit also makes sure there is no duplicate in the consumer list
of a group. Consumers are also now identified only by their connection
PID and their subscription ID, as now the timestamp they contain in
their state does not allow a field-by-field comparison.
2025-06-23 10:47:33 +02:00
Arnaud Cogoluègnes 4bca14a4bb
Merge pull request #14102 from rabbitmq/dependabot/maven/deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot/main/prod-deps-75c31b2fef
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Test Authentication/Authorization backends via mutiple messaging protocols / selenium (chrome, 1.17.3, 27.3) (push) Has been cancelled Details
Test Authentication/Authorization backends via mutiple messaging protocols / summary-selenium (push) Has been cancelled Details
[skip ci] Bump the prod-deps group across 2 directories with 1 update
2025-06-23 07:06:07 +00:00
dependabot[bot] 417714cf62
[skip ci] Bump the prod-deps group across 2 directories with 1 update
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot directory: [org.springframework.boot:spring-boot-starter-parent](https://github.com/spring-projects/spring-boot).
Bumps the prod-deps group with 1 update in the /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot_kotlin directory: [org.springframework.boot:spring-boot-starter-parent](https://github.com/spring-projects/spring-boot).


Updates `org.springframework.boot:spring-boot-starter-parent` from 3.5.0 to 3.5.3
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.5.0...v3.5.3)

Updates `org.springframework.boot:spring-boot-starter-parent` from 3.5.0 to 3.5.3
- [Release notes](https://github.com/spring-projects/spring-boot/releases)
- [Commits](https://github.com/spring-projects/spring-boot/compare/v3.5.0...v3.5.3)

---
updated-dependencies:
- dependency-name: org.springframework.boot:spring-boot-starter-parent
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
- dependency-name: org.springframework.boot:spring-boot-starter-parent
  dependency-version: 3.5.3
  dependency-type: direct:production
  update-type: version-update:semver-patch
  dependency-group: prod-deps
...

Signed-off-by: dependabot[bot] <support@github.com>
2025-06-21 18:26:18 +00:00
Michael Klishin ae3fbbcb0a
Merge pull request #14097 from rabbitmq/fix-federation-makefile
Test (make) / Build and Xref (1.17, 26) (push) Has been cancelled Details
Test (make) / Build and Xref (1.17, 27) (push) Has been cancelled Details
Test (make) / Test (1.17, 27, khepri) (push) Has been cancelled Details
Test (make) / Test (1.17, 27, mnesia) (push) Has been cancelled Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Has been cancelled Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Has been cancelled Details
Test (make) / Type check (1.17, 27) (push) Has been cancelled Details
Federation: update makefile to avoid dialyzer compilation errors
2025-06-20 12:52:50 +04:00
Diana Parra Corbacho 0801e68c14 Federation: update makefile to avoid dialyzer compilation errors
They just happen with a combination of OTP 27.3 and Elixir 1.17
2025-06-19 21:55:06 +02:00
Arnaud Cogoluègnes 72df6270b2
Mention socket is from stream reader in log message
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Has been cancelled Details
2025-06-19 15:50:47 +02:00
Michael Klishin 60be7d8046
Merge commit from fork
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Has been cancelled Details
Test Authentication/Authorization backends via mutiple messaging protocols / selenium (chrome, 1.17.3, 27.3) (push) Has been cancelled Details
Test (make) / Build and Xref (1.17, 26) (push) Has been cancelled Details
Test (make) / Build and Xref (1.17, 27) (push) Has been cancelled Details
Test (make) / Test (1.17, 27, khepri) (push) Has been cancelled Details
Test (make) / Test (1.17, 27, mnesia) (push) Has been cancelled Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Has been cancelled Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Has been cancelled Details
Test (make) / Type check (1.17, 27) (push) Has been cancelled Details
Test Management UI with Selenium / selenium (chrome, 1.17.3, 27.3) (push) Has been cancelled Details
Test Authentication/Authorization backends via mutiple messaging protocols / summary-selenium (push) Has been cancelled Details
Management UI: escape virtual host names in virtual host restart forms
2025-06-17 21:49:30 +04:00
Michael Davis cfce31ef05
management: Sanitize vhost names in restart forms 2025-06-17 11:54:07 -04:00
Jean-Sébastien Pédron 0ee99d5169
Merge pull request #14081 from rabbitmq/remove-symlinks-to-rabbitmq-components.mk
Trigger a 4.2.x alpha release build / trigger_alpha_build (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 26) (push) Waiting to run Details
Test (make) / Build and Xref (1.17, 27) (push) Waiting to run Details
Test (make) / Test (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, khepri) (push) Waiting to run Details
Test (make) / Test mixed clusters (1.17, 27, mnesia) (push) Waiting to run Details
Test (make) / Type check (1.17, 27) (push) Waiting to run Details
Delete symlinks to `erlang.mk` and `rabbitmq-components.mk`
2025-06-17 16:03:29 +02:00
Arnaud Cogoluègnes 3fe705f73e
Merge pull request #13672 from rabbitmq/stream-sac-coordinator-status-instead-of-active-flag
Prevent blocked groups in stream SAC with fine-grained status
2025-06-17 11:25:06 +00:00
Arnaud Cogoluègnes 41acc117bd
Add activate_stream_consumer command
New CLI command to trigger a rebalancing in a SAC group and activate a
consumer. This is a last resort solution if all consumers in a group
accidently end up in {connected, waiting} state.

The command re-uses an existing function, which only picks the consumer
that should be active. This means it does not try to "fix" the state
(e.g. removing a disconnected consumer because its node is definitely
gone from the cluster).

Fixes #14055
2025-06-17 11:56:37 +02:00
Arnaud Cogoluègnes 58f4e83c22
Close stream connection in case of unexpected error from SAC coordinator
Calls to the stream SAC coordinator can fail for various reason
(e.g. a timeout because of a network partition). The stream reader does not
take into account what the SAC coordinator returns and moves on even
in case of errors. This can lead to inconsistent state for SAC groups.

This commit changes this behavior by handling unexpected errors from the
SAC coordinator and closing the connection. The client is expected to
reconnect. This is safer than risking inconsistent state.

Fixes #14040
2025-06-17 11:56:37 +02:00
Arnaud Cogoluègnes a9cf049030
Remove only stream subscriptions affected by down stream member
The clean-up of a stream connection state when a stream member goes down can
remove subscriptions not affected by the member. The subscription state is
removed from the connection, but the subscription is not removed from
the SAC state (if the subscription is a SAC), because the subscription member
PID does not match the down member PID.

When the actual member of the subscription goes down, the subscription is no
longer part of the state, so the clean-up does not find the subscription
and does not remove it from the SAC state. This lets a ghost consumer in
the corresponding SAC group.

This commit makes sure only the affected subscriptions are removed from
the state when a stream member goes down.

Fixes #13961
2025-06-17 11:56:36 +02:00
Arnaud Cogoluègnes d1aab61566
Prevent blocked groups in stream SAC with fine-grained status
A boolean status in the stream SAC coordinator is not enough to follow
the evolution of a consumer. For example a former active consumer that
is stepping down can go down before another consumer in the group is
activated, letting the coordinator expect an activation request that
will never arrive, leaving the group without any active consumer.

This commit introduces 3 status: active (formerly "true"), waiting
(formerly "false"), and deactivating. The coordinator will now know when
a deactivating consumer goes down and will trigger a rebalancing to
avoid a stuck group.

This commit also introduces a status related to the connectivity state
of a consumer. The possible values are: connected, disconnected, and
presumed_down. Consumers are by default connected, they can become
disconnected if the coordinator receives a down event with a
noconnection reason, meaning the node of the consumer has been
disconnected from the other nodes. Consumers can become connected again when
their node joins the other nodes again.

Disconnected consumers are still considered part of a group, as they are
expected to come back at some point. For example there is no rebalancing
in a group if the active consumer got disconnected.

The coordinator sets a timer when a disconnection occurs. When the timer
expires, corresponding disconnected consumers pass into the "presumed
down" state. At this point they are no longer considered part of their
respective group and are excluded from rebalancing decision. They are expected
to get removed from the group by the appropriate down event of a
monitor.

So the consumer status is now a tuple, e.g. {connected, active}. Note
this is an implementation detail: only the stream SAC coordinator deals with
the status of stream SAC consumers.

2 new configuration entries are introduced:
 * rabbit.stream_sac_disconnected_timeout: this is the duration in ms of the
   disconnected-to-forgotten timer.
 * rabbit.stream_cmd_timeout: this is the timeout in ms to apply RA commands
   in the coordinator. It used to be a fixed value of 30 seconds. The
   default value is still the same. The setting has been introduced to
   make integration tests faster.

Fixes #14070
2025-06-17 11:56:20 +02:00
Michael Klishin b8a9cf12c5
Merge pull request #14088 from rabbitmq/rabbit_fifo_min
Avoid list allocation
2025-06-17 12:50:47 +04:00