... and cache it.
[Why]
It happens at least in CI that the computed start time varies by a few
seconds. I think this comes from the Erlang time offset which might be
adjusted over time.
This affects peer discovery's sorting of RabbitMQ nodes which uses that
start time to determine the oldest node. When the start time of a node
changes, it could be considered the seed node to join by some nodes but
ignored by the other nodes, leading to troubles with cluster formation.
[Why]
It happens in CI from time to time and it was crashing the channel
process. There is always a `channel.close` method pending in the
channel mailbox.
[How]
For now, log something and ignore the DOWN message. The channel will
exit after handling the pending `channel.close` method anyway.
* Implement rabbitmq-queues leader_health_check command for quorum queues
(cherry picked from commit c26edbef33)
* Tests for rabbitmq-queues leader_health_check command
(cherry picked from commit 6cc03b0009)
* Ensure calling ParentPID in leader health check execution and
reuse and extend formatting API, with amqqueue:to_printable/2
(cherry picked from commit 76d66a1fd7)
* Extend core leader health check tests and update badrpc error handling in cli tests
(cherry picked from commit 857e2a73ca)
* Refactor leader_health_check command validators and ignore vhost arg
(cherry picked from commit 6cf9339e49)
* Update leader_health_check_command description and banner
(cherry picked from commit 96b8bced2d)
* Improve output formatting for healthy leaders and support
silent mode in rabbitmq-queues leader_health_check command
(cherry picked from commit 239a69b404)
* Support global flag to run leader health check for
all queues in all vhosts on local node
(cherry picked from commit 48ba3e161f)
* Return immediately for leader health checks on empty vhosts
(cherry picked from commit 7873737b35)
* Rename leader health check timeout refs
(cherry picked from commit b7dec89b87)
* Update banner message for global leader health check
(cherry picked from commit c7da4d5b24)
* QQ leader-health-check: check_process_limit_safety before spawning leader checks
(cherry picked from commit 17368454c5)
* Log leader health check result in broker logs (if any leaderless queues)
(cherry picked from commit 1084179a2c)
* Ensure check_passed result for leader health internal calls)
(cherry picked from commit 68739a6bd2)
* Extend CLI format output to process check_passed payload
(cherry picked from commit 5f5e9922bd)
* Format leader healthcheck result log and function exports
(cherry picked from commit ebffd7d8a4)
* Change leader_health_check command scope from queues to diagnostics
(cherry picked from commit 663fc9846e)
* Update (c) line year
(cherry picked from commit df82f12a70)
* Rename command to check_for_quorum_queues_without_an_elected_leader
and use across_all_vhosts option for global checks
(cherry picked from commit b2acbae28e)
* Use rabbit_db_queue for qq leader health check lookups
and introduce rabbit_db_queue:get_all_by_type_and_vhost/2.
Update leader health check timeout to 5s and process limit
threshold to 20% of node's process_limit.
(cherry picked from commit 7a8e166ff6)
* Update tests: quorum_queue_SUITE and rabbit_db_queue_SUITE
(cherry picked from commit 9bdb81fd79)
* Fix typo (cli test module)
(cherry picked from commit 615856853a)
* Small refactor - simpler final leader health check result return on function head match
(cherry picked from commit ea07938f3d)
* Clear dialyzer warning & fix type spec
(cherry picked from commit a45aa81bd2)
* Ignore result without strict match to avoid diayzer warning
(cherry picked from commit bb43c0b929)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' documentation edits
(cherry picked from commit 845230b0b380a5f5bad4e571a759c10f5cc93b91)
* 'rabbitmq-diagnostics check_for_quorum_queues_without_an_elected_leader' output copywriting
(cherry picked from commit 235f43bad58d3a286faa0377b8778fcbe6f8705d)
* diagnostics check_for_quorum_queues_without_an_elected_leader: behave like a health check w.r.t. error reporting
(cherry picked from commit db7376797581e4716e659fad85ef484cc6f0ea15)
* check_for_quorum_queues_without_an_elected_leader: handle --quiet and --silent
plus simplify function heads.
References #13433.
(cherry picked from commit 7b392315d5e597e5171a0c8196230d92b8ea8e92)
---------
Co-authored-by: Ayanda Dube <adube14@bloomberg.net>
observer_cli (and its dependency recon) was declared as a dependency
of rabbitmq_cli and as a consequence included in all escritps. However
the major part of observer_cli runs in the broker. The cli side only
used `observer_cli:rpc_start/2` which is just an rpc call into the
target node.
By using common rpc call we can remove observer_cli and recon from the
escripts. This can be considered a minor improvement based on the
philosophy "simpler is better".
As an additional benefit auto-completing functions of the recon app
now works in `rabbitmq-diagnostics remote_shell`.
(eg. `recon:proc_c<TAB>`)
CI sometimes failed with the following error:
```
v5_SUITE:session_upgrade_v3_v5_qos failed on line 1068
Reason: {test_case_failed,Received unexpected PUBLISH payload. Expected: <<"2">> Got: <<"3">>}
```
The emqtt client auto acks by default.
Therefore, if Subv3 client was able to successfully auto ack message 2
before Subv3 disconnected, Subv5 client did not receive message 2.
This commit fixes this flake by making sure that Subv3 does not ack
message 2.
Fix crash in close_sent since the client might receive the open frame if
it previously sent the close frame in state open_sent.
We explicitly ignore the open frame. The alternative is to add another
gen_statem state CLOSE_PIPE which might be an overkill however.
This commit also fixes a wrong comment: No sessions have begun if the
app requests the connection to be closed in state open_sent.
[Why]
This testsuite is very unstable and it is difficult to debug while it is
part of a `parallel-ct` group. It also forced us to re-run the entire
`parallel-ct` group just to retry that one testsuite.
The `buffer` socket option will be changed dynamically
based on how much data is received.
This is restricted to AMQP protocols (old and 1.0).
The algorithm is a little different than Cowboy 2.13.
The moving average is less reactive (div 8 instead of 2)
and floats are used so that using smaller lower buffer
values is possible (otherwise the rounding prevents
increasing buffer sizes). The lower buffer size was
set to 128 as a result.
Compared to the previous which was to set `buffer` to
`rcvbuf` effectively, often to 131072 on Linux for
example, the performance sees a slight improvement
in various scenarios for all message sizes using
AMQP-0.9.1 and a lower memory usage as well. But
the difference is small in the benchmarks we have
run (5% to 10%), whereas Cowboy saw a huge improvement
because its default was very small (1460).
For AMQP-1.0 this seems to be no worse but we didn't
detect a clear improvement. We saw scenarios where
small message sizes showed improvement, and large
message sizes showed a regression. But we are even
less confident with these results. David (AMQP-1.0
native developer) ran a few tests and didn't see a
regression.
The dynamic buffer code is currently identical for
old and 1.0 AMQP. But we might tweak them differently
in the future so they're left as duplicate for now.
This is because different protocols have different
behaviors and so the algorithm may need to be tweaked
differently for each protocol.
The `msg` record was used in 3.13. This commit makes 4.x understand
this record for backward compatibility, specifically for the rare case where:
1. a 3.13 node internally parsed a message from a stream via
```
Message = mc:init(mc_amqp, amqp10_framing:decode_bin(Bin), #{})
```
2. published this Message to a queue
3. RabbitMQ got upgraded to 4.x
(This commit can be reverted in some future RabbitMQ version once it's
safe to assume that these upgraded messages have been consumed.)
The changes were manually tested as described in Jira RMQ-1525.
We must consider whether the previous current file is empty
(has data written, but was already removed) when writing
large messages and opening a file specifically for the large
message. If we don't, then the file will never get deleted
as we only consider files for deletion when a message gets
removed (and there are none).
This is only an issue for large messages. Small messages
write a message than roll over to a new file, so there is
at least one valid message. Large messages close the current
file first, regardless of there being a valid message.
Prior to this commit, if the WebSocket client received multiple
WebSocket frames in a single Erlang message by gen_tcp, the WebSocket
client sent only the first received WebSocket frame to the application.
This commit fixes this bug by having the WebSocket client send all
WebSocket frames to the application.
The `rabbit_registry` boot step starts up the `rabbit_registry` gen
server from `rabbit_common`. This is a registry somewhat similar to
the feature flag registry - it's meant to protect an ETS table used for
looking up implementers of behaviors. The registry and its ETS table
should be available as early as possible: the step should enable
external_infrastructure rather than require it.
The previous behaviour was passing solely the message ID making
queue implementations such as, for example, the priority one hard
to fulfil.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit 1f7a27c51d)
Since https://github.com/rabbitmq/rabbitmq-server/pull/13242 updated
Cowlib to v2.14.0, this commit deletes rabbit_uri as written in the
comments of rabbit_uri.erl:
```
This file is a partial copy of
https://github.com/ninenines/cowlib/blob/optimise-urldecode/src/cow_uri.erl
We use this copy because:
1. uri_string:unquote/1 is lax: It doesn't validate that characters that are
required to be percent encoded are indeed percent encoded. In RabbitMQ,
we want to enforce that proper percent encoding is done by AMQP clients.
2. uri_string:unquote/1 and cow_uri:urldecode/1 in cowlib v2.13.0 are both
slow because they allocate a new binary for the common case where no
character was percent encoded.
When a new cowlib version is released, we should make app rabbit depend on
app cowlib calling cow_uri:urldecode/1 and delete this file (rabbit_uri.erl).
```
Cowboy 2.13 contains the Websocket optimisations as well
as the ability to set the Websocket max_frame_size option
dynamically, plus plenty of other improvements.
Cowlib was added as a test dep to rabbitmq_mqtt to make
sure emqtt doesn't pull the wrong Cowlib version for Cowboy.
Only large messages delivered to multiple CQs are stored once for
multiple queues.
Non-durable queues are deprecated and will be removed, so don't even
mention them.
We don't "page out" messages anymore.
When trying to use OTP28.0-rc1, Elixir fails to compile these modules
because a module attribute cannot be a regex. It is not yet clear
whether it's something to be fixed in Elixir for OTP28 compatibility
or something that accidentally worked in the past, but either way,
using a string as an attribute is equally good and works all OTP
versions, including OTP28.0-rc1.
```
== Compilation error in file lib/rabbitmq/cli/core/command_modules.ex ==
** (ArgumentError) cannot inject attribute @commands_ns into function/macro because cannot escape #Reference<0.2201422310.1333657602.13657>. The supported values are: lists, tuples, maps, atoms, numbers, bitstrings, PIDs and remote functions in the format &Mod.fun/arity
(elixir 1.18.2) lib/kernel.ex:3729: Kernel.do_at/5
(elixir 1.18.2) expanding macro: Kernel.@/1
lib/rabbitmq/cli/core/command_modules.ex:133: RabbitMQ.CLI.Core.CommandModules.make_module_map/2
```
Since 4.0.0 (commit d45fbc3d) the shared message store writes large
messages into their own rdq files. This information can be utilised
when scanning rdq files during recovery to avoid reading in the whole
message body into memory unnecessarily.
This commit addresses the same issue that was addressed in 3.13.x by
commit baeefbec (ie. appending a large binary together from 4MB chunks
leaves a lot of garbage and memory fragmentation behind) but even more
efficiently.
Large messages which were written before 4.0.0, which don't fully fill
the rdq file, are still handled as before.
```
make -C deps/rabbit ct-rabbit_stream_queue t=cluster_size_3_parallel_1 RABBITMQ_METADATA_STORE=mnesia
```
flaked prior to this commit locally on Ubuntu with the following error after 11 runs:
```
rabbit_stream_queue_SUITE > cluster_size_3_parallel_1 > consume_from_replica
{error,
{{shutdown,
{server_initiated_close,406,
<<"PRECONDITION_FAILED - stream queue 'consume_from_replica' in vhost '/' does not have a running replica on the local node">>}},
{gen_server,call,
[<0.8365.0>,
{subscribe,
{'basic.consume',0,<<"consume_from_replica">>,
<<"ctag">>,false,false,false,false,
[{<<"x-stream-offset">>,long,0}]},
<0.8151.0>},
infinity]}}}
```
Initialising a message container from data stored in a
stream is a special case where we need to recover exchange
and routing key information from the following message annatations:
* x-exchange
* x-routing-keys
* x-cc
We do not want to do this when initialising a message container
from AMQP data just received from a publisher.
This commit introduces a new function `mc_amqp:init_from_stream/2`
that is to be used when needing a message container from a stream
message.
[Why]
During mixed-version testing, the old node might not be able to join or
rejoin a cluster if the other nodes run a newer Khepri machine version.
[How]
The old node is used as the cluster seed node and is never touched
otherwise. Other nodes are restarted or join the cluster later.
## What?
Support the `dynamic` field of sources and targets.
## Why?
1. This allows AMQP clients to dynamically create exclusive queues, which
can be useful for RPC workloads.
2. Support creation of JMS temporary queues over AMQP using the Qpid JMS
client. Exclusive queues map very nicely to JMS temporary queues
because:
> Although sessions are used to create temporary destinations, this is only
for convenience. Their scope is actually the entire connection. Their
lifetime is that of their connection and any of the connection’s sessions
are allowed to create a consumer for them.
https://jakarta.ee/specifications/messaging/3.1/jakarta-messaging-spec-3.1#creating-temporary-destinations
## How?
If the terminus contains the capability `temporary-queue` as defined in
[amqp-bindmap-jms-v1.0-wd10](https://groups.oasis-open.org/higherlogic/ws/public/document?document_id=67638)
[5.2] and as sent by Qpid JMS client,
RabbitMQ will create an exclusive queue.
(This allows a future commit to take other actions if capability
`temporary-topic` will be used, such as the additional creation of bindings.)
No matter what the desired node properties are, RabbitMQ will set the
lifetime policy delete-on-close deleting the exclusive queue when the
link which caused its creation ceases to exist. This means the exclusive
queue will be deleted if either:
* the link gets detached, or
* the session ends, or
* the connection closes
Although the AMQP JMS Mapping and Qpid JMS create only a **sending** link
with `dynamic=true`, this commit also supports **receiving** links with
`dynamic=true` for non-JMS AMQP clients.
RabbitMQ is free to choose the generated queue name. As suggested by the
AMQP spec, the generated queue name will contain the container-id and link
name unless they are very long.
Co-authored-by: Arnaud Cogoluègnes <acogoluegnes@gmail.com>
Make AMQP 1.0 connection shut down its sessions before sending the
close frame to the client similar to how the AMQP 0.9.1 connection
shuts down its channels before closing the connection.
This commit avoids concurrent deletion of exclusive queues by the session process
and the classic queue process.
This commit should also fix https://github.com/rabbitmq/rabbitmq-server/issues/2596
* Separate invalid client test from the valid one
* Apply same changes from pr #13197
* Deal with stalereferences caused by timing issues
looking up objects in the DOM
* Unlink before assertion
... are being used at the same time.
[Why]
Depending on which node clusters with which, a node running an older
version of the Khepri Ra machine may not be able to apply Ra commands
and could be stuck.
There is no real solution and this clearly an unsupported scenario. An
old node won't always be able to join a newer cluster.
[How]
In the testsuites, we skip clustering tests if we detect that multiple
Khepri Ra machine versions are being used.
The following test flaked in CI under Khepri in mixed version mode:
```
make -C deps/rabbitmq_mqtt ct-v5 t=cluster_size_3:will_delay_node_restart RABBITMQ_METADATA_STORE=khepri SECONDARY_DIST=rabbitmq_server-4.0.5 FULL=1
```
The first node took exactly 30 seconds for draining:
```
2025-02-10 15:00:09.550824+00:00 [debug] <0.1449.0> MQTT accepting TCP connection <0.1449.0> (127.0.0.1:33376 -> 127.0.0.1:27005)
2025-02-10 15:00:09.550992+00:00 [debug] <0.1449.0> Received a CONNECT, client ID: sub0, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.551134+00:00 [debug] <0.1449.0> MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.551219+00:00 [debug] <0.1449.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.551530+00:00 [info] <0.1449.0> Accepted MQTT connection 127.0.0.1:33376 -> 127.0.0.1:27005 for client ID sub0
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> Received a SUBSCRIBE with subscription(s) [{mqtt_subscription,<<"my/topic">>,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> {mqtt_subscription_opts,0,false,
2025-02-10 15:00:09.551651+00:00 [debug] <0.1449.0> false,0,undefined}}]
2025-02-10 15:00:09.556233+00:00 [debug] <0.896.0> RabbitMQ metadata store: follower leader cast - redirecting to {rabbitmq_metadata,'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'}
2025-02-10 15:00:09.561518+00:00 [debug] <0.1456.0> MQTT accepting TCP connection <0.1456.0> (127.0.0.1:33390 -> 127.0.0.1:27005)
2025-02-10 15:00:09.561634+00:00 [debug] <0.1456.0> Received a CONNECT, client ID: will, username: undefined, clean start: true, protocol version: 5, keepalive: 60, property names: ['Session-Expiry-Interval']
2025-02-10 15:00:09.561715+00:00 [debug] <0.1456.0> MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 picked vhost using plugin_configuration_or_default_vhost
2025-02-10 15:00:09.561828+00:00 [debug] <0.1456.0> User 'guest' authenticated successfully by backend rabbit_auth_backend_internal
2025-02-10 15:00:09.562596+00:00 [info] <0.1456.0> Accepted MQTT connection 127.0.0.1:33390 -> 127.0.0.1:27005 for client ID will
2025-02-10 15:00:09.565743+00:00 [warning] <0.1460.0> This node is being put into maintenance (drain) mode
2025-02-10 15:00:09.565833+00:00 [debug] <0.1460.0> Marking the node as undergoing maintenance
2025-02-10 15:00:09.570772+00:00 [info] <0.1460.0> Marked this node as undergoing maintenance
2025-02-10 15:00:09.570904+00:00 [info] <0.1460.0> Asked to suspend 9 client connection listeners. No new client connections will be accepted until these listeners are resumed!
2025-02-10 15:00:09.572268+00:00 [warning] <0.1460.0> Suspended all listeners and will no longer accept client connections
2025-02-10 15:00:09.572317+00:00 [warning] <0.1460.0> Closed 0 local client connections
2025-02-10 15:00:09.572418+00:00 [warning] <0.1449.0> MQTT disconnecting client <<"127.0.0.1:33376 -> 127.0.0.1:27005">> with client ID 'sub0', reason: maintenance
2025-02-10 15:00:09.572414+00:00 [warning] <0.1000.0> Closed 2 local (Web) MQTT client connections
2025-02-10 15:00:09.572499+00:00 [warning] <0.1456.0> MQTT disconnecting client <<"127.0.0.1:33390 -> 127.0.0.1:27005">> with client ID 'will', reason: maintenance
2025-02-10 15:00:09.572866+00:00 [alert] <0.1000.0> Closed 0 local STOMP client connections
2025-02-10 15:00:09.577432+00:00 [debug] <0.1456.0> scheduled delayed Will Message to topic my/topic for MQTT client ID will to be sent in 10000 ms
2025-02-10 15:00:12.991328+00:00 [debug] <0.1469.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:12.991443+00:00 [debug] <0.1469.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:12.992497+00:00 [debug] <0.1469.0> Done with virtual host processes reconciliation (run 3)
2025-02-10 15:00:16.511733+00:00 [debug] <0.1476.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:16.511864+00:00 [debug] <0.1476.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:16.514293+00:00 [debug] <0.1476.0> Done with virtual host processes reconciliation (run 4)
2025-02-10 15:00:24.897477+00:00 [debug] <0.1479.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:24.897607+00:00 [debug] <0.1479.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:24.898483+00:00 [debug] <0.1479.0> Done with virtual host processes reconciliation (run 5)
2025-02-10 15:00:24.898527+00:00 [debug] <0.1479.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:32.994347+00:00 [debug] <0.1484.0> Will reconcile virtual host processes on all cluster members...
2025-02-10 15:00:32.994474+00:00 [debug] <0.1484.0> Will make sure that processes of 1 virtual hosts are running on all reachable cluster nodes
2025-02-10 15:00:32.996539+00:00 [debug] <0.1484.0> Done with virtual host processes reconciliation (run 6)
2025-02-10 15:00:32.996585+00:00 [debug] <0.1484.0> Will reschedule virtual host process reconciliation after 30 seconds
2025-02-10 15:00:39.576325+00:00 [info] <0.1460.0> Will transfer leadership of 0 quorum queues with current leader on this node
2025-02-10 15:00:39.576456+00:00 [info] <0.1460.0> Leadership transfer for quorum queues hosted on this node has been initiated
2025-02-10 15:00:39.576948+00:00 [info] <0.1460.0> Will stop local follower replicas of 0 quorum queues on this node
2025-02-10 15:00:39.576990+00:00 [info] <0.1460.0> Stopped all local replicas of quorum queues hosted on this node
2025-02-10 15:00:39.577120+00:00 [info] <0.1460.0> Will transfer leadership of metadata store with current leader on this node
2025-02-10 15:00:39.577282+00:00 [info] <0.1460.0> Khepri clustering: transferring leadership to node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577424+00:00 [info] <0.1460.0> Khepri clustering: skipping leadership transfer, leader is already in node 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577547+00:00 [info] <0.1460.0> Leadership transfer for metadata store on this node has been done. The new leader is 'rmq-ct-mqtt-cluster_size_3-2-27054@localhost'
2025-02-10 15:00:39.577674+00:00 [info] <0.1460.0> Node is ready to be shut down for maintenance or upgrade
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0> SIGTERM received - shutting down
2025-02-10 15:00:39.595638+00:00 [notice] <0.64.0>
2025-02-10 15:00:39.595758+00:00 [debug] <0.44.0> Running rabbit_prelaunch:shutdown_func() as part of `kernel` shutdown
```
Running the same test locally revealed that [rabbit_maintenance:status_consistent_read/1](55ae918094/deps/rabbit/src/rabbit_maintenance.erl (L131))
takes exactly 30 seconds to complete.
The test case assumes a Will Delay higher than the time it takes to
drain and shut down the node. Hence, this commit increases the Will
Delay time from 10 seconds to 40 seconds.
That was not keycloak format it was an
extension to the oauth spec introuduced
a few years ago. To get a token from
keycloak using this format, a.k.a.
requesting party token, one has to specify
a different claim type called
urn:ietf:params:oauth:grant-type:uma-ticket
[Why]
Some testcases used to use node 1 as the clustering seed node. With
mixed-version testing, it could cause issues because node 1 would start
with a new version of Ra compared to node 2 and node 2 could fail to
join.
[How]
By using node 2 as the seed node, node 1 running a newer version of Ra
should be able to join because it supports talking to an older version.
[Why]
The `force_reset` command simply removes local files on disk for the
local node.
In the case of Ra, this can't work because the rest of the cluster does
not know about the forced-reset node. Therefore the leader will continue
to send `append_entry` commands to the reset node.
If that forced-reset node restarts and receives these messages, it will
either join the cluster again (because it's on an older Raft term) or it
will hit an assertion and exit (because it's on the same Raft term).
[How]
Given we can't really support this scenario and it has little value, the
command will now return an error if someone attemps a `force_reset` with
a node running Khepri.
This also deprecates the command: once Mnesia support is removed, the
command will be removed at the same time. This is noted in the
rabbitmqctl.8 manpage.
* Redesigned k8s peer discovery
Rather than querying the Kubernetes API, just check the local node name
and try to connect to the pod with `-0` suffix (or configured
`ordinal_start` value). Only the pod with the lowest ordinal can form
a new cluster - all other pods will wait forever.
This should prevent any race conditions and incorrectly formed clusters.
This commit contains the following changes:
1. Simplify .NET suite
2. Simplify Java package naming
3. Extract JMS tests into separate suite. This way, it's easier to run,
debug, and add new tests compared to the previous suite which mixed
.NET tests with JMS tests.
4. Add tests for different JMS message types
for the backends that support it in the first place.
When forming a cluster, registration of the node
joining the cluster might be left to (container)
orchestration tools like Nomad or Kubernetes.
This PR add a new configuration option,
'cluster_formation.registration.enable',
which defaults to true.
When set to false node registration will be skipped.
There is at least one important advantage using a
tool such as Nomad (plus Consul) over the application
(RabbitMQ) doing the registration.
When the application is not stopped gracefully for
any reason, e.g. its OOM killed,
it cannot deregister the service/node.
This leaves behind an unlinked service entry in the registry.
This problem is fundamentally avoided by allowing
Nomad (or similar tools) to register the
node'service.
See #11233#11045 for prior discussions.
Co-authored-by: Frederik Bosch <f.bosch@genkgo.nl>
Consumer count is already returned by the /channels API endpoint. Now
the consumer count column can be shown in the channels table but it is
hidden by default.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
to match that used with Mnesia.
In the case of Mnesia, there are 10 retries
with a 30 second delay each.
For Khepri, a single timeout is used, so it
must be ten times as long.
As described in section 7.1 of filtex-v1.0-wd09:
> Impose a limit on the complexity of each filter expression.
Here, we hard code the maximum properties within a filter expression to 16.
There should never be a use case requiring to filter on more than 16
different properties.
to match that used with Mnesia.
In the case of Mnesia, there are 10 retries
with a 30 second delay each.
For Khepri, a single timeout is used, so it
must be ten times as long.