If phases are not linked, they can continue running *after* the coordinator has
been stopped. This could (or not) lead to unexpected behaviours,
but for sure it makes debugging harder.
Default gen_server timeout is not enough to list busy connections.
Setting it to infinity allows the caller to decide the timeout,
as classic queues do. The `emit_info` function's family sets its
own timeout for all the cli commands.
This line is printed on every new MQTT connection which leads to very chatty logs when there is a lot of connections. Given that the way MQTT uses vhosts is generally static (once set up, always the same for all connections), I think this can be a debug message instead.
Recovering from an existing queue is fine but if a node is restarted when
there are no longer stream queues on the system, the recovery process won't
restart the pre-existing coordinator as that's only performed on queue recovery.
The first attempt to declare a new stream queue on this cluster will crash with
`coordinator unavailable` error, as it only restarts the local coordinator
and not the whole ra cluster, thus lacking quorum.
Recovering the coordinator during the boot process ensures that a pre-existing
coordinator cluster is restarted in any case, and does nothing if there was
never a coordinator on the node.
When N cluster nodes are configured to import the same definition
file that includes Shovel declarations, there is a natural race
condition that can lead to Shovels being started on multiple
nodes under the same name.
See #3154 for the background.
Other tests (that produce flakes) arguably test classic mirrored
queues, a deprecated feature reasonably well
covered in other suites.
Per discussion with @gerhard.
See #3155 for context. It is also possible that someone might
override the path with duplicate values.
Note that the list of directories will lose its original
ordering. That should be fine as we expect unique plugin
paths: it is not a "preference list" (ordering should
not matter).
Closes#3155
Calling ensure_stats_timer after init_stats_timer and reset_stats_timer
is enough.
The idea is to call stop_stats_timer before hibernation and
ensure_stats_timer on wakeup. However, since we never call
stop_stats_timer in rabbit_stream_reader, we don't need to call
ensure_stats_timer on every network activity.
Before this commit test AlarmsTest.diskAlarmShouldNotPreventConsumption
of the Java client was failing.
When executing that test, the server failed with:
2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0> exception exit: {unexpected_message,resume}
2021-06-25 16:11:02.886935+02:00 [error] <0.1301.0> in function rabbit_heartbeat:heartbeater/3 (src/rabbit_heartbeat.erl, line 138
because the heartbeater was tried to be resumed without being paused
before.
Above exception exit also happens on master branch when executing this
test. However, the test falsely succeeds on master because the following FIXME was
never implemented:
8e569ad8bf/deps/rabbitmq_stream/src/rabbit_stream_reader.erl (L778)
This is pure refactoring - no functional change.
Benefits:
* code is more maintainable
* smaller methods (instead of previous 350 lines listen_loop_post_auth function)
* well defined state transitions (e.g. useful to enforce authentication protocol)
* we get some gen_statem helper functions for free (e.g. debug utilities)
Useful doc: https://ninenines.eu/docs/en/ranch/2.0/guide/protocols/
This is purely for information purposes. We have often seen users
copy-paste console output when reporting issues, and they miss the most
important information: Erlang & SSL library versions. After this change,
we are less likely to have ask about the Erlang & SSL library versions
via follow-up questions.
emu_flavor was introduced in Erlang 24. Erlang 23 has not JIT so we can
always return "emu".
FWIW, we discourage putting new code in rabbitmq_common as this is meant
to be shared between the broker and the client. Also, keeping the
function definitions close to where they are called is a good general
practice.
We discussed the benefits of having the OS line with @dumbbell &
@gerhard and concluded that it's not worth the effort.
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
All these metrics, except publishers & consumers, are handled by
rabbitmq_global_metrics, so we currently have duplicates. As I started
removing these, I realised that tests were written in Java - why not
Erlang? - and they seemed way too complicated for what was needed. After
the new rabbitmq_global_metrics, we are left with 2 metrics, and all the
extra code simply doesn't justify them. I am proposing that we add them to
rabbit_global_counters as gauges. Let's discuss @dcorbacho @acogoluegnes
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.
The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.
This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.
We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.
While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.
Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.
Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
For a large number of bindings, this implementation reduces the recovery
time from minutes to seconds. rabbit_binding:recover/2 is the second
operation that may take minutes but it can be improved separately.
Some tests would make rpc calls, ignoring the result. Since erpc
throws errors, this broke some tests. This commit restores the
non-throw behavior to add_vhost, delete_vhost, add_user,
set_user_tags, delete_user & clear_permissions.
Most tests that can start rabbitmq nodes have some chance of
flaking. Rather than chase individual flakes for now, this commit
changes the default (though it can still be overriden, as is the case
for config_scheme_SUITE in many places, since I have yet to see that
particular suite flake).
In case removed node hosts a leader, it takes a moment for
the QQ to elect a new one and begin accepting cluster
membership change operations again.
(cherry picked from commit a9d8816c6a)
Mark per_user_connection_channel_tracking_SUITE:cluster_size_2_network
as not mixed version compatible.
In a mixed 3.8/3.9 cluster, changes to rabbit_core_ff.erl imply that
some feature flag related migrations cannot occur, and therefore
user_limits cannot be enabled as required by the test
quorum_unaffected_after_vhost_failure isn't mixed versions compatible as
it tries to declare a queue in a mixed cluster from a node running RA 1.x where all other
nodes are running Ra 2.0.
simple_confirm_availability_on_leader_change can't be made forwards compatible
as when running in mixed mode the queue declaration happens on an old node in
a cluster of mostly new nodes. As new nodes run Ra 2.0 and Ra 1.x does not know
how to create members on Ra 2.0 nodes this test fails. This is an acceptable limitation
for a transient mixed versions cluster.
This reverts commit 0f94046b37, reversing
changes made to 37f5744833.
The value would be rejected by rabbitmq.conf schema validation.
It can be set to `undefined` via `rabbit.consumer_timeout` in
`advanced.config`.
This is a fix for an issue that occurs when shutting down
a node (via SIGTERM) while the queues and more specifically
the queue index is recovering. When that happens
rabbit_recovery_terms has already started, and when
it starts it calls dets:open_file/2 which creates an
empty recovery.dets file. After the node is down and
restarted again, the node thinks the shutdown was clean
because the recovery file is there, except it is empty
and therefore the queues have lost all their state.
This results in RabbitMQ thinking there are 0 messages
in all classic queues.
To avoid this issue, we consider a shutdown to be dirty
in the case where we have a recovery file BUT we do not
find our state in the recovery terms.
To reliably reproduce the issue this fixes:
* Start a node
* Fill it with many messages (800k is more than enough)
* Wait a little and then kill the node via Ctrl+C twice
(to force dirty recovery next start)
* Start the node again
* While it says "Starting broker", after waiting
about 5 seconds, send a SIGTERM (killall beam.smp)
to shutdown the node "cleanly"
* Start the node again
* Management will show 0 messages in all classic queues
Before this commit sending garbarge data to the server stream port
caused the RabbitMQ node to eat more and more memory.
In this commit, we fix it by expecting the client to go through the
proper authentication sequence. Otherwise, the server closes the socket.
Co-authored-by: Michal Kuratczyk <mkuratczyk@pivotal.io>
Now that a policy overwrites queue arguments, running policy tests in
parallel with other tests leads to non-deterministic test results with
some tests randomly failing.
We've decided that policy values such as max-length-bytes and max-age
should take precedence over arguments provided during stream
declaration. Previously the lower of the values was used which meant
that a stream retention period could not be extended beyond the initial
definition. Since streams are long lived, retention requirements can
change as the system evolve and since they contain data, deleting them
and redeclaring with different arguments not feasible. With this change,
retention can always be adjusted through a policy.
When a federated message is processed, certain headers are added,
cluster_name is called for every message.
This every single time creates an UDP port and queries into the
inet_drv to get the hostname.
Now we first try to look it up in the database instead of
calculating a default which will not be needed.
On initial cluster formation, only one node in a multi node cluster
should initialize the Mnesia database schema (i.e. form the cluster).
To ensure that for nodes starting up in parallel,
RabbitMQ peer discovery backends have used
either locks or randomized startup delays.
Locks work great: When a node holds the lock, it either starts a new
blank node (if there is no other node in the cluster), or it joins
an existing node. This makes it impossible to have two nodes forming
the cluster at the same time.
Consul and etcd peer discovery backends use locks. The lock is acquired
in the consul and etcd infrastructure, respectively.
For other peer discovery backends (classic, DNS, AWS), randomized
startup delays were used. They work good enough in most cases.
However, in https://github.com/rabbitmq/cluster-operator/issues/662 we
observed that in 1% - 10% of the cases (the more nodes or the
smaller the randomized startup delay range, the higher the chances), two
nodes decide to form the cluster. That's bad since it will end up in a
single Erlang cluster, but in two RabbitMQ clusters. Even worse, no
obvious alert got triggered or error message logged.
To solve this issue, one could increase the randomized startup delay
range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster
formation very slow since it will take up to 3 minutes until
every node is ready. In rare cases, we still end up with two nodes
forming the cluster.
Another way to solve the problem is to name a dedicated node to be the
seed node (forming the cluster). This was explored in
https://github.com/rabbitmq/cluster-operator/pull/689 and works well.
Two minor downsides to this approach are: 1. If the seed node never
becomes available, the whole cluster won't be formed (which is okay),
and 2. it doesn't integrate with existing dynamic peer discovery backends
(e.g. K8s, AWS) since nodes are not yet known at deploy time.
In this commit, we take a better approach: We remove randomized startup
delays altogether. We replace them with locks. However, instead of
implementing our own lock implementation in an external system (e.g. in K8s),
we re-use Erlang's locking mechanism global:set_lock/3.
global:set_lock/3 has some convenient properties:
1. It accepts a list of nodes to set the lock on.
2. The nodes in that list connect to each other (i.e. create an Erlang
cluster).
3. The method is synchronous with a timeout (number of retries). It
blocks until the lock becomes available.
4. If a process that holds a lock dies, or the node goes down, the lock
held by the process is deleted.
The list of nodes passed to global:set_lock/3 corresponds to the nodes
the peer discovery backend discovers (lists).
Two special cases worth mentioning:
1. That list can be all desired nodes in the cluster
(e.g. in classic peer discovery where nodes are known at
deploy time) while only a subset of nodes is available.
In that case, global:set_lock/3 still sets the lock not
blocking until all nodes can be connected to. This is good since
nodes might start sequentially (non-parallel).
2. In dynamic peer discovery backends (e.g. K8s, AWS), this
list can be just a subset of desired nodes since nodes might not startup
in parallel. That's also not a problem as long as the following
requirement is met: "The peer disovery backend does not list two disjoint
sets of nodes (on different nodes) at the same time."
For example, in a 2-node cluster, the peer discovery backend must not
list only node 1 on node 1 and only node 2 on node 2.
Existing peer discovery backends fullfil that requirement because the
resource the nodes are discovered from is global.
For example, in K8s, once node 1 is part of the Endpoints object, it
will be returned on both node 1 and node 2.
Likewise, in AWS, once node 1 started, the described list of instances
with a specific tag will include node 1 when the AWS peer discovery backend
runs on node 1 or node 2.
Removing randomized startup delays also makes cluster formation
considerably faster (up to 1 minute faster if that was the
upper bound in the range).
when force refreshing events. We do that when consumers are registered
online. Inactive consumers in case of SAC queues are still present
and their presence should be broadcast as an internal event.
This also simplifies the code updated for #3072.
Per discussion with @pjk25.
A queue that had a SAC but currently has none (e.g. because client
connection failed) should not fail with a badmatch.
Clearing the holder field will be investigated separately.
Closes#3072
So, do what RabbitMQ core does as of 3.6.7 or so.
This makes it possible for nodes with those plugins enabled to be
restarted in arbitrary order within a certain time window, just
like nodes without those plugins.
This is an important metric to keep track of and be aware (maybe even
alert on) when consumers fall behind consuming stream messages. While
they should be able to catch up, if they fall behind too much and the
stream gets truncated, they may miss on messages.
This is something that we want to expose via Prometheus metrics as well,
but we've started closer to the core, CLI & Management.
This should be merged as soon as it passes CI, we shouldn't wait on the
Prometheus changes - they can come later.
Pair: @kjnilsson
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Because it runs in the context of a supervisor that does very
little work, the supervisor would end up with MiBs of memory
allocated and never freed. An explicit GC frees memory completely
and the process is no longer prominent in tools like observer_cli.
x-stream-offset supports "friendly" relative timebase specifications
such as 100s. A recent change introduced a validation of the x-stream-offset
that disallowed such specs.
This introduces a backup mechanism that can be controlled
by plugins via policies.
Benchmarks suggest the cost of this change on
Erlang 24 is well under 1%. With a stream target, it is less
than routing to one extra queue of the same type (e.g. a quorum queue).
Previously the restart intensity was 100 and period 50. This process was
prone to crashing in the past, so it does not make sense to not restart it
if anything happens during the gathering of statistics.
Related: #2850