From the coordinator's POV each stream has a unique id consisting of the
vhost, queuename and a high resolution timestamp even if several stream ids
relate to the same queue record.
When performing the mnesia update the coordinator now checks that the current stream id
matches that of the update_mnesia action and does not change the queue record if
the stream id is not the same.
This should avoid "old" incarnations of a stream queue updating newer ones
with incorrect information.
When the suite passes, it's about 120 seconds total, so 5 minutes per
case seems to be too much. Additionally, if the suite times out at the
bazel level, we get no logs, so the cause of the timeout is unclear.
Avoids multiple calls to `application:get_env` which can be very expensive.
Also limits filter to vhost_msg_stats, as queue_msg_stats are required for
individual queue metrics
So that a reply is sent to the caller immediately after the command has
been processed as intended. Previously it was possible if reply_to was
already set that a reply never was sent to the caller and the caller
times out. This should improve some flakyness in the rabbit_stream_queue suite
as well.
Strictly this is a change that introduces indeterminism in the coordinator
state machine as during an upgrade different members may run different code
for this command. But as this state only affects side effects (replies) and
the state for the streams affected will shortly be removed this is very
unlikely to cause any real issues.
Fixes#2941
This adds proper exception handlers in the right places. And tests
ensure that it indeed provides nice neat logs without large
stacktraces for every amqp operation.
Unnecessary checking for subscribe permissions on topic was dropped,
as `queue.bind` does exactly the same check. Topic permissions tests
were also added, and they indeed confirm that there was no change in
behaviour.
Ideally the same explicit topic permission check should be dropped for
publishing, but it's more complicated - so for now there only a
detailed comment in the source code explaining it.
A few other things were also optimized away:
- Using amqp client to test for queue existence
- Creating queues/starting consumptions too eagerly, even if not yet
requested by client
debugging of situation where messages may be stuck.
Also cancel rabbit_fifo_client timer after message resend to avoid
resending them again when the timer triggers.
Some plugins might create internal queues that should not be accounted
for the total number of messages on the system. These can now be filtered
out using a regular expression on the queue name. Individual queue stats
are still available
As we only need to make sure the rabbit_queues table is populated
use a dirty write function that only does this instead. This could potentially
half recovery times for many QQ scenarios.
Technically duplicate names is supported by common test, but we have
seen it contribute to flakiness in our suite in practice
(cherry picked from commit 513446b6d1)
This should avoid channel crashes when the leader has recently changed
and the amqqueue record hasn't yet been updated.
Also catch errors in the QQ tick temporary process that updates metrics.
During shutdown the metrics ETS table may disappear before this process has finished
which would result in a noisy error in the logs.
In test suite. Note a snapshot for 1.0-SNAPSHOT has been
pushed by mistake a while ago, so this one must be
excluded. It's unlikely it will be erased, as the snapshots
for the first stable version should be 1.0.0-SNAPSHOT.
With many queues, rebalancing can log hundreds of lines at warning/info
level, even though nothing exciting happened. I think we can tune that
down - if there is nothing to do - that's a debug level,
if things go well - that's info, not a warning.
Node GUID allows to differentiate between different incarnations of a node.
However, since rabbit may take some time to start (many queues/bindings, etc),
there could be a significant difference between Erlang VM being up and
responding to RPC requests and the new GUID being announced. During that
time, node monitor could incorrectly assume there was a network
partition, while in fact a node was simply restarted. With this change,
as soon as the Erlang VM is up, we can tell whether it was restarted and
avoid false positives.
Additionally, we now log if any queues were deleted on behalf of the
restarted node. This can take quite a long time if there are many transient
queues (eg. auto-delete queues). The longer this takes, the higher were the
odds of a restarted node being up again by the time
check_partial_partition was called. We may need to reconsider this logic
as well but for now - we just log this activity.
Co-authored-by: Loïc Hoguin <lhoguin@vmware.com>
This is meant to be used by deployment tools,
core features and plugins
that expect a certain minimum
number of cluster nodes
to be present.
For example, certain setup steps
in distributed plugins might require
at least three nodes to be available.
This is just a hint, not an enforced
requirement. The default value is 1
so that for single node clusters,
there would be no behavior changes.
The protocol documentation uses decimal values for error and request key
codes.
Let's use hex values instead. This helps when looking at a request and
its response - 0x0006 and 0x8006 vs. 6 and 32774.
Also, when looking at output of protocol analysis tools like Wireshark,
a hexadecimal value will be printed, for example:
"Nov 1, 2021 23:05:19.395825508 GMT","60216,5552","00000009000600010000000701"
"Nov 1, 2021 23:05:19.396069528 GMT","5552,60216","0000000a80060001000000070001"
Above, we can visually identify delete publisher request and response
(0x0006 and 0x8006) and easily match them in the documentation of the
protocol.
Finally, above argument applies to logging as it is common to log
hex values, not decimal.
1. Response for publisher declaration request does not contain
publisher id.
2. Add mechanism entry to the details of SASL handshake request.
3. SASL handshake response contains list of mechanisms, not just single
mechanism.
The situation is this one: if for some reason (i.e. a bug) a queue has
the same message referenced twice (or more) in its index and this
message is eligible for the message store (as opposed to being entirely
stored in the queue index), it will send it multiple times to the
message store.
Until now, the code of the message store had two assertions:
* one verified in the context of the queue process to make sure there
was no two writes or two removes in a row;
* one verified in the context of the message store process, doing
exactly the same.
Consider the following order of events:
1. the queue sends the first copy to the message store
2. the message store handles the first copy
3. the queue sends the second copy to the message
4. the message store handles the second copy
In this scenario, none of the assertions are triggered and the message
goes through the message store as if it was coming from different queues
(i.e. a fan-out use case).
Now consider this order of events:
1. the queue sends the first copy to the message store
2. the queue sends the second copy to the message store
3. the message store handles the first copy
This time, the code will hit both assertions, leading to the crash of
the queue, the crash of the message store and as a consequence, the
crash of the entire vhost.
In the case of two consecutive writes, those assertions are useless
because the message store already knows how to handle multiple copies of
the same message. However, the consequence are catastrophic: a single
queue with a duplicate message could take down an entire vhost.
This patch relaxes the assertion in the case of two consecutive writes.
Now, both scenarii described above will be consistent: the copies from
the same queue will be handled as any copies and the message store and
the vhost will continue to work as usual.
Note that this patch doesn't cover the reason why there were multiple
copies in the queue in the first place! So the initial reason for this
to happen is still there lurking somewhere. The user who saw this
problem had duplicate messages in a dead-letter queue. Perhaps something
related to the lack of publisher-confirms between the initial queue and
the dead-letter one?
All ssl options were stored in the same proplist, and the code was
then trying to determine whether an option actually belongs to ranch
ssl options or not.
Some keys landed in the wrong place, like it did happen in #2975 -
different ports were mentioned in listener config (default at
top-level, and non-default in `ssl_opts`). Then `ranch` and
`rabbitmq_web_dispatch` were treating this differently.
This change just moves all ranch ssl opts into proper place using
schema, removing any need for guessing in code.
The only downside is that advanced config compatibility is broken.
Fixes#3396
As can be seen in
9cf18e83f2/deps/rabbitmq_management/priv/www/js/tmpl/queue.ejs (L171)
there should be 3 colums for a classic queue.
While it's possible to share the same field with the quorum queues
case, I did it this way so that conditions for the labels and for the
values will have the same shape.
Deriving a max-cluster-size only from running nodes would create situations where
in a three-node with only two nodes running cluster it would select an non-running
node as follower.
This error probably shouldn't happen if the system is correct, however
`stream_not_found` is a valid return value for the functions called
- it comes from `register_listener`.
Not handling it causes a case_clause (and long stacktrace) that is sent
to the client as an internal error, instead of a known protocol error.
Seen while debugging another issue.
Instead of trying to read the ranch_server ETS table directly,
use an internal Ranch function that does the same. While neither
are documented, the latter is less likely to change.
As AMQP 0.9.1 headers are translated into AMQP 1.0 application properties
they are not able to contain complex values such as arrays or tables.
RabbitMQ federation does use array and table values so to avoid crashing when
delivering a federated message to a stream queue we drop them. These header values
should be considered internal however so dropping them before a final queue deliver should not be a huge problem.
It was automatically happening for e.g. `make start-cluster`.
But some plugins were not covered by default generated config, and
running rabbit from 2 different worktrees was a bit complicated.
Allow an offset spec to be used to attach at an appropriate point in the
stream. This is done by specifying a source filter with the key rabbitmq:stream-offset-spec.
The offset is also included as a message annotation with the key x-stream-offset.
When a link is detached we also issue a basic.cancel to the 0.9.1 channel. If this wasn't done
and you detached then re-attached a link for the same queue you'd get a consumer-tag offset
error from the 0.9.1 channel.
When a consumer reaches the end of a stream it need to register an
offset listener with the local stream member so that it can be notified
when new stream messages are committed. The stream queue implementation
for some reason registered offset listeners with the leader, not the local
member.
The suite level timeout the .erl I've learned is actually per
case. By sharding bu testcase, we can better match the common test
level and bazel level timeouts, such that we can get logs from remote
test run failures.
AWS, Kubernetes and Classic peer discovery plugins use list_nodes and
Erlang global:set_lock to create a mutex lock. To unlock, these plugins
get the latest list with list_nodes and call global:del_lock.
However, if list_nodes within unlock fails, RabbitMQ will throw an
uncaught exception and the lock will not be released until the node
holding the lock is restarted. This prevents new nodes from joining the
cluster.
This failure can be avoided by passing the list of nodes from lock to
unlock. If a node goes away (and comes back) between the lock and unlock
calls, del_lock could still successfully remove the lock. Similarly, if
a new node starts up between the lock and unlock calls, del_lock
wouldn't need to inform the new node.
Use case: Allow plain connections over one (internal IP), and TLS
connections over another IP (eg. internet routable IP). Without this
patch a cluster can only support access over one or the other IP, not
both.
(cherry picked from commit b9e6aad035)
Set a timeout at the common test level that is less than 30
minutes. There is a 10 minute timeout at the bazel level, but if that
is reached, logs are not captured.
Previously the bazel timeout and common test timeout were equal, which
meant that in practice the bazel timeout was often reached first, in
which case we don't receive the test logs