In test suite. Note a snapshot for 1.0-SNAPSHOT has been
pushed by mistake a while ago, so this one must be
excluded. It's unlikely it will be erased, as the snapshots
for the first stable version should be 1.0.0-SNAPSHOT.
With many queues, rebalancing can log hundreds of lines at warning/info
level, even though nothing exciting happened. I think we can tune that
down - if there is nothing to do - that's a debug level,
if things go well - that's info, not a warning.
rabbitmq-server master has normally just used the latest bazel-erlang,
however when things go wrong it's pretty disruptive. This workflow
creates a PR that updates bazel-erlang's pinning in this repo to the
latest commit - but without blocking the pipeline if things go wrong.
If we need to coordinate a rabbitmq-server/bazel-erlang change, then
of course the pinned version can be updated within a manual
rabbitmq-server PR
Node GUID allows to differentiate between different incarnations of a node.
However, since rabbit may take some time to start (many queues/bindings, etc),
there could be a significant difference between Erlang VM being up and
responding to RPC requests and the new GUID being announced. During that
time, node monitor could incorrectly assume there was a network
partition, while in fact a node was simply restarted. With this change,
as soon as the Erlang VM is up, we can tell whether it was restarted and
avoid false positives.
Additionally, we now log if any queues were deleted on behalf of the
restarted node. This can take quite a long time if there are many transient
queues (eg. auto-delete queues). The longer this takes, the higher were the
odds of a restarted node being up again by the time
check_partial_partition was called. We may need to reconsider this logic
as well but for now - we just log this activity.
Co-authored-by: Loïc Hoguin <lhoguin@vmware.com>
This is meant to be used by deployment tools,
core features and plugins
that expect a certain minimum
number of cluster nodes
to be present.
For example, certain setup steps
in distributed plugins might require
at least three nodes to be available.
This is just a hint, not an enforced
requirement. The default value is 1
so that for single node clusters,
there would be no behavior changes.
It's too disruptive to just track master. A piece of automation will
be created to ensure the dep doesn't fall behind, but it can do so
speculatively on a branch first via PR, potentially with auto-merging.
The protocol documentation uses decimal values for error and request key
codes.
Let's use hex values instead. This helps when looking at a request and
its response - 0x0006 and 0x8006 vs. 6 and 32774.
Also, when looking at output of protocol analysis tools like Wireshark,
a hexadecimal value will be printed, for example:
"Nov 1, 2021 23:05:19.395825508 GMT","60216,5552","00000009000600010000000701"
"Nov 1, 2021 23:05:19.396069528 GMT","5552,60216","0000000a80060001000000070001"
Above, we can visually identify delete publisher request and response
(0x0006 and 0x8006) and easily match them in the documentation of the
protocol.
Finally, above argument applies to logging as it is common to log
hex values, not decimal.
1. Response for publisher declaration request does not contain
publisher id.
2. Add mechanism entry to the details of SASL handshake request.
3. SASL handshake response contains list of mechanisms, not just single
mechanism.
The situation is this one: if for some reason (i.e. a bug) a queue has
the same message referenced twice (or more) in its index and this
message is eligible for the message store (as opposed to being entirely
stored in the queue index), it will send it multiple times to the
message store.
Until now, the code of the message store had two assertions:
* one verified in the context of the queue process to make sure there
was no two writes or two removes in a row;
* one verified in the context of the message store process, doing
exactly the same.
Consider the following order of events:
1. the queue sends the first copy to the message store
2. the message store handles the first copy
3. the queue sends the second copy to the message
4. the message store handles the second copy
In this scenario, none of the assertions are triggered and the message
goes through the message store as if it was coming from different queues
(i.e. a fan-out use case).
Now consider this order of events:
1. the queue sends the first copy to the message store
2. the queue sends the second copy to the message store
3. the message store handles the first copy
This time, the code will hit both assertions, leading to the crash of
the queue, the crash of the message store and as a consequence, the
crash of the entire vhost.
In the case of two consecutive writes, those assertions are useless
because the message store already knows how to handle multiple copies of
the same message. However, the consequence are catastrophic: a single
queue with a duplicate message could take down an entire vhost.
This patch relaxes the assertion in the case of two consecutive writes.
Now, both scenarii described above will be consistent: the copies from
the same queue will be handled as any copies and the message store and
the vhost will continue to work as usual.
Note that this patch doesn't cover the reason why there were multiple
copies in the queue in the first place! So the initial reason for this
to happen is still there lurking somewhere. The user who saw this
problem had duplicate messages in a dead-letter queue. Perhaps something
related to the lack of publisher-confirms between the initial queue and
the dead-letter one?
All ssl options were stored in the same proplist, and the code was
then trying to determine whether an option actually belongs to ranch
ssl options or not.
Some keys landed in the wrong place, like it did happen in #2975 -
different ports were mentioned in listener config (default at
top-level, and non-default in `ssl_opts`). Then `ranch` and
`rabbitmq_web_dispatch` were treating this differently.
This change just moves all ranch ssl opts into proper place using
schema, removing any need for guessing in code.
The only downside is that advanced config compatibility is broken.