The suite level timeout the .erl I've learned is actually per
case. By sharding bu testcase, we can better match the common test
level and bazel level timeouts, such that we can get logs from remote
test run failures.
AWS, Kubernetes and Classic peer discovery plugins use list_nodes and
Erlang global:set_lock to create a mutex lock. To unlock, these plugins
get the latest list with list_nodes and call global:del_lock.
However, if list_nodes within unlock fails, RabbitMQ will throw an
uncaught exception and the lock will not be released until the node
holding the lock is restarted. This prevents new nodes from joining the
cluster.
This failure can be avoided by passing the list of nodes from lock to
unlock. If a node goes away (and comes back) between the lock and unlock
calls, del_lock could still successfully remove the lock. Similarly, if
a new node starts up between the lock and unlock calls, del_lock
wouldn't need to inform the new node.
Previously the bazel timeout and common test timeout were equal, which
meant that in practice the bazel timeout was often reached first, in
which case we don't receive the test logs
It should be rare that repeated use of these commands would grow the
Raft log excessively but just incase we evaluate the release cursors
here anyway so that if the queue is empty we may trigger a snapshot
anyway.
If this is not done apps that consume/cancel from empty queues in a loop
will grow the raft log in an unbounded manner. This could also be the
case for the garbage_collect command.
If the queue is empty when a consumer is cancelled it would leave the
consumer id inside the service queue. If an application subscribes/unsubscibes
in a loop from an empty queue this would cause the service queue to never be
cleared up.
NB: whenever we make a change to how the quorum queue state machien is
calculated we need to consider how this effects determinism as during an
upgrade different members may calculate a different service queue state.
In this case it should be ok as they will eventually converge on the same
state once all "dead" consumer ids have been removed from the queue.
In any case it should not affect how messages are assigned to consumers.
Two testcases in the original suite fail if the test is run as the
root user. Currently under remote execution with bazel this is the
only working option. There is a workaround in place, but the entire
suite when run that way takes around 12 minutes. This splits the suite
so that the minimal set of cases is executed using the slower workaround.
A value that is too low will prevent the index from shutting
down in time when there are many queues. This leads to the
process being killed and on the next RabbitMQ restart a
(potentially very long) dirty recovery is needed.
The value of 10 minutes was chosen to mirror the shutdown
timeout of the message store. Since both queues and message
store need to have shut down gracefully in order to have
a clean restart it makes sense to use the same value.
Related: c40c2628a9
* max_message_size had an off-by-one error and unfortunate naming
* classic mirrored queue batch size was not validating the size in messages.
The limit of over 2B messages did not make much sense. 1M is a still very
high but a more reasonable upper bound
Fixes#3390
With this change and if the RabbitMQ node is running on Unix and accepts
input, the titlebar of an Xterm-compatible terminal emulator will show a
few details about the running node. Specifically, it will indicate the
name of the node and the version of RabbitMQ.
A user could already enable single-line logging (the `single_line`
option of `logger_formatter` or RabbitMQ internal formatters) from the
configuration file. For example:
log.console.formatter.single_line = on
With this patch, the option can be enabled from the `$RABBITMQ_LOG`
environment variable as well:
make run-broker RABBITMQ_LOG=+single_line
Rather than sleeping for 6 seconds, we want to check that replica
recovered multiple times within 30 seconds, and either eventually
succeed, or fail if this does not recover within 30 seconds, the default
await_condition time interval.
Pair: @kjnilsson
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Also increased the tick timeout to avoid checking for new rabbit nodes
to auto add too often.
Also increased sleep times for nodedowns to retry less often.
Some logs used ~p to format a full stack trace. Given these warnings are emitted during
any nodedown this unnecessarily pollutes the logs. Trimmed using ~W instead.
This ensures that only nodes that are ready to host stream members
are included in the election. This avoids continuous restart attempts
when the rabbit application is stopped.
Prior to this change, exclusive queues have been subject to the queue
location process, just like other queues. Therefore, if
queue_master_locator was not client-local and x-queue-master-locator was
not set to client-local, an exclusive queue was likely to be located on
a different node than the connection it is exclusive to. This is
suboptimal and may lead to inconsistencies when the queue's node goes
down while the connection's node is still up.
The table might not exist yet (or is already gone) between the time
rabbit_nodes:all_running() runs and returns a specific node, and
mnesia:dirty_match_object() is called for that node's table.
This seems to happen frequently in CI.
The classic local filesystem source is still supported
using the same traditional configuration key, load_definitions.
Configuration schema follows peer discovery in spirit:
* definitions.import_backend configures the mechanism to use,
which can be a module provided by a plugin
* definitions.* keys can be defined by plugins and contain any
keys a specific mechanism needs
For example, the classic local filesystem source can now be
configured like this:
``` ini
definitions.import_backend = local_filesystem
definitions.local.path = /path/to/definitions.d/definition.json
```
``` ini
definitions.import_backend = https
definitions.https.url = https://hostname/path/to/definitions.json
```
HTTPS may require additional configuration keys related to TLS/x.509
peer verification. Such extra keys will be added as the need for them
becomes evident.
References #3249
It is the equivalent of the content of the Erlang cookie file. Note this
variable IS the cookie value, NOT the path to a cookie file.
If it is set, it will take precedence over the content of the Erlang
cookie file.
Fixesdocker-library/rabbitmq#508.