Commit Graph

510 Commits

Author SHA1 Message Date
Philip Kuryloski ab0a9cd700
Merge pull request #3516 from rabbitmq/move-ct-helpers-to-monorepo
Move CT helpers to monorepo
2021-09-30 11:29:26 +02:00
Alexey Lebedeff 46df4f1689 Update makefiles/bazel to reflect CT helpers repo merge-in 2021-09-30 10:48:11 +02:00
Philip Kuryloski 9c9fb7ffb0 Shard cluster_management_SUITE by testcase to better manage timeouts
The suite level timeout the .erl I've learned is actually per
case. By sharding bu testcase, we can better match the common test
level and bazel level timeouts, such that we can get logs from remote
test run failures.
2021-09-30 10:38:39 +02:00
Vy Hong 7090199330 Reuse list of nodes in peer discovery plugins that use Erlang global locks
AWS, Kubernetes and Classic peer discovery plugins use list_nodes and
Erlang global:set_lock to create a mutex lock. To unlock, these plugins
get the latest list with list_nodes and call global:del_lock.

However, if list_nodes within unlock fails, RabbitMQ will throw an
uncaught exception and the lock will not be released until the node
holding the lock is restarted. This prevents new nodes from joining the
cluster.

This failure can be avoided by passing the list of nodes from lock to
unlock. If a node goes away (and comes back) between the lock and unlock
calls, del_lock could still successfully remove the lock. Similarly, if
a new node starts up between the lock and unlock calls, del_lock
wouldn't need to inform the new node.
2021-09-29 16:22:37 -07:00
Ayanda Dube 3bce36b667 Preserve stacktraces in propagated unexpected internal auth backend
exceptions.
2021-09-27 19:19:30 +01:00
Ayanda Dube 78145d880a Extract SingleActiveConsumer channel-pid and consumer refs only once 2021-09-24 18:15:36 +01:00
Ayanda Dube fb0b1ec75a Remove code repetion in rabbit_auth_backend_internal catch statements
by matching and reusing the raised exception class
2021-09-24 18:15:36 +01:00
Philip Kuryloski 860653c97a Adjust the clustering_management_SUITE timeout at the ct level
Previously the bazel timeout and common test timeout were equal, which
meant that in practice the bazel timeout was often reached first, in
which case we don't receive the test logs
2021-09-23 13:55:18 +02:00
Philip Kuryloski 7dc0c29227 Use only 3 nodes for feature_flags_with_unpriveleged_user_SUITE
The test does not appear reliable when it runs in Github actions. This
is currently the only test that does so. Other tests run of BuildBuddy workers.
2021-09-22 17:22:49 +02:00
Michael Klishin 5fb118e8ef
Merge pull request #3409 from rabbitmq/lh-increase-queue-shutdown-timeout
Increase classic queue shutdown timeout
2021-09-21 16:46:21 +03:00
Karl Nilsson be380930ec Stream coordinator: monitor task processes
So that they are cleaned up from the stream coordinator aux state
when they finish instead of growing indefinitely.
2021-09-21 13:09:46 +01:00
Philip Kuryloski 6e6279eb2b Reduce a test timeout
The original value of 15 minutes was inherited from a larger suite. 5
should be sufficient, as a passing run is typically around 2 minutes.
2021-09-21 10:16:38 +02:00
Michael Klishin ddbd56b1d9
Merge pull request #3462 from rabbitmq/mk-rabbit-nodes-all
Introduce rabbit_nodes:all/0
2021-09-20 23:08:03 +03:00
Michael Klishin 0f6a9dac27
Introduce rabbit_nodes:all/0 2021-09-20 22:24:25 +03:00
Arnaud Cogoluègnes 9ea1a823cc
Merge pull request #3448 from rabbitmq/qq-consumer-cancellation-fixes
Quorum Queue consumer cancellation fixes
2021-09-20 17:29:26 +02:00
Karl Nilsson ee6ef35873 Emit release cursor for more commands
It should be rare that repeated use of these commands would grow the
Raft log excessively but just incase we evaluate the release cursors
here anyway so that if the queue is empty we may trigger a snapshot
anyway.
2021-09-20 12:19:22 +01:00
Michael Klishin 80c00aed04
Merge pull request #3454 from rabbitmq/mk-node-cluster-membership-internal-events
Emit a node.added event when a new node joins the cluster
2021-09-19 20:23:02 +03:00
Michael Klishin c8781e5da7
Emit a node.added event when a new node joins the cluster 2021-09-19 18:59:26 +03:00
Karl Nilsson eaa216da82 QQ: emit release cursors after consumer cancel
If this is not done apps that consume/cancel from empty queues in a loop
will grow the raft log in an unbounded manner. This could also be the
case for the garbage_collect command.
2021-09-17 17:09:30 +01:00
Karl Nilsson 5779059bd5 QQ: fix memory leak when cancelling consumer
If the queue is empty when a consumer is cancelled it would leave the
consumer id inside the service queue. If an application subscribes/unsubscibes
in a loop from an empty queue this would cause the service queue to never be
cleared up.

NB: whenever we make a change to how the quorum queue state machien is
calculated we need to consider how this effects determinism as during an
upgrade different members may calculate a different service queue state.
In this case it should be ok as they will eventually converge on the same
state once all "dead" consumer ids have been removed from the queue.

In any case it should not affect how messages are assigned to consumers.
2021-09-17 14:53:33 +01:00
Philip Kuryloski eea99e1cd5 Split the feature_flags_SUITE into two parts for CI/Bazel
Two testcases in the original suite fail if the test is run as the
root user. Currently under remote execution with bazel this is the
only working option. There is a workaround in place, but the entire
suite when run that way takes around 12 minutes. This splits the suite
so that the minimal set of cases is executed using the slower workaround.
2021-09-17 11:08:48 +02:00
Loïc Hoguin 09c8cd4f98
Increase classic queue shutdown timeout
A value that is too low will prevent the index from shutting
down in time when there are many queues. This leads to the
process being killed and on the next RabbitMQ restart a
(potentially very long) dirty recovery is needed.

The value of 10 minutes was chosen to mirror the shutdown
timeout of the message store. Since both queues and message
store need to have shut down gracefully in order to have
a clean restart it makes sense to use the same value.

Related: c40c2628a9
2021-09-13 10:59:30 +02:00
Philip Kuryloski 16a22f0424
Merge pull request #3401 from rabbitmq/ranch-21-bazel
Use Ranch 2.1.0 in bazel build
2021-09-10 15:16:10 +02:00
Michal Kuratczyk 624767281f Enable metrics collection in run_tests
Proposed `min-masters` implementation relies on metrics so they need to
be collected during queue_master_location tests.
2021-09-10 14:51:11 +02:00
Philip Kuryloski 5fd9d1f638 Use Ranch 2.1.0 in bazel build
Matches 063d32626d
2021-09-10 14:46:36 +02:00
Michael Klishin 3248895ec9
Revisit two rabbitmq.conf validators
* max_message_size had an off-by-one error and unfortunate naming
 * classic mirrored queue batch size was not validating the size in messages.
   The limit of over 2B messages did not make much sense. 1M is a still very
   high but a more reasonable upper bound

Fixes #3390
2021-09-10 13:16:21 +03:00
Michael Klishin f6c8380be5
rabbit_vhost: handle imported tags that are atom lists 2021-09-03 18:38:47 +03:00
Jean-Sébastien Pédron ef9eee8229
rabbit_boot_state: Fix style bug 2021-09-01 12:24:02 +02:00
Jean-Sébastien Pédron 409dc0e52a
rabbit_boot_state: Support Xterm titlebar update
With this change and if the RabbitMQ node is running on Unix and accepts
input, the titlebar of an Xterm-compatible terminal emulator will show a
few details about the running node. Specifically, it will indicate the
name of the node and the version of RabbitMQ.
2021-09-01 12:23:58 +02:00
Jean-Sébastien Pédron 689c56cb04
Logging: Add `single_line` flag support to $RABBITMQ_LOG
A user could already enable single-line logging (the `single_line`
option of `logger_formatter` or RabbitMQ internal formatters) from the
configuration file. For example:

    log.console.formatter.single_line = on

With this patch, the option can be enabled from the `$RABBITMQ_LOG`
environment variable as well:

    make run-broker RABBITMQ_LOG=+single_line
2021-09-01 09:31:54 +02:00
Gerhard Lazu 6a1faa6fd6
Keep checking that replica recovered in rabbit_stream_queue
Rather than sleeping for 6 seconds, we want to check that replica
recovered multiple times within 30 seconds, and either eventually
succeed, or fail if this does not recover within 30 seconds, the default
await_condition time interval.

Pair: @kjnilsson

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-08-31 17:02:21 +01:00
Karl Nilsson c016567359
stream coordinator: further logging improvements
Also increased the tick timeout to avoid checking for new rabbit nodes
to auto add too often.

Also increased sleep times for nodedowns to retry less often.
2021-08-31 15:29:16 +01:00
Karl Nilsson b59d87d90e
Tidy up some stream coordinator warning logs
Some logs used ~p to format a full stack trace. Given these warnings are emitted during
any nodedown this unnecessarily pollutes the logs. Trimmed using ~W instead.
2021-08-31 15:29:16 +01:00
Karl Nilsson 9092de1e32
Stream coordinator: only return tail info if osiris app is started
This ensures that only nodes that are ready to host stream members
are included in the election. This avoids continuous restart attempts
when the rabbit application is stopped.
2021-08-31 15:29:16 +01:00
Philip Kuryloski 09fb5c5321 Skip additional tests in mixed versions
The tests in question won't pass consistently as they are at the mercy
of how the quorum queue is placed across the mixed version nodes
2021-08-30 17:17:25 +02:00
Michael Klishin 83f007be54
Merge pull request #3341 from rabbitmq/local-exclusive-queues
Always place exclusive queues on the local node
2021-08-28 09:54:49 +03:00
Michael Klishin 3b4b4dc222
Exclude roundtrip definition import cases from mixed version runs
References #3333
2021-08-26 19:10:11 +03:00
Michael Klishin 54f7b6d77c
Re-format two definition import input files 2021-08-26 19:03:14 +03:00
Michael Klishin 42a3dfa81b
Exclude the #3333 test case from mixed version runs 2021-08-26 17:25:07 +03:00
Michal Kuratczyk d3dcd48ea5 Always place exclusive queues on the local node
Prior to this change, exclusive queues have been subject to the queue
location process, just like other queues. Therefore, if
queue_master_locator was not client-local and x-queue-master-locator was
not set to client-local, an exclusive queue was likely to be located on
a different node than the connection it is exclusive to.  This is
suboptimal and may lead to inconsistencies when the queue's node goes
down while the connection's node is still up.
2021-08-26 13:05:55 +02:00
Michael Klishin 2e61f51773
Commit definition import case16 file 2021-08-24 04:41:51 +03:00
Michael Klishin 6f97707dac
Definition import: correctly import vhost metadata 2021-08-24 04:41:04 +03:00
Jean-Sébastien Pédron 0b1942bdc0
rabbit_{connection,channel}_tracking: Fix race condition in list()
The table might not exist yet (or is already gone) between the time
rabbit_nodes:all_running() runs and returns a specific node, and
mnesia:dirty_match_object() is called for that node's table.

This seems to happen frequently in CI.
2021-08-19 16:45:24 +02:00
Michael Klishin f5fe419892
Make PUT /api/vhosts/{name} update tags and/or description 2021-08-18 19:07:25 +03:00
Michael Klishin 6a0058fe7c
Introduce TLS-related rabbitmq.conf settings for definition import
currently only used by the HTTPS mechanism but can be used by
any other.
2021-08-17 20:42:53 +03:00
Michael Klishin f3a5235408
Refactor definition import to allow for arbitrary sources
The classic local filesystem source is still supported
using the same traditional configuration key, load_definitions.

Configuration schema follows peer discovery in spirit:

 * definitions.import_backend configures the mechanism to use,
   which can be a module provided by a plugin
 * definitions.* keys can be defined by plugins and contain any
   keys a specific mechanism needs

For example, the classic local filesystem source can now be
configured like this:

``` ini
definitions.import_backend = local_filesystem
definitions.local.path = /path/to/definitions.d/definition.json
```

``` ini
definitions.import_backend = https
definitions.https.url = https://hostname/path/to/definitions.json
```

HTTPS may require additional configuration keys related to TLS/x.509
peer verification. Such extra keys will be added as the need for them
becomes evident.

References #3249
2021-08-14 14:53:45 +03:00
Michael Klishin 1eacbaac15
Merge pull request #3299 from rabbitmq/add-env-vars-to-set-default-user-pass-vhost-and-erlang-cookie
Add support to override `default_{user,pass,vhost}` and the Erlang cookie from the environment
2021-08-11 23:06:46 +03:00
Michael Klishin 81780dc95e
Log a warning when Erlang cookie is overriden using an env variable
as it can be really difficult to troubleshoot such cookie changes
2021-08-11 20:34:38 +03:00
Jean-Sébastien Pédron d0b7a33a0f
Logging: Add comments explaining when burst limit is disabled
Follow-up to rabbitmq/rabbitmq-server#3298.
2021-08-11 16:56:21 +02:00
Jean-Sébastien Pédron bd39027d68
Add support for $RABBITMQ_ERLANG_COOKIE env var
It is the equivalent of the content of the Erlang cookie file. Note this
variable IS the cookie value, NOT the path to a cookie file.

If it is set, it will take precedence over the content of the Erlang
cookie file.

Fixes docker-library/rabbitmq#508.
2021-08-11 15:50:40 +02:00