Commit Graph

167 Commits

Author SHA1 Message Date
David Ansari 6d968718c8 Fix function_clause error in tracking_status/2
Before this commit:

> ./sbin/rabbitmq-streams stream_status --tracking s1
Status of stream s1 on node rabbit@localhost ...
Error:
{:function_clause,
[{:rabbit_stream_queue, :"-tracking_status/2-fun-0-",
[:offsets, %{"s1-1" => 5}, []],
[file: 'src/rabbit_stream_queue.erl', line: 608]},
{:maps, :fold_1, 3, [file: 'maps.erl', line: 410]},
{:rabbit_stream_queue, :tracking_status, 2, []}]}

After this commit:

> ./sbin/rabbitmq-streams stream_status --tracking s1
Status of stream s1 on node rabbit@localhost ...
┌────────┬───────────┬───────┐
│ type   │ reference │ value │
├────────┼───────────┼───────┤
│ offset │ s1-1      │ 51    │
└────────┴───────────┴───────┘
2021-07-23 19:34:31 +02:00
Michael Klishin c1e3710140
Squash a compiler warning 2021-07-20 00:55:40 +03:00
Philip Kuryloski d6399bbb5b
Mixed version testing in bazel (#3200)
Unlike with gnu make, mixed version testing with bazel uses a package-generic-unix for the secondary umbrella rather than the source. This brings the benefit of being able to mixed version test releases built with older erlang versions (even though all nodes will run under the single version given to bazel)

This introduces new test labels, adding a `-mixed` suffix for every existing test. They can be skipped if necessary with `--test_tag_filters` (see the github actions workflow for an example)

As part of the change, it is now possible to run an old release of rabbit with rabbitmq_run rule, such as:

`bazel run @rabbitmq-server-generic-unix-3.8.17//:rabbitmq-run run-broker`
2021-07-19 14:33:25 +02:00
Philip Kuryloski 0f4cf2755d Increase a timeout for flakiness sake 2021-07-19 14:24:46 +02:00
Philip Kuryloski 0a78484999 Make things a little more consistent between per_*_limit suites 2021-07-16 14:40:51 +02:00
Philip Kuryloski 4f514f435b Try to reduce flakes in per_user_connection_channel_limit_partitions_SUITE 2021-07-16 14:32:35 +02:00
Philip Kuryloski 97e8037b80 Replace some static sleeps in tests with dynamic waits
This should help with flakiness
2021-07-15 16:42:14 +02:00
dcorbacho e65ba8347c Fix delete_replica bug
It caused a lot of flakiness on the rabbit_stream_queue_SUITE, both on `delete_replica`
and `delete_last_replica` test cases.
2021-07-14 17:18:20 +02:00
dcorbacho 6052ecdc9c Split cluster_size_3_parallel in two groups
Faster to test locally the flaky tests and isolate them
2021-07-14 17:18:20 +02:00
Philip Kuryloski 923d87f847 Avoid using a duplicate group name in rabbitm_stream_queue_SUITE
Since bazel-erlang doesn't support this with sharding
2021-07-09 16:26:56 +02:00
Karl Nilsson 284809e750
Merge pull request #3170 from rabbitmq/stream-flaky
Fix restart of stream coordinator when there are no stream queues
2021-07-05 15:03:45 +01:00
Philip Kuryloski da6da8d6c7 Clear memory alarms on all nodes in the memory_alarm_rolls_wal test
If the alarm is triggered directly with `rabbit_alarm` it has to be
cleared on all nodes
2021-07-05 15:57:56 +02:00
dcorbacho deaa42ecac Fix restart of stream coordinator when there are no stream queues
Recovering from an existing queue is fine but if a node is restarted when
there are no longer stream queues on the system, the recovery process won't
restart the pre-existing coordinator as that's only performed on queue recovery.
The first attempt to declare a new stream queue on this cluster will crash with
`coordinator unavailable` error, as it only restarts the local coordinator
and not the whole ra cluster, thus lacking quorum.

Recovering the coordinator during the boot process ensures that a pre-existing
coordinator cluster is restarted in any case, and does nothing if there was
never a coordinator on the node.
2021-07-05 15:34:05 +02:00
Philip Kuryloski 390a00b828 Handle feature flag enablement failure more gracefully in test setup 2021-07-05 11:22:38 +02:00
Philip Kuryloski 1b92fadd80 Skip additional quorum_queue_SUITE cases under mixed versions 2021-06-30 12:41:59 +02:00
Philip Kuryloski d086af8070 Reduce test case flakyness in quorum_queue_SUITE
for the clustered/cluster_size_3/confirm_availability_on_leader_change case
2021-06-30 10:05:23 +02:00
Philip Kuryloski ef9647671f Introduce dynamic wait in parts of the quorum_queue_SUITE
to help with test flakes
2021-06-29 18:32:53 +02:00
Philip Kuryloski a8ae32e2f7 Skip an additional quorum_queue_SUITE case in mixed versions 2021-06-29 16:43:19 +02:00
Michael Klishin cf147ebfe5
Merge branch 'master' into mk-stricter-stop-start-assertions-in-quorum-queue-suite 2021-06-29 12:49:31 +03:00
Michael Klishin a1ab7452ef
Improve assertions in a QQ suite test 2021-06-29 12:16:23 +03:00
Philip Kuryloski 3cb8ff1ab9 Mixed version testing skip updates 2021-06-29 10:49:06 +02:00
dcorbacho c9305d948a
Use number of publishing channels as global publishers in amqp091 2021-06-29 08:10:42 +01:00
Michael Klishin bed64f2cc9
Reduce priority_queue_SUITE to single node tests
Other tests (that produce flakes) arguably test classic mirrored
queues, a deprecated feature reasonably well
covered in other suites.

Per discussion with @gerhard.
2021-06-28 21:59:16 +03:00
Michael Klishin a19a0f924a
quorum_queue_SUITE: don't unconditionally skip node_removal_is_not_quorum_critical
Unintentionally introduced in a3c97d491f
2021-06-28 13:02:55 +03:00
Philip Kuryloski a3c97d491f Update additional test skipping for 3.8/3.9 mixed versions 2021-06-25 11:17:46 +02:00
Philip Kuryloski dca208abce Additional skipping of unsupported tests in mixed version clusters
Also consolidate the mixed version check on
rabbit_ct_helpers:is_mixed_versions/1 as much as possible
2021-06-23 14:27:41 +02:00
Gerhard Lazu c7971252cd
Global counters per protocol + protocol AND queue_type
This way we can show how many messages were received via a certain
protocol (stream is the second real protocol besides the default amqp091
one), as well as by queue type, which is something that many asked for a
really long time.

The most important aspect is that we can also see them by protocol AND
queue_type, which becomes very important for Streams, which have
different rules from regular queues (e.g. for example, consuming
messages is non-destructive, and deep queue backlogs - think billions of
messages - are normal). Alerting and consumer scaling due to deep
backlogs will now work correctly, as we can distinguish between regular
queues & streams.

This has gone through a few cycles, with @mkuratczyk & @dcorbacho
covering most of the ground. @dcorbacho had most of this in
https://github.com/rabbitmq/rabbitmq-server/pull/3045, but the main
branch went through a few changes in the meantime. Rather than resolving
all the conflicts, and then making the necessary changes, we (@gerhard +
@kjnilsson) took all learnings and started re-applying a lot of the
existing code from #3045. We are confident in this approach and would
like to see it through. We continued working on this with @dumbbell, and
the most important changes are captured in
https://github.com/rabbitmq/seshat/pull/1.

We expose these global counters in rabbitmq_prometheus via a new
collector. We don't want to keep modifying the existing collector, which
grew really complex in parts, especially since we introduced
aggregation, but start with a new namespace, `rabbitmq_global_`, and
continue building on top of it. The idea is to build in parallel, and
slowly transition to the new metrics, because semantically the changes
are too big since streams, and we have been discussing protocol-specific
metrics with @kjnilsson, which makes me think that this approach is
least disruptive and... simple.

While at this, we removed redundant empty return value handling in the
channel. The function called no longer returns this.

Also removed all DONE / TODO & other comments - we'll handle them when
the time comes, no need to leave TODO reminders.

Pairs @kjnilsson @dcorbacho @dumbbell
(this is multiple commits squashed into one)

Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
2021-06-22 14:14:21 +01:00
Philip Kuryloski 2d06a67921 Fix test for rabbitmq-ct-helpers change
rabbit_ct_broker_helpers:rpc now raised an exception if the remote
call fails. This matches the assertion with the new behavior
2021-06-22 11:12:17 +02:00
Michael Klishin 0e6298d1ae
Explain
(cherry picked from commit 2387022e8c)
2021-06-21 14:43:54 +08:00
Michael Klishin 6550cd1752
Reduce flakiness in rabbitmq_queues_cli_integration_SUITE
In case removed node hosts a leader, it takes a moment for
the QQ to elect a new one and begin accepting cluster
membership change operations again.

(cherry picked from commit a9d8816c6a)
2021-06-21 14:41:10 +08:00
Philip Kuryloski 40a7a1c24c Bring rabbit:logger_SUITE online in bazel and bump mismatched deps 2021-06-18 14:41:14 +02:00
Philip Kuryloski cff7516317 Skip some tests that are not mixed version compatible
Mark per_user_connection_channel_tracking_SUITE:cluster_size_2_network
as not mixed version compatible.

In a mixed 3.8/3.9 cluster, changes to rabbit_core_ff.erl imply that
some feature flag related migrations cannot occur, and therefore
user_limits cannot be enabled as required by the test
2021-06-17 15:36:22 +02:00
Karl Nilsson 8a4f4c6d45 Ignore dynamic_qq test that isn't mixed version compatible
quorum_unaffected_after_vhost_failure isn't mixed versions compatible as
it tries to declare a queue in a mixed cluster from a node running RA 1.x where all other
nodes are running Ra 2.0.
2021-06-17 11:52:34 +01:00
Karl Nilsson d8ac46d745 Mark quorum queue test as non-mixed-version compatible
simple_confirm_availability_on_leader_change can't be made forwards compatible
as when running in mixed mode the queue declaration happens on an old node in
a cluster of mostly new nodes. As new nodes run Ra 2.0 and Ra 1.x does not know
how to create members on Ra 2.0 nodes this test fails. This is an acceptable limitation
for a transient mixed versions cluster.
2021-06-17 10:48:20 +01:00
Michal Kuratczyk 437d8aa8c5 Don't run policy tests in parallel
Now that a policy overwrites queue arguments, running policy tests in
parallel with other tests leads to non-deterministic test results with
some tests randomly failing.
2021-06-07 16:46:14 +02:00
David Ansari 0876746d5f Remove randomized startup delays
On initial cluster formation, only one node in a multi node cluster
should initialize the Mnesia database schema (i.e. form the cluster).
To ensure that for nodes starting up in parallel,
RabbitMQ peer discovery backends have used
either locks or randomized startup delays.

Locks work great: When a node holds the lock, it either starts a new
blank node (if there is no other node in the cluster), or it joins
an existing node. This makes it impossible to have two nodes forming
the cluster at the same time.
Consul and etcd peer discovery backends use locks. The lock is acquired
in the consul and etcd infrastructure, respectively.

For other peer discovery backends (classic, DNS, AWS), randomized
startup delays were used. They work good enough in most cases.
However, in https://github.com/rabbitmq/cluster-operator/issues/662 we
observed that in 1% - 10% of the cases (the more nodes or the
smaller the randomized startup delay range, the higher the chances), two
nodes decide to form the cluster. That's bad since it will end up in a
single Erlang cluster, but in two RabbitMQ clusters. Even worse, no
obvious alert got triggered or error message logged.

To solve this issue, one could increase the randomized startup delay
range from e.g. 0m - 1m to 0m - 3m. However, this makes initial cluster
formation very slow since it will take up to 3 minutes until
every node is ready. In rare cases, we still end up with two nodes
forming the cluster.

Another way to solve the problem is to name a dedicated node to be the
seed node (forming the cluster). This was explored in
https://github.com/rabbitmq/cluster-operator/pull/689 and works well.
Two minor downsides to this approach are: 1. If the seed node never
becomes available, the whole cluster won't be formed (which is okay),
and 2. it doesn't integrate with existing dynamic peer discovery backends
(e.g. K8s, AWS) since nodes are not yet known at deploy time.

In this commit, we take a better approach: We remove randomized startup
delays altogether. We replace them with locks. However, instead of
implementing our own lock implementation in an external system (e.g. in K8s),
we re-use Erlang's locking mechanism global:set_lock/3.

global:set_lock/3 has some convenient properties:
1. It accepts a list of nodes to set the lock on.
2. The nodes in that list connect to each other (i.e. create an Erlang
cluster).
3. The method is synchronous with a timeout (number of retries). It
blocks until the lock becomes available.
4. If a process that holds a lock dies, or the node goes down, the lock
held by the process is deleted.

The list of nodes passed to global:set_lock/3 corresponds to the nodes
the peer discovery backend discovers (lists).

Two special cases worth mentioning:

1. That list can be all desired nodes in the cluster
(e.g. in classic peer discovery where nodes are known at
deploy time) while only a subset of nodes is available.
In that case, global:set_lock/3 still sets the lock not
blocking until all nodes can be connected to. This is good since
nodes might start sequentially (non-parallel).

2. In dynamic peer discovery backends (e.g. K8s, AWS), this
list can be just a subset of desired nodes since nodes might not startup
in parallel. That's also not a problem as long as the following
requirement is met: "The peer disovery backend does not list two disjoint
sets of nodes (on different nodes) at the same time."
For example, in a 2-node cluster, the peer discovery backend must not
list only node 1 on node 1 and only node 2 on node 2.

Existing peer discovery backends fullfil that requirement because the
resource the nodes are discovered from is global.
For example, in K8s, once node 1 is part of the Endpoints object, it
will be returned on both node 1 and node 2.
Likewise, in AWS, once node 1 started, the described list of instances
with a specific tag will include node 1 when the AWS peer discovery backend
runs on node 1 or node 2.

Removing randomized startup delays also makes cluster formation
considerably faster (up to 1 minute faster if that was the
upper bound in the range).
2021-06-03 08:01:28 +02:00
Michael Klishin 12253d2fb4
Merge pull request #2954 from rabbitmq/new-segment-entry-count-default
Set segment_entry_count per vhost and use a better default
2021-05-27 01:56:02 +03:00
Karl Nilsson 1ea7bf5519 quorum_queue_SUITE restructure tests
Run more tests with 3 node cluster and have only one group definition
for cluster_size_2
2021-05-21 15:13:18 +01:00
Karl Nilsson 355b1cbe21 quorum_queue_SUITE only configure dist proxy when needed
Only configure the dist proxy for groups that require it.
2021-05-21 12:39:07 +01:00
Arnaud Cogoluègnes c30e013d7a
Rename max-segment-size to stream-max-segment-size-bytes 2021-05-20 10:16:19 +02:00
Michael Klishin 040f8cc912
Replace a few more leftover MPLv1.1 license headers
Most files have been using the MPLv2 headers for months now.
These were detected by the OSL process.
2021-05-19 21:20:47 +03:00
Michael Klishin 09a4ad411e
Merge pull request #3046 from rabbitmq/mk-extra-bcc-routing-target-in-queue-metadata
Make it possible for queues to have extra BCC targets specified as options
2021-05-19 17:56:39 +03:00
Karl Nilsson a96670b6c6 Fix stream x-stream-offset regression
x-stream-offset supports "friendly" relative timebase specifications
such as 100s. A recent change introduced a validation of the x-stream-offset
that disallowed such specs.
2021-05-19 11:23:37 +01:00
Michael Klishin 38c15d691d
Make it possible for queues to have extra BCC targets specified as options
This introduces a backup mechanism that can be controlled
by plugins via policies.

Benchmarks suggest the cost of this change on
Erlang 24 is well under 1%. With a stream target, it is less
than routing to one extra queue of the same type (e.g. a quorum queue).
2021-05-18 22:22:16 +03:00
dcorbacho 733f5fb367 Report stream coordinator unavailable as an amqp error
Uses code 506: resource_error
2021-05-12 17:12:09 +01:00
Karl Nilsson 94e943692b
Merge pull request #3022 from rabbitmq/relative-time-offset
Support relative time based offset specs
2021-05-11 13:50:00 +01:00
Loïc Hoguin d9344b2b58
Set segment_entry_count per vhost and use a better default
The new default of 2048 was chosen based on various scenarios.
It provides much better memory usage when many queues are used
(allowing one host to go from 500 queues to 800+ queues) and
there seems to be none or negligible performance cost (< 1%)
for single queues.
2021-05-11 10:45:28 +02:00
dcorbacho 464bf69cc4 Support relative time based offset specs 2021-05-03 17:55:43 +02:00
dcorbacho bcac37d442 Disallow removal of the last stream member 2021-04-30 17:25:06 +02:00
Michael Klishin 62df3b7ebc
Reduce log output 2021-04-28 00:24:06 +03:00
Karl Nilsson 63e33aef6d
Merge pull request #2996 from rabbitmq/stream-add-replica-check
Streams: safer replica addition
2021-04-27 10:37:57 +01:00
kjnilsson a827275a43 Streams: safer replica addition
Disallow replica additions if any of the existing replicas are more than
10 seconds out of date.
2021-04-27 09:38:39 +01:00
Ayanda-D d78e14ad3b Allow #amqp_error{} responses in channel interceptors 2021-04-20 14:57:55 +01:00
Michael Klishin d147a08aee
Correct parse tags provided as a list
Discovered while testing a PR for rabbit-hole
2021-04-16 18:35:47 +03:00
kjnilsson b35c29d7b2 QQ: ensure that messages are delivered in order
In the case where there are some messages kept in memory mixed with
some that are not it is possible that a messages are delivered to the
consuming channel with gaps/out of order which would in some cases cause
the channel to treat them as re-sends it has already seen and just
discard them. When this happens the messages get stuck in the consumer
state inside the queue and are never seen by the client consumer and
thus never acked. When this happen the release cursors can't be emitted
as the smallest raft index will be one of the stuck messages.
2021-04-15 15:01:22 +01:00
Michael Klishin 6a4ee16b79
Merge pull request #2968 from rabbitmq/longer-qq-names
Allow quorum queue names to exceed atom max chars
2021-04-12 18:45:22 +03:00
kjnilsson 432edb11fc Allow quorum queue names to exceed atom max chars
If the concatenation of the vhost and the queue name exceeds 255 chars
we instead generate an arbitrary atom name instead of throwing an
exception.
2021-04-12 14:14:26 +01:00
Michael Klishin 7f98bc3d1c
Add more VM memory monitor tests, pass Dialyzer
(cherry picked from commit 57ec1f8768)
2021-04-11 11:36:30 +03:00
Michael Klishin 30cbbba167
High VM watermark: support {relative, N} values set via advanced.config
for usability. It is not any different from when a float value
is used and only exists as a counterpart to '{absolute, N}'.

Also nothing changes for rabbitmq.conf users as that format performs
validation and correct value translation.

See #2694, #2965 for background.
2021-04-11 10:28:35 +03:00
Philip Kuryloski 3644ed58ee Test sharding and flaky annotations
Also rename a nested common test group in quorum_queue_SUITE to avoid
a name collision that prevented running the duplicates individually
2021-04-08 15:33:19 +02:00
kjnilsson e2fd14b996 Bump timeouts for peer discovery suite 2021-04-07 10:00:07 +01:00
kjnilsson b576242952 Increase rabbit_stream_queue_SUITE timetrap
And set the default of make start-cluster to 3 nodes.
2021-04-06 15:50:22 +01:00
Jean-Sébastien Pédron 95f9e92caa
unit_log_management_SUITE: Use $RABBITMQ_LOGS to configure logging
Now that the Cuttlefish schema sets default values for the application
environment in `{rabbit, [{log, ...}]}`, the values set in the testsuite
using application:setenv() are overwritten.

By using the $RABBITMQ_LOGS environment variable, we can override those
default values.
2021-04-06 11:52:55 +02:00
Philip Kuryloski 0caeb65d04 Shard the eager_sync_SUITE by case
This suite contains only one group, but is long enough to warrant
sharding. This is probably a bit of a time penalty in absolute terms
because init_per_suite and init_per_group re-run in each shard.
2021-03-31 15:47:36 +02:00
Jean-Sébastien Pédron 571b97513f
Logging: Allow to set timezone in rfc3339- and format-string-based time formats
This is not exposed to the end user (yet) through the Cuttlefish
configuration. But this is required to make logging_SUITE timezone
agnostic (i.e. the timezone of the host running the testsuite should not
affect the formatted times).
2021-03-31 14:13:40 +02:00
Carl Hörberg 330b820a0f Update proxy protocol test cases 2021-03-30 16:55:36 +02:00
Jean-Sébastien Pédron 2f648da118
config_schema_SUITE: Stop testing log configuration
The design of the rabbit_ct_config_schema helper makes it impossible to
do pattern matching and thus handle default values in the schema. As a
consequence, the helper explicitly removes the `{rabbit, {log, _}}`
configuration key to work around this limitation until a proper solution
is implemented and all testsuites rewritten. See
rabbitmq/rabbitmq-ct-helpers@b1f1f1ce68.

Therefore, we can't test log configuration variables anymore using this
helper. Thatt's ok because logging_SUITE already tests many things.
2021-03-30 10:21:26 +02:00
Jean-Sébastien Pédron aca638abbb
Logging: Add configuration variables to set various formats
In addition to the existing configuration variables to configure
logging, the following variables were added to extend the settings.

log.*.formatter = plaintext | json
  Selects between the plain text (default) and JSON formatters.

log.*.formatter.time_format = rfc3339_space | rfc3339_T | epoch_usecs | epoch_secs | lager_default
  Configures how the timestamp should be formatted. It has several
  values to get RFC3339 date & time, Epoch-based integers and Lager
  default format.

log.*.formatter.level_format = lc | uc | lc3 | uc3 | lc4 | uc4
  Configures how to format the level. Things like uppercase vs.
  lowercase, full vs. truncated.
  Examples:
    lc: debug
    uc: DEBUG
    lc3: dbg
    uc3: DBG
    lw4: dbug
    uc4: DBUG

log.*.formatter.single_line = on | off
  Indicates if multi-line messages should be reformatted as a
  single-line message. A multi-line message is converted to a
  single-line message by joining all lines and separating them
  with ", ".

log.*.formatter.plaintext.format
  Set to a pattern to indicate the format of the entire message. The
  format pattern is a string with $-based variables. Each variable
  corresponds to a field in the log event. Here is a non-exhaustive list
  of common fields:
    time
    level
    msg
    pid
    file
    line
  Example:
    $time [$level] $pid $msg

log.*.formatter.json.field_map
  Indicates if fields should be renamed or removed, and the ordering
  which they should appear in the final JSON object. The order is set by
  the order of fields in that coniguration variable.
  Example:
    time:ts level msg *:-
  In this example, `time` is renamed to `ts`. `*:-` tells to remove all
  fields not mentionned in the list. In the end the JSON object will
  contain the fields in the following order: ts, level, msg.

log.*.formatter.json.verbosity_map
  Indicates if a verbosity field should be added and how it should be
  derived from the level. If the verbosity map is not set, no verbosity
  field is added to the JSON object.
  Example:
    debug:2 info:1 notice:1 *:0
  In this example, debug verbosity is 2, info and notice verbosity is 1,
  other levels have a verbosity of 0.

All of them work with the console, exchange, file and syslog outputs.

The console output has specific variables too:

log.console.stdio = stdout | stderr
  Indicates if stdout or stderr should be used. The default is stdout.

log.console.use_colors = on | off
  Indicates if colors should be used in log messages. The default
  depends on the environment.

log.console.color_esc_seqs.*
  Indicates how each level is mapped to a color. The value can be any
  string but the idea is to use an ANSI escape sequence.
  Example:
    log.console.color_esc_seqs.error = \033[1;31m

V2: A custom time format pattern was introduced, first using variables,
    then a reference date & time (e.g. "Mon 2 Jan 2006"), thanks to
    @ansd. However, we decided to remove it for now until we have a
    better implementation of the reference date & time parser.

V3: The testsuite was extended to cover new settings as well as the
    syslog output. To test it, a fake syslogd server was added (Erlang
    process, part of the testsuite).

V4: The dependency to cuttlefish is moved to rabbitmq_prelaunch which
    actually uses the library. The version is updated to 3.0.1 because
    we need Kyorai/cuttlefish#25.
2021-03-29 17:39:50 +02:00
Philip Kuryloski 388654c542
Add a partial Bazel build (#2938)
Adds WORKSPACE.bazel, BUILD.bazel & *.bzl files for partial build & test with Bazel. Introduces a build-time dependency on https://github.com/rabbitmq/bazel-erlang
2021-03-29 11:01:43 +02:00
Philip Kuryloski 09e85d2e3d
Merge pull request #2935 from rabbitmq/rabbitmq-queue-int-tests
Fix integration tests to wait until ra cluster is ready
2021-03-26 17:28:06 +01:00
dcorbacho a1caff2a86 Fix integration tests to wait until ra cluster is ready
Publish/confirm before grow/shrink members is enough
2021-03-26 17:04:50 +01:00
Philip Kuryloski 1ead01081a Increase startup delay range in peer_discovery_classic_config_SUITE
I suspect the second ra system for coordination requires a bit more
time in boot, as this seems to flake more often since the merge
2021-03-26 14:11:36 +01:00
Philip Kuryloski 3c0c0901b1 Restore retry in peer_discovery_classic_config_SUITE
It was accidentally left commented out
2021-03-25 20:05:36 +01:00
Philip Kuryloski c313f36b57 Fix Makefile for feature_flags_SUITE_data/my_plugin
It was not updated for the rabbitmq-components.mk consolidation
2021-03-25 19:43:48 +01:00
Philip Kuryloski 008e47ef3c Fixup the behavior of rabbit_mnesia:is_virgin_node/0
Given the addition of the Coord ra system (and additional files on disk)
2021-03-25 10:49:17 +01:00
kjnilsson 8d8b67bb34 fix rabbit_fifo_int_SUITE 2021-03-24 14:17:34 +00:00
Michael Klishin 8eac876bc8
Use "quorum_queues" for QQ Ra system
"quorum" and "coordination" are not very distinctive
2021-03-22 21:44:19 +03:00
kjnilsson 75cea78415
fixes 2021-03-22 21:44:19 +03:00
kjnilsson f6f02a5d2d
ra systems wip 2021-03-22 21:44:15 +03:00
Philip Kuryloski a63f169fcb Remove duplicate rabbitmq-components.mk and erlang.mk files
Also adjust the references in rabbitmq-components.mk to account for
post monorepo locations
2021-03-22 15:40:19 +01:00
Michael Klishin 373285093e
Merge pull request #2899 from rabbitmq/parallel-stream-suite
Run most stream tests in parallel
2021-03-19 22:21:18 +03:00
Jean-Sébastien Pédron 9fd2d68e7a
rabbit_prelaunch_logging: $RABBITMQ_LOGS doesn't override log level
... if it is set in the configuration file.

Here is an example of that use case:
* The official Docker image sets RABBITMQ_LOGS=- in the environment
* A user of that image adds a configuration file with:
      log.console.level = debug

The initial implementation, introduced in rabbitmq/rabbitmq-server#2861,
considered that if the output is overriden in the environment (through
$RABBITMQ_LOGS), any output configuration in the configuration file is
ignored.

The problem is that the output-specific configuration could also set the
log level which is not changed by $RABBITMQ_LOGS. This patch fixes that
by keeping the log level from the configuration (if it is set obviously)
even if the output is overridden in the environment.
2021-03-19 15:43:28 +01:00
dcorbacho 9b3b5d48ec Run most stream tests in parallel
The test suite isn't faster, I guess some contention on the coordinator,
but is finding some bugs.
2021-03-17 21:32:42 +01:00
kjnilsson cbf0107605 Stream coordinator bug fix
Fix issue where a deleted replica could be restarted if the leader went
down whilst the replica was still running it's start phase.
2021-03-17 13:54:28 +00:00
kjnilsson 9d83e0c5d9 Add logging to config decryption test
To possibly get a bit more information on failure reasons on GH Actions.
2021-03-16 16:28:41 +00:00
kjnilsson 3a26cf8654 Stream coordinator: handle commands for unknown streams
To avoid crashing.
2021-03-12 15:04:40 +00:00
kjnilsson 1709208105 Throw resource error when no local stream member
As well as some additional tests
2021-03-12 15:04:40 +00:00
dcorbacho e19aca8075 Use right map fields to compute streams info 2021-03-12 15:04:40 +00:00
kjnilsson 7fa3f6b6e1 Stream Coordinator: primitive backoff
Sleep for 5s after a failure due to a node being down before reporting
back to stream coordinator (which will immediately retry).

stream coordinator: correct command type spec

tidy up

fix rabbit_fifo_prop tests

stream coord: add function for member state query
2021-03-12 15:03:47 +00:00
kjnilsson bb3e0a7674 Move stream coordinator unit tests into ct suite 2021-03-12 15:03:10 +00:00
kjnilsson 9fb2e6d2dd Stream Coordinator refactor 2021-03-12 15:03:08 +00:00
Jean-Sébastien Pédron cdcf602749
Switch from Lager to the new Erlang Logger API for logging
The configuration remains the same for the end-user. The only exception
is the log root directory: it is now set through the `log_root`
application env. variable in `rabbit`. People using the Cuttlefish-based
configuration file are not affected by this exception.

The main change is how the logging facility is configured. It now
happens in `rabbit_prelaunch_logging`. The `rabbit_lager` module is
removed.

The supported outputs remain the same: the console, text files, the
`amq.rabbitmq.log` exchange and syslog.

The message text format slightly changed: the timestamp is more precise
(now to the microsecond) and the level can be abbreviated to always be
4-character long to align all messages and improve readability. Here is
an example:

    2021-03-03 10:22:30.377392+01:00 [dbug] <0.229.0> == Prelaunch DONE ==
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>  Starting RabbitMQ 3.8.10+115.g071f3fb on Erlang 23.2.5
    2021-03-03 10:22:30.377860+01:00 [info] <0.229.0>  Licensed under the MPL 2.0. Website: https://rabbitmq.com

The example above also shows that multiline messages are supported and
each line is prepended with the same prefix (the timestamp, the level
and the Erlang process PID).

JSON is also supported as a message format and now for any outputs.
Indeed, it is possible to use it with e.g. syslog or the exchange. Here
is an example of a JSON-formatted message sent to syslog:

    Mar  3 11:23:06 localhost rabbitmq-server[27908] <0.229.0> - {"time":"2021-03-03T11:23:06.998466+01:00","level":"notice","msg":"Logging: configured log handlers are now ACTIVE","meta":{"domain":"rabbitmq.prelaunch","file":"src/rabbit_prelaunch_logging.erl","gl":"<0.228.0>","line":311,"mfa":["rabbit_prelaunch_logging","configure_logger",1],"pid":"<0.229.0>"}}

For quick testing, the values accepted by the `$RABBITMQ_LOGS`
environment variables were extended:
  * `-` still means stdout
  * `-stderr` means stderr
  * `syslog:` means syslog on localhost
  * `exchange:` means logging to `amq.rabbitmq.log`

`$RABBITMQ_LOG` was also extended. It now accepts a `+json` modifier (in
addition to the existing `+color` one). With that modifier, messages are
formatted as JSON intead of plain text.

The `rabbitmqctl rotate_logs` command is deprecated. The reason is
Logger does not expose a function to force log rotation. However, it
will detect when a file was rotated by an external tool.

From a developer point of view, the old `rabbit_log*` API remains
supported, though it is now deprecated. It is implemented as regular
modules: there is no `parse_transform` involved anymore.

In the code, it is recommended to use the new Logger macros. For
instance, `?LOG_INFO(Format, Args)`. If possible, messages should be
augmented with some metadata. For instance (note the map after the
message):

    ?LOG_NOTICE("Logging: switching to configured handler(s); following "
                "messages may not be visible in this log output",
                #{domain => ?RMQLOG_DOMAIN_PRELAUNCH}),

Domains in Erlang Logger parlance are the way to categorize messages.
Some predefined domains, matching previous categories, are currently
defined in `rabbit_common/include/logging.hrl` or headers in the
relevant plugins for plugin-specific categories.

At this point, very few messages have been converted from the old
`rabbit_log*` API to the new macros. It can be done gradually when
working on a particular module or logging.

The Erlang builtin console/file handler, `logger_std_h`, has been forked
because it lacks date-based file rotation. The configuration of
date-based rotation is identical to Lager. Once the dust has settled for
this feature, the goal is to submit it upstream for inclusion in Erlang.
The forked module is calld `rabbit_logger_std_h` and is based
`logger_std_h` in Erlang 23.0.
2021-03-11 15:17:36 +01:00
Michael Klishin d77609bba4
Merge pull request #2846 from rabbitmq/cleanup-rabbit-fifo-usage
Clean up rabbit_fifo_usage table on queue.delete
2021-03-03 18:33:33 +03:00
Michael Klishin a2f98f25e9
Merge pull request #2804 from rabbitmq/rabbitmq-server-2756
Add federation support for quorum queues
2021-02-25 19:10:15 +03:00
dcorbacho a147cc4877 Clean up rabbit_fifo_usage table on queue.delete 2021-02-25 16:57:43 +01:00
Michael Klishin cd1a271499
As of Lager 3.8.2, Lager has a log_root default
so override it unconditionally.
2021-02-25 00:43:02 +03:00
dcorbacho 699cd1ab29 Add federation support for quorum queues 2021-02-18 17:15:47 +01:00
Carl Hörberg 413bfe7b37 Disable Erlang busy wait by default
By disabling Erlang busy wait threshold CPU usage with 5000 idle connection
drops from 110% to 14%. Throughput does not seem to be affected at all,
if any thing it actually goes up a bit when you have 5000 idle connections
(because less CPU cycles are wasted polling idle connections).

rabbitmq-perf-test-2.13.0/bin/runjava com.rabbitmq.perf.PerfTest -s 8000 -z 15

With default erlang busy wait threshold:
id: test-115706-497, sending rate avg: 39589 msg/s
id: test-115706-497, receiving rate avg: 39570 msg/s

With busy wait disabled:
id: test-115807-719, sending rate avg: 40340 msg/s
id: test-115807-719, receiving rate avg: 40301 msg/s

rabbitmq-diagnostics runtime_thread_stats output while running the
PerfTest:

with default busy wait threshold:

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.01%    0.00%    0.00%    0.00%    0.00%    0.00%   99.98%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.03%    0.05%    0.00%   99.92%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.01%    0.00%   99.99%
          poll    0.00%    0.67%    0.00%    0.00%    0.00%    0.00%   99.33%
     scheduler    0.69%    0.18%   28.41%    5.49%    9.50%    7.43%   48.29%

without busy wait threshold:

Stats per type:
         async    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
           aux    0.01%    0.00%    0.00%    0.00%    0.01%    0.00%   99.98%
dirty_cpu_sche    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
dirty_io_sched    0.00%    0.00%    0.00%    0.00%    0.00%    0.00%  100.00%
          poll    0.00%    0.77%    0.00%    0.00%    0.00%    0.00%   99.23%
     scheduler    0.70%    0.14%   28.29%    5.41%    0.86%    7.22%   57.38%
2021-02-10 12:35:12 +01:00
Michael Klishin ad20bfbc40
Use new crypto API cipher name here
References rabbitmq/credentials-obfuscation#10
2021-02-09 11:22:48 +03:00
Michael Klishin 0939cec51a
Exclude aes_ige256 in one more test suite 2021-02-08 11:21:16 +03:00