[Why]
The testcase tries to replicate the steps described in issue #12934.
[How]
It uses intermediate Erlang nodes between the common_test control node
and the RabbitMQ nodes, using `peer` standard_io communication. The goal
is to make sure the common_test control node doesn't interfere with the
nodes the RabbitMQ nodes can see, despite the blocking of the Erlang
distribution connection.
So far, I couldn't reproduce the problem reported in #12934. @mkuratczyk
couldn't either, so it might have been fixed as a side effect of another
change...
References #12934.
This is a follow-up to commit 93db480bc4
`erlang.mk` supports the `YRL_ERLC_OPTS` variable to set `erlc`-specific
compiler options when processing `.yrl` and `.xrl` files. By using this
variable, it allows `make RMQ_ERLC_OPTS=` to disable the
`+deterministic` option. This allows using `c()` in the erl shell to
recompile modules on the fly when a cluster is running.
You can see that, when `make RMQ_ERLC_OPTS=` is run, these generated
files were produced with the `+deterministic` option, because their
`-file` directives use only basenames.
* `deps/rabbit/src/rabbit_amqp_sql_lexer.erl`
* `deps/rabbit/src/rabbit_amqp_sql_parser.erl`
```
-file("rabbit_amqp_sql_parser.yrl", 0).
-module(rabbit_amqp_sql_parser).
-file("rabbit_amqp_sql_parser.erl", 3).
-export([parse/1, parse_and_scan/1, format_error/1]).
-file("rabbit_amqp_sql_parser.yrl", 122).
```
This commit also ignores those two files, as they will always be
auto-generated.
[Why]
If a user configures an auth backend module, but doesn't enabled the
plugin that provides it, it will get a crash and a stacktrace when
authentication is performed. The error is not helpful to understand what
the problem is.
[How]
We add a boot step that go through the configured auth backends and
query the core of RabbitMQ and the plugins. If an auth backend is
provided by a plugin, the plugin must be enabled to consider the auth
backend to be valid.
In the end, at least one auth backend must be valid, otherwise the boot
is aborted.
If only some of the configured auth backends were filtered out, but
there are still some valid auth backends, we store the filtered list in
the application environment variable so that
authentication/authorization doesn't try to use them later.
We also report invalid auth backends in the logs:
* Info message for a single invalid auth backend:
[info] <0.213.0> The `rabbit_auth_backend_ldap` auth backend module is configured. However, the `rabbitmq_auth_backend_ldap` plugin must be enabled in order to use this auth backend. Until then it will be skipped during authentication/authorization
* Warning message when some auth backends were filtered out:
[warning] <0.213.0> Some configured backends were dropped because their corresponding plugins are disabled. Please look at the info messages above to learn which plugin(s) should be enabled. Here is the list of auth backends kept after filering:
[warning] <0.213.0> [rabbit_auth_backend_internal]
* Error message when no auth backends are valid:
[error] <0.213.0> None of the configured auth backends are usable because their corresponding plugins were not enabled. Please look at the info messages above to learn which plugin(s) should be enabled.
V2: In fact, `rabbit_plugins:is_enabled/1` indicates if a plugin is
running, not if it is enabled... The new check runs as a boot step
and thus is executed before plugins are started. Therefore we can't
use this API. Instead, we use `rabbit_plugins:enabled_plugins/0'
which lists explicitly enabled plugins. The drawback is that in the
auth backend is enabled implicitly because it is a dependency of
another explicitly enabled plugin, the check will still consider it
is disabled and thus abort the boot.
Fixes#13783.
[Why]
This will be used in a later commit to find the auth backend plugin that
provides a configured auth backend module.
[How]
We go through the list of available plugins, regardless if they are
enabled or not, then look up the given module in the list of modules
associated with each plugin's application.
... without having to pass a plugins path.
[Why]
It's painful to have to get the plugins path, then pass it to `list/1`
every time. It's also more difficult to discover how to use
`rabbit_plugins` to get that list of plugins.
The code path in question is executed every time
rabbit_plugins:list/2 (e.g. rabbit_plugins:is_enabled/1)
is used, which with some distributed plugins can
happen once or several times a minute.
Given the maturity of the plugins subsystem, we
arguably can drop those messages.
This version forces prefixed binaries
(such as encrypted:TkQbjiVWtUJw3Ed/hkJ5JIsFIyhruKII6uKPXogfvDyMXGH1qQK3hVqshFolLN0S)
to have alphanumeric prefixes ([a-zA-Z0-9_]+).
This allows us to tell a generated password value
with a colon from an tagged binary.
If a value of, say, default_pass or ssl_options.password
cannot be parsed as a tagged value, it will be
parsed as a regular binary, because rabbit.schema
specifies multiple types as supported.
References #14233.
The rabbitmq_stream.advertised_tls_host setting is not used in the
metadata frame of the stream protocol, even if it is set. This commit
makes sure the setting is used if set.
References rabbitmq/rabbitmq-stream-java-client#803
[Why]
I noticed the following error in a test case:
error sending frame
Traceback (most recent call last):
File "/home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbitmq_stomp/test/python_SUITE_data/src/deps/stomp/transport.py", line 623, in send
self.socket.sendall(encoded_frame)
OSError: [Errno 9] Bad file descriptor
When the test suite succeeds, this error is not present. When it failed,
it was present. But I checked only one instance of each, it's not enough
to draw any conclusion about the relationship between this error and the
failing test case later.
I have no idea which test case hits this error, so increase the
verbosity, in the hope we see the name of the test case running at the
time of this error.
[Why]
I still don't know what causes the transient failures in this testsuite.
The AMQP connection is closed asynchronously, therefore the next test
case is running when it finishes to close. I have no idea if it causes
troubles, but it makes the broker logs more difficult to read.
[Why]
The `test_topic_dest` test case fails from time to time in CI. I don't
know why as there are no errors logged anywhere. Let's assume it's a
timeout a bit too short.
While here, apply the same change to `test_exchange_dest`.
[Why]
`gen_tcp:close/1` simply closes the connection and doesn't wait for the
broker to handle it. This sometimes causes the next test to fail
because, in addition to that test's new connection, there is still the
previous one's process still around waiting for the broker to notice the
close.
[How]
We now wait for the connection to be closed at the end of a test case,
and wait for the connection list to have a single element when we want
to query the connnection name.
[Why]
The connection is about to be killed at the end of the test case. It's
not necessary to close it explicitly.
Moreover, on a slow environment like CI, the connection process might
have already exited when the test case tries to close it. In this case,
it fails with a `noproc` exception.
... when testing user limits
[How]
This is the same fix as the one for the vhost limits test case made in
commit 5aab965db4.
While here, fix a compiler warning about an unused variable.
[Why]
Relying on the return value of the queue deletion is fragile because the
policy is cleared asynchronously.
[How]
We now wait for the queues to reach the expected queue length, then we
delete them and ensure the length didn't change.
[Why]
Before this change, when the `idle_time_out_on_server/1` test case was runned first in the
shuffled test group, the test module was not loaded on the remote broker.
When the anonymous function was passed to meck and was executed, we got
the following crash on the broker:
crasher:
initial call: rabbit_heartbeat:'-heartbeater/2-fun-0-'/0
pid: <0.704.0>
registered_name: []
exception error: {undef,
[{#Fun<amqp_client_SUITE.14.116163631>,
[#Port<0.45>,[recv_oct]],
[]},
{rabbit_heartbeat,get_sock_stats,3,
[{file,"rabbit_heartbeat.erl"},{line,175}]},
{rabbit_heartbeat,heartbeater,3,
[{file,"rabbit_heartbeat.erl"},{line,155}]},
{proc_lib,init_p,3,
[{file,"proc_lib.erl"},{line,317}]},
{rabbit_net,getstat,[#Port<0.45>,[recv_oct]],[]}]}
This led to a failure of the test case later, when it waited for a
message from the connecrtion.
We do the same in two other test cases where this is likely to happen
too.
[How]
Loading the module first fixes the problem.
[Why]
Maven took ages to fetch dependencies at least once in CI. The testsuite
failed because it reached the time trap limit.
[How]
Increase it from 2 to 5 minutes.
[Why]
The `rabbit_consistent_hash_exchange_raft_based_metadata_store` does not
seem to be a feature flag that ever existed according to the git
history. This causes the test case to always be skipped.
[How]
Simply remove the statement that enables this ghost feature flag.
[Why]
In CI, we observe that the channel hangs sometimes.
rabbitmq_ct_client_helpers implicit connection is quite fragile in the
sense that a test case can disturb the next one in some cases.
[How]
Let's use a dedicated connection and see if it fixes the problem.
[Why]
The `stream_pub_sub_metrics` test failed at least once in CI because the
`rabbitmq_stream_consumer_max_offset_lag` was 4 instead of the expected
3 on line 815.
I couldn't reproduce the problem so far.
[How]
The test case now logs the initial value of that metric at the beginning
of the test function. Hopefully this will give us some clue for the day
it fails again.