The rabbitmq_stream.advertised_tls_host setting is not used in the
metadata frame of the stream protocol, even if it is set. This commit
makes sure the setting is used if set.
References rabbitmq/rabbitmq-stream-java-client#803
[Why]
I noticed the following error in a test case:
error sending frame
Traceback (most recent call last):
File "/home/runner/work/rabbitmq-server/rabbitmq-server/deps/rabbitmq_stomp/test/python_SUITE_data/src/deps/stomp/transport.py", line 623, in send
self.socket.sendall(encoded_frame)
OSError: [Errno 9] Bad file descriptor
When the test suite succeeds, this error is not present. When it failed,
it was present. But I checked only one instance of each, it's not enough
to draw any conclusion about the relationship between this error and the
failing test case later.
I have no idea which test case hits this error, so increase the
verbosity, in the hope we see the name of the test case running at the
time of this error.
[Why]
I still don't know what causes the transient failures in this testsuite.
The AMQP connection is closed asynchronously, therefore the next test
case is running when it finishes to close. I have no idea if it causes
troubles, but it makes the broker logs more difficult to read.
[Why]
The `test_topic_dest` test case fails from time to time in CI. I don't
know why as there are no errors logged anywhere. Let's assume it's a
timeout a bit too short.
While here, apply the same change to `test_exchange_dest`.
[Why]
`gen_tcp:close/1` simply closes the connection and doesn't wait for the
broker to handle it. This sometimes causes the next test to fail
because, in addition to that test's new connection, there is still the
previous one's process still around waiting for the broker to notice the
close.
[How]
We now wait for the connection to be closed at the end of a test case,
and wait for the connection list to have a single element when we want
to query the connnection name.
[Why]
The connection is about to be killed at the end of the test case. It's
not necessary to close it explicitly.
Moreover, on a slow environment like CI, the connection process might
have already exited when the test case tries to close it. In this case,
it fails with a `noproc` exception.
... when testing user limits
[How]
This is the same fix as the one for the vhost limits test case made in
commit 5aab965db4.
While here, fix a compiler warning about an unused variable.
[Why]
Relying on the return value of the queue deletion is fragile because the
policy is cleared asynchronously.
[How]
We now wait for the queues to reach the expected queue length, then we
delete them and ensure the length didn't change.
[Why]
Before this change, when the `idle_time_out_on_server/1` test case was runned first in the
shuffled test group, the test module was not loaded on the remote broker.
When the anonymous function was passed to meck and was executed, we got
the following crash on the broker:
crasher:
initial call: rabbit_heartbeat:'-heartbeater/2-fun-0-'/0
pid: <0.704.0>
registered_name: []
exception error: {undef,
[{#Fun<amqp_client_SUITE.14.116163631>,
[#Port<0.45>,[recv_oct]],
[]},
{rabbit_heartbeat,get_sock_stats,3,
[{file,"rabbit_heartbeat.erl"},{line,175}]},
{rabbit_heartbeat,heartbeater,3,
[{file,"rabbit_heartbeat.erl"},{line,155}]},
{proc_lib,init_p,3,
[{file,"proc_lib.erl"},{line,317}]},
{rabbit_net,getstat,[#Port<0.45>,[recv_oct]],[]}]}
This led to a failure of the test case later, when it waited for a
message from the connecrtion.
We do the same in two other test cases where this is likely to happen
too.
[How]
Loading the module first fixes the problem.
[Why]
Maven took ages to fetch dependencies at least once in CI. The testsuite
failed because it reached the time trap limit.
[How]
Increase it from 2 to 5 minutes.
[Why]
The `rabbit_consistent_hash_exchange_raft_based_metadata_store` does not
seem to be a feature flag that ever existed according to the git
history. This causes the test case to always be skipped.
[How]
Simply remove the statement that enables this ghost feature flag.
[Why]
In CI, we observe that the channel hangs sometimes.
rabbitmq_ct_client_helpers implicit connection is quite fragile in the
sense that a test case can disturb the next one in some cases.
[How]
Let's use a dedicated connection and see if it fixes the problem.
[Why]
The `stream_pub_sub_metrics` test failed at least once in CI because the
`rabbitmq_stream_consumer_max_offset_lag` was 4 instead of the expected
3 on line 815.
I couldn't reproduce the problem so far.
[How]
The test case now logs the initial value of that metric at the beginning
of the test function. Hopefully this will give us some clue for the day
it fails again.
[Why]
In CI, we observed failures where the sender runs out of credits and
don't expect that.
[How]
The `amqp_utils:send_messages/3` function already takes care of that.
Move this logic to a `send_message/2` function and use it in
`send_messages/3` and prevriously direct uses of
`amqp10_client:send_msg/2`.
[Why]
In CI, we sometimes observe two tracked connections in the return value.
I don't know yet what they are. Could it be a client that reopened its
crashed connection and because stats are updated asynchronously, we get
two tracked connections for a short period of time?
[Why]
In CI, we sometimes observe two tracked connections in the return value.
I don't know yet what they are. Could it be a client that reopened its
crashed connection and because stats are updated asynchronously, we get
two tracked connections for a short period of time?
[Why]
This doesn't replicate the common_test logs layout, but it will be good
enough to let our GitHub Actions workflow to upload the logs without
specific instructions in the workflow.
[Why]
If we use the list of reachable nodes, it includes nodes which are
currently booting. Trying to start vhost during their start can disturb
their initialization and has a great chance to fail anyway.
Add more tests for the Direct Reply-to feature in AMQP 0.9.1.
This will help the future Direct Reply-To refactoring making sure the
existing behaviour won't break.