This check is expected to succeed and the status is expected to be
printed to stdout rather than stderr. This change silences the status
output. The status text was printed mistakenly previously because we
captured stderr rather than stdout.
This previously emitted a warning because Elixir will rebind `this_node`
by default, so the `this_node` binding in the line above was unused.
(As opposed to Erlang which would treat this as a match - rejecting
the binding if `this_node` was not equal to the value being matched.)
The node needed to be adjusted as well - `node()` returned the ExUnit
runner's node while the command returned the remote node, which is
stored in the context under `opts.node`.
`rabbit_binary_generator:map_exception/3` will crash when there are
unicode characters in the `explaination` field of `Reason#amqp_error`
parameter. The explaination string (list) is assumed to be ascii, with
each character/member in the range of a byte. Any unicode characters
in the string will trigger `badarg` crash of `list_to_binary/1` in
`rabbit_binary_generator:amqp_exception_explanation/2`.
Amqp091 shovel crash due to this is reported,
https://github.com/rabbitmq/rabbitmq-server/discussions/12874
When a queue as shovel source/destination does not exist, and its
name contains non-ascii characters, the explaination of amqp_error
will be like `no queue non_ascii_name_😍 in vhost /`. It will
subsequently crash and even affect management console.
To fix this, `unicode:characters_to_binary/1` is used instead of
`list_to_binary/1`, and unicode-safe truncation of long explaination
with `io_lib:format/3` chars_limit replaces direct bytes truncation.
The `:io.format/2` call was originally passed a single-quote string
(i.e. a charlist in Elixir terminology) which emits a warning in more
recent Elixir versions:
warning: single-quoted strings represent charlists. Use ~c"" if you indeed want a charlist or use "" instead
└─ nofile:1:12
This warning would pop up a few times when using `make dialyze` within
a deps directory. To resolve it we can switch the quoting so that the
eval string is wrapped in single quotes (equivalent for shell since this
line doesn't use variables) and the format argument is wrapped in double
quotes. This uses a binary in Elixir instead, but that's ok because
`io:format/3`'s `io:format()` parameter may either be an atom, string,
or binary.
This trick was copied from Makefile:49 which uses the same quoting.
[Why]
The test configuration was querying a network interface IP address based
on its name. However, the name, "eth0", is very specific to Linux. This
broke the test on other systems.
[How]
We still have to set an explicit `bind_addr` because Consul refuses to
start if the host has multiple private IPv4 addresses, as it is the case
in CI.
Therefore, we hard-code 127.0.0.1 as the IPv4 address to use because it has a
great chance to exist about anywhere.
[Why]
Two reasons:
1. We need to set the correct feature flags on the test node we have to
start.
2. We can skip Mnesia- or Khepri-specific tests if they are marked.
[Why]
The `run-background-broker` does not wait for the node to be ready,
leading to some transient errors in the testsuite.
[How]
The `start-background-broker` does wait.
While here, export the value of `$(MAKE)`. Otherwise, nested uses of
make(1) may use the wrong make command.
[Why]
The code assumed that the transaction would always succeed. It was kind
of the case with Mnesia because it would throw an exception if it
failed.
Khepri returns an error instead. The code has to handle it. In
particular, we see timeouts in CI and before this patch, they caused a
crash because the list comprehension was asked to work on a tuple.
[How]
We now retry a few times for 10 seconds.
[Why]
We pin a version of Horus even if we don't use it directly (it is a
dependency of Khepri). But currently, we can't update Khepri while still
needing the fix in Horus 0.3.1.
Horus 0.3.1 works around a crash in `cover` that mostly affects CI for
now.
This pinning will have to go away with the next update of Khepri.
[Why]
The `ra:member_add/3` call returns before the change is committed. This
is ok for that addition but any follow-up changes to the cluster might
be rejected with the `cluster_change_not_permitted` error.
[How]
Instead of changing other places to wait or retry their cluster
membership change, this patch waits for the current add to be applied
before proceeding and returning.
This fixes some transient failures in CI where such follow-up changes
are rejected and not retried, leaving the cluster in an unexpected state
for the testcase.
An example is with
`quorum_queue_SUITE:force_shrink_member_to_current_member/1`
This check fails on a virin node, because the metadata store
is not yet ready to handle the query. However, a virin
node by definition can't have any queues, so let's just return
false without asking.
This changes the line `openssl x509 -in path/to/cert.pem -nameopt RFC2253 -subject -noout` to put the `-in` parameter at the end of the line, so that it's easier to ^W the path and replace it with my own.
Tested that this works with OpenSSL 3.1.6 4 Jun 2024 (Library: OpenSSL 3.1.6 4 Jun 2024) and OpenSSL 3.3.0 9 Apr 2024 (Library: OpenSSL 3.3.0 9 Apr 2024) on an Ubuntu 22.04.4 container and MacOS 14.7.1
See discussion #12807 for details.
rabbit_peer_discovery:normalize/1 can be
changed to only return lists of nodes but then
there is a number of core code paths that
treat a single node as a special "preselected"
value.
So let's keep that part and convert both
sets of nodes to lists before computing the
difference.
[Why]
In CI, we observe some timeouts in the Erlang distribution connections
between the temporary hidden node and the nodes it queries. This affects
peer discovery obviously.
[How]
We introduce some query retries to reduce the risk of an incomplete
query.
While here, we move the sorting of queried nodes from the
`query_node_props2/3` last clause (executed in the temporary hidden
node) to the function setting the temporary hidden node and asking for
these queries. This way the debug messages from that sorting are logged
by RabbitMQ out of the box.
[Why]
This impacts what is reported by the catch because it caught exceptions
emitted by code supposedly called later. An example is the assert
in `query_node_props2/3` last clause.
[Why]
This was the first solution put in place to prevent that the temporary
hidden node connects to the node that started it to write any printed
messages. Because of this, the nodes that the temporary hidden node
queried found out about the parent node and they opened an Erlang
distribution connection to it. This polluted the known nodes list.
However later, the temporary hidden node was started with the
`standard_io` connection option. This prevented the temporary hidden
node from knowing about the node that started it, solving the problem in
a cleaner way.
[How]
This commit garbage-collects that piece of code that is now useless. It
makes the query code way simpler to understand.
Parallel/sharding groups often fail to create certificates in CI.
Most likely it is related to the fact they use the same directory
for certificates. This commit uses shard/node name and unique id
for each SSL certificate
[Why]
That timer was started during boot and continued regardless if `rabbit`
was running or stopped.
This caused the reconsiliation to crash if the `rabbit` app was stopped
before the it ended because it tried to access the database even though
it was stopped or even reset.
[How]
We just check if `rabbit` is running before running one reconciliation
and scheduling a new one.
Empty proplists will be serialized to JSON as arrays,
which they arguably are, and HTTP API clients
expect a regardless of collection size.
References #12552#12699
This undocumented key used to use a simple date-based
formula and used to help support and the core
team.
Nodes no longer have the context to return
a correct response, so all we can do is drop this
key.
This fixes erlang_ls's header resolution. Previously it would confuse
the include_lib of the `khepri.hrl` from Khepri with this header in
the rabbit app.
This header is also specific to how rabbit uses Khepri so I think the
new name fits better.
rabbit:product_version/0 should not return
an 'undefined'.
However, a fallback to the base version is
a technique we already use in 'rabbitmq-diagnostics status',
so adopt the same trick.
The application is not always recompiled which causes tests to fail
because they cannot call `serial_number:usort/1`.
(cherry picked from commit 05a3733722)
Introduce a single place in the AMQP 1.0 Erlang client that infers the AMQP 1.0 type.
Erlang integers are inferred to be AMQP type `long` to avoid overflow surprises.