The following scenario led to a channel crash:
1. Publish to a non-existing stream: `perf-test -y 0 -p -e amq.default -t direct -k stream`
2. Declare the stream: `rabbitmqadmin declare queue name=stream queue_type=stream`
There is no pid yet, so we got a function_clause with `none`
```
{function_clause,
[{osiris_writer,write,
[none,<0.877.0>,<<"<0.877.0>_-65ZKFz18ll5lau0phi7CsQ">>,1,
[[0,"Sp",[192,6,5,"B@@AC"]],
[0,"Sr",
[193,38,4,
[[[163,10,<<"x-exchange">>],[161,0,<<>>]],
[[163,13,<<"x-routing-key">>],[161,6,<<"stream">>]]]]],
[0,"Su",[160,12,[<<0,19,252,1,0,0,98,171,20,16,108,167>>]]]]],
[{file,"src/osiris_writer.erl"},{line,158}]},
{rabbit_stream_queue,deliver0,4,
[{file,"rabbit_stream_queue.erl"},{line,540}]},
{rabbit_stream_queue,'-deliver/3-fun-0-',4,
[{file,"rabbit_stream_queue.erl"},{line,526}]},
{lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
{rabbit_queue_type,'-deliver0/4-fun-5-',5,
[{file,"rabbit_queue_type.erl"},{line,707}]},
{maps,fold_1,4,[{file,"maps.erl"},{line,860}]},
{rabbit_queue_type,deliver0,4,
[{file,"rabbit_queue_type.erl"},{line,704}]},
{rabbit_queue_type,deliver,4,
[{file,"rabbit_queue_type.erl"},{line,662}]}]}
```
Co-authored-by: Karl Nilsson <kjnilsson@gmail.com>
build(deps): bump org.springframework.boot:spring-boot-starter-parent from 3.4.0 to 3.4.1 in /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot_kotlin
build(deps): bump org.springframework.boot:spring-boot-starter-parent from 3.4.0 to 3.4.1 in /deps/rabbitmq_auth_backend_http/examples/rabbitmq_auth_backend_spring_boot
[Why]
It was possible that testcases were executed before the etcd daemon was
ready, leading to test failures.
[How]
There was already a santy check to verify that the etcd daemon was
working correctly, but it was itself a testcase.
This patch moves this code to the etcd start code to wait for it to be
ready.
This replaces the previous workaround of waiting for 2 seconds.
While here, log anything printed to stdout/stderr by etcd after it
exited.
Fixes#12981.
Running
```
make -C deps/rabbitmq_peer_discovery_etcd ct-system
```
on some macOS system causes test failures because the client cannot
connect to etcd:
```
test failed to connect [localhost:2379] by <Gun Down> {down,
{shutdown,
econnrefused}}
```
The etcd log file didn't show any error message.
However, the etcd log file showed that the etcd listener got started
after the test case tried to connect.
This commit fixes the test failure.
A better solution would be to use the HTTP API or the etcdctl CLI to
poll the listener status. However, simply waiting for 2 seconds is good
enough for this test suite.
[Why]
Up-to RabbitMQ 3.13.x, there was a case where if:
1. you enabled a plugin
2. you enabled its feature flags
3. you disabled the plugin
4. you restarted a node (or upgraded it)
... the node could crash on startup because it had a feature flag marked
as enabled that it didn't know about:
error:{badmatch,#{feature_flags => ...
rabbit_ff_controller:-check_one_way_compatibility/2-fun-0-/3, line 514
lists:all_1/2, line 1520
rabbit_ff_controller:are_compatible/2, line 496
rabbit_ff_controller:check_node_compatibility_task1/4, line 437
rabbit_db_cluster:check_compatibility/1, line 376
This was "fixed" by the new way of keeping the registry in memory
(#10988) because it introduces a slight change of behavior. Indeed, the
old way walked through the `FeatureFlags` map and looked up the state in
the `FeatureStates` map to create the `is_enabled/1` function. The new
way just looks up the state in `FeatureStates`.
[How]
The new testcase succeeds on 4.0.x and `main`, but would fail on 3.13.x
with the aforementionne crash.
## Why?
To introduce AMQP over WebSocket, we will add gun to the Erlang AMQP
1.0 client. We want to add the latest version of gun for this new
feature. Since rabbitmq_peer_discovery_etcd depends on the outdated
eetcd 0.3.6 which in turn depends on the outdated gun 1.3.3, this commit
first upgrades eetcd and gun.
## How?
See https://github.com/zhongwencool/eetcd?tab=readme-ov-file#migration-from-eetcd-03x-to-04x
## Breaking Changes
This commit causes the following breaking change:
`rabbitmq.conf` settings
* `cluster_formation.etcd.ssl_options.fail_if_no_peer_cert`
* `cluster_formation.etcd.ssl_options.dh`
* `cluster_formation.etcd.ssl_options.dhfile`
are unsupported because they are not valid `ssl:tls_client_option()`.
See https://github.com/erlang/otp/issues/7497#issuecomment-1636012198
[Why]
The feature flag controller that is responsible for enabling a feature
flag may be on a node that doesn't know this feature flag. This is
supported by there is a bug when it queries the callback definition for
that feature flag: it uses its own registry which does not have anything
about this feature flag.
This leads to a crash because the `run_callback/5` funtion tries to use
the `undefined` atom returned by the registry as a map:
crasher:
initial call: rabbit_ff_controller:init/1
pid: <0.374.0>
registered_name: rabbit_ff_controller
exception error: bad map: undefined
in function rabbit_ff_controller:run_callback/5
in call from rabbit_ff_controller:do_enable/3 (rabbit_ff_controller.erl, line 1244)
in call from rabbit_ff_controller:update_feature_state_and_enable/2 (rabbit_ff_controller.erl, line 1180)
in call from rabbit_ff_controller:enable_with_registry_locked/2 (rabbit_ff_controller.erl, line 1050)
in call from rabbit_ff_controller:enable_many_locked/2 (rabbit_ff_controller.erl, line 991)
in call from rabbit_ff_controller:enable_many/2 (rabbit_ff_controller.erl, line 979)
in call from rabbit_ff_controller:updating_feature_flag_states/3 (rabbit_ff_controller.erl, line 307)
in call from gen_statem:loop_state_callback/11 (gen_statem.erl, line 3735)
[How]
The callback definition is now queried from the first node in the list
given as argument. For the common use case where all nodes know about a
feature flag, the first node is the local one, so there should be no
latency caused by the RPC.
See #12963.
[Why]
Once `khepr_db` is enabled by default, we need another way to disable it
to select Mnesia instead.
[How]
We use the new relative forced feature flags mechanism to indicate if we
want to explicitly enable or disable `khepri_db`. This way, we don't
touch other stable feature flags and only mess with Khepri.
However, this mechanism is not supported by RabbitMQ 4.0.x and older.
They will ignore the setting. Therefore, to make this work in
mixed-version testing, we set the `$RABBITMQ_FEATURE_FLAGS` variable for
the secondary umbrella. This part will go away once we test against
RabbitMQ 4.1.x as the secondary umbrella in the future.
At the end, we compare the effective metadata store to the expected one.
If they don't match, we skip the test.
While here, change `rjms_topic_selector_SUITE` to only choose Khepri
without specifying any feature flags.
Transient (i.e. `durable=false`) exchanges and queues are deprecated.
Khepri will store all entities durably.
(Even exclusive queues will be stored durably. Exclusive queues are
still deleted when the declaring connection is closed.)
Similar to how the RabbitMQ AMQP 1.0 Java client already disallows the
creation of transient exchanges and queues, this commit will prohibit
the declaration of transient exchanges and queues in the RabbitMQ
AMQP 1.0 Erlang client starting with RabbitMQ 4.1.
If handle_tick is called before the machine has finished the upgrade
process, it could receive an old overview format (stats tuple vs map).
Let's ignore it and the next handle tick should be fine.
Unlikely to happen in production, detected on CI with a very low tick timeout
Fixes#12933
The assumption that `x-last-death-*` annotations must have been set
whenever the `deaths` annotation is set was wrong.
Reproducation steps, Option 1:
1. In v3.13.7, dead letter a message from Q1 to Q2 (both can be classic queues).
2. Re-publish the message including its x-death header from Q2 back to Q1.
(RabbitMQ 3.13.7 will interpret this x-death header and set the deaths annotation.)
3. Upgrade to v4.0.4
4. Dead letter the message from Q1 to Q2 will cause the following crash:
```
crasher:
initial call: rabbit_amqqueue_process:init/1
pid: <0.577.0>
registered_name: []
exception exit: {{badkey,<<"x-last-death-exchange">>},
[{mc,record_death,4,[{file,"mc.erl"},{line,410}]},
{rabbit_dead_letter,publish,5,
[{file,"rabbit_dead_letter.erl"},{line,38}]},
{rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',
7,
[{file,"rabbit_amqqueue_process.erl"},{line,1060}]},
{rabbit_variable_queue,'-ackfold/4-fun-0-',3,
[{file,"rabbit_variable_queue.erl"},{line,655}]},
{lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
{rabbit_variable_queue,ackfold,4,
[{file,"rabbit_variable_queue.erl"},{line,652}]},
{rabbit_priority_queue,ackfold,4,
[{file,"rabbit_priority_queue.erl"},{line,309}]},
{rabbit_amqqueue_process,
'-dead_letter_rejected_msgs/3-fun-0-',5,
[{file,"rabbit_amqqueue_process.erl"},
{line,1038}]}]}
```
Reproduction steps, Option 2:
1. Run a 4.0.4 / 3.13.7 mixed version cluster where both queues Q1 and Q2
are hosted on the 4.0.4 node.
2. Send a message to Q1 which dead letters to Q2.
3. Re-publish a message with the x-death AMQP 0.9.1 header from Q2 to
Q1. However, this time make sure to publish to the 3.13.7 node which
forwards this message to Q1 on the 4.0.4 node.
4. Subsequently dead lettering this message from Q1 to Q2 (happening on
the 4.0.4 node) will also cause the crash.
The modified test case in this commit was able to repro this crash via
Option 2 in the mixed version cluster tests on the `v4.0.x` branch.
As the de-duplication plugin is the only adopter of the `is_duplicate`
callback, we now use a simpler signature.
When a message is deemed duplicated, we discard it and re-route it to
dead letter exchange.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit f93baa35cb)
`is_duplicate` callback signature was changed in order to support both
the mirroring queues as well as the de-duplication ones.
As the mirroring queues are now deprecated and removed, we can fall
back to a simpler boolean as return value.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit c927446e17)
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
initial call: rabbit_amqp_writer:init/1
pid: <0.19353.0>
registered_name: []
exception error: bad argument
in function iolist_size/1
called as iolist_size([<<112,0,0,23,120>>,
[82,-15],
<<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
<<112,0,0,23,120>>,
"Rª",64,64,64,64])
*** argument 1: not an iodata term
in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```
This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.
Fixes#12816
The credit_flow between publishing AMQP 0.9.1 channel (or MQTT
connection) and (non-mirrored) classic queue processes was
unintentionally removed in 4.0 together with anything else related to
CQ mirroring.
By default we restore the 3.x behaviour for non-mirored classic
queues. It is possible to disable flow-control (the earlier 4.0.x
behaviour) with the new env `classic_queue_flow_control`. In 3.x this
was possible with the config `mirroring_flow_control`.
(cherry picked from commit d65bd7d07a)