msg_store_io_batch_size is no longer used
msg_store_credit_disc_bound appears to be used in the code, but I don't
see any impact of that value on the performance. It should be properly
investigated and either removed completely or fixed, because there's
hardly any point in warning about the values configured
(plus, this settings is hopefully almost never used anyway)
According to the `rabbit_backing_queue` behavious it must always
return `ok`, but it used to return a list of results one for each
priority. That caused the below crash further up the call chain.
```
> rabbit_classic_queue:delete_crashed(Q)
** exception error: no case clause matching [ok,ok,ok,ok,ok,ok,ok,ok,ok,ok,ok]
in function rabbit_classic_queue:delete_crashed/2 (rabbit_classic_queue.erl, line 516)
```
Other backing_queue implementations (`rabbit_variable_queue`) just
exit with a badmatch upon error.
This (very minor) issue is present since 3.13.0 when
`rabbit_classic_queue:delete_crashed_in_backing_queue/1` was
instroduced with Khepri in commit 5f0981c5. Before that the result of
`BQ:delete_crashed/1` was simply ignored.
Include monitored session pids in format_status/1 of rabbit_amqp_writer.
They could be useful when debugging.
The maximum number of sessions per connection is limited, hence the
output won't be too large.
[Why]
In order to make `khepri_db` the default in the future, the handling of
`$RABBITMQ_FEATURE_FLAGS` had to be adapted to be able to *disable*
Khepri instead.
Unfortunately I broke the behavior with stable feature flags that are
only available in the primary umbrella. In this case, they were
automatically enabled and thus, clustering with an old umbrella that did
not have these feature flags failed with `incompatible_feature_flags`.
[How]
The solution is to always use an absolute list of feature flags, not the
new relative list.
V2: Allow a testsuite to skip the configuration of the metadata store.
This is needed for the feature_flags_SUITE testsuite because it
tests the default behavior and the configuration of the metadata
store changes that behavior.
While here, fix a ct log message where variables were swapped
compared to the format strieg expectation.
V3: Enable `rabbitmq_4.0.0` feature flag in rabbit_mgmt_http_SUITE. This
testsuite apparently requires it and if it's not enabled, it fails.
Without this change, consumers using protocols other than the stream
protocol would display as inactive in the Management UI/API and CLI
commands, even though they were receiving messages.
Accidental "fat finger" virtual deletion accidents
would be easier to avoid if there was a protection mechanism
that would apply equally even to CLI tools and external
applications that do not use confirmations for deletion
operations.
This introduce the following changes:
* Virtual host metadata now supports a new queue,
'protected_from_deletion', which, when set,
will be considered by key virtual host deletion function(s)
* DELETE /api/vhosts/{name} was adapted to handle
such blocked deletion attempts to respond with
a 412 Precondition Failed status
* 'rabbitmqctl list_vhosts' and 'rabbitmqctl delete_vhost'
were adapted accordingly
* DELETE /api/vhosts/{name}/deletion/protection
is a new endpoint that can be used to remove
the protective seal (the metadata key)
* POST /api/vhosts/{name}/deletion/protection
marks the virtual host as protected
In the case of the HTTP API, all operations on
virtual host metadata require administrative
privileges from the target user.
Other considerations:
* When a virtual host does not exist, the behavior
remains the same: the original, protection-unaware
code path is used to preserve backwards compatibility
References #12772.
The following scenario led to a channel crash:
1. Publish to a non-existing stream: `perf-test -y 0 -p -e amq.default -t direct -k stream`
2. Declare the stream: `rabbitmqadmin declare queue name=stream queue_type=stream`
There is no pid yet, so we got a function_clause with `none`
```
{function_clause,
[{osiris_writer,write,
[none,<0.877.0>,<<"<0.877.0>_-65ZKFz18ll5lau0phi7CsQ">>,1,
[[0,"Sp",[192,6,5,"B@@AC"]],
[0,"Sr",
[193,38,4,
[[[163,10,<<"x-exchange">>],[161,0,<<>>]],
[[163,13,<<"x-routing-key">>],[161,6,<<"stream">>]]]]],
[0,"Su",[160,12,[<<0,19,252,1,0,0,98,171,20,16,108,167>>]]]]],
[{file,"src/osiris_writer.erl"},{line,158}]},
{rabbit_stream_queue,deliver0,4,
[{file,"rabbit_stream_queue.erl"},{line,540}]},
{rabbit_stream_queue,'-deliver/3-fun-0-',4,
[{file,"rabbit_stream_queue.erl"},{line,526}]},
{lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
{rabbit_queue_type,'-deliver0/4-fun-5-',5,
[{file,"rabbit_queue_type.erl"},{line,707}]},
{maps,fold_1,4,[{file,"maps.erl"},{line,860}]},
{rabbit_queue_type,deliver0,4,
[{file,"rabbit_queue_type.erl"},{line,704}]},
{rabbit_queue_type,deliver,4,
[{file,"rabbit_queue_type.erl"},{line,662}]}]}
```
Co-authored-by: Karl Nilsson <kjnilsson@gmail.com>
[Why]
Up-to RabbitMQ 3.13.x, there was a case where if:
1. you enabled a plugin
2. you enabled its feature flags
3. you disabled the plugin
4. you restarted a node (or upgraded it)
... the node could crash on startup because it had a feature flag marked
as enabled that it didn't know about:
error:{badmatch,#{feature_flags => ...
rabbit_ff_controller:-check_one_way_compatibility/2-fun-0-/3, line 514
lists:all_1/2, line 1520
rabbit_ff_controller:are_compatible/2, line 496
rabbit_ff_controller:check_node_compatibility_task1/4, line 437
rabbit_db_cluster:check_compatibility/1, line 376
This was "fixed" by the new way of keeping the registry in memory
(#10988) because it introduces a slight change of behavior. Indeed, the
old way walked through the `FeatureFlags` map and looked up the state in
the `FeatureStates` map to create the `is_enabled/1` function. The new
way just looks up the state in `FeatureStates`.
[How]
The new testcase succeeds on 4.0.x and `main`, but would fail on 3.13.x
with the aforementionne crash.
[Why]
The feature flag controller that is responsible for enabling a feature
flag may be on a node that doesn't know this feature flag. This is
supported by there is a bug when it queries the callback definition for
that feature flag: it uses its own registry which does not have anything
about this feature flag.
This leads to a crash because the `run_callback/5` funtion tries to use
the `undefined` atom returned by the registry as a map:
crasher:
initial call: rabbit_ff_controller:init/1
pid: <0.374.0>
registered_name: rabbit_ff_controller
exception error: bad map: undefined
in function rabbit_ff_controller:run_callback/5
in call from rabbit_ff_controller:do_enable/3 (rabbit_ff_controller.erl, line 1244)
in call from rabbit_ff_controller:update_feature_state_and_enable/2 (rabbit_ff_controller.erl, line 1180)
in call from rabbit_ff_controller:enable_with_registry_locked/2 (rabbit_ff_controller.erl, line 1050)
in call from rabbit_ff_controller:enable_many_locked/2 (rabbit_ff_controller.erl, line 991)
in call from rabbit_ff_controller:enable_many/2 (rabbit_ff_controller.erl, line 979)
in call from rabbit_ff_controller:updating_feature_flag_states/3 (rabbit_ff_controller.erl, line 307)
in call from gen_statem:loop_state_callback/11 (gen_statem.erl, line 3735)
[How]
The callback definition is now queried from the first node in the list
given as argument. For the common use case where all nodes know about a
feature flag, the first node is the local one, so there should be no
latency caused by the RPC.
See #12963.
[Why]
Once `khepr_db` is enabled by default, we need another way to disable it
to select Mnesia instead.
[How]
We use the new relative forced feature flags mechanism to indicate if we
want to explicitly enable or disable `khepri_db`. This way, we don't
touch other stable feature flags and only mess with Khepri.
However, this mechanism is not supported by RabbitMQ 4.0.x and older.
They will ignore the setting. Therefore, to make this work in
mixed-version testing, we set the `$RABBITMQ_FEATURE_FLAGS` variable for
the secondary umbrella. This part will go away once we test against
RabbitMQ 4.1.x as the secondary umbrella in the future.
At the end, we compare the effective metadata store to the expected one.
If they don't match, we skip the test.
While here, change `rjms_topic_selector_SUITE` to only choose Khepri
without specifying any feature flags.
If handle_tick is called before the machine has finished the upgrade
process, it could receive an old overview format (stats tuple vs map).
Let's ignore it and the next handle tick should be fine.
Unlikely to happen in production, detected on CI with a very low tick timeout
Fixes#12933
The assumption that `x-last-death-*` annotations must have been set
whenever the `deaths` annotation is set was wrong.
Reproducation steps, Option 1:
1. In v3.13.7, dead letter a message from Q1 to Q2 (both can be classic queues).
2. Re-publish the message including its x-death header from Q2 back to Q1.
(RabbitMQ 3.13.7 will interpret this x-death header and set the deaths annotation.)
3. Upgrade to v4.0.4
4. Dead letter the message from Q1 to Q2 will cause the following crash:
```
crasher:
initial call: rabbit_amqqueue_process:init/1
pid: <0.577.0>
registered_name: []
exception exit: {{badkey,<<"x-last-death-exchange">>},
[{mc,record_death,4,[{file,"mc.erl"},{line,410}]},
{rabbit_dead_letter,publish,5,
[{file,"rabbit_dead_letter.erl"},{line,38}]},
{rabbit_amqqueue_process,'-dead_letter_msgs/4-fun-0-',
7,
[{file,"rabbit_amqqueue_process.erl"},{line,1060}]},
{rabbit_variable_queue,'-ackfold/4-fun-0-',3,
[{file,"rabbit_variable_queue.erl"},{line,655}]},
{lists,foldl,3,[{file,"lists.erl"},{line,2146}]},
{rabbit_variable_queue,ackfold,4,
[{file,"rabbit_variable_queue.erl"},{line,652}]},
{rabbit_priority_queue,ackfold,4,
[{file,"rabbit_priority_queue.erl"},{line,309}]},
{rabbit_amqqueue_process,
'-dead_letter_rejected_msgs/3-fun-0-',5,
[{file,"rabbit_amqqueue_process.erl"},
{line,1038}]}]}
```
Reproduction steps, Option 2:
1. Run a 4.0.4 / 3.13.7 mixed version cluster where both queues Q1 and Q2
are hosted on the 4.0.4 node.
2. Send a message to Q1 which dead letters to Q2.
3. Re-publish a message with the x-death AMQP 0.9.1 header from Q2 to
Q1. However, this time make sure to publish to the 3.13.7 node which
forwards this message to Q1 on the 4.0.4 node.
4. Subsequently dead lettering this message from Q1 to Q2 (happening on
the 4.0.4 node) will also cause the crash.
The modified test case in this commit was able to repro this crash via
Option 2 in the mixed version cluster tests on the `v4.0.x` branch.
As the de-duplication plugin is the only adopter of the `is_duplicate`
callback, we now use a simpler signature.
When a message is deemed duplicated, we discard it and re-route it to
dead letter exchange.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit f93baa35cb)
`is_duplicate` callback signature was changed in order to support both
the mirroring queues as well as the de-duplication ones.
As the mirroring queues are now deprecated and removed, we can fall
back to a simpler boolean as return value.
Signed-off-by: Matteo Cafasso <noxdafox@gmail.com>
(cherry picked from commit c927446e17)
Prior to this commit, when the sending client overshot RabbitMQ's incoming-window
(which is allowed in the event of a cluster wide memory or disk alarm),
and RabbitMQ sent a FLOW frame to the client, RabbitMQ sent a negative
incoming-window field in the FLOW frame causing the following crash in
the writer proc:
```
crasher:
initial call: rabbit_amqp_writer:init/1
pid: <0.19353.0>
registered_name: []
exception error: bad argument
in function iolist_size/1
called as iolist_size([<<112,0,0,23,120>>,
[82,-15],
<<"pÿÿÿü">>,<<"pÿÿÿÿ">>,67,
<<112,0,0,23,120>>,
"Rª",64,64,64,64])
*** argument 1: not an iodata term
in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 141)
in call from amqp10_binary_generator:generate1/1 (amqp10_binary_generator.erl, line 88)
in call from amqp10_binary_generator:generate/1 (amqp10_binary_generator.erl, line 79)
in call from rabbit_amqp_writer:assemble_frame/3 (rabbit_amqp_writer.erl, line 206)
in call from rabbit_amqp_writer:internal_send_command_async/3 (rabbit_amqp_writer.erl, line 189)
in call from rabbit_amqp_writer:handle_cast/2 (rabbit_amqp_writer.erl, line 110)
in call from gen_server:try_handle_cast/3 (gen_server.erl, line 1121)
```
This commit fixes this crash by maintaning a floor of zero for
incoming-window in the FLOW frame.
Fixes#12816
The credit_flow between publishing AMQP 0.9.1 channel (or MQTT
connection) and (non-mirrored) classic queue processes was
unintentionally removed in 4.0 together with anything else related to
CQ mirroring.
By default we restore the 3.x behaviour for non-mirored classic
queues. It is possible to disable flow-control (the earlier 4.0.x
behaviour) with the new env `classic_queue_flow_control`. In 3.x this
was possible with the config `mirroring_flow_control`.
(cherry picked from commit d65bd7d07a)