In order to retain deterministic results of state machine applications
during upgrades we need to make the stream coordinator versioned such
that we only use the new logic once the stream coordinator switches to
machine version 1.
For booleans, we can prefer the operator policy value
unconditionally, without any safety implications.
Per discussion with @binarin @pjk25
(cherry picked from commit 6edb7396fd)
A channel that first sends a mandatory publish before enabling
confirms mode may not receive confirms for messages published
after that. This is because the publish_seqno was increased
also for mandatory publishes even if confirms were disabled.
But the mandatory feature has nothing to do with publish_seqno.
The issue exists since at least
38e5b687de
The test case introduced focuses for multiple=false. The issue
also exists for multiple=true but it has a different impact:
sending multiple=true,delivery_tag=2 results in both messages
1 and 2 being acked, even if message 2 doesn't exist as far
as the client is concerned. If the message does exist
it might get confirmed earlier than it should have been. The
issue is a bigger problem the more mandatory messages were
sent before enabling confirms mode.
Before this commit, the tests were not including any settle, return, or
discard Ra commands.
Do not pattern match against 'ra_event' because nowadays:
_Opts = [local, ra_event]
From the coordinator's POV each stream has a unique id consisting of the
vhost, queuename and a high resolution timestamp even if several stream ids
relate to the same queue record.
When performing the mnesia update the coordinator now checks that the current stream id
matches that of the update_mnesia action and does not change the queue record if
the stream id is not the same.
This should avoid "old" incarnations of a stream queue updating newer ones
with incorrect information.
This is meant to be used by deployment tools,
core features and plugins
that expect a certain minimum
number of cluster nodes
to be present.
For example, certain setup steps
in distributed plugins might require
at least three nodes to be available.
This is just a hint, not an enforced
requirement. The default value is 1
so that for single node clusters,
there would be no behavior changes.
Deriving a max-cluster-size only from running nodes would create situations where
in a three-node with only two nodes running cluster it would select an non-running
node as follower.
As AMQP 0.9.1 headers are translated into AMQP 1.0 application properties
they are not able to contain complex values such as arrays or tables.
RabbitMQ federation does use array and table values so to avoid crashing when
delivering a federated message to a stream queue we drop them. These header values
should be considered internal however so dropping them before a final queue deliver should not be a huge problem.
The suite level timeout the .erl I've learned is actually per
case. By sharding bu testcase, we can better match the common test
level and bazel level timeouts, such that we can get logs from remote
test run failures.
Previously the bazel timeout and common test timeout were equal, which
meant that in practice the bazel timeout was often reached first, in
which case we don't receive the test logs
If this is not done apps that consume/cancel from empty queues in a loop
will grow the raft log in an unbounded manner. This could also be the
case for the garbage_collect command.
If the queue is empty when a consumer is cancelled it would leave the
consumer id inside the service queue. If an application subscribes/unsubscibes
in a loop from an empty queue this would cause the service queue to never be
cleared up.
NB: whenever we make a change to how the quorum queue state machien is
calculated we need to consider how this effects determinism as during an
upgrade different members may calculate a different service queue state.
In this case it should be ok as they will eventually converge on the same
state once all "dead" consumer ids have been removed from the queue.
In any case it should not affect how messages are assigned to consumers.
Two testcases in the original suite fail if the test is run as the
root user. Currently under remote execution with bazel this is the
only working option. There is a workaround in place, but the entire
suite when run that way takes around 12 minutes. This splits the suite
so that the minimal set of cases is executed using the slower workaround.
Rather than sleeping for 6 seconds, we want to check that replica
recovered multiple times within 30 seconds, and either eventually
succeed, or fail if this does not recover within 30 seconds, the default
await_condition time interval.
Pair: @kjnilsson
Signed-off-by: Gerhard Lazu <gerhard@lazu.co.uk>
Prior to this change, exclusive queues have been subject to the queue
location process, just like other queues. Therefore, if
queue_master_locator was not client-local and x-queue-master-locator was
not set to client-local, an exclusive queue was likely to be located on
a different node than the connection it is exclusive to. This is
suboptimal and may lead to inconsistencies when the queue's node goes
down while the connection's node is still up.
The classic local filesystem source is still supported
using the same traditional configuration key, load_definitions.
Configuration schema follows peer discovery in spirit:
* definitions.import_backend configures the mechanism to use,
which can be a module provided by a plugin
* definitions.* keys can be defined by plugins and contain any
keys a specific mechanism needs
For example, the classic local filesystem source can now be
configured like this:
``` ini
definitions.import_backend = local_filesystem
definitions.local.path = /path/to/definitions.d/definition.json
```
``` ini
definitions.import_backend = https
definitions.https.url = https://hostname/path/to/definitions.json
```
HTTPS may require additional configuration keys related to TLS/x.509
peer verification. Such extra keys will be added as the need for them
becomes evident.
References #3249
The code was passing a number (the timestamp) to
unicode:characters_to_binary/1 which expects an iolist to convert to
UTF-8.
We now verify if we have a number before calling that function. If this
is a number (integer or float), we keep it as is because JSON supports
that type.