This restores the previous pre-init behaviour where an invalid server
will not stop the Ra system from starting. Instead it will log the
errors and continue.
This ensures compatibility with upgraded older systems and systems
where there are historical discrepancies between what is in the
ra_directory and actually on disk.
This contains a fix in the ra_directory module to ensure
names can be deleted even when a Ra server has never been started
during the current node lifetime.
Also contains a small tweak to ensure the ra_directory:unregister_name
is called before deleting a Ra data directory which is less likely
to cause a corrupt state that will stop a Ra system from starting.
This release contains improvements to the checkpointing feature
needed for quorum queues v4 and the following fixes:
* Add read to file:open/2 options in ra_lib:sync_file/1
* Emit the new local_query tuple only if query options are set
* bug fixes for checkpoints
The beam cache allows switching between app and test
builds without having to rebuild everything. Since
the files keep their mtime and other attributes,
rebuilding continues from where it was left off
before, and only the relevant files get rebuilt
if anything changed.
It has largely been superseded by `perf`. It is no longer
generally useful. It can always be added to BUILD_DEPS for
the rare cases it is needed, or installed locally and
pointed to by setting its path to ERL_LIBS.
This release includes a new machine API `snapshot_installed/2`. This new
API will only be used indirectly through khepri.
This release also includes an performance improvement that reduces the chances
of building a large WAL mailbox backlog when a node is low on scheduling
resources and commands are committed by followers completing writes to disk
before the leader.
There is also a fix for a potential election deadlock.
When there is nothing to do we don't need this variable
so we don't want to calculate it unnecessarily.
Because this variable is only used once, when
producing the .app file, we don't have to worry
about the calculation being done multiple times.
If we ever do then it will need to be lazily
evaluated[1] instead.
[1] Managing Projects with GNU Make, 3rd Edition Chapter 10
Execution speed differences:
make -C deps/rabbit nope 0,02s user 0,03s system 101% cpu 0,051 total
make -C deps/rabbit nope 0,02s user 0,01s system 97% cpu 0,031 total
Compressed ETS tables may introduce a small throughput penalty (low single
digit %) but can reduce peak Ra memory use by 30-50%.
Also set a default wal_max_entries value to avoid mem tables growing
too large when using very small message sizes (as more than 1M tiny
messages can easily fit into one WAL file).
Ra 2.10.1 has a type spec fix needed.
This Ra release contains a number of fixes and improvements including:
* Much improved resiliency when Ra infrastructure such as the WAL or
segment writer encounters unexpected errors during disk operations.
It also includes the following features that are RabbitMQ does not
yet make use of (but will in the near future).
* Checkpoints: allow non truncating snapshots to be written
to allow faster recovery of quorum queues with long backlogs for example.
* Server recovery strategy configuration: allow dynamically started
ra servers to be optionally restarted.
* New handle_aux/5 callback with a better and safer API
Khepri v0.13.0 contains a fix for how projections are handled during
registration and recovery. The error returned from
`khepri:register_projection/1,2,3` has also been updated to use the
`?khepri_error(..)` helper macro.
Co-authored-by: Jean-Sébastien Pédron <jean-sebastien.pedron@dumbbell.fr>
This Ra release contains fixes for leaderboard updates as well
as a long standing bug fix that meant the latest cluster may not
be recovered correctly after an unclean shutdown.
Khepri 0.10.0 replaces `khepri:wait_for_async_ret/2,3` with
`khepri:handle_async_ret/1,2`. This will be used by the child commit:
the child commit will use Khepri's async interface and handle async
write events from Ra.
Changes to the bazel build files were done automatically with gazelle:
bazel run gazelle -- update-repos --verbose \
--build_files_dir=bazel github.com/rabbitmq/khepri@v0.10.1
This includes a new ra:key_metrics/1 API that is more available
than parsing the output of sys:get_status/1.
the rabbit_quorum_queue:status/1 function has been ported to use
this API instead as well as now inludes a few new fields.
Includes minor fixes and improvements such as:
* Don't overwrite Ra member config file in place to avoid potential
corruption scenario
* Make logging unicode compatible
* Optimisation to avoid spawning node connector process on ra member init
when nodes are already connected.
* Catch recovery failures in the Ra WAL rather than crashing hard.
We already were using Cowlib 2.12.1 and therefore were
compatible with OTP-26. This simply updates Cowboy to
the version that depends on Cowlib 2.12.1.
Returns reaching a Ra member that used to be leader but now has stepped
down would cause that follower to crash and restart.
This commit avoids this scenario as well as giving the return commands
a good chance of being resent to the new leader in a timeley manner.
(see the Ra release for this).
This Ra release includes improvements to Ra server GC behaviour when receiving a lot
of low priority commands with large binary payloads (e.g. quorum queue messages).
Practically this allows quorum queues to accept large amounts of messages in a more predicatble and performant manner.
This change also removes ra_file_handle cache that was used as a bridge between ra file operations and RabbitMQ io metrics. Lots of components in RabbitMQ such as streams and CQv2s do not record io metrics in the previous manner due to overhead incurred for every file io operation. These metrics are better inspected at the OS level anyway.