This version of Ra contains a substantially refactored Ra log
implementation that provides higher throughput and lower
memory use in serveral scenarios.
New features:
* `log_ext` new effect type that instead of immedately reading
entries from the log it will instead provide a read plan for any
entries only located in segments.
* Machine version upgrades can now be be delayed until all
members are confirmed to support the new version.
This will avoid potential consumption pauses during upgrades.
This version contains bug fixes and a change to use async_dist
when a quorum queue sends a message to a remote node (e.g. a consumer
delivery). Using async_dist will reduce chances of messages not
reaching consumers in a timely manner when the system is loaded
and occasionally fills the distribution buffer.
[Why]
We pin a version of Horus even if we don't use it directly (it is a
dependency of Khepri). But currently, we can't update Khepri while still
needing the fix in Horus 0.3.1.
Horus 0.3.1 works around a crash in `cover` that mostly affects CI for
now.
This pinning will have to go away with the next update of Khepri.
This release contains a bug fix to an issue that very occasionally
could cause consumers on replica nodes not to be notified about
newly committed offsets in a timely manner.
This osiris release contains a bug fix that would cause an osiris
member to crash during recovery if certain unexpected files
were present in the log directory. (.e.g ".nfsXXXXXXXXXXXX") type
files used by the NFS file system when in use files are deleted.
This release contains fixes around certain recovery failures where
there are either orphaned segment files (that do not have a corresponding
index file) or index files that do not have a corresponding segment
file.
This release contains a few fixes and improvements:
* Add ra:key_metrics/2
* ra_server: Add a new last_applied state query
* Stop checkpoint validation when encountering a valid checkpoint
* Kill snapshot process before deleting everything
This restores the previous pre-init behaviour where an invalid server
will not stop the Ra system from starting. Instead it will log the
errors and continue.
This ensures compatibility with upgraded older systems and systems
where there are historical discrepancies between what is in the
ra_directory and actually on disk.
This contains a fix in the ra_directory module to ensure
names can be deleted even when a Ra server has never been started
during the current node lifetime.
Also contains a small tweak to ensure the ra_directory:unregister_name
is called before deleting a Ra data directory which is less likely
to cause a corrupt state that will stop a Ra system from starting.
This release contains improvements to the checkpointing feature
needed for quorum queues v4 and the following fixes:
* Add read to file:open/2 options in ra_lib:sync_file/1
* Emit the new local_query tuple only if query options are set
* bug fixes for checkpoints
It has largely been superseded by `perf`. It is no longer
generally useful. It can always be added to BUILD_DEPS for
the rare cases it is needed, or installed locally and
pointed to by setting its path to ERL_LIBS.
This release includes a new machine API `snapshot_installed/2`. This new
API will only be used indirectly through khepri.
This release also includes an performance improvement that reduces the chances
of building a large WAL mailbox backlog when a node is low on scheduling
resources and commands are committed by followers completing writes to disk
before the leader.
There is also a fix for a potential election deadlock.
Compressed ETS tables may introduce a small throughput penalty (low single
digit %) but can reduce peak Ra memory use by 30-50%.
Also set a default wal_max_entries value to avoid mem tables growing
too large when using very small message sizes (as more than 1M tiny
messages can easily fit into one WAL file).
Ra 2.10.1 has a type spec fix needed.
This Ra release contains a number of fixes and improvements including:
* Much improved resiliency when Ra infrastructure such as the WAL or
segment writer encounters unexpected errors during disk operations.
It also includes the following features that are RabbitMQ does not
yet make use of (but will in the near future).
* Checkpoints: allow non truncating snapshots to be written
to allow faster recovery of quorum queues with long backlogs for example.
* Server recovery strategy configuration: allow dynamically started
ra servers to be optionally restarted.
* New handle_aux/5 callback with a better and safer API