1. When the 2nd argument is an exception we don't need a placeholder
2. Placeholders should equal to arguments.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
KIP-500 is not particularly descriptive. I also tweaked the readme text a bit.
Tested that the readme for self-managed still works after these changes.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>, Jason Gustafson <jason@confluent.io>
Cluster ID has traditionally been treated as a string by the Kafka protocol (for example,
AdminClient returns it as a string). The new KIP-500 code should continue this practice. If
we don't do this, upgrading existing clusters may be harder to do.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
The array backing BaseHashTable is intended to be sized as a power of
two. Due to a bug, the initial array size was calculated incorrectly
in some cases.
Also make the maximum array size the largest possible 31-bit power of
two. Previously it was a smaller size but this was due to a typo.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jose Sancio <jsancio@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This patch implements additional handling logic for `RemoveTopic` records:
- Update `MetadataPartitions` to ensure addition of deleted partitions to `localRemoved` set
- Ensure topic configs are removed from `ConfigRepository`
- Propagate deleted partitions to `GroupCoordinator` so that corresponding offset commits can be removed
This patch also changes the controller topic id generation logic to use `Uuid.randomUuid` rather than `Random`.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>
This patch enables delete topic support for the new KIP-500 controller. Also fixes the following:
- Fix a bug where feature level records were not correctly replayed.
- Fix a bug in TimelineHashMap#remove where the wrong type was being returned.
Reviewers: Jason Gustafson <jason@confluent.io>, Justine Olshan <jolshan@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
Co-authored-by: Jason Gustafson <jason@confluent.io>
This patch fixes two problem with the AlterIsr handling of the quorum controller:
- Ensure that partition epoch is updated correctly after partition change records and is
propagated to Partition
- Ensure that AlterIsr response includes partitions that were successfully updated
As part of this patch, I've renamed BrokersToIsrs.TopicPartition to
BrokersToIsrs.TopicIdPartition to avoid confusion with the TopicPartition object which is
used virtually everywhere. I've attempted to address some of the testing gaps as welll.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
The quorum controller stores metadata in the KIP-500 metadata log, not in Apache
ZooKeeper. Each controller node is a voter in the metadata quorum. The leader of the
quorum is the active controller, which processes write requests. The followers are standby
controllers, which replay the operations written to the log. If the active controller goes away,
a standby controller can take its place.
Like the ZooKeeper-based controller, the quorum controller is based on an event queue
backed by a single-threaded executor. However, unlike the ZK-based controller, the quorum
controller can have multiple operations in flight-- it does not need to wait for one operation
to be finished before starting another. Therefore, calls into the QuorumController return
CompleteableFuture objects which are completed with either a result or an error when the
operation is done. The QuorumController will also time out operations that have been
sitting on the queue too long without being processed. In this case, the future is completed
with a TimeoutException.
The controller uses timeline data structures to store multiple "versions" of its in-memory
state simultaneously. "Read operations" read only committed state, which is slightly older
than the most up-to-date in-memory state. "Write operations" read and write the latest
in-memory state. However, we can not return a successful result for a write operation until
its state has been committed to the log. Therefore, if a client receives an RPC response, it
knows that the requested operation has been performed, and can not be undone by a
controller failover.
Reviewers: Jun Rao <junrao@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
The Kafka Metadata shell is a new command which allows users to
interactively examine the metadata stored in a KIP-500 cluster.
It can examine snapshot files that are specified via --snapshot.
The metadata tool works by replaying the log and storing the state into
in-memory nodes. These nodes are presented in a fashion similar to
filesystem directories.
Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>, Igor Soarez <soarez@apple.com>
This adds BrokerMetadataListener which is responsible for processing metadata records received by the broker when running in Raft mode.
This also moves some classes that were added to the wrong folder in trunk
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ron Dagostino <rdagostino@confluent.io>
Add MetaLogListener, LocalLogManager, and related classes. These
classes are used by the KIP-500 controller and broker to interface with the
Raft log.
Also add the Controller interface. The implementation will be added in a separate PR.
Reviewers: Ron Dagostino <rdagostino@confluent.io>, David Arthur <mumrah@gmail.com>
Add KafkaEventQueue, which is used by the KIP-500 controller to manage its event queue.
Compared to using an Executor, KafkaEventQueue has the following advantages:
* Events can be given "deadlines." If an event lingers in the queue beyond the deadline, it
will be completed with a timeout exception. This is useful for implementing timeouts for
controller RPCs.
* Events can be prepended to the queue as well as appended.
* Events can be given tags to make them easier to manage. This is especially useful for
rescheduling or cancelling events which were previously scheduled to execute in the future.
Reviewers: Jun Rao <junrao@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
This patch adds a `RecordSerde` implementation for the metadata record format expected by KIP-631.
Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <mlists@juma.me.uk>
Add the metadata gradle module, which will contain the metadata record
definitions, and other metadata-related broker-side code.
Add MetadataParser, MetadataParseException, etc.
Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>