Commit Graph

301 Commits

Author SHA1 Message Date
Satish Duggana e8ce93bd53
KAFKA-9555 Added default RLMM implementation based on internal topic storage. (#10579)
KAFKA-9555 Added default RLMM implementation based on internal topic storage.

This is the initial version of the default RLMM implementation.
This includes changes containing default RLMM configs, RLMM implementation, producer/consumer managers.
Introduced TopicBasedRemoteLogMetadataManagerHarness which takes care of bringing up a Kafka cluster and create remote log metadata topic and initializes TopicBasedRemoteLogMetadataManager.
Refactored existing RemoteLogMetadataCacheTest to RemoteLogSegmentLifecycleTest to have parameterized tests to run both RemoteLogMetadataCache and also TopicBasedRemoteLogMetadataManager.
Refactored existing InmemoryRemoteLogMetadataManagerTest, RemoteLogMetadataManagerTest to have parameterized tests to run both InmemoryRemoteLogMetadataManager and also TopicBasedRemoteLogMetadataManager.

This is part of tiered storage KIP-405 efforts.

Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Cong Ding <cong@ccding.com>, Jun Rao <junrao@gmail.com>
2021-07-19 09:05:46 -07:00
Colin Patrick McCabe e07de97a4c
KAFKA-12803: Support reassigning partitions when in KRaft mode (#10753)
Support the KIP-455 reassignment API when in KRaft mode. Reassignments
which merely rearrange partitions complete immediately. Those that only
remove a partition complete immediately if the ISR would be non-empty
after the specified removals. Reassignments that add one or more
partitions follow the KIP-455 pattern of adding all the adding replicas
to the replica set, and then waiting for the ISR to include all the new
partitions before completing. Changes to the partition sets are
accomplished via PartitionChangeRecord.

Reviewers: Jun Rao <junrao@gmail.com>
2021-07-15 11:41:51 -07:00
A. Sophie Blee-Goldman 37d086fa2a
KAFKA-12984: make AbstractStickyAssignor resilient to invalid input, utilize generation in cooperative, and fix assignment bug (#10985)
1) Bring the generation field back to the CooperativeStickyAssignor so we don't need to rely so heavily on the ConsumerCoordinator properly updating its SubscriptionState after eg falling out of the group. The plain StickyAssignor always used the generation since it had to, so we just make sure the CooperativeStickyAssignor has this tool as well
2) In case of unforeseen problems or further bugs that slip past the generation field safety net, the assignor will now explicitly look out for partitions that are being claimed by multiple consumers as owned in the same generation. Such a case should never occur, but if it does, we have to invalidate this partition from the ownedPartitions of both consumers, since we can't tell who, if anyone, has the valid claim to this partition.
3) Fix a subtle bug that I discovered while writing tests for the above two fixes: in the constrained algorithm, we compute the exact number of partitions each consumer should end up with, and keep track of the "unfilled" members who must -- or might -- require more partitions to hit their quota. The problem was that members at the minQuota were being considered as "unfilled" even after we had already hit the maximum number of consumers allowed to go up to the maxQuota, meaning those minQuota members could/should not accept any more partitions beyond that. I believe this was introduced in #10509, so it shouldn't be in any released versions and does not need to be backported.

Reviewers: Guozhang Wang <guozhang@apache.org>, Luke Chen <showuon@gmail.com>
2021-07-13 18:29:31 -07:00
Justine Olshan 2b8aff58b5
KAFKA-10580: Add topic ID support to Fetch request (#9944)
Updated FetchRequest and FetchResponse to use topic IDs rather than topic names.
Some of the complicated code is found in FetchSession and FetchSessionHandler.
We need to be able to store topic IDs and maintain a cache on the broker for IDs that may not have been resolved. On incremental fetch requests, we will try to resolve them or remove them if in toForget.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
2021-07-07 16:02:37 -07:00
Sanjana Kaundinya e00c0f3719
KAFKA-12234: Implement request/response for offsetFetch batching (KIP-709) (#10962)
This implements the request and response portion of KIP-709. It updates the OffsetFetch request and response to support fetching offsets for multiple consumer groups at a time. If the broker does not support the new OffsetFetch version, clients can revert to the previous behaviour and use a request for each coordinator.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Konstantine Karantasis <konstantine@confluent.io>
2021-07-07 11:55:00 +01:00
Colin Patrick McCabe b4e45cd0d2
KAFKA-13019: Add MetadataImage and MetadataDelta classes for KRaft Snapshots (#10949)
Create the image/ module for storing, reading, and writing broker metadata images.
Metadata images are immutable. New images are produced from existing images
using delta classes. Delta classes are mutable, and represent changes to a base
image.

MetadataImage objects can be converted to lists of KRaft metadata records. This
is essentially writing a KRaft snapshot. The resulting snapshot can be read
back into a MetadataDelta object. In practice, we will typically read the
snapshot, and then read a few more records to get fully up to date. After that,
the MetadataDelta can be converted to a MetadataImage as usual.

Sometimes, we have to load a snapshot even though we already have an existing
non-empty MetadataImage. We would do this if the broker fell too far behind and
needed to receive a snapshot to catch up. This is handled just like the normal
snapshot loading process. Anything that is not in the snapshot will be marked
as deleted in the MetadataDelta once finishSnapshot() is called.

In addition to being used for reading and writing snapshots, MetadataImage also
serves as a cache for broker information in memory. A follow-up PR will replace
MetadataCache, CachedConfigRepository, and the client quotas cache with the
corresponding Image classes. TopicsDelta also replaces the "deferred
partition" state that the RaftReplicaManager currently implements. (That change
is also in a follow-up PR.)

Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>
2021-07-01 00:08:25 -07:00
kpatelatwork 527ba111c7
KAFKA-4793: Connect API to restart connector and tasks (KIP-745) (#10822)
Implements KIP-745 https://cwiki.apache.org/confluence/display/KAFKA/KIP-745%3A+Connect+API+to+restart+connector+and+tasks to change connector REST API to restart a connector and its tasks as a whole.

Testing strategy 
- [x]  Unit tests added for all possible combinations of onlyFailed and includeTasks
- [x]  Integration tests added for all possible combinations of onlyFailed and includeTasks
- [x]  System tests for happy path 

Reviewers: Randall Hauch <rhauch@gmail.com>, Diego Erdody <erdody@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>
2021-06-30 21:13:07 -07:00
Niket d3ec9f940c
KAFKA-12952 Add header and footer records for raft snapshots (#10899)
Add header and footer records for raft snapshots. This helps identify when the snapshot
starts and ends. The header also contains a time.  The time field is currently set to 0.
KAFKA-12997 will add in the necessary wiring to use the correct timestamp.

Reviewers: Jose Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2021-06-29 09:37:20 -07:00
Jason Gustafson 299eea88a5
KAFKA-12870; Flush in progress not cleared after transaction completion (#10880)
We had been using `RecordAccumulator.beginFlush` in order to force the `RecordAccumulator` to flush pending batches when a transaction was being completed. Internally, `RecordAccumulator` has a simple counter for the number of flushes in progress. The count gets incremented in `beginFlush` and it is expected to be decremented by `awaitFlushCompletion`. The second call to decrement the counter never happened in the transactional path, so the counter could get stuck at a positive value, which means that the linger time would effectively be ignored.

This patch fixes the problem by removing the use of `beginFlush` in `Sender`. Instead, we now add an additional condition in `RecordAccumulator` to explicitly check when a transaction is being completed. 

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-06-18 15:50:49 -07:00
Satish Duggana 56250f446a
KAFKA-12816 Added tiered storage related configs including remote log manager configs. (#10733)
Added tiered storage related configs including remote log manager configs.
Added local log retention configs to LogConfig.
Added tests for the added configs.

Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2021-06-18 09:38:42 -07:00
Ismael Juma d27a84f70c
KAFKA-12945: Remove port, host.name and related configs in 3.0 (#10872)
They have been deprecated since 0.10.0. Full list of removes configs:
* port
* host.name
* advertised.port
* advertised.host.name

Also adjust tests to take the removals into account. Some tests were
no longer relevant and have been removed.

Finally, took the chance to:
* Clean up unnecessary usage of `KafkaConfig$.MODULE$` in
related files.
* Add missing `Test` annotations to `AdvertiseBrokerTest` and
make necessary changes for the tests to pass.

Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>
2021-06-17 05:32:34 -07:00
José Armando García Sancio b67a77d5b9
KAFKA-12787; Integrate controller snapshoting with raft client (#10786)
Directly use `RaftClient.Listener`, `SnapshotWriter` and `SnapshotReader` in the quorum controller.

1. Allow `RaftClient` users to create snapshots by specifying the last committed offset and last committed epoch. These values are validated against the log and leader epoch cache.
2. Remove duplicate classes in the metadata module for writing and reading snapshots.
3. Changed the logic for comparing snapshots. The old logic was assuming a certain batch grouping. This didn't match the implementation of the snapshot writer. The snapshot writer is free to merge batches before writing them.
4. Improve `LocalLogManager` to keep track of multiple snapshots.
5. Improve the documentation and API for the snapshot classes to highlight the distinction between the offset of batches in the snapshot vs the offset of batches in the log. These two offsets are independent of one another. `SnapshotWriter` and `SnapshotReader` expose a method called `lastOffsetFromLog` which represents the last inclusive offset from the log that is represented in the snapshot.

Reviewers: dengziming <swzmdeng@163.com>, Jason Gustafson <jason@confluent.io>
2021-06-15 10:32:01 -07:00
A. Sophie Blee-Goldman 48379bd6e5
KAFKA-12648: Pt. 1 - Add NamedTopology to protocol and state directory structure (#10609)
This PR includes adding the NamedTopology to the Subscription/AssignmentInfo, and to the StateDirectory so it can place NamedTopology tasks within the hierarchical structure with task directories under the NamedTopology parent dir.

Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2021-06-07 15:38:12 -07:00
José Armando García Sancio f50f13d781
KAFKA-12342: Remove MetaLogShim and use RaftClient directly (#10705)
This patch removes the temporary shim layer we added to bridge the interface
differences between MetaLogManager and RaftClient. Instead, we now use the
RaftClient directly from the metadata module.  This also means that the
metadata gradle module now depends on raft, rather than the other way around.
Finally, this PR also consolidates the handleResign and handleNewLeader APIs
into a single handleLeaderChange API.

Co-authored-by: Jason Gustafson <jason@confluent.io>
2021-05-20 15:39:46 -07:00
Colin Patrick McCabe 9e5b77fb96
KAFKA-12788: improve KRaft replica placement (#10494)
Implement a striped replica placement algorithm for KRaft. This also
means implementing rack awareness.  Previously, KRraft just chose
replicas randomly in a non-rack-aware fashion.  Also, allow replicas to
be placed on fenced brokers if there are no other choices.  This was
specified in KIP-631 but previously not implemented.

Reviewers: Jun Rao <junrao@gmail.com>
2021-05-17 16:49:47 -07:00
Colin Patrick McCabe f20fdbd839
KAFKA-12778: Fix QuorumController request timeouts and electLeaders (#10688)
The QuorumController should honor the timeout for RPC requests
which feature a timeout. For electLeaders, attempt to trigger a leader
election for all partitions when the request specifies null for the topics
argument.

Reviewers: David Arthur <mumrah@gmail.com>
2021-05-14 12:44:16 -07:00
Daniyar Yeralin 6d1ae8bc00
KAFKA-8326: Introduce List Serde (#6592)
Introduce List serde for primitive types or custom serdes with a serializer and a deserializer according to KIP-466

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias J. Sax <mjsax@conflunet.io>, John Roesler <roesler@confluent.io>, Michael Noll <michael@confluent.io>
2021-05-13 15:54:00 -07:00
Satish Duggana 7ef3879429
KAFKA-12758 Added `server-common` module to have server side common classes. (#10638)
Added server-common module to have server side common classes. Moved ApiMessageAndVersion, RecordSerde, AbstractApiMessageSerde, and BytesApiMessageSerde to server-common module.

Reivewers:  Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2021-05-11 09:58:28 -07:00
Chris Egerton 9ba583f6d6
KAFKA-12252 and KAFKA-12262: Fix session key rotation when leadership changes (#10014)
Author: Chris Egerton <chrise@confluent.io>
Reviewers: Greg Harris <gregh@confluent.io>, Randall Hauch <rhauch@gmail.com>
2021-05-05 16:11:15 -05:00
Satish Duggana a1367f57f5
KAFKA-12429: Added serdes for the default implementation of RLMM based on an internal topic as storage. (#10271)
KAFKA-12429: Added serdes for the default implementation of RLMM based on an internal topic as storage. This topic will receive events of RemoteLogSegmentMetadata, RemoteLogSegmentUpdate, and RemotePartitionDeleteMetadata. These events are serialized into Kafka protocol message format.
Added tests for all the event types for that topic.

This is part of the tiered storaqe implementation KIP-405.

Reivewers:  Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2021-05-05 07:48:52 -07:00
Vito Jeng 816f5c3b86
KAFKA-5876: KIP-216 Part 3, Apply StreamsNotStartedException for Interactive Queries (#10597)
KIP-216 Part 3: Throw StreamsNotStartedException if KafkaStreams state is CREATED

Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
2021-05-03 13:53:35 -07:00
José Armando García Sancio 6203bf8b94
KAFKA-12154; Raft Snapshot Loading API (#10085)
Implement Raft Snapshot loading API.

1. Adds a new method `handleSnapshot` to `raft.Listener` which is called whenever the `RaftClient` determines that the `Listener` needs to load a new snapshot before reading the log. This happens when the `Listener`'s next offset is less than the log start offset also known as the earliest snapshot.

2.  Adds a new type `SnapshotReader<T>` which provides a `Iterator<Batch<T>>` interface and de-serializes records in the `RawSnapshotReader` into `T`s

3.  Adds a new type `RecordsIterator<T>` that implements an `Iterator<Batch<T>>` by scanning a `Records` object and deserializes the batches and records into `Batch<T>`. This type is used by both `SnapshotReader<T>` and `RecordsBatchReader<T>` internally to implement the `Iterator` interface that they expose. 

4. Changes the `MockLog` implementation to read one or two batches at a time. The previous implementation always read from the given offset to the high-watermark. This made it impossible to test interesting snapshot loading scenarios.

5. Removed `throws IOException` from some methods. Some of types were inconsistently throwing `IOException` in some cases and throwing `RuntimeException(..., new IOException(...))` in others. This PR improves the consistent by wrapping `IOException` in `RuntimeException` in a few more places and replacing `Closeable` with `AutoCloseable`.

6. Updated the Kafka Raft simulation test to take into account snapshot. `ReplicatedCounter` was updated to generate snapshot after 10 records get committed. This means that the `ConsistentCommittedData` validation was extended to take snapshots into account. Also added a new invariant to ensure that the log start offset is consistently set with the earliest snapshot.

Reviewers: dengziming <swzmdeng@163.com>, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
2021-05-01 10:05:45 -07:00
Sergio Peña bf359f8e29
KAFKA-10847: Fix spurious results on left/outer stream-stream joins (#10462)
Fixes the issue with https://issues.apache.org/jira/browse/KAFKA-10847.

To fix the above problem, the left/outer stream-stream join processor uses a buffer to hold non-joined records for some time until the window closes, so they are not processed if a join is found during the join window time. If the window of a record closes and a join was not found, then this should be emitted and processed by the consequent topology processor.

A new time-ordered window store is used to temporary hold records that do not have a join and keep the records keys ordered by time. The KStreamStreamJoin has a reference to this new store . For every non-joined record seen, the processor writes it to this new state store without processing it. When a joined record is seen, the processor deletes the joined record from the new state store to prevent further processing.

Records that were never joined at the end of the window + grace period are emitted to the next topology processor. I use the stream time to check for the expiry time for determinism results . The KStreamStreamJoin checks for expired records and emit them every time a new record is processed in the join processor.

The new state store is shared with the left and right join nodes. The new store needs to serialize the record keys using a combined key of <joinSide-recordKey>. This key combination helps to delete the records from the other join if a joined record is found. Two new serdes are created for this, KeyAndJoinSideSerde which serializes a boolean value that specifies the side where the key is found, and ValueOrOtherValueSerde that serializes either V1 or V2 based on where the key was found.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-04-28 17:57:28 -07:00
A. Sophie Blee-Goldman 3805f3706f
KAFKA-12574: KIP-732, Deprecate eos-alpha and replace eos-beta with eos-v2 (#10573)
Deprecates the following 

1. StreamsConfig.EXACTLY_ONCE
2. StreamsConfig.EXACTLY_ONCE_BETA
3. Producer#sendOffsetsToTransaction(Map offsets, String consumerGroupId)

And introduces a new StreamsConfig.EXACTLY_ONCE_V2 config. Additionally, this PR replaces usages of the term "eos-beta" throughout the code with the term "eos-v2"

Reviewers: Matthias J. Sax <mjsax@confluent.io>
2021-04-28 13:22:15 -07:00
Colin Patrick McCabe a8a6952e4a
KAFKA-12471: Implement createPartitions in KIP-500 mode (#10343)
Implement the createPartitions RPC which adds more partitions to a topic
in the KIP-500 controller.  Factor out some of the logic for validating
manual partition assignments, so that it can be shared between
createTopics and createPartitions.  Add a startPartition argument to the
replica placer.

Reviewers: Jason Gustafson <jason@confluent.io>
2021-04-13 11:00:22 -07:00
Satish Duggana 327809024f
KAFKA-12368: Added inmemory implementations for RemoteStorageManager and RemoteLogMetadataManager. (#10218)
KAFKA-12368: Added inmemory implementations for RemoteStorageManager and RemoteLogMetadataManager.

Added inmemory implementation for RemoteStorageManager and RemoteLogMetadataManager. A major part of inmemory RLMM will be used in the default RLMM implementation which will be based on topic storage. These will be used in unit tests for tiered storage.
Added tests for both the implementations and their supported classes.
This is part of tiered storage implementation, KIP-405.

Reivewers:  Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
2021-04-13 10:14:03 -07:00
dengziming 73df36d241
MINOR: Remove some unnecessary cyclomatic complexity suppressions (#10488)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-04-12 17:49:24 +08:00
Luke Chen f76b8e4938
KAFKA-9831: increase max.poll.interval.ms to avoid unexpected rebalance (#10301)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2021-04-09 12:19:14 -07:00
Colin Patrick McCabe 7bc84d6ced
KAFKA-12467: Implement QuorumController snapshot generation (#10366)
Implement controller-side snapshot generation.Implement QuorumController snapshot
generation.  Note that this PR does not handle KRaft integration, just the internal 
snapshot record generation and consumption logic.

Reading a snapshot is relatively straightforward.  When the  QuorumController
starts up, it loads the most recent snapshot.  This is just a series of records
that we replay, plus a log offset ("snapshot epoch") that we advance to.

Writing a snapshot is more complex.  There are several components:
the SnapshotWriter which persists the snapshot, the SnapshotGenerator
which manages writing each batch of records, and the SnapshotGeneratorManager
which interfaces the preceding two classes with the event queue.

Controller snapshots are done incrementally.  In order to avoid blocking the
controller thread for a long time, we pull a few record batches at a time from
our record batch iterators.  These iterators are implemented by controller
manager classes such as ReplicationControlManager, ClusterControlManager, etc.

Finally, this PR adds ControllerTestUtils#deepSortRecords and
ControllerTestUtils#assertBatchIteratorContains, which make it easier to write
unit tests.  Since records are often constructed from unsorted data structures,
it is often useful to sort them before comparing them.

Reviewers: David Arthur <mumrah@gmail.com>
2021-04-06 10:18:06 -07:00
dengziming 4f47a565e2
KAFKA-12539; Refactor KafkaRaftCllient handleVoteRequest to reduce cyclomatic complexity (#10393)
1. Add `canGrantVote` to `EpochState`
2. Move the if-else in `KafkaRaftCllient.handleVoteRequest` to `EpochState`
3. Add unit tests for `canGrantVote`

Reviewers: Jason Gustafson <jason@confluent.io>
2021-04-05 09:27:50 -07:00
David Arthur e820eb42b2
KAFKA-12383: Get RaftClusterTest.java and other KIP-500 junit tests working (#10220)
Introduce "testkit" package which includes KafkaClusterTestKit class for enabling integration tests of self-managed clusters. Also make use of this new integration test harness in the ClusterTestExtentions JUnit extension. 

Adds RaftClusterTest for basic self-managed integration test. 

Reviewers: Jason Gustafson <jason@confluent.io>, Colin P. McCabe <cmccabe@apache.org>

Co-authored-by: Colin P. McCabe <cmccabe@apache.org>
2021-03-22 11:45:56 -04:00
dengziming 69eebbf968
KAFKA-12440; ClusterId validation for Vote, BeginQuorum, EndQuorum and FetchSnapshot (#10289)
Previously we implemented ClusterId validation for the Fetch API in the Raft implementation. This patch adds ClusterId validation to the remaining Raft RPCs. 

Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
2021-03-19 10:27:47 -07:00
Lev Zemlyanov 1fb8bd9c44
KAFKA-10070: parameterize Connect unit tests to remove code duplication (#10299)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Konstantine Karantasis <k.karantasis@gmail.com>
2021-03-19 14:03:36 +00:00
Jason Gustafson 8ef1619f3e
KAFKA-12459; Use property testing library for raft event simulation tests (#10323)
This patch changes the raft simulation tests to use jqwik, which is a property testing library. This provides two main benefits:

- It simplifies the randomization of test parameters. Currently the tests use a fixed set of `Random` seeds, which means that most builds are doing redundant work. We get a bigger benefit from allowing each build to test different parameterizations.
- It makes it easier to reproduce failures. Whenever a test fails, jqwik will report the random seed that failed. A developer can then modify the `@Property` annotation to use that specific seed in order to reproduce the failure.

This patch also includes an optimization for `MockLog.earliestSnapshotId` which reduces the time to run the simulation tests dramatically.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Chia-Ping Tsai <chia7712@gmail.com>, José Armando García Sancio <jsancio@gmail.com>, David Jacot <djacot@confluent.io>
2021-03-17 19:20:07 -07:00
Lee Dongjin 28ee656081
MINOR: Remove redundant allows in import-control.xml (#10339)
1. Remove org.apache.log4j from allowed import list of shell, trogdor subpackage; they uses slf4j, not log4.
2. Remove org.slf4j from allowed import list of clients, server subpackage: org.slf4j is allowed globally.
3. Remove org.apache.log4j from streams subpackage's allowed import list

Reviewers: David Jacot <david.jacot@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2021-03-17 19:03:29 +08:00
Colin Patrick McCabe eebc6f279e
MINOR: Enable topic deletion in the KIP-500 controller (#10184)
This patch enables delete topic support for the new KIP-500 controller. Also fixes the following:
- Fix a bug where feature level records were not correctly replayed.
- Fix a bug in TimelineHashMap#remove where the wrong type was being returned.

Reviewers: Jason Gustafson <jason@confluent.io>, Justine Olshan <jolshan@confluent.io>, Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>

Co-authored-by: Jason Gustafson <jason@confluent.io>
2021-03-04 11:28:20 -08:00
A. Sophie Blee-Goldman 23b61ba383
KAFKA-12375: don't reuse thread.id until a thread has fully shut down (#10215)
Always grab a new thread.id and verify that a thread has fully shut down to DEAD before removing from the `threads` list and making that id available again

Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
2021-03-02 16:28:15 -08:00
John Roesler a92b986c85
KAFKA-12268: Implement task idling semantics via currentLag API (#10137)
Implements KIP-695

Reverts a previous behavior change to Consumer.poll and replaces
it with a new Consumer.currentLag API, which returns the client's
currently cached lag.

Uses this new API to implement the desired task idling semantics
improvement from KIP-695.

Reverts fdcf8fbf72 / KAFKA-10866: Add metadata to ConsumerRecords (#9836)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@apache.org>
2021-03-02 08:20:47 -06:00
Colin Patrick McCabe 5eac5a822f
KAFKA-12276: Add the quorum controller code (#10070)
The quorum controller stores metadata in the KIP-500 metadata log, not in Apache
ZooKeeper. Each controller node is a voter in the metadata quorum. The leader of the
quorum is the active controller, which processes write requests. The followers are standby
controllers, which replay the operations written to the log. If the active controller goes away,
a standby controller can take its place.

Like the ZooKeeper-based controller, the quorum controller is based on an event queue
backed by a single-threaded executor. However, unlike the ZK-based controller, the quorum
controller can have multiple operations in flight-- it does not need to wait for one operation
to be finished before starting another. Therefore, calls into the QuorumController return
CompleteableFuture objects which are completed with either a result or an error when the
operation is done. The QuorumController will also time out operations that have been
sitting on the queue too long without being processed. In this case, the future is completed
with a TimeoutException.

The controller uses timeline data structures to store multiple "versions" of its in-memory 
state simultaneously. "Read operations" read only committed state, which is slightly older
than the most up-to-date in-memory state. "Write operations" read and write the latest
in-memory state. However, we can not return a successful result for a write operation until
its state has been committed to the log. Therefore, if a client receives an RPC response, it
knows that the requested operation has been performed, and can not be undone by a
controller failover.

Reviewers: Jun Rao <junrao@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
2021-02-19 18:03:23 -08:00
Colin P. Mccabe 690f72dd69 KAFKA-12334: Add the KIP-500 metadata shell
The Kafka Metadata shell is a new command which allows users to
interactively examine the metadata stored in a KIP-500 cluster.
It can examine snapshot files that are specified via --snapshot.

The metadata tool works by replaying the log and storing the state into
in-memory nodes.  These nodes are presented in a fashion similar to
filesystem directories.

Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>, Igor Soarez <soarez@apple.com>
2021-02-19 15:46:34 -08:00
Jason Gustafson 698319b8e2 KAFKA-12278; Ensure exposed api versions are consistent within listener (#10666)
Previously all APIs were accessible on every listener exposed by the broker, but
with KIP-500, that is no longer true.  We now have more complex requirements for
API accessibility.

For example, the KIP-500 controller exposes some APIs which are not exposed by
brokers, such as BrokerHeartbeatRequest, and does not expose most client APIs,
such as JoinGroupRequest, etc.  Similarly, the KIP-500 broker does not implement
some APIs that the ZK-based broker does, such as LeaderAndIsrRequest and
UpdateFeaturesRequest.

All of this means that we need more sophistication in how we expose APIs and
keep them consistent with the ApiVersions API. Up until now, we have been
working around this using the controllerOnly flag inside ApiKeys, but this is
not rich enough to support all of the cases listed above.  This PR introduces a
new "listeners" field to the request schema definitions.  This field is an array
of strings which indicate the listener types in which the API should be exposed.
We currently support "zkBroker", "broker", and "controller".  ("broker"
indicates the KIP-500 broker, whereas zkBroker indicates the old broker).

This PR also creates ApiVersionManager to encapsulate the creation of the
ApiVersionsResponse based on the listener type.  Additionally, it modifies
SocketServer to check the listener type of received requests before forwarding
them to the request handler.

Finally, this PR also fixes a bug in the handling of the ApiVersionsResponse
prior to authentication. Previously a static response was sent, which means that
changes to features would not get reflected. This also meant that the logic to
ensure that only the intersection of version ranges supported by the controller
would get exposed did not work. I think this is important because some clients
rely on the initial pre-authenticated ApiVersions response rather than doing a
second round after authentication as the Java client does.

One final cleanup note: I have removed the expectation that envelope requests
are only allowed on "privileged" listeners.  This made sense initially because
we expected to use forwarding before the KIP-500 controller was available. That
is not the case anymore and we expect the Envelope API to only be exposed on the
controller listener. I have nevertheless preserved the existing workarounds to
allow verification of the forwarding behavior in integration testing.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-02-18 16:25:51 -08:00
Ron Dagostino a30f92bf59
MINOR: Add KIP-500 BrokerServer and ControllerServer (#10113)
This PR adds the KIP-500 BrokerServer and ControllerServer classes and 
makes some related changes to get them working.  Note that the ControllerServer 
does not instantiate a QuorumController object yet, since that will be added in
PR #10070.

* Add BrokerServer and ControllerServer

* Change ApiVersions#computeMaxUsableProduceMagic so that it can handle
endpoints which do not support PRODUCE (such as KIP-500 controller nodes)

* KafkaAdminClientTest: fix some lingering references to decommissionBroker
that should be references to unregisterBroker.

* Make some changes to allow SocketServer to be used by ControllerServer as
we as by the broker.

* We now return a random active Broker ID as the Controller ID in
MetadataResponse for the Raft-based case as per KIP-590.

* Add the RaftControllerNodeProvider

* Add EnvelopeUtils

* Add MetaLogRaftShim

* In ducktape, in config_property.py: use a KIP-500 compatible cluster ID.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2021-02-17 21:35:13 -08:00
Ismael Juma 744d05b128
KAFKA-12327: Remove MethodHandle usage in CompressionType (#10123)
We don't really need it and it causes problems in older Android versions
and GraalVM native image usage (there are workarounds for the latter).

Move the logic to separate classes that are only invoked when the
relevant compression library is actually used. Place such classes
in their own package and enforce via checkstyle that only these
classes refer to compression library packages.

To avoid cyclic dependencies, moved `BufferSupplier` to the `utils`
package.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-14 08:12:25 -08:00
Colin Patrick McCabe bf5e1f1cc0
MINOR: add the MetaLogListener, LocalLogManager, and Controller interface. (#10106)
Add MetaLogListener, LocalLogManager, and related classes. These
classes are used by the KIP-500 controller and broker to interface with the
Raft log.

Also add the Controller interface. The implementation will be added in a separate PR.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, David Arthur <mumrah@gmail.com>
2021-02-11 08:42:59 -08:00
David Arthur e7e4252b0f
JUnit extensions for integration tests (#9986)
Adds JUnit 5 extension for running the same test with different types of clusters. 
See core/src/test/java/kafka/test/junit/README.md for details
2021-02-09 11:49:33 -05:00
Colin Patrick McCabe d98df7fc4d
MINOR: Add KafkaEventQueue (#10030)
Add KafkaEventQueue, which is used by the KIP-500 controller to manage its event queue.
Compared to using an Executor, KafkaEventQueue has the following advantages:

* Events can be given "deadlines." If an event lingers in the queue beyond the deadline, it
will be completed with a timeout exception. This is useful for implementing timeouts for
controller RPCs.

* Events can be prepended to the queue as well as appended.

* Events can be given tags to make them easier to manage. This is especially useful for
rescheduling or cancelling events which were previously scheduled to execute in the future.

Reviewers: Jun Rao <junrao@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2021-02-04 14:46:57 -08:00
Jason Gustafson f58c2acf26
KAFKA-12250; Add metadata record serde for KIP-631 (#9998)
This patch adds a `RecordSerde` implementation for the metadata record format expected by KIP-631. 

Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <mlists@juma.me.uk>
2021-02-03 16:16:35 -08:00
Colin Patrick McCabe 772f2cfc82
MINOR: Replace BrokerStates.scala with BrokerState.java (#10028)
Replace BrokerStates.scala with BrokerState.java, to make it easier to use from Java code if needed.  This also makes it easier to go from a numeric type to an enum.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-03 13:41:38 -08:00
Colin Patrick McCabe 1711cfa4eb
KAFKA-12209: Add the timeline data structures for the KIP-631 controller (#9901)
Reviewers: Jun Rao <junrao@gmail.com>
2021-02-02 11:33:55 -08:00
John Roesler 4d28391480
KAFKA-10867: Improved task idling (#9840)
Use the new ConsumerRecords.metadata() API to implement
improved task idling as described in KIP-695

Reviewers: Guozhang Wang <guozhang@apache.org>
2021-01-27 21:57:20 -06:00