Commit Graph

196 Commits

Author SHA1 Message Date
José Armando García Sancio adee6f0cc1
KAFKA-16527; Implement request handling for updated KRaft RPCs (#16235)
Implement request handling for the updated versions of the KRaft RPCs (Fetch, FetchSnapshot, Vote,
BeginQuorumEpoch and EndQuorumEpoch). This doesn't add support for KRaft replicas to send the new
version of the KRaft RPCs. That will be implemented in KAFKA-16529.

All of the RPCs responses were extended to include the leader's endpoint for the listener of the
channel used in the request. EpochState was extended to include the leader's endpoint information
but only the FollowerState and LeaderState know the leader id and its endpoint(s).

For the Fetch request, the replica directory id was added. The leader now tracks the follower's log
end offset using both the replica id and replica directory id.

For the FetchSnapshot request, the replica directory id was added. This is not used by the KRaft
leader and it is there for consistency with Fetch and for help debugging.

For the Vote request, the replica key for both the voter (destination) and the candidate (source)
were added. The voter key is checked for consistency. The candidate key is persisted when the vote
is granted.

For the BeginQuorumEpoch request, all of the leader's endpoints are included. This is needed so
that the voters can return the leader's endpoint for all of the supported listeners.

For the EndQuorumEpoch request, all of the leader's endpoints are included. This is needed so that
the voters can return the leader's endpoint for all of the supported listeners. The successor list
has been extended to include the directory id. Receiving voters can use the entire replica key when
searching their position in the successor list.

Updated the existing test in KafkaRaftClientTest and KafkaRaftClientSnapshotTest to execute using
both the old version and new version of the RPCs.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2024-06-25 13:45:15 -07:00
dujian0068 78372b6883
KAFKA-17013 RequestManager#ConnectionState#toString() should use %s (#16413)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-24 23:44:39 +08:00
Luke Chen d646a09dd0
KAFKA-16531: calculate check-quorum when leader is not in voter set (#16211)
In the check-quorum calculation, the leader should not assume that it is part of the voter set. This may happen when the leader is removing itself from the voter set. This PR improves it by checking if leader is in the voter set or not, and decide how many follower fetches required. Also add tests.

Co-authored-by: Colin P. McCabe <cmccabe@apache.org>

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2024-06-21 11:22:24 +08:00
PoAn Yang 9ac102596b
KAFKA-16971 Fix the incorrect format string in QuorumConfigs#parseBootstrapServer (#16358)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-17 07:55:47 +08:00
gongxuanzhang d239dde8f6
KAFKA-10787 Apply spotless to raft module (#16278)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-15 11:28:36 +08:00
Omnia Ibrahim e99da2446c
KAFKA-15853: Move KafkaConfig.configDef out of core (#16116)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2024-06-14 17:26:00 +02:00
gongxuanzhang 596b945072
KAFKA-16643 Add ModifierOrder checkstyle rule (#15890)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-13 15:39:32 +08:00
Nikolay aecaf44475
KAFKA-16520: Support KIP-853 in DescribeQuorum (#16106)
Add support for KIP-953 KRaft Quorum reconfiguration in the DescribeQuorum request and response.
Also add support to AdminClient.describeQuorum, so that users will be able to find the current set of
quorum nodes, as well as their directories, via these RPCs.

Reviewers: Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Andrew Schofield <aschofield@confluent.io>
2024-06-11 10:01:35 -07:00
Alyssa Huang f880ad6ccf
KAFKA-16530: Fix high-watermark calculation to not assume the leader is in the voter set (#16079)
1. Changing log message from error to info - We may expect the HW calculation to give us a smaller result than the current HW in the case of quorum reconfiguration. We will continue to not allow the HW to actually decrease.
2. Logic for finding the updated LeaderEndOffset for updateReplicaState is changed as well. We do not assume the leader is in the voter set and check the observer states as well.
3. updateLocalState now accepts an additional "lastVoterSet" param which allows us to update the leader state with the last known voters. any nodes in this set but not in voterStates will be added to voterStates and removed from observerStates, any nodes not in this set but in voterStates will be removed from voterStates and added to observerStates

Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2024-06-06 14:30:49 +08:00
Sanskar Jhajharia 896af1b2f2
MINOR: Raft module Cleanup (#16205)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-06-06 04:16:59 +08:00
José Armando García Sancio 459da4795a
KAFKA-16525; Dynamic KRaft network manager and channel (#15986)
Allow KRaft replicas to send requests to any node (Node) not just the nodes configured in the
controller.quorum.voters property. This flexibility is needed so KRaft can implement the
controller.quorum.voters configuration, send request to the dynamically changing set of voters and
send request to the leader endpoint (Node) discovered through the KRaft RPCs (specially
BeginQuorumEpoch request and Fetch response).

This was achieved by changing the RequestManager API to accept Node instead of just the replica ID.
Internally, the request manager tracks connection state using the Node.idString method to match the
connection management used by NetworkClient.

The API for RequestManager is also changed so that the ConnectState class is not exposed in the
API. This allows the request manager to reclaim heap memory for any connection that is ready.

The NetworkChannel was updated to receive the endpoint information (Node) through the outbound raft
request (RaftRequent.Outbound). This makes the network channel more flexible as it doesn't need to
be configured with the list of all possible endpoints. RaftRequest.Outbound and
RaftResponse.Inbound were updated to include the remote node instead of just the remote id.

The follower state tracked by KRaft replicas was updated to include both the leader id and the
leader's endpoint (Node). In this comment the node value is computed from the set of voters. In
future commit this will be updated so that it is sent through KRaft RPCs. For example
BeginQuorumEpoch request and Fetch response.

Support for configuring controller.quorum.bootstrap.servers was added. This includes changes to
KafkaConfig, QuorumConfig, etc. All of the tests using QuorumTestHarness were changed to use the
controller.quorum.bootstrap.servers instead of the controller.quorum.voters for the broker
configuration. Finally, the node id for the bootstrap server will be decreasing negative numbers
starting with -2.

Reviewers: Jason Gustafson <jason@confluent.io>, Luke Chen <showuon@gmail.com>, Colin P. McCabe <cmccabe@apache.org>
2024-06-03 14:24:48 -07:00
dengziming 131ce0ba59
Minor: Fix VoterSetHistoryTest.testAddAt (#16104)
Reviewers: Luke Chen <showuon@gmail.com>
2024-05-30 10:28:07 +08:00
Colin P. McCabe 90892ae99f KAFKA-16516: Fix the controller node provider for broker to control channel
Fix the code in the RaftControllerNodeProvider to query RaftManager to find Node information,
rather than consulting a static map. Add a RaftManager.voterNode function to supply this
information. In KRaftClusterTest, add testControllerFailover to get more coverage of controller
failovers.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-05-24 09:52:47 -07:00
Mickael Maison affe8da54c
KAFKA-7632: Support Compression Levels (KIP-390) (#15516)
Reviewers: Jun Rao <jun@confluent.io>,  Luke Chen <showuon@gmail.com>
Co-authored-by: Lee Dongjin <dongjin@apache.org>
2024-05-21 17:58:49 +02:00
José Armando García Sancio 056d232f4e
KAFKA-16526; Quorum state data version 1 (#15859)
Allow KRaft replicas to read and write version 0 and 1 of the quorum-state file. Which version is written is controlled by the kraft.version. With kraft.version 0, version 0 of the quorum-state file is written. With kraft.version 1, version 1 of the quorum-state file is written. Version 1 of the quorum-state file adds the VotedDirectoryId field and removes the CurrentVoters. The other fields removed in version 1 are not important as they were not overwritten or used by KRaft.

In kraft.version 1 the set of voters will be stored in the kraft partition log segments and snapshots.

To implement this feature the following changes were made to KRaft.

FileBasedStateStore was renamed to FileQuorumStateStore to better match the name of the implemented interface QuorumStateStore.

The QuorumStateStore::writeElectionState was extended to include the kraft.version. This version is used to determine which version of QuorumStateData to store. When writing version 0 the VotedDirectoryId is not persisted but the latest value is kept in-memory. This allows the replica to vote consistently while they stay online. If a replica restarts in the middle of an election it will forget the VotedDirectoryId if the kraft.version is 0. This should be rare in practice and should only happen if there is an election and failure while the system is upgrading to kraft.version 1.

The type ElectionState, the interface EpochState and all of the implementations of EpochState (VotedState, UnattachedState, FollowerState, ResignedState, CandidateState and LeaderState) are extended to support the new voted directory id.

The type QuorumState is changed so that local directory id is used. The type is also changed so that the latest value for the set of voters and the kraft version is query from the KRaftControlRecordStateMachine.

The replica directory id is read from the meta.properties and passed to the KafkaRaftClient. The replica directory id is guaranteed to be set in the local replica.

Adds a new metric for current-vote-directory-id which exposes the latest in-memory value of the voted directory id.

Renames VoterSet.VoterKey to ReplicaKey.

It is important to note that after this change, version 1 of the quorum-state file will not be written by kraft controllers and brokers. This change adds support reading and writing version 1 of the file in preparation for future changes.

Reviewers: Jun Rao <junrao@apache.org>
2024-05-16 09:53:36 -04:00
José Armando García Sancio 440f5f6c09
MINOR; Validate at least one control record (#15912)
Validate that a control batch in the batch accumulator has at least one control record.

Reviewers: Jun Rao <junrao@apache.org>, Chia-Ping Tsai <chia7712@apache.org>
2024-05-14 10:02:29 -04:00
José Armando García Sancio bfe81d6229
KAFKA-16207; KRaft's internal log listener to update voter set (#15671)
Adds support for the KafkaRaftClient to read the control records KRaftVersionRecord and VotersRecord in the snapshot and log. As the control records in the KRaft partition are read, the replica's known set of voters are updated. This change also contains the necessary changes to include the control records when a snapshot is generated by the KRaft state machine.

It is important to note that this commit changes the code and the in-memory state to track the sets of voters but it doesn't change any data that is externally exposed. It doesn't change the RPCs, data stored on disk or configuration.

When the KRaft replica starts the PartitionListener reads the latest snapshot and then log segments up to the LEO, updating the in-memory state as it reads KRaftVersionRecord and VotersRecord. When the replica (leader and follower) appends to the log, the PartitionListener catches up to the new LEO. When the replica truncates the log because of a diverging epoch, the PartitionListener also truncates the in-memory state to the new LEO. When the state machine generate a new snapshot the PartitionListener trims any prefix entries that are not needed. This is all done to minimize the amount of data tracked in-memory and to make sure that it matches the state on disk.

To implement the functionality described above this commit also makes the following changes:

Adds control records for KRaftVersionRecord and VotersRecord. KRaftVersionRecord describes the finalized kraft.version supported by all of the replicas. VotersRecords describes the set of voters at a specific offset.

Changes Kafka's feature version to support 0 as the smallest valid value. This is needed because the default value for kraft.version is 0.

Refactors FileRawSnapshotWriter so that it doesn't directly call the onSnapshotFrozen callback. It adds NotifyingRawSnapshotWriter for calling such callbacks. This reorganization is needed because in this change both the KafkaMetadataLog and the KafkaRaftClient need to react to snapshots getting frozen.

Cleans up KafkaRaftClient's initialization. Removes initialize from RaftClient - this is an implementation detail that doesn't need to be exposed in the interface. Removes RaftConfig.AddressSpec and simplifies the bootstrapping of the static voter's address. The bootstrapping of the address is delayed because of tests. We should be able to simplify this further in future commits.

Update the DumpLogSegment CLI to support the new control records KRaftVersionRecord and VotersRecord.

Fix the RecordsSnapshotReader implementations so that the iterator includes control records. RecordsIterator is extended to support reading the new control records.
Improve the BatchAccumulator implementation to allow multiple control records in one control batch. This is needed so that KRaft can make sure that VotersRecord is included in the same batch as the control record (KRaftVersionRecord) that upgrades the kraft.version to 1.

Add a History interface and default implementation TreeMapHistory. This is used to track all of the sets of voters between the latest snapshot and the LEO. This is needed so that KafkaRaftClient can query for the latest set of voters and so that KafkaRaftClient can include the correct set of voters when the state machine generates a new snapshot at a given offset.

Add a builder pattern for RecordsSnapshotWriter. The new builder pattern also implements including the KRaftVersionRecord and VotersRecord control records in the snapshot as necessary. A KRaftVersionRecord should be appended if the kraft.version is greater than 0 at the snapshot's offset. Similarly, a VotersRecord should be appended to the snapshot with the latest value up to the snapshot's offset.

Reviewers: Jason Gustafson <jason@confluent.io>
2024-05-04 12:43:16 -07:00
Mickael Maison e7792258df
MINOR: Various cleanups in raft (#15805)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-26 15:20:09 +02:00
Omnia Ibrahim 6feae817d2
MINOR: Rename RaftConfig to QuorumConfig (#15797)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-04-26 03:08:31 +08:00
Kuan-Po (Cooper) Tseng 12a1d85362
KAFKA-12187 replace assertTrue(obj instanceof X) with assertInstanceOf (#15512)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2024-03-20 10:36:25 +08:00
José Armando García Sancio 474f8c1ad6
KAFKA-16286; Notify listener of latest leader and epoch (#15397)
KRaft was only notifying listeners of the latest leader and epoch when the replica transition to a new state. This can result in the listener never getting notified if the registration happened after it had become a follower.

This problem doesn't exists for the active leader because the KRaft implementation attempts to notified the listener of the latest leader and epoch when the replica is the active leader.

This issue is fixed by notifying the listeners of the latest leader and epoch after processing the listener registration request.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2024-02-23 12:56:25 -08:00
yicheny af0aceb4d7
MINOR: fix word spelling mistakes (#15331)
fix word spelling mistakes

Reviewers: Luke Chen <showuon@gmail.com>
2024-02-07 15:30:01 +08:00
Luke Chen 70bd4ce8a7
KAFKA-16144: skip checkQuorum for only 1 voter case (#15235)
When there's only 1 voter, there will be no fetch request from other voters. In this case, we should still not expire the checkQuorum timer because there's just 1 voter.

Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2024-01-23 10:17:53 +08:00
Jason Gustafson 599e22b842
MINOR: Move Raft io thread implementation to Java (#15119)
This patch moves the `RaftIOThread` implementation into Java. I changed the name to `KafkaRaftClientDriver` since the main thing it does is drive the calls to `poll()`. There shouldn't be any changes to the logic.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2024-01-05 09:27:36 -08:00
Josep Prat dd143cdcb7
MINOR Minor Cleanup KRaft code (#15002)
* MINOR Minor Cleanup KRaft code

Clean up minor issues with KRaft code.
Add batch info excluding the records to ease troubleshotting in RecordsSnapshotReader#lastContainedLogTimestamp

Reviewers: Luke Chen <showuon@gmail.com>

Signed-off-by: Josep Prat <josep.prat@aiven.io>
2023-12-15 13:55:18 +01:00
Luke Chen 37416e1aeb
KAFKA-15489: resign leadership when no fetch or fetch snapshot from majority voters (#14428)
In KIP-595, we expect to piggy-back on the `quorum.fetch.timeout.ms` config, and if the leader did not receive Fetch requests from a majority of the quorum for that amount of time, it would begin a new election, to resolve the network partition in the quorum. But we missed this implementation in current KRaft. Fixed it in this PR.

The commit include:
1. Added a timer with timeout configuration in `LeaderState`, and check if expired each time when leader is polled. If expired, resigning the leadership and start a new election.

2. Added `fetchedVoters` in `LeaderState`, and update the value each time received a FETCH or FETCH_SNAPSHOT request, and clear it and resets the timer if the majority - 1 of the remote voters sent such requests.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-11-30 11:34:44 -08:00
Gyeongwon, Do abde0e0878
MINOR: fix typo and comment (#14650)
Reviewers: hudeqi <1217150961@qq.com>, Ziming Deng <dengziming1993@gmail.com>.
2023-10-28 12:10:53 +08:00
Ismael Juma 69e591db3a
MINOR: Rewrite/Move KafkaNetworkChannel to the `raft` module (#14559)
This is now possible since `InterBrokerSend` was moved from `core` to `server-common`.
Also rewrite/move `KafkaNetworkChannelTest`.

The scala version of `KafkaNetworkChannelTest` passed with the changes here (before I
deleted it).

Reviewers: Justine Olshan <jolshan@confluent.io>, José Armando García Sancio <jsancio@users.noreply.github.com>
2023-10-16 20:10:31 -07:00
Ismael Juma 4cf86c5d2f
KAFKA-15492: Upgrade and enable spotbugs when building with Java 21 (#14533)
Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-12 14:09:10 +02:00
Ismael Juma 98febb989a
KAFKA-15485: Fix "this-escape" compiler warnings introduced by JDK 21 (1/N) (#14427)
This is one of the steps required for kafka to compile with Java 21.

For each case, one of the following fixes were applied:
1. Suppress warning if fixing would potentially result in an incompatible change (for public classes)
2. Add final to one or more methods so that the escape is not possible
3. Replace method calls with direct field access.

In addition, we also fix a couple of compiler warnings related to deprecated references in the `core` module.

See the following for more details regarding the new lint warning:
https://www.oracle.com/java/technologies/javase/21-relnote-issues.html#JDK-8015831

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Chris Egerton <chrise@aiven.io>
2023-09-24 05:59:29 -07:00
José Armando García Sancio 7b669e8806
KAFKA-14273; Close file before atomic move (#14354)
In the Windows OS atomic move are not allowed if the file has another open handle. E.g

__cluster_metadata-0\quorum-state: The process cannot access the file because it is being used by another process
        at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92)
        at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103)
        at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:403)
        at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293)
        at java.base/java.nio.file.Files.move(Files.java:1430)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:949)
        at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:932)
        at org.apache.kafka.raft.FileBasedStateStore.writeElectionStateToFile(FileBasedStateStore.java:152)

This is fixed by first closing the temporary quorum-state file before attempting to move it.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
Co-Authored-By: Renaldo Baur Filho <renaldobf@gmail.com>
2023-09-07 16:17:03 -07:00
mannoopj 2e3ff21c2e
KAFKA-15412: Reading an unknown version of quorum-state-file should trigger an error (#14302)
Reading an unknown version of quorum-state-file should trigger an error. Currently the only known version is 0. Reading any other version should cause an error.

Reviewers: Justine Olshan <jolshan@confluent.io>, Luke Chen <showuon@gmail.com>
2023-08-30 15:03:41 +08:00
Phuc-Hong-Tran 8d12c1175c
KAFKA-15152: Fix incorrect format specifiers when formatting string (#14026)
Reviewers: Divij Vaidya <diviv@amazon.com>

Co-authored-by: phuchong.tran <phuchong.tran@servicenow.com>
2023-08-24 19:38:45 +02:00
José Armando García Sancio 3f4816dd3e
KAFKA-15345; KRaft leader notifies leadership when listener reaches epoch start (#14213)
In a non-empty log the KRaft leader only notifies the listener of leadership when it has read to the leader's epoch start offset. This guarantees that the leader epoch has been committed and that the listener has read all committed offsets/records.

Unfortunately, the KRaft leader doesn't do this when the log is empty. When the log is empty the listener is notified immediately when it has become leader. This makes the API inconsistent and harder to program against.

This change fixes that by having the KRaft leader wait for the listener's nextOffset to be greater than the leader's epochStartOffset before calling handleLeaderChange.

The RecordsBatchReader implementation is also changed to include control records. This makes it possible for the state machine learn about committed control records. This additional information can be used to compute the committed offset or for counting those bytes when determining when to snapshot the partition.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2023-08-17 18:40:17 -07:00
José Armando García Sancio dafe51b658
KAFKA-15100; KRaft data race with the expiration service (#14141)
The KRaft client uses an expiration service to complete FETCH requests that have timed out. This expiration service uses a different thread from the KRaft polling thread. This means that it is unsafe for the expiration service thread to call tryCompleteFetchRequest. tryCompleteFetchRequest reads and updates a lot of states that is assumed to be only be read and updated from the polling thread.

The KRaft client now does not call tryCompleteFetchRequest when the FETCH request has expired. It instead will send the FETCH response that was computed when the FETCH request was first handled.

This change also fixes a bug where the KRaft client was not sending the FETCH response immediately, if the response contained a diverging epoch or snapshot id.

Reviewers: Jason Gustafson <jason@confluent.io>
2023-08-09 07:12:08 -07:00
José Armando García Sancio e0727063f7
KAFKA-15312; Force channel before atomic file move (#14162)
On ext4 file systems we have seen snapshots with zero-length files. This is possible if
the file is closed and moved before forcing the channel to write to disk.

Reviewers: Ron Dagostino <rndgstn@gmail.com>, Alok Thatikunta <athatikunta@confluent.io>
2023-08-08 14:31:42 -07:00
Colin Patrick McCabe 10bcd4fc7f
KAFKA-15213: provide the exact offset to QuorumController.replay (#13643)
Provide the exact record offset to QuorumController.replay() in all cases. There are several situations
where this is useful, such as logging, implementing metadata transactions, or handling broker
registration records.

In the case where the QC is inactive, and simply replaying records, it is easy to compute the exact
record offset from the batch base offset and the record index.

The active QC case is more difficult. Technically, when we submit records to the Raft layer, it can
choose a batch base offset later than the one we expect, if someone else is also adding records.
While the QC is the only entity submitting data records, control records may be added at any time.
In the current implementation, these are really only used for leadership elections. However, this
could change with the addition of quorum reconfiguration or similar features.

Therefore, this PR allows the QC to tell the Raft layer that a record append should fail if it
would have resulted in a batch base offset other than what was expected. This in turn will trigger a
controller failover. In the future, if automatically added control records become more common, we
may wish to have a more sophisticated system than this simple optimistic concurrency mechanism. But
for now, this will allow us to rely on the offset as correct.

In order that the active QC can learn what offset to start writing at, the PR also adds a new
RaftClient#endOffset function.

At the Raft level, this PR adds a new exception, UnexpectedBaseOffsetException. This gets thrown
when we request a base offset that doesn't match the one the Raft layer would have given us.
Although this exception should cause a failover, it should not be considered a fault. This
complicated the exception handling a bit and motivated splitting more of it out into the new
EventHandlerExceptionInfo class. This will also let us unit test things like slf4j log messages a
bit better.

Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
2023-07-27 17:01:55 -07:00
Colin Patrick McCabe c7de30f38b
KAFKA-15183: Add more controller, loader, snapshot emitter metrics (#14010)
Implement some of the metrics from KIP-938: Add more metrics for
measuring KRaft performance.

Add these metrics to QuorumControllerMetrics:
    kafka.controller:type=KafkaController,name=TimedOutBrokerHeartbeatCount
    kafka.controller:type=KafkaController,name=EventQueueOperationsStartedCount
    kafka.controller:type=KafkaController,name=EventQueueOperationsTimedOutCount
    kafka.controller:type=KafkaController,name=NewActiveControllersCount

Create LoaderMetrics with these new metrics:
    kafka.server:type=MetadataLoader,name=CurrentMetadataVersion
    kafka.server:type=MetadataLoader,name=HandleLoadSnapshotCount

Create SnapshotEmitterMetrics with these new metrics:
    kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedBytes
    kafka.server:type=SnapshotEmitter,name=LatestSnapshotGeneratedAgeMs

Reviewers: Ron Dagostino <rndgstn@gmail.com>
2023-07-24 21:13:58 -07:00
Cheryl Simmons e98508747a
Doc fixes: Fix format and other small errors in config documentation (#13661)
Various formatting fixes in the config docs.

Reviewers: Bill Bejeck <bbejeck@apache.org>
2023-07-10 12:48:35 -04:00
José Armando García Sancio 3a246b1aba
KAFKA-15078; KRaft leader replys with snapshot for offset 0 (#13845)
If the follower has an empty log, fetches with offset 0, it is more
efficient for the leader to reply with a snapshot id (redirect to
FETCH_SNAPSHOT) than for the follower to continue fetching from the log
segments.

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-06-28 14:21:11 -07:00
José Armando García Sancio b7a6a8fd5f
KAFKA-15076; KRaft should prefer latest snapshot (#13834)
If the KRaft listener is at offset 0, the start of the log, and KRaft has generated a snapshot, it should prefer the latest snapshot instead of having the listener read from the start of the log.

This is implemented by having KafkaRaftClient send a Listener.handleLoadSnapshot event, if the Listener is at offset 0 and the KRaft partition has generated a snapshot.

Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>
2023-06-12 07:25:42 -07:00
Alok Thatikunta 3d349ae0d6
MINOR; Add helper util Snapshots.lastContainedLogTimestamp (#13772)
This change refactors the lastContainedLogTimestamp to the Snapshots class, for re-usability. Introduces IdentitySerde based on ByteBuffer, required for using RecordsSnapshotReader. This change also removes the "recordSerde: RecordSerde[_]" argument from the KafkaMetadataLog constructor.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-06-06 08:29:15 -07:00
mojh7 04f2f6a26a
MINOR: Typo and unused method removal (#13739)
clean up unused private method and removed typos

Reviewers:  Divij Vaidya <diviv@amazon.com>,  Manyanda Chitimbo <manyanda.chitimbo@gmail.com>,  Daniel Scanteianu, Josep Prat <josep.prat@aiven.io>
2023-06-06 10:50:56 +02:00
Divij Vaidya fe6a827e20
KAFKA-14633: Reduce data copy & buffer allocation during decompression (#13135)
After this change,

    For broker side decompression: JMH benchmark RecordBatchIterationBenchmark demonstrates 20-70% improvement in throughput (see results for RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize).
    For consumer side decompression: JMH benchmark RecordBatchIterationBenchmark a mix bag of single digit regression for some compression type to 10-50% improvement for Zstd (see results for RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize).

Reviewers: Luke Chen <showuon@gmail.com>, Manyanda Chitimbo <manyanda.chitimbo@gmail.com>, Ismael Juma <mail@ismaeljuma.com>
2023-06-05 15:04:49 +08:00
David Mao d944ef1efb MINOR: Rename handleSnapshot to handleLoadSnapshot (#13727)
Rename handleSnapshot to handleLoadSnapshot to make it explicit that it is handling snapshot load,
not generation.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2023-05-17 09:57:24 -07:00
Chia-Ping Tsai 3c8665025a
MINOR: move ControlRecordTest to correct directory (#13718)
Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, hudeqi<1217150961@qq.com>, Satish Duggana<satishd@apache.org>
2023-05-17 18:56:02 +05:30
Luke Chen 625ef176ee
MINOR: remove kraft readme link (#13691)
The config/kraft/README.md is already removed. We should also remove the link.

Reviewers: dengziming <dengziming1993@gmail.com>
2023-05-10 16:40:20 +08:00
Manyanda Chitimbo dd63d88ac3
MINOR: fix noticed typo in raft and metadata projects (#13612)
Reviewers: Josep Prat <jlprat@apache.org>
2023-04-21 15:02:06 +02:00
José Armando García Sancio 1f1900b380
MINOR: Improve raft log4j messages a bit (#13553)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-04-14 10:05:22 -07:00
Paolo Patierno 571841fed3
KAFKA-14883: Expose `observer` state in KRaft metrics (#13525)
Currently, the current-state KRaft related metric reports follower state for a broker while technically it should be reported as an observer as the kafka-metadata-quorum tool does.

Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-04-13 12:55:57 +08:00