kafka/raft
Alyssa Huang 4583b033f0
KAFKA-17642: PreVote response handling and ProspectiveState (#18240)
This PR implements the second part of KIP-996 and KAFKA-16164 (tasks KAFKA-16607, KAFKA-17642, KAFKA-17643, KAFKA-17675) which encompass the response handling of PreVotes, addition of new ProspectiveState, update to metrics, and addition of Raft simulation tests.

Voters now transition to ProspectiveState first before CandidateState to prevent unnecessary epoch bumps. Voters in ProspectiveState send PreVotes requests which are Vote requests with PreVote set to true.

Follower grants PreVotes if it has not yet fetched successfully from leader. Leader denies all PreVotes. Unattached, Prospective, Candidate, and Resigned will grant PreVotes if the requesting replica's log is at least as long as theirs. Granted PreVotes are not persisted like standard votes. It is possible for a voter to grant several PreVotes in the same epoch.

The only state which is allowed to transition directly to CandidateState is ProspectiveState. This happens on reception of majority of granted PreVotes or if at least one voter doesn't support PreVote requests.

Prospective will transition to Follower after election loss/timeout if it was already aware of last known leader and the leader's endpoint, or at any point if it discovers the leader.

Prospective will transition to Unattached after election loss/timeout if it does not know the leader endpoints.

After electionTimeout, Resigned now always transitions to Unattached and increases the epoch.

Prospective grants standard votes if it has not already granted a standard vote (no votedKey), has no leaderId, and the recipient's log is current enough

Candidate no longer backs off after election timeout. Candidate still backs off after election loss.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-17 09:38:03 -05:00
..
bin KAFKA-18388 test-kraft-server-start.sh should use log4j2.yaml (#18370) 2025-01-07 04:20:06 +08:00
config KAFKA-18388 test-kraft-server-start.sh should use log4j2.yaml (#18370) 2025-01-07 04:20:06 +08:00
src KAFKA-17642: PreVote response handling and ProspectiveState (#18240) 2025-01-17 09:38:03 -05:00
.gitignore KAFKA-13429: ignore bin on new modules (#11415) 2021-11-10 14:36:24 -06:00
README.md KAFKA-17728 Add missing config `replica-directory-id` to raft README (#17518) 2024-10-17 11:32:27 +08:00

README.md

KRaft (Kafka Raft)

KRaft (Kafka Raft) is a protocol based on the Raft Consensus Protocol tailored for Apache Kafka.

This is used by Apache Kafka in the KRaft (Kafka Raft Metadata) mode. We also have a standalone test server which can be used for performance testing. We describe the details to set this up below.

Run Single Quorum

bin/test-kraft-server-start.sh --config config/kraft.properties --replica-directory-id b8tRS7h4TJ2Vt43Dp85v2A

Run Multi Node Quorum

Create 3 separate KRaft quorum properties as the following:

cat << EOF >> config/kraft-quorum-1.properties

node.id=1
listeners=PLAINTEXT://localhost:9092
controller.listener.names=PLAINTEXT
controller.quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
log.dirs=/tmp/kraft-logs-1
EOF

cat << EOF >> config/kraft-quorum-2.properties

node.id=2
listeners=PLAINTEXT://localhost:9093
controller.listener.names=PLAINTEXT
controller.quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
log.dirs=/tmp/kraft-logs-2
EOF

cat << EOF >> config/kraft-quorum-3.properties

node.id=3
listeners=PLAINTEXT://localhost:9094
controller.listener.names=PLAINTEXT
controller.quorum.voters=1@localhost:9092,2@localhost:9093,3@localhost:9094
log.dirs=/tmp/kraft-logs-3
EOF

Open up 3 separate terminals, and run individual commands:

bin/test-kraft-server-start.sh --config config/kraft-quorum-1.properties --replica-directory-id b8tRS7h4TJ2Vt43Dp85v2A
bin/test-kraft-server-start.sh --config config/kraft-quorum-2.properties --replica-directory-id Nkij_D9XRiYKNb41SiJo7Q
bin/test-kraft-server-start.sh --config config/kraft-quorum-3.properties --replica-directory-id 4-e97nI7eHPYKfEDtW8rtQ

Once a leader is elected, it will begin writing to an internal __raft_performance_test topic with a steady workload of random data. You can control the workload using the --throughput and --record-size arguments passed to test-kraft-server-start.sh.