The share-partition leader keeps track of the state and delivery attempts for in-flight records. However, delivery attempts tracking follows atleast-once semantics.
The consumer processes the records and acknowledges them upon successful consumption. This successful attempt triggers a transition to the "Acknowledged" state.
The code implements the functionality to acknowledge the offset/batches in the request to in-memory cached data.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
* KAFKA-16952: Do not bump broker epoch when re-registering the same incarnation
As part of KIP-858 (Handle JBOD broker disk failure in KRaft), we added some code that caused the
broker to re-register itself when transitioning from a MetadataVersion that did not support broker
directory IDs, to one that did. This code was necessary because otherwise the controller would not
be aware of what directories the broker held.
However, prior to this PR, the re-registration process acted exactly like a full registration. That
is, it bumped the broker epoch (which is meant to only be bumped on broker restart). This PR fixes
the code to keep the broker epoch the same if the incarnation ID is the same.
There are some other minor improvements here:
- The previous logic relied on a complicated combination of request version and previous broker
epoch to understand if the request came from the same broker or not. This is not needed: either
the incarnation ID is the same and it's the same process, or it is not and it isn't.
- We now log whether we're amending a registration, registering a previously unknown broker, or
replacing a previous registration.
- Move changes to the HeartbeatManager to the end of the function, so that we will not do them if
any validation step fails. Log4j messages are also generated at the end, for the same reason.
Reviewers: Ron Dagostino <rndgstn@gmail.com>
Zookeeper migration system tests currently override the config to
use only one log directory.
This PR removes the override so that the system tests run with 2 log
directories following the work done as part of KIP-858.
Reviewers: Igor Soarez <soarez@apple.com>, Proven Provenzano <pprovenzano@confluent.io>
The below remote log configs can be configured dynamically:
1. remote.log.manager.copy.max.bytes.per.second
2. remote.log.manager.fetch.max.bytes.per.second and
3. remote.log.index.file.cache.total.size.bytes
If those values are configured dynamically, then during the broker restart, it ensures the dynamic values are loaded instead of the static values from the config.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>
When broker configuration is incompatible with the current Metadata Version the Broker should log an error-level message but avoid shutting down.
Reviewers: Luke Chen <showuon@gmail.com>
This PR fixes consumer close to avoid updating the subscription state object in the app thread. Now the close simply triggers an UnsubscribeEvent that is handled in the background to trigger callbacks, clear assignment, and send leave heartbeat. Note that after triggering the event, the unsubscribe will continuously process background events until the event completes, to ensure that it allows for callbacks to run in the app thread.
The logic around what happens if the unsubscribe fails remain unchanged: close will log, keep the first exception and carry on.
It also removes the redundant LeaveOnClose event (it used to do the the exact same thing as the UnsubscribeEvent, both calling membershipMgr.leaveGroup).
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
In `GroupMetadataManager#toTopicPartitions`, we generate a list of `ConsumerGroupHeartbeatRequestData.TopicPartitions` from the input deserialized subscription. Currently the input subscription is `ConsumerPartitionAssignor.Subscription`, where the topic partitions are stored as (topic-partition) pairs, whereas in `ConsumerGroupHeartbeatRequestData.TopicPartitions`, we need the topic partitions to be stored as (topic-partition list) pairs.
`ConsumerProtocolSubscription` is an intermediate data structure in the deserialization where the topic partitions are stored as (topic-partition list) pairs. This pr uses `ConsumerProtocolSubscription` instead as the input subscription to make `toTopicPartitions` more efficient.
Reviewers: David Jacot <djacot@confluent.io>
We observed some runs of the test suite caused CI pipelines to stall.
A thread dump revealed that the test runner was blocked trying to read from a
socket, while attempting to close the socket [[0]]. It turns out this is
due to a bug in JDK which is very similar to JDK-8274524, but it affects
the else branch of `SSLSocketImpl::bruteForceCloseInput` [[1]] which wasn't
fixed in JDK-8274524.
Since the blocking happens in a native call, the test runner's timeouts have
no effect as the blocked test runner thread doesn't seem to respond to
interrupts.
As a mitigation in Kafka's test suite, this change adds `SO_TIMEOUT` of
30 seconds to all the TLS sockets handled by `EchoServer`. The timeout is
reasonably high for tests and a finite upper bound avoids infinite
blocking of the test suite.
[0]: https://issues.apache.org/jira/secure/attachment/13066427/timeout.log
[1]: 890adb6410/src/java.base/share/classes/sun/security/ssl/SSLSocketImpl.java (L808)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR contains the the following documentation changes for the native docker image:
in the docker/README.md: How to build, release and promote the native docker image.
in the tests/README.md: How to run system tests by bringing up kafka in the native mode.
added docker/native/README.md
added html changes for the kafka-site
added native docker image support in the docker compose files examples.
Testing:
Tested all the docker compose files with both the docker images - jvm and native
Tested the html changes locally with the kafka-site
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <vesharma@confluent.io>
Define the interfaces and RPCs for share-group persistence. (KIP-932). This PR is just RPCs and interfaces to allow building of the broker components which depend upon them. The implementation will follow in subsequent PRs.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
Added additional APIs for SharePartition which shall be used by SharePartitionManager.
The lock API on SharePartition helps not issuing concurrent fetch request on replica manager for same SharePartition. The updateCacheAndOffsets API helps to update the cache and corresponding offsets when an exception is encountered in SharePartitionManager because of movement of Log Start Offset.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Allow the committed offsets fetch to run for as long as needed. This handles the case where a user invokes Consumer.poll() with a very small timeout (including zero).
Reviewers: Andrew Schofield <aschofield@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
In previous PR(#16048), I mistakenly excluded the underscore (_) from the set of valid characters for the protocol,
resulting in the inability to correctly parse the connection string for SASL_PLAINTEXT. This bug fix addresses the
issue and includes corresponding tests.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Luke Chen <showuon@gmail.com>
Each KafkaStreams instance maintains a map from threadId to state
to use to aggregate to a KafkaStreams app state. The map is updated
on every state change, and when a new thread is created. State change
updates are done in a synchronized blocks, however the update that
happens on thread creation is not, which can raise
ConcurrentModificationException. This patch moves this update
into the listener object and protects it using the object's lock.
It also moves ownership of the state map into the listener so that
its less likely that future changes access it without locking
Reviewers: Matthias J. Sax <matthias@confluent.io>
About
KIP-932 introduces share sessions for share groups. This PR implements share sessions and contexts for incoming share fetch requests on broker. The changes include:
Defined CachedSharePartition class which are stored in share sessions.
Defined ShareSessionKey, ShareSession classes.
Defined ShareSessionCache class which caches all the share sessions and has evict policy defined as per KIP-932
Defined the 2 types of contexts -
a. ShareSessionContext - for share session fetch request.
b. FinalContext - for final share fetch request (epoch = -1).
Defined newContext function which returns the created/updated context on receiving share fetch request on broker.
Testing
The added code has been tested with the help of unit tests present in the PR.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Apoorv Mittal <apoorvmittal10@gmail.com>
Fixes for the leave group flow (unsubscribe/close):
Fix to send Heartbeat to leave group on close even if the callbacks fail
fix to ensure that if a member gets fenced while blocked on callbacks (ex. on unsubscribe), it will clear its epoch to not include it in commit requests
fix to avoid race on the subscription state object on unsubscribe, updating it only on the background thread when the callbacks to leave complete (success or failure).
Also improving logging in this area.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Philip Nee <pnee@confluent.io>