This reverts commit e8837465a5.
The commit that is reverted prevents Kafka Streams from re-initializing
its transactional producer. If an exception that fences the
transactional producer occurs, the producer is not re-initialized during
the handling of the exception. That causes an infinite loop of
ProducerFencedExceptions with corresponding rebalances.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, David Jacot
<djacot@confluent.io>
To address the issue of not creating a checkpoint file during the
restoring and closing process, called the
GlobalStateUpdateTask.flushState() method in
GlobalStateUpdateTask.initialize() and GlobalStateUpdateTask.close()
methods. This will flush the state and create a checkpoint file thereby,
avoiding the need to completely restore the entire state.
Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Matthias J. Sax <matthias@confluent.io>
In 3.1 we deprecated the eager rebalancing protocol and marked it for
removal in a later release. We aim to officially drop support and remove
the protocol from Streams in 4.0.
The effect of this PR is that it will no longer be possible to perform a
live upgrade Kafka Streams directly to 4.0 from version 2.3 or below.
Users will have to go through a bridge release between 2.4 - 3.9
instead.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Pull request to implement KIP-1111, aims to add a configuration that
prevents a Kafka Streams application from starting if any of its
internal topics have auto-generated names, thereby enforcing explicit
naming for all internal topics and enhancing the stability of the
application’s topology.
- Repartition Topics:
All repartition topics are created in the
KStreamImpl.createRepartitionedSource(...) static method. This method
either receives a name explicitly provided by the user or null and then
builds the final repartition topic name.
- Changelog Topics and State Store Names:
There are several scenarios where these are created:
In the MaterializedInternal constructor.
During KStream/KStream joins.
During KStream/KTable joins with grace periods.
With key-value buffers are used in suppressions.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Sophie Blee-Goldman <sophie@responsive.dev>
* In this PR, we add various infra classes needed to support the
`deleteShareGroups` functionality via the `kafka-share-groups.sh`
script, as well as the implementation of `kafka-share-groups.sh --delete`.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Cleanup code to avoid rawtype, and add suppressions where necessary.
Change the build to fail on rawtype warning.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
KAFKA-16720 aims to add the support for the AlterShareGroupOffsets AdminClient. Key Changes in the PR:
1. Added handing of alterShareGroupOffsets() in KafkaAdminClient and introduce AlterShareGroupOffsetRequest/AlterShareGroupOffsetResponse/AlterShareGroupOffsetsOptions classes.
2. Corresponding test in KafkaAdminClientTest.
3. Added ALTER_SHARE_GROUP_OFFSETS API (will finish it in next PR and the share coordinator pieces)
Reviewers: poorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
KIP-744 introduced the StreamsMetadata class as part of the implementation. In the KIP, the javadoc for the standbyTopicPartitions states that the method returns the set of source TopicPartition that it represents as a standby. The current javadoc states that it represents the changelog TopicPartition(s). While the partitions of the source and changelog topics will match, the javadoc needs to be updated to reflect the correct behavior.
Note that the deprecated o.a.k.streams.state.StreamsMetadata#standbyTopicPartitions method also describes the set of TopicPartition being source TopicPartition.
Reviewers: Matthias Sax<mjsax@apache.org>
In the streams smoke test, flush records that are appended to the input topics, to advance the stream time so that all suppressed windows are flushed at the end of the test. The records are created with record time equal to current time + 2 days. caf0b67 changed the broker defaults so that records more than one hour in the future are rejected by the broker. This breaks the flush messages. By moving all record time stamps 2 days into the past, the existing logic should work correctly with the new default broker configuration.
A similar thing happens in the relational smoke test, where data is emitted 4 days into the future. To avoid running into retention / compaction, the window retention time is increased for both tests.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Bill Bejeck <bill@confluent.io>
While reviewing PR #18287, I wrote some javadocs to help me understand the AbstractMergedSortedCacheStoreIterator. Maybe we could add them to help the next developers getting into it.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
The exception stack trace shown in the the ticket can happen when we are
concurrently closing the producer because of an error and doing a
regular close. This is not a bug in the test, but a real race condition
that can happen.
The sequence is this:
Thread1: Enter PENDING_ERROR
Thread2: Check if state is already ERROR
Thread1: Transition to ERROR
Thread2: Check if state is already PENDING_ERROR
Thread2: Transition to PENDING_SHUTDOWN
One idea to fix this would be to synchronize the sequence performed by
Thread1 using the state lock. However, this would require more changes,
since we cannot use the normal state transition method `setState` while
owning the lock, as it calls user-defined callbacks, which may create
deadlocks. Do avoid adding more synchronization, we can also fix it by
first attempting to transition to PENDING_SHUTDOWN, and _then_ checking
whether another thread is already attempting to shut down (states
PENDING_SHUTDOWN, PENDING_ERROR, ERROR, NOT_RUNNING). Since we never
transition from a shutdown state back to a non-shutdown state.
Reviewers: Matthias J. Sax <matthias@confluent.io>
Once a StreamThread receives its assignment, it will close the startup tasks. But during the closing process, the StandbyTask.closeClean() method will eventually call theStatemanagerUtil.closeStateManager method which needs to lock the state directory, but locking requires the calling thread be the current owner. Since the main thread grabs the lock on startup but moves on without releasing it, we need to update ownership explicitly here in order for the stream thread to close the startup task and begin processing.
Reviewers: Matthias Sax <mjsax@apache.org>, Nick Telford