Migrate KStream stateless operators to new Processor API.
Following PRs will complete migration of KStream stateful operators and KTable.
No expected functionality changes.
Reviewers: John Roesler <vvcephei@apache.org>
In #5344 it came to our attention that the StreamsConfig overloads of the KafkaStreams constructors are actually quite useful for dependency injection, providing a cleaner way to configure dependencies and better type safety.
Reviewers: Matthias J. Sax <mjsax@confluent.io>
This new store is more efficient when calling range queries with only time parameters, like `fetch(from, to)`. For range queries using key ranges, then the current RocksDBWindowStore should be used.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
* Remove `ExtendedSerializer` and `ExtendedDeserializer`, deprecated since 2.1.
The extra functionality was also made available in `Serializer` and `Deserializer`.
* Remove `close(long, TimeUnit)` from the producer, consumer and admin client,
deprecated since 2.0 for the consumer and 2.2 for the rest. The replacement is `close(Duration)`.
* Remove `ConsumerConfig.addDeserializerToConfig` and `ProducerConfig.addSerializerToConfig`,
deprecated since 2.7 with no replacement. These methods were not intended to be public API
and are likely not used much (if at all).
* Remove `NoOffsetForPartitionException.partition()`, deprecated since 0.11. `partitions()`
should be used instead.
* Remove `MessageFormatter.init(Properties)`, deprecated since 2.7. The `configure(Map)`
method should be used instead.
* Remove `kafka.common.MessageFormatter`, deprecated since 2.7.
`org.apache.kafka.common.MessageFormatter` should be used instead.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
* Standardize license headers in scala, python, and gradle files.
* Relocate copyright attribution to the NOTICE.
* Add a license header check to `spotless` for scala files.
Reviewers: Ewen Cheslack-Postava <ewencp@apache.org>, Matthias J. Sax <mjsax@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org
Minor followup to #10407 -- we need to extract the rebalanceInProgress check down into the commitAndFillInConsumedOffsetsAndMetadataPerTaskMap method which is invoked during handleCorrupted, otherwise we may attempt to commit during a a rebalance which will fail
Reviewers: Matthias J. Sax <mjsax@confluent.io>
The filesystem locks don't protect access between StreamThreads, only across different instances of the same Streams application. Running multiple processes in the same physical state directory is not supported, and as of PR #9978 it's explicitly guarded against), so there's no reason to continue locking the task directories with anything heavier than an in-memory map.
Reviewers: Rohan Desai <rodesai@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Remove the default close implementation for RocksDBConfigSetter to avoid accidental memory leaks via C++ backed objects which are constructed but not closed by the user
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Need to handle TaskCorruptedException and TimeoutException that can be thrown from offset commit during handleRevocation or handleCorruption
Reviewers: Matthias J. Sax <mjsax@confluent.org>, Guozhang Wang <guozhang@confluent.io>
When in EOS the run loop terminates on that thread before the shutdown can be called. This is a problem for EOS single thread applications using the application shutdown feature.
I changed it so in all cases with a single thread, the dying thread will spin up a new thread to communicate the shutdown and terminate the dying thread. Also @ableegoldman refactored the catch blocks in runloop.
Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
ProcessorContext#forward was changed via KIP-251 in 2.0.0 release. This PR removes the old and deprecated overloads.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
A major issue has been raised that this implementation of
emit-on-change is vulnerable to a number of data-loss bugs
in the presence of recovery with dirty state under at-least-once
semantics. This should be fixed in the future when we implement
a way to avoid or clean up the dirty state under at-least-once,
at which point it will be safe to re-introduce KIP-557 and
complete it.
Reviewers: A. Sophie Blee-Goldman <ableegoldman@apache.org>
There were errors while generating javadoc for the streams:test-utils module
because the included TopologyTestDriver imported some excluded classes.
This fixes the errors by inlining the previously excluded packages.
Reviewers: Chia-Ping Tsai <chia7712@apache.org>, Ismael Juma <ijuma@apache.org>
1. replace org.junit.Assert by org.junit.jupiter.api.Assertions
2. replace org.junit by org.junit.jupiter.api
3. replace Before by BeforeEach
4. replace After by AfterEach
5. remove ExternalResource from all scala modules
6. add explicit AfterClass/BeforeClass to stop/start EmbeddedKafkaCluster
Noted that this PR does not migrate stream module to junit 5 so it does not introduce callback of junit 5 to deal with beforeAll/afterAll. The next PR of migrating stream module can replace explicit beforeAll/afterAll by junit 5 extension. Or we can keep the beforeAll/afterAll if it make code more readable.
Reviewers: John Roesler <vvcephei@apache.org>
This PR was removed by accident in trunk and 2.8, bringing it back.
Co-authored-by: Matthias J. Sax <matthias@confluent.io>
Reviewers: Matthias J. Sax <matthias@confluent.io>
This PR implements adding read-only access to the key for KStream.join as described in KIP-149
This PR as it stands does not affect the Streams Scala API. Updating the Streams Scala API will be done in a follow-up PR.
Additionally, the original KIP did not include the KTable API, but I don't see any reason why we wouldn't want the same functionality there as well, this will be done in an additional follow-up PR after updating the existing KIP.
Reviewers: Matthias J. Sax <mjsax@apache.org>
Need to exclude threads in PENDING_SHUTDOWN from the num live threads computation used to compute the new cache size per thread. Also adds some logging to help follow what's happening when a thread is added/removed/replaced.
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Walker Carlson <wcarlson@confluent.io>, John Roesler <john@confluent.io>
Emit on change introduced in Streams with KIP-557 might lead to
data loss if a record is put into a source KTable and emitted
downstream and then a failure happens before the offset could be
committed. After Streams rereads the record, it would find a record
with the same key, value and timestamp in the KTable (i.e. the same
record that was put into the KTable before the failure) and not
forward it downstreams. Hence, the record would never be processed
downstream of the KTable which breaks at-least-once and exactly-once
processing guarantees.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
For KIP-698, we need a way to setup internal topics without validating them. This PR adds a setup method to the InternalTopicManager for that purpose.
Reviewers: Rohan Desai <rohan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
For KIP-698, we need a way to validate internal topics before we create them. This PR adds a validation method to the InternalTopicManager for that purpose.
Reviewers: Rohan Desai <rohan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
The method StreamsBuilder#addGlobalStore was simplified via KIP-233 in 1.1.0 release. This PR removes the old and deprecated overload.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Always grab a new thread.id and verify that a thread has fully shut down to DEAD before removing from the `threads` list and making that id available again
Reviewers: Walker Carlson <wcarlson@confluent.io>, Bruno Cadonna <cadonna@confluent.io>
To implement the explicit user initialization of Kafka Streams as
described in KIP-698, we first need to extract the code for the
setup of the changelog topics from the Streams partition assignor
so that it can also be called outside of a rebalance.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
This PR aims to add unit test cases for RocksDBRangeIterator which were missing.
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Implements KIP-695
Reverts a previous behavior change to Consumer.poll and replaces
it with a new Consumer.currentLag API, which returns the client's
currently cached lag.
Uses this new API to implement the desired task idling semantics
improvement from KIP-695.
Reverts fdcf8fbf72 / KAFKA-10866: Add metadata to ConsumerRecords (#9836)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <guozhang@apache.org>