Part of KIP-572: follow up work to PR #9800. It's not save to retry a TX commit after a timeout, because it's unclear if the commit was successful or not, and thus on retry we might get an IllegalStateException. Instead, we will throw a TaskCorruptedException to retry the TX if the commit failed.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
We should only recommend to increase the number of KafkaStreams instances, not the number of threads, since a standby task can never be placed on the same instance as an active task regardless of the thread count
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Consolidate auto topic creation logic to either forward a CreateTopicRequest or handling the creation directly as AutoTopicCreationManager, when handling FindCoordinator/Metadata request.
Co-authored-by: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Part of KIP-572: We move the offset reset for the internal "main consumer" when we revive a corrupted task, from the "task cleanup" code path, to the "task init" code path. For this case, we have already logic in place to handle TimeoutException that might be thrown by consumer#committed() method call.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
Part of KIP-572: When a custom `StreamPartitioner` is used, we need to get the number of partitions of output topics from the producer. This `partitionFor(topic)` call may through a `TimeoutException` that we now handle gracefully.
Reviewers: John Roesler <john@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>
Implements KIP-418, that deprecated the `branch()` operator in favor of the newly added and type-safe `split()` operator.
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <john@confluent.io>
To stabilize the task assignment across restarts of the JVM we need some way to persist the process-specific UUID. We can just write it to a file in the state directory, and initialize it from there or create a new one if no prior UUID exists.
Reviewers: Walker Carlson <wcarlson@confluent.io>, Leah Thomas <lthomas@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Add prefix scan support to State stores. Currently, only RocksDB and InMemory key value stores are being supported.
Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Add timeout to remove thread, and trigger thread to explicitly leave the group even in case of static membership
Reviewers: Bruno Cadonna <bruno@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Make the default state store directory location to follow
OS-specific temporary directory settings or java.io.tmpdir
JVM parameter, with Utils#getTempDir.
Reviewers: Matthias J. Sax <mjsax@apache.org>, John Roesler <vvcephei@apache.org>
As it's only API extension to match the java API with Named object with lots of duplication, I only tested the logic once.
Reviewers: Bill Bejeck <bbejeck@apache.org>
Implements KIP-696: Add new state PENDING_ERROR to KafkaStreams client.
Reviewers: Matthias J. Sax <matthias@confluent.io>, Bruno Cadonna <bruno@confluent.io>
Previously, StateDirectory used PosixFilePermissions to configure its directories' permissions which fails on Windows as its file system is not POSIX-compliant. This PR updates StateDirectory to fall back to the File API on non-POSIX-compliant file systems.
Reviewers: Luke Chen <43372967+showuon@users.noreply.github.com>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
We do not always own the thread that executes the close() method, i.e., we do not know the interruption policy of the thread. Thus, we should not swallow the interruption. The least we can do is restoring the interruption status before the current thread exits this method.
Reviewers: Walker Carlson <wcarlson@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
KIP-698: extract the code for the setup of the repartition topics from the Streams partition assignor so that it can also be called outside of a rebalance.
Reviewers: Leah Thomas <lthomas@confluent.io> , Guozhang Wang <guozhang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Remove all INFO-level logging from the main StreamThread loop in favor of a summary with a 2min interval
Reviewers: Walker Carlson <carlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
If a user doesn't have Persistent Stores, we won't create base dir and state dir and should not try to set permissions on them.
Reviewers: Bruno Cadonna <cadonna@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
Kafka Streams' TaskManager is a central class that grew quite big. This
PR breaks out a new 'task container' class to descope what TaskManager
does. In follow up PRs, we plan to move more methods from TaskManager
to the new 'Tasks.java' class and also improve task-type type safety.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>
Document the new properties-based metrics for RocksDB
Reviewers: Leah Thomas <lthomas@confluent.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
* replace `org.junit.Assert` by `org.junit.jupiter.api.Assertions`
* replace `org.junit` by `org.junit.jupiter.api`
* replace `org.junit.runners.Parameterized` by `org.junit.jupiter.params.ParameterizedTest`
* replace `org.junit.runners.Parameterized.Parameters` by `org.junit.jupiter.params.provider.{Arguments, MethodSource}`
* replace `Before` by `BeforeEach`
* replace `After` by `AfterEach`
Reviewers: Ismael Juma <ismael@juma.me.uk>
Avoid spamming the logs at the INFO level in a tight loop when there are no new records being polled
Reviewers: Walker Carlson <wcarlson@confluent.io>, Leah Thomas <lthomas@confluent.io>
This PR includes following changes.
1. replace org.junit.Assert by org.junit.jupiter.api.Assertions
2. replace org.junit by org.junit.jupiter.api
3. replace Before by BeforeEach
4. replace After by AfterEach
Reviewers: Ismael Juma <ismael@confluent.io>
This PR includes following changes.
1. @Test(expected = Exception.class) is replaced by assertThrows
2. remove reference to org.scalatest.Assertions
3. change the magic code from 1 to 2 for testAppendAtInvalidOffset to test ZSTD
4. rename testMaybeAddPartitionToTransactionXXXX to testNotReadyForSendXXX
5. increase maxBlockMs from 1s to 3s to avoid unexpected timeout from TransactionsTest#testTimeout
Reviewers: Ismael Juma <ismael@confluent.io>
If EOS is enabled and the TX commit fails with a timeout,
we should not process more messages (what is ok for non-EOS)
because we don't really know the status of the TX.
If the commit was indeed successful, we won't have an open TX
can calling send() would fail with an fatal error.
Instead, we should retry the (idempotent) commit of the TX,
and start a new TX afterwards.
Reviewers: Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>