When flush is called a copy of incomplete batches is made. This
means that the full ProducerBatch(s) are held in memory until the flush
has completed. Note that the `Sender` removes producer batches
from the original incomplete collection when they're no longer
needed.
For batches where the existing memory pool is used this
is not as wasteful as the memory will be returned to the pool,
but for non pool memory it can only be GC'd after the flush has
completed. Rather than use copyAll we can make a new array with only the
produceFuture(s) and await on those.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
The QuorumController should honor the timeout for RPC requests
which feature a timeout. For electLeaders, attempt to trigger a leader
election for all partitions when the request specifies null for the topics
argument.
Reviewers: David Arthur <mumrah@gmail.com>
(reverted #10405). #10405 has several issues, for example:
It fails to create a topic with 9000 partitions.
It flushes in several unnecessary places.
If multiple segments of the same partition are flushed at roughly the same time, we may end up doing multiple unnecessary flushes: the logic of handling the flush in LogSegments.scala is weird.
Kafka does not call fsync() on the directory when a new log segment is created and flushed to disk.
The problem is that following sequence of calls doesn't guarantee file durability:
fd = open("log", O_RDWR | O_CREATE); // suppose open creates "log"
write(fd);
fsync(fd);
If the system crashes after fsync() but before the parent directory has been flushed to disk, the log file can disappear.
This PR is to flush the directory when flush() is called for the first time.
Did performance test which shows this PR has a minimal performance impact on Kafka clusters.
Reviewers: Jun Rao <junrao@gmail.com>
Currently KafkaStreams#cleanUp only throw an IllegalStateException if the state is RUNNING or REBALANCING, however the application could be in the process of shutting down in which case StreamThreads may still be running. We should also throw if the state is PENDING_ERROR or PENDING_SHUTDOWN
Reviewers: Walker Carlson <wcarlson@confluent.io>, Guozhang Wang <guozhang@confluent.io>
Introduce List serde for primitive types or custom serdes with a serializer and a deserializer according to KIP-466
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Matthias J. Sax <mjsax@conflunet.io>, John Roesler <roesler@confluent.io>, Michael Noll <michael@confluent.io>
Added server-common module to have server side common classes. Moved ApiMessageAndVersion, RecordSerde, AbstractApiMessageSerde, and BytesApiMessageSerde to server-common module.
Reivewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
Consecutive UUID generation could result in same prefix.
Reviewers: Josep Prat <josep.prat@aiven.io>, Anna Sophie Blee-Goldman <ableegoldman@apache.org>
* replace deprecated Class.newInstance() to class.getDeclaredConstructor().newInstance()
* throw ReflectiveOperationException to cover all other exceptions
Reviewers: Tom Bentley <tbentley@redhat.com>
* Changes the new Throughput Generators to track messages per window
instead of making per-second calculations which can have rounding errors.
Also, one of these had a calculation error which prompted this change in
the first place.
* Fixes a couple typos.
* Fixes an error where certain JSON fields were not exposed, causing the
workloads to not behave as intended.
* Fixes a bug where we use wait not in a loop, which exits too quickly.
* Adds additional constant payload generators.
* Fixes problems with an example spec.
* Fixes several off-by-one comparisons.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This PR upgrades RocksDB to 6.19.3. After the upgrade the Gradle build exited with code 134 due to SIGABRT signals ("Pure virtual function called!") coming from the C++ part of RocksDB. This error was caused by RocksDB state stores not properly closed in Streams' code. This PR adds the missing closings and updates the RocksDB option adapter.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <wangguoz@gmail.com>
The version of the Eclipse Jersey library brought as dependences,
2.31, has a known vulnerability, CVE-2021-28168 (https://github.com/advisories/GHSA-c43q-5hpj-4crv).
This replaces it with 2.34, which is fully compatible with
2.31, except for bugs and vulnerabilities.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
We currently use hamcrest imports to check the outputs of the RelationalSmokeTest, but with the new gradle updates, the proper hamcrest imports are no longer included in the test jar.
This is a bit of a workaround to remove the hamcrest usage so we can get system tests up and running again. Potential follow-up could be to update the way we create the test-jar to pull in the proper dependencies.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Try to address the extreme flakiness of shouldInnerJoinMultiPartitionQueryable since the recent test cleanup. Since we need to wait for 3 streams reach RUNNING state, it makes sense to increase the waiting time to make the test more reliable.
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>
1. Make code simpler and cleaner
2. After the PR: the testLargeAssignmentAndGroupWithUniformSubscription (1 million partitions) will run from ~2600 ms down to ~1400 ms, improves 46% of performance, almost 2x faster!!
Reviewers: Anna Sophie Blee-Goldman <ableegoldman@apache.org>, Guozhang Wang <guozhang@confluent.io>
When users supply in-memory stores for left/outer joins, then the internal shared outer store must be switch to in-memory store too. This will allow users who want to keep all stores in memory to continue doing so.
Added unit tests to validate topology and left/outer joins work fine with an in-memory shared store.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
KAFKA-12429: Added serdes for the default implementation of RLMM based on an internal topic as storage. This topic will receive events of RemoteLogSegmentMetadata, RemoteLogSegmentUpdate, and RemotePartitionDeleteMetadata. These events are serialized into Kafka protocol message format.
Added tests for all the event types for that topic.
This is part of the tiered storaqe implementation KIP-405.
Reivewers: Kowshik Prakasam <kprakasam@confluent.io>, Jun Rao <junrao@gmail.com>
The test cases for ThreaCache didn't have the corresponding unit tests for all, reverseAll and reverseRange methods. This PR aims to add the same.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Adds an internal flag that can be used to disable the fixes in KAFKA-10847. It defaults to true if the flag is not set or has an invalid boolean value.
The flag is named __enable.kstreams.outer.join.spurious.results.fix__. This flag is considered internal only. It is a temporary flag that will be used to help users to disable the join fixes while they do a transition from the previous semantics of left/outer joins. The flag may be removed in future releases.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Remove duplicate serializing auto-generated data in RequestConvertToJsonTest, this is inspired by #9964
2. Remove RequestTestUtils.serializeRequestWithHeader since we added a AbstractRequest.serializeWithHeader in #10142
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Implement Raft Snapshot loading API.
1. Adds a new method `handleSnapshot` to `raft.Listener` which is called whenever the `RaftClient` determines that the `Listener` needs to load a new snapshot before reading the log. This happens when the `Listener`'s next offset is less than the log start offset also known as the earliest snapshot.
2. Adds a new type `SnapshotReader<T>` which provides a `Iterator<Batch<T>>` interface and de-serializes records in the `RawSnapshotReader` into `T`s
3. Adds a new type `RecordsIterator<T>` that implements an `Iterator<Batch<T>>` by scanning a `Records` object and deserializes the batches and records into `Batch<T>`. This type is used by both `SnapshotReader<T>` and `RecordsBatchReader<T>` internally to implement the `Iterator` interface that they expose.
4. Changes the `MockLog` implementation to read one or two batches at a time. The previous implementation always read from the given offset to the high-watermark. This made it impossible to test interesting snapshot loading scenarios.
5. Removed `throws IOException` from some methods. Some of types were inconsistently throwing `IOException` in some cases and throwing `RuntimeException(..., new IOException(...))` in others. This PR improves the consistent by wrapping `IOException` in `RuntimeException` in a few more places and replacing `Closeable` with `AutoCloseable`.
6. Updated the Kafka Raft simulation test to take into account snapshot. `ReplicatedCounter` was updated to generate snapshot after 10 records get committed. This means that the `ConsistentCommittedData` validation was extended to take snapshots into account. Also added a new invariant to ensure that the log start offset is consistently set with the earliest snapshot.
Reviewers: dengziming <swzmdeng@163.com>, David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>