Includes:
- New API to authorize by resource type
- Default implementation for the method that supports super users and ACLs
- Optimized implementation in AclAuthorizer that supports ACLs, super users and allow.everyone.if.no.acl.found
- Benchmarks and tests
- InitProducerIdRequest authorized for Cluster:IdempotentWrite or WRITE to any topic, ProduceRequest authorized only for topic even if idempotent
Reviewers: Lucas Bradstreet <lucas@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This patch follows up https://github.com/apache/kafka/pull/9547. It refactors AbstractFetcherThread and its descendants to use `OffsetForLeaderEpochRequestData.OffsetForLeaderPartition` instead of `OffsetsForLeaderEpochRequest.PartitionData`. The patch relies on existing tests.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
* The naming for `ListOffsets` was inconsistent, in some places it was `ListOffset` and in others
it was `ListOffsets`. Picked the latter since it was used in metrics and the protocol documentation
and made it consistent.
* Removed unused methods in ApiKeys.
* Deleted `CommonFields`.
* Added `lowestSupportedVersion` and `highestSupportedVersion` to `ApiMessageType`
* Removed tests in `MessageTest` that are no longer relevant.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
print out the feature flags received at DEBUG level, as well as the other version information.
Example log line:
[2020-11-03 17:47:17,076] DEBUG Node 0 has finalizedFeaturesEpoch: 42, finalizedFeatures: [FinalizedFeatureKey(name='feature_1', maxVersionLevel=2, minVersionLevel=1), FinalizedFeatureKey(name='feature_2', maxVersionLevel=4, minVersionLevel=3)], supportedFeatures: [SupportedFeatureKey(name='feature_1', minVersion=1, maxVersion=2), SupportedFeatureKey(name='feature_2', minVersion=3, maxVersion=4)] (org.apache.kafka.clients.NetworkClient:926)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR introduces a new interface 'TransferableChannel' to replace GatheringByteChannel to avoid casting in write path. `TransferableChannel ` extends GatheringByteChannel with the minimal set of methods required by the Send interface. Supporting TLS and efficient zero copy transfers are the main reasons for the additional methods.
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Missed this in #9729. The substitution in `markCoordinatorUnknown` does not work because the argument is not provided as a parameter.
Reviewers: Ismael Juma <ismael@juma.me.uk>
When a consumer encounters an issue that triggers marking it to mark coordinator as unknown, the error message it prints does not give much context about the error that triggered it. This change includes the response error that triggered the transition or any other cause if not triggered by an error code in a response.
Reviewers: Jason Gustafson <jason@confluent.io>
This patch updates the request logger to output request and response payloads in JSON. Payloads are converted to JSON based on their auto-generated schema.
Reviewers: Lucas Bradstreet <lucas@confluent.io>, David Mao <dmao@confluent.io>, David Jacot <djacot@confluent.io>
Set it as a cluster action and update the handler in KafkaApis. We keep the `throttleTimeMs` field
since we intend to enable throttling in the future (especially relevant when we switch to the
built-in quorum mode).
Reviewers: David Arthur <mumrah@gmail.com>
This PR adds support for IP entities to the `DescribeClientQuotas` and `AlterClientQuotas` APIs. This PR also adds support for describing/altering IP quotas via `kafka-configs` tooling.
Reviewers: Brian Byrne <bbyrne@confluent.io>, Anna Povzner <anna@confluent.io>, David Jacot <djacot@confluent.io>
As suggested, ensure InvalidProducerEpoch gets caught properly on stream side.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
Connection id is now only present in `NetworkSend`, which is now
the class used by `Selector`/`NetworkClient`/`KafkaChannel` (which
works well since `NetworkReceive` is the class used for
received data).
The previous `NetworkSend` was also responsible for adding a size
prefix. This logic is already present in `SendBuilder`, but for the
minority of cases where `SendBuilder` is not used (including
a number of tests), we now have `ByteBufferSend.sizePrefixed()`.
With regards to the request/message utilities:
* Renamed `toByteBuffer`/`toBytes` in `MessageUtil` to
`toVersionPrefixedByteBuffer`/`toVersionPrefixedBytes` for clarity.
* Introduced new `MessageUtil.toByteBuffer` that does not include
the version as the prefix.
* Renamed `serializeBody` in `AbstractRequest/Response` to
`serialize` for symmetry with `parse`.
* Introduced `RequestTestUtils` and moved relevant methods from
`TestUtils`.
* Moved `serializeWithHeader` methods that were only used in
tests to `RequestTestUtils`.
* Deleted `MessageTestUtil`.
Finally, a couple of changes to simplify coding patterns:
* Added `flip()` and `buffer()` to `ByteBufferAccessor`.
* Added `MessageSizeAccumulator.sizeExcludingZeroCopy`.
* Used lambdas instead of `TestCondition`.
* Used `Arrays.copyOf` instead of `System.arraycopy` in `MessageUtil`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
This reverts commit 8a59a22881 since it breaks
client configurations like `bootstrap.servers=SASL_PLAINTEXT://localhost:49767`.
A KIP will be submitted to discuss the details and an adjusted change will
be submitted depending on the outcome of that.
This patch ensures that the leader is included among the voters in the `LeaderChangeMessage`. It also adds an additional field for the set of granting voters, which was originally specified in KIP-595.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>
add total timeout for forwarding, including the underlying broker-to-controller channel timeout setting.
Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>
Generated request/response classes have code to serialize/deserialize directly to
`ByteBuffer` so the intermediate conversion to `Struct` can be skipped for them.
We have recently completed the transition to generated request/response classes,
so we can also remove the `Struct` based fallbacks.
Additional noteworthy changes:
* `AbstractRequest.parseRequest` has a more efficient computation of request size that
relies on the received buffer instead of the parsed `Struct`.
* Use `SendBuilder` for `AbstractRequest/Response` `toSend`, made the superclass
implementation final and removed the overrides that are no longer necessary.
* Removed request/response constructors that assume latest version as they are unsafe
outside of tests.
* Removed redundant version fields in requests/responses.
* Removed unnecessary work in `OffsetFetchResponse`'s constructor when version >= 2.
* Made `AbstractResponse.throttleTimeMs()` abstract.
* Using `toSend` in `SaslClientAuthenticator` instead of `serialize`.
* Various changes in Request/Response classes to make them more consistent and to
rely on the Data classes as much as possible when it comes to their state.
* Remove the version argument from `AbstractResponse.toString`.
* Fix `getErrorResponse` for `ProduceRequest` and `DescribeClientQuotasRequest` to
use `ApiError` which processes the error message sent back to the clients. This was
uncovered by an accidental fix to a `RequestResponseTest` test (it was calling
`AbstractResponse.toString` instead of `AbstractResponse.toString(short)`).
Rely on existing protocol tests to ensure this refactoring does not change
observed behavior (aside from improved performance).
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR adds support for generating snapshot for KIP-630.
1. Adds the interfaces `RawSnapshotWriter` and `RawSnapshotReader` and the implementations `FileRawSnapshotWriter` and `FileRawSnapshotReader` respectively. These interfaces and implementations are low level API for writing and reading snapshots. They are internal to the Raft implementation and are not exposed to the users of `RaftClient`. They operation at the `Record` level. These types are exposed to the `RaftClient` through the `ReplicatedLog` interface.
2. Adds a buffered snapshot writer: `SnapshotWriter<T>`. This type is a higher-level type and it is exposed through the `RaftClient` interface. A future PR will add the related `SnapshotReader<T>`, which will be used by the state machine to load a snapshot.
Reviewers: Jason Gustafson <jason@confluent.io>
For MemberIdRequiredException, we would not print the exception at INFO with a full exception message since it may introduce more confusion that clearance.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Boyang Chen <boyang@confluent.io>
This patch follows up https://github.com/apache/kafka/pull/9547. It refactors KafkaApis, ReplicaManager and Partition to use `OffsetForLeaderEpochResponseData.EpochEndOffset` instead of `EpochEndOffset`. In the mean time, it removes `OffsetsForLeaderEpochRequest#epochsByTopicPartition` and `OffsetsForLeaderEpochResponse#responses` and replaces their usages to use the automated protocol directly. Finally, it removes old constructors in `OffsetsForLeaderEpochResponse`. The patch relies on existing tests.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
As a follow-up from [KIP-482](https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields), this PR bumps the version for several
RPC's to enable tagged fields via the flexible versioning mechanism.
Additionally, a new IBP version `KAFKA_2_8_IV0` is introduced to
allow replication to take advantage of these new RPC versions for
OffsetForLeaderEpoch and ListOffset.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
We use a background thread for Kerberos to perform re-login before tickets expire. The thread performs logout() followed by login(), relying on the Java library to clear and then populate credentials in Subject. This leaves a timing window where clients fail to authenticate because credentials are not available. We cannot introduce any form of locking since authentication is performed on the network thread. So this commit treats NO_CRED as a transient failure rather than a fatal authentication exception in clients.
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
This patch factors out some common parsing logic from `NetworkClient.parseResponse` and `AbstractResponse.parseResponse`. As a result of this refactor, we are now verifying the correlationId in forwarded requests. This patch also adds a test case to verify handling in this case.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Boyang Chen <boyang@confluent.io>
This patch changes the grouping of `Send` objects created by `SendBuilder` in order to reduce the number of generated `Send` objects and thereby the number of system writes.
Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This PR migrates the OffsetsForLeaderEpoch request/response to the automated protocol. It also refactors the OffsetsForLeaderEpochClient to use directly the internal structs generated by the automated protocol. It relies on the existing tests.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
As decided in KIP-516, the UUID class should be named Uuid. Change all instances of
org.apache.kafka.common.UUID to org.apache.kafka.common.Uuid.
Also modify Uuid so that it stores two `long` fields instead of wrapping java.util.UUID
to reduce memory usage.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Ensures INVALID_PRODUCER_EPOCH recognizable from client side, and ensure the ProduceResponse always uses the old error code as INVALID_PRODUCER_EPOCH.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
This patch creates a new `SendBuilder` class which allows us to avoid copying "zero copy" types when transmitting an api message over the network. This generalizes the pattern that was previously used only for `FetchResponse`. Initially we only apply this optimization to the `Envelope` types and `FetchResponse`, but in the future, it can be the default implementation for `toSend`.
The patch also contains a few minor cleanups such as moving envelope parsing logic into `RequestContext`.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Zstd-jni 1.4.5-6 allocates large internal buffers inside of ZstdInputStream and ZstdOutputStream. This caused a lot of allocation and GC activity when creating and closing the streams. It also does not buffer the reads or writes. This causes inefficiency when DefaultRecord.writeTo() does a series of small single bytes reads using various ByteUtils methods. The JNI is more efficient if the writes of uncompressed data were flushed in large pieces rather than for each byte. This is due to the the expense of context switching between the Java code and the native code. This is also the case when reading as well. Per luben/zstd-jni#141 the maintainer of zstd-jni and I agreed to not buffer reads and writes in favor of having the caller do that, so here we are updating the caller.
In this patch, I upgraded to the most recent zstd-jni version with the buffer reuse built-in. This was done in luben/zstd-jni#143 and luben/zstd-jni#146 Since we decided not to add additional buffering of input/output with zstd-jni, I added the BufferedInputStream and BufferedOutputStream to CompressionType.ZSTD just like we currently do for CompressionType.GZIP which also is inefficient for single byte reads and writes. I used the same buffer sizes as that existing implementation.
NOTE: if so desired we could pass a wrapped BufferSupplier into the Zstd*Stream classes to have Kafka decide how the buffer recycling occurs. This functionality was added in the latter PR linked above. I am holding off on this since based on jmh benchmarking the performance gains were not clear and personally I don't know if it worth the complexity of trying to hack around the reflection at this point in time. The zstd-jni uses a very similar default recycler as snappy does currently which seems to provide decent efficiency. While this PR fixes the defect, I feel that using BufferSupplier in both zstd-jni and snappy is outside of the scope of this bugfix and should be considered a separate improvement. I would prefer this change get merged in on its own since the performance gains here are very significant relative to the more incremental and minor optimizations which could be achieved by doing that separate work.
There are some noticeable improvements in the JMH benchmarks (excerpt):
BEFORE:
Benchmark (bufferSupplierStr) (bytes) (compressionType) (maxBatchSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed CREATE RANDOM ZSTD 200 1000 2 thrpt 15 27743.260 ± 673.869 ops/s
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 3399.966 ± 82.608 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 134968.010 ± 0.012 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 3850.985 ± 84.476 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 152881.128 ± 942.189 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Survivor_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 174.241 ± 3.486 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Survivor_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 6917.758 ± 82.522 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 1689.000 counts
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 82621.000 ms
JMH benchmarks done
Benchmark (bufferSupplierStr) (bytes) (compressionType) (maxBatchSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage CREATE RANDOM ZSTD 200 1000 2 thrpt 15 24095.711 ± 895.866 ops/s
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2932.289 ± 109.465 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 134032.012 ± 0.013 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 3282.912 ± 115.042 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 150073.914 ± 1342.235 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Survivor_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 149.697 ± 5.786 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Survivor_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 6842.462 ± 64.515 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 1449.000 counts
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 82518.000 ms
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize CREATE RANDOM ZSTD 200 1000 2 thrpt 15 1449.060 ± 230.498 ops/s
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 198.051 ± 31.532 MB/sec
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 150502.519 ± 0.186 B/op
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 200.064 ± 31.879 MB/sec
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 152569.341 ± 13826.686 B/op
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 91.000 counts
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 75869.000 ms
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2609.660 ± 1145.160 ops/s
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 815.441 ± 357.818 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 344309.097 ± 0.238 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 808.952 ± 354.975 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 345712.061 ± 51434.034 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.019 ± 0.042 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 18.615 ± 42.045 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Survivor_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 24.132 ± 12.254 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Survivor_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 13540.960 ± 14649.192 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 148.000 counts
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 23848.000 ms
JMH benchmarks done
AFTER
Benchmark (bufferSupplierStr) (bytes) (compressionType) (maxBatchSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed CREATE RANDOM ZSTD 200 1000 2 thrpt 15 147792.454 ± 2721.318 ops/s
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2708.481 ± 50.012 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 20184.002 ± 0.002 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2732.667 ± 59.258 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 20363.460 ± 120.585 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Old_Gen CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.042 ± 0.033 MB/sec
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.churn.G1_Old_Gen.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.316 ± 0.249 B/op
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 833.000 counts
CompressedRecordBatchValidationBenchmark.measureValidateMessagesAndAssignOffsetsCompressed:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 8390.000 ms
JMH benchmarks done
Benchmark (bufferSupplierStr) (bytes) (compressionType) (maxBatchSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage CREATE RANDOM ZSTD 200 1000 2 thrpt 15 166786.092 ± 3285.702 ops/s
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2926.914 ± 57.464 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 19328.002 ± 0.002 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2938.541 ± 66.850 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 19404.357 ± 177.485 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Old_Gen CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.516 ± 0.100 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Old_Gen.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 3.409 ± 0.657 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Survivor_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.032 ± 0.131 MB/sec
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.churn.G1_Survivor_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.207 ± 0.858 B/op
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 834.000 counts
RecordBatchIterationBenchmark.measureIteratorForBatchWithSingleMessage:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 9370.000 ms
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize CREATE RANDOM ZSTD 200 1000 2 thrpt 15 15988.116 ± 137.427 ops/s
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 448.636 ± 3.851 MB/sec
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 30907.698 ± 0.020 B/op
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 450.905 ± 5.587 MB/sec
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 31064.113 ± 291.190 B/op
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.043 ± 0.007 MB/sec
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2.931 ± 0.493 B/op
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 790.000 counts
RecordBatchIterationBenchmark.measureSkipIteratorForVariableBatchSize:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 999.000 ms
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize CREATE RANDOM ZSTD 200 1000 2 thrpt 15 11345.169 ± 206.528 ops/s
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.alloc.rate CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2314.800 ± 42.094 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.alloc.rate.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 224714.266 ± 0.028 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2320.213 ± 45.521 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Eden_Space.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 225235.965 ± 803.309 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen CREATE RANDOM ZSTD 200 1000 2 thrpt 15 0.026 ± 0.005 MB/sec
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.churn.G1_Old_Gen.norm CREATE RANDOM ZSTD 200 1000 2 thrpt 15 2.551 ± 0.455 B/op
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.count CREATE RANDOM ZSTD 200 1000 2 thrpt 15 994.000 counts
RecordBatchIterationBenchmark.measureStreamingIteratorForVariableBatchSize:·gc.time CREATE RANDOM ZSTD 200 1000 2 thrpt 15 1189.000 ms
JMH benchmarks done
Reviewers: Ismael Juma <ismael@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This PR follows up 0814e4f to migrate the remaining RPCs which need forwarding, including:
CreateAcls/DeleteAcls/CreateDelegationToken/RenewDelegationToken/ExpireDelegationToken/AlterPartitionReassignment/CreatePartition/DeleteTopics/UpdateFeatures/Scram
Reviewers: David Arthur <mumrah@gmail.com>
When initializing the raft state machine after shutting down as a leader, we were previously entering the "unattached" state, which means we have no leader and no voted candidate. This was a bug because it allowed a reinitialized leader to cast a vote for a candidate in the same epoch that it was already the leader of. This patch fixes the problem by introducing a new "resigned" state which allows us to retain the leader state so that we cannot change our vote and we will not accept additional appends.
This patch also revamps the shutdown logic to make use of the new "resigned" state. Previously we had a separate path in `KafkaRaftClient.poll` for the shutdown logic which resulted in some duplication. Instead now we incorporate shutdown behavior into each state's respective logic.
Finally, this patch changes the shutdown logic so that `EndQuorumEpoch` is only sent by resigning leaders. Previously we allowed this request to be sent by candidates as well.
Reviewers: dengziming <dengziming1993@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
This PR adds support for forwarding of the following RPCs:
AlterConfigs
IncrementalAlterConfigs
AlterClientQuotas
CreateTopics
Co-authored-by: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
In this PR, I have eliminated the facility in Admin#describeFeatures API and it's implementation to be able to optionally send a describeFeatures request to the controller. This feature was not seen to be particularly useful, and besides it also poses some hindrance to post KIP-500 world where no client would be able to access the controller directly.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jun Rao <junrao@gmail.com>
In #9418, we add a listener to the `RaftClient` interface. In that patch, we used it only to send commit notifications for writes from the leader. In this PR, we extend the `handleCommit` API to accept all committed data and we remove the pull-based `read` API. Additionally, we add two new callbacks to the listener interface in order to notify the state machine when the raft client has claimed or resigned leadership.
Finally, this patch allows the `RaftClient` to support multiple listeners. This is necessary for KIP-500 because we will have one listener for the controller role and one for the broker role.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Boyang Chen <boyang@confluent.io>
Couple of failures observed after KAFKA-9627: Replace ListOffset request/response with automated protocol (https://github.com/apache/kafka/pull/8295)
1. Latest consumer fails to consume from 0.10.0.1 brokers. Below system tests are failing
kafkatest.tests.client.client_compatibility_features_test.ClientCompatibilityFeaturesTest
kafkatest.tests.client.client_compatibility_produce_consume_test.ClientCompatibilityProduceConsumeTest
Solution: Current default value for MaxNumOffsets is 0. because to this brokers are not returning offsets for v0 request. Set default value for MaxNumOffsets field to 1. This is similar to previous [approach]
(https://github.com/apache/kafka/blob/2.6/clients/src/main/java/org/apache/kafka/common/requests/ListOffsetRequest.java#L204)
2. In some scenarios, latest consumer fails with below error when connecting to a Kafka cluster which consists of newer and older (<=2.0) Kafka brokers
`org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default currentLeaderEpoch at version 3`
Solution: After #8295, consumer can set non-default CurrentLeaderEpoch value for v3 and below requests. One solution is to make CurrentLeaderEpoch ignorable.
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: David Jacot <djacot@confluent.io>
Closes#9540 from omkreddy/fix-listoffsets
This reverts commit 21dc5231ce as we decide to use Envelope for redirection instead of initial principal.
Reviewers: Jason Gustafson <jason@confluent.io>
The patch adds `quorum.append.linger.ms` behavior to the raft implementation. This gives users a powerful knob to tune the impact of fsync. When an append is accepted from the state machine, it is held in an accumulator (similar to the producer) until the configured linger time is exceeded. This allows the implementation to amortize fsync overhead at the expense of some write latency.
The patch also improves our methodology for testing performance. Up to now, we have relied on the producer performance test, but it is difficult to simulate expected controller loads because producer performance is limited by other factors such as the number of producer clients and head-of-line blocking. Instead, this patch adds a workload generator which runs on the leader after election.
Finally, this patch brings us nearer to the write semantics expected by the KIP-500 controller. It makes the following changes:
- Introduce `RecordSerde<T>` interface which abstracts the underlying log implementation from `RaftClient`. The generic type is carried over to `RaftClient<T>` and is exposed through the read/write APIs.
- `RaftClient.append` is changed to `RaftClient.scheduleAppend` and returns the last offset of the expected log append.
- `RaftClient.scheduleAppend` accepts a list of records and ensures that the full set are included in a single batch.
- Introduce `RaftClient.Listener` with a single `handleCommit` API which will eventually replace `RaftClient.read` in order to surface committed data to the controller state machine. Currently `handleCommit` is only used for records appended by the leader.
Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
A regression got introduced by 466f8fd21c. The owned partition field must be ignored for version < 1 otherwise the serialization fails with an unsupported version exception.
Reviewers: Jason Gustafson <jason@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
`KafkaAdminClient.describeUserScramCredentials` should not fail with a NPE when `users` is `null` as `null` means that all the users must be returned.
Reviewers: Ron Dagostino <rdagostino@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, David Jacot <djacot@confluent.io>
The transaction manager does currently not handle producer fenced errors returned from a offset commit request.
Adds the handling of the producer fenced errors.
Reviewers: Boyang Chen <boyang@apache.org>, John Roesler <vvcephei@apache.org>
We currently stop polling in `Sender` in a transactional producer if there is only one broker in the bootstrap server list and `max.in.flight.requests.per.connection=1` and Metadata response is pending when InitProducerId request is ready to be sent. In this scenario, we attempt to send FindCoordinator to `leastLoadedNode`, but since that is blocked due to `max.in.flight=1` as a result of the pending metadata response, we never unblock unless we poll. This PR ensures we poll in this case.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>
In order to support topic IDs, we need to create a public UUID class. This class will be used in protocols. This PR creates the class, modifies code to use the class in the message protocol and changes the code surrounding the existing messages/json that used the old UUID class.
SimpleExampleMessage was used only for testing, so all usages of UUID have been switched to the new class.
SubscriptionInfoData uses UUID for processId extensively. It also utilizes java.util.UUID implementation of Comparable so that UUIDs can be ordered. This functionality was not necessary for the UUIDs used for topic IDs converted to java.util.UUID on the boundary of SubscriptionInfoData. Sorting was used only for testing, though, so this still may be changed.
Also added tests for the methods of the new UUID class. The existing SimpleExampleMessage tests should be sufficient for testing the new UUID class in message protocols.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
UpdateMetadataRequestTest.testVersionLogic's assertions must verify the deserialized request instead of the original one.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Using a Set is not necessary as the caller only cares about having the list of timed out connections/nodes.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Other than a Stack Overflow comment (see https://stackoverflow.com/a/61738065) by Colin Patrick McCabe and a proposed design note on KIP-117 wiki, there is no source that verifies the thread-safety of KafkaAdminClient.
This patch updates JavaDoc of KafkaAdminClient to clarify its thread-safety.
Reviewers: Tom Bentley <tbentley@redhat.com>, Chia-Ping Tsai <chia7712@gmail.com>
Fix a bug that was introduced by change 86013dc that resulted in incorrect behavior when
deleting through an iterator.
The bug is that the hash table relies on a denseness invariant... if you remove something,
you might have to move some other things. Calling removeElementAtSlot will do this.
Calling removeFromList is not enough.
Reviewers: Jason Gustafson <jason@confluent.io>
This replaces code and comment occurrences as described in the KIP
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Gwen Shapira, Mickael Maison
Closes#9366 from xvrl/kafka-10571
In this PR, I have addressed the review comments from @chia7712 in #9001 which were provided after #9001 was merged. The changes are made mainly to KafkaAdminClient:
Improve error message in updateFeatures api when feature name is empty.
Propagate top-level error message in updateFeatures api.
Add an empty-parameter variety for describeFeatures api.
Minor documentation updates to @param and @return to make these resemble other apis.
Reviewers: Chia-Ping Tsai chia7712@gmail.com, Jun Rao junrao@gmail.com
Summary:
In this PR, I have implemented the write path of the feature versioning system (KIP-584). Here is a summary of what's in this PR:
New APIs in org.apache.kafka.clients.admin.Admin interface, and their client and server implementations. These APIs can be used to describe features and update finalized features. These APIs are: Admin#describeFeatures and Admin#updateFeatures.
The write path is provided by the Admin#updateFeatures API. The corresponding server-side implementation is provided in KafkaApis and KafkaController classes. This can be a good place to start the code review.
The write path is supplemented by Admin#describeFeatures client API. This does not translate 1:1 to a server-side API. Instead, under the hood the API makes an explicit ApiVersionsRequest to the Broker to fetch the supported and finalized features.
Implemented a suite of integration tests in UpdateFeaturesTest.scala that thoroughly exercises the various cases in the write path.
Other changes:
The data type of the FinalizedFeaturesEpoch field in ApiVersionsResponse has been modified from int32 to int64. This change is to conform with the latest changes to the KIP explained in the voting thread.
Along the way, the class SupportedFeatures has been renamed to be called BrokerFeatures, and, it now holds both supported features as well as default minimum version levels.
For the purpose of testing, both the BrokerFeatures and FinalizedFeatureCache classes have been changed to be no longer singleton in implementation. Instead, these are now instantiated once and maintained in KafkaServer. The singleton instances are passed around to various classes, as needed.
Reviewers: Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>
This field is leftover from the early days of the KIP when it covered reassignment. The API is not exposed yet, so there is no harm updating the first version.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Adds support for SSL key and trust stores to be specified in PEM format either as files or directly as configuration values.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Add following checks to `KafkaConsumer.groupMetadata`:
1. null check of coordinator (replace NPE by `InvalidGroupIdException` which is same to other methods)
2. concurrent check (`groupMetadata` is not thread-safe so concurrent check is necessary)
Reviewers: Jason Gustafson <jason@confluent.io>
This PR adds the logic to preserve the ThrottlingQuotaExceededException when topics are retried. The throttleTimeMs is also adjusted accordingly as the request could remain pending or in-flight for quite a long time.
Have run various tests on clusters with enabled quotas and I, indeed, find it better to preserve the exception. Otherwise, the caller does not really understand what is going on. This allows the caller to take the appropriate measure and also to take the throttleTimeMs into consideration.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Some minor logging adjustments to standardize the grammar of rebalance related messages and make it easy to query the logs for quick debugging results
Guozhang Wang <wangguoz@gmail.com>
This PR moves the consumer protocol to using the automated protocol instead of using plain old structs.
Reviewers: Jason Gustafson <jason@confluent.io>
This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io>
Co-authored-by: Boyang Chen <boyang@confluent.io>
Co-authored-by: Guozhang Wang <wangguoz@gmail.com>
There are no checks on the header key so instantiating key (bytes to string) is unnecessary.
One implication is that conversion failures will be detected a bit later, but this is consistent
with how we handle the header value.
**JMH RESULT**
1. ops: +12%
1. The optimization of memory usage is very small as the cost of creating extra ```ByteBuffer``` is
almost same to byte array copy (used to construct ```String```). Using large key results in better
improvement but I don't think large key is common case.
**BEFORE**
```
Benchmark (bufferSupplierStr) (bytes) (compressionType) (headerKeySize) (maxBatchSize) (maxHeaderSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureValidation NO_CACHING RANDOM NONE 10 200 5 1000 2 thrpt 15 2035938.174 ± 1653.566 ops/s
RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm NO_CACHING RANDOM NONE 10 200 5 1000 2 thrpt 15 2040.000 ± 0.001 B/op
```
```
Benchmark (bufferSupplierStr) (bytes) (compressionType) (headerKeySize) (maxBatchSize) (maxHeaderSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureValidation NO_CACHING RANDOM NONE 30 200 5 1000 2 thrpt 15 1979193.376 ± 1239.286 ops/s
RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm NO_CACHING RANDOM NONE 30 200 5 1000 2 thrpt 15 2120.000 ± 0.001 B/op
```
**AFTER**
```
Benchmark (bufferSupplierStr) (bytes) (compressionType) (headerKeySize) (maxBatchSize) (maxHeaderSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureValidation NO_CACHING RANDOM NONE 10 200 5 1000 2 thrpt 15 2289115.973 ± 2661.856 ops/s
RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm NO_CACHING RANDOM NONE 10 200 5 1000 2 thrpt 15 2032.000 ± 0.001 B/op
```
```
Benchmark (bufferSupplierStr) (bytes) (compressionType) (headerKeySize) (maxBatchSize) (maxHeaderSize) (messageSize) (messageVersion) Mode Cnt Score Error Units
RecordBatchIterationBenchmark.measureValidation NO_CACHING RANDOM NONE 30 200 5 1000 2 thrpt 15 2222625.706 ± 908.358 ops/s
RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm NO_CACHING RANDOM NONE 30 200 5 1000 2 thrpt 15 2040.000 ± 0.001 B/op
```
Reviewers: Ismael Juma <ismael@juma.me.uk>
Currently the docs have HTML ids for each config key. That doesn't work
correctly for config keys like bootstrap.servers which occur across
producer, consumer, admin configs: We generate duplicate ids. So arrange
for each config to prefix the ids it generates with the HTML id of its
section heading.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
This patch fixes a couple problems with the use of the `StructRegistry`. First, it fixes registration so that it is consistently based on the typename of the struct. Previously structs were registered under the field name which meant that fields which referred to common structs resulted in multiple entries. Second, the patch fixes `SchemaGenerator` so that common structs are considered first.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch changes the Fetch response schema to include both the diverging epoch and its end offset rather than just the offset. This allows for more accurate truncation on the follower. This is the schema that was originally specified in KIP-595, but we altered it during the discussion.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This patch bumps the `Fetch` protocol as specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. The main differences are the following:
- Truncation detection
- Leader discovery through the response
- Flexible version support
The most notable change is truncation detection. This patch adds logic in the request handling path to detect truncation, but it does not change the replica fetchers to make use of this capability. This will be done separately.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
This PR fixes two issues that have been introduced by #9114.
- When the metric was switched from Rate to TokenBucket in the ControllerMutationQuotaManager, the metrics were mixed up. That broke the quota update path.
- When a quota is updated, the ClientQuotaManager updates the MetricConfig of the KafkaMetric. That update was not reflected into the Sensor so the Sensor was still using the MetricConfig that it has been created with.
Reviewers: Anna Povzner <anna@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
This change sets the groundwork for migrating other modules incrementally.
Main changes:
- Replace `junit` 4.13 with `junit-jupiter` and `junit-vintage` 5.7.0-RC1.
- All modules except for `tools` depend on `junit-vintage`.
- `tools` depends on `junit-jupiter`.
- Convert `tools` tests to JUnit 5.
- Update `PushHttpMetricsReporterTest` to use `mockito` instead of `powermock` and `easymock`
(powermock doesn't seem to work well with JUnit 5 and we don't need it since mockito can mock
static methods).
- Update `mockito` to 3.5.7.
- Update `TestUtils` to use JUnit 5 assertions since `tools` depends on it.
Unrelated clean-ups:
- Remove `unit` from package names in a few `core` tests.
- Replace `try/catch/fail` with `assertThrows` in a number of places.
- Tag `CoordinatorTest` as integration test.
- Remove unnecessary type parameters when invoking methods and constructors.
Tested with IntelliJ and gradle. Verified that the following commands work as expected:
* ./gradlew tools:unitTest
* ./gradlew tools:integrationTest
* ./gradlew tools:test
* ./gradlew core:unitTest
* ./gradlew core:integrationTest
* ./gradlew clients:test
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
1. Split the consumer coordinator's REBALANCING state into PREPARING_REBALANCE and COMPLETING_REBALANCE. The first is when the join group request is sent, and the second is after the join group response is received. During the first state we should still not send hb since it shares the same socket with the join group request and the group coordinator has disabled timeout, however when we transit to the second state we should start sending hb in case leader's assign takes long time. This is also for fixing KAFKA-10122.
2. When deciding coordinator#timeToNextPoll, do not count in timeToNextHeartbeat if the state is in UNJOINED or PREPARING_REBALANCE since we would disable hb and hence its timer would not be updated.
3. On the broker side, allow hb received during PREPARING_REBALANCE, return NONE error code instead of REBALANCE_IN_PROGRESS. However on client side, we still need to ignore REBALANCE_IN_PROGRESS if state is COMPLETING_REBALANCE in case it is talking to an old versioned broker.
4. Piggy-backing a log4j improvement on the broker coordinator for triggering rebalance reason, as I found it a bit blurred during the investigation. Also subsumed #9038 with log4j improvements.
The tricky part for allowing hb during COMPLETING_REBALANCE is in two parts: 1) before the sync-group response is received, a hb response may have reset the generation; also after the sync-group response but before the callback is triggered, a hb response can still reset the generation, we need to handle both cases by checking the generation / state. 2) with the hb thread enabled, the sync-group request may be sent by the hb thread even if the caller thread did not call poll yet.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Boyang Chen <boyang@confluent.io>, John Roesler <john@confluent.io>
Fixes flakiness in `KafkaAdminClientTest` as a result of #8864. Addresses the following flaky tests:
- testAlterReplicaLogDirsPartialFailure
- testDescribeLogDirsPartialFailure
- testMetadataRetries
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
The schema specification allows a struct type name to differ from the field name. This works with the generated `Message` classes, but not with the generated JSON converter. The patch fixes the problem, which is that the type name is getting replaced with the field name when the struct is registered in the `StructRegistry`.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Add ImplicitLinkedHashCollection#moveToEnd.
Refactor ImplicitLinkedHashCollectionIterator to be a little bit more
robust against concurrent modifications to the map (which admittedly
should not happen.)
Reviewers: Jason Gustafson <jason@confluent.io>
Implement the KIP-554 API to create, describe, and alter SCRAM user configurations via the AdminClient. Add ducktape tests, and modify JUnit tests to test and use the new API where appropriate.
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
For the generated message code, put the JSON conversion functionality
in a separate JsonConverter class.
Make MessageDataGenerator simply another generator class, alongside the
new JsonConverterGenerator class. Move some of the utility functions
from MessageDataGenerator into FieldSpec and other places, so that they
can be used by other generator classes.
Use argparse4j to support a better command-line for the generator.
Reviewers: David Arthur <mumrah@gmail.com>
Adds avg, min, and max e2e latency metrics at the new TRACE level. Also adds the missing avg task-level metric at the INFO level.
I think where we left off with the KIP, the TRACE-level metrics were still defined to be "stateful-processor-level". I realized this doesn't really make sense and would be pretty much impossible to define given the DFS processing approach of Streams, and felt that store-level metrics made more sense to begin with. I haven't updated the KIP yet so I could get some initial feedback on this
Reviewers: Bruno Cadonna <bruno@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Only check if positions need validation if there is new metadata.
Also fix some inefficient java.util.stream code in the hot path of SubscriptionState.
This patch fixes the generated serde logic for the 'records' type so that it uses the compact byte array representation consistently when flexible versions are enabled.
Reviewers: David Arthur <mumrah@gmail.com>
Add a separate error code as PRODUCER_FENCED to differentiate INVALID_PRODUCER_EPOCH. On broker side, replace INVALID_PRODUCER_EPOCH with PRODUCER_FENCED when the request version is the latest, while still returning INVALID_PRODUCER_EPOCH to older clients. On client side, simply handling INVALID_PRODUCER_EPOCH the same as PRODUCER_FENCED if from txn coordinator APIs.
Reviewers: Guozhang Wang <wangguoz@gmail.com>
The message generator was missing conversion logic for tagged structures. This led to casting errors when either `fromStruct` or `toStruct` were invoked. This patch also adds missing null checks in the serialization of tagged byte arrays, which was found from improved test coverage.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This patch removes the PartitionHeader grouping from the Fetch response. With old versions of the protocol, there was no cost for this grouping, but once we add flexible version support, then it adds an extra byte to the schema for tagged fields with little apparent benefit.
Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>
Based on the discussion in #9072, I have put together an alternative way. This one does the following:
Instead of changing the implementation of the Rate to behave like a Token Bucket, it actually use two different metrics: the regular Rate and a new Token Bucket. The latter is used to enforce the quota.
The Token Bucket algorithm uses the rate of the quota as the refill rate for the credits and compute the burst based on the number of samples and their length (# samples * sample length * quota).
The Token Bucket algorithm used can go under zero in order to handle unlimited burst (e.g. create topic with a number of partitions higher than the burst). Throttling kicks in when the number of credits is under zero.
The throttle time is computed as credits under zero / refill rate (or quota).
Only the controller mutation uses it for now.
The remaining number of credits in the bucket is exposed with the tokens metrics per user/clientId.
Reviewers: Anna Povzner <anna@confluent.io>, Jun Rao <junrao@gmail.com>
Enhance the understandability for constrainedAssign and generalAssign method by getting more detailed meta comments.
Co-authored-by: A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Reviewers: Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <ableegoldman@gmail.com>
Refactored FetchRequest and FetchResponse to use the generated message classes for serialization and deserialization. This allows us to bypass unnecessary Struct conversion in a few places. A new "records" type was added to the message protocol which uses BaseRecords as the field type. When sending, we can set a FileRecords instance on the message, and when receiving the message class will use MemoryRecords.
Also included a few JMH benchmarks which indicate a small performance improvement for requests with high partition counts or small record sizes.
Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io>, David Jacot <djacot@confluent.io>, Lucas Bradstreet <lucas@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, David Jacot <djacot@confluent.io>, Lee Dongjin <dongjin@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
Modified KafkaProducer.sendOffsetsToTransaction() to be affected with max.block.ms, and added timeout test for blocking methods
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Xi Hu <huxi_2b@hotmail.com>
Add a broker to controller channel manager for use cases such as redirection and AlterIsr.
Reviewers: David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
Co-authored-by: Viktor Somogyi <viktorsomogyi@gmail.com>
Co-authored-by: Boyang Chen <boyang@confluent.io>
While debugging a rebalance scenario I found that inside rejoinNeededOrPending when we trigger rebalance due to metadata or subscription changes it is not logged, and hence it's actually a bit tricky to find out the reason of the triggered rebalance. I'm adding two INFO log4j entries to fill in the gap.
Other requestRejoin() calls are already covered.
Reviewers: Boyang Chen <boyang@confluent.io>
This PR implements the broker side changes of KIP-599, except the changes of the Rate implementation which will be addressed separately. The PR changes/introduces the following:
- It introduces the protocol changes.
- It introduces a new quota manager ControllerMutationQuotaManager which is another specialization of the ClientQuotaManager.
- It enforces the quota in the KafkaApis and in the AdminManager. This part handles new and old clients as described in the KIP.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
- part of KIP-572
- deprecates producer config `retries` (still in use)
- deprecates admin config `retries` (still in use)
- deprecates Kafka Streams config `retries` (will be ignored)
- adds new Kafka Streams config `task.timeout.ms` (follow up PRs will leverage this new config)
Reviewers: John Roesler <john@confluent.io>, Jason Gustafson <jason@confluent.io>, Randall Hauch <randall@confluent.io>
- After #8312, older brokers are returning empty configs, with latest `adminClient.describeConfigs`. Old brokers are receiving empty configNames in `AdminManageer.describeConfigs()` method. Older brokers does not handle empty configKeys. Due to this old brokers are filtering all the configs.
- Update ClientCompatibilityTest to verify describe configs
- Add test case to test describe configs with empty configuration Keys
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#9046 from omkreddy/KAFKA-9432
Brokers currently return NOT_LEADER_FOR_PARTITION to producers and REPLICA_NOT_AVAILABLE to consumers if a replica is not available on the broker during reassignments. Non-Java clients treat REPLICA_NOT_AVAILABLE as a non-retriable exception, Java consumers handle this error by explicitly matching the error code even though it is not an InvalidMetadataException. This PR renames NOT_LEADER_FOR_PARTITION to NOT_LEADER_OR_FOLLOWER and uses the same error for producers and consumers. This is compatible with both Java and non-Java clients since all clients handle this error code (6) as retriable exception. The PR also makes ReplicaNotAvailableException a subclass of InvalidMetadataException.
- ALTER_REPLICA_LOG_DIRS continues to return REPLICA_NOT_AVAILABLE. Retained this for compatibility since this request never returned NOT_LEADER_FOR_PARTITION earlier.
- MetadataRequest version 0 also returns REPLICA_NOT_AVAILABLE as topic-level error code for compatibility. Newer versions filter these out and return Errors.NONE, so didn't change this.
- Partition responses in MetadataRequest return REPLICA_NOT_AVAILABLE to indicate that one of the replicas is not available. Did not change this since NOT_LEADER_FOR_PARTITION is not suitable in this case.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Bob Barrett <bob.barrett@confluent.io>
The intention of using poll(0) is to not block on rebalance but still return some data; however, `updateAssignmentMetadataIfNeeded` have three different logic: 1) discover coordinator if necessary, 2) join-group if necessary, 3) refresh metadata and fetch position if necessary. We only want to make 2) to be non-blocking but not others, since e.g. when the coordinator is down, then heartbeat would expire and cause the consumer to fetch with timeout 0 as well, causing unnecessarily high CPU.
Since splitting this function is a rather big change to make as a last minute blocker fix for 2.6, so I made a smaller change to make updateAssignmentMetadataIfNeeded has an optional boolean flag to indicate if 2) above should wait until either expired or complete, otherwise do not wait on the join-group future and just poll with zero timer.
Reviewers: Jason Gustafson <jason@confluent.io>
This PR fixes a bug introduced in #8683.
While processing connection set up timeouts, we are iterating through the connecting nodes to process timeouts and we disconnect within the loop, removing the entry from the set in the loop that it iterating over the set. That raises a ConcurrentModificationException exception. The current unit test did not catch this because it was using only one node.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
The documentation for max.block.ms said it affected only send()
and partitionsFor(), but it actually also affects initTransactions(),
abortTransaction() and commitTransaction(). So rework the
documentation to cover these methods too.
Reviewers: Boyang Chen <boyang@confluent.io>
This PR includes 3 MessageFormatters for MirrorMaker2 internal topics:
- HeartbeatFormatter
- CheckpointFormatter
- OffsetSyncFormatter
This also introduces a new public interface org.apache.kafka.common.MessageFormatter that users can implement to build custom formatters.
Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>, David Jacot <djacot@confluent.io>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Since https://issues.apache.org/jira/browse/KAFKA-8834, describing topics with the TopicCommand requires privileges to use ListPartitionReassignments or fails to describe the topics with the following error:
> Error while executing topic command : Cluster authorization failed.
This is a quite hard restriction has most of the secure clusters do not authorize non admin members to access ListPartitionReassignments.
This patch catches the `ClusterAuthorizationException` exception and gracefully fails back. We already do this when the API is not available so it remains consistent.
Author: David Jacot <djacot@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8947 from dajac/KAFKA-10212
We inadvertently changed the binary schema of the suppress buffer changelog
in 2.4.0 without bumping the schema version number. As a result, it is impossible
to upgrade from 2.3.x to 2.4+ if you are using suppression.
* Refactor the schema compatibility test to use serialized data from older versions
as a more foolproof compatibility test.
* Refactor the upgrade system test to use the smoke test application so that we
actually exercise a significant portion of the Streams API during upgrade testing
* Add more recent versions to the upgrade system test matrix
* Fix the compatibility bug by bumping the schema version to 3
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
The enum ```State``` is private so it is fine to fix typo without breaking compatibility.
Author: Chia-Ping Tsai <chia7712@gmail.com>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8932 from chia7712/MINOR-8932
Add unit tests for KafkaProducer.close(), KafkaProducer.abortTransaction(), and KafkaProducer.flush() in the KafkaProducerTest.
Increase KafkaProducer test code coverage from 82% methods, 82% lines to 86% methods, 87% lines when being merged.
Reviewers: Boyang Chen <boyang@confluent.io>
This patch fixes a bug in the constructor of `LogTruncationException`. We were passing the divergent offsets to the super constructor as the fetch offsets. There is no way to fix this without breaking compatibility, but the harm is probably minimal since this exception was not getting raised properly until KAFKA-9840 anyway.
Note that I have also moved the check for unknown offset and epoch into `SubscriptionState`, which ensures that the partition is still awaiting validation and that the fetch offset hasn't changed. Finally, I made some minor improvements to the logging and exception messages to ensure that we always have the fetch offset and epoch as well as the divergent offset and epoch included.
Reviewers: Boyang Chen <boyang@confluent.io>, David Arthur <mumrah@gmail.com>
Since admin client allows use to use flexible offset-spec, we can always set to use read-uncommitted regardless of the EOS config.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
## Background
When a partition subscription is initialized it has a `null` position and is in the INITIALIZING state. Depending on the consumer, it will then transition to one of the other states. Typically a consumer will either reset the offset to earliest/latest, or it will provide an offset (with or without offset metadata). For the reset case, we still have no position to act on so fetches should not occur.
Recently we made changes for KAFKA-9724 (#8376) to prevent clients from entering the AWAIT_VALIDATION state when targeting older brokers. New logic to bypass offset validation as part of this change exposed this new issue.
## Bug and Fix
In the partition subscriptions, the AWAIT_RESET state was incorrectly reporting that it had a position. In some cases a position might actually exist (e.g., if we were resetting offsets during a fetch after a truncation), but in the initialization case no position had been set. We saw this issue in system tests where there is a race between the offset reset completing and the first fetch request being issued.
Since AWAIT_RESET#hasPosition was incorrectly returning `true`, the new logic to bypass offset validation was transitioning the subscription to FETCHING (even though no position existed).
The fix was simply to have AWAIT_RESET#hasPosition to return `false` which should have been the case from the start. Additionally, this fix includes some guards against NPE when reading the position from the subscription.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Jason Gustafson <jason@confluent.io>
* Add documentation for using transformation predicates.
* Add `PredicateDoc` for generating predicate config docs, following the style of `TransformationDoc`.
* Fix the header depth mismatch.
* Avoid generating HTML ids based purely on the config name since there
are very likely to conflict (e.g. #name). Instead allow passing a function
which can be used to generate an id from a config key.
The docs have been generated and tested locally.
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
In this PR, I have implemented various classes and integration for the read path of the feature versioning system (KIP-584). The ultimate plan is that the cluster-wide finalized features information is going to be stored in ZK under the node /feature. The read path implemented in this PR is centered around reading this finalized features information from ZK, and, processing it inside the Broker.
Here is a summary of what's in this PR (a lot of it is new classes):
A facility is provided in the broker to declare its supported features, and advertise its supported features via its own BrokerIdZNode under a features key.
A facility is provided in the broker to listen to and propagate cluster-wide finalized feature changes from ZK.
When new finalized features are read from ZK, feature incompatibilities are detected by comparing against the broker's own supported features.
ApiVersionsResponse is now served containing supported and finalized feature information (using the newly added tagged fields).
Reviewers: Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>
This change adds a check to the KafkaConfigBackingStore, KafkaOffsetBackingStore, and KafkaStatusBackingStore to use the admin client to verify that the internal topics are compacted and do not use the `delete` cleanup policy.
Connect already will create the internal topics with `cleanup.policy=compact` if the topics do not yet exist when the Connect workers are started; the new topics are created always as compacted, overwriting any user-specified `cleanup.policy`. However, if the topics already exist the worker did not previously verify the internal topics were compacted, such as when a user manually creates the internal topics before starting Connect or manually changes the topic settings after the fact.
The current change helps guard against users running Connect with topics that have delete cleanup policy enabled, which will remove all connector configurations, source offsets, and connector & task statuses that are older than the retention time. This means that, for example, the configuration for a long-running connector could be deleted by the broker, and this will cause restart issues upon a subsequent rebalance or restarting of Connect worker(s).
Connect behavior requires that its internal topics are compacted and not deleted after some retention time. Therefore, this additional check is simply enforcing the existing expectations, and therefore does not need a KIP.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>, Chris Egerton <chrise@confluent.io>
Ensure all channels get closed in `Selector.close`, even if some of them raise errors.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
There is some confusion over the compression rate metrics, as the meaning of the value isn't clearly stated in the metric description. In this case, it was assumed that a higher compression rate value meant better compression. This PR clarifies the meaning of the value, to prevent misunderstandings.
Reviewers: Jason Gustafson <jason@confluent.io>
In the first version of the incremental cooperative protocol, in the presence of a failed sync request by the leader, the assignor was designed to treat the unapplied assignments as lost and trigger a rebalance delay.
This commit applies optimizations in these cases to avoid the unnecessary activation of the rebalancing delay. First, if the worker that loses the sync group request or response is the leader, then it detects this failure by checking the what is the expected generation when it performs task assignments. If it's not the expected one, it resets its view of the previous assignment because it wasn't successfully applied and it doesn't represent a correct state. Furthermore, if the worker that has missed the assignment sync is an ordinary worker, then the leader is able to detect that there are lost assignments and instead of triggering a rebalance delay among the same members of the group, it treats the lost tasks as new tasks and reassigns them immediately. If the lost assignment included revocations that were not applied, the leader reapplies these revocations again.
Existing unit tests and integration tests are adapted to test the proposed optimizations.
Reviewers: Randall Hauch <rhauch@gmail.com>
The current "about" string incorrectly describes the session epoch as the partition epoch. Rename to `SessionEpoch` to make usage clearer. Also rename `MaxWait` to `MaxWaitMs` to make the time unit clear and `FetchableTopic` to `FetchTopic` for consistency with `FetchPartition`.
Reviewers: Ismael Juma <ismael@juma.me.uk>
_unknownTaggedFields contains tagged fields which we don't understand
with the current schema. However, we still want to keep the data around
for various purposes. For example, if we are printing out a JSON form of
the message we received, we want to include a section containing the
tagged fields that couldn't be parsed. To leave these out would give an
incorrect impression of what was sent over the wire. Since the unknown
tagged fields represent real data, they should be included in the fields
checked by equals().
Reviewers: Ismael Juma <ismael@juma.me.uk>, Boyang Chen <boyang@confluent.io>
`SelectorMetrics` has a per-connection metrics, which means the number of `MetricName` objects and the strings associated with it (such as group name and description) grows with the number of connections in the client. This overhead of duplicate string objects is amplified when there are multiple instances of kafka clients within the same JVM. This patch addresses some of the memory overhead by making `metricGrpName` a constant field and introducing a new field `perConnectionMetricGrpName`.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
This PR provides two fixes:
1. Skip offset validation if the current leader epoch cannot be reliably determined.
2. Raise an out of range error if the leader returns an undefined offset in response to the OffsetsForLeaderEpoch request.
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Also added a new unit test to verify the functionality and expectations.
Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
cc omkreddy this should also get backported to 2.6.x
Author: Xavier Léauté <xvrl@apache.org>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8813 from xvrl/fix-jmx-reset
Fixes KAFKA-10033.
Replace AdminOperationException with UnknownTopicOrPartitionException if topic does not exist when validating topic configs in AdminZkClient.
Author: gnkoshelev <gnkoshelev@gmail.com>
Author: Gregory <gnkoshelev@gmail.com>
Reviewers: Brian Byrne <bbyrne@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#8715 from gnkoshelev/KAFKA-10033
This applies to the producer, consumer, admin client, connect worker
and inter broker communication.
`ClientDnsLookup.DEFAULT` has been deprecated and a warning
will be logged if it's explicitly set in a client config.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Minimum fix needed to stop this test failing and unblock others
Co-authored-by: Luke Chen <showuon@gmail.com>
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Enables `TLSv1.3` by default with Java 11 or newer.
2. Add unit tests that cover the various TLSv1.2 and TLSv1.3 combinations.
3. Extend `benchmark_test.py` and `replication_test.py` to run with 'TLSv1.2'
or 'TLSv1.3'.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Fix the failed testMultiConsumerStickyAssignment by modifying the logic error in allSubscriptionsEqual method.
We will create the consumerToOwnedPartitions to keep the set of previously owned partitions encoded in the Subscription. It's our basis to do the reassignment. In the allSubscriptionsEqual, we'll get the member generation of the subscription, and remove all previously owned partitions as invalid if the current generation is higher. However, the logic before my fix, will remove the current highest member out of the consumerToOwnedPartitions, which should be kept because it's the current higher generation member. Fix this logic error.
Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Fixed spotBugs error introduced by c6633a1:
>Dead store to isFreshAssignment in org.apache.kafka.clients.consumer.internals.AbstractStickyAssignor.generalAssign(Map, Map)
Reviewers: Ismael Juma <ismael@juma.me.uk>
Motivation and pseudo code algorithm in the ticket.
Added a scale test with large number of topic partitions and consumers and 30s timeout.
With these changes, assignment with 2,000 consumers and 200 topics with 2,000 each completes within a few seconds.
Porting the same test to trunk, it took 2 minutes even with a 100x reduction in the number of topics (ie, 2 minutes for 2,000 consumers and 2 topics with 2,000 partitions)
Should be cherry-picked to 2.6, 2.5, and 2.4
Reviewers: Guozhang Wang <wangguoz@gmail.com>
1. Move KafkaProducer#propsToMap to Utils#propsToMap
2. Apply Utils#propsToMap to constructor of KafkaConsumer
Reviewers: Noa Resare <noa@resare.com>, Ismael Juma <ismael@juma.me.uk>