Commit Graph

11852 Commits

Author SHA1 Message Date
A. Sophie Blee-Goldman 7702f44c6e
Change the ProducerConfig constructor with `doLog` parameter to protected
In applications that construct the various client configs multiple times, the config logging can be pretty extreme, so it's nice to be able to disable this for subsequent config objects that are created. We don't expose the constructors that accept a `doLog` parameter in the public API, but all the other clients at least make this constructor `protected` so it's possible to extend the class and suppress the excessive logging. However the ProducerConfig is the one case where this constructor is package-private. It would be nice to align it with the other client config classes and make this `protected` to allow turning off logging
2023-10-31 15:18:44 -07:00
Apoorv Mittal ed3fa83d38
KAFKA-15669: Implement telemetry metric naming strategy (KIP-714) (#14619)
The PR defines the naming convention for telemetry metric names for KIP-714 - jira. Telemetry metric name should be dot separated and tags should be snake case.

PR adds the interface which will be used in MetricsReporter implementation to construct metric names.

Reviewers: Xavier Léauté <xvrl@apache.org>, Walker Carlson <wcarlson@apache.org>, Matthias J. Sax <mjsax@apache.org>, Andrew Schofield <andrew_schofield@uk.ibm.com>
2023-10-31 15:35:02 -05:00
Florin Akermann b5c24974ae
Kafka 12317: Relax non-null key requirement in Kafka Streams (#14174)
[KIP-962](https://cwiki.apache.org/confluence/display/KAFKA/KIP-962%3A+Relax+non-null+key+requirement+in+Kafka+Streams)

The key requirments got relaxed for the followinger streams dsl operator:

left join Kstream-Kstream: no longer drop left records with null-key and call ValueJoiner with 'null' for right value.
outer join Kstream-Kstream: no longer drop left/right records with null-key and call ValueJoiner with 'null' for right/left value.
left-foreign-key join Ktable-Ktable: no longer drop left records with null-foreign-key returned by the ForeignKeyExtractor and call ValueJoiner with 'null' for right value.
left join KStream-Ktable: no longer drop left records with null-key and call ValueJoiner with 'null' for right value.
left join KStream-GlobalTable: no longer drop records when KeyValueMapper returns 'null' and call ValueJoiner with 'null' for right value.

Reviewers: Walker Carlson <wcarlson@apache.org>
2023-10-31 11:09:42 -05:00
Philip Nee 47b468bb8c
MINOR: Remove ambiguous constructor (#14598)
One of the comments in https://issues.apache.org/jira/browse/KAFKA-15534 : #14532

that the constructor taking a BiConsumer is rather confusing. Removing this constructor and allow the request to take a callback using whenComplete method.

Reviewers: Kirk True <ktrue@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2023-10-31 16:05:43 +01:00
Dongnuo Lyu 7bdd1a015e
KAFKA-15647: Fix the different behavior in error handling between the old and new group coordinator (#14589)
In `KafkaApis.scala`, we build the API response differently if exceptions are thrown during the API execution. Since the new group coordinator only populates the response with error code instead of throwing an exception when an error occurs, there may be different behavior between the existing group coordinator and the new one.

This patch:
- Fixes the response building in `KafkaApis.scala` for the two APIs affected by such difference -- OffsetFetch and OffsetDelete.
- In `GroupCoordinatorService.java`, returns a response with error code instead of a failed future when the coordinator is not active.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-31 03:11:52 -07:00
Ritika Reddy a48ca490e4
KAFKA-15643: Fix error logged when unload is called on a broker that was never a coordinator. (#14657)
When a new leader is elected for a __consumer_offset partition, the followers are notified to unload the state. However, only the former leader is aware of it. The remaining follower prints out the following error:
`ERROR [GroupCoordinator id=1] Execution of UnloadCoordinator(tp=__consumer_offsets-1, epoch=0) failed due to This is not the correct coordinator.. (org.apache.kafka.coordinator.group.runtime.CoordinatorRuntime)`
The error is actually correct and expected when in the remaining follower case, however this could be misleading. This patch handles the case gracefully.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-31 03:09:32 -07:00
Kamal Chandraprakash 57fd8f4c36
KAFKA-15632: Drop the invalid remote log metadata events (#14576)
Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>
2023-10-31 15:21:33 +05:30
Calvin Liu 8f8ad6db38
KAFKA-15582: Move the clean shutdown file to the storage package (#14603)
A follow-up change to move the clean shutdown file to the storage package.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2023-10-30 16:27:40 -07:00
Matthias J. Sax 85737fe88a HOTFIX: remove unused import to fix checkstyle error 2023-10-30 13:22:03 -07:00
Matthias J. Sax 4d04711451
KAFKA-15602: revert KAFKA-4852 (#14617)
This PR reverts
 - 51dbd175b0
 - 496ae054c2

Reviewers:  Philip Nee <pnee@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2023-10-30 13:14:15 -07:00
Kirk True 2e2f32c050
KAFKA-15628: Refactor ConsumerRebalanceListener invocation for reuse (#14638)
Straightforward refactoring to extract an inner class and methods related to `ConsumerRebalanceListener` for reuse in the KIP-848 implementation of the consumer group protocol. Also using `Optional` to explicitly mark when a `ConsumerRebalanceListener` is in use or not, allowing us to make some (forthcoming) optimizations when there is no listener to invoke.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-30 11:51:30 -07:00
Igor Soarez 9dbee599f1
MINOR: Rename log dir UUIDs (#14517)
After a late discussion in the voting thread for KIP-858 we
decided to improve the names for the designated reserved
log directory UUID values.

Reviewers: Christo Lolov <lolovc@amazon.com>, Ismael Juma <ismael@juma.me.uk>,  Ziming Deng <dengziming1993@gmail.com>.
2023-10-30 19:10:57 +08:00
Paolo Patierno 2736a2e50a
KAFKA-15689: Logging skipped event when expected migration state is wrong (#14646)
As described in ticket KAFKA-15689, this PR fixes the logging of a migration event when the expected migration state is wrong.

Signed-off-by: Paolo Patierno <ppatierno@live.com>

Reviewers: Luke Chen <showuon@gmail.com>
2023-10-30 17:59:11 +08:00
Paolo Patierno 0c7d1fca92
Using INFO level for migration transition state logging (#14651)
Trivial PR to use the INFO level (instead of DEBUG) for logging the state transition during the ZooKeeper to KRaft migration.
I think it's a useful information to be logged without the need for the user to increase the logging level itself.

Signed-off-by: Paolo Patierno <ppatierno@live.com>

Reviewers: Luke Chen <showuon@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-30 17:57:26 +08:00
James Cheng b9f2874c44
MINOR: Fix typo in a comment at KTableFilter (#14665)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-30 10:16:12 +01:00
Ismael Juma fa36a7f2d6
MINOR: Push down logic from TransactionManager to TxnPartitionEntry (#14591)
And encapsulate TxnPartitionEntry state.

This makes it easier to understand the behavior and the paths through
which the state is updated.

Reviewers: Justine Olshan <jolshan@confluent.io>
2023-10-28 07:27:20 -07:00
Gyeongwon, Do abde0e0878
MINOR: fix typo and comment (#14650)
Reviewers: hudeqi <1217150961@qq.com>, Ziming Deng <dengziming1993@gmail.com>.
2023-10-28 12:10:53 +08:00
David Arthur 37715862d7 KAFKA-15704: Set missing ZkMigrationReady field on ControllerRegistrationRequest
This field was missed by the initial KIP-919 PR(s). The result is that migrations can't begin since
the controllers will never become ready. This patch fixes that as well as pulls over some fixes
from the 3.6 branch.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-27 14:16:24 -07:00
Ritika Reddy 68a5072f54
KAFKA-15578: System Tests for running old protocol with new coordinator (#14524)
This patch adds configs to facilitate the testing with the new group coordinator (KIP-848) in kraft mode. Only one test files is converted at the moment. The others will follow.

Reviewers: Ian McDonald <imcdonald@confluent.io>, David Jacot <djacot@confluent.io>
2023-10-27 10:33:40 -07:00
bachmanity1 f0e97397c0
KAFKA-14133: Replace Easymock with Mockito in StreamsProducerTest, TopologyMetadataTest & GlobalStateStoreProviderTest (#14410)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>
2023-10-27 10:45:25 +02:00
Hanyu Zheng 834f72b03d
KAFKA-15527: Update docs and JavaDocs (#14600)
Part of KIP-985.

Updates JavaDocs for `RangeQuery` and `ReadOnlyKeyValueStore` with regard to ordering guarantees.
Updates Kafka Streams upgrade guide with KIP information.

Reviewer: Matthias J. Sax <matthias@confluent.io>
2023-10-26 17:48:28 -07:00
David Arthur 339d2556c6
KAFKA-15605: Fix topic deletion handling during ZK migration (#14545)
This patch adds reconciliation logic to migrating ZK brokers to deal with pending topic deletions as well as missed StopReplicas.

During the hybrid mode of the ZK migration, the KRaft controller is asynchronously sending UMR and LISR to the ZK brokers to propagate metadata. Since this process is essentially "best effort" it is possible for a broker to miss a StopReplicas. The new logic lets the ZK broker examine its local logs compared with the full set of replicas in a "Full" LISR. Any local logs which are not present in the set of replicas in the request are removed from ReplicaManager and marked as "stray".

To avoid inadvertent data loss with this new behavior, the brokers do not delete the "stray" partitions. They will rename the directories and log warning messages during log recovery. It will be up to the operator to manually delete the stray partitions. We can possibly enhance this in the future to clean up old stray logs.

This patch makes use of the previously unused Type field on LeaderAndIsrRequest. This was added as part of KIP-516 but never implemented. Since its introduction, an implicit 0 was sent in all LISR. The KRaft controller will now send a value of 2 to indicate a full LISR (as specified by the KIP). The presence of this value acts as a trigger for the ZK broker to perform the log reconciliation.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-10-26 18:13:52 -04:00
Levani Kokhreidze 986c1b1f31
KAFKA-15659: Fix shouldInvokeUserDefinedGlobalStateRestoreListener flaky test (#14608)
Trying to fix flakiness for the shouldInvokeUserDefinedGlobalStateRestoreListener test introduced in #14519.

Fixes are:

-Do not use static membership.
-Always close the 2nd KafkaStreams instance.
-Await for the Kafka Streams instance to transition to a RUNNING state before proceeding.
-Added logging for the StateRestoreListener implementation.
-Reduce restore consumer MAX_POLL_RECORDS.

Reviewers: Anna Sophie Blee-Goldman <sophie@responsive.dev>
2023-10-26 14:56:33 -07:00
Apoorv Mittal ad2677bb7b
KAFKA-15614: Define interfaces and classes for client telemetry (#14575)
This PR for KIP-714 - KAFKA-1564 lays out interfaces and classes for capturing client telemetry metrics.

Below image defines interaction of different classes among them interfaces have been included in the PR.

Reviewers: Walker Carlson <wcarlson@apache.org>, Matthias J. Sax <matthias@confluent.io>, Andrew Schofield <andrew_schofield@uk.ibm.com>, Kirk True <ktrue@confluent.io>, Philip Nee <pnee@confluent.io>, Jun Rao <junrao@gmail.com>,
2023-10-26 15:06:38 -05:00
Matthias J. Sax a6c14003a9
HOTFIX: close iterator to avoid resource leak (#14624)
Reviewers: Hao Li <hli@confluent.io>, Bill Bejeck <bill@confluent.io>
2023-10-26 10:30:39 -07:00
atu-sharm a7aaa9c44f
KAFKA-15644: Fix CVE-2023-4586 in netty:handler (#14584)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>
2023-10-26 18:36:59 +02:00
Lucas Brutschy b061ab7701
MINOR: Fix misleading log-line (#14643)
After finishing restoration, we should only log the active tasks. Standby tasks are not part of restoration and it can be confusing to see them show up on this log message.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-10-26 08:31:46 -07:00
maniekes 987609404c
KAFKA-15685: Add support for MinGW and MSYS2 (windows OS) (#13321)
Kafka class runner does not work with MINGW/Git Bash on Windows. This commit adds support for MinGW and MSYS2 development environments.

Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-26 11:29:39 +02:00
Owen Leung 9989b68d0d
KAFKA-15200: Add pre-requisite check in release.py (#14636)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-26 10:40:01 +02:00
Gaurav Narula abd104a606
MINOR: avoid blocking for randomness in DefaultRecordBatchTest (#14625)
Using `SecureRandom.getInstanceStrong()` results in using `/dev/random` which is known to block in Linux when the OS runs low on entropy. This was noticable when running tests in containerised CI environments.

This commit avoids using a CSPRNG altogether since the tests do not need cryptographically secure random numbers.

Reviewers: Divij Vaidya <diviv@amazon.com>, Igor Soarez <soarez@apple.com>

---------

Co-authored-by: Igor Soarez <soarez@apple.com>
2023-10-26 09:42:04 +02:00
hudeqi b559942c17
KAFKA-15671: Fix flaky test RemoteIndexCacheTest.testClearCacheAndIndexFilesWhenResizeCache (#14622)
Reviewers: Divij Vaidya <diviv@amazon.com>

---------

Co-authored-by: Deqi Hu <deqi.hu@shopee.com>
2023-10-25 11:18:55 +02:00
dengziming 03ea24aa1d
MINOR: Fix flaky testFollowerCompleteDelayedFetchesOnReplication (#14616)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-10-25 09:39:29 +08:00
Kirk True 2b233bfa5f
KAFKA-14274 [6, 7]: Introduction of fetch request manager (#14406)
Changes:

1. Introduces FetchRequestManager that implements the RequestManager
   API for fetching messages from brokers. Unlike Fetcher, record
   decompression and deserialization is performed on the application
   thread inside CompletedFetch.
2. Restructured the code so that objects owned by the background thread
   are not instantiated until the background thread runs (via Supplier)
   to ensure that there are no references available to the
   application thread.
3. Ensuring resources are properly using Closeable and using
   IdempotentCloser to ensure they're only closed once.
4. Introduces ConsumerTestBuilder to reduce a lot of inconsistency in
   the way the objects were built up for tests.

Reviewers: Philip Nee <pnee@confluent.io>, Lianet Magrans <lianetmr@gmail.com>, Jun Rao<junrao@gmail.com>
2023-10-24 13:03:05 -07:00
Lucas Brutschy d144b7ee38
KAFKA-15326: [10/N] Integrate processing thread (#14193)
- Introduce a new internal config flag to enable processing threads
- If enabled, create a scheduling task manager inside the normal task manager (renamings will be added on top of this), and use it from the stream thread
- All operations inside the task manager that change task state, lock the corresponding tasks if processing threads are enabled.
- Adds a new abstract class AbstractPartitionGroup. We can modify the underlying implementation depending on the synchronization requirements. PartitionGroup is the unsynchronized subclass that is going to be used by the original code path. The processing thread code path uses a trivially synchronized SynchronizedPartitionGroup that uses object monitors. Further down the road, there is the opportunity to implement a weakly synchronized alternative. The details are complex, but since the implementation is essentially a queue + some other things, it should be feasible to implement this lock-free.
- Refactorings in StreamThreadTest: Make all tests use the thread member variable and add tearDown in order avoid thread leaks and simplify debugging. Make the test parameterized on two internal flags: state updater enabled and processing threads enabled. Use JUnit's assume to disable all tests that do not apply.
Enable some integration tests with processing threads enabled.

Reviewer: Bruno Cadonna <bruno@confluent.io>
2023-10-24 10:17:55 +02:00
Nikolay e0121a38b1
MINOR: Deduplicating ConsumerGroupCommand print formating (#14610)
ConsumerGroupCommand contains code duplications for table row format.
This PR reduces code duplication and make it more clear and easy to understand.

Reviewers: Luke Chen <showuon@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-24 15:16:32 +08:00
Jotaniya Jeel 4612fe42af
KAFKA-15481: Fix concurrency bug in RemoteIndexCache (#14483)
RemoteIndexCache has a concurrency bug which leads to IOException while fetching data from remote tier.

The bug could be reproduced as per the following order of events:-

Thread 1 (cache thread): invalidates the entry, removalListener is invoked async, so the files have not been renamed to "deleted" suffix yet.
Thread 2: (fetch thread): tries to find entry in cache, doesn't find it because it has been removed by 1, fetches the entry from S3, writes it to existing file (using replace existing)
Thread 1: async removalListener is invoked, acquires a lock on old entry (which has been removed from cache), it renames the file to "deleted" and starts deleting it
Thread 2: Tries to create in-memory/mmapped index, but doesn't find the file and hence, creates a new file of size 2GB in AbstractIndex constructor. JVM returns an error as it won't allow creation of 2GB random access file.

This commit fixes the bug by using EvictionListener instead of RemovalListener to perform the eviction atomically with the file rename. It handles the manual removal (not handled by EvictionListener) by using computeIfAbsent() and enforcing atomic cache removal & file rename.

Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Arpit Goyal 
<goyal.arpit.91@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>
2023-10-23 14:50:46 +02:00
Mickael Maison 8b9f6d17f2
KAFKA-15093: Add 3.5 Streams upgrade system tests (#14602)
Reviewers: Matthias J. Sax <mjsax@apache.org>
2023-10-23 13:26:50 +02:00
shuoer86 27a155c80a
MINOR: Fix typos in build.gradle, tests and trogdor (#14574)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, hudeqi <1217150961@qq.com>
2023-10-23 12:30:57 +02:00
vamossagar12 1a3aca305e
KAFKA-15457: Add support for OffsetFetch version 9 in admin client (#14611)
This patch adds support for OffsetFetch version 9 in the admin client. It mainly allows handling two new error codes `STALE_MEMBER_EPOCH` and `UNKNOWN_MEMBER_ID`  introduced as part of KIP-848.

Reviewers: David Jacot <djacot@confluent.io>
2023-10-23 02:42:21 -07:00
Mickael Maison 9c77c17c4e
KAFKA-15664: Add 3.4 Streams upgrade system tests (#14601)
Reviewers: Luke Chen <showuon@gmail.com>,  Matthias J. Sax <mjsax@apache.org>
2023-10-23 10:33:59 +02:00
Gantigmaa Selenge 84a58d75bb
KAFKA-15566: Fix test FetchRequestTest.testLastFetchedEpochValidation for KRaft mode (#14563)
Fix test FetchRequestTest.testLastFetchedEpochValidation for KRaft mode
The test fails due to unexpected error (OFFSET_OUT_OF_RANGE) when enabled with KRaft mode.
The reason it takes longer to set the leader epoch in KRaft mode is because of the way the topic partitions are created differently than Zookeeper. In Zookeeper mode, we create the topic partitions directly with Zookeeper therefore seem to take less time to create the logs and set leader epoch on broker. In KRaft mode, we use Admin client to create topic partitions. Even though the test waits for topic partitions to get created and appear in metadata cache, it doesn’t seem to be sufficient time for leader epoch to get set on the brokers.

Reviewers: Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-10-23 11:05:57 +08:00
Greg Harris ffcb6d4a1a
KAFKA-14767: Fix missing commitId build error after git gc (#13315)
git gc moves commit hashes from individual .git/refs/heads/ to .git/packed-refs which is not read
by the determineCommitId function.

Replace the existing lookup within the .git directory with a GrGit lookup that handles packed and
unpacked refs transparently.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-10-22 11:08:01 -07:00
Matthias J. Sax 4371214fbe
KAFKA-15378: fix streams upgrade system test (#14539)
Fixing bad test setup. We tried to fix an upgrade bug for FK-joins in 3.1 release, but it later turned out that the PR was not sufficient to fix it. We finally fixed in 3.4 release.

This PR updates the system test matrix to only test working versions with FK-joins, limited to available test versions.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Hao Li <hli@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
2023-10-20 16:20:00 -07:00
Justine Olshan e8c8969330
KAFKA-15626: Replace verification guard object with an specific type (#14568)
I've added a new class with an incrementing atomic long to represent the verification guard. Upon creation of verification guard, we will increment this value and assign it to the guard.

The expected behavior is the same as the object guard, but with better debuggability with the string value and type safety (I found a type safety issue in the current code when implementing this)

Reviewers: Ismael Juma <ismael@juma.me.uk>, Artem Livshits <alivshits@confluent.io>
2023-10-20 14:26:20 -07:00
Josep Prat eed5e68880
MINOR: Server-Commons cleanup (#14572)
MINOR: Server-Commons cleanup

Fixes Javadoc and minor issues in the Java files of Server-Commons modules.

Javadoc is now formatted as intended by the author of the doc itself.

Signed-off-by: Josep Prat <josep.prat@aiven.io>

Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-10-20 21:04:04 +02:00
hudeqi 4083cd627e
KAFKA-15607: Fix NPE in MirrorCheckpointTask::syncGroupOffset (#14587)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-10-20 12:17:51 -04:00
Christo Lolov b5ec6e8a0d
KAFKA-14133: Move RocksDBGenericOptionsToDbOptionsColumnFamilyOptionsAdapterTest to Mockito (#14586)
Reviewers: Divij Vaidya <diviv@amazon.com>
2023-10-20 16:09:36 +02:00
Chris Egerton 091eb9b349
KAFKA-15428: Cluster-wide dynamic log adjustments for Connect (#14538)
Reviewers: Greg Harris <greg.harris@aiven.io>, Yang Yang <yayang@uber.com>, Yash Mayya <yash.mayya@gmail.com>
2023-10-20 09:52:37 -04:00
Philip Nee c81a725219
KAFKA-15534: Inject request completion time when the request failed (#14532)
Currently, we aren't able to access the request completion time if the request is completed exceptionally, which results in many system calls. This is not ideal because these system calls can add up. Instead, time is already retrieved on the top of the background thread event loop, which is then propagated into the NetworkClientDelegate.poll.

In this PR - I store the completion time in the handler, so that it becomes accessible in the callbacks.

Reviewer: Bruno Cadonna <cadonna@apache.org>
2023-10-20 09:47:25 +02:00
hudeqi 21ebbe6b28
MINOR:Remove unused method parameter in ConsumerGroupCommand (#14585)
In ConsumerGroupCommand, there are two methods: getLogEndOffsets and getLogStartOffsets, the first parameter groupId is not used, so remove it.

Reviewers: Luke Chen <showuon@gmail.com>
2023-10-20 10:05:47 +08:00