kafka

Commit Graph

Author	SHA1	Message	Date
David Jacot	3be7f7d611	KAFKA-14391; Add ConsumerGroupHeartbeat API (#12972 ) This patch does a few things: 1) It introduces a new flag to the request spec: `latestVersionUnstable`. It signifies that the last version of the API is considered unstable (or still in development). As such, the last API version is not exposed by the server unless specified otherwise with the new internal `unstable.api.versions.enable`. This allows us to commit new APIs which are still in development. 3) It adds the ConsumerGroupHeartbeat API, part of KIP-848, and marks it as unreleased for now. 4) It adds the new error codes required by the new ConsumerGroupHeartbeat API. Reviewers: Justine Olshan <jolshan@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Jason Gustafson <jason@confluent.io>	2023-02-09 09:13:31 +01:00
Christo Lolov	a0a9b6ffea	MINOR: Remove unnecessary code (#13210 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>	2023-02-07 17:37:45 +01:00
Yash Mayya	a3cf8b54e0	KAFKA-14455: Kafka Connect create and update REST APIs should surface failures while writing to the config topic (#12984 ) Reviewers: Greg Harris <greg.harris@aiven.io>, Chris Egerton <chrise@aiven.io>	2023-02-02 11:03:38 -05:00
Colin Patrick McCabe	eb7d5cbf15	MINOR: add startup timeouts to KRaft integration tests (#13153 ) When running junit tests, it is not good to block forever on CompletableFuture objects. When there are bugs, this can lead to junit tests hanging forever. Jenkins does not deal with this well -- it often brings down the whole multi-hour test run. Therefore, when running integration tests in JUnit, set some reasonable time limits on broker and controller startup time. Reviewers: Jason Gustafson <jason@confluent.io>	2023-01-30 11:29:30 -08:00
Kirk True	bc1ce9f0f1	KAFKA-14623: OAuth's HttpAccessTokenRetriever potentially leaks secrets in logging (#13119 ) Removes logging of the HTTP response directly in all known cases to prevent potentially logging access tokens. Reviewers: Sushant Mahajan <sushant.mahajan88@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	2023-01-24 19:06:24 +00:00
Jason Gustafson	b2cb546fba	MINOR: Small cleanups in refactored consumer implementation (#13138 ) This patch contains a few cleanups and fixes in the new refactored consumer logic: - Use `CompletableFuture` instead of `RequestFuture` in `NetworkClientDelegate`. This is a much more extensible API and it avoids tying the new implementation to `ConsumerNetworkClient`. - Fix call to `isReady` in `NetworkClientDelegate`. We need the call to `ready` to initiate the connection. - Ensure backoff is enforced even after successful `FindCoordinator` requests. This avoids a tight loop while metadata is converging after a coordinator change. - `RequestState` was incorrectly using the reconnect backoff as the retry backoff. In fact, we don't currently have a retry backoff max implemented in the consumer, so the use of `ExponentialBackoff` is unnecessary, but I've left it since we may add this with https://cwiki.apache.org/confluence/display/KAFKA/KIP-580%3A+Exponential+Backoff+for+Kafka+Clients. - Minor cleanups in test cases to avoid unused classes/fields. Reviewers: Philip Nee <pnee@confluent.i>, Guozhang Wang <guozhang@apache.org>	2023-01-23 12:45:24 -08:00
Mickael Maison	00e5803cd3	MINOR: Various cleanups in client tests (#13094 ) Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <mlists@juma.me.uk>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <christololov@gmail.com>	2023-01-23 13:07:44 +01:00
Philip Nee	ca80502ebe	Update outdated documentation. (#13139 ) The current documentation indicates two positions are tracked, but these positions were removed a few years ago. Now we use a single position to track the last consumed record. Updated the documentation to reflect to the current state. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2023-01-20 14:53:20 -08:00
A. Sophie Blee-Goldman	79b12cbe65	MINOR: fix flaky StickyAssignorTest.testLargeAssignmentAndGroupWithNonEqualSubscription (#13134 ) This test is supposed to be a sanity check that rebalancing with a large number of partitions/consumers won't start to take obscenely long or approach the max.poll.interval.ms -- bumping up the timeout by another 30s still feels very reasonable considering the test is for 1 million partitions Reviewers: Matthias J. Sax <mjsax@apache.org>	2023-01-20 13:56:38 -08:00
Alex Sorokoumov	4b75265761	KAFKA-14638: Elaborate when transaction.timeout.ms resets (#13129 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>	2023-01-19 18:55:32 -08:00
David Jacot	700947aa5a	KAFKA-14367; Add `OffsetDelete` to the new `GroupCoordinator` interface (#12902 ) This patch adds `OffsetDelete` to the new `GroupCoordinator` interface and updates `KafkaApis` to use it. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-01-17 20:39:01 +01:00
Divij Vaidya	b2bc72dc79	MINOR: Include the inner exception stack trace when re-throwing an exception (#12229 ) While wrapping the caught exception into a custom one, information about the caught exception is being lost, including information about the stack trace of the exception. When re-throwing an exception, we either include the original exception or the relevant information is added to the exception message. Reviewers: Ismael Juma <ismael@juma.me.uk>, Luke Chen <showuon@gmail.com>, dengziming <dengziming1993@gmail.com>, Matthew de Detrich <mdedetrich@gmail.com>	2023-01-15 15:03:23 -08:00
Cheryl Simmons	d2556e02a2	Update ProducerConfig.java (#13115 ) This should be greater than 1 to be consistent with behavior described inmax.in.flight.requests.per.connection. Reviewers: Bill Bejeck <bbejeck@apache.org>	2023-01-13 16:35:49 -05:00
David Jacot	a2926edc2f	KAFKA-14367; Add `TxnOffsetCommit` to the new `GroupCoordinator` interface (#12901 ) This patch adds `TxnOffsetCommit` to the new `GroupCoordinator` interface and updates `KafkaApis` to use it. Reviewers: Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-01-13 09:54:54 +01:00
Federico Valeri	111f02cc74	KAFKA-14568: Move FetchDataInfo and related to storage module (#13085 ) Part of KAFKA-14470: Move log layer to storage module. Reviewers: Ismael Juma <ismael@juma.me.uk> Co-authored-by: Ismael Juma <ismael@juma.me.uk>	2023-01-12 21:32:23 -08:00
David Jacot	e6669672ef	KAFKA-14367; Add `OffsetCommit` to the new `GroupCoordinator` interface (#12886 ) This patch adds `OffsetCommit` to the new `GroupCoordinator` interface and updates `KafkaApis` to use it. Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-01-12 18:05:49 +01:00
David Arthur	0bb05d8679	KAFKA-14304 Use boolean for ZK migrating brokers in RPC/record (#13103 ) With the new broker epoch validation logic introduced in #12998, we no longer need the ZK broker epoch to be sent to the KRaft controller. This patch removes that epoch and replaces it with a boolean. Another small fix is included in this patch for controlled shutdown in migration mode. Previously, if a ZK broker was in migration mode, it would always try to do controlled shutdown via BrokerLifecycleManager. Since there is no ordering dependency between bringing up ZK brokers and the KRaft quorum during migration, a ZK broker could be running in migration mode, but talking to a ZK controller. A small check was added to see if the current controller is ZK or KRaft before decided which controlled shutdown to attempt. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-01-11 14:36:56 -05:00
David Jacot	24a86423e9	KAFKA-14367; Add `OffsetFetch` to the new `GroupCoordinator` interface (#12870 ) This patch adds OffsetFetch to the new GroupCoordinator interface and updates KafkaApis to use it. Reviewers: Philip Nee <pnee@confluent.i>, Jeff Kim <jeff.kim@confluent.io>, Justine Olshan <jolshan@confluent.io>	2023-01-10 11:38:31 -08:00
Michael Marshall	def8d724c8	KAFKA-14540: Fix DataOutputStreamWritable#writeByteBuffer (#13032 ) When writing a ByteBuffer backed by a HeapBuffer to a DataOutputStream, it is necessary to pass in the offset and the position, not just the position. It is also necessary to pass the remain length, not the limit. The current code results in writing the wrong data to DataOutputStream. While the DataOutputStreamWritable is used in the project, I do not see any references that would utilize this code path, so this bug fix is relatively minor. I added a new test to cover the exact bug. The test fails without this change.	2023-01-10 11:50:07 +01:00
Akhilesh C	db49070760	KAFKA-14493: Introduce Zk to KRaft migration state machine STUBs in KRaft controller. (#12998 ) This patch introduces a preliminary state machine that can be used by KRaft controller to drive online migration from Zk to KRaft. MigrationState -- Defines the states we can have while migration from Zk to KRaft. KRaftMigrationDriver -- Defines the state transitions, and events to handle actions like controller change, metadata change, broker change and have interfaces through which it claims Zk controllership, performs zk writes and sends RPCs to ZkBrokers. MigrationClient -- Interface that defines the functions used to claim and relinquish Zk controllership, read to and write from Zk. Co-authored-by: David Arthur <mumrah@gmail.com> Reviewers: Colin P. McCabe <cmccabe@apache.org>	2023-01-09 10:44:11 -08:00
iamazy	e38526e375	KAFKA-14570: Fix parenthesis in verifyFullFetchResponsePartitions output (#13072 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2023-01-09 11:15:01 +01:00
Satish Duggana	026105d05f	KAFKA-14550: Move SnapshotFile and CorruptSnapshotException to storage module (#13039 ) For broader context on this change, see: * KAFKA-14470: Move log layer to storage module Reviewers: Ismael Juma <ismael@juma.me.uk>	2023-01-02 07:31:40 -08:00
Ismael Juma	871289c5c4	KAFKA-14476: Move OffsetMap and related to storage module (#13042 ) For broader context on this change, please check: * KAFKA-14470: Move log layer to storage module Reviewers: dengziming <dengziming1993@gmail.com>, Satish Duggana <satishd@apache.org>, Federico Valeri <fedevaleri@gmail.com>	2022-12-23 08:19:00 -08:00
Luke Chen	b47d1950c7	MINOR: increase connectionMaxIdleMs to make test reliable (#13031 ) Saw this flaky test in recent builds: #1452, #1454, org.opentest4j.AssertionFailedError: Unexpected channel state EXPIRED ==> expected: <true> but was: <false> at app//org.junit.jupiter.api.AssertionFailureBuilder.build(AssertionFailureBuilder.java:151) at app//org.junit.jupiter.api.AssertionFailureBuilder.buildAndThrow(AssertionFailureBuilder.java:132) at app//org.junit.jupiter.api.AssertTrue.failNotTrue(AssertTrue.java:63) at app//org.junit.jupiter.api.AssertTrue.assertTrue(AssertTrue.java:36) at app//org.junit.jupiter.api.Assertions.assertTrue(Assertions.java:210) at app//org.apache.kafka.common.network.SslTransportLayerTest.testIOExceptionsDuringHandshake(SslTransportLayerTest.java:883) at app//org.apache.kafka.common.network.SslTransportLayerTest.testUngracefulRemoteCloseDuringHandshakeWrite(SslTransportLayerTest.java:833) We expected the channel state to be AUTHENTICATE or READY, but got EXPIRED. Checking the test, we're not expecting expired state at all, but set a 5 secs idle timeout in selector, which is not enough when machine is busy. Increasing the connectionMaxIdleMs to 10 secs to make the test reliable. Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-12-22 10:29:13 +08:00
Ismael Juma	e8232edd24	KAFKA-14477: Move LogValidator and related to storage module (#13012 ) Also improved `LogValidatorTest` to cover a bug that was originally only caught by `LogAppendTimeTest`. For broader context on this change, please check: * KAFKA-14470: Move log layer to storage module Reviewers: Jun Rao <junrao@gmail.com>	2022-12-21 16:57:02 -08:00
Akhilesh C	5b521031ed	MINOR: Add zk migration field to the ApiVersionsResponse (#13029 ) Add the zk mgiration field which will be used by the KRaft controller to see if the quorum is ready to handle a zk to kraft migration. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-12-21 15:47:30 -08:00
Lucas Brutschy	6a6c730241	KAFKA-14532: Correctly handle failed fetch when partitions unassigned (#13023 ) The failure handling code for fetches could run into an IllegalStateException if a fetch response came back with a failure after the corresponding topic partition has already been removed from the assignment. Reviewers: David Jacot <djacot@confluent.io>	2022-12-21 09:17:11 +01:00
Philip Nee	4548c272ae	KAFKA-14264; New logic to discover group coordinator (#12862 ) [KAFKA-14264](https://issues.apache.org/jira/browse/KAFKA-14264) In this patch, we refactored the existing FindCoordinator mechanism. In particular, we first centralize all of the network operation (send, poll) in `NetworkClientDelegate`, then we introduced a RequestManager interface that is responsible to handle the timing of different kind of requests, based on the implementation. In this path, we implemented a `CoordinatorRequestManager` which determines when to create an `UnsentRequest` upon polling the request manager. Reviewers: Jason Gustafson <jason@confluent.io>	2022-12-19 09:48:52 -08:00
Satish Duggana	7146ac57ba	[KAFKA-13369] Follower fetch protocol changes for tiered storage. (#11390 ) This PR implements the follower fetch protocol as mentioned in KIP-405. Added a new version for ListOffsets protocol to receive local log start offset on the leader replica. This is used by follower replicas to find the local log star offset on the leader. Added a new version for FetchRequest protocol to receive OffsetMovedToTieredStorageException error. This is part of the enhanced fetch protocol as described in KIP-405. We introduced a new field locaLogStartOffset to maintain the log start offset in the local logs. Existing logStartOffset will continue to be the log start offset of the effective log that includes the segments in remote storage. When a follower receives OffsetMovedToTieredStorage, then it tries to build the required state from the leader and remote storage so that it can be ready to move to fetch state. Introduced RemoteLogManager which is responsible for initializing RemoteStorageManager and RemoteLogMetadataManager instances. receives any leader and follower replica events and partition stop events and act on them also provides APIs to fetch indexes, metadata about remote log segments. Followup PRs will add more functionality like copying segments to tiered storage, retention checks to clean local and remote log segments. This will change the local log start offset and make sure the follower fetch protocol works fine for several cases. You can look at the detailed protocol changes in KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-405%3A+Kafka+Tiered+Storage#KIP405:KafkaTieredStorage-FollowerReplication Co-authors: satishd@apache.org, kamal.chandraprakash@gmail.com, yingz@uber.com Reviewers: Kowshik Prakasam <kprakasam@confluent.io>, Cong Ding <cong@ccding.com>, Tirtha Chatterjee <tirtha.p.chatterjee@gmail.com>, Yaodong Yang <yangyaodong88@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Jun Rao <junrao@gmail.com>	2022-12-17 09:36:44 -08:00
Kirk True	f247aac96a	KAFKA-14496: Wrong Base64 encoder used by OIDC OAuthBearerLoginCallbackHandler (#13000 ) The OAuth code to generate the Authentication header was incorrectly using the URL-safe base64 encoder. For client IDs and/or secrets with dashes and/or plus signs would not be encoded correctly, leading to the OAuth server to reject the credentials. This change uses the correct base64 encoder, per RFC-7617. Co-authored-by: Endre Vig <vendre@gmail.com>	2022-12-16 19:44:41 +05:30
Daniel Scanteianu	e3585a4cd5	MINOR: Document Offset and Partition 0-indexing, fix typo (#12753 ) Add comments to clarify that both offsets and partitions are 0-indexed, and fix a minor typo. Clarify which offset will be retrieved by poll() after seek() is used in various circumstances. Also added integration tests. Reviewers: Luke Chen <showuon@gmail.com>	2022-12-16 17:12:40 +08:00
Akhilesh C	8b045dcbf6	KAFKA-14446: API forwarding support from zkBrokers to the Controller (#12961 ) This PR enables brokers which are upgrading from ZK mode to KRaft mode to forward certain metadata change requests to the controller instead of applying them directly through ZK. To faciliate this, we now support EnvelopeRequest on zkBrokers (instead of only on KRaft nodes.) In BrokerToControllerChannelManager, we can now reinitialize our NetworkClient. This is needed to handle the case when we transition from forwarding requests to a ZK-based broker over the inter-broker listener, to forwarding requests to a quorum node over the controller listener. In MetadataCache.scala, distinguish between KRaft and ZK controller nodes with a new type, CachedControllerId. In LeaderAndIsrRequest, StopReplicaRequest, and UpdateMetadataRequest, switch from sending both a zk and a KRaft controller ID to sending a single controller ID plus a boolean to express whether it is KRaft. The previous scheme was ambiguous as to whether the system was in KRaft or ZK mode when both IDs were -1 (although this case is unlikely to come up in practice). The new scheme avoids this ambiguity and is simpler to understand. Reviewers: dengziming <dengziming1993@gmail.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-12-15 14:16:41 -08:00
Greg Harris	dcc02346c5	KAFKA-13881: Add Clients package description javadocs (#12895 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>  , Tom Bentley <tbentley@redhat.com>	2022-12-15 10:03:44 +01:00
David Arthur	67c72596af	KAFKA-14448 Let ZK brokers register with KRaft controller (#12965 ) Prior to starting a KIP-866 migration, the ZK brokers must register themselves with the active KRaft controller. The controller waits for all brokers to register in order to verify that all the brokers can A) Communicate with the quorum B) Have the migration config enabled C) Have the proper IBP set This patch uses the new isMigratingZkBroker field in BrokerRegistrationRequest and RegisterBrokerRecord. The type was changed from int8 to bool for BrokerRegistrationRequest (a mistake from #12860). The ZK brokers use the existing BrokerLifecycleManager class to register and heartbeat with the controllers. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-12-13 13:15:21 -08:00
Artem Livshits	43f39c2e60	KAFKA-14379: Consumer should refresh preferred read replica on update metadata (#12956 ) The consumer (fetcher) used to refresh the preferred read replica on three conditions: 1. the consumer receives an OFFSET_OUT_OF_RANGE error 2. the follower does not exist in the client's metadata (i.e., offline) 3. after metadata.max.age.ms (5 min default) For other errors, it will continue to reach to the possibly unavailable follower and only after 5 minutes will it refresh the preferred read replica and go back to the leader. Another problem is that the client might have stale metadata and not send fetches to preferred replica, even after the leader redirects to the preferred replica. A specific example is when a partition is reassigned. the consumer will get NOT_LEADER_OR_FOLLOWER which triggers a metadata update but the preferred read replica will not be refreshed as the follower is still online. it will continue to reach out to the old follower until the preferred read replica expires. The consumer can instead refresh its preferred read replica whenever it makes a metadata update request, so when the consumer receives i.e. NOT_LEADER_OR_FOLLOWER it can find the new preferred read replica without waiting for the expiration. Generally, we will rely on the leader to choose the correct preferred read replica and have the consumer fail fast (clear preferred read replica cache) on errors and reach out to the leader. Co-authored-by: Jeff Kim <jeff.kim@confluent.io> Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-12-12 09:55:22 +01:00
Ismael Juma	88725669e7	MINOR: Move MetadataQuorumCommand from `core` to `tools` (#12951 ) `core` should only be used for legacy cli tools and tools that require access to `core` classes instead of communicating via the kafka protocol (typically by using the client classes). Summary of changes: 1. Convert the command implementation and tests to Java and move it to the `tools` module. 2. Introduce mechanism to capture stdout and stderr from tests. 3. Change `kafka-metadata-quorum.sh` to point to the new command class. 4. Adjusted the test classpath of the `tools` module so that it supports tests that rely on the `@ClusterTests` annotation. 5. Improved error handling when an exception different from `TerseFailure` is thrown. 6. Changed `ToolsUtils` to avoid usage of arrays in favor of `List`. Reviewers: dengziming <dengziming1993@gmail.com>	2022-12-09 09:22:58 -08:00
David Jacot	f9a09fdd29	MINOR: Small refactor in DescribeGroupsResponse (#12970 ) This patch does a few cleanups: * It removes `DescribeGroupsResponse.fromError` and pushes its logic to `DescribeGroupsRequest.getErrorResponse` to be consistent with how we implemented the other requests/responses. * It renames `DescribedGroup.forError` to `DescribedGroup.groupError`. The patch relies on existing tests. Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-12-09 14:15:53 +01:00
David Jacot	a0c19c05ef	KAFKA-14425; The Kafka protocol should support nullable structs (#12932 ) This patch adds support for nullable structs in the Kafka protocol as described in KIP-893 - https://cwiki.apache.org/confluence/display/KAFKA/KIP-893%3A+The+Kafka+protocol+should+support+nullable+structs. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	2022-12-08 20:54:29 +01:00
Rajini Sivaram	d23ce20bdf	KAFKA-14352: Rack-aware consumer partition assignment protocol changes (KIP-881) (#12954 ) Reviewers: David Jacot <djacot@confluent.io>	2022-12-07 11:41:21 +00:00
Justine Olshan	5aad085a8e	KAFKA-14417: Producer doesn't handle REQUEST_TIMED_OUT for InitProducerIdRequest, treats as fatal error (#12915 ) The broker may return the `REQUEST_TIMED_OUT` error in `InitProducerId` responses when allocating the ID using the `AllocateProducerIds` request. The client currently does not handle this. Instead of retrying as we would expect, the client raises a fatal exception to the application. In this patch, we address this problem by modifying the producer to handle `REQUEST_TIMED_OUT` and any other retriable errors by re-enqueuing the request. Reviewers: Jason Gustafson <jason@confluent.io>	2022-12-06 11:41:31 -08:00
Divij Vaidya	8bb897655f	MINOR: Optimize metric recording when quota check not required (#12933 ) In the selector code path, we record some values for some sensors which do not have any metric associated with them that requires quota. Yet, a redundant check for quotas is made which consumes ~0.7% of CPU time in the hot path as demonstrated by the flame graph below. ![Screenshot 2022-12-01 at 15 46 52](https://user-images.githubusercontent.com/71267/205082727-29c1300d-48fa-475f-b736-45743a2291ba.png) This PR is a minor optimization which removes the redundant check for quotas in cases where it is not required in the hot path. Flamegraph after this patch (note that checkQuotas CPU utilization has been removed) <img width="1789" alt="Screenshot 2022-12-02 at 12 00 31" src="https://user-images.githubusercontent.com/71267/205278204-798fbe96-0acf-4fa4-995b-19b66b6c2cbb.png"> Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-12-05 13:58:07 +01:00
David Jacot	df29b17fc4	KAFKA-14367; Add `LeaveGroup` to the new `GroupCoordinator` interface (#12850 ) This patch adds `leaveGroup` to the new `GroupCoordinator` interface and updates `KafkaApis` to use it. Reviewers: Justine Olshan <jolshan@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-12-05 09:28:35 +01:00
David Arthur	7b7e40a536	KAFKA-14304 Add RPC changes, records, and config from KIP-866 (#12928 ) Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	2022-12-02 19:59:52 -05:00
Colin Patrick McCabe	5514f372b3	MINOR: extract jointly owned parts of BrokerServer and ControllerServer (#12837 ) Extract jointly owned parts of BrokerServer and ControllerServer into SharedServer. Shut down SharedServer when the last component using it shuts down. But make sure to stop the raft manager before closing the ControllerServer's sockets. This PR also fixes a memory leak where ReplicaManager was not removing some topic metric callbacks during shutdown. Finally, we now release memory from the BatchMemoryPool in KafkaRaftClient#close. These changes should reduce memory consumption while running junit tests. Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	2022-12-02 00:27:22 -08:00
José Armando García Sancio	9b409de1e2	KAFKA-14358; Disallow creation of cluster metadata topic (#12885 ) With KRaft the cluster metadata topic (__cluster_metadata) has a different implementation compared to regular topic. The user should not be allowed to create this topic. This can cause issues if the metadata log dir is the same as one of the log dirs. This change returns an authorization error if the user tries to create the cluster metadata topic. Reviewers: David Arthur <mumrah@gmail.com>	2022-12-01 18:34:29 -08:00
Jonathan Albrecht	b56e71faee	MINOR: Update unit/integration tests to work with the IBM Semeru JDK (#12343 ) The IBM Semeru JDK use the OpenJDK security providers instead of the IBM security providers so test for the OpenJDK classes first where possible and test for Semeru in the java.runtime.name system property otherwise. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-12-01 16:22:00 +01:00
Luke Chen	81d98993ad	KAFKA-13715: add generationId field in subscription (#12748 ) Implementation for KIP-792, to add generationId field in ConsumerProtocolSubscription message. So when doing assignment, we'll take from subscription generationId fields it is provided in cooperative rebalance protocol. Otherwise, we'll fall back to original solution to use userData. Reviewers: David Jacot <djacot@confluent.io>	2022-12-01 14:25:51 +08:00
Joel Hamill	d9946a7ffc	MINOR: Fix config documentation formatting (#12921 ) Reviewers: José Armando García Sancio <jsancio@apache.org>	2022-11-30 08:54:39 -08:00
Divij Vaidya	b2d8354e10	KAFKA-14414: Fix request/response header size calculation (#12917 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>	2022-11-30 12:17:21 +01:00
David Jacot	be032735b3	KAFKA-14422; Consumer rebalance stuck after new static member joins a group with members not supporting static members (#12909 ) When a consumer group on a version prior to 2.3 is upgraded to a newer version and static membership is enabled in the meantime, the consumer group remains stuck, iff the leader is still on the old version. The issue is that setting `GroupInstanceId` in the response to the leader is only supported from JoinGroup version >= 5 and that `GroupInstanceId` is not ignorable nor handled anywhere else. Hence is there is at least one static member in the group, sending the JoinGroup response to the leader fails with a serialization error. ``` org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default groupInstanceId at version 2 ``` When this happens, the member stays around until the group coordinator is bounced because a member with a non-null `awaitingJoinCallback` is never expired. This patch fixes the issue by making `GroupInstanceId` ignorable. A unit test has been modified to cover this. Reviewers: Jason Gustafson <jason@confluent.io>	2022-11-28 20:15:54 +01:00
Divij Vaidya	edfa894eb5	KAFKA-14414: Remove unnecessary usage of ObjectSerializationCache (#12890 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-11-28 15:15:34 +01:00
Divij Vaidya	19286449ee	MINOR: Remove possibility of overriding test files clean up (#12889 ) Reviewers: Chris Egerton <chrise@aiven.io>	2022-11-22 11:06:33 -05:00
Artem Livshits	36f933fc5f	Minor; fix sticky partitioner doc to say at least batch.size (#12880 ) Reviewers: Jun Rao <junrao@gmail.com>	2022-11-21 09:10:04 -08:00
Joel Hamill	95499c409f	MINOR: fix syntax typo (#12868 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-11-18 11:13:11 +08:00
Igor Soarez	5bd556a49b	KAFKA-14303 Producer.send without record key and batch.size=0 goes into infinite loop (#12752 ) This fixes an bug which causes a call to producer.send(record) with a record without a key and configured with batch.size=0 never to return. Without specifying a key or a custom partitioner the new BuiltInPartitioner, as described in KIP-749 kicks in. BuiltInPartitioner seems to have been designed with the reasonable assumption that the batch size will never be lower than one. However, documentation for producer configuration states batch.size=0 as a valid value, and even recommends its use directly. [1] [1] clients/src/main/java/org/apache/kafka/clients/producer/ProducerConfig.java:87 Reviewers: Artem Livshits <alivshits@confluent.io>, Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>	2022-11-17 14:27:02 -08:00
David Jacot	c2fc36f331	MINOR: Handle JoinGroupResponseData.protocolName backward compatibility in JoinGroupResponse (#12864 ) This is a small refactor extracted from https://github.com/apache/kafka/pull/12845. It basically moves the logic to handle the backward compatibility of `JoinGroupResponseData.protocolName` from `KafkaApis` to `JoinGroupResponse`. The patch adds a new unit test for `JoinGroupResponse` and relies on existing tests as well. Reviewers: Justine Olshan <jolshan@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-11-16 12:43:00 -08:00
Omnia G H Ibrahim	46bee5bcf3	KAFKA-13401: KIP-787 - MM2 manage Kafka resources with custom Admin implementation. (#12577 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Tom Bentley <tombentley@users.noreply.github.com> Co-authored-by: Tom Bentley <tombentley@users.noreply.github.com> Co-authored-by: oibrahim3 <omnia@apple.com>	2022-11-15 11:21:24 +01:00
Ashmeet Lamba	a971448f3f	KAFKA-14254: Format timestamps as dates in logs (#12684 ) Improves logs withing Streams by replacing timestamps to date instances to improve readability. Approach - Adds a function within common.utils.Utils to convert a given long timestamp to a date-time string with the same format used by Kafka's logger. Reviewers: Divij Vaidya <diviv@amazon.com>, Bruno Cadonna <cadonna@apache.org>	2022-11-07 13:42:39 +01:00
Shawn	fcab5fb888	Revert "KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS (#12140 )" (#12794 ) This reverts commit `c23d60d56c`. Reviewers: Luke Chen <showuon@gmail.com>	2022-11-05 14:05:48 +08:00
Philip Nee	dc18dd921c	MINOR: Remove outdated comment about metadata request (#12810 ) This is an outdated comment about the metadata request from the old implementation. The name of the RPC was changed from GroupMetadata to FindCoordinator, but the comment was not updated. Reviewers: Jason Gustafson <jason@confluent.io>	2022-11-01 17:01:19 -07:00
Philip Nee	fa10e213bf	KAFKA-14247: add handler impl to the prototype (#12775 ) Minor revision for KAFKA-14247. Added how the handler is called and constructed to the prototype code path. Reviewers: John Roesler <vvcephei@apache.org>, Kirk True <kirk@mustardgrain.com>	2022-10-31 14:01:22 -05:00
Philip Nee	0a045d4ef7	KAFKA-14247: Consumer background thread base implementation (#12672 ) Adds skeleton of the background thread. 1-pager: https://cwiki.apache.org/confluence/display/KAFKA/Proposal%3A+Consumer+Threading+Model+Refactor Continuation of #12663 Reviewers: Guozhang Wang <guozhang@apache.org>, Kirk True <kirk@mustardgrain.com>, John Roesler <vvcephei@apache.org>	2022-10-19 21:02:55 -05:00
Ismael Juma	de8c9ea04c	MINOR: Include TLS version in transport layer debug log (#12751 ) This was helpful when debugging an issue recently. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Divij Vaidya <divijvaidya13@gmail.com>	2022-10-17 08:26:23 -07:00
Philip Nee	7d67ddce22	MINOR: typo in KafkaChannel (#12757 ) Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>	2022-10-16 20:08:53 +08:00
Chris Egerton	18e60cb000	KAFKA-12497: Skip periodic offset commits for failed source tasks (#10528 ) Also moves the Streams LogCaptureAppender class into the clients module so that it can be used by both Streams and Connect. Reviewers: Nigel Liang <nigel@nigelliang.com>, Kalpesh Patel <kpatel@confluent.io>, John Roesler <vvcephei@apache.org>, Tom Bentley <tbentley@redhat.com>	2022-10-13 10:15:42 -04:00
Mickael Maison	a7a026cabb	MINOR: Fix closing code tag in producer config docs (#12718 ) Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-10-06 14:33:28 +02:00
Jason Gustafson	c5745d2845	MINOR: Add initial property tests for StandardAuthorizer (#12703 ) In https://github.com/apache/kafka/pull/12695, we discovered a gap in our testing of `StandardAuthorizer`. We addressed the specific case that was failing, but I think we need to establish a better methodology for testing which incorporates randomized inputs. This patch is a start in that direction. We implement a few basic property tests using jqwik which focus on prefix searching. It catches the case from https://github.com/apache/kafka/pull/12695 prior to the fix. In the future, we can extend this to cover additional operation types, principal matching, etc. Reviewers: David Arthur <mumrah@gmail.com>	2022-10-04 16:31:43 -07:00
Philip Nee	997dfa950e	MINOR: Fix typo in selector documentation (#12710 ) Reviewers: David Jacot <djacot@confluent.io>	2022-10-04 13:05:26 +02:00
Philip Nee	c3690a3b4a	KAFKA-14247; Define event handler interface and events (#12663 ) Add some initial interfaces to kick off https://cwiki.apache.org/confluence/display/KAFKA/Proposal%3A+Consumer+Threading+Model+Refactor. We introduce an `EventHandler` interface and a new consumer implementation which demonstrates how it will be used. Subsequent PRs will continue to flesh out the implementation. Reviewers: Guozhang Wang wangguoz@gmail.com, Jason Gustafson <jason@confluent.io>	2022-10-03 09:36:40 -07:00
LinShunKang	496ae054c2	Fix ByteBufferSerializer#serialize(String, ByteBuffer) not roundtrip input with ByteBufferDeserializer#deserialize(String, byte[]) (#12704 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-09-30 06:45:18 -07:00
LinShunKang	51dbd175b0	KAFKA-4852: Fix ByteBufferSerializer#serialize(String, ByteBuffer) not compatible with offsets (#12683 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-09-29 10:59:47 -07:00
Kirk True	8e43548175	KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package (#12039 ) * KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package Move classes into a sub-package of "internal" named "secured" that matches the layout more closely of the "unsecured" package. Replaces the concrete implementations in the former packages with sub-classes of the new package layout and marks them as deprecated. If anyone is already using the newer OAuth code, this should still work. * Fix checkstyle and spotbugs violations Co-authored-by: Kirk True <kirk@mustardgrain.com> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-09-23 13:15:15 +05:30
Manikumar Reddy	5587c65fd3	MINOR: Add configurable max receive size for SASL authentication requests This adds a new configuration `sasl.server.max.receive.size` that sets the maximum receive size for requests before and during authentication. Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com> Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com> Co-authored-by: Mickael Maison <mickael.maison@gmail.com>	2022-09-21 20:58:33 +05:30
Colin Patrick McCabe	b401fdaefb	MINOR: Add more validation during KRPC deserialization When deserializing KRPC (which is used for RPCs sent to Kafka, Kafka Metadata records, and some other things), check that we have at least N bytes remaining before allocating an array of size N. Remove DataInputStreamReadable since it was hard to make this class aware of how many bytes were remaining. Instead, when reading an individual record in the Raft layer, simply create a ByteBufferAccessor with a ByteBuffer containing just the bytes we're interested in. Add SimpleArraysMessageTest and ByteBufferAccessorTest. Also add some additional tests in RequestResponseTest. Reviewers: Tom Bentley <tbentley@redhat.com>, Mickael Maison <mickael.maison@gmail.com>, Colin McCabe <colin@cmccabe.xyz> Co-authored-by: Colin McCabe <colin@cmccabe.xyz> Co-authored-by: Manikumar Reddy <manikumar.reddy@gmail.com> Co-authored-by: Mickael Maison <mickael.maison@gmail.com>	2022-09-21 20:58:23 +05:30
Sushant Mahajan	f8e0a6d924	KAFKA-14212: Enhance HttpAccessTokenRetriever to retrieve error message (#12651 ) Currently HttpAccessTokenRetriever client side class does not retrieve error response from the token e/p. As a result, seemingly trivial config issues could take a lot of time to diagnose and fix. For example, client could be sending invalid client secret, id or scope. This PR aims to remedy the situation by retrieving the error response, if present and logging as well as appending to any exceptions thrown. New unit tests have also been added. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-09-20 12:33:19 +05:30
Artem Livshits	2b2039f0ba	KAFKA-14156: Built-in partitioner may create suboptimal batches (#12570 ) Now the built-in partitioner defers partition switch (while still accounting produced bytes) if there is no ready batch to send, thus avoiding switching partitions and creating fractional batches. Reviewers: Jun Rao <jun@confluent.io>	2022-09-14 17:39:14 -07:00
Philip Nee	5f01fed206	MINOR: Small cleanups in FetcherTest following KAFKA-14196 (#12629 ) Minor cleanups in `FetcherTest` following https://github.com/apache/kafka/pull/12603. Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-09-13 11:10:41 -07:00
Jason Gustafson	3d2ac7cdbe	KAFKA-14208; Do not raise wakeup in consumer during asynchronous offset commits (#12626 ) Asynchronous offset commits may throw an unexpected WakeupException following #11631 and #12244. This patch fixes the problem by passing through a flag to ensureCoordinatorReady to indicate whether wakeups should be disabled. This is used to disable wakeups in the context of asynchronous offset commits. All other uses leave wakeups enabled. Note: this patch builds on top of #12611. Co-Authored-By: Guozhang Wang wangguoz@gmail.com Reviewers: Luke Chen <showuon@gmail.com>	2022-09-13 15:43:09 +08:00
Philip Nee	536cdf692f	KAFKA-14196; Do not continue fetching partitions awaiting auto-commit prior to revocation (#12603 ) When auto-commit is enabled with the "eager" rebalance strategy, the consumer will commit all offsets prior to revocation. Following recent changes, this offset commit is done asynchronously, which means there is an opportunity for fetches to continue returning data to the application. When this happens, the progress is lost following revocation, which results in duplicate consumption. This patch fixes the problem by adding a flag in `SubscriptionState` to ensure that partitions which are awaiting revocation will not continue being fetched. Reviewers: Luke Chen <showuon@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-09-12 21:02:13 -07:00
Jason Gustafson	b9774c0b02	KAFKA-14215; Ensure forwarded requests are applied to broker request quota (#12624 ) Currently forwarded requests are not applied to any quotas on either the controller or the broker. The controller-side throttling requires the controller to apply the quota changes from the log to the quota managers, which will be done separately. In this patch, we change the response logic on the broker side to also apply the broker's request quota. The enforced throttle time is the maximum of the throttle returned from the controller (which is 0 until we fix the aforementioned issue) and the broker's request throttle time. Reviewers: David Arthur <mumrah@gmail.com>	2022-09-12 20:50:33 -07:00
Divij Vaidya	d4fc3186b4	MINOR: Replace usage of File.createTempFile() with TestUtils.tempFile() (#12591 ) Why Current set of integration tests leak files in the /tmp directory which makes it cumbersome if you don't restart the machine often. Fix Replace the usage of File.createTempFile with existing TestUtils.tempFile method across the test files. TestUtils.tempFile automatically performs a clean up of the temp files generated in /tmp/ folder. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <mlists@juma.me.uk>	2022-09-13 08:44:21 +08:00
Colin Patrick McCabe	47252f431e	KAFKA-14216: Remove ZK reference from org.apache.kafka.server.quota.ClientQuotaCallback javadoc (#12617 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-09-12 08:34:46 -07:00
David Jacot	b7f20be809	KAFKA-14201; Consumer should not send group instance ID if committing with empty member ID (#12599 ) The consumer group instance ID is used to support a notion of "static" consumer groups. The idea is to be able to identify the same group instance across restarts so that a rebalance is not needed. However, if the user sets `group.instance.id` in the consumer configuration, but uses "simple" assignment with `assign()`, then the instance ID nevertheless is sent in the OffsetCommit request to the coordinator. This may result in a surprising UNKNOWN_MEMBER_ID error. This PR fixes the issue on the client side by not setting the group instance id if the member id is empty (no generation). Reviewers: Jason Gustafson <jason@confluent.io>	2022-09-08 15:05:40 -07:00
Ismael Juma	faefbb29b6	MINOR: Fix comment in Exit.java to refer to the right method (#12592 ) Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2022-09-06 20:12:55 -07:00
Andrew Dean	0cbf74a940	KAFKA-14194: Fix NPE in Cluster.nodeIfOnline (#12584 ) When utilizing the rack-aware consumer configuration and rolling updates are being applied to the Kafka brokers the metadata updates can be in a transient state and a given topic-partition can be missing from the metadata. This seems to resolve itself after a bit of time but before it can resolve the `Cluster.nodeIfOnline` method throws an NPE. This patch checks to make sure that a given topic-partition has partition info available before using that partition info. Reviewers: David Jacot <djacot@confluent.io>	2022-09-05 09:56:23 +02:00
dengziming	9e71d818d6	KAFKA-13990: KRaft controller should return right features in ApiVersionResponse (#12294 ) Previously, the KRaft controller was incorrectly reporting an empty feature set in ApiVersionResponse. This was preventing any multi-node clusters from being upgraded via kafka-features.sh, since they would incorrectly believe that metadata.version was not a supported feature. This PR adds a regression test for this bug, KRaftClusterTest.testUpdateMetadataVersion. Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-08-31 11:35:58 -07:00
Colin Patrick McCabe	28d5a05943	KAFKA-14187: kafka-features.sh: add support for --metadata (#12571 ) This PR adds support to kafka-features.sh for the --metadata flag, as specified in KIP-778. This flag makes it possible to upgrade to a new metadata version without consulting a table mapping version names to short integers. Change --feature to use a key=value format. FeatureCommandTest.scala: make most tests here true unit tests (that don't start brokers) in order to improve test run time, and allow us to test more cases. For the integration test part, test both KRaft and ZK-based clusters. Add support for mocking feature operations in MockAdminClient.java. upgrade.html: add a section describing how the metadata.version should be upgraded in KRaft clusters. Add kraft_upgrade_test.py to test upgrades between KRaft versions. Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, José Armando García Sancio <jsancio@gmail.com>	2022-08-30 16:56:03 -07:00
dengziming	fe99262fe2	MINOR: Add KRaft broker api to protocol docs (#11786 ) Add KRaft broker api to protocol docs Reviewers: Luke Chen <showuon@gmail.com>	2022-08-30 14:40:11 +08:00
RivenSun	2d5871d57e	MINOR: MINOR: Remove redundant error log in ChannelBuilder (#12539 ) Remove redundant error log in ChannelBuilder Reviewers: Luke Chen <showuon@gmail.com>	2022-08-30 11:24:58 +08:00
José Armando García Sancio	f83c6f2da4	KAFKA-14183; Cluster metadata bootstrap file should use header/footer (#12565 ) The boostrap.checkpoint files should include a control record batch for the SnapshotHeaderRecord at the start of the file. It should also include a control record batch for the SnapshotFooterRecord at the end of the file. The snapshot header record is important because it versions the rest of the bootstrap file. Reviewers: David Arthur <mumrah@gmail.com>	2022-08-27 19:11:06 -07:00
Jason Gustafson	5c52c61a46	MINOR: A few cleanups for DescribeQuorum APIs (#12548 ) A few small cleanups in the `DescribeQuorum` API and handling logic: - Change field types in `QuorumInfo`: - `leaderId`: `Integer` -> `int` - `leaderEpoch`: `Integer` -> `long` (to allow for type expansion in the future) - `highWatermark`: `Long` -> `long` - Use field names `lastFetchTimestamp` and `lastCaughtUpTimestamp` consistently - Move construction of `DescribeQuorumResponseData.PartitionData` into `LeaderState` - Consolidate fetch time/offset update logic into `LeaderState.ReplicaState.updateFollowerState` Reviewers: Luke Chen <showuon@gmail.com>, José Armando García Sancio <jsancio@users.noreply.github.com>	2022-08-24 13:12:14 -07:00
Mickael Maison	0507597597	KAFKA-10360: Allow disabling JMX Reporter (KIP-830) (#12046 ) This implements KIP-830: https://cwiki.apache.org/confluence/display/KAFKA/KIP-830%3A+Allow+disabling+JMX+Reporter It adds a new configuration `auto.include.jmx.reporter` that can be set to false to disable the JMX Reporter. This configuration is deprecated and will be removed in the next major version. Reviewers: Tom Bentley <tbentley@redhat.com>, Christo Lolov <christo_lolov@yahoo.com>	2022-08-24 18:30:31 +02:00
Divij Vaidya	9aef992118	MINOR: Catch InvocationTargetException explicitly and propagate underlying cause (#12230 ) Catch InvocationTargetException explicitly and propagate underlying cause Reviewers: Ismael Juma <mlists@juma.me.uk>, Matthew de Detrich <mdedetrich@gmail.com>, Kvicii, Luke Chen <showuon@gmail.com>	2022-08-23 17:34:39 +08:00
dengziming	150fd5b0b1	KAFKA-13914: Add command line tool kafka-metadata-quorum.sh (#12469 ) Add `MetadataQuorumCommand` to describe quorum status, I'm trying to use arg4j style command format, currently, we only support one sub-command which is "describe" and we can specify 2 arguments which are --status and --replication. ``` # describe quorum status kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --replication ReplicaId LogEndOffset Lag LastFetchTimeMs LastCaughtUpTimeMs Status 0 10 0 -1 -1 Leader 1 10 0 -1 -1 Follower 2 10 0 -1 -1 Follower kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status ClusterId: fMCL8kv1SWm87L_Md-I2hg LeaderId: 3002 LeaderEpoch: 2 HighWatermark: 10 MaxFollowerLag: 0 MaxFollowerLagTimeMs: -1 CurrentVoters: [3000,3001,3002] CurrentObservers: [0,1,2] # specify AdminClient properties kafka-metadata-quorum.sh --bootstrap-server localhost:9092 --command-config config.properties describe --status ``` Reviewers: Jason Gustafson <jason@confluent.io>	2022-08-20 08:37:26 -07:00
Guozhang Wang	5d32f24cc3	MINOR: Improve KafkaProducer Javadocs (#12537 ) While reviewing KIP-588 and KIP-691 I went through the exception throwing behavior and wanted to improve the related javadocs a little bit. Reviewers: John Roesler <vvcephei@apache.org>	2022-08-19 10:09:48 -07:00
Luke Chen	bc50f70219	remove sleep in test (#12525 ) Remove spurious sleep in ConsumerCoordinatorTest Reviewers: Ismael Juma <mlists@juma.me.uk>	2022-08-19 14:17:33 +08:00
Jason Gustafson	bc90c29faf	KAFKA-14167; Completion exceptions should not be translated directly to error codes (#12518 ) There are a few cases in `ControllerApis` where we may see an `ApiException` wrapped as a `CompletionException`. This can happen in `QuorumController.allocateProducerIds` where the returned future is the result of calling `thenApply` on the future passed to the controller. The danger when this happens is that the `CompletionException` gets passed to `Errors.forException`, which translates it to an `UNKNOWN_SERVER_ERROR`. At a minimum, I found that the `AllocateProducerIds` and `UpdateFeatures` APIs were affected by this bug, but it is difficult to root out all cases. Interestingly, `DeleteTopics` is not affected by this bug as I originally suspected. This is because we have logic in `ApiError.fromThrowable` to check for both `CompletionException` and `ExecutionException` and to pull out the underlying cause. This patch duplicates this logic from `ApiError.fromThrowable` into `Errors.forException` to be sure that we handle all cases where exceptions are converted to error codes. Reviewers: David Arthur <mumrah@gmail.com>	2022-08-17 18:11:42 -07:00
Jason Gustafson	e5b865d6bf	KAFKA-13940; Return NOT_LEADER_OR_FOLLOWER if DescribeQuorum sent to non-leader (#12517 ) Currently the server will return `INVALID_REQUEST` if a `DescribeQuorum` request is sent to a node that is not the current leader. In addition to being inconsistent with all of the other leader APIs in the raft layer, this error is treated as fatal by both the forwarding manager and the admin client. Instead, we should return `NOT_LEADER_OR_FOLLOWER` as we do with the other APIs. This error is retriable and we can rely on the admin client to retry it after seeing this error. Reviewers: David Jacot <djacot@confluent.io>	2022-08-17 15:48:32 -07:00
Badai Aqrandista	d529d86aa4	KAFKA-13559: Fix issue where responses intermittently takes 300+ ms to respond, even when the server is idle. (#12416 ) Ensures that SSL buffered data is processed by server immediately on the next poll when channel is unmuted after processing previous request. Poll timeout is reset to zero for this case to avoid 300ms delay in poll() if no new data arrives on the sockets. Reviewers: David Mao <dmao@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>	2022-08-15 12:34:03 +01:00
Ismael Juma	91c23a3b93	MINOR: Cleanup NetworkReceive constructors (#12511 ) There was unnecessary duplication and one of the overloads did not set the size field for no good reason. Reviewers: Luke Chen <showuon@gmail.com>	2022-08-14 12:54:20 -07:00
Francesco Nigro	aecd47b3bc	KAFKA-13900 Support Java 9 direct ByteBuffer Checksum methods (#12163 ) Some numbers with JDK 11. Before: ``` Benchmark (bytes) (direct) (readonly) (seed) Mode Cnt Score Error Units Crc32CBenchmark.checksum 128 false false 42 thrpt 20 26.730 ± 0.410 ops/us Crc32CBenchmark.checksum 128 true false 42 thrpt 20 1.781 ± 0.007 ops/us Crc32CBenchmark.checksum 1024 false false 42 thrpt 20 6.553 ± 0.053 ops/us Crc32CBenchmark.checksum 1024 true false 42 thrpt 20 0.223 ± 0.001 ops/us Crc32CBenchmark.checksum 4096 false false 42 thrpt 20 4.054 ± 0.015 ops/us Crc32CBenchmark.checksum 4096 true false 42 thrpt 20 0.056 ± 0.001 ops/us ``` And this PR: ``` Benchmark (bytes) (direct) (readonly) (seed) Mode Cnt Score Error Units Crc32CBenchmark.checksum 128 false false 42 thrpt 20 26.922 ± 0.065 ops/us Crc32CBenchmark.checksum 128 true false 42 thrpt 20 24.656 ± 0.620 ops/us Crc32CBenchmark.checksum 1024 false false 42 thrpt 20 6.548 ± 0.025 ops/us Crc32CBenchmark.checksum 1024 true false 42 thrpt 20 6.432 ± 0.136 ops/us Crc32CBenchmark.checksum 4096 false false 42 thrpt 20 4.031 ± 0.022 ops/us Crc32CBenchmark.checksum 4096 true false 42 thrpt 20 4.004 ± 0.016 ops/us ``` The purpose of the PR is to makes heap and direct ByteBuffer able to perform the same (especially not read-only), without affecting the existing heap ByteBuffer performance. Reviewers: Ismael Juma <ismael@juma.me.uk>, Divij Vaidya <diviv@amazon.com>	2022-08-12 20:09:15 -07:00
Divij Vaidya	f17928e4be	Fix the rate window size calculation for edge cases (#12184 ) ## Problem Implementation of connection creation rate quotas in Kafka is dependent on two configurations: [quota.window.num](https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.num) AND [quota.window.size.seconds](https://kafka.apache.org/documentation.html#brokerconfigs_quota.window.size.seconds) The minimum possible values of these configuration is 1 as per the documentation. However, when we set 1 as the configuration value, we can hit a situation where rate is calculated as NaN (and hence, leads to exceptions). This specific scenario occurs when an event is recorded at the start of a sample window. ## Solution This patch fixes this edge case by ensuring that the windowSize over which Rate is calculated is at least 1ms (even if it is calculated at the start of the sample window). ## Test Added a unit test which fails before the patch and passes after the patch Reviewers: Ismael Juma <ismael@juma.me.uk>, David Mao <dmao@confluent.io>	2022-08-10 06:25:05 -07:00
Niket	48caba9340	KAFKA-14104; Add CRC validation when iterating over Metadata Log Records (#12457 ) This commit adds a check to ensure the RecordBatch CRC is valid when iterating over a Batch of Records using the RecordsIterator. The RecordsIterator is used by both Snapshot reads and Log Records reads in Kraft. The check can be turned off by a class parameter and is on by default. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-08-08 15:03:04 -07:00
Mickael Maison	1cc1e776f7	KAFKA-14095: Improve handling of sync offset failures in MirrorMaker (#12432 ) We should not treat UNKNOWN_MEMBER_ID as an unexpected error in the Admin client. In MirrorMaker, check the result of committing offsets and log an useful error message in case that failed with UNKNOWN_MEMBER_ID. Reviewers: Chris Egerton <fearthecellos@gmail.com>	2022-08-01 12:59:41 +02:00
vamossagar12	3ddb62316f	KAFKA-14012: Add warning to closeQuietly documentation about method references of null objects (#12321 ) Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Chris Egerton <fearthecellos@gmail.com>	2022-07-28 16:44:19 -04:00
Daniel Fonai	160a6ab4cb	KAFKA-13730: OAuth access token validation fails if it does not contain the "sub" claim (#11886 ) Removes the requirement of presence of sub claim in JWT access tokens, when clients authenticate via OAuth. This does not interfere with OAuth specifications and is to ensure wider compatibility with OAuth providers. Unit test added. Reviewers: Kirk True <ktrue@confluent.io>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>	2022-07-27 16:44:54 +05:30
Artem Livshits	badfbacdd0	KAFKA-14020: Performance regression in Producer (#12365 ) As part of KAFKA-10888 work, there were a couple regressions introduced: A call to time.milliseconds() got moved under the queue lock, moving it back outside the lock. The call may be expensive and cause lock contention. Now the call is moved back outside of the lock. The reference to ProducerRecord was held in the batch completion callback, so it was kept alive as long as the batch was alive, which may increase the amount of memory in certain scenario and cause excessive GC work. Now the reference is reset early, so the ProducerRecord lifetime isn't bound to the batch lifetime. Tested via manually crafted benchmark, lock profile shows ~15% lock contention on the ArrayQueue lock without the fix and ~5% lock contention with the fix (which is also consistent with pre-KAFKA-10888 profile). Alloc profile shows ~10% spent in ProducerBatch.completeFutureAndFireCallbacks without the fix vs. ~0.25% with the fix (which is also consistent with pre-KAFKA-10888 profile). Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>	2022-07-20 08:19:31 -07:00
Shawn	eee40200df	KAFKA-14024: Consumer keeps Commit offset in onJoinPrepare in Cooperative rebalance (#12349 ) In KAFKA-13310, we tried to fix a issue that consumer#poll(duration) will be returned after the provided duration. It's because if rebalance needed, we'll try to commit current offset first before rebalance synchronously. And if the offset committing takes too long, the consumer#poll will spend more time than provided duration. To fix that, we change commit sync with commit async before rebalance (i.e. onPrepareJoin). However, in this ticket, we found the async commit will keep sending a new commit request during each Consumer#poll, because the offset commit never completes in time. The impact is that the existing consumer will be kicked out of the group after rebalance timeout without joining the group. That is, suppose we have consumer A in group G, and now consumer B joined the group, after the rebalance, only consumer B in the group. Besides, there's also another bug found during fixing this bug. Before KAFKA-13310, we commitOffset sync with rebalanceTimeout, which will retry when retriable error until timeout. After KAFKA-13310, we thought we have retry, but we'll retry after partitions revoking. That is, even though the retried offset commit successfully, it still causes some partitions offsets un-committed, and after rebalance, other consumers will consume overlapping records. Reviewers: RivenSun <riven.sun@zoom.us>, Luke Chen <showuon@gmail.com>	2022-07-20 10:03:43 +08:00
Rajini Sivaram	ddbc030036	MINOR: Fix options for old-style Admin.listConsumerGroupOffsets (#12406 ) Reviewers: David Jacot <djacot@confluent.io>	2022-07-15 09:21:35 +01:00
Sanjana Kaundinya	beac86f049	KAFKA-13043: Implement Admin APIs for offsetFetch batching (#10964 ) This implements the AdminAPI portion of KIP-709: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=173084258. The request/response protocol changes were implemented in 3.0.0. A new batched API has been introduced to list consumer offsets for different groups. For brokers older than 3.0.0, separate requests are sent for each group. Co-authored-by: Rajini Sivaram <rajinisivaram@googlemail.com> Co-authored-by: David Jacot <djacot@confluent.io> Reviewers: David Jacot <djacot@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2022-07-14 13:47:34 +01:00
Kirk True	d3130f2e91	KAFKA-14062: OAuth client token refresh fails with SASL extensions (#12398 ) - Different objects should be considered unique even with same content to support logout - Added comments for SaslExtension re: removal of equals and hashCode - Also swapped out the use of mocks in exchange for real SaslExtensions so that we exercise the use of default equals() and hashCode() methods. - Updates to implement equals and hashCode and add tests in SaslExtensionsTest to confirm Co-authored-by: Purshotam Chauhan <pchauhan@confluent.io> Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-07-12 14:28:19 +05:30
Eugene Tolbakov	a3f06d8814	KAFKA-14013: Limit the length of the `reason` field sent on the wire (#12388 ) KIP-800 added the `reason` field to the JoinGroupRequest and the LeaveGroupRequest as I mean to provide more information to the group coordinator. In https://issues.apache.org/jira/browse/KAFKA-13998, we discovered that the size of the field is limited to 32767 chars by our serialisation mechanism. At the moment, the field either provided directly by the user or constructed internally is directly set regardless of its length. This patch sends only the first 255 chars of the used provided or internally generated reason on the wire. Given the purpose of this field, that seems acceptable and that should still provide enough information to operators to understand the cause of a rebalance. Reviewers: David Jacot <djacot@confluent.io>	2022-07-12 09:31:16 +02:00
SC	23c92ce793	MINOR: Use String#format for niceMemoryUnits result (#12389 ) Reviewers: Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>	2022-07-11 10:36:56 +08:00
vamossagar12	5a1bac2608	KAFKA-13846: Follow up PR to address review comments (#12297 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-07-07 11:43:38 -07:00
Guozhang Wang	ca8135b242	HOTFIX: KIP-851, rename requireStable in ListConsumerGroupOffsetsOptions	2022-07-06 22:00:31 -07:00
Guozhang Wang	915c781243	KAFKA-10199: Remove main consumer from store changelog reader (#12337 ) When store changelog reader is called by a different thread than the stream thread, it can no longer use the main consumer to get committed offsets since consumer is not thread-safe. Instead, we would remove main consumer and leverage on the existing admin client to get committed offsets. Reviewers: Bruno Cadonna <cadonna@apache.org>	2022-07-06 17:23:18 -07:00
Prashanth Joseph Babu	1daa149730	MINOR: record lag max metric documentation enhancement (#12367 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-06-30 09:30:14 -07:00
David Jacot	cf9c3a2eca	MINOR: Fix group coordinator is unavailable log (#12335 ) Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Luke Chen <showuon@gmail.com>	2022-06-24 11:48:09 +02:00
Viktor Somogyi-Vass	d65d886798	KAFKA-6945: KIP-373, allow users to create delegation token for others (#10738 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-06-21 12:51:08 +05:30
James Hughes	30216ea1c5	KAFKA-13998: JoinGroupRequestData 'reason' can be too large (#12298 ) The `reason` field cannot contain more than 32767 chars. We did not expect to ever reach this but it turns out that it is possible if the the message provided in the `Throwable` somehow contains the entire stack trace. This patch ensure that the reason crafted based on exceptions remain small. Co-authored-by: David Jacot <djacot@confluent.io> Reviewers: Bruno Cadonna <cadonna@apache.org>, A. Sophie Blee-Goldman <ableegoldman@apache.org>, David Jacot <djacot@confluent.io>	2022-06-20 16:47:02 +02:00
Guozhang Wang	cfdd567955	KAFKA-13880: Remove DefaultPartitioner from StreamPartitioner (#12304 ) There are some considerata embedded in this seemingly straight-forward PR that I'd like to explain here. The StreamPartitioner is used to send records to three types of topics: 1) repartition topics, where key should never be null. 2) changelog topics, where key should never be null. 3) sink topics, where only non-windowed key could be null and windowed key should still never be null. Also, the StreamPartitioner is used as part of the IQ to determine which host contains a certain key, as determined by the case 2) above. This PR's main goal is to remove the deprecated producer's default partitioner, while with those things in mind such that: We want to make sure for not-null keys, the default murmur2 hash behavior of the streams' partitioner stays consistent with producer's new built-in partitioner. For null-keys (which is only possible for non-window default stream partition, and is never used for IQ), we would fix the issue that we may never rotate to a new partitioner by setting the partition as null hence relying on the newly introduced built-in partitioner. Reviewers: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>, Matthias J. Sax <matthias@confluent.io>	2022-06-17 20:17:02 -07:00
RivenSun	1e21201ea2	KAFKA-13890: Improve documentation of `ssl.keystore.type` and `ssl.truststore.type` (#12226 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>  , David Jacot <djacot@confluent.io>, Kvicii <kvicii.yu@gmail.com>	2022-06-17 16:31:13 +02:00
Divij Vaidya	17637c4ad5	MINOR: Clean up tmp files created by tests (#12233 ) There are a bunch of tests which do not clean up after themselves. This leads to accumulation of files in the tmp directory of the system on which the tests are running. This code change fixes some of the main culprit tests which leak the files in the temporary directory. Reviewers: Ismael Juma <ismael@juma.me.uk>, Kvicii <kvicii.yu@gmail.com>	2022-06-16 16:46:07 -07:00
Niket	a126e3a622	KAFKA-13888; Addition of Information in DescribeQuorumResponse about Voter Lag (#12206 ) This commit adds an Admin API handler for DescribeQuorum Request and also adds in two new fields LastFetchTimestamp and LastCaughtUpTimestamp to the DescribeQuorumResponse as described by KIP-836. This commit does not implement the newly added fields. Those will be added in a subsequent commit. Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-06-15 09:20:15 -07:00
Guozhang Wang	39a555ba94	KAFKA-13846: Use the new addMetricsIfAbsent API (#12287 ) Use the newly added function to replace the old addMetric function that may throw illegal argument exceptions. Although in some cases concurrency should not be possible they do not necessarily remain always true in the future, so it's better to use the new API just to be less error-prone. Reviewers: Bruno Cadonna <cadonna@apache.org>	2022-06-14 16:04:26 -07:00
Mickael Maison	4fcfd9ddc4	KAFKA-13958: Expose logdirs total/usable space via Kafka API (KIP-827) (#12248 ) This implements KIP-827: https://cwiki.apache.org/confluence/display/KAFKA/KIP-827%3A+Expose+logdirs+total+and+usable+space+via+Kafka+API Add TotalBytes and UsableBytes to DescribeLogDirsResponse Add matching getters on LogDirDescription Reviewers: Tom Bentley <tbentley@redhat.com>, Divij Vaidya<diviv@amazon.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Igor Soarez <soarez@apple.com>	2022-06-14 14:20:29 +02:00
David Jacot	f83d95d9a2	KAFKA-13916; Fenced replicas should not be allowed to join the ISR in KRaft (KIP-841, Part 2) (#12181 ) This path implements [KIP-841](https://cwiki.apache.org/confluence/display/KAFKA/KIP-841%3A+Fenced+replicas+should+not+be+allowed+to+join+the+ISR+in+KRaft). Specifically, it implements the following: * It introduces INELIGIBLE_REPLICA and NEW_LEADER_ELECTED error codes. * The KRaft controller validates the new ISR provided in the AlterPartition request and rejects the call if any replica in the new ISR is not eligible to join the the ISR - e.g. when fenced or shutting down. The leader reverts to the last committed ISR when its request is rejected due to this. * The partition leader also verifies that a replica is eligible before trying to add it back to the ISR. If it is not eligible, the ISR expansion is not triggered at all. * Updates the AlterPartition API to use topic ids. Updates the AlterPartition manger to handle topic names/ids. Updates the ZK controller and the KRaft controller to handle topic names/ids depending on the version of the request used. Reviewers: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>, José Armando García Sancio <jsancio@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-06-14 13:12:45 +02:00
vamossagar12	5cab11cf52	KAFKA-13846: Adding overloaded metricOrElseCreate method (#12121 ) Reviewers: David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2022-06-13 10:36:39 -07:00
Shawn	c23d60d56c	KAFKA-13891: reset generation when syncgroup failed with REBALANCE_IN_PROGRESS (#12140 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-06-13 10:39:54 +08:00
Chris Egerton	7692643f59	KAFKA-13967: Document guarantees for producer callbacks on transaction commit (#12264 ) Clarify in producer docs that `send` callbacks are invoked prior to transaction completion. Reviewers: Tom Bentley <tbentley@redhat.com>, Jason Gustafson <jason@confluent.io>	2022-06-10 09:51:36 -07:00
András Csáki	3d5b41e05f	[KAFKA-13848] Clients remain connected after SASL re-authentication f… (#12179 ) Clients remain connected and able to produce or consume despite an expired OAUTHBEARER token. Root cause seems to be SaslServerAuthenticator#calcCompletionTimesAndReturnSessionLifetimeMs failing to set ReauthInfo#sessionExpirationTimeNanos when tokens have already expired (when session life time goes negative), in turn causing KafkaChannel#serverAuthenticationSessionExpired returning false and finally SocketServer not closing the channel. The issue is observed with OAUTHBEARER but seems to have a wider impact on SASL re-authentication. Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Sam Barker <sbarker@redhat.com>	2022-06-10 21:33:33 +08:00
Christo Lolov	6c90f3335e	KAFKA-13947: Use %d formatting for integers rather than %s (#12267 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>, Kvicii <kvicii.yu@gmail.com>	2022-06-10 13:55:52 +02:00
Viktor Somogyi-Vass	f8f57960c6	KAFKA-13917: Avoid calling lookupCoordinator() in tight loop (#12180 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-06-10 12:05:05 +08:00
nicolasguyomar	ae279a9d26	MINOR: Improve description of `cleanup.policy` (#12086 ) Clarifies description of `cleanup.policy`, in particular for the case when both "delete" and "compact" are specified. Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-06-07 17:54:20 -07:00
bozhao12	75223b668d	MINOR: A fewer method javadoc and typo fix (#12253 ) Fixes an unneeded parameter doc in `MemoryRecordsBuilder` and a typo in `LazyDownConversionRecordsSend`. Reviewers: Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io> Co-authored-by: zhaobo <zhaobo@kuaishou.com>	2022-06-06 12:25:05 -07:00
Guozhang Wang	5d593287c7	HOTFIX: add space to avoid checkstyle failure	2022-06-06 11:34:59 -07:00
Guozhang Wang	2047fc3715	HOTFIX: only try to clear discover-coordinator future upon commit (#12244 ) This is another way of fixing KAFKA-13563 other than #11631. Instead of letting the consumer to always try to discover coordinator in pool with either mode (subscribe / assign), we defer the clearance of discover future upon committing async only. More specifically, under manual assign mode, there are only three places where we need the coordinator: * commitAsync (both by the consumer itself or triggered by caller), this is where we want to fix. * commitSync, which we already try to re-discovery coordinator. * committed (both by the consumer itself based on reset policy, or triggered by caller), which we already try to re-discovery coordinator. The benefits are that for manual assign mode that does not try to trigger any of the above three, then we never would be discovering coordinator. The original fix in #11631 would let the consumer to discover coordinator even if none of the above operations are required. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-06-06 11:05:41 -07:00
Federico Valeri	af71375d6d	KAFKA-13933: Fix stuck SSL unit tests in case of authentication failure (#12159 ) When there is an authentication error after the initial TCP connection, the selector never becomes READY, and these tests wait forever waiting for this state. This will happen while using an JDK like OpenJDK build that does not support the required cipher suites. Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>, Divij Vaidya <diviv@amazon.com>	2022-06-05 15:47:09 +08:00
dengziming	1d6e3d6cb3	KAFKA-13845: Add support for reading KRaft snapshots in kafka-dump-log (#12084 ) The kafka-dump-log command should accept files with a suffix of ".checkpoint". It should also decode and print using JSON the snapshot header and footer control records. Reviewers: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-06-01 14:49:00 -07:00
José Armando García Sancio	7d1b0926fa	KAFKA-13883: Implement NoOpRecord and metadata metrics (#12183 ) Implement NoOpRecord as described in KIP-835. This is controlled by the new metadata.max.idle.interval.ms configuration. The KRaft controller schedules an event to write NoOpRecord to the metadata log if the metadata version supports this feature. This event is scheduled at the interval defined in metadata.max.idle.interval.ms. Brokers and controllers were improved to ignore the NoOpRecord when replaying the metadata log. This PR also addsffour new metrics to the KafkaController metric group, as described KIP-835. Finally, there are some small fixes to leader recovery. This PR fixes a bug where metadata version 3.3-IV1 was not marked as changing the metadata. It also changes the ReplicaControlManager to accept a metadata version supplier to determine if the leader recovery state is supported. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-06-01 10:48:24 -07:00
David Jacot	283a9955cf	MINOR: inline metrics in RecordAccumulator (#12227 ) Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-06-01 09:48:31 +02:00
David Jacot	6b93652a54	MINOR: Improve code style in FenceProducersHandler (#12208 ) Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-28 09:15:22 -07:00
Colin Patrick McCabe	7143267f71	MINOR: Fix some bugs with UNREGISTER_BROKER Fix some bugs in the KRaft unregisterBroker API and add a junit test. 1. kafka-cluster-tool.sh unregister should fail if no broker ID is passed. 2. UnregisterBrokerRequest must be marked as a KRaft broker API so that KRaft brokers can receive it. 3. KafkaApis.scala must forward UNREGISTER_BROKER to the controller. Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-05-26 14:07:29 -07:00
dengziming	c22d320a5c	KAFKA-12902: Add unit32 type in generator (#10830 ) Add uint32 support in the KRPC generator. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-05-25 16:25:16 -07:00
dengziming	54d60ced86	KAFKA-13833: Remove the min_version_level from the finalized version range written to ZooKeeper (#12062 ) Reviewers: David Arthur <mumrah@gmail.com>	2022-05-25 14:02:34 -04:00
David Arthur	1135f22eaf	KAFKA-13830 MetadataVersion integration for KRaft controller (#12050 ) This patch builds on #12072 and adds controller support for metadata.version. The kafka-storage tool now allows a user to specify a specific metadata.version to bootstrap into the cluster, otherwise the latest version is used. Upon the first leader election of the KRaft quroum, this initial metadata.version is written into the metadata log. When writing snapshots, a FeatureLevelRecord for metadata.version will be written out ahead of other records so we can decode things at the correct version level. This also includes additional validation in the controller when setting feature levels. It will now check that a given metadata.version is supportable by the quroum, not just the brokers. Reviewers: José Armando García Sancio <jsancio@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>, Alyssa Huang <ahuang@confluent.io>	2022-05-18 12:08:36 -07:00
Dejan Maric	1c02a764ec	KAFKA-12703; Allow unencrypted private keys when using PEM files (#11916 ) Reviewers: David Jacot <djacot@confluent.io>	2022-05-16 09:25:05 +02:00
Colin Patrick McCabe	fa59be4e77	KAFKA-13649: Implement early.start.listeners and fix StandardAuthorizer loading (#11969 ) Since the StandardAuthorizer relies on the metadata log to store its ACLs, we need to be sure that we have the latest metadata before allowing the authorizer to be used. However, if the authorizer is not usable for controllers in the cluster, the latest metadata cannot be fetched, because inter-node communication cannot occur. In the initial commit which introduced StandardAuthorizer, we punted on the loading issue by allowing the authorizer to be used immediately. This commit fixes that by implementing early.start.listeners as specified in KIP-801. This will allow in superusers immediately, but throw the new AuthorizerNotReadyException if non-superusers try to use the authorizer before StandardAuthorizer#completeInitialLoad is called. For the broker, we call StandardAuthorizer#completeInitialLoad immediately after metadata catch-up is complete, right before unfencing. For the controller, we call StandardAuthorizer#completeInitialLoad when the node has caught up to the high water mark of the cluster metadata partition. This PR refactors the SocketServer so that it creates the configured acceptors and processors in its constructor, rather than requiring a call to SocketServer#startup A new function, SocketServer#enableRequestProcessing, then starts the threads and begins listening on the configured ports. enableRequestProcessing uses an async model: we will start the acceptor and processors associated with an endpoint as soon as that endpoint's authorizer future is completed. Also fix a bug where the controller and listener were sharing an Authorizer when in co-located mode, which was not intended. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-12 14:48:33 -07:00
José Armando García Sancio	e94934b6b7	MINOR; DeleteTopics version tests (#12141 ) Add a DeleteTopics test for all supported versions. Convert the DeleteTopicsRequestTest to run against both ZK and KRaft mode. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>	2022-05-12 13:04:48 -07:00
Ismael Juma	3ea7b418fb	MINOR: Make TopicPartitionBookkeeper and TopicPartitionEntry top level (#12097 ) This is the first step towards refactoring the `TransactionManager` so that it's easier to understand and test. The high level idea is to push down behavior to `TopicPartitionEntry` and `TopicPartitionBookkeeper` and to encapsulate the state so that the mutations can only be done via the appropriate methods. Inner classes have no mechanism to limit access from the outer class, which presents a challenge when mutability is widespread (like we do here). As a first step, we make `TopicPartitionBookkeeper` and `TopicPartitionEntry` top level and rename them and a couple of methods to make the intended usage clear and avoid redundancy. To make the review easier, we don't change anything else except access changes required for the code to compile. The next PR will contain the rest of the refactoring. Reviewers: Jason Gustafson <jason@confluent.io>	2022-05-11 09:30:46 -07:00
chern	eeb1e702eb	KAFKA-13879: Reconnect exponential backoff is ineffective in some cases (#12131 ) When a client connects to a SSL listener using PLAINTEXT security protocol, after the TCP connection is setup, the client considers the channel setup is complete. In reality the channel setup is not complete yet. The client then resets reconnect exponential backoff and issues API version request. Since the broker expects SSL handshake, the API version request will cause the connection to disconnect. Client reconnects without exponential backoff since it has been reset. This commit removes the reset of reconnect exponential backoff when sending API version request. In the good case where the channel setup is complete, reconnect exponential backoff will be reset when the node becomes ready, which is after getting the API version response. Inter-broker clients which do not send API version request and go directly to ready state continue to reset backoff before any successful requests. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2022-05-10 11:36:42 +01:00
RivenSun	df507e56e2	KAFKA-13793: Add validators for configs that lack validators (#12010 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Luke Chen <showuon@gmail.com>, Chris Egerton <fearthecellos@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <divijvaidya13@gmail.com>	2022-05-09 20:29:17 +02:00
Artem Livshits	f7db6031b8	KAFKA-10888: Sticky partition leads to uneven produce msg (#12049 ) The design is described in detail in KIP-794 https://cwiki.apache.org/confluence/display/KAFKA/KIP-794%3A+Strictly+Uniform+Sticky+Partitioner. Implementation notes: The default partitioning logic is moved to the BuiltInPartitioner class (there is one object per topic). The object keeps track of how many bytes are produced per-partition and once the amount exceeds batch.size, switches to the next partition (note that partition switch decision is decoupled from batching). The object also keeps track of probability weights that are based on the queue sizes (the larger the queue size is the less chance for the next partition to be chosen). The queue sizes are calculated in the RecordAccumulator in the `ready` method, the method already enumerates all partitions so we just add some extra logic into the existing O(N) method. The partition switch decision may take O(logN), where N is the number partitions per topic, but it happens only once per batch.size (and the logic is avoided when all queues are of equal size). Produce bytes accounting logic is lock-free. When partitioner.availability.timeout.ms is non-0, RecordAccumulator keeps stats on "node latency" which is defined as the difference between the last time the node had a batch waiting to be send and the last time the node was ready to take a new batch. If this difference exceeds partitioner.availability.timeout.ms we don't switch to that partition until the node is ready. Reviewers: Jun Rao <junrao@gmail.com>	2022-05-06 11:31:12 -07:00
Joel Hamill	18b84d0404	MINOR: Fix typos in configuration docs (#11874 ) Reviewers: Chris Egerton, Weikang Sun, Andrew Eugene Choi, Luke Chen, Guozhang Wang	2022-05-04 10:27:14 -07:00
dengziming	bf7cd675f8	MINOR: Remove duplicated test cases in MetadataVersionTest (#12116 ) These tests belongs to ApiVersionsResponseTest, and accidentally copied them to MetadataVersionTest when working on #12072. Reviewer: Luke Chen <showuon@gmail.com>	2022-05-04 11:10:39 +08:00
Alyssa Huang	8245c9a3d5	KAFKA-13854 Refactor ApiVersion to MetadataVersion (#12072 ) Refactoring ApiVersion to MetadataVersion to support both old IBP versioning and new KRaft versioning (feature flags) for KIP-778. IBP versions are now encoded as enum constants and explicitly prefixed w/ IBP_ instead of KAFKA_, and having a LegacyApiVersion vs DefaultApiVersion was not necessary and replaced with appropriate parsing rules for extracting the correct shortVersions/versions. Co-authored-by: David Arthur <mumrah@gmail.com> Reviewers: Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2022-05-02 16:27:52 -07:00
Ismael Juma	c462a657ec	KAFKA-13794: Fix comparator of inflightBatchesBySequence in TransactionsManager (round 3) (#12096 ) Conceptually, the ordering is defined by the producer id, producer epoch and the sequence number. This set should generally only have entries for the same producer id and epoch, but there is one case where we can have conflicting `remove` calls and hence we add this as a temporary safe fix. We'll follow-up with a fix that ensures the original intended invariant. Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-04-28 06:13:23 -07:00
Guozhang Wang	e026384ffb	HOTFIX: Only measure in nano when producer metadata refresh is required (#12102 ) We added the metadata wait time in total blocked time (#11805). But we added it in the critical path of send which is called per-record, whereas metadata refresh only happens rarely. This way the cost of time.nanos becomes unnecessarily significant as we call it twice per record. This PR moves the call to inside the waitOnMetadata callee and only when we do need to wait for a metadata refresh round-trip (i.e. we are indeed blocking). Reviewers: Matthias J. Sax <matthias@confluent.io>	2022-04-27 11:27:54 -07:00
Mike Tobola	6d7723f073	MINOR: fix html generation syntax errors (#12094 ) The html document generation has some errors in it, specifically related to protocols. The two issues identified and resolved are: * Missing </tbody> closing tags added * Invalid usage of a <p> tag as a wrapper element for <table> elements. Changed the <p> tag to be a <div>. Tested by running ./gradlew siteDocsTar and observing that the output was properly formed. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-04-26 16:51:12 -07:00
Jason Gustafson	f2a782a4d7	MINOR: Rename `AlterIsrManager` to `AlterPartitionManager` (#12089 ) Since we have changed the `AlterIsr` API to `AlterPartition`, it makes sense to rename `AlterIsrManager` as well and some of the associated classes. Reviewers: dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-04-26 09:34:18 -07:00
Philip Nee	b020819ac4	KAFKA-12841: Remove an additional call of onAcknowledgement (#12064 ) The bug was introduced in #11689 that an additional onAcknowledgement was made using the InterceptorCallback class. This is undesirable since onSendError will attempt to call onAcknowledgement once more. Reviewers: Jun Rao <junrao@gmail.com>	2022-04-25 15:59:45 -07:00
RivenSun	2b64f1a571	MINOR: Using enums for auto.offset.reset configuration (#12077 ) Using enums instead of Strings for auto.offset.reset configuration Reviewers: Divij Vaidya <divijvaidya13@gmail.com>, Luke Chen <showuon@gmail.com	2022-04-24 20:54:44 +08:00
ruanliang	e8c675ed56	KAFKA-13834: add test coverage for RecordAccumulatorTest (#12092 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-24 17:06:19 +08:00
David Jacot	7c8c65fc54	MINOR: Rename `ZkVersion` to `PartitionEpoch` (#12071 ) This patch does some initial cleanups in the context of KAFKA-13790. Mainly, it renames `ZkVersion` field to `PartitionEpoch` in the `LeaderAndIsrRequest`, the `LeaderAndIsr` and the `Partition`. Reviewers: Jason Gustafson <jason@confluent.io>, dengziming <dengziming1993@gmail.com>	2022-04-22 20:38:17 +02:00
José Armando García Sancio	4380eae7ce	MINOR; Fix partition change record noop check (#12073 ) When LeaderRecoveryState was added to the PartitionChangeRecord, the check for being a noop was not updated. This commit fixes that and improves the associated test to avoid this oversight in the future. Reviewers: Colin Patrick McCabe <cmccabe@apache.org>	2022-04-21 09:05:46 -07:00
ruanliang	252e09501d	KAFKA-13834: fix drain batch starving issue (#12066 ) In drainBatchesForOneNode method, there's possibility causing some partitions in a node will never get picked. Fix this issue by maintaining a drainIndex for each node. Reviewers: Luke Chen <showuon@gmail.com>, RivenSun <91005273+RivenSun2@users.noreply.github.com>	2022-04-21 19:26:55 +08:00
RivenSun	6b07f42ecd	MINOR: cleanup for postProcessAndValidateIdempotenceConfigs method (#12069 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-21 14:34:35 +08:00
RivenSun	cf5e714a8b	MINOR: ignore unused configuration when ConsumerCoordinator is not constructed (#12041 ) Following PR #11940, ignore unused config when ConsumerCoordinator is not constructed. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-04-14 17:30:43 -07:00
RivenSun	f49cff412d	MINOR: Remove redundant conditional judgments in Selector.clear() (#12048 ) Condition 'sendFailed' is always 'false' when reached. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-04-14 17:25:37 -07:00
David Arthur	55ff5d3603	KAFKA-13823 Feature flag changes from KIP-778 (#12036 ) This PR includes the changes to feature flags that were outlined in KIP-778. Specifically, it changes UpdateFeatures and FeatureLevelRecord to remove the maximum version level. It also adds dry-run to the RPC so the controller can actually attempt the upgrade (rather than the client). It introduces an upgrade type enum, which supersedes the allowDowngrade boolean. Because FeatureLevelRecord was unused previously, we do not need to introduce a new version. The kafka-features.sh tool was overhauled in KIP-778 and now includes the describe, upgrade, downgrade, and disable sub-commands. Refer to [KIP-778](https://cwiki.apache.org/confluence/display/KAFKA/KIP-778%3A+KRaft+Upgrades) for more details on the new command structure. Reviewers: Colin P. McCabe <cmccabe@apache.org>, dengziming <dengziming1993@gmail.com>	2022-04-14 10:04:32 -07:00
dengziming	87aa8259dd	KAFKA-13743: Prevent topics with conflicting metrics names from being created in KRaft mode #11910 In ZK mode, the topic "foo_bar" will conflict with "foo.bar" because of limitations in metric names. We should implement this in KRaft mode. This PR also changes TopicCommandIntegrationTest to support KRaft mode. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-04-13 11:59:29 -07:00
Jason Gustafson	f97646488c	KAFKA-13651; Add audit logging to `StandardAuthorizer` (#12031 ) This patch adds audit support through the kafka.authorizer.logger logger to StandardAuthorizer. It follows the same conventions as AclAuthorizer with a similarly formatted log message. When logIfAllowed is set in the Action, then the log message is at DEBUG level; otherwise, we log at trace. When logIfDenied is set, then the log message is at INFO level; otherwise, we again log at TRACE. Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-04-13 10:33:15 -07:00
David Jacot	4eeb707107	KAFKA-13828; Ensure reasons sent by the consumer are small (#12043 ) This PR reworks the reasons used in the ConsumerCoordinator to ensure that they remain reasonably short. Reviewers: Bruno Cadonna <bruno@confluent.io>	2022-04-13 13:42:27 +02:00
RivenSun	1df232c839	MINOR: Supplement the description of `Valid Values` in the documentation of `compression.type` (#11985 ) Because a validator is added to ProducerConfig.COMPRESSION_TYPE_CONFIG and KafkaConfig.CompressionTypeProp, the corresponding testCase is improved to verify whether the wrong value of compression.type will throw a ConfigException. Reviewers: Mickael Maison <mickael.maison@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-04-12 21:24:57 -07:00
RivenSun	4ad439c56d	MINOR: Change the log output information in the KafkaConsumer assign method (#12026 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-11 10:06:56 +08:00
Xiaoyue Xue	9596c7b9cf	KAFKA-13794: Follow up to fix producer batch comparator (#12006 ) In comparator, objects that are not equal need to have a stable order otherwise, binary search may not find the objects. Improve the producer batch comparator	2022-04-09 10:58:16 +08:00
Alok Nikhil	7a5f0cfaef	MINOR: Fix DescribeLogDirs API error handling for older API versions (#12017 ) With KAFKA-13527 / KIP-784 we introduced a new top-level error code for the DescribeLogDirs API for versions 3 and above. However, the change regressed the error handling for versions less than 3 since the response converter fails to write the non-zero error code out (rightly) for versions lower than 3 and drops the response to the client which eventually times out instead of receiving an empty log dirs response and processing that as a Cluster Auth failure. With this change, the API conditionally propagates the error code out to the client if the request API version is 3 and above. This keeps the semantics of the error handling the same for all versions and restores the behavior for older versions. See current behavior in the broker log: ```bash ERROR] 2022-04-08 01:22:56,406 [data-plane-kafka-request-handler-10] kafka.server.KafkaApis - [KafkaApi-0] Unexpected error handling request RequestHeader(apiKey=DESCRIBE_LOG_DIRS, apiVersion=0, clientId=sarama, correlationId=1) -- DescribeLogDirsRequestData(topics=null) org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 [ERROR] 2022-04-08 01:22:56,407 [data-plane-kafka-request-handler-10] kafka.server.KafkaRequestHandler - [Kafka Request Handler 10 on Broker 0], Exception when handling request org.apache.kafka.common.errors.UnsupportedVersionException: Attempted to write a non-default errorCode at version 0 ``` Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-04-08 12:54:09 -07:00
bozhao12	a2149c4178	MINOR: Fix method javadoc and typo in comments (#12007 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-07 18:16:12 +08:00
Xiaoyue Xue	e7cfbad04f	MINOR: Clean up for TransactionManager and RecordAccumulator (#11979 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-04-07 14:24:23 +08:00
Xavier Léauté	29a6979c54	KAFKA-6204 KAFKA-7402 ProducerInterceptor should implement AutoCloseable (#11997 ) As part of KIP-376 we had ConsumerInterceptor implement AutoCloseable but forgot to do the same for ProducerInterceptor. This fixes the inconsistency and also addresses KAFKA-6204 at the same time. Reviewers: John Roesler <vvcephei@apache.org>	2022-04-05 21:23:31 -05:00
Xiaoyue Xue	f0a2b62b0e	KAFKA-13794; Fix comparator of `inflightBatchesBySequence` in `TransactionManager` (#11991 ) Fixes a bug in the comparator used to sort producer inflight batches for a topic partition. This can cause batches in the map `inflightBatchesBySequence` to be removed incorrectly: i.e. one batch may be removed by another batch with the same sequence number. This leads to an `IllegalStateException` when the inflight request finally returns. This patch fixes the comparator to check equality of the `ProducerBatch` instances if the base sequences match. Reviewers: Jason Gustafson <jason@confluent.io>	2022-04-05 10:03:33 -07:00
Jason Gustafson	3ceedac79e	KAFKA-13782; Ensure correct partition added to txn after abort on full batch (#11995 ) Fixes a regression introduced in https://github.com/apache/kafka/pull/11452. Following [KIP-480](https://cwiki.apache.org/confluence/display/KAFKA/KIP-480%3A+Sticky+Partitioner), the `Partitioner` will receive a callback when a batch has been completed so that it can choose another partition. Because of this, we have to wait until the batch has been successfully appended to the accumulator before adding the partition in `TransactionManager.maybeAddPartition`. This is still safe because the `Sender` cannot dequeue a batch from the accumulator until it has been added to the transaction successfully. Reviewers: Artem Livshits <84364232+artemlivshits@users.noreply.github.com>, David Jacot <djacot@confluent.io>, Tom Bentley <tbentley@redhat.com>	2022-04-05 09:48:21 -07:00
yun-yun	481cc13a13	KAFKA-13791: Fix potential race condition in FetchResponse#`fetchData` and `forgottenTopics` (#11981 ) Fix FetchResponse#`fetchData` and `forgottenTopics`: Assignment of lazy-initialized members should be the last step with double-checked locking Reviewers: Luke Chen <showuon@gmail.com>	2022-04-05 15:27:32 +08:00
bozhao12	02a465b090	MINOR: fix typo in FetchRequest.json (#11988 ) Reviewers: David Jacot <djacot@confluent.io>	2022-04-04 09:07:58 +02:00
Colin Patrick McCabe	62ea4c46a9	KAFKA-13749: CreateTopics in KRaft must return configs (#11941 ) Previously, when in KRaft mode, CreateTopics did not return the active configurations for the topic(s) it had just created. This PR addresses that gap. We will now return these topic configuration(s) when the user has DESCRIBE_CONFIGS permission. (In the case where the user does not have this permission, we will omit the configurations and set TopicErrorCode. We will also omit the number of partitions and replication factor data as well.) For historical reasons, we use different names to refer to each topic configuration when it is set in the broker context, as opposed to the topic context. For example, the topic configuration "segment.ms" corresponds to the broker configuration "log.roll.ms". Additionally, some broker configurations have synonyms. For example, the broker configuration "log.roll.hours" can be used to set the log roll time instead of "log.roll.ms". In order to track all of this, this PR adds a table in LogConfig.scala which maps each topic configuration to an ordered list of ConfigSynonym classes. (This table is then passed to KafkaConfigSchema as a constructor argument.) Some synonyms require transformations. For example, in order to convert from "log.roll.hours" to "segment.ms", we must convert hours to milliseconds. (Note that our assumption right now is that topic configurations do not have synonyms, only broker configurations. If this changes, we will need to add some logic to handle it.) This PR makes the 8-argument constructor for ConfigEntry public. We need this in order to make full use of ConfigEntry outside of the admin namespace. This change is probably inevitable in general since otherwise we cannot easily test the output from various admin APIs in junit tests outside the admin package. Testing: This PR adds PlaintextAdminIntegrationTest#testCreateTopicsReturnsConfigs. This test validates some of the configurations that it gets back from the call to CreateTopics, rather than just checking if it got back a non-empty map like some of the existing tests. In order to test the configuration override logic, testCreateDeleteTopics now sets up some custom static and dynamic configurations. In QuorumTestHarness, we now allow tests to configure what the ID of the controller should be. This allows us to set dynamic configurations for the controller in testCreateDeleteTopics. We will have a more complete fix for setting dynamic configuations on the controller later. This PR changes ConfigurationControlManager so that it is created via a Builder. This will make it easier to add more parameters to its constructor without having to update every piece of test code that uses it. It will also make the test code easier to read. Reviewers: David Arthur <mumrah@gmail.com>	2022-04-01 10:50:25 -07:00
David Jacot	ce7788aada	KAFKA-13783; Remove reason prefixing in JoinGroupRequest and LeaveGroupRequest (#11971 ) KIP-800 introduced a mechanism to pass a reason in the join group request and in the leave group request. A default reason is used unless one is provided by the user. In this case, the custom reason is prefixed by the default one. When we tried to used this in Kafka Streams, we noted a significant degradation of the performances, see https://github.com/apache/kafka/pull/11873. It is not clear wether the prefixing is the root cause of the issue or not. To be on the safe side, I think that we should remove the prefixing. It does not bring much anyway as we are still able to distinguish a custom reason from the default one on the broker side. This patch removes prefixing the user provided reasons. So if a the user provides a reason, the reason is used directly. If the reason is empty or null, the default reason is used. Reviewers: Luke Chen <showuon@gmail.com>, <jeff.kim@confluent.io>, Hao Li <hli@confluent.io>	2022-03-31 14:31:31 +02:00
Xiaobing Fang	8965240da3	MINOR: Fix doc variable typos in `TopicConfig` (#11972 ) Reviewers: Luke Chen <showuon@gmail.com>	2022-03-31 09:54:42 +08:00
yun-yun	366b998a22	KAFKA-13777: Fix potential FetchResponse#responseData race condition issue (#11963 ) In Fix FetchResponse#responseData, we did a double-checked lock for the responseData, but the assignment of lazy-initialized object(responseData) didn't assign in the last step, which would let other threads get the partial object. Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-03-31 09:45:33 +08:00
Ismael Juma	5aed178048	KAFKA-13418: Support key updates with TLS 1.3 (#11966 ) Key updates with TLS 1.3 trigger code paths similar to renegotiation with TLS 1.2. Update the read/write paths not to throw an exception in this case (kept the exception in the `handshake` method). With the default configuration, key updates happen after 2^37 bytes are encrypted. There is a security property to adjust this configuration, but the change has to be done before it is used for the first time and it cannot be changed after that. As such, it is best done via a system test (filed KAFKA-13779). To validate the change, I wrote a unit test that forces key updates and manually ran a producer workload that produced more than 2^37 bytes. Both cases failed without these changes and pass with them. Note that Shylaja Kokoori attached a patch with the SslTransportLayer fix and hence included them as a co-author of this change. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com> Co-authored-by: Shylaja Kokoori	2022-03-29 14:59:38 -07:00
Levani Kokhreidze	35ae4f248b	KAFKA-6718: Add documentation for KIP-708 (#11923 ) Adds documentation for KIP-708: Rack awareness for Kafka Streams Co-authored-by: Bruno Cadonna <cadonna@apache.org> Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-29 14:08:51 +02:00
Rohan	01533e3dd7	KAFKA-13692: include metadata wait time in total blocked time (#11805 ) This patch includes metadata wait time in total blocked time. First, this patch adds a new metric for total producer time spent waiting on metadata, called metadata-wait-time-ms-total. Then, this time is included in the total blocked time computed from StreamsProducer. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-24 09:55:26 -07:00
RivenSun	c5bc2688da	KAFKA-13689: optimize the log output of logUnused method (#11940 ) Optimize the log output of logUnused method. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-24 09:29:39 -07:00
xuexiaoyue	a44e1ed449	MINOR: Fix typos in `TransactionManager` (#11924 ) Reviewers: Kvicii <Karonazaba@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-22 13:55:54 +01:00
Idan Kamara	eddb98df67	MINOR: Fix class comparison in `AlterConfigPolicy.RequestMetadata.equals()` (#11900 ) This patch fixes a bug in the `AlterConfigPolicy.RequestMetadata.equals` method where we were not comparing the class correctly. Co-authored-by: David Jacot <djacot@confluent.io> Reviewers: David Jacot <djacot@confluent.io>	2022-03-22 09:45:04 +01:00
Xiaobing Fang	be4ef3df42	KAFKA-13752: Uuid compare using equals in java (#11912 ) This patch fixes a few cases where we use `==` instead of `equals` to compare UUID. The impact of this bug is low because `Uuid.ZERO_UUID` is used by default everywhere. Reviewers: Justine Olshan <jolshan@confluent.io>, dengziming <dengziming1993@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-22 09:31:46 +01:00
Luke Chen	3a8f6b17a6	KAFKA-7540: commit offset sync before close (#11898 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-21 16:51:21 +08:00
José Armando García Sancio	52621613fd	KAFKA-13587; Implement leader recovery for KIP-704 (#11733 ) Implementation of the protocol for starting and stopping leader recovery after an unclean leader election. This includes the management of state in the controllers (legacy and KRaft) and propagating this information to the brokers. This change doesn't implement log recovery after an unclean leader election. Protocol Changes ================ For the topic partition state znode, the new field "leader_recovery_state" was added. If the field is missing the value is assumed to be RECOVERED. ALTER_PARTITION was renamed from ALTER_ISR. The CurrentIsrVersion field was renamed to PartitionEpoch. The new field LeaderRecoveryState was added. The new field LeaderRecoverState was added to the LEADER_AND_ISR request. The inter broker protocol version is used to determine which version to send to the brokers. A new tagged field for LeaderRecoveryState was added to both the PartitionRecord and PartitionChangeRecord. Controller ========== For both the KRaft and legacy controller the LeaderRecoveryState is set to RECOVERING, if the leader was elected out of the ISR, also known as unclean leader election. The controller sets the state back to RECOVERED after receiving an ALTER_PARTITION request with version 0, or with version 1 and with the LeaderRecoveryState set to RECOVERED. Both controllers preserve the leader recovery state even if the unclean leader goes offline and comes back online before an RECOVERED ALTER_PARTITION is sent. The controllers reply with INVALID_REQUEST if the ALTER_PARTITION either: 1. Attempts to increase the ISR while the partition is still RECOVERING 2. Attempts to change the leader recovery state to RECOVERING from a RECOVERED state. Topic Partition Leader ====================== The topic partition leader doesn't implement any log recovery in this change. The topic partition leader immediately marks the partition as RECOVERED and sends that state in the next ALTER_PARTITION request. Reviewers: Jason Gustafson <jason@confluent.io>	2022-03-18 09:24:11 -07:00
Jules Ivanic	03641e6a28	MINOR: Fix `ConsumerConfig.ISOLATION_LEVEL_DOC` (#11915 ) Reviewers: David Jacot <djacot@confluent.io>	2022-03-18 09:20:08 +01:00
Levani Kokhreidze	b68463c250	KAFKA-6718 / Add rack awareness configurations to StreamsConfig (#11837 ) This PR is part of KIP-708 and adds rack aware standby task assignment logic. Rack aware standby task assignment won't be functional until all parts of this KIP gets merged. Splitting PRs into three smaller PRs to make the review process easier to follow. Overall plan is the following: ⏭️ Rack aware standby task assignment logic #10851 ⏭️ Protocol change, add clientTags to SubscriptionInfoData #10802 👉 Add required configurations to StreamsConfig (public API change, at this point we should have full functionality) This PR implements last point of the above mentioned plan. Reviewers: Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>	2022-03-16 18:02:24 +01:00
David Arthur	5c1dd493d6	Don't generate Uuid with a leading "-" (#11901 )	2022-03-16 11:54:02 -04:00
Jason Gustafson	76d287c967	KAFKA-13727; Preserve txn markers after partial segment cleaning (#11891 ) It is possible to clean a segment partially if the offset map is filled before reaching the end of the segment. The highest offset that is reached becomes the new dirty offset after the cleaning completes. The data above this offset is nevertheless copied over to the new partially cleaned segment. Hence we need to ensure that the transaction index reflects aborted transactions from both the cleaned and uncleaned portion of the segment. Prior to this patch, this was not the case. We only collected the aborted transactions from the cleaned portion, which means that the reconstructed index could be incomplete. This can cause the aborted data to become effectively committed. It can also cause the deletion of the abort marker before the corresponding data has been removed (i.e. the aborted transaction becomes hanging). Reviewers: Jun Rao <junrao@gmail.com>	2022-03-15 12:26:23 -07:00
Paolo Patierno	418b122150	MINOR: Improve producer Javadoc about send with acks = 0 (#11882 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-03-15 12:08:10 +01:00
RivenSun	84b41b9d3a	KAFKA-13689: Revert AbstractConfig code changes (#11863 ) Reviewer: Luke Chen <showuon@gmail.com>	2022-03-10 10:54:10 +08:00
Vincent Jiang	798275f254	KAFKA-13717: skip coordinator lookup in commitOffsetsAsync if offsets is empty (#11864 ) Reviewer: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-03-10 10:52:05 +08:00
Philip Nee	28393be6d7	KAFKA-12879: Revert changes from KAFKA-12339 and instead add retry capability to KafkaBasedLog (#11797 ) Fixes the compatibility issue regarding KAFKA-12879 by reverting the changes to the admin client from KAFKA-12339 (#10152) that retry admin client operations, and instead perform the retries within Connect's `KafkaBasedLog` during startup via a new `TopicAdmin.retryEndOffsets(..)` method. This method delegates to the existing `TopicAdmin.endOffsets(...)` method, but will retry on `RetriableException` until the retry timeout elapses. This change should be backward compatible to the KAFKA-12339 so that when Connect's `KafkaBasedLog` starts up it will retry attempts to read the end offsets for the log's topic. The `KafkaBasedLog` existing thread already has its own retry logic, and this is not changed. Added more unit tests, and thoroughly tested the new `RetryUtil` used to encapsulate the parameterized retry logic around any supplied function.	2022-03-09 12:39:28 -06:00
Jason Koch	2367c8994b	KAFKA-13630: Reduce amount of time that producer network thread holds batch queue lock (#11722 ) Hold the `deque` lock for only as long as is required to collect and make a decision in `ready()` and `drain()` loops. Once this is done, remaining work can be done without lock, so release it. This allows producers to continue appending. For an application with with a single producer thread and a high send() rate, this change reduces spinlock CPU cycles from 14.6% to 2.5% of the send() path, or more clearly a 12.1% improvement in efficiency for the send() path by reducing the duration of contention events with the network thread. Note that this application was executed with Java 8, which has a slower crc32c implementation. Reviewers: Luke Chen <showuon@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Artem Livshits <84364232+artemlivshits@users.noreply.github.com>	2022-03-09 05:41:06 -08:00
Adam Kotwasinski	add11eed75	MINOR: Correct logging and Javadoc in FetchSessionHandler (#11843 ) Reviewers: Justine Olshan <jolshan@confluent.io>, David Jacot <david.jacot@gmail.com>, Luke Chen <showuon@gmail.com>	2022-03-09 16:51:26 +08:00
RivenSun	0dac4b4267	KAFKA-13689: printing unused and unknown logs separately (#11800 ) Differentiate between unused and unknown configs during log output. Reviewer: Luke Chen <showuon@gmail.com>	2022-03-05 16:08:14 +08:00
RivenSun	3be978464c	KAFKA-13694: Log more specific information when the verification record fails on brokers. (#11830 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-03-04 10:45:44 -08:00
Vincent Jiang	95dbba9fe5	KAFKA-13706: Remove closed connections from MockSelector.ready (#11839 ) Reviewers: David Jacot <djacot@confluent.io>	2022-03-04 09:51:53 +01:00
Luke Chen	7c280c1d5f	KAFKA-13673: disable idempotence when config conflicts (#11788 ) Disable idempotence when conflicting config values for acks, retries and max.in.flight.requests.per.connection are set by the user. For the former two configs, we log at info level when we disable idempotence due to conflicting configs. For the latter, we log at warn level since it's due to an implementation detail that is likely to be surprising. This mitigates compatibility impact of enabling idempotence by default. Added unit tests to verify the change in behavior. Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Mickael Maison <mickael.maison@gmail.com>	2022-03-03 05:40:41 -08:00
Chris Egerton	066cdc8c62	KAFKA-10000: Add producer fencing API to admin client (KIP-618) (#11777 ) * KAFKA-10000: Add producer fencing API to admin client Reviewers: Luke Chen <showuon@gmail.com>, Tom Bentley <tbentley@redhat.com>	2022-03-03 10:27:17 +00:00
Zhang Hongyi	15ebad54b4	MINOR: Skip fsync on parent directory to start Kafka on ZOS (#11793 ) Reviewers: Cong Ding <cong@ccding.com>, Jun Rao <junrao@gmail.com>	2022-02-24 13:26:23 -08:00
Chris Egerton	6f09c3f88b	KAFKA-10000: Utils methods for overriding user-supplied properties and dealing with Enum types (#11774 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-02-23 14:49:30 +01:00
RivenSun	219a446feb	MINOR: Optimize the matches method of AccessControlEntryFilter (#11768 ) * MINOR: Make the getters in match method in AccessControlEntryFilter consistency Reviewers: Luke Chen <showuon@gmail.com>	2022-02-17 20:56:57 +08:00
keashem	1c7cd79792	MINOR: Fix sensor removal assertion in `MetricTest.testRemoveInactiveMetrics` (#11755 ) The test case `MetricTest.testRemoveInactiveMetrics` attempts to test removal of inactive sensors, but one of the assertions is checking the wrong sensor name ("test.s1" instead of "test.s2"). The patch fixes the assertion to use the right sensor name. Reviewers: Jason Gustafson <jason@confluent.io> Co-authored-by: zhonghou3 <zhonghou3@jd.com>	2022-02-15 18:36:13 -08:00
dengziming	dd36331a81	MINOR: Enable kraft in ApiVersionTest (#11667 ) This patch enables `ApiVersionsTest` to test both kraft brokers and controllers. It fixes a minor bug in which the `Envelope` request to be exposed from `ApiVersions` requests to the kraft broker. Reviewers: Jason Gustafson <jason@confluent.io>	2022-02-15 10:16:03 -08:00
David Jacot	c8fbe26f3b	KAFKA-13435; Static membership protocol should let the leader skip assignment (KIP-814) (#11688 ) This patch implements KIP-814 as described here: https://cwiki.apache.org/confluence/display/KAFKA/KIP-814%3A+Static+membership+protocol+should+let+the+leader+skip+assignment. Reviewers: Luke Chen <showuon@gmail.com>, Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>	2022-02-14 11:55:38 +01:00
Matthias J. Sax	40b261f082	MINOR: improve logging (#11584 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Luke Chen <showuon@gmail.com>	2022-02-11 10:50:44 -08:00
Vijay Krishna	368f4cee7a	KAFKA-13323; Fixed variable name in KafkaConsumer (#11558 ) Fixes a misspelled variable name in `KafkaConsumer`: `cachedSubscriptionHashAllFetchPositions` -> `cachedSubscriptionHasAllFetchPositions`. Reviewers: Kvicii <Karonazaba@gmail.com>, Jason Gustafson <jason@confluent.io>	2022-02-10 14:13:43 -08:00
Colin Patrick McCabe	d35283f011	KAFKA-13646; Implement KIP-801: KRaft authorizer (#11649 ) Currently, when using KRaft mode, users still have to have an Apache ZooKeeper instance if they want to use AclAuthorizer. We should have a built-in Authorizer for KRaft mode that does not depend on ZooKeeper. This PR introduces such an authorizer, called StandardAuthorizer. See KIP-801 for a full description of the new Authorizer design. Authorizer.java: add aclCount API as described in KIP-801. StandardAuthorizer is currently the only authorizer that implements it, but eventually we may implement it for AclAuthorizer and others as well. ControllerApis.scala: fix a bug where createPartitions was authorized using CREATE on the topic resource rather than ALTER on the topic resource as it should have been. QuorumTestHarness: rename the controller endpoint to CONTROLLER for consistency (the brokers already called it that). This is relevant in AuthorizerIntegrationTest where we are examining endpoint names. Also add the controllerServers call. TestUtils.scala: adapt the ACL functions to be usable from KRaft, by ensuring that they use the Authorizer from the current active controller. BrokerMetadataPublisher.scala: add broker-side ACL application logic. Controller.java: add ACL APIs. Also add a findAllTopicIds API in order to make junit tests that use KafkaServerTestHarness#getTopicNames and KafkaServerTestHarness#getTopicIds work smoothly. AuthorizerIntegrationTest.scala: convert over testAuthorizationWithTopicExisting (more to come soon) QuorumController.java: add logic for replaying ACL-based records. This means storing them in the new AclControlManager object, and integrating them into controller snapshots. It also means applying the changes in the Authorizer, if one is configured. In renounce, when reverting to a snapshot, also set newBytesSinceLastSnapshot to 0. Reviewers: YeonCheol Jang <YeonCheolGit@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-09 10:38:52 -08:00
RivenSun	4b468a9d81	KAFKA-13310 : KafkaConsumer cannot jump out of the poll method, and the… (#11340 ) Title: KafkaConsumer cannot jump out of the poll method, and cpu and traffic on the broker side increase sharply description: The local test has been passed, the problem described by jira can be solved JIRA link : https://issues.apache.org/jira/browse/KAFKA-13310 Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-02-08 23:05:42 -08:00
Ismael Juma	7c2d672413	MINOR: Update library dependencies (Q1 2022) (#11306 ) - scala 2.13: 2.13.6 -> 2.13.8 * Support Java 18 and improve Android compatibility * https://www.scala-lang.org/news/2.13.7 * https://www.scala-lang.org/news/2.13.8 - scala 2.12: 2.12.14 -> 2.12.15. * The `-release` flag now works with Scala 2.12, backend parallelism can be enabled via `-Ybackend-parallelism N` and string interpolation is more efficient. * https://www.scala-lang.org/news/2.12.5 - gradle versions plugin: 0.38.0 -> 0.42.0 * Minor fixes * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.40.0 * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.41.0 * https://github.com/ben-manes/gradle-versions-plugin/releases/tag/v0.42.0 - gradle dependency check plugin: 6.1.6 -> 6.5.3 * Minor fixes - gradle spotbugs plugin: 4.7.1 -> 5.0.5 * Fixes and minor improvements * There were too many releases to include all the links, include the major version bump * https://github.com/spotbugs/spotbugs-gradle-plugin/releases/tag/5.0.0 - gradle scoverage plugin: 5.0.0 -> 7.0.0 * Support newer Gradle versions and other improvements * https://github.com/scoverage/gradle-scoverage/releases/tag/6.0.0 * https://github.com/scoverage/gradle-scoverage/releases/tag/6.1.0 * https://github.com/scoverage/gradle-scoverage/releases/tag/7.0.0 - gradle shadow plugin: 7.0.0 -> 7.1.2 * Support gradle toolchains and security fixes * https://github.com/johnrengelman/shadow/releases/tag/7.1.0 * https://github.com/johnrengelman/shadow/releases/tag/7.1.1 * https://github.com/johnrengelman/shadow/releases/tag/7.1.2 - bcpkix: 1.66 -> 1.70 * Several improvements and fixes * https://www.bouncycastle.org/releasenotes.html - jline: 3.12.1 -> 3.21.0 * Various fixes and improvements - jmh: 1.32 -> 1.34 * Compiler blackhole enabled by default when using Java 17 and improved gradle incremental compilation * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-August/003355.html * https://mail.openjdk.java.net/pipermail/jmh-dev/2021-December/003406.html - scalaLogging: 3.9.3 -> 3.9.4 * Support for Scala 3.0 - jose4j: 0.7.8 -> 0.7.9 * Minor fixes - junit: 5.7.1 -> 5.8.2 * Minor improvements and fixes * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.0 * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.1 * https://junit.org/junit5/docs/current/release-notes/index.html#release-notes-5.8.2 - jqwik: 1.5.0 -> 1.6.3 * Numerous improvements * https://github.com/jlink/jqwik/releases/tag/1.6.0 - mavenArtifact: 3.8.1 -> 3.8.4 - mockito: 3.12.4 -> 4.3.1 * Removed deprecated methods, `DoNotMock` annotation and minor fixes/improvements * https://github.com/mockito/mockito/releases/tag/v4.0.0 * https://github.com/mockito/mockito/releases/tag/v4.1.0 * https://github.com/mockito/mockito/releases/tag/v4.2.0 * https://github.com/mockito/mockito/releases/tag/v4.3.0 - scalaCollectionCompat: 2.4.4 -> 2.6.0 * Minor fixes * https://github.com/scala/scala-collection-compat/releases/tag/v2.5.0 * https://github.com/scala/scala-collection-compat/releases/tag/v2.6.0 - scalaJava8Compat: 1.0.0 -> 1.0.2 * Minor changes - scoverage: 1.4.1 -> 1.4.11 * Support for newer Scala versions - slf4j: 1.7.30 -> 1.7.32 * Minor fixes, 1.7.35 automatically uses reload4j and 1.7.33/1.7.34 cause build failures, so we stick with 1.7.32 for now. - zstd: 1.5.0-4 -> 1.5.2-1 * zstd 1.5.2 * Small refinements and performance improvements * https://github.com/facebook/zstd/releases/tag/v1.5.1 * https://github.com/facebook/zstd/releases/tag/v1.5.2 Checkstyle, spotBugs and spotless will be upgraded separately as they either require non trivial code changes or they have regressions that affect us. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2022-02-07 15:24:50 -08:00
Wenhao Ji	f1fdd31e8f	KAFKA-7572: Producer should not send requests with negative partition id (#10525 ) This PR is for KAFKA-7572, which fixes the issue that producers will throw confusing exceptions when a custom Partitioner returns a negative partition. Since the PR #5858 is not followed by anyone currently, I reopen this one to continue the work. Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-02-07 09:53:10 -08:00
Ismael Juma	b125a06c46	MINOR: Use CRC32 from standard library and remove custom implementation (#11736 ) We only use it in the legacy record formats (V0 and V1) and the CRC32 implementation in the standard library has received various performance improvements over the years (https://bugs.openjdk.java.net/browse/JDK-8245512 is a recent example). Also worth noting that record formats V0 and V1 have been deprecated since Apache Kafka 3.0. Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Kvicii <Karonazaba@gmail.com>	2022-02-07 06:42:30 -08:00
Luke Chen	ca5d6f9229	KAFKA-13563: clear FindCoordinatorFuture for non consumer group mode (#11631 ) After KAFKA-10793, we clear the findCoordinatorFuture in 2 places: 1. heartbeat thread 2. AbstractCoordinator#ensureCoordinatorReady But in non consumer group mode with group id provided (for offset commitment. So that there will be consumerCoordinator created), there will be no (1)heartbeat thread , and it only call (2)AbstractCoordinator#ensureCoordinatorReady when 1st time consumer wants to fetch committed offset position. That is, after 2nd lookupCoordinator call, we have no chance to clear the findCoordinatorFuture , and causes the offset commit never succeeded. To avoid the race condition as KAFKA-10793 mentioned, it's not safe to clear the findCoordinatorFuture in the future listener. So, I think we can fix this issue by calling AbstractCoordinator#ensureCoordinatorReady when coordinator unknown in non consumer group case, under each ConsumerCoordinator#poll. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2022-02-06 15:07:59 -08:00
Jason Koch	c05403f47f	KAFKA-13629: Use faster algorithm for ByteUtils sizeOfXxx algorithm (#11721 ) Replace loop with a branch-free implementation. Include: - Unit tests that includes old code and new code and runs through several ints/longs. - JMH benchmark that compares old vs new performance of algorithm. JMH results with JDK 17.0.2 and `compiler` blackhole mode are 2.8-3.4 faster with the new implementation. In a real application, a 6% reduction in CPU cycles was observed in the `send()` path via flamegraphs. ``` ByteUtilsBenchmark.testSizeOfUnsignedVarint thrpt 4 1472440.102 ± 67331.797 ops/ms ByteUtilsBenchmark.testSizeOfUnsignedVarint:·async thrpt NaN --- ByteUtilsBenchmark.testSizeOfUnsignedVarint:·gc.alloc.rate thrpt 4 ≈ 10⁻⁴ MB/sec ByteUtilsBenchmark.testSizeOfUnsignedVarint:·gc.alloc.rate.norm thrpt 4 ≈ 10⁻⁷ B/op ByteUtilsBenchmark.testSizeOfUnsignedVarint:·gc.count thrpt 4 ≈ 0 counts ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple thrpt 4 521333.117 ± 595169.618 ops/ms ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple:·async thrpt NaN --- ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple:·gc.alloc.rate thrpt 4 ≈ 10⁻⁴ MB/sec ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple:·gc.alloc.rate.norm thrpt 4 ≈ 10⁻⁶ B/op ByteUtilsBenchmark.testSizeOfUnsignedVarintSimple:·gc.count thrpt 4 ≈ 0 counts ByteUtilsBenchmark.testSizeOfVarlong thrpt 4 1106519.633 ± 16556.502 ops/ms ByteUtilsBenchmark.testSizeOfVarlong:·async thrpt NaN --- ByteUtilsBenchmark.testSizeOfVarlong:·gc.alloc.rate thrpt 4 ≈ 10⁻⁴ MB/sec ByteUtilsBenchmark.testSizeOfVarlong:·gc.alloc.rate.norm thrpt 4 ≈ 10⁻⁶ B/op ByteUtilsBenchmark.testSizeOfVarlong:·gc.count thrpt 4 ≈ 0 counts ByteUtilsBenchmark.testSizeOfVarlongSimple thrpt 4 324435.607 ± 147754.813 ops/ms ByteUtilsBenchmark.testSizeOfVarlongSimple:·async thrpt NaN --- ByteUtilsBenchmark.testSizeOfVarlongSimple:·gc.alloc.rate thrpt 4 ≈ 10⁻⁴ MB/sec ByteUtilsBenchmark.testSizeOfVarlongSimple:·gc.alloc.rate.norm thrpt 4 ≈ 10⁻⁶ B/op ByteUtilsBenchmark.testSizeOfVarlongSimple:·gc.count thrpt 4 ≈ 0 counts ``` Reviewers: Ismael Juma <ismael@juma.me.uk>, Artem Livshits	2022-02-06 13:36:44 -08:00
Luke Chen	e6db0ca48c	KAFKA-13598: enable idempotence producer by default and validate the configs (#11691 ) In v3.0, we changed the default value for `enable.idempotence` to true, but we didn't adjust the validator and the `idempotence` enabled check method. So if a user didn't explicitly enable idempotence, this feature won't be turned on. This patch addresses the problem, cleans up associated logic, and fixes tests that broke as a result of properly applying the new default. Specifically it does the following: 1. fix the `ProducerConfig#idempotenceEnabled` method, to make it correctly detect if `idempotence` is enabled or not 2. remove some unnecessary config overridden and checks due to we already default `acks`, `retries` and `enable.idempotence` configs. 3. move the config validator for the idempotent producer from `KafkaProducer` into `ProducerConfig`. The config validation should be the responsibility of `ProducerConfig` class. 4. add an `AbstractConfig#hasKeyInOriginals` method, to avoid `originals` configs get copied and only want to check the existence of the key. 5. fix many broken tests. As mentioned, we didn't actually enable idempotence in v3.0. After this PR, there are some tests broken due to some different behavior between idempotent and non-idempotent producer. 6. add additional tests to validate configuration behavior Reviewers: Kirk True <kirk@mustardgrain.com>, Ismael Juma <ismael@juma.me.uk>, Mickael Maison <mimaison@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-05 10:53:27 -08:00
Matthew Wong	17dcb8097c	MINOR: Update documentation and DumpLogSegments tool for addition of `deleteHorizonMs` in batch format (#11694 ) This PR updates the documentation and tooling to match https://github.com/apache/kafka/pull/10914, which added support for encoding `deleteHorizonMs` in the record batch schema. The changes include adding the new attribute and updating field names. We have also updated stale references to the old `FirstTimestamp` field in the code and comments. Finally, In the `DumpLogSegments` tool, when record batch information is printed, it will also include the value of `deleteHorizonMs` is (e.g. `OptionalLong.empty` or `OptionalLong[123456]`). Reviewers: Vincent Jiang <84371940+vincent81jiang@users.noreply.github.com>, Kvicii <42023367+Kvicii@users.noreply.github.com>, Jason Gustafson <jason@confluent.io>	2022-02-04 16:20:37 -08:00
Kvicii	23771cd27c	MINOR: Fix AbstractStickyAssignor doc (#11727 ) Reviewers: David Jacot <djacot@confluent.io>	2022-02-03 10:54:54 +01:00
dengziming	af6a9a17bf	KAFKA-13637: Use default.api.timeout.ms as default timeout value for KafkaConsumer.endOffsets (#11726 ) We introduced `default.api.timeout.ms` in `53ca52f855` but we missed updating `KafkaConsumer.endOffsets` which still use `request.timeout.ms`. This patch fixes this. Reviewers: David Jacot <djacot@confluent.io>	2022-02-03 10:32:25 +01:00
Philip Nee	319732dbeb	KAFKA-12841: Fix producer callback handling when partition is missing (#11689 ) Sometimes, the Kafka producer encounters an error prior to selecting a topic partition. In this case, we would like to acknowledge the failure in the producer interceptors, if any are configured. We should also pass a non-null Metadata object to the producer callback, if there is one. This PR implements that behavior. It also updates the JavaDoc to clarify that if a partition cannot be selected, we will pass back a partition id of -1 in the metadata. This is in keeping with KAFKA-3303. Co-authors: Kirk True <kirk@mustardgrain.com> Reviewers: Colin P. McCabe <cmccabe@apache.org>	2022-02-02 16:03:32 -08:00
Kirk True	9f42739dd3	KAFKA-13558: NioEchoServer fails to close resources (#11618 ) Due to resource leaks in the NioEchoServer, at times it won't start properly to accept clients and will throw an exception in the ServerSocketChannel.accept() call. Previous to this change, the error was not being logged. The logged error was that there were too many open files. Using the UnixOperatingSystemMXBean, I was able to detect that use of the NioEchoServer creates several FDs but does not close them. This then caused the client to never be able to connect to the server, so the waitForCondition failed intermittently. This change closes the internal Selector and the AcceptorThread's selector so that the file descriptors are reclaimed. Reviewers: Ismael Juma <ismael@juma.me.uk>	2022-02-02 06:11:56 -08:00
Mickael Maison	31fca1611a	KAFKA-13527: Add top-level error code field to DescribeLogDirsResponse (#11599 ) Implements KIP-784: https://cwiki.apache.org/confluence/display/KAFKA/KIP-784%3A+Add+top-level+error+code+field+to+DescribeLogDirsResponse Reviewers: David Jacot <djacot@confluent.io>, Tom Bentley <tbentley@redhat.com>	2022-02-01 18:53:30 +01:00
工业废水	31081e6ec4	MINOR: Replace statement lambda with expression lambda (#11723 ) Reviewers: Kvicii <Karonazaba@gmail.com>, David Jacot <djacot@confluent.io>	2022-02-01 11:03:43 +01:00
Chia-Ping Tsai	98e8608646	MINOR: remove redundant argument in logging (#11719 ) Reviewers: David Jacot <djacot@confluent.io>, Luke Chen <showuon@gmail.com>	2022-01-30 23:15:58 +08:00
Chris Egerton	000ba031c3	KAFKA-9279: Fail producer transactions for asynchronously-reported, synchronously-encountered ApiExceptions (#11508 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>	2022-01-26 11:51:10 +01:00
Colin Patrick McCabe	68a19539cf	KAFKA-13552: Fix BROKER and BROKER_LOGGER in KRaft (#11657 ) Currently, KRaft does not support setting BROKER_LOGGER configs (it always fails.) Additionally, there are several bugs in the handling of BROKER configs. They are not properly validated on the forwarding broker, and the way we apply them is buggy as well. This PR fixes those issues. KafkaApis: add support for doing validation and log4j processing on the forwarding broker. This involves breaking the config request apart and forwarding only part of it. Adjust KafkaApisTest to test the new behavior, rather than expecting forwarding of the full request. MetadataSupport: remove MetadataSupport#controllerId since it duplicates the functionality of MetadataCache#controllerId. Add support for getResourceConfig and maybeForward. ControllerApis: log an error message if the handler throws an exception, just like we do in KafkaApis. ControllerConfigurationValidator: add JavaDoc. Move some functions that don't involve ZK from ZkAdminManager to DynamicConfigManager. Move some validation out of ZkAdminManager and into a new class, ConfigAdminManager, which is not tied to ZK. ForwardingManager: add support for sending new requests, rather than just forwarding existing requests. BrokerMetadataPublisher: do not try to apply dynamic configurations for brokers other than the current one. Log an INFO message when applying a new dynamic config, like we do in ZK mode. Also, invoke reloadUpdatedFilesWithoutConfigChange when applying a new non-default BROKER config. QuorumController: fix a bug in ConfigResourceExistenceChecker which prevented cluster configs from being set. Add a test for this class. Reviews: José Armando García Sancio <jsancio@users.noreply.github.com>	2022-01-21 17:00:21 -07:00
David Jacot	33a004ab62	KAFKA-13388; Kafka Producer nodes stuck in CHECKING_API_VERSIONS (#11671 ) At the moment, the `NetworkClient` will remain stuck in the `CHECKING_API_VERSIONS` state forever if the `Channel` does not become ready. To prevent this from happening, this patch changes the logic to transition to the `CHECKING_API_VERSIONS` only when the `ApiVersionsRequest` is queued to be sent out. With this, the connection will timeout if the `Channel` does not become ready within the connection setup timeout. Once the `ApiVersionsRequest` is queued up, the request timeout takes over. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2022-01-21 17:44:56 +01:00
Jason Gustafson	df2236d73b	KAFKA-13412; Ensure initTransactions() safe for retry after timeout (#11452 ) If the user's `initTransactions` call times out, the user is expected to retry. However, the producer will continue retrying the `InitProducerId` request in the background. If it happens to return before the user retry of `initTransactions`, then the producer will raise an exception about an invalid state transition. The patch fixes the issue by tracking the pending state transition until the user has acknowledged the operation's result. In the case of `initTransactions`, even if the `InitProducerId` returns in the background and the state changes, we can still retry the `initTransactions` call to obtain the result. Reviewers: David Jacot <djacot@confluent.io>	2022-01-19 13:20:41 -08:00
Jeff Kim	c45cf35d30	MINOR: Remove version check when setting reason for `Join/LeaveGroupRequest` in `RequestResponseTest` (#11680 ) This patch ensures that the `Reason` field can be set for all versions (ignorable field). Reviewers: David Jacot <djacot@confluent.io>	2022-01-17 10:20:53 +01:00
Jeff Kim	30c9087d74	MINOR: Make JoinGroupRequest and LeaveGroupRequest 'Reason' field ignorable (KIP-800) (#11679 ) The `Reason` field must be ignorable otherwise new client does not work with older brokers. Reviewers: David Jacot <djacot@confluent.io>	2022-01-14 10:32:07 +01:00
Jeff Kim	bf609694f8	KAFKA-13496: Add reason to LeaveGroupRequest (KIP-800) (#11571 ) This patch adds a `reason` field to the `LeaveGroupRequest` as specified in KIP-800: https://cwiki.apache.org/confluence/display/KAFKA/KIP-800%3A+Add+reason+to+JoinGroupRequest+and+LeaveGroupRequest. Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-01-13 16:06:11 +01:00
Jeff Kim	69645f1fe5	KAFKA-13495: Add reason to JoinGroupRequest (KIP-800) (#11566 ) This patch adds a `reason` field to the `JoinGroupRequest` as specified in KIP-800: https://cwiki.apache.org/confluence/display/KAFKA/KIP-800%3A+Add+reason+to+JoinGroupRequest+and+LeaveGroupRequest. Reviewers: loboya~ <317307889@qq.com>, Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>	2022-01-13 16:01:37 +01:00
Chang	99d9e8f7fc	MINOR: update the comment for Utils.atomicMoveWithFallback (#11641 ) Reviewers: Luke Chen <showuon@gmail.com>, Cong Ding <cong@ccding.com>, Jun Rao <junrao@gmail.com>	2022-01-05 17:03:50 -08:00
Matthias J. Sax	daaa9dfb54	MINOR: add default-replication-factor to MockAdminClient (#11648 ) Reviewers: Bill Bejeck <bill@confluent.io>	2022-01-05 10:37:23 -08:00
RivenSun	2518c37f9a	KAFKA-13425: Optimization of KafkaConsumer#pause semantics (#11460 ) * 1. Enhance the annotation of KafkaConsumer#pause(...) method 2. Add log output when clearing the paused mark of topicPartitions. Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2022-01-04 15:47:03 -08:00
David Jacot	be60fe56cc	MINOR: Fix malformed html in javadoc of `ScramMechanism` (#11625 ) Reviewers: Luke Chen <showuon@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2021-12-23 21:15:24 +01:00
Luke Chen	c219fba421	KAFKA-13419: Only reset generation ID when ILLEGAL_GENERATION error (#11451 ) Updated: This PR will reset generation ID when ILLEGAL_GENERATION error since the member ID is still valid. ===== resetStateAndRejoin when REBALANCE_IN_PROGRESS error in sync group, to avoid out-of-date ownedPartition == JIRA description == In KAFKA-13406, we found there's user got stuck when in rebalancing with cooperative sticky assignor. The reason is the "ownedPartition" is out-of-date, and it failed the cooperative assignment validation. Investigate deeper, I found the root cause is we didn't reset generation and state after sync group fail. In KAFKA-12983, we fixed the issue that the onJoinPrepare is not called in resetStateAndRejoin method. And it causes the ownedPartition not get cleared. But there's another case that the ownedPartition will be out-of-date. Here's the example: consumer A joined and synced group successfully with generation 1 New rebalance started with generation 2, consumer A joined successfully, but somehow, consumer A doesn't send out sync group immediately other consumer completed sync group successfully in generation 2, except consumer A. After consumer A send out sync group, the new rebalance start, with generation 3. So consumer A got REBALANCE_IN_PROGRESS error with sync group response When receiving REBALANCE_IN_PROGRESS, we re-join the group, with generation 3, with the assignment (ownedPartition) in generation 1. So, now, we have out-of-date ownedPartition sent, with unexpected results happened We might want to do resetStateAndRejoin when RebalanceInProgressException errors happend in sync group. Because when we got sync group error, it means, join group passed, and other consumers (and the leader) might already completed this round of rebalance. The assignment distribution this consumer have is already out-of-date. Reviewers: David Jacot <djacot@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2021-12-17 15:36:12 -08:00
Chris Egerton	33853c154f	MINOR: Correct usage of ConfigException in file and directory config providers The two-arg variant is intended to take a property name and value, not an exception message and a cause. As-is, this leads to confusing log messages like: ``` org.apache.kafka.common.config.ConfigException: Invalid value java.nio.file.NoSuchFileException: /my/missing/secrets.properties for configuration Could not read properties from file /my/missing/secrets.properties ``` Author: Chris Egerton <chrise@confluent.io> Reviewers: Gwen Shapira Closes #11555 from C0urante/patch-1	2021-12-17 11:51:02 -08:00

... 3 4 5 6 7 ...

2879 Commits