Commit Graph

15214 Commits

Author SHA1 Message Date
Logan Zhu 04ea25b3c3
MINOR: Replace lambda expressions with method references in ProducerStateManager (#18753)
Reviewers: TengYao Chi <kitingiao@gmail.com>, Divij Vaidya <diviv@amazon.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-02-03 10:16:22 +08:00
Ismael Juma 78aff4fede
KAFKA-18659: librdkafka compressed produce fails unless api versions returns produce v0 (#18727)
Return produce v0-v2 as supported versions in `ApiVersionsResponse`, but disable support
for it everywhere else.

Since clients pick the highest supported version by both client and broker during version
negotiation, this solves the problem with minimal tech debt (even though it's not ideal that
`ApiVersionsResponse` becomes inconsistent with the actual protocol support).

Add one test for the socket server handling (in `ProcessorTest`) and one test for the
client behavior (in `ProduceRequestTest`). Adjust a couple of api versions tests to verify
the new behavior.

Finally, include a few clean-ups in `ApiKeys`, `Protocol`, `ProduceRequest`,
`ProduceRequestTest` and `BrokerApiVersionsCommandTest`.

Reference to related librdkafka issue:
https://github.com/confluentinc/librdkafka/issues/4956

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2025-02-01 16:08:54 -08:00
Apoorv Mittal 484ba83f59
KAFKA-18683: Handle slicing of file records for updated start position (#18759)
The PR corrects the check which was introduced in #5332 where position is checked to be within boundaries of file. The check 
    position > currentSizeInBytes - start 
is incorrect, since the position is relative to start.

Reviewers: Jun Rao <junrao@gmail.com>
2025-01-31 15:43:51 -08:00
Lianet Magrans 7920fadbb5 Revert "KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)"
This reverts commit 6cf54c4dab.
2025-01-31 17:18:35 -05:00
David Jacot d19b605210
KAFKA-18320; Ensure that assignors are at the right place (#18750)
The full class name of the assignors if part of our public api. Hence, we should ensure that they are not changed by mistake. This patch adds a unit test verifying them.

Reviewers: Sean Quah <squah@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-01-31 07:51:28 -08:00
David Jacot 0ff4dafb7d
KAFKA-18146; tests/kafkatest/tests/core/upgrade_test.py needs to be re-added as KRaft (#18766)
This patch renames kraft_upgrade_test.py to upgrade_test.py. This is enough to cover the old upgrade/downgrade tests.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-31 06:07:11 -08:00
TengYao Chi d7a5b877f2
KAFKA-18677; Update ConsoleConsumerTest system test (#18763)
This patch converts the ConsoleConsumerTest system test to only use KRaft.

Reviewers: David Jacot <djacot@confluent.io>
2025-01-31 12:19:49 +01:00
Mickael Maison 71314739f9
KAFKA-15995: Initial API + make Producer/Consumer plugins Monitorable (#17511)
Reviewers: Greg Harris <gharris1727@gmail.com>, Luke Chen <showuon@gmail.com>
2025-01-31 10:40:10 +01:00
Luke Chen 15c5c075c1
MINOR: Clean up for sasl endpoints (#18519)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-31 09:27:04 +01:00
Matthias J. Sax 281a3c6a3a
MINOR: cleanup KStream JavaDocs (3/N) - groupBy[Key] (#18705)
Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-01-30 19:52:14 -08:00
Matthias J. Sax 0d1e7e04b2
KAFKA-18644: improve generic type names for KStreamImpl and KTableImpl (#18722)
Reviewers: Bill Bejeck <bill@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-01-30 19:50:37 -08:00
kevin-wu24 184b891871
KAFKA-16524; Metrics for KIP-853 (#18304)
This change implement some of the metrics enumerated in KIP-853.

The KafkaRaftMetrics object now exposes number-of-voters, number-of-observers and uncommitted-voter-change. The number-of-observers and uncommitted-voter-change metrics are only present on the active controller or leader, since it does not make sense for other replicas to report these metrics.

In order to make these two metrics thread-safe, KafkaRaftMetrics needs to be passed into LeaderState, and therefore QuorumState. This introduces a circularity since the KafkaRaftMetrics constructor takes in QuorumState. To break the circularity for now, the logic using QuorumState will be moved to the KafkaRaftMetrics#initialize method.

The BrokerServerMetrics object now exposes ignored-static-voters. The ControllerServerMetrics object now exposes IgnoredStaticVoters. To implement both metrics for "ignored static voters", this PR introduces the ExternalKRaftMetrics interface, which allows for higher layer metrics objects to be accessible within the raft module.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2025-01-30 18:35:01 -05:00
Justine Olshan ccab9eb8b4
KAFKA-18660: Transactions Version 2 doesn't handle epoch overflow correctly (#18730)
Fixed the typo that used the wrong producer ID and epoch when returning so that we handle epoch overflow correctly.

We also had to rearrange the concurrent transaction handling so that we don't self-fence when we start the new transaction with the new producer ID.

I also tested this with a modified version of the code where epoch overflow happens on the first epoch bump (every request has a new producer id)

Reviewers: Artem Livshits <alivshits@confluent.io>, Jeff Kim <jeff.kim@confluent.io>
2025-01-30 13:42:10 -08:00
Kirk True 6cf54c4dab
KAFKA-17182: Consumer fetch sessions are evicted too quickly with AsyncKafkaConsumer (#17700)
This change reduces fetch session cache evictions on the broker for AsyncKafkaConsumer by altering its logic to determine which partitions it includes in fetch requests.

Background
Consumer implementations fetch data from the cluster and temporarily buffer it in memory until the user next calls Consumer.poll(). When a fetch request is being generated, partitions that already have buffered data are not included in the fetch request.

The ClassicKafkaConsumer performs much of its fetch logic and network I/O in the application thread. On poll(), if there is any locally-buffered data, the ClassicKafkaConsumer does not fetch any new data and simply returns the buffered data to the user from poll().

On the other hand, the AsyncKafkaConsumer consumer splits its logic and network I/O between two threads, which results in a potential race condition during fetch. The AsyncKafkaConsumer also checks for buffered data on its application thread. If it finds there is none, it signals the background thread to create a fetch request. However, it's possible for the background thread to receive data from a previous fetch and buffer it before the fetch request logic starts. When that occurs, as the background thread creates a new fetch request, it skips any buffered data, which has the unintended result that those partitions get added to the fetch request's "to remove" set. This signals to the broker to remove those partitions from its internal cache.

This issue is technically possible in the ClassicKafkaConsumer too, since the heartbeat thread performs network I/O in addition to the application thread. However, because of the frequency at which the AsyncKafkaConsumer's background thread runs, it is ~100x more likely to happen.

Options
The core decision is: what should the background thread do if it is asked to create a fetch request and it discovers there's buffered data. There were multiple proposals to address this issue in the AsyncKafkaConsumer. Among them are:

The background thread should omit buffered partitions from the fetch request as before (this is the existing behavior)
The background thread should skip the fetch request generation entirely if there are any buffered partitions
The background thread should include buffered partitions in the fetch request, but use a small “max bytes” value
The background thread should skip fetching from the nodes that have buffered partitions
Option 4 won out. The change is localized to AbstractFetch where the basic idea is to skip fetch requests to a given node if that node is the leader for buffered data. By preventing a fetch request from being sent to that node, it won't have any "holes" where the buffered partitions should be.

Reviewers: Lianet Magrans <lmagrans@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Jun Rao <junrao@gmail.com>
2025-01-30 13:12:11 -08:00
Joao Pedro Fonseca Dantas 9980e12ce1
MINOR: remove close from contextual processors javadoc
Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-01-30 11:02:07 -08:00
Matthias J. Sax ea07ff7694
MINOR: cleanup KStream JavaDocs (2/N) - print/foreach/peek/split/merge (#18704)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-01-30 09:32:57 -08:00
Matthias J. Sax a916a1db82
MINOR: cleanup KStream JavaDocs (1/N) - filter[Not]/selectKey (#18703)
Reviewers: Alieh Saeedi <asaeedi@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
2025-01-30 09:31:47 -08:00
Ken Huang 4b29fd6383
KAFKA-18034: CommitRequestManager should fail pending requests on fatal coordinator errors (#18548)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>
2025-01-30 11:22:54 -05:00
Sushant Mahajan be96807ac8
MINOR: Refactor share coord cache helper to share package. (#18743)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-30 13:33:42 +00:00
Pramithas Dhakal aa27df9396
MINOR: KafkaProducerTest - Fix resource leakage and replace explicit invocation of close() method with try with resources (#18678)
Reviewers: Divij Vaidya <diviv@amazon.com>, Greg Harris <greg.harris@aiven.io>, Christo Lolov <lolovc@amazon.com>
2025-01-30 12:34:57 +01:00
yx9o c0b5d3334a
MINOR: Improve error message for invalid topic in TopicCommand (#18714)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-30 12:07:45 +01:00
Almog Gavra 95abd174c7
MINOR: fix typo in HTML docs (#18742)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-30 11:58:13 +01:00
Mehari Beyene cc259d76e9
KAFKA-18570: Update documentation to add remainingLogsToRecover, remainingSegmentsToRecover and LogDirectoryOffline metrics (#18731)
Reviewers: Divij Vaidya <diviv@amazon.com>
2025-01-30 11:52:35 +01:00
Lucas Brutschy 56e50120be
KAFKA-18621: Add StreamsCoordinatorRecordHelpers (#18669)
A class with helper methods to create records stored in the __consumer_offsets topic.

Compared to the feature branch, I added unit tests (most functions were not tested) and adopted the new interface for constructing coordinator records introduced by David.

Reviewers: Bruno Cadonna <cadonna@apache.org>
2025-01-30 09:28:45 +01:00
PoAn Yang 0dfc4017b8
KAFKA-18441: Fix flaky KafkaAdminClientTest#testAdminClientApisAuthenticationFailure (#18735)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-30 08:01:20 +00:00
David Arthur 617196c68e
KAFKA-18636 Fix how we handle Gradle exits in CI (#18681)
This patch removes the explicit failure of test tasks in Gradle when there is a flaky test. This also fixes a fall-through case in junit.py where we did not recognize an error prior to running the tests (such as the javadoc task).

Additionally, this patch removes usages of ignoreFailures in our CI and changes the XML copy task to a finalizer task instead of doLast closure.

Reviewers: Jun Rao <junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-29 18:42:39 -05:00
Calvin Liu fdbed6c458
KAFKA-18649: complete ClearElrRecord handling (#18708)
Implement ClearElrRecord handling in the TopicDelta. Also, the ReplicationControlManager should not merge updates if ELR/LastKnownElr are empty, becuase that will cause an unnecessary partition epoch bump.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2025-01-29 15:07:44 -08:00
TengYao Chi 9dd73d43b0
KAFKA-18569: New consumer close may wait on unneeded FindCoordinator (#18590)
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Kirk True <ktrue@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-29 14:15:56 -05:00
Matthias J. Sax 1123a76110
KAFKA-13722: remove internal usage of old ProcessorContext (#18698)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-01-29 11:13:57 -08:00
Bill Bejeck 20b073bbee
KAFKA-18498: Update lock ownership from main thread (#18732)
Once a StreamThread receives its assignment, it will close the startup tasks. But during the closing process, the StandbyTask.closeClean() method will eventually call theStatemanagerUtil.closeStateManager method which needs to lock the state directory, but locking requires the calling thread be the current owner. Since the main thread grabs the lock on startup but moves on without releasing it, we need to update ownership explicitly here in order for the stream thread to close the startup task and begin processing.

Reviewers: Matthias Sax <mjsax@apache.org>, Nick Telford
2025-01-29 14:09:44 -05:00
Joao Pedro Fonseca Dantas 85109a5111
KAFKA-16339: Add Kafka Streams migrating guide from transform to process (#18314)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2025-01-29 11:07:16 -08:00
PoAn Yang 4dd0bcbde8
KAFKA-18383 Remove reserved.broker.max.id and broker.id.generation.enable (#18478)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2025-01-30 02:55:09 +08:00
Calvin Liu a3b34c1315
KAFKA-18662: Return CONCURRENT_TRANSACTIONS on produce request in TV2 (#18733)
While testing, it was found that the not_enough_replicas error was super common and could be easily confused. Since we are already bumping the request, we can signify that the produce request may return this error and new clients can handle it 

(Note, the java client should be able to handle this already as a retriable error, but other client libraries may need to implement this change)

Reviewers: Justine Olshan <jolshan@confluent.io>
2025-01-29 10:15:48 -08:00
Sushant Mahajan 632aedcf4f
KAFKA-18632: Multibroker test improvements. (#18718)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-29 17:03:43 +00:00
Jeff Kim 048dfeffd0
MINOR: prevent exception from HdrHistogram (#18674)
HdrHistogram can throw an exception if the recorded value is greater than a configured limit. Expand the ceiling from per-metric to all invocations.

Reviewers: David Jacot <djacot@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
2025-01-29 11:34:46 -05:00
Abhinav Dixit dd1f2b8aab
KAFKA-18653: Fix mocks and potential thread leak issues causing silent RejectedExecutionException in share group broker tests (#18725)
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-29 16:24:30 +00:00
Ismael Juma ca5d2cf76d
KAFKA-18646: Null records in fetch response breaks librdkafka (#18726)
Ensure we always return empty records (including cases where an error is returned).
We also remove `nullable` from `records` since it is effectively expected to be
non-null by a large percentage of clients in the wild.

This behavior regressed in fe56fc9 (KAFKA-18269). Empty records were
previously set via `FetchResponse.recordsOrFail(partitionData)` in the
now-removed `maybeConvertFetchedData` method.

Added an integration test that fails without this fix and also update many
tests to set `records` to `empty` instead of leaving them as `null`.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, David Arthur <mumrah@gmail.com>
2025-01-29 07:04:12 -08:00
TengYao Chi 97a228070e
KAFKA-18619: New consumer topic metadata events should set requireMetadata flag (#18668)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-29 08:36:05 -05:00
Andrew Schofield f960e20647
KAFKA-18488: Improve KafkaShareConsumerTest (#18728)
Reviewers: Lianet Magrans <lmagrans@confluent.io>
2025-01-29 09:47:21 +00:00
Ismael Juma e6d72c9e60
KAFKA-18648: Add back support for metadata version 0-3 (#18716)
During testing, we identified that kafka-python (and aiokafka) relies on metadata request v0 and
hence we need to add these back to comply with the premise of KIP-896 - i.e. it should not
break the clients listed within it.

I reverted the changes from #18218 related to the removal of metadata versions 0-3.

I will submit a separate PR to undeprecate these API versions on the relevant 3.x branches.

kafka-python (and aiokafka) work correctly (produce & consume) with this change on
top of the 4.0 branch.

Reviewers: David Arthur <mumrah@gmail.com>
2025-01-28 18:35:33 -08:00
David Arthur f18457f2b8
MINOR Mark a StickyAssignorTest as flaky (#18719)
Mark StickyAssignorTest#testLargeAssignmentAndGroupWithNonEqualSubscription as flaky. Used data from this
report https://github.com/apache/kafka/actions/runs/12982945953

Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-28 10:34:05 -05:00
Apoorv Mittal c7619ef8d1
KAFKA-17951: Share parition rotate strategy (#18651)
Reviewers: Andrew Schofield <aschofield@confluent.io>, Abhinav Dixit <adixit@confluent.io>
2025-01-28 11:44:48 +00:00
Sushant Mahajan f32932cc25
KAFKA-18629: Delete share group state impl [1/N] (#18712)
Reviewers: Christo Lolov <lolovc@amazon.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-28 11:43:01 +00:00
Ken Huang 5631be20a6
MINOR: Remove ZooKeeper mentions in comments (#18646)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2025-01-28 12:35:46 +01:00
Apoorv Mittal 04567cdb22
KAFKA-18657: Fixing SharePartitionManager flaky test (#18710)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Andrew Schofield <aschofield@confluent.io>
2025-01-28 08:06:58 +00:00
Matthias J. Sax dc396f47e8
KAFKA-17162: join() started thread in DefaultTaskManagerTest (#18570)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
2025-01-27 16:48:07 -08:00
TaiJuWu e89b30d14e
KAFKA-18528: MultipleListenersWithSameSecurityProtocolBaseTest and GssapiAuthenticationTest should run for async consumer (#18555)
Reviewers: Kirk True <ktrue@confluent.io>, Lianet Magrans <lmagrans@confluent.io>
2025-01-27 15:49:44 -05:00
Matthias J. Sax ee4f8f8c42
KAFKA-18541: fix flaky KafkaStreamsTelemetryIntegrationTest (#18569)
Reviewers: Bill Bejeck <bill@confluent.io>
2025-01-27 09:43:25 -08:00
Martin Sillence d001b47093
KAFKA-17792: Efficiently parse decimals with large exponents in Connect Values (#17510)
Reviewers: Greg Harris <greg.harris@aiven.io>, Mickael Maison <mickael.maison@gmail.com>
2025-01-27 09:04:29 -08:00
Sushant Mahajan b92cd9d236
KAFKA-18632: Added few share consumer multibroker tests. (#18679)
Reviewers: Andrew Schofield <aschofield@confluent.io>
2025-01-27 12:56:56 +00:00