Commit Graph

11022 Commits

Author SHA1 Message Date
Gantigmaa Selenge 751a8af1f0
KAFKA-14420: Use incrementalAlterConfigs API for syncing topic configurations in MirrorMaker 2 (KIP-894) (#13373)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Chris Egerton <chrise@aiven.io>
2023-04-10 11:55:49 -04:00
Victoria Xia 17435484e4
KAFKA-14491: [22/N] Add test for manual upgrade to versioned store (#13449)
Adds an integration test for the manual upgrade scenario to upgrade a non-versioned store to a versioned store. The procedure is outlined in KIP-889 and also in the docs.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-07 11:15:17 -07:00
José Armando García Sancio 672dd3ab6a
KAFKA-13020; Implement reading Snapshot log append timestamp (#13345)
The SnapshotReader exposes the "last contained log time". This is mainly used during snapshot cleanup. The previous implementation used the append time of the snapshot record. This is not accurate as this is the time when the snapshot was created and not the log append time of the last record included in the snapshot.

The log append time of the last record included in the snapshot is store in the header control record of the snapshot. The header control record is the first record of the snapshot.

To be able to read this record, this change extends the RecordsIterator to decode and expose the control records in the Records type.

Reviewers: Colin Patrick McCabe <cmccabe@apache.org>
2023-04-07 09:25:54 -07:00
Calvin Liu d5e216d618
KAFKA-14617: Fill broker epochs to the AlterPartitionRequest (#13489)
As the third part of the KIP-903, it fills the broker epochs from the Fetch request into the AlterPartitionRequest. Also, before generating the alterPartitionRequest, the partition will check whether the broker epoch from the FetchRequest matches with the broker epoch recorded in the metadata cache. If not, the ISR change will be delayed.

Reviewers: Jun Rao <junrao@gmail.com>
2023-04-07 09:09:29 -07:00
Philip Nee ef453dd1ad
KAFKA-12634 enforce checkpoint after restoration (#13269)
Under at-least-once, we want to ensure checkpointing the progress after completing the restoration to prevent losing the progress and needing to restore from scratch.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Bruno Cadonna <cadonna@apache.org>
2023-04-07 11:18:40 +02:00
Chia-Ping Tsai 637bc92ba1
MINOR: move RecordReader from org.apache.kafka.tools (client module) to org.apache.kafka.tools.api (tools-api module) (#13454)
Reviewers: Jun Rao <junrao@gmail.com>
2023-04-07 00:20:56 +08:00
Luke Chen f02f5f8c8a
MINOR: fix stream failing tests (#13512)
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2023-04-06 09:00:10 -07:00
Lucas Brutschy 2117c4bce8
Minor: fix ReadOnlyTaskTest (#13519)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-04-06 08:56:07 -07:00
Roman Schmitz 4f34ce1b49
KAFKA-14376: Add ConfigProvider to make use of environment variables KIP-887 (#12992)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Jordan Moore <crikket.007@gmail.com>, Chris Egerton <fearthecellos@gmail.com>
2023-04-06 17:22:12 +02:00
Chia-Ping Tsai 3bbff167fa
MINOR: fix invalid usage in java docs (#13506)
Reviewers: Luke Chen <showuon@gmail.com>
2023-04-06 16:01:14 +08:00
stejani-cflt eb0819c202
MINOR: add under-min-isr string to log messages (#13508)
Whenever there are changes to the ISR, add an extra string to the existing log message in case the partition is under min ISR. This makes it easier to search the log when partitions go under min-ISR.

Reviewers: Luke Chen <showuon@gmail.com>, Colin Patrick McCabe <colin@cmccabe.xyz>
2023-04-06 15:04:48 +08:00
David Jacot 290eeed7ba
KAFKA-14880; TransactionMetadata with producer epoch -1 should be expirable (#13499)
We have seen the following error in logs:

```
"Mar 22, 2019 @ 21:57:56.655",Error,"kafka-0-0","transaction-log-manager-0","Uncaught exception in scheduled task 'transactionalId-expiration'","java.lang.IllegalArgumentException: Illegal new producer epoch -1
```

Investigations showed that it is actually possible for a transaction metadata object to still have -1 as producer epoch when it transitions to Dead.

When a transaction metadata is created for the first time (in handleInitProducerId), it has -1 as its producer epoch. Then a producer epoch is attributed and the transaction coordinator tries to persist the change. If the write fail for instance because there is an under min isr, the transaction metadata remains with its epoch as -1 forever or until the init producer id is retried.

This means that it is possible for transaction metadata to remain with -1 as producer epoch until it gets expired. At the moment, this is not allowed because we enforce a producer epoch greater or equals to 0 in prepareTransitionTo.

Reviewers: Luke Chen <showuon@gmail.com>, Justine Olshan <jolshan@confluent.io>
2023-04-06 08:45:16 +02:00
Guozhang Wang b2ee6df1c4
KAFKA-14172: Should clear cache when active recycled from standby (#13369)
This fix is inspired by #12540.

1. Added a clearCache function for CachedStateStore, which would be triggered upon recycling a state manager.
2. Added the integration test inherited from #12540 .
3. Improved some log4j entries.
4. Found and fixed a minor issue with log4j prefix.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2023-04-05 16:05:11 -07:00
Guozhang Wang 653baa6694
KAFKA-10199: Add task updater metrics, part 2 (#13300)
Part of KIP-869

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2023-04-05 11:49:08 -07:00
Victoria Xia df59cc1a01
KAFKA-14491: [20/N] Add public-facing methods for versioned stores (#13442)
Until this PR, all the code added for KIP-889 for introducing versioned stores to Kafka Streams has been accessible from internal packages only. This PR exposes the stores via public Stores.java methods, and also updates the TopologyTestDriver.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-05 09:27:53 -07:00
Luke Chen 31f9a54cba
KAFKA-14850: introduce InMemoryLeaderEpochCheckpoint (#13456)
The motivation for introducing InMemoryLeaderEpochCheckpoint is to allow remote log manager to create the RemoteLogSegmentMetadata(RLSM) with the correct leader epoch info for a specific segment. To do that, we need to rely on the LeaderEpochCheckpointCache to truncate from start and end, to get the epoch info. However, we don't really want to truncate the epochs in cache (and write to checkpoint file in the end). So, we introduce this InMemoryLeaderEpochCheckpoint to feed into LeaderEpochCheckpointCache, and when we truncate the epoch for RLSM, we can do them in memory without affecting the checkpoint file, and without interacting with file system.

Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-04-05 20:11:32 +08:00
Guozhang Wang beb0be5fe4
KAFKA-14533: Do not interrupt state-updater thread during shutdown (#13318)
1. Fix the StateUpdater shutdown procedure: a) in shutdown, we first set the running flag, then notify the condition; b) in the thread's waitIfAllChangelogsCompletelyRead block, we collapse the if condition together with the while condition so that we always check all four conditions once the thread is notified inside the while loop. As a result, shutdown procedure would not involve any thread interruptions anymore.
2. Print fine-grained streams exception when list-offset fails, this is a byproduct of the debugging procedure but I think it's worth keeping since it has better operational visibilities.
3. Some nit logging improvements (including moving logger from the inner thread into the outer class to also add some more logging).
4. Re-enable state-updater in SmokeTestDriverIntegrationTest.

Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Bruno Cadonna <cadonna@apache.org>
2023-04-04 15:29:00 -07:00
Proven Provenzano 6d36db1c78
KAFKA-14765 and KAFKA-14776: Support for SCRAM at bootstrap with integration tests (#13374)
Implement KIP-900

Update kafka-storage to be able to add SCRAM records to the bootstrap metadata file at format time so that SCRAM is enabled at initial start (bootstrap) of KRaft cluster. Includes unit tests.

Update ./core/src/test/scala/integration/kafka/api/SaslScramSslEndToEndAuthorizationTest.scala to use bootstrap and
enable the test to run with both ZK and KRaft quorum.

Moved the one test from ScramServerStartupTest.scala into SaslScramSslEndToEndAuthorizationTest.scala. This test is really small, so there was no point in recreating all the bootstrap startup just for a 5 line test when it could easily be run elsewhere.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>
2023-04-04 08:34:09 -07:00
Chris Egerton 5e820571de
MINOR: Fix base ConfigDef in AbstractHerder::connectorPluginConfig (#13466)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris <gharris1727@gmail.com>
2023-04-04 11:57:42 +02:00
Victoria Xia babfb1778b
KAFKA-14864: Close iterator in KStream windowed aggregation emit on window close (#13470)
Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:29:40 -07:00
Victoria Xia 63fee01366
KAFKA-14491: [19/N] Combine versioned store RocksDB instances into one (#13431)
The RocksDB-based versioned store implementation introduced in KIP-889 currently uses two physical RocksDB instances per store instance: one for the "latest value store" and another for the "segments store." This PR combines those two RocksDB instances into one by representing the latest value store as a special "reserved" segment within the segments store. This reserved segment has segment ID -1, is never expired, and is not included in the regular Segments methods for getting or creating segments, but is represented in the physical RocksDB instance the same way as any other segment.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:27:19 -07:00
Victoria Xia f503aa3ab4
KAFKA-14491: [16/N] Add recovery logic for store inconsistency due to failed write (#13364)
The RocksDB-based implementation of versioned stores introduced via KIP-889 consists of a "latest value store" and separate (logical) "segments stores." A single put operation may need to modify multiple (two) segments, or both a segment and the latest value store, which opens the possibility to store inconsistencies if the first write succeeds while the later one fails. When this happens, Streams will error out, but the store still needs to be able to recover upon restart. This PR adds the necessary repair logic into RocksDBVersionedStore to effectively undo the earlier failed write when a store inconsistency is encountered.

Reviewers: Matthias J. Sax <matthias@confluent.io>
2023-04-03 21:23:48 -07:00
Andreas Maechler 15e896a5b3
Fix typos in security.html (#13480)
Reviewers: Divij Vaidya <diviv@amazon.com>,  Jun Rao <junrao@gmail.com>
2023-04-03 14:28:25 -07:00
Pierangelo Di Pilato 4e1fcf1847
KAFKA-14771: Include threads info in ConcurrentModificationException message (#13325)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Divij Vaidya <diviv@amazon.com>
2023-04-03 12:35:13 +02:00
Rajini Sivaram 1f0ae71fb3
KAFKA-14452: Make sticky assignors rack-aware if client rack is configured (KIP-881) (#13350)
Best-effort rack alignment for sticky assignors when both consumer racks and partition racks are available with the protocol changes introduced in KIP-881. Rack-aware assignment is enabled by configuring client.rack for consumers. The assignment builders attempt to align on racks on a best-effort basis, but prioritize balanced assignment over rack alignment.

Reviewers: David Jacot <djacot@confluent.io>
2023-04-03 09:22:47 +01:00
Yash Mayya 970dea60e8
KAFKA-14785 (KIP-875): Connect offset read REST API (#13434)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-04-02 13:09:33 -04:00
Colin P. McCabe 145ef2d1e0 MINOR: fix BrokerMetadataPublisherTest.testExceptionInUpdateCoordinator
Fix a case where we were getting an exception because we removed a publisher, but left it in
BrokerServer.metadataPublishers (resulting in us trying to remove it during broker shutdown.)
2023-03-31 10:15:52 -07:00
Luke Chen 372b0f1c58
Suppress exception in testExceptionInUpdateCoordinator (#13486)
Reviewers: David Arthur <mumrah@gmail.com>
2023-03-31 11:20:35 -04:00
Dániel Urbán 0aa365add8
KAFKA-14838: Add flow/connector/task/role information to MM2 Kafka client.id configs (#13458)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-03-31 10:50:11 -04:00
Rajini Sivaram 3c4472d701
KAFKA-14867: Trigger rebalance when replica racks change if client.rack is configured (KIP-881) (#13474)
When `client.rack` is configured for consumers, we perform rack-aware consumer partition assignment to improve locality. After/during reassignments, replica racks may change, so to ensure optimal consumer assignment, trigger rebalance from the leader when set of racks of any partition changes.

Reviewers: David Jacot <djacot@confluent.io>
2023-03-31 15:01:07 +01:00
Luke Chen d849d66717
Use readlock for reading epochs in LeaderEpochFIleCache (#13483)
Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>
2023-03-31 18:03:29 +05:30
Calvin Liu 8c88cdb718
KAFKA-14617: Update AlterPartitionRequest and enable Kraft controller to reject stale request. (#13408)
Second part of the [KIP-903](https://cwiki.apache.org/confluence/display/KAFKA/KIP-903%3A+Replicas+with+stale+broker+epoch+should+not+be+allowed+to+join+the+ISR), it updates the AlterPartitionRequest:
- Deprecate the NewIsr field
- Create a new field BrokerState with BrokerId and BrokerEpoch
- Bump the AlterPartition version to 3

With this change, the Quorum Controller is enabled to reject stale AlterPartition request.

Reviewers: Jun Rao <junrao@gmail.com>, David Jacot <djacot@confluent.io>
2023-03-31 11:27:42 +02:00
Robert Young 2b26db0d38
Switch to SplittableRandom in ProducerPerformance utility (#13482)
Why:
Using java.util.Random to generate every byte sent from the ProducerPerformance
appears to be a limiting factor. Throughput of the ProducerPerformance script is
higher with a file of records as compared to randomly generated records.

On my machine a single thread can generate ~100MB/second of uppercase letters using
java.util.Random and ~300MB/sec using java.util.SplittableRandom. This is a limit on
throughput.

Note: you can optimise further by expanding it from 26 letters to 32 letter generated
as it is more efficient to generate a nicely distributed int when the bound is a
power of two.

Reviewers: Luke Chen <showuon@gmail.com>
2023-03-31 14:52:10 +08:00
andymg3 b77b7a6f6f
MINOR: Deflake some tests in TopicCommandIntegrationTest (#13479)
A couple tests in TopicCommandIntegrationTest look flaky, such as testTopicDeletion and testTopicWithCollidingCharDeletionAndCreateAgain.

I also updated part of a comment that implies the code only runs in ZK mode but thats not the case so I removed it.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Justine Olshan <jolshan@confluent.io>
2023-03-30 15:25:15 -07:00
Justine Olshan 6d9d65e666
MINOR: Change ordering of checks to prevent log spam on metadata updates (#13447)
On startup, we always update the metadata. The topic ID also goes from null to defined. Move the epoch is null check to before the topic ID check to prevent log spam.

Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>
2023-03-30 09:23:55 -07:00
andymg3 887d05559f
MINOR: Create only one FeatureControlManager instance in ReplicationControlManagerTest (#13468)
This is a small patch to make it so we only create one FeatureControlManager instance in ReplicationControlManagerTest. Currently we create two, which isn't needed. Its also a bit confusing because the ReplicationControlTestContext objects ends up having a different FeatureControlManager reference that the one its own ReplicationControlManager instance has a reference to.

Reviewers: José Armando García Sancio <jsancio@apache.org>, dengziming <dengziming1993@gmail.com>
2023-03-29 19:10:03 -07:00
Philip Nee 5c0e4aa676
KAFKA-14468: Committed API (#13380)
In this PR, I implemented the committed API. Here are the specifics:

* the CommitRequestManager handles committed() request.
* I implemented a UnsentOffsetFetchRequestState to handle deduping the request: because we don't want to send the exact requests repeatedly.
* I implemented the retry mechanism: Some retriable errors will be retried automatically
* ClientResponse errors are handled in the handlers.
* Some of the top-level APIs were refactored lightly.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2023-03-29 16:09:52 -07:00
Justine Olshan f8d0fc835b
MINOR: Remove addOne to fix build (#13469)
Removed addOne method that broke scala 2.12 build

---------

Co-authored-by: David Arthur <mumrah@gmail.com>

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
2023-03-29 14:02:47 -07:00
Colin Patrick McCabe 09e59bc776
KAFKA-14857: Fix some MetadataLoader bugs (#13462)
The MetadataLoader is not supposed to publish metadata updates until we have loaded up to the high
water mark. Previously, this logic was broken, and we published updates immediately. This PR fixes
that and adds a junit test.

Another issue is that the MetadataLoader previously assumed that we would periodically get
callbacks from the Raft layer even if nothing had happened. We relied on this to install new
publishers in a timely fashion, for example. However, in older MetadataVersions that don't include
NoOpRecord, this is not a safe assumption.

Aside from the above changes, also fix a deadlock in SnapshotGeneratorTest, fix the log prefix for
BrokerLifecycleManager, and remove metadata publishers on brokerserver shutdown (like we do for
controllers).

Reviewers: David Arthur <mumrah@gmail.com>, dengziming <dengziming1993@gmail.com>
2023-03-29 12:30:12 -07:00
andymg3 379b6978a0
KAFKA-14829: Consolidate reassignment logic into PartitionReassignmentReplicas (#13440)
Currently, we have various bits of reassignment logic spread across different classes. For example, ReplicationControlManager contains logic for when a reassignment is in progress, which is duplication in PartitionChangeBuilder. Another example is PartitionReassignmentRevert which contains logic for how to undo/revert a reassignment. The idea here is to move the logic to PartitionReassignmentReplicas so it's more testable and easier to reason about.

Reviewers: José Armando García Sancio <jsancio@apache.org>
2023-03-29 10:12:40 -07:00
Chia-Ping Tsai 6e8d0d9850
KAFKA-14853 the serializer/deserialize which extends ClusterResourceListener is not added to Metadata (#13460)
Reviewers: dengziming <dengziming1993@gmail.com>
2023-03-29 16:02:04 +08:00
Jorge Esteban Quilcate Otoya 5afedd9ac3
KAFKA-14843: Include Connect framework properties when retrieving connector config definitions (#13445)
Reviewers: Yash Mayya <yash.mayya@gmail.com>, Greg Harris <greg.harris@aiven.io>, Chris Egerton <chrise@aiven.io>
2023-03-28 11:26:23 -04:00
hudeqi f7ea9cfb50
KAFKA-14837/14842:Avoid the rebalance caused by the addition and deletion of irrelevant groups for MirrorCheckPointConnector (#13446)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-03-28 09:19:52 -04:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
Purshotam Chauhan f3e4dd9229
KAFKA-14827: Support for StandardAuthorizer benchmark (#13423)
* KAFKA-14827: Support for StandardAuthorizer benchmark

Co-authored-by: Purshotam Chauhan <purshotam.r.chauhan@gmail.com>

* reverting unintentional change

---------

Co-authored-by: David Arthur <mumrah@gmail.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2023-03-28 14:14:50 +05:30
David Arthur f1b3732fa6
KAFKA-14796 Migrate ACLs from AclAuthorizor to KRaft (#13368)
This patch refactors the loadCache method in AclAuthorizer to make it reusable by ZkMigrationClient.
The loaded ACLs are converted to AccessControlEntryRecord. I noticed we still have the defunct
AccessControlRecord, so I've deleted it.

Also included here are the methods to write ACL changes back to ZK while in dual-write mode.

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Colin P. McCabe <cmccabe@apache.org>
2023-03-27 16:12:02 -07:00
Kirk True 31440b00f3
KAFKA-14848: KafkaConsumer incorrectly passes locally-scoped serializers to FetchConfig (#13452)
Fix for a NPE bug that was caused by referring to a local variable and not the instance variable of the deserializers.

Co-authored-by: Robert Yokota <1761488+rayokota@users.noreply.github.com>

Reviewers: Robert Yokota <1761488+rayokota@users.noreply.github.com>, Guozhang Wang <wangguoz@gmail.com>
2023-03-27 09:53:12 -07:00
Chia-Ping Tsai 7438f100cf
KAFKA-14774 the removed listeners should not be reconfigurable (#13326)
Reviewers: Mickael Maison <mimaison@users.noreply.github.com>
2023-03-27 18:48:31 +08:00
egyedt 139f7709bd
Fix log DateTime format unit test (#13441)
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
2023-03-27 10:48:47 +02:00
Iblis Lin e4af074b4c
MINOR: doc: fix typo in config-streams (#13450)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2023-03-26 00:00:53 +08:00