kafka

Commit Graph

Author	SHA1	Message	Date
Manikumar Reddy	728666f3ad	KAFKA-15502: Update SslEngineValidator to handle large stores (#14445 ) We have observed an issue where inter broker SSL listener is not coming up when running with TLSv3/JDK 17 . SSL debug logs shows that TLSv3 post handshake messages >16K are not getting read and causing SslEngineValidator process to stuck while validating the provided trust/key store. - Right now, WRAP returns if there is already data in the buffer. But if we need more data to be wrapped for UNWRAP to succeed, we end up looping forever. To fix this, now we always attempt WRAP and only return early on BUFFER_OVERFLOW. - Update SslEngineValidator to unwrap post-handshake messages from peer when local handshake status is FINISHED. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2023-10-08 12:28:40 +05:30
Matthias J. Sax	c9ae44e811	MINOR: update Kafka versions for system tests (#14501 ) Reviewers: Bill Bejeck <bill@confluent.io>	2023-10-05 11:00:44 -07:00
Justine Olshan	9d7a821273	KAFKA-15330: Add missing documentation of metrics introduced as part of KAFKA-15028 (#14480 ) I've added details for VerificationFailureRate and VerificationTimeMs. I considered adding the documentation for the AddPartitionsToTxnVerification metrics, but I noticed that all the request metrics simply listed Produce\|FetchConsumer\|FetchFollower. If we don't already report the AddPartitionsToTxn request metrics in this file, it doesn't make sense to add the verification variant. (As well as all the other APIs we report) Filed a followup jira if we want to redo that whole section. Reviewers: Reviewers: Divij Vaidya <diviv@amazon.com>	2023-10-04 13:30:50 -07:00
Satish Duggana	2edd22bcab	MINOR Update 3.6 branch version to 3.6.1-SNAPSHOT	2023-10-03 14:04:42 -07:00
Satish Duggana	2097c8fa4c	Merge tag '3.6.0-rc2' into 3.6 3.6.0-rc2	2023-10-03 13:41:20 -07:00
David Arthur	0022949281	KAFKA-15483: Add KIP-938 and KIP-866 metrics to bundled docs (#14421 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Ron Dagostino <rdagostino@confluent.io>	2023-10-03 13:41:41 +02:00
Lucas Brutschy	72e275f6ea	MINOR: Logging fix in StreamsPartitionAssignor (#14435 ) Fix broken log message Reviewer: A. Sophie Blee-Goldman <ableegoldman@apache.org>	2023-10-02 12:33:09 +02:00
Hao Li	3a793b094c	MINOR: only log error when rack aware assignment is enabled (#14415 ) Reviewers: Lucas Brutschy <lbrutschy@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2023-09-29 10:17:37 -07:00
iit2009060	1897af3ef9	KAFKA-15511: Handle CorruptIndexException in RemoteIndexCache (#14459 ) A bug in the RemoteIndexCache leads to a situation where the cache does not replace the corrupted index with a new index instance fetched from remote storage. This commit fixes the bug by adding correct handling for `CorruptIndexException`. Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Alexandre Dupriez <duprie@amazon.com>	2023-09-29 10:28:37 +00:00
Satish Duggana	60e845626d	Bump version to 3.6.0	2023-09-28 21:56:28 -07:00
Kamal Chandraprakash	0d553cc9c6	KAFKA-15499: Fix the flaky DeleteSegmentsDueToLogStartOffsetBreach test (#14439 ) DeleteSegmentsDueToLogStartOffsetBreach configures the segment such that it can hold at-most 2 record-batches. And, it asserts that the local-log-start-offset based on the assumption that each segment will contain exactly two messages. During leader switch, the segment can get rotated and may not always contain two records. Previously, we were checking whether the expected local-log-start-offset is equal to the base-offset-of-the-first-local-log-segment. With this patch, we will scan the first local-log-segment for the expected offset. Reviewers: Divij Vaidya <diviv@amazon.com>	2023-09-28 13:06:40 +00:00
Luke Chen	4fdac6136b	KAFKA-15498: bump snappy-java version to 1.1.10.4 (#14434 ) bump snappy-java version to 1.1.10.4, and add more tests to verify the compressed data can be correctly decompressed and read. For LogCleanerParameterizedIntegrationTest, we increased the message size for snappy decompression since in the new version of snappy, the decompressed size is increasing compared with the previous version. But since the compression algorithm is not kafka's scope, all we need to do is to make sure the compressed data can be successfully decompressed and parsed/read. Reviewers: Divij Vaidya <diviv@amazon.com>, Ismael Juma <ismael@juma.me.uk>, Josep Prat <josep.prat@aiven.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>	2023-09-27 19:02:04 +08:00
Divij Vaidya	a6dd6c58e2	Upgrade Jetty to 9.4.52.v20230823 (#14438 ) Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2023-09-25 10:26:08 -07:00
Luke Chen	be527ea36c	MINOR: fix kraft upgrade system test (#14424 ) We should use DEV_BRANCH instead of DEV_VERSION in this case, otherwise, error will be thrown: RunnerClient: kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.6.0-SNAPSHOT.metadata_quorum=ISOLATED_KRAFT: FAIL: RemoteCommandError({'ssh_config': {'host': 'ducker10', 'hostname': 'ducker10', 'user': 'ducker', 'port': 22, 'password': '', 'identityfile': '/home/ducker/.ssh/id_rsa', 'connecttimeout': None}, 'hostname': 'ducker10', 'ssh_hostname': 'ducker10', 'user': 'ducker', 'externally_routable_ip': 'ducker10', '_logger': <Logger kafkatest.tests.core.kraft_upgrade_test.TestKRaftUpgrade.test_isolated_mode_upgrade.from_kafka_version=3.6.0-SNAPSHOT.metadata_quorum=ISOLATED_KRAFT-2 (DEBUG)>, 'os': 'linux', '_ssh_client': <paramiko.client.SSHClient object at 0xffffb35d5820>, '_sftp_client': <paramiko.sftp_client.SFTPClient object at 0xffffb35f8ca0>, '_custom_ssh_exception_checks': None}, '/opt/kafka-3.6.0-SNAPSHOT/bin/kafka-storage.sh format --ignore-formatted --config /mnt/kafka/kafka.properties --cluster-id I2eXt9rvSnyhct8BYmW6-w', 127, b'bash: line 1: /opt/kafka-3.6.0-SNAPSHOT/bin/kafka-storage.sh: No such file or directory\n') Reviewers: Satish Duggana <satishd@apache.org>	2023-09-25 16:15:51 +08:00
Divij Vaidya	e8dffea9ab	MINOR: Fix kafka-site formatting (#14419 ) Reviewers: Satish Duggana <satishd@apache.org>, Josep Prat <jlprat@apache.org>	2023-09-21 09:31:04 +00:00
David Arthur	01fa95c216	MINOR: Fix the ZK migration system tests (#14409 ) As part of validating 3.6.0 RC0, I ran the ZK migration system tests at the RC tag. Pretty much all of them failed due to recent changes (particularly, disallowing migrations with JBOD). All of the changes here are test fixes, so not a release blocker. ================================================================================ SESSION REPORT (ALL TESTS) ducktape version: 0.11.3 session_id: 2023-09-19--007 run time: 8 minutes 51.147 seconds tests run: 5 passed: 5 flaky: 0 failed: 0 ignored: 0 Reviewers: Luke Chen <showuon@gmail.com>	2023-09-20 14:36:50 +08:00
Greg Harris	ae352b6397	KAFKA-15473: Hide duplicate plugins in /connector-plugins (#14398 ) Reviewers: Yash Mayya <yash.mayya@gmail.com>, Sagar Rao <sagarmeansocean@gmail.com>, Hector Geraldino <hgeraldino@gmail.com>, Chris Egerton <chrise@aiven.io>	2023-09-19 22:30:18 +05:30
Satish Duggana	193d8c5be8	Added missing licenses for libraries (#14393 ) Reviewers: Luke Chen <showuon@gmail.com>	2023-09-15 23:23:28 +05:30
Luke Chen	8319163062	KAFKA-15442: add a section in doc for tiered storage (#14382 ) Added 6.11: Tiered Storage section and notable changes ini v3.6.0 Reviewers: Satish Duggana <satishd@apache.org>, Gantigmaa Selenge <gselenge@redhat.com>	2023-09-14 21:13:26 +05:30
Kamal Chandraprakash	2508e30670	KAFKA-15439: Transactions test with tiered storage (#14347 ) This test extends the existing TransactionsTest. It configures the broker and topic with tiered storage and expects at-least one log segment to be uploaded to the remote storage. Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>	2023-09-14 09:52:46 +08:00
Justine Olshan	f13367de4e	KAFKA-15459: Convert coordinator retriable errors to a known producer response error (#14378 ) KIP-890 Part 1 tries to address hanging transactions on old clients. Thus, the produce version can not be bumped and no new errors can be added. Before we used the java client's notion of retriable and abortable errors -- retriable errors are defined as such by extending the retriable error class, fatal errors are defined explicitly, and abortable errors are the remaining. However, many other clients treat non specified errors as fatal and that means many retriable errors kill the application. Stuck between having specific errors for Java clients that are handled correctly (ie we retry) or specific fatal errors for cases that should not be fatal, we opted for a middle ground of non-specific error, but a message in the response to specify. Converting some of the coordinator error codes to NOT_ENOUGH_REPLICAS which is a known produce response. Also correctly add the old errors to the produce response. (We were not doing this correctly before) Added tests for the new errors and messages. Reviewers: Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>	2023-09-13 14:23:41 -07:00
Federico Valeri	4902884edd	MINOR: Fix metadata.version reference in "ZooKeeper to KRaft Migration" documentation (#14366 ) In "ZooKeeper to KRaft Migration" documentation, we are still reporting 3.4 as metadata version. Reworking that phrase to make it more clear and avoid the need to update it in the future. Signed-off-by: Federico Valeri <fedevaleri@gmail.com> Reviewers: Luke Chen <showuon@gmail.com>	2023-09-13 17:20:25 +08:00
Luke Chen	89e4976770	MINOR: Fix errors in javadoc and docs in tiered storage (#14379 ) Reviewers: Satish Duggana <satishd@apache.org>	2023-09-13 12:46:52 +05:30
Luke Chen	6b91043bfb	MINOR: reduce default RLMM retry interval (#14374 ) Reduce default remote.log.metadata.initialization.retry.interval.ms value to 100ms. Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-12 23:03:09 +05:30
David Arthur	50fea09724	KAFKA-15450 Don't allow ZK migration with JBOD (#14367 ) Reviewers: Ron Dagostino <rndgstn@gmail.com>	2023-09-12 10:29:03 -04:00
Abhijeet Kumar	9c44f705b3	KAFKA-14993: Improve TransactionIndex instance handling while copying to and fetching from RSM (#14363 ) - Updated the contract for RSM's fetchIndex to throw a ResourceNotFoundException instead of returning an empty InputStream when it does not have a TransactionIndex. - Updated the LocalTieredStorage implementation to adhere to the new contract. - Added Unit Tests for the change. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Divij Vaidya <diviv@amazon.com>, Christo Lolov <lolovc@amazon.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-12 17:54:57 +05:30
Christo Lolov	4e831b967c	KAFKA-15352: Update log-start-offset before initiating deletion of remote segments (#14349 ) This change is about the current leader updating the log-start-offset before the segments are deleted from remote storage. This will do a best-effort mechanism for followers to receive log-start-offset from the leader and they can update their log-start-offset before it becomes a leader. Reviewers: Kamal Chandraprakash<kamal.chandraprakash@gmail.com>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2023-09-12 10:13:44 +05:30
Kamal Chandraprakash	2a56edc0ea	MINOR: Removed the RSM and RLMM classpath config validator (#14358 ) - RSM and RLMM classpath can be empty since it's optional so removed the non-empty string validator - Fix getting the `localTieredStorage` by brokerId after stopping a broker. Reviewers: Christo Lolov <lolovc@amazon.com>, Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>	2023-09-09 19:03:18 +05:30
David Arthur	5318390e71	KAFKA-15441 Allow broker heartbeats to complete in metadata transaction (#14351 ) This patch allows broker heartbeat events to be completed while a metadata transaction is in-flight. More generally, this patch allows any RUNS_IN_PREMIGRATION event to complete while the controller is in pre-migration mode even if the migration transaction is in-flight. We had a problem with broker heartbeats timing out because they could not be completed while a large ZK migration transaction was in-flight. This resulted in the controller fencing all the ZK brokers which has many undesirable downstream effects. Reviewers: Akhilesh Chaganti <akhileshchg@users.noreply.github.com>, Colin Patrick McCabe <cmccabe@apache.org>	2023-09-08 16:36:36 -04:00
David Arthur	365308b52d	KAFKA-15435 Fix counts in MigrationManifest (#14342 ) Reviewers: Liu Zeyu <zeyu.luke@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2023-09-08 09:14:00 -04:00
Lucas Brutschy	99bc91b73f	MINOR: fix currentLag javadoc (#14224 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 19:26:13 -07:00
atu-sharm	bb98b61009	KAFKA-15338: The metric group documentation for metrics added in KAFKA-13945 is incorrect (#14221 ) Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 19:06:13 -07:00
Kamal Chandraprakash	946ab8f410	KAFKA-15410: Delete records with tiered storage integration test (4/4) (#14330 ) * Added the integration test for DELETE_RECORDS API for tiered storage enabled topic * Added validation checks before removing remote log segments for log-start-offset breach Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-09-08 05:16:28 +05:30
José Armando García Sancio	522263d195	KAFKA-14273; Close file before atomic move (#14354 ) In the Windows OS atomic move are not allowed if the file has another open handle. E.g __cluster_metadata-0\quorum-state: The process cannot access the file because it is being used by another process at java.base/sun.nio.fs.WindowsException.translateToIOException(WindowsException.java:92) at java.base/sun.nio.fs.WindowsException.rethrowAsIOException(WindowsException.java:103) at java.base/sun.nio.fs.WindowsFileCopy.move(WindowsFileCopy.java:403) at java.base/sun.nio.fs.WindowsFileSystemProvider.move(WindowsFileSystemProvider.java:293) at java.base/java.nio.file.Files.move(Files.java:1430) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:949) at org.apache.kafka.common.utils.Utils.atomicMoveWithFallback(Utils.java:932) at org.apache.kafka.raft.FileBasedStateStore.writeElectionStateToFile(FileBasedStateStore.java:152) This is fixed by first closing the temporary quorum-state file before attempting to move it. Reviewers: Colin Patrick McCabe <cmccabe@apache.org> Co-Authored-By: Renaldo Baur Filho <renaldobf@gmail.com>	2023-09-07 16:35:03 -07:00
Chris Egerton	0db8e8c5f2	KAFKA-15416: Fix flaky TopicAdminTest::retryEndOffsetsShouldRetryWhenTopicNotFound test case (#14313 ) Reviewers: Philip Nee <pnee@confluent.io>, Greg Harris <greg.harris@aiven.io>	2023-09-07 19:25:03 -04:00
Chris Egerton	5d185a88e4	KAFKA-15425: Fail fast in Admin::listOffsets when topic (but not partition) metadata is not found (#14314 ) This restores previous behavior for Admin::listOffsets, which was to fail immediately if topic metadata could not be found, and only retry if metadata for one or more specific partitions could not be found. There is a subtle difference here: prior to https://github.com/apache/kafka/pull/13432, the operation would be retried if any metadata error was reported for any individual topic partition, even if an error was also reported for the entire topic. With this change, the operation always fails if an error is reported for the entire topic, even if an error is also reported for one or more individual topic partitions. I am not aware of any cases where brokers might return both topic- and topic partition-level errors for a metadata request, and if there are none, then this change should be safe. However, if there are such cases, we may need to refine this PR to remove the discrepancy in behavior. Reviewers: Justine Olshan <jolshan@confluent.io>	2023-09-07 14:04:27 -07:00
Lucia Cerchie	d571408672	KAFKA-15307: Removes non-existent configs (#14341 ) `partition.grouper` was removed in 3.0 release. Reviewers: Matthias J. Sax <matthias@confluent.io>	2023-09-07 13:00:58 -07:00
Luke Chen	a5e3f0ded4	MINOR: Update the javadoc in RSM (#14352 ) Reviewers: Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-07 20:57:11 +05:30
Kamal Chandraprakash	5d7840e1b2	KAFKA-15351: Update log-start-offset after leader election for topics enabled with remote storage (#14340 ) On leadership failover, the new leader's start offset may be older than the start offset of old leader. This works fine for local storage scenario because the new leader still contains data associated with stale start offset. But in case of remote storage, although new leader has a stale offset, the data associated with it has been deleted from remote by the old leader. Hence, we end up in a situation where leader has a start offset but no data associated with it. This commit fixes the situation by ensuring that on every leadership failover, for topics with remote storage, the leader will update it's start offset from the base of first segment in current leader chain present in the remote storage (if any). Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>	2023-09-07 14:37:22 +00:00
Proven Provenzano	940f329007	KAFKA-15422: Update documenttion for delegation tokens when working with Kafka with KRaft (#14339 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2023-09-06 10:42:30 +05:30
Kamal Chandraprakash	2be8b15323	KAFKA-15410: Delete topic integration test with LocalTieredStorage and TBRLMM (3/4) (#14329 ) Added delete topic integration tests for tiered storage enabled topics with LocalTieredStorage and TBRLMM Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>	2023-09-06 06:00:05 +05:30
Yash Mayya	4f855576e6	KAFKA-14876: Add stopped state to Kafka Connect Administration docs section (#14336 ) Reviewers: Chris Egerton <chrise@aiven.io>	2023-09-05 14:44:24 -04:00
Yash Mayya	3c50c382af	MINOR: Update the documentation's table of contents to add missing headings for Kafka Connect (#14337 ) Reviewers: Chris Egerton <chrise@aiven.io>	2023-09-05 13:59:35 -04:00
Abhijeet Kumar	7f50497925	KAFKA-15293 Added documentation for tiered storage metrics (#14331 ) Reviewers: Divij Vaidya <diviv@amazon.com>, Satish Duggana <satishd@apache.org>	2023-09-05 22:19:53 +05:30
Luke Chen	b7df99abec	MINOR: Update comment in consumeAction (#14335 ) Reviewers: Satish Duggana <satishd@apache.org>, Divij Vaidya <diviv@amazon.com>	2023-09-05 21:36:57 +05:30
Kamal Chandraprakash	33b385e3fa	KAFKA-15410: Reassign replica expand, move and shrink integration tests (2/4) (#14328 ) - Updated the log-start-offset to the correct value while building the replica state in ReplicaFetcherTierStateMachine#buildRemoteLogAuxState Integration tests added: 1. ReassignReplicaExpandTest 2. ReassignReplicaMoveTest and 3. ReassignReplicaShrinkTest Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>	2023-09-05 19:29:35 +05:30
Kamal Chandraprakash	991c5c0610	KAFKA-15410: Expand partitions, segment deletion by retention and enable remote log on topic integration tests (1/4) (#14307 ) Added the below integration tests with tiered storage - PartitionsExpandTest - DeleteSegmentsByRetentionSizeTest - DeleteSegmentsByRetentionTimeTest and - EnableRemoteLogOnTopicTest - Enabled the test for both ZK and Kraft modes. These are enabled for both ZK and Kraft modes. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Divij Vaidya <diviv@amazon.com>	2023-09-05 10:28:35 +05:30
Justine Olshan	d8d7d3127a	KAFKA-15424: Make the transaction verification a dynamic configuration (#14324 ) This will allow enabling and disabling transaction verification (KIP-890 part 1) without having to roll the cluster. Tested that restarting the cluster persists the configuration. If a verification is disabled/enabled while we have an inflight request, depending on the step of the process, the change may or may not be seen in the inflight request (enabling will typically fail unverified requests, but we may still verify and reject when we first disable) Subsequent requests/retries will behave as expected for verification. Sequence checks will continue to take place after disabling until the first message is written to the partition (thus clearing the verification entry with the tentative sequence) or the broker restarts/partition is reassigned which will clear the memory. On enabling, we will only track sequences that for requests received after the verification is enabled. Reviewers: Jason Gustafson <jason@confluent.io>, Satish Duggana <satishd@apache.org>	2023-09-04 20:42:34 -07:00
Dimitar Dimitrov	c6af3dac00	KAFKA-15052 Fix the flaky testBalancePartitionLeaders - part II (#13908 ) A follow-up to https://github.com/apache/kafka/pull/13804. This follow-up adds the alternative fix approach mentioned in the PR above - bumping the session timeout used in the test with 1 second. Reproducing the flake-out locally has been much harder than on the CI runs, as neither Gradle with Java 11 or Java 14 nor IntelliJ with Java 14 could show it, but IntelliJ with Java 11 could occasionally reproduce the failure the first time immediately after a rebuild. While I was unable to see the failure with the bumped session timeout, the testing procedure definitely didn't provide sufficient reassurance for the fix as even without it often I'd see hundreds of consecutive successful test runs when the first run didn't fail. Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>	2023-09-04 17:03:39 +08:00
Abhijeet Kumar	6d3aa70b26	KAFKA-15260: RLM Task should handle uninitialized RLMM for the associated topic-parititon (#14113 ) This change is about RLM task handling retriable exception when it tries to copy segments to remote but the RLMM is not yet initialized. On encountering the exception, we log the error and throw the exception back to the caller. We also make sure that the failure metrics are updated since this is a temporary error because RLMM is not yet initialized. Added unit tests to verify RLM task does not attempt to copy segments to remote on encountering the retriable exception and that failure metrics remain unchanged. Reviewers: Satish Duggana <satishd@apache.org>, Luke Chen <showuon@gmail.com>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>	2023-09-04 09:14:29 +05:30

1 2 3 4 5 ...

11741 Commits All Branches Search

11741 Commits

All Branches