Commit Graph

236 Commits

Author SHA1 Message Date
Ron Dagostino 64087fd9f3 MINOR: Move delegation token support to Metadata Version 3.6-IV2 (#14270)
#14083 added support for delegation tokens in KRaft and attached that support to the existing
MetadataVersion 3.6-IV1. This patch moves that support into a separate MetadataVersion 3.6-IV2.

Reviewers: Colin P. McCabe <cmccabe@apache.org>
2023-08-22 16:20:27 -07:00
Greg Harris 6bd17419b7
KAFKA-15228: Add sync-manifests command to connect-plugin-path (KIP-898) (#14195)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-16 11:37:33 -07:00
Colin Patrick McCabe adc16d0f31
KAFKA-14538: Implement KRaft metadata transactions in QuorumController
Implement the QuorumController side of KRaft metadata transactions.

As specified in KIP-868, this PR creates a new metadata version, IBP_3_6_IV1, which contains the
three new records: AbortTransactionRecord, BeginTransactionRecord, EndTransactionRecord.

In order to make offset management unit-testable, this PR moves it out of QuorumController.java and
into OffsetControlManager.java. The general approach here is to track the "last stable offset," which is
calculated by looking at the latest committed offset and the in-progress transaction (if any). When
a transaction is aborted, we revert back to this last stable offset. We also revert back to it when
the controller is transitioning from active to inactive.

In a follow-up PR, we will add support for the transaction records in MetadataLoader. We will also
add support for automatically aborting pending transactions after a controller failover.

Reviewers: David Arthur <mumrah@gmail.com>
2023-08-14 16:58:56 -07:00
Greg Harris f5655d31d3
KAFKA-15030: Add connect-plugin-path command-line tool (#14064)
Reviewers: Chris Egerton <chrise@aiven.io>
2023-08-11 12:05:51 -07:00
Nikolay 1fd58e30cf
KAFKA-14595: Move classes from ReassignPartitionsCommand to tools (#14172)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-08-11 14:52:14 +02:00
Federico Valeri 8de3e0436a
KAFKA-15239: Fix system tests using producer performance service (#14092)
Reviewers: Greg Harris <greg.harris@aiven.io>
2023-08-10 14:23:43 -07:00
Federico Valeri bb677c4959
KAFKA-14583: Move ReplicaVerificationTool to tools (#14059)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-07-26 12:04:34 +02:00
Federico Valeri 1bf73d89d0
KAFKA-15232: Move ToolsUtils to tools (#14066)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-07-21 20:27:44 +02:00
Nikolay 4bba2c8a32
KAFKA-14591: Move DeleteRecordsCommand to tools (#13278)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>
2023-07-21 17:30:28 +02:00
Greg Harris 125dbb9286
KAFKA-14760: Move ThroughputThrottler from tools to clients, remove tools dependency from connect-runtime (#13313)
Reviewers: Ismael Juma <ismael@juma.me.uk>
2023-07-20 12:58:48 -07:00
Federico Valeri 334c41d604
KAFKA-14734: Use CommandDefaultOptions in StreamsResetter (#13983)
This PR adds CommandDefaultOptions usage like in the other joptsimple based tools. It also moves the associated unit test class from streams to tools module as discussed in #13127 (comment)

Reviewers:  Luke Chen <showuon@gmail.com>, Bruno Cadonna <cadonna@apache.org>, Sagar Rao <sagarmeansocean@gmail.com>
2023-07-20 18:45:05 +08:00
Manikumar Reddy 4e85bc9f80
MINOR: Fix Jmxtool to honour wait option when MBean is not yet avaibale in MBean server (#13995)
In JmxTool.scala, we will wait till all the object names are available from MBean server. But in the newer version, we only wait for subset of object names. Due to this, we may not enforce wait option and prematurely return the result if the objects are not yet registered in MBean sever.

Reviewers: Luke Chen <showuon@gmail.com>, Federico Valeri <fvaleri@redhat.com>
2023-07-12 17:01:10 +05:30
prasanthV 58fc264410
MINOR: Fix ToolsTestUtils by removing incorrect closure of Std Stream (#13922)
Reviewers: Lucas Bradstreet <lucas@confluent.io>, Divij Vaidya <diviv@amazon.com>
2023-06-28 17:46:22 +02:00
José Armando García Sancio 8ad0ed3e61
KAFKA-15021; Skip leader epoch bump on ISR shrink (#13765)
When the KRaft controller removes a replica from the ISR because of the controlled shutdown there is no need for the leader epoch to be increased by the KRaft controller. This is accurate as long as the topic partition leader doesn't add the removed replica back to the ISR.

This change also fixes a bug when computing the HWM. When computing the HWM, replicas that are not eligible to join the ISR but are caught up should not be included in the computation. Otherwise, the HWM will never increase for replica.lag.time.max.ms because the shutting down replica is not sending FETCH request. Without this additional fix PRODUCE requests would timeout if the request timeout is greater than replica.lag.time.max.ms.

Because of the bug above the KRaft controller needs to check the MV to guarantee that all brokers support this bug fix before skipping the leader epoch bump.

Reviewers: David Mao <47232755+splett2@users.noreply.github.com>, Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>
2023-06-07 07:20:40 -07:00
Federico Valeri 7e9a82c732
MINOR: Fix for MetadataQuorumCommandErrorTest.testRelativeTimeMs (#13784)
Reviewers: Divij Vaidya <diviv@amazon.com>, David Jacot <djacot@confluent.io>
2023-05-31 18:48:26 +02:00
Federico Valeri 45520c1342
KAFKA-14982: Improve the kafka-metadata-quorum output (#13738)
When running kafka-metadata-quorum script to get the quorum replication status, the LastFetchTimestamp and LastCaughtUpTimestamp output is not human-readable.

I will be convenient to add an optional flag (-hr, --human-readable) to enable a human-readable format showing the delay in ms (i.e. 366 ms ago).

This dealy is computed as (now - timestamp), where they are both represented as Unix time (UTC based).

$ bin/kafka-metadata-quorum.sh --bootstrap-server :9092 describe --replication --human-readable
NodeId	LogEndOffset	Lag	LastFetchTimestamp	LastCaughtUpTimestamp	Status  	
2     	61          	0  	5 ms ago          	5 ms ago             	Leader  	
3     	61          	0  	56 ms ago         	56 ms ago            	Follower	
4     	61          	0  	56 ms ago         	56 ms ago            	Follower

Reviewers: Luke Chen <showuon@gmail.com>
2023-05-29 10:04:46 +08:00
Federico Valeri ac9d11b426
KAFKA-14997: Fix JmxToolTest failing on CI server (#13720)
This test was reported as flaky on CI server.

When connecting to a multi-homed machine using RMI, the wrong address may be returned by the RMI registry to the client, causing the connection to the RMI server to timeout.

This change explicitly set the hostname returned to the the clients in the remote stub object.

Reviewers: Luke Chen <showuon@gmail.com>, vamossagar12 <sagarmeansocean@gmail.com>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, hudeqi <16120374@bjtu.edu.cn>, Christo Lolov <christololov@gmail.com>
2023-05-16 10:31:32 +08:00
Kamal Chandraprakash 54a4067f81
KAFKA-14559: Fix JMX tool to handle the object names with wildcard and optional attributes (#13060)
Reviewers: Federico Valeri <fedevaleri@gmail.com>, Satish Duggana <satishd@apache.org>
2023-05-11 21:49:21 +05:30
Christo Lolov dc7819d7f1
KAFKA-14594: Move LogDirsCommand to tools module (#13122)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-05-04 12:00:33 +02:00
Gantigmaa Selenge ea540fa400
KAFKA-14592: Move FeatureCommand to tools (#13459)
KAFKA-14592: Move FeatureCommand to tools

Reviewers: Luke Chen <showuon@gmail.com>
2023-04-25 20:28:37 +08:00
Chia-Ping Tsai 637bc92ba1
MINOR: move RecordReader from org.apache.kafka.tools (client module) to org.apache.kafka.tools.api (tools-api module) (#13454)
Reviewers: Jun Rao <junrao@gmail.com>
2023-04-07 00:20:56 +08:00
Robert Young 2b26db0d38
Switch to SplittableRandom in ProducerPerformance utility (#13482)
Why:
Using java.util.Random to generate every byte sent from the ProducerPerformance
appears to be a limiting factor. Throughput of the ProducerPerformance script is
higher with a file of records as compared to randomly generated records.

On my machine a single thread can generate ~100MB/second of uppercase letters using
java.util.Random and ~300MB/sec using java.util.SplittableRandom. This is a limit on
throughput.

Note: you can optimise further by expanding it from 26 letters to 32 letter generated
as it is more efficient to generate a nicely distributed int when the bound is a
power of two.

Reviewers: Luke Chen <showuon@gmail.com>
2023-03-31 14:52:10 +08:00
vamossagar12 c14f56b484
KAFKA-14586: Moving StreamResetter to tools (#13127)
Moves StreamResetter to tools project.

Reviewers: Federico Valeri <fedevaleri@gmail.com>, Christo Lolov <lolovc@amazon.com>, Bruno Cadonna <cadonna@apache.org>
2023-03-28 14:43:22 +02:00
hudeqi aef004edee
KAFKA-14812:ProducerPerformance still counting successful sending in console when sending failed (#13404)
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2023-03-21 16:59:18 +08:00
Chia-Ping Tsai 279c237632
Revert "MINOR: Fixed ProducerPerformance still counting successful sending when sending failed (#13348)" (#13401)
This reverts commit 8e4c0d0b04.

Reviewers: Luke Chen <showuon@gmail.com>
2023-03-16 21:26:01 +08:00
hudeqi 8e4c0d0b04
MINOR: Fixed ProducerPerformance still counting successful sending when sending failed (#13348)
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
2023-03-15 21:30:51 +08:00
Federico Valeri 07e2f6cd4d
KAFKA-14578: Move ConsumerPerformance to tools (#13215)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Alexandre Dupriez <alexandre.dupriez@gmail.com>
2023-03-06 18:16:55 +01:00
vamossagar12 bb3111f472
KAFKA-14580: Moving EndToEndLatency from core to tools module (#13095)
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Federico Valeri <fedevaleri@gmail.com>, Ismael Juma <mlists@juma.me.uk>
2023-03-02 12:05:22 +01:00
Gantigmaa Selenge ea30ec4b56
KAFKA-14590: Move DelegationTokenCommand to tools (#13172)
KAFKA-14590: Move DelegationTokenCommand to tools

Reviewers: Luke Chen <showuon@gmail.com>, Christo Lolov <christo_lolov@yahoo.com>, Federico Valeri <fvaleri@redhat.com>
2023-03-02 14:30:07 +08:00
Ron Dagostino 631e6be3a0
KAFKA-14711: kafaka-metadata-quorum.sh does not honor --command-confi… (#13241)
…g option

https://github.com/apache/kafka/pull/12951 accidentally changed the behavior of the `kafaka-metadata-quorum.sh` CLI by making it silently ignore a `--command-config <filename>` properties file that exists. This was an undetected regression in the 3.4.0 release.  This patch fixes the issue such that any such specified file will be honored.

Reviewers: José Armando García Sancio <jsancio@apache.org>, Ismael Juma <ismael@juma.me.uk>
2023-02-13 18:33:20 -05:00
Federico Valeri 50e0e3c257
KAFKA-14582: Move JmxTool to tools (#13136)
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2023-02-02 11:23:26 +01:00
Mickael Maison 8b44237655
KAFKA-14575: Move ClusterTool to tools module (#13080)
Reviewers: dengziming <dengziming1993@gmail.com>, Federico Valeri  <fedevaleri@gmail.com>
2023-01-22 12:50:43 +01:00
Luke Chen 2575362639
KAFKA-14498: reduce the startup nodes to avoid timeout error (#13016)
In MetadataQuorumCommandTest, we sometimes got the error:

java.util.concurrent.ExecutionException: java.lang.RuntimeException: Received a fatal error while waiting for the broker to catch up with the current cluster metadata.

Since we tried to bring up 3 broker + 3 controllers at the same time, and the config initial.broker.registration.timeout.ms (default 1 min) is sometimes not enough for them to start up. Checking the tests, it doesn't require so many nodes. Reducing the nodes number to make these tests reliable.

Reviewers: dengziming <dengziming1993@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2022-12-21 11:19:22 +08:00
Ismael Juma c0b28fde66
MINOR: Use INFO logging for tools and trogdor tests (#13006)
`TRACE` is too noisy and makes the build slower.

Reviewers: David Jacot <djacot@confluent.io>
2022-12-17 10:22:40 -08:00
Ismael Juma 88725669e7
MINOR: Move MetadataQuorumCommand from `core` to `tools` (#12951)
`core` should only be  used for legacy cli tools and tools that require
access to `core` classes instead of communicating via the kafka protocol
(typically by using the client classes).

Summary of changes:
1. Convert the command implementation and tests to Java and move it to
    the `tools` module.
2. Introduce mechanism to capture stdout and stderr from tests.
3. Change `kafka-metadata-quorum.sh` to point to the new command class.
4. Adjusted the test classpath of the `tools` module so that it supports tests
    that rely on the `@ClusterTests` annotation.
5. Improved error handling when an exception different from `TerseFailure` is
    thrown.
6. Changed `ToolsUtils` to avoid usage of arrays in favor of `List`.

Reviewers: dengziming <dengziming1993@gmail.com>
2022-12-09 09:22:58 -08:00
runom b8754c074a
KAFKA-14355: Fix integer overflow in ProducerPerformance (#12822)
Change types from int to long to avoid overflow

Reviewers: Luke Chen <showuon@gmail.com>,  Igor Soarez <soarez@apple.com>
2022-11-05 20:19:08 +08:00
Kirk True 8e43548175
KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package (#12039)
* KAFKA-13725: KIP-768 OAuth code mixes public and internal classes in same package

Move classes into a sub-package of "internal" named "secured" that
matches the layout more closely of the "unsecured" package.

Replaces the concrete implementations in the former packages with
sub-classes of the new package layout and marks them as deprecated. If
anyone is already using the newer OAuth code, this should still work.

* Fix checkstyle and spotbugs violations

Co-authored-by: Kirk True <kirk@mustardgrain.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2022-09-23 13:15:15 +05:30
dengziming 150fd5b0b1
KAFKA-13914: Add command line tool kafka-metadata-quorum.sh (#12469)
Add `MetadataQuorumCommand` to describe quorum status, I'm trying to use arg4j style command format, currently, we only support one sub-command which is "describe" and we can specify 2 arguments which are --status and --replication.

```
# describe quorum status
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --replication

ReplicaId	LogEndOffset	Lag	LastFetchTimeMs	LastCaughtUpTimeMs	Status  	
0        	10          	        0  	-1             	        -1                	                 Leader  	
1        	10          	        0  	-1             	        -1                	                 Follower	
2        	10          	        0  	-1             	        -1                	                 Follower	

kafka-metadata-quorum.sh --bootstrap-server localhost:9092 describe --status
ClusterId:                             fMCL8kv1SWm87L_Md-I2hg
LeaderId:                             3002
LeaderEpoch:                      2
HighWatermark:                  10
MaxFollowerLag:                 0
MaxFollowerLagTimeMs:   -1
CurrentVoters:                    [3000,3001,3002]
CurrentObservers:              [0,1,2]

# specify AdminClient properties
kafka-metadata-quorum.sh --bootstrap-server localhost:9092 --command-config config.properties describe --status
```

Reviewers: Jason Gustafson <jason@confluent.io>
2022-08-20 08:37:26 -07:00
Jason Gustafson f0a09ea003
MINOR: Fix event output inconsistencies in TransactionalMessageCopier (#12098)
This patch fixes some strangeness and inconsistency in the messages written by `TransactionalMessageCopier` to stdout. Here is a sample of two messages.

Progress message:
```
{"consumed":33000,"stage":"ProcessLoop","totalProcessed":33000,"progress":"copier-0","time":"2022/04/24 05:40:31:649","remaining":333}
```
The `transactionalId` is set to the value of the `progress` key.

And a shutdown message:
```
{"consumed":33333,"shutdown_complete":"copier-0","totalProcessed":33333,"time":"2022/04/24 05:40:31:937","remaining":0}
```
The `transactionalId` this time is set to the `shutdown_complete` key and there is no `stage` key.

In this patch, we change the following:

1. Use a separate key for the `transactionalId`.
2. Drop the `progress` and `shutdown_complete` keys.
3. Use `stage=ShutdownComplete` in the shutdown message.
4. Modify `transactional_message_copier.py` system test service accordingly.

Reviewers: David Arthur <mumrah@gmail.com>
2022-04-29 10:02:25 -07:00
彭小漪 6145974fef
KAFKA-13728: fix PushHttpMetricsReporter no longer pushes metrics when network failure is recovered. (#11879)
The class PushHttpMetricsReporter no longer pushes metrics when network failure is recovered.

I debugged the code and found the problem here: when we submit a task to the ScheduledThreadPoolExecutor that needs to be executed periodically, if the task throws an exception and is not swallowed, the task will no longer be scheduled to execute.

So when an IO exception occasionally occurs on the network, we should swallow it rather than throw it in task HttpReporter.

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2022-03-19 21:09:28 -07:00
Justine Olshan 7afdb069bf
KAFKA-13750; Client Compatability KafkaTest uses invalid idempotency configs (#11909)
Reviewers: Luke Chen <showuon@gmail.com>, David Jacot <djacot@confluent.io>
2022-03-17 18:00:27 +01:00
Kirk True ec29b62e92
KAFKA-13444: Fix OAuthCompatibilityTool help and add SSL options (#11486)
Reviewers: Jun Rao <junrao@gmail.com>
2021-11-15 15:45:18 -08:00
Jorge Esteban Quilcate Otoya 214b59b3ec
KAFKA-13429: ignore bin on new modules (#11415)
Reviewers: John Roesler <vvcephei@apache.org>
2021-11-10 14:36:24 -06:00
Kirk True 7b379539a5
KAFKA-13202: KIP-768: Extend SASL/OAUTHBEARER with Support for OIDC (#11284)
This task is to provide a concrete implementation of the interfaces defined in KIP-255 to allow Kafka to connect to an OAuth/OIDC identity provider for authentication and token retrieval. While KIP-255 provides an unsecured JWT example for development, this will fill in the gap and provide a production-grade implementation.

The OAuth/OIDC work will allow out-of-the-box configuration by any Apache Kafka users to connect to an external identity provider service (e.g. Okta, Auth0, Azure, etc.). The code will implement the standard OAuth client credentials grant type.

The proposed change is largely composed of a pair of AuthenticateCallbackHandler implementations: one to login on the client and one to validate on the broker.

See the following for more detail:

KIP-768
KAFKA-13202

Reviewers: Yi Ding <dingyi.zj@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>
2021-10-28 11:36:53 -07:00
Jason Gustafson c1c639db77
KAFKA-13288; Include internal topics when searching hanging transactions (#11319)
This patch ensures that internal topics are included when searching for hanging transactions with the `--broker-id` argument in `kafka-transactions.sh`.

Reviewers: David Jacot <djacot@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2021-09-10 14:33:37 -07:00
Yanwen(Jason) Lin 66a27af2f1
KAFKA-10038: Supports default client.id for ConsoleConsumer, ProducerPerformance, ConsumerPerformance (#11297)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2021-09-07 13:49:50 -07:00
dengziming 1d22b0d706
KAFKA-10774; Admin API for Describe topic using topic IDs (#9769)
Reviewers: Justine Olshan <jolshan@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>, Satish Duggana <satishd@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>
2021-08-28 09:00:36 +01:00
Jason Gustafson ba47beec01
MINOR: Ensure transactional message copier failures are logged (#11268)
This patch has a couple small improvements to `TransactionalMessageCopier` logging:

- Log all fatal exceptions which cause the copier to shutdown unexpectedly
- Log all non-fatal exceptions which cause the copier to abort a transaction

Reviewers: David Jacot <djacot@confluent.io>
2021-08-27 11:02:47 -07:00
David Jacot 4e2f2b0674
MINOR: Update `TransactionalMessageCopier` to use the latest transaction pattern (#11265)
Reviewers: Jason Gustafson <jason@confluent.io>
2021-08-27 11:11:57 +02:00
Jason Gustafson a5daae20b5
KAFKA-13155; Fix concurrent modification in consumer shutdown (#11164)
The `TransactionalMessageCopier` tool, which is used in system tests attempts to close the consumer as part of a shutdown hook. Although the access is synchronized, there is no guarantee that the consumer has finished polling when shutdown is invoked. The patch fixes the problem by call `wakeup()` from the shutdown hook and pushing the call to `close()` to the main thread.

Reviewers: David Jacot <djacot@confluent.io>
2021-08-10 09:39:59 -07:00