Introduce new quarantinedTest that excludes tests tagged with "flaky". Also introduce two new build parameters "maxQuarantineTestRetries" and "maxQuarantineTestRetryFailures".
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch completely removes the compile-time dependency on core for both test and main sources by introducing two new modules.
1) `test-common` include all the common test implementation code (including dependency on :core for BrokerServer, ControllerServer, etc)
2) `test-common:api` new sub-module that just includes interfaces including our junit extension
Reviewers: David Arthur <mumrah@gmail.com>
Introduces the share coordinator. This coordinator is built on the new coordinator runtime framework. It
is responsible for persistence of share-group state in a new internal topic named "__share_group_state".
The responsibility for being a share coordinator is distributed across the brokers in a cluster.
Reviewers: David Arthur <mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>
The ignoreFailures property was removed in #17066 to prevent test failures from being cached. However, this breaks the JUnit report and makes the github workflow less user friendly.
The problem is that we are copying the junit test report files into a new directory (added in #17098) in a Gradle doLast closure. If we don't run with ignoreFailures=true, then this closure will not run and the test failures won't be processed by junit.py.
This patch adds logic to ensure the doLast closure of :test is always run. The user provided -PignoreFailures is still honored for the test tasks so local developer workflows should not be disturbed.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Recently, we fixed caching for ":jar" and ":test" tasks. A side effect of this is that the test results will be restored as part of the Gradle cache resolution. This means test tasks which are skipped (as a result of FROM-CACHE) will still have test results in their build directory. To avoid incorrectly reporting these results in the job summary, this patch uses a doLast task handler to relocate JUnit XML files into a new directory.
This patch also removes the "continue-on-error" from the JUnit test step which caused timed-out builds to appear successful.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
For several modules, we include a kafka-version.properties in the Jar file. This file includes the Git SHA of the project at the time of the build. This means that even if no source files change, the :jar task will never be UP-TO-DATE between two git commits. Ultimately, this breaks Gradle caching.
This patch marks all of the createVersionFile tasks as cacheable and also changes our Gradle invocation to override the commit ID to a dummy static value. This will allow the :jar task to be cacheable and reusable between builds.
This patch also configures the trunk build to only write to the build cache and not read from it. This will prevent any cache pollution/corruption from propagating from build to build.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This patch adds a "deflake" github action which can be used to run a single JUnit test or suites. It works by parameterizing the --tests Gradle option. If the test extends ClusterTest, the "deflake" workflow can repeat number of times by setting the kafka.cluster.test.repeat system property.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Introduce ShareCoordinator interface and related classes.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, David Arthur <mumrah@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This patch introduces a wrapper around [HdrHistogram](https://github.com/HdrHistogram/HdrHistogram) to use for group coordinator histograms, event queue time, event processing time, flush time, and purgatory time.
Reviewers: David Jacot <djacot@confluent.io>
Establishes the new `:share` Gradle module. This module is intended to be used for server-side KIP-932 classes that are not part of the new share group coordinator.
This patch relocates and renames some existing classes. A small amount of compatibility changes were also made, but do not affect any logic.
Reviewers: Andrew Schofield <aschofield@confluent.io>, David Arthur <mumrah@gmail.com>
What
Introduced a new module share-coordinator to house relevant implementation code and resources.
Modified settings.gradle and build.gradle to accommodate the module.
Added ShareSnapshot[Key, Value], ShareUpdate[Key, Value] message record schemas.
Introduced a trivial impl of ShareCoordinatorShard class to establish dependencies with other modules (:coordinator-common, :metadata). The actual impl for this class will be done in future PRs.
Why
The share coordinator component has been introduced as part of KIP-932 (QFK). This component is will be responsible for managing persistence for various data related to share partitions into a dedicated internal topic.
To keep all this functionality contained, we want to create a separate module in line with group and transaction coordinators.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Apoorv Mittal <apoorvmittal10@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
There is a lot of code in group-coordinator which is not share/consumer/classic group specific.
Since we are introducing a share-coordinator as part of KIP-932 (in a new module), it would make sense to get the common coordinator functionality into a separate common coordinator module so that share-coordinator need not depend on group-coordinator.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, David Jacot <djacot@confluent.io>, Andrew Schofield <aschofield@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This change fixes the Kafka configuration validation to take into account the reconfiguration changes to configuration and allows KRaft observers to start with an unknown set of voters.
For the Kafka configuration validation the high-level change is that now the user only needs to specify either the controller.quorum.bootstrap.servers property or the controller.quorum.voters property. The other notable change in the configuration is that controller listeners can now be (and should be) specified in advertise.listeners property.
Because Kafka can now be configured without any voters and just the bootstrap servers. The KRaft client needs to allow for an unknown set of voters during the initial startup. This is done by adding the VoterSet#empty set of voters to the KRaftControlRecordStateMachine.
Lastly the RaftClientTestContext type is updated to support this new configuration for KRaft and a test is added to verify that observers can start and send Fetch requests when the voters are unknown.
Reviewers: David Arthur <mumrah@gmail.com>
In MetadataVersion 3.7-IV2 and above, the broker's AssignmentsManager sends an RPC to the
controller informing it about which directory we have chosen to place each new replica on.
Unfortunately, the code does not check to see if the topic still exists in the MetadataImage before
sending the RPC. It will also retry infinitely. Therefore, after a topic is created and deleted in
rapid succession, we can get stuck including the now-defunct replica in our subsequent
AssignReplicasToDirsRequests forever.
In order to prevent this problem, the AssignmentsManager should check if a topic still exists (and
is still present on the broker in question) before sending the RPC. In order to prevent log spam,
we should not log any error messages until several minutes have gone past without success.
Finally, rather than creating a new EventQueue event for each assignment request, we should simply
modify a shared data structure and schedule a deferred event to send the accumulated RPCs. This
will improve efficiency.
Reviewers: Igor Soarez <i@soarez.me>, Ron Dagostino <rndgstn@gmail.com>
This PR adds support for add-controller and remove-controller in the kafka-metadata-quorum.sh
command-line tool. It also fixes some minor server-side bugs that blocked the tool from working.
In kafka-metadata-quorum.sh, the implementation of remove-controller is fairly straightforward. It
just takes some command-line flags and uses them to invoke AdminClient. The add-controller
implementation is a bit more complex because we have to look at the new controller's configuration
file. The parsing logic for the advertised.listeners and listeners server configurations that we
need was previously implemented in the :core module. However, the gradle module where
kafka-metadata-quorum.sh lives, :tools, cannot depend on :core. Therefore, I moved listener parsing
into SocketServerConfigs.listenerListToEndPoints. This will be a small step forward in our efforts
to move Kafka configuration out of :core.
I also made some minor changes in kafka-metadata-quorum.sh and Kafka-storage-tool.sh to handle
--help without displaying a backtrace on the screen, and give slightly better error messages on
stderr. Also, in DynamicVoter.toString, we now enclose the host in brackets if it contains a colon
(as IPV6 addresses can).
This PR fixes our handling of clusterId in addRaftVoter and removeRaftVoter, in two ways. Firstly,
it marks clusterId as nullable in the AddRaftVoterRequest.json and RemoveRaftVoterRequest.json
schemas, as it was always intended to be. Secondly, it allows AdminClient to optionally send
clusterId, by using AddRaftVoterOptions and RemoveRaftVoterOptions. We now also remember to
properly set timeoutMs in AddRaftVoterRequest. This PR adds unit tests for
KafkaAdminClient#addRaftVoter and KafkaAdminClient#removeRaftVoter, to make sure they are sending
the right things.
Finally, I fixed some minor server-side bugs that were blocking the handling of these RPCs.
Firstly, ApiKeys.ADD_RAFT_VOTER and ApiKeys.REMOVE_RAFT_VOTER are now marked as forwardable so that
forwarding from the broker to the active controller works correctly. Secondly,
org.apache.kafka.raft.KafkaNetworkChannel has now been updated to enable API_VERSIONS_REQUEST and
API_VERSIONS_RESPONSE.
Co-authored-by: Murali Basani muralidhar.basani@aiven.io
Reviewers: José Armando García Sancio <jsancio@apache.org>, Alyssa Huang <ahuang@confluent.io>
This pr support EarliestLocalSpec LatestTierSpec in GetOffsetShell, and add integration tests.
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>, PoAn Yang <payang@apache.org>
The generated code path added to test scope of sourceSets will make `Gradle` assume there are tests in module, and then it results in failed test task since the module does not define test engine.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This version of the ShadowJarPlugin uses the incorrect classifier for the published archive. This is a temporary measure to fix publishing prior to the upcoming release.
Signed-off-by: Greg Harris <greg.harris@aiven.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
In #12148 , we removed log4jAppender dependency, and add testImplementation dependency for slf4jlog4j lib. However, we need this runtime dependency in tools module to output logs. (ref) Adding this dependency back.
Note: The slf4jlog4j lib was added in log4j-appender dependency. Since it's removed, we need to explicitly declare it.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Edoardo Comar <ecomar@uk.ibm.com>
This patch moves the `PartitionAssignor` interface and all the related classes to a newly created `group-coordinator/api` module, following the pattern used by the storage and tools modules.
Reviewers: Ritika Reddy <rreddy@confluent.io>, Jeff Kim <jeff.kim@confluent.io>, Chia-Ping Tsai <chia7712@gmail.com>
This PR aims to add JVM based Docker Official Image for Apache Kafka as per the following KIP - https://cwiki.apache.org/confluence/display/KAFKA/KIP-1028%3A+Docker+Official+Image+for+Apache+Kafka
This PR adds the following functionalities:
Introduces support for Apache Kafka Docker Official Images via:
GitHub Workflows:
- Workflow to prepare static source files for Docker images
- Workflow to build and test Docker official images
- Scripts to prepare source files and perform Docker image builds and tests
A new directory for Docker official images, named docker/docker_official_images. This is the new directory to house all Docker Official Image assets.
Co-authored-by: Vedarth Sharma <vesharma@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Vedarth Sharma <vesharma@confluent.io>
When the consumer group protocol is used in a cluster, it is, at the moment, impossible to see all records stored in the __consumer_offsets topic with kafka-dump-log --offsets-decoder. It does not know how to handle all the new records.
This patch refactors the OffsetsMessageParser used internally by kafka-dump-log to use the RecordSerde used by the new group coordinator. It ensures that the tool is always in sync with the coordinator implementation. The patch also changes the format to using the toString'ed representations of the records instead of having custom logic to dump them. It ensures that all the information is always dumped. The downside of the latest is that inner byte arrays (e.g. assignment in the classic protocol) are no longer deserialized. Personally, I feel like that it is acceptable and it is actually better to stay as close as possible to the actual records in this tool. It also avoids issues like https://issues.apache.org/jira/browse/KAFKA-15603.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Small cleanup: removed version when excluding shaded dependencies from clients library as it's not needed.
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This avoids `gradle dependencyCheckAggregate` from reporting on
advisories in build-time dependencies (e.g. CVE-2023-46122) which
typically should not affect us.
I checked that this does not prevent advisories in 'regular'
dependencies from being reported (but there currently are none).
Reviewers: Josep Prat <josep.prat@aiven.io>
Following test cases don't really run kraft case. The reason is that the test info doesn't contain parameter name, so it always returns false in TestInfoUtils#isKRaft.
1) TopicCommandIntegrationTest
2) DeleteConsumerGroupsTest
3) AuthorizerIntegrationTest
4) DeleteOffsetsConsumerGroupCommandIntegrationTest
We can fix it by adding options.compilerArgs << '-parameters' after
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
The issue KAFKA-16359 reported inclusion of kafka-clients runtime dependencies in MANIFEST.MF Class-Path.
The root cause is the issue defined here with the usage of shadow plugin.
Looking into the specifics of plugin and documentation, specifies that any dependency marked as shadow will be treated as following by the shadow plugin:
1. Adds the dependency as runtime dependency in resultant pom.xml - code here
2. Adds the dependency as Class-Path in MANIFEST.MF as well - code here
Resolution
We do need the runtime dependencies available in the pom (1 above) but not on manifest (2 above). Also there is no clear way to separate the behaviour as both above tasks relies on shadow configuration.
To fix, I have defined another custom configuration named shadowed which is later used to populate the correct pom (the code is similar to what shadow plugin does to populate pom for runtime dependencies).
Though this might seem like a workaround, but I think that's the only way to fix the issue. I have checked other SDKs which suffered with same issue and went with similar route to populate pom.
Reviewers: Luke Chen <showuon@gmail.com>, Reviewers: Mickael Maison <mickael.maison@gmail.com>, Gaurav Narula <gaurav_narula2@apple.com>
1) This PR moves kafka.security classes from core to server module.
2) AclAuthorizer not moved, because it has heavy dependencies on core classes that not rewrited from scala at the moment.
3) AclAuthorizer will be deleted as part of ZK removal
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
reportScoverage task (which was used previously as dependency of the registered coverage task)
creates a task for each Test task and executes them. There's unitTest, integrationTest and
the test tasks (which is just for executing both unit and integration), so reportScoverage
executes all three with their corresponding scoverage task, hence running all tests twice.
Solution is just to use the reportTestScoverage task as dependency.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
1) Rename WorkerSinkTaskMockitoTest back to WorkerSinkTaskTest
2) Tidy up the code a bit
3) rewrite "fail" by "assertThrow"
Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Jacoco and scoverage reporting hasn't been working for a while. This commit fixes report generation. After this PR only subproject level reports are generated as Jenkins and Sonar only cares about that.
This PR doesn't change Kafka's Jenkinsfile.
Reviewers: Viktor Somogyi-Vass <viktorsomogyi@gmail.com>
Currently, there are few document files generated automatically like the task genConnectMetricsDocs
However, the unwanted log information also added into it.
And the format is not aligned with other which has Mbean located of the third column.
I modified the code logic so the format could follow other section in ops.html
Also close the log since we take everything from the std as a documentation
Reviewers: Luke Chen <showuon@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
In order to move ConfigCommand to tools we must move all it's dependencies which includes KafkaConfig and other core classes to java. This PR moves log cleaner configuration to CleanerConfig class of storage module.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
This PR is part of #14471
Is contains some of ConsoleGroupCommand tests rewritten in java.
Intention of separate PR is to reduce changes and simplify review.
Reviewers: Luke Chen <showuon@gmail.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Ismael Juma <ismael@juma.me.uk> , David Jacot <djacot@confluent.io>, Nikolay <NIzhikov@gmail.com>
This patch wires the transaction verification in the new group coordinator. It basically calls the verification path before scheduling the write operation. If the verification fails, the error is returned to the caller.
Note that the patch uses `appendForGroup`. I suppose that we will move away from using it when https://github.com/apache/kafka/pull/15087 is merged.
Reviewers: Justine Olshan <jolshan@confluent.io>
Migrates functionality provided by utility to Kafka core. This wrapper will be used to generate property files and format storage when invoked from docker container.
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Viktor Somogyi-Vass <viktorsomogyi@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
The PR fixes the publishing of kafka-clients artifact to remote maven. The kafka-clients jar was recently shadowed which would publish the artifacts to the local maven repo successfully but would throw an error when publishing to remote maven. (as part of the release process)
The issue triggers only with publishMavenJavaPublicationToMavenRepository due to signing. Generating signed asc files error out for shadowed release artifacts as the module name (clients) differs from the artifact name (kafka-clients).
The fix is basically to explicitly define artifact of shadowJar to signing and publish plugin. project.shadow.component(mavenJava) previously outputs the name as client-<version>-all.jar though the classifier and archivesBaseName are already defined correctly in :clients and shadowJar construction.
This patch moves the `RaftIOThread` implementation into Java. I changed the name to `KafkaRaftClientDriver` since the main thing it does is drive the calls to `poll()`. There shouldn't be any changes to the logic.
Reviewers: José Armando García Sancio <jsancio@apache.org>
Improve JsonConverter performance by using afterBurnModule of Jackson library.
Reviewers: Divij Vaidya <diviv@amazon.com>, Mickael Maison <mickael.maison@gmail.com>
MetadataShell should take an advisory lock on the .lock file of the directory it is reading from.
Add an integration test of this functionality in MetadataShellIntegrationTest.java.
Note: in build.gradle, I had to add some dependencies on server-common's test files in order to use
MockFaultHandler, etc.
MetadataBatchLoader.java: fix a case where a log message was incorrect. The intention was to print
the number equivalent to (offset + index). Instead it was printing the offset, followed by the
index. So if the offset was 100 and the index was 1, 1001 would be printed rather than 101.
Co-authored-by: Igor Soarez <i@soarez.me>
Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
We need to be careful when aborting a long poll with wakeup() since the
consumer might never return records if the poll is interrupted after the
consumer position has been updated but the records have not been returned
to the caller of poll().
This PR avoid wake-ups during this critical period.
Reviewers: Philip Nee <pnee@confluent.io>, Kirk True <ktrue@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
We only move Java classes that have minimal or no dependencies on Scala classes in this PR.
Details:
* Configured `server` module in build files.
* Changed `ControllerRequestCompletionHandler` to be an interface since it has no implementations.
* Cleaned up various import control files.
* Minor build clean-ups for `server-common`.
* Disabled `testAssignmentAggregation` when executed with Java 8, this is an existing issue (see #14794).
For broader context on this change, please check:
* KAFKA-15852: Move server code from `core` to `server` module
Reviewers: Divij Vaidya <diviv@amazon.com>
Reviewers: Christo Lolov <lolovc@amazon.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>, Ron Dagostino <rdagostino@confluent.io>
Part of KIP-714.
The PR comprises of changes to include opentlemetry library as defined in KIP-714. The libraries are shadowed to prevent conflicts.
Co-authored-by: Xavier Léauté <xl+github@xvrl.net>
Reviewers: Andrew Schofield <aschofield@confluent.io>, Xavier Léauté <xavier@confluent.io>, Walker Carlson <wcarlson@confluent.io>, Matthias J. Sax <matthias@confluent.io>
This PR for KIP-714 - KAFKA-1564 lays out interfaces and classes for capturing client telemetry metrics.
Below image defines interaction of different classes among them interfaces have been included in the PR.
Reviewers: Walker Carlson <wcarlson@apache.org>, Matthias J. Sax <matthias@confluent.io>, Andrew Schofield <andrew_schofield@uk.ibm.com>, Kirk True <ktrue@confluent.io>, Philip Nee <pnee@confluent.io>, Jun Rao <junrao@gmail.com>,
git gc moves commit hashes from individual .git/refs/heads/ to .git/packed-refs which is not read
by the determineCommitId function.
Replace the existing lookup within the .git directory with a GrGit lookup that handles packed and
unpacked refs transparently.
Reviewers: Ismael Juma <ismael@juma.me.uk>
Implements the following metrics:
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=loading
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=active
kafka.server:type=group-coordinator-metrics,name=num-partitions,state=failed
kafka.server:type=group-coordinator-metrics,name=event-queue-size
kafka.server:type=group-coordinator-metrics,name=partition-load-time-max
kafka.server:type=group-coordinator-metrics,name=partition-load-time-avg
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-min
kafka.server:type=group-coordinator-metrics,name=thread-idle-ratio-avg
The PR makes these metrics generic so that in the future the transaction coordinator runtime can implement the same metrics in a similar fashion.
Also, CoordinatorLoaderImpl#load will now return LoadSummary which encapsulates the start time, end time, number of records/bytes.
Co-authored-by: David Jacot <djacot@confluent.io>
Reviewers: Ritika Reddy <rreddy@confluent.io>, Calvin Liu <caliu@confluent.io>, David Jacot <djacot@confluent.io>, Justine Olshan <jolshan@confluent.io>
Spotbugs was temporarily disabled as part of KAFKA-15485 to support Kafka build with JDK 21. This PR upgrades the spotbugs version to 4.8.0 which adds support for JDK 21 and enables it's usage on build again.
Reviewers: Divij Vaidya <diviv@amazon.com>
This PR is part of #13247
It contains ReassignPartitionsIntegrationTest rewritten in java.
Goal of PR is reduce changes size in main PR.
Reviewers: Taras Ledkov <tledkov@apache.org>, Justine Olshan <jolshan@confluent.io>
* Update CI to build with Java 21 instead of Java 20
* Disable spotbugs when building with Java 21 as it doesn't support it yet (filed KAFKA-15492 for
addressing this)
* Disable SslTransportLayerTest.testValidEndpointIdentificationCN with Java 21 (same as Java 20)
Reviewers: Divij Vaidya <diviv@amazon.com>
Resolves cache misses in checkstyle tasks due to absolute paths in configProperties.
Sets configDirectory extension property, which is made available by the checkstyle plugin as ${config_loc} in the checkstyle xml files, as shown in the Checkstyle Gradle docs. The absolute paths set in configProperties are then replaced by relative paths from configDirectory. Because the header and suppression config file names are static and only referenced once, these were removed from configProperties and the file names are given directly in checkstyle.xml
Reviewers: Divij Vaidya <diviv@amazon.com>
This change does the following:
1. Make RemoteLogManagerConfigs that are implemented public
2. Add tasks to generate html docs for the configs
3. Include config docs in the main site
Reviewers: Divij Vaidya <diviv@amazon.com>, Luke Chen <showuon@gmail.com>, Christo Lolov <lolovc@amazon.com>, Satish Duggana <satishd@apache.org>
`TieredStorageTestHarness` is a base class for integration tests exercising the tiered storage functionality. This uses `LocalTieredStorage` instance as the second-tier storage system and `TopicBasedRemoteLogMetadataManager` as the remote log metadata manager.
Co-authored-by: Alexandre Dupriez <alexandre.dupriez@gmail.com>
Co-authored-by: Kamal Chandraprakash <kamal.chandraprakash@gmail.com>