Fixes a backwards-compatibility issue in the KIP-979 change to the kafka-server-stop.sh script where it would not stop processes if run from a directory that differed from the directory where the processes were started form and the config file specified at start time was a relative path.
Reviewers: Ron Dagostino <rdagostino@confluent.io>
Moves ELR from MetadataVersion IBP_3_7_IV3 into the new IBP_3_8_IV0 because the ELR feature was not completed before 3.7 reached feature freeze. Leaves IBP_3_7_IV3 empty -- it is a no-op and is not reused for anything. Adds the new MetadataVersion IBP_3_7_IV4 for the FETCH request changes from KIP-951, which were mistakenly never associated with a MetadataVersion. Updates the LATEST_PRODUCTION MetadataVersion to IBP_3_7_IV4 to declare both KRaft JBOD and the KIP-951 changes ready for production use.
Reviewers: Omnia G H Ibrahim <o.g.h.ibrahim@gmail.com>, Ron Dagostino <rdagostino@confluent.io>, Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@apache.org>, Justine Olshan <jolshan@confluent.io>
This patch bumps the next release version to 3.8.0-SNAPSHOT.
Following the Release Process, I created the 3.7 branch and am following the steps to bump these versions:
Modify the version in trunk to bump to the next one (eg. "0.10.1.0-SNAPSHOT") in the following files:
docs/js/templateData.js
gradle.properties
kafka-merge-pr.py
streams/quickstart/java/pom.xml
streams/quickstart/java/src/main/resources/archetype-resources/pom.xml
streams/quickstart/pom.xml
tests/kafkatest/__init__.py
This includes a fix to better handle the case where a member gets fenced while preparing to leave the group. The logic was already making sure that the member ignores the fencing (no callbacks triggered or rejoin), but this PR adds the right transition to stop sending heartbeats while the leaving process completes (because the member is not part of the group anymore from the broker point of view).
Reviewer: Bruno Cadonna <cadonna@apache.org>
This includes a fix for properly handling cases where the member might receive a new assignment from the broker when the member is already getting ready to leave the group. The expectation is that the member should just ignore the new assignment and continue with the leave group process.
Reviewer: Bruno Cadonna <cadonna@apache.org>
Rewrote the verification flow to pass a callback to execute after verification completes.
For the TxnOffsetCommit, we will call doTxnCommitOffsets. This allows us to do offset validations post verification.
I've reorganized the verification code and group coordinator code to make these code paths clearer. The followup refactor (https://issues.apache.org/jira/browse/KAFKA-15987) will further clean up the produce verification code.
Reviewers: Artem Livshits <alivshits@confluent.io>, Jason Gustafson <jason@confluent.io>, David Jacot <djacot@confluent.io>, Jun Rao <junrao@gmail.com>
This pull request implements the first in the list of metrics in KIP-963: Additional metrics in Tiered Storage.
Since each partition of a topic will be serviced by its own RLMTask we need an aggregator object for a topic. The aggregator object in this pull request is BrokerTopicAggregatedMetric. Since the RemoteCopyLagBytes is a gauge I have introduced a new GaugeWrapper. The GaugeWrapper is used by the metrics collection system to interact with the BrokerTopicAggregatedMetric. The RemoteLogManager interacts with the BrokerTopicAggregatedMetric directly.
Reviewers: Luke Chen <showuon@gmail.com>, Satish Duggana <satishd@apache.org>, Kamal Chandraprakash<kamal.chandraprakash@gmail.com>
This patch ensure that `offset.commit.timeout.ms` is enforced. It does so by adding a timeout to the CoordinatorWriteEvent.
Reviewers: David Jacot <djacot@confluent.io>
In many parameterized tests, the display name is broken. Example - testMetadataFetch appears as [1] true, [2] false link
This is because the constant in @ParameterizedTest
String DEFAULT_DISPLAY_NAME = "[{index}] {argumentsWithNames}";
This PR adds a new junit-platform.properties which overrides to add a {displayName} which shows the the display name of the method
For existing tests which override the name, should work as is. The precedence rules are explained
name attribute in @ParameterizedTest, if present
value of the junit.jupiter.params.displayname.default configuration parameter, if present
DEFAULT_DISPLAY_NAME constant defined in @ParameterizedTest
Source: https://junit.org/junit5/docs/current/user-guide/#writing-tests-parameterized-tests-display-names
Sample test run output
Before: [1] true link
After: testMetadataExpiry(boolean).false link
This commit is an extension of bdf6d46b41 which needed to reverted due to introduces test failures.
Reviewers: David Jacot <djacot@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
Currently, we increment generic group metrics whenever we create a new Group object when we load a partition. This is incorrect as the partition may contain several records for the same group if in the active segment or if the segment has not yet been compacted.
The same applies to removing groups; we can possibly have multiple group tombstone records. Instead, only increment the metric if we created a new group and only decrement the metric if the group exists.
Reviewers: David Jacot <djacot@confluent.io>
Implement Consumer.listTopics and Consumer.partitionsFor in the new consumer. The topic metadata request manager already existed so this PR adds expiration to requests, removes some redundant state checking and adds tests.
Reviewers: Lucas Brutschy <lucasbru@apache.org>
Update the port mapping in test compose files to align with the recommendations in example files
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Currently, poll interval is not being respected during consumer#poll. When the user stops polling the consumer, we should assume either the consumer is too slow to respond or is already dead. In either case, we should let the group coordinator kick the member out of the group and reassign its partition after the rebalance timeout expires.
If the consumer comes back alive, we should send a heartbeat and the member will be fenced and rejoin. (and the partitions will be revoked).
This is the same behavior as the current implementation.
Reviewers: Lucas Brutschy <lucasbru@apache.org>, Bruno Cadonna <cadonna@apache.org>, Lianet Magrans <lianetmr@gmail.com>
This patch updates the client state machine, including:
* handling transitions to fatal and fenced while member is leaving the group.
* minor improvements addressing comments.
* support for static member leave group.
* minor fixes for handling assignments received.
Reviewers: David Jacot <djacot@confluent.io>
Session expiration in ZkClient can lead to a thread leak, and does fail CI on master.
This is happening in testNoDescribeProduceOrConsumeWithoutTopicDescribeAcl, and possibly other tests.
Use try-with-resources to close ZkClient if this happens.
This does not fix the underlying session expiration in ZK.
Reviewers: David Jacot <djacot@confluent.io>
In the new consumer, the commit request manager and the membership manager are separate components. The commit request manager is initialised with group information that it uses to construct `OffsetCommit` requests. However, the initial value of the member ID is `""` in some cases. When the consumer joins the group, it receives a `ConsumerGroupHeartbeat` response which tells it the member ID. The member ID was not being passed to the commit request manager, so it sent invalid `OffsetCommit` requests that failed with `UNKNOWN_MEMBER_ID`.
Reviewers: Bruno Cadonna <cadonna@apache.org>, David Jacot <djacot@confluent.io>
The support for regular expressions has not been implemented yet in the new consumer group protocol. This patch removes the `SubscribedTopicRegex` from the `ConsumerGroupHeartbeatRequest` in preparation for 3.7. It seems better to bump the version and add it back when we implement the feature, as part of https://issues.apache.org/jira/browse/KAFKA-14517, instead of having an unused field in the request.
Reviewers: Sagar Rao <sagarmeansocean@gmail.com>, Justine Olshan <jolshan@confluent.io>
MetadataShell should take an advisory lock on the .lock file of the directory it is reading from.
Add an integration test of this functionality in MetadataShellIntegrationTest.java.
Note: in build.gradle, I had to add some dependencies on server-common's test files in order to use
MockFaultHandler, etc.
MetadataBatchLoader.java: fix a case where a log message was incorrect. The intention was to print
the number equivalent to (offset + index). Instead it was printing the offset, followed by the
index. So if the offset was 100 and the index was 1, 1001 would be printed rather than 101.
Co-authored-by: Igor Soarez <i@soarez.me>
Reviewers: David Arthur <mumrah@gmail.com>, José Armando García Sancio <jsancio@apache.org>
- Add proper start & stop for AssignmentsManager's event loop
- Dedupe queued duplicate assignments
- Fix bug where directory ID is resolved too late
Co-authored-by: Gaurav Narula <gaurav_narula2@apple.com>
Reviewers: Colin P. McCabe <cmccabe@apache.org>
When log directories fail, the broker will send a heartbeat listing the failed directories. This
PR implements processing offline directories in the controller's broker heartbeat handling. We
update broker registrations and generate leadership/ISR changes as necessary.
Co-authored-by: Colin P. McCabe <cmccabe@apache.org>
Reviewers: Ron Dagostino <rndgstn@gmail.com>
Allow using JBOD during ZK migration if MetadataVersion is at or above 3.7-IV2.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
KAFKA-15361 (#14838) introduced a check for non empty directory list on brokerregistration requests
from MetadataVersion.IBP_3_7_IV2 or later, which enables directory assignment. However, ZK brokers
weren't yet registering yet with a directory list. This patch addresses that. We also make the
directory list non-optional in BrokerLifecycleManager.
Reviewers: Ron Dagostino <rndgstn@gmail.com>, Colin P. McCabe <cmccabe@apache.org>, Proven Provenzano <pprovenzano@confluent.io>
Part of KIP-714.
Add support to collect client instance id of the global consumer.
Reviewers: Walker Carlson <wcarlson@confluent.io>, Lucas Brutschy <lbrutschy@confluent.io>
This new integration test verifies that a static member who temporary left the group is removed after the session timeout expires. It also verifies that a new static member with the same instance id can't join the group until the previous static member is expired.
Reviewers: David Jacot <djacot@confluent.io>
A recent change in AsyncKafkaConsumer.updateFetchPositions has made fetching fail without returning records in some situations. Reverting the troublesome code.
Reviewers: Lucas Brutschy <lbrutschy@confluent.io>
This reverts commit bdf6d46b41. We found out that this commit introduced flakiness in Streams' tests. We will revise it.
Reviewers: Bruno Cadonna <cadonna@apache.org>
Docker Image Promote github Actions Workflow was using crane github action, which is not allowed to run in apache github actions. So now using regctl which is in Github Marketplace.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>