https://issues.apache.org/jira/browse/KAFKA-19485
**Bug :**
There is a bug in `ShareConsumeRequestManager` where we are adding
acknowledgements on initial `ShareSession` epoch even after checking for
it.
Added fix to only include acknowledgements in the request if we have to,
PR also adds the check at another point in the code where we could
potentially be sending such acknowledgements. One of the cases could be
when metadata is refreshed with empty topic IDs after a broker restart.
This means leader information would not be available on the node.
- Consumer subscribed to a partition whose leader was node-0.
- Broker restart happens and node-0 is elected leader again. Broker
starts a new `ShareSession`.
- Background thread sends a fetch request with **non-zero** epoch.
- Broker responds with `SHARE_SESSION_NOT_FOUND`.
- Client updates session epoch to 0 once it receives this error.
- Client updates metadata but receives empty metadata response. (Leader
unavailable)
- Application thread processing the previous fetch, completes and sends
acks to piggyback on next fetch.
- Next fetch will send the piggyback acknowledgements on the fetch for
previously subscribed partitions resulting in error from broker
("`Acknowledge data present on initial epoch`"). (Currently we attempt
to send even if leader is unavailable).
**Fix** : Add a check before sending out acknowledgments if we are on
an initial epoch.
Added unit test covering the above scenario.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Minor tidying up in AlterShareGroupOffsetsHandler based on review
comment
https://github.com/apache/kafka/pull/20049#discussion_r2192904850.
Reviewers: Jimmy Wang <wangzhiwang611@gmail.com>, Lan Ding
<isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>, Ken Huang
<s7133700@gmail.com>, Jhen-Yung Hsu <jhenyunghsu@gmail.com>, Chia-Ping
Tsai <chia7712@gmail.com>
## Changes
- The partitions == 0 branch has been moved from **waitForTopic** to
**waitTopicDeletion**.
## Reasons
- Clarify the responsibility of each helper method makes the test code
easier to reason by moving the partitions == 0 logic from
**waitForTopic** into a dedicated method **waitTopicDeletion**.
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TaiJuWu
<tjwu1217@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
Creates GetReplicaLogInfoRequest and GetReplicaLogInfoResponse RPCs
Information returned by these brokers will be used to aid
unclean-recovery by selecting longest logs.
Reviewers: Alyssa Huang <ahuang@confluent.io>, Calvin Liu <caliu@confluent.io>, Colin P. McCabe <cmccabe@apache.org>, TaiJuWu <tjwu1217@gmail.com>
This PR includes the following fixes:
- Streams CLI used to list and return the description of the first group
which is a bug. With this fix, it returns the descriptions of the groups
specified by the `--group` or `all-groups`. Integration test are added
to verify the fix.
- `timeoutOption` is missing in describe groups. This fix adds and tests
it with short timeout.
- `DescribeStreamsGroupsHandler` used to return an empty group in `DEAD`
state when the group id was not found, but with this fix, it throws
`GroupIdNotFoundException`
Use Java to rewrite PlaintextConsumerTest by new test infra and move it
to client-integration-tests module.
Reviewers: Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
This fixes librdkafka older than the recently released 2.11.0 with
Kerberos authentication and Apache Kafka 4.x.
Even though this is a bug in librdkafka, a key goal of KIP-896 is not to
break the popular client libraries listed in it. Adding back JoinGroup
v0 & v1 is a very small change and worth it from that perspective.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Migrate ControllerMutationQuotaManager to Java implementation and move
to server module, including ClientQuotaManager and associated files.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Ken Huang
<s7133700@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
When writing HTML, it's recommended to use the <code> element instead of
backticks for inline code formatting.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, TengYao Chi
<frankvicky@apache.org>
Flaky Test Report / Flaky Test Report (push) Has been cancelledDetails
- Metadata doesn't have the full view of topicNames to ids during
rebootstrap of client or when topic has been deleted/recreated. The
solution is to pass down topic id and stop trying to figure it out later
in the logic.
---------
Co-authored-by: Kirk True <kirk@kirktrue.pro>
Update the outdated Javadocs in Metrics.java. The `MetricName(String
name, String group)` constructor in MetricName.java was removed in
59b918ec2b
Minor typo fixes included.
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
While testing the code in https://github.com/apache/kafka/pull/19820, it
became clear that the error handling problems were due to the underlying
Admin API. This PR fixes the error handling for top-level errors in the
AlterShareGroupOffsets RPC.
Reviewers: Apoorv Mittal <apoorvmittal10@gmail.com>, Lan Ding
<isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>
Remove FlattenedIterator. It’s no longer used anywhere after
https://github.com/apache/kafka/pull/20037.
Reviewers: TengYao Chi <kitingiao@gmail.com>, Lan Ding
<isDing_L@163.com>, Chia-Ping Tsai <chia7712@gmail.com>
The PR do following:
1. Remove ReplicaManager#becomeLeaderOrFollower.
2. Remove `LeaderAndIsrRequest` and `LeaderAndIsrResponse`
3. Migrate `LeaderAndIsrRequest.PartitionState` to server-common module
and change to `PartitionState`
4. Remove `ControllerEpoch` from PartitionState
5. Remove `isShuttingDown` from BrokerServer and ReplicaManager
Reviewers: Kuan-Po Tseng <brandboat@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
from https://github.com/apache/kafka/pull/19319#discussion_r2169555430
This pull request addresses a minor typo in the
ProducerFailureHandlingTest within the Apache Kafka project.
Specifically, it corrects an erroneous method call where waitForDeletion
was used instead of waitForTopic (or createTopic).
Reviewers: PoAn Yang <payang@apache.org>, TaiJuWu <tjwu1217@gmail.com>,
Jhen-Yung Hsu <jhenyunghsu@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Ken Huang <s7133700@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
This PR does the following:
1) Rewrites consumerBounceTest in Java.
2) Moves the test to clients-integration-test.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Ensure the config.providers configuration is documented for all
components supporting it
Reviewers: Mickael Maison <mickael.maison@gmail.com>, Greg Harris
<gharris1727@gmail.com>, Matthias J. Sax <mjsax@apache.org>
The OffsetFetch API does not support top level errors in version 1.
Hence, the top level error must be returned at the partition level.
Side note: It is a tad annoying that we create error response in
multiple places (e.g. KafkaApis, Group CoordinatorService). There were a
reason for this but I cannot remember.
Reviewers: Dongnuo Lyu <dlyu@confluent.io>, Sean Quah <squah@confluent.io>, Ken Huang <s7133700@gmail.com>, TengYao Chi <frankvicky@apache.org>
The prepareResponse function is no longer used anywhere in the codebase.
Removing unused code improves maintainability and readability.
Reviewers: PoAn Yang <payang@apache.org>, Yung
<yungyung7654321@gmail.com>, TengYao Chi <frankvicky@apache.org>, Ken
Huang <s7133700@gmail.com>
Description
* Replace `org.apache.kafka.common.test.TestUtils` with
`org.apache.kafka.test.TestUtils` in outer package modules to
standardize test utility usage
* Move `waitUntilLeaderIsElectedOrChangedWithAdmin` method from
`org.apache.kafka.test.TestUtils` to `ClusterInstance` and refactor for
better code organization
* Add `org.apache.kafka.test.TestUtils` dependency to
`transaction-coordinator` import control
Reviewers: PoAn Yang [payang@apache.org](mailto:payang@apache.org), Ken
Huang [s7133700@gmail.com](mailto:s7133700@gmail.com), Ken Huang
[s7133700@gmail.com](mailto:s7133700@gmail.com), Chia-Ping Tsai
[chia7712@gmail.com](mailto:chia7712@gmail.com)
this is a follow-up for https://github.com/apache/kafka/pull/19685
The timeout issue in `AsyncConsumer#unsubscribe` was fixed by
https://github.com/apache/kafka/pull/19779. As a result, the test
`GroupAuthorizerIntegrationTest#testConsumeUnsubscribeWithoutGroupPermission`
should now retain its original behavior as expected prior to the issue.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
In KRaft, custom KafkaPrincipalBuilder instances must implement
KafkaPrincipalSerde to support the forward mechanism. Currently, this
requirement is not enforced and relies on the developer's attention.
With this patch, we can prevent incorrect implementations at compile
time.
Reviewers: Ken Huang <s7133700@gmail.com>, TengYao Chi
<kitingiao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
The old approach `OFFSETS_TOPIC_REPLICATION_FACTOR_CONFIG` will override
`OFFSETS_TOPIC_PARTITIONS_CONFIG` config, this behaviour is not
expected, we should fix it.
Reviewers: TengYao Chi <frankvicky@apache.org>
Replace `singleton` with `Set.of` in the SubscriptionStateTest.
Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, Yung <yungyung7654321@gmail.com>, TengYao Chi
<frankvicky@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
MetadataRequest and MetadataResponse version 0 is still supported.
Remove from README.md mentioning that they're not supported. It's
unclear how such a mention was ever included in the file. It has always
been there
Reviewers: Andrew Schofield <aschofield@confluent.io>
Configure tests in ShareConsumer to run exclusively in Raft-Isolated
mode to reduce execution time.
Reviewers: Andrew Schofield <aschofield@confluent.io>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>, Ken Huang <s7133700@gmail.com>
*What*
- There was a possibility of a flake occurring in one of the tests in
`ShareConsumerTest` where we are not waiting for the assignment to be
received and synced before we start consuming on `poll()`.
- There should ideally be a `waitedPoll()` or a dedicated poll which
would trigger the reconcilation and updation of the assignment.
Scenario :
- We are doing 3 `polls` to receive 1500 records.
- On debugging, I noticed 1st poll always returned 0 records as this was
triggering reconcilation, the test passed locally as 2nd and 3rd polls
add up to give us 1500 records almost always (as 500 is a soft limit).
The test can fail if these 2 `polls` do not add up 1500. We need a
maximum
of 3 polls to receive 1500 records with `max.poll.records` set to 500.
- Now we have introduced a loop which would keep polling until we
receive the expected records and then close.
Reviewers: Andrew Schofield <aschofield@confluent.io>
Fix to ensure assigned partitions whose topics are not in the consumer
explicit subscription are considered not fetchable (so that no records
are returned on poll for them)
This scenario could happen in the new async consumer (using the Consumer
rebalance protocol) when the subscription changes, because the consumer
will keep its assignment until the coordinator sends a new one (broker
drives assignments).
This does not happen in the classic consumer because the assignment
logic lives on the client-side, so the consumer pro-actively updates
assignment as needed.
This PR validates assignment vs subscription on fetch for explicit
subscription only. Regular expressions, shared subscription remain
unchanged (regex case still under discussion, will be handled separately
if needed)
Reviewers: Andrew Schofield <aschofield@confluent.io>, TengYao Chi
<frankvicky@apache.org>, Kirk True <ktrue@confluent.io>, Jhen-Yung Hsu
<jhenyunghsu@gmail.com>
Profiling has shown that using the Collections Streams API approach adds
unnecessary overhead compared to a traditional for loop. Minor revisions
to the code have been made to use simpler constructs to improve
performance.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Andrew Schofield
<aschofield@confluent.io>
These can be used to implement transformations on top of the RPC
definitions. Group IDs were already marked. This PR additionally adds
the entityType for all topic names.
Reviewers: Matthias J. Sax <matthias@confluent.io>
This change compares the remote replica's HWM with the leader's HWM and
completes the FETCH request if the remote HWM is less than the leader's
HWM. When the leader's HWM is updated any pending FETCH RPC is
completed.
Reviewers: Alyssa Huang <ahuang@confluent.io>, David Arthur
<mumrah@gmail.com>, Andrew Schofield <aschofield@confluent.io>
Use Java to rewrite ProducerSendWhileDeletionTest by new test infra and
move it to client-integration-tests module.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
see https://github.com/apache/kafka/pull/19948#discussion_r2150617216,
replace test "catch exception" by assertThrows.
Reviewers: PoAn Yang <payang@apache.org>, Ken Huang
<s7133700@gmail.com>, TengYao Chi <kitingiao@gmail.com>, Chia-Ping Tsai
<chia7712@gmail.com>
Clarify the interaction of `min.insync.replicas` and `ack=all`
configuration. Prior to this change, the doc for `min.insync.replicas`
could have been interpreted as being used to short-circuit in the
`acks=all` case as if it would be enough if `min.inscyn.replicas`
number of brokers replicated a message before it can be acknowledged
back to the producer.
Reviewers: Matthias J. Sax <matthias@confluent.io>
---------
Co-authored-by: Matthias J. Sax <mjsax@apache.org>
I noticed that in commit
[a662bc56](a662bc5634),
renamed ignored exceptions `e` to `swallow`.
Here's a small patch to make that change consistent in other files:
AsyncKafkaConsumer.java, KafkaConsumerTest.java,
AsyncKafkaConsumerTest.java
Reviewers: Kirk True <kirk@kirktrue.pro>, Lan Ding <isDing_L@163.com>,
Chia-Ping Tsai <chia7712@gmail.com>
Adds a test dependency on
[mock-oauth2-server](https://github.com/navikt/mock-oauth2-server/) for
integration tests for OAuth layer. Also includes fixes for some
regressions that were caught by the integration tests.
Reviewers: Manikumar Reddy <manikumar@confluent.io>, Lianet Magrans
<lmagrans@confluent.io>
This PR uses topic IDs received in assignment (under new protocol) to
ensure that only these assigned topics are included in the consumer
metadata requests performed when the user subscribes to broker-side
regex (RE2J).
For handling the edge case of consumer needing metadata for topics IDs
(from RE2J) and topic names (from transient topics), the approach is to
send a request for the transient topics needed temporarily, and once
those resolved, the request for the topic IDs needed for RE2J will
follow. (this is because the broker doesn't accept requests for names
and IDs at the same time)
With the changes we also end up fixing another issue (KAFKA-18729) aimed
at avoiding iterating the full set of assigned partitions when checking
if a topic should be retained from the metadata response when using
RE2J.
Reviewers: David Jacot <djacot@confluent.io>
Fixed a long-standing issue where the client JWT validation was decoding
the JWT sections using base 64 instead of URL-safe base 64.
Note: server-side validation leverages the jose4j library for parsing
JWTs, hence no fix is needed there.
Reviewers: Lianet Magrans <lmagrans@confluent.io>, Manikumar Reddy
<manikumar@confluent.io>
---------
Co-authored-by: Lianet Magrans <98415067+lianetm@users.noreply.github.com>
Simplify Set initialization and reduce the overhead of creating extra
collections.
The changes mostly include:
- new HashSet<>(List.of(...))
- new HashSet<>(Arrays.asList(...)) / new HashSet<>(asList(...))
- new HashSet<>(Collections.singletonList()) / new
HashSet<>(singletonList())
- new HashSet<>(Collections.emptyList())
- new HashSet<>(Set.of())
This change takes the following into account, and we will not change to
Set.of in these scenarios:
- Require `mutability` (UnsupportedOperationException).
- Allow `duplicate` elements (IllegalArgumentException).
- Allow `null` elements (NullPointerException).
- Depend on `Ordering`. `Set.of` does not guarantee order, so it could
make tests flaky or break public interfaces.
Reviewers: Ken Huang <s7133700@gmail.com>, PoAn Yang
<payang@apache.org>, Chia-Ping Tsai <chia7712@gmail.com>
Log segment closure results in right sizing the segment on disk along
with the associated index files.
This is specially important for TimeIndexes where a failure to right
size may eventually cause log roll failures leading to under replication
and log cleaner failures.
This change uses `Utils.closeAll` which propagates exceptions, resulting
in an "unclean" shutdown. That would then cause the broker to attempt to
recover the log segment and the index on next startup, thereby avoiding
the failures described above.
Reviewers: Omnia Ibrahim <o.g.h.ibrahim@gmail.com>, Jun Rao
<junrao@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
We can use `pollUntilTrue` instead of `waitForCondition`, thus do a
little refactor to reduce the duplicate code
Reviewers: TengYao Chi <frankvicky@apache.org>, Lan Ding
<isDing_L@163.com>, TaiJuWu <tjwu1217@gmail.com>
Use Java to rewrite PlaintextConsumerSubscriptionTest by new test infra
and move it to client-integration-tests module.
Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
Remove the event IDs from the ApplicationEvent and BackgroundEvent as it
serves no functional purpose other than uniquely identifying events in
the logs.
Reviewers: Andrew Schofield <aschofield@confluent.io>
While reading through the code, I found the method name to be somewhat
ambiguous and not fully descriptive of its purpose.
So I renamed the method to make its purpose clearer and more
self-explanatory. If there was another reason for the original naming,
I’d be happy to hear about it.
Reviewers: Lianet Magrans <lmagrans@confluent.io>
* Add `group.share.assignors` config to `GroupCoordinatorConfig`.
* Send `rackId` in share group heartbeat request if it's not null.
* Add integration test `testShareConsumerWithRackAwareAssignor`.
Reviewers: Lan Ding <53332773+DL1231@users.noreply.github.com>, Andrew
Schofield <aschofield@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
The mapKey optimisation can be used in some KIP-932 RPC schemas to
improve efficiency of some key-based accesses.
* AlterShareGroupOffsetsResponse
* ShareFetchRequest
* ShareFetchResponse
* ShareAcknowledgeRequest
* ShareAcknowledgeResponse
Reviewers: Andrew Schofield <aschofield@confluent.io>
---------
Signed-off-by: PoAn Yang <payang@apache.org>
The PR do following:
1. rewrite to new test infra
2. rewrite to java
3. move to clients-integration-tests
Reviewers: Ken Huang <s7133700@gmail.com>, Kuan-Po Tseng
<brandboat@gmail.com>, Chia-Ping Tsai <chia7712@gmail.com>
- Moving off deprecated methods
- Fixing argument order for assertEquals(...)
- Few other minor cleanups
Reviewers: PoAn Yang <payang@apache.org>, Lianet Magrans
<lmagrans@confluent.io>, Ken Huang <s7133700@gmail.com>