kafka

Commit Graph

Author	SHA1	Message	Date
Colin P. Mccabe	67fd88050f	KAFKA-8984: Improve tagged fields documentation Author: Colin P. Mccabe <cmccabe@confluent.io> Reviewers: Vikas Singh <vikas@confluent.io>, Jason Gustafson <jason@confluent.io> Closes #7477 from cmccabe/KAFKA-8984	2019-11-09 10:37:48 +05:30
Guozhang Wang	6df058ec15	KAFKA-8677: Simplify the best-effort network client poll to never throw exception (#7613 ) Within KafkaConsumer.poll, we have an optimization to try to send the next fetch request before returning the data in order to pipelining the fetch requests; however, this pollNoWakeup should NOT throw any exceptions, since at this point the fetch position has been updated. If an exception is thrown and the callers decide to capture and continue, those records would never be returned again, causing data loss. Also fix the flaky test itself. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jason Gustafson <jason@confluent.io>	2019-11-08 09:02:34 -08:00
Guozhang Wang	4283fd640c	MINOR: Return null in key mapping of committed (#7659 ) To be consistent with other grouping APIs, and also modified callers accordingly. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-11-08 08:53:38 -08:00
Jason Gustafson	929c25732f	MINOR: Fix version range check in MessageTest (#7663 ) This patch fixes the test utility `testAllMessageRoundTripsFromVersion` in `MessageTest` which was unintentionally excluding the highest version. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-11-07 22:15:39 -08:00
Guozhang Wang	2132709675	KAFKA-9140: Also reset join future when generation was reset in order to re-join (#7647 ) Otherwise the join-group would not be resend and we'd just fall into the endless loop. Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>	2019-11-06 09:47:08 -08:00
A. Sophie Blee-Goldman	a41bc7274b	HOTFIX: remove reference to unused Assignment error code (#7645 ) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-11-05 17:53:11 -08:00
Stanislav Kozlovski	be58580e14	MINOR: Rework NewPartitionReassignment public API (#7638 ) This patch removes the NewPartitionReassignment#of() method in favor of a simple constructor. Said method was confusing due to breaking two conventions - always returning a non-empty Optional and thus not being used as a static factory method. Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>	2019-11-05 10:34:11 -08:00
Ismael Juma	c552c06aed	KAFKA-9110: Improve efficiency of disk reads when TLS is enabled (#7604 ) 1. Avoid a buffer allocation and a buffer copy per file read. 2. Ensure we flush `netWriteBuffer` successfully before reading from disk to avoid wasted disk reads. 3. 32k reads instead of 8k reads to reduce the number of disk reads (improves efficiency for magnetic drives and reduces the number of system calls). 4. Update SslTransportLayer.write(ByteBuffer) to loop until the socket buffer is full or the src buffer has no remaining bytes. 5. Renamed `MappedByteBuffers` to `ByteBufferUnmapper` since it's also applicable for direct byte buffers. 6. Skip empty `RecordsSend` 7. Some minor clean-ups for readability. I ran a simple consumer perf benchmark on a 6 partition topic (large enough not to fit into page cache) starting from the beginning of the log with TLS enabled on my 6 core MacBook Pro as a sanity check. This laptop has fast SSDs so it benefits less from the larger reads than the case where magnetic disks are used. Consumer throughput was ~260 MB/s before the changes and ~300 MB/s after (~15% improvement). Credit to @junrao for pointing out that this code could be more efficient. Reviewers: Jun Rao <junrao@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	2019-11-05 04:51:25 -08:00
Ismael Juma	7bdbdf1900	HOTFIX: Try to complete Send even if no bytes were written (#7622 ) If there are pending bytes in the transport layer, we may complete a send even if no bytes were recorded as written. We assume bytes are written when they are in the netWriteBuffer, but we only consider the send as completed when it's in the socket channel buffer. This fixes a regression introduced via `0971f66ff5`. The impact is that we would sometimes throw the following exception in `MultiRecordsSend.writeTo`: ```java if (completed()) throw new KafkaException("This operation cannot be invoked on a complete request."); ``` Added unit test verifying the bug fix. While in the area, I simplified one of the `SslSelectorTest` methods. Reviewers: Jun Rao <junrao@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-11-02 08:42:30 -07:00
A. Sophie Blee-Goldman	d61b0c131c	KAFKA-8972 (2.4 blocker): TaskManager state should always be updated after rebalance (#7620 ) Currently when we identify version probing we return early from onAssignment and never get to updating the TaskManager and general state with the new assignment. Since we do actually give out "real" assignments even during version probing, a StreamThread should take real ownership of its tasks/partitions including cleaning them up in onPartitionsRevoked which gets invoked when we call onLeavePrepare as part of triggering the follow-up rebalance. Every member will always get an assignment encoded with the lowest common version, so there should be no problem decoding a VP assignment. We should just allow onAssignment to proceed as usual so that the TaskManager is in a consistent state, and knows what all its tasks/partitions are when the first rebalance completes and the next one is triggered. Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <mjsax@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2019-11-01 16:10:42 -07:00
Viktor Somogyi	5fa2de43ec	MINOR: Replace some Java 7 style code with Java 8 style (#7623 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>	2019-11-02 00:23:21 +05:30
John Roesler	4a5155c934	KAFKA-8868: Generate SubscriptionInfo protocol message (#7248 ) Rather than maintain hand coded protocol serialization code, Streams could use the same code-generation framework as Clients/Core. There isn't a perfect match, since the code generation framework includes an assumption that you're generating "protocol messages", rather than just arbitrary blobs, but I think it's close enough to justify using it, and improving it over time. Using the code generation allows us to drop a lot of detail-oriented, brittle, and hard-to-maintain serialization logic in favor of a schema spec. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-11-01 10:03:55 -07:00
huxi	9e81ec9a6e	KAFKA-9093: NullPointerException in KafkaConsumer with group.instance.id (#7590 ) `log` in KafkaConsumer does not get initialized if an invalid value for group.intance.id is given during consumer construction. In this case we should skip the catch block's close procedure since no internal objects have been initialized yet. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-30 14:19:57 -07:00
Boyang Chen	465f810730	KAFKA-8972 (2.4 blocker): correctly release lost partitions during consumer.unsubscribe() (#7441 ) Inside onLeavePrepare we would look into the assignment and try to revoke the owned tasks and notify users via RebalanceListener#onPartitionsRevoked, and then clear the assignment. However, the subscription's assignment is already cleared in this.subscriptions.unsubscribe(); which means user's rebalance listener would never be triggered. In other words, from consumer client's pov nothing is owned after unsubscribe, but from the user caller's pov the partitions are not revoked yet. For callers like Kafka Streams which rely on the rebalance listener to maintain their internal state, this leads to inconsistent state management and failure cases. Before KIP-429 this issue is hidden away since every time the consumer re-joins the group later, it would still revoke everything anyways regardless of the passed-in parameters of the rebalance listener; with KIP-429 this is easier to reproduce now. Our fixes are following: • Inside unsubscribe, first do onLeavePrepare / maybeLeaveGroup and then subscription.unsubscribe. This we we are guaranteed that the streams' tasks are all closed as revoked by then. • [Optimization] If the generation is reset due to fatal error from join / hb response etc, then we know that all partitions are lost, and we should not trigger onPartitionRevoked, but instead just onPartitionsLost inside onLeavePrepare. This is because we don't want to commit for lost tracks during rebalance which is doomed to fail as we don't have any generation info. Reviewers: Matthias J. Sax <matthias@confluent.io>, A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>	2019-10-29 10:41:25 -07:00
Guozhang Wang	59a75f4422	KAFKA-9048 Pt1: Remove Unnecessary lookup in Fetch Building (#7576 ) Get rid of partitionStates that creates a new PartitionState for each state since all the callers do not require it to be a Seq. Modify ReplicaFetcherThread constructor to fix the broken benchmark path. This PR: Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9280.953 ± 55.967 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 61533.546 ± 1213.559 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151306.146 ± 1820.222 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1138547.929 ± 45301.938 ns/op Trunk: Benchmark (partitionCount) Mode Cnt Score Error Units \| \| \| \| \| -- \| -- \| -- \| -- \| -- \| -- ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 9305.588 ± 51.886 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 65216.933 ± 939.827 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 151715.514 ± 1361.009 ns/op \| \| \| \| \| ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1231958.103 ± 94 Reviewers: Jason Gustafson <jason@confluent.io>, Lucas Bradstreet <lucasbradstreet@gmail.com>	2019-10-28 07:57:50 -07:00
Boyang Chen	77fc498889	KAFKA-8992; Redefine RemoveMembersFromGroup interface on AdminClient (#7478 ) This PR fixes the inconsistency involved in the `removeMembersFromGroup` admin API calls: 1. Fail the `all()` request when there is sub level error (either partition or member) 2. Change getMembers() to members() 3. Hide the actual Errors from user 4. Do not expose generated MemberIdentity type 5. Use more consistent naming for Options and Result types Reviewers: Guozhang Wang <wangguoz@gmail.com>, David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>	2019-10-25 00:01:51 -07:00
Stanislav Kozlovski	28ef7f1d6d	MINOR: Re-implement NewPartitionReassignment#of() (#7592 ) Re-implement NewPartitionReassignment#of. It now takes a list rather than a variable-length list of arguments. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Vikas Singh <vikas@confluent.io>	2019-10-24 15:23:54 -07:00
Chris Pettitt	6975f1dfa9	KAFKA-8700: Flaky Test QueryableStateIntegrationTest#queryOnRebalance (#7548 ) This is not guaranteed to actually fix queryOnRebalance, since the failure could never be reproduced locally. I did not bump timeouts because it looks like that has been done in the past for this test without success. Instead this change makes the following improvements: It waits for the application to be in a RUNNING state before proceeding with the test. It waits for the remaining instance to return to RUNNING state within a timeout after rebalance. I observed once that we were able to do the KV queries but the instance was still in REBALANCING, so this should reduce some opportunity for flakiness. The meat of this change: we now iterate over all keys in one shot (vs. one at a time with a timeout) and collect various failures, all of which are reported at the end. This should help us to narrow down the cause of flakiness if it shows up again. Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-10-23 12:39:31 -07:00
Nikolay	adb2bdb122	KAFKA-8584: The RPC code generator should support ByteBuffer. (#7342 ) The RPC code generator should support using the ByteBuffer class in addition to byte arrays. By using the ByteBuffer class, we can avoid performing a copy in many situations. Also modify TestByteBufferDataTest to test the new feature. Reviewers: Colin P. McCabe <cmccabe@apache.org>, Guozhang Wang <wangguoz@gmail.com>	2019-10-23 12:39:12 -07:00
Jason Gustafson	0971f66ff5	KAFKA-9056; Inbound/outbound byte metrics should reflect incomplete sends/receives (#7551 ) Currently we only record completed sends and receives in the selector metrics. If there is a disconnect in the middle of the respective operation, then it is not counted. The metrics will be more accurate if we take into account partial sends and receives. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com	2019-10-23 08:45:08 -07:00
Jason Gustafson	4fc649d85c	MINOR: Add toString to PartitionReassignment (#7579 ) This patch adds a `toString()` implementation to `PartitionReassignment`. It also makes the `ListPartitionReassignmentsResult` constructor use default access, which is the standard for the admin client *Result classes. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-22 17:11:58 -07:00
José Armando García Sancio	c00bd38ab2	MINOR: Rename brokers to replicas in the reassignment API (#7570 ) Reviewers: Jason Gustafson <jason@confluent.io>, Manikumar Reddy <manikumar.reddy@gmail.com>, Vikas Singh <vikas@confluent.io>, Colin P. McCabe <cmccabe@apache.org>	2019-10-22 15:36:53 -07:00
Manikumar Reddy	e20dcffa84	KAFKA-8943: Move SecurityProviderCreator to org.apache.kafka.common.security.auth package (#7564 ) Reviewers: Mickael Maison <mickael.maison@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-10-21 08:48:33 +01:00
Lee Dongjin	a9b0fc866a	KAFKA-8482: Improve documentation on AdminClient#alterReplicaLogDirs, AlterReplicaLogDirsResult (#7083 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2019-10-20 17:13:17 +05:30
Mickael Maison	99a4068c5c	KAFKA-7689; Add AlterConsumerGroup/List Offsets to AdminClient [KIP-396] (#7296 ) This patch implements new AdminClient APIs to list offsets and alter consumer group offsets as documented in KIP-396: https://cwiki.apache.org/confluence/pages/viewpage.action?pageId=97551484. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-10-19 21:30:50 -07:00
Nikolay	4e094217f7	KAFKA-8455: Add VoidSerde to Serdes (#7485 ) Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>	2019-10-18 16:54:23 -07:00
Dhruvil Shah	317089663c	KAFKA-8962; Use least loaded node for AdminClient#describeTopics (#7421 ) Allow routing of `AdminClient#describeTopics` to any broker in the cluster than just the controller, so that we don't create a hotspot for this API call. `AdminClient#describeTopics` uses the broker's metadata cache which is asynchronously maintained, so routing to brokers other than the controller is not expected to have a significant difference in terms of metadata consistency; all metadata requests are eventually consistent. This patch also fixes a few flaky test failures. Reviewers: Ismael Juma <ismael@juma.me.uk>, José Armando García Sancio <jsancio@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-10-17 23:08:34 -07:00
Colin Patrick McCabe	3cb8ccf63a	MINOR: AbstractRequestResponse should be an interface (#7513 ) AbstractRequestResponse should be an interface, since it has no concrete elements or implementation. Move AbstractRequestResponse#serialize to RequestUtils#serialize and make it package-private, since it doesn't need to be public. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-10-17 09:21:34 -07:00
Will James	2cf7b35d90	KAKFA-8950: Fix KafkaConsumer Fetcher breaking on concurrent disconnect (#7511 ) The KafkaConsumer Fetcher can sometimes get into an invalid state where it believes that there are ongoing fetch requests, but in fact there are none. This may be caused by the heartbeat thread concurrently handling a disconnection event just after the fetcher thread submits a request which would cause the Fetcher to enter an invalid state where it believes it has ongoing requests to the disconnected node but in fact it does not. This is due to a thread safety issue in the Fetcher where it was possible for the ordering of the modifications to the nodesWithPendingFetchRequests to be incorrect - the Fetcher was adding it after the listener had already been invoked, which would mean that pending node never gets removed again. This PR addresses that thread safety issue by ensuring that the pending node is added to the nodesWithPendingFetchRequests before the listener is added to the future, ensuring the finally block is called after the node is added. Reviewers: Tom Lee, Jason Gustafson <jason@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>	2019-10-17 11:56:41 +01:00
Kevin Lu	ed078bd702	KAFKA-8874: Add consumer metrics to observe user poll behavior (KIP-517) https://cwiki.apache.org/confluence/display/KAFKA/KIP-517%3A+Add+consumer+metrics+to+observe+user+poll+behavior Author: Kevin Lu <kelu@paypal.com> Reviewers: Sriharsha Chintalapani <sriharsha@apache.org>, Jason Gustafson <jason@confluent.io> Closes #7395 from KevinLiLu/KIP517-KAFKA8874	2019-10-17 14:46:42 +05:30
Nikolay	00374c3ddf	KAFKA-8104: Consumer cannot rejoin to the group after rebalancing (#7460 ) This PR contains the fix of race condition bug between "consumer thread" and "consumer coordinator heartbeat thread". It reproduces in many production environments. Condition for reproducing: 1. Consumer thread initiates rejoin to the group because of commit timeout. Call of AbstractCoordinator#joinGroupIfNeeded which leads to sendJoinGroupRequest. 2. JoinGroupResponseHandler writes to the AbstractCoordinator.this.generation new generation data and leaves the synchronized section. 3. Heartbeat thread executes mabeLeaveGroup and clears generation data via resetGenerationOnLeaveGroup. 4. Consumer thread executes onJoinComplete(generation.generationId, generation.memberId, generation.protocol, memberAssignment); with the cleared generation data. This leads to the corresponding exception. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-16 22:53:14 -07:00
Lucas Bradstreet	8966d066bd	KAFKA-9039: Optimize ReplicaFetcher fetch path (#7443 ) Improves the performance of the replica fetcher for high partition count fetch requests, where a majority of the partitions did not update between fetch requests. All benchmarks were run on an r5x.large. Vanilla Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 26491.825 ± 438.463 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 153941.952 ± 4337.073 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 339868.602 ± 4201.462 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2588878.448 ± 22172.482 ns/op From 100 to 5000 partitions the latency increase is 2588878.448 / 26491.825 = 97. Avoid gettimeofdaycalls in steady state fetch states `8545888` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 22685.381 ± 267.727 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 113622.521 ± 1854.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 273698.740 ± 9269.554 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2189223.207 ± 1706.945 ns/op From 100 to 5000 partitions the latency increase is 2189223.207 / 22685.381 = 97X Avoid copying partition states to maintain fetch offsets `29fdd60` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 17039.989 ± 609.355 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 99371.086 ± 1833.256 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 216071.333 ± 3714.147 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2035678.223 ± 5195.232 ns/op From 100 to 5000 partitions the latency increase is 2035678.223 / 17039.989 = 119X Keep lag alongside PartitionFetchState to avoid expensive isReplicaInSync check `0e57e3e` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 15131.684 ± 382.088 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 86813.843 ± 3346.385 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 193050.381 ± 3281.833 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1801488.513 ± 2756.355 ns/op From 100 to 5000 partitions the latency increase is 1801488.513 / 15131.684 = 119X Fetch session optimizations (mostly presizing the next hashmap, and avoiding making a copy of sessionPartitions, as a deep copy is not required for the ReplicaFetcher) `2614b24` Benchmark (partitionCount) Mode Cnt Score Error Units ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 11386.203 ± 416.701 ns/op ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 60820.292 ± 3163.001 ns/op ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 146242.158 ± 1937.254 ns/op ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1366768.926 ± 3305.712 ns/op From 100 to 5000 partitions the latency increase is 1366768.926 / 11386.203 = 120 Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>	2019-10-16 09:49:53 -07:00
Chris Pettitt	9c8ab5ce10	MINOR: Provide better messages when waiting for a condition in test (#7488 ) Reviewers: Boyang Chen <boyang@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Bill Bejeck <bill@confluent.io>	2019-10-15 17:18:58 -07:00
Jason Gustafson	b24e9f3ccb	KAFKA-9033; Use consumer/producer identity in generated clientId (#7514 ) By default, if the user does not configure a `client.id`, then we use a very generic identifier, such as `consumer-15`. It is more useful to include identifying information when available such as `group.id` for the consumer and `transactional.id` for the producer. Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-15 14:52:26 -07:00
A. Sophie Blee-Goldman	c9c3adca43	MINOR: move "Added/Removed sensor" log messages to TRACE (#7502 ) Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Bill Bejeck <bill@confluent.io>	2019-10-14 12:38:33 -07:00
A. Sophie Blee-Goldman	6fb8bd0987	KAFKA-9029: Flaky Test CooperativeStickyAssignorTest.testReassignmentWithRand: bump to 4 (#7503 ) One of the sticky assignor tests involves a random change in subscriptions that the current assignor algorithm struggles to react to and in cooperative mode ends up requiring more than one followup rebalance. Apparently, in rare cases it can also require more than 2. Bumping the "allowed subsequent rebalances" to 4 (increase of 2) to allow some breathing room and reduce flakiness (technically any number is "correct", but if it turns out to ever require more than 4 we should revisit and improve the algorithm because that would be excessive (see KAFKA-8767) Reviewers: Guozhang Wang <wangguoz@gmail.com>	2019-10-12 11:07:12 -07:00
Tu V. Tran	f41a5c2c86	KAFKA-8729, pt 3: Add broker-side logic to handle the case when there are record_errors and error_message (#7167 ) All the changes are in ReplicaManager.appendToLocalLog and ReplicaManager.appendRecords. Also, replaced LogAppendInfo.unknownLogAppendInfoWithLogStartOffset with LogAppendInfo.unknownLogAppendInfoWithAdditionalInfo to include those 2 new fields. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-10-10 14:44:37 -07:00
Colin Patrick McCabe	5c4bbf9344	MINOR: fix compatibility-breaking bug in RequestHeader (#7479 ) Reviewers: David Arthur <mumrah@gmail.com>, Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>	2019-10-10 13:52:07 -07:00
Guozhang Wang	fb9b0dfde5	MINOR: Augment log4j to add generation number in performAssign (#7451 ) Since generation is private in AbstractCoordinator, I need to modify the generation() to let it return the object directly. Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bill Bejeck <bill@confluent.io>	2019-10-09 10:34:19 -07:00
Rajini Sivaram	1f1179ea64	KAFKA-8932; Add tag for CreateTopicsResponse.TopicConfigErrorCode (KIP-525) (#7464 ) Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>	2019-10-09 09:19:18 +01:00
Jason Gustafson	a9493346aa	KAFKA-8983; AdminClient deleteRecords should not fail all partitions unnecessarily (#7449 ) The deleteRecords API in the AdminClient groups records to be sent by the partition leaders. If one of these requests fails, we currently fail all futures, including those tied to requests sent to other leaders. It would be better to fail only those partitions included in the failed request. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-10-08 08:19:07 -07:00
A. Sophie Blee-Goldman	d88f1048da	KAFKA-8179: Part 7, cooperative rebalancing in Streams (#7386 ) Key improvements with this PR: * tasks will remain available for IQ during a rebalance (but not during restore) * continue restoring and processing standby tasks during a rebalance * continue processing active tasks during rebalance until the RecordQueue is empty* * only revoked tasks must suspended/closed * StreamsPartitionAssignor tries to return tasks to their previous consumers within a client * but do not try to commit, for now (pending KAFKA-7312) Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>	2019-10-07 09:27:09 -07:00
Jason Gustafson	c06e45a215	KAFKA-8985; Add flexible version support to inter-broker APIs (#7453 ) This patch adds flexible version support for the following inter-broker APIs: ControlledShutdown, LeaderAndIsr, UpdateMetadata, and StopReplica. Version checks have been removed from `getErrorResponse` methods since they were redundant given the checks in `AbstractRequest` and the respective`*Data` types. Reviewers: Ismael Juma <ismael@juma.me.uk>	2019-10-07 09:21:14 -07:00
Colin Patrick McCabe	0de61a4683	KAFKA-8885; The Kafka Protocol should Support Optional Tagged Fields (#7325 ) This patch implements support for optional (tagged) fields in the Kafka protocol as documented in KIP-482: https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields#KIP-482:TheKafkaProtocolshouldSupportOptionalTaggedFields-TypeClasses. Reviewers: David Jacot <djacot@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>	2019-10-06 21:13:23 -07:00
John Roesler	7349608009	MINOR: Gracefully handle non-assigned case in fetcher metric (#7383 ) Minor tweak to gracefully handle a possible IllegalStateException while checking a metric value. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>	2019-10-06 10:55:04 -07:00
David Jacot	f98013cc50	Part 1 of KIP-511: Collect and Expose Client's Name and Version in the Brokers #7381 Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, David Arthur <mumrah@gmail.com>, Colin P. McCabe <cmccabe@apache.org>	2019-10-04 09:19:18 -07:00
Adam Bellemare	c87fe9402c	KAFKA-3705 Added a foreignKeyJoin implementation for KTable. (#5527 ) https://issues.apache.org/jira/browse/KAFKA-3705 Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join. Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>, John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Jan Filipiak <Jan.Filipiak@trivago.com>, pgwhalen, Alexei Daniline	2019-10-03 18:59:31 -04:00
Andy Coates	c62dd1e92e	MINOR: Make TopicDescription's other constructor public (#7405 ) Reviewers: Colin P. McCabe <cmccabe@apache.org>	2019-10-03 10:11:28 -07:00
Ismael Juma	f99bb0466e	MINOR: Start correlation id at 0 in SaslClientAuthenticator (#7432 ) I tried to add a test for this, but it's actually pretty hard to verify what we want to verify. I could add a test that checks the correlation field after the connection has been established, but it would not catch this kind of bug where the issue is not the value we store, but the value we create the request header with. I have another PR that avoids intermediate structs during serialization/deserialization, which has a test that fails without this change. So we'll get coverage that way. Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>	2019-10-02 21:02:33 -07:00
Jeff Huang	157aadbbf3	MINOR: Add provider name of SSLContext to debug log. (#7407 )	2019-10-02 15:57:52 +01:00

1 2 3 4 5 ...

1703 Commits