Commit Graph

413 Commits

Author SHA1 Message Date
Colin Patrick McCabe 5eac5a822f
KAFKA-12276: Add the quorum controller code (#10070)
The quorum controller stores metadata in the KIP-500 metadata log, not in Apache
ZooKeeper. Each controller node is a voter in the metadata quorum. The leader of the
quorum is the active controller, which processes write requests. The followers are standby
controllers, which replay the operations written to the log. If the active controller goes away,
a standby controller can take its place.

Like the ZooKeeper-based controller, the quorum controller is based on an event queue
backed by a single-threaded executor. However, unlike the ZK-based controller, the quorum
controller can have multiple operations in flight-- it does not need to wait for one operation
to be finished before starting another. Therefore, calls into the QuorumController return
CompleteableFuture objects which are completed with either a result or an error when the
operation is done. The QuorumController will also time out operations that have been
sitting on the queue too long without being processed. In this case, the future is completed
with a TimeoutException.

The controller uses timeline data structures to store multiple "versions" of its in-memory 
state simultaneously. "Read operations" read only committed state, which is slightly older
than the most up-to-date in-memory state. "Write operations" read and write the latest
in-memory state. However, we can not return a successful result for a write operation until
its state has been committed to the log. Therefore, if a client receives an RPC response, it
knows that the requested operation has been performed, and can not be undone by a
controller failover.

Reviewers: Jun Rao <junrao@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
2021-02-19 18:03:23 -08:00
Colin P. Mccabe 690f72dd69 KAFKA-12334: Add the KIP-500 metadata shell
The Kafka Metadata shell is a new command which allows users to
interactively examine the metadata stored in a KIP-500 cluster.
It can examine snapshot files that are specified via --snapshot.

The metadata tool works by replaying the log and storing the state into
in-memory nodes.  These nodes are presented in a fashion similar to
filesystem directories.

Reviewers: Jason Gustafson <jason@confluent.io>, David Arthur <mumrah@gmail.com>, Igor Soarez <soarez@apple.com>
2021-02-19 15:46:34 -08:00
Jason Gustafson 698319b8e2 KAFKA-12278; Ensure exposed api versions are consistent within listener (#10666)
Previously all APIs were accessible on every listener exposed by the broker, but
with KIP-500, that is no longer true.  We now have more complex requirements for
API accessibility.

For example, the KIP-500 controller exposes some APIs which are not exposed by
brokers, such as BrokerHeartbeatRequest, and does not expose most client APIs,
such as JoinGroupRequest, etc.  Similarly, the KIP-500 broker does not implement
some APIs that the ZK-based broker does, such as LeaderAndIsrRequest and
UpdateFeaturesRequest.

All of this means that we need more sophistication in how we expose APIs and
keep them consistent with the ApiVersions API. Up until now, we have been
working around this using the controllerOnly flag inside ApiKeys, but this is
not rich enough to support all of the cases listed above.  This PR introduces a
new "listeners" field to the request schema definitions.  This field is an array
of strings which indicate the listener types in which the API should be exposed.
We currently support "zkBroker", "broker", and "controller".  ("broker"
indicates the KIP-500 broker, whereas zkBroker indicates the old broker).

This PR also creates ApiVersionManager to encapsulate the creation of the
ApiVersionsResponse based on the listener type.  Additionally, it modifies
SocketServer to check the listener type of received requests before forwarding
them to the request handler.

Finally, this PR also fixes a bug in the handling of the ApiVersionsResponse
prior to authentication. Previously a static response was sent, which means that
changes to features would not get reflected. This also meant that the logic to
ensure that only the intersection of version ranges supported by the controller
would get exposed did not work. I think this is important because some clients
rely on the initial pre-authenticated ApiVersions response rather than doing a
second round after authentication as the Java client does.

One final cleanup note: I have removed the expectation that envelope requests
are only allowed on "privileged" listeners.  This made sense initially because
we expected to use forwarding before the KIP-500 controller was available. That
is not the case anymore and we expect the Envelope API to only be exposed on the
controller listener. I have nevertheless preserved the existing workarounds to
allow verification of the forwarding behavior in integration testing.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
2021-02-18 16:25:51 -08:00
Ron Dagostino a30f92bf59
MINOR: Add KIP-500 BrokerServer and ControllerServer (#10113)
This PR adds the KIP-500 BrokerServer and ControllerServer classes and 
makes some related changes to get them working.  Note that the ControllerServer 
does not instantiate a QuorumController object yet, since that will be added in
PR #10070.

* Add BrokerServer and ControllerServer

* Change ApiVersions#computeMaxUsableProduceMagic so that it can handle
endpoints which do not support PRODUCE (such as KIP-500 controller nodes)

* KafkaAdminClientTest: fix some lingering references to decommissionBroker
that should be references to unregisterBroker.

* Make some changes to allow SocketServer to be used by ControllerServer as
we as by the broker.

* We now return a random active Broker ID as the Controller ID in
MetadataResponse for the Raft-based case as per KIP-590.

* Add the RaftControllerNodeProvider

* Add EnvelopeUtils

* Add MetaLogRaftShim

* In ducktape, in config_property.py: use a KIP-500 compatible cluster ID.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, David Arthur <mumrah@gmail.com>
2021-02-17 21:35:13 -08:00
Ismael Juma 744d05b128
KAFKA-12327: Remove MethodHandle usage in CompressionType (#10123)
We don't really need it and it causes problems in older Android versions
and GraalVM native image usage (there are workarounds for the latter).

Move the logic to separate classes that are only invoked when the
relevant compression library is actually used. Place such classes
in their own package and enforce via checkstyle that only these
classes refer to compression library packages.

To avoid cyclic dependencies, moved `BufferSupplier` to the `utils`
package.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-14 08:12:25 -08:00
Colin Patrick McCabe bf5e1f1cc0
MINOR: add the MetaLogListener, LocalLogManager, and Controller interface. (#10106)
Add MetaLogListener, LocalLogManager, and related classes. These
classes are used by the KIP-500 controller and broker to interface with the
Raft log.

Also add the Controller interface. The implementation will be added in a separate PR.

Reviewers: Ron Dagostino <rdagostino@confluent.io>, David Arthur <mumrah@gmail.com>
2021-02-11 08:42:59 -08:00
David Arthur e7e4252b0f
JUnit extensions for integration tests (#9986)
Adds JUnit 5 extension for running the same test with different types of clusters. 
See core/src/test/java/kafka/test/junit/README.md for details
2021-02-09 11:49:33 -05:00
Colin Patrick McCabe d98df7fc4d
MINOR: Add KafkaEventQueue (#10030)
Add KafkaEventQueue, which is used by the KIP-500 controller to manage its event queue.
Compared to using an Executor, KafkaEventQueue has the following advantages:

* Events can be given "deadlines." If an event lingers in the queue beyond the deadline, it
will be completed with a timeout exception. This is useful for implementing timeouts for
controller RPCs.

* Events can be prepended to the queue as well as appended.

* Events can be given tags to make them easier to manage. This is especially useful for
rescheduling or cancelling events which were previously scheduled to execute in the future.

Reviewers: Jun Rao <junrao@gmail.com>, José Armando García Sancio <jsancio@gmail.com>
2021-02-04 14:46:57 -08:00
Jason Gustafson f58c2acf26
KAFKA-12250; Add metadata record serde for KIP-631 (#9998)
This patch adds a `RecordSerde` implementation for the metadata record format expected by KIP-631. 

Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <mlists@juma.me.uk>
2021-02-03 16:16:35 -08:00
Colin Patrick McCabe 772f2cfc82
MINOR: Replace BrokerStates.scala with BrokerState.java (#10028)
Replace BrokerStates.scala with BrokerState.java, to make it easier to use from Java code if needed.  This also makes it easier to go from a numeric type to an enum.

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-02-03 13:41:38 -08:00
Colin Patrick McCabe 1711cfa4eb
KAFKA-12209: Add the timeline data structures for the KIP-631 controller (#9901)
Reviewers: Jun Rao <junrao@gmail.com>
2021-02-02 11:33:55 -08:00
John Roesler 4d28391480
KAFKA-10867: Improved task idling (#9840)
Use the new ConsumerRecords.metadata() API to implement
improved task idling as described in KIP-695

Reviewers: Guozhang Wang <guozhang@apache.org>
2021-01-27 21:57:20 -06:00
John Roesler fdcf8fbf72
KAFKA-10866: Add metadata to ConsumerRecords (#9836)
Expose fetched metadata via the ConsumerRecords
object as described in KIP-695.

Reviewers: Guozhang Wang <guozhang@apache.org>
2021-01-27 18:18:38 -06:00
Ismael Juma 24a2ed26a6
MINOR: Update zstd-jni to 1.4.8-2 (#9957)
* The latest version zstd-jni doesn't use `RecyclingBufferPool` by default, so we
pass it via the relevant constructors to maintain the behavior before this
change.
* zstd-jni fixes an issue when using Alpine, see https://github.com/luben/zstd-jni/issues/157.
* zstd 1.4.7 includes several months of improvements across many axis,
from performance to various fixes. Details: https://github.com/facebook/zstd/releases/tag/v1.4.7
* zstd 1.4.8 is a hotfix release, details: https://github.com/facebook/zstd/releases/tag/v1.4.8

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-01-24 20:20:52 -08:00
Colin Patrick McCabe 217334b0f4
KAFKA-12183: Add the KIP-631 metadata record definitions (#9876)
Add the metadata gradle module, which will contain the metadata record
definitions, and other metadata-related broker-side code.

Add MetadataParser, MetadataParseException, etc.

Reviewers: José Armando García Sancio <jsancio@gmail.com>, Ismael Juma <ismael@juma.me.uk>, David Arthur <mumrah@gmail.com>
2021-01-14 09:58:52 -08:00
Ning Zhang 2cde6f61b8
KAFKA-10304: Refactor MM2 integration tests (#9224)
Co-authored-by: Ning Zhang <nzhang1220@fb.com>
Reviewers: Mickael Maison <mickael.maison@gmail.com>
2021-01-14 14:48:17 +00:00
Ismael Juma 52b8aa0fdc
KAFKA-7340: Migrate clients module to JUnit 5 (#9874)
* Use the packages/classes from JUnit 5
* Move description in `assert` methods to last parameter
* Convert parameterized tests so that they work with JUnit 5
* Remove `hamcrest`, it didn't seem to add much value
* Fix `Utils.mkEntry` to have correct `equals` implementation
* Add a missing `@Test` annotation in `SslSelectorTest` override
* Adjust regex in `SaslAuthenticatorTest` due to small change in the
assert failure string in JUnit 5

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>
2021-01-13 16:17:45 -08:00
Colin P. Mccabe 1e4d335245 KAFKA-12180: Implement the KIP-631 message generator changes
* Implement the uint16 type
* Implement MetadataRecordType and MetadataJsonConverters

Reviewers: Jason Gustafson <jason@confluent.io>
2021-01-12 12:43:59 -08:00
Jason Gustafson eb9fe411bb
KAFKA-10842; Use `InterBrokerSendThread` for raft's outbound network channel (#9732)
This patch contains the following improvements:

- Separate inbound/outbound request flows so that we can open the door for concurrent inbound request handling
- Rewrite `KafkaNetworkChannel` to use `InterBrokerSendThread` which fixes a number of bugs/shortcomings
- Get rid of a lot of boilerplate conversions in `KafkaNetworkChannel` 
- Improve validation of inbound responses in `KafkaRaftClient` by checking correlationId. This fixes a bug which could cause an out of order Fetch to be applied incorrectly.

Reviewers: David Arthur <mumrah@gmail.com>
2020-12-21 18:15:15 -08:00
Justine Olshan 1dd1e7f945
KAFKA-10545: Create topic IDs and propagate to brokers (#9626)
This change propagates topic ids to brokers in LeaderAndIsr Request. It also removes the topic name from the LeaderAndIsr Response, reorganizes the response to be sorted by topic, and includes the topic ID.

In addition, the topic ID is persisted to each replica in Log as well as in a file on disk. This file is read on startup and if the topic ID exists, it will be reloaded.

Reviewers: David Jacot <djacot@confluent.io>, dengziming <dengziming1993@gmail.com>, Nikhil Bhatia <rite2nikhil@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-12-18 22:19:50 +00:00
Scott Hendricks baef516789
Add ConfigurableProducerSpec to Trogdor for improved E2E latency tracking. (#9736)
Reviewer: Colin P. McCabe <cmccabe@apache.org>
2020-12-18 13:03:59 -08:00
Cheng Tan ae3a6ed990
KAKFA-10619: Idempotent producer will get authorized once it has a WRITE access to at least one topic (KIP-679) (#9485)
Includes:
- New API to authorize by resource type
- Default implementation for the method that supports super users and ACLs
- Optimized implementation in AclAuthorizer that supports ACLs, super users and allow.everyone.if.no.acl.found
- Benchmarks and tests
- InitProducerIdRequest authorized for Cluster:IdempotentWrite or WRITE to any topic, ProduceRequest authorized only for topic even if idempotent

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2020-12-18 18:08:46 +00:00
José Armando García Sancio ab0807dd85
KAFKA-10394: Add classes to read and write snapshot for KIP-630 (#9512)
This PR adds support for generating snapshot for KIP-630.

1. Adds the interfaces `RawSnapshotWriter` and `RawSnapshotReader` and the implementations `FileRawSnapshotWriter` and `FileRawSnapshotReader` respectively. These interfaces and implementations are low level API for writing and reading snapshots. They are internal to the Raft implementation and are not exposed to the users of `RaftClient`. They operation at the `Record` level. These types are exposed to the `RaftClient` through the `ReplicatedLog` interface.

2. Adds a buffered snapshot writer: `SnapshotWriter<T>`. This type is a higher-level type and it is exposed through the `RaftClient` interface. A future PR will add the related `SnapshotReader<T>`, which will be used by the state machine to load a snapshot.

Reviewers: Jason Gustafson <jason@confluent.io>
2020-12-07 14:06:25 -08:00
Chia-Ping Tsai 6bbf69fb00
KAFKA-10497 Convert group coordinator metadata schemas to use generat… (#9318)
Reviewers: David Jacot <djacot@confluent.io>
2020-11-18 14:49:04 +08:00
A. Sophie Blee-Goldman e71cb7ab11
KAFKA-10689: fix windowed FKJ topology and put checks in assignor to avoid infinite loops (#9568)
Fix infinite loop in assignor when trying to resolve the number of partitions in a topology with a windowed FKJ. Also adds a check to this loop to break out and fail the application if we detect that we are/will be stuck in an infinite loop

Reviewers: Matthias Sax <matthias@confluent.io>
2020-11-17 16:57:53 -08:00
Boyang Chen 0814e4f645
KAFKA-10181: Use Envelope RPC to do redirection for (Incremental)AlterConfig, AlterClientQuota and CreateTopics (#9103)
This PR adds support for forwarding of the following RPCs:

AlterConfigs
IncrementalAlterConfigs
AlterClientQuotas
CreateTopics

Co-authored-by: Jason Gustafson <jason@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
2020-11-04 14:21:44 -08:00
Matthias J. Sax e8ad80ebe1
MINOR: remove explicit passing of AdminClient into StreamsPartitionAssignor (#9384)
Currently, we pass multiple object reference (AdminClient,TaskManager, and a few more) into StreamsPartitionAssignor. Furthermore, we (miss)use TaskManager#mainConsumer() to get access to the main consumer (we need to do this, to avoid a cyclic dependency).

This PR unifies how object references are passed into a single ReferenceContainer class to
 - not "miss use" the TaskManager as reference container
 - unify how object references are passes

Note: we need to use a reference container to avoid cyclic dependencies, instead of using a config for each passed reference individually.

Reviewers: John Roesler <john@confluent.io>
2020-10-15 16:10:27 -07:00
John Roesler 27b0e35e7a
KAFKA-10437: Implement new PAPI support for test-utils (#9396)
Implements KIP-478 for the test-utils module:
* adds mocks of the new ProcessorContext and StateStoreContext
* adds tests that all stores and store builders are usable with the new mock
* adds tests that the new Processor api is usable with the new mock
* updates the demonstration Processor to the new api

Reviewers: Guozhang Wang <guozhang@apache.org>
2020-10-13 11:15:22 -05:00
Lee Dongjin 8d4bbf22ad
MINOR: trivial cleanups, javadoc errors, omitted StateStore tests, etc. (#8130)
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-10-07 19:08:31 -07:00
Rajini Sivaram 7be8bd8cbf
KAFKA-10338; Support PEM format for SSL key and trust stores (KIP-651) (#9345)
Adds support for SSL key and trust stores to be specified in PEM format either as files or directly as configuration values.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2020-10-06 19:13:43 +01:00
Guozhang Wang 53a35c1de3
MINOR: Refactor unit tests around RocksDBConfigSetter (#9358)
* Extract the mock RocksDBConfigSetter into a separate class.
* De-dup unit tests covering RocksDBConfigSetter.

Reviewers: Boyang Chen <boyang@confluent.io>
2020-10-06 09:09:54 -07:00
John Roesler 69790a1463
KAFKA-10535: Split ProcessorContext into Processor/StateStore/Record Contexts (#9361)
Migrate different components of the old ProcessorContext interface
into separate interfaces that are more appropriate for their usages.
See KIP-478 for the details.

Reviewers: Guozhang Wang <guozhang@apache.org>, Paul Whalen <pgwhalen@gmail.com>
2020-10-02 18:49:12 -05:00
leah 95986a8f48
MINOR: Fix KStreamKTableJoinTest and StreamTaskTest (#9357)
Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-09-30 20:07:23 -07:00
Jason Gustafson b7c8490cf4
KAFKA-10492; Core Kafka Raft Implementation (KIP-595) (#9130)
This is the core Raft implementation specified by KIP-595: https://cwiki.apache.org/confluence/display/KAFKA/KIP-595%3A+A+Raft+Protocol+for+the+Metadata+Quorum. We have created a separate "raft" module where most of the logic resides. The new APIs introduced in this patch in order to support Raft election and such are disabled in the server until the integration with the controller is complete. Until then, there is a standalone server which can be used for testing the performance of the Raft implementation. See `raft/README.md` for details.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io>

Co-authored-by: Boyang Chen <boyang@confluent.io>
Co-authored-by: Guozhang Wang <wangguoz@gmail.com>
2020-09-22 11:32:44 -07:00
leah 2194ccba5b
Adding reverse iterator usage for sliding windows processing (extending KIP-450) (#9239)
Add a backwardFetch call to the window store for sliding window
processing. While the implementation works with the forward call
to the window store, using backwardFetch allows for the iterator
to be closed earlier, making implementation more efficient.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, John Roesler <vvcephei@apache.org>
2020-09-11 16:38:17 -05:00
leah dd2b9eca5d
KAFKA-5636: Improve handling of "early" records in sliding windows (#9157)
Update for KIP-450 to handle "early" records.

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-09-09 12:02:19 -07:00
Colin Patrick McCabe b6ba67482f
KAFKA-10384: Separate converters from generated messages (#9194)
For the generated message code, put the JSON conversion functionality
in a separate JsonConverter class.

Make MessageDataGenerator simply another generator class, alongside the
new JsonConverterGenerator class.  Move some of the utility functions
from MessageDataGenerator into FieldSpec and other places, so that they
can be used by other generator classes.

Use argparse4j to support a better command-line for the generator.

Reviewers: David Arthur <mumrah@gmail.com>
2020-08-26 15:10:09 -07:00
David Arthur 1a9697430a
KAFKA-8806 Reduce calls to validateOffsetsIfNeeded (#7222)
Only check if positions need validation if there is new metadata. 

Also fix some inefficient java.util.stream code in the hot path of SubscriptionState.
2020-08-21 10:25:52 -04:00
Jason Gustafson 3a189ad868
KAFKA-10386; Fix flexible version support for `records` type (#9163)
This patch fixes the generated serde logic for the 'records' type so that it uses the compact byte array representation consistently when flexible versions are enabled.

Reviewers: David Arthur <mumrah@gmail.com>
2020-08-13 09:52:23 -07:00
Matthias J. Sax b351493543
KAFKA-9274: Remove `retries` for global task (#9047)
- part of KIP-572
 - removed the usage of `retries` in `GlobalStateManger`
 - instead of retries the new `task.timeout.ms` config is used

Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-08-05 14:14:18 -07:00
John Roesler 26a217c8e7
MINOR: Streams integration tests should not call exit (#9067)
- replace System.exit with Exit.exit in all relevant classes
- forbid use of System.exit in all relevant classes and add exceptions for others

Co-authored-by: John Roesler <vvcephei@apache.org>
Co-authored-by: Matthias J. Sax <matthias@confluent.io>

Reviewers: Lucas Bradstreet <lucas@confluent.io>, Ismael Juma <ismael@confluent.io>
2020-08-05 13:52:50 -07:00
David Arthur 4cd2396db3
KAFKA-9629 Use generated protocol for Fetch API (#9008)
Refactored FetchRequest and FetchResponse to use the generated message classes for serialization and deserialization. This allows us to bypass unnecessary Struct conversion in a few places. A new "records" type was added to the message protocol which uses BaseRecords as the field type. When sending, we can set a FileRecords instance on the message, and when receiving the message class will use MemoryRecords. 

Also included a few JMH benchmarks which indicate a small performance improvement for requests with high partition counts or small record sizes.

Reviewers: Jason Gustafson <jason@confluent.io>, Boyang Chen <boyang@confluent.io>, David Jacot <djacot@confluent.io>, Lucas Bradstreet <lucas@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Colin P. McCabe <cmccabe@apache.org>
2020-07-30 13:29:39 -04:00
Mickael Maison caa806cd82
KAFKA-10232: MirrorMaker2 internal topics Formatters KIP-597 (#8604)
This PR includes 3 MessageFormatters for MirrorMaker2 internal topics:
- HeartbeatFormatter
- CheckpointFormatter
- OffsetSyncFormatter

This also introduces a new public interface org.apache.kafka.common.MessageFormatter that users can implement to build custom formatters.

Reviewers: Konstantine Karantasis <k.karantasis@gmail.com>, Ryanne Dolan <ryannedolan@gmail.com>, David Jacot <djacot@confluent.io>

Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
2020-07-03 10:41:45 +01:00
A. Sophie Blee-Goldman 42aa0f38b9
MINOR: clean up unused checkstyle suppressions for Streams (#8861)
Reviewers: John Roesler <john@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-06-17 17:04:43 -07:00
A. Sophie Blee-Goldman 03ed08d0d1
KAFKA-10144: clean up corrupted standby tasks before attempting a commit (#8849)
We need to make sure that corrupted standby tasks are actually cleaned up upon a TaskCorruptedException. However due to the commit prior to invoking handleCorruption, it's possible to throw a TaskMigratedException before actually cleaning up any of the corrupted tasks.

This is fine for active tasks since handleLostAll will finish up the job, but it does nothing with standby tasks. We should make sure that standby tasks are handled before attempting to commit (which we can do, since we don't need to commit anything for the corrupted standbys)

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-06-12 16:21:57 -07:00
Adam Bellemare bcf45b09d3
KAFKA-10049: Fixed FKJ bug where wrapped serdes are set incorrectly when using default StreamsConfig serdes (#8764)
Bug Details:
Mistakenly setting the value serde to the key serde for an internal wrapped serde in the FKJ workflow.

Testing:
Modified the existing test to reproduce the issue, then verified that the test passes.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, John Roesler <vvcephei@apache.org>
2020-06-12 10:00:38 -05:00
Kowshik Prakasam 4f96c5b424
KAFKA-10027: Implement read path for feature versioning system (KIP-584) (#8680)
In this PR, I have implemented various classes and integration for the read path of the feature versioning system (KIP-584). The ultimate plan is that the cluster-wide finalized features information is going to be stored in ZK under the node /feature. The read path implemented in this PR is centered around reading this finalized features information from ZK, and, processing it inside the Broker.

Here is a summary of what's in this PR (a lot of it is new classes):

A facility is provided in the broker to declare its supported features, and advertise its supported features via its own BrokerIdZNode under a features key.
A facility is provided in the broker to listen to and propagate cluster-wide finalized feature changes from ZK.
When new finalized features are read from ZK, feature incompatibilities are detected by comparing against the broker's own supported features.
ApiVersionsResponse is now served containing supported and finalized feature information (using the newly added tagged fields).

Reviewers: Boyang Chen <boyang@confluent.io>, Jun Rao <junrao@gmail.com>
2020-06-11 11:28:57 -07:00
Konstantine Karantasis 09b22e7e67
KAFKA-9848: Avoid triggering scheduled rebalance delay when task assignment fails but Connect workers remain in the group (#8805)
In the first version of the incremental cooperative protocol, in the presence of a failed sync request by the leader, the assignor was designed to treat the unapplied assignments as lost and trigger a rebalance delay. 

This commit applies optimizations in these cases to avoid the unnecessary activation of the rebalancing delay. First, if the worker that loses the sync group request or response is the leader, then it detects this failure by checking the what is the expected generation when it performs task assignments. If it's not the expected one, it resets its view of the previous assignment because it wasn't successfully applied and it doesn't represent a correct state. Furthermore, if the worker that has missed the assignment sync is an ordinary worker, then the leader is able to detect that there are lost assignments and instead of triggering a rebalance delay among the same members of the group, it treats the lost tasks as new tasks and reassigns them immediately. If the lost assignment included revocations that were not applied, the leader reapplies these revocations again. 

Existing unit tests and integration tests are adapted to test the proposed optimizations. 

Reviewers: Randall Hauch <rhauch@gmail.com>
2020-06-09 09:41:11 -07:00
Tom Bentley 78e8a49cda
KAFKA-9434: automated protocol for alterReplicaLogDirs (#8311)
Reviewers: David Jacot <djacot@confluent.io>, Mickael Maison <mickael.maison@gmail.com>
2020-06-04 15:36:37 +01:00
A. Sophie Blee-Goldman c6633a157e
KAFKA-9987: optimize sticky assignment algorithm for same-subscription case (#8668)
Motivation and pseudo code algorithm in the ticket.

Added a scale test with large number of topic partitions and consumers and 30s timeout.
With these changes, assignment with 2,000 consumers and 200 topics with 2,000 each completes within a few seconds.

Porting the same test to trunk, it took 2 minutes even with a 100x reduction in the number of topics (ie, 2 minutes for 2,000 consumers and 2 topics with 2,000 partitions)

Should be cherry-picked to 2.6, 2.5, and 2.4

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2020-06-01 15:57:15 -07:00
Mickael Maison fe948d39e5
KAFKA-9130; KIP-518 Allow listing consumer groups per state (#8238)
Implementation of KIP-518: https://cwiki.apache.org/confluence/display/KAFKA/KIP-518%3A+Allow+listing+consumer+groups+per+state. 

Reviewers: David Jacot <djacot@confluent.io>, Jason Gustafson <jason@confluent.io>

Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
2020-05-29 11:25:20 -07:00
A. Sophie Blee-Goldman 9d52deca24
KAFKA-9501: convert between active and standby without closing stores (#8248)
This PR has gone through several significant transitions of its own, but here's the latest:

* TaskManager just collects the tasks to transition and refers to the active/standby task creator to handle closing & recycling the old task and creating the new one. If we ever hit an exception during the close, we bail and close all the remaining tasks as dirty.

* The task creators tell the task to "close but recycle state". If this is successful, it tells the recycled processor context and state manager that they should transition to the new type.

* During "close and recycle" the task just does a normal clean close, but instead of closing the state manager it informs it to recycle itself: maintain all of its store information (most importantly the current store offsets) but unregister the changelogs from the changelog reader

* The new task will (re-)register its changelogs during initialization, but skip re-registering any stores. It will still read the checkpoint file, but only use the written offsets if the store offsets are not already initialized from pre-transition

* To ensure we don't end up with manual compaction disabled for standbys, we have to call the state restore listener's onRestoreEnd for any active restoring stores that are switching to standbys

Reviewers: John Roesler <vvcephei@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2020-05-29 10:48:03 -07:00
Aakash Shah 38c1e96d2c
KAFKA-9971: Error Reporting in Sink Connectors (KIP-610) (#8720)
Implementation for KIP-610: https://cwiki.apache.org/confluence/display/KAFKA/KIP-610%3A+Error+Reporting+in+Sink+Connectors based on which sink connectors can now report errors at the final stages of the stream that exports records to the sink system.
 
This PR adds the `ErrantRecordReporter` interface as well as its implementation - `WorkerErrantRecordReporter`. The `WorkerErrantRecordReporter` is created in `Worker` and brought up through `WorkerSinkTask` to `WorkerSinkTaskContext`. 

An integration test and unit tests have been added.

Reviewers: Lev Zemlyanov <lev@confluent.io>, Greg Harris <gregh@confluent.io>, Chris Egerton <chrise@confluent.io>, Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>
2020-05-27 23:49:57 -07:00
xiaodongdu 9c833f665f
KAFKA-9960: implement KIP-606 to add metadata context to MetricsReporter (#8691)
Implemented KIP-606 to add metadata context to MetricsReporter.

Author: Xiaodong Du <xdu@confluent.io>
Reviewers: David Arthur <mumrah@gmail.com>, Randall Hauch <rhauch@gmail.com>, Xavier Léauté <xavier@confluent.io>, Ryan Pridgeon <ryan.n.pridgeon@gmail.com>
2020-05-27 20:18:36 -05:00
A. Sophie Blee-Goldman 83c616f706
KAFKA-9983: KIP-613: add INFO level e2e latency metrics (#8697)
Add e2e latency metrics at the beginning and end of task topologies
as INFO-level processor-node-level metrics.

Implements: KIP-613
Reviewers: John Roesler <vvcephei@apache.org>, Andrew Choi <a24choi@edu.uwaterloo.ca>, Bruno Cadonna <cadonna@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2020-05-27 15:55:29 -05:00
Konstantine Karantasis 371f14c3c1
KAFKA-5295: Allow source connectors to specify topic-specific settings for new topics (KIP-158) (#8722)
Kafka Connect workers have been able to create Connect's internal topics using the new admin client for some time now (see KAFKA-4667 for details). However, tasks of source connectors are still relying upon the broker to auto-create topics with default config settings if they don't exist, or expect these topics to exist before the connector is deployed, if their configuration needs to be specialized. 

With the implementation of KIP-158 here, if `topic.creation.enable=true`, Kafka Connect will supply the source tasks of connectors that are configured to create topics with an admin client that will allow them to create new topics on-the-fly before writing the first source records to a new topic. Additionally, each source connector has the opportunity to customize the topic-specific settings of these new topics by defining groups of topic configurations. 

This feature is tested here via unit tests (old tests that have been adjusted and new ones) as well as integration tests.

Reviewers: Randall Hauch <rhauch@gmail.com>
2020-05-26 22:07:34 -07:00
Jeff Huang 2988eac082
KAFKA-9944: Added supporting customized HTTP response headers for Kafka Connect. (#8620)
Added support for customizing the HTTP response headers for Kafka Connect as described in KIP-577.

Author: Jeff Huang <jeff.huang@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2020-05-24 08:56:27 -05:00
Randall Hauch 981ef5166d
KAFKA-9931: Implement KIP-605 to expand support for Connect worker internal topic configurations (#8654)
Added support for -1 replication factor and partitions for distributed worker internal topics by expanding the allowed values for the internal topics’ replication factor and partitions from positive values to also include -1 to signify that the broker defaults should be used.

The Kafka storage classes were already constructing a `NewTopic` object (always with a replication factor and partitions) and sending it to Kafka when required. This change will avoid setting the replication factor and/or number of partitions on this `NewTopic` if the worker configuration uses -1 for the corresponding configuration value.

Also added support for extra settings for internal topics on distributed config, status, and offset internal topics.

Quite a few new tests were added to verify that the `TopicAdmin` utility class is correctly using the AdminClient, and that the `DistributedConfig` validators for these configurations are correct. Also added integration tests for internal topic creation, covering preexisting functionality plus the new functionality.

Author: Randall Hauch <rhauch@gmail.com>
Reviewer: Konstantine Karantasis <konstantine@confluent.io>
2020-05-23 09:00:32 -05:00
Matthias J. Sax 1daa8f638b
KAFKA-9748: Add Streams eos-beta integration test (#8496)
Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-05-04 22:45:54 -07:00
A. Sophie Blee-Goldman b5de449377
KAFKA-9127: don't create StreamThreads for global-only topology (#8540)
Reviewers: Matthias J. Sax <matthias@confluent.io>, John Roesler <vvcephei@apache.org>
2020-04-28 22:34:17 -05:00
Colin Patrick McCabe bf6dffe93b
KAFKA-9309: Add the ability to translate Message classes to and from JSON (#7844)
Reviewers: David Arthur <mumrah@gmail.com>, Ron Dagostino <rdagostino@confluent.io>
2020-04-09 13:11:36 -07:00
Chia-Ping Tsai 833dc7725c
HOTFIX: exclude ConsumerCoordinator from NPathComplexity check (#8447)
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
2020-04-08 13:00:03 +01:00
Lucas Bradstreet 46540eb5e0
KAFKA-9820: validateMessagesAndAssignOffsetsCompressed allocates unused iterator (#8422)
3e9d1c1411 introduced skipKeyValueIterator(s) which were intended to be used, but in this case were created but were not used in offset validation.

A subset of the benchmark results follow. Looks like a 20% improvement in validation performance and a 40% reduction in garbage allocation for 1-2 batch sizes.

**# Parameters: (bufferSupplierStr = NO_CACHING, bytes = RANDOM, compressionType = LZ4, maxBatchSize = 1, messageSize = 1000, messageVersion = 2)**

Before:
Result "org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation":
  64851.837 ±(99.9%) 944.248 ops/s [Average]              
  (min, avg, max) = (64505.317, 64851.837, 65114.359), stdev = 245.218
  CI (99.9%): [63907.589, 65796.084] (assumes normal distribution)                                       
                                                             
"org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm":
  164088.003 ±(99.9%) 0.004 B/op [Average]                                                                                 
  (min, avg, max) = (164088.001, 164088.003, 164088.004), stdev = 0.001
  CI (99.9%): [164087.998, 164088.007] (assumes normal distribution)

After:

Result "org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation":                                      
  78910.273 ±(99.9%) 707.024 ops/s [Average]                                                                               
  (min, avg, max) = (78785.486, 78910.273, 79234.007), stdev = 183.612                                                     
  CI (99.9%): [78203.249, 79617.297] (assumes normal distribution)                                       

"org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm":                                                                                                                                   
  96440.002 ±(99.9%) 0.001 B/op [Average]                                                                                  
  (min, avg, max) = (96440.002, 96440.002, 96440.002), stdev = 0.001                                                       
  CI (99.9%): [96440.002, 96440.003] (assumes normal distribution)   

 **# Parameters: (bufferSupplierStr = NO_CACHING, bytes = RANDOM, compressionType = LZ4, maxBatchSize = 2, messageSize = 1000, messageVersion = 2)**

Before:
Result "org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation":                                      
  64815.364 ±(99.9%) 639.309 ops/s [Average]                                                                               
  (min, avg, max) = (64594.545, 64815.364, 64983.305), stdev = 166.026                                                                                                                                                                                
  CI (99.9%): [64176.056, 65454.673] (assumes normal distribution)                                                         
                                                                                                                                                                                        "org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm":        
  163944.003 ±(99.9%) 0.001 B/op [Average]                                                                                 
  (min, avg, max) = (163944.002, 163944.003, 163944.003), stdev = 0.001                                                    
  CI (99.9%): [163944.002, 163944.004] (assumes normal distribution)                                     

After:
Result "org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation":
  77075.096 ±(99.9%) 201.092 ops/s [Average]              
  (min, avg, max) = (77021.537, 77075.096, 77129.693), stdev = 52.223
  CI (99.9%): [76874.003, 77276.188] (assumes normal distribution)                                       
                                                             
"org.apache.kafka.jmh.record.RecordBatchIterationBenchmark.measureValidation:·gc.alloc.rate.norm":
  96504.002 ±(99.9%) 0.003 B/op [Average]                                                                                  
  (min, avg, max) = (96504.001, 96504.002, 96504.003), stdev = 0.001
  CI (99.9%): [96503.999, 96504.005] (assumes normal distribution)

Reviewers: Chia-Ping Tsai <chia7712@gmail.com>, Ismael Juma <ismael@juma.me.uk>
2020-04-04 10:05:51 -07:00
A. Sophie Blee-Goldman 6e0d553350
MINOR: clean up Streams assignment classes and tests (#8406)
First set of cleanup pushed to followup PR after KIP-441 Pt. 5. Main changes are:

1. Moved `RankedClient` and the static `buildClientRankingsByTask` to a new file
2. Moved `Movement` and the static `getMovements` to a new file (also renamed to `TaskMovement`)
3. Consolidated the many common variables throughout the assignment tests to the new `AssignmentTestUtils` 
4. New utility to generate comparable/predictable UUIDs for tests, and removed the generic from `TaskAssignor` and all related classes

Reviewers: John Roesler <vvcephei@apache.org>, Andrew Choi <a24choi@edu.uwaterloo.ca>
2020-04-03 13:53:51 -05:00
A. Sophie Blee-Goldman 2322bc0a6f
KAFKA-6145: Pt. 5 Implement high availability assignment (#8337)
Adds a new TaskAssignor implementation, currently hidden behind an internal feature flag, that implements the high availability algorithm of KIP-441.

Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-04-02 13:36:03 -05:00
Gardner Vickers 8cf781ef01
MINOR: Improve performance of checkpointHighWatermarks, patch 1/2 (#6741)
This PR works to improve high watermark checkpointing performance.

`ReplicaManager.checkpointHighWatermarks()` was found to be a major contributor to GC pressure, especially on Kafka clusters with high partition counts and low throughput.

Added a JMH benchmark for `checkpointHighWatermarks` which establishes a
performance baseline. The parameterized benchmark was run with 100, 1000 and
2000 topics. 

Modified `ReplicaManager.checkpointHighWatermarks()` to avoid extra copies and cached
the Log parent directory Sting to avoid frequent allocations when calculating
`File.getParent()`.

A few clean-ups:
* Changed all usages of Log.dir.getParent to Log.parentDir and Log.dir.getParentFile to
Log.parentDirFile.
* Only expose public accessor for `Log.dir` (consistent with `Log.parentDir`)
* Removed unused parameters in `Partition.makeLeader`, `Partition.makeFollower` and `Partition.createLogIfNotExists`.

Benchmark results:

| Topic Count | Ops/ms | MB/sec allocated |
|-------------|---------|------------------|
| 100               | + 51%    |  - 91% |
| 1000             | + 143% |  - 49% |
| 2000            | + 149% |   - 50% |

Reviewers: Lucas Bradstreet <lucas@confluent.io>. Ismael Juma <ismael@juma.me.uk>

Co-authored-by: Gardner Vickers <gardner@vickers.me>
Co-authored-by: Ismael Juma <ismael@juma.me.uk>
2020-03-25 20:53:42 -07:00
A. Sophie Blee-Goldman 6cf27c9c77
KAFKA-6145: Pt 2.5 Compute overall task lag per client (#8252)
Once we have encoded the offset sums per task for each client, we can compute the overall lag during assign by fetching the end offsets for all changelog and subtracting.

If the listOffsets request fails, we simply return a "completely sticky" assignment, ie all active tasks are given to previous owners regardless of balance.

Builds (but does not yet use) the statefulTasksToRankedCandidates map with the ranking:
Rank -1: active running task
Rank 0: standby or restoring task whose overall lag is within acceptableRecoveryLag
Rank 1: tasks whose lag is unknown (eg during version probing)
Rank 1+: all other tasks are ranked according to their actual total lag

Implements: KIP-441
Reviewers: Bruno Cadonna <bruno@confluent.io>, John Roesler <vvcephei@apache.org>
2020-03-21 13:40:34 -05:00
Colin Patrick McCabe 56051e7639
KAFKA-8820: kafka-reassign-partitions.sh should support the KIP-455 API (#8244)
Rewrite ReassignPartitionsCommand to use the KIP-455 API when possible, rather
than direct communication with ZooKeeper.  Direct ZK access is still supported,
but deprecated, as described in KIP-455.

As specified in KIP-455, the tool has several new flags.  --cancel stops
an assignment which is in progress.  --preserve-throttle causes the
--verify and --cancel commands to leave the throttles alone.
--additional allows users to execute another partition assignment even
if there is already one in progress.  Finally, --show displays all of
the current partition reassignments.

Reorganize the reassignment code and tests somewhat to rely more on unit
testing using the MockAdminClient and less on integration testing.  Each
integration test where we bring up a cluster seems to take about 5 seconds, so
it's good when we can get similar coverage from unit tests.  To enable this,
MockAdminClient now supports incrementalAlterConfigs, alterReplicaLogDirs,
describeReplicaLogDirs, and some other APIs.  MockAdminClient is also now
thread-safe, to match the real AdminClient implementation.

In DeleteTopicTest, use the KIP-455 API rather than invoking the reassignment
command.
2020-03-19 20:44:34 -07:00
Matthias J. Sax 89cd2f2a0b
KAFKA-9441: Unify committing within TaskManager (#8218)
- part of KIP-447
 - commit all tasks at once using non-eos (and eos-beta in follow up work)
 - unified commit logic into TaskManager
 - split existing methods of Task interface in pre/post parts

Reviewers: Boyang Chen <boyang@confluent.io>, Guozhang Wang <guozhang@confluent.io>
2020-03-19 11:31:51 -07:00
Manikumar Reddy a0e1407820
KAFKA-9670; Reduce allocations in Metadata Response preparation (#8236)
This PR removes  intermediate  conversions between `MetadataResponse.TopicMetadata` => `MetadataResponseTopic` and `MetadataResponse.PartitionMetadata` => `MetadataResponsePartition` objects.

There is 15-20% reduction in object allocations and 5-10% improvement in metadata request performance.

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson<jason@confluent.io>
2020-03-16 09:30:48 -07:00
Brian Byrne 227a7322b7
KIP-546: Implement describeClientQuotas and alterClientQuotas. (#8083)
Reviewers: Colin P. McCabe <cmccabe@apache.org>
2020-03-14 23:03:13 -07:00
jiao e3ccf20794 KAFKA-9685: Solve Set concatenation perf issue in AclAuthorizer
To dismiss the usage of operation ++ against Set which is slow when Set has many entries. This pr introduces a new class 'AclSets' which takes multiple Sets as parameters and do 'find' against them one by one. For more details about perf and benchmark, refer to [KAFKA-9685](https://issues.apache.org/jira/browse/KAFKA-9685)

Author: jiao <jiao.zhang@linecorp.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #8261 from jiao-zhangS/jira-9685
2020-03-13 20:53:29 +05:30
John Roesler 78374a1549
KAFKA-9615: Clean up task/producer create and close (#8213)
* Consolidate task/producer management. Now, exactly one component manages
  the creation and destruction of Producers, whether they are per-thread or per-task.
* Add missing test coverage on TaskManagerTest

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Boyang Chen <boyang@confluent.io>
2020-03-05 14:20:46 -06:00
Manikumar Reddy 8dff0b168a Kafka 9626: Improve ACLAuthorizer.acls() performance
This PR avoids creation of unnecessary sets in AclAuthorizer.acls() method implementation.

Perf results:
**Old**
```
Benchmark                                (aclCount)  (resourceCount)  Mode  Cnt    Score   Error  Units
AclAuthorizerBenchmark.testAclsIterator           5             5000  avgt   15    5.821 ? 0.309  ms/op
AclAuthorizerBenchmark.testAclsIterator           5            10000  avgt   15   15.303 ? 0.107  ms/op
AclAuthorizerBenchmark.testAclsIterator           5            50000  avgt   15   74.976 ? 0.543  ms/op
AclAuthorizerBenchmark.testAclsIterator          10             5000  avgt   15   15.366 ? 0.184  ms/op
AclAuthorizerBenchmark.testAclsIterator          10            10000  avgt   15   29.899 ? 0.129  ms/op
AclAuthorizerBenchmark.testAclsIterator          10            50000  avgt   15  167.301 ? 1.723  ms/op
AclAuthorizerBenchmark.testAclsIterator          15             5000  avgt   15   21.980 ? 0.114  ms/op
AclAuthorizerBenchmark.testAclsIterator          15            10000  avgt   15   44.385 ? 0.255  ms/op
AclAuthorizerBenchmark.testAclsIterator          15            50000  avgt   15  241.919 ? 3.955  ms/op
```
**New**

```
Benchmark                                (aclCount)  (resourceCount)  Mode  Cnt   Score   Error  Units
AclAuthorizerBenchmark.testAclsIterator           5             5000  avgt   15   0.666 ? 0.004  ms/op
AclAuthorizerBenchmark.testAclsIterator           5            10000  avgt   15   1.427 ? 0.015  ms/op
AclAuthorizerBenchmark.testAclsIterator           5            50000  avgt   15  21.410 ? 0.225  ms/op
AclAuthorizerBenchmark.testAclsIterator          10             5000  avgt   15   1.230 ? 0.018  ms/op
AclAuthorizerBenchmark.testAclsIterator          10            10000  avgt   15   4.303 ? 0.744  ms/op
AclAuthorizerBenchmark.testAclsIterator          10            50000  avgt   15  36.724 ? 0.409  ms/op
AclAuthorizerBenchmark.testAclsIterator          15             5000  avgt   15   2.433 ? 0.379  ms/op
AclAuthorizerBenchmark.testAclsIterator          15            10000  avgt   15   9.818 ? 0.214  ms/op
AclAuthorizerBenchmark.testAclsIterator          15            50000  avgt   15  52.886 ? 0.525  ms/op
```

Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Author: Lucas Bradstreet <lucas@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Lucas Bradstreet <lucas@confluent.io>

Closes #8199 from omkreddy/KAFKA-9626
2020-03-03 01:51:09 +05:30
Bob Barrett 937f1f741c
KAFKA-8805; Bump producer epoch on recoverable errors (#7389)
This change is the client-side part of KIP-360. It identifies cases where it is safe to abort a transaction, bump the producer epoch, and allow the application to continue without closing the producer. In these cases, when KafkaProducer.abortTransaction() is called, the producer sends an InitProducerId following the transaction abort, which causes the producer epoch to be bumped. The application can then start a new transaction and continue processing.

For recoverable errors in the idempotent producer, the epoch is bumped locally. In-flight requests for partitions with an error are rewritten to reflect the new epoch, and in-flights of all other partitions are allowed to complete using the old epoch. 

Reviewers: Boyang Chen <boyang@confluent.io>, Jason Gustafson <jason@confluent.io>
2020-02-15 22:47:10 -08:00
Konstantine Karantasis 16ee326755
KAFKA-9556; Fix two issues with KIP-558 and expand testing coverage (#8085)
Correct the Connect worker logic to properly disable the new topic status (KIP-558) feature when `topic.tracking.enable=false`, and fix automatic topic status reset after a connector is deleted.

Also adds new `ConnectorTopicsIntegrationTest` and expanded unit tests.

Reviewers: Randall Hauch <rhauch@gmail.com>
2020-02-14 14:34:34 -08:00
Xavier Léauté 7e1c39f75a
KAFKA-9106 make metrics exposed via jmx configurable (#7674)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2020-02-13 10:21:14 -08:00
Boyang Chen 07db26c20f
KAFKA-9417: New Integration Test for KIP-447 (#8000)
This change mainly have 2 components:

1. extend the existing transactions_test.py to also try out new sendTxnOffsets(groupMetadata) API to make sure we are not introducing any regression or compatibility issue
  a. We shrink the time window to 10 seconds for the txn timeout scheduler on broker so that we could trigger expiration earlier than later

2. create a completely new system test class called group_mode_transactions_test which is more complicated than the existing system test, as we are taking rebalance into consideration and using multiple partitions instead of one. For further breakdown:
  a. The message count was done on partition level, instead of global as we need to visualize 
the per partition order throughout the test. For this sake, we extend ConsoleConsumer to print out the data partition as well to help message copier interpret the per partition data.
  b. The progress count includes the time for completing the pending txn offset expiration
  c. More visibility and feature improvements on TransactionMessageCopier to better work under either standalone or group mode.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-12 12:34:12 -08:00
John Roesler 1681c78f60
KAFKA-9500: Fix FK Join Topology (#8015)
Corrects a flaw leading to an exception while building topologies that include both:

* A foreign-key join with the result not explicitly materialized
* An operation after the join that requires source materialization

Also corrects a flaw in TopologyTestDriver leading to output records being enqueued in the wrong order under some (presumably rare) circumstances.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-02-11 22:38:05 -06:00
Guozhang Wang 4090f9a2b0
KAFKA-9113: Clean up task management and state management (#7997)
This PR is collaborated by Guozhang Wang and John Roesler. It is a significant tech debt cleanup on task management and state management, and is broken down by several sub-tasks listed below:

Extract embedded clients (producer and consumer) into RecordCollector from StreamTask.
guozhangwang#2
guozhangwang#5

Consolidate the standby updating and active restoring logic into ChangelogReader and extract out of StreamThread.
guozhangwang#3
guozhangwang#4

Introduce Task state life cycle (created, restoring, running, suspended, closing), and refactor the task operations based on the current state.
guozhangwang#6
guozhangwang#7

Consolidate AssignedTasks into TaskManager and simplify the logic of changelog management and task management (since they are already moved in step 2) and 3)).
guozhangwang#8
guozhangwang#9

Also simplified the StreamThread logic a bit as the embedded clients / changelog restoration logic has been moved into step 1) and 2).
guozhangwang#10

Reviewers: A. Sophie Blee-Goldman <sophie@confluent.io>, Bruno Cadonna <bruno@confluent.io>, Boyang Chen <boyang@confluent.io>
2020-02-04 21:06:39 -08:00
Rajini Sivaram a565d1a182
KAFKA-9181; Maintain clean separation between local and group subscriptions in consumer's SubscriptionState (#7941)
Reviewers: Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2020-01-24 10:38:21 +00:00
Boyang Chen de90175fc2 KAFKA-9418; Add new sendOffsetsToTransaction API to KafkaProducer (#7952)
This patch adds a new API to the producer to implement transactional offset commit fencing through the group coordinator as proposed in KIP-447. This PR mainly changes on the Producer end for compatible paths to old `sendOffsetsToTxn(offsets, groupId)` vs new `sendOffsetsToTxn(offsets, groupMetadata)`.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-01-22 13:48:36 -08:00
Mickael Maison 3953204d35 MINOR: Fix connect:mirror checkstyle (#7951)
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
2020-01-13 15:25:24 -08:00
John Roesler 717ce42a6d
KAFKA-9138: Add system test for relational joins (#7664)
Add a system test to verify the new foreign-key join introduced in KIP-213

Reviewers: Guozhang Wang <wangguoz@gmail.com>
2019-12-11 09:48:23 -08:00
Colin Patrick McCabe 02df8e1496
KAFKA-8986: Allow null as a valid default for tagged fields. (#7585)
Allow null as a valid default for tagged fields.  Fix a bunch of cases where this would previously result in null pointer dereferences.

Also allow inferring FieldSpec#versions based on FieldSpec#taggedVersions.  Prefix 'key' with an underscore when it is used in the generated code, to avoid potential name collisions if someone names an RPC field "key".

Allow setting setting hexadecimal constants and 64-bit contstants.

Add a lot more test cases to SimpleExampleMessage.json.

Reviewers: Jason Gustafson <jason@confluent.io>
2019-11-20 16:40:18 -08:00
John Roesler 4a5155c934 KAFKA-8868: Generate SubscriptionInfo protocol message (#7248)
Rather than maintain hand coded protocol serialization code, Streams could use the same code-generation framework as Clients/Core.

There isn't a perfect match, since the code generation framework includes an assumption that you're generating "protocol messages", rather than just arbitrary blobs, but I think it's close enough to justify using it, and improving it over time.

Using the code generation allows us to drop a lot of detail-oriented, brittle, and hard-to-maintain serialization logic in favor of a schema spec.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Boyang Chen <boyang@confluent.io>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-11-01 10:03:55 -07:00
Nikolay adb2bdb122 KAFKA-8584: The RPC code generator should support ByteBuffer. (#7342)
The RPC code generator should support using the ByteBuffer class in addition to byte arrays. By using the ByteBuffer class, we can avoid performing a copy in many situations. Also modify TestByteBufferDataTest to test the new feature.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Guozhang Wang <wangguoz@gmail.com>
2019-10-23 12:39:12 -07:00
John Roesler f93c473be1 KAFKA-9000: fix flaky FK join test by using TTD (#7517)
Migrate this integration test to use TopologyTestDriver instead of running 3 Streams instances.

Dropped one test that was attempting to produce specific interleavings. If anything, these should be verified deterministically by unit testing.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-16 22:40:57 -07:00
Greg Harris ff68b60429 KAFKA-8340, KAFKA-8819: Use PluginClassLoader while statically initializing plugins (#7315)
Added plugin isolation unit tests for various scenarios, with a `TestPlugins` class that compiles and builds multiple test plugins without them being on the classpath and verifies that the Plugins and DelegatingClassLoader behave properly. These initially failed for several cases, but now pass since the issues have been fixed.

KAFKA-8340 and KAFKA-8819 are closely related, and this fix corrects the problems reported in both issues.

Author: Greg Harris <gregh@confluent.io>
Reviewers: Chris Egerton <chrise@confluent.io>, Magesh Nandakumar <mageshn@confluent.io>, Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
2019-10-16 20:43:00 -05:00
Lucas Bradstreet 8966d066bd KAFKA-9039: Optimize ReplicaFetcher fetch path (#7443)
Improves the performance of the replica fetcher for high partition count fetch requests, where a majority of the partitions did not update between fetch requests. All benchmarks were run on an r5x.large.

Vanilla
Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 26491.825 ± 438.463 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 153941.952 ± 4337.073 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 339868.602 ± 4201.462 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2588878.448 ± 22172.482 ns/op

From 100 to 5000 partitions the latency increase is 2588878.448 / 26491.825 = 97.

Avoid gettimeofdaycalls in steady state fetch states
8545888

Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 22685.381 ± 267.727 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 113622.521 ± 1854.254 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 273698.740 ± 9269.554 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2189223.207 ± 1706.945 ns/op

From 100 to 5000 partitions the latency increase is 2189223.207 / 22685.381 = 97X

Avoid copying partition states to maintain fetch offsets
29fdd60

Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 17039.989 ± 609.355 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 99371.086 ± 1833.256 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 216071.333 ± 3714.147 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 2035678.223 ± 5195.232 ns/op

From 100 to 5000 partitions the latency increase is 2035678.223 / 17039.989 = 119X

Keep lag alongside PartitionFetchState to avoid expensive isReplicaInSync check
0e57e3e

Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 15131.684 ± 382.088 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 86813.843 ± 3346.385 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 193050.381 ± 3281.833 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1801488.513 ± 2756.355 ns/op

From 100 to 5000 partitions the latency increase is 1801488.513 / 15131.684 = 119X

Fetch session optimizations (mostly presizing the next hashmap, and avoiding making a copy of sessionPartitions, as a deep copy is not required for the ReplicaFetcher)
2614b24

Benchmark (partitionCount) Mode Cnt Score Error Units
ReplicaFetcherThreadBenchmark.testFetcher 100 avgt 15 11386.203 ± 416.701 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 500 avgt 15 60820.292 ± 3163.001 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 1000 avgt 15 146242.158 ± 1937.254 ns/op
ReplicaFetcherThreadBenchmark.testFetcher 5000 avgt 15 1366768.926 ± 3305.712 ns/op

From 100 to 5000 partitions the latency increase is 1366768.926 / 11386.203 = 120

Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2019-10-16 09:49:53 -07:00
A. Sophie Blee-Goldman d88f1048da KAFKA-8179: Part 7, cooperative rebalancing in Streams (#7386)
Key improvements with this PR:

* tasks will remain available for IQ during a rebalance (but not during restore)
* continue restoring and processing standby tasks during a rebalance
* continue processing active tasks during rebalance until the RecordQueue is empty*
* only revoked tasks must suspended/closed
* StreamsPartitionAssignor tries to return tasks to their previous consumers within a client
* but do not try to commit, for now (pending KAFKA-7312)


Reviewers: John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-10-07 09:27:09 -07:00
Ryanne Dolan 4ac892ca78 KAFKA-7500: MirrorMaker 2.0 (KIP-382)
Implementation of [KIP-382 "MirrorMaker 2.0"](https://cwiki.apache.org/confluence/display/KAFKA/KIP-382%3A+MirrorMaker+2.0)

Author: Ryanne Dolan <ryannedolan@gmail.com>
Author: Arun Mathew <arunmathew88@gmail.com>
Author: In Park <inpark@cloudera.com>
Author: Andre Price <obsoleted@users.noreply.github.com>
Author: christian.hagel@rio.cloud <christian.hagel@rio.cloud>

Reviewers: Eno Thereska <eno.thereska@gmail.com>, William Hammond <william.t.hammond@gmail.com>, Viktor Somogyi <viktorsomogyi@gmail.com>, Jakub Korzeniowski, Tim Carey-Smith, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Arun Mathew, Jeremy-l-ford, vpernin, Oleg Kasian <oleg.kasian@gmail.com>, Mickael Maison <mickael.maison@gmail.com>, Qihong Chen, Sriharsha Chintalapani <sriharsha@apache.org>, Jun Rao <junrao@gmail.com>, Randall Hauch <rhauch@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #6295 from ryannedolan/KIP-382
2019-10-07 13:57:54 +05:30
Colin Patrick McCabe 0de61a4683 KAFKA-8885; The Kafka Protocol should Support Optional Tagged Fields (#7325)
This patch implements support for optional (tagged) fields in the Kafka protocol as documented in KIP-482: https://cwiki.apache.org/confluence/display/KAFKA/KIP-482%3A+The+Kafka+Protocol+should+Support+Optional+Tagged+Fields#KIP-482:TheKafkaProtocolshouldSupportOptionalTaggedFields-TypeClasses.

Reviewers: David Jacot <djacot@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
2019-10-06 21:13:23 -07:00
Adam Bellemare c87fe9402c KAFKA-3705 Added a foreignKeyJoin implementation for KTable. (#5527)
https://issues.apache.org/jira/browse/KAFKA-3705

Allows for a KTable to map its value to a given foreign key and join on another KTable keyed on that foreign key. Applies the joiner, then returns the tuples keyed on the original key. This supports updates from both sides of the join.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <mjsax@apache.org>,  John Roesler <john@confluent.io>, Boyang Chen <boyang@confluent.io>, Christopher Pettitt <cpettitt@confluent.io>, Bill Bejeck <bbejeck@gmail.com>, Jan Filipiak <Jan.Filipiak@trivago.com>, pgwhalen, Alexei Daniline
2019-10-03 18:59:31 -04:00
Chris Egerton 791d0d61bf KAFKA-8804: Secure internal Connect REST endpoints (#7310)
Implemented KIP-507 to secure the internal Connect REST endpoints that are only for intra-cluster communication. A new V2 of the Connect subprotocol enables this feature, where the leader generates a new session key, shares it with the other workers via the configuration topic, and workers send and validate requests to these internal endpoints using the shared key.

Currently the internal `POST /connectors/<connector>/tasks` endpoint is the only one that is secured.

This change adds unit tests and makes some small alterations to system tests to target the new `sessioned` Connect subprotocol. A new integration test ensures that the endpoint is actually secured (i.e., requests with missing/invalid signatures are rejected with a 400 BAD RESPONSE status).

Author: Chris Egerton <chrise@confluent.io>
Reviewed: Konstantine Karantasis <konstantine@confluent.io>, Randall Hauch <rhauch@gmail.com>
2019-10-02 17:06:57 -05:00
Arjun Satish 1c831c22e1 KAFKA-7772: Dynamically Adjust Log Levels in Connect (#7403)
Implemented KIP-495 to expose a new `admin/loggers` endpoint for the Connect REST API that lists the current log levels and allows the caller to change log levels. 

Author: Arjun Satish <arjun@confluent.io>
Reviewer: Randall Hauch <rhauch@gmail.com>
2019-10-02 17:00:37 -05:00
Yaroslav Tkachenko 70d1bb40d9 KAFKA-7273: Extend Connect Converter to support headers (#6362)
Implemented KIP-440 to allow Connect converters to use record headers when serializing or deserializing keys and values. This change is backward compatible in that the new methods default to calling the older existing methods, so existing Converter implementations need not be changed. This changes the WorkerSinkTask and WorkerSourceTask to use the new converter methods, but Connect's existing Converter implementations and the use of converters for internal topics are intentionally not modified. Added unit tests.

Author: Yaroslav Tkachenko <sapiensy@gmail.com>
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ewen Cheslack-Postava <me@ewencp.org>, Randall Hauch <rhauch@gmail.com>
2019-09-25 11:23:03 -05:00
Colin Patrick McCabe 92688ef82c MINOR: improve the Kafka RPC code generator (#7340)
Move the generator checkstyle suppressions to a special section, rather
than mixing them in with the other sections.  For generated code, do not
complain about variable names or cyclic complexity.

FieldType.java: remove isInteger since it isn't used anywhere.  This way, we
don't have to decide whether a UUID is an integer or not (there are arguments
for both choices).  Add FieldType#serializationIsDifferentInFlexibleVersions
and FieldType#isVariableLength.

HeaderGenerator: add the ability to generate static imports.  Add
IsNullConditional, VersionConditional, and ClauseGenerator as easier ways of
generating "if" statements.
2019-09-25 11:58:54 -04:00
Guozhang Wang a0470726c4 MINOR: Move Murmur3 to Streams 2019-09-19 16:38:18 -07:00
Adam Bellemare 2d0cd2ef54 MINOR: Murmur3 Hash with Guava dependency
Part of supporting KIP-213 ( https://cwiki.apache.org/confluence/display/KAFKA/KIP-213+Support+non-key+joining+in+KTable ). Murmur3 hash is used as a hashing mechanism in KIP-213 for the large range of uniqueness. The Murmur3 class and tests are ported directly from Apache Hive, with no alterations to the code or dependencies.

Author: Adam Bellemare <adam.bellemare@wishabi.com>

Reviewers: John Roesler <vvcephei@users.noreply.github.com>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

Closes #7271 from bellemare/murmur3hash
2019-09-19 15:36:32 -07:00
Lucas Bradstreet f3ded39a05 KAFKA-8841; Reduce overhead of ReplicaManager.updateFollowerFetchState (#7324)
This PR makes two changes to code in the ReplicaManager.updateFollowerFetchState path, which is in the hot path for follower fetches. Although calling ReplicaManager.updateFollowerFetch state is inexpensive on its own, it is called once for each partition every time a follower fetch occurs.

1. updateFollowerFetchState no longer calls maybeExpandIsr when the follower is already in the ISR. This avoid repeated expansion checks. 
2. Partition.maybeIncrementLeaderHW is also in the hot path for ReplicaManager.updateFollowerFetchState. Partition.maybeIncrementLeaderHW calls Partition.remoteReplicas four times each iteration, and it performs a toSet conversion. maybeIncrementLeaderHW now avoids generating any intermediate collections when updating the HWM.

**Benchmark results for Partition.updateFollowerFetchState on a r5.xlarge:**
Old:
```
  1288.633 ±(99.9%) 1.170 ns/op [Average]
  (min, avg, max) = (1287.343, 1288.633, 1290.398), stdev = 1.037
  CI (99.9%): [1287.463, 1289.802] (assumes normal distribution)
```

New (when follower fetch offset is updated):
```
  261.727 ±(99.9%) 0.122 ns/op [Average]
  (min, avg, max) = (261.565, 261.727, 261.937), stdev = 0.114
  CI (99.9%): [261.605, 261.848] (assumes normal distribution)
```

New (when follower fetch offset is the same):
```
  68.484 ±(99.9%) 0.025 ns/op [Average]
  (min, avg, max) = (68.446, 68.484, 68.520), stdev = 0.023
  CI (99.9%): [68.460, 68.509] (assumes normal distribution)
```

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
2019-09-18 09:11:39 -07:00
Scott Hendricks 39afe9fe0e KAFKA-8853; Create sustained connections test for Trogdor
This creates a test that generates sustained connections against Kafka.  There
are three different components we can stress with this, KafkaConsumer,
KafkaProducer, and AdminClient.  This test tries use minimal bandwidth per
connection to reduce overhead impacts.

This test works by creating a threadpool that creates connections and then
maintains a central pool of connections at a specified keepalive rate.  The
keepalive action varies by which component is being stressed:

  * KafkaProducer:  Sends one single produce record.  The configuration for
    the produce request uses the same key/value generator as the ProduceBench
    test.

  * KafkaConsumer: Subscribes to a single partition, seeks to the end, and
    then polls a minimal number of records.  Each consumer connection is its
    own consumer group, and defaults to 1024 bytes as FETCH_MAX_BYTES to keep
    traffic to a minimum.

  * AdminClient: Makes an API call to get the nodes in the cluster.

NOTE: This test is designed to be run alongside a ProduceBench test for a
specific topic, due to the way the Consumer test polls a single partition.
There may be no data returned by the consumer test if this is run on its own.
The connection should still be kept alive, but with no data returned.

Author: Scott Hendricks <scott.hendricks@confluent.io>

Reviewers: Stanislav Kozlovski, Gwen Shapira

Closes #7289 from scott-hendricks/trunk
2019-09-08 19:49:13 -07:00
Boyang Chen c0019e6538 KAFKA-8590; Use automated TxnOffsetCommit type and add tests for OffsetCommit (#6994)
This PR changes the TxnOffsetCommit protocol to auto-generated types, and add more unit test coverage to the plain OffsetCommit protocol.

Reviewers: Jason Gustafson <jason@confluent.io>
2019-09-05 23:07:42 -07:00
Rajini Sivaram 364794866f
KAFKA-8760; New Java Authorizer API (KIP-504) (#7268)
New Java Authorizer API and a new out-of-the-box authorizer (AclAuthorizer) that implements the new interface.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
2019-09-02 14:43:17 +01:00
cpettitt-confluent 7334222a71 KAFKA-8412: Fix nullpointer exception thrown on flushing before closing producers (#7207)
Prior to this change an NPE is raised when calling AssignedTasks.close
under the following conditions:

1. EOS is enabled
2. The task was in a suspended state

The cause for the NPE is that when a clean close is requested for a
StreamTask the StreamTask tries to commit. However, in the suspended
state there is no producer so ultimately an NPE is thrown for the
contained RecordCollector in flush.

The fix put forth in this commit is to have AssignedTasks call
closeSuspended when it knows the underlying StreamTask is suspended.

Note also that this test is quite involved. I could have just tested
that AssignedTasks calls closeSuspended when appropriate, but that is
testing, IMO, a detail of the implementation and doesn't actually verify
we reproduced the original problem as it was described. I feel much more
confident that we are reproducing the behavior - and we can test exactly
the conditions that lead to it - when testing across AssignedTasks and
StreamTask. I believe this is an additional support for the argument of
eventually consolidating the state split across classes.

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-08-26 09:53:36 -07:00
Stanislav Kozlovski 5129ab53ee KAFKA-8345: KIP-455: Admin API changes (Part 2) (#7120)
Add the AlterPartitionReassignments and ListPartitionReassignments APIs.  Also remove an unused methodlength suppression for KafkaAdminClient.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>
2019-08-14 09:25:17 -07:00
Justine Olshan 717c55be97 KAFKA-8601: Implement KIP-480: Sticky Partitioning for keyless records (#6997)
Implement KIP-480, which specifies that the default partitioner should use a "sticky" partitioning strategy for records that have a null key.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Lucas Bradstreet <lucasbradstreet@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Jun Rao <junrao@gmail.com>, Kamal Chandraprakash  <kamal.chandraprakash@gmail.com>
2019-08-01 14:36:12 -07:00
Boyang Chen 74c90f46c3 KAFKA-8221; Add batch leave group request (#6714)
This patch is part of KIP-345. We are aiming to support batch leave group request issued from admin client. This diff is the first effort to bump leave group request version.

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
2019-07-26 23:13:37 -07:00
Andy Coates 2a133ba656 KAFKA-8454; Add Java AdminClient Interface (KIP-476) (#7087)
Adds an `Admin` interface as specified in [KIP-476](https://cwiki.apache.org/confluence/display/KAFKA/KIP-476%3A+Add+Java+AdminClient+Interface).

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
2019-07-22 15:47:34 -07:00
Ismael Juma 57903be496
MINOR: Remove zkclient dependency (#7036)
ZkUtils was removed so we don't need this anymore.

Also:
* Fix ZkSecurityMigrator and ReplicaManagerTest not to
reference ZkClient classes.
* Remove references to zkclient in various `log4j.properties`
and `import-control.xml`.

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2019-07-05 07:50:32 -07:00
Boyang Chen b4e20495fa remove cs (#6950)
cleanup of some redundant checkstyle
Reviewers: Bill Bejeck <bbejeck@gmail.com>
2019-06-28 18:47:16 -04:00
A. Sophie Blee-Goldman 16769d263e KAFKA-8215: Upgrade Rocks to v5.18.3 (#6743)
This upgrade exposes a number of new options, including the WriteBufferManager which -- along with existing TableConfig options -- allows users to limit the total memory used by RocksDB across instances. This can alleviate some cascading OOM potential when, for example, a large number of stateful tasks are suddenly migrated to the same host.

The RocksDB docs guarantee backwards format compatibility across versions

Reviewers: Matthias J. Sax <mjsax@apache.org>, Bill Bejeck <bbejeck@gmail.com>,
2019-05-17 16:32:17 -04:00
Magesh Nandakumar 2e91a310d7 KAFKA-8265: Initial implementation for ConnectorClientConfigPolicy to enable overrides (KIP-458) (#6624)
Implementation to enable policy for Connector Client config overrides. This is
implemented per the KIP-458.

Reviewers: Randall Hauch <rhauch@gmail.com>
2019-05-17 01:37:32 -07:00
Chris Egerton cc097e909c KAFKA-8304: Fix registration of Connect REST extensions (#6651)
Fix registration of Connect REST extensions to prevent deadlocks when extensions get the list of connectors before the herder is available. Added integration test to check the behavior.

Author: Chris Egerton <cegerton@oberlin.edu>
Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>
2019-05-07 17:20:51 -05:00
Boyang Chen 0f995ba6be KAFKA-7862 & KIP-345 part-one: Add static membership logic to JoinGroup protocol (#6177)
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:

* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower

The rest of the 345 change will be broken down to 4 separate diffs:

* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay

Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2019-04-26 11:44:38 -07:00
Manikumar Reddy 3b1524c5df KAFKA-7466: Add IncrementalAlterConfigs API (KIP-339) (#6247)
Reviewers: Colin P. McCabe <cmccabe@apache.org>, Viktor Somogyi <viktorsomogyi@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
2019-04-16 16:26:33 -07:00
Viktor Somogyi 7bd81628d9 KAFKA-6635; Producer close awaits pending transactions (#5971)
Currently close() only awaits completion of pending produce requests. If there is a transaction ongoing, it may be dropped. For example, if one thread is calling commitTransaction() and another calls close(), then the commit may never happen even if the caller is willing to wait for it (by using a long timeout). What's more, the thread blocking in commitTransaction() will be stuck since the result will not be completed once the producer has shutdown. 

This patch ensures that 1) completing transactions are awaited, 2) ongoing transactions are aborted, and 3) pending callbacks are completed before close() returns.

Reviewers: Jason Gustafson <jason@confluent.io>
2019-04-15 15:56:36 -07:00
Konstantine Karantasis e4cad35312 KAFKA-8014: Extend Connect integration tests to add and remove workers dynamically (#6342)
Extend Connect's integration test framework to add or remove workers to EmbeddedConnectCluster, and choosing whether to fail the test on ungraceful service shutdown. Also added more JavaDoc and other minor improvements. 

Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Arjun Satish <arjun@confluent.io>, Randall Hauch <rhauch@gmail.com>

Closes #6342 from kkonstantine/KAFKA-8014
2019-03-25 09:29:33 -05:00
Mickael Maison 4824dc994d KAFKA-7972: Use automatic RPC generation in SaslHandshake
Author: Mickael Maison <mickael.maison@gmail.com>

Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #6301 from mimaison/sasl-handshake
2019-02-25 11:20:07 +05:30
Alex Diachenko ec42e0378e KAFKA-7799; Use httpcomponents-client in RestServerTest.
The test `org.apache.kafka.connect.runtime.rest.RestServerTest#testCORSEnabled` assumes Jersey client can send restricted HTTP headers(`Origin`).

Jersey client uses `sun.net.www.protocol.http.HttpURLConnection`.
`sun.net.www.protocol.http.HttpURLConnection` drops restricted headers(`Host`, `Keep-Alive`, `Origin`, etc) based on static property `allowRestrictedHeaders`.
This property is initialized in a static block by reading Java system property `sun.net.http.allowRestrictedHeaders`.

So, if classloader loads `HttpURLConnection` before we set `sun.net.http.allowRestrictedHeaders=true`, then all subsequent changes of this system property won't take any effect(which happens if `org.apache.kafka.connect.integration.ExampleConnectIntegrationTest` is executed before `RestServerTest`).
To prevent this, we have to either make sure we set `sun.net.http.allowRestrictedHeaders=true` as early as possible or do not rely on this system property at all.

This PR adds test dependency on `httpcomponents-client` which doesn't depend on `sun.net.http.allowRestrictedHeaders` system property. Thus none of existing tests should interfere with `RestServerTest`.

Author: Alex Diachenko <sansanichfb@gmail.com>

Reviewers: Randall Hauch, Konstantine Karantasis, Gwen Shapira

Closes #6236 from avocader/KAFKA-7799
2019-02-12 12:03:08 -08:00
Chia-Ping Tsai 3ebf058123 MINOR: fix checkstyle suppressions for generated RPC code to work on Windows
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
2019-01-31 10:29:55 -08:00
Matthias J. Sax 1fa02d5aef
MINOR: Update usage of deprecated API (#6146)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Ismael Juma <ismael@confluent.io>, Jorge Quilcate Otoya <quilcate.jorge@gmail.com>, John Roesler <john@confluent.io>, Bill Bejeck <bill@confluent.io>
2019-01-29 11:15:43 -08:00
Tom Bentley 269b65279c KAFKA-5692: Change PreferredReplicaLeaderElectionCommand to use Admin… (#3848)
See also KIP-183.

This implements the following algorithm:

AdminClient sends ElectPreferredLeadersRequest.
KafakApis receives ElectPreferredLeadersRequest and delegates to
ReplicaManager.electPreferredLeaders()
ReplicaManager delegates to KafkaController.electPreferredLeaders()
KafkaController adds a PreferredReplicaLeaderElection to the EventManager,
ReplicaManager.electPreferredLeaders()'s callback uses the
delayedElectPreferredReplicasPurgatory to wait for the results of the
election to appear in the metadata cache. If there are no results
because of errors, or because the preferred leaders are already leading
the partitions then a response is returned immediately.
In the EventManager work thread the preferred leader is elected as follows:

The EventManager runs PreferredReplicaLeaderElection.process()
process() calls KafkaController.onPreferredReplicaElectionWithResults()
KafkaController.onPreferredReplicaElectionWithResults()
calls the PartitionStateMachine.handleStateChangesWithResults() to
perform the election (asynchronously the PSM will send LeaderAndIsrRequest
to the new and old leaders and UpdateMetadataRequest to all brokers)
then invokes the callback.

Reviewers: Colin P. McCabe <cmccabe@apache.org>, Jun Rao <junrao@gmail.com>
2019-01-25 14:06:18 -08:00
Colin Patrick McCabe a79d6dcdb6
KAFKA-7793: Improve the Trogdor command line. (#6133)
* Allow the Trogdor agent to be started in "exec mode", where it simply
runs a single task and exits after it is complete.

* For AgentClient and CoordinatorClient, allow the user to pass the path
to a file containing JSON, instead of specifying the JSON object in the
command-line text itself.  This means that we can get rid of the bash
scripts whose only function was to load task specs into a bash string
and run a Trogdor command.

* Print dates and times in a human-readable way, rather than as numbers
of milliseconds.

* When listing tasks or workers, output human-readable tables of
information.

* Allow the user to filter on task ID name, task ID pattern, or task
state.

* Support a --json flag to provide raw JSON output if desired.

Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
2019-01-24 09:26:51 -08:00
Arjun Satish dc935c4beb MINOR: Handle case where connector status endpoints returns 404 (#6176)
Reviewers: Randall Hauch <randall@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2019-01-20 19:31:20 -08:00
Arjun Satish 69d8d2ea11 KAFKA-7503: Connect integration test harness
Expose a programmatic way to bring up a Kafka and Zk cluster through Java API to facilitate integration tests for framework level changes in Kafka Connect. The Kafka classes would be similar to KafkaEmbedded in streams. The new classes would reuse the kafka.server.KafkaServer classes from :core, and provide a simple interface to bring up brokers in integration tests.

Signed-off-by: Arjun Satish <arjunconfluent.io>

Author: Arjun Satish <arjun@confluent.io>
Author: Arjun Satish <wicknicks@users.noreply.github.com>

Reviewers: Randall Hauch <rhauch@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #5516 from wicknicks/connect-integration-test
2019-01-14 13:50:23 -08:00
Matthias J. Sax 82d1db6358
MINOR: code cleanup (#6054)
Reviewers: Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@confuent.io>
2019-01-14 13:36:36 -08:00
Colin Patrick McCabe 71e85f5e84 KAFKA-7609; Add Protocol Generator for Kafka (#5893)
This patch adds a framework to automatically generate the request/response classes for Kafka's protocol. The code will be updated to use the generated classes in follow-up patches. Below is a brief summary of the included components:

**buildSrc/src**
The message generator code is here.  This code is automatically re-run by gradle when one of the schema files changes.  The entire directory is processed at once to minimize the number of times we have to start a new JVM.  We use Jackson to translate the JSON files into Java objects.

**clients/src/main/java/org/apache/kafka/common/protocol/Message.java**
This is the interface implemented by all automatically generated messages.

**clients/src/main/java/org/apache/kafka/common/protocol/MessageUtil.java**
Some utility functions used by the generated message code.

**clients/src/main/java/org/apache/kafka/common/protocol/Readable.java, Writable.java, ByteBufferAccessor.java**
The generated message code uses these classes for writing to a buffer.

**clients/src/main/message/README.md**
This README file explains how the JSON schemas work.

**clients/src/main/message/\*.json**
The JSON files in this directory implement every supported version of every Kafka API.  The unit tests automatically validate that the generated schemas match the hand-written schemas in our code.  Additionally, there are some things like request and response headers that have schemas here.

**clients/src/main/java/org/apache/kafka/common/utils/ImplicitLinkedHashSet.java**
I added an optimization here for empty sets.  This is useful here because I want all messages to start with empty sets by default prior to being loaded with data.  This is similar to the "empty list" optimizations in the `java.util.ArrayList` class.

Reviewers: Stanislav Kozlovski <stanislav_kozlovski@outlook.com>, Ismael Juma <ismael@juma.me.uk>, Bob Barrett <bob.barrett@outlook.com>, Jason Gustafson <jason@confluent.io>
2019-01-11 16:40:21 -08:00
Stanislav Kozlovski 7fadf0a11d Trogdor: Add Task State filter to /coordinator/tasks endpoint (#5907)
Reviewers: Colin McCabe <cmccabe@apache.org>
2018-11-26 16:46:58 -08:00
Ron Dagostino e8a3bc7425 KAFKA-7352; Allow SASL Connections to Periodically Re-Authenticate (KIP-368) (#5582)
KIP-368 implementation to enable periodic re-authentication of SASL clients. Also adds a broker configuration option to terminate client connections that do not re-authenticate within the configured interval.
2018-10-26 23:18:15 +01:00
Rajini Sivaram 4c602e6130
KAFKA-7498: Remove references from `common.requests` to `clients` (#5784)
Add CreatePartitionsRequest.PartitionDetails similar to CreateTopicsRequest.TopicDetails to avoid references from `common.requests` package to `clients`.

Reviewers: Ismael Juma <ismael@juma.me.uk>
2018-10-15 13:21:15 +01:00
Ismael Juma 578205cadd KAFKA-7439; Replace EasyMock and PowerMock with Mockito in clients module
Development of EasyMock and PowerMock has stagnated while Mockito
continues to be actively developed. With the new Java release cadence,
it's a problem to depend on libraries that do bytecode manipulation
and are not actively maintained. In addition, Mockito is also
easier to use.

While updating the tests, I attempted to go from failing test to
passing test. In cases where the updated test passed on the first
attempt, I artificially broke it to ensure the test was still doing its
job.

I included a few improvements that were helpful while making these
changes:

1. Better exception if there are no nodes in `leastLoadedNodes`
2. Always close the producer in `KafkaProducerTest`
3. requestsInFlight producer metric should not hold a reference to
`Sender`

Finally, `Metadata` is no longer final so that we don't need
`PowerMock` to mock it. It's an internal class, so it's OK.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Dong Lin <lindong28@gmail.com>

Closes #5691 from ijuma/kafka-7438-mockito
2018-10-09 15:55:09 -07:00
Randall Hauch f0282498e7 KAFKA-6926: Simplified some logic to eliminate some suppressions of NPath complexity checks (#5051)
Modified several classes' `equals` methods and simplified a complex method to
reduce the NPath complexity so they could be removed from the checkstyle
suppressions that were required with the recent move to Java 8 and upgrade
of Checkstyle: https://github.com/apache/kafka/pull/5046.

Reviewers: Robert Yokota <rayokota@gmail.com>, Arjun Satish <arjun@confluent.io>, Ismael Juma <ismael@juma.me.uk>
2018-09-12 21:21:46 -07:00
Anna Povzner e2ec2d79c8 KAFKA-7044; Fix Fetcher.fetchOffsetsByTimes and NPE in describe consumer group (#5627)
A call to `kafka-consumer-groups --describe --group ...` can result in NullPointerException for two reasons:
1)  `Fetcher.fetchOffsetsByTimes()` may return too early, without sending list offsets request for topic partitions that are not in cached metadata.
2) `ConsumerGroupCommand.getLogEndOffsets()` and `getLogStartOffsets()` assumed that endOffsets()/beginningOffsets() which eventually call Fetcher.fetchOffsetsByTimes(), would return a map with all the topic partitions passed to endOffsets()/beginningOffsets() and that values are not null. Because of (1), null values were possible if some of the topic partitions were already known (in metadata cache) and some not (metadata cache did not have entries for some of the topic partitions). However, even with fixing (1), endOffsets()/beginningOffsets() may return a map with some topic partitions missing, when list offset request returns a non-retriable error. This happens in corner cases such as message format on broker is before 0.10, or maybe in cases of some other errors. 

Testing:
-- added unit test to verify fix in Fetcher.fetchOffsetsByTimes() 
-- did some manual testing with `kafka-consumer-groups --describe`, causing NPE. Was not able to reproduce any NPE cases with DescribeConsumerGroupTest.scala,

Reviewers: Jason Gustafson <jason@confluent.io>
2018-09-11 10:10:42 -07:00
John Roesler d57fe1b053 MINOR: single Jackson serde for PageViewTypedDemo (#5590)
Previously, we depicted creating a Jackson serde for every pojo class, which becomes a burden in practice. There are many ways to avoid this and just have a single serde, so we've decided to model this design choice instead.

Reviewers: Viktor Somogyi <viktorsomogyi@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
2018-08-31 13:13:42 -07:00
Ismael Juma b282b2ab10
KAFKA-7308: Fix rat and checkstyle config for Java 11 support (#5529)
Relative paths in Gradle break when the Gradle daemon is used
unless user.dir can be changed while the process is running.
Java 11 disallows this, so we use project paths instead.

Verified that rat and checkstyle work with Java 11 after these
changes.

Reviewers: Dong Lin <lindong28@gmail.com>
2018-08-18 08:18:28 -07:00
Colin Patrick McCabe 609c81ec8b KAFKA-7183: Add a trogdor test that creates many connections to brokers (#5393)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
2018-08-06 08:47:25 +01:00
John Roesler 3637b2c374 MINOR: Require final variables in Streams (#5452)
Reviewers: Bill Bejeck <bill@confluent.io>, Guozhang Wang <guozhang@confluent.io>, Matthias J. Sax <matthias@confluent.io>
2018-08-03 13:19:46 -07:00
Manikumar Reddy O 96c53e96b8 MINOR: Remove deprecated ZkUtils usage from EmbeddedKafkaCluster (#5324)
Reviewers: Matthias J. Sax <mjsax@apache.org>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
2018-07-19 14:28:12 -07:00
Joan Goyeau 05c5854d1f MINOR: Add Scalafmt to Streams Scala API (#4965)
Reviewers: Guozhang Wang <wangguoz@gmail.com>
2018-07-09 16:48:34 -07:00
Rajini Sivaram 1f8527b331 KAFKA-7136: Avoid deadlocks in synchronized metrics reporters (#5341)
We need to use the same lock for metric update and read to avoid NPE and concurrent modification exceptions. Sensor add/remove/update are synchronized on Sensor since they access lists and maps that are not thread-safe. Reporters are notified of metrics add/remove while holding (Sensor, Metrics) locks and reporters may synchronize on the reporter lock. Metric read may be invoked by metrics reporters while holding a reporter lock. So read/update cannot be synchronized using Sensor since that could lead to deadlock. This PR introduces a new lock in Sensor for update/read.
Locking order:

- Sensor#add: Sensor -> Metrics -> MetricsReporter
- Metrics#removeSensor: Sensor -> Metrics -> MetricsReporter
- KafkaMetric#metricValue: MetricsReporter -> Sensor#metricLock
- Sensor#record: Sensor -> Sensor#metricLock


Reviewers: Jun Rao <junrao@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
2018-07-06 10:54:28 -07:00
Ismael Juma cc4dce94af
KAFKA-2983: Remove Scala consumers and related code (#5230)
- Removed Scala consumers (`SimpleConsumer` and `ZooKeeperConsumerConnector`)
and their tests.
- Removed Scala request/response/message classes.
- Removed any mention of new consumer or new producer in the code
with the exception of MirrorMaker where the new.consumer option was
never deprecated so we have to keep it for now. The non-code
documentation has not been updated either, that will be done
separately.
- Removed a number of tools that only made sense in the context
of the Scala consumers (see upgrade notes).
- Updated some tools that worked with both Scala and Java consumers
so that they only support the latter (see upgrade notes).
- Removed `BaseConsumer` and related classes apart from `BaseRecord`
which is used in `MirrorMakerMessageHandler`. The latter is a pluggable
interface so effectively public API.
- Removed `ZkUtils` methods that were only used by the old consumers.
- Removed `ZkUtils.registerBroker` and `ZKCheckedEphemeral` since
the broker now uses the methods in `KafkaZkClient` and no-one else
should be using that method.
- Updated system tests so that they don't use the Scala consumers except
for multi-version tests.
- Updated LogDirFailureTest so that the consumer offsets topic would
continue to be available after all the failures. This was necessary for it
to work with the Java consumer.
- Some multi-version system tests had not been updated to include
recently released Kafka versions, fixed it.
- Updated findBugs and checkstyle configs not to refer to deleted
classes and packages.

Reviewers: Dong Lin <lindong28@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>
2018-06-19 07:32:54 -07:00
Andy Coates b3aa655a70 KAFKA-6841: Support Prefixed ACLs (KIP-290) (#5117)
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Jun Rao <junrao@gmail.com>

Co-authored-by: Piyush Vijay <pvijay@apple.com>
Co-authored-by: Andy Coates <big-andy-coates@users.noreply.github.com>
2018-06-06 07:22:57 -07:00
Robert Yokota 08e8facdc9 KAFKA-6886: Externalize secrets from Connect configs (KIP-297)
This commit allows secrets in Connect configs to be externalized and replaced with variable references of the form `${provider:[path:]key}`, where the "path" is optional.

There are 2 main additions to `org.apache.kafka.common.config`: a `ConfigProvider` and a `ConfigTransformer`.  The `ConfigProvider` is an interface that allows key-value pairs to be provided by an external source for a given "path".  An a TTL can be associated with the key-value pairs returned from the path.  The `ConfigTransformer` will use instances of `ConfigProvider` to replace variable references in a set of configuration values.

In the Connect framework, `ConfigProvider` classes can be specified in the worker config, and then variable references can be used in the connector config.  In addition, the herder can be configured to restart connectors (or not) based on the TTL returned from a `ConfigProvider`.  The main class that performs restarts and transformations is `WorkerConfigTransformer`.

Finally, a `configs()` method has been added to both `SourceTaskContext` and `SinkTaskContext`.  This allows connectors to get configs with variables replaced by the latest values from instances of `ConfigProvider`.

Most of the other changes in the Connect framework are threading various objects through classes to enable the above functionality.

Author: Robert Yokota <rayokota@gmail.com>
Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #5068 from rayokota/KAFKA-6886-connect-secrets
2018-05-30 14:43:11 -07:00
Magesh Nandakumar 98094954a2 KAFKA-6776: ConnectRestExtension Interfaces & Rest integration (KIP-285)
This PR provides the implementation for KIP-285 and also a reference implementation for authenticating BasicAuth credentials using JAAS LoginModule

Author: Magesh Nandakumar <magesh.n.kumar@gmail.com>

Reviewers: Randall Hauch <rhauch@gmail.com>, Arjun Satish <wicknicks@users.noreply.github.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4931 from mageshn/KIP-285
2018-05-29 21:35:22 -07:00
Ron Dagostino 8c5d7e0408 KAFKA-6562: OAuth Authentication via SASL/OAUTHBEARER (KIP-255) (#4994)
This KIP adds the following functionality related to SASL/OAUTHBEARER:

1) Allow clients (both brokers when SASL/OAUTHBEARER is the inter-broker protocol as well as non-broker clients) to flexibly retrieve an access token from an OAuth 2 authorization server based on the declaration of a custom login CallbackHandler implementation and have that access token transparently and automatically transmitted to a broker for authentication.

2) Allow brokers to flexibly validate provided access tokens when a client establishes a connection based on the declaration of a custom SASL Server CallbackHandler implementation.

3) Provide implementations of the above retrieval and validation features based on an unsecured JSON Web Token that function out-of-the-box with minimal configuration required (i.e. implementations of the two types of callback handlers mentioned above will be used by default with no need to explicitly declare them).

4) Allow clients (both brokers when SASL/OAUTHBEARER is the inter-broker protocol as well as non-broker clients) to transparently retrieve a new access token in the background before the existing access token expires in case the client has to open new connections.
2018-05-26 08:18:41 +01:00
Ismael Juma e70a191d30
KAFKA-4423: Drop support for Java 7 (KIP-118) and update deps (#5046)
* Set --source, --target and --release to 1.8.
* Build Scala 2.12 by default.
* Remove some conditionals in the build file now that Java 8
is the minimum version.
* Bump the version of Jetty, Jersey and Checkstyle (the newer
versions require Java 8).
* Fixed issues uncovered by the new version if Checkstyle.
* A couple of minor updates to handle an incompatible source
change in the new version of Jetty.
* Add dependency to jersey-hk2 to fix failing tests caused
by Jersey upgrade.
* Update release script to use Java 8 and to take into account
that Scala 2.12 is now built by default.
* While we're at it, bump the version of Gradle, Gradle plugins,
ScalaLogging, JMH and apache directory api.
* Minor documentation updates including the readme and upgrade
notes. A number of Streams Java 7 examples can be removed
subsequently.
2018-05-21 23:17:42 -07:00
John Roesler ed51b2cdf5 KAFKA-6376; refactor skip metrics in Kafka Streams
* unify skipped records metering
* log warnings when things get skipped
* tighten up metrics usage a bit

### Testing strategy:
Unit testing of the metrics and the logs should be sufficient.

Author: John Roesler <john@confluent.io>

Reviewers: Bill Bejeck <bill@confluent.io>, Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #4812 from vvcephei/kip-274-streams-skip-metrics
2018-04-23 11:41:03 -07:00
Colin Patrick McCabe 832b096f4f KAFKA-6696 Trogdor should support destroying tasks (#4759)
Implement destroying tasks and workers.  This means erasing all record of them on the Coordinator and the Agent.

Workers should be identified by unique 64-bit worker IDs, rather than by the names of the tasks they are implementing.  This ensures that when a task is destroyed and re-created with the same task ID, the old workers will be not be treated as part of the new task instance.

Fix some return results from RPCs.  In some cases RPCs were returning values that were never used.  Attempting to re-create the same task ID with different arguments should fail.  Add RequestConflictException to represent HTTP error code 409 (CONFLICT) for this scenario.

If only one worker in a task stops, don't stop all the other workers for that task, unless the worker that stopped had an error.

Reviewers: Anna Povzner <anna@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2018-04-16 08:51:33 +01:00
Jorge Quilcate Otoya 6a99da87ab KAFKA-6058: KIP-222; Add Consumer Group operations to Admin API
KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-222+-+Add+Consumer+Group+operations+to+Admin+API

Author: Jorge Quilcate Otoya <quilcate.jorge@gmail.com>
Author: Jorge Esteban Quilcate Otoya <quilcate.jorge@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Guozhang Wang <wangguoz@gmail.com>

Closes #4454 from jeqo/feature/admin-client-describe-consumer-group
2018-04-11 14:17:46 -07:00
Manikumar Reddy O 47918f2d79 KAFKA-6447: Add Delegation Token Operations to KafkaAdminClient (KIP-249) (#4427)
Reviewers: Jun Rao <junrao@gmail.com>
2018-04-11 10:48:04 -07:00
Matthias J. Sax 0c0d8363e5
KAFKA-6054: Fix upgrade path from Kafka Streams v0.10.0 (#4779)
Reviewers: Guozhang Wang <guozhang@confluent.io>, Bill Bejeck <bill@confluent.io>, John Roesler <john@confluent.io>, Damian Guy <damian@confluent.io>
2018-04-06 17:00:52 -07:00
Anna Povzner 5c24295d44 Trogdor's ProducerBench does not fail if topics exists (#4673)
Added configs to ProducerBenchSpec:
topicPrefix: name of topics will be of format topicPrefix + topic index. If not provided, default is "produceBenchTopic".
partitionsPerTopic: number of partitions per topic. If not provided, default is 1.
replicationFactor: replication factor per topic. If not provided, default is 3.

The behavior of producer bench is changed such that if some or all topics already exist (with topic names = topicPrefix + topic index), and they have the same number of partitions as requested, the worker uses those topics and does not fail. The producer bench fails if one or more existing topics has number of partitions that is different from expected number of partitions.

Added unit test for WorkerUtils -- for existing methods and new methods.

Fixed bug in MockAdminClient, where createTopics() would over-write existing topic's replication factor and number of partitions while correctly completing the appropriate futures exceptionally with TopicExistsException.

Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
2018-03-20 13:51:45 +00:00
Colin Patrick McCabe 9e0e6e43a7 MINOR: Trogdor should not assume an agent co-located with the controller (#4712) 2018-03-16 17:57:38 +00:00
Guozhang Wang f26fbb9adc
MINOR: Rename stream partition assignor to streams partition assignor (#4621)
This is a straight-forward change that make the name of the partition assignor to be aligned with Streams.

Reviewers: Matthias J. Sax <mjsax@apache.org>
2018-02-26 14:39:47 -08:00
Colin Patrick McCabe 66039b1312 MINOR: Fix ConcurrentModificationException in TransactionManager (#4608) 2018-02-22 22:23:14 -08:00
Konstantine Karantasis 17aaff3606 KAFKA-6288: Broken symlink interrupts scanning of the plugin path
Submitting a fail safe fix for rare IOExceptions on symbolic links.

The fix is submitted without a test case since it does seem easy to reproduce such type of failures (just having a broken symbolic link does not reproduce the issue) and it's considered pretty low risk.

If accepted, needs to be ported at least to 1.0, if not 0.11

Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4481 from kkonstantine/KAFKA-6288-Broken-symlink-interrupts-scanning-the-plugin-path
2018-02-04 14:57:25 -08:00
Rajini Sivaram 4019b21d60 KAFKA-6246; Dynamic update of listeners and security configs (#4488)
Dynamic update of listeners as described in KIP-226. This includes:
  - Addition of new listeners with listener-prefixed security configs
  - Removal of existing listeners
  - Password encryption
  - sasl.jaas.config property for broker's JAAS config prefixed with listener and mechanism name
2018-02-04 09:19:16 -08:00
Randall Hauch 4c48942f9d KAFKA-5142: Add Connect support for message headers (KIP-145)
**[KIP-145](https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect) has been accepted, and this PR implements KIP-145 except without the SMTs.**

Changed the Connect API and runtime to support message headers as described in [KIP-145](https://cwiki.apache.org/confluence/display/KAFKA/KIP-145+-+Expose+Record+Headers+in+Kafka+Connect).

The new `Header` interface defines an immutable representation of a Kafka header (key-value pair) with support for the Connect value types and schemas. This interface provides methods for easily converting between many of the built-in primitive, structured, and logical data types.

The new `Headers` interface defines an ordered collection of headers and is used to track all headers associated with a `ConnectRecord` (and thus `SourceRecord` and `SinkRecord`). This does allow multiple headers with the same key. The `Headers` contains methods for adding, removing, finding, and modifying headers. Convenience methods allow connectors and transforms to easily use and modify the headers for a record.

A new `HeaderConverter` interface is also defined to enable the Connect runtime framework to be able to serialize and deserialize headers between the in-memory representation and Kafka’s byte[] representation. A new `SimpleHeaderConverter` implementation has been added, and this serializes to strings and deserializes by inferring the schemas (`Struct` header values are serialized without the schemas, so they can only be deserialized as `Map` instances without a schema.) The `StringConverter`, `JsonConverter`, and `ByteArrayConverter` have all been extended to also be `HeaderConverter` implementations. Each connector can be configured with a different header converter, although by default the `SimpleHeaderConverter` is used to serialize header values as strings without schemas.

Unit and integration tests are added for `ConnectHeader` and `ConnectHeaders`, the two implementation classes for headers. Additional test methods are added for the methods added to the `Converter` implementations. Finally, the `ConnectRecord` object is already used heavily, so only limited tests need to be added while quite a few of the existing tests already cover the changes.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Arjun Satish <arjun@confluent.io>, Ted Yu <yuzhihong@gmail.com>, Magesh Nandakumar <magesh.n.kumar@gmail.com>, Konstantine Karantasis <konstantine@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #4319 from rhauch/kafka-5142-b
2018-01-31 10:40:24 -08:00
Matthias J. Sax a78f66a5ae KAFKA-3625: Add public test utils for Kafka Streams (#4402)
* KAFKA-3625: Add public test utils for Kafka Streams
 - add new artifact test-utils
 - add TopologyTestDriver
 - add MockTime, TestRecord, add TestRecordFactory

Reviewers: Guozhang Wang <wangguoz@gmail.com>, Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>
2018-01-29 17:21:48 -08:00
Rajini Sivaram b814a16b96 KAFKA-6241; Enable dynamic updates of broker SSL keystore (#4263)
Enable dynamic broker configuration (see KIP-226 for details). Includes
 - Base implementation to allow specific broker configs and custom configs to be dynamically updated
 - Extend DescribeConfigsRequest/Response to return all synonym configs and their sources in the order of precedence
 - Extend AdminClient to alter dynamic broker configs
 - Dynamic update of SSL keystores

Reviewers: Ted Yu <yuzhihong@gmail.com>, Jason Gustafson <jason@confluent.io>
2018-01-23 08:44:31 -08:00
Manikumar Reddy 27a8d0f9e7 KAFKA-4541; Support for delegation token mechanism
- Add capability to create delegation token
- Add authentication based on delegation token.
- Add capability to renew/expire delegation tokens.
- Add units tests and integration tests

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes #3616 from omkreddy/KAFKA-4541
2018-01-16 09:50:31 -08:00
Manikumar Reddy 488ea4b9fd KAFKA-5647; Use KafkaZkClient in ReassignPartitionsCommand and PreferredReplicaLeaderElectionCommand
*  Use KafkaZkClient in ReassignPartitionsCommand
*  Use KafkaZkClient in PreferredReplicaLeaderElectionCommand
*  Updated test classes to use new methods
*  All existing tests should pass

Author: Manikumar Reddy <manikumar.reddy@gmail.com>

Reviewers: Jun Rao <junrao@gmail.com>

Closes #4260 from omkreddy/KAFKA-5647-ADMINCOMMANDS
2017-12-20 12:19:36 -08:00
Matthias J. Sax 234ec8a8af KAFKA-4857: Replace StreamsKafkaClient with AdminClient in Kafka Streams
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Bill Bejeck <bbejeck@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes #4242 from mjsax/kafka-4857-admit-client
2017-12-07 16:16:54 -08:00
Jorge Quilcate Otoya 30f08d158a KAFKA-5520: KIP-171; Extend Consumer Group Reset Offset for Stream Application
KIP: https://cwiki.apache.org/confluence/display/KAFKA/KIP-171+-+Extend+Consumer+Group+Reset+Offset+for+Stream+Application

Merge changes from KIP-198

Ref: https://github.com/apache/kafka/pull/3831

Author: Jorge Quilcate Otoya <quilcate.jorge@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Author: Matthias J. Sax <matthias@confluent.io>
Author: Manikumar Reddy <manikumar.reddy@gmail.com>
Author: Guozhang Wang <wangguoz@gmail.com>
Author: Apurva Mehta <apurva@confluent.io>
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Author: Jason Gustafson <jason@confluent.io>
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Author: Bill Bejeck <bill@confluent.io>
Author: Dong Lin <lindong28@gmail.com>
Author: Soenke Liebau <soenke.liebau@opencore.com>
Author: Colin P. Mccabe <cmccabe@confluent.io>
Author: Damian Guy <damian.guy@gmail.com>
Author: Xavier Léauté <xl+github@xvrl.net>
Author: Maytee Chinavanichkit <maytee.chinavanichkit@linecorp.com>
Author: Joel Hamill <git config --global user.email>
Author: Paolo Patierno <ppatierno@live.com>
Author: siva santhalingam <siva.santhalingam@gmail.com>
Author: Tommy Becker <tobecker@tivo.com>
Author: Mickael Maison <mickael.maison@gmail.com>
Author: Onur Karaman <okaraman@linkedin.com>
Author: tedyu <yuzhihong@gmail.com>
Author: Xin Li <Xin.Li@trivago.com>
Author: Magnus Edenhill <magnus@edenhill.se>
Author: Manjula K <manjula@kafka-summit.org>
Author: Hugo Louro <hmclouro@gmail.com>
Author: Jeff Widman <jeff@jeffwidman.com>
Author: bartdevylder <bartdevylder@gmail.com>
Author: Ewen Cheslack-Postava <me@ewencp.org>
Author: Jacek Laskowski <jacek@japila.pl>
Author: Tom Bentley <tbentley@redhat.com>
Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #4159 from jeqo/feature/kip-171
2017-12-06 11:38:38 -08:00
Colin P. Mccabe d9cbc6b1a2 KAFKA-5811; Add Kibosh integration for Trogdor and Ducktape
For ducktape: add Kibosh to the testing Dockerfile.
Create files_unreadable_fault_spec.py.

For trogdor: create FilesUnreadableFaultSpec.java.
Add a unit test of using the Kibosh service.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #4195 from cmccabe/KAFKA-5811
2017-11-16 17:59:24 +00:00
Colin P. Mccabe 4fac83ba1f KAFKA-6060; Add workload generation capabilities to Trogdor
Previously, Trogdor only handled "Faults."  Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.

The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm.  No locks are necessary, because only one thread can access
the task state or worker state.  This makes them a lot easier to reason
about.

The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay).  I added a
MockTimeTest.

MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.

RPC messages now inherit from a common Message.java class.  This class
handles implementing serialization, equals, hashCode, etc.

Remove FaultSet, since it is no longer necessary.

Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception.  They now retry several times
before giving up.  Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent.  If a response is lost, and
the request is resent, no harm will be done.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #4073 from cmccabe/KAFKA-6060
2017-11-03 09:37:29 +00:00
Randall Hauch 11afff0990 KAFKA-5990: Enable generation of metrics docs for Connect (KIP-196)
A new mechanism was added recently to the Metrics framework to make it easier to generate the documentation. It uses a registry with a MetricsNameTemplate for each metric, and then those templates are used when creating the actual metrics. The metrics framework provides utilities that can generate the HTML documentation from the registry of templates.

This change moves the recently-added Connect metrics over to use these templates and to then generate the metric documentation for Connect.

This PR is based upon #3975 and can be rebased once that has been merged.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #3987 from rhauch/kafka-5990
2017-10-04 11:05:50 -07:00
Rajini Sivaram 021d8a8e96 KAFKA-5746; Add new metrics to support health checks (KIP-188)
Adds new metrics to support health checks:
1. Error rates for each request type, per-error code
2. Request size and temporary memory size
3. Message conversion rate and time
4. Successful and failed authentication rates
5. ZooKeeper latency and status
6. Client version

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3705 from rajinisivaram/KAFKA-5746-new-metrics
2017-09-28 21:58:59 +01:00
Bill Bejeck 271f6b5aec KAFKA-5862: Remove ZK dependency from Streams reset tool, Part I
Author: Bill Bejeck <bill@confluent.io>
Author: bbejeck <bbejeck@gmail.com>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Ted Yu <yuzhihong@gmail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes #3927 from bbejeck/KAFKA-5862_remove_zk_dependency_from_streams_reset_tool
2017-09-23 12:05:16 +08:00
Rajini Sivaram 96ba21e0df KAFKA-5947; Handle authentication failure in admin client, txn producer
1. Raise AuthenticationException for authentication failures in admin client
2. Handle AuthenticationException as a fatal error for transactional producer
3. Add comments to authentication exceptions

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Ismael Juma <ismael@juma.me.uk>

Closes #3928 from rajinisivaram/KAFKA-5947-auth-failure
2017-09-21 13:58:43 +01:00
Jason Gustafson 0cf7708007 MINOR: Move request/response schemas to the corresponding object representation
This refactor achieves the following:

1. Breaks up the increasingly unmanageable `Protocol` class and moves schemas closer to their actual usage.
2. Removes the need for redundant field identifiers maintained separately in `Protocol` and the respective request/response objects.
3. Provides a better mechanism for sharing common fields between different schemas (e.g. topics, partitions, error codes, etc.).
4. Adds convenience helpers to `Struct` for common patterns (such as setting a field only if it exists).

Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3813 from hachikuji/protocol-schema-refactor
2017-09-19 05:12:55 +01:00
Rajini Sivaram 8fca432223 KAFKA-4764; Wrap SASL tokens in Kafka headers to improve diagnostics (KIP-152)
SASL handshake protocol changes from KIP-152.

Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #3708 from rajinisivaram/KAFKA-4764-SASL-diagnostics
2017-09-15 17:16:29 +01:00
Jason Gustafson 3b5d88febb KAFKA-5783; Add KafkaPrincipalBuilder with support for SASL (KIP-189)
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>

Closes #3795 from hachikuji/KAFKA-5783
2017-09-14 10:16:00 +01:00
James Cheng 2fb5664bf4 KAFKA-5597: Autogenerate producer sender metrics
Subtask of https://issues.apache.org/jira/browse/KAFKA-3480

The changes are very similar to what was done for the consumer in https://issues.apache.org/jira/browse/KAFKA-5191 (pull request https://github.com/apache/kafka/pull/2993)

Author: James Cheng <jylcheng@yahoo.com>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Guozhang Wang <wangguoz@gmail.com>

Closes #3535 from wushujames/producer_sender_metrics_docs

Fix one minor naming bug
2017-09-05 17:38:58 -07:00
Colin P. Mccabe 0772fde562 KAFKA-5776; Add the Trogdor fault injection daemon
Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>

Closes #3699 from cmccabe/trogdor-review
2017-08-25 12:29:40 -07:00
huxihx 607c3c21f6 KAFKA-5755; KafkaProducer should be refactored to use LogContext
With LogContext, each producer log item is automatically prefixed with client id and transactional id.

Author: huxihx <huxi_2b@hotmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #3703 from huxihx/KAFKA-5755
2017-08-25 10:42:40 -07:00
Ismael Juma ed96523a2c KAFKA-4501; Java 9 compilation and runtime fixes
Compilation error fixes:
- Avoid ambiguity error when appending to Properties in Scala
code (https://github.com/scala/bug/issues/10418)
- Use position() and limit() to fix ambiguity issue (
https://github.com/scala/bug/issues/10418#issuecomment-316364778)
- Disable findBugs if Java 9 is used (
https://github.com/findbugsproject/findbugs/issues/105)

Compilation warning fixes:
- Avoid deprecated Class.newInstance in Utils.newInstance
- Silence a few Java 9 deprecation warnings
- var -> val and unused fixes

Runtime error fixes:
- Introduce Base64 class that works in Java 7 and Java 9

Also:
- Set --release option if building with Java 9

Note that tests involving EasyMock (https://github.com/easymock/easymock/issues/193)
or PowerMock (https://github.com/powermock/powermock/issues/783)
will fail as neither supports Java 9 currently.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #3647 from ijuma/kafka-4501-support-java-9
2017-08-19 08:55:29 +01:00
Randall Hauch 3b1cea60e9 KAFKA-5731; Corrected how the sink task worker updates the last committed offsets
Prior to this change, it was possible for the synchronous consumer commit request to be handled before previously-submitted asynchronous commit requests. If that happened, the out-of-order handlers improperly set the last committed offsets, which then became inconsistent with the offsets the connector task is working with.

This change ensures that the last committed offsets are updated only for the most recent commit request, even if the consumer reorders the calls to the callbacks.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #3662 from rhauch/kafka-5731
2017-08-15 14:16:00 -07:00
Colin P. Mccabe c9ffab1622 KAFKA-5658; Fix AdminClient request timeout handling bug resulting in continual BrokerNotAvailableExceptions
The AdminClient does not properly clear calls from the callsInFlight structure.
Later, in an effort to clear the lingering call objects, it closes the connection
they are associated with. This disrupts new incoming calls, which then get
BrokerNotAvailableException.

This patch fixes this bug by properly removing completed calls from the
callsInFlight structure. It also adds the Call#aborted flag, which
ensures that we throw the right exception (TimeoutException instead of
DisconnectException) and only abort a connection once -- even if there
is a similar bug in the future which causes old Call objects to linger.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3584 from cmccabe/KAFKA-5658
2017-08-08 09:38:22 +01:00
radai-rosenblatt 47ee8e954d KAFKA-4602; KIP-72 - Allow putting a bound on memory consumed by Incoming requests
this is the initial implementation.

Author: radai-rosenblatt <radai.rosenblatt@gmail.com>

Reviewers: Ewen Cheslack-Postava <me@ewencp.org>, Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>, Jun Rao <junrao@gmail.com>

Closes #2330 from radai-rosenblatt/broker-memory-pool-with-muting
2017-07-26 08:19:56 +02:00
Matthias J. Sax b04bed022a KAFKA-3856; Cleanup Kafka Stream builder API (KIP-120)
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bbejeck@gmail.com>

Closes #2301 from mjsax/kafka-3856-topology-builder-API
2017-07-20 18:47:47 +01:00
Ismael Juma a293e1dc0c MINOR: Adjust checkstyle suppression paths to work on Windows
Use the file name whenever possible and replace / with [/\\]
when it's not.

Also remove unnecessary suppresions.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Vahid Hashemian <vahidhashemian@us.ibm.com>, Jason Gustafson <jason@confluent.io>

Closes #3431 from ijuma/fix-checkstyle-suppressions-on-windows
2017-06-28 01:47:00 +01:00
Eno Thereska 55a90938a1 MINOR: add Yahoo benchmark to nightly runs
Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Damian Guy <damian.guy@gmail.com>

Closes #3289 from enothereska/yahoo-benchmark
2017-06-21 11:46:59 +01:00
Ismael Juma b20d9333be KAFKA-5274: AdminClient Javadoc improvements
Publish Javadoc for common.annotation package, which contains
InterfaceStability.

Finally, mark AdminClient classes with `Evolving` instead of `Unstable`.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Colin Mccabe, Gwen Shapira

Closes #3316 from ijuma/kafka-5274-admin-client-javadoc
2017-06-14 08:57:49 -07:00
Matthias J. Sax ba07d828c5 KAFKA-5362: Add EOS system tests for Streams API
Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Damian Guy <damian.guy@gmail.com>, Bill Bejeck <bill@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #3201 from mjsax/kafka-5362-add-eos-system-tests-for-streams-api
2017-06-08 14:08:54 -07:00
Ismael Juma 39eb31feae MINOR: Optimise performance of `Topic.validate()`
I included a JMH benchmark and the results follow. The
implementation in this PR takes no more than 1/10th
of the time when compared to trunk. I also included
results for an alternative implementation that is a little
slower than the one in the PR.

Trunk:
```text
TopicBenchmark.testValidate                                topic  avgt   15  134.107 ±  3.956  ns/op
TopicBenchmark.testValidate                    longer-topic-name  avgt   15  316.241 ± 13.379  ns/op
TopicBenchmark.testValidate  very-long-topic-name_with_more_text  avgt   15  636.026 ± 30.272  ns/op
```

Implementation in the PR:
```text
TopicBenchmark.testValidate                                topic  avgt   15  13.153 ± 0.383  ns/op
TopicBenchmark.testValidate                    longer-topic-name  avgt   15  26.139 ± 0.896  ns/op
TopicBenchmark.testValidate  very-long-topic-name.with_more_text  avgt   15  44.829 ± 1.390  ns/op
```

Alternative implementation where boolean validChar = Character.isLetterOrDigit(c) || c == '.' || c == '_' || c == '-';
```text
TopicBenchmark.testValidate                                topic  avgt   15  18.883 ± 1.044  ns/op
TopicBenchmark.testValidate                    longer-topic-name  avgt   15  36.696 ± 1.220  ns/op
TopicBenchmark.testValidate  very-long-topic-name_with_more_text  avgt   15  65.956 ± 0.669  ns/op
```

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #3234 from ijuma/optimise-topic-is-valid
2017-06-06 03:08:40 +01:00
Ismael Juma c7bc8f7d8c MINOR: Remove redundant volatile write in RecordHeaders
The JMH benchmark included shows that the redundant
volatile write causes the constructor of `ProducerRecord`
to take more than 50% longer:

ProducerRecordBenchmark.constructorBenchmark  avgt   15  24.136 ± 1.458  ns/op (before)
ProducerRecordBenchmark.constructorBenchmark  avgt   15  14.904 ± 0.231  ns/op (after)

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>

Closes #3233 from ijuma/remove-volatile-write-in-records-header-constructor
2017-06-04 10:48:34 -07:00
Colin P. Mccabe f389b71570 KAFKA-5374; Set allow auto topic creation to false when requesting node information only
It avoids the need to handle protocol downgrades and it's safe (i.e. it will never cause
the auto creation of topics).

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3220 from ijuma/kafka-5374-admin-client-metadata
2017-06-03 06:26:16 +01:00
Mario Molina dc5bf4bd45 KAFKA-5218; New Short serializer, deserializer, serde
Author: Mario Molina <mmolimar@gmail.com>

Reviewers: Matthias J. Sax <matthias@confluent.io>, Damian Guy <damian.guy@gmail.com>, Michael G. Noll <michael@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #3017 from mmolimar/KAFKA-5218
2017-05-31 15:09:58 -07:00
Colin P. Mccabe da9a171c99 KAFKA-5265; Move ACLs, Config, Topic classes into org.apache.kafka.common
Also introduce TopicConfig.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #3120 from cmccabe/KAFKA-5265
2017-05-31 17:35:31 +01:00
Xavier Léauté c060c48285 KAFKA-5150; Reduce LZ4 decompression overhead
- reuse decompression buffers in consumer Fetcher
- switch lz4 input stream to operate directly on ByteBuffers
- avoids performance impact of catching exceptions when reaching the end of legacy record batches
- more tests with both compressible / incompressible data, multiple
  blocks, and various other combinations to increase code coverage
- fixes bug that would cause exception instead of invalid block size
  for invalid incompressible blocks
- fixes bug if incompressible flag is set on end frame block size

Overall this improves LZ4 decompression performance by up to 40x for small batches.
Most improvements are seen for batches of size 1 with messages on the order of ~100B.
We see at least 2x improvements for for batch sizes of < 10 messages, containing messages < 10kB

This patch also yields 2-4x improvements on v1 small single message batches for other compression types.

Full benchmark results can be found here
https://gist.github.com/xvrl/05132e0643513df4adf842288be86efd

Author: Xavier Léauté <xavier@confluent.io>
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>

Closes #2967 from xvrl/kafka-5150
2017-05-31 02:22:07 +01:00
Jason Gustafson dfa3c8a92d KAFKA-5316; LogCleaner should account for larger record sets after cleaning
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jun Rao <junrao@gmail.com>

Closes #3142 from hachikuji/KAFKA-5316
2017-05-28 09:57:59 -07:00
Rajini Sivaram 73ca0d215e KAFKA-5320: Include all request throttling in client throttle metrics
Author: Rajini Sivaram <rajinisivaram@googlemail.com>

Reviewers: Jun Rao <junrao@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #3137 from rajinisivaram/KAFKA-5320
2017-05-25 20:28:18 +01:00
Matthias J. Sax 495836a4a2 KAFKA-4923: Modify compatibility test for Exaclty-Once Semantics in Streams
- add broker compatibility system tests

Author: Matthias J. Sax <matthias@confluent.io>

Reviewers: Damian Guy, Eno Thereska, Guozhang Wang

Closes #2974 from mjsax/kafka-4923-add-eos-to-streams-add-broker-check-and-system-test
2017-05-21 22:16:18 -07:00
Jiangjie Qin 7fad45557e KAFKA-3995; KIP-126 Allow KafkaProducer to split and resend oversized batches
Author: Jiangjie Qin <becket.qin@gmail.com>

Reviewers: Joel Koshy <jjkoshy.w@gmail.com>, Ismael Juma <ismael@juma.me.uk>

Closes #2638 from becketqin/KAFKA-3995
2017-05-21 17:31:31 -07:00
Xavier Léauté e287523577 KAFKA-5192: add WindowStore range scan (KIP-155)
Implements range scan for keys in windowed and session stores

Modifies caching session and windowed stores to use segmented cache keys.
Cache keys are internally prefixed with their segment id to ensure key ordering in the cache matches the ordering in the underlying store for keys spread across multiple segments.
This should also result in fewer cache keys getting scanned for queries spanning only some segments.

Author: Xavier Léauté <xavier@confluent.io>

Reviewers: Damian Guy, Guozhang Wang

Closes #3027 from xvrl/windowstore-range-scan
2017-05-18 17:02:51 -07:00
Konstantine Karantasis 45f2261763 KAFKA-3487: Support classloading isolation in Connect (KIP-146)
Author: Konstantine Karantasis <konstantine@confluent.io>

Reviewers: Randall Hauch <rhauch@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #3028 from kkonstantine/KAFKA-3487-Support-classloading-isolation-in-Connect
2017-05-18 10:39:15 -07:00
Ismael Juma 972b754536 KAFKA-3267; Describe and Alter Configs Admin APIs (KIP-133)
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Jun Rao <junrao@gmail.com>

Closes #3076 from ijuma/kafka-3267-describe-alter-configs-protocol
2017-05-18 06:51:02 +01:00
Colin P. Mccabe 9815e18fef KAFKA-3266; Describe, Create and Delete ACLs Admin APIs (KIP-140)
Includes server-side code, protocol and AdminClient.

Author: Colin P. Mccabe <cmccabe@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>

Closes #2941 from cmccabe/KAFKA-3266
2017-05-18 03:20:30 +01:00