Go to file
Ewen Cheslack-Postava f6acfb0891 KAFKA-2366; Initial patch for Copycat
This is an initial patch implementing the basics of Copycat for KIP-26.

The intent here is to start a review of the key pieces of the core API and get a reasonably functional, baseline, non-distributed implementation of Copycat in place to get things rolling. The current patch has a number of known issues that need to be addressed before a final version:

* Some build-related issues. Specifically, requires some locally-installed dependencies (see below), ignores checkstyle for the runtime data library because it's lifted from Avro currently and likely won't last in its current form, and some Gradle task dependencies aren't quite right because I haven't gotten rid of the dependency on `core` (which should now be an easy patch since new consumer groups are in a much better state).
* This patch currently depends on some Confluent trunk code because I prototyped with our Avro serializers w/ schema-registry support. We need to figure out what we want to provide as an example built-in set of serializers. Unlike core Kafka where we could ignore the issue, providing only ByteArray or String serializers, this is pretty central to how Copycat works.
* This patch uses a hacked up version of Avro as its runtime data format. Not sure if we want to go through the entire API discussion just to get some basic code committed, so I filed KAFKA-2367 to handle that separately. The core connector APIs and the runtime data APIs are entirely orthogonal.
* This patch needs some updates to get aligned with recent new consumer changes (specifically, I'm aware of the ConcurrentModificationException issue on exit). More generally, the new consumer is in flux but Copycat depends on it, so there are likely to be some negative interactions.
* The layout feels a bit awkward to me right now because I ported it from a Maven layout. We don't have nearly the same level of granularity in Kafka currently (core and clients, plus the mostly ignored examples, log4j-appender, and a couple of contribs). We might want to reorganize, although keeping data+api separate from runtime and connector plugins is useful for minimizing dependencies.
* There are a variety of other things (e.g., I'm not happy with the exception hierarchy/how they are currently handled, TopicPartition doesn't really need to be duplicated unless we want Copycat entirely isolated from the Kafka APIs, etc), but I expect those we'll cover in the review.

Before commenting on the patch, it's probably worth reviewing https://issues.apache.org/jira/browse/KAFKA-2365 and https://issues.apache.org/jira/browse/KAFKA-2366 to get an idea of what I had in mind for a) what we ultimately want with all the Copycat patches and b) what we aim to cover in this initial patch. My hope is that we can use a WIP patch (after the current obvious deficiencies are addressed) while recognizing that we want to make iterative progress with a bunch of subsequent PRs.

Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Ismael Juma, Gwen Shapira

Closes #99 from ewencp/copycat and squashes the following commits:

a3a47a6 [Ewen Cheslack-Postava] Simplify Copycat exceptions, make them a subclass of KafkaException.
8c108b0 [Ewen Cheslack-Postava] Rename Coordinator to Herder to avoid confusion with the consumer coordinator.
7bf8075 [Ewen Cheslack-Postava] Make Copycat CLI speific to standalone mode, clean up some config and get rid of config storage in standalone mode.
656a003 [Ewen Cheslack-Postava] Clarify and expand the explanation of the Copycat Coordinator interface.
c0e5fdc [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat
0fa7a36 [Ewen Cheslack-Postava] Mark Copycat classes as unstable and reduce visibility of some classes where possible.
d55d31e [Ewen Cheslack-Postava] Reorganize Copycat code to put it all under one top-level directory.
b29cb2c [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat
d713a21 [Ewen Cheslack-Postava] Address Gwen's review comments.
6787a85 [Ewen Cheslack-Postava] Make Converter generic to match serializers since some serialization formats do not require a base class of Object; update many other classes to have generic key and value class type parameters to match this change.
b194c73 [Ewen Cheslack-Postava] Split Copycat converter option into two options for key and value.
0b5a1a0 [Ewen Cheslack-Postava] Normalize naming to use partition for both source and Kafka, adjusting naming in CopycatRecord classes to clearly differentiate.
e345142 [Ewen Cheslack-Postava] Remove Copycat reflection utils, use existing Utils and ConfigDef functionality from clients package.
be5c387 [Ewen Cheslack-Postava] Minor cleanup
122423e [Ewen Cheslack-Postava] Style cleanup
6ba87de [Ewen Cheslack-Postava] Remove most of the Avro-based mock runtime data API, only preserving enough schema functionality to support basic primitive types for an initial patch.
4674d13 [Ewen Cheslack-Postava] Address review comments, clean up some code styling.
25b5739 [Ewen Cheslack-Postava] Fix sink task offset commit concurrency issue by moving it to the worker thread and waking up the consumer to ensure it exits promptly.
0aefe21 [Ewen Cheslack-Postava] Add log4j settings for Copycat.
220e42d [Ewen Cheslack-Postava] Replace Avro serializer with JSON serializer.
1243a7c [Ewen Cheslack-Postava] Merge remote-tracking branch 'origin/trunk' into copycat
5a618c6 [Ewen Cheslack-Postava] Remove offset serializers, instead reusing the existing serializers and removing schema projection support.
e849e10 [Ewen Cheslack-Postava] Remove duplicated TopicPartition implementation.
dec1379 [Ewen Cheslack-Postava] Switch to using new consumer coordinator instead of manually assigning partitions. Remove dependency of copycat-runtime on core.
4a9b4f3 [Ewen Cheslack-Postava] Add some helpful Copycat-specific build and test targets that cover all Copycat packages.
31cd1ca [Ewen Cheslack-Postava] Add CLI tools for Copycat.
e14942c [Ewen Cheslack-Postava] Add Copycat file connector.
0233456 [Ewen Cheslack-Postava] Add copycat-avro and copycat-runtime
11981d2 [Ewen Cheslack-Postava] Add copycat-data and copycat-api
2015-08-14 16:00:51 -07:00
bin KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
checkstyle KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
clients/src KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
config KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
contrib KAFKA-2140 Improve code readability; reviewed by Neha Narkhede 2015-04-26 08:40:58 -07:00
copycat KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
core/src KAFKA-2406: Throttle ISR propagation 2015-08-13 17:54:36 -04:00
dev-utils KAFKA-2153 kafka-patch-review tool uploads a patch even if it is empty; reviewed by Neha Narkhede, Gwen Shapira 2015-05-04 11:58:53 -07:00
examples KAFKA-2140 Improve code readability; reviewed by Neha Narkhede 2015-04-26 08:40:58 -07:00
gradle kafka-2248; Use Apache Rat to enforce copyright headers; patched by Ewen Cheslack-Postava; reviewed by Gwen Shapira, Joel Joshy and Jun Rao 2015-07-06 15:47:40 -07:00
log4j-appender/src kafka-2132; Move Log4J appender to a separate module; patched by Ashish Singh; reviewed by Gwen Shapira, Aditya Auradkar and Jun Rao 2015-07-06 16:36:20 -07:00
system_test kafka-2005; Generate html report for system tests; patched by Ashish Singh; reviewed by Jun Rao 2015-06-11 15:27:51 -07:00
tests KAFKA-2408: ConsoleConsumerService direct log output to file 2015-08-11 15:24:52 -07:00
tools/src/main/java/org/apache/kafka/clients/tools KAFKA-2276; KIP-25 initial patch 2015-07-28 17:22:14 -07:00
vagrant KAFKA-2276; KIP-25 initial patch 2015-07-28 17:22:14 -07:00
.gitignore MINOR: Added to .gitignore Kafka server logs directory 2015-08-03 14:12:00 -07:00
.reviewboardrc KAFKA-1053 Kafka patch review tool that integrates JIRA and reviewboard; reviewed by Joel Koshy, Swapnil Ghike and Guozhang Wang 2013-09-17 20:48:15 -07:00
CONTRIBUTING.md KAFKA-2321; Introduce CONTRIBUTING.md 2015-07-27 10:54:23 -07:00
HEADER trivial fix to add missing license header using .gradlew licenseFormatMain and ./gradlew licenseFormatTest; patched by Jun Rao 2014-02-07 14:19:06 -08:00
LICENSE KAFKA-1254 remove vestigial sbt patch by Joe Stein; reviewed by Jun Rao 2014-02-20 00:11:31 -05:00
NOTICE KAFKA-533 changes to NOTICE and LICENSE related to KAFKA-534 removing client libraries from repo 2012-09-27 00:24:56 +00:00
README.md KAFKA-2348; Drop support for Scala 2.9 2015-07-24 09:19:59 -07:00
Vagrantfile MINOR: expose vagrant base box as variable 2015-08-13 15:22:54 -07:00
build.gradle KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
doap_Kafka.rdf trivial change to add kafka doap project file 2014-04-11 21:29:09 -07:00
gradle.properties KAFKA-2199 Make signing artifacts optional and disabled by 2015-05-29 14:50:45 -07:00
gradlew trivial change to README to make the gradle wrapper download clearer 2014-09-23 14:39:10 -07:00
gradlew.bat trivial change to README to make the gradle wrapper download clearer 2014-09-23 14:39:10 -07:00
kafka-merge-pr.py KAFKA-2430; Listing of PR commits in commit message should be optional 2015-08-13 10:35:09 -07:00
kafka-patch-review.py kafka-2248; Use Apache Rat to enforce copyright headers; patched by Ewen Cheslack-Postava; reviewed by Gwen Shapira, Joel Joshy and Jun Rao 2015-07-06 15:47:40 -07:00
scala.gradle kafka-2248; Use Apache Rat to enforce copyright headers; patched by Ewen Cheslack-Postava; reviewed by Gwen Shapira, Joel Joshy and Jun Rao 2015-07-06 15:47:40 -07:00
settings.gradle KAFKA-2366; Initial patch for Copycat 2015-08-14 16:00:51 -07:00
wrapper.gradle KAFKA-1490 remove gradlew initial setup output from source distribution patch by Ivan Lyutov reviewed by Joe Stein 2014-09-23 12:46:02 -04:00

README.md

Apache Kafka

See our web site for details on the project.

You need to have gradle installed.

First bootstrap and download the wrapper

cd kafka_source_dir
gradle

Now everything else will work

Building a jar and running it

./gradlew jar  

Follow instuctions in http://kafka.apache.org/documentation.html#quickstart

Building source jar

./gradlew srcJar

Building javadocs and scaladocs

./gradlew javadoc
./gradlew javadocJar # builds a jar from the javadocs
./gradlew scaladoc
./gradlew scaladocJar # builds a jar from the scaladocs
./gradlew docsJar # builds both javadoc and scaladoc jar

Running unit tests

./gradlew test

Forcing re-running unit tests w/o code change

./gradlew cleanTest test

Running a particular unit test

./gradlew -Dtest.single=RequestResponseSerializationTest core:test

Running a particular test method within a unit test

./gradlew core:test --tests kafka.api.test.ProducerFailureHandlingTest.testCannotSendToInternalTopic
./gradlew clients:test --tests org.apache.kafka.clients.producer.MetadataTest.testMetadataUpdateWaitTime

Running a particular unit test with log4j output

change the log4j setting in either clients/src/test/resources/log4j.properties or core/src/test/resources/log4j.properties
./gradlew -i -Dtest.single=RequestResponseSerializationTest core:test

Building a binary release gzipped tar ball

./gradlew clean
./gradlew releaseTarGz  
The above command will fail if you haven't set up the signing key. To bypass signing the artifact, you can run
./gradlew releaseTarGz -x signArchives

The release file can be found inside ./core/build/distributions/.

Cleaning the build

./gradlew clean

Running a task on a particular version of Scala (either 2.10.5 or 2.11.7)

(If building a jar with a version other than 2.10, need to set SCALA_BINARY_VERSION variable or change it in bin/kafka-run-class.sh to run quick start.)

./gradlew -PscalaVersion=2.11.7 jar
./gradlew -PscalaVersion=2.11.7 test
./gradlew -PscalaVersion=2.11.7 releaseTarGz

Running a task for a specific project

This is for 'core', 'contrib:hadoop-consumer', 'contrib:hadoop-producer', 'examples' and 'clients' ./gradlew core:jar ./gradlew core:test

Listing all gradle tasks

./gradlew tasks

Building IDE project

./gradlew eclipse
./gradlew idea

Building the jar for all scala versions and for all projects

./gradlew jarAll

Running unit tests for all scala versions and for all projects

./gradlew testAll

Building a binary release gzipped tar ball for all scala versions

./gradlew releaseTarGzAll

Publishing the jar for all version of Scala and for all projects to maven

./gradlew uploadArchivesAll

Please note for this to work you should create/update ~/.gradle/gradle.properties and assign the following variables

mavenUrl=
mavenUsername=
mavenPassword=
signing.keyId=
signing.password=
signing.secretKeyRingFile=

Publishing the jars without signing to a local repository

./gradlew -Dorg.gradle.project.skipSigning=true -Dorg.gradle.project.mavenUrl=file://path/to/repo uploadArchivesAll

Building the test jar

./gradlew testJar

Determining how transitive dependencies are added

./gradlew core:dependencies --configuration runtime

Running checkstyle on the java code

./gradlew checkstyleMain checkstyleTest

Running in Vagrant

See vagrant/README.md.

Contribution

Apache Kafka is interested in building the community; we would welcome any thoughts or patches. You can reach us on the Apache mailing lists.

To contribute follow the instructions here:

We also welcome patches for the website and documentation which can be found here: