kafka/vagrant
Geoff Anderson e43c9aff92 KAFKA-2276; KIP-25 initial patch
Initial patch for KIP-25

Note that to install ducktape, do *not* use pip to install ducktape. Instead:

```
$ git clone gitgithub.com:confluentinc/ducktape.git
$ cd ducktape
$ python setup.py install
```

Author: Geoff Anderson <geoff@confluent.io>
Author: Geoff <granders@gmail.com>
Author: Liquan Pei <liquanpei@gmail.com>

Reviewers: Ewen, Gwen, Jun, Guozhang

Closes #70 from granders/KAFKA-2276 and squashes the following commits:

a62fb6c [Geoff Anderson] fixed checkstyle errors
a70f0f8 [Geoff Anderson] Merged in upstream trunk.
8b62019 [Geoff Anderson] Merged in upstream trunk.
47b7b64 [Geoff Anderson] Created separate tools jar so that the clients package does not pull in dependencies on the Jackson JSON tools or argparse4j.
a9e6a14 [Geoff Anderson] Merged in upstream changes
d18db7b [Geoff Anderson] fixed :rat errors (needed to add licenses)
321fdf8 [Geoff Anderson] Ignore tests/ and vagrant/ directories when running rat build task
795fc75 [Geoff Anderson] Merged in changes from upstream trunk.
1d93f06 [Geoff Anderson] Updated provisioning to use java 7 in light of KAFKA-2316
2ea4e29 [Geoff Anderson] Tweaked README, changed default log collection behavior on VerifiableProducer
0eb6fdc [Geoff Anderson] Merged in system-tests
69dd7be [Geoff Anderson] Merged in trunk
4034dd6 [Geoff Anderson] Merged in upstream trunk
ede6450 [Geoff] Merge pull request #4 from confluentinc/move_muckrake
7751545 [Geoff Anderson] Corrected license headers
e6d532f [Geoff Anderson] java 7 -> java 6
8c61e2d [Geoff Anderson] Reverted jdk back to 6
f14c507 [Geoff Anderson] Removed mode = "test" from Vagrantfile and Vagrantfile.local examples. Updated testing README to clarify aws setup.
98b7253 [Geoff Anderson] Updated consumer tests to pre-populate kafka logs
e6a41f1 [Geoff Anderson] removed stray println
b15b24f [Geoff Anderson] leftover KafkaBenchmark in super call
0f75187 [Geoff Anderson] Rmoved stray allow_fail. kafka_benchmark_test -> benchmark_test
f469f84 [Geoff Anderson] Tweaked readme, added example Vagrantfile.local
3d73857 [Geoff Anderson] Merged downstream changes
42dcdb1 [Geoff Anderson] Tweaked behavior of stop_node, clean_node to generally fail fast
7f7c3e0 [Geoff Anderson] Updated setup.py for kafkatest
c60125c [Geoff Anderson] TestEndToEndLatency -> EndToEndLatency
4f476fe [Geoff Anderson] Moved aws scripts to vagrant directory
5af88fc [Geoff Anderson] Updated README to include aws quickstart
e5edf03 [Geoff Anderson] Updated example aws Vagrantfile.local
96533c3 [Geoff] Update aws-access-keys-commands
25a413d [Geoff] Update aws-example-Vagrantfile.local
884b20e [Geoff Anderson] Moved a bunch of files to kafkatest directory
fc7c81c [Geoff Anderson] added setup.py
632be12 [Geoff] Merge pull request #3 from confluentinc/verbose-client
51a94fd [Geoff Anderson] Use argparse4j instead of joptsimple. ThroughputThrottler now has more intuitive behavior when targetThroughput is 0.
a80a428 [Geoff Anderson] Added shell program for VerifiableProducer.
d586fb0 [Geoff Anderson] Updated comments to reflect that throttler is not message-specific
6842ed1 [Geoff Anderson] left out a file from last commit
1228eef [Geoff Anderson] Renamed throttler
9100417 [Geoff Anderson] Updated command-line options for VerifiableProducer. Extracted throughput logic to make it reusable.
0a5de8e [Geoff Anderson] Fixed checkstyle errors. Changed name to VerifiableProducer. Added synchronization for thread safety on println statements.
475423b [Geoff Anderson] Convert class to string before adding to json object.
bc009f2 [Geoff Anderson] Got rid of VerboseProducer in core (moved to clients)
c0526fe [Geoff Anderson] Updates per review comments.
8b4b1f2 [Geoff Anderson] Minor updates to VerboseProducer
2777712 [Geoff Anderson] Added some metadata to producer output.
da94b8c [Geoff Anderson] Added number of messages option.
07cd1c6 [Geoff Anderson] Added simple producer which prints status of produced messages to stdout.
a278988 [Geoff Anderson] fixed typos
f1914c3 [Liquan Pei] Merge pull request #2 from confluentinc/system_tests
81e4156 [Liquan Pei] Bootstrap Kafka system tests
2015-07-28 17:22:14 -07:00
..
aws KAFKA-2276; KIP-25 initial patch 2015-07-28 17:22:14 -07:00
README.md KAFKA-1173 Using Vagrant to get up and running with Apache Kafka patch by Ewen Cheslack-Postava reviewed by Joe Stein 2014-12-05 08:37:11 -05:00
base.sh KAFKA-2276; KIP-25 initial patch 2015-07-28 17:22:14 -07:00
broker.sh KAFKA-2304 Supported enabling JMX in Kafka Vagrantfile patch by Stevo Slavic reviewed by Ewen Cheslack-Postava 2015-07-07 10:09:11 -07:00
system-test-Vagrantfile.local KAFKA-2276; KIP-25 initial patch 2015-07-28 17:22:14 -07:00
zk.sh KAFKA-2304 Supported enabling JMX in Kafka Vagrantfile patch by Stevo Slavic reviewed by Ewen Cheslack-Postava 2015-07-07 10:09:11 -07:00

README.md

Apache Kafka

Using Vagrant to get up and running.

  1. Install Virtual Box https://www.virtualbox.org/

  2. Install Vagrant >= 1.6.4 http://www.vagrantup.com/

  3. Install Vagrant Plugins:

    Required

    $ vagrant plugin install vagrant-hostmanager

    Optional

    $ vagrant plugin install vagrant-cachier # Caches & shares package downloads across VMs

In the main Kafka folder, do a normal Kafka build:

$ gradle
$ ./gradlew jar

You can override default settings in Vagrantfile.local, which is a Ruby file that is ignored by git and imported into the Vagrantfile. One setting you likely want to enable in Vagrantfile.local is enable_dns = true to put hostnames in the host's /etc/hosts file. You probably want this to avoid having to use IP addresses when addressing the cluster from outside the VMs, e.g. if you run a client on the host. It's disabled by default since it requires sudo access, mucks with your system state, and breaks with naming conflicts if you try to run multiple clusters concurrently.

Now bring up the cluster:

$ vagrant up --no-provision && vagrant provision

We separate out the two steps (bringing up the base VMs and configuring them) due to current limitations in ZooKeeper (ZOOKEEPER-1506) that require us to collect IPs for all nodes before starting ZooKeeper nodes.

Once this completes:

  • Zookeeper will be running on 192.168.50.11 (and zk1 if you used enable_dns)
  • Broker 1 on 192.168.50.51 (and broker1 if you used enable_dns)
  • Broker 2 on 192.168.50.52 (and broker2 if you used enable_dns)
  • Broker 3 on 192.168.50.53 (and broker3 if you used enable_dns)

To log into one of the machines:

vagrant ssh <machineName>

You can access the brokers and zookeeper by their IP or hostname, e.g.

# Specify ZooKeeper node 1 by it's IP: 192.168.50.11
bin/kafka-topics.sh --create --zookeeper 192.168.50.11:2181 --replication-factor 3 --partitions 1 --topic sandbox

# Specify brokers by their hostnames: broker1, broker2, broker3
bin/kafka-console-producer.sh --broker-list broker1:9092,broker2:9092,broker3:9092 --topic sandbox

# Specify ZooKeeper node by its hostname: zk1
bin/kafka-console-consumer.sh --zookeeper zk1:2181 --topic sandbox --from-beginning

If you need to update the running cluster, you can re-run the provisioner (the step that installs software and configures services):

vagrant provision

Note that this doesn't currently ensure a fresh start -- old cluster state will still remain intact after everything restarts. This can be useful for updating the cluster to your most recent development version.

Finally, you can clean up the cluster by destroying all the VMs:

vagrant destroy

Configuration

You can override some default settings by specifying the values in Vagrantfile.local. It is interpreted as a Ruby file, although you'll probably only ever need to change a few simple configuration variables. Some values you might want to override:

  • enable_dns - Register each VM with a hostname in /etc/hosts on the hosts. Hostnames are always set in the /etc/hosts in the VMs, so this is only necessary if you want to address them conveniently from the host for tasks that aren't provided by Vagrant.
  • num_zookeepers - Size of zookeeper cluster
  • num_brokers - Number of broker instances to run

Using Other Providers

EC2

Install the vagrant-aws plugin to provide EC2 support:

$ vagrant plugin install vagrant-aws

Next, configure parameters in Vagrantfile.local. A few are required: enable_dns, ec2_access_key, ec2_secret_key, ec2_keypair_name, ec2_keypair_file, and ec2_security_groups. A couple of important notes:

  1. You definitely want to use enable_dns if you plan to run clients outside of the cluster (e.g. from your local host). If you don't, you'll need to go lookup vagrant ssh-config.

  2. You'll have to setup a reasonable security group yourself. You'll need to open ports for Zookeeper (2888 & 3888 between ZK nodes, 2181 for clients) and Kafka (9092). Beware that opening these ports to all sources (e.g. so you can run producers/consumers locally) will allow anyone to access your Kafka cluster. All other settings have reasonable defaults for setting up an Ubuntu-based cluster, but you may want to customize instance type, region, AMI, etc.

  3. ec2_access_key and ec2_secret_key will use the environment variables AWS_ACCESS_KEY and AWS_SECRET_KEY respectively if they are set and not overridden in Vagrantfile.local.

  4. If you're launching into a VPC, you must specify ec2_subnet_id (the subnet in which to launch the nodes) and ec2_security_groups must be a list of security group IDs instead of names, e.g. sg-34fd3551 instead of kafka-test-cluster.

Now start things up, but specify the aws provider:

$ vagrant up --provider=aws --no-parallel --no-provision && vagrant provision

Your instances should get tagged with a name including your hostname to make them identifiable and make it easier to track instances in the AWS management console.