mirror of https://github.com/apache/kafka.git
MINOR: Add vagrant up wrapper for simple parallel bringup on aws
The main impediment to bringing up aws machines in parallel using vagrant was the interaction between `vagrant-hostmanager` and `vagrant-aws`. If you disable hostmanager during the `up` phase, and run it after the cluster is up, parallel bringup is possible. The only caveat is that machines must be brought up in small-ish batches to prevent rate limit errors from AWS since `vagrant-aws` doesn't seem to have mechanisms to This PR: - disables `vagrant-hostmanager` during bringup - adds a wrapper script to make it convenient to bring machines up in batches on aws Author: Geoff Anderson <geoff@confluent.io> Reviewers: Ewen Cheslack-Postava <ewen@confluent.io> Closes #982 from granders/vagrant-disable-hostmanager
This commit is contained in:
parent
bfac36ad0e
commit
c188a68e2b
|
@ -22,6 +22,8 @@ VAGRANTFILE_API_VERSION = "2"
|
|||
|
||||
# General config
|
||||
enable_dns = false
|
||||
# Override to false when bringing up a cluster on AWS
|
||||
enable_hostmanager = true
|
||||
enable_jmx = false
|
||||
num_zookeepers = 1
|
||||
num_brokers = 3
|
||||
|
@ -55,7 +57,7 @@ end
|
|||
|
||||
# TODO(ksweeney): RAM requirements are not empirical and can probably be significantly lowered.
|
||||
Vagrant.configure(VAGRANTFILE_API_VERSION) do |config|
|
||||
config.hostmanager.enabled = true
|
||||
config.hostmanager.enabled = enable_hostmanager
|
||||
config.hostmanager.manage_host = enable_dns
|
||||
config.hostmanager.include_offline = false
|
||||
|
||||
|
|
|
@ -26,7 +26,8 @@ https://cwiki.apache.org/confluence/display/KAFKA/tutorial+-+set+up+and+run+Kafk
|
|||
|
||||
* Bring up the test cluster
|
||||
|
||||
$ vagrant up
|
||||
$ vagrant/vagrant-up.sh
|
||||
$ # When using Virtualbox, it also works to run: vagrant up
|
||||
|
||||
* Build the desired branch of Kafka
|
||||
|
||||
|
@ -111,6 +112,7 @@ the test driver machine.
|
|||
ec2_instance_type = "..." # Pick something appropriate for your
|
||||
# test. Note that the default m3.medium has
|
||||
# a small disk.
|
||||
enable_hostmanager = false
|
||||
num_zookeepers = 0
|
||||
num_kafka = 0
|
||||
num_workers = 9
|
||||
|
@ -120,9 +122,10 @@ the test driver machine.
|
|||
ec2_region = 'us-west-2'
|
||||
ec2_ami = "ami-29ebb519"
|
||||
|
||||
* Start up the instances (note we have found bringing up machines in parallel can cause errors on aws):
|
||||
* Start up the instances:
|
||||
|
||||
$ vagrant up --provider=aws --no-provision --no-parallel && vagrant provision
|
||||
# This will brink up worker machines in small parallel batches
|
||||
$ vagrant/vagrant-up.sh --aws
|
||||
|
||||
* Now you should be able to run tests:
|
||||
|
||||
|
|
|
@ -27,11 +27,15 @@ clusters concurrently.
|
|||
|
||||
Now bring up the cluster:
|
||||
|
||||
$ vagrant up --no-provision && vagrant provision
|
||||
$ vagrant/vagrant-up.sh
|
||||
$ # If on aws, run: vagrant/vagrant-up.sh --aws
|
||||
|
||||
We separate out the two steps (bringing up the base VMs and configuring them)
|
||||
(This essentially runs vagrant up --no-provision && vagrant hostmanager && vagrant provision)
|
||||
|
||||
We separate out the steps (bringing up the base VMs, mapping hostnames, and configuring the VMs)
|
||||
due to current limitations in ZooKeeper (ZOOKEEPER-1506) that require us to
|
||||
collect IPs for all nodes before starting ZooKeeper nodes.
|
||||
collect IPs for all nodes before starting ZooKeeper nodes. Breaking into multiple steps
|
||||
also allows us to bring machies up in parallel on AWS.
|
||||
|
||||
Once this completes:
|
||||
|
||||
|
@ -66,7 +70,7 @@ the cluster to your most recent development version.
|
|||
|
||||
Finally, you can clean up the cluster by destroying all the VMs:
|
||||
|
||||
vagrant destroy
|
||||
vagrant destroy -f
|
||||
|
||||
## Configuration ##
|
||||
|
||||
|
@ -75,6 +79,7 @@ You can override some default settings by specifying the values in
|
|||
only ever need to change a few simple configuration variables. Some values you
|
||||
might want to override:
|
||||
|
||||
* `enable_hostmanager` - true by default; override to false if on AWS to allow parallel cluster bringup.
|
||||
* `enable_dns` - Register each VM with a hostname in /etc/hosts on the
|
||||
hosts. Hostnames are always set in the /etc/hosts in the VMs, so this is only
|
||||
necessary if you want to address them conveniently from the host for tasks
|
||||
|
@ -96,7 +101,7 @@ Install the `vagrant-aws` plugin to provide EC2 support:
|
|||
$ vagrant plugin install vagrant-aws
|
||||
|
||||
Next, configure parameters in `Vagrantfile.local`. A few are *required*:
|
||||
`enable_dns`, `ec2_access_key`, `ec2_secret_key`, `ec2_keypair_name`, `ec2_keypair_file`, and
|
||||
`enable_hostmanager`, `enable_dns`, `ec2_access_key`, `ec2_secret_key`, `ec2_keypair_name`, `ec2_keypair_file`, and
|
||||
`ec2_security_groups`. A couple of important notes:
|
||||
|
||||
1. You definitely want to use `enable_dns` if you plan to run clients outside of
|
||||
|
@ -122,7 +127,7 @@ Next, configure parameters in `Vagrantfile.local`. A few are *required*:
|
|||
|
||||
Now start things up, but specify the aws provider:
|
||||
|
||||
$ vagrant up --provider=aws --no-parallel --no-provision && vagrant provision
|
||||
$ vagrant/vagrant-up.sh
|
||||
|
||||
Your instances should get tagged with a name including your hostname to make
|
||||
them identifiable and make it easier to track instances in the AWS management
|
||||
|
|
|
@ -18,6 +18,7 @@
|
|||
# To use it, move it to the base kafka directory and rename
|
||||
# it to Vagrantfile.local, and adjust variables as needed.
|
||||
ec2_instance_type = "m3.xlarge"
|
||||
enable_hostmanager = false
|
||||
num_zookeepers = 0
|
||||
num_brokers = 0
|
||||
num_workers = 9
|
||||
|
|
|
@ -0,0 +1,237 @@
|
|||
# Licensed to the Apache Software Foundation (ASF) under one or more
|
||||
# contributor license agreements. See the NOTICE file distributed with
|
||||
# this work for additional information regarding copyright ownership.
|
||||
# The ASF licenses this file to You under the Apache License, Version 2.0
|
||||
# (the "License"); you may not use this file except in compliance with
|
||||
# the License. You may obtain a copy of the License at
|
||||
#
|
||||
# http://www.apache.org/licenses/LICENSE-2.0
|
||||
#
|
||||
# Unless required by applicable law or agreed to in writing, software
|
||||
# distributed under the License is distributed on an "AS IS" BASIS,
|
||||
# WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
|
||||
# See the License for the specific language governing permissions and
|
||||
# limitations under the License.
|
||||
|
||||
#! /usr/bin/env bash
|
||||
|
||||
set -o nounset
|
||||
set -o errexit # exit script if any command exits with nonzero value
|
||||
|
||||
readonly PROG_NAME=$(basename $0)
|
||||
readonly PROG_DIR=$(dirname $(realpath $0))
|
||||
readonly INVOKE_DIR=$(pwd)
|
||||
readonly ARGS="$@"
|
||||
|
||||
# overrideable defaults
|
||||
AWS=false
|
||||
PARALLEL=true
|
||||
MAX_PARALLEL=5
|
||||
|
||||
readonly USAGE="Usage: $PROG_NAME [-h | --help] [--aws [--no-parallel] [--max-parallel MAX]]"
|
||||
readonly HELP="$(cat <<EOF
|
||||
Tool to bring up a vagrant cluster on local machine or aws.
|
||||
|
||||
-h | --help Show this help message
|
||||
--aws Use if you are running in aws
|
||||
--no-parallel Bring up machines not in parallel. Only applicable on aws
|
||||
--max-parallel MAX Maximum number of machines to bring up in parallel. Note: only applicable on test worker machines on aws. default: $MAX_PARALLEL
|
||||
|
||||
Approximately speaking, this wrapper script essentially wraps 2 commands:
|
||||
vagrant up
|
||||
vagrant hostmanager
|
||||
|
||||
The situation on aws is complicated by the fact that aws imposes a maximum request rate,
|
||||
which effectively caps the number of machines we are able to bring up in parallel. Therefore, on aws,
|
||||
this wrapper script attempts to bring up machines in small batches.
|
||||
|
||||
If you are seeing rate limit exceeded errors, you may need to use a reduced --max-parallel setting.
|
||||
|
||||
EOF
|
||||
)"
|
||||
|
||||
function help {
|
||||
echo "$USAGE"
|
||||
echo "$HELP"
|
||||
exit 0
|
||||
}
|
||||
|
||||
while [[ $# > 0 ]]; do
|
||||
key="$1"
|
||||
case $key in
|
||||
-h | --help)
|
||||
help
|
||||
;;
|
||||
--aws)
|
||||
AWS=true
|
||||
;;
|
||||
--no-parallel)
|
||||
PARALLEL=false
|
||||
;;
|
||||
--max-parallel)
|
||||
MAX_PARALLEL="$2"
|
||||
shift
|
||||
;;
|
||||
*)
|
||||
# unknown option
|
||||
echo "Unknown option $1"
|
||||
exit 1
|
||||
;;
|
||||
esac
|
||||
shift # past argument or value
|
||||
done
|
||||
|
||||
# Get a list of vagrant machines (in any state)
|
||||
function read_vagrant_machines {
|
||||
local ignore_state="ignore"
|
||||
local reading_state="reading"
|
||||
local tmp_file="tmp-$RANDOM"
|
||||
|
||||
local state="$ignore_state"
|
||||
local machines=""
|
||||
|
||||
while read -r line; do
|
||||
# Lines before the first empty line are ignored
|
||||
# The first empty line triggers change from ignore state to reading state
|
||||
# When in reading state, we parse in machine names until we hit the next empty line,
|
||||
# which signals that we're done parsing
|
||||
if [[ -z "$line" ]]; then
|
||||
if [[ "$state" == "$ignore_state" ]]; then
|
||||
state="$reading_state"
|
||||
else
|
||||
# all done
|
||||
echo "$machines"
|
||||
return
|
||||
fi
|
||||
continue
|
||||
fi
|
||||
|
||||
# Parse machine name while in reading state
|
||||
if [[ "$state" == "$reading_state" ]]; then
|
||||
line=$(echo "$line" | cut -d ' ' -f 1)
|
||||
if [[ -z "$machines" ]]; then
|
||||
machines="$line"
|
||||
else
|
||||
machines="${machines} ${line}"
|
||||
fi
|
||||
fi
|
||||
done < <(vagrant status)
|
||||
}
|
||||
|
||||
# Filter "list", returning a list of strings containing pattern as a substring
|
||||
function filter {
|
||||
local list="$1"
|
||||
local pattern="$2"
|
||||
|
||||
local result=""
|
||||
for item in $list; do
|
||||
if [[ ! -z "$(echo $item | grep "$pattern")" ]]; then
|
||||
result="$result $item"
|
||||
fi
|
||||
done
|
||||
echo "$result"
|
||||
}
|
||||
|
||||
# Given a list of machine names, return only test worker machines
|
||||
function worker {
|
||||
local machines="$1"
|
||||
local workers=$(filter "$machines" "worker")
|
||||
workers=$(echo "$workers" | xargs) # trim leading/trailing whitespace
|
||||
echo "$workers"
|
||||
}
|
||||
|
||||
# Given a list of machine names, return only zookeeper and broker machines
|
||||
function zk_broker {
|
||||
local machines="$1"
|
||||
local zk_broker_list=$(filter "$machines" "zk")
|
||||
zk_broker_list="$zk_broker_list $(filter "$machines" "broker")"
|
||||
zk_broker_list=$(echo "$zk_broker_list" | xargs) # trim leading/trailing whitespace
|
||||
echo "$zk_broker_list"
|
||||
}
|
||||
|
||||
# Run a vagrant command on batches of machines of size $group_size
|
||||
# This is annoying but necessary on aws to avoid errors due to AWS request rate
|
||||
# throttling
|
||||
#
|
||||
# Example
|
||||
# $ vagrant_batch_command "vagrant up" "m1 m2 m3 m4 m5" "2"
|
||||
#
|
||||
# This is equivalent to running "vagrant up" on groups of machines of size 2 or less, i.e.:
|
||||
# $ vagrant up m1 m2
|
||||
# $ vagrant up m3 m4
|
||||
# $ vagrant up m5
|
||||
function vagrant_batch_command {
|
||||
local vagrant_cmd="$1"
|
||||
local machines="$2"
|
||||
local group_size="$3"
|
||||
|
||||
local count=1
|
||||
local m_group=""
|
||||
# Using --provision flag makes this command useable both when bringing up a cluster from scratch,
|
||||
# and when bringing up a halted cluster. Permissions on certain directores set during provisioning
|
||||
# seem to revert when machines are halted, so --provision ensures permissions are set correctly in all cases
|
||||
for machine in $machines; do
|
||||
m_group="$m_group $machine"
|
||||
|
||||
if [[ $(expr $count % $group_size) == 0 ]]; then
|
||||
# We've reached a full group
|
||||
# Bring up this part of the cluster
|
||||
$vagrant_cmd $m_group
|
||||
m_group=""
|
||||
fi
|
||||
((count++))
|
||||
done
|
||||
|
||||
# Take care of any leftover partially complete group
|
||||
if [[ ! -z "$m_group" ]]; then
|
||||
$vagrant_cmd $m_group
|
||||
fi
|
||||
}
|
||||
|
||||
# We assume vagrant-hostmanager is installed, but may or may not be disabled during vagrant up
|
||||
# In this fashion, we ensure we run hostmanager after machines are up, and before provisioning.
|
||||
# This sequence of commands is necessary for example for bringing up a multi-node zookeeper cluster
|
||||
function bring_up_local {
|
||||
vagrant up --no-provision
|
||||
vagrant hostmanager
|
||||
vagrant provision
|
||||
}
|
||||
|
||||
function bring_up_aws {
|
||||
local parallel="$1"
|
||||
local max_parallel="$2"
|
||||
local machines="$(read_vagrant_machines)"
|
||||
|
||||
zk_broker_machines=$(zk_broker "$machines")
|
||||
worker_machines=$(worker "$machines")
|
||||
|
||||
if [[ "$parallel" == "true" ]]; then
|
||||
if [[ ! -z "$zk_broker_machines" ]]; then
|
||||
# We still have to bring up zookeeper/broker nodes serially
|
||||
echo "Bringing up zookeeper/broker machines serially"
|
||||
vagrant up --provider=aws --no-parallel --no-provision $zk_broker_machines
|
||||
vagrant hostmanager
|
||||
vagrant provision
|
||||
fi
|
||||
|
||||
if [[ ! -z "$worker_machines" ]]; then
|
||||
echo "Bringing up test worker machines in parallel"
|
||||
vagrant_batch_command "vagrant up --provider=aws" "$worker_machines" "$max_parallel"
|
||||
vagrant hostmanager
|
||||
fi
|
||||
else
|
||||
vagrant up --provider=aws --no-parallel --no-provision
|
||||
vagrant hostmanager
|
||||
vagrant provision
|
||||
fi
|
||||
}
|
||||
|
||||
function main {
|
||||
if [[ "$AWS" == "true" ]]; then
|
||||
bring_up_aws "$PARALLEL" "$MAX_PARALLEL"
|
||||
else
|
||||
bring_up_local
|
||||
fi
|
||||
}
|
||||
|
||||
main
|
Loading…
Reference in New Issue