The consumer's `committed` API does not return an entry in the response map for a requested partition if there is no committed offset. The transactional message copier, which is used in the transaction system test, did not account for this. If the first transaction attempted by the copier was randomly aborted, then we would not seek to the beginning as expected, which means we would fail to copy some of the records.
This patch fixes the problem by iterating over the assignment rather than the result of `committed` when resetting offsets. It also adds enables additional logging in the transaction message copier service to make finding problems easier in the future.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#7653 from hachikuji/fix-transaction-system-test
1. Add the overloaded functions.
2. Update the code in Streams to use the batch API for better latency (this applies to both active StreamsTask for initialize the offsets, as well as the StandbyTasks for updating offset limits).
3. Also update all unit test to replace the deprecated APIs.
Reviewers: Christopher Pettitt <cpettitt@confluent.io>, Kamal Chandraprakash <kamal.chandraprakash@gmail.com>, Bill Bejeck <bill@confluent.io>
This creates a test that generates sustained connections against Kafka. There
are three different components we can stress with this, KafkaConsumer,
KafkaProducer, and AdminClient. This test tries use minimal bandwidth per
connection to reduce overhead impacts.
This test works by creating a threadpool that creates connections and then
maintains a central pool of connections at a specified keepalive rate. The
keepalive action varies by which component is being stressed:
* KafkaProducer: Sends one single produce record. The configuration for
the produce request uses the same key/value generator as the ProduceBench
test.
* KafkaConsumer: Subscribes to a single partition, seeks to the end, and
then polls a minimal number of records. Each consumer connection is its
own consumer group, and defaults to 1024 bytes as FETCH_MAX_BYTES to keep
traffic to a minimum.
* AdminClient: Makes an API call to get the nodes in the cluster.
NOTE: This test is designed to be run alongside a ProduceBench test for a
specific topic, due to the way the Consumer test polls a single partition.
There may be no data returned by the consumer test if this is run on its own.
The connection should still be kept alive, but with no data returned.
Author: Scott Hendricks <scott.hendricks@confluent.io>
Reviewers: Stanislav Kozlovski, Gwen Shapira
Closes#7289 from scott-hendricks/trunk
When sending bad records, the Trogdor task will fail if the final record produced is bad. Instead we should catch the exception to allow the task to finish since sending bad records is a valid use case.
Reviewers: Tu V. Tran <tuvtran97@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
Add a new RandomComponentPayloadGenerator that gives a payload based on random selection of another PayloadGenerator. Additionally, add an example that uses a non-default PayloadGenerator configuration to TROGDOR.md.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
Add a "useConfiguredPartitioner" boolean to specify testing with the configured partitioner, rather than overriding the partitioner in the test.
Add a "skipFlush" boolean to specify skipping the flush operation when producing. This is helpful when testing some scenarios where linger.ms is greater than 0.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
This adds a new Trogdor fault spec for inducing network latency on a network device for system testing. It operates very similarly to the existing network partition spec by executing the `tc` linux utility.
* Remove deprecated class Slf4jRequestLog: use Slf4jRequestLogWriter, CustomRequestLog instread.
1. Remove '@SuppressWarnings("deprecation")' from RestServer#initializeResources, JsonRestServer#start.
2. Remove unused JsonRestServer#httpRequest.
* Fix deprecated class usage: SslContextFactory -> SslContextFactory.[Server, Client]
1. Split SSLUtils#createSslContextFactory into SSLUtils#create[Server, Client]SideSslContextFactory: each method instantiates SslContextFactory.[Server, Client], respectively.
2. SSLUtils#configureSslContextFactoryAuthentication is called from SSLUtils#createServerSideSslContextFactory only.
3. Update SSLUtilsTest following splittion; for client-side SSL Context Factory, SslContextFactory#get[Need, Want]ClientAuth is always false. (SSLUtilsTest#testCreateClientSideSslContextFactory)
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
For static members join/rejoin, we encode the current timestamp in the new member.id. The format looks like group.instance.id-timestamp.
During consumer/broker interaction logic (Join, Sync, Heartbeat, Commit), we shall check the whether group.instance.id is known on group. If yes, we shall match the member.id stored on static membership map with the request member.id. If mismatching, this indicates a conflict consumer has used same group.instance.id, and it will receive a fatal exception to shut down.
Right now the only missing part is the system test. Will work on it offline while getting the major logic changes reviewed.
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
This is the first diff for the implementation of JoinGroup logic for static membership. The goal of this diff contains:
* Add group.instance.id to be unique identifier for consumer instances, provided by end user;
Modify group coordinator to accept JoinGroupRequest with/without static membership, refactor the logic for readability and code reusability.
* Add client side support for incorporating static membership changes, including new config for group.instance.id, apply stream thread client id by default, and new join group exception handling.
* Increase max session timeout to 30 min for more user flexibility if they are inclined to tolerate partial unavailability than burdening rebalance.
* Unit tests for each module changes, especially on the group coordinator logic. Crossing the possibilities like:
6.1 Dynamic/Static member
6.2 Known/Unknown member id
6.3 Group stable/unstable
6.4 Leader/Follower
The rest of the 345 change will be broken down to 4 separate diffs:
* Avoid kicking out members through rebalance.timeout, only do the kick out through session timeout.
* Changes around LeaveGroup logic, including version bumping, broker logic, client logic, etc.
* Admin client changes to add ability to batch remove static members
* Deprecate group.initial.rebalance.delay
Reviewers: Liquan Pei <liquanpei@gmail.com>, Stanislav Kozlovski <familyguyuser192@windowslive.com>, Jason Gustafson <jason@confluent.io>, Guozhang Wang <wangguoz@gmail.com>
Each separate thread should have its own throttle, so that it can sleep
for an appropriate amount of time when needed.
ConnectionStressWorker should avoid recalculating the status after
shutting down the runnables. Otherwise, if one runnable is slow to
stop, it will skew the average down in a way that doesn't reflect
reality. This change moves the status calculation into a separate
periodic runnable that gets shut down cleanly before the other ones.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Gwen Shapira, Stanislav Kozlovski
Closes#6533 from cmccabe/fix_connection_stress_worker
Optimize ConnectionStressWorker by avoiding creating a new
ChannelBuilder each time we want to open a new connection.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Gwen Shapira
Closes#6518 from cmccabe/optimize-connection-stress-worker
doneFuture is supposed to be completed with an empty string (meaning success) or a non-empty string which is the error message. Currently, due to exception.getMessage sometimes returning null or an empty string, this is not working correctly. This patch fixes that.
Reviewers: David Arthur <mumrah@gmail.com>
This patch adds a TimeIntervalTransactionsGenerator class which enables the Trogdor ProduceBench worker to commit transactions based on a configurable millisecond time interval.
Also, we now handle 409 create task responses in the coordinator command-line client by printing a more informative message
Reviewers: Colin P. McCabe <cmccabe@apache.org>
RoundTripWorker to should use a long field for maxMessages rather than an int. The consumer group used should unique as well.
Reviewers: Colin P. McCabe <cmccabe@apache.org>
TopicDescription and ConsumerGroupDescription in org.apache.kafka.clients.admin. are part of the public API, so we should retain the existing public constructor. Changed the new constructor with authorized operations to be package-private to avoid maintaining more public constructors since we only expect admin client to use this.
Reviewers: Manikumar Reddy <manikumar.reddy@gmail.com>
It is best to use a growing thread pool for worker cleanups. This lets us ensure that we close workers as fast as possible and not get slowed down on blocking cleanups.
Reviewers: Colin McCabe <cmccabe@apache.org>, Jason Gustafson <jason@confluent.io>
SecurityTest.test_client_ssl_endpoint_validation_failure is failing because it greps for 'SSLHandshakeException in the consumer and producer log files. With the fix for KAKFA-7773, the test uses the VerifiableConsumer instead of the ConsoleConsumer, which does not log the exception stack trace to the service log. This patch catches exceptions in the VerifiableConsumer and logs them in order to fix the test. Tested by running the test locally.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Allow the Trogdor agent to execute external commands. The agent communicates with the external commands via stdin, stdout, and stderr.
Based on a patch by Xi Yang <xi@confluent.io>
Reviewers: David Arthur <mumrah@gmail.com>
* Allow the Trogdor agent to be started in "exec mode", where it simply
runs a single task and exits after it is complete.
* For AgentClient and CoordinatorClient, allow the user to pass the path
to a file containing JSON, instead of specifying the JSON object in the
command-line text itself. This means that we can get rid of the bash
scripts whose only function was to load task specs into a bash string
and run a Trogdor command.
* Print dates and times in a human-readable way, rather than as numbers
of milliseconds.
* When listing tasks or workers, output human-readable tables of
information.
* Allow the user to filter on task ID name, task ID pattern, or task
state.
* Support a --json flag to provide raw JSON output if desired.
Reviewed-by: David Arthur <mumrah@gmail.com>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
Update the Trogdor StringExpander regex to handle an epilogue. Previously the regex would use a lazy quantifier at the end, which meant it would not catch anything after the range expression. Add a unit test.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
* Update KafkaAdminClient#describeTopics to throw UnknownTopicOrPartitionException.
* Remove unused method: WorkerUtils#getMatchingTopicPartitions.
* Add some JavaDoc.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>, Ryanne Dolan <ryannedolan@gmail.com>
The Trogdor Coordinator now overwrites a task's startMs to the time it received it if startMs is in the past.
The Trogdor Agent now correctly expires a task after the expiry time (startMs + durationMs) passes. Previously, it would ignore startMs and expire after durationMs milliseconds of local start of the task.
Reviewed-by: Colin P. McCabe <cmccabe@apache.org>
Switch to lambda when ever possible instead of old anonymous way
in tools module
Author: Srinivas Reddy <srinivas96alluri@gmail.com>
Author: Srinivas Reddy <mrsrinivas@users.noreply.github.com>
Reviewers: Ryanne Dolan <ryannedolan@gmail.com>, Ismael Juma <ismael@juma.me.uk>, Manikumar Reddy <manikumar.reddy@gmail.com>
Closes#6013 from mrsrinivas/tools-switch-to-java8
Current code will fall into non-stop loop and send more message to broker, and Stat in PerfCallback method record will throw ArrayIndexOutOfBoundsException
KAFKA-7597: Add configurable transaction support to ProduceBenchWorker. In order to get support for serializing Optional<> types to JSON, add a new library: jackson-datatype-jdk8. Once Jackson 3 comes out, this library will not be needed.
Reviewers: Colin McCabe <cmccabe@apache.org>, Ismael Juma <ismael@juma.me.uk>
Add threads with separate consumers to ConsumeBenchWorker. Update the Trogdor test scripts and documentation with the new functionality.
Reviewers: Colin McCabe <cmccabe@apache.org>
- Use Xlint:all with 3 exclusions (filed KAFKA-7613 to remove the exclusions)
- Use the same javac options when compiling tests (seems accidental that
we didn't do this before)
- Replaced several deprecated method calls with non-deprecated ones:
- `KafkaConsumer.poll(long)` and `KafkaConsumer.close(long)`
- `Class.newInstance` and `new Integer/Long` (deprecated since Java 9)
- `scala.Console` (deprecated in Scala 2.11)
- `PartitionData` taking a timestamp (one of them seemingly a bug)
- `JsonMappingException` single parameter constructor
- Fix unnecessary usage of raw types in several places.
- Add @SuppressWarnings for deprecations, unchecked and switch fallthrough in
several places.
- Scala clean-ups (var -> val, ETA expansion warnings, avoid reflective calls)
- Use lambdas to simplify code in a few places
- Add @SafeVarargs, fix varargs usage and remove unnecessary `Utils.mkList` method
Reviewers: Matthias J. Sax <mjsax@apache.org>, Manikumar Reddy <manikumar.reddy@gmail.com>, Randall Hauch <rhauch@gmail.com>, Bill Bejeck <bill@confluent.io>, Stanislav Kozlovski <stanislav_kozlovski@outlook.com>
Currently PushHttpMetricsReporter will convert value from KafkaMetric.metricValue() to double. This will not work for non-numerical metrics such as version in AppInfoParser whose value can be string. This has caused issue for PushHttpMetricsReporter which in turn caused system test kafkatest.tests.client.quota_test.QuotaTest.test_quota to fail.
Since we allow metric value to be object, PushHttpMetricsReporter should also read metric value as object and pass it to the http server.
Author: Dong Lin <lindong28@gmail.com>
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#5886 from lindong28/KAFKA-7560
KIP-368 implementation to enable periodic re-authentication of SASL clients. Also adds a broker configuration option to terminate client connections that do not re-authenticate within the configured interval.
Adds `client.dns.lookup=resolve_canonical_bootstrap_servers_only` option to perform full dns resolution of bootstrap addresses
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Sriharsha Chintalapani <sriharsha@apache.org>, Edoardo Comar <ecomar@uk.ibm.com>, Mickael Maison <mickael.maison@gmail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Rajini Sivaram <rajinisivaram@googlemail.com>
Implementation of KIP-302: Based on the new client configuration `client.dns.lookup`, a NetworkClient can use InetAddress.getAllByName to find all IPs and iterate over them when they fail to connect. Only uses either IPv4 or IPv6 addresses similar to the default mode.
Co-authored-by: Edoardo Comar <ecomar@uk.ibm.com>
Co-authored-by: Mickael Maison <mickael.maison@gmail.com>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Use delivery timeout instead of retries when possible and remove various TODOs associated with completion of KIP-91.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>
Currently create time is interpreted as integer.
This PR makes the tool accept long values.
Reviewers: Manikumar Reddy O <manikumar.reddy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>
They rely on finalizers (before Java 11), which create
unnecessary GC load. The alternatives are as easy to
use and don't have this issue.
Also use FileChannel directly instead of retrieving
it from RandomAccessFile whenever possible
since the indirection is unnecessary.
Finally, add a few try/finally blocks.
Reviewers: Colin Patrick McCabe <colin@cmccabe.xyz>, Rajini Sivaram <rajinisivaram@googlemail.com>
This PR does the following:
* Remove the StreamsRepeatingIntegerKeyProducerService and the associated Java class
* Add a parameter to VerifiableProducer.java to enable sending keys when specified
* Update the corresponding Python file verifiable_producer.py to support the new parameter.
Reviewers: Matthias J Sax <matthias@confluentio>, Guozhang Wang <wangguoz@gmail.com>
Implement destroying tasks and workers. This means erasing all record of them on the Coordinator and the Agent.
Workers should be identified by unique 64-bit worker IDs, rather than by the names of the tasks they are implementing. This ensures that when a task is destroyed and re-created with the same task ID, the old workers will be not be treated as part of the new task instance.
Fix some return results from RPCs. In some cases RPCs were returning values that were never used. Attempting to re-create the same task ID with different arguments should fail. Add RequestConflictException to represent HTTP error code 409 (CONFLICT) for this scenario.
If only one worker in a task stops, don't stop all the other workers for that task, unless the worker that stopped had an error.
Reviewers: Anna Povzner <anna@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Added consumer only workload to Trogdor. The topics must already be pre-populated. The spec lets the user request topic pattern and range of partitions to assign to [startPartition, endPartition].
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Currently, WorkerUtils will be able to create topics when there is no security. To be able to work with secure kafka, WorkerUtils.createTopic() needs to be able to take security configs. This PR adds commonClientConf field to both producer bench and roundtrip workload specs so that users can specify security and other common configs once for producer/consumer and adminClient. Also added adminClientConf field to workload specs so that users can specify adminClient specific configs if they want to. For completeness, added consumerConf and producerConf to roundtrip workload spec.
Reviewers: Ismael Juma <ismael@juma.me.uk>, Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Added configs to ProducerBenchSpec:
topicPrefix: name of topics will be of format topicPrefix + topic index. If not provided, default is "produceBenchTopic".
partitionsPerTopic: number of partitions per topic. If not provided, default is 1.
replicationFactor: replication factor per topic. If not provided, default is 3.
The behavior of producer bench is changed such that if some or all topics already exist (with topic names = topicPrefix + topic index), and they have the same number of partitions as requested, the worker uses those topics and does not fail. The producer bench fails if one or more existing topics has number of partitions that is different from expected number of partitions.
Added unit test for WorkerUtils -- for existing methods and new methods.
Fixed bug in MockAdminClient, where createTopics() would over-write existing topic's replication factor and number of partitions while correctly completing the appropriate futures exceptionally with TopicExistsException.
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
Make PayloadGenerator an interface which can have multiple implementations: constant, uniform random, sequential.
Allow different payload generators to be used for keys and values.
This change fixes RoundTripWorkload. Previously RoundTripWorkload was unable to get the sequence number of the keys that it produced.
AgentClient and CoordinatorClient should have the option of logging failures to custom log4j objects. There should also be builders for these objects, to make them easier to extend in the future.
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
It generates the producer payload (key and value) and makes sure that the values are
populated to target a realistic compression rate (0.3 - 0.4) if compression is used.
The generated payload is deterministic and can be replayed from a given position.
For now, all generated values are constant size, and key types can be configured
to be either null or 8 bytes.
Added messageSize parameter to producer spec, that specifies produced
key + message size.
Changing KafkaFuture.Future and KafkaFuture.BiConsumer into an interface makes
them a functional interface. This makes them Java 8 lambda compatible.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Author: Steven Aerts <steven.aerts@gmail.com>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Xavier Léauté <xl+github@xvrl.net>, Tom Bentley <tbentley@redhat.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#4033 from steven-aerts/KAFKA-6018
Handle InvalidTypeIdException as NOT_IMPLEMENTED and add unit tests for all exceptions.
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Rajini Sivaram <rajinisivaram@googlemail.com>
When a log entry is appended to a Kafka topic using `KafkaLog4jAppender`, the producer.send operation may block waiting for metadata. This can result in deadlocks in a couple of scenarios if a log entry from the producer network thread is also at a log level that results in the entry being appended to a Kafka topic.
1. Producer's network thread will attempt to send data to a Kafka topic and this is unsafe since producer.send may block waiting for metadata, causing a deadlock since the thread will not process the metadata request/response.
2. `KafkaLog4jAppender#append` is invoked while holding the lock of the logger. So the thread waiting for metadata in the initial send will be holding the logger lock. If the producer network thread has.a log entry that needs to be appended, it will attempt to acquire the logger lock and deadlock.
This is a temporary workaround to avoid deadlocks in system tests by setting log level to WARN for `Metadata` in `VerifiableLog4jAppender`. The fix has been verified using the system tests log4j_appender_test.py which started failing when the info-level log entry was introduced.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Satish Duggana <satish.duggana@gmail.com>, tedyu <yuzhihong@gmail.com>
Closes#4375 from rajinisivaram/KAFKA-6415-log4jappender
* Implement process stop faults via SIGSTOP / SIGCONT
* Implement RoundTripWorkload, which both sends messages, and confirms that they are received at least once.
* Allow Trogdor tasks to block until other Trogdor tasks are complete.
* Add CreateTopicsWorker, which can be a building block for a lot of tests.
* Simplify how TaskSpec subclasses in ducktape serialize themselves to JSON.
* Implement some fault injection tests in round_trip_workload_test.py
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4323 from cmccabe/KAFKA-5849
For ducktape: add Kibosh to the testing Dockerfile.
Create files_unreadable_fault_spec.py.
For trogdor: create FilesUnreadableFaultSpec.java.
Add a unit test of using the Kibosh service.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#4195 from cmccabe/KAFKA-5811
This is follow up to #4072 which added the PushHttpMetricsReporter and converted some services to use it. We somehow missed some compatibility issues that made the ProducerPerformance tool fail when using a newer tools jar with older common/clients jar, which we do with some system tests so we have all the features we need in the tool but can build compatibility tests for older releases.
This just adjusts some API usage to make the tool compatible with all previous releases.
I have a full run of the tests starting [here](https://jenkins.confluent.io/job/system-test-kafka-branch-builder/1122/)
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#4214 from ewencp/fix-compatibility-sanity-check-tests
Previously, Trogdor only handled "Faults." Now, Trogdor can handle
"Tasks" which may be either faults, or workloads to execute in the
background.
The Agent and Coordinator have been refactored from a
mutexes-and-condition-variables paradigm into a message passing
paradigm. No locks are necessary, because only one thread can access
the task state or worker state. This makes them a lot easier to reason
about.
The MockTime class can now handle mocking deferred message passing
(adding a message to an ExecutorService with a delay). I added a
MockTimeTest.
MiniTrogdorCluster now starts up Agent and Coordinator classes in
paralle in order to minimize junit test time.
RPC messages now inherit from a common Message.java class. This class
handles implementing serialization, equals, hashCode, etc.
Remove FaultSet, since it is no longer necessary.
Previously, if CoordinatorClient or AgentClient hit a networking
problem, they would throw an exception. They now retry several times
before giving up. Additionally, the REST RPCs to the Coordinator and
Agent have been changed to be idempotent. If a response is lost, and
the request is resent, no harm will be done.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#4073 from cmccabe/KAFKA-6060
Author: Tom Bentley <tbentley@redhat.com>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4157 from tombentley/KAFKA-6130-verifiable-consumer-max-messages
Multiple inflights means that when there are rolling bounces or other cluster instability, there is an increased likelihood of having previously tried batch expire in the accumulator. This is a fatal error
for a transactional producer, causing the `TransactionalMessageCopier` to exit. To work around this, we bump the request timeout. We can get rid of this when KIP-91 is merged.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4039 from apurvam/MINOR-bump-request-timeout-in-transactional-message-copier
With these changes, we are ensuring that the partitions being reassigned are from non-zero offsets. We also ensure that every message in the log has producerId and sequence number.
This means that it successfully reproduces https://issues.apache.org/jira/browse/KAFKA-6003.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#4029 from apurvam/KAFKA-6016-add-idempotent-producer-to-reassign-partitions
Since we removed the unused `TRACE` option from `SecurityProtocol`, it now seems safer to expose it from `AuthenticationContext`. Additionally this patch exposes javadocs under security.auth and relocates the `Login` and `AuthCallbackHandler` to a non-public package.
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Rajini Sivaram <rajinisivaram@googlemail.com>, Manikumar Reddy <manikumar.reddy@gmail.com>, Ismael Juma <ismael@juma.me.uk>
Closes#3863 from hachikuji/use-security-protocol-in-auth-context
Adds new metrics to support health checks:
1. Error rates for each request type, per-error code
2. Request size and temporary memory size
3. Message conversion rate and time
4. Successful and failed authentication rates
5. ZooKeeper latency and status
6. Client version
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3705 from rajinisivaram/KAFKA-5746-new-metrics
Here we introduce client and broker changes to support multiple inflight requests while still guaranteeing idempotence. Two major problems to be solved:
1. Sequence number management on the client when there are request failures. When a batch fails, future inflight batches will also fail with `OutOfOrderSequenceException`. This must be handled on the client with intelligent sequence reassignment. We must also deal with the fatal failure of some batch: the future batches must get different sequence numbers when the come back.
2. On the broker, when we have multiple inflights, we can get duplicates of multiple old batches. With this patch, we retain the record metadata for 5 older batches.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3743 from apurvam/KAFKA-5494-increase-max-in-flight-for-idempotent-producer
In the coordinator, we should check that 'shutdown' is not true before going to sleep waiting for the condition.
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3755 from cmccabe/KAFKA-5806
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Rajini Sivaram <rajinisivaram@googlemail.com>
Closes#3699 from cmccabe/trogdor-review
Clean up includes:
- Switching try-catch-finally blocks to try-with-resources when possible
- Removing some seemingly unnecessary `SuppressWarnings` annotations
- Resolving some Java warnings
- Closing unclosed Closable objects
- Removing unused code
Author: Vahid Hashemian <vahidhashemian@us.ibm.com>
Reviewers: Balint Molnar <balintmolnar91@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Matthias J. Sax <matthias@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>
Closes#3222 from vahidhashemian/minor/code_cleanup_1706
With this patch, the `ProducePerfomance` tool can create transactions of differing durations.
This patch was used to to collect the initial set of benchmarks for transaction performance, documented here: https://docs.google.com/spreadsheets/d/1dHY6M7qCiX-NFvsgvaE0YoVdNq26uA8608XIh_DUpI4/edit#gid=282787170
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#3400 from apurvam/MINOR-add-transaction-size-to-producre-perf
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Colin P. Mccabe <cmccabe@confluent.io>, Jason Gustafson <jason@confluent.io>
Closes#3339 from ijuma/kafka-5275-admin-client-api-consistency
Author: Jason Gustafson <jason@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#3340 from hachikuji/add-random-aborts-to-system-test
Before this patch, we would call `producerBatch.done` directly from the accumulator when expiring batches. This meant that we would not transition to the `ABORTABLE_ERROR` state in the transaction manager, allowing other transactional requests (including Commits!) to go through, even though the produce failed.
This patch modifies the logic so that we call `Sender.failBatch` on every expired batch, thus ensuring that the transaction state is accurate.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Jason Gustafson <jason@confluent.io>
Closes#3252 from apurvam/KAFKA-5385-fail-transaction-if-batches-expire
This currently fails in multiple ways. One of which is most likely KAFKA-5355, where the concurrent consumer reads duplicates.
During broker bounces, the concurrent consumer misses messages completely. This is another bug.
Author: Apurva Mehta <apurva@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>
Closes#3217 from apurvam/KAFKA-5366-add-concurrent-reads-to-transactions-system-test
It avoids the need to handle protocol downgrades and it's safe (i.e. it will never cause
the auto creation of topics).
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#3220 from ijuma/kafka-5374-admin-client-metadata
junrao added a config `--print.metrics` to control whether ProducerPerformance prints out metrics at the end of the test. If its okay, will add the code counterpart for consumer.
Author: huxi <huxi@zhenrongbao.com>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2860 from amethystic/kafka-5068_print_metrics_in_perf_tests
This is a minor change but it helps to improve the log readability.
Author: Kamal C <kamal.chandraprakash@gmail.com>
Reviewers: Ismael Juma <ismael@juma.me.uk>
Closes#2709 from Kamal15/util
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Apurva Mehta <apurva@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2684 from cmccabe/KAFKA-4895
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#2303 from mjsax/licenseHeader
Author: Maysam Yabandeh <myabandeh@dropbox.com>
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Jun Rao <junrao@gmail.com>
Closes#2474 from ijuma/kafka-4039-deadlock-during-shutdown
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2414 from cmccabe/KAFKA-4635
Author: Colin P. Mccabe <cmccabe@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ismael Juma <ismael@juma.me.uk>
Closes#2390 from cmccabe/KAFKA-4630
Current implementation of ProducerPerformance creates static payload. This is not very useful in testing compression or when you want to test with production/custom payloads. So, we decided to add support for providing payload file as an input to producer perf test script.
We made the following changes:
1. Added support to provide a payload file which can have the list of payloads that you actually want to send.
2. Moved payload generation inside the send loop for cases when payload file is provided.
Following are the changes to how the producer-performance is evoked:
1. You must provide "--record-size" or "--payload-file" but not both. This is because, record size cannot be guaranteed when you are using custom events.
e.g. ./kafka-producer-perf-test.sh --topic test_topic --num-records 100000 --producer-props bootstrap.servers=127.0.0.1:9092 acks=0 buffer.memory=33554432 compression.type=gzip batch.size=10240 linger.ms=10 --throughput -1 --payload-file ./test_payloads --payload-delimiter ,
2. Earlier "--record-size" was a required config, now you must provide exactly one of "--record-size" or "--payload-file". Providing both will result in an error.
3. Support for an additional parameter "--payload-delimiter" has been added which defaults to "\n"
Author: Sandesh K <sandesh.karkera@flipkart.com>
Reviewers: dan norwood <norwood@confluent.io>, Jun Rao <junrao@gmail.com>
Closes#2158 from SandeshKarkera/PerfProducerChanges
moved streams application reset tool from tools to core
Author: Matthias J. Sax <matthias@confluent.io>
Reviewers: Ismael Juma <ismael@juma.me.uk>, Damian Guy <damian.guy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1685 from mjsax/moveResetTool
(cherry picked from commit f2405a73ea)
Signed-off-by: Ewen Cheslack-Postava <me@ewencp.org>
Hi all,
This is my first commit to Kafka.
"msec / 1000" turns into sec, isn't it?
I just have fixed a variable name.
granders
Author: kota-uchida <kota-uchida@cybozu.co.jp>
Reviewers: Guozhang Wang <wangguoz@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1482 from uchan-nos/elapsed-ms
This actually removes joins altogether, as well as references to self.worker_threads, which is best left as an implementation detail in BackgroundThreadService.
This makes use of hachikuji 's recent ducktape patch, and updates ducktape dependency to 0.5.0.
Author: Geoff Anderson <geoff@confluent.io>
Reviewers: Jason Gustafson <jason@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1297 from granders/KAFKA-3581-systest-add-join-timeout
Even if a test calls stop() on console_consumer or verifiable_producer, it is still possible that producer/consumer will not shutdown cleanly, and will be killed forcefully after a timeout. It will be useful for some tests to know whether a clean shutdown happened or not. This PR adds methods to console_consumer and verifiable_producer to query whether clean shutdown happened or not.
hachikuji and/or granders Please review.
Author: Anna Povzner <anna@confluent.io>
Reviewers: Jason Gustafson, Geoff Anderson, Gwen Shapira
Closes#1278 from apovzner/kafka-3597
* Use a fixed `Random` seed in `EndToEndLatency.scala` for determinism
* Add `compression_type` to and remove `consumer_fetch_max_wait` from `end_to_end_latency.py`. The latter was never used.
* Tweak logging of `end_to_end_latency.py` to be similar to `consumer_performance.py`.
* Add `compression_type` to `benchmark_test.py` methods and add `snappy` to `matrix` annotation
* Use randomly generated bytes from a restricted range for `ProducerPerformance` payload. This is a simple fix for now. It can be improved in the PR for KAFKA-3554.
Author: Ismael Juma <ismael@juma.me.uk>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#1225 from ijuma/kafka-3558-add-compression_type-benchmark_test.py
Added CompressionTest that tests 4 producers, each using a different compression type and one not using compression.
Enabled VerifiableProducer to run producers with different compression types (passed in the constructor). This includes enabling each producer to output unique values, so that the verification process in ProduceConsumeValidateTest is correct (counts acks from all producers).
Also a fix for console consumer to raise an exception if it sees the incorrect consumer output (before we swallowed an exception, so was hard to debug the issue).
Author: Anna Povzner <anna@confluent.io>
Reviewers: Geoff Anderson, Jason Gustafson
Closes#958 from apovzner/kafka-3214
Note that KAFKA-3077 will be required to run these tests.
Author: Ashish Singh <asingh@cloudera.com>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#747 from SinghAsDev/KAFKA-3078
Summary of code changes
------------------------------------
1) Added a new Checkstyle rule to flag any code using star imports
2) Fixed ALL existing code violations using star imports
Testing
-----------
Local build was successful
ALL JUnits ran successfully on local.
ewencp - Request you to please review changes. Thank you !
I state that the contribution is my original work and I license the work to the project under the project's open source license.
Author: manasvigupta <manasvigupta@yahoo.co.in>
Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>
Closes#700 from manasvigupta/KAFKA-3009
Gradle does not handle subprojects with the same name (top-level tools vs
connect/tools) properly, making the dependency impossible to express correctly
since we need to move the ThroughputThrottler class into the top level tools
project. Moving the current set of tools into the runtime jar works fine since
they are only used for system tests at the moment.
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Gwen Shapira
Closes#512 from ewencp/kafka-2807-redux
Author: Ewen Cheslack-Postava <me@ewencp.org>
Reviewers: Ben Stopford, Geoff Anderson, Guozhang Wang
Closes#432 from ewencp/kafka-2752-copycat-clean-bounce-test
Updated kafka-producer-perf-test.sh to use org.apache.kafka.clients.tools.ProducerPerformance.
Updated build.gradle to add kafka-tools-0.9.0.0-SNAPSHOT.jar to kafka/libs folder.
Author: Manikumar reddy O <manikumar.reddy@gmail.com>
Reviewers: Gwen Shapira, Ismael Juma
Closes#242 from omkreddy/KAFKA-2562
ewencp gwenshap
This needs some refactoring to avoid the duplicated code between replication test and upgrade test, but in shape for initial feedback.
I'm interested in feedback on the added `KafkaConfig` class and `kafka_props` file. This addition makes it:
- easier to attach different configs to different nodes (e.g. during broker upgrade process)
- easier to reason about the configuration of a particular node
Notes:
- in the default values in the KafkaConfig class, I removed many properties which were in kafka.properties before. This is because most of those properties were set to what is already the default value.
- when running non-trunk VerifiableProducer, I append the trunk tools jar to the classpath, and run it with the non-trunk kafka-run-class.sh script
Author: Geoff Anderson <geoff@confluent.io>
Reviewers: Dong Lin, Ewen Cheslack-Postava
Closes#229 from granders/KAFKA-1888-upgrade-test
Parametrize console consumer sanity test, replication tests and benchmarks tests to run with both PLAINTEXT and SSL.
Author: Rajini Sivaram <rajinisivaram@googlemail.com>
Reviewers: Geoff Anderson, Ewen Cheslack-Postava, Guozhang Wang
Closes#271 from rajinisivaram/KAFKA-2581
ewencp
The changes here are smaller than they look - mostly refactoring/cleanup.
- ConsumerPerformanceService: added new_consumer flag, and exposed more command-line settings
- benchmark.py: refactored to use `parametrize` and `matrix` - this reduced some amount of repeated code
- benchmark.py: added consumer performance tests with new consumer (using `parametrize`)
- benchmark.py: added more detailed test descriptions
- performance.py: broke into separate files
Author: Geoff Anderson <geoff@confluent.io>
Reviewers: Ewen Cheslack-Postava, Jason Gustafson, Gwen Shapira
Closes#179 from granders/KAFKA-2489-benchmark-new-consumer
Initial patch for KIP-25
Note that to install ducktape, do *not* use pip to install ducktape. Instead:
```
$ git clone gitgithub.com:confluentinc/ducktape.git
$ cd ducktape
$ python setup.py install
```
Author: Geoff Anderson <geoff@confluent.io>
Author: Geoff <granders@gmail.com>
Author: Liquan Pei <liquanpei@gmail.com>
Reviewers: Ewen, Gwen, Jun, Guozhang
Closes#70 from granders/KAFKA-2276 and squashes the following commits:
a62fb6c [Geoff Anderson] fixed checkstyle errors
a70f0f8 [Geoff Anderson] Merged in upstream trunk.
8b62019 [Geoff Anderson] Merged in upstream trunk.
47b7b64 [Geoff Anderson] Created separate tools jar so that the clients package does not pull in dependencies on the Jackson JSON tools or argparse4j.
a9e6a14 [Geoff Anderson] Merged in upstream changes
d18db7b [Geoff Anderson] fixed :rat errors (needed to add licenses)
321fdf8 [Geoff Anderson] Ignore tests/ and vagrant/ directories when running rat build task
795fc75 [Geoff Anderson] Merged in changes from upstream trunk.
1d93f06 [Geoff Anderson] Updated provisioning to use java 7 in light of KAFKA-2316
2ea4e29 [Geoff Anderson] Tweaked README, changed default log collection behavior on VerifiableProducer
0eb6fdc [Geoff Anderson] Merged in system-tests
69dd7be [Geoff Anderson] Merged in trunk
4034dd6 [Geoff Anderson] Merged in upstream trunk
ede6450 [Geoff] Merge pull request #4 from confluentinc/move_muckrake
7751545 [Geoff Anderson] Corrected license headers
e6d532f [Geoff Anderson] java 7 -> java 6
8c61e2d [Geoff Anderson] Reverted jdk back to 6
f14c507 [Geoff Anderson] Removed mode = "test" from Vagrantfile and Vagrantfile.local examples. Updated testing README to clarify aws setup.
98b7253 [Geoff Anderson] Updated consumer tests to pre-populate kafka logs
e6a41f1 [Geoff Anderson] removed stray println
b15b24f [Geoff Anderson] leftover KafkaBenchmark in super call
0f75187 [Geoff Anderson] Rmoved stray allow_fail. kafka_benchmark_test -> benchmark_test
f469f84 [Geoff Anderson] Tweaked readme, added example Vagrantfile.local
3d73857 [Geoff Anderson] Merged downstream changes
42dcdb1 [Geoff Anderson] Tweaked behavior of stop_node, clean_node to generally fail fast
7f7c3e0 [Geoff Anderson] Updated setup.py for kafkatest
c60125c [Geoff Anderson] TestEndToEndLatency -> EndToEndLatency
4f476fe [Geoff Anderson] Moved aws scripts to vagrant directory
5af88fc [Geoff Anderson] Updated README to include aws quickstart
e5edf03 [Geoff Anderson] Updated example aws Vagrantfile.local
96533c3 [Geoff] Update aws-access-keys-commands
25a413d [Geoff] Update aws-example-Vagrantfile.local
884b20e [Geoff Anderson] Moved a bunch of files to kafkatest directory
fc7c81c [Geoff Anderson] added setup.py
632be12 [Geoff] Merge pull request #3 from confluentinc/verbose-client
51a94fd [Geoff Anderson] Use argparse4j instead of joptsimple. ThroughputThrottler now has more intuitive behavior when targetThroughput is 0.
a80a428 [Geoff Anderson] Added shell program for VerifiableProducer.
d586fb0 [Geoff Anderson] Updated comments to reflect that throttler is not message-specific
6842ed1 [Geoff Anderson] left out a file from last commit
1228eef [Geoff Anderson] Renamed throttler
9100417 [Geoff Anderson] Updated command-line options for VerifiableProducer. Extracted throughput logic to make it reusable.
0a5de8e [Geoff Anderson] Fixed checkstyle errors. Changed name to VerifiableProducer. Added synchronization for thread safety on println statements.
475423b [Geoff Anderson] Convert class to string before adding to json object.
bc009f2 [Geoff Anderson] Got rid of VerboseProducer in core (moved to clients)
c0526fe [Geoff Anderson] Updates per review comments.
8b4b1f2 [Geoff Anderson] Minor updates to VerboseProducer
2777712 [Geoff Anderson] Added some metadata to producer output.
da94b8c [Geoff Anderson] Added number of messages option.
07cd1c6 [Geoff Anderson] Added simple producer which prints status of produced messages to stdout.
a278988 [Geoff Anderson] fixed typos
f1914c3 [Liquan Pei] Merge pull request #2 from confluentinc/system_tests
81e4156 [Liquan Pei] Bootstrap Kafka system tests