Commit Graph

2334 Commits

Author SHA1 Message Date
Ismael Juma 319c6e7195 MINOR: Add missing `@Override` to `KStreamImpl.through`
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1216 from ijuma/add-missing-override-to-through
2016-04-12 11:34:46 -07:00
Guozhang Wang 40fd456649 KAFKA-3519: Refactor Transformer's transform / punctuate to return nullable values
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Dan Norwood, Anna Povzner

Closes #1204 from guozhangwang/KTransformR
2016-04-11 12:33:48 -07:00
Guozhang Wang c76b6e6d9b HOTFIX: special handling first ever triggered punctuate
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Anna Povzner <anna@confluent.io>

Closes #1208 from guozhangwang/KPunctuate
2016-04-10 23:09:43 -07:00
bbejeck 7c27989860 KAFKA-3338: Add print and writeAsText to KStream/KTable in Kafka Streams
Addresses comments from previous PR [#1187]
    Changed print and writeAsText method return signature to void
    Flush System.out on close
    Changed IllegalStateException to TopologyBuilderException
    Updated MockProcessorContext.topic method to return a String
    Renamed KStreamPrinter to KeyValuePrinter
    Updated the printing of null keys to 'null' to match ConsoleConsumer
    Updated JavaDoc stating need to override toString

Author: bbejeck <bbejeck@gmail.com>

Reviewers: Dan Norwood, Guozhang Wang

Closes #1209 from bbejeck/KAFKA-3338_Adding_print/writeAsText_to_Streams_DSL
2016-04-10 17:43:47 -07:00
Guozhang Wang 8c59565761 KAFKA-3521: validate null keys in Streams DSL implementations
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #1197 from guozhangwang/K3521
2016-04-08 13:30:46 -07:00
Eno Thereska 9beafae23a KAFKA-3512: Added foreach operator
miguno guozhangwang please have a look if you can.

Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Michael G. Noll <michael@confluent.io>, Guozhang Wang <wangguoz@gmail.com>

Closes #1193 from enothereska/kafka-3512-ForEach
2016-04-08 09:17:05 -07:00
Guozhang Wang 3a58407e2e KAFKA-3505: Fix punctuate generated record metadata
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Anna Povzner <anna@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #1190 from guozhangwang/K3505
2016-04-08 08:59:50 -07:00
Eno Thereska 8dbd688b16 KAFKA-3497: Streams ProcessorContext should support forward() based on child name
Author: Eno Thereska <eno.thereska@gmail.com>

Reviewers: Yuto Kawamura, Michael G. Noll, Guozhang Wang

Closes #1194 from enothereska/kafka-3497-forward
2016-04-07 10:20:17 -07:00
Matthias J. Sax 99d2329227 KAFKA-3477: extended KStream/KTable API to specify custom partitioner for sinks
Author: mjsax <matthias@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1180 from mjsax/kafka-3477-streamPartitioner-DSL
2016-04-05 15:56:09 -07:00
Yasuhiro Matsuda 31e263e829 HOTFIX: set timestamp in SinkNode
guozhangwang
Setting the timestamp in produced records in SinkNode. This forces the producer record's timestamp same as the context's timestamp.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1137 from ymatsuda/set_timestamp_in_sinknode
2016-04-04 14:57:15 -07:00
Yasuhiro Matsuda bd5325dd8b MINOR: small code optimizations in streams
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1176 from ymatsuda/optimize
2016-04-01 17:14:29 -07:00
Guozhang Wang ae939467e8 MINOR: add null check for aggregate and reduce operators
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda, Gwen Shapira

Closes #1175 from guozhangwang/KSNullPointerException
2016-04-01 13:14:47 -07:00
Yasuhiro Matsuda 2788f2dc73 MINOR: a simple benchmark for Streams
guozhangwang miguno

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1164 from ymatsuda/perf
2016-03-30 14:26:01 -07:00
Yasuhiro Matsuda 5089f547d5 HOTFIX: RocksDBStore must clear dirty flags after flush
guozhangwang
Without clearing the dirty flags, RocksDBStore will perform flush for every new record. This bug made the store performance painfully slower.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1163 from ymatsuda/clear_dirty_flag
2016-03-29 13:30:56 -07:00
Guozhang Wang 23b50093f4 KAFKA-3454: add Kafka Streams web docs
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Gwen Shapira

Closes #1127 from guozhangwang/KStreamsDocs
2016-03-25 16:04:58 -07:00
Andrea Cosentino c1d8c38345 KAFKA-3449: Rename filterOut() to filterNot() to achieve better terminology
…nology

Hi all,

This is my first contribution and I hope it will be good.

The PR is related to this issue:
https://issues.apache.org/jira/browse/KAFKA-3449

Thanks a lot,

Andrea

Author: Andrea Cosentino <ancosen@gmail.com>

Reviewers: Yasuhiro Matsuda, Guozhang Wang

Closes #1134 from oscerd/KAFKA-3449
2016-03-25 15:00:45 -07:00
Yasuhiro Matsuda 80d78f8147 HOTFIX: fix NPE in changelogger
Fix NPE in StoreChangeLogger caused by a record out of window retention period.
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1124 from ymatsuda/logger_npe
2016-03-23 14:25:08 -07:00
Ismael Juma d4d5920ed4 KAFKA-3432; Cluster.update() thread-safety
Replace `update` with `withPartitions`, which returns a copy instead of mutating the instance.

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1118 from ijuma/kafka-3432-cluster-update-thread-safety
2016-03-23 13:53:37 -07:00
Guozhang Wang b6c29e3810 MINOR: Add InterfaceStability.Unstable annotations to some Kafka Streams public APIs
Also improves Java docs for the Streams high-level DSL.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ismael Juma, Michael G. Noll

Closes #1097 from guozhangwang/KNewJavaDoc
2016-03-21 12:06:07 -07:00
Pierre-Yves Ritschard 4332175c11 KAFKA-3006: standardize KafkaConsumer API to use Collection
Author: Pierre-Yves Ritschard <pyr@spootnik.org>

Reviewers: Jason Gustafson, Gwen Shapira

Closes #1098 from hachikuji/KAFKA-3006
2016-03-18 16:07:20 -07:00
Guozhang Wang 5d0cd7667f KAFKA-3422: Add overloading functions without serdes in Streams DSL
Also include:

1) remove streams specific configs before passing to producer and consumer to avoid warning message;
2) add `ConsumerRecord` timestamp extractor and set as the default extractor.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Michael G. Noll, Ewen Cheslack-Postava

Closes #1093 from guozhangwang/KConfigWarn
2016-03-18 12:39:41 -07:00
Guozhang Wang dea0719e99 KAFKA-3336: Unify Serializer and Deserializer into Serialization
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Michael G. Noll, Ismael Juma

Closes #1066 from guozhangwang/K3336
2016-03-17 15:41:59 -07:00
Michael G. Noll 958e10c87c KAFKA-3411: Streams: stop using "job" terminology, rename job.id to application.id
guozhangwang ymatsuda : please review.

Author: Michael G. Noll <michael@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1081 from miguno/KAFKA-3411
2016-03-17 10:41:48 -07:00
Yasuhiro Matsuda 355076cd26 MINOR: kstream/ktable counting method with default long serdes
guozhangwang miguno

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Michael G. Noll, Guozhang Wang

Closes #1065 from ymatsuda/count_serdes
2016-03-15 12:08:26 -07:00
Liquan Pei cf40acc2b1 MINOR: Remove unused method, redundant in interface definition and add final for object used in sychronization
guozhangwang Very minor cleanup.

Author: Liquan Pei <liquanpei@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1063 from Ishiihara/minor-cleanup
2016-03-14 15:09:47 -07:00
Yasuhiro Matsuda c1a56c6839 KAFKA-3395: prefix job id to internal topic names
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1062 from ymatsuda/k3395
2016-03-14 14:50:24 -07:00
Guozhang Wang 9c4c5ae1cd MINOR: Add unit test for internal topics
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #1047 from guozhangwang/KInternal
2016-03-10 14:54:47 -08:00
Manikumar reddy O 5761a9ec72 MINOR: streams javadoc corrections
Author: Manikumar reddy O <manikumar.reddy@gmail.com>

Reviewers: Gwen Shapira

Closes #1019 from omkreddy/JAVADOC
2016-03-07 11:32:34 -08:00
Guozhang Wang 1d19ac9fea HOTFIX: Avoid NPE in StreamsPartitionAssignor
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Michael G. Noll <michael@confluent.io>

Closes #1004 from guozhangwang/KStreamPANPE
2016-03-03 08:57:21 -08:00
Michael G. Noll 10394aa801 KAFKA-3324; NullPointerException in StreamPartitionAssignor
Author: Michael G. Noll <michael@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #1001 from miguno/KAFKA-3324
2016-03-03 02:53:12 -08:00
Ewen Cheslack-Postava b9eda22d71 HOTFIX: Fix checkstyle failure in KStreams by providing fully qualified class names.
Author: Ewen Cheslack-Postava <me@ewencp.org>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #1000 from ewencp/hotfix-kstreams-checkstyle-javadocs
2016-03-02 23:42:17 -08:00
Guozhang Wang f676cfeb83 MINOR: Improve JavaDoc for some public classes.
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Mastuda <yasuhiro.mastuda@gmail.com>

Closes #999 from guozhangwang/KJavaDoc
2016-03-02 16:11:19 -08:00
Guozhang Wang 2a58ba9fd8 KAFKA-3311; Prepare internal source topics before calling partition grouper
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro.matsuda@gmail.com>, Jun Rao <junrao@gmail.com>

Closes #990 from guozhangwang/K3311
2016-03-02 13:43:48 -08:00
Anna Povzner 002b377dad KAFKA-3196; Added checksum and size to RecordMetadata and ConsumerRecord
This is the second (remaining) part of KIP-42. See https://cwiki.apache.org/confluence/display/KAFKA/KIP-42%3A+Add+Producer+and+Consumer+Interceptors

Author: Anna Povzner <anna@confluent.io>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Jason Gustafson <jason@confluent.io>, Jun Rao <junrao@gmail.com>

Closes #951 from apovzner/kafka-3196
2016-03-02 09:40:34 -08:00
Guozhang Wang 79662cc7cb HOTFIX: Use the correct serde classes
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #991 from guozhangwang/KSerde
2016-03-01 16:27:45 -08:00
Yasuhiro Matsuda 5da935ef78 MINOR: Add AUTO_OFFSET_RESET_CONFIG to StreamsConfig; remove TOTAL_RECORDS_TO_PROCESS from StreamsConfig
and remove TOTAL_RECORDS_TO_PROCESS
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #985 from ymatsuda/config_params
2016-02-29 16:05:46 -08:00
Guozhang Wang 845c6eae1f KAFKA-3192: Add unwindowed aggregations for KStream; and make all example code executable
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda, Michael G. Noll, Jun Rao

Closes #870 from guozhangwang/K3192
2016-02-29 14:03:32 -08:00
Kim Christensen d501cc62dd KAFKA-3133: Add putIfAbsent function to KeyValueStore
guozhangwang

Author: Kim Christensen <kich@mvno.dk>

Reviewers: Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>

Closes #912 from kichristensen/KAFKA-3133
2016-02-29 12:46:03 -08:00
Tom Dearman d4e60b9f59 KAFKA-3278: concatenate thread name to clientId when producer and consumers config is created
guozhangwang made the changes as requested, I reverted my original commit and that seems to have closed the other pull request - sorry if that mucks up the process a bit

Author: tomdearman <tom.dearman@gmail.com>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #978 from tomdearman/KAFKA-3278
2016-02-26 15:00:38 -08:00
Yasuhiro Matsuda 13b8fb295c MINOR: enhance streams system test
guozhangwang

* add table aggregate to the system test
* actually create change log partition replica

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #966 from ymatsuda/enh_systest
2016-02-24 18:11:36 -08:00
Yasuhiro Matsuda fa05752ccd MINOR: add retry to state dir locking
There is a possibility that the state directory locking fails when another stream thread is taking long to close all tasks. Simple retries should alleviate the problem.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #899 from ymatsuda/minor2
2016-02-24 13:41:46 -08:00
Guozhang Wang c00a036e06 HOTFIX: Add missing file for KeyValue unit test
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Gwen Shapira

Closes #960 from guozhangwang/KCountP1
2016-02-23 15:33:29 -08:00
Guozhang Wang 0ce9163989 MINOR: KTable.count() to only take a selector for key
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Gwen Shapira, Yashiru Matsuda, Michael Noll

Closes #872 from guozhangwang/KCount
2016-02-23 15:10:56 -08:00
Yasuhiro Matsuda f7fe9ccb46 HOTFIX: fix consumer config for streams
guozhangwang
My bad. I removed ZOOKEEPER_CONNECT_CONFIG from consumer's config by mistake. It is needed by our own partition assigner running in consumers.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #959 from ymatsuda/hotfix3
2016-02-23 14:55:07 -08:00
Yasuhiro Matsuda 878b78acb6 KAFKA-3245: config for changelog replication factor
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #948 from ymatsuda/changelog_topic_replication
2016-02-23 14:54:06 -08:00
Yasuhiro Matsuda 3358e1682f KAFKA-2802: kafka streams system tests
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Geoff Anderson <geoff@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #930 from ymatsuda/streams_systest
2016-02-23 12:14:26 -08:00
Yasuhiro Matsuda 68af16ac15 MINOR: catch a commit failure due to rebalance in StreamThread
StreamThread should keep going after a commit was failed due to a group rebalance.
Currently the thread just dies.
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #933 from ymatsuda/catch_commit_failure
2016-02-22 21:39:26 -08:00
Yasuhiro Matsuda 982ab09a79 HOTFIX: check offset limits in streamtask when recovering KTable store
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #947 from ymatsuda/hotfix2
2016-02-22 21:37:27 -08:00
Yasuhiro Matsuda ff7b0f5b46 HOTFIX: make sure to go through all shutdown steps
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #928 from ymatsuda/shutdown
2016-02-22 13:16:06 -08:00
Jiangjie Qin 45c8195fa1 KAFKA-3025; Added timetamp to Message and use relative offset.
See KIP-31 and KIP-32 for details.

A few notes on the patch:
1. This patch implements KIP-31 and KIP-32. The patch includes features in both KAFKA-3025,  KAFKA-3026 and KAFKA-3036
2. All unit tests passed.
3. The unit tests were run with new and old message format.
4. When message format conversion occurs during consumption, the consumer will not be able to detect the message size too large situation. I did not try to fix this because the situation seems rare and only happen during migration phase.

Author: Jiangjie Qin <becket.qin@gmail.com>
Author: Ismael Juma <ismael@juma.me.uk>
Author: Jiangjie (Becket) Qin <becket.qin@gmail.com>

Reviewers: Jason Gustafson <jason@confluent.io>, Anna Povzner <anna@confluent.io>, Ismael Juma <ismael@juma.me.uk>, Guozhang Wang <wangguoz@gmail.com>, Jun Rao <junrao@gmail.com>

Closes #764 from becketqin/KAFKA-3025
2016-02-19 07:56:40 -08:00
Yasuhiro Matsuda eee95228fa MINOR: remove streams config params from producer/consumer configs
Removing streams' specific config params from producer/consumer configs to reduce warning messages.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang <wangguoz@gmail.com>

Closes #906 from ymatsuda/clean_config
2016-02-18 09:39:30 +08:00
Yasuhiro Matsuda f141e647a4 MINOR: catch an exception in rebalance and stop the stream thread
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #901 from ymatsuda/minor3
2016-02-12 17:11:12 +08:00
tomdearman 330274ed1c KAFKA-3229 ensure that root statestore is registered with ProcessorStateManager
Pass through the root StateStore in the init method so the inner StateStore can register that object.

Author: tomdearman <tom.dearman@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #904 from tomdearman/KAFKA-3229
2016-02-11 11:35:55 -07:00
Yasuhiro Matsuda 67a7ea9d67 MINOR: add setUncaughtExceptionHandler to KafkaStreams
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #894 from ymatsuda/minor
2016-02-10 22:01:56 -08:00
Yasuhiro Matsuda c1f8f689af HOTFIX: poll even when all partitions are paused. handle concurrent cleanup
* We need to poll periodically even when all partitions are paused in order to respond to a possible rebalance promptly.
* There is a race condition when two (or more) threads try to clean up the same state directory. One of the thread fails with FileNotFoundException. Thus the new code simply catches it and ignore.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Gwen Shapira

Closes #893 from ymatsuda/hotfix
2016-02-10 15:02:27 -07:00
Yasuhiro Matsuda b5e6b8671a HOTFIX: open window segments in order, add segment id check in getSegment
* During window store initialization, we have to open segments in the segment id order and update ```currentSegmentId```, otherwise cleanup won't work.
* ```getSegment()``` should not create a segment and clean up old segments if the segment id is greater than ```currentSegmentId```. Segment maintenance should be driven not by query but only by data insertion.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #891 from ymatsuda/hotfix2
2016-02-09 13:22:46 -08:00
Yasuhiro Matsuda 6352a30f46 HOTFIX: Fix NPE after standby task reassignment
Buffered records of change logs must be cleared upon reassignment of standby tasks.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #889 from ymatsuda/hotfix
2016-02-09 10:02:20 -08:00
Yasuhiro Matsuda feda3f68e9 HOTFIX: open window segments on init
guozhangwang

A window store should open all existing segments. This is important for segment cleanup, and it also ensures that the first fetch() call returns the hits, the values in the search range. (previously, it missed the hits in fetch() immediately after initialization).

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #886 from ymatsuda/hotfix3
2016-02-08 16:48:06 -08:00
Yasuhiro Matsuda f7ad3d1b1f HOTFIX: RecordCollector should send a record to the specified partition
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #887 from ymatsuda/hotfix4
2016-02-08 16:35:05 -08:00
Yasuhiro Matsuda d2fc6f36cc MINOR: fix RocksDBStore range search
The range is inclusive according to KeyValueStore's java doc.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #883 from ymatsuda/minor
2016-02-09 05:00:38 +08:00
Yasuhiro Matsuda 4ee68b43c1 HOTFIX: fix streams issues
* RocksDBStore.putInternal should bypass logging.
* StoreChangeLogger should not call context.recordCollector() when nothing to log
  * This is for standby tasks. In standby task, recordCollector() throws an exception. There should be nothing to log anyway.
* fixed ConcurrentModificationException in StreamThread

guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #877 from ymatsuda/hotfix2
2016-02-09 04:46:11 +08:00
Yasuhiro Matsuda fa05ee7279 MINOR: Add more info to RecordCollector error message
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Grant Henke <granthenke@gmail.com>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #873 from ymatsuda/hotfix
2016-02-05 12:58:14 -08:00
Guozhang Wang 7802a90ed9 KAFKA-3207: Fix StateChangeLogger to use the right topic name
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #865 from guozhangwang/K3207
2016-02-04 14:51:10 -08:00
Yasuhiro Matsuda 0a7b20e286 HOTFIX: fix partition ordering in assignment
workround partition ordering not preserved by the consumer group management.
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #868 from ymatsuda/partitionOrder
2016-02-04 14:14:08 -08:00
Yasuhiro Matsuda 77683c3cb0 HOTFIX: temp fix for ktable look up
guozhangwang
Temporarily disabled state store access checking.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #864 from ymatsuda/fix_table_lookup
2016-02-04 10:16:01 -08:00
Guozhang Wang d3ff902d60 MINOR: Fix restoring for source KTable
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #860 from guozhangwang/KRestoreChangelog
2016-02-03 20:42:43 -08:00
Guozhang Wang 79eacf6c95 MINOR: Some more Kafka Streams Javadocs
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #853 from guozhangwang/KJavaDoc
2016-02-03 11:31:32 -08:00
Ismael Juma e8343e67e1 KAFKA-3195; Transient test failure in OffsetCheckpointTest.testReadWrite
Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #855 from ijuma/kafka-3195-offset-checkpoint-test-transient-failure
2016-02-02 18:29:29 -08:00
Guozhang Wang 95174337c2 KAFKA-3121: Refactor KStream Aggregate to be Lambda-able.
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #839 from guozhangwang/K3121s2
2016-02-02 12:01:47 -08:00
Yasuhiro Matsuda 8189f9d580 MINOR: some javadocs for kstream public api
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #844 from ymatsuda/javadoc
2016-02-02 10:52:43 -08:00
Guozhang Wang 86a9036a7b MINOR: fix the logic of RocksDBWindowStore using RocksDBStore Segments
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #849 from guozhangwang/KRemoveInitializedCheck
2016-02-02 10:14:13 -08:00
Guozhang Wang 4adfd7960c MINOR: Reorder StreamThread shutdown sequence
We need to close producer first before closing tasks to make sure all messages are acked and hence checkpoint offsets are updated before closing tasks and their state. It was re-ordered mistakenly before.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #845 from guozhangwang/KStreamState
2016-02-01 17:32:36 -08:00
Guozhang Wang 57da044a99 KAFKA-3060: Refactor MeteredStore and RockDBStore Impl
Changes include:

1) Move logging logic from MeteredXXXStore to internal stores, and leave WindowedStore API clean by removed all internalPut/Get functions.

2) Wrap common logging behavior of InMemory and LRUCache stores into one class.

3) Fix a bug for StoreChangeLogger where byte arrays are not comparable in HashSet by using a specified RawStoreChangeLogger.

4) Add a caching layer on top of RocksDBStore with object caching, it relies on the object's equals and hashCode function to be consistent with serdes.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda <yasuhiro@confluent.io>, Ewen Cheslack-Postava <ewen@confluent.io>

Closes #826 from guozhangwang/K3060
2016-02-01 16:11:13 -08:00
Yasuhiro Matsuda 181df80dc3 MINOR: removed obsolete class
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #843 from ymatsuda/remove_unused
2016-02-01 13:51:53 -08:00
Yasuhiro Matsuda 598851f19c MINOR: remove the init method from Serdes
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #834 from ymatsuda/remove_init_from_Serdes
2016-01-29 09:31:02 -08:00
Yasuhiro Matsuda 22de0a8ab5 MINOR: join test for windowed keys
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #814 from ymatsuda/windowed_key_join_test
2016-01-26 23:03:42 -08:00
Yasuhiro Matsuda 9ffa907d70 MINOR: remove FilteredIterator
guozhangwang
removing an unused class, FilteredIterator, and its test.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Gwen Shapira

Closes #816 from ymatsuda/remove_obsolete_class
2016-01-26 20:03:50 -08:00
Guozhang Wang 5ae97196ae KAFKA-3125: Add Kafka Streams Exceptions
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #809 from guozhangwang/K3125
2016-01-26 09:19:28 -08:00
Yasuhiro Matsuda 942074b77b MINOR: add equals and hashCode to Windowed
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #808 from ymatsuda/windowed_key
2016-01-25 16:22:32 -08:00
Guozhang Wang c197113a9c KAFKA-3066: Demo Examples for Kafka Streams
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #797 from guozhangwang/K3066
2016-01-22 15:25:24 -08:00
Guozhang Wang 21c6cfe50d KAFKA-3136: Rename KafkaStreaming to KafkaStreams
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Gwen Shapira

Closes #800 from guozhangwang/KRename
2016-01-22 13:00:00 -08:00
Guozhang Wang 959cf09e86 KAFKA-3121: Remove aggregatorSupplier and add Reduce functions
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #795 from guozhangwang/K3121s1
2016-01-20 16:10:43 -08:00
Guozhang Wang f75e335025 MINOR: complete built-in stream aggregate functions
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #787 from guozhangwang/KBuiltInAgg
2016-01-18 13:43:52 -08:00
Guozhang Wang a62eb5993f KAFKA-3104: add windowed aggregation to KStream
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Mastuda

Closes #781 from guozhangwang/K3104
2016-01-18 12:14:43 -08:00
Ismael Juma 0c32bc9926 KAFKA-3105: Use `Utils.atomicMoveWithFallback` instead of `File.rename`
It behaves better on Windows and provides more useful error messages.

Also:
* Minor inconsistency fix in `kafka.server.OffsetCheckpoint`.
* Remove delete from `streams.state.OffsetCheckpoint` constructor (similar to the change in `kafka.server.OffsetCheckpoint` in 836cb19633 (diff-2503b32f29cbbd61ed8316f127829455L29)).

Author: Ismael Juma <ismael@juma.me.uk>

Reviewers: Ewen Cheslack-Postava <ewen@confluent.io>

Closes #771 from ijuma/kafka-3105-use-atomic-move-with-fallback-instead-of-rename
2016-01-18 09:47:32 -08:00
Yasuhiro Matsuda 37be6d98da KAFKA-3108: custom StreamParitioner for Windowed key
guozhangwang

When ```WindowedSerializer``` is specified in ```to(...)``` or ```through(...)``` for a key, we use ```WindowedStreamPartitioner```.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #779 from ymatsuda/partitioner
2016-01-14 17:20:08 -08:00
Guozhang Wang a3d3d5379d MINOR: add internal source topic for tracking
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Mastuda

Closes #775 from guozhangwang/KRepartTopic
2016-01-14 17:09:33 -08:00
Guozhang Wang 4f22705c7d KAFKA-3081: KTable Aggregation
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #761 from guozhangwang/K3081
2016-01-13 17:15:57 -08:00
Guozhang Wang 40d731b871 KAFKA-2653: Add KStream/KTable Aggregation and KTable Join APIs
ping ymatsuda for reviews.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #730 from guozhangwang/K2653r
2016-01-07 17:18:33 -08:00
Randall Hauch 4836e525c8 KAFKA-2649: Add support for custom partitioning in topology sinks
Added option to use custom partitioning logic within each topology sink.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Guozhang Wang

Closes #309 from rhauch/kafka-2649
2016-01-07 14:46:02 -08:00
Yasuhiro Matsuda 5aad4999d1 KAFKA-3016: phase-2. stream join implementations
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #737 from ymatsuda/windowed_join2
2016-01-06 14:34:40 -08:00
Yasuhiro Matsuda b0b3e5aebf KAFKA-3016: phase-1. A local store for join window
guozhangwang
An implementation of local store for join window. This implementation uses "rolling" of RocksDB instances for timestamp based truncation.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #726 from ymatsuda/windowed_join
2016-01-04 16:47:17 -08:00
Yasuhiro Matsuda 587a2f4efd KAFKA-2984: KTable should send old values when required
guozhangwang

At DAG level, `KTable<K,V>` sends (key, (new value, old value)) to down stream.  This is done by wrapping the new value and the old value in an instance of `Change<V>` class and sending it as a "value" part of the stream. The old value is omitted (set to null) by default for optimization. When any downstream processor needs to use the old value, the framework should enable it (see `KTableImpl.enableSendingOldValues()` and implementations of `KTableProcessorSupplier.enableSensingOldValues()`).

NOTE: This is meant to be used by aggregation. But, if there is a use case like a SQL database trigger, we can add a new KTable method to expose this.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #672 from ymatsuda/trigger
2015-12-16 15:37:53 -08:00
Yasuhiro Matsuda 841d2d1a26 MINOR: StreamThread performance optimization
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #680 from ymatsuda/perf
2015-12-15 23:53:23 -08:00
Yasuhiro Matsuda 1dcafadefc MINOR: test ktable state store creation
guozhangwang
* a test for ktable state store creation

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #661 from ymatsuda/more_ktable_test
2015-12-10 21:48:27 -08:00
Yasuhiro Matsuda 3b350cdff7 HOTFIX: fix table-table outer join and left join. more tests
guozhangwang

* fixed bugs in table-table outer/left joins
* added more tests

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #653 from ymatsuda/join_tests
2015-12-09 23:02:44 -08:00
Guozhang Wang ec466d358d KAFKA-2733: Standardize metric name for Kafka Streams
Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Yasuhiro Matsuda, Jun Rao

Closes #643 from guozhangwang/K2733
2015-12-09 22:15:34 -08:00
Yasuhiro Matsuda 991aad23ba KAFKA-2962: stream-table table-table joins
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #644 from ymatsuda/join_methods
2015-12-08 23:33:46 -08:00
Dong Lin ef92a8ae74 KAFKA-2668; Add a metric that records the total number of metrics
onurkaraman becketqin Do you have time to review this patch? It addresses the ticket that jjkoshy filed in KAFKA-2668.

Author: Dong Lin <lindong28@gmail.com>

Reviewers: Onur Karaman <okaraman@linkedin.com>, Joel Koshy <jjkoshy@gmail.com>, Guozhang Wang <wangguoz@gmail.com>, Jun Rao <junrao@gmail.com>

Closes #328 from lindong28/KAFKA-2668
2015-12-08 19:43:05 -08:00
Yasuhiro Matsuda 268392f5e9 HOTFIX: fix ProcessorStateManager to use correct ktable partitions
guozhangwang

* fix ProcessorStateManager to use correct ktable partitions
* more ktable tests

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #635 from ymatsuda/more_ktable_test
2015-12-07 23:16:19 -08:00
Guozhang Wang d05fa0a03b KAFKA-2804: manage changelog topics through ZK in PartitionAssignor
Author: Guozhang Wang <wangguoz@gmail.com>
Author: wangguoz@gmail.com <guozhang@Guozhang-Macbook.local>
Author: Guozhang Wang <guozhang@Guozhang-Macbook.local>

Reviewers: Yasuhiro Matsuda

Closes #579 from guozhangwang/K2804
2015-12-07 15:12:09 -08:00
Yasuhiro Matsuda 39c3512ece KAFKA-2856: Add KTable non-stateful APIs along with standby task support
guozhangwang
* added KTable API and impl
* added standby support for KTable

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #604 from ymatsuda/add_ktable
2015-12-04 14:59:24 -08:00
bbejeck a5382a333e KAFKA-2902: streaming config use get base consumer configs.
Changes made for using getBaseConsumerConfigs from StreamingConfig.getConsumerConfigs.

Author: bbejeck <bbejeck@gmail.com>
Author: Bill Bejeck <bbejeck@gmail.com>

Reviewers: Guozhang Wang

Closes #596 from bbejeck/KAFKA-2902-StreamingConfig-use-getBaseConsumerConfigs
2015-12-02 18:40:07 -08:00
Yasuhiro Matsuda d07bb18140 MINOR: comments on KStream methods, and fix generics
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #591 from ymatsuda/comments
2015-11-25 16:44:43 -08:00
Yasuhiro Matsuda 5e8958a856 MINOR: initialize Serdes with ProcessorContext
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #589 from ymatsuda/init_serdes_with_procctx
2015-11-25 15:21:17 -08:00
Yasuhiro Matsuda e14ae66e72 MINOR: change KStream processor names
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #587 from ymatsuda/kstream_processor_names
2015-11-25 14:53:24 -08:00
Yasuhiro Matsuda 617a91a236 HOTFIX: fix StreamTask.close()
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #586 from ymatsuda/fix_streamtask_close
2015-11-25 12:04:24 -08:00
Bill Bejeck 84c8d2bb86 KAFKA-2872: unite sink nodes with parent nodes in addSink
Starting a KafkaStream was getting an error due to the fact that the TopologyBuilder.addSink method was not connecting the sink with it parent(s) processor/sources.  Just needed to wire up the sink with it parent(s) in TopologyBuilder.addSink .

Author: bbejeck <bbejeck@gmail.com>

Reviewers: Guozhang Wang

Closes #572 from bbejeck/KAFKA-2872_kafka_stream_sink_not_connected_to_parent
2015-11-21 18:46:15 -08:00
Yasuhiro Matsuda 0158623480 MINOR: remove the group id from a restore consumer
guozhangwang
A restore consumer does not belong to a consumer group.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #543 from ymatsuda/no_group_for_restore_consumer
2015-11-17 17:39:21 -08:00
Yasuhiro Matsuda 1a36af80b7 MINOR: add KStream merge operator
guozhangwang

Added KStreamBuilder.merge(KStream...).

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #536 from ymatsuda/kstream_merge_operator
2015-11-17 17:34:54 -08:00
Yasuhiro Matsuda 4a3d244a2c MINOR: do not create a StandbyTask if there is no state store in the task
guozhangwang
An optimization which may reduce unnecessary poll for standby tasks.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #535 from ymatsuda/remove_empty_standby_task
2015-11-16 14:09:27 -08:00
Yasuhiro Matsuda 45e7f71309 KAFKA-2811: add standby tasks
guozhangwang
* added a new config param "num.standby.replicas" (the default value is 0).
* added a new abstract class AbstractTask
* added StandbyTask as a subclass of AbstractTask
* modified StreamTask to a subclass of AbstractTask
* StreamThread
  * standby tasks are created by calling StreamThread.addStandbyTask() from onPartitionsAssigned()
  * standby tasks are destroyed by calling StreamThread.removeStandbyTasks() from onPartitionRevoked()
  * In addStandbyTasks(), change log partitions are assigned to restoreConsumer.
  * In removeStandByTasks(), change log partitions are removed from restoreConsumer.
  * StreamThread polls change log records using restoreConsumer in the runLoop with timeout=0.
  * If records are returned, StreamThread calls StandbyTask.update and pass records to each standby tasks.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #526 from ymatsuda/standby_task
2015-11-16 13:34:42 -08:00
Grant Henke 370ce2b4b7 KAFKA-2815: Fix KafkaStreamingPartitionAssignorTest.testSubscription
Fails when order of elements is incorrect

Author: Grant Henke <granthenke@gmail.com>

Reviewers: Yasuhiro Matsuda

Closes #510 from granthenke/streams-test
2015-11-12 09:59:47 -08:00
Yasuhiro Matsuda 124f73b174 KAFKA-2763: better stream task assignment
guozhangwang

When the rebalance happens each consumer reports the following information to the coordinator.
* Client UUID (a unique id assigned to an instance of KafkaStreaming)
* Task ids of previously running tasks
* Task ids of valid local states on the client's state directory

TaskAssignor does the following
* Assign a task to a client which was running it previously. If there is no such client, assign a task to a client which has its valid local state.
* Try to balance the load among stream threads.
  * A client may have more than one stream threads. The assignor tries to assign tasks to a client proportionally to the number of threads.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #497 from ymatsuda/task_assignment
2015-11-11 16:14:27 -08:00
Jason Gustafson c39e79bb5a KAFKA-2691: Improve handling of authorization failure during metadata refresh
Author: Jason Gustafson <jason@confluent.io>

Reviewers: Jun Rao

Closes #394 from hachikuji/KAFKA-2691
2015-11-04 11:02:30 -08:00
Yasuhiro Matsuda 421de0a3f9 KAFKA-2727: Topology partial construction
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #411 from ymatsuda/topology_partial_construction
2015-11-04 09:57:50 -08:00
Yasuhiro Matsuda e466ccd711 KAFKA-2707: make KStream processor names deterministic
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #408 from ymatsuda/kstream_processor_name
2015-11-02 14:40:52 -08:00
Yasuhiro Matsuda 758272267c KAFKA-2706: make state stores first class citizens in the processor topology
* Added StateStoreSupplier
* StateStore
  * Added init(ProcessorContext context) method
* TopologyBuilder
  * Added addStateStore(StateStoreSupplier supplier, String... processNames)
  * Added connectProessorAndStateStores(String processorName, String... stateStoreNames)
    * This is for the case processors are not created when a store is added to the topology. (used by KStream)
* KStream
  * add stateStoreNames to process(), transform(), transformValues().
* Refactored existing state stores to implement StateStoreSupplier

guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #387 from ymatsuda/state_store_supplier
2015-11-02 13:24:48 -08:00
Yasuhiro Matsuda 13c3e049fb HOTFIX: correct sourceNodes for kstream.through()
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #374 from ymatsuda/fix_through_operator
2015-10-27 16:26:47 -07:00
Yasuhiro Matsuda af42c37899 HOTFIX: call consumer.poll() even when no task is assigned
StreamThread should keep calling consumer.poll() even when no task is assigned. This is necessary to get a task.

guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #373 from ymatsuda/no_task
2015-10-27 13:57:19 -07:00
Yasuhiro Matsuda 38a1b60553 HOTFIX: fix off-by-one stream offset commit
guozhangwang

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #372 from ymatsuda/commit_offset
2015-10-27 13:49:19 -07:00
Yasuhiro Matsuda e6f9b9e473 KAFKA-2694: Reformat task id as group id and partition id
guozhangwang

* A task id is now a class, ```TaskId```, that has a topic group id and a partition id fields.
* ```TopologyBuilder``` assigns a topic group id to a topic group. Related methods are changed accordingly.
* A state store uses the partition id part of the task id as the change log partition id.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #365 from ymatsuda/task_id
2015-10-26 20:14:19 -07:00
Yasuhiro Matsuda 71399ffe4c KAFKA-2652: integrate new group protocol into partition grouping
guozhangwang

* added ```PartitionGrouper``` (abstract class)
 * This class is responsible for grouping partitions. Each group forms a task.
 * Users may implement this class for custom grouping.
* added ```DefaultPartitionGrouper```
 * our default implementation of ```PartitionGrouper```
* added ```KafkaStreamingPartitionAssignor```
 * We always use this as ```PartitionAssignor``` of stream consumers.
 * Actual grouping is delegated to ```PartitionGrouper```.
* ```TopologyBuilder```
 * added ```topicGroups()```
   * This returns groups of related topics according to the topology
 * added ```copartitionSources(sourceNodes...)```
   * This is used by DSL layer. It asserts the specified source nodes must be copartitioned.
 * added ```copartitionGroups()```
   * This returns groups of copartitioned topics
* KStream layer
 * keep track of source nodes to determine copartition sources when steams are joined
 * source nodes are set to null when partitioning property is not preserved (ex. ```map()```, ```transform()```), and this indicates the stream is no longer joinable

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #353 from ymatsuda/grouping
2015-10-26 13:33:51 -07:00
Randall Hauch c553249b4e KAFKA-2594: Add InMemoryLRUCacheStore as a preliminary method for bounding in-memory stores
Added a new `KeyValueStore` implementation called `InMemoryLRUCacheStore` that keeps a maximum number of entries in-memory, and as the size exceeds the capacity the least-recently used entry is removed from the store and the backing topic. Also added unit tests for this new store and the existing `InMemoryKeyValueStore` and `RocksDBKeyValueStore` implementations. A new `KeyValueStoreTestDriver` class simplifies all of the other tests, and can be used by other libraries to help test their own custom implementations.

This PR depends upon [KAFKA-2593](https://issues.apache.org/jira/browse/KAFKA-2593) and its PR at https://github.com/apache/kafka/pull/255. Once that PR is merged, I can rebase this PR if desired.

Two issues were uncovered when creating these new unit tests, and both are also addressed as separate (small) commits in this PR:
* The `RocksDBKeyValueStore` initialization was not creating the file system directory if missing.
* `MeteredKeyValueStore` was casting to `ProcessorContextImpl` to access the `RecordCollector`, which prevent using `MeteredKeyValueStore` implementations in tests where something other than `ProcessorContextImpl` was used. The fix was to introduce a `RecordCollector.Supplier` interface to define this `recordCollector()` method, and change `ProcessorContextImpl` and `MockProcessorContext` to both implement this interface. Now, `MeteredKeyValueStore` can cast to the new interface to access the record collector rather than to a single concrete implementation, making it possible to use any and all current stores inside unit tests.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Edward Ribeiro, Guozhang Wang

Closes #256 from rhauch/kafka-2594
2015-10-16 14:46:26 -07:00
Yasuhiro Matsuda 50a076d1e9 MINOR: set up temp directories properly in StreamTaskTest
guozhangwang
StreamTaskTest did not set up a temp directory for each test. This occasionally caused interference between tests through state directory locking.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #317 from ymatsuda/fix_StreamTaskTest
2015-10-15 11:49:44 -07:00
Yasuhiro Matsuda c50d39ea82 KAFKA-2654: optimize unnecessary poll(0) away in StreamTask
guozhangwang
This change aims to remove unnecessary ```consumer.poll(0)``` calls.
* ```once``` after some partition is resumed
* whenever the size of the top queue in any task is below ```BUFFERED_RECORDS_PER_PARTITION_CONFIG```

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #315 from ymatsuda/less_poll_zero
2015-10-15 11:06:51 -07:00
Randall Hauch 6e571225d5 KAFKA-2593: Key value stores can use specified serializers and deserializers
Add support for the key value stores to use specified serializers and deserializers (aka, "serdes"). Prior to this change, the stores were limited to only the default serdes specified in the topology's configuration and exposed to the processors via the ProcessorContext.

Now, using InMemoryKeyValueStore and RocksDBKeyValueStore are similar: both are parameterized on the key and value types, and both have similar multiple static factory methods. The static factory methods either take explicit key and value serdes, take key and value class types so the serdes can be inferred (only for the built-in serdes for string, integer, long, and byte array types), or use the default serdes on the ProcessorContext.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Guozhang Wang

Closes #255 from rhauch/kafka-2593
2015-10-14 13:59:10 -07:00
Yasuhiro Matsuda b1ce9494e3 MINOR: flush record collector after local state flush
guozhangwang
Fix the order of flushing. Undoing the change I did sometime ago.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #304 from ymatsuda/flush_order
2015-10-13 15:31:27 -07:00
Yasuhiro Matsuda c67ca65889 MINOR: putting back kstream stateful transform methods
guozhangwang

* added back type safe stateful transform methods (kstream.transform() and kstream.transformValues())
* changed kstream.process() to void

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Guozhang Wang

Closes #292 from ymatsuda/transform_method
2015-10-09 16:28:40 -07:00
Randall Hauch 7233858bea KAFKA-2600: Align Kafka Streams' interfaces with Java 8 functional interfaces
A few of Kafka Stream's interfaces and classes are not as well-aligned with Java 8's functional interfaces. By making these changes, when Kafka moves to Java 8 these classes can extend standard Java 8 functional interfaces while remaining backward compatible. This will make it easier for developers to use Kafka Streams, and may allow us to eventually remove these custom interfaces and just use the standard Java 8 interfaces.

The changes include:

1. The 'apply' method of KStream's `Predicate` functional interface was renamed to `test` to match the method name on `java.util.function.BiPredicate`. This will allow KStream's `Predicate` to extend `BiPredicate` when Kafka moves to Java 8, and for the `KStream.filter` and `filterOut` methods to accept `BiPredicate`.
2. Renamed the `ProcessorDef` and `WindowDef` interfaces to `ProcessorSupplier` and `WindowSupplier`, respectively. Also the `SlidingWindowDef` class was renamed to `SlidingWindowSupplier`, and the `MockProcessorDef` test class was renamed to `MockProcessorSupplier`. The `instance()` method in all were renamed to `get()`, so that all of these can extend/implement Java 8's `java.util.function.Supplier<T>` interface in the future with no other changes and while remaining backward compatible. Variable names that used some form of "def" were changed to use "supplier".

These two sets of changes were made in separate commits.

Author: Randall Hauch <rhauch@gmail.com>

Reviewers: Ismael Juma, Guozhang Wang

Closes #270 from rhauch/kafka-2600
2015-10-09 12:19:07 -07:00
Yasuhiro Matsuda 5a921a3239 MINOR: typing ProcessorDef
guozhangwang
This code change properly types ProcessorDef. This also makes KStream.process() typesafe.

Author: Yasuhiro Matsuda <yasuhiro@confluent.io>

Reviewers: Ismael Juma, Guozhang Wang

Closes #289 from ymatsuda/typing_ProcessorDef
2015-10-08 23:19:12 -07:00
Guozhang Wang f236bc20ff HOTFIX: Persistent store in ProcessorStateManagerTest
ymatsuda junrao Could you take a quick look? The current unit test is failing on this.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Ismael Juma, Jun Rao

Closes #276 from guozhangwang/HF-ProcessorStateManager
2015-10-05 13:59:04 -07:00
Guozhang Wang 37f7d75e3d KAFKA-2591: Fix StreamingMetrics
Remove state storage upon unclean shutdown and fix streaming metrics used for local state.

Author: Guozhang Wang <wangguoz@gmail.com>

Reviewers: Edward Ribeiro, Yasuhiro Matsuda, Jun Rao

Closes #265 from guozhangwang/K2591
2015-10-02 20:12:34 -07:00
Guozhang Wang 263c10ab7c KIP-28: Add a processor client for Kafka Streaming
This work has been contributed by Jesse Anderson, Randall Hauch, Yasuhiro Matsuda and Guozhang Wang. The detailed design can be found in https://cwiki.apache.org/confluence/display/KAFKA/KIP-28+-+Add+a+processor+client.

Author: Guozhang Wang <wangguoz@gmail.com>
Author: Yasuhiro Matsuda <yasuhiro.matsuda@gmail.com>
Author: Yasuhiro Matsuda <yasuhiro@confluent.io>
Author: ymatsuda <yasuhiro.matsuda@gmail.com>
Author: Randall Hauch <rhauch@gmail.com>
Author: Jesse Anderson <jesse@smokinghand.com>
Author: Ismael Juma <ismael@juma.me.uk>
Author: Jesse Anderson <eljefe6a@gmail.com>

Reviewers: Ismael Juma, Randall Hauch, Edward Ribeiro, Gwen Shapira, Jun Rao, Jay Kreps, Yasuhiro Matsuda, Guozhang Wang

Closes #130 from guozhangwang/streaming
2015-09-25 17:27:58 -07:00