From 9758e2d7fdbdcd493d0892deeecabe302deade8a Mon Sep 17 00:00:00 2001 From: "A. Sophie Blee-Goldman" Date: Mon, 9 Dec 2019 13:35:17 -0800 Subject: [PATCH] port paragrpah from CP docs (#7808) The AK Streams architecture docs should explain how the maximum parallelism is determined Reviewers: Bill Bejeck --- docs/streams/architecture.html | 8 ++++++++ 1 file changed, 8 insertions(+) diff --git a/docs/streams/architecture.html b/docs/streams/architecture.html index 0dbb1dc97ff..5b5e59e3507 100644 --- a/docs/streams/architecture.html +++ b/docs/streams/architecture.html @@ -52,6 +52,14 @@ these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.

+

+ Slightly simplified, the maximum parallelism at which your application may run is bounded by the maximum number of stream tasks, which itself is determined by + maximum number of partitions of the input topic(s) the application is reading from. For example, if your input topic has 5 partitions, then you can run up to 5 + applications instances. These instances will collaboratively process the topic’s data. If you run a larger number of app instances than partitions of the input + topic, the “excess” app instances will launch but remain idle; however, if one of the busy instances goes down, one of the idle instances will resume the former’s + work. +

+

It is important to understand that Kafka Streams is not a resource manager, but a library that "runs" anywhere its stream processing application runs. Multiple instances of the application are executed either on the same machine, or spread across multiple machines and tasks can be distributed automatically