port paragrpah from CP docs (#7808)

The AK Streams architecture docs should explain how the maximum parallelism is determined
Reviewers: Bill Bejeck <bbejeck@gmail.com>
This commit is contained in:
A. Sophie Blee-Goldman 2019-12-09 13:35:17 -08:00 committed by Bill Bejeck
parent 54b26c0aa4
commit 9758e2d7fd
1 changed files with 8 additions and 0 deletions

View File

@ -52,6 +52,14 @@
these record buffers. As a result stream tasks can be processed independently and in parallel without manual intervention.
</p>
<p>
Slightly simplified, the maximum parallelism at which your application may run is bounded by the maximum number of stream tasks, which itself is determined by
maximum number of partitions of the input topic(s) the application is reading from. For example, if your input topic has 5 partitions, then you can run up to 5
applications instances. These instances will collaboratively process the topics data. If you run a larger number of app instances than partitions of the input
topic, the “excess” app instances will launch but remain idle; however, if one of the busy instances goes down, one of the idle instances will resume the formers
work.
</p>
<p>
It is important to understand that Kafka Streams is not a resource manager, but a library that "runs" anywhere its stream processing application runs.
Multiple instances of the application are executed either on the same machine, or spread across multiple machines and tasks can be distributed automatically