When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput). |
||
---|---|---|
.. | ||
anomaly-detection | ||
common/apis | ||
df-analytics/apis | ||
images | ||
trained-models/apis | ||
ml-shared.asciidoc |