elasticsearch/docs/reference/ml/trained-models/apis/start-trained-model-deploym...

[role="xpack"]
[[start-trained-model-deployment]]
= Start trained model deployment API
[subs="attributes"]
++++
<titleabbrev>Start trained model deployment</titleabbrev>
++++

Starts a new trained model deployment.

preview::[]

[[start-trained-model-deployment-request]]
== {api-request-title}

`POST _ml/trained_models/<model_id>/deployment/_start`

[[start-trained-model-deployment-prereq]]
== {api-prereq-title}
Requires the `manage_ml` cluster privilege. This privilege is included in the
`machine_learning_admin` built-in role.

[[start-trained-model-deployment-desc]]
== {api-description-title}

Currently only `pytorch` models are supported for deployment. When deployed,
the model attempts allocation to every machine learning node. Once deployed
the model can be used by the <<inference-processor,{infer-cap} processor>>
in an ingest pipeline or directly in the <<infer-trained-model>> API.

[[start-trained-model-deployment-path-params]]
== {api-path-parms-title}

`<model_id>`::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]

[[start-trained-model-deployment-query-params]]
== {api-query-parms-title}

`number_of_allocations`::
(Optional, integer)
The number of model allocations on each node where the model is deployed.
All allocations on a node share the same copy of the model in memory but use
a separate set of threads to evaluate the model. 
Increasing this value generally increases the throughput.
If this setting is greater than the number of hardware threads
it will automatically be changed to a value less than the number of hardware threads.
Defaults to 1.

[NOTE]
=============================================
If the sum of `threads_per_allocation` and `number_of_allocations` is greater than the number of
hardware threads then the number of `inference_threads` will be reduced.
=============================================

`queue_capacity`::
(Optional, integer)
Controls how many inference requests are allowed in the queue at a time.
Every machine learning node in the cluster where the model can be allocated
has a queue of this size; when the number of requests exceeds the total value,
new requests are rejected with a 429 error. Defaults to 1024.

`threads_per_allocation`::
(Optional, integer)
Sets the number of threads used by each model allocation during inference. This generally increases
the inference speed. The inference process is a compute-bound process; any number
greater than the number of available hardware threads on the machine does not increase the
inference speed. If this setting is greater than the number of hardware threads
it will automatically be changed to a value less than the number of hardware threads.
Defaults to 1.

`timeout`::
(Optional, time)
Controls the amount of time to wait for the model to deploy. Defaults
to 20 seconds.

`wait_for`::
(Optional, string)
Specifies the allocation status to wait for before returning. Defaults to
`started`. The value `starting` indicates deployment is starting but not yet on
any node. The value `started` indicates the model has started on at least one
node. The value `fully_allocated` indicates the deployment has started on all
valid nodes.

[[start-trained-model-deployment-example]]
== {api-examples-title}

The following example starts a new deployment for a
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:

[source,console]
--------------------------------------------------
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m
--------------------------------------------------
// TEST[skip:TBD]

The API returns the following results:

[source,console-result]
----
{
    "assignment": {
        "task_parameters": {
            "model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english",
            "model_bytes": 265632637
        },
        "routing_table": {
            "uckeG3R8TLe2MMNBQ6AGrw": {
                "routing_state": "started",
                "reason": ""
            }
        },
        "assignment_state": "started",
        "start_time": "2021-11-02T11:50:34.766591Z"
    }
}
----
[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00			`[role="xpack"]`
			`[[start-trained-model-deployment]]`
			`= Start trained model deployment API`
			`[subs="attributes"]`
			`++++`
			`<titleabbrev>Start trained model deployment</titleabbrev>`
			`++++`

[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			`Starts a new trained model deployment.`

[DOCS] Add preview admonition to infer API (#86486) 2022-05-06 04:49:02 +08:00			`preview::[]`

[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00			`[[start-trained-model-deployment-request]]`
			`== {api-request-title}`

[DOCS] Fixes typo in start trained models API (#80368) 2021-11-05 05:23:03 +08:00			`POST _ml/trained_models/<model_id>/deployment/_start`
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00
[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00			`[[start-trained-model-deployment-prereq]]`
			`== {api-prereq-title}`
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			Requires the `manage_ml` cluster privilege. This privilege is included in the
			`machine_learning_admin` built-in role.
[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00
			`[[start-trained-model-deployment-desc]]`
			`== {api-description-title}`

[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			Currently only `pytorch` models are supported for deployment. When deployed,
[ML] Add NLP inference configs to the inference processor docs (#82320) 2022-01-11 16:50:45 +08:00			`the model attempts allocation to every machine learning node. Once deployed`
			`the model can be used by the <<inference-processor,{infer-cap} processor>>`
[ML] add new trained_models/{model_id}/_infer endpoint for all supervised models and deprecate deployment infer api (#86361) This commit adds a new `_ml/trained_models/{model_id}/_infer` API. This api works for both native NLP models and supervised models trained via Data Frame analytics. The format of the API is the same as the old `_ml/trained_models/{model_id}/deployment/_infer`. Taking a `docs` and an `inference_config` parameter. This PR also deprecates the old experimental `_ml/trained_models/{model_id}/deployment/_infer` API. The biggest difference is that the response now nests all results under an "inference_results" object. closes: https://github.com/elastic/elasticsearch/issues/86032 2022-05-06 02:58:59 +08:00			`in an ingest pipeline or directly in the <<infer-trained-model>> API.`
[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00
			`[[start-trained-model-deployment-path-params]]`
			`== {api-path-parms-title}`

			`<model_id>`::
			`(Required, string)`
			`include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]`

			`[[start-trained-model-deployment-query-params]]`
			`== {api-query-parms-title}`

[ML] Rename threading params in _start trained model deployment API (#86597) When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput). 2022-05-10 22:41:00 +08:00			`number_of_allocations`::
[ML] adds new params to the start trained model deployment docs (#80016) 2021-10-28 23:23:25 +08:00			`(Optional, integer)`
[ML] Rename threading params in _start trained model deployment API (#86597) When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput). 2022-05-10 22:41:00 +08:00			`The number of model allocations on each node where the model is deployed.`
			`All allocations on a node share the same copy of the model in memory but use`
			`a separate set of threads to evaluate the model.`
[ML][DOCS] Add note about max values of thread settings (#81367) 2021-12-14 21:07:34 +08:00			`Increasing this value generally increases the throughput.`
			`If this setting is greater than the number of hardware threads`
			`it will automatically be changed to a value less than the number of hardware threads.`
			`Defaults to 1.`

			`[NOTE]`
			`=============================================`
[ML] Rename threading params in _start trained model deployment API (#86597) When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput). 2022-05-10 22:41:00 +08:00			If the sum of `threads_per_allocation` and `number_of_allocations` is greater than the number of
Revert "[ML] Only one of `inference_threads` and `model_threads` may be great… (#84794)" (#85089) This reverts commit 4eaedb265dfddd1020e6e39d0797d378b8ce62a1. On further investigation of how to improve allocation of trained models, we concluded that being able to set `inference_threads` in combination with `model_threads` is fundamental for scalability. 2022-03-18 15:41:27 +08:00			hardware threads then the number of `inference_threads` will be reduced.
[ML][DOCS] Add note about max values of thread settings (#81367) 2021-12-14 21:07:34 +08:00			`=============================================`
[DOCS] Fixes start and stop trained model deployment APIs (#80978) 2021-11-25 02:09:45 +08:00
[ML] adds new params to the start trained model deployment docs (#80016) 2021-10-28 23:23:25 +08:00			`queue_capacity`::
			`(Optional, integer)`
[ML] Add NLP inference configs to the inference processor docs (#82320) 2022-01-11 16:50:45 +08:00			`Controls how many inference requests are allowed in the queue at a time.`
			`Every machine learning node in the cluster where the model can be allocated`
			`has a queue of this size; when the number of requests exceeds the total value,`
			`new requests are rejected with a 429 error. Defaults to 1024.`
[ML] adds new params to the start trained model deployment docs (#80016) 2021-10-28 23:23:25 +08:00
[ML] Rename threading params in _start trained model deployment API (#86597) When starting a trained model deployment the user can tweak performance by setting the `model_threads` and `inference_threads` parameters. These parameters are hard to understand and cause confusion. This commit renames these as well as the fields where their values are reported in the stats API. - `model_threads` => `number_of_allocations` - `inference_threads` => `threads_per_allocation` Now the terminology is as follows. A model deployment starts with a requested `number_of_allocations`. Each allocation means the model gets another thread for executing parallel inference requests. Thus, more allocations should increase throughput. In its turn, each allocation is may be using a number of threads to parallelize each individual inference request. This is the `threads_per_allocation` setting and increases inference speed (which might also result in improved throughput). 2022-05-10 22:41:00 +08:00			`threads_per_allocation`::
			`(Optional, integer)`
			`Sets the number of threads used by each model allocation during inference. This generally increases`
			`the inference speed. The inference process is a compute-bound process; any number`
			`greater than the number of available hardware threads on the machine does not increase the`
			`inference speed. If this setting is greater than the number of hardware threads`
			`it will automatically be changed to a value less than the number of hardware threads.`
			`Defaults to 1.`

[DOCS] Fixes start and stop trained model deployment APIs (#80978) 2021-11-25 02:09:45 +08:00			`timeout`::
			`(Optional, time)`
			`Controls the amount of time to wait for the model to deploy. Defaults`
			`to 20 seconds.`

			`wait_for`::
			`(Optional, string)`
			`Specifies the allocation status to wait for before returning. Defaults to`
			`started`. The value `starting` indicates deployment is starting but not yet on
			any node. The value `started` indicates the model has started on at least one
			node. The value `fully_allocated` indicates the deployment has started on all
			`valid nodes.`

[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-27 00:49:37 +08:00			`[[start-trained-model-deployment-example]]`
			`== {api-examples-title}`
[ML] add allocation state reason and support for partial model allocations (#76925) Previously, if a model failed to be allocated on any node, the deployment failed. This commit allows for an allocation to be partially_started and indicates its current state via a new state value in the deployment stats API. Additionally, when starting a deployment, the user may specify to wait_for starting, partially_started, started and the API will block (as long as timeout doesn't expire) until that state is reached. 2021-09-08 03:23:13 +08:00
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			`The following example starts a new deployment for a`
[ML] adds new params to the start trained model deployment docs (#80016) 2021-10-28 23:23:25 +08:00			`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00
			`[source,console]`
			`--------------------------------------------------`
			`POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m`
			`--------------------------------------------------`
			`// TEST[skip:TBD]`

			`The API returns the following results:`

			`[source,console-result]`
			`----`
			`{`
[ML] rename trained model allocations to assignments (#85503) This renames the internal concept of a trained model allocation into an assignment. Now models are assigned to a node and routes created for inference. Not "allocated". This is an internal rename only. The user facing concepts of trained models and deployments are untouched. 2022-04-18 23:35:10 +08:00			`"assignment": {`
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			`"task_parameters": {`
			`"model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english",`
			`"model_bytes": 265632637`
			`},`
			`"routing_table": {`
			`"uckeG3R8TLe2MMNBQ6AGrw": {`
			`"routing_state": "started",`
			`"reason": ""`
			`}`
			`},`
[ML] rename trained model allocations to assignments (#85503) This renames the internal concept of a trained model allocation into an assignment. Now models are assigned to a node and routes created for inference. Not "allocated". This is an internal rename only. The user facing concepts of trained models and deployments are untouched. 2022-04-18 23:35:10 +08:00			`"assignment_state": "started",`
[ML] Report start_time for trained model deployments and allocations (#80188) Adds `start_time` to the get deployment stats API for the deployment and each allocation. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com> 2021-11-02 23:12:46 +08:00			`"start_time": "2021-11-02T11:50:34.766591Z"`
[ML] adding some initial document for our pytorch NLP model support (#78270) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations 2021-09-28 00:46:13 +08:00			`}`
			`}`
			`----`