2021-07-27 00:49:37 +08:00
|
|
|
[role="xpack"]
|
|
|
|
[[start-trained-model-deployment]]
|
|
|
|
= Start trained model deployment API
|
|
|
|
[subs="attributes"]
|
|
|
|
++++
|
|
|
|
<titleabbrev>Start trained model deployment</titleabbrev>
|
|
|
|
++++
|
|
|
|
|
2021-09-28 00:46:13 +08:00
|
|
|
Starts a new trained model deployment.
|
|
|
|
|
2022-09-28 18:09:02 +08:00
|
|
|
beta::[]
|
2022-05-06 04:49:02 +08:00
|
|
|
|
2021-07-27 00:49:37 +08:00
|
|
|
[[start-trained-model-deployment-request]]
|
|
|
|
== {api-request-title}
|
|
|
|
|
2021-11-05 05:23:03 +08:00
|
|
|
`POST _ml/trained_models/<model_id>/deployment/_start`
|
2021-09-28 00:46:13 +08:00
|
|
|
|
2021-07-27 00:49:37 +08:00
|
|
|
[[start-trained-model-deployment-prereq]]
|
|
|
|
== {api-prereq-title}
|
2021-09-28 00:46:13 +08:00
|
|
|
Requires the `manage_ml` cluster privilege. This privilege is included in the
|
|
|
|
`machine_learning_admin` built-in role.
|
2021-07-27 00:49:37 +08:00
|
|
|
|
|
|
|
[[start-trained-model-deployment-desc]]
|
|
|
|
== {api-description-title}
|
|
|
|
|
2022-06-23 20:35:58 +08:00
|
|
|
Currently only `pytorch` models are supported for deployment. Once deployed
|
2022-01-11 16:50:45 +08:00
|
|
|
the model can be used by the <<inference-processor,{infer-cap} processor>>
|
2022-05-06 02:58:59 +08:00
|
|
|
in an ingest pipeline or directly in the <<infer-trained-model>> API.
|
2021-07-27 00:49:37 +08:00
|
|
|
|
2022-06-23 20:35:58 +08:00
|
|
|
Scaling inference performance can be achieved by setting the parameters
|
|
|
|
`number_of_allocations` and `threads_per_allocation`.
|
|
|
|
|
|
|
|
Increasing `threads_per_allocation` means more threads are used when
|
|
|
|
an inference request is processed on a node. This can improve inference speed
|
|
|
|
for certain models. It may also result in improvement to throughput.
|
|
|
|
|
2022-07-18 21:19:01 +08:00
|
|
|
Increasing `number_of_allocations` means more threads are used to
|
2022-06-23 20:35:58 +08:00
|
|
|
process multiple inference requests in parallel resulting in throughput
|
|
|
|
improvement. Each model allocation uses a number of threads defined by
|
|
|
|
`threads_per_allocation`.
|
|
|
|
|
|
|
|
Model allocations are distributed across {ml} nodes. All allocations assigned
|
|
|
|
to a node share the same copy of the model in memory. To avoid
|
|
|
|
thread oversubscription which is detrimental to performance, model allocations
|
|
|
|
are distributed in such a way that the total number of used threads does not
|
|
|
|
surpass the node's allocated processors.
|
|
|
|
|
2021-07-27 00:49:37 +08:00
|
|
|
[[start-trained-model-deployment-path-params]]
|
|
|
|
== {api-path-parms-title}
|
|
|
|
|
|
|
|
`<model_id>`::
|
|
|
|
(Required, string)
|
|
|
|
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
|
|
|
|
|
|
|
|
[[start-trained-model-deployment-query-params]]
|
|
|
|
== {api-query-parms-title}
|
|
|
|
|
2022-07-18 21:19:01 +08:00
|
|
|
`cache_size`::
|
|
|
|
(Optional, <<byte-units,byte value>>)
|
|
|
|
The inference cache size (in memory outside the JVM heap) per node for the model.
|
|
|
|
The default value is the same size as the `model_size_bytes`. To disable the cache, `0b` can be provided.
|
|
|
|
|
2022-05-10 22:41:00 +08:00
|
|
|
`number_of_allocations`::
|
2021-10-28 23:23:25 +08:00
|
|
|
(Optional, integer)
|
2022-06-23 20:35:58 +08:00
|
|
|
The total number of allocations this model is assigned across {ml} nodes.
|
2021-12-14 21:07:34 +08:00
|
|
|
Increasing this value generally increases the throughput.
|
|
|
|
Defaults to 1.
|
2021-11-25 02:09:45 +08:00
|
|
|
|
2021-10-28 23:23:25 +08:00
|
|
|
`queue_capacity`::
|
|
|
|
(Optional, integer)
|
2022-01-11 16:50:45 +08:00
|
|
|
Controls how many inference requests are allowed in the queue at a time.
|
|
|
|
Every machine learning node in the cluster where the model can be allocated
|
|
|
|
has a queue of this size; when the number of requests exceeds the total value,
|
2022-08-24 21:52:19 +08:00
|
|
|
new requests are rejected with a 429 error. Defaults to 1024. Max allowed value is 1000000.
|
2021-10-28 23:23:25 +08:00
|
|
|
|
2022-05-10 22:41:00 +08:00
|
|
|
`threads_per_allocation`::
|
|
|
|
(Optional, integer)
|
|
|
|
Sets the number of threads used by each model allocation during inference. This generally increases
|
2022-06-23 20:35:58 +08:00
|
|
|
the speed per inference request. The inference process is a compute-bound process;
|
|
|
|
`threads_per_allocations` must not exceed the number of available allocated processors per node.
|
2022-06-17 20:12:37 +08:00
|
|
|
Defaults to 1. Must be a power of 2. Max allowed value is 32.
|
2022-05-10 22:41:00 +08:00
|
|
|
|
2021-11-25 02:09:45 +08:00
|
|
|
`timeout`::
|
|
|
|
(Optional, time)
|
|
|
|
Controls the amount of time to wait for the model to deploy. Defaults
|
|
|
|
to 20 seconds.
|
|
|
|
|
|
|
|
`wait_for`::
|
|
|
|
(Optional, string)
|
|
|
|
Specifies the allocation status to wait for before returning. Defaults to
|
|
|
|
`started`. The value `starting` indicates deployment is starting but not yet on
|
|
|
|
any node. The value `started` indicates the model has started on at least one
|
|
|
|
node. The value `fully_allocated` indicates the deployment has started on all
|
|
|
|
valid nodes.
|
|
|
|
|
2021-07-27 00:49:37 +08:00
|
|
|
[[start-trained-model-deployment-example]]
|
|
|
|
== {api-examples-title}
|
2021-09-08 03:23:13 +08:00
|
|
|
|
2021-09-28 00:46:13 +08:00
|
|
|
The following example starts a new deployment for a
|
2021-10-28 23:23:25 +08:00
|
|
|
`elastic__distilbert-base-uncased-finetuned-conll03-english` trained model:
|
2021-09-28 00:46:13 +08:00
|
|
|
|
|
|
|
[source,console]
|
|
|
|
--------------------------------------------------
|
|
|
|
POST _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/deployment/_start?wait_for=started&timeout=1m
|
|
|
|
--------------------------------------------------
|
|
|
|
// TEST[skip:TBD]
|
|
|
|
|
|
|
|
The API returns the following results:
|
|
|
|
|
|
|
|
[source,console-result]
|
|
|
|
----
|
|
|
|
{
|
2022-04-18 23:35:10 +08:00
|
|
|
"assignment": {
|
2021-09-28 00:46:13 +08:00
|
|
|
"task_parameters": {
|
|
|
|
"model_id": "elastic__distilbert-base-uncased-finetuned-conll03-english",
|
2022-05-17 22:27:44 +08:00
|
|
|
"model_bytes": 265632637,
|
|
|
|
"threads_per_allocation" : 1,
|
|
|
|
"number_of_allocations" : 1,
|
|
|
|
"queue_capacity" : 1024
|
2021-09-28 00:46:13 +08:00
|
|
|
},
|
|
|
|
"routing_table": {
|
|
|
|
"uckeG3R8TLe2MMNBQ6AGrw": {
|
|
|
|
"routing_state": "started",
|
|
|
|
"reason": ""
|
|
|
|
}
|
|
|
|
},
|
2022-04-18 23:35:10 +08:00
|
|
|
"assignment_state": "started",
|
2022-05-17 22:27:44 +08:00
|
|
|
"start_time": "2022-11-02T11:50:34.766591Z"
|
2021-09-28 00:46:13 +08:00
|
|
|
}
|
|
|
|
}
|
|
|
|
----
|