2023-09-29 16:12:07 +08:00
[role="xpack"]
[[post-inference-api]]
=== Perform inference API
2023-11-02 20:02:29 +08:00
experimental[]
2024-04-04 21:42:03 +08:00
Performs an inference task on an input text by using an {infer} endpoint.
2023-09-29 16:12:07 +08:00
2024-04-04 21:42:03 +08:00
IMPORTANT: The {infer} APIs enable you to use certain services, such as built-in
2024-07-02 20:47:14 +08:00
{ml} models (ELSER, E5), models uploaded through Eland, Cohere, OpenAI, Azure, Google AI Studio, Google Vertex AI or
2024-04-04 21:42:03 +08:00
Hugging Face. For built-in models and models uploaded though Eland, the {infer}
APIs offer an alternative way to use and manage trained models. However, if you
do not plan to use the {infer} APIs to use these models or if you want to use
non-NLP models, use the <<ml-df-trained-models-apis>>.
2024-01-09 22:46:46 +08:00
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-request]]
==== {api-request-title}
2024-03-26 15:20:34 +08:00
`POST /_inference/<inference_id>`
2024-03-20 19:15:21 +08:00
2024-03-26 15:20:34 +08:00
`POST /_inference/<task_type>/<inference_id>`
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-prereqs]]
==== {api-prereq-title}
2024-03-20 19:15:21 +08:00
* Requires the `monitor_inference` <<privileges-list-cluster,cluster privilege>>
(the built-in `inference_admin` and `inference_user` roles grant this privilege)
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-desc]]
==== {api-description-title}
2024-03-26 15:20:34 +08:00
The perform {infer} API enables you to use {ml} models to perform specific tasks
2024-04-02 19:34:46 +08:00
on data that you provide as an input. The API returns a response with the
2024-04-04 21:42:03 +08:00
results of the tasks. The {infer} endpoint you use can perform one specific task
2024-04-11 00:29:40 +08:00
that has been defined when the endpoint was created with the
2024-04-04 21:42:03 +08:00
<<put-inference-api>>.
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-path-params]]
==== {api-path-parms-title}
2024-03-26 15:20:34 +08:00
`<inference_id>`::
2023-09-29 16:12:07 +08:00
(Required, string)
2024-03-26 15:20:34 +08:00
The unique identifier of the {infer} endpoint.
2023-09-29 16:12:07 +08:00
`<task_type>`::
2024-02-07 00:15:24 +08:00
(Optional, string)
2023-09-29 16:12:07 +08:00
The type of {infer} task that the model performs.
2024-04-15 16:31:09 +08:00
[discrete]
2024-04-11 00:29:40 +08:00
[[post-inference-api-query-params]]
2024-04-15 16:31:09 +08:00
==== {api-query-parms-title}
2024-04-11 00:29:40 +08:00
`timeout`::
(Optional, timeout)
Controls the amount of time to wait for the inference to complete. Defaults to 30
seconds.
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-request-body]]
2024-04-15 16:31:09 +08:00
==== {api-request-body-title}
2023-09-29 16:12:07 +08:00
`input`::
2024-04-16 15:39:36 +08:00
(Required, string or array of strings)
2023-09-29 16:12:07 +08:00
The text on which you want to perform the {infer} task.
2023-11-15 22:43:57 +08:00
`input` can be a single string or an array.
2024-04-16 15:39:36 +08:00
+
--
2024-04-02 19:34:46 +08:00
[NOTE]
====
2024-04-16 15:39:36 +08:00
Inference endpoints for the `completion` task type currently only support a
single string as input.
2024-04-02 19:34:46 +08:00
====
2024-04-16 15:39:36 +08:00
--
`query`::
(Required, string)
Only for `rerank` {infer} endpoints. The search query text.
2023-09-29 16:12:07 +08:00
2024-06-12 21:15:08 +08:00
`task_settings`::
(Optional, object)
Task settings for the individual {infer} request.
These settings are specific to the `<task_type>` you specified and override the task settings specified when initializing the service.
2023-09-29 16:12:07 +08:00
[discrete]
[[post-inference-api-example]]
==== {api-examples-title}
2024-04-16 15:39:36 +08:00
[discrete]
[[inference-example-completion]]
===== Completion example
The following example performs a completion on the example question.
2023-09-29 16:12:07 +08:00
[source,console]
------------------------------------------------------------
2024-04-16 15:39:36 +08:00
POST _inference/completion/openai_chat_completions
2023-09-29 16:12:07 +08:00
{
2024-04-16 15:39:36 +08:00
"input": "What is Elastic?"
2023-09-29 16:12:07 +08:00
}
------------------------------------------------------------
// TEST[skip:TBD]
The API returns the following response:
[source,console-result]
------------------------------------------------------------
{
2024-04-16 15:39:36 +08:00
"completion": [
2023-11-15 22:43:57 +08:00
{
2024-04-16 15:39:36 +08:00
"result": "Elastic is a company that provides a range of software solutions for search, logging, security, and analytics. Their flagship product is Elasticsearch, an open-source, distributed search engine that allows users to search, analyze, and visualize large volumes of data in real-time. Elastic also offers products such as Kibana, a data visualization tool, and Logstash, a log management and pipeline tool, as well as various other tools and solutions for data analysis and management."
}
2023-11-15 22:43:57 +08:00
]
2023-09-29 16:12:07 +08:00
}
------------------------------------------------------------
2023-11-15 22:43:57 +08:00
// NOTCONSOLE
2024-04-02 19:34:46 +08:00
2024-04-16 15:39:36 +08:00
[discrete]
[[inference-example-rerank]]
===== Rerank example
2024-04-02 19:34:46 +08:00
2024-04-16 15:39:36 +08:00
The following example performs reranking on the example input.
[source,console]
------------------------------------------------------------
POST _inference/rerank/cohere_rerank
{
2024-05-16 19:22:01 +08:00
"input": ["luke", "like", "leia", "chewy","r2d2", "star", "wars"],
"query": "star wars main character"
2024-04-16 15:39:36 +08:00
}
------------------------------------------------------------
// TEST[skip:TBD]
The API returns the following response:
[source,console-result]
------------------------------------------------------------
{
"rerank": [
{
"index": "2",
"relevance_score": "0.011597361",
"text": "leia"
},
{
"index": "0",
"relevance_score": "0.006338922",
"text": "luke"
},
{
"index": "5",
"relevance_score": "0.0016166499",
"text": "star"
},
{
"index": "4",
"relevance_score": "0.0011695103",
"text": "r2d2"
},
{
"index": "1",
"relevance_score": "5.614787E-4",
"text": "like"
},
{
"index": "6",
"relevance_score": "3.7850367E-4",
"text": "wars"
},
{
"index": "3",
"relevance_score": "1.2508839E-5",
"text": "chewy"
}
]
}
------------------------------------------------------------
[discrete]
[[inference-example-sparse]]
===== Sparse embedding example
The following example performs sparse embedding on the example sentence.
2024-04-02 19:34:46 +08:00
[source,console]
------------------------------------------------------------
2024-04-16 15:39:36 +08:00
POST _inference/sparse_embedding/my-elser-model
2024-04-02 19:34:46 +08:00
{
2024-04-16 15:39:36 +08:00
"input": "The sky above the port was the color of television tuned to a dead channel."
2024-04-02 19:34:46 +08:00
}
------------------------------------------------------------
// TEST[skip:TBD]
The API returns the following response:
[source,console-result]
------------------------------------------------------------
{
2024-04-16 15:39:36 +08:00
"sparse_embedding": [
2024-04-02 19:34:46 +08:00
{
2024-04-16 15:39:36 +08:00
"port": 2.1259406,
"sky": 1.7073475,
"color": 1.6922266,
"dead": 1.6247464,
"television": 1.3525393,
"above": 1.2425821,
"tuned": 1.1440028,
"colors": 1.1218185,
"tv": 1.0111054,
"ports": 1.0067928,
"poem": 1.0042328,
"channel": 0.99471164,
"tune": 0.96235967,
"scene": 0.9020516,
(...)
},
(...)
2024-04-02 19:34:46 +08:00
]
}
------------------------------------------------------------
// NOTCONSOLE
2024-06-12 21:15:08 +08:00
[discrete]
[[inference-example-text-embedding]]
===== Text embedding example
The following example performs text embedding on the example sentence using the Cohere integration.
[source,console]
------------------------------------------------------------
POST _inference/text_embedding/my-cohere-endpoint
{
"input": "The sky above the port was the color of television tuned to a dead channel.",
"task_settings": {
"input_type": "ingest"
}
}
------------------------------------------------------------
// TEST[skip:TBD]
The API returns the following response:
[source,console-result]
------------------------------------------------------------
{
"text_embedding": [
{
"embedding": [
{
0.018569946,
-0.036895752,
0.01486969,
-0.0045204163,
-0.04385376,
0.0075950623,
0.04260254,
-0.004005432,
0.007865906,
0.030792236,
-0.050476074,
0.011795044,
-0.011642456,
-0.010070801,
(...)
},
(...)
]
}
]
}
------------------------------------------------------------
// NOTCONSOLE