elasticsearch/docs/reference/ml/trained-models/apis/put-trained-model-vocabular...

83 lines
2.1 KiB
Plaintext

[role="xpack"]
[[put-trained-model-vocabulary]]
= Create trained model vocabulary API
[subs="attributes"]
++++
<titleabbrev>Create trained model vocabulary</titleabbrev>
++++
Creates a trained model vocabulary.
This is supported only for natural language processing (NLP) models.
[[ml-put-trained-model-vocabulary-request]]
== {api-request-title}
`PUT _ml/trained_models/<model_id>/vocabulary/`
[[ml-put-trained-model-vocabulary-prereq]]
== {api-prereq-title}
Requires the `manage_ml` cluster privilege. This privilege is included in the
`machine_learning_admin` built-in role.
[[ml-put-trained-model-vocabulary-desc]]
== {api-description-title}
The vocabulary is stored in the index as described in
`inference_config.*.vocabulary` of the trained model definition.
[[ml-put-trained-model-vocabulary-path-params]]
== {api-path-parms-title}
`<model_id>`::
(Required, string)
include::{es-repo-dir}/ml/ml-shared.asciidoc[tag=model-id]
[[ml-put-trained-model-vocabulary-request-body]]
== {api-request-body-title}
`vocabulary`::
(array)
The model vocabulary. Must not be empty.
`merges`::
(Optional, array)
The model merges used in byte-pair encoding. The merges must be sub-token pairs, space delimited, and in order of
preference. Example: ["f o", "fo o"]. Must be provided for RoBERTa and BART style models.
`scores`::
(Optional, array)
Vocabulary value scores used by sentence-piece tokenization. Must have the same length as `vocabulary`.
Required for unigram sentence-piece tokenized models like XLMRoberta and T5.
[[ml-put-trained-model-vocabulary-example]]
== {api-examples-title}
The following example shows how to create a model vocabulary for a
previously stored trained model configuration.
[source,console]
--------------------------------------------------
PUT _ml/trained_models/elastic__distilbert-base-uncased-finetuned-conll03-english/vocabulary
{
"vocabulary": [
"[PAD]",
"[unused0]",
...
]
}
--------------------------------------------------
// TEST[s/\.\.\./"[PAD]"/ skip:TBD]
The API returns the following results:
[source,console-result]
----
{
"acknowledged": true
}
----