90 lines
3.2 KiB
Plaintext
90 lines
3.2 KiB
Plaintext
// tag::elser[]
|
|
|
|
ELSER produces token-weight pairs as output from the input text and the query.
|
|
The {es} <<sparse-vector,`sparse_vector`>> field type can store these
|
|
token-weight pairs as numeric feature vectors. The index must have a field with
|
|
the `sparse_vector` field type to index the tokens that ELSER generates.
|
|
|
|
To create a mapping for your ELSER index, refer to the
|
|
<<elser-mappings,Create the index mapping section>> of the tutorial. The example
|
|
shows how to create an index mapping for `my-index` that defines the
|
|
`my_embeddings.tokens` field - which will contain the ELSER output - as a
|
|
`sparse_vector` field.
|
|
|
|
[source,console]
|
|
----
|
|
PUT my-index
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"my_tokens": { <1>
|
|
"type": "sparse_vector" <2>
|
|
},
|
|
"my_text_field": { <3>
|
|
"type": "text" <4>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
<1> The name of the field that will contain the tokens generated by ELSER.
|
|
<2> The field that contains the tokens must be a `sparse_vector` field.
|
|
<3> The name of the field from which to create the sparse vector representation.
|
|
In this example, the name of the field is `my_text_field`.
|
|
<4> The field type is `text` in this example.
|
|
|
|
// end::elser[]
|
|
|
|
|
|
// tag::dense-vector[]
|
|
|
|
The models compatible with {es} NLP generate dense vectors as output. The
|
|
<<dense-vector,`dense_vector`>> field type is suitable for storing dense vectors
|
|
of numeric values. The index must have a field with the `dense_vector` field
|
|
type to index the embeddings that the supported third-party model that you
|
|
selected generates. Keep in mind that the model produces embeddings with a
|
|
certain number of dimensions. The `dense_vector` field must be configured with
|
|
the same number of dimensions using the `dims` option. Refer to the respective
|
|
model documentation to get information about the number of dimensions of the
|
|
embeddings.
|
|
|
|
To review a mapping of an index for an NLP model, refer to the mapping code
|
|
snippet in the
|
|
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an ingest inference pipeline]
|
|
section of the tutorial. The example shows how to create an index mapping that
|
|
defines the `my_embeddings.predicted_value` field - which will contain the model
|
|
output - as a `dense_vector` field.
|
|
|
|
[source,console]
|
|
----
|
|
PUT my-index
|
|
{
|
|
"mappings": {
|
|
"properties": {
|
|
"my_embeddings.predicted_value": { <1>
|
|
"type": "dense_vector", <2>
|
|
"dims": 384,<3>
|
|
"index": true,
|
|
"similarity": "cosine"
|
|
},
|
|
"my_text_field": { <4>
|
|
"type": "text" <5>
|
|
}
|
|
}
|
|
}
|
|
}
|
|
----
|
|
<1> The name of the field that will contain the embeddings generated by the
|
|
model.
|
|
<2> The field that contains the embeddings must be a `dense_vector` field.
|
|
<3> The model produces embeddings with a certain number of dimensions. The
|
|
`dense_vector` field must be configured with the same number of dimensions by
|
|
the `dims` option. Refer to the respective model documentation to get
|
|
information about the number of dimensions of the embeddings.
|
|
<4> The name of the field from which to create the dense vector representation.
|
|
In this example, the name of the field is `my_text_field`.
|
|
<5> The field type is `text` in this example.
|
|
|
|
|
|
// end::dense-vector[]
|