elasticsearch/docs/reference/tab-widgets/semantic-search/field-mappings.asciidoc

90 lines
3.2 KiB
Plaintext

// tag::elser[]
ELSER produces token-weight pairs as output from the input text and the query.
The {es} <<sparse-vector,`sparse_vector`>> field type can store these
token-weight pairs as numeric feature vectors. The index must have a field with
the `sparse_vector` field type to index the tokens that ELSER generates.
To create a mapping for your ELSER index, refer to the
<<elser-mappings,Create the index mapping section>> of the tutorial. The example
shows how to create an index mapping for `my-index` that defines the
`my_embeddings.tokens` field - which will contain the ELSER output - as a
`sparse_vector` field.
[source,console]
----
PUT my-index
{
"mappings": {
"properties": {
"my_tokens": { <1>
"type": "sparse_vector" <2>
},
"my_text_field": { <3>
"type": "text" <4>
}
}
}
}
----
<1> The name of the field that will contain the tokens generated by ELSER.
<2> The field that contains the tokens must be a `sparse_vector` field.
<3> The name of the field from which to create the sparse vector representation.
In this example, the name of the field is `my_text_field`.
<4> The field type is `text` in this example.
// end::elser[]
// tag::dense-vector[]
The models compatible with {es} NLP generate dense vectors as output. The
<<dense-vector,`dense_vector`>> field type is suitable for storing dense vectors
of numeric values. The index must have a field with the `dense_vector` field
type to index the embeddings that the supported third-party model that you
selected generates. Keep in mind that the model produces embeddings with a
certain number of dimensions. The `dense_vector` field must be configured with
the same number of dimensions using the `dims` option. Refer to the respective
model documentation to get information about the number of dimensions of the
embeddings.
To review a mapping of an index for an NLP model, refer to the mapping code
snippet in the
{ml-docs}/ml-nlp-text-emb-vector-search-example.html#ex-text-emb-ingest[Add the text embedding model to an ingest inference pipeline]
section of the tutorial. The example shows how to create an index mapping that
defines the `my_embeddings.predicted_value` field - which will contain the model
output - as a `dense_vector` field.
[source,console]
----
PUT my-index
{
"mappings": {
"properties": {
"my_embeddings.predicted_value": { <1>
"type": "dense_vector", <2>
"dims": 384,<3>
"index": true,
"similarity": "cosine"
},
"my_text_field": { <4>
"type": "text" <5>
}
}
}
}
----
<1> The name of the field that will contain the embeddings generated by the
model.
<2> The field that contains the embeddings must be a `dense_vector` field.
<3> The model produces embeddings with a certain number of dimensions. The
`dense_vector` field must be configured with the same number of dimensions by
the `dims` option. Refer to the respective model documentation to get
information about the number of dimensions of the embeddings.
<4> The name of the field from which to create the dense vector representation.
In this example, the name of the field is `my_text_field`.
<5> The field type is `text` in this example.
// end::dense-vector[]