* Update term-suggest.asciidoc
It is really easy to miss the fact, that that's the default setting, since it is not highlighted or called out in anyway
* Apply review suggestion
---------
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
This change at a high level adds global ranking on the coordinating node at the end of query reduction
prior to the fetch phase. Individual rank methods are defined in plugins.
The first rank plugin added as part of this change is reciprocal rank fusion (RRF). RRF uses a relatively
simple formula for merging 1...n results sets together with sum(1/(k+d)) where k is a ranking constant
and d is a document's scored position within a result set from a query.
This adds a new parameter to `knn` that allows filtering nearest neighbor results that are outside a given similarity.
`num_candidates` and `k` are still required as this controls the nearest-neighbor vector search accuracy and exploration. For each shard the query will search `num_candidates` and only keep those that are within the provided `similarity` boundary, and then finally reduce to only the global top `k` as normal.
For example, when using the `l2_norm` indexed similarity value, this could be considered a `radius` post-filter on `knn`.
relates to: https://github.com/elastic/elasticsearch/issues/84929 && https://github.com/elastic/elasticsearch/pull/93574
The _terms_enum API currently does not support ip fields. However,
type-ahead-like completion is useful for UI purposes.
This change adds the ability to query ip fields via the _terms_enum API by
leveraging the terms enumeration available when doc_values are enabled on the
field, which is the default. In order to make prefix filtering fast, we
internally create a fast prefix automaton from the user-supplied prefix that
gets intersected with the shards terms enumeration, similar to what we do for
keyword fields already.
Closes#89933
The text_embedding query vector builder that can be used with
KNN search to deliver a semantic search solution will be experimental
for its first release.
The _terms_enum API currently only supports the keyword, constant_keyword
and flattened field type. This change adds support for the `version` field type
that sorts according to the semantic versioning definition.
Closes#83403
This was only needed because the percolator uses a MemoryIndex which did
not support stored fields, and so when it ran a highlighting phase it needed to
force it to read from source. MemoryIndex added stored fields support in
lucene 9.5, so we can remove this internal parameter.
The parameter remains available, but deprecated, via the rest layer, and no
longer has any effect.
This adds a new option to the knn search clause called query_vector_builder. This is a pluggable configuration that allows the query_vector created or retrieved.
It makes sense to allow more than one KNN search clause per individual search request. It may be that different documents have separate vector spaces or that a single doc is index with more than one vector space. In both of these scenarios, users may want to retrieve a resulting set that takes into account all their indexed vector spaces.
A prime example here would be searching a semantic text embedding along with searching an image embedding.
closes https://github.com/elastic/elasticsearch/issues/91187
Start instrumenting Weight#count function in ProfileWeight because we start to use it to compute total hit counts and aggregation counts.
Resolve#85203
Loading of stored fields is currently handled directly in FetchPhase, with
some fairly complex logic examining various bits of the FetchContext to work
out what fields need to be loaded. This is further complicated by synthetic
source, which may have its own stored field requirements.
This commit tries to separate out these concerns a little by adding a new
StoredFieldsSpec record that holds information about which stored fields
need to be loaded. Each FetchSubPhaseProcessor can now report a
StoredFieldsSpec detailing what its requirements are, and these specs can
be merged together, along with requirements from a SourceLoader, to
determine up-front what fields should be loaded by the StoredFieldLoader.
The stored fields themselves are added into the SearchHit by a new
StoredFieldsPhase, which handles alias resolution and value post-
processing. The logic to determine when source should be loaded and
when not, based on the presence of script fields or stored fields, is
moved into FetchContext, which highlights some inconsistencies that
can be fixed in follow-up commits.
Adds the query option to the _semantic_search endpoint for hybrid retrieval.
Scoring is controlled by the boost fields of the knn search and the query.
This commit removes the experimental tag from kNN search docs and makes some
docs improvements:
* Add a prominent warning about memory usage in the kNN search guide
* Link to the performance tuning guide from the main guide
* Clarify the memory requirements section in the tuning guide
This change adds an element_type as an optional mapping parameter for dense vector fields as
described in #89784. This also adds a byte element_type for dense vector fields that supports storing
dense vectors using only 8-bits per dimension. This is only supported when the mapping parameter
index is set to true.
The code follows a similar pattern to our NumberFieldMapper where we have an enum for
ElementType, and it has methods that DenseVectorFieldType and DenseVectorMapper can delegate to
to support each available type (just float and byte for now).
Adds a {index}_semantic_search endpoint which first converts the query text into a dense vector
using a NLP text embedding model then performs a knn search against an index containing
dense vectors created with the same embedding model.