2023-11-02 02:21:40 +08:00
|
|
|
|
[[query-dsl-knn-query]]
|
|
|
|
|
=== Knn query
|
|
|
|
|
++++
|
|
|
|
|
<titleabbrev>Knn</titleabbrev>
|
|
|
|
|
++++
|
|
|
|
|
|
|
|
|
|
Finds the _k_ nearest vectors to a query vector, as measured by a similarity
|
|
|
|
|
metric. _knn_ query finds nearest vectors through approximate search on indexed
|
|
|
|
|
dense_vectors. The preferred way to do approximate kNN search is through the
|
|
|
|
|
<<knn-search,top level knn section>> of a search request. _knn_ query is reserved for
|
|
|
|
|
expert cases, where there is a need to combine this query with other queries.
|
|
|
|
|
|
|
|
|
|
[[knn-query-ex-request]]
|
|
|
|
|
==== Example request
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
PUT my-image-index
|
|
|
|
|
{
|
|
|
|
|
"mappings": {
|
|
|
|
|
"properties": {
|
|
|
|
|
"image-vector": {
|
|
|
|
|
"type": "dense_vector",
|
|
|
|
|
"dims": 3,
|
|
|
|
|
"index": true,
|
|
|
|
|
"similarity": "l2_norm"
|
|
|
|
|
},
|
|
|
|
|
"file-type": {
|
|
|
|
|
"type": "keyword"
|
2024-01-19 04:53:48 +08:00
|
|
|
|
},
|
|
|
|
|
"title": {
|
|
|
|
|
"type": "text"
|
2023-11-02 02:21:40 +08:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
|
|
|
|
|
. Index your data.
|
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-image-index/_bulk?refresh=true
|
|
|
|
|
{ "index": { "_id": "1" } }
|
2024-01-19 04:53:48 +08:00
|
|
|
|
{ "image-vector": [1, 5, -20], "file-type": "jpg", "title": "mountain lake" }
|
2023-11-02 02:21:40 +08:00
|
|
|
|
{ "index": { "_id": "2" } }
|
2024-01-19 04:53:48 +08:00
|
|
|
|
{ "image-vector": [42, 8, -15], "file-type": "png", "title": "frozen lake"}
|
2023-11-02 02:21:40 +08:00
|
|
|
|
{ "index": { "_id": "3" } }
|
2024-01-19 04:53:48 +08:00
|
|
|
|
{ "image-vector": [15, 11, 23], "file-type": "jpg", "title": "mountain lake lodge" }
|
2023-11-02 02:21:40 +08:00
|
|
|
|
----
|
|
|
|
|
//TEST[continued]
|
|
|
|
|
|
2024-06-28 21:59:28 +08:00
|
|
|
|
. Run the search using the `knn` query, asking for the top 10 nearest vectors
|
|
|
|
|
from each shard, and then combine shard results to get the top 3 global results.
|
2023-11-02 02:21:40 +08:00
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-image-index/_search
|
|
|
|
|
{
|
|
|
|
|
"size" : 3,
|
|
|
|
|
"query" : {
|
|
|
|
|
"knn": {
|
|
|
|
|
"field": "image-vector",
|
|
|
|
|
"query_vector": [-5, 9, -12],
|
2024-06-28 21:59:28 +08:00
|
|
|
|
"k": 10
|
2023-11-02 02:21:40 +08:00
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
//TEST[continued]
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[[knn-query-top-level-parameters]]
|
|
|
|
|
==== Top-level parameters for `knn`
|
|
|
|
|
|
|
|
|
|
`field`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Required, string) The name of the vector field to search against. Must be a
|
|
|
|
|
<<index-vectors-knn-search, `dense_vector` field with indexing enabled>>.
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`query_vector`::
|
|
|
|
|
+
|
|
|
|
|
--
|
2024-03-18 23:13:38 +08:00
|
|
|
|
(Optional, array of floats or string) Query vector. Must have the same number of dimensions
|
2024-03-13 15:24:51 +08:00
|
|
|
|
as the vector field you are searching against. Must be either an array of floats or a hex-encoded byte vector.
|
2024-03-18 23:13:38 +08:00
|
|
|
|
Either this or `query_vector_builder` must be provided.
|
2023-11-02 02:21:40 +08:00
|
|
|
|
--
|
|
|
|
|
|
2024-03-18 23:13:38 +08:00
|
|
|
|
`query_vector_builder`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, object) Query vector builder.
|
2024-04-17 20:37:07 +08:00
|
|
|
|
include::{es-ref-dir}/rest-api/common-parms.asciidoc[tag=knn-query-vector-builder]
|
2024-03-18 23:13:38 +08:00
|
|
|
|
--
|
|
|
|
|
|
2024-06-28 21:59:28 +08:00
|
|
|
|
`k`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, integer) The number of nearest neighbors to return from each shard.
|
|
|
|
|
{es} collects `k` results from each shard, then merges them to find the global top results.
|
|
|
|
|
This value must be less than or equal to `num_candidates`. Defaults to `num_candidates`.
|
|
|
|
|
--
|
2024-03-18 23:13:38 +08:00
|
|
|
|
|
2023-11-02 02:21:40 +08:00
|
|
|
|
`num_candidates`::
|
|
|
|
|
+
|
|
|
|
|
--
|
2024-06-28 21:59:28 +08:00
|
|
|
|
(Optional, integer) The number of nearest neighbor candidates to consider per shard
|
|
|
|
|
while doing knn search. Cannot exceed 10,000. Increasing `num_candidates` tends to
|
|
|
|
|
improve the accuracy of the final results.
|
|
|
|
|
Defaults to `1.5 * k` if `k` is set, or `1.5 * size` if `k` is not set.
|
2023-11-02 02:21:40 +08:00
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`filter`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, query object) Query to filter the documents that can match.
|
|
|
|
|
The kNN search will return the top documents that also match this filter.
|
|
|
|
|
The value can be a single query or a list of queries. If `filter` is not provided,
|
|
|
|
|
all documents are allowed to match.
|
|
|
|
|
|
|
|
|
|
The filter is a pre-filter, meaning that it is applied **during** the approximate
|
|
|
|
|
kNN search to ensure that `num_candidates` matching documents are returned.
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`similarity`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, float) The minimum similarity required for a document to be considered
|
|
|
|
|
a match. The similarity value calculated relates to the raw
|
|
|
|
|
<<dense-vector-similarity, `similarity`>> used. Not the document score. The matched
|
|
|
|
|
documents are then scored according to <<dense-vector-similarity, `similarity`>>
|
|
|
|
|
and the provided `boost` is applied.
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`boost`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, float) Floating point number used to multiply the
|
|
|
|
|
scores of matched documents. This value cannot be negative. Defaults to `1.0`.
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
`_name`::
|
|
|
|
|
+
|
|
|
|
|
--
|
|
|
|
|
(Optional, string) Name field to identify the query
|
|
|
|
|
--
|
|
|
|
|
|
|
|
|
|
[[knn-query-filtering]]
|
|
|
|
|
==== Pre-filters and post-filters in knn query
|
|
|
|
|
|
|
|
|
|
There are two ways to filter documents that match a kNN query:
|
|
|
|
|
|
|
|
|
|
. **pre-filtering** – filter is applied during the approximate kNN search
|
|
|
|
|
to ensure that `k` matching documents are returned.
|
|
|
|
|
. **post-filtering** – filter is applied after the approximate kNN search
|
|
|
|
|
completes, which results in fewer than k results, even when there are enough
|
|
|
|
|
matching documents.
|
|
|
|
|
|
|
|
|
|
Pre-filtering is supported through the `filter` parameter of the `knn` query.
|
|
|
|
|
Also filters from <<filter-alias,aliases>> are applied as pre-filters.
|
|
|
|
|
|
|
|
|
|
All other filters found in the Query DSL tree are applied as post-filters.
|
|
|
|
|
For example, `knn` query finds the top 3 documents with the nearest vectors
|
2024-06-28 21:59:28 +08:00
|
|
|
|
(k=3), which are combined with `term` filter, that is
|
2023-11-02 02:21:40 +08:00
|
|
|
|
post-filtered. The final set of documents will contain only a single document
|
|
|
|
|
that passes the post-filter.
|
|
|
|
|
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-image-index/_search
|
|
|
|
|
{
|
|
|
|
|
"size" : 10,
|
|
|
|
|
"query" : {
|
|
|
|
|
"bool" : {
|
|
|
|
|
"must" : {
|
|
|
|
|
"knn": {
|
|
|
|
|
"field": "image-vector",
|
|
|
|
|
"query_vector": [-5, 9, -12],
|
2024-06-28 21:59:28 +08:00
|
|
|
|
"k": 3
|
2023-11-02 02:21:40 +08:00
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
"filter" : {
|
|
|
|
|
"term" : { "file-type" : "png" }
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
//TEST[continued]
|
|
|
|
|
|
2024-01-19 04:53:48 +08:00
|
|
|
|
[[knn-query-in-hybrid-search]]
|
|
|
|
|
==== Hybrid search with knn query
|
|
|
|
|
Knn query can be used as a part of hybrid search, where knn query is combined
|
|
|
|
|
with other lexical queries. For example, the query below finds documents with
|
|
|
|
|
`title` matching `mountain lake`, and combines them with the top 10 documents
|
|
|
|
|
that have the closest image vectors to the `query_vector`. The combined documents
|
|
|
|
|
are then scored and the top 3 top scored documents are returned.
|
|
|
|
|
|
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-image-index/_search
|
|
|
|
|
{
|
|
|
|
|
"size" : 3,
|
|
|
|
|
"query": {
|
|
|
|
|
"bool": {
|
|
|
|
|
"should": [
|
|
|
|
|
{
|
|
|
|
|
"match": {
|
|
|
|
|
"title": {
|
|
|
|
|
"query": "mountain lake",
|
|
|
|
|
"boost": 1
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
},
|
|
|
|
|
{
|
|
|
|
|
"knn": {
|
|
|
|
|
"field": "image-vector",
|
|
|
|
|
"query_vector": [-5, 9, -12],
|
2024-06-28 21:59:28 +08:00
|
|
|
|
"k": 10,
|
2024-01-19 04:53:48 +08:00
|
|
|
|
"boost": 2
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
//TEST[continued]
|
|
|
|
|
|
|
|
|
|
|
2023-11-02 02:21:40 +08:00
|
|
|
|
[[knn-query-with-nested-query]]
|
|
|
|
|
==== Knn query inside a nested query
|
|
|
|
|
|
|
|
|
|
`knn` query can be used inside a nested query. The behaviour here is similar
|
|
|
|
|
to <<nested-knn-search, top level nested kNN search>>:
|
|
|
|
|
|
|
|
|
|
* kNN search over nested dense_vectors diversifies the top results over
|
|
|
|
|
the top-level document
|
|
|
|
|
* `filter` over the top-level document metadata is supported and acts as a
|
|
|
|
|
post-filter
|
|
|
|
|
* `filter` over `nested` field metadata is not supported
|
|
|
|
|
|
|
|
|
|
A sample query can look like below:
|
|
|
|
|
|
|
|
|
|
[source,js]
|
|
|
|
|
----
|
|
|
|
|
{
|
|
|
|
|
"query" : {
|
|
|
|
|
"nested" : {
|
|
|
|
|
"path" : "paragraph",
|
|
|
|
|
"query" : {
|
|
|
|
|
"knn": {
|
|
|
|
|
"query_vector": [
|
|
|
|
|
0.45,
|
|
|
|
|
45
|
|
|
|
|
],
|
|
|
|
|
"field": "paragraph.vector",
|
|
|
|
|
"num_candidates": 2
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
// NOTCONSOLE
|
|
|
|
|
|
|
|
|
|
[[knn-query-aggregations]]
|
|
|
|
|
==== Knn query with aggregations
|
2024-06-28 21:59:28 +08:00
|
|
|
|
`knn` query calculates aggregations on top `k` documents from each shard.
|
2023-11-02 02:21:40 +08:00
|
|
|
|
Thus, the final results from aggregations contain
|
2024-06-28 21:59:28 +08:00
|
|
|
|
`k * number_of_shards` documents. This is different from
|
2023-11-02 02:21:40 +08:00
|
|
|
|
the <<knn-search,top level knn section>> where aggregations are
|
2024-06-28 21:59:28 +08:00
|
|
|
|
calculated on the global top `k` nearest documents.
|