elasticsearch/docs/reference/vectors/vector-functions.asciidoc

[role="xpack"]
[[vector-functions]]
===== Functions for vector fields

NOTE: During vector functions' calculation, all matched documents are
linearly scanned. Thus, expect the query time grow linearly
with the number of matched documents. For this reason, we recommend
to limit the number of matched documents with a `query` parameter.

This is the list of available vector functions and vector access methods:

1. <<vector-functions-cosine,`cosineSimilarity`>> – calculates cosine similarity
2. <<vector-functions-dot-product,`dotProduct`>> – calculates dot product
3. <<vector-functions-l1,`l1norm`>> – calculates L^1^ distance
4. <<vector-functions-l2,`l2norm`>> - calculates L^2^ distance
5. <<vector-functions-accessing-vectors,`doc[<field>].vectorValue`>> – returns a vector's value as an array of floats
6. <<vector-functions-accessing-vectors,`doc[<field>].magnitude`>> – returns a vector's magnitude

NOTE: The recommended way to access dense vectors is through the
`cosineSimilarity`, `dotProduct`, `l1norm` or `l2norm` functions. Please note
however, that you should call these functions only once per script. For example,
don’t use these functions in a loop to calculate the similarity between a
document vector and multiple other vectors. If you need that functionality,
reimplement these functions yourself by
<<vector-functions-accessing-vectors,accessing vector values directly>>.

Let's create an index with a `dense_vector` mapping and index a couple
of documents into it.

[source,console]
--------------------------------------------------
PUT my-index-000001
{
  "mappings": {
    "properties": {
      "my_dense_vector": {
        "type": "dense_vector",
        "dims": 3
      },
      "status" : {
        "type" : "keyword"
      }
    }
  }
}

PUT my-index-000001/_doc/1
{
  "my_dense_vector": [0.5, 10, 6],
  "status" : "published"
}

PUT my-index-000001/_doc/2
{
  "my_dense_vector": [-0.5, 10, 10],
  "status" : "published"
}

POST my-index-000001/_refresh

--------------------------------------------------
// TESTSETUP

[[vector-functions-cosine]]
====== Cosine similarity

The `cosineSimilarity` function calculates the measure of
cosine similarity between a given query vector and document vectors.

[source,console]
--------------------------------------------------
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published" <1>
            }
          }
        }
      },
      "script": {
        "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
        "params": {
          "query_vector": [4, 3.4, -0.2]  <3>
        }
      }
    }
  }
}
--------------------------------------------------

<1> To restrict the number of documents on which script score calculation is applied, provide a filter.
<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
<3> To take advantage of the script optimizations, provide a query vector as a script parameter.

NOTE: If a document's dense vector field has a number of dimensions
different from the query's vector, an error will be thrown.

[[vector-functions-dot-product]]
====== Dot product

The `dotProduct` function calculates the measure of
dot product between a given query vector and document vectors.

[source,console]
--------------------------------------------------
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          double value = dotProduct(params.query_vector, 'my_dense_vector');
          return sigmoid(1, Math.E, -value); <1>
        """,
        "params": {
          "query_vector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------

<1> Using the standard sigmoid function prevents scores from being negative.

[[vector-functions-l1]]
====== L^1^ distance (Manhattan distance)

The `l1norm` function calculates L^1^ distance
(Manhattan distance) between a given query vector and
document vectors.

[source,console]
--------------------------------------------------
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------

<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
`l2norm` shown below represent distances or differences. This means, that
the more similar the vectors are, the lower the scores will be that are
produced by the `l1norm` and `l2norm` functions.
Thus, as we need more similar vectors to score higher,
we reversed the output from `l1norm` and `l2norm`. Also, to avoid
division by 0 when a document vector matches the query exactly,
we added `1` in the denominator.

[[vector-functions-l2]]
====== L^2^ distance (Euclidean distance)

The `l2norm` function calculates L^2^ distance
(Euclidean distance) between a given query vector and
document vectors.

[source,console]
--------------------------------------------------
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
        "params": {
          "queryVector": [4, 3.4, -0.2]
        }
      }
    }
  }
}
--------------------------------------------------

[[vector-functions-missing-values]]
====== Checking for missing values

If a document doesn't have a value for a vector field on which a vector function
is executed, an error will be thrown.

You can check if a document has a value for the field `my_vector` with
`doc['my_vector'].size() == 0`. Your overall script can look like this:

[source,js]
--------------------------------------------------
"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
--------------------------------------------------
// NOTCONSOLE

[[vector-functions-accessing-vectors]]
====== Accessing vectors directly

You can access vector values directly through the following functions:

- `doc[<field>].vectorValue` – returns a vector's value as an array of floats

- `doc[<field>].magnitude` – returns a vector's magnitude as a float
(for vectors created prior to version 7.5 the magnitude is not stored.
So this function calculates it anew every time it is called).

For example, the script below implements a cosine similarity using these
two functions:

[source,console]
--------------------------------------------------
GET my-index-000001/_search
{
  "query": {
    "script_score": {
      "query" : {
        "bool" : {
          "filter" : {
            "term" : {
              "status" : "published"
            }
          }
        }
      },
      "script": {
        "source": """
          float[] v = doc['my_dense_vector'].vectorValue;
          float vm = doc['my_dense_vector'].magnitude;
          float dotProduct = 0;
          for (int i = 0; i < v.length; i++) {
            dotProduct += v[i] * params.queryVector[i];
          }
          return dotProduct / (vm * (float) params.queryVectorMag);
        """,
        "params": {
          "queryVector": [4, 3.4, -0.2],
          "queryVectorMag": 5.25357
        }
      }
    }
  }
}
--------------------------------------------------
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								[role="xpack"]
 								[[vector-functions]]
 								===== Functions for vector fields
 								NOTE: During vector functions' calculation, all matched documents are
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								linearly scanned. Thus, expect the query time grow linearly
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								with the number of matched documents. For this reason, we recommend
 								to limit the number of matched documents with a `query` parameter.
-												Add access to dense_vector values (#71313)

Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
											
										
										
											2021-04-19 20:02:05 +08:00
+								This is the list of available vector functions and vector access methods:
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+. <<vector-functions-cosine,`cosineSimilarity`>> – calculates cosine similarity
 . <<vector-functions-dot-product,`dotProduct`>> – calculates dot product
 . <<vector-functions-l1,`l1norm`>> – calculates L^1^ distance
 . <<vector-functions-l2,`l2norm`>> - calculates L^2^ distance
 . <<vector-functions-accessing-vectors,`doc[<field>].vectorValue`>> – returns a vector's value as an array of floats
 . <<vector-functions-accessing-vectors,`doc[<field>].magnitude`>> – returns a vector's magnitude
 								NOTE: The recommended way to access dense vectors is through the
 								`cosineSimilarity`, `dotProduct`, `l1norm` or `l2norm` functions. Please note
 								however, that you should call these functions only once per script. For example,
 								don’t use these functions in a loop to calculate the similarity between a
 								document vector and multiple other vectors. If you need that functionality,
 								reimplement these functions yourself by
 								<<vector-functions-accessing-vectors,accessing vector values directly>>.
-												Add access to dense_vector values (#71313)

Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
											
										
										
											2021-04-19 20:02:05 +08:00
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								Let's create an index with a `dense_vector` mapping and index a couple
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								of documents into it.
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
+								[source,console]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "mappings": {
 								    "properties": {
 								      "my_dense_vector": {
 								        "type": "dense_vector",
 								        "dims": 3
 								      },
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								      "status" : {
 								        "type" : "keyword"
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								      }
 								    }
 								  }
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/1
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "my_dense_vector": [0.5, 10, 6],
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								  "status" : "published"
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/2
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "my_dense_vector": [-0.5, 10, 10],
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								  "status" : "published"
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								POST my-index-000001/_refresh
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
 								// TESTSETUP
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-cosine]]
 								====== Cosine similarity
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								The `cosineSimilarity` function calculates the measure of
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								cosine similarity between a given query vector and document vectors.
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
+								[source,console]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_search
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "query": {
 								    "script_score": {
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								      "query" : {
 								        "bool" : {
 								          "filter" : {
 								            "term" : {
 								              "status" : "published" <1>
 								            }
 								          }
 								        }
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								      },
 								      "script": {
-												Update the signature of vector script functions. (#48604)

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
											
										
										
											2019-10-30 04:26:36 +08:00
+								        "source": "cosineSimilarity(params.query_vector, 'my_dense_vector') + 1.0", <2>
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        "params": {
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								          "query_vector": [4, 3.4, -0.2]  <3>
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								<1> To restrict the number of documents on which script score calculation is applied, provide a filter.
 								<2> The script adds 1.0 to the cosine similarity to prevent the score from being negative.
 								<3> To take advantage of the script optimizations, provide a query vector as a script parameter.
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
 								NOTE: If a document's dense vector field has a number of dimensions
 								different from the query's vector, an error will be thrown.
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-dot-product]]
 								====== Dot product
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								The `dotProduct` function calculates the measure of
 								dot product between a given query vector and document vectors.
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
+								[source,console]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_search
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "query": {
 								    "script_score": {
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								      "query" : {
 								        "bool" : {
 								          "filter" : {
 								            "term" : {
 								              "status" : "published"
 								            }
 								          }
 								        }
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								      },
 								      "script": {
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								        "source": """
-												Update the signature of vector script functions. (#48604)

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
											
										
										
											2019-10-30 04:26:36 +08:00
+								          double value = dotProduct(params.query_vector, 'my_dense_vector');
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								          return sigmoid(1, Math.E, -value); <1>
 								        """,
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        "params": {
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								          "query_vector": [4, 3.4, -0.2]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								<1> Using the standard sigmoid function prevents scores from being negative.
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-l1]]
 								====== L^1^ distance (Manhattan distance)
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								The `l1norm` function calculates L^1^ distance
 								(Manhattan distance) between a given query vector and
 								document vectors.
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
+								[source,console]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_search
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "query": {
 								    "script_score": {
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								      "query" : {
 								        "bool" : {
 								          "filter" : {
 								            "term" : {
 								              "status" : "published"
 								            }
 								          }
 								        }
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								      },
 								      "script": {
-												Update the signature of vector script functions. (#48604)

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
											
										
										
											2019-10-30 04:26:36 +08:00
+								        "source": "1 / (1 + l1norm(params.queryVector, 'my_dense_vector'))", <1>
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        "params": {
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								          "queryVector": [4, 3.4, -0.2]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								<1> Unlike `cosineSimilarity` that represent similarity, `l1norm` and
 								`l2norm` shown below represent distances or differences. This means, that
 								the more similar the vectors are, the lower the scores will be that are
 								produced by the `l1norm` and `l2norm` functions.
 								Thus, as we need more similar vectors to score higher,
 								we reversed the output from `l1norm` and `l2norm`. Also, to avoid
 								division by 0 when a document vector matches the query exactly,
 								we added `1` in the denominator.
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-l2]]
 								====== L^2^ distance (Euclidean distance)
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								The `l2norm` function calculates L^2^ distance
 								(Euclidean distance) between a given query vector and
 								document vectors.
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
-												[DOCS] Replace "// CONSOLE" comments with [source,console] (#46159)



											
										
										
											2019-09-05 00:51:02 +08:00
+								[source,console]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								--------------------------------------------------
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_search
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								{
 								  "query": {
 								    "script_score": {
-												Add filters in exampls of vector functions (#45327)


											
										
										
											2019-08-08 21:38:05 +08:00
+								      "query" : {
 								        "bool" : {
 								          "filter" : {
 								            "term" : {
 								              "status" : "published"
 								            }
 								          }
 								        }
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								      },
 								      "script": {
-												Update the signature of vector script functions. (#48604)

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
											
										
										
											2019-10-30 04:26:36 +08:00
+								        "source": "1 / (1 + l2norm(params.queryVector, 'my_dense_vector'))",
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								        "params": {
 								          "queryVector": [4, 3.4, -0.2]
-												Add l1norm and l2norm distances for vectors (#44116)

* Add l1norm and l2norm distances for vectors

Add L1norm - Manhattan distance
Add L2norm - Euclidean distance
relates to #37947

* Address Christoph's feedback

- organize vector functions as a separate doc
- increase precision in tests calculations
- add a separate test when sparse doc dims
are bigger and less than query vector dims

* Made examples more realistic

											
										
										
											2019-07-12 02:14:23 +08:00
+								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-missing-values]]
 								====== Checking for missing values
 								If a document doesn't have a value for a vector field on which a vector function
 								is executed, an error will be thrown.
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								You can check if a document has a value for the field `my_vector` with
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								`doc['my_vector'].size() == 0`. Your overall script can look like this:
 								[source,js]
 								--------------------------------------------------
-												Update the signature of vector script functions. (#48604)

Previously the functions accepted a doc values reference, whereas they now
accept the name of the vector field. Here's an example of how a vector function
was called before and after the change.

```
Before: cosineSimilarity(params.query_vector, doc['field'])
After:  cosineSimilarity(params.query_vector, 'field')
```

This seems more intuitive, since we don't allow direct access to vector doc
values and the the meaning of `doc['field']` is unclear.

The PR makes the following changes (broken into distinct commits):
* Add new function signatures of the form `function(params.query_vector,
'field')` and deprecates the old ones. Because Painless doesn't allow two
methods with the same name and number of arguments, we allow a generic `Object`
to be passed in to the function and decide on the behavior through an
`instanceof` check.
* Refactor the class bindings so that the document field is passed to the
constructor instead of the instance method. This allows us to avoid retrieving
the vector doc values on every function invocation, which gives a tiny speed-up
in benchmarks.

Note that this PR adds new signatures for the sparse vector functions too, even
though sparse vectors are deprecated. It seemed simplest to understand (for both
us and users) to keep everything symmetric between dense and sparse vectors.
											
										
										
											2019-10-30 04:26:36 +08:00
+								"source": "doc['my_vector'].size() == 0 ? 0 : cosineSimilarity(params.queryVector, 'my_vector')"
-												Deprecate the sparse_vector field type. (#48315)

We have not seen much adoption of this experimental field type, and don't see a
clear use case as it's currently designed. This PR deprecates the field type in
7.x. It will be removed from 8.0 in a follow-up PR.
											
										
										
											2019-10-23 09:06:50 +08:00
+								--------------------------------------------------
 								// NOTCONSOLE
-												Add access to dense_vector values (#71313)

Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
											
										
										
											2021-04-19 20:02:05 +08:00
-												[DOCS] Warn about calling vector functions repeatedly (#91864)

* [DOCS] Add script score vector function clarification

* [DOCS] Warn about calling vector functions repeatedly
											
										
										
											2022-12-12 16:43:46 +08:00
+								[[vector-functions-accessing-vectors]]
 								====== Accessing vectors directly
 								You can access vector values directly through the following functions:
-												Add access to dense_vector values (#71313)

Allow direct access to a dense_vector' values in script
through the following functions:

- getVectorValue – returns a vector's value as an array of floats
- getMagnitude – returns a vector's magnitude

Closes #51964
											
										
										
											2021-04-19 20:02:05 +08:00
 								- `doc[<field>].vectorValue` – returns a vector's value as an array of floats
 								- `doc[<field>].magnitude` – returns a vector's magnitude as a float
 								(for vectors created prior to version 7.5 the magnitude is not stored.
 								So this function calculates it anew every time it is called).
 								For example, the script below implements a cosine similarity using these
 								two functions:
 								[source,console]
 								--------------------------------------------------
 								GET my-index-000001/_search
 								{
 								  "query": {
 								    "script_score": {
 								      "query" : {
 								        "bool" : {
 								          "filter" : {
 								            "term" : {
 								              "status" : "published"
 								            }
 								          }
 								        }
 								      },
 								      "script": {
 								        "source": """
 								          float[] v = doc['my_dense_vector'].vectorValue;
 								          float vm = doc['my_dense_vector'].magnitude;
 								          float dotProduct = 0;
 								          for (int i = 0; i < v.length; i++) {
 								            dotProduct += v[i] * params.queryVector[i];
 								          }
 								          return dotProduct / (vm * (float) params.queryVectorMag);
 								        """,
 								        "params": {
 								          "queryVector": [4, 3.4, -0.2],
 								          "queryVectorMag": 5.25357
 								        }
 								      }
 								    }
 								  }
 								}
 								--------------------------------------------------