Commit Graph

1279 Commits

Author SHA1 Message Date
István Zoltán Szabó 32dbc28e82
[DOCS] Adds disclaimer to semantic search tutorials (#106590) 2024-03-21 11:32:57 +01:00
Ioana Tagirta d01adfff60
Add links to text_expansion in ELSER tutorial (#106490)
* Add links to text_expansion in ELSER tutorial

* Apply suggestions from code review

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-03-20 10:03:04 +01:00
Aurélien FOUCRET e944619e01
Fix typo in the LTR guide. (#106276) 2024-03-13 09:05:47 +01:00
Panagiotis Bailis d471ccb5bb
Adding support for hex-encoded byte vectors on knn-search (#105393) 2024-03-13 09:24:51 +02:00
Jack Conradson 68b0acac8f
Add retrievers using the parser-only approach (#105470)
This enhancement adds a new abstraction to the _search API called "retriever." A 
retriever is something that returns top hits. This adds three initial retrievers called
"standard", "knn", and "rrf". The retrievers use a parser-only approach where they
are parsed and then translated into a SearchSourceBuilder to execute the actual
search.
---------

Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>
2024-03-12 10:11:55 -07:00
Aurélien FOUCRET 5f81c1bbe6
First version of the LTR guide. (#105956) 2024-03-11 17:26:01 +01:00
Nhat Nguyen 863cbf6bb4
Add docs for cross cluster search in ES|QL(#105934)
This change adds a documentation for cross cluster search in ES|QL.

Relates #102954
Closes #105529
2024-03-07 13:15:01 -08:00
István Zoltán Szabó 3dcfbe0732
[DOCS] Changes the cohere example to use a different model (#106037) 2024-03-06 19:40:04 +01:00
István Zoltán Szabó 6ae9dbfda7
[DOCS] Adds cohere service example to the inference API tutorial (#105904)
Co-authored-by: Jonathan Buttner <56361221+jonathan-buttner@users.noreply.github.com>
2024-03-04 16:43:41 +01:00
Liam Thompson 9e5fe197ca
[DOCS] Fix sublist syntax (#105625) 2024-02-19 16:25:31 +01:00
Matteo Piergiovanni 54cfce4379
Flag in _field_caps to return only fields with values in index (#103651)
We are adding a query parameter to the field_caps api in order to filter out 
fields with no values. The parameter is called `include_empty_fields`  and 
defaults to true, and if set to false it will filter out from the field_caps 
response all the fields that has no value in the index.
We keep track of FieldInfos during refresh in order to know which field has 
value in an index. We added also a system property 
`es.field_caps_empty_fields_filter` in order to disable this feature if needed.

---------

Co-authored-by: Matthias Wilhelm <ankertal@gmail.com>
2024-02-08 17:52:21 +01:00
Panagiotis Bailis 7ce8d76559
Making k and num_candidates optional for knn search (#101209) 2024-02-01 15:43:09 +02:00
Michael Peterson 06a25b60c9
Add keep_alive param to the async-search status endpoint (#104629) 2024-01-31 17:25:37 -05:00
David Kyle 2cbe23a189
[DOCS] Dense vector element type should be float for OpenAI (#104966) 2024-01-31 11:13:03 +00:00
Liam Thompson dac0f4a371
[DOCS] Update CCS compatibility matrix for 8.12 (#104663) 2024-01-24 10:18:11 +01:00
Michael Peterson e8370f8c43
Update search-across-clusters API docs to include incremental partial results (#104489) 2024-01-22 08:34:20 -05:00
Benjamin Trent e4feaff900
Add support for more than one inner_hit when searching nested vectors (#104006)
This commit adds the ability to gather more than one inner_hit when
searching nested kNN.

# Global kNN example

```
POST test/_search
{
    "_source": false,
    "fields": [
        "name"
    ],
    "knn": {
        "field": "nested.vector",
        "query_vector": [
            -0.5,
            90,
            -10,
            14.8,
            -156
        ],
        "k": 3,
        "num_candidates": 3,
        "inner_hits": {
            "size": 2,
            "fields": [
                "nested.paragraph_id"
            ],
            "_source": false
        }
    }
}
```

Results in

<details>

```
{
    "took": 66,
    "timed_out": false,
    "_shards": {
        "total": 2,
        "successful": 2,
        "skipped": 0,
        "failed": 0
    },
    "hits": {
        "total": {
            "value": 2,
            "relation": "eq"
        },
        "max_score": 0.009090909,
        "hits": [
            {
                "_index": "test",
                "_id": "2",
                "_score": 0.009090909,
                "fields": {
                    "name": [
                        "moose.jpg"
                    ]
                },
                "inner_hits": {
                    "nested": {
                        "hits": {
                            "total": {
                                "value": 2,
                                "relation": "eq"
                            },
                            "max_score": 0.009090909,
                            "hits": [
                                {
                                    "_index": "test",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 0
                                    },
                                    "_score": 0.009090909,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "0"
                                                ]
                                            }
                                        ]
                                    }
                                },
                                {
                                    "_index": "test",
                                    "_id": "2",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 1
                                    },
                                    "_score": 0.004968944,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "2"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            },
            {
                "_index": "test",
                "_id": "3",
                "_score": 0.0021519717,
                "fields": {
                    "name": [
                        "rabbit.jpg"
                    ]
                },
                "inner_hits": {
                    "nested": {
                        "hits": {
                            "total": {
                                "value": 1,
                                "relation": "eq"
                            },
                            "max_score": 0.0021519717,
                            "hits": [
                                {
                                    "_index": "test",
                                    "_id": "3",
                                    "_nested": {
                                        "field": "nested",
                                        "offset": 0
                                    },
                                    "_score": 0.0021519717,
                                    "fields": {
                                        "nested": [
                                            {
                                                "paragraph_id": [
                                                    "0"
                                                ]
                                            }
                                        ]
                                    }
                                }
                            ]
                        }
                    }
                }
            }
        ]
    }
}
```

</details>

# kNN Query example

With a kNN query, this opens an interesting door, which allows for
multiple inner_hit scoring schemes.

## Nearest by max passage only

```
POST test/_search
{
    "size": 3,
    "query": {
        "nested": {
            "path": "nested",
            "score_mode": "max",
            "query": {
                "knn": {
                    "field": "nested.vector",
                    "query_vector": [
                        -0.5,
                        90,
                        -10,
                        14.8,
                        -156
                    ],
                    "num_candidates": 5
                }
            },
            "inner_hits": {
                "size": 2,
                "_source": false,
                "fields": [
                    "nested.paragraph_id"
                ]
            }
        }
    }
}
```

</details>

closes: https://github.com/elastic/elasticsearch/issues/102950
2024-01-17 11:32:46 -05:00
Benjamin Trent 73f537170b
Update nested knn search documentation about inner-hits (#104154)
Adding a link tag for inner hits behavior and kNN search. Additionally
adding a note that if you are using multiple knn clauses, that the inner
hit name should be provided.
2024-01-10 07:46:42 -05:00
Kathleen DeRusso bdde29720a
Update synonyms doc with warning about index creation (#103476)
* Update synonyms doc with warning about index creation

* PR feedback

* Moved warning in docs
2023-12-18 13:18:51 -05:00
István Zoltán Szabó c55495d502
[DOCS] Adds inference API end-to-end example (#103042)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-12-12 12:02:47 +01:00
Benjamin Trent 7fde357f3a
Improve docs around knn similarity search (#103158)
Adding equations to the docs around how to best calculate similarity & score. The similarity parameter for search was added in 8.8.

The max-inner-product mentions will be removed for all versions before 8.11 when backporting.

closes: https://github.com/elastic/elasticsearch/issues/102924
2023-12-11 14:56:16 -05:00
Abdon Pijpelink 6b60a53732
Update rrf.asciidoc (#103078) (#103109)
typo

(cherry picked from commit 851cab63eb)

Co-authored-by: Ugo Sangiorgi <ugo.sangiorgi@elastic.co>
2023-12-11 13:02:49 +01:00
Benjamin Trent 47b57537ae
Add docs for the include_named_queries_score param (#103155)
The only docs for this _search param were mentioned in the bool query docs. While it makes contextual sense to have it there, we should also add it as a _search parameter in the search API docs.

It was introduced in 8.8.
2023-12-08 14:39:18 -05:00
Kathleen DeRusso 4dd9e2a772
[Query Rules] Add some usability clarifications to docs (#102990)
* [Query Rules] Add some usability clarifications to docs

* Fix typo
2023-12-06 17:16:56 -05:00
Benjamin Trent f00364aefd
Add byte quantization for float vectors in HNSW (#102093)
Adds new `quantization_options` to `dense_vector`. This allows for
vectors to be automatically quantized to `byte` when indexed.

Example:

```
PUT vectors
{
  "mappings": {
    "properties": {
      "my_vector": {
        "type": "dense_vector",
        "index": true,
        "index_options": {
          "type": "int8_hnsw"
        }
      }
    }
  }
}
```

When querying, the query vector is automatically quantized and used when
querying the HNSW graph. This reduces the memory required to only `25%`
of what was previously required for `float` vectors at a slight loss of
accuracy.

This is currently only available when `index: true` and when using
`hnsw`
2023-11-29 12:29:55 -05:00
Luca Cavanna 7c9e8356e6 Merge branch 'main' into lucene_snapshot 2023-11-24 09:57:22 +01:00
Saikat Sarkar d4f01fc7b3
Gather vector_operation count for knn search (#102032) 2023-11-21 12:16:21 -07:00
Luca Cavanna 9cd96df179
Add support for index_filter to open pit (#102388)
The open point in time API accepts a list of indices and opens a point in time view against those indices.
Like we do already for field caps, this commit allows users to provide an index_filter parameter as part of
the request body, that will be used to execute the can match phase and exclude the indices that can't possibly
match such filter.

Closes #99740
2023-11-21 15:35:49 +01:00
Kathleen DeRusso 4567d397fa
Clarify text expansion query docs to not suggest enabling track_total_hits for performance (#102102) 2023-11-20 08:56:26 -05:00
István Zoltán Szabó c303ab885a
[DOCS] Simplifies dense vector mapping in semantic search example (#102080) 2023-11-14 10:52:56 +01:00
Abdon Pijpelink 70128f5b74
[DOCS] Mark 'ignore_throttled' deprecated in all docs (#101838) 2023-11-07 13:03:49 +01:00
Abdon Pijpelink 49c5b03d57
[DOCS] Update CCS compatibility matrix for 8.11 (#101786) 2023-11-06 08:41:15 +01:00
Mayya Sharipova 61c7483fc9
Make knn search a query (#98916)
This introduced a new knn query:
- knn query is executed during the Query phase similar to all other queries.
- No k parameter, k defaults to  size
- num_candidates is a size of queue for candidates to consider while
  search a graph on each shard
- For aggregations: "size" results are collected with total = size * shards.
   Aggregations will see size * shards results.
- All filters from DSL are applied as post-filters, except: 1) alias filter
 is applied as  pre-filter or 2) a filter provided as a parameter
 inside knn query.
2023-11-01 14:21:40 -04:00
James Rodewig 4c69746c24
[DOCS] Update tech preview copy (#101606)
Updates the copy for tech preview and experimental features in the Elasticsearch docs.

Relates to https://github.com/elastic/docs/pull/2807
2023-10-31 10:31:07 -04:00
Alan Woodward f7a9783d45
Check that scripts produce correct json in render template action (#101518)
If a mustache script that outputs badly-formed json is referred to in a render
template request, then the error returned will be a 500 server error, rather than
a 400 json parsing error. This is because rendering templates skips json parsing,
and so the error ends up being caught in the REST layer instead.

This commit changes the template rendering logic to always parse the output of
the script, catching json errors higher in the stack and allowing us to return
the correct status code. This also means that errors are correctly detected and
returned as part of multi search template requests.

Fixes #101477
2023-10-30 13:25:39 +00:00
István Zoltán Szabó 9b404099b4
[DOCS] Adds links to token section in ESLER conceptual. (#101033) 2023-10-18 11:30:38 +02:00
Liam Thompson eab813f8cb
[DOCS] Migrate Behavioral Analytics docs to ES ref (#100704)
* [DOCS] Migrate Behavioral Analytics docs to ES ref

* Fix typo

* Fix attributes

* Rename top level heading, fix requirements

* Address review suggestions
2023-10-13 09:05:23 +02:00
István Zoltán Szabó 446ac9f378
[DOCS] Updates ELSER tutorial with inference processor changes (#100420)
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-10-11 17:33:20 +02:00
Abdon Pijpelink 62b85b1d0f
[DOCS] Refresh "Search your data" (#99482)
* Restructure existing docs

* Add draft content

* Changes for MVP

* Reword

* Move Search Applications docs to ES reference

- Renamed files and changed ids per https://github.com/elastic/elasticsearch/pull/100032
- Updated URL syntax for absolute URLs using attribute
- Deleted redirects in redirects.asciidoc

* Fix json source formatting

* Use `source, js`, not `javascript`

* Idem

* Fix console-reponse

* Skip tests for js blocks

* This will definitely fix things

* Use attributes

* Remove commented out redirects

* Fix header level in search-with-synonyms.asciidoc

* Update docs/reference/search/search-your-data/knn-search.asciidoc

Co-authored-by: Chris Cressman <chris@chriscressman.com>

* Fix trailing comma bug

Flagged in #enterprise-search Slack

* Move semantic search under vector search

---------

Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Chris Cressman <chris@chriscressman.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2023-10-10 10:47:35 +02:00
Carlos Delgado f2dfbfe8c4
[DOCS] Add sparse-vector field type to docs, changed references (#100348) 2023-10-06 14:25:27 +02:00
Luca Cavanna 689a1e490a Merge branch 'main' into lucene_snapshot_9_8 2023-10-02 13:56:12 +02:00
István Zoltán Szabó 9d01def3dc
[DOCS] Changes semantic search tutorials to use ELSER v2 and sparse_vector field type (#100021)
* [DOCS] Changes semantic search tutorials to use ELSER v2 and sparse_vector field type.

* [DOCS] More edits.
2023-09-29 09:24:36 +02:00
Benjamin Trent 92cea2797e
Add nested support for dense_vector fields and knn search (#99763)
* Nested dense_vector support

* Adjust nested support based on new lucene version

* fixing after rebase

* fixing some code

* fixing tests adding transport version

* spotless

* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf

* Adds new max_inner_product vector similarity function (#99527)

Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores

* requiring top level filter to be parent filter

* adding docs & fixing tests

* adding and fixing docs

* adding changlog

* removing unnecessary file changes

* removing unused imports

* fixing test

* maybe fix doc tests

* continue tests in docs

* fixing more tests

* fixing tests

---------

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2023-09-28 11:38:04 -04:00
Matteo Piergiovanni d9c15c526e
Add counters to _clusters response for all states (#99566)
To help the user know what the possible cluster states are and to 
provide an accurate accounting, we added counters summarising
`running`, `partial` and `failed` clusters to the `_clusters` section.
Changes:
- Now in the response is present the number of `running` clusters.
- We split up `partial` and `successful` (before was summed up in the 
`successful` counter).
- We now have a counter for `failed` clusters.
- Now `total` is always equal to `running` + `skipped` + `failed` + 
`partial` + `successful`.
2023-09-28 09:28:45 +02:00
Ignacio Vera 4bc1afddda
Move Aggregator#buildTopLevel() to search worker thread. (#98715)
This commit introduces an AggregatorCollector that contains a finish method which performs aggregation 
post-collection and builds the internal aggregation for this collector. This method is called on the worker 
thread at the end of the collection phase.
2023-09-19 09:46:51 +02:00
David Pilato 7064bc9e5c
Generated field is `ml.tokens` (#99049)
The generated field name is `ml.tokens` and not `ml-tokens`.
2023-09-13 15:21:27 +02:00
István Zoltán Szabó f5dc68abc6
[DOCS] Fine-tunes the reindexing step of the ELSER tutorial. (#99155) 2023-09-04 11:04:58 +02:00
Michael Peterson 649821e992
Support cluster/details for CCS minimize_roundtrips=false (#98457)
This commit tracks progress for each shard search by cluster alias
using a new SearchProgressListener (CCSSingleCoordinatorSearchProgressListener).
Both sync and async CCS searches use this new progress listener when
minimize_roundtrips=false.

Two of the SearchProgressListener method had to be extended to allow tracking
per-cluster took values (TransportSearchAction.SearchTimeProvider) and
whether searches timed out (by passing in QuerySearchResult to the onQueryResult
listener method).

This commit brings parity between minimize_roundtrips=true and false to have
the same _cluster/details sections in CCS search responses.

Note that there are still a few differences between minimize_roundtrips=true and false.
1. The per-cluster took value for minimize_roundtrips=true is accurate, but the
   for 'false' it is only measured at the granualarity of each partial reduce,
   so the per cluster took time is overestimated in basically all cases.
2. For minimize_roundtrips=true, a skip_unavailable=false cluster that disconnects
   during the search or has all searches on all shards fail, will cause the entire
   search to fail. This is (still) not true for minimize_roundtrips=false. The search
   is only failed if the skip_unavailable=false cluster cannot be connected to at the
   start of the search. (This will likely be changed in a follow up ticket that implements
   fail-fast logic for in-progress searches that should fail due to a skip_unavailable=true
   cluster failing.)
3. The shard accounting for minimize_roundtrips=false is always accurate (total shard counts
   are known at the start of the search). For minimize_roundtrips=true, the shard accounting
   is only accurate per cluster unless all clusters have successful (or partially successful)
   searches. For clusters that have failures we do not have shard count info.
2023-08-31 12:56:20 -04:00
Liam Thompson dfbec46c3d
[Docs] Add link to labs from semantic search overview (#98985) 2023-08-30 10:54:24 +02:00
Liam Thompson a3c96caa51
[DOCS] Add link to Elasticsearch labs ELSER Python notebook (#98983)
* Add link to Elasticsearch labs ELSER Python notebook

* Fix typos

* Use {es} variable

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2023-08-29 15:26:00 +02:00