Commit Graph

1244 Commits

Author SHA1 Message Date
István Zoltán Szabó 9b404099b4
[DOCS] Adds links to token section in ESLER conceptual. (#101033) 2023-10-18 11:30:38 +02:00
Liam Thompson eab813f8cb
[DOCS] Migrate Behavioral Analytics docs to ES ref (#100704)
* [DOCS] Migrate Behavioral Analytics docs to ES ref

* Fix typo

* Fix attributes

* Rename top level heading, fix requirements

* Address review suggestions
2023-10-13 09:05:23 +02:00
István Zoltán Szabó 446ac9f378
[DOCS] Updates ELSER tutorial with inference processor changes (#100420)
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-10-11 17:33:20 +02:00
Abdon Pijpelink 62b85b1d0f
[DOCS] Refresh "Search your data" (#99482)
* Restructure existing docs

* Add draft content

* Changes for MVP

* Reword

* Move Search Applications docs to ES reference

- Renamed files and changed ids per https://github.com/elastic/elasticsearch/pull/100032
- Updated URL syntax for absolute URLs using attribute
- Deleted redirects in redirects.asciidoc

* Fix json source formatting

* Use `source, js`, not `javascript`

* Idem

* Fix console-reponse

* Skip tests for js blocks

* This will definitely fix things

* Use attributes

* Remove commented out redirects

* Fix header level in search-with-synonyms.asciidoc

* Update docs/reference/search/search-your-data/knn-search.asciidoc

Co-authored-by: Chris Cressman <chris@chriscressman.com>

* Fix trailing comma bug

Flagged in #enterprise-search Slack

* Move semantic search under vector search

---------

Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Chris Cressman <chris@chriscressman.com>
Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2023-10-10 10:47:35 +02:00
Carlos Delgado f2dfbfe8c4
[DOCS] Add sparse-vector field type to docs, changed references (#100348) 2023-10-06 14:25:27 +02:00
Luca Cavanna 689a1e490a Merge branch 'main' into lucene_snapshot_9_8 2023-10-02 13:56:12 +02:00
István Zoltán Szabó 9d01def3dc
[DOCS] Changes semantic search tutorials to use ELSER v2 and sparse_vector field type (#100021)
* [DOCS] Changes semantic search tutorials to use ELSER v2 and sparse_vector field type.

* [DOCS] More edits.
2023-09-29 09:24:36 +02:00
Benjamin Trent 92cea2797e
Add nested support for dense_vector fields and knn search (#99763)
* Nested dense_vector support

* Adjust nested support based on new lucene version

* fixing after rebase

* fixing some code

* fixing tests adding transport version

* spotless

* [Automated] Update Lucene snapshot to 9.9.0-snapshot-b3e67403aaf

* Adds new max_inner_product vector similarity function (#99527)

Adds new max_inner_product vector similarity function. This differs from dot_product in the following ways:

Doesn't require vectors to be normalized
Scales the similarity between vectors differently to prevent negative scores

* requiring top level filter to be parent filter

* adding docs & fixing tests

* adding and fixing docs

* adding changlog

* removing unnecessary file changes

* removing unused imports

* fixing test

* maybe fix doc tests

* continue tests in docs

* fixing more tests

* fixing tests

---------

Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2023-09-28 11:38:04 -04:00
Matteo Piergiovanni d9c15c526e
Add counters to _clusters response for all states (#99566)
To help the user know what the possible cluster states are and to 
provide an accurate accounting, we added counters summarising
`running`, `partial` and `failed` clusters to the `_clusters` section.
Changes:
- Now in the response is present the number of `running` clusters.
- We split up `partial` and `successful` (before was summed up in the 
`successful` counter).
- We now have a counter for `failed` clusters.
- Now `total` is always equal to `running` + `skipped` + `failed` + 
`partial` + `successful`.
2023-09-28 09:28:45 +02:00
Ignacio Vera 4bc1afddda
Move Aggregator#buildTopLevel() to search worker thread. (#98715)
This commit introduces an AggregatorCollector that contains a finish method which performs aggregation 
post-collection and builds the internal aggregation for this collector. This method is called on the worker 
thread at the end of the collection phase.
2023-09-19 09:46:51 +02:00
David Pilato 7064bc9e5c
Generated field is `ml.tokens` (#99049)
The generated field name is `ml.tokens` and not `ml-tokens`.
2023-09-13 15:21:27 +02:00
István Zoltán Szabó f5dc68abc6
[DOCS] Fine-tunes the reindexing step of the ELSER tutorial. (#99155) 2023-09-04 11:04:58 +02:00
Michael Peterson 649821e992
Support cluster/details for CCS minimize_roundtrips=false (#98457)
This commit tracks progress for each shard search by cluster alias
using a new SearchProgressListener (CCSSingleCoordinatorSearchProgressListener).
Both sync and async CCS searches use this new progress listener when
minimize_roundtrips=false.

Two of the SearchProgressListener method had to be extended to allow tracking
per-cluster took values (TransportSearchAction.SearchTimeProvider) and
whether searches timed out (by passing in QuerySearchResult to the onQueryResult
listener method).

This commit brings parity between minimize_roundtrips=true and false to have
the same _cluster/details sections in CCS search responses.

Note that there are still a few differences between minimize_roundtrips=true and false.
1. The per-cluster took value for minimize_roundtrips=true is accurate, but the
   for 'false' it is only measured at the granualarity of each partial reduce,
   so the per cluster took time is overestimated in basically all cases.
2. For minimize_roundtrips=true, a skip_unavailable=false cluster that disconnects
   during the search or has all searches on all shards fail, will cause the entire
   search to fail. This is (still) not true for minimize_roundtrips=false. The search
   is only failed if the skip_unavailable=false cluster cannot be connected to at the
   start of the search. (This will likely be changed in a follow up ticket that implements
   fail-fast logic for in-progress searches that should fail due to a skip_unavailable=true
   cluster failing.)
3. The shard accounting for minimize_roundtrips=false is always accurate (total shard counts
   are known at the start of the search). For minimize_roundtrips=true, the shard accounting
   is only accurate per cluster unless all clusters have successful (or partially successful)
   searches. For clusters that have failures we do not have shard count info.
2023-08-31 12:56:20 -04:00
Liam Thompson dfbec46c3d
[Docs] Add link to labs from semantic search overview (#98985) 2023-08-30 10:54:24 +02:00
Liam Thompson a3c96caa51
[DOCS] Add link to Elasticsearch labs ELSER Python notebook (#98983)
* Add link to Elasticsearch labs ELSER Python notebook

* Fix typos

* Use {es} variable

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2023-08-29 15:26:00 +02:00
Abdon Pijpelink 1955bd8ad4
[DOCS] New docs for remote clusters using API key authentication (#98330)
* New docs structure for remote clusters

* Fix broken cross-book link errors

* More broken cross-book link errors

* Remove redirects for new pages

* Link to generic remote cluster docs instead

* Drop 'API' from the abbreviated title

* Add 'Establish trust with a remote cluster' section

* Restructure 'Establish trust' section into Prprequisite/local/remote instructions

* Add 'Configure roles and users' section

* Add 'Connect to a remote cluster' section

* Move version compatibility to prerequisites

* Fix test errors

* Incorporate review feedback

* Mention version 8.10 or later in the intro for API keys

* Add license prerequisite
2023-08-24 12:30:03 +02:00
Kathleen DeRusso 8c12a7b7cd
Query rules docs clarification (#98605)
* Query rules docs clarification

* Update docs/reference/search/search-your-data/search-using-query-rules.asciidoc

* Update docs/reference/search/search-your-data/search-using-query-rules.asciidoc
2023-08-17 11:11:49 -04:00
Craig Taverner dfe9bdc45f
Simple grammar fix for MVT docs (#98591) 2023-08-17 16:10:26 +02:00
Nick Chow 5de0a9013f
Documentation update that fixes a query rules code example (#98540)
* Change example field in rule query guide

* Change fuzzy to contains to get tests to work

---------

Co-authored-by: Kathleen DeRusso <kathleen.derusso@elastic.co>
2023-08-16 15:14:32 -07:00
Carlos Delgado c596f121b4
Synonyms Overview Documentation (#98202) 2023-08-10 18:07:12 +02:00
Abdon Pijpelink 21ef4f3629
[DOCS] Update CCS compatibility matrix for 8.10 (#98341) 2023-08-10 15:57:47 +02:00
Kathleen DeRusso 0437416c33
Tech debt: Add tests to documentation for query rules, search applications (#98266)
* Add tests for query rules

* More tests

* Fix search app tests

* Fix tests

* Add teardown to tests

* Add tests for list search apps call

* Update test in get search application

* Tweak stack trace

* Make response match in test

---------

Co-authored-by: carlosdelest <carlos.delgado@elastic.co>
2023-08-09 08:01:52 -04:00
Michael Peterson 169f7d1774
Add specific cluster error info, shard info and additional metadata for CCS when minimizing roundtrips (#97731)
For CCS searches with ccs_minimize_roundtrips=true, when an error is returned, it is unclear which cluster
caused the problem. This commit adds additional accounting and error information to the search response
for each cluster involved in a cross-cluster search.

The _clusters section of the SearchResponse has a new details section added with an entry for each cluster
(remote and local). It includes status info, shard accounting counters and error information that are added
incrementally as the search happens.

The search on each cluster can be in one of 5 states:
RUNNING
SUCCESSFUL - all shards were successfully searched (successful or skipped)
PARTIAL - some shard searches failed, but at least one succeeded and partial data has been returned
SKIPPED - no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=true
FAILED - no shards were successfully searched (all failed or cluster unavailable) when skip_unavailable=false

A new SearchResponse.Cluster object has been added. Each TransportSearchAction.CCSActionListener
(one for each cluster) has a reference to a separate Cluster instance and updates once it gets back 
information from its cluster.

The SearchResponse.Clusters object only uses the new Cluster object for CCS minimize_roundtrips=true.
For local-only searches and CCS minimize_roundtrips=false, it uses the current  Clusters object as before.

Follow on work will change CCS minimize_roundtrips=false to also use the new Cluster model and update
state in the _cluster/details section.

The Cluster objects are immutable, so a CAS operation is required to swap in new state to the 
map of Cluster objects held by the `SearchResponse.Clusters` class. This concurrency model is 
a little bit of overkill for the minimize_roundtrips=true use case, but it will be necessary for 
supporting minimize_roundtrips=false, since updates there will be done per shard, not per cluster.
2023-08-07 12:32:06 -04:00
Kathleen DeRusso 23e35d5687
[Query Rules] Add documentation for rule_query (#97667)
* Add docs for rule query

* Add test

* Fix formatting in rule query dsl

* Remove query string as required from rule query docs

* PR feedback

* Update with API changes

* Expand and clarify 'search using query rules' doc

* Clean up wording

* Update put syntax

* Fix examples after refactor

* Update docs/reference/query-dsl/rule-query.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* PR feedback + update privilege

* PR feedback

* More PR feedback

* Small correction

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-08-02 15:56:06 -04:00
Abdon Pijpelink 48b3a85741
[DOCS] Update RRF tech preview statement (#97851)
* [DOCS] Update RRF tech preview statement

* Add 'rank' and 'sub_searches'
2023-07-24 13:55:06 +02:00
Abdon Pijpelink 40409bf8ca
[DOCS] Semantic search page (#97715)
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2023-07-20 10:45:13 +02:00
István Zoltán Szabó 57fd6b84fb
[DOCS] Expands ELSER tutorial with optimization info (#97392)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-07-19 10:38:11 +02:00
Michael Peterson eaa86796a7
Add completion_time time field to async_search get and status response (#97700)
The completion_time is set as the start_time (already present) plus the 'took'
time that is set in the SearchResponse object and only if the isRunning status == false
since took is set even for in-progress searches.

We use the 'took' field because it is based on relative time, not absolute wall clock time
which can go backwards due to NTP issues. See the comments in TransportSearchAction about
the SearchTimeProvider for details.

Closes #88640
2023-07-17 09:13:15 -04:00
Mayya Sharipova f8c626f792
Track max_score in collapse when requested (#97703)
Before we used to track max_score in collapse when requested (track_scores=true)
or when there is no sort in collapse (see PR#27122). But this feature
was lost through refactoring and changes.

This PR restores this feature.

Closes #97653
2023-07-17 06:48:00 -04:00
Abdon Pijpelink 0f810b19e9
[DOCS] Clarify that dense vectors can be created with ES (#97636)
* [DOCS] Clarify that dense vectors can be created with ES

* Fix rendering issue

* Break up long sentence
2023-07-13 14:04:32 +02:00
István Zoltán Szabó 9cd609f22c
[DOCS] Adds deployment_id as an option to query_vector_builder (#97576) 2023-07-12 09:35:36 +02:00
Jack Conradson f2b0434ee2
Mark rank and sub_searches as tech preview (#97573)
rank and sub_searches are in tech preview. This adds the tech preview text that is required in the docs for these features.
2023-07-11 09:28:46 -07:00
Marc-Antoine Leclercq b1d150babf
Fix typo on semantic-search-elser.asciidoc (#97551)
MACRO => MARCO
2023-07-11 11:52:26 +02:00
Luca Cavanna 7df388df64
Make terminate_after early termination friendly (#97540)
There are situations in which the terminate_after functionality causes
the collection to keep on going although there is nothing to collect,
with the only goal of incrementing the counter of collected docs and
eventually early terminating which sets the `terminated_early` flag
in the search response to true.

When docs collection early terminates, we should rather honor the
corresponding `CollectionTerminatedException` that is thrown, and
adjust expectations around the fact that `terminate_after` affects
actual collection of documents, meaning that it can't be honored if
the threshold has not been reached by the team the collection early
terminates for other reasons.

This commit adjust the QueryPhaseCollector behavior to do that, which
allows for some additional simplifications.

Closes #97269
2023-07-11 10:14:12 +02:00
Michael Peterson 6dd1841dbc
Allow users to run the painless execute API on a remote cluster shard (#97335)
Added a clusterAlias to the Painless execute Request object, so that index
expressions in the request of the form "myremote:myindex" will be parsed to
set clusterAlias to "myremote" and the index to "myindex".

If clusterAlias is null, then it is executed against a shard on the local cluster, as before.
If clusterAlias is non-null, then the SingleShardTransportAction is sent to the remote cluster,
where it will run the full request (doing remote coordination). Note that the new clusterAlias 
field is not Writeable so that when it is sent to the remote cluster it will only see the index
name, not the clusterAlias (which it wouldn't know how to handle correctly).

Added PainlessExecuteIT test that tests cross-cluster calls

Updated painless-execute-script end user docs to indicate support for cross-cluster executions
2023-07-10 12:27:00 -04:00
Christoph Büscher 192597d795
Limit _terms_enum prefix size (#97488)
Currently the prefix size of the _terms_enum endpoint are not limited in size.
Since they run against a keyword field and build automata, this can lead to high memory
consumption and the danger of running OOM. This change check the size of the prefix
early in the rest request and throw a validation error in case it exceeds
IndexWriter.MAX_TERM_LENGTH, which is the same limit we apply to the length of
keyword field values anyway, so this comes at no loss in functionality.

Closes #96572
2023-07-10 12:21:07 +02:00
Luca Cavanna f5a2af6c71
Query phase: fold collector wrappers into a single top level collector (#97030)
The query phase uses a number of different collectors and combines them together, pretty much one per feature that the search API exposes: there is a collector for post_filter, one for min_score, one for terminate_after, one for aggs. While this is very flexible, we always combine such collectors together in the same way (e.g. terminate_after must be the first one, post_filter is only applied to top docs collection, min score is applied to both aggs and top docs). This means that despite we could flexibly compose collectors, we need to apply each feature predictably which makes the composability not needed. Furthermore, composability causes complexity.

The terminate_after functionality is a clear example of complexity introduced as a consequence of having a complex collector tree: it relies on a multi collector, and throws an exception to force terminating the collection for all other collectors in the tree. If there was a single collector aware of post_filter, min_score and terminate_after at the same time, we could simply reuse Lucene mechanisms to early terminate the collection (CollectionTerminatedException) instead of forcing the termination throwing an exception that Lucene does not handle.

Furthermore, MultiCollector is a complex and generic collector to combine multiple collectors together, while we always every combine maximum two collectors with it, which are more or less fixed (e.g. top docs and aggs).

This PR introduces a new top-level collector that is inspired by MultiCollector in that it holds the top docs and the optional aggs collector and applies post_filter, min_score as well as terminate_after as part of its execution. This allows us to have a specialized collector for our needs, less flexibility and more control. This surfaced some strange behaviour that we may want to change as a follow-up in how terminate_after makes us collecting docs even when all possible collections have been early terminated. The goal of this PR though is to have feature parity with query phase before the refactoring, without any change of behaviour.

A nice benefit of this work is that it allows us to rely on CollectionTerminatedException for the terminate_after functionality. This simplifies the introduction of multi-threaded collector managers when it comes to handling exceptions.
2023-06-30 12:48:13 +02:00
James Rodewig ff84ad1469
[DOCS] Note license requirements for CCS (#97252)
Notes that CCS requires both clusters to use the same license level for full capabilities.
2023-06-29 16:55:10 -04:00
Jack Conradson bca4995fc8
Add basic documentation for sub searches (#97025)
This adds basic documentation for the sub_searches top-level element in the search API. (#96224).
2023-06-28 07:02:38 -07:00
István Zoltán Szabó a62402ce96
[DOCS] Adjusts the note about minimum recommended node size on the ELSER tutorial page (#97083)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2023-06-26 11:09:18 +02:00
Michael Peterson afbf1f5ca1
Profile API should show node details as well as shard details (#96396)
Added additional fields to SearchProfileResults for XContent output: node_id, cluster, index, shard_id.
It parses the existing composite ID using the new parseProfileShardId method, which reverses
the SeachShardTarget.toString method.

No new information is added here, merely the splitting out of the four pieces of information
in the profile shards "composite" id that is created by the SeachShardTarget.toString method.

Profile/shards output now has the form:
```
  "profile": {
    "shards": [
      {
        "id": "[2m7SW9oIRrirdrwirM1mwQ][blogs][0]",
        "node_id": "2m7SW9oIRrirdrwirM1mwQ",
        "shard_id": "0",
        "index": "blogs",
        "cluster": "(local)",
        "searches": [ ... ]
        ...
      },
      {
        "id": "[UngEVXTBQL-7w5j_tftGAQ][remote1:blogs][2]",
        "node_id": "UngEVXTBQL-7w5j_tftGAQ",
        "shard_id": "2",
        "index": "blogs",
        "cluster": "remote1",
        "searches": [ ... ]
        ...
```

where the latter is on a remote cluster and you can see that as the prefix on the index name.

Partially addresses #25896

Added yamlRestTest for the new fields in the profile response.
2023-06-24 14:12:25 -04:00
István Zoltán Szabó 27dec1a605
[DOCS] Adds note to the tutorial about the recommended ML node size for ELSER. (#96880) 2023-06-15 18:03:41 +02:00
István Zoltán Szabó 0469fe5f3e
[DOCS] Makes ELSER mapping requirements clearer (#96854)
Makes ELSER mapping requirements clearer.
2023-06-15 11:27:45 +02:00
István Zoltán Szabó 80bc048aaf
[DOCS] Adds size parameter to reindex call in ELSER tutorial. (#96820) 2023-06-14 13:55:46 +02:00
István Zoltán Szabó 656d367e8d
[DOCS] Removes the technical preview admonition from query_vector_builder docs. (#96735) 2023-06-12 09:55:39 +02:00
Michael Peterson 110b1a686e
Add end-user documentation for CCS using async-search (#96507)
Added documentation to search-across-clusters.asciidoc showing that async-search
can now support the ccs_minimize_roundtrips=true flag and how it behaves relative to
async CCS when ccs_minimize_roundtrips=true.

I also updated the "Don't minimize network roundtrips" section to reflect the fact that the 
REST based Search Shards API is no longer called but rather an internal transport-layer only 
version of search_shards.
2023-06-09 08:55:38 -04:00
István Zoltán Szabó 53c082b5aa
[DOCS] Fixes field name in text_expansion query. (#96724) 2023-06-09 11:43:46 +02:00
Luca Cavanna 2b67a45fc2
[DOCS] Remove leftover experimental tag for knn search (#96722)
Knn search was made GA in Elasticsearch 8.5, see #91065 .
This commit removes a leftover experimental marking from the search
docs.
2023-06-09 11:10:03 +02:00
István Zoltán Szabó 890dd08df0
[DOCS] Adds a compound query example to the ELSER semantic search tutorial (#96460)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2023-06-07 10:19:24 +02:00
Ignacio Vera 15a6aca060
update docs for vector tile and geohex (#96595)
Geohex aggregation is now supported since Elasticsearch 8.7 for geo_shape fields so update docs accordingly.
2023-06-06 11:39:56 +02:00