Commit Graph

11731 Commits

Author SHA1 Message Date
elasticsearchmachine c5eb558371 Bump to version 8.16.0 2024-07-04 09:10:43 +00:00
Martijn van Groningen 6eaf171411
Add some information about the impact of index.codec setting. (#110413) 2024-07-04 09:20:19 +02:00
George Wallace b6e9860919
Update role-mapping-resources.asciidoc (#110441)
made it clear that some characters need to be escaped properly

Co-authored-by: Jan Doberstein <jan.doberstein@elastic.co>
2024-07-03 13:00:52 -06:00
Lisa Cawley 748dbd51e4
[DOCS] Add serverless details in Elasticsearch security privileges (#109718) 2024-07-03 09:52:21 -07:00
Tim Grein 406b969c62
[Inference API] Add Google Vertex AI reranking docs (#110390) 2024-07-03 14:03:12 +02:00
Johannes Fredén 89cd966b24
Add bulk delete roles API (#110383)
* Add bulk delete roles API
2024-07-03 11:04:53 +02:00
Sylvain Wallez e78bdc953a
ESQL: add Arrow dataframes output format (#109873)
Initial support for Apache Arrow's streaming format as a response for ES|QL. It triggers based on the Accept header or the format request parameter.

Arrow has implementations in every mainstream language and is a backend of the Python Pandas library, which is extremely popular among data scientists and data analysts. Arrow's streaming format has also become the de facto standard for dataframe interchange. It is an efficient binary format that allows zero-cost deserialization by adding data access wrappers on top of memory buffers received from the network.

This PR builds on the experiment made by @nik9000 in PR #104877

Features/limitations:
- all ES|QL data types are supported
- multi-valued fields are not supported
- fields of type _source are output as JSON text in a varchar array. In a future iteration we may want to offer the choice of the more efficient CBOR and SMILE formats.

Technical details:

Arrow comes with its own memory management to handle vectors with direct memory, reference counting, etc. We don't want to use this as it conflicts with Elasticsearch's own memory management.

We therefore use the Arrow library only for the metadata objects describing the dataframe schema and the structure of the streaming format. The Arrow vector data is produced directly from ES|QL blocks.

---------

Co-authored-by: Nik Everett <nik9000@gmail.com>
2024-07-03 10:29:57 +02:00
Carlos Delgado 30b32b6a46
semantic_text: Updated copy-to docs (#110350) 2024-07-03 10:18:40 +02:00
Fang Xing 8abc8857f2
[ES|QL] weighted_avg (#109993)
* weighted_avg
2024-07-02 18:29:02 -04:00
Matt Culbreth 81b8495388
Mark the Redact processor as Generally Available 2024-07-02 16:58:57 -04:00
Nik Everett 6fbc52d170
ESQL docs: Push down needs index and doc_values (#110353)
This adds a `NOTE` to each comparison saying that pushing the comparison
to the search index requires that the field have an `index` and
`doc_values`. This is unique compared to the rest of Elasticsearch which
only requires an `index` and it's caused by our insistence that
comparisons only return true for single-valued fields. We can in future
accelerate comparisons without `doc_values`, but we just haven't written
that code yet.
2024-07-02 14:22:50 -04:00
Kathleen DeRusso 7a1d532ffb
Pass over Sparse Vector docs for correctness (#110282)
* Remove legacy mentions of text expansion queries

* Add missing query_vector param to sparse_vector query docs

* Fix formatting errors in sparse vector query dsl doc

* Remove unnecessary test setup block
2024-07-02 13:37:25 -04:00
Felix Barnsteiner cdbe092d90
Update docs now that keyword dimensions support ignore_above (#110385)
This is a follow-up from https://github.com/elastic/elasticsearch/pull/110337
2024-07-02 17:04:57 +02:00
Johannes Fredén 55476041d9
Add BulkPutRoles API (#109339)
* Add BulkPutRoles API
2024-07-02 15:45:39 +02:00
Tim Grein 390439ad9f
[Inference API] Add Google Vertex AI text embeddings docs (#110317) 2024-07-02 14:47:14 +02:00
Mike Pellegrini d288dbf94e
Fix Semantic Query Parameter Formatting (#110355) 2024-07-02 08:07:35 -04:00
Iván Cea Fontenla c89ee3b648
ESQL: Renamed TopList to Top (#110347)
Rename TopList aggregation to Top, after internal discussions
2024-07-02 03:52:24 +10:00
Jedr Blaszyk 3b827f6a8c
Create `manage_connector` privilege (#110128)
* Create manage_seaech_connector privilege

* `manage_search_connector` -> `manage_connector` and exclude connector secrets patterns from this privilege

* Add `monitor_connector` privilege

* Update Kibana system privilege to monitor_connector for telemetry

* Rename privilege to 'manage_connector_state'

Since privilege names are often namespaced and used with globs, we want to ensure that if there's a future privilege like `manage_connector_secrets`, that it is not implicitly included in this new privileg's <name>*. By extending the privilege name to include "_state", we better namespace this distinct from any "_secrets" namespace.

* Revert "Rename privilege to 'manage_connector_state'"

This reverts commit 70b89eee76.
After further discussion with the security team, this name change is not needed after all
since the secret management privileges aren't currently prefixed with "manage_"

---------

Co-authored-by: Sean Story <sean.j.story@gmail.com>
2024-07-01 12:41:28 -05:00
Tim Grein 99749aa277
[Inference API] Fix wording in Azure AI Studio docs (#110322) 2024-07-01 14:37:56 +02:00
Tim Grein 6accd6e247
[Inference API] Fix wording in delete-inference docs (#110321) 2024-07-01 13:37:30 +02:00
Tim Grein 35eae4029a
Fix typo in get-inference docs (retrives -> retrieves) (#110320) 2024-07-01 10:13:48 +02:00
István Zoltán Szabó 43f5696406
[DOCS] Refactors PUT inference API docs (#109812) 2024-07-01 10:12:16 +02:00
Nikolaj Volgushev 78c812f845
Fix security index settings docs (#110126)
Docs tweak with a typo fix and a clarification on how the two available
settings interact (essentially
https://github.com/elastic/elasticsearch/issues/27871). I'm also open to
including this info in the more generic settings API but feels like a
simple enough callout to add to the security API.
2024-07-01 18:07:15 +10:00
Kostas Krikellas 6ae652f90e
Support index sorting with nested fields (#110251)
This PR piggy-backs on recent changes in Lucene 9.11.1
(https://github.com/apache/lucene/pull/12829,
https://github.com/apache/lucene/pull/13341/), setting the parent doc
when nested fields are present. This allows moving nested documents
along with parent ones during sorting.

With this change, sorting is now allowed on fields outside nested
objects. Sorting on fields within nested objects is still not supported
(throws an exception).

Fixes #107349
2024-07-01 17:24:17 +10:00
Costin Leau b906ce3d66
ESQL: change from quoting from backtick to quote (#108395)
* ESQL: change from quoting from backtick to quote

For historical reasons, the source declaration inside FROM command is
 treated as an identifier, using backticks (`) for escaping the value.
This is inconsistent since the source is not an identifier (field name)
 but an index name which has different semantics.
 `index` means a field name index while "index" means a literal with
 said value.

In case of FROM, the index name/location is more like a literal (also in
 unquoted form) than an identifier (that is a reference to a value).

This PR tweaks the grammar and plugs in the quoted string logic so that
 both the single quote (") and triple quote (""") are allowed.

* Update grammar

* Add more tests

* Add a few more tests

* Add extra test

* Update docs/changelog/108395.yaml

* Adress review comments

* Add doc note

* Revert test rename

* Fix quoting with remote cluster

* Update docs/reference/esql/source-commands/from.asciidoc

Co-authored-by: marciw <333176+marciw@users.noreply.github.com>

---------

Co-authored-by: Bogdan Pintea <bogdan.pintea@elastic.co>
Co-authored-by: Bogdan Pintea <pintea@mailbox.org>
Co-authored-by: marciw <333176+marciw@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-06-30 20:01:31 +03:00
George Wallace dea593db3f
Update behavioral-analytics-start.asciidoc (#110271) 2024-06-28 09:01:48 -06:00
Mayya Sharipova 405e39660b
Support k parameter for knn query (#110233)
Introduce an optional k param for knn query

If k is not set, knn query has the previous behaviour:
- `num_candidates` docs  is collected from each shard. This `num_candidates` docs
are used for combining with results with other queries and aggregations on each shard.
- docs from all shards are merged to produce the top global `size` results

If k is set, the behaviour instead is following:
- `k` docs is collected from each shard. This `k` docs are used for
combining results with other queries and aggregations on each shard.
- similarly, docs from all shards are merged to produce the top global `size`
results.

Having `k` param makes it more intuitive for users to address their needs.
They also don't need to care and can skip `num_candidates` param for this query
as it is of more internal details to tune how knn search operates.

Closes #108473
2024-06-28 09:59:28 -04:00
Nick Tindall 8edb3b07e7
Make repository analysis API available to non-operators (#110179)
Closes #100381
2024-06-28 09:07:20 +10:00
Kathleen DeRusso 19fc0d9cad
Deprecate text_expansion and weighted_tokens queries (#109880) 2024-06-27 13:24:57 -04:00
Iván Cea Fontenla fc0313f429
ESQL: Add aggregations testing base and docs (#110042)
- Added a new `AbstractAggregationTestCase` base class for tests, that shares most of the code of function tests, adapted for aggregations. Including both testing and docs generation.
  - Reused the `AbstractFunctionTestCase` class to also let us test evaluators if the aggregation is foldable
- Added a `TopListTests` example
  - This includes the docs for Top_list _(Also added a missing include of Ip_prefix docs)_
- Adapted Kibana docs to use `type: "agg"` (@drewdaemon)

The current tests are very basic: Consume a page, generate an output,
all in Single aggregation mode (No intermediates, no grouping). More
complex testing will be added in future PRs

Initial PR of https://github.com/elastic/elasticsearch/issues/109917
2024-06-27 21:21:55 +10:00
Jedr Blaszyk 5179b0db29
[Connector API] Update status when setting/resetting connector error (#110192) 2024-06-27 12:17:33 +02:00
Benjamin Trent 5add44d7d1
Adds new `bit` element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.

`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.

`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.

For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.

Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.

closes: https://github.com/elastic/elasticsearch/issues/48322
2024-06-27 04:48:41 +10:00
István Zoltán Szabó 31f0253b43
[DOCS] Adds link to ES-Cohere notebook and clarifies requirements. (#110195) 2024-06-26 17:22:40 +02:00
Oleksandr Kolomiiets b68e7d76c9
Remove obsolete sentence from TSDS docs (#110162) 2024-06-26 08:21:52 -07:00
Kostas Krikellas 3afd53e26a
Remove `average` from downsampling statistics in documentation (#110189) 2024-06-26 17:23:06 +03:00
Pius 79623c7609
Update search-application-api.asciidoc (#110113)
Add a subsection about cross cluster search support (or the lack of).
2024-06-26 12:20:28 +02:00
David Kyle 3c1c8d0f32
[ML] Increase response size limit for batched requests (#110112)
Increase the default to 50MB and do not retry when the limit is exceeded
2024-06-26 10:31:06 +01:00
Kathleen DeRusso 1f46a94dec
Add documentation for individual query rules (#110006)
* Add individual query rule API docs

* Update docs/reference/query-rules/apis/get-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/query-rules/apis/delete-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/query-rules/apis/get-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* PR feedback

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-06-25 14:35:08 -04:00
Benjamin Trent 1c1733d823
Add some docs explaining filter performance and behavior for HNSW (#110108) 2024-06-25 08:42:24 -04:00
Martijn van Groningen 851e955181
Remove obsolete information about tsdb dimensions limit. (#110047) 2024-06-25 11:41:25 +02:00
Martijn van Groningen 1b0e800f5b
Add a note about enabling time series index mode via a component template (#110050)
Closes #109149
2024-06-25 17:22:31 +10:00
Jedr Blaszyk a257fed44b
[Connector API] Add metadata to sync job stats endpoint (#109927) 2024-06-25 08:04:56 +02:00
Mayya Sharipova 5c87eef89d
[DOCS Vectors with cosine automatically normalized (#110071)
PR #99445 introduced automatic normalization of dense vectors with
cosine similarity. This adds a note about this in the documentation.

Relates to #99445
2024-06-22 22:32:25 +10:00
Benjamin Trent d97cb686a5
Correct positioning for unique token filter (#109395)
This is an extension of:
https://github.com/elastic/elasticsearch/pull/35420

closes: https://github.com/elastic/elasticsearch/issues/35411
2024-06-22 09:44:24 +10:00
Kathleen DeRusso 41a61b069b
Mark Query Rules as GA (#110004)
* Mark query rules APIs as stable

* Remove preview label from docs

* Update docs/changelog/110004.yaml
2024-06-21 15:26:51 -04:00
Carlos Delgado d332ed7d16
Enforce synonyms limit on APIs (#109981) 2024-06-21 18:16:16 +02:00
Jan Kuipers 13478b2bca
Fix put inference API docs (#110025)
* Fix put inference API docs

* Update docs/changelog/110025.yaml

* Delete docs/changelog/110025.yaml
2024-06-21 16:01:08 +02:00
Craig Taverner 536d614694
ES|QL ST_DISTANCE Function (#108764)
* WIP Started refactoring in preparation for ST_DISTANCE

* Initial evaluators for ST_DISTANCE

* Update docs/changelog/108764.yaml

* Fix invalid changelog generated by CI

* Register function and get unit tests working

* Fixed failing meta function description tests, and refined descriptions

* Added initial CsvTests and calculate Geo differently to Cartesian

* Added more csv-spec tests and changed to arcDistance for accuracy

* Added generated docs files

* Link to generated docs

* Fix examples tag for linking from generated docs

* Skip wrapper function

And note that we might want to include instead some of the related intelligence from Circle2D::HaversineDistance class

* Added ST_DWITHIN and more tests for ST_DISTANCE and ST_DWITHIN

* Code style

* Added more tests, this time for sorting on distance

* Fixes after rebase on main

* The ST_DWITHIN cannot use BinarySpatialFunction because it is ternary

So we moved the common code to a separate SpatialTypeResolver, and made a simpler TernarySpatialFunction based on a simple TernaryScalarFunction. This had additional consequences, simplifying the points-only cases.

The main reason for this change was to support StDWithinTests which need to test a lot of things that involve varying all three input types, generating expected error strings, etc. The original hack of just adding to BinarySpatialFunction worked for the actual integration tests, but clearly did not satisfy all the use cases tested by the unit tests.

We also restricted ST_DWITHIN to take only a double as the third argument, because otherwise the number of evaluators would explode, since we need a separate evaluator for each Block type, and Integer and Double use different block types.

* Fixed function count after rebasing on main

* Update docs/changelog/108764.yaml

* Added generated docs for ST_DWITHIN

* Connect docs for ST_DWITHIN

* Add back issue link

* Remove support for ST_DWITHIN

* Update docs/changelog/108764.yaml

* Bring back link to issue in changelog

* Update x-pack/plugin/esql/src/main/java/org/elasticsearch/xpack/esql/expression/function/scalar/spatial/StDistance.java

Co-authored-by: Ignacio Vera <iverase@gmail.com>

* Revert reformatting of function descriptions

We should put this into a separate PR

* Github merged commit with incorrectly formatted whitespace

---------

Co-authored-by: Ignacio Vera <iverase@gmail.com>
2024-06-21 11:59:44 +02:00
David Turner 5662f988b2
Remove trappy timeouts in snapshot APIs (#109828)
Wholesale fix of every `TRAPPY_IMPLICIT_DEFAULT_MASTER_NODE_TIMEOUT` in
`o.e.snapshots` and `o.e.repositories`, just pulling them up to the REST
layer (where they become API params), the test suite (where they become
`TEST_REQUEST_TIMEOUT`), or some other place where an explicit value is
available.

Relates #107984
2024-06-21 07:11:12 +10:00
Oleksandr Kolomiiets 8bc5ecdc31
Support synthetic source together with ignore_malformed in histogram fields (#109882) 2024-06-20 09:09:45 -07:00