Commit Graph

16683 Commits

Author SHA1 Message Date
Ignacio Vera ab520d9a65
Fix needsScore computation in GlobalOrdCardinalityAggregator (#113129)
Only use TOP_DOCS if we are going to use dynamic pruning.
2024-10-01 09:01:15 +02:00
Luca Cavanna a1860f0273
Avoid using concurrent collector manager in LuceneChangesSnapshot (#113816)
The searcher never gets an executor set, then we can save the overhead of the
concurrent collector manager / collectors.
2024-10-01 08:56:35 +02:00
Michael Peterson ddba47407d
Collect and display execution metadata for ES|QL cross cluster searches (#112595)
Enhance ES|QL responses to include information about `took` time (search latency), shards, and
clusters against which the query was executed.

The goal of this PR is to begin to provide parity between the metadata displayed for 
cross-cluster searches in _search and ES|QL.

This PR adds the following features:
- add overall `took` time to all ES|QL query responses. And to emphasize: "all" here 
means: async search, sync search, local-only and cross-cluster searches, so it goes
beyond just CCS.
- add `_clusters` metadata to the final response for cross-cluster searches, for both
async and sync search (see example below)
- tracking/reporting counts of skipped shards from the can_match (SearchShards API)
phase of ES|QL processing
- marking clusters as skipped if they cannot be connected to (during the field-caps
phase of processing)

Out of scope for this PR:
- honoring the `skip_unavailable` cluster setting
- showing `_clusters` metadata in the async response **while** the search is still running
- showing any shard failure messages (since any shard search failures in ES|QL are
automatically fatal and _cluster/details is not shown in 4xx/5xx error responses). Note that 
this also means that the `failed` shard count is always 0 in ES|QL `_clusters` section.

Things changed with respect to behavior in `_search`:
- the `timed_out` field in `_clusters/details/mycluster` was removed in the ESQL
response, since ESQL does not support timeouts. It could be added back later
if/when ESQL supports timeouts.
- the `failures` array in `_clusters/details/mycluster/_shards` was removed in the ESQL
response, since any shard failure causes the whole query to fail.

Example output from ES|QL CCS:

```es
POST /_query
{
  "query": "from blogs,remote2:bl*,remote1:blogs|\nkeep authors.first_name,publish_date|\n limit 5"
}
```

```json
{
  "took": 49,
  "columns": [
    {
      "name": "authors.first_name",
      "type": "text"
    },
    {
      "name": "publish_date",
      "type": "date"
    }
  ],
  "values": [
    [
      "Tammy",
      "2009-11-04T04:08:07.000Z"
    ],
    [
      "Theresa",
      "2019-05-10T21:22:32.000Z"
    ],
    [
      "Jason",
      "2021-11-23T00:57:30.000Z"
    ],
    [
      "Craig",
      "2019-12-14T21:24:29.000Z"
    ],
    [
      "Alexandra",
      "2013-02-15T18:13:24.000Z"
    ]
  ],
  "_clusters": {
    "total": 3,
    "successful": 2,
    "running": 0,
    "skipped": 1,
    "partial": 0,
    "failed": 0,
    "details": {
      "(local)": {
        "status": "successful",
        "indices": "blogs",
        "took": 43,
        "_shards": {
          "total": 13,
          "successful": 13,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote2": {
        "status": "skipped",  // remote2 was offline when this query was run
        "indices": "remote2:bl*",
        "took": 0,
        "_shards": {
          "total": 0,
          "successful": 0,
          "skipped": 0,
          "failed": 0
        }
      },
      "remote1": {
        "status": "successful",
        "indices": "remote1:blogs",
        "took": 47,
        "_shards": {
          "total": 13,
          "successful": 13,
          "skipped": 0,
          "failed": 0
        }
      }
    }
  }
}
```

Fixes https://github.com/elastic/elasticsearch/issues/112402 and https://github.com/elastic/elasticsearch/issues/110935
2024-09-30 16:03:39 -04:00
Benjamin Trent 5c840f72b7
Deprecate dutch_kp and lovins stemmer as they are removed in Lucene 10 (#113143)
Lucene 10 has upgraded its Snowball stemming support, as part of those
upgrades, two no longer supported stemmers were removed, `KpStemmer` and
`LovinsStemmer`. These are `dutch_kp` and `lovins`, respectively.

We will deprecate in 8.16 and will remove support for these in a future
version.
2024-10-01 04:03:44 +10:00
Stanislav Malyshev b26d81c713
Implement remote cluster CCS telemetry (#112478)
* Add remote cluster stats to _cluster/stats
* Implement remote cluster stats polling
* Add docs for the include_remotes part
2024-09-30 11:50:22 -06:00
Patrick Doyle a6b104d843
Fix max file size check to use getMaxFileSize (#113723)
* Fix max file size check to use getMaxFileSize

* Update docs/changelog/113723.yaml

* CURSE YOU SPOTLESS
2024-09-30 11:23:32 -04:00
Liam Thompson 55078d4c5e
[DOCS] Fix heading level (#113800) 2024-09-30 16:11:46 +02:00
Luke Whiting b1b249d26b
#101193 Preserve Step Info Across ILM Auto Retries (#113187)
* Add new Previous Step Info field to LifecycleExecutionState

* Add new field to IndexLifecycleExplainResponse

* Add new field to TransportExplainLifecycleAction

* Add logic to IndexLifecycleTransition to keep previous setp info

* Switch tests to use Java standard Clock class

for any time based testing, this is the recommended method

* Fix tests for new field

Also refactor tests to newer style

* Add test to ensure step info is preserved

Across auto retries

* Add docs for new field

* Changelog Entry

* Update docs/changelog/113187.yaml

* Revert "Switch tests to use Java standard Clock class"

This reverts commit 241074c735.

* PR Changes

* PR Changes - Improve docs wording

Co-authored-by: Mary Gouseti <mgouseti@gmail.com>

* Integration test for new ILM explain field

* Use ROOT locale instead of default toLowerCase

* PR Changes - Switch to block strings

* Remove forbidden API usage

---------

Co-authored-by: Mary Gouseti <mgouseti@gmail.com>
2024-09-30 11:44:46 +01:00
Simon Cooper acd4f07475
Create a doc for versioning info (#113601) 2024-09-30 10:42:59 +01:00
David Kyle 3a04f07c50
[ML] Fix check on E5 model platform compatibility (#113437)
Creating an endpoint for the built in multilingual e5 model failed for
linux optimised version due to an error in the logic that checks model
compatibility.
2024-09-30 09:26:23 +01:00
Liam Thompson 6e400c12a7
[DOCS] Port connector docs from Enterprise Search guide (#112953) 2024-09-30 10:22:37 +02:00
István Zoltán Szabó 436c6c85ff
[DOCS] Adds an admonition to the transform painless examples. (#113706) 2024-09-30 09:28:28 +02:00
Kostas Krikellas c9f378da29
Revert "Apply auto-flattening to `subobjects: auto` (#112092)" (#113692)
* Revert "Apply auto-flattening to `subobjects: auto` (#112092)"

This reverts commit fffe8844

* fix DataGenerationHelper
2024-09-30 10:11:15 +03:00
Ignacio Vera b4334f1afa
[ESQL] Fix init value in max float aggregation (#113699)
We are current using Float.MIN_VALUE which is the smallest positive non-zero value, not the smallest possible value. 
This commit change it to -Float.MAX_VALUE to be symmetric to the double max aggregation.
2024-09-27 18:25:20 +02:00
Sam Xiao 6917f1679a
Tag redacted document in ingest pipeline (#113552)
Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes #94633
2024-09-27 12:24:24 -04:00
Nikolay Benlioglu 929b388ba5
fix: handle null path parameter in RestNodesCapabilitiesAction (#113413) 2024-09-27 16:14:51 +01:00
Mike Pellegrini 8344d3a6ac
Add inner hits support to semantic query (#111834)
Adds inner hits support to the semantic query through a restricted inner_hits parameter, which exposes from and size from the inner_hits options
2024-09-27 10:51:11 -04:00
Luke Whiting db632ee3cd
Note in docs about interpreting IO stats when running in docker (#113676)
* Note in docs about incorrect IO stats when running in docker

* Update docs/reference/cluster/nodes-stats.asciidoc

Co-authored-by: David Turner <david.turner@elastic.co>

* Requested PR changes to wording

* Update docs/reference/cluster/nodes-stats.asciidoc

Co-authored-by: David Turner <david.turner@elastic.co>

---------

Co-authored-by: David Turner <david.turner@elastic.co>
2024-09-27 13:32:23 +01:00
Carson Ip 4932697142
[docs] Fix typo in repository-s3.asciidoc (#113678)
Same as #113572 but targeting `8.x` and `main`.
2024-09-27 20:36:22 +10:00
Oleksandr Kolomiiets 55cab5fa98
Fix ignore_above handling in synthetic source when index level setting is used (#113570) 2024-09-26 13:01:21 -07:00
elasticsearchmachine 8d6ecdc75d Prune changelogs after 8.15.2 release 2024-09-26 17:40:46 +00:00
Fang Xing 3516b9a675
[ES|QL] Check expression resolved before checking its data type in ImplicitCasting (#113314)
* check resolved before check data type
2024-09-26 12:06:35 -04:00
Mark Tozzi 122e728820
[ESQL] Add TO_DATE_NANOS conversion function (#112150)
Resolves #111842

This adds a conversion function that yields DATE_NANOS. Mostly this is straight forward.

It is worth noting that when converting a millisecond date into a nanosecond date, the conversion function truncates it to 0 nanoseconds (i.e. first nanosecond of that millisecond). This is, of course, a bit of an assumption, but I don't have a better assumption we can make. I'd thought about adding a second, optional, parameter to control this behavior, but it's important that TO_DATE_NANOS extend AbstractConvertFunction, which itself extends UnaryScalarFunction, so that it will work correctly with union types. Also, it's unlikely the user will have any better guess than we do for filling in the nanoseconds.

Making that assumption does, however, create some weirdness. Consider two comparisons:

TO_DATETIME("2023-03-23T12:15:03.360103847") == TO_DATETIME("2023-03-23T12:15:03.360") will return true while TO_DATE_NANOS("2023-03-23T12:15:03.360103847") == TO_DATE_NANOS("2023-03-23T12:15:03.360") will return false. This is akin to casting between longs and doubles, where things may compare equal in one type that are not equal in the other. This seems fine, and I can't think of a better way to do it, but it's worth being aware of.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-09-26 12:03:01 -04:00
Stef Nestor 0ea8a78ca7
(Doc+) Avoid search pile up by setting default timeout (#112846)
👋! Mini doc PR to say can avoid search task pile-ups by setting [`search.default_search_timeout`](https://www.elastic.co/guide/en/elasticsearch/reference/current/search-your-data.html#search-timeout) under [High JVM > avoid expensive searches](https://www.elastic.co/guide/en/elasticsearch/reference/master/high-jvm-memory-pressure.html#reduce-jvm-memory-pressure).
2024-09-26 09:05:21 -06:00
kosabogi 6e73c1423b
Adds text_similarity task type to inference processor documentation (#113517) 2024-09-26 16:12:28 +02:00
István Zoltán Szabó 5e019998ef
[DOCS] Improves semantic text documentation. (#113606) 2024-09-26 16:09:28 +02:00
Benjamin Trent 1b67dabadb
Fix collapse interaction with stored fields (#112761)
Collapse dynamically will add values to the DocumentField values array.
There are a few scenarios where this is immutable and most of these are
OK. However, we get in trouble when we create an immutable set for
StoredValues which collapse later tries to update.

The other option for this fix was to make an array copy for `values` in
every `DocumentField` ctor, this seemed very expensive and could get out
of hand. So, I decided to fix this one bug instead.

closes https://github.com/elastic/elasticsearch/issues/112646
2024-09-26 22:51:02 +10:00
Kostas Krikellas fffe8844e9
Apply auto-flattening to `subobjects: auto` (#112092)
* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-09-26 11:42:40 +03:00
Stanislav Malyshev 5e06092d5e
Improve DateTime error handling and add some bad date tests (#112723)
* Improve DateTime error handling and add some bad date tests
2024-09-25 15:55:30 -06:00
Oleksandr Kolomiiets 35fbbec46a
Fix synthetic source for flattened field when used with ignore_above (#113499) 2024-09-25 14:38:37 -07:00
Keith Massey cd950bb2fa
Adding component template substitutions to the simulate ingest API (#113276) 2024-09-25 15:30:22 -05:00
Mayya Sharipova c18c531d72
Deprecate legacy params from range query (#113286)
Deprecate to, from, include_lower, include_upper range query params.
These params have been removed from our documentation in v. 0.90.4 (d6ecdecc19),
but did not got through deprecation cycle.

These params to be removed in v9.0.

Related to #81276

Closes #48538
2024-09-25 14:48:45 -04:00
Liam Thompson 4f666310c7
[DOCS] Create Elasticsearch basics section, refactor quickstarts section (#112436)
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-09-25 17:30:01 +02:00
Smriti 0638d3977a
Update index-templates.asciidoc (#113461)
Adding `security_solution-*-*` in list of index nae to avoid the pattern collisions.
2024-09-25 13:55:17 +02:00
David Kyle 7a0f4ee56e
[ML] Limit in flight requests when indexing model download parts (#112992)
Restores the changes from #111684 which uses multiple streams to improve the
time to download and install the built in ml models. The first iteration has a problem
where the number of in-flight requests was not properly limited which is fixed here.
Additionally there are now circuit breaker checks on allocating the buffer used to 
store the model definition.
2024-09-25 10:10:06 +01:00
Mike Pellegrini 44732a5648
Add Search Inference ID To Semantic Text Mapping (#113051)
Adds a search_inference_id parameter to the semantic_text mapping. This parameter defines the inference endpoint that is used to generate embeddings at query time.
2024-09-24 16:53:23 -04:00
Stanislav Malyshev fda755939f
Improve date expression/remote handling in index names (#112405)
* Improve date expression/remote handling
The original code did not account for the possibility of the date expression being prefixed
with -.
2024-09-24 13:22:32 -06:00
Luigi Dell'Aquila 7ba26892f3
ES|QL: make CSV date tests more friendly for Java 23 (#113472)
Following [this
suggestion](https://github.com/elastic/elasticsearch/pull/113376#issuecomment-2370817089),
switching date patterns from week years to calendar years, that have the
same behavior in java <=22 and java 23.
2024-09-25 02:57:22 +10:00
Nik Everett 5c91edda9f
ESQL: Speed up CASE for some parameters (#112295)
This speeds up the `CASE` function when it has two or three arguments
and both of the arguments are constants or fields. This works because
`CASE` is lazy so it can avoid warnings in cases like
```
CASE(foo != 0, 2 / foo, 1)
```

And, in the case where the function is *very* slow, it can avoid the
computations.

But if the lhs  and rhs of the `CASE` are constant then there isn't any
work to avoid.

The performance improvment is pretty substantial:
```
 (operation)  Before   Error   After    Error  Units
 case_1_lazy  97.422 ± 1.048  101.571 ± 0.737  ns/op
case_1_eager  79.312 ± 1.190    4.601 ± 0.049  ns/op
```

The top line is a `CASE` that has to be lazy - it shouldn't change. The
4 nanos change here is noise. The eager version improves by about 94%.
2024-09-24 12:54:40 -04:00
David Turner 60713622a5
Suppress merge-on-recovery for older indices (#113462)
There may be many older indices in need of merging, but today we do not
throttle this work across shards so an upgrade could lead to an
overwhelming spike in merges. With this commit we make it so that the
automatic merge-on-recovery behaviour only applies to newly-created
indices.
2024-09-25 00:47:29 +10:00
Parker Timmins c17cf18d5d
Add ensureGreen method for use with adminClient (#113425)
Current ensureGreen test helper method uses client() directly.
Sometimes is useful to call ensureGreen with adminClient() or
another rest client. This PR allows passing a RestClient into
ensureGreen.
2024-09-24 08:39:58 -05:00
Ignacio Vera a3806cd8e1
Account for DelayedBucket before reduction (#113013)
This commit moves the account for the DelayableBucket before reduction, therefore in some adversarial cases, we 
should exit much sooner.
2024-09-24 14:46:25 +02:00
Andrei Dan 4e5e870370
Implement `parseBytesRef` for TimeSeriesRoutingHashFieldType (#113373)
This implements the `parseBytesRef` method for the `_ts_routing_hash` field so we
can parse the values generated by the companion `format` method.
We parse the values when fetching them from the source when the field is used
as a `sort` paired with `search_after`.

Before this change a sort by and search_after `_ts_routing_hash` would yield
an `UnsupportedOperationException`
2024-09-24 09:13:26 +01:00
Valeriy Khakhutskyy 7b7dd91f62
[ML] Add documentation for post calendar events API (#113188)
This PR updates the documentation for the extension of the POST calendar events API implemented in #112837.
2024-09-24 09:46:42 +02:00
Ignacio Vera d9e0cbeb59
Small performance improvement in h3 library (#113385)
Changing some FDIV's into FMUL's leads to performance improvements
2024-09-24 07:08:47 +02:00
Pat Whelan 1565c31471
[ML] Stream Inference API (#113158)
Create `POST _inference/<task>/<id>/_stream` and
`POST _inference/<id>/_stream` API.

REST Streaming API will reuse InferenceAction.
For now, all services and task types will return an
HTTP 405 status code and error message.

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2024-09-23 18:19:40 -04:00
Sam Xiao 80dd56398f
ILM: Add total_shards_per_node setting to searchable snapshot (#112972)
Allows setting index total_shards_per_node in the SearchableSnapshot action of ILM to remediate hot spot in shard allocation for searchable snapshot index.

Closes #112261
2024-09-23 13:37:58 -04:00
Nik Everett 58021c3405
ESQL: TOP support for strings (#113183)
Adds support to the `TOP` aggregation for `keyword` and `text` field
types.

Closes #109849
2024-09-24 03:00:18 +10:00
Salvatore Campagna 208a1fe571
Introduce an `ignore_above` index-level setting (#113121)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
2024-09-23 18:05:02 +02:00
YeonghyeonKo 2a5afca1ba
fix typos of docs/plugins (#113348) 2024-09-23 17:53:38 +02:00