Commit Graph

78179 Commits

Author SHA1 Message Date
Armin Braun 7e81229b7f
Wait for dynamic mapping update more precisely (#110187)
We ran into a situation where dynamic mapping updates where retried in a
fairly hot loop. The problem that triggered this was waiting for any cluster state
update in this logic. This is mostly fine but adds a lot of overhead for
retries when there's other actions running at a higher priority than the
mapping update. Lets make it specific so that we at least wait for there
to be any mapping and for its version to be different from the version
that made us request a mapping update in the first place.
Also added a breakout in case the index got concurrently deleted so we
don't run out the clock in that case.
2024-06-27 11:18:27 +02:00
Tim Rühsen 186edb2c35
[Profiling] Add field env_https_proxy to profiling-hosts (#110219) 2024-06-27 10:22:04 +02:00
Luigi Dell'Aquila b7c18bcfe1
ES|QL: Fix DISSECT that overwrites input (#110201)
Fixes https://github.com/elastic/elasticsearch/issues/110184

When a DISSECT command overwrites the input, eg.

```
FROM idx | DISSECT foo "%{foo} %{bar}" | KEEP foo, bar
```

The input field (`foo` in this case) could be excluded from the index
resolution (incorrectly masked by the `foo` that is the result of the
DISSECT). This PR makes sure that the input field does not get lost and
is correctly passed to the indexResolver.
2024-06-27 18:15:59 +10:00
Huaixinww 2d45e47bb5
[ML] Add InferenceAction request query validation (#110147) 2024-06-27 09:12:28 +02:00
Nick Tindall 944f2da5dd
Fix Netty4ChunkedContinuationsIT#testClientCancellation (#110118)
Closes #109866
2024-06-27 14:34:55 +10:00
Nik Everett f0afd91650
ESQL: Reenable test (#110215)
Our test muting tools muted this test, but the fix was in flight. It's
merged now.

Closes #110203
2024-06-27 11:07:51 +10:00
elasticsearchmachine 8b63f65278 Mute org.elasticsearch.synonyms.SynonymsManagementAPIServiceIT testUpdateRuleWithMaxSynonyms #110212 2024-06-27 07:23:34 +10:00
elasticsearchmachine 1510e506ed Mute org.elasticsearch.index.store.FsDirectoryFactoryTests testPreload #110211 2024-06-27 07:23:31 +10:00
elasticsearchmachine 0983902682 Mute org.elasticsearch.index.store.FsDirectoryFactoryTests testStoreDirectory #110210 2024-06-27 07:23:28 +10:00
elasticsearchmachine 303a7cf007 Mute org.elasticsearch.http.netty4.Netty4ChunkedContinuationsIT testClientCancellation #109866 2024-06-27 07:23:26 +10:00
Benjamin Trent 5add44d7d1
Adds new `bit` element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.

`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.

`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.

For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.

Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.

closes: https://github.com/elastic/elasticsearch/issues/48322
2024-06-27 04:48:41 +10:00
Nhat Nguyen 97651dfb9f
Support rate aggregation in ES|QL (#109979)
Rate aggregation is special because it must be computed per time series, 
regardless of the grouping keys. The keys must be `_tsid` or a pair of
`_tsid` and `time_bucket`. To support user-defined grouping keys, we
first execute the rate aggregation using the time-series keys, then
perform another aggregation with the resulting rate using the
user-specific keys.

This PR translates the aggregates in the METRICS commands to standard 
aggregates. This approach helps avoid introducing new plans and
operators for metrics aggregations only.

Examples:

**METRICS k8s max(rate(request))** becomes:
```
METRICS k8s
| STATS rate(request) BY _tsid
| STATS max(`rate(request)`)
```

**METRICS k8s max(rate(request)) BY host** becomes:
```
METRICS k8s
| STATS rate(request), VALUES(host) BY _tsid
| STATS max(`rate(request)`) BY host=`VALUES(host)`
```

**METRICS k8s avg(rate(request)) BY host** becomes:
```
METRICS k8s
| STATS rate(request), VALUES(host) BY _tsid
| STATS sum=sum(`rate(request)`), count(`rate(request)`) BY host=`VALUES(host)`
| EVAL `avg(rate(request))` = `sum(rate(request))` / `count(rate(request))`
| KEEP `avg(rate(request))`, host
```

 **METRICS k8s avg(rate(request)) BY host, time_bucket=bucket(\@timestamp, 1minute)**  becomes:

```
 METRICS k8s
 | EVAL  `bucket(@timestamp, 1minute)`=datetrunc(@timestamp, 1minute)
 | STATS rate(request), VALUES(host) BY _tsid,`bucket(@timestamp, 1minute)`
 | STATS sum=sum(`rate(request)`), count(`rate(request)`) BY host=`VALUES(host)`, `bucket(@timestamp, 1minute)`
 | EVAL `avg(rate(request))` = `sum(rate(request))` / `count(rate(request))`
 | KEEP `avg(rate(request))`, host, `bucket(@timestamp, 1minute)`
```
2024-06-26 11:16:11 -07:00
Michael Peterson 8863497d87
Add min/max range of the event.ingested field to cluster state for searchable snapshots (#106252)
Add event.ingested min/max range to cluster state for searchable snapshots in order
to support shard skipping on the search coordinator node for event.ingested
like we do for the @timestamp field.

The key motivation is that the Elastic Security solution uses two timestamp fields:
@timestamp which is user populated and event.ingested which is populated by
an ingest processor which tells the time that the event was ingested.

In some cases, queries are filtered by @timestamp, while in some others they are filtered
by event.ingested. When data is stored in the cold or frozen tier, we have a shard skipping
mechanism on the coordinating node, which allows the coordinating node to skip shards
based on min and max value stored in the cluster state for the @timestamp field. This is
done to prevent returning irrelevant errors in case shards in frozen/cold are unavailable
(as they have no replicas) and queries that don’t target them (as the datetime filter won’t
match them) are executed.

This works well for the @timestamp field and the Security solution could benefit from having
the same skipping mechanism applied to the event.ingested field.

Note that it is important that the value of the event.ingested range in IndexMetadata
in cluster state be set to UNKNOWN when the min cluster TransportVersion is less
than EVENT_INGESTED_RANGE_IN_CLUSTER_STATE.
2024-06-26 13:05:19 -04:00
Nik Everett f37b8dd9c3
ESQL: Merge more of EsqlDataTypes into DataType (#110105)
This moves a few more methods from `EsqlDataTypes` into `DataType` and
makes some of the followup changes recommended in #109921.
2024-06-26 12:54:08 -04:00
Oleksandr Kolomiiets 63aae0d737
Opt in number fields into fallback synthetic source when doc values are disabled (#110160)
Contributes to #109546.
2024-06-27 02:38:59 +10:00
elasticsearchmachine 8515c9a043 Mute org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests #110203 2024-06-27 02:24:39 +10:00
Simon Cooper c752fbab35
Don't run the JVM crash test on windows (#110194) 2024-06-26 17:05:16 +01:00
Armin Braun 5eae642259
Save list wrapping in InternalAggregations (#110196)
No point in wrapping in unmodifiable here on every call, just adds
allocations and we escape the mutable iterator anyway. Also we dont't
need to call `asList` to iterate to begin with.
2024-06-27 02:04:11 +10:00
Ignacio Vera 9d19559dcc
Don't sample calls to ReduceContext#consumeBucketsAndMaybeBreak in InternalDateHistogram and InternalHistogram during reduction (#110186)
We are only called the method every 10K buckets, so for deeply nested aggregations might never call the method 
although the final number of buckets can be enormous.
2024-06-26 17:33:58 +02:00
István Zoltán Szabó 31f0253b43
[DOCS] Adds link to ES-Cohere notebook and clarifies requirements. (#110195) 2024-06-26 17:22:40 +02:00
Oleksandr Kolomiiets b68e7d76c9
Remove obsolete sentence from TSDS docs (#110162) 2024-06-26 08:21:52 -07:00
Tim Grein dd3e73ef97
[Inference API] Add Google Vertex AI as provider for text_embedding task type (#110090) 2024-06-26 17:08:57 +02:00
Nik Everett 573e44ab2a
ESQL: Drop unused parts of esql-core package (#110174)
This drops a bunch of unused code in the esql-core package that was used
to power SQL and EQL. We copied it when we forked the shared ql code
into esql-core. But we don't need it.
2024-06-26 11:03:25 -04:00
Armin Braun c856314ea2
Make empty aggregation instances a little cheaper (#110190)
For sub-aggregations we were not properly sizing the array. Also we can
use singleton lists in a couple spots to save a little more memory.
2024-06-27 00:51:29 +10:00
Kostas Krikellas 3afd53e26a
Remove `average` from downsampling statistics in documentation (#110189) 2024-06-26 17:23:06 +03:00
Nik Everett 2a6bd5cb91
ESQL: Lock testing to LEFT joins only (#110165)
We only support LEFT joins at the moment and the tests would randomly
fail.

Closes #110158 Closes #110152
2024-06-26 22:37:25 +10:00
Rene Groeschke e3bccbe2a1
[CI] Add gradle cache validation buildkite script (#110185)
* Add ci script for gradle cache validation
2024-06-26 14:30:33 +02:00
Benjamin Trent 9a5c598059
Fixing test failure #110032 due to partial shard results (#110062) 2024-06-26 07:55:54 -04:00
Nik Everett 84f5d04570
ESQL: Fix a TODO in ESQL's ENRICH (#110173)
There was a TODO around serialization in the enrich code. This solves
it.
2024-06-26 07:32:57 -04:00
David Kyle 0075646210
Revert "AwaitsFix: https://github.com/elastic/elasticsearch/issues/109904" (#109912)
This reverts commit 330e1defd7.
2024-06-26 11:47:41 +01:00
Pius 79623c7609
Update search-application-api.asciidoc (#110113)
Add a subsection about cross cluster search support (or the lack of).
2024-06-26 12:20:28 +02:00
Simon Cooper 4ca6f30216
Various tweaks to reserved cluster state tests (#110148) 2024-06-26 10:52:48 +01:00
David Kyle 3c1c8d0f32
[ML] Increase response size limit for batched requests (#110112)
Increase the default to 50MB and do not retry when the limit is exceeded
2024-06-26 10:31:06 +01:00
Quentin Pradet 6d98e0d6b9
Fix trailing slash in two rollup specifications (#110176) 2024-06-26 12:29:19 +04:00
Quentin Pradet 5a2c7841b9
Fix trailing slash in security.put_privileges specification (#110177) 2024-06-26 12:28:47 +04:00
Nick Tindall caf4bc1fbe
Add leak tracking and unmute Netty4ChunkedContinuationsIT (#110175)
Related to: #109866
2024-06-26 15:17:14 +10:00
Nik Everett fa89cbec41
ESQL: Allow another cancel message in test (#110169)
This causes the ESQL tests to be ok with a different cancellation
message in the cancellation tests - this one is just as descriptive as
the previous allowed ones though I'm not entirely sure where it comes
from. But that's ok - it tells us the request is cancelled.

Closes #109890
2024-06-26 12:16:59 +10:00
Oleksandr Kolomiiets 29faff135f
Fix text and keyword synthetic source tests (#110170)
Closes #110163. Closes #110161.
2024-06-26 12:02:51 +10:00
Jack Conradson 9685a5d32c
Add metric to count search responses (#110070)
This adds a metric to count the total number of search responses generated by the search action and 
the scroll search action. This uses the key es.search_response.response_count.total. Each response 
also has an attribute to specify whether the response was a success, a partial failure, or a failure.
2024-06-25 15:20:07 -07:00
Nik Everett e1482e2f3f
ESQL: Escape `&` in RLIKE tests (#110167)
`&` is the sigil for an intersection in lucene's regexes.

Closes #110095
2024-06-26 08:08:14 +10:00
Mark Tozzi c0bfadf1f5
[ESQL] Replace explicit type list with filtered stream (#110155)
Follow up to #109227. As discussed there, this makes it clear what is being excluded from the type list. It also makes adding new types easier, as implementers do not need to remember to add the type to the list.

I do not know why these two data types (DOC_DATA_TYPE and TSID_DATA_TYPE) are excluded from this list. It was like that when I found it.
2024-06-25 16:44:58 -04:00
elasticsearchmachine 79a78e1f42 Mute org.elasticsearch.index.mapper.KeywordFieldMapperTests testBlockLoaderFromRowStrideReader #110163 2024-06-26 06:41:02 +10:00
elasticsearchmachine ca32e29c6f Mute org.elasticsearch.index.mapper.TextFieldMapperTests testSyntheticSourceMany #110161 2024-06-26 06:38:32 +10:00
Kathleen DeRusso 1f46a94dec
Add documentation for individual query rules (#110006)
* Add individual query rule API docs

* Update docs/reference/query-rules/apis/get-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/query-rules/apis/delete-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* Update docs/reference/query-rules/apis/get-query-rule.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

* PR feedback

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-06-25 14:35:08 -04:00
Benjamin Trent 759f61d90f
Fix vector similarity test failures for half-byte (#110065)
The test that is set up assumes a single shard. Since the test uses so
few vectors and few dimensions, the statistics are pretty sensitive. 

CCS tests seem to allow more than one write shard (via more than one
cluster). Consequently, the similarity detected can vary pretty wildly.
However, through empirical testing, I found that the desired vector
seems to always have a score > 0.0034 and all the other vectors have a
score < 0.001. This commit adjusts this similarity threshold
accordingly. This should make the test flakiness go away in CCS testing.

closes: https://github.com/elastic/elasticsearch/issues/109881
2024-06-26 04:16:55 +10:00
Jim Ferenczi cdba862c76
Fix detection of invalid mixed sort fields (#109914)
Sort fields are usually rewritten into their merge form during transport serialization. For searches where all shards are local to the coordinating node, this serialization is not applied, and sort fields remain in their original form. This is typically not an issue except when checking sort compatibility, as we rely on the primary type of the sort field.

This change ensures we extract the correct type even if the sort field is in its primary form (with a CUSTOM type) using the original comparator source. The only field type that benefits from this change afaik is the constant_keyword field type. When a sort field is a mix of numeric and constant_keyword, this change ensures the discrepancy is correctly reported to the user.
2024-06-25 19:01:46 +01:00
Quentin Pradet af8c35986f
Fix trailing slash in ml.get_categories specification (#110146) 2024-06-25 22:00:31 +04:00
Benjamin Trent 9e57ac421a
Fix automatic tracking of collapse with docvalue_fields (#110103)
There were some optimizations that broke collapse fields automatically
being added to `docvalue_fields` during the fetch phase. 

Consequently, users will get really weird errors like
`unsupported_operation_exception`. This commit corrects the intended
behavior of automatically including the collapse field in the
docvalue_fields context during fetch if it isn't already included.

closes: https://github.com/elastic/elasticsearch/issues/96510
2024-06-26 03:02:50 +10:00
Oleksandr Kolomiiets 653b99a76b
Opt in keyword field into fallback synthetic source when doc values are disabled (#110016) 2024-06-25 09:34:58 -07:00
Mark Tozzi ae39525e9f
[ESQL] Refactor Element Type selection to use a switch (#110111)
Similar to https://github.com/elastic/elasticsearch/pull/110036, this
creates the opportunity for the compiler to flag this as a place which
needs to be changed when we add a data type.  It's also fewer lines of
code, and makes it explicit which types are not supported.

Ideally, this would also become a property on DataType, but that can't
happen until we pull it out of core.
2024-06-26 01:48:07 +10:00