elasticsearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	7e81229b7f	Wait for dynamic mapping update more precisely (#110187 ) We ran into a situation where dynamic mapping updates where retried in a fairly hot loop. The problem that triggered this was waiting for any cluster state update in this logic. This is mostly fine but adds a lot of overhead for retries when there's other actions running at a higher priority than the mapping update. Lets make it specific so that we at least wait for there to be any mapping and for its version to be different from the version that made us request a mapping update in the first place. Also added a breakout in case the index got concurrently deleted so we don't run out the clock in that case.	2024-06-27 11:18:27 +02:00
Tim Rühsen	186edb2c35	[Profiling] Add field env_https_proxy to profiling-hosts (#110219 )	2024-06-27 10:22:04 +02:00
Luigi Dell'Aquila	b7c18bcfe1	ES\|QL: Fix DISSECT that overwrites input (#110201 ) Fixes https://github.com/elastic/elasticsearch/issues/110184 When a DISSECT command overwrites the input, eg. ``` FROM idx \| DISSECT foo "%{foo} %{bar}" \| KEEP foo, bar ``` The input field (`foo` in this case) could be excluded from the index resolution (incorrectly masked by the `foo` that is the result of the DISSECT). This PR makes sure that the input field does not get lost and is correctly passed to the indexResolver.	2024-06-27 18:15:59 +10:00
Huaixinww	2d45e47bb5	[ML] Add InferenceAction request query validation (#110147 )	2024-06-27 09:12:28 +02:00
Nick Tindall	944f2da5dd	Fix Netty4ChunkedContinuationsIT#testClientCancellation (#110118 ) Closes #109866	2024-06-27 14:34:55 +10:00
Nik Everett	f0afd91650	ESQL: Reenable test (#110215 ) Our test muting tools muted this test, but the fix was in flight. It's merged now. Closes #110203	2024-06-27 11:07:51 +10:00
elasticsearchmachine	8b63f65278	Mute org.elasticsearch.synonyms.SynonymsManagementAPIServiceIT testUpdateRuleWithMaxSynonyms #110212	2024-06-27 07:23:34 +10:00
elasticsearchmachine	1510e506ed	Mute org.elasticsearch.index.store.FsDirectoryFactoryTests testPreload #110211	2024-06-27 07:23:31 +10:00
elasticsearchmachine	0983902682	Mute org.elasticsearch.index.store.FsDirectoryFactoryTests testStoreDirectory #110210	2024-06-27 07:23:28 +10:00
elasticsearchmachine	303a7cf007	Mute org.elasticsearch.http.netty4.Netty4ChunkedContinuationsIT testClientCancellation #109866	2024-06-27 07:23:26 +10:00
Benjamin Trent	5add44d7d1	Adds new `bit` element_type for dense_vectors (#110059 ) This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: https://github.com/elastic/elasticsearch/issues/48322	2024-06-27 04:48:41 +10:00
Nhat Nguyen	97651dfb9f	Support rate aggregation in ES\|QL (#109979 ) Rate aggregation is special because it must be computed per time series, regardless of the grouping keys. The keys must be `_tsid` or a pair of `_tsid` and `time_bucket`. To support user-defined grouping keys, we first execute the rate aggregation using the time-series keys, then perform another aggregation with the resulting rate using the user-specific keys. This PR translates the aggregates in the METRICS commands to standard aggregates. This approach helps avoid introducing new plans and operators for metrics aggregations only. Examples: METRICS k8s max(rate(request)) becomes: ``` METRICS k8s \| STATS rate(request) BY _tsid \| STATS max(`rate(request)`) ``` METRICS k8s max(rate(request)) BY host becomes: ``` METRICS k8s \| STATS rate(request), VALUES(host) BY _tsid \| STATS max(`rate(request)`) BY host=`VALUES(host)` ``` METRICS k8s avg(rate(request)) BY host becomes: ``` METRICS k8s \| STATS rate(request), VALUES(host) BY _tsid \| STATS sum=sum(`rate(request)`), count(`rate(request)`) BY host=`VALUES(host)` \| EVAL `avg(rate(request))` = `sum(rate(request))` / `count(rate(request))` \| KEEP `avg(rate(request))`, host ``` METRICS k8s avg(rate(request)) BY host, time_bucket=bucket(\@timestamp, 1minute) becomes: ``` METRICS k8s \| EVAL `bucket(@timestamp, 1minute)`=datetrunc(@timestamp, 1minute) \| STATS rate(request), VALUES(host) BY _tsid,`bucket(@timestamp, 1minute)` \| STATS sum=sum(`rate(request)`), count(`rate(request)`) BY host=`VALUES(host)`, `bucket(@timestamp, 1minute)` \| EVAL `avg(rate(request))` = `sum(rate(request))` / `count(rate(request))` \| KEEP `avg(rate(request))`, host, `bucket(@timestamp, 1minute)` ```	2024-06-26 11:16:11 -07:00
Michael Peterson	8863497d87	Add min/max range of the event.ingested field to cluster state for searchable snapshots (#106252 ) Add event.ingested min/max range to cluster state for searchable snapshots in order to support shard skipping on the search coordinator node for event.ingested like we do for the @timestamp field. The key motivation is that the Elastic Security solution uses two timestamp fields: @timestamp which is user populated and event.ingested which is populated by an ingest processor which tells the time that the event was ingested. In some cases, queries are filtered by @timestamp, while in some others they are filtered by event.ingested. When data is stored in the cold or frozen tier, we have a shard skipping mechanism on the coordinating node, which allows the coordinating node to skip shards based on min and max value stored in the cluster state for the @timestamp field. This is done to prevent returning irrelevant errors in case shards in frozen/cold are unavailable (as they have no replicas) and queries that don’t target them (as the datetime filter won’t match them) are executed. This works well for the @timestamp field and the Security solution could benefit from having the same skipping mechanism applied to the event.ingested field. Note that it is important that the value of the event.ingested range in IndexMetadata in cluster state be set to UNKNOWN when the min cluster TransportVersion is less than EVENT_INGESTED_RANGE_IN_CLUSTER_STATE.	2024-06-26 13:05:19 -04:00
Nik Everett	f37b8dd9c3	ESQL: Merge more of EsqlDataTypes into DataType (#110105 ) This moves a few more methods from `EsqlDataTypes` into `DataType` and makes some of the followup changes recommended in #109921.	2024-06-26 12:54:08 -04:00
Oleksandr Kolomiiets	63aae0d737	Opt in number fields into fallback synthetic source when doc values are disabled (#110160 ) Contributes to #109546.	2024-06-27 02:38:59 +10:00
elasticsearchmachine	8515c9a043	Mute org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests org.elasticsearch.xpack.esql.tree.EsqlNodeSubclassTests #110203	2024-06-27 02:24:39 +10:00
Simon Cooper	c752fbab35	Don't run the JVM crash test on windows (#110194 )	2024-06-26 17:05:16 +01:00
Armin Braun	5eae642259	Save list wrapping in InternalAggregations (#110196 ) No point in wrapping in unmodifiable here on every call, just adds allocations and we escape the mutable iterator anyway. Also we dont't need to call `asList` to iterate to begin with.	2024-06-27 02:04:11 +10:00
Ignacio Vera	9d19559dcc	Don't sample calls to ReduceContext#consumeBucketsAndMaybeBreak in InternalDateHistogram and InternalHistogram during reduction (#110186 ) We are only called the method every 10K buckets, so for deeply nested aggregations might never call the method although the final number of buckets can be enormous.	2024-06-26 17:33:58 +02:00
István Zoltán Szabó	31f0253b43	[DOCS] Adds link to ES-Cohere notebook and clarifies requirements. (#110195 )	2024-06-26 17:22:40 +02:00
Oleksandr Kolomiiets	b68e7d76c9	Remove obsolete sentence from TSDS docs (#110162 )	2024-06-26 08:21:52 -07:00
Tim Grein	dd3e73ef97	[Inference API] Add Google Vertex AI as provider for text_embedding task type (#110090 )	2024-06-26 17:08:57 +02:00
Nik Everett	573e44ab2a	ESQL: Drop unused parts of esql-core package (#110174 ) This drops a bunch of unused code in the esql-core package that was used to power SQL and EQL. We copied it when we forked the shared ql code into esql-core. But we don't need it.	2024-06-26 11:03:25 -04:00
Armin Braun	c856314ea2	Make empty aggregation instances a little cheaper (#110190 ) For sub-aggregations we were not properly sizing the array. Also we can use singleton lists in a couple spots to save a little more memory.	2024-06-27 00:51:29 +10:00
Kostas Krikellas	3afd53e26a	Remove `average` from downsampling statistics in documentation (#110189 )	2024-06-26 17:23:06 +03:00
Nik Everett	2a6bd5cb91	ESQL: Lock testing to LEFT joins only (#110165 ) We only support LEFT joins at the moment and the tests would randomly fail. Closes #110158 Closes #110152	2024-06-26 22:37:25 +10:00
Rene Groeschke	e3bccbe2a1	[CI] Add gradle cache validation buildkite script (#110185 ) * Add ci script for gradle cache validation	2024-06-26 14:30:33 +02:00
Benjamin Trent	9a5c598059	Fixing test failure #110032 due to partial shard results (#110062 )	2024-06-26 07:55:54 -04:00
Nik Everett	84f5d04570	ESQL: Fix a TODO in ESQL's ENRICH (#110173 ) There was a TODO around serialization in the enrich code. This solves it.	2024-06-26 07:32:57 -04:00
David Kyle	0075646210	Revert "AwaitsFix: https://github.com/elastic/elasticsearch/issues/109904 " (#109912 ) This reverts commit `330e1defd7`.	2024-06-26 11:47:41 +01:00
Pius	79623c7609	Update search-application-api.asciidoc (#110113 ) Add a subsection about cross cluster search support (or the lack of).	2024-06-26 12:20:28 +02:00
Simon Cooper	4ca6f30216	Various tweaks to reserved cluster state tests (#110148 )	2024-06-26 10:52:48 +01:00
David Kyle	3c1c8d0f32	[ML] Increase response size limit for batched requests (#110112 ) Increase the default to 50MB and do not retry when the limit is exceeded	2024-06-26 10:31:06 +01:00
Quentin Pradet	6d98e0d6b9	Fix trailing slash in two rollup specifications (#110176 )	2024-06-26 12:29:19 +04:00
Quentin Pradet	5a2c7841b9	Fix trailing slash in security.put_privileges specification (#110177 )	2024-06-26 12:28:47 +04:00
Nick Tindall	caf4bc1fbe	Add leak tracking and unmute Netty4ChunkedContinuationsIT (#110175 ) Related to: #109866	2024-06-26 15:17:14 +10:00
Nik Everett	fa89cbec41	ESQL: Allow another cancel message in test (#110169 ) This causes the ESQL tests to be ok with a different cancellation message in the cancellation tests - this one is just as descriptive as the previous allowed ones though I'm not entirely sure where it comes from. But that's ok - it tells us the request is cancelled. Closes #109890	2024-06-26 12:16:59 +10:00
Oleksandr Kolomiiets	29faff135f	Fix text and keyword synthetic source tests (#110170 ) Closes #110163. Closes #110161.	2024-06-26 12:02:51 +10:00
Jack Conradson	9685a5d32c	Add metric to count search responses (#110070 ) This adds a metric to count the total number of search responses generated by the search action and the scroll search action. This uses the key es.search_response.response_count.total. Each response also has an attribute to specify whether the response was a success, a partial failure, or a failure.	2024-06-25 15:20:07 -07:00
Nik Everett	e1482e2f3f	ESQL: Escape `&` in RLIKE tests (#110167 ) `&` is the sigil for an intersection in lucene's regexes. Closes #110095	2024-06-26 08:08:14 +10:00
Mark Tozzi	c0bfadf1f5	[ESQL] Replace explicit type list with filtered stream (#110155 ) Follow up to #109227. As discussed there, this makes it clear what is being excluded from the type list. It also makes adding new types easier, as implementers do not need to remember to add the type to the list. I do not know why these two data types (DOC_DATA_TYPE and TSID_DATA_TYPE) are excluded from this list. It was like that when I found it.	2024-06-25 16:44:58 -04:00
elasticsearchmachine	79a78e1f42	Mute org.elasticsearch.index.mapper.KeywordFieldMapperTests testBlockLoaderFromRowStrideReader #110163	2024-06-26 06:41:02 +10:00
elasticsearchmachine	ca32e29c6f	Mute org.elasticsearch.index.mapper.TextFieldMapperTests testSyntheticSourceMany #110161	2024-06-26 06:38:32 +10:00
Kathleen DeRusso	1f46a94dec	Add documentation for individual query rules (#110006 ) * Add individual query rule API docs * Update docs/reference/query-rules/apis/get-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Update docs/reference/query-rules/apis/delete-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * Update docs/reference/query-rules/apis/get-query-rule.asciidoc Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com> * PR feedback --------- Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>	2024-06-25 14:35:08 -04:00
Benjamin Trent	759f61d90f	Fix vector similarity test failures for half-byte (#110065 ) The test that is set up assumes a single shard. Since the test uses so few vectors and few dimensions, the statistics are pretty sensitive. CCS tests seem to allow more than one write shard (via more than one cluster). Consequently, the similarity detected can vary pretty wildly. However, through empirical testing, I found that the desired vector seems to always have a score > 0.0034 and all the other vectors have a score < 0.001. This commit adjusts this similarity threshold accordingly. This should make the test flakiness go away in CCS testing. closes: https://github.com/elastic/elasticsearch/issues/109881	2024-06-26 04:16:55 +10:00
Jim Ferenczi	cdba862c76	Fix detection of invalid mixed sort fields (#109914 ) Sort fields are usually rewritten into their merge form during transport serialization. For searches where all shards are local to the coordinating node, this serialization is not applied, and sort fields remain in their original form. This is typically not an issue except when checking sort compatibility, as we rely on the primary type of the sort field. This change ensures we extract the correct type even if the sort field is in its primary form (with a CUSTOM type) using the original comparator source. The only field type that benefits from this change afaik is the constant_keyword field type. When a sort field is a mix of numeric and constant_keyword, this change ensures the discrepancy is correctly reported to the user.	2024-06-25 19:01:46 +01:00
Quentin Pradet	af8c35986f	Fix trailing slash in ml.get_categories specification (#110146 )	2024-06-25 22:00:31 +04:00
Benjamin Trent	9e57ac421a	Fix automatic tracking of collapse with docvalue_fields (#110103 ) There were some optimizations that broke collapse fields automatically being added to `docvalue_fields` during the fetch phase. Consequently, users will get really weird errors like `unsupported_operation_exception`. This commit corrects the intended behavior of automatically including the collapse field in the docvalue_fields context during fetch if it isn't already included. closes: https://github.com/elastic/elasticsearch/issues/96510	2024-06-26 03:02:50 +10:00
Oleksandr Kolomiiets	653b99a76b	Opt in keyword field into fallback synthetic source when doc values are disabled (#110016 )	2024-06-25 09:34:58 -07:00
Mark Tozzi	ae39525e9f	[ESQL] Refactor Element Type selection to use a switch (#110111 ) Similar to https://github.com/elastic/elasticsearch/pull/110036, this creates the opportunity for the compiler to flag this as a place which needs to be changed when we add a data type. It's also fewer lines of code, and makes it explicit which types are not supported. Ideally, this would also become a property on DataType, but that can't happen until we pull it out of core.	2024-06-26 01:48:07 +10:00

1 2 3 4 5 ...

78179 Commits All Branches Search

78179 Commits

All Branches