elasticsearch

Commit Graph

Author	SHA1	Message	Date
Pooya Salehi	8df8d50392	Adjust log assertion for default project (#130549 ) We changed repo names to be prefixed with project name. Using just `default` here since this is not a multi-project test. Closes https://github.com/elastic/elasticsearch/issues/130536	2025-07-04 19:30:37 +10:00
Ignacio Vera	0c4bf226e6	Don't accept clustersPerNeighborhood lower than 2 (#130526 )	2025-07-04 08:56:30 +01:00
Ignacio Vera	bc0296456c	Trim to size lists created in source fetchers (#130521 ) This commit trim to size those lists to prevent wasteful heap usage.	2025-07-04 07:21:24 +01:00
Benjamin Trent	cdfd3ddfce	Remove most usages of XFeatureField as Lucene upgrade provides needed APIs (#130572 ) XFeatureField was added in conjunction with a required update for SparseVectorFields. However, with the upgrade to Lucene 10.1+, it has outstayed its usefulness.	2025-07-04 08:00:36 +10:00
Ryan Ernst	be63963594	Add exception type response header for 5xx errors (#130226 ) There are only a handful of relevant 5xx http error codes. Yet distinguishing errors can be important for debugging large numbers of errors. This commit adds a new `X-Elasticsearch-Exception` response header to 5xx error responses which contains the class name of the Java exception.	2025-07-04 07:24:20 +10:00
Tim Brooks	547d4a4287	Do not dispatch in IngestService (#130253 ) The ingest service dispatches to the provided executor. This work became redundant several years ago when we moved the transport bulk action to be dispatched from the transport threads for the action itself. With this behavior, there is no reason to re-dispatch an ingest action.	2025-07-03 11:19:02 -06:00
Benjamin Trent	e5da80f472	Adjust IVF fixup phase to sometimes bypass some of the neighborhood calculations (#130490 ) * Improve ivf index time during fixup phase * iter * addressing PR comments	2025-07-03 11:11:10 -04:00
Albert Zaharovits	83084118ae	Fix ThreadPoolMergeExecutorServiceTests testIORateIsAdjustedForAllRunningMergeTasks (#130545 ) The test submits merge tasks that support IO throttling, and asserts that all the currently running merge tasks are indeed IO throttled after the new one was submitted. The test erroneously tried to assert a property on the set of currently running merge tasks, which is very difficult to do since all merge tasks are possibly backlogged and re-enqueued asynchronously multiple times before they are run or aborted (so looking at the threadpool merge task queue there's no telling which merge task will execute first). Fixes https://github.com/elastic/elasticsearch/issues/129531	2025-07-04 00:44:00 +10:00
Jim Ferenczi	f91124a59e	Fix default index options when dimensions are unset for legacy indices (#130540 ) In #129825, we modified the dense_vector field type to delay setting index options until the field's dimensions are known. However, this introduced a discrepancy for indices created before that change, which would previously default to int8_hnsw even when dimensions were not set. This discrepancy leads to an assertion failure in mixed-version clusters, where the serialized mappings differ between nodes: ``` [2025-07-02T20:37:29,852][ERROR][o.e.b.ElasticsearchUncaughtExceptionHandler] [v9.0.4-2] fatal error in thread [elasticsearch[v9.0.4-2][clusterApplierService#updateTask][T#1]], exiting java.lang.AssertionError: provided source [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine"}}}}] differs from mapping [{"_doc":{"properties":{"vector":{"type":"dense_vector","index":true,"similarity":"cosine","index_options":{"type":"int8_hnsw","m":16,"ef_construction":100}}}}}] ``` This commit resolves the issue by ensuring that indices created before the change continue to default to int8_hnsw index options, even if dimensions remain unset.	2025-07-03 14:13:34 +01:00
Jeremy Dahlgren	e07f9fe075	Replace '_remote_access' with REMOTE_CLUSTER_PROFILE in comment (#130489 ) There isn't a '_remote_access' profile, it looks like the author meant to write '_remote_cluster'. This commit changes it to REMOTE_CLUSTER_PROFILE to match the code below and avoid confusing the reader with a profile name that doesn't exist.	2025-07-03 08:03:39 -04:00
Tim Vernum	3bda6af023	Handle unavailable MD5 in ES\|QL (#130158 ) In Java 14 the `MessageDigest` specification was changed so that the "MD5" hash function is no longer required. It is permissible for a JRE to ship without support for MD5 hashes. This commit modifies the ES\|QL MD5 hash function implementation so that if the specified `MessageDigest` object is not available on startup, then the error is non-fatal, and the node will still boot successfully. If an ES\|QL query attempts to make use of md5, and it is unavailable, then the query will fail with an ES\|QL verification exception Resolves: #129689	2025-07-03 18:51:57 +10:00
Yang Wang	f15ef7c2ed	Snapshots support multi-project (#130000 ) This PR makes snapshot service code and APIs multi-project compatible. Resolves: ES-10225 Resolves: ES-10226	2025-07-03 18:48:44 +10:00
Yang Wang	74fd66c1f1	Drain responses on completion for TransportNodesAction (#130303 ) This PR ensures the node responses are copied and drained exclusively in onCompletion so that they do not get concurrently modified by cancellation. Resolves: #128852	2025-07-03 10:25:35 +10:00
Yang Wang	c17bfcbfc2	Migrate the reserved repository action to be per-project (#130155 ) Resolves: ES-10479	2025-07-03 10:25:27 +10:00
Oleksandr Kolomiiets	f3c5eb7815	Delete unowned documents during split (#130240 )	2025-07-02 16:20:10 -07:00
Ryan Ernst	9e6464eeae	Test for duplicate transport versions (#130494 ) We used to have an assertion during transport version loading that duplicate ids were not found, but it appears to have been lost in refactorings. This commit adds a test to ensure duplicate ids do not occur. relates #130486	2025-07-03 08:44:56 +10:00
Evgenii-Kazannik	5d0c5e02bd	Add Ibm Granite Completion and Chat Completion support (#129146 ) * Add Ibm Granite Completion and Chat Completion support * Apply suggestions * remove ibm watsonx transport version constant * update transport version	2025-07-02 16:57:16 -04:00
Mark Tozzi	82b6e45a81	fix transport version conflict (#130486 ) #129649 introduced a duplicate transport version. This fixes it.	2025-07-02 15:05:28 -04:00
Pooya Salehi	54c9db9a41	Record project deletions in ProjectStateRegistry (#130225 ) Project deletions is used to update the Stateless lease blob, as a project deletion is a notable event that that our stateless cluster consistency check should consider before acknowledging writes. Relates ES-11207	2025-07-03 04:08:56 +10:00
Patrick Doyle	89f701f4c4	Bootstrap entitlements for testing (#129268 ) * Fix ExceptionSerializationTests to use getCodeSource instead of getResource. Using getResource makes this sensitive to unrelated classpath entries, such as the entitlement bridge library, that get prepended to the classpath. * FIx logging tests to use org.elasticsearch.index instead of root logger. Using the root logger makes this sensitive to unrelated logging, such as from the entitlement library. * Fix entitlement error message by stashing the module name in ModuleEntitlements. Taking the actual module name from the class doesn't work in tests, where those classes are loaded from the classpath and so their module info is misleading. * Ignore server locations whose representative class isn't loaded * Partial initial implementation * System properties: testOnlyClasspath and enableForTests * Trivially allow some packages * DEBUG: use TreeMap in TestScopeResolver for readability * Special case bouncycastle for security plugin * Add CONFIG to TestPathLookup * Add the classpath to the source path list for every plugin * Add @WithoutEntitlements to tests that run ES nodes * Set es.entitlement.enableForTests for all libs * Use @WithoutEntitlements on ingest plugin tests * Substitute ALL-UNNAMED for module name in non-modular plugins * Add missing entitlements found by unit tests * Comment in TestScopeResolver * Properly compute bridge jar location for patch-module * Call out nonServerLibs * Don't build two TestPathLookups * More comments for meta-tests * Remove redundant dependencies for bridgeJarConfig. These are alread set in ElasticsearchJavaBasePlugin. * Add bridge+agent dependencies only if those exist. For serverless, those project dependencies don't exist, and we'll need to add the dependencies differently, using Maven coordinates. * [CI] Auto commit changes from spotless * Pass testOnlyPath in environment instead of command line. It's typically a very very long string, which made Windows angry. * [CI] Auto commit changes from spotless * Split testOnlyPathString at File.pathSeparator * Use doFirst to delay setting testOnlyPath env var * Trivially allow jimfs (??) * Don't enforce entitlements on internalClusterTest for now * Replace forbidden APIs * Match testOnlyClasspath using URI instead of String. We already get the "needle" in the form of a URI, so this skips a step, and has the benefit of also working on Windows. * [CI] Auto commit changes from spotless * More forbidden APIs * Disable configuration cache for LegacyYamlRestTestPluginFuncTest * Strip carriage-return characters in expected output for ReleaseNotesGeneratorTest. The template generator also strips these, so we need to do so to make this pass on Windows. Note that we use replace("\r", "") where the template generator uses replace("\\r", ""). The latter didn't work for me when I tried it on Windows, for reasons I'm not aware of. * Move configureEntitlements to ElasticsearchTestBasePlugin as-is * Use matching instead of if * Remove requireNonNull * Remove default configuration * Set inputs instead of dependencies * Use test.systemProperty * Respond to PR comments * Disable entitlement enforcement for ScopedSettingsTests. This test works by altering the logging on the root logger. With entitlements enabled, that will cause additional log statements to appear, which interferes with the test. * Address PR comments * Moritz's configureJavaBaseModuleOptions * Allow for entitlements not yet enforced in serverless * fix entitlementBridge config after rename * drop empty file collections * Remove workaround in LegacyYamlRestTestPluginFuncTest --------- Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co> Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co> Co-authored-by: Moritz Mack <mmack@apache.org>	2025-07-02 13:44:38 -04:00
Mark Tozzi	a7a79f7612	ESQL - transport version change to support TSDB metadata (#129649 ) Relates to #128621 This is a first step in making the ES\|QL query planner aware of TSDB Dimensions and Metric field metadata. This is purposefully small to only touch the serialization change we'll need for this. The plan is get the TSDB metadata field type out of Field Caps and to store this information on EsField. This PR adds a place to store such a field, and adds it to the serialization for EsField and its sub-classes. As of this PR, we don't do anything with this data. That's intentional, to minimize the footprint of the transport version change. Further PRs in this project will load and act on this data. I've added some constructors here to minimize the number of files I'm touching in this PR. I hope that as we begin loading this data (as opposed to just defaulting it right now) we can get rid of some of these default value constructors.	2025-07-02 12:58:51 -04:00
Albert Zaharovits	72b5c0175a	assertTrue for submitMergeTask (#130430 ) Previously, out of zealousness for testing efficiency, the mocked filesystems were reused across the test suite class. But this makes tests liable to interference wrt to filesystem stats calls. Moreover, if one test fails, it can trigger failures in other test methods. This PR recreates the mocked filesystems for every test method. Fixes #129296 #130205	2025-07-03 02:17:01 +10:00
Parker Timmins	b38d5454ad	Add pattern_text feature flag to logsdb yaml tests (#130399 ) Patterned text yaml tests in logsdb plugin are failing as they assume the presence of the pattern_text type. Enable the feature flag for the tests.	2025-07-02 10:55:22 -05:00
Benjamin Trent	a9625cec7a	Reduce final clustering pass sample size (#130451 ) Figuring out the right balance on index throughput and speed is tricky. Initially I was digging into reducing the "neighborhood" size for the fix up. This actually harmed recall a bit too much in my tests, while it did speed things up. While I do think there is ground to be covered there, I pivoted to reducing the sample size since now we actually have true random sampling (instead of the first N docs). In the extreme case, this improves force-merge time by 25% with zero change in recall. On the lower end, it only improves about 8%. I really do think there is ground to be recovered in the "fix up phase", but this is a nice improvement :). ``` index_name index_type num_docs index_time(ms) force_merge_time(ms) num_segments ----------------------------------- ---------- -------- -------------- -------------------- ------------ corpus-quora-E5-small.fvec.flat ivf 500000 17443 18422 0 cohere-wikipedia-docs-768d.vec ivf 2000000 156320 193383 0 corpus-dbpedia-entity-arctic-0.fvec ivf 1000000 92902 82131 0 index_name index_type n_probe latency(ms) net_cpu_time(ms) avg_cpu_count QPS recall visited ----------------------------------- ---------- ------- ----------- ---------------- ------------- ------- ------ -------- corpus-quora-E5-small.fvec.flat ivf 10 0.95 0.00 0.00 1052.63 0.83 5713.06 corpus-quora-E5-small.fvec.flat ivf 20 0.69 0.00 0.00 1449.28 0.89 10620.80 corpus-quora-E5-small.fvec.flat ivf 30 0.81 0.00 0.00 1234.57 0.92 15498.94 corpus-quora-E5-small.fvec.flat ivf 40 0.94 0.00 0.00 1063.83 0.93 20088.68 corpus-quora-E5-small.fvec.flat ivf 50 1.11 0.00 0.00 900.90 0.94 24801.41 cohere-wikipedia-docs-768d.vec ivf 10 1.20 0.00 0.00 833.33 0.66 2824.19 cohere-wikipedia-docs-768d.vec ivf 20 1.33 0.00 0.00 751.88 0.74 4875.23 cohere-wikipedia-docs-768d.vec ivf 30 1.44 0.00 0.00 694.44 0.79 6974.69 cohere-wikipedia-docs-768d.vec ivf 40 1.56 0.00 0.00 641.03 0.81 9147.20 cohere-wikipedia-docs-768d.vec ivf 50 1.66 0.00 0.00 602.41 0.83 11478.62 cohere-wikipedia-docs-768d.vec ivf 60 1.80 0.00 0.00 555.56 0.85 13863.93 cohere-wikipedia-docs-768d.vec ivf 70 1.96 0.00 0.00 510.20 0.87 16301.12 cohere-wikipedia-docs-768d.vec ivf 80 2.05 0.00 0.00 487.80 0.88 18761.24 cohere-wikipedia-docs-768d.vec ivf 90 2.18 0.00 0.00 458.72 0.89 21185.38 cohere-wikipedia-docs-768d.vec ivf 100 2.27 0.00 0.00 440.53 0.90 23648.77 corpus-dbpedia-entity-arctic-0.fvec ivf 10 0.79 0.00 0.00 1265.82 0.52 3654.77 corpus-dbpedia-entity-arctic-0.fvec ivf 20 0.97 0.00 0.00 1030.93 0.61 7170.57 corpus-dbpedia-entity-arctic-0.fvec ivf 30 1.13 0.00 0.00 884.96 0.67 10761.73 corpus-dbpedia-entity-arctic-0.fvec ivf 40 1.27 0.00 0.00 787.40 0.70 14550.00 corpus-dbpedia-entity-arctic-0.fvec ivf 50 1.42 0.00 0.00 704.23 0.72 18149.22 corpus-dbpedia-entity-arctic-0.fvec ivf 60 1.61 0.00 0.00 621.12 0.74 21971.72 corpus-dbpedia-entity-arctic-0.fvec ivf 70 1.74 0.00 0.00 574.71 0.76 25612.96 corpus-dbpedia-entity-arctic-0.fvec ivf 80 1.94 0.00 0.00 515.46 0.77 29311.67 corpus-dbpedia-entity-arctic-0.fvec ivf 90 2.05 0.00 0.00 487.80 0.78 33034.66 corpus-dbpedia-entity-arctic-0.fvec ivf 100 2.23 0.00 0.00 448.43 0.80 36743.77 ```	2025-07-03 00:04:11 +10:00
Ignacio Vera	f81d35536d	optimize OptimizedScalarQuantizer#scalarQuantize (#129874 ) optimize OptimizedScalarQuantizer#scalarQuantize when destination can optimize OptimizedScalarQuantizer#scalarQuantize when destination can be an integer array	2025-07-02 14:57:59 +01:00
Dimitris Rempapis	8ffbf4a976	Throw a 400 when sorting for all types of range fields	2025-07-02 12:22:29 +03:00
Jordan Powers	a69c48477f	Add index version for match_only_text stored field in binary format (#130363 ) Follow-up to #130049 to gate using the binary format for the stored field in match_only_text fields behind an index version.	2025-07-01 18:19:18 -07:00
Ryan Ernst	4b7de2fa30	Add http-only headers to ElasticsearchException (#130348 ) When return an error from Elasticsearch exceptions may contain values written as http response headers. ElasticsearchException contains a map of headers that are added to the response. But these values are also written to a special "header" section of the response body. This commit renames the existing "headers" in ElasticsearchException to "body headers", which are both http headers and written to the response body. A new "http headers" is added for headers that should only be written as response headers.	2025-07-01 15:03:08 -07:00
Benjamin Trent	a6dfe64652	Fix sampling for kmeans and address assignment edge case (#130405 ) This is three fixes: - We should be doing actual sampling when doing kmeans clustering, taking the first N vectors creates some weird edge cases - Having assignments initialized as `0` means that if a vector gets assigned to cluster ord `0`, that cluster centroid actually isn't updated later in the lloyd steps. So, this initializes assignments to -1 - If we actually don't sample the vectors for lloyd, don't bother with final pass to potentially update the centroids	2025-07-02 06:54:46 +10:00
Benjamin Trent	044f34bf3e	Refactor bulk quantization writing into a unified class (#130354 ) this is a small refactor, laying ground work for more generalized bulk writing. I did some benchmarking and there was no significant performance difference (as expected).	2025-07-02 06:53:45 +10:00
James Baiera	2144baeb8c	[Streams] Add new ingest pipeline field access flag (#129096 ) This PR introduces a new flag to ingest pipeline configurations which will be used to control how fields are accessed from within that pipeline.	2025-07-01 15:50:53 -04:00
Parker Timmins	3a69d45892	Add test for matching middle key bug (#130396 ) There was a bug in previous version where flattened fields would produce incorrect synthetic source with too few opening braces. This bug was fixed as a side effect of #129600. Adding this test to confirm. See #129600 for a full explanation.	2025-07-01 14:24:48 -05:00
Keith Massey	ae450dad20	Correctly handling data stream settings when component templates are used (#130394 )	2025-07-01 12:40:37 -05:00
Nik Everett	c0744a1808	Don't build can_match queries we can't push to data nodes (#130210 ) Passes the minimum transport version down to expressions when we convert them into queries that we'll use for can_match. Right now all this is used for is skipping the can_match from the wildcard like queries. The queries we make there aren't serializable. We'll fix that - but this should give us the levers that we need to do it in a backwards incompatible way.	2025-07-01 08:05:20 -04:00
Tommaso Teofili	9edfa6642a	Wrap ES KNN queries with PatienceKNN query (#127223 )	2025-07-01 13:56:55 +02:00
Ryan Ernst	bc393b9b91	Handle createLink post security manager in StoreRecoveryTests (#129964 ) When running under security manager an assumptionw as made that failing to create a hard link due to security exception implied hard links were supported. Now that security manager is gone, the code to create a hard link in StoreRecoveryTests executes. But in the case of windows, BasicFileAttributes.fileKey does not return a unique object that can be used to verify a link exists. Yet the fact createLink returned is enough to trust the jdk was able to create a link. closes #124104	2025-06-30 23:31:50 +02:00
Jim Ferenczi	2142915fcb	Speed up (filtered) KNN queries for flat vector fields (#130251 ) For dense vector fields using the `flat` index, we already know a brute-force search will be used—so there’s no need to go through the codec’s approximate KNN logic. This change skips that step and builds the brute-force query directly, making things faster and simpler. I tested this on a setup with 10 million random vectors, each with 1596 dimensions and 17,500 partitions, using the `random_vector` track. The results: ### Performance Comparison \| Metric \| Before \| After \| Change \| \| ----------------- \| --------- \| ---------- \| --------- \| \| Throughput \| 221 ops/s \| 2762 ops/s \| 🟢 +1149% \| \| Latency (p50) \| 29.2 ms \| 1.6 ms \| 🔻 -94.4% \| \| Latency (p99) \| 81.6 ms \| 3.5 ms \| 🔻 -95.7% \| Filtered KNN queries on flat vectors are now over 10x faster on my laptop!	2025-06-30 19:19:35 +01:00
Keith Massey	ff52007996	Adding rest actions for getting and updating data stream mappings (#130241 )	2025-06-30 13:05:13 -05:00
Martijn van Groningen	b6e518f01a	Remove tmp_fdt_no_mmap feature flag. (#130308 ) After running this change for a week, no regressions where detected in nightly benchmarks that use index sorting.	2025-06-30 16:07:17 +02:00
Keith Massey	fe2d6dfcf7	Avoid using data stream mappings if feature flag is disabled (#130261 )	2025-06-30 08:07:11 -05:00
Yang Wang	6b0d53402c	Fix a racing condition in shard snapshot status update (#130302 ) The enqueued task can run before the code reaches the status update line. When it happens, the status update can set the status backwards. This PR fixes it by moving the status update before enqueuing the task. Resolves: #129752	2025-06-30 19:50:48 +10:00
Ievgen Degtiarenko	225f23cc01	Claim backported profile versions (#130187 )	2025-06-30 09:42:59 +02:00
Ignacio Vera	23cd462e07	Handle soar assignments when vector and centroid are very close (#130206 )	2025-06-30 08:50:36 +02:00
Albert Zaharovits	e9c12e7f01	Add TEST MergeWithLowDiskSpaceIT testForceMergeIsBlockedThenUnblocked (#130189 ) This adds a new IT for when forced merges are blocked (and then unblocked) because of the insufficient disk space situation. This test was suggested https://github.com/elastic/elasticsearch/pull/127613#pullrequestreview-2900135287.	2025-06-29 16:38:43 +10:00
Mike Pellegrini	52495aa5fc	Fix incorrect accounting of semantic text indexing memory pressure (#130221 )	2025-06-27 14:29:54 -04:00
Tim Brooks	ea2e7b4382	Reapply "Dispatch ingest work to coordination thread pool (#130152 ) This reverts commit `73b0a60`. Additionally, it adds thread pool documentation.	2025-06-27 11:34:28 -06:00
Keith Massey	21bb836ce5	Adding actions to get and update data stream mappings (#130042 )	2025-06-27 10:29:06 -05:00
Ignacio Vera	ce74df5c0c	Fix iterating for best centroid when algorithm is neighbour aware and decrease SAMPLES_PER_CLUSTER_DEFAULT (#130069 ) * KMeansIntermediate shares assigments	2025-06-27 13:28:12 +02:00
Sam Xiao	3200abc4ce	Make EnterpriseGeoIpDownloaderLicenseListener project aware (#129992 )	2025-06-27 16:12:09 +08:00
Jim Ferenczi	93e4e01277	Fix ES818BinaryQuantizedVectorsReader to not use directIO during merge (#130114 ) This commit fixes the BBQ reader to not use directIO when merging the original float vectors.	2025-06-27 09:03:16 +01:00

1 2 3 4 5 ...

16967 Commits