elasticsearch

Commit Graph

Author	SHA1	Message	Date
David Turner	97db0c2182	Remove `{Indices,}ClusterStateUpdateRequest` (#113483 ) These abstract classes are now unused so this commit removes them.	2024-09-25 08:17:51 +01:00
Simon Cooper	f9aa6f40cd	Always use CLDR locale on ES v9 (#113184 ) Regardless of JDK version, ES should always use CLDR locale database from 9.0.0. This also removes IsoCalendarDataProvider used to override week-date calculations for the root locale only.	2024-09-23 11:05:08 +01:00
Ryan Ernst	2ecfb397ad	Remove plugin classloader indirection (#113154 ) Extensible plugins use a custom classloader for other plugin jars. When extensible plugins were first added, the transport client still existed, and elasticsearch plugins did not exist in the transport client (at least not the ones that create classloaders). Yet the transport client still created a PluginsService. An indirection was used to avoid creating separate classloaders when the transport client had created the PluginsService. The transport client was removed in 8.0, but the indirection still exists. This commit removes that indirection layer.	2024-09-20 07:45:40 -07:00
Mark Vieira	a59c182f9f	Add AGPLv3 as a supported license	2024-09-13 15:29:46 -07:00
Patrick Doyle	50871a3d28	New injector (#111722 ) * Initial new injector * Allow createComponents to return classes * Downsample injection * Remove more vestiges of subtype handling * Lowercase logger * Respond to code review comments * Only one object per class * Some additional cleanup incl spotless * PR feedback * Missed one * Rename workQueue * Remove Injector.addRecordContents * TelemetryProvider requires us to inject an object using a supertype * Address Simon's comments * Clarify the reason for SuppressForbidden * Make log indentation code less intrusive	2024-08-28 11:13:47 -04:00
David Turner	f150e2c11d	Add telemetry for repository usage (#112133 ) Adds to the `GET _cluster/stats` endpoint information about the snapshot repositories in use, including their types, whether they are read-only or read-write, and for Azure repositories the kind of credentials in use.	2024-08-27 23:34:02 +10:00
Panagiotis Bailis	b685a436ce	Adding RankDocsRetrieverBuilder and RankDocsQuery (#111709 )	2024-08-26 15:18:47 +03:00
Patrick Doyle	35a375329a	Move Guice to org.elasticsearch.injection.guice (#111723 ) * Move files and fix imports & module exports * Other consequences of moving Guice	2024-08-12 10:47:46 -04:00
Keith Massey	a2814e816b	Adding mapping validation to the simulate ingest API (#110606 )	2024-07-19 08:08:21 -05:00
Joe Gallo	27e7601698	Directly download commercial ip geolocation databases from providers (#110844 ) Co-authored-by: Keith Massey <keith.massey@elastic.co>	2024-07-17 20:55:14 -04:00
Ryan Ernst	e6713a5c0a	Remove JNA from server dependencies (#110809 ) All native methods are now bound through NativeAccess. This commit removes the jna dependency from server. relates #104876	2024-07-12 19:49:13 -07:00
Ryan Ernst	8417d3f141	Move preallocate functionality to native access (#110678 ) This commit moves the file preallocation functionality into NativeAccess. The code is basically the same. One small tweak is that instead of breaking Java access boundaries in order to get an open file handle, the new code uses posix open directly. relates #104876	2024-07-11 09:42:44 -07:00
Mayya Sharipova	405e39660b	Support k parameter for knn query (#110233 ) Introduce an optional k param for knn query If k is not set, knn query has the previous behaviour: - `num_candidates` docs is collected from each shard. This `num_candidates` docs are used for combining with results with other queries and aggregations on each shard. - docs from all shards are merged to produce the top global `size` results If k is set, the behaviour instead is following: - `k` docs is collected from each shard. This `k` docs are used for combining results with other queries and aggregations on each shard. - similarly, docs from all shards are merged to produce the top global `size` results. Having `k` param makes it more intuitive for users to address their needs. They also don't need to care and can skip `num_candidates` param for this query as it is of more internal details to tune how knn search operates. Closes #108473	2024-06-28 09:59:28 -04:00
Benjamin Trent	5add44d7d1	Adds new `bit` element_type for dense_vectors (#110059 ) This commit adds `bit` vector support by adding `element_type: bit` for vectors. This new element type works for indexed and non-indexed vectors. Additionally, it works with `hnsw` and `flat` index types. No quantization based codec works with this element type, this is consistent with `byte` vectors. `bit` vectors accept up to `32768` dimensions in size and expect vectors that are being indexed to be encoded either as a hexidecimal string or a `byte[]` array where each element of the `byte` array represents `8` bits of the vector. `bit` vectors support script usage and regular query usage. When indexed, all comparisons done are `xor` and `popcount` summations (aka, hamming distance), and the scores are transformed and normalized given the vector dimensions. Note, indexed bit vectors require `l2_norm` to be the similarity. For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is `sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported. Note, the dimensions expected by this element_type are always to be divisible by `8`, and the `byte[]` vectors provided for index must be have size `dim/8` size, where each byte element represents `8` bits of the vectors. closes: https://github.com/elastic/elasticsearch/issues/48322	2024-06-27 04:48:41 +10:00
Patrick Doyle	43b2e877e0	Revert "Move PluginsService to its own internal package (#109872 )" (#109946 ) This reverts commit `b9e7965184`.	2024-06-19 18:10:50 -04:00
Patrick Doyle	b9e7965184	Move PluginsService to its own internal package (#109872 ) * Mechanical package change in IntelliJ * A couple of manual fixups * Export plugins.loading to deprecation * Put plugin-cli in a module so can export PluginsUtils to it.	2024-06-19 15:23:47 -04:00
Benjamin Trent	acc99302c6	Adding hamming distance function to painless for dense_vector fields (#109359 ) This adds `hamming` distances, the pop-count of `xor` byte vectors as a first class citizen in painless. For byte vectors, this means that we can compute hamming distances via script_score (aka, brute-force). The implementation of `hamming` is the same that is available in Lucene, and when lucene 9.11 is merged, we should update our logic where applicable to utilize it. NOTE: this does not yet add hamming distance as a metric for indexed vectors. This will be a future PR after the Lucene 9.11 upgrade.	2024-06-18 03:41:20 +10:00
Chris Hegarty	fa364bfcaf	Rename the vec module to better reflect that it provides SIMD optimized vector scorers (#109661 ) This commit renames the vector module to better reflect its intent - to provide SIMD optimized vector scorer implementations.	2024-06-17 11:10:02 +01:00
Panagiotis Bailis	4dd4356c15	Adding RerankingContext classes to support global reranking (#109435 )	2024-06-12 18:09:06 +03:00
Ryan Ernst	0be3c741df	Guard file settings readiness on file settings support (#109500 ) Consistency of file settings is an important invariant. However, when upgrading from Elasticsearch versions before file settings existed, cluster state will not yet have the file settings metadata. If the first node upgraded is not the master node, new nodes will never become ready while they wait for file settings metadata to exist. This commit adds a node feature for file settings to guard waiting on file settings for readiness. Although file settings has existed since 8.4, the feature is not a historical feature because historical features are not applied to cluster state that readiness checks. In this case it is not needed since upgrading from 8.4+ will already contain file settings metadata.	2024-06-11 06:55:53 +10:00
Panagiotis Bailis	4a1d7426d7	Adding RankFeature implementation (#108538 )	2024-06-06 11:20:53 +03:00
Przemyslaw Gomulka	437e7db499	Refactor reporting of RA metrics to not to be done in TransportShardBulkAction (#108449 ) previously DocumentSizeReporter was reporting upon indexing being completed in TransportShardBulkAction#onComplete This commit renames the method to onIndexingCompleted and moves that reporting to IndexEngine in serverless plugin. This will be followed up in a separate PR that will be reporting in an Engine#index subclass (serverless)	2024-05-16 13:57:06 +02:00
Simon Cooper	e7350dce29	Add a capabilities API to check node and cluster capabilities (#106820 ) This adds a /_capabilities rest endpoint for checking the capabilities of a cluster - what endpoints, parameters, and endpoint capabilities the cluster supports	2024-05-08 14:44:26 +01:00
Kostas Krikellas	3183e6d6c9	Add ignored field values to synthetic source (#107567 ) * Add ignored field values to synthetic source * Update docs/changelog/107567.yaml * initialize map * yaml fix * add node feature * add comments * small fixes * missing cluster feature in yaml * constants for chars, stored fields * remove duplicate method * throw exception on parse failure * remove Base64 encoding * add assert on IgnoredValuesFieldMapper::write * changes from review * simplify logic * add comment * rename classes * rename _ignored_values to _ignored_source * rename _ignored_values to _ignored_source	2024-04-26 15:35:31 +03:00
Panagiotis Bailis	d029d40cea	Adding new RankContext classes per different search phase/node type (#107093 )	2024-04-24 14:58:16 +03:00
Mary Gouseti	119f6e71ce	[Data stream lifecycle] Introduce factory retention settings (#107741 ) We introduce the plumbing so that a plugin can provide factory retention. This retention will take effect if there is no global retention provided by the user. Without a plugin defining the factory retention, elasticsearch will have no factory retention.	2024-04-24 11:52:24 +03:00
Howard	fdbb21bba4	Support effective watermark thresholds in node stats API (#107244 ) Adds to the `fs` component of the node stats API some additional values indicating the disk watermarks that are currently in effect. Relates #106676	2024-04-18 09:57:28 -04:00
Chris Hegarty	6b52d7837b	Add an optimised int8 vector distance function for aarch64. (#106133 ) This commit adds an optimised int8 vector distance implementation for aarch64. Additional platforms like, say, x64, will be added as a follow-up. The vector distance implementation outperforms Lucene's Pamana Vector implementation for binary comparisons by approx 5x (depending on the number of dimensions). It does so by means of compiler intrinsics built into a separate native library and link by Panama's FFI. Comparisons are performed on off-heap mmap'ed vector data. The implementation is currently only used during merging of scalar quantized segments, through a custom format ES814HnswScalarQuantizedVectorsFormat, but its usage will likely be expanded over time. Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Lorenzo Dematté <lorenzo.dematte@elastic.co> Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Ryan Ernst <ryan@iernst.net>	2024-04-12 08:44:21 +01:00
Tim Vernum	36d5282907	Allow additional JSON log fields via SPI (#106980 ) This adds a new SPI based `LoggingDataProvider` service that can be implemented in order to add new fields to the main JSON log	2024-04-10 22:14:00 -04:00
Adrien Grand	49ffa045a6	Cut over stored fields to ZSTD for compression. (#103374 ) This cuts over stored fields with `index.codec: best_speed` (default) to ZSTD with level 0 and blocks of at most 128 documents or 14kB, and `index.codec: best_compression` to ZSTD with level 3 and blocks of at most 2,048 documents or 240kB. Compared with the current codecs, this would yield similar indexing speed, much better space efficiency and similar retrieval speed. Benchmarks on the `elastic/logs` track suggest 10% better storage efficiency and slightly faster ingestion. The Lucene codec infrastructure records the codec on a per-segment basis and ensures that this change is backward-compatible. Segments will get progressively migrated to ZSTD as they get merged in the background. Bindings for ZSTD are provided by the Panama FFI API on JDK21+ and JNA on older JDKs. ZSTD support is currently behind a feature flag, so it won't be enabled immediately when this feature gets merged, this will need a follow-up change. Co-authored-by: Mark Vieira <portugee@gmail.com> Co-authored-by: Ryan Ernst <ryan@iernst.net>	2024-04-09 09:18:58 +02:00
Jack Conradson	68b0acac8f	Add retrievers using the parser-only approach (#105470 ) This enhancement adds a new abstraction to the _search API called "retriever." A retriever is something that returns top hits. This adds three initial retrievers called "standard", "knn", and "rrf". The retrievers use a parser-only approach where they are parsed and then translated into a SearchSourceBuilder to execute the actual search. --------- Co-authored-by: Mayya Sharipova <mayya.sharipova@elastic.co>	2024-03-12 10:11:55 -07:00
Andrei Dan	882b92ab60	Add service for computing the optimal number of shards for data streams (#105498 ) This adds the `DataStreamAutoShardingService` that will compute the optimal number of shards for a data stream and return a recommendation as to when to apply it (a time interval we call cool down which is 0 when the auto sharding recommendation can be applied immediately). This also introduces a `DataStreamAutoShardingEvent` object that will be stored in the data stream metadata to indicate the last auto sharding event that was applied to a data stream and its cluster state representation looks like so: ``` "auto_sharding": { "trigger_index_name": ".ds-logs-nginx-2024.02.12-000002", "target_number_of_shards": 3, "event_timestamp": 1707739707954 } ``` The auto sharding service is not used in this PR, so the auto sharding event will not be stored in the data stream metadata, but the required infrastructure to configure it is in place.	2024-03-06 05:12:08 -05:00
Ryan Ernst	6375e9f443	Add native access library (#105100 ) Elasticsearch requires access to some native functions. Historically this has been achieved with the JNA library. However, JNA is a complicated, magical library, and has caused various problems booting Elasticsearch over the years. The new Java Foreign Function and Memory API allows access to call native functions directly from Java. It also has the advantage of tight integration with hotspot which can improve performance of these functions (though performance of Elasticsearch's native calls has never been much of an issue since they are mostly at boot time). This commit adds a new native lib that is internal to Elasticsearch. It is built to use the foreign function api starting with Java 21, and continue using JNA with Java versions below that. Only one function, checking whether Elasticsearch is running as root, is migrated. Future changes will migrate other native functions.	2024-02-07 18:27:09 -05:00
Niels Bauman	64891011d3	Extend `repository_integrity` health indicator for unknown and invalid repos (#104614 ) This PR extends the repository integrity health indicator to cover also unknown and invalid repositories. Because these errors are local to a node, we extend the `LocalHealthMonitor` to monitor the repositories and report the changes in their health regarding the unknown or invalid status. To simplify this extension in the future, we introduce the `HealthTracker` abstract class that can be used to create new local health checks. Furthermore, we change the severity of the health status when the repository integrity indicator reports unhealthy from `RED` to `YELLOW` because even though this is a serious issue, there is no user impact yet.	2024-02-07 15:18:55 +01:00
Craig Taverner	a58b2c2b05	Move doc-values classes needed by ST_INTERSECTS to server (#104980 ) * Move doc-values classes needed by ST_INTERSECTS to server This classes are needed by ESQL spatial queries, and are not licensed in a way that prevents this move. Since they depend on lucene it is not possible to move them to a library. Instead they are moved to be co-located with the GeoPoint doc-values classes that already exist in server. * Moved to lucene package org.elasticsearch.lucene.spatial * Moved Geo/ShapeDocValuesQuery to server because it is Lucene specific And this gives us access to these classes from ESQL for lucene-pushdown of spatial queries.	2024-02-07 15:00:38 +01:00
Benjamin Trent	43362d5de5	Add new int8_flat and flat vector index types (#104872 ) This adds two new vector index types: - flat - int8_flat Both store the vectors in a flat space and search is brute-force over the vectors in the index. For the regular `flat` index, this can be considered syntactic sugar that allows `knn` queries without having to put indices within HNSW. For `int8_flat`, this allows float vectors to be stored in a flat manner, but also automatically quantized.	2024-02-05 12:56:13 -05:00
Daniel Mitterdorfer	6e15229f6e	Make counted terms agg visible to profiling (#105049 ) The counted-terms aggregation is defined in its own plugin. When other plugins (such as the profiling plugin) want to use this aggregation, this leads to class loader issues, such as that the aggregation class is not recognized. By moving just the aggregation code itself to the server module but keeping everything else (including registration) in the `mapper-counted-keyword` module, we can use the counted-terms aggregation also from other plugins.	2024-02-02 15:56:07 +01:00
David Roberts	4e91d690e5	Export random sampler agg from server (#104747 ) The server module exports the classes needed to use most aggregations, but the random sampler aggregation was missed. (I think it's because the PRs to add random sampler and to add modularization in general were both long-running and were in flight around the same time.) This PR adds an export for the random sampler agg, so that it can be used from plugins.	2024-01-25 13:01:35 +00:00
Ignacio Vera	4e7a0dae19	Introduce Elasticsearch PostingFormat based on Lucene 90 positing format using PFOR (#103601 ) Lucene 9.9 has introduced a new posting format that uses FOR instead of PFOR. Elasticsearch prefers the former format, therefore we introduce it as a our own posting format here.	2023-12-20 15:09:24 +01:00
Mary Gouseti	9e3d0dbaf8	[Health API] Abstract data tier diagnoses as node roles (#102466 ) We generalise the code that is diagnosing the shard availability when it comes to data tier issues. We make it more extensible, so in serverless we can introduce new roles. For this reason, we consider a tier as a more specific kind of a role. Then we expose some methods and some diagnosis definitions in the ShardsAvailabilityHealthIndicatorService so they can be extended.	2023-11-22 17:20:10 +02:00
Simon Cooper	4c98fd9c5c	Add a historical feature for transport version fixups (#102211 ) Make sure logging is configured in the historical versions task Co-authored-by: Mark Vieira <portugee@gmail.com>	2023-11-16 10:02:12 +00:00
Simon Cooper	0c18798d59	Add feature for index mapping auto-put (#101668 )	2023-11-13 13:43:56 +00:00
Lee Hinman	4952f986ce	Modularize shard availability service (#101796 ) * Modularize shard availability service This commit moves the `ShardsAvailabilityHealthIndicatorService` to a package and modularizes it with exports so that Serverless can make use of it as a superclass. Relates to #101394	2023-11-03 15:59:09 -06:00
Simon Cooper	e851b303d0	Migrate desirednode processors version checks to features (#101706 )	2023-11-03 13:57:47 +00:00
Simon Cooper	580283025e	Unify naming of feature spec implementations (#101704 ) Use a naming scheme of <area>Features	2023-11-03 09:24:50 +00:00
Simon Cooper	1bb1c7be04	Create a historical feature for the get settings rest action (#101684 )	2023-11-02 08:51:39 +00:00
Simon Cooper	f6a211225a	Add historical feature for cluster health checks (#101538 )	2023-10-31 10:06:14 +00:00
Simon Cooper	bfad5e5b13	Create new feature API for querying features present on a cluster (#100974 ) This adds an internal API and service to manage & get information on features that are present on nodes in a cluster. New features can be declared as supported, and historical features can be added to previous node versions to eventually replace node version comparisons	2023-10-30 14:38:30 +00:00
Stuart Tettemer	f8d09e9c6c	APM Metering API (#99832 ) Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types: * Gauge - a single number that can go up or down * Histogram - bucketed samples * Counter - monotonically increasing summed value * UpDownCounter - summed value that may decrease Supports both Long* and Double* versions of the instruments. Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider. The metering provider starts as the open telemetry noop provider. `telemetry.metrics.enabled` turns on metering.	2023-09-28 19:35:46 -05:00
David Kyle	096cf81670	[ML] Make Inference Services pluggable (#99886 ) Creates an InferenceServicePlugins interface for inference services to implement and adds a test implementation to mock an inference service.	2023-09-27 13:35:45 +01:00

1 2 3

110 Commits