This adds two new vector index types: - flat - int8_flat
Both store the vectors in a flat space and search is brute-force over
the vectors in the index. For the regular `flat` index, this can be
considered syntactic sugar that allows `knn` queries without having to
put indices within HNSW.
For `int8_flat`, this allows float vectors to be stored in a flat
manner, but also automatically quantized.
The counted-terms aggregation is defined in its own plugin. When other
plugins (such as the profiling plugin) want to use this aggregation,
this leads to class loader issues, such as that the aggregation class is
not recognized. By moving just the aggregation code itself to the server
module but keeping everything else (including registration) in the
`mapper-counted-keyword` module, we can use the counted-terms
aggregation also from other plugins.
The server module exports the classes needed to use most
aggregations, but the random sampler aggregation was missed.
(I think it's because the PRs to add random sampler and to
add modularization in general were both long-running and were
in flight around the same time.) This PR adds an export for
the random sampler agg, so that it can be used from plugins.
Lucene 9.9 has introduced a new posting format that uses FOR instead of PFOR. Elasticsearch prefers the former
format, therefore we introduce it as a our own posting format here.
We generalise the code that is diagnosing the shard availability when it comes to data tier issues. We make it more extensible, so in serverless we can introduce new roles.
For this reason, we consider a tier as a more specific kind of a role. Then we expose some methods and some diagnosis definitions in the ShardsAvailabilityHealthIndicatorService so they can be extended.
* Modularize shard availability service
This commit moves the `ShardsAvailabilityHealthIndicatorService` to a package and modularizes it
with exports so that Serverless can make use of it as a superclass.
Relates to #101394
This adds an internal API and service to manage & get information on features that are present on nodes in a cluster. New features can be declared as supported, and historical features can be added to previous node versions to eventually replace node version comparisons
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
* Gauge - a single number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increasing summed value
* UpDownCounter - summed value that may decrease
Supports both Long* and Double* versions of the instruments.
Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.
The metering provider starts as the open telemetry noop provider.
`telemetry.metrics.enabled` turns on metering.
with the support of metrics the TracerPlugin name is no longer adequate. Renaming this to TelemetryPlugin.
Also introducing TelemetryProvider interface. While it is only used in Node.java at the moment to fetch Tracer instance, it is intended to be used in Plugin::createComponents (to be done in separate commit due to
the broad scope of this method)
This will allow for plugins to get access to both Tracer and Metric interfaces
without the need to add yet another argument to createComponents
Also adding internal subpackage in module/apm so that it is more obvious
which packages are not exported
This adds a `ComponentVersionNumber` service interface for modules to provide version numbers for individual components to be reported inside node info. Initial implementations for `MlConfigVersion` and `TransformConfigVersion` are provided.
This commit renames the tracing to telemetry.tracing in both xpack/APM and elasticserach's org.elasticsearch.tracing.Tracer (the api)
the xpack/APM is renamed as follows:
org.elasticsearch.telemetry.apm - the only exported package
org.elasticsearch.telemetry.apm.settings - APMSettings
org.elasticsearch.telemetry.apm.tracing - APMTracer
org.elasticsearch.tracing.Tracer is moved to org.elasticsearch.telemetry.tracing.Tracer (responsible for majority of the changes in this PR)
When overriding transport and index versions, it is difficult to decide
whether a version constant in serverless should be returned, or the
latest version constant from serverless should be used, since the latest
from serverless is not available. This commit adjust the version
extension methods to pass in the latest serverless version constants. It
also tweaks module and method visibility for a helper method needed.
Cluster state currently holds a cluster minimum transport version and a map of nodes to transport versions. However, to determine node compatibility, we will need to account for more types of versions in cluster state than just the transport version (see #99076). Here we introduce a wrapper class to cluster state and update accessors and builders to use the new method. (I would have liked to re-use org.elasticsearch.cluster.node.VersionInformation, but that one holds IndexVersion rather than TransportVersion.
* Introduce CompatibilityVersions to cluster state class
Currently plugins register settings through Plugin.getSettings. For
easier breakdown of the codebase, it would be nice to allow arbitrary
Java modules to register settings. This commit adds an internal
SettingsExtension SPI which acts just like Plugin.getSettings but from a
purely static context.
This commit changes the ActionModules to allow the RestController to be
provided by an internal plugin.
It renames `RestInterceptorActionPlugin` to `RestServerActionPlugin`
and adds a new `getRestController` method to it.
There may be multiple RestServerActionPlugins installed on a node, but
only 1 may provide a Rest Wrapper (getRestHandlerInterceptor) and only 1
may provide a RestController (getRestController).
In serverless we will like to report (meter and bill) upon a document ingestion. The metering should be agnostic to a document format (document structure should be normalised) hence we should allow to create XContentParsers which will keep track of parsed fields and values.
There are 2 places where the parsing of the ingested document happens:
1. upon the 'raw bulk' a request is sent without the pipelines
2. upon the 'ingest service' when a request is sent with pipelines
(parsing can occur twice when a dynamic mappings are calculated, this PR takes this into account and prevent double billing)
We also want to make sure, that the metering logic is not unnecessarily executed when a document was already reported. That is if a document was reported in IngestService, there is no point wrapping the XContentParser again.
This commit introduces a `DocumentReporterPlugin` an internal plugin that will be implemented in serverless. This plugin should return a `DocumentParsingObserver` supplier which will create a `DocumentParsingObserver`. A DocumentParsingObserver is used to wrap an `XContentParser` with an implementation that keeps track of parsed fields and values (performs a metering) and allows to send that information along with an index name to a MeteringReporter.
The nodes info returns some information from the build. However, the
flavor is still hardcoded to default, even though flavor was added back
to Build. This commit exposes the build flavor again in the nodes
info response. It also fixes the build extension to be accessible in
serverless.
This PR introduces downsampling configuration to the data stream lifecycle. Keep in mind downsampling implementation will come in a follow up PR. Configuration looks like this:
```
{
"lifecycle": {
"data_retention": "90d",
"downsampling": [
{
"after": "1d",
"fixed_interval": "2h"
},
{ "after": "15d", "fixed_interval": "1d" },
{ "after": "30d", "fixed_interval": "1w" }
]
}
}
```
We will also support using `null` to unset downsampling configuration during template composition:
```
{
"lifecycle": {
"data_retention": "90d",
"downsampling": null
}
}
```
This commit adds the Log4j JUL bridge so that messages using JUL are
more nicely converted to log4j messages. Currently these messages are
captured via the stdout logging stream. This commit also adds a log4j
filter to replace the logging stream filtering mechanism used to quiet
some Lucene log messages that may be confusing to users.
closes#94613
* Initial import for TDigest forking.
* Fix MedianTest.
More work needed for TDigestPercentile*Tests and the TDigestTest (and
the rest of the tests) in the tdigest lib to pass.
* Fix Dist.
* Fix AVLTreeDigest.quantile to match Dist for uniform centroids.
* Update docs/changelog/96086.yaml
* Fix `MergingDigest.quantile` to match `Dist` on uniform distribution.
* Add merging to TDigestState.hashCode and .equals.
Remove wrong asserts from tests and MergingDigest.
* Fix style violations for tdigest library.
* Fix typo.
* Fix more style violations.
* Fix more style violations.
* Fix remaining style violations in tdigest library.
* Update results in docs based on the forked tdigest.
* Fix YAML tests in aggs module.
* Fix YAML tests in x-pack/plugin.
* Skip failing V7 compat tests in modules/aggregations.
* Fix TDigest library unittests.
Remove redundant serializing interfaces from the library.
* Remove YAML test versions for older releases.
These tests don't address compatibility issues in mixed cluster tests as
the latter contain a mix of older and newer nodes, so the output depends
on which node is picked as a data node since the forked TDigest library
is not backwards compatible (produces slightly different results).
* Fix test failures in docs and mixed cluster.
* Reduce buffer sizes in MergingDigest to avoid oom.
* Exclude more failing V7 compatibility tests.
* Update results for JdbcCsvSpecIT tests.
* Update results for JdbcDocCsvSpecIT tests.
* Revert unrelated change.
* More test fixes.
* Use version skips instead of blacklisting in mixed cluster tests.
* Switch TDigestState back to AVLTreeDigest.
* Update docs and tests with AVLTreeDigest output.
* Update flaky test.
* Remove dead code, esp around tracking of incoming data.
* Update docs/changelog/96086.yaml
* Delete docs/changelog/96086.yaml
* Remove explicit compression calls.
This was added to prevent concurrency tests from failing, but it leads
to reduces precision. Submit this to see if the concurrency tests are
still failing.
* Revert "Remove explicit compression calls."
This reverts commit 5352c96f65.
* Remove explicit compression calls to MedianAbsoluteDeviation input.
* Add unittests for AVL and merging digest accuracy.
* Fix spotless violations.
* Delete redundant tests and benchmarks.
* Fix spotless violation.
* Use the old implementation of AVLTreeDigest.
The latest library version is 50% slower and less accurate, as verified
by ComparisonTests.
* Update docs with latest percentile results.
* Update docs with latest percentile results.
* Remove repeated compression calls.
* Update more percentile results.
* Use approximate percentile values in integration tests.
This helps with mixed cluster tests, where some of the tests where
blocked.
* Fix expected percentile value in test.
* Revert in-place node updates in AVL tree.
Update quantile calculations between centroids and min/max values to
match v.3.2.
* Add SortingDigest and HybridDigest.
The SortingDigest tracks all samples in an ArrayList that
gets sorted for quantile calculations. This approach
provides perfectly accurate results and is the most
efficient implementation for up to millions of samples,
at the cost of bloated memory footprint.
The HybridDigest uses a SortingDigest for small sample
populations, then switches to a MergingDigest. This
approach combines to the best performance and results for
small sample counts with very good performance and
acceptable accuracy for effectively unbounded sample
counts.
* Remove deps to the 3.2 library.
* Remove unused licenses for tdigest.
* Revert changes for SortingDigest and HybridDigest.
These will be submitted in a follow-up PR for enabling MergingDigest.
* Remove unused Histogram classes and unit tests.
Delete dead and commented out code, make the remaining tests run
reasonably fast. Remove unused annotations, esp. SuppressWarnings.
* Remove Comparison class, not used.
* Small fixes.
* Add javadoc and tests.
* Remove special logic for singletons in the boundaries.
While this helps with the case where the digest contains only
singletons (perfect accuracy), it has a major issue problem
(non-monotonic quantile function) when the first singleton is followed
by a non-singleton centroid. It's preferable to revert to the old
version from 3.2; inaccuracies in a singleton-only digest should be
mitigated by using a sorted array for small sample counts.
* Revert changes to expected values in tests.
This is due to restoring quantile functions to match head.
* Revert changes to expected values in tests.
This is due to restoring quantile functions to match head.
* Tentatively restore percentile rank expected results.
* Use cdf version from 3.2
Update Dist.cdf to use interpolation, use the same cdf
version in AVLTreeDigest and MergingDigest.
* Revert "Tentatively restore percentile rank expected results."
This reverts commit 7718dbba59.
* Revert remaining changes compared to main.
* Revert excluded V7 compat tests.
* Exclude V7 compat tests still failing.
* Exclude V7 compat tests still failing.
* Restore bySize function in TDigest and subclasses.
Version.CURRENT is statically loaded as a constant early during startup.
Yet serverless needs to be able to override the current Version so it
can add additional versions. This commit makes Version.CURRENT pluggable
via SPI. Note that the only way for this to be plugged in is via an
additional jar on the boot layer.
This commit adds a mechanism to be exposed internally to allow
plugins/modules to react to termination signals via SPI.
I'm not entirely sure how to test this - normally I'd use an
`ESIntegTestCase` and supply a stubbed version of the plugin but that's
more difficult with SPI.
Supercedes https://github.com/elastic/elasticsearch/pull/95518
we want to allow overriding info (GET /) api in serverless, therefore this commit moves the RestMainAction and is transport classes into a module that has a rest plugin
Main endpoint is often used in testing to verfiy that a cluster is ready, hence this commit also has to add a testing dependency on main to a lot of modules
relates #95422
This change at a high level adds global ranking on the coordinating node at the end of query reduction
prior to the fetch phase. Individual rank methods are defined in plugins.
The first rank plugin added as part of this change is reciprocal rank fusion (RRF). RRF uses a relatively
simple formula for merging 1...n results sets together with sum(1/(k+d)) where k is a ranking constant
and d is a document's scored position within a result set from a query.
The preallocate module needs access to java.io internals. However, in
order to open java.io to a specific module, rather than the unnamed
module as was previously done, the said module must be in the boot
layer.
This commit moves the preallocate module to libs. It adds it to the main
lib dir, though it does not add it as a compile dependency of server.
We built quite a bit of infrastructure to have one polling job
running via the `SchedulerEngine` and `ActiveSchedule`. This moves this
infrastructure outside x-pack to server so elasticsearch/modules can use
it and avoid re-implementing it using `threadPool.schedule`.
Introduces:
* New action that can be used to broadcast to unpromotable shards of a given IndexShardRoutingTable.
* New hook in ReplicationOperation for custom logic when the primary operation completes. If there is a failure, this increases the shard failures of the replication operation.
* Refresh action now uses the new hook to broadcast the unpromotable refresh action to all unpromotable shards.
Fixes ES-5454
Fixes ES-5212
Refactoring that drops the api suffix from package name
This will have to be followed up by a plugins/examples fix in imports
Also set an artifact group name to `org.elasticsearch.plugin` in the plugin-api and plugin-analysis-api
This doc values format is meant to be used for TSDB. It provides
compression using delta encoding, gcd encoding and bit-packing
encoding. The expectation is that such encoding works well for
fields whose values are monotonically increasing, like the timestamp
field and counter metric fields. Encoding and decoding fields using
the new format is available behind a feature flag.