Adds `WriteScript` as the common base class for the write scripts: `IngestScript`, `UpdateScript`, `UpdateByQueryScript` and `ReindexScript`.
This pulls the common `getCtx()` and `metadata()` methods into the base class and prepares for the implementation of the ingest fields api (https://github.com/elastic/elasticsearch/issues/79155).
As part of the refactor, `IngestScript` now takes a `CtxMap` directly rather than taking "sourceAndMetadata" (`CtxMap`) and `Metadata` (from `CtxMap`). There is a new `getCtxMap()` getter to get the typed `CtxMap`. `getSourceAndMetadata` could have been refactored to do this, but most of the callers of that don't need to know about `CtxMap` and are happy with a `Map<String, Object>`.
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support.
- Use sha256 in favor of sha1 as sha1 is not considered safe these days.
Closes https://github.com/elastic/elasticsearch/issues/69736
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.
See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.
The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.
To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.
There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.
I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
This change converts the range query from an array to object.
```
range": {
"number": [
{
"gte": 4
}
]
}
```
to
```
range": {
"number": {
"gte": 4
}
}
```
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
This change adds source fallback support for byte, short, and long fields. These use the already
existing class SourceValueFetcherSortedNumericIndexFieldData.
This removes many calls to the last remaining `createParser` method that
I deprecated in #79814, migrating callers to one of the new methods that
it created.
There were some cases where synthetic source wasn't properly rounding in
round trips. `0.15527719259262085` with a scaling factor of
`2.4206374697469164E16` was round tripping to `0.15527719259262088`
which then round trips up to `0.0.1552771925926209`, rounding the wrong
direction! This fixes the round tripping in this case through ever more
paranoid double checking and nudging.
Closes#88854
Adds metadata classes for Reindex and UpdateByQuery contexts.
For Reindex metadata:
* _index can't be null
* _id, _routing and _version are writable and nullable
* _now is read-only
* op is read-write must be 'noop', 'index' or 'delete'
Reindex metadata keeps the originx value for _index, _id, _routing and _version
so that `Reindexer` can see if they've changed.
If _version is null in the ctx map, or, equivalently, the augmentation
`setVersionToInternal()` was called by the script, `Reindexer` sets document
versioning to internal. If `_version` is `null` in the ctx map, `getVersion`
returns `Long.MIN_VALUE`.
For UpdateByQuery metadata:
* _index, _id, _version, _routing are all read-only
* _routing is also nullable
* _now is read-only
* op is read-write and one of 'index', 'noop', 'delete'
Closes: #86472
This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.
The value is `_now` and there was a previous metadata
value `_timestamp` (see test removal in #88733) so the
name is confusing.
Also renames the method `getTimestamp()` to `getNow()`
to reflect the change.
This formats the result of the `fields` section of the `_search` API for
runtime `geo_point` fields using the `format` parameter like we do for
non-runtime `geo_point` fields. This changes the default format for
those fields from `lat, lon` to `geojson` with the option to get `wkt`
or any other format we support.
The fix does so by preserving the `double, double` nature of the
`geo_point` rather than encoding it immediately in the script. Callers can
use the results. The field fetchers use the `double, double` natively,
preserving as much precision as possible. The queries quantize the points
exactly like lucene indexing does. And like the script did before this Pr.
Closes#85245
Allow UpdateByQuery to read the doc version if set in the request via
`version=true`.
If `version=true` is unset or false, the `ctx._version` is `-1`
indicating internal versioning via seq.
Fixes: #55745
MappedFieldType#fieldDataBuilder() currently takes two parameters, a fully qualified
index name and a supplier for a SearchLookup. We expect to add more parameters here
as we add support for loading fielddata from source. Rather than telescoping the
parameter list, this commit instead introduces a new FieldDataContext carrier object
which will allow us to add to these context parameters more easily.
In #88015 we made it so that downloads from S3 would sometimes retry
more than the configured limit, if each attempt seemed to be making
meaningful progress. This causes the failure of some assertions that the
number of retries was exactly as expected. This commit weakens those
assertions for S3 repositories.
Closes#88784Closes#88666
Part of #84369. Split out from #87696. Introduce tracing interfaces in
advance of adding APM support to Elasticsearch. The only implementation
at this point is a no-op class.
This PR adds a new `knn` option to the `_search` API to support ANN search.
It's powered by the same Lucene ANN capabilities as the old `_knn_search`
endpoint. The `knn` option can be combined with other search features like
queries and aggregations.
Addresses #87625
If we run into a seed that causes many fake exceptions and thus retries,
a 100ms retry interval will add up to minutes of test time for tests like
`testLargeBlobCountDeletion` that trigger thousands of requests.
There's no reason not to speed this up by 10x via more aggressive retry
timings as far as I can see so I reduced the timings to avoid randomly
blocked tests.
Adds the `metadata()` API call and a Metadata class for the Update context.
There are different metadata available in the update context depending
on whether it is an update or an insert (via upsert).
For update, scripts can read `index`, `id`, `routing`, `version` and `timestamp`.
For insert, scripts can read `index`, `id` and `timestamp`.
Scripts can always read and write the `op` but the available ops are different.
Updates allow 'noop', 'index' and 'delete'.
Inserts allow 'noop' and 'create'.
Refs: #86472
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
Create a `Metadata` superclass for ingest and update contexts.
Create a `CtxMap` superclass for `ctx` backwards compatibility in ingest and update contexts. `script.CtxMap` was moved from `ingest.IngestSourceAndMetadata`
`CtxMap` takes a `Metadata` subclass and validates update via the `FieldProperty`s passed in.
`Metadata` provides typed getters and setters and implements a `Map`-like interface, making it easy for a class containing `CtxMap` to implement the full `Map` interface.
The `FieldProperty` record that configures how to validate fields. Fields have a `type`, are `writeable` or read-only, and `nullable` or not and may have an additional validation useful for Set/Enum validation.
Pull out the implementation of `Metadata` from `IngestSourceAndMetadata`.
`Metadata` will become a base class extended by the update contexts: ingest, update, update by query and reindex.
`Metadata` implements a map-like interface, making it easy for a class containing `Metadata` to implement the full `Map` interface.
The method BucketCollector#getLeafCollector(LeafReaderContext) should be removed in favour of
BucketCollector#getLeafCollector(AggregationExecutionContext)
DocValueFieldExistsQuery, NormsFieldExistsQuery as well as KnnVectorFieldExistsQuery are deprecated in Lucene in favour of FieldExistsQuery which combines the three into a single query.
This commit updates Elasticsearch to no longer rely on such deprecated queries.
see https://issues.apache.org/jira/browse/LUCENE-10436
The invokedynamic instruction does not perfectly follow the Painless casting model opting to add
bridge methods where necessary to ensure symmetric behavior between compile-time and run-time
casting using boxed types. This change replaces the specialized class loader and bridge methods using
filtered method handles instead. This reduces the overall complexity of runtime casting.
S3 sometimes enters a state where blob downloads repeatedly fail but
with nontrivial progress between failures. Often each attempt yields 10s
or 100s of MBs of data. Today we abort a download after three (by
default) such failures, but this may not be enough to completely
retrieve a large blob during one of these flaky patches.
With this commit we start to avoid counting download attempts that
retrieved at least 1% of the configured `buffer_size` (typically 1MB)
towards the maximum number of retries.
Closes#87243
Adds the `Metadata` class and `metadata()` method to the ingest context.
Metadata has getters and setters for index, id, routing, version and versionType.
It also has a getter for timestamp.
Refs: #86472