Commit Graph

6664 Commits

Author SHA1 Message Date
Stuart Tettemer 264f09f3d5
Script: Common base class for write scripts (#89141)
Adds `WriteScript` as the common base class for the write scripts: `IngestScript`, `UpdateScript`, `UpdateByQueryScript` and `ReindexScript`.

This pulls the common `getCtx()` and `metadata()` methods into the base class and prepares for the implementation of the ingest fields api (https://github.com/elastic/elasticsearch/issues/79155).

As part of the refactor, `IngestScript` now takes a `CtxMap` directly rather than taking "sourceAndMetadata" (`CtxMap`) and `Metadata` (from `CtxMap`).  There is a new `getCtxMap()` getter to get the typed `CtxMap`.  `getSourceAndMetadata` could have been refactored to do this, but most of the callers of that don't need to know about `CtxMap` and are happy with a `Map<String, Object>`.
2022-08-09 12:31:18 -05:00
Jack Conradson 81265d2c2a
Add support for source fallback with scaled float field type (#89053)
This change adds source fallback support for scaled float. This uses the already existing class 
SourceValueFetcherSortedDoubleIndexFieldData.
2022-08-08 08:39:13 -07:00
Jack Conradson 24e367fe0f
Add support for source fallback with the boolean field type (#89052)
This change adds a SourceValueFetcherSortedBooleanIndexFieldData to support boolean doc values 
for source fallback.
2022-08-08 08:38:48 -07:00
David Turner c08111b5b7
Avoid expensive call to Span.fromContextOrNull(null) (#89135)
Workaround for #89107
2022-08-05 02:07:15 +09:30
Mary Gouseti 418883aeb9
maybeScheduleNow with delay 0 instead of 1 (#89110)
Replace the 1 millisecond delay to 0 when we want to schedule a
monitoring task now.
2022-08-04 21:39:57 +09:30
Rene Groeschke 3909b5eaf9
Add verification metadata for dependencies (#88814)
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support. 

- Use sha256 in favor of sha1 as sha1 is not considered safe these days.

Closes https://github.com/elastic/elasticsearch/issues/69736
2022-08-04 09:51:16 +02:00
Rory Hunter 512bfebc10
Provide tracing implementation using OpenTelemetry + APM agent (#88443)
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.

See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.

The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.

To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.

There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.

I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
2022-08-03 14:13:31 +01:00
Philip Krauss d2f99f5baf
Change YAML test structure from list to object (#77700)
This change converts the range query from an array to object.

```
range": {
  "number": [
    {
      "gte": 4
    }
  ]
}
```

to

```
range": {
  "number": {
    "gte": 4
  }
}
```
2022-08-03 03:47:41 +09:30
Jack Conradson 3bb4a84bdd
Support source fallback for double, float, and half_float field types (#89010)
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
2022-08-02 10:13:58 -07:00
Jack Conradson 5194d29b1c
Support source fallback for byte, short, and long fields (#88954)
This change adds source fallback support for byte, short, and long fields. These use the already 
existing class SourceValueFetcherSortedNumericIndexFieldData.
2022-08-01 08:23:36 -07:00
Nik Everett 87ab933c8b
Remove calls to deprecated xcontent method (#84733)
This removes many calls to the last remaining `createParser` method that
I deprecated in #79814, migrating callers to one of the new methods that
it created.
2022-08-01 22:18:03 +09:30
Nik Everett 4607182ce8
synthetic source: fix scaled_float rounding (#88916)
There were some cases where synthetic source wasn't properly rounding in
round trips. `0.15527719259262085` with a scaling factor of
`2.4206374697469164E16` was round tripping to `0.15527719259262088`
which then round trips up to `0.0.1552771925926209`, rounding the wrong
direction! This fixes the round tripping in this case through ever more
paranoid double checking and nudging.

Closes #88854
2022-08-01 22:17:23 +09:30
Ignacio Vera ed564f6e1d
Update lo lucene-9.3.0 (#88927) 2022-08-01 07:21:13 +02:00
Chris Hegarty 4e3b71b6af
Ensure that the extended socket options TCP_KEEPXXX are available (#88935) 2022-07-29 17:54:33 +01:00
Stuart Tettemer 476da8c4ed
Script: Reindex & UpdateByQuery Metadata (#88665)
Adds metadata classes for Reindex and UpdateByQuery contexts.

For Reindex metadata:
 * _index can't be null
 * _id, _routing and _version are writable and nullable
 * _now is read-only
 * op is read-write must be 'noop', 'index' or 'delete'

Reindex metadata keeps the originx value for _index, _id, _routing and _version
so that `Reindexer` can see if they've changed.

If _version is null in the ctx map, or, equivalently, the augmentation
`setVersionToInternal()` was called by the script, `Reindexer` sets document
versioning to internal.  If `_version` is `null` in the ctx map, `getVersion`
returns `Long.MIN_VALUE`.

For UpdateByQuery metadata:
 * _index, _id, _version, _routing are all read-only
 * _routing is also nullable
 * _now is read-only
 * op is read-write and one of 'index', 'noop', 'delete'

Closes: #86472
2022-07-28 17:21:07 -05:00
Jack Conradson 5e0701f026
Add source fallback for keyword fields using operation (#88735)
This change adds an operation parameter to FieldDataContext that allows us to specialize the field data that are returned from fielddataBuilder in MappedFieldType. Keyword, integer, and geo point field types now support source fallback where we build a doc values wrapper using source if doc values doesn't exist for this field under the operation SCRIPT. This allows us to have source fallback in scripting for the scripting fields API.
2022-07-28 10:34:05 -07:00
Stuart Tettemer fb01f4e633
Script: Rename TIMESTAMP constant NOW in Metadata (#88870)
The value is `_now` and there was a previous metadata
value `_timestamp` (see test removal in #88733) so the
name is confusing.

Also renames the method `getTimestamp()` to `getNow()`
to reflect the change.
2022-07-27 17:28:30 -05:00
Nik Everett 3bcee8eaa0
Format runtime geo_points (#85449)
This formats the result of the `fields` section of the `_search` API for
runtime `geo_point` fields using the `format` parameter like we do for
non-runtime `geo_point` fields. This changes the default format for
those fields from `lat, lon` to `geojson` with the option to get `wkt`
or any other format we support.

The fix does so by preserving the `double, double` nature of the
`geo_point` rather than encoding it immediately in the script. Callers can
use the results. The field fetchers use the `double, double` natively,
preserving as much precision as possible. The queries quantize the points
exactly like lucene indexing does. And like the script did before this Pr.

Closes #85245
2022-07-27 13:11:07 -04:00
Keith Massey 1424702bac
Fixing a routing bug in 190_script_processor/Test metadata (#88831) 2022-07-27 08:16:00 -05:00
Stuart Tettemer e3f1de2c12
Script: UpdateByQuery can read doc version if requested (#88740)
Allow UpdateByQuery to read the doc version if set in the request via
`version=true`.

If `version=true` is unset or false, the `ctx._version` is `-1`
indicating internal versioning via seq.

Fixes: #55745
2022-07-26 13:00:26 -05:00
Alan Woodward bc8ebbf540
Add FieldDataContext (#88779)
MappedFieldType#fieldDataBuilder() currently takes two parameters, a fully qualified
index name and a supplier for a SearchLookup. We expect to add more parameters here
as we add support for loading fielddata from source. Rather than telescoping the
parameter list, this commit instead introduces a new FieldDataContext carrier object
which will allow us to add to these context parameters more easily.
2022-07-26 14:47:50 +01:00
Nik Everett d244fde850
TSDB: Rename some methods (#88790)
This renames a couple of methods on `IndexMode` to be more clear and
adds some javadoc.
2022-07-26 09:08:45 -04:00
David Turner 942e7a76ca
Relax assertion about retry count for S3 repos (#88801)
In #88015 we made it so that downloads from S3 would sometimes retry
more than the configured limit, if each attempt seemed to be making
meaningful progress. This causes the failure of some assertions that the
number of retries was exactly as expected. This commit weakens those
assertions for S3 repositories.

Closes #88784 Closes #88666
2022-07-26 20:48:14 +09:30
Stuart Tettemer 7af25f2dcf
Rename AbstractKeywordDocValuesField to BaseKeywordDocValuesField and make it abstract (#88778)
* Make AbstractKeywordDocValuesField abstract
* AbstractKeywordDocValuesField -> BaseKeywordDocValuesField
2022-07-25 17:05:27 -05:00
Rory Hunter 5c5981d27d
Introduce tracing interfaces (#87921)
Part of #84369. Split out from #87696. Introduce tracing interfaces in
advance of adding APM support to Elasticsearch. The only implementation
at this point is a no-op class.
2022-07-26 05:31:41 +09:30
Keith Massey 4ad9349b18
Expanding switch statement to cover all cases (#88772) 2022-07-25 11:12:04 -05:00
Julie Tibshirani e3ede67262
Integrate ANN into _search endpoint (#88694)
This PR adds a new `knn` option to the `_search` API to support ANN search.
It's powered by the same Lucene ANN capabilities as the old `_knn_search`
endpoint. The `knn` option can be combined with other search features like
queries and aggregations.

Addresses #87625
2022-07-22 08:02:07 -07:00
Ignacio Vera 3b7f393a82
Upgrade to lucene snapshot lucene-9.3.0-snapshot-b8d1fcfd0ec (#88706) 2022-07-22 11:22:39 +02:00
Armin Braun 6f70066ffc
Reintroduce the ability to configure S3 repository credentials in cluster state (#88652)
Revert of #46147, we want to keep this functionality around for a little longer.
2022-07-22 10:34:04 +02:00
Ievgen Degtiarenko 584f7f24fc
Simplify node stopping call (#88547)
There is no need to use stopRandomNode/namePredicate combination to stop a node by its name.
2022-07-22 09:35:50 +02:00
Armin Braun 377ad77096
Fix Azure repository tests randomly taking many minutes (#88559)
If we run into a seed that causes many fake exceptions and thus retries,
a 100ms retry interval will add up to minutes of test time for tests like
`testLargeBlobCountDeletion` that trigger thousands of requests.
There's no reason not to speed this up by 10x via more aggressive retry
timings as far as I can see so I reduced the timings to avoid randomly
blocked tests.
2022-07-20 10:31:23 +02:00
Stuart Tettemer fa25c31aca
Script: Metadata for update context (#88333)
Adds the `metadata()` API call and a Metadata class for the Update context.

There are different metadata available in the update context depending
on whether it is an update or an insert (via upsert).

For update, scripts can read `index`, `id`, `routing`, `version` and `timestamp`.

For insert, scripts can read `index`, `id` and `timestamp`.

Scripts can always read and write the `op` but the available ops are different.

Updates allow 'noop', 'index' and 'delete'.
Inserts allow 'noop' and 'create'.

Refs: #86472
2022-07-19 11:40:01 -05:00
Alan Woodward 5c11a81913
Add 'mode' option to `_source` field mapper (#88211)
Currently we have two parameters that control how the source of a document
is stored, `enabled` and `synthetic`, both booleans. However, there are only
three possible combinations of these, with `enabled:false` and `synthetic:true`
being disallowed. To make this easier to reason about, this commit replaces
the `enabled` parameter with a new `mode` parameter, which can take the values
`stored`, `synthetic` and `disabled`. The `mode` parameter cannot be set
in combination with `enabled`, and we will subsequently move towards
deprecating `enabled` entirely.
2022-07-18 12:50:10 +01:00
Ignacio Vera 051d0e9946
Move geo_shape percolator test to the spatial module (#88554) 2022-07-18 10:04:48 +02:00
Craig Taverner 6797d44f1a
Support cartesian shape with doc values (#88487)
Shape field type generated doc values by default now.
2022-07-14 07:17:51 +02:00
Stuart Tettemer 85b8d3df30
Script: Ingest Metadata and CtxMap (#88458)
Create a `Metadata` superclass for ingest and update contexts.

Create a `CtxMap` superclass for `ctx` backwards compatibility in ingest and update contexts.  `script.CtxMap` was moved from `ingest.IngestSourceAndMetadata`

`CtxMap` takes a `Metadata` subclass and validates update via the `FieldProperty`s passed in.

`Metadata` provides typed getters and setters and implements a `Map`-like interface, making it easy for a class containing `CtxMap` to implement the full `Map` interface.

The `FieldProperty` record that configures how to validate fields. Fields have a `type`, are `writeable` or read-only, and `nullable` or not and may have an additional validation useful for Set/Enum validation.
2022-07-13 14:58:56 -05:00
Albert Zaharovits a3ae8ce3f2
Test mute 88387 (#88512)
Mute IngestCommonClientYamlTestSuiteIT test
{yaml=ingest/190_script_processor/Test metadata}

Related: https://github.com/elastic/elasticsearch/issues/88387
2022-07-14 01:42:37 +09:30
Stuart Tettemer 39de085cb0
Ingest: Start separating Metadata from IngestSourceAndMetadata (#88401)
Pull out the implementation of `Metadata` from `IngestSourceAndMetadata`.

`Metadata` will become a base class extended by the update contexts: ingest, update, update by query and reindex.

`Metadata` implements a map-like interface, making it easy for a class containing `Metadata` to implement the full `Map` interface.
2022-07-12 16:01:16 -05:00
Ignacio Vera 4af02b8c80
Stop registering TestGeoShapeFieldMapperPlugin in ESIntegTestCase (#88460)
Instead of registering the plugin by default, implementations that need it are responsible on registering the plugin.
2022-07-12 14:42:45 +02:00
Ignacio Vera dd1bd83234
Don't index geo_shape field in AbstractBuilderTestCase (#88437)
This commit stops adding the geo_shape field mapper by default and adds the mapper only when it is needed.
2022-07-12 06:52:14 +02:00
Ignacio Vera 450f223eb5
Remove usages of BucketCollector#getLeafCollector(LeafReaderContext) (#88414)
The method BucketCollector#getLeafCollector(LeafReaderContext) should be removed in favour of 
BucketCollector#getLeafCollector(AggregationExecutionContext)
2022-07-11 17:13:41 +02:00
Luca Cavanna d3b1a61f36
Replace usages of deprecated specialized field exists queries (#88312)
DocValueFieldExistsQuery, NormsFieldExistsQuery as well as KnnVectorFieldExistsQuery are deprecated in Lucene in favour of FieldExistsQuery which combines the three into a single query.

This commit updates Elasticsearch to no longer rely on such deprecated queries.

see https://issues.apache.org/jira/browse/LUCENE-10436
2022-07-08 13:48:31 +02:00
Ignacio Vera d1ba276587
Remove tech debt on Aggregations#getLeafCollector (#88230)
Remove #getLeafCollector(LeafReaderContext, LeafBucketCollector) in favour of 
#getLeafCollector(ggregationExecutionContext, LeafBucketCollector)
2022-07-07 07:47:10 +02:00
Nhat Nguyen bd69f90fff
Upgrade to Lucene-9.3.0-snapshot-2d05f5c623e (#88284)
To include LUCENE-10620 - which passes Weight to Collector
2022-07-06 16:16:03 -04:00
Jack Conradson 7234730532
Replace bridge methods with filtered methods in Painless (#88100)
The invokedynamic instruction does not perfectly follow the Painless casting model opting to add 
bridge methods where necessary to ensure symmetric behavior between compile-time and run-time 
casting using boxed types. This change replaces the specialized class loader and bridge methods using 
filtered method handles instead. This reduces the overall complexity of runtime casting.
2022-07-05 11:59:04 -07:00
Salvatore Campagna 66b5189e08
fix: extract matrix stats using bucket_selector buckets_path (#88271) 2022-07-05 18:02:31 +02:00
Chris Hegarty 453f12c72d
Upgrade to Log4J 2.18.0 (#88237) 2022-07-04 11:30:38 +01:00
Martijn van Groningen 577d3e2e9c
Skip backing indices with a disjoint range on @timestamp field. (#85162)
Implicitly skip backing indices with a time series range that doesn't
match with a required filter on @timestamp field.

Relates to #74660
2022-07-04 05:38:46 -04:00
David Turner 71aeebe0c3
Retry after all S3 get failures that made progress (#88015)
S3 sometimes enters a state where blob downloads repeatedly fail but
with nontrivial progress between failures. Often each attempt yields 10s
or 100s of MBs of data. Today we abort a download after three (by
default) such failures, but this may not be enough to completely
retrieve a large blob during one of these flaky patches.

With this commit we start to avoid counting download attempts that
retrieved at least 1% of the configured `buffer_size` (typically 1MB)
towards the maximum number of retries.

Closes #87243
2022-06-30 06:41:15 -04:00
Stuart Tettemer 011c76ab55
Script: Add Metadata to ingest context. (#87309)
Adds the `Metadata` class and `metadata()` method to the ingest context.

Metadata has getters and setters for index, id, routing, version and versionType.
It also has a getter for timestamp.

Refs: #86472
2022-06-29 15:32:53 -05:00