Commit Graph

54561 Commits

Author SHA1 Message Date
James Rodewig 1b51acbbab
[DOCS] Add PIT to search after docs (#61593)
Co-authored-by: Jim Ferenczi <jim.ferenczi@elastic.co>
2020-09-08 09:53:21 -04:00
Dan Hermann 9dcab76427
Preserve grok pattern ordering and add sort option (#61671) 2020-09-08 07:10:27 -05:00
Dimitris Athanasiou 1b5e1183e6
[ML] Update mappings of ml stats index (#61980)
- Adds missing mappings for `alpha`, `gamma`, and `lambda`.
- Corrects name of `soft_tree_depth_limit` and `soft_tree_depth_tolerance`.
- Removes unused `regularization_depth_penalty_multiplier`,
  `regularization_leaf_weight_penalty_multiplier` and
  `regularization_tree_size_penalty_multiplier`.
2020-09-08 14:52:04 +03:00
David Roberts 4b4dab1095
[ML] Add support for date_nanos fields in find_file_structure (#62048)
Now that #61324 is merged it is possible for the find_file_structure
endpoint to suggest using date_nanos fields for timestamps where
the timestamp format provides greater than millisecond accuracy.
2020-09-08 12:48:39 +01:00
Luca Cavanna 28b89b4265
Fix point in time toXContent impl (#62080)
PointInTimeBuilder is a ToXContentObject yet it does not print out a whole object (it is rather a fragment). Also, when it is printed out as part of SearchSourceBuilder, an error is thrown because pit should be wrapped into its own object.

This commit fixes this and adds tests for it.
2020-09-08 13:40:29 +02:00
Przemko Robakowski 69b8fe431f
More resilient ILM history rollover test (#61973)
* More resilient ILM history rollover test

* add comment

* use assertHitCount

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-09-08 11:13:09 +02:00
Francisco Fernández Castaño f55b20482a
Add repositories metering API (#60371)
This pull request adds a new set of APIs that allows tracking the number of requests performed
by the different registered repositories.

In order to avoid losing data, the repository statistics are archived after the repository is closed for
a configurable retention period `repositories.stats.archive.retention_period`. The API exposes the
statistics for the active repositories as well as the modified/closed repositories.
2020-09-08 10:44:54 +02:00
Armin Braun d15d796766
Fix testMasterFailOverWithQueuedDeletes (#62062)
Fixing very rare corner case where the delete retry is slow.

Closes #62031
2020-09-08 09:47:02 +02:00
Andrei Stefan 8ec4a768f4
EQL: create the search request with a list of indices (#62005)
* The query client uses an array of indices instead of the comma separated
version of the indices names
2020-09-08 09:31:07 +03:00
Nhat Nguyen 04c95fa9ae
Allow enabling soft-deletes on restore from snapshot (#62018)
Closes #61969
2020-09-07 14:06:41 -04:00
David Kyle 8aaea029d0
Mute testTwoJobsWithSameRandomizeSeedUseSameTrainingSet (#62066)
For #62064
2020-09-07 16:11:25 +01:00
Jim Ferenczi 0043242b2f
Reenable bwc tests after backport of #61779 (#62051) 2020-09-07 16:46:13 +02:00
David Kyle 5a750478ab
[ML] Add missing exceptions to mappings upgrade tests (#61921)
Following the upgrade mappings test added in #61834 and the 
test has failed with some missing mappings. This adds those
mappings to the exceptions list
2020-09-07 14:55:13 +01:00
Alan Woodward 6a1c3e6059
Remove SearchPhase interface (#62050)
The interface is never used as an abstraction - implementations are are called directly,
and most of them don't need to implement the preProcess method.
2020-09-07 13:43:53 +01:00
David Turner 5f8c1f0484
Introduce integ tests for high disk watermark (#60460)
An important goal of the disk threshold decider is to ensure that nodes
use less disk space than the high watermark, and to take action if a
node ever exceeds this watermark. Today we do not have any
integration-style tests of this high-level behaviour. This commit
introduces a small test harness that can adjust the apparent size of the
disk and verify that the disk threshold decider moves shards around in
response.

Co-authored-by: Yannick Welsch <yannick@welsch.lu>
2020-09-07 14:03:03 +02:00
Luca Cavanna 00d7d8d869
Rename runtime_script field type to runtime (#62034)
We've had some discussions around the user experience when using runtime fields. Although we do plan on having multiple runtime fields implementation (e.g. grok, lookup etc.) which could be exposed as different field types, we decided to expose all runtime fields under the same `runtime` type. At the moment, the only implementation will be through scripts, hence a `script` must be specified. In the future, there will be other ways to generate values for runtime fields besides scripts.

This translates also to renaming the RuntimeScriptFieldMapper class to RuntimeFieldMapper .

Relates to #59332
2020-09-07 13:58:30 +02:00
Jim Ferenczi e3295acc8d
Adapt version after backport of #61779 (#62029)
This commit adapts the bwc version checks added in #61779 and disable
bwc tests until the backport #62028 is merged.
2020-09-07 13:12:53 +02:00
Armin Braun 5579025a73
Improve Snapshot State Machine Performance (#62000)
Just a few random things to optimize motivated by somewhat sub-standard performance
for large snapshot cluster states with many concurrent snapshots observed in production.
2020-09-07 12:36:29 +02:00
David Kyle 610a4f12ba
Mute Docs rollover index test snippet (#62045)
For #62043
2020-09-07 11:21:52 +01:00
Alan Woodward b0510a36cd
Fix null_value parsing for data_nanos field mapper (#61994)
The null_value parameter for date fields is always parsed using DateFormatter.parseMillis,
which is incorrect for nanosecond resolution fields. This commit changes the parsing logic
to always use DateFieldType.parse() to parse the null value.
2020-09-07 10:58:18 +01:00
István Zoltán Szabó a75094e666
[DOCS] Removes inference from the names of trained model APIs. (#62036) 2020-09-07 11:23:29 +02:00
David Kyle b08f121e65
Mute AsyncSearchActionIT tests (#62037)
For #61790
2020-09-07 10:11:38 +01:00
Alan Woodward 98b5204bea
Convert completion, binary, boolean tests to MapperTestCase (#62004)
Also fixes a metadata serialization bug in CompletionFieldMapper.
2020-09-07 10:10:44 +01:00
Jim Ferenczi 38dc926e10
Ensure validation of the reader context is executed first (#61831)
This change makes sure that reader context is validated (`SearchOperationListener#validateReaderContext)
before any other operation and that it is correctly recycled or removed at the end of the operation.
This commit also fixes a race condition bug that would allocate the security reader for scrolls more than once.

Relates #61446 

Co-authored-by: Nhat Nguyen <nhat.nguyen@elastic.co>
2020-09-07 10:47:24 +02:00
Martijn van Groningen 3df08102cb
Move data stream yaml tests to xpack plugin module. (#61998)
Moving the data stream yaml tests to xpack plugin module has the following benefits:
* The tests are ran both with security enabled (as part of xpack/plugin integTest)
  and disabled (as part of xpack/plugin/data-stream/qa/rest integTest).
* and running the tests in mixed cluster qa environment.
2020-09-07 10:26:28 +02:00
Tanguy Leroux 49a927fa4d
Adapt SearchableSnapshotsBlobStoreCacheIntegTests to Lucene 8.7.0 (#61989)
Elasticsearch now uses #61957 which includes https://issues.apache.org/jira/browse/LUCENE-9456. 
We can remove the corresponding //TODO in SearchableSnapshotsBlobStoreCacheIntegTests.
2020-09-07 09:27:17 +02:00
Howard de56d71558
Remove unused deciders in BalancedShardsAllocator (#62026) 2020-09-07 00:02:31 -04:00
Luca Cavanna 1f2f49b223
[TEST] Core tests with runtime fields minor fix (#62009)
Two dynamic templates get add per runtime type, while one is enough. Also one of the two for the keyword field was incorrect.
2020-09-05 10:22:18 +02:00
Armin Braun d52668d3ed
Simplify BytesReference StreamInput (#61681)
Flattening both streams into a single stream here saves a few objects and some indirection.
Also, removed the redundant `offset` field which added nothing but complexity by forcing the
incrementation of two counters on every read.
2020-09-05 09:19:56 +02:00
Nhat Nguyen 36368b7f99
CCR should retry on CircuitBreakingException (#62013)
CCR shard follow task can hit CircuitBreakingException on the leader 
cluster (read changes requests) or the follower cluster (bulk requests).
CCR should retry on CircuitBreakingException as it's a transient error.
2020-09-04 22:46:51 -04:00
Ryan Ernst a56952a299
Add snapshot only test modules (#61954)
This commit adds external test modules. These are modules meant for
external systems to test edge cases in elasticsearch, but only within
snapshots. They are not meant to be used in production, so protections
are also added from their accidental inclusion in release builds.

Note that this commit does not actually add any new modules, it only
adds the infrastructure for the new modules, under
`test/external-modules`.
2020-09-04 16:34:55 -07:00
Lisa Cawley 8290b6216e
[DOCS] Fix capitalization in HLRC ML APIs (#62010) 2020-09-04 13:40:02 -07:00
James Rodewig dcf0c3062f
[DOCS] Document dynamic discovery settings (#61420) 2020-09-04 10:56:17 -04:00
bellengao c0dfb45191
Add test for item-level error when no write index defined for an alias in bulk API (#55503)
Co-authored-by: Jake Landis <jake.landis@elastic.co>
2020-09-04 09:29:23 -05:00
James Rodewig bbcd8078ce
[DOCS] Document dynamic index mgmt and buffer settings (#61753) 2020-09-04 10:19:42 -04:00
James Rodewig 9169f26ad2
[DOCS] Use correct get document API (#61804) (#61991)
The documentation refers to a deprecated get document API call (it uses document `type`).

Co-authored-by: Thiago Souza <thiago@elastic.co>
2020-09-04 10:05:21 -04:00
Dimitris Athanasiou b4fcb77e20
[ML] Allow training_percent to be any positive double up to hundred (#61977)
This changes the valid range of `training_percent` for regression and
classification from [1, 100] to (0, 100].

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-09-04 16:47:03 +03:00
Yannick Welsch 4978a79ce3
Simplify searchable snapshot shard allocation (#61911)
Simplifies allocation for snapshot-backed shards by always making the recovery source "from snapshot" for those
snapshot-backed shards (instead of "recover from local or from empty store"). Also let's the balancer pick a node which
to allocate the snapshot-backed shard to (which takes number of shards on each node into account unlike the current
implementation which just picks whatever node we are allowed to allocate to, with no notion of "balancing" at all).
2020-09-04 15:43:55 +02:00
James Rodewig e2d6fec643
[DOCS] Fix typo in URL-based access control docs (#61896) (#61985)
Co-authored-by: George Tseres <george.tseres@gmail.com>
2020-09-04 09:24:36 -04:00
Jim Ferenczi 49ae2bb56a
Improve reduction of terms aggregations (#61779)
* Improve reduction of terms aggregations

Today, the terms aggregation reduces multiple aggregations at once using a map
to group same buckets together. This operation can be costly since it requires
to lookup every bucket in a global map with no particular order.
This commit changes how term buckets are sorted by shards and partial reduces in
order to be able to reduce results using a merge-sort strategy.
For bwc, results are merged with the legacy code if any of the aggregations use
a different sort (if it was returned by a node in prior versions).

Relates #51857
2020-09-04 15:08:32 +02:00
Alan Woodward 50a74f972a
Improve error messages on bad [format] and [null_value] params for date mapper (#61932)
Currently, if an incorrectly formatted date is passed as a null_value for a date field mapper
configuration, you get a vague error:

Failed to parse mapping [_doc]: cannot parse empty date
Similarly, if you pass an incorrect format, you get the error:

Failed to parse mapping [_doc]: Invalid format [...]
This commit improves both these errors by including the mapper name and parameter that
are misconfigured.

Fixes #61712
2020-09-04 14:02:18 +01:00
Mikołaj Przybysz 9e8d8ee38a
[DOCS] Add line break to get ILM lifecycle API docs (#61892) 2020-09-04 09:00:11 -04:00
Ignacio Vera 2f6001e557
Enable BWC after backporting lucene upgrade (#61978) 2020-09-04 14:55:13 +02:00
Martijn van Groningen 9ad2970e40
Fix skip versions fix xpack data stream yaml tests. (#61926)
Relates to #61904

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-09-04 14:07:54 +02:00
Rene Groeschke eda35d0d9f
Fix resolveAllDependencies broken by ArtfactTransforms (#61972)
- ignore es extracted configuration for resolveAllDeps
- fixes #61945
2020-09-04 14:04:40 +02:00
Ignacio Vera 1549ad0e08
disable BWC for backport #61974 (#61975) 2020-09-04 13:45:52 +02:00
Ignacio Vera f37727a8e7
Fix invalid flag setting for RegExp after lucene upgrade (#61976) 2020-09-04 13:45:02 +02:00
Ignacio Vera e236054e09
upgrade to Lucene-8.7.0-snapshot-61ea26a (#61957) 2020-09-04 12:08:35 +02:00
Dimitris Athanasiou 5d1be250e9
[ML] Add incremental id during data frame analytics reindexing (#61943)
Previously, we added a copy of the `_id` during reindexing and sorted
the destination index on that. This allowed us to traverse the docs in the
destination index in a stable order multiple times and with efficiency.
However, the destination index being sorted means we cannot have `nested`
typed fields. This is a problem as it does not allow us to provide
a good experience with our evaluate API when it comes to computing
metrics for specific classes, features, etc.

This commit changes the approach in order to result to a destination
index that allows nested fields.

Instead of adding a copy of the `_id` field, we now add an incremental
id that we can use to traverse the docs in a stable order. We also
ensure we always assign the same incremental id to the same doc from
the source indices by sorting on `_seq_no` during reindexing. That
in combination with the reindexing API using scroll gives us a stable
order as scroll uses the (`_index`, `_doc`, shard_id) tuple to resolve ties.

The extractor now does not need to scroll. Instead we sort on the incremental
id and we do ranged searches to avoid the sort-all-docs overhead.

Finally, the `TestDocsIterator` is simply changed to search_after the incremental id.

With these changes data frame analytics jobs do not use scroll at any part.

Having all these in place, the commit adds the `nested` types to the necessary
fields of `classification` and `regression` analyses results.
2020-09-04 11:45:05 +03:00
Tanguy Leroux bcd67066fa
Reduce locking in prewarming (#61837)
During prewarming of a Lucene file a CacheFile is acquired and 
then locked for the duration of the prewarming, ie locked until all 
the part of the file has been downloaded and written to cache on 
disk. The locking (executed with CacheFile#fileLock()) is here to 
prevent the cache file to be evicted while it is prewarming.

But holding the lock may take a while for large files, specially since
 restoring snapshot files now respects the 
indices.recovery.max_bytes_per_sec setting of 40mb (#58658), 
and this can have bad consequences like preventing the CacheFile 
to be evicted, opened or closed. In manual tests this bug slow 
downs various requests like mounting a new searchable snapshot
 index or deleting an existing one that is still prewarming.

This commit reduces the time the lock is held during prewarming so
 that the read lock is only required when actively writing to the CacheFile.
2020-09-04 10:28:11 +02:00