When waiting for no initializing shards we also have to wait for events
when we have more than one node in the cluster. When the primary is
started, there is a short period of time, where neither the primary nor
any of the replicas are initializing.
Closes#46535
We leave replicas unassigned until we reroute after the primary shard
starts. If a cluster health request with wait_for_no_initializing_shards
is executed before the reroute, it will return immediately although
there will be some initializing replicas. Peer recoveries of those
shards can prevent translog on the primary from trimming.
We add wait_for_events to the cluster health request so that it will
execute after the reroute.
Closes#46425
Adds support for `geotile_grid` as a source in composite aggs.
Part of this change includes adding a new docFormat of `GEOTILE` that formats a hashed `long` value into a geotile formatting string `zoom/x/y`.
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
Since #45136, we use soft-deletes instead of translog in peer recovery.
There's no need to retain extra translog to increase a chance of
operation-based recoveries. This commit ignores the translog retention
policy if soft-deletes is enabled so we can discard translog more
quickly.
Co-authored-by: David Turner <david.turner@elastic.co>
Relates #45136
* Update the REST API specification
This patch updates the REST API spefication in JSON files to better encode deprecated entities,
to improve specification of URL paths, and to open up the schema for future extensions.
Notably, it changes the `paths` from a list of strings to a list of objects, where each
particular object encodes all the information for this particular path: the `parts` and the `methods`.
Among the benefits of this approach is eg. encoding the difference between using the `PUT` and `POST`
methods in the Index API, to either use a specific document ID, or let Elasticsearch generate one.
Also `documentation` becomes an object that supports an `url` and also a `description` which is a
new field.
* Adapt YAML runner to new REST API specification format
The logic for choosing the path to use when running tests has been
simplified, as a consequence of the path parts being listed under each
path in the spec. The special case for create and index has been removed.
Also the parsing code has been hardened so that errors are thrown earlier
when the structure of the spec differs from what expected, and their
error messages should be more helpful.
Introducing a IsoLocal.ROOT constant which should be used instead of java.util.Locale.ROOT in ES when dealing with dates. IsoLocal.ROOT customises start of the week to be Monday instead of Sunday.
closes#42588 an issue with investigation details
relates #41670 bug raised (this won't fix it on its own. joda.parseInto has to be reimplemented
closes#43275 an issue raised by community member
This commit changes the ForceMergeRequest.validate() method so that it does
not accept the parameters only_expunge_deletes and max_num_segments
to be set at the same time.
The motivation is that InternalEngine.forceMerge() just ignores the max. number
of segments parameter when the only expunge parameter is set to true, leaving
the wrong impression to the user that max. number of segments has been applied.
It also changes InternalEngine.forceMerge() so that it now throws an exception
when both parameters are set, and modifies tests where needed.
Because it changes the behavior of the REST API I marked this as >breaking.
Closes#43102
Adds an API to clone an index. This is similar to the index split and shrink APIs, just with the
difference that the number of primary shards is kept the same. In case where the filesystem
provides hard-linking capabilities, this is a very cheap operation.
Indexing cloning can be done by running `POST my_source_index/_clone/my_target_index` and it
supports the same options as the split and shrink APIs.
Closes#44128
The cat APIs support the ?bytes parameter to format bytes value. The cat
recovery API does not follow this because the fields were not using
ByteSizeValue. This commit addresses this.
With this change, we will return primary_term and seq_no of the current
document if an update is detected as a noop. We already return the
version; hence we should also return seq_no and primary_term.
Relates #42497
This is a refactor to current JSON logging to make it more open for extensions
and support for custom ES log messages used inDeprecationLogger IndexingSlowLog , SearchSLowLog
We want to include x-opaque-id in deprecation logs. The easiest way to have this as an additional JSON field instead of part of the message is to create a custom DeprecatedMessage (extends ESLogMEssage)
These messages are regular log4j messages with a text, but also carry a map of fields which can then populate the log pattern. The logic for this lives in ESJsonLayout and ESMessageFieldConverter.
Similar approach can be used to refactor IndexingSlowLog and SearchSlowLog JSON logs to contain fields previously only present as escaped JSON string in a message field.
closes#41350
This adds a `rare_terms` aggregation. It is an aggregation designed
to identify the long-tail of keywords, e.g. terms that are "rare" or
have low doc counts.
This aggregation is designed to be more memory efficient than the
alternative, which is setting a terms aggregation to size: LONG_MAX
(or worse, ordering a terms agg by count ascending, which has
unbounded error).
This aggregation works by maintaining a map of terms that have
been seen. A counter associated with each value is incremented
when we see the term again. If the counter surpasses a predefined
threshold, the term is removed from the map and inserted into a cuckoo
filter. If a future term is found in the cuckoo filter we assume it
was previously removed from the map and is "common".
The map keys are the "rare" terms after collection is done.
This commit adds a wildcard intervals source, similar to the prefix. It
also changes the term parameter in prefix to read prefix, to bring it
in to line with the pattern parameter in wildcard.
Closes#43198
This commit removes the nested_path and nested_filter options deprecated in 6x.
This change also checks that the sort field has a [nested] option if it is under a nested
object and throws an exception if it's not the case.
Closes#27098
This commit adds a prefix intervals source, allowing you to search
for intervals that contain terms starting with a given prefix. The source
can make use of the index_prefixes mapping option.
Relates to #43198
This commit finalizes the work done to rename size to max_docs in
reindex and update/delete by query. size is no longer supported in URL
or outer level body for the 3 APIs (though size in update/delete-by-query
will and has always been interpreted as scroll_size, it is not to be relied
upon).
Continuation of #41894Closes#24344
This commit adds multiple repositories support to get snapshots
request.
If some repository throws an exception this method does not fail fast
instead, it returns results for all repositories.
This PR is opened in favour of #41799, because we decided to change
the response format in a non-BwC manner. It makes sense to read a
discussion of the aforementioned PR.
This is the continuation of work done here #15151.
* introduce state to the REST API specification
* change state over to stability
* CCR is no GA updated to stable
* SQL is now GA so marked as stable
* Introduce `internal` as state for API's, marks stable in terms of lifetime but unstable in terms of guarantees on its output format since it exposes internal representations
* make setting a wrong stability value, or not setting it at all an error that causes the YAML test suite to fail
* update spec files to be explicit about their stability state
* Document the fact that stability needs to be defined
Otherwise the YAML test runner will fail (with a nice exception message)
* address check style violations
* update rest spec unit tests to include stability
* found one more test spec file not declaring stability, made sure stability appears after documentation everywhere
* cluster.state is stable, mark response in some way to denote its a key value format that can be changed during minors
* mark data frame API's as beta
* remove internal and private as states for an API
* removed the wrong enum values in the Stability Enum in the previous commit
The painless context api is internal and currently meant only for use in
generating docs. This commit moves the spec file for the api so that it
is only used by the test for this api, and not externally by any clients
building from the public rest spec.
* Documents the new deprecations options on the rest-api-spec
Relates #41439#38613#35262
* remove reference to path now that #41452 is merged, also fixed missing a comma rendering the example json invalid
* removed one more instance of path
* make sure json examples are self contained and not excerpts
Both TransportAnalyzeAction and CategorizationAnalyzer have logic to build
custom analyzers for index-independent analysis. A lot of this code is duplicated,
and it requires the AnalysisRegistry to expose a number of internal provider
classes, as well as making some assumptions about when analysis components are
constructed.
This commit moves the build logic directly into AnalysisRegistry, reducing the
registry's API surface considerably.
Previously, a reindex request had two different size specifications in the body:
* Outer level, determining the maximum documents to process
* Inside the source element, determining the scroll/batch size.
The outer level size has now been renamed to max_docs to
avoid confusion and clarify its semantics, with backwards compatibility and
deprecation warnings for using size.
Similarly, the size parameter has been renamed to max_docs for
update/delete-by-query to keep the 3 interfaces consistent.
Finally, all 3 endpoints now support max_docs in both body and URL.
Relates #24344
Adds a metadata field to snapshots which can be used to store arbitrary
key-value information. This may be useful for attaching a description of
why a snapshot was taken, tagging snapshots to make categorization
easier, or identifying the source of automatically-created snapshots.
Remove `common` query and `cutoff_frequency` parameter of
`match` and `multi_match` queries. Both have already been
deprecated for the next 7.x version.
Closes: #37096