SimpleFS is deprecated and will be removed in Lucene 9. This commit
deprecates SimpleFS in 7.x and uses NIOFS for SimpleFS in Elasticsearch
7.15 or later as it offers superior or equivalent performance to
SimpleFS.
Closes#74036. Since some orchestration platforms forbid periods in
environment variable names, allow Docker users to pass settings to ES
using an alternative name scheme. For example:
bootstrap.memory_lock
...becomes:
ES_BOOTSTRAP_MEMORY__LOCK
The setting name is uppercased, prefixed, all underscores are converted
to double underscores, and all periods are converted to underscores.
`field_masking_span` is the only span query that does not begin with
`span_`. This commit deprecates the existing name and adds a new
name `span_field_masking` to better fit with the other queries.
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.
Closes#65740.
We incorrectly list `wait_for` as a valid `refresh` argument for the
following APIs:
* Delete by query
* Multi get
* Reindex
This fixes that error. It also updates the get API docs for consistency.
Closes#65031
Documents async SQL search functionality.
I plan to add formal API documentation for the async APIs with a later PR.
Relates to #73991 and #74845.
# Conflicts:
# docs/reference/release-notes/highlights.asciidoc
Add documentation for the newly introduced CircuitBreaker, which is
used to restrict the memory usage for an EQL sequence query to avoid
OutOfMemory exceptions.
Follows: #74381
Today the docs on setting `tcp_retries2` only talk about intra-cluster
connections, but in fact this setting is equally important to the
resilience of remote cluster connections too. This commit rewords these
docs to cover both cases.
Relates #34405
In preparation for #74845, we need to create formal API reference documentation for our SQL APIs.
Due to the number of SQL APIs, we'll likely need to create a separate nested page for them. For parity, this PR moves
our EQL APIs to a separate page as well. Previously, they were listed under our search APIs.
Clarifies that you cannot specify an alias in the delete index API. You _can_ delete indices with an alias.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Adds a tutorial for search templates.
* Adds reference docs for the render search template API.
* Improves parameter documentation for the multi search template API.
* Removes duplicate examples from the search template API, multi search API, and create stored script API docs.
* Splits the source files for the search template API and the multi search template API docs.
Add a dynamic transient cluster setting search.max_async_search_response_size
that controls the maximum allowed size for a stored async search
response. The default max size is 10Mb. An attempt to store
an async search response larger than this size will result in error.
Relates to #67594
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
* [DOCS] Add performance info for runtime fields
* Add script-based sorting and clarify performance
* Changing title to Incentives and reworking the intro
* Removes docs and references for the following `geo_shape` mapping parameters:
* `tree`
* `tree_levels`
* `strategy`
* `distance_error_pct`
* Updates a related breaking change.
Relates to #70850
This adds support for the range aggregation over `histogram` mapped fields.
Decisions made for implementation:
- Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation
- Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future.
- Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.
This adds support for a `dry_run` parameter for the
`_ilm/migrate_to_data_tiers` API. This defaults to `false`, but when
configured to `true` it will simulate the migration of elasticsearch
entities to data tiers based routing, returning the entites that need to
be updated (indices, ILM policies and the legacy index template that'd
be deleted, if any was configured in the request).
To switch an index's lifecycle policy, you must first remove the existing
policy. Otherwise, phase execution for the index may silently fail.
Closes#70151
You can now use a wildcard pattern to remove data stream and index
aliases in the same action/request.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds the _ilm/migrate_to_data_tiers API to expose the service for
migrating the elasticsearch abstractions (indices, ILM policies and an
optional legacy template to delete) to data tiers routing allocation
(away from custom node attributes)
Added the dimension parameter to the following field types:
keyword
ip
Numeric field types (integer, long, byte, short)
The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.
Relates to #74014
* [DOCS] Remove beta label for most service accounts docs
* Remove beta label from additional service account files
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
This commit adds the "in_use_by" object to the response for ILM policies. This map shows the
indices, data streams, and composable templates that use the ILM policy.
An example output may look like:
```json
{
"logs" : {
"version" : 1,
"modified_date" : "2021-06-23T18:42:08.381Z",
"policy" : {
...
},
"in_use_by" : {
"indices" : [".ds-logs-foo-barbaz-2021.06.23-000001", ".ds-logs-foo-other-2021.06.23-000001"],
"data_streams" : ["logs-foo-barbaz", "logs-foo-other"],
"composable_templates" : ["logs"]
}
}
}
```
Resolves#73869
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.
The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
A reindex from a remote cluster doesn't support automatic or manual slicing.
This reuses a related note from the reindex docs in the upgrade docs.
Closes#54243.
Previously it was a requirement of the close job API that if the
job had an associated datafeed that that datafeed was stopped
before the job could be closed. Experience has shown that this
is just a pedantic nuisance. If a user closes the job without
first stopping the datafeed then it's just a mistake, and they
then have to make two further calls, to stop the datafeed and
then attempt to close the job again.
This PR changes the behaviour so that if you ask to close a job
whose datafeed is running then the datafeed gets stopped first
as part of the same call. Datafeeds are stopped with the same
level of force as the job close request specified.
This commit adds two related changes:
* ILM WaitForDataTierStep
* Autoscaling frozen_existence decider
The first part ensures that we wait mounting an index until a node that
can hold the index is available, avoiding a failed restore and red
cluster state. This is in particular important for the frozen phase, but
is done generically in the searchable snapshot action.
The second part triggers on indices in the ILM frozen phase to scale the
tier into existence by requiring a minimal amount of memory and storage.
Closes#72771
I was helping some folks debug an issue with the terms agg and noticed
that we didn't always have the `total_buckets` debug information. I also
noticed that we can't tell how many buckets we build, so I added that
too as `built_buckets`.
Finally, I noticed that when we're using segment ords we count segments
without any values as "multi-valued". We can do better there and count
them as no-valued. That will, mostly, just improve the profiling. When
we collect from global ords we have no way to tell how many values are
on the segment so segments without any values will, sadly, in this case
still be miscounted as multi-valued.
When we introduced dynamic:runtime (#65489) we decided to have it create objects dynamically under properties, as the runtime section did not (and still does not) support object fields. That proved to be a poor choice, because the runtime section is flat, supports dots in field names, and does not really need objects. Also, these end up causing unnecessary mapping conflicts.
With this commit we adapt dynamic:runtime to not dynamically create objects.
Closes#70268
Today we don't really describe why using `index.shard.check_on_startup`
is such a bad idea, or what to do instead. This commit expands the docs
to clarify what it does, why it's not really necessary and what to do
instead. It also now logs a warning every time the startup checks run to
encourage users to stop using this setting.
https://github.com/elastic/elasticsearch/pull/74201 documents `null` handling to the arg descriptions of several string functions.
This PR moves pre-existing docs for `null` handling and similar edge case handling for string functions to arg descriptions for consistency.
Relates to #74193
Pagination and snapshots for get snapshots API, build on top of the current implementation to enable work that needs this API for testing. A follow-up will leverage the changes to make things more efficient via pagination.
Relates https://github.com/elastic/elasticsearch/pull/73570 which does part of the under-the-hood changes required to efficiently implement this API on the repository layer.
Both of these APIs don't parse request bodies, the parameters are all taken
from the query string. Also, included the master timeout param include
as it was missing here also.
In #74138 we noted that index settings aren't copied in a clone. In fact
that's not true, we copy everything except explicitly-excluded ones,
`number_of_replicas` and `auto_expand_replicas`. This fixes the mistake.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Today if sending file chunks is CPU-bound (e.g. when using compression)
then we tend to concentrate all that work onto relatively few threads,
even if `indices.recovery.max_concurrent_file_chunks` is increased. With
this commit we fork the transmission of each chunk onto its own thread
so that the CPU-bound work can happen in parallel.