SimpleFS is deprecated and will be removed in Lucene 9. This commit
deprecates SimpleFS in 7.x and uses NIOFS for SimpleFS in Elasticsearch
7.15 or later as it offers superior or equivalent performance to
SimpleFS.
Closes#74036. Since some orchestration platforms forbid periods in
environment variable names, allow Docker users to pass settings to ES
using an alternative name scheme. For example:
bootstrap.memory_lock
...becomes:
ES_BOOTSTRAP_MEMORY__LOCK
The setting name is uppercased, prefixed, all underscores are converted
to double underscores, and all periods are converted to underscores.
`field_masking_span` is the only span query that does not begin with
`span_`. This commit deprecates the existing name and adds a new
name `span_field_masking` to better fit with the other queries.
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.
Closes#65740.
We incorrectly list `wait_for` as a valid `refresh` argument for the
following APIs:
* Delete by query
* Multi get
* Reindex
This fixes that error. It also updates the get API docs for consistency.
Closes#65031
Documents async SQL search functionality.
I plan to add formal API documentation for the async APIs with a later PR.
Relates to #73991 and #74845.
# Conflicts:
# docs/reference/release-notes/highlights.asciidoc
Add documentation for the newly introduced CircuitBreaker, which is
used to restrict the memory usage for an EQL sequence query to avoid
OutOfMemory exceptions.
Follows: #74381
Today the docs on setting `tcp_retries2` only talk about intra-cluster
connections, but in fact this setting is equally important to the
resilience of remote cluster connections too. This commit rewords these
docs to cover both cases.
Relates #34405
In preparation for #74845, we need to create formal API reference documentation for our SQL APIs.
Due to the number of SQL APIs, we'll likely need to create a separate nested page for them. For parity, this PR moves
our EQL APIs to a separate page as well. Previously, they were listed under our search APIs.
Clarifies that you cannot specify an alias in the delete index API. You _can_ delete indices with an alias.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Adds a tutorial for search templates.
* Adds reference docs for the render search template API.
* Improves parameter documentation for the multi search template API.
* Removes duplicate examples from the search template API, multi search API, and create stored script API docs.
* Splits the source files for the search template API and the multi search template API docs.
Add a dynamic transient cluster setting search.max_async_search_response_size
that controls the maximum allowed size for a stored async search
response. The default max size is 10Mb. An attempt to store
an async search response larger than this size will result in error.
Relates to #67594
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
* [DOCS] Add performance info for runtime fields
* Add script-based sorting and clarify performance
* Changing title to Incentives and reworking the intro
* Removes docs and references for the following `geo_shape` mapping parameters:
* `tree`
* `tree_levels`
* `strategy`
* `distance_error_pct`
* Updates a related breaking change.
Relates to #70850
This adds support for the range aggregation over `histogram` mapped fields.
Decisions made for implementation:
- Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation
- Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future.
- Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.
This adds support for a `dry_run` parameter for the
`_ilm/migrate_to_data_tiers` API. This defaults to `false`, but when
configured to `true` it will simulate the migration of elasticsearch
entities to data tiers based routing, returning the entites that need to
be updated (indices, ILM policies and the legacy index template that'd
be deleted, if any was configured in the request).
To switch an index's lifecycle policy, you must first remove the existing
policy. Otherwise, phase execution for the index may silently fail.
Closes#70151
You can now use a wildcard pattern to remove data stream and index
aliases in the same action/request.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds the _ilm/migrate_to_data_tiers API to expose the service for
migrating the elasticsearch abstractions (indices, ILM policies and an
optional legacy template to delete) to data tiers routing allocation
(away from custom node attributes)
Added the dimension parameter to the following field types:
keyword
ip
Numeric field types (integer, long, byte, short)
The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.
Relates to #74014
* [DOCS] Remove beta label for most service accounts docs
* Remove beta label from additional service account files
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
This commit adds the "in_use_by" object to the response for ILM policies. This map shows the
indices, data streams, and composable templates that use the ILM policy.
An example output may look like:
```json
{
"logs" : {
"version" : 1,
"modified_date" : "2021-06-23T18:42:08.381Z",
"policy" : {
...
},
"in_use_by" : {
"indices" : [".ds-logs-foo-barbaz-2021.06.23-000001", ".ds-logs-foo-other-2021.06.23-000001"],
"data_streams" : ["logs-foo-barbaz", "logs-foo-other"],
"composable_templates" : ["logs"]
}
}
}
```
Resolves#73869
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.
The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
A reindex from a remote cluster doesn't support automatic or manual slicing.
This reuses a related note from the reindex docs in the upgrade docs.
Closes#54243.
Previously it was a requirement of the close job API that if the
job had an associated datafeed that that datafeed was stopped
before the job could be closed. Experience has shown that this
is just a pedantic nuisance. If a user closes the job without
first stopping the datafeed then it's just a mistake, and they
then have to make two further calls, to stop the datafeed and
then attempt to close the job again.
This PR changes the behaviour so that if you ask to close a job
whose datafeed is running then the datafeed gets stopped first
as part of the same call. Datafeeds are stopped with the same
level of force as the job close request specified.
This commit adds two related changes:
* ILM WaitForDataTierStep
* Autoscaling frozen_existence decider
The first part ensures that we wait mounting an index until a node that
can hold the index is available, avoiding a failed restore and red
cluster state. This is in particular important for the frozen phase, but
is done generically in the searchable snapshot action.
The second part triggers on indices in the ILM frozen phase to scale the
tier into existence by requiring a minimal amount of memory and storage.
Closes#72771
I was helping some folks debug an issue with the terms agg and noticed
that we didn't always have the `total_buckets` debug information. I also
noticed that we can't tell how many buckets we build, so I added that
too as `built_buckets`.
Finally, I noticed that when we're using segment ords we count segments
without any values as "multi-valued". We can do better there and count
them as no-valued. That will, mostly, just improve the profiling. When
we collect from global ords we have no way to tell how many values are
on the segment so segments without any values will, sadly, in this case
still be miscounted as multi-valued.
When we introduced dynamic:runtime (#65489) we decided to have it create objects dynamically under properties, as the runtime section did not (and still does not) support object fields. That proved to be a poor choice, because the runtime section is flat, supports dots in field names, and does not really need objects. Also, these end up causing unnecessary mapping conflicts.
With this commit we adapt dynamic:runtime to not dynamically create objects.
Closes#70268
Today we don't really describe why using `index.shard.check_on_startup`
is such a bad idea, or what to do instead. This commit expands the docs
to clarify what it does, why it's not really necessary and what to do
instead. It also now logs a warning every time the startup checks run to
encourage users to stop using this setting.
https://github.com/elastic/elasticsearch/pull/74201 documents `null` handling to the arg descriptions of several string functions.
This PR moves pre-existing docs for `null` handling and similar edge case handling for string functions to arg descriptions for consistency.
Relates to #74193
Pagination and snapshots for get snapshots API, build on top of the current implementation to enable work that needs this API for testing. A follow-up will leverage the changes to make things more efficient via pagination.
Relates https://github.com/elastic/elasticsearch/pull/73570 which does part of the under-the-hood changes required to efficiently implement this API on the repository layer.
Both of these APIs don't parse request bodies, the parameters are all taken
from the query string. Also, included the master timeout param include
as it was missing here also.
In #74138 we noted that index settings aren't copied in a clone. In fact
that's not true, we copy everything except explicitly-excluded ones,
`number_of_replicas` and `auto_expand_replicas`. This fixes the mistake.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Today if sending file chunks is CPU-bound (e.g. when using compression)
then we tend to concentrate all that work onto relatively few threads,
even if `indices.recovery.max_concurrent_file_chunks` is increased. With
this commit we fork the transmission of each chunk onto its own thread
so that the CPU-bound work can happen in parallel.
Adds a new API that allows a user to reset
an anomaly detection job.
To use the API do:
```
POST _ml/anomaly_detectors/<job_id>_reset
```
The API removes all data associated to the job.
In particular, it deletes model state, results and stats.
However, job notifications and user annotations are not removed.
Also, the API can be called asynchronously by setting the parameter
`wait_for_completion` to `false` (defaults to `true`). When run
that way the API returns the task id for further monitoring.
In order to prevent the job from opening while it is resetting,
a new job field has been added called `blocked`. It is an object
that contains a `reason` and the `task_id`. `reason` can take
a value from ["delete", "reset", "revert"] as all these
operations should block the job from opening. The `task_id` is also
included in order to allow tracking the task if necessary.
Finally, this commit also sets the `blocked` field when
the revert snapshot API is called as a job should not be opened
while it is reverted to a different model snapshot.
It is useful to know the following information when reading datafeed stats:
- Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
- Has the datafeed processed all past data available at the time of starting.
This object is only available if the datafeed task has been created.
It has the form:
```
"running_state": {
"is_real_time": <boolean>,
"look_back_finished": <boolean>
}
```
Changes:
* Removes a reference to the
`-XX:+UnlockDiagnosticVMOptions -XX:+PrintCompressedOopsMode` JVM option. This
option is no longer supported.
* Combines `Xms/Xmx` recommendations for compressed oops.
Closes#71644.
Co-authored-by: Rick Boyd <boyd.richardj@gmail.com>
Adds a new keep_values gap policy that works like skip, except if the metric
calculated on an empty bucket provides a non-null non-NaN value, this value is
used for the bucket.
Fixes#27377
Co-authored-by: Mark Tozzi <mark.tozzi@gmail.com>
Changes:
* Combines the `Document counts are approximate` and `Calculating document count
error` sections.
* Rewrites the section to include `sum_other_doc_count` and
`doc_count_error_upper_bound` for easier on-page (ctrl+f) searching.
Closes#73200
Improve the error message when inconsistent mappings cause doc value formatting errors. For example, trying to format a binary encoded IP address as a UTF8 string often fails with something unexpected, like `ArrayIndexOutOfBounds`. This change catches that and wraps it with a message suggesting the user check their mappings. Also gets rid of anonymous instances for doc value formatters, which made it hard to see what format was failing to be applied.
In #55805, we added a setting to allow single data node clusters to
respect the high watermark. In #73733 we added the related deprecations.
This commit ensures the only valid value for the setting is true and
adds deprecations if the setting is set. The setting will be removed
in a future release.
Co-authored-by: David Turner <david.turner@elastic.co>
This commit adds a short note to the docs on repository backups
indicating that the repository must not be modified while registered, so
that a restore from a repository backup must complete before
registration.
Relates #73730
This adds a new pipeline aggregation for calculating Kolmogorov–Smirnov test for a given sample and buckets path.
For now, the buckets path resolution needs to be `_count`. But, this may be relaxed in the future.
It accepts a parameter `fractions` that indicates the distribution of documents from some other pre-calculated sample.
This particular version of the K-S test is Two-sample, meaning, it calculates if the `fractions` and the distribution of `_count` values in the buckets_path are taken from the same distribution.
This in combination with the hypothesis alternatives (`less`, `greater`, `two_sided`) and sampling logic (`upper_tail`, `lower_tail`, `uniform`) allow for flexibility and usefulness when comparing two samples and determining the likelihood of them being from the same overall distribution.
Usage:
```
POST correlate_latency/_search?size=0&filter_path=aggregations
{
"aggs": {
"buckets": {
"terms": { <1>
"field": "version",
"size": 2
},
"aggs": {
"latency_ranges": {
"range": { <2>
"field": "latency",
"ranges": [
{ "to": 0.0 },
{ "from": 0, "to": 105 },
{ "from": 105, "to": 225 },
{ "from": 225, "to": 445 },
{ "from": 445, "to": 665 },
{ "from": 665, "to": 885 },
{ "from": 885, "to": 1115 },
{ "from": 1115, "to": 1335 },
{ "from": 1335, "to": 1555 },
{ "from": 1555, "to": 1775 },
{ "from": 1775 }
]
}
},
"ks_test": { <3>
"bucket_count_ks_test": {
"buckets_path": "latency_ranges>_count",
"alternative": ["less", "greater", "two_sided"]
}
}
}
}
}
}
```
This commit adds the ability to specify exclusion patterns in Auto-Follow patterns. This allows excluding indices that match any of the inclusion patterns and also match some of the exclusion patterns giving more fine grained control in scenarios where this is important.
Related #67686
The value of `*.ssl.client_authentication` is `required` for
everything except `xpack.security.http.ssl.client_authentication`, for
which is it `none`.
The doc template for this setting was configured to have a default
value, and allow an override. However, the default was set to `none`
when it should have been `required`.
The override for `http` was correctly set to `none` (but that didn't
really do anything, since that was the same as the default).
This commit changes the default to `required`, which matches the code
(see `XPackSettings.CLIENT_AUTH_DEFAULT`), and leaves the override for
http as `none` (see `XPackSettings.HTTP_CLIENT_AUTH_DEFAULT`).
* Add new thread pool for critical operations
* Split critical thread pool into read and write
* Add POJO to hold thread pool names
* Add tests for critical thread pools
* Add thread pools to data streams
* Update settings for security plugin
* Retrieve ExecutorSelector from SystemIndices where possible
* Use a singleton ExecutorSelector
* [DOCS] Add retriving from flattened fields
* Clarify sub-field syntax
* Moving sub-field retrieval to flattened field docs
* Remove full example and de-emphasize runtime fields
* Remove extraneous sample tag
With
230b860d95,
the `elastic/tap/elasticsearch-oss` tap was removed from Homebrew. This
removes outdated references to the tap from our docs.
It also notes that Homebrew installs the latest version of Elasticsearch.
Changes:
* Revises the size your shards guide to use a 50GB shard guideline. This better aligns with our default in the ILM policy UI.
* Updates the language to indicate that the 50GB shard guideline is not a hard limit. Larger shards may work depending on the network and use case.
Reverts some changes added in #71367.
Changes:
* Reuses the same `aliases` object properties in the following API docs:
* Clone index API
* Create index API
* Put component template API
* Put legacy index template API
* Put index template API
* Rollover index API
* Shrink index API
* Simulate template API
* Split index API
* Updates the `aliases` object properties for the simulate index API docs.
Closes#73044
Recent JDK releases have disabled TLS v1.0 and TLS v1.1 by default
See
- https://java.com/en/jre-jdk-cryptoroadmap.html
- https://bugs.openjdk.java.net/browse/JDK-8202343
This change adds documentation clarifying which TLS versions are
supported on which JDKs (in general terms, rather than specific builds)
and how to change the configuration if necessary.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Relates to #70755.
The main changes of this PR are:
Add an optional _meta field to ILM policy.
Add some test code about the change.
Update the doc of Create or update lifecycle policy API.
Changes:
* Updates the write index snippet to use data streams
* Notes data streams aliases don't set an implicit write stream, even if the alias points to one data stream.
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.
The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.
It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.
To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.
If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.
Closeselastic/ml-cpp#1724
Today when upgrading to the next major version we have a so-called
_major version barrier_: once the cluster comprises nodes of the new
major version then nodes of the previous major version are prevented
from joining the cluster. This means we can be certain that
`clusterState.nodes().getMinNodeVersion().major` will never decrease, so
we can implement upgrade logic that relies on the cluster remaining in
its wholly-upgraded state.
This commit generalises this behaviour to apply to all upgrades, so that
we can be certain that `clusterState.nodes().getMinNodeVersion()` will
never decrease in a running cluster.
Closes#72911
This allows indexing documents into a data stream alias.
The ingestion is that forwarded to the write index of the data stream
that is marked as write data stream.
The `is_write_index` parameter can be used to indicate what the write data stream is,
when updating / adding a data steam alias.
Relates to #66163
* [DOCS] Create a new page for dissect in scripting docs
* Expanding a bit more
* Adding a section for using dissect patterns
* Adding tests
* Fix test cases and other edits
* [DOCS] Moving grok to its own scripting page
* Adding examples
* Updating cross link for grok page
* Adds same runtime field in a search request for #73262
* Clarify titles and shift navigation
* Incorporating review feedback
* Updating cross-link to Painless
* [DOCS] Expand information on when to use a runtime field without a script
* Reworking information based on review feedback
* Clarify case where doc_values are disabled
* A few minor changes from review feedback
These settings were deprecated in 7.13+ in #72835 and are now removed by this commit.
This commit also ensures that the settings are removed from index metadata when the metadata is
loaded. The reason for this is that if we allow the settings to remain (because they are not
technically "invalid"), then the index will not be able to be allocated, because the
FilterAllocationDecider will be looking for nodes with the _tier attribute.
Changes:
* Expands the `aliases` parameter for the create index API to better document
supported properties.
* Reuses `aliases` parameter in the following API docs:
* Clone index API
* Shrink index API
* Split index API
Updates an outdated reference to empty `data_stream` objects. The create index
template API's `data_stream` parameter now supports the `hidden` property.
Changes:
* Reuses and reorders the index template API's body parameters in the simulate template API docs.
* Replaces several includes with a shorter xref.
* Reformats a sidebar on naming collisions with built-in index templates.
Updates the exists API docs to better reflect its support of data streams and
aliases.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This documents how to configure the proxy support for ODBC.
It also removes the documentation of the connection string values, these
are now all covered by the GUI settings.
Co-authored-by: Andrei Stefan <astefan@users.noreply.github.com>
Fixed autoscaling docs to no longer call partially mounted indices or
shards for frozen indices/shards, now uses partially mounted indices or
shards.
Closes#73132
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This change adds support for using `search_after` with field collapsing. When
using these in conjunction, the same field must be used for both sorting and
field collapsing. This helps keep the behavior simple and predictable.
Otherwise it would be possible for a group to appear on multiple pages of
results.
Currently search after is handled directly in `CollapsingTopDocsCollector`. As
a follow-up, we could generalize the logic and move support to the Lucene
grouping framework.
Closes#53115.
Today the docs indicate that restoring a snapshot with
`include_global_state` set will merge the ingest pipelines, ILM
policies, settings etc in the snapshot with those already in the
cluster. This isn't the case, we simply replace all the things. This
commit corrects the docs.
The get alias api should take into account the aliases parameter when
returning aliases that refer to data streams and don't return entries
for data streams that don't have any aliases pointing to it.
Relates to #66163
Adds new snapshot meta pool that is used to speed up the get snapshots API
by making `SnapshotInfo` load in parallel. Also use this pool to load
`RepositoryData`.
A follow-up to this would expand the use of this pool to the snapshot status
API and make it run in parallel as well.
The current search API documentation doesn't include any examples of query
parameter usage.
This updates the docs to include a simple syntax example using the `from` and
`size` query parameters.
Changes:
* Removes an error in the create SLM policy API's `schedule` parameter
def. `schedule` is not used to delete expired snapshots.
* Updates the `expire_after` parameter def to mention the
`slm.retention_schedule` cluster setting.
This commit adds a `cancelled` flag to each cancellable task in the
response to the list tasks API, allowing users to see that a task has
been properly cancelled and will complete as soon as possible.
Closes#72907
If a node is partitioned away from the rest of the cluster then the
`ClusterFormationFailureHelper` periodically reports that it cannot
discover the expected collection of nodes, but does not indicate why. To
prove it's a connectivity problem, users must today restart the node
with `DEBUG` logging on `org.elasticsearch.discovery.PeerFinder` to see
further details.
With this commit we log messages at `WARN` level if the node remains
disconnected for longer than a configurable timeout, which defaults to 5
minutes.
Relates #72968