* Adding shard count to _nodes/stats api
Added a shards section to each node returned by the _nodes/stats api. Currently this new section only contains a total count of all shards on the node.
The current `ids` option doesn't allow pinning a specific document in a
single index when searching over multiple indices. This introduces a
`documents` option, which is an array of `_id` and `_index`
fields to allow index-specific pins.
Closes https://github.com/elastic/elasticsearch/issues/67855.
A tag is required to reuse Elasticsearch breaking changes in the Stack
Guide. To display properly, the breaking changes must use external
links rather than xrefs.
This PR correctly places those tags for reuse. It also replaces
several xrefs with external links for reuse.
Changes:
* Updates the Kibana dev console screenshots to use the new EUI theme
* Corrects an example SQL query in one screenshot. The previous screenshot used `_/sql`, which is not a valid URI.
Adds formal API docs for the following APIs:
* Clear SQL cursor
* SQL search
* SQL translate
Other changes:
* Removes and redirects the "Supported REST parameters section." This is now covered in the SQL search API docs.
* Updates a few related xrefs.
Closes#75085
In the upcoming Lucene 9 release, `indices.query.bool.max_clause_count` is
going to apply to the entire query tree rather than per `bool` query. In order
to avoid breaks, the limit has been bumped from 1024 to 4096.
The semantics will effectively change when we upgrade to Lucene 9, this PR
is only about agreeing on a migration strategy and documenting this change.
To avoid further breaks, I am leaning towards keeping the current setting name
even though it contains `bool`. I believe that it still makes sense given that
`bool` queries are typically the main contributors to high numbers of clauses.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Today the docs for remote cluster connections use `ping_schedule` fairly
liberally, and don't mention that you should prefer TCP keepalives
wherever possible. This commit reduces the use of this setting in the
examples and adjusts the description of the setting to include a note
about TCP keepalives instead.
Date histogram interval parameter was deprecated in 7.2, in favor of the more specific fixed_interval and calendar_interval parameters. The old logic used some poorly understood guessing to decide if it should operate in fixed or calendar mode. The new logic requires a specific choice by the user, which is more explicit. In 7.x REST compatibility mode, we will parse the interval as calendar if possible, and otherwise interpret it as fixed.
This change introduces a CLI tool that can be used to create
enrollment tokens. It doesn't require credentials, but simply
write access to the local filesystem of a node. It uses an
auto-generated user in the file-realm with superuser role.
For this purpose, this change also introduces a base class for a
CLI tool that can be used by any CLI tool needs to perform actions
against an ES node as a superuser without requiring credentials
from the user. It is worth noting that this doesn't change our
existing thread model, because already an actor with write access
to the fs of an ES node, can become superuser (again, by
adding a superuser to the file realm, albeit manually).
Changes:
* Adds an example script for removing a subfield from an object
* Makes several other example snippets more modular.
Relates to 60532646df and
074f84dde5
* Allow ILM move-to-step without `action` or `name`
This commit enhances ILM's move-to-step API to allow dropping the `name`, or dropping both the
`action` and `name`. For example:
```json
POST /_ilm/move/foo-1
{
"current_step": {
"phase": "hot",
"action": "rollover",
"name": "check-rollover-ready"
},
"next_step": {
"phase": "warm",
"action": "forcemerge"
}
}
```
Will move to the first step in the `forcemerge` action in the `warm` phase (without having to know
the specific step name).
Another example:
```json
POST /_ilm/move/foo-1
{
"current_step": {
"phase": "hot",
"action": "rollover",
"name": "check-rollover-ready"
},
"next_step": {
"phase": "warm"
}
}
```
Will move to the first step in the `warm` phase (without having to know the specific action name).
Bear in mind that the execution order is still entirely an implementation detail, so "first" in the
above sentences means the first step that ILM would execute.
Resolves#58128
* Apply Andrei's wording change (thanks!)
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
* Log index and policy name when the concrete step key can't be resolved
Co-authored-by: Andrei Dan <andrei.dan@elastic.co>
Changes:
* Documents the `dimension` mapping parameter for `ip`, `keyword`, and `numeric`
fields.
* Documents the `index.mapping.dimension_fields.limit` index setting.
By default, `logger.deprecation.level` logs messages at the `DEPRECATION` level. This updates
and reorganizes the related docs.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
In theory, Elasticsearch supported configuring a PKCS#11 keystore
anywhere where a keystore/truststore could be used. For example:
xpack.security.http.ssl.keystore.type: pkcs11
However, this support was poorly tested and broken.
This commit removes PKCS#11 support from any configurable SSL context.
It does not affect the ability to use a PKCS#11 keystore as the JRE's
system default keystore/truststore.
Fixes a broken link in the `documentation.url` for the field usage stats API and
enroll Kibana API.
These broken links caused the build for the JS client docs to fail.
This change introduces a CLI tool that can be used to create
enrollment tokens. It doesn't require credentials, but simply
write access to the local filesystem of a node. It uses an
auto-generated user in the file-realm with superuser role.
For this purpose, this change also introduces a base class for a
CLI tool that can be used by any CLI tool needs to perform actions
against an ES node as a superuser without requiring credentials
from the user. It is worth noting that this doesn't change our
existing thread model, because already an actor with write access
to the fs of an ES node, can become superuser (again, by
adding a superuser to the file realm, albeit manually).
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Adds a field usage API that reports shard-level statistics about which Lucene fields have been accessed, and which
parts of the Lucene data structures have been accessed.
Field usage statistics are automatically captured when queries are runnning on a cluster. A shard-level search request
that accesses a given field, even if multiple times during that request, is counted as a single use.
We have already decided not to have xpack usage for field mappers
(see #53076). As mappings stats of all fields is already tracked
in cluster stats.
Moreover xpack usage for vector field is a quite expensive operation
(see #74974).
This removes xpack actions for vector field.
Adds formal API docs and JSON specs for the following APIs:
* Get async SQL search
* Get async SQL search status
* Delete async SQL search
Closes#74845
SimpleFS is deprecated and will be removed in Lucene 9. This commit
deprecates SimpleFS in 7.x and uses NIOFS for SimpleFS in Elasticsearch
7.15 or later as it offers superior or equivalent performance to
SimpleFS.
Closes#74036. Since some orchestration platforms forbid periods in
environment variable names, allow Docker users to pass settings to ES
using an alternative name scheme. For example:
bootstrap.memory_lock
...becomes:
ES_BOOTSTRAP_MEMORY__LOCK
The setting name is uppercased, prefixed, all underscores are converted
to double underscores, and all periods are converted to underscores.
`field_masking_span` is the only span query that does not begin with
`span_`. This commit deprecates the existing name and adds a new
name `span_field_masking` to better fit with the other queries.
This PR adds support for using the `slice` option in point-in-time searches. By
default, the slice query splits documents based on their Lucene ID. This
strategy is more efficient than the one used for scrolls, which is based on the
`_id` field and must iterate through the whole terms dictionary. When slicing a
search, the same point-in-time ID must be used across slices to guarantee the
partitions don't overlap or miss documents.
Closes#65740.
We incorrectly list `wait_for` as a valid `refresh` argument for the
following APIs:
* Delete by query
* Multi get
* Reindex
This fixes that error. It also updates the get API docs for consistency.
Closes#65031
Documents async SQL search functionality.
I plan to add formal API documentation for the async APIs with a later PR.
Relates to #73991 and #74845.
# Conflicts:
# docs/reference/release-notes/highlights.asciidoc
Add documentation for the newly introduced CircuitBreaker, which is
used to restrict the memory usage for an EQL sequence query to avoid
OutOfMemory exceptions.
Follows: #74381
Today the docs on setting `tcp_retries2` only talk about intra-cluster
connections, but in fact this setting is equally important to the
resilience of remote cluster connections too. This commit rewords these
docs to cover both cases.
Relates #34405
In preparation for #74845, we need to create formal API reference documentation for our SQL APIs.
Due to the number of SQL APIs, we'll likely need to create a separate nested page for them. For parity, this PR moves
our EQL APIs to a separate page as well. Previously, they were listed under our search APIs.
Clarifies that you cannot specify an alias in the delete index API. You _can_ delete indices with an alias.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Adds a tutorial for search templates.
* Adds reference docs for the render search template API.
* Improves parameter documentation for the multi search template API.
* Removes duplicate examples from the search template API, multi search API, and create stored script API docs.
* Splits the source files for the search template API and the multi search template API docs.
Add a dynamic transient cluster setting search.max_async_search_response_size
that controls the maximum allowed size for a stored async search
response. The default max size is 10Mb. An attempt to store
an async search response larger than this size will result in error.
Relates to #67594
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
* [DOCS] Add performance info for runtime fields
* Add script-based sorting and clarify performance
* Changing title to Incentives and reworking the intro
* Removes docs and references for the following `geo_shape` mapping parameters:
* `tree`
* `tree_levels`
* `strategy`
* `distance_error_pct`
* Updates a related breaking change.
Relates to #70850
This adds support for the range aggregation over `histogram` mapped fields.
Decisions made for implementation:
- Sub-aggregations are not allowed. This is to simplify implementation and follows the prior art set by the `histogram` aggregation
- Nothing fancy is done with the ranges. No filter translations as we cannot easily do a `range` filter query against histogram fields. This may be an optimization in the future.
- Ranges check the histogram value ONLY. No interpolation of values is done. If we have better statistics around the histogram this MAY be possible.
This adds support for a `dry_run` parameter for the
`_ilm/migrate_to_data_tiers` API. This defaults to `false`, but when
configured to `true` it will simulate the migration of elasticsearch
entities to data tiers based routing, returning the entites that need to
be updated (indices, ILM policies and the legacy index template that'd
be deleted, if any was configured in the request).
To switch an index's lifecycle policy, you must first remove the existing
policy. Otherwise, phase execution for the index may silently fail.
Closes#70151
You can now use a wildcard pattern to remove data stream and index
aliases in the same action/request.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This adds the _ilm/migrate_to_data_tiers API to expose the service for
migrating the elasticsearch abstractions (indices, ILM policies and an
optional legacy template to delete) to data tiers routing allocation
(away from custom node attributes)
Added the dimension parameter to the following field types:
keyword
ip
Numeric field types (integer, long, byte, short)
The dimension parameter is of type boolean (default: false) and is used
to mark that a field is a time series dimension field.
Relates to #74014
* [DOCS] Remove beta label for most service accounts docs
* Remove beta label from additional service account files
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories.
The changes for requests that target multiple repositories are:
* Add `repository` field to `SnapshotInfo` and REST response
* Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests
* Pagination now works across repositories instead of being per repository for multi-repository requests
closes#69108closes#43462
This commit adds the "in_use_by" object to the response for ILM policies. This map shows the
indices, data streams, and composable templates that use the ILM policy.
An example output may look like:
```json
{
"logs" : {
"version" : 1,
"modified_date" : "2021-06-23T18:42:08.381Z",
"policy" : {
...
},
"in_use_by" : {
"indices" : [".ds-logs-foo-barbaz-2021.06.23-000001", ".ds-logs-foo-other-2021.06.23-000001"],
"data_streams" : ["logs-foo-barbaz", "logs-foo-other"],
"composable_templates" : ["logs"]
}
}
}
```
Resolves#73869
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.
The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
A reindex from a remote cluster doesn't support automatic or manual slicing.
This reuses a related note from the reindex docs in the upgrade docs.
Closes#54243.
Previously it was a requirement of the close job API that if the
job had an associated datafeed that that datafeed was stopped
before the job could be closed. Experience has shown that this
is just a pedantic nuisance. If a user closes the job without
first stopping the datafeed then it's just a mistake, and they
then have to make two further calls, to stop the datafeed and
then attempt to close the job again.
This PR changes the behaviour so that if you ask to close a job
whose datafeed is running then the datafeed gets stopped first
as part of the same call. Datafeeds are stopped with the same
level of force as the job close request specified.
This commit adds two related changes:
* ILM WaitForDataTierStep
* Autoscaling frozen_existence decider
The first part ensures that we wait mounting an index until a node that
can hold the index is available, avoiding a failed restore and red
cluster state. This is in particular important for the frozen phase, but
is done generically in the searchable snapshot action.
The second part triggers on indices in the ILM frozen phase to scale the
tier into existence by requiring a minimal amount of memory and storage.
Closes#72771
I was helping some folks debug an issue with the terms agg and noticed
that we didn't always have the `total_buckets` debug information. I also
noticed that we can't tell how many buckets we build, so I added that
too as `built_buckets`.
Finally, I noticed that when we're using segment ords we count segments
without any values as "multi-valued". We can do better there and count
them as no-valued. That will, mostly, just improve the profiling. When
we collect from global ords we have no way to tell how many values are
on the segment so segments without any values will, sadly, in this case
still be miscounted as multi-valued.
When we introduced dynamic:runtime (#65489) we decided to have it create objects dynamically under properties, as the runtime section did not (and still does not) support object fields. That proved to be a poor choice, because the runtime section is flat, supports dots in field names, and does not really need objects. Also, these end up causing unnecessary mapping conflicts.
With this commit we adapt dynamic:runtime to not dynamically create objects.
Closes#70268
Today we don't really describe why using `index.shard.check_on_startup`
is such a bad idea, or what to do instead. This commit expands the docs
to clarify what it does, why it's not really necessary and what to do
instead. It also now logs a warning every time the startup checks run to
encourage users to stop using this setting.
https://github.com/elastic/elasticsearch/pull/74201 documents `null` handling to the arg descriptions of several string functions.
This PR moves pre-existing docs for `null` handling and similar edge case handling for string functions to arg descriptions for consistency.
Relates to #74193
Pagination and snapshots for get snapshots API, build on top of the current implementation to enable work that needs this API for testing. A follow-up will leverage the changes to make things more efficient via pagination.
Relates https://github.com/elastic/elasticsearch/pull/73570 which does part of the under-the-hood changes required to efficiently implement this API on the repository layer.
Both of these APIs don't parse request bodies, the parameters are all taken
from the query string. Also, included the master timeout param include
as it was missing here also.
In #74138 we noted that index settings aren't copied in a clone. In fact
that's not true, we copy everything except explicitly-excluded ones,
`number_of_replicas` and `auto_expand_replicas`. This fixes the mistake.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Today if sending file chunks is CPU-bound (e.g. when using compression)
then we tend to concentrate all that work onto relatively few threads,
even if `indices.recovery.max_concurrent_file_chunks` is increased. With
this commit we fork the transmission of each chunk onto its own thread
so that the CPU-bound work can happen in parallel.