Keepalive options are not well-documented (only in transport section, although also available at http and network level).
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This feature adds a new `fields` parameter to the search request, which
consults both the document `_source` and the mappings to fetch fields in a
consistent way. The PR merges the `field-retrieval` feature branch.
Addresses #49028 and #55363.
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.
This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
Today, we send operations in phase2 of peer recoveries batch by batch
sequentially. Normally that's okay as we should have a fairly small of
operations in phase 2 due to the file-based threshold. However, if
phase1 takes a lot of time and we are actively indexing, then phase2 can
have a lot of operations to replay.
With this change, we will send multiple batches concurrently (defaults
to 1) to reduce the recovery time.
Part of #48366. Add documentation for the dangling indices
API added in #58176.
Co-authored-by: David Turner <david.turner@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Adding ESS icons to supported ES settings.
* Adding new file for supported ESS settings.
* Adding supported ESS settings for HTTP and disk-based shard allocation.
* Adding more supported settings for ESS.
* Adding descriptions for each Cloud section, plus additional settings.
* Adding new warehouse file for Cloud, plus additional settings.
* Adding node settings for Cloud.
* Adding audit settings for Cloud.
* Resolving merge conflict.
* Adding SAML settings (part 1).
* Adding SAML realm encryption and signing settings.
* Adding SAML SSL settings.
* Adding Kerberos realm settings.
* Adding OpenID Connect Realm settings.
* Adding OpenID Connect SSL settings.
* Resolving leftover Git merge markers.
* Removing Cloud settings page and link to it.
* Add link to mapping source
* Update docs/reference/docs/reindex.asciidoc
* Incorporate edit of HTTP settings
* Remove "cloud" from tag and ID
* Remove "cloud" from tag and update description
* Remove "cloud" from tag and ID
* Change "whitelists" to "specifies"
* Remove "cloud" from end tag
* Removing cloud from IDs and tags.
* Changing link reference to fix build issue.
* Adding index management page for missing settings.
* Removing warehouse file for Cloud and moving settings elsewhere.
* Clarifying true/false usage of http.detailed_errors.enabled.
* Changing underscore to dash in link to fix ci build.
* Forbid read-only-allow-delete block in blocks API
The read-only-allow-delete block is not really under the user's control
since Elasticsearch adds/removes it automatically. This commit removes
support for it from the new API for adding blocks to indices that was
introduced in #58094.
* Missing xref
* Reword paragraph on read-only-allow-delete block
Today the disk-based shard allocator accounts for incoming shards by
subtracting the estimated size of the incoming shard from the free space on the
node. This is an overly conservative estimate if the incoming shard has almost
finished its recovery since in that case it is already consuming most of the
disk space it needs.
This change adds to the shard stats a measure of how much larger each store is
expected to grow, computed from the ongoing recovery, and uses this to account
for the disk usage of incoming shards more accurately.
Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account
(i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository
setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a
per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to
configure throttling in a single place.
The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to
`40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change
will be observed by clusters where the recovery and restore settings were not adapted.
Relates https://github.com/elastic/elasticsearch/issues/57023
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Today we have individual settings for configuring node roles such as
node.data and node.master. Additionally, roles are pluggable and we have
used this to introduce roles such as node.ml and node.voting_only. As
the number of roles is growing, managing these becomes harder for the
user. For example, to create a master-only node, today a user has to
configure:
- node.data: false
- node.ingest: false
- node.remote_cluster_client: false
- node.ml: false
at a minimum if they are relying on defaults, but also add:
- node.master: true
- node.transform: false
- node.voting_only: false
If they want to be explicit. This is also challenging in cases where a
user wants to have configure a coordinating-only node which requires
disabling all roles, a list which we are adding to, requiring the user
to keep checking whether a node has acquired any of these roles.
This commit addresses this by adding a list setting node.roles for which
a user has explicit control over the list of roles that a node has. If
the setting is configured, the node has exactly the roles in the list,
and not any additional roles. This means to configure a master-only
node, the setting is merely 'node.roles: [master]', and to configure a
coordinating-only node, the setting is merely: 'node.roles: []'.
With this change we deprecate the existing 'node.*' settings such as
'node.data'.
* Changes for issue #36114.
* Adding stronger wording to the new note.
* Removing statement about typically not needting to set the read-only allow delete block.
* Replacing Elasticsearch with {es} variable.
Elasticsearch enables HTTP compression by default. However, to mitigate
potential security risks like the BREACH attack, compression is disabled by
default if HTTPS is enabled.
This updates the `http.compression` setting definition accordingly and adds
additional context.
Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
* Moves `Discovery and cluster formation` content from `Modules` to
`Set up Elasticsearch`.
* Combines `Adding and removing nodes` with `Adding nodes to your
cluster`. Adds related redirect.
* Removes and redirects the `Modules` page.
* Rewrites parts of `Discovery and cluster formation` to remove `module`
references and meta references to the section.
* [DOCS] Create top-level "Search your data" page
**Goal**
Create a top-level search section. This will let us clean up our search
API reference docs, particularly content from [`Request body search`][0].
**Changes**
* Creates a top-level `Search your data` page. This page is designed to
house concept and tutorial docs related to search.
* Creates a `Run a search` page under `Search your data`. For now, This
contains a basic search tutorial. The goal is to add content from
[`Request body search`][0] to this in the future.
* Relocates `Long-running searches` and `Search across clusters` under
`Search your data`. Increments several headings in that content.
* Reorders the top-level TOC to move `Search your data` higher. Also
moves the `Query DSL`, `EQL`, and `SQL access` chapters immediately
after.
Relates to #48194
[0]: https://www.elastic.co/guide/en/elasticsearch/reference/master/search-request-body.html
The following settings are now no-ops:
* xpack.flattened.enabled
* xpack.logstash.enabled
* xpack.rollup.enabled
* xpack.slm.enabled
* xpack.sql.enabled
* xpack.transform.enabled
* xpack.vectors.enabled
Since these settings no longer need to be checked, we can remove settings
parameters from a number of constructors and methods, and do so in this
commit.
We also update documentation to remove references to these settings.
The disk decider had special handling for the single data node case,
allowing any allocation (skipping watermark checks) for such clusters.
This special handling can now be avoided via a setting.
* Scripting: Deprecate general cache settings
* Add script.disable_max_compilations_rate setting
* Move construction to ScriptCache
* Use ScriptService to do updates of CacheHolder
* Remove fallbacks
* Add SCRIPT_DISABLE_MAX_COMPILATIONS_RATE_SETTING to ClusterSettings
* Node scope
* Use back compat
* 8.0 for bwc
* script.max_compilations_rate=2048/1m -> script.disable_max_compilations_rate=true in docker compose
* do not guard in esnode
* Doc update
* isSnapshotBuild() -> systemProperty 'es.script.disable_max_compilations_rate', 'true'
* Do not use snapshot in gradle to set max_compilations_rate
* Expose cacheHolder as package private
* monospace 75/5m in cbreaker docs, single space in using
* More detail in general compilation rate error
* Test: don't modify defaultConfig on upgrade
We believe there's no longer a need to be able to disable basic-license
features completely using the "xpack.*.enabled" settings. If users don't
want to use those features, they simply don't need to use them. Having
such features always available lets us build more complex features that
assume basic-license features are present.
This commit deprecates settings of the form "xpack.*.enabled" for
basic-license features, excluding "security", which is a special case.
It also removes deprecated settings from integration tests and unit
tests where they're not directly relevant; e.g. monitoring and ILM are
no longer disabled in many integration tests.
Relocates the "Remote Clusters" documentation from the "Modules" section to the "Set up Elasticsearch" section.
Supporting changes:
* Reorders the "Bootstrap checks for X-Pack" section to immediately follow the "Bootstrap checks"chapter.
* Removes an outdated X-Pack `idef` from the "Remote Clusters" intro.
Some of these characters are special to Asciidoctor and they ruin the
rendering on this page. Instead, we use a macro to passthrough these
characters without Asciidoctor applying any subtitutions to them. This
commit then addresses some rendering issues in the thread pool docs.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
The use of available processors, the terminology, and the settings
around it have evolved over time. This commit cleans up some places in
the codes and in the docs to adjust to the current terminology.
The `indices.recovery.max_bytes_per_sec` recovery bandwidth limit can differ
between nodes if it is not set dynamically, but today this is not obvious. This
commit adds a paragraph to its documentation clarifying how to set different
bandwidth limits on each node.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
It is useful to be able to delay state recovery until enough data nodes have
joined the cluster, since this gives the shard allocator a decent opportunity
to re-use as much existing data as possible. However we also have the option to
delay state recovery until a certain number of master-eligible nodes have
joined, and this is unnecessary: we require a majority of master-eligible nodes
for state recovery, and there is no advantage in waiting for more.
This commit deprecates the unnecessary settings in preparation for their
removal.
Relates #51806
The introductory sections of the reference manual contains some simplified
instructions for adding a node to the cluster. Unfortunately they are a little
too simplified and only really work for clusters running on `localhost`. If you
try and follow these instructions for a distributed cluster then the new node
will, confusingly, auto-bootstrap itself into a distinct one-node cluster.
Multiple nodes running on localhost is a valid config, of course, but we should
spell out that these instructions are really only for experimentation and that
it takes a bit more work to add nodes to a distributed cluster. This commit
does so.
Also, the "important config" instructions for discovery say that you MUST set
`discovery.seed_hosts` whereas in fact it is fine to ignore this setting and
use a dynamic discovery mechanism instead. This commit weakens this statement
and links to the docs for dynamic discovery mechanisms.
Finally, this section is also overloaded with some technical details that are
not important for this context and are adequately covered elsewhere, and
completely fails to note that the default discovery port is 9300. This commit
addresses this.
Explicitly notes the Elasticsearch API endpoints that support CCS.
This should deter users from attempting to use CCS with other API
endpoints, such as `GET <index>/_doc/<_id>`.
Updates the cross-cluster search (CCS) documentation to note how
cluster-level settings are applied.
When `ccs_minimize_roundtrips` is `true`, each cluster applies its own
cluster-level settings to the request.
When `ccs_minimize_roundtrips` is `false`, cluster-level settings for
the local cluster is used. This includes shard limit settings, such as
`action.search.shard_count.limit`, `pre_filter_shard_size`, and
`max_concurrent_shard_requests`. If these limits are set too low, the
request could be rejected.
Today we use `cluster.join.timeout` to prevent nodes from waiting indefinitely
if joining a faulty master that is too slow to respond, and
`cluster.publish.timeout` to allow a faulty master to detect that it is unable
to publish its cluster state updates in a timely fashion. If these timeouts
occur then the node restarts the discovery process in an attempt to find a
healthier master.
In the special case of `discovery.type: single-node` there is no point in
looking for another healthier master since the single node in the cluster is
all we've got. This commit suppresses these timeouts and instead lets the node
wait for joins and publications to succeed no matter how long this might take.
Synced flush was a brilliant idea. It supports instant recoveries with a
quite small implementation. However, with the presence of sequence
numbers and retention leases, it is no longer needed. This change
removes it from 8.0.
Relates #5077
* Creates a prerequisites section in the cross-cluster replication (CCR)
overview.
* Adds concise definitions for local and remote cluster in a CCR context.
* Documents that the ES version of the local cluster must be the same
or a newer compatible version as the remote cluster.
The "Restore any snapshots as required" step is a trap: it's somewhere between
tricky and impossible to restore multiple clusters into a single one.
Also add a note about configuring discovery during a rolling upgrade to
proscribe any rare cases where you might accidentally autobootstrap during the
upgrade.
The docs specify that cluster.remote.connect disables cross-cluster
search. This is correct, but not fully accurate as it disables any
functionality that relies on remote cluster connections: cross-cluster
search, remote data feeds, and cross-cluster replication. This commit
updates the docs to reflect this.
Today the docs say that the low watermark has no effect on any shards that have
never been allocated, but this is confusing. Here "shard" means "replication
group" not "shard copy" but this conflicts with the "never been allocated"
qualifier since one allocates shard copies and not replication groups.
This commit removes the misleading words. A newly-created replication group
remains newly-created until one of its copies is assigned, which might be quite
some time later, but it seems better to leave this implicit.
Clarifies not to set `cluster.initial_master_nodes` on nodes that are joining
an existing cluster.
Co-Authored-By: James Rodewig <james.rodewig@elastic.co>
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a
bad idea since it will lead to the kinds of overshoot that were otherwise fixed
in #46079. This setting was deprecated in #47443. This commit removes it.
Setting `cluster.routing.allocation.disk.include_relocations` to `false` is a
bad idea since it will lead to the kinds of overshoot that were otherwise fixed
in #46079. This commit deprecates this setting so it can be removed in the next
major release.
We added docs for proxy mode in #40281 but on reflection we should not be
documenting this setting since it does not play well with all proxies and we
can't recommend its use. This commit removes those docs and expands its Javadoc
instead.
The changes in #32006 mean that the discovery process can no longer use
master-ineligible nodes as a stepping-stone between master-eligible nodes.
This was normally an indication of a strange and possibly-fragile configuration
and was not recommended. This commit clarifies that only master-eligible nodes
are now involved with discovery.
With this commit, Elasticsearch will no longer prefer using shards in the same location
(with the same awareness attribute values) to process `_search` and `_get` requests.
Instead, adaptive replica selection (the default since 7.0) should route requests more efficiently
using the service time of prior inter-node communications. Clusters with big latencies between
nodes should switch to cross cluster replication to isolate nodes within the same zone.
Note that this change only targets 8.0 since it is considered as breaking. However a follow up
pr should add an option to activate this behavior in 7.x in order to allow users to opt-in early.
Closes#43453
* Snapshot cleanup functionality via transport/REST endpoint.
* Added all the infrastructure for this with the HLRC and node client
* Made use of it in tests and resolved relevant TODO
* Added new `Custom` CS element that tracks the cleanup logic.
Kept it similar to the delete and in progress classes and gave it
some (for now) redundant way of handling multiple cleanups but only allow one
* Use the exact same mechanism used by deletes to have the combination
of CS entry and increment in repository state ID provide some
concurrency safety (the initial approach of just an entry in the CS
was not enough, we must increment the repository state ID to be safe
against concurrent modifications, otherwise we run the risk of "cleaning up"
blobs that just got created without noticing)
* Isolated the logic to the transport action class as much as I could.
It's not ideal, but we don't need to keep any state and do the same
for other repository operations
(like getting the detailed snapshot shard status)
Removes support for using a system property to disable the automatic release of
the write block applied when a node exceeds the flood-stage watermark.
Relates #42559