Changes:
* Updates the Cloud setup instructions to note that you open Kibana by default.
* Reorders the API call section to highlight Kibana.
* Fixes the Docker `ifeval` to hide some text on unreleased branches.
This enhances the migrate to data tiers routing API to also iterate over
the existing legacy, composable, and component templates and look if
they define a custom node attribute routing in their settings for either
`index.routing.allocation.require.{nodeAttrName}` or
`index.routing.allocation.include.{nodeAttrName}`. If any does, we
update them to remove all the routings settings for the provided
`nodeAttrName`.
eg. any template with the following setting configuration:
```
"settings": {
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "rack1",
index.routing.allocation.exclude.data: "rack2,rack3"
}
```
will have its settings updated to:
```
"settings": {}
```
Adds two known issues to the 8.0.0-rc1 release notes:
* A general advisory not to upgrade production clusters to 8.0.0-rc1. The advisory also notes that upgrades from 8.0.0-rc1 are not supported.
* A note indicating that the SQL JDBC driver requires Java 17 or newer. This requirement will be changed to Java 1.8
or newer in 8.0.0-rc2.
(cherry picked from commit 31cba4fa16)
Cosine similarity is not defined when one of the vectors has zero magnitude.
Before, the kNN search endpoint threw a confusing exception related to top docs
collection. Now we reject vectors early with a clear error message, failing
indexing if the vector has zero magnitude.
As of 8.0, the compatibility window for cross-cluster search (CCS) to an earlier release will be one minor release. This updates the CCS docs and adds a related 8.0 breaking change.
Closes https://github.com/elastic/elasticsearch/issues/80782
Emit deprecation warning when creating new jobs with bucket spans that
aren't an integral divisor or multiple of a day.
Relates #81645
Co-authored-by: lcawl <lcawley@elastic.co>
* Adds a prerequisites section covering remote cluster config, node roles, and security.
* Moves existing content about remote cluster config to the prereqs.
* Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes.
Closes https://github.com/elastic/elasticsearch/issues/72001
Closes#81652.
Convert the `repository-azure`, `repository-gcs` and `repository-s3`
plugins into modules, so that they are always included in the
Elasticsearch distribution. Also change plugin installation, removal
and syncing so that attempting to add or remove these plugins still
succeeds but is now a no-op.
Adding text to clarify that the default pipeline only applies to indexing requests, not updates.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit 4e6e4eab22)
Co-authored-by: Mike Barretta <mike.barretta@elastic.co>
This commit adds support for MPNet based models.
MPNet models differ from BERT style models in that:
- Special tokens are different
- Input to the model doesn't require token positions.
To configure an MPNet tokenizer for your pytorch MPNet based model:
```
"tokenization": {
"mpnet": {...}
}
```
The options provided to `mpnet` are the same as the previously supported `bert` configuration.
The migrate to data tiers routing API required ILM to be stopped. This
is fine for "live" runs, but for dry runs this isn't a requirement.
This changes the dry_run to allow the API to run irrespective of the ILM
status.
This fixes the migrate to data tiers routing API to take into account
the scenario where the node attribute configuration for an index is more
accurate than the existing `_tier_preference` configuration.
Previously we would simply remove the node attributes routing if there
was a `_tier_preference` configured for the index.
With this commit, we'll look if either the `require.data` or
`include.data` custom routings are colder than the existing `_tier_preference`
configuration (ie. `cold` vs `data_warm,data_hot`) and update the tier
routing accordingly.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold",
index.routing.allocation.include._tier_preference: "data_hot"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
This also removes the existing invariant that had the `require.data`
configuration take precedence over a possible `include.data`
configuration, and will now migrate the coldest configuration to the
corresponding `_tier_preference`.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
As outlined in elastic/elasticsearch#81604, including the `searchable_snapshot` action in both the hot and cold phases can result in indices not automatically migrating to the cold tier during the cold phase.
This adds a related warning.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Notes that the query string query's `default_field` and `fields` parameters support wildcards.
* Adds an xref to the `index.query.default_field` docs to the `default_field` parameter.
As part of the effort of making JDBC driver self sufficient, remove the
ES lib geo dependencies without any replacement.
Currently the JDBC driver takes the WKT text and instantiates a geo
object based on the ES lib geo.
Moving forward the driver will return the WKT string representation
without any conversion letting the user pick the geo library desired.
That can be ES lib geo, jts, spatial4j or others.
Note this is a breaking change.
Relates #80277
We (mostly I) were initially advocating for the auto-generated files to
use unique names (the name containing a timestamp particle), in order to
avoid that subsequent invocations of the config step conflict with
itself. Moreover, I was wishing that these files will not have to be
handled directly by admins (that the enrollment process was to be used).
However, experience proved us otherwise, admins have to manipulate these
files, and unique configuration names are hard to deal with in scripts
and docs, so this PR is all about using a fixed name for all the
generated files. _Labeling as a bug fix because the feedback is that it
very negatively impacts usabilty._ Closes
https://github.com/elastic/elasticsearch/issues/81057
This improves reporting of trained model size in the response of the stats API.
In particular, it removes the `model_size_bytes` from the `deployment_stats` section and
replaces it with a top-level `model_size_stats` object that contains:
- `model_size_bytes`: the actual model size
- `required_native_memory_bytes`: the amount of memory required to load a model
In addition, these are now reported for PyTorch models regardless of their deployment state.
Add JwtRealmSettings
Include unit tests and realm security settings documentation. Covers all settings except client authentication mTLS option, and HTTP proxy option.
Refactor Open ID Connect realm to reuse ClaimSetting.java and ClaimParser.java for JWT realm.
This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
After 7.16.2, we'll no longer produce Windows MSI installer packages for Elasticsearch. These packages were previously released in beta and didn't receive widespread adoption.
### Changes:
* Adds a related 7.17 breaking change.
* Adds a related 7.16 deprecation.
* Removes the MSI installation instructions.
* Removes references to the MSI installer.
I plan to port the applicable changes to 8.1 (main), 8.0, 7.17, and 7.16. In the 7.16 ports, I'll leave in the MSI install docs and add related deprecation notes to them instead.
Removes a section covering configuration management tools from the
installation instructions.
After 7.16.2, Elastic will no longer maintain these tools. Previously,
the tools were only supported on a "best effort" basis.
For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater.
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Combines several 8.0 breaking changes for the removal of API endpoints that contain mapping types. These items were separate because we previously organized breaking changes by area.
This is a follow-on to #79162.
This commit deprecates the indices.query.bool.max_clause_count node setting,
and instead configures the maximum clause count for lucene based on the available
heap and the size of the thread pool.
Closes#46433
This PR adds four new templates that are automatically installed from the Monitoring plugin.
In 8.x, Metricbeat will be writing its data in ECS compliant format, even when used with xpack
mode enabled (stack monitoring). In order to continue to support the legacy data format, new
mappings have been created with the new ECS fields for indexing data, and alias fields for the
legacy format which point to the corresponding ECS fields.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Mat Schaffer <mat@schaffer.me>
* [DOCS] Enroll additional nodes on Docker
* Remove -p option for second node
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
* Rename nodes to align with other Docker docs
* Add elastic network to first node docker run command
* Remove hyphen from node names
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
Updates the remote clusters version compatibility table to include 7.17 and 8.x versions.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
If a node reaches the flood stage watermark then we automatically apply
the `read_only_allow_delete` block to all its indices to prevent any
further growth in data. Users are expected to fix the disk space issue
by adding more space or deleting indices. However some users may prefer
to fix the disk space issues by modifying some of the index settings,
perhaps removing replicas or adjusting an allocation filter to move
shards onto nodes with more space. Today this isn't possible since the
`read_only_allow_delete` block also applies to metadata writes. Blocking
metadata writes isn't necessary to protect against further increases in
disk usage, and makes it harder for users to resolve the disk space
issue, so this commit removes the `METADATA_WRITE` level from the block
definition.
per issue 60780, decision from team to remove experimental language from HDR Histogram percentiles and ranks. Feature has been in production for quite some time.
closes#60780
* [DOCS] Add docs for verifying CA fingerprint
* Update openssl command and explanatory text
* Explain copying CA cert if fingerprint validation isn't possible
* Incorporate new section into the main security config page
* Clarify how cert is used
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Split into two, separate sections
* Rename file and update text based on feedback
* Update ref to use new filename
* Remove extra word
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Remove sentence about security being disabled by default
* Updating introduction
* Remove minimal security page
* Clarify configuring security before starting ES
* Clarifications
* Remove old file
* Add set passwords page
* Update change passwords page, clarify TLS adjustments, and other edits
* Update test
* Minor clarification to intro text
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Previously the ML model snapshot upgrade endpoint did not
provide a way to reliably monitor progress. This could lead
to the upgrade assistant UI thinking that a model snapshot
upgrade had finished when it actually hadn't.
This change adds a new "stats" API that allows external
interested parties to find out the status of each model
snapshot upgrade and which node (if any) each is running on.
Fixes#81519
We say to mark repos as readonly to prevent corruption, but there's
other ways to prevent corruption that people sometimes use instead (e.g.
denying writes at the filesystem/bucket level). It's reasonable to think
that the readonly flag is redundant in that situation but it's not: they
should still mark the repo as readonly tho to bypass the cache and
re-read its contents on each access. This commit adds docs to that
effect.
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Reverts an anchor change from #46711.
Previous versions of the docs use the `_shrinking_an_index` anchor for this
section. Preserving that anchor will prevent doc build breaks in future releases.
* Expose the index age in ILM explain output.
ILM already exposes the `age` that ILM will use to transition to the next phase, based on that phase's `min_age`. The `index_age` is based only on the index creation date and it's used to trigger a rollover.
Resolves#64429
Force merge action is a very costly action. It may take several hours to run for big indices. But current force merge rest api do not support wait_for_completion parameter.
This adds support for the wait_for_completion parameter.
`GET _nodes/stats` returns statistics about indexing pressure for each node.
With this commit `GET _cluster/stats` now returns stats about indexing pressure
computed by aggregating the indexing pressure stats of each node in the
cluster.
Closes#79788
Today the _Size your shards_ docs focus on shard size and count, but in
fact index count and field count are also important. This commit expands
these docs a bit to cover this observation too.
Today the same-shard allocation decider falls back to checking the
hostname if the node has no host address. In practice nodes will always
have an address so the fallback is dead code. This commit removes that
dead code.
Relates #80702 which will add the ability to distinguish nodes by
hostname regardless of whether they have an address or not, and #80767
which optimizes this area of code - this refactoring should make the
optimization simpler.
In order to perform a kNN search on a `dense_vector` field, it must have
`index: true` in its mapping. This commit clarifies the error message. Before
the message was confusing, because the user likely didn't touch the `index`
parameter and might not even be aware of it.
It adds a note to the docs clarifying that when coming from 7.x, you must
explicitly update `index: true` and reindex the vectors.
Relates to #78473.
Today, a search request with PIT would fail immediately if any
associated indices or nodes are gone, which is inconsistent when
allow_partial_search_results is true.
Relates #81256
The hidden index docs did not mention that dot-prefixed patterns default
to matching hidden indices. This PR adds a note explaining the behavior
and why it's like that.
The current `multi_match` docs contain an erroneous reference to the `combined_fields` query. This updates the reference to reference the correct query.
Relates to https://github.com/elastic/elasticsearch/pull/76893
The searchable snapshot action mounts snapshots as indices
with a different prefix depending of the phase. This commit
tries to mention them in the docs.
Manipulating the contents of a snapshot repository is a very bad idea,
but it turns out we don't call this out in the docs anywhere. This
commit adds a warning about this.
The documentations states that if the `weight` field is missing, and no
explicit missing configuration is provided, a default value of 1 is used.
This is incorrect and does not match the implementation of the weighted
average aggregator. In this specific case the document is skipped, instead.
The change removes a node-level setting. It was accidentally placed with REST API changes as part of https://github.com/elastic/elasticsearch/issues/79162. This moves the breaking change to the cluster and node setting changes section.
Removes source-related query parameters from the update by query
and delete by query API documentation. These parameters don't return
source fields as part of the response.
Today we increase the verbosity of discovery failures after 5 minutes
without a master. Unfortunately 5 minutes is a common orchestration
timeout, so if discovery is broken then we see nodes being shut down
just before they start to emit useful logs. This commit reduces the
default timeout to 3 minutes to address that.
Internally we already kept track of whether a data stream is replicated by CCR.
It is part of the `DataStream` class. This just adds it to the xcontent serialization
of the get data stream api response class.
Relates to elastic/kibana#118899
Add a --url option for elasticsearch-reset-password and
elasticsearch-create-enrollment-token CLI Tools ( and any tools
that would extend BaseRunAsSuperuserCommand ).
The tools use CommandLineHttpClient internally, which tries its
best to deduce the URL of the local node based on the configuration
but there are certain cases where it either fails or returns an
unwanted result. Concretely:
- CommandLineHttpClient#getDefaultURL will always return a URL with
the port set to 9200, unless otherwise explicitly set in the
configuration. When running multiple nodes on the same host,
subsequent nodes get sequential port numbers after 9200 by default
and this means that the CLI tool will always connect the first of
n nodes in a given host. Since these tools depend on a file realm
local user, requests to other nodes would fail
- When an ES node binds and listens to many addresses, there can
be the case that not all of the IP addresses are added as SANs in
the certificate that is used for TLS on the HTTP layer.
CommandLineHttpClient#getDefaultURL will pick an address based on
a preference order but that address might not be in the SANs and
thus all requests to the node would fail due to failed hostname
verification.
Manually setting `--url` to an appropriate value allows users to
overcome these edge cases.
Today we indicate that the `unassigned.reason` field in various APIs
indicates the reason why a shard is unassigned. This isn't really true,
it tells you some information about the event that caused the shard to
_become_ unassigned (or which most recently changed its routing table
entry while remaining unassigned) but tells you almost nothing about why
the shard _is now_ unassigned and how to fix it. That's what the
allocation explain API is for. This commit clarifies this point in the
docs.
Closes#80892
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
Adds to the transport node stats a record of the distribution of the
times for which a transport thread was handling a message, represented
as a histogram.
Closes#80428
You can no longer configure `xpack.searchable.snapshot.shared_cache.size` as a user setting in ESS on 7.13+ deployments. This PR removes the ESS icon from the related 8.0 breaking change for the setting.
It also clarifies the breaking change text to indicate that configuring the setting on non-frozen nodes will result in an error on startup.
Relates to https://github.com/elastic/elasticsearch/pull/80795
* Revert "Return 200 OK response code for a cluster health timeout (#78968)"
This reverts commit a2c3daea
* Revert "Allow deprecation warning for the return_200_for_cluster_health_timeout parameter (#80178)"
This reverts commit 1c711e35fc.
* Revert "Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)"
This reverts commit b9fbe66ab0.
* Revert "Adjust the BWC version for the return200ForClusterHealthTimeout field (#79436)"
This reverts commit f60bda5685.
* Revert "Use query param instead of a system property for opting in for new cluster health response code (#79351)"
This reverts commit 8901a999
* Revert "Deprecate returning 408 for a server timeout on `_cluster/health` (#78180)"
This reverts commit f266eb32
* Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)
This reverts commit fa4d562c
* Revert "Disable BWC for #80821 (#80839)"
This reverts commit cb0e73e2fc.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add known issues for remaining system indices work
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
* [DOCS] Update ES quick start for security ON by default
* Remove code.asciidoc, which is part of the overall doc build now
* Update node names for cleanup
* Add note with links to tools
* Add --net elastic network
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Update Windows .zip install instructions for security ON by default
* Rework instructions for running as a service on Windows
* Update wording and add variable for back/forward slashes
* Relocating enroll nodes steps and introducing variables
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
#80556 reverted the deprecation of transient cluster settings. This replaces deprecation language in the docs with a warning/recommendation to avoid transient settings.
Closes#80557
# Conflicts:
# docs/reference/migration/migrate_7_16.asciidoc
Makes several changes to consolidate snapshot and backup-related docs.
Highlights:
* Adds info about supported ESS snapshot repository types
* Adds docs for Kibana's Snapshot and Restore feature
* Combines tutorial pages related to taking and managing snapshots
* Consolidates explanations of the snapshot process
* Incorporates SLM into the snapshot tutorial
* Removes duplicate "back up a cluster" pages
* Reorganizes the 8.0.0 and 8.1.0 breaking changes and deprecations into component-based categories.
* Adds an ESS icon to cluster settings on the ESS user settings allowlist.
* Adds tips for sections that aren't relevant to Cloud users.
* Updates the labels for some items to provide better context.
Co-authored-by: debadair <debadair@elastic.co>
The put repository API doesn't accept these parameters in the request body.
Co-authored-by: Ivonne Botello <87008515+ibotello@users.noreply.github.com>
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.
With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.
Closes#18272Closes#73309Closes#74545Closes#77014Closes#77053
Relates #77285
Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
Adds a `force` parameter to the delete trained models API
which when set to `true` allows deletion of a model that
is referenced by ingest pipelines or has a started deployment.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This deprecates estimated_heap_memory_usage_bytes on model put and replaces it with model_size_bytes.
On GET, only model_size_bytes is returned unless v7 rest-api compatibility is requested.
For the ml/info API, only model_size_bytes is returned
A forward-port of: #80545
Closes#76681. Our approach to using `scratch` for building Docker
images has caused problems at Docker Hub. Fix this situation by
removing the whole process of using scratch and instead bases the
default distribution on `almalinux:8.4-minimal`. Alma Linux is
binary-compatible with RHEL, and therefore very similar to UBI.
If the xpack.ml.use_auto_machine_memory_percent setting is true,
and xpack.ml.max_model_memory_limit is not set then
xpack.ml.max_model_memory_limit is now considered to be set to
the largest size that could be assigned in the cluster.
This functionality will be crucial for Cloud once the Elasticsearch
startup code is setting the Elasticsearch JVM heap size. Then the
Cloud code will no longer be able to accurately set
xpack.ml.max_model_memory_limit, so will not set it at all.
Instead the Cloud code will just set
xpack.ml.use_auto_machine_memory_percent and the ML code will
calculate the appropriate maximum model_memory_limit that should
be permitted.
This commit adds a new field deployment_stats that is optionally set for models that are deployed.
If a model does not have a deployment, it will be null.
Also, removes the get deployment stats API and makes the deployment stats action internal only.
This commit adds docs for the new `_knn_search` endpoint.
It focuses on being an API reference and is light on details in terms of how
exactly the kNN search works, and how the endpoint contrasts with
`script_score` queries. We plan to add a high-level guide on kNN search that
will explain this in depth.
Relates to #78473.
Currently, we don't support kNN search against fields in a `nested` mapping.
Before, we were checking this at search-time. This commit moves it earlier, so
you aren't even allowed to set `index: true` if the vector is in a nested
mapping. That way, users are aware of the limitation before they start to index
documents.
Relates to #78473.
When a deployment is started, we do not validate that the definition
documents are all present and not truncated. This commit adds a
validation on _start that prevents a bad state from occurring where the
deployment starts, but the model is incorrectly defined, or some unknown
error occurs to late in the deployment process.
We have a few leftover mentions of `zen` discovery, mostly for
historical/BwC reasons, which this commit removes.
Prior to this commit the default value for `discovery.type` was `zen`
but this was not written down anywhere or officially supported: the two
options were to set it to `single-node` or to omit it entirely. This
commit changes the default to `multi-node` and documents this.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This reverts the change to use segment ordinals in composite terms aggregations due to a performance degradation when the field is high cardinality.
Co-authored-by: Mark Tozzi <mark.tozzi@elastic.co>