You can no longer configure `xpack.searchable.snapshot.shared_cache.size` as a user setting in ESS on 7.13+ deployments. This PR removes the ESS icon from the related 8.0 breaking change for the setting.
It also clarifies the breaking change text to indicate that configuring the setting on non-frozen nodes will result in an error on startup.
Relates to https://github.com/elastic/elasticsearch/pull/80795
* Revert "Return 200 OK response code for a cluster health timeout (#78968)"
This reverts commit a2c3daea
* Revert "Allow deprecation warning for the return_200_for_cluster_health_timeout parameter (#80178)"
This reverts commit 1c711e35fc.
* Revert "Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)"
This reverts commit b9fbe66ab0.
* Revert "Adjust the BWC version for the return200ForClusterHealthTimeout field (#79436)"
This reverts commit f60bda5685.
* Revert "Use query param instead of a system property for opting in for new cluster health response code (#79351)"
This reverts commit 8901a999
* Revert "Deprecate returning 408 for a server timeout on `_cluster/health` (#78180)"
This reverts commit f266eb32
* Drop pre-7.2.0 wire format in ClusterHealthRequest (#79551)
This reverts commit fa4d562c
* Revert "Disable BWC for #80821 (#80839)"
This reverts commit cb0e73e2fc.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* Add known issues for remaining system indices work
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
* [DOCS] Update ES quick start for security ON by default
* Remove code.asciidoc, which is part of the overall doc build now
* Update node names for cleanup
* Add note with links to tools
* Add --net elastic network
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Update Windows .zip install instructions for security ON by default
* Rework instructions for running as a service on Windows
* Update wording and add variable for back/forward slashes
* Relocating enroll nodes steps and introducing variables
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
#80556 reverted the deprecation of transient cluster settings. This replaces deprecation language in the docs with a warning/recommendation to avoid transient settings.
Closes#80557
# Conflicts:
# docs/reference/migration/migrate_7_16.asciidoc
Makes several changes to consolidate snapshot and backup-related docs.
Highlights:
* Adds info about supported ESS snapshot repository types
* Adds docs for Kibana's Snapshot and Restore feature
* Combines tutorial pages related to taking and managing snapshots
* Consolidates explanations of the snapshot process
* Incorporates SLM into the snapshot tutorial
* Removes duplicate "back up a cluster" pages
* Reorganizes the 8.0.0 and 8.1.0 breaking changes and deprecations into component-based categories.
* Adds an ESS icon to cluster settings on the ESS user settings allowlist.
* Adds tips for sections that aren't relevant to Cloud users.
* Updates the labels for some items to provide better context.
Co-authored-by: debadair <debadair@elastic.co>
The put repository API doesn't accept these parameters in the request body.
Co-authored-by: Ivonne Botello <87008515+ibotello@users.noreply.github.com>
Today if `libffi` cannot allocate pages of memory which are both
writeable and executable then it will attempt to write code to a
temporary file. Elasticsearch configures itself a suitable temporary
directory for use by JNA but by default `libffi` won't find this
directory and will try various other places. In certain configurations,
none of the other places that `libffi` tries are suitable. With older
versions of JNA this would result in a `SIGSEGV`; since #80617 the JVM
will exit with an exception.
With this commit we use the `LIBFFI_TMPDIR` environment variable to
configure `libffi` to use the same directory as JNA for its temporary
files if they are needed.
Closes#18272Closes#73309Closes#74545Closes#77014Closes#77053
Relates #77285
Co-authored-by: Rory Hunter <roryhunter2@gmail.com>
Adds a `force` parameter to the delete trained models API
which when set to `true` allows deletion of a model that
is referenced by ingest pipelines or has a started deployment.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
This deprecates estimated_heap_memory_usage_bytes on model put and replaces it with model_size_bytes.
On GET, only model_size_bytes is returned unless v7 rest-api compatibility is requested.
For the ml/info API, only model_size_bytes is returned
A forward-port of: #80545
Closes#76681. Our approach to using `scratch` for building Docker
images has caused problems at Docker Hub. Fix this situation by
removing the whole process of using scratch and instead bases the
default distribution on `almalinux:8.4-minimal`. Alma Linux is
binary-compatible with RHEL, and therefore very similar to UBI.
If the xpack.ml.use_auto_machine_memory_percent setting is true,
and xpack.ml.max_model_memory_limit is not set then
xpack.ml.max_model_memory_limit is now considered to be set to
the largest size that could be assigned in the cluster.
This functionality will be crucial for Cloud once the Elasticsearch
startup code is setting the Elasticsearch JVM heap size. Then the
Cloud code will no longer be able to accurately set
xpack.ml.max_model_memory_limit, so will not set it at all.
Instead the Cloud code will just set
xpack.ml.use_auto_machine_memory_percent and the ML code will
calculate the appropriate maximum model_memory_limit that should
be permitted.
This commit adds a new field deployment_stats that is optionally set for models that are deployed.
If a model does not have a deployment, it will be null.
Also, removes the get deployment stats API and makes the deployment stats action internal only.
This commit adds docs for the new `_knn_search` endpoint.
It focuses on being an API reference and is light on details in terms of how
exactly the kNN search works, and how the endpoint contrasts with
`script_score` queries. We plan to add a high-level guide on kNN search that
will explain this in depth.
Relates to #78473.
Currently, we don't support kNN search against fields in a `nested` mapping.
Before, we were checking this at search-time. This commit moves it earlier, so
you aren't even allowed to set `index: true` if the vector is in a nested
mapping. That way, users are aware of the limitation before they start to index
documents.
Relates to #78473.
When a deployment is started, we do not validate that the definition
documents are all present and not truncated. This commit adds a
validation on _start that prevents a bad state from occurring where the
deployment starts, but the model is incorrectly defined, or some unknown
error occurs to late in the deployment process.
We have a few leftover mentions of `zen` discovery, mostly for
historical/BwC reasons, which this commit removes.
Prior to this commit the default value for `discovery.type` was `zen`
but this was not written down anywhere or officially supported: the two
options were to set it to `single-node` or to omit it entirely. This
commit changes the default to `multi-node` and documents this.
Co-authored-by: Adam Locke <adam.locke@elastic.co>
This reverts the change to use segment ordinals in composite terms aggregations due to a performance degradation when the field is high cardinality.
Co-authored-by: Mark Tozzi <mark.tozzi@elastic.co>
Implements a `force` parameter to the stop deployment API.
This allows a user to forcefully stop a deployment. Currently,
this specifically allows stopping a deployment that is in use
by ingest processors.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Returning 408 for a cluster health timeout was deprecated in #78180 and backported to 7.x in #78940
Now we can do a breaking change in 8.0 respecting the user choice to run ES in 7.x compatible mode via the REST Compatibility layer.
Fixes#70849
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.
Relates to #79309, #31619
The `elasticsearch.yml` file that ships with our Docker image includes the
`network.host: 0.0.0.0` setting by default. If a user bind-mounts a custom
config file, it should include this setting to ensure Elasticsearch is reachable.
Closes#77937.
* [DOCS] Update archive install docs for security ON by default
* Remove extra attribute references that aren't needed
* Incorporate security info into start page
* Update heading
This commit updates the `dense_vector` docs to include information on the new
`index`, `similarity`, and `index_options` parameters. It also tries to clarify
the difference between `similarity` and `index_options` with the existing
parameters that have the same name.
Relates to #78473.
* Adjust packaged installation docs for security on by default
This commit introduces necessary changes to guide users through
the installation of our DEB/RPM packages, now that security is
enabled and configured by default.
* Update security docs and configure includes
* Update wording in check-running.asciidoc
* Adding hidden GET request
* Update heading
* Updated reconfigure heading
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Changes:
* Adds a transient settings migration guide to the 7.16 docs.
* Updates the related deprecation docs to link to the guide.
Closes#80055
Relates to #79167.
Adds new sections for optional fields and optional `by` fields. Also revises some existing content to define **join keys**.
Closes#79910
Relates to #79677
Adds `start_time` to the get deployment stats API for the deployment
and each allocation.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* 8.0.0-beta1 release notes (#79969)
* initial release notes
* [DOCS] Adds known issues section with an item about rolling upgrade.
* Edits ML PRs
* Update docs/reference/release-notes/8.0.0-beta1.asciidoc
Co-authored-by: David Roberts <dave.roberts@elastic.co>
* Update docs/reference/release-notes/8.0.0-beta1.asciidoc
Co-authored-by: David Roberts <dave.roberts@elastic.co>
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: lcawl <lcawley@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
* Fix typo and tidy a bit
Co-authored-by: Jake Landis <jake.landis@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: lcawl <lcawley@elastic.co>
Co-authored-by: David Roberts <dave.roberts@elastic.co>
The `terms` agg picks the top `size` terms in a single scatter/gather
pass across all the shards. For the default `order` and if you `order`
by `_key` this works quite well. Some errors creep in, but it's fairly
easy to point to them and understand them. But ordering by doc count
ascending is like inviting the error vampire into your agg. It's super
easy to get inaccurate results. This updates the docs to be more stark
about it. Closes#72684
Changes several H3s in the EQL syntax page to H4s. We previously bumped up several H4s to H3s to display them in the "On this page" TOC. With elastic/docs#2237, the TOC now displays H4s.
Relates to #65497.
the infer endpoint has changed its format.
Also, the results format for the various tasks have changed. This updates the docs to match what is currently in 8.0.0.
Adds documentation informing the reader about how the Elasticsearch Service / Elastic Cloud Enterprise upgrade process and autoscaling system automate the migration of the Elasticsearch nodes and ILM policies to using node.roles rather than filtering on node attributes.
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
This commit makes the two following changes (along with some
refactoring) - Nlp results will now indicate if the input was truncated
or not - The default truncation is now `none` instead of `first`
Changes:
* Removes several `[testenv="gold+"]` attributes from the docs. `gold+` is not a valid [subscription level](https://www.elastic.co/subscriptions) or testenv value.
* Moves two `[testenv="basic"]` attributes to the file header. This makes the `testenv` placement consistent and fixes the yml file generated from `docs/reference/snapshot-restore/register-repository.asciidoc`.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Resolves#72151 The _sql endpoint offers a `page_timeout` parameter for
customizing how long scroll contexts should be kept open (if needed) and
a `request_timeout` parameter which the docs describe as "Timeout before
the request fails.". Currently, the value of the `page_timeout`
parameter is used as the `timeout` in subsequent _search requests and
not as the timeout in the `scroll` configuration. For the `scroll`
configuration, SQL uses the `request_timeout` parameter. This PR
addresses the issue by swapping the uses of `page_timeout` and
`request_timeout` in querier. Additionally, the PR removes some unused
artifacts that might have caused some confusion: - The `timeout` and
`keepAlive` fields in `Querier`. Instead, `Querier` directly uses the
according fields in `SqlConfiguration`. - The `SqlConfiguration`
parameter from `ScrollCursor.clear`, it's not used but required an
instance of `SqlConfiguration` with all default values. - One overloaded
constructor of `SqlConfiguration` that was only used for calling
`ScrollCursor.clear` (and some tests) and used default values for an
(arbitrary?) subset of the fields. - The fields related to async
requests in `SqlConfiguration`. I'm a bit unsure about this one but the
fields are never read and it does not seem like an SQL specific concern.
The whole creation of the async tasks is handled in
`TransportSqlQueryAction` and the downstream components do not require
the information.
Adds an `experimental` annotation to the following:
* `time_series_metric` mapping parameter
* `time_series_dimension` mapping parameter
* `index.mapping.dimension_fields.limit` index setting
* `time_series_dimension` and `time_series_metric` properties in the field caps API response
When restoring a snapshot to a new cluster, users may expect the cluster
to not contain any conflicting indices or data streams. However, some
features, such as the GeoIP processor, automatically create indices at
startup.
This adds and updates related procedures in the restore a snapshot tutorial.
I plan to improve other documentation related to feature states in snapshots
in a separate PR(s).
This PR also updates the restore snapshot API's example to include
the `indices` and `feature_states` parameters.
Relates to #79675
The create SLM policy API's `max_count` parameter limits the number of
snapshots for a policy. Only successful snapshot attempts count toward
this limit. Failed snapshot attempts do not.
Changes:
* Documents the `wildcard` parameter for the `wildcard` query. This parameter is an alias for the `value` parameter.
* Reorders the parameters alphabetically.
Closes#79711
Currently the fleet search URL of /_fleet/_msearch will collide with the
normal msearch API when the fleet plugin is not enabled. This is because
_fleet will be identified as an index to search. This commit resolves
the issue by changing the APIs to /_fleet/_fleet_search and
/_fleet/_fleet_msearch.
Resolves#79480
My initial thought was to change the properties to be interpreted as seconds but this might not be worth it. All relevant places in the code seem to assume the timeouts to be in ms and there does not seem to be a consistent use of ms or s across JDBC drivers (Postgres uses seconds, MySQL uses ms, MS SQL mixes the two depending on the connection property).
Hence, just fixing the docs might be easier.
The original example of "snapped" does not apply to this section since it is talking about edge ngrams.
The change replaces the term with "approximate" as a valid example.
This change introduces a new CLI tool that can be used to set and
reset the password of all the built-in users and users in the native
realm in Elasticsearch. It depends on the file realm being enabled
(which it is, by default) and can (re)set one built-in user password at a time.
It removes the previously introduced elasticsearch-reset-elastic-password
and elasticsearch-reset-kibana-system-password as their functionality is
covered by this new tool.
PR #55884 removed documentation for several query parameters from the search API
docs. During tests, I failed to notice that these are valid parameters but require other parameters to use.
Changes:
* Notes the following search API parameters require the `q` query string parameter:
* `analyzer`
* `analyze_wildcard`
* `default_operator`
* `df`
* `lenient`
* Notes the following search API parameters require the `suggest_field` and `suggest_text` query parameters:
* `suggest_mode`
* `suggest_size`
* Re-adds the above parameters to the search API docs.
These changes also affect API documentation that reuses the search API parameters:
* Delete by query API
* Update by query API
* Count API
* Explain API
* Validate API
Closes#79674
Today we have a short note in one place in the docs saying not to touch
the contents of the data path. This commit expands the warning to
describe more precisely what is forbidden, and to give some more detail
of the consequences, and also duplicates the warning to the other
location that documents the `path.data` setting.
With Security ON by default project where the `elastic` user
password is auto-generated, we have decided to deprecate the
setup-passwords tool and consider removing it in a future version.
Users will get a password for the `elastic` built-in user when the
node starts for the first time and they can also use the newly
introduced elastisearch-reset-elastic-password tool to set or
reset that password. With credentials for the elastic user
available, the password for the rest of the built-in users can be
set using the Change Password API, or via Kibana.
This commit fixes a handful of bugs with categorize_text agg
- The agg now fails on fields that are not text fields
- Limits the number of tokens categorized
- Validates the configuration inputs to disallow settings above static maximums
Changes:
- Notes snapshot names support date math
- Sorts request body parameters alphabetically
- Adds the `expand_wildcards` request body parameter
- Reuses cluster state contents list from the restore snapshot API
- Notes the `indices` and `feature_states` parameters support a special `none` value
Relates to #79081
Deprecate the script context cache in favor of the general cache.
Users should use the following settings:
`script.max_compilations_rate` to set the max compilation rate
for user scripts such as filter scripts. Certain script contexts
that submit scripts outside of the control of the user are
exempted from this rate limit. Examples include runtime fields,
ingest and watcher.
`script.cache.max_size` to set the max size of the cache.
`script.cache.expire` to set the expiration time for entries in
the cache.
Whats deprecated?
`script.max_compilations_rate: use-context`. This special
setting value was used to turn on the script context-specific caches.
`script.context.$CONTEXT.cache_max_size`, use `script.cache.max_size`
instead.
`script.context.$CONTEXT.cache_expire`, use `script.cache.expire`
instead.
`script.context.$CONTEXT.max_compilations_rate`, use
`script.max_compilations_rate` instead.
The default cache size was increased from `100` to `3000`, which
was approximately the max cache size when using context-specific caches.
The default compilation rate limit was increased from `75/5m` to
`150/5m` to account for increasing uses of scripts.
System script contexts can now opt-out of compilation rate limiting
using a flag rather than a sentinel rate limit value.
7.16: Script: Deprecate script context cache #79508
Refs: #62899
7.16: Script: Opt-out system contexts from script compilation rate limit #79459
Refs: #62899
* Update rolling_upgrade.asciidoc
As discussed in https://github.com/elastic/elasticsearch/issues/77007#issuecomment-909087934, it was decided that documentation on rolling upgrade should explicitly mention upgrading by tiers.
* Update docs/reference/upgrade/rolling_upgrade.asciidoc
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
This PR deprecates all monitoring settings as well as adds deprecation info entries for each setting.
Collecting and shipping monitoring data using the Monitoring plugin will be deprecated in 7.16 and will be removed at some point in the 8.x line after sufficient wait time. The recommended approach for collecting and shipping monitoring data going forward is to use Metricbeat. The recommended approach for alerting is Kibana alerting.
This PR adds a framework for migrating system indices as necessary prior
to Elasticsearch upgrades. This framework uses REST APIs added in
another commit:
- GET _migration/system_features
This API, which gets the status of "features" (plugins which own system
indices) with regards to whether they need to be upgraded or not. As of
this PR, this API also reports errors encountered while migrating system
indices alongside the index that was being processed when this occurred.
As an example of this error reporting:
```json
{
"feature_name": "logstash_management",
"minimum_index_version": "8.0.0",
"upgrade_status": "ERROR",
"indices": [
{
"index": ".logstash",
"version": "8.0.0",
"failure_cause": {
"error": {
"root_cause": [
{
"type": "runtime_exception",
"reason": "whoopsie",
"stack_trace": "<omitted for brevity>"
}
],
"type": "runtime_exception",
"reason": "whoopsie",
"stack_trace": "<omitted for brevity>"
}
}
}
]
}
```
- POST _migration/system_features
This API starts the migration process. The API for this has no changes,
but when called, any system indices which need to be migrated will be
migrated, with status information stored in the cluster state for later
use by the GET _migration/system_features API.
Recently we have deprecated a number of settings in monitoring. These settings should be represented in the deprecation info API. This PR will be backported with some minor changes to the 7.x branch so that we can start the deprecation process in that release cycle.
* Add note in breaking changes for nameid_format
We changed the default for `nameid_format` in 8.0 in #44090 but
did not add anything to the breaking changes in the release notes.
This change amends that.
* remove reference to settings
* Fix docs build
* Accepting most of James' suggested changes
Thanks James!
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Incorporating changes from Ioannis
* Apply suggestions from code review
Co-authored-by: Tim Vernum <tim@adjective.org>
* Apply suggestions from code review
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Tim Vernum <tim@adjective.org>
When running a rate aggregation without setting the field parameter, the result is computed based on the bucket doc_count.
This PR adds support for a custom _doc_count field.
Closes#77734
The original change was implemented in #78940, bu we have decided to move from a system property to an a request parameter, so Cloud users/clients have an easier way to opt-in for the new status code.
Relates #70849
Today we limit the max number of concurrent snapshot file restores
per recovery. This works well when the default
node_concurrent_recoveries is used (which is 2). When this limit is
increased, it is possible to exhaust the underlying repository
connection pool, affecting other workloads.
This commit adds a new setting
`indices.recovery.max_concurrent_snapshot_file_downloads_per_node` that
allows to limit the max number of snapshot file downloads per node
during recoveries. When a recovery starts in the target node it tries
to acquire a permit that allows it to download snapshot files when it is
granted. This is communicated to the source node in the
StartRecoveryRequest. This is a rather conservative approach since it is
possible that a recovery that gets a permit to use snapshot files
doesn't recover any snapshot file while there's a concurrent recovery
that doesn't get a permit could take advantage of recovering from a
snapshot.
Closes#79044
This change adds all the JodaCompatibleZonedDateTime methods that no longer exist to the
migration docs with their ZonedDateTime equivalents.
Fixes: #78739
Changes can-match from a shard-level to a node-level action, which helps avoid an explosion of shard-level can-match
subrequests in clusters with many shards, that can cause stability issues. Also introduces a new search_coordination
thread pool to handle the sending and handling of node-level can-match requests.
Since #65905 Elasticsearch has determined the Java heap settings
from node roles and total system memory.
This change allows the total system memory used in that calculation
to be overridden with a user-specified value. This is intended to
be used when Elasticsearch is running on a machine where some other
software that consumes a non-negligible amount of memory is running.
For example, a user could tell Elasticsearch to assume it was
running on a machine with 3GB of RAM when actually it was running
on a machine with 4GB of RAM.
The system property is `es.total_memory_bytes`, so, for example,
could be specified using `-Des.total_memory_bytes=3221225472`.
(It is specified in bytes rather than using a unit, because it
needs to be parsed by startup code that does not have access to
the utility classes that interpret byte size units.)
This PR changes uses of transient cluster settings to
persistent cluster settings.
The PR also deprecates the transient settings usage.
Relates to #49540
The docs for `GET _nodes/<node>/<metric>` omitted a couple of metrics
and indicated that this API returned dynamic stats rather than static
info. They also didn't mention that `_all` is a legal value, nor
did it give a way to suppress all metrics even though this is possible.
This commit adjusts the docs and adds tests to ensure that selecting
metrics works as expected and to ensure that there is a future-proof
legal way to suppress all metrics.
Closes#79187
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* document accept_enterprise parameter
This was added in 7.6, will be deprecated in 8.x, and removed in 9.x+ (noted in text).
https://github.com/elastic/elasticsearch/pull/50067
* Update wording and deprecation notice
* Incorporate review feedback
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Ken MacInnis <ken.macinnis@elastic.co>
* A typo error
a space between 'E' and 'cluster...'
* Update example, fix headings, change notes
Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Marwane Chahoud <marwane.chahoud@gmail.com>
This commit adds the ability to configure a list of settings that will be ignored by the deprecation info
API. Any deprecation messages for any of the settings given will be suppressed. This can be used to hide
settings that users do not have the ability to change.
Relates #78725
Changed "This lets you to independently scale resources for each task." to "This allows you to independently scale resources for each task."
Co-authored-by: wakejordan <90637320+wakejordan@users.noreply.github.com>
This is related to #71449. This commit adds a specialized search API
which allows users to pass wait on refresh checkpoints. When users pass
these checkpoints to the API, the search will only be executed after the
checkpoints are visible after a refresh.
Add a section in the docs that describe a number of node level settings
for the enrich processor.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.
Relates to #68920
* [DOCS] Fix default value for closed indices
#57953 introduced changes that added ESS icons to many Elasticsearch settings. As part of those changes, the default value for `cluster.indices.close.enable` was indicated as `false`, when it should be `true`. This PR updates the default value to `true`.
Closes#78877
* Update description
* Update note to remove outdated claims
We document that `GET /_index_template/...` accepts a comma-separated
list of template names but in fact today this API accepts only a single
name or pattern. Likewise `GET /_cat/templates/...` (at least it didn't
until #78829 but that's not released yet). This commit fixes the docs to
indicate these APIs accept only a single template name and also adds
some extra validation to reject requests containing a `,` since such a
request cannot match any actual templates.
It also adjusts `GET /_cat/templates` to use the filtering built into
`TransportGetComposableIndexTemplateAction` rather than retrieving all
templates and then filtering them on the coordinating node.
Since Kibana's Discover switched to retrieving values via the fields API rather than source there have been gaps in the display caused by "ignored" fields (those that fall foul of ignore_above and ignore_malformed size and formatting rules).
This PR returns ignored values from source when a user-requested field fails to be parsed for a document. In these cases the corresponding hit adds a new ignored_field_values section in the response.
Closes#74121
Add an _upgrade endpoint to bulk upgrade transforms. _upgrade rewrites all transforms and its
artifacts into the latest format to the latest storage(index). If all transforms are upgraded old
indices and outdated documents get deleted. Using the dry_run option it is possible to check if
upgrades are necessary without applying changes.
* [DOCS] EQL: Consistently use 'statement'
We describe `with runs` as a 'statement.' This updates `with maxspan`
to use the same terminology.
* whitespace
This commit adds the new normalize_above parameter to the p_value significant
terms heuristic.
This parameter allows for consistent significance results at various scales. When a total count (in or out of the set background set) is above the normalize_above parameter, both the total set and the set including the term are scaled by normalize_above/count where count is term in the set or total set size.
In https://github.com/elastic/kibana/pull/113783, we renamed Kibana's **Ingest Pipelines** feature to **Ingest Pipelines**. This updates screenshots and references for the feature. It also replaces a few remaining `ingest node pipeline` references.
Introduces a setting cluster.deprecation_indexing.x_opaque_id_used.enabled to disable use of
x-opaque-id in RateLimitingFilter. This will be used for deprecation
logs indexing and will not affect logging to files (it uses different
instance of RateLimitingFilter with this flag enabled by default)
Changes the indices backing a deprecation log data stream to be hidden.
Refactors DeprecationHttpIT to be more reliable
relates #76292closes#77936
* Index prefixes for searchable snapshots
added a note about how ILM managed indices are prefixed with "restored-" or "partial-" when they are either fully or partially mounted for searchable snapshots
* Apply suggestions from code review
Co-authored-by: debadair <debadair@elastic.co>
* Implement and test get feature upgrade status API
* Add integration test for feature upgrade endpoint
* Use constant enum for statuses
* Add unit tests for transport class methods
* WIP, basic implementation
* Pull `if` branch into a variable
* Remove outdated javadoc
* Remove map iteration, use target name instead of id (whoops)
* Remove streaming from isReplacementSource
* Simplify getReplacementName
* Only calculate node shutdowns if canRemain==false and forceMove==false
* Move canRebalance comment in BalancedShardsAllocator
* Rename canForceDuringVacate -> canForceAllocateDuringReplace
* Add comment to AwarenessAllocationDecider.canForceAllocateDuringReplace
* Revert changes to ClusterRebalanceAllocationDecider
* Change "no replacement" decision message in NodeReplacementAllocationDecider
* Only construct shutdown map once in isReplacementSource
* Make node shutdowns and target shutdowns available within RoutingAllocation
* Add randomization for adding the filter that is overridden in test
* Add integration test with replicas: 1
* Go nuts with the verbosity of allocation decisions
* Also check NODE_C in unit test
* Test with randomly assigned shard
* Fix test for extra verbose decision messages
* Remove canAllocate(IndexMetadat, RoutingNode, RoutingAllocation) overriding
* Spotless :|
* Implement 100% disk usage check during force-replace-allocate
* Add rudimentary documentation for "replace" shutdown type
* Use RoutingAllocation shutdown map in BalancedShardsAllocator
* Add canForceAllocateDuringReplace to AllocationDeciders & add test
* Switch from percentage to bytes in DiskThresholdDecider force check
* Enhance docs with note about rollover, creation, & shrink
* Clarify decision messages, add test for target-only allocation
* Simplify NodeReplacementAllocationDecider.replacementOngoing
* Start nodeC before nodeB in integration test
* Spotleeeessssssss! You get me every time!
* Remove outdated comment
If the _nodes/stats API received a level=shards request parameter, then the response would have two "shards" fields,
which would cause problems with json parsers. This commit renames the "shards" field that currently only contains
"total_count" to "shard_stats".
Relates #78311#75433
Fixes a couple of erroneous references related to system indices in the snapshot restore tutorial:
* Calling the delete index API on `*` will only delete
some system indices, such as the `.security`. It won't delete others, such as
`.geoip_databases`.
* Not all dot indices are system indices. Some are just hidden indices.
Relates to #76929
The composite aggregation is considered expensive. Users should perform load testing before deploying it in production.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
The documentation indicates that `stack.templates.enabled` can be used in Elasticsearch Service, but it is not part of the settings allowlist in ESS. This PR makes the documentation match the state of the allowlist.
This PR adds a MonitoringIndexTemplateRegistry to the monitoring plugin which automatically
installs all monitoring templates locally when the plugin is initialized. Exporters have been
updated to no longer attempt installation of the monitoring templates, and instead will wait for
the templates to become available before setting themselves as started. Some older
functionality related to templates has been removed as well, such as the expectation that
version 6 monitoring templates are installed, as well as the setting that controls their installation
(xpack.monitoring.exporters.<EXPORTER>.index.template.create_legacy_templates).
This change removes several pieces of deprecated code from stored scripts.
Stored scripts/templates are no longer allowed to be an empty and will throw an exception when used
with PutStoredScript.
ScriptMetadata will now drop any existing stored scripts that are empty with a deprecation warning in
the case they have not been previously removed.
The code field is now only allowed as source as part of a PutStoredScript JSON blob.
As the script has only access to the nested document, this should be
documented.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
* [DOCS] Add Beats config example for ingest pipelines
The Elasticsearch ingest pipeline docs cover ingest pipelines for Fleet and
Elastic Agent. However, the docs don't cover Beats. This adds those docs.
Relates to https://github.com/elastic/beats/pull/28239.
* Update docs/reference/ingest.asciidoc
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
Co-authored-by: DeDe Morton <dede.morton@elastic.co>
The 'verbose' option to /_segments returns memory information
for each segment. However, lucene 9 has stopped tracking this memory
information as it is largely held off-heap and so is no longer significant.
This commit deprecates the 'verbose' parameter and makes it a no-op.
Fixes#75955
This commit adds a new multi-bucket aggregation: `categorize_text`
The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read.
Key difference is that it does not use the indexed field's analyzer, but instead relies on
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.
The high level logical flow is as follows:
- at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
- Read in the particular tokens from the analyzer
- Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
- Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
- Merge all buckets that have the EXACT same key
- Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
- That tree builds the final buckets and that is returned to the user
Algorithm explanation:
- Each log is parsed with the ml-standard tokenizer
- each token is passed into a token tree
- For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
- If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
- merging simply replaces tokens that are different in the group with `*`
- If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.
## Aggregation configuration.
Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
"aggs": {
"categories": {
"categorize_text": {
"field": "message", // The field to categorize
"similarity_threshold": 20, // merge log groups if they are this similar
"max_unique_tokens": 20, // Max Number of children per token position
"max_match_token": 4, // Maximum tokens to build prefix trees
"size": 1
}
}
}
}
```
This will return buckets like
```json
"aggregations" : {
"categories" : {
"buckets" : [
{
"doc_count" : 806,
"key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
}
]
}
}
```
The get SLM status API will only return one of three statuses: `RUNNING`, `STOPPING`, or `STOPPED`.
This corrects the docs to remove the `STARTED` status and document the `RUNNING` status.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
In #77686 we added a service to clean up blob store
cache docs after a searchable snapshot is no more
used. We noticed some situations where some cache
docs could still remain in the system index: when the
system index is not available when the searchable
snapshot index is deleted; when the system index is
restored from a backup or when the searchable
snapshot index was deleted on a version before #77686.
This commit introduces a maintenance task that
periodically scans and cleans up unused blob cache
docs. This task is scheduled to run every hour on the
data node that contain the blob store cache primary
shard. The periodic task works by using a point in
time context with search_after.
For this grid type, the features on the aggregation layer are represented by a point that is computed from the
centroid of the data inside the cell
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Documents the `runs` keyword for running the same event criteria successively in a sequence query.
Relates to #75082.
# Conflicts:
# docs/reference/release-notes/highlights.asciidoc
Documents `archived.*` persistent cluster settings and index settings.
These settings are commonly produced during a major version upgrade.
Closes#28027
* Add stubs for get API
* Add stub for post API
* Register new actions in ActionModule
* HLRC stubs
* Unit tests
* Add rest api spec and tests
* Add new action to non-operator actions list
This change removes JodaCompatibleZonedDateTime and replaces it with ZonedDateTime for use in
scripting.
Breaking changes:
* JodaCompatibleDateTime no longer exists and cannot be cast to in Painless. Use ZonedDateTime
instead.
* The dayOfWeek method on ZonedDateTime returns the DayOfWeek enum instead of an int from
JodaCompatibleDateTime. dayOfWeekEnum still exists on ZonedDateTime as an augmentation to
support the transition to ZonedDateTime, but is now deprecated in favor of dayOfWeek on
ZonedDateTime.
* [DOCS] Always enable file and native realms by default
Adds an 8.0 breaking change for PR #69096.
The copy is based on the 7.13 deprecation notice added with PR #69320.
* reword
* Update docs/reference/migration/migrate_8_0/security.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Update docs/reference/migration/migrate_8_0/security.asciidoc
Co-authored-by: Yang Wang <ywangd@gmail.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
* [ML] add documentation for get deployment stats API
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Improve docs for pre-release version compatibility
Follow-up to #78317 clarifying a couple of points:
- a pre-release build can restore snapshots from released builds
- compatibility applies if at least one of the local or remote cluster
is a released build
* Remote cluster build date nit
Monitoring installs a number of ingest pipelines which have been historically used
to upgrade documents when mappings and document structures change between
versions. Since there aren't any changes to the document format, nor will there be
by the time the format is completely retired, we can comfortably remove these
pipelines.
Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels.
This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be:
"Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...].
This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification.
See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works.
The zeroshot classification task is configured as follows:
```js
{
// <snip> model configuration </snip>
"inference_config" : {
"zero_shot_classification": {
"classification_labels": ["entailment", "neutral", "contradiction"], // <1>
"labels": ["sad", "glad", "mad", "rad"], // <2>
"multi_label": false, // <3>
"hypothesis_template": "This example is {}.", // <4>
"tokenization": { /*<snip> tokenization configuration </snip>*/}
}
}
}
```
* <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case
* <2> This is an optional parameter for the default zero_shot labels to attempt to classify
* <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels
* <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.`
For inference in a pipeline one may provide label updates:
```js
{
//<snip> pipeline definition </snip>
"processors": [
//<snip> other processors </snip>
{
"inference": {
// <snip> general configuration </snip>
"inference_config": {
"zero_shot_classification": {
"labels": ["humanities", "science", "mathematics", "technology"], // <1>
"multi_label": true // <2>
}
}
}
}
//<snip> other processors </snip>
]
}
```
* <1> The `labels` we care about, these replace the default ones if they exist.
* <2> Should the results allow multiple true labels
Similarly one may provide label changes against the `_infer` endpoint
```js
{
"docs":[{ "text_field": "This is a very happy person"}],
"inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}}
}
```
We deprecated support for multiple data paths (MDP) in 7.13. However,
we won't remove support until after 8.0.
Changes:
* Reverts PR #72267, which removed MDP docs
* Removes a related item from the 8.0 breaking changes.
The reference manual includes docs on version compatibility in various
places, but it's not clear that these docs only apply to released
versions and that the rules for pre-release versions are stricter than
folks expect. This commit adds some words to the docs for unreleased
versions which explains this subtlety.
Changes:
* Documents the `time_series_metric` mapping parameter for PR #76766.
* Renames the `dimension` parameter to `time_series_dimension` for PR #78012.
* Adds support for `unsigned_long` to `time_series_dimension` for PR #78204.
This adds a new "elasticsearch-keystore show" command that displays
the value of a single secure setting from the keystore.
An optional `-o` (or `--output`) parameter can be used to direct
output to a file.
The `-o` option is required for binary keystore values
because the CLI `Terminal` class does not support writing binary data.
Hence this command:
elasticsearch-keystore show xpack.watcher.encryption_key > watcher.key
would not produce a file with the correct contents.
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* [DOCS] Update remote cluster docs
* Add files, rename files, write new stuff
* Plethora of changes
* Add test and update snippets
* Redirects, moved files, and test updates
* Moved file to x-pack for tests
* Remove older CCS page and add redirects
* Cleanup, link updates, and some rewrites
* Update image
* Incorporating user feedback and rewriting much of the remote clusters page
* More changes from review feedback
* Numerous updates, including request examples for CCS and Kibana
* More changes from review feedback
* Minor clarifications on security for remote clusters
* Incorporate review feedback
Co-authored-by: Yang Wang <ywangd@gmail.com>
* Some review feedback and some editorial changes
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
This allows consumers of the API to be able to know exactly if all the features in a tile has been considered
when building the hits layer of a vector tile
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>