Since 7.15 transform automatically optimizes the grouping part of the query. We therefore can delete group_by order advise in the transform at scale documentation so it gets simpler.
Allows searching on ip fields when those fields are not indexed (index: false) but just doc values are enabled.
This enables searches on archive data, which has access to doc values but not index structures. When combined with
searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set
of documents.
Relates #81210 and #52728
Closes#69533.
The Docker docs mention bind-mounting the `config`, `data` and
`logs` directories when using an arbitrary UID / GID, but they fail
to mention that the `plugins` dir must also be mounted in order to
install plugins.
* Rolling restart instructions
In a rolling restart scenario, indexing do not need to be stopped. This step is optional. I have updated Rolling restart/ step 2.
Note: I am not sure how to check if the doc, once generated, will keep the numbering as section 1 and 3 are references to the full restart, can you check this detail?
* Clarify that shard recovery can be faster
Co-authored-by: Adam Locke <adam.locke@elastic.co>
User can no longer set location for Hunspell dictionaries. `<config-dir>/hunspell` directory is silently used everytime no matter what configuration is used.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit 1a4fd34129)
Co-authored-by: Jan Jíša <jenda.jisa@gmail.com>
The `GET _cluster/state` API is really only suitable for debugging or
diagnostics. Its response format is not documented since it changes
fairly freely between versions.
Today we mention in its docs that this API is unstable, and deliberately
omit a description of its response format, but we don't explicitly say
that it's only for diagnostics and is unsuitable for consumption by
external tools that might try and use it for monitoring.
This commit adjusts the docs to give some more explicit guidance about
how it should and shouldn't be used.
Allows searching on boolean fields when those fields are not indexed (index: false) but just doc values are enabled.
This enables searches on archive data, which has access to doc values but not index structures. When combined with
searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set
of documents.
Relates #81210 and #52728
Allows searching on keyword fields when those fields are not indexed (index: false) but just doc values are enabled.
This enables searches on archive data, which has access to doc values but not index structures. When combined with
searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set
of documents.
Relates #81210 and #52728
* Upgrade documentation for xpack.monitoring.history.duration
* A few updates now that https://github.com/elastic/elasticsearch/pull/82498 has been opened with a concrete policy.
* Grammar fix
* Update docs/reference/migration/migrate_8_0/cluster-node-setting-changes.asciidoc
Co-authored-by: James Baiera <james.baiera@gmail.com>
Co-authored-by: James Baiera <james.baiera@gmail.com>
* Add support for HTTP Proxies for the GCS repository
The change adds 3 new client properties for the GCS repository:
* gcs.client.default.proxy.type
* gcs.client.default.proxy.host
* gcs.client.default.proxy.port
They allow to configure a [java.net.Proxy](https://docs.oracle.com/en/java/javase/17/docs/api/java.base/java/net/Proxy.html)
for the GCS SDK to use when communicating with the GCS API.
Resolves#82444
* [DOCS] Remove extraneous Elasticsearch Docker image information
In the step for starting Elasticsearch with the generated enrollment token, `docker.elastic.co/elasticsearch/elasticsearch:` was included in conjunction with the `{docker-image}` variable. This include led to a duplicate Docker image that displayed as `ocker.elastic.co/elasticsearch/elasticsearch:docker.elastic.co/elasticsearch/elasticsearch:8.0.0-rc1`. This PR removes the duplicate image information.
* Update ifeval statements and add sub-heading for setting JVM heap size
There have been many requests to support repository-s3 authentication via IAM roles in Kubernetes service accounts.
The AWS SDK is supposed to support them out of the box with the aws-java-sdk-sts library. Unfortunately, we can't use WebIdentityTokenCredentialsProvider from the SDK. It reads the token from AWS_WEB_IDENTITY_TOKEN_FILE environment variable which is usually mounted to /var/run/secrets/eks.amazonaws.com/serviceaccount/token and the S3 repository doesn't have the read permission to read it. We don't want to hard-code a file permission for the repository, because the location of AWS_WEB_IDENTITY_TOKEN_FILE can change at any time in the future and we would also generally prefer to restrict the ability of plugins to access things outside of their config directory.
To overcome this limitation, this change adds a custom WebIdentityCredentials provider that reads the service account from a symlink to AWS_WEB_IDENTITY_TOKEN_FILE created in the repository's config directory. We expect the end user to create the symlink to indicate that they want to use service accounts for authentification.
Service accounts are checked and exchanged for session tokens by the AWS STS. To test the authentification flow, this change adds a test fixture which mocks the assume-role-with-web-identity call to the service and returns a response with test credentials.
Fixes#52625
Documents the `EMPTY` and `NONE` `flag` values for the `regexp` query.
Also documents the `""` (empty string) value, which is an alias for `ALL`.
Closes#81978.
Removing the mock nio transport and replacing its usage with the netty transport to make tests
with a more realistic transport implementation. This way improves the real world coverage for
the Netty transport, makes our tests more realistic and saves lots of code.
In particular, coverage on the rather complicated throttling/chunking in the netty message handler
is really ice to have.
The downside of this change is that we lose the slow transport thread warnings that the mock transport
outputs. This isn't a big deal these days in my opinion as we have slow logging in other places
now that makes up for this (we didn't when initially adding the slow logging) and that contains
far more detailed information on what exactly was slow.
Other than that, the mock transport does not come with any features we don't also have in the Netty
transport at this point.
Similar to #82409, but for date fields.
Allows searching on date field types (date, date_nanos) when those fields are not indexed (index: false) but just doc
values are enabled.
This enables searches on archive data, which has access to doc values but not index structures. When combined with
searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set
of documents.
Relates #81210 and #52728
The simulate template api should include the settings that registered IndexSettingProvider generate.
Currently, these settings are not included in the simulate template api response,
but are only used to create a dummy IndexService instance to validate aliases.
This commit adds `search_interval` to the datafeed stats API
`running_state` object. When the datafeed is running, it reports
the last search interval that was searched. It is useful to
understand the point in time where the datafeed is currently
searching.
Closes#82405
### Changes
* Updates the snapshot version compatibility table to use minor versions rather than major versions.
* Adds a index creation version and cluster compatibility table. Updates the index compatibility section to use minor versions.
* Moves the tables to separate files. This'll help prevent merge conflicts.
* Fixes the heading level for the "Warnings" section.
Allows searching on number field types (long, short, int, float, double, byte, half_float) when those fields are not
indexed (index: false) but just doc values are enabled.
This enables searches on archive data, which has access to doc values but not index structures. When combined with
searchable snapshots, it allows downloading only data for a given (doc value) field to quickly filter down to a select set
of documents.
Note to reviewers:
I have split isSearchable into two separate methods isIndexed and isSearchable on MappedFieldType. The former one is
about whether actual indexing data structures have been used (postings or points), and the latter one on whether you
can run queries on the given field (e.g. used by field caps). For number field types, queries are now allowed whenever
points are available or when doc values are available (i.e. searchability is expanded).
Relates #81210 and #52728
Changes:
* Moves the get SLM policy API example _after_ the get SLM stats API. This seems to fit the normal workflow better where a user will drill-down into a particular policy to get more information.
* Notes some more information about what the get SLM policy API returns. In particular, it notes that you can get the error message for the last policy failure.
Replaces a 7.x reference with the `{prev-major-last}` variable, which will use the last minor of the previous major (currently 7.17 for 8.x).
Relates to #70451.
Changes:
* Updates the Cloud setup instructions to note that you open Kibana by default.
* Reorders the API call section to highlight Kibana.
* Fixes the Docker `ifeval` to hide some text on unreleased branches.
This enhances the migrate to data tiers routing API to also iterate over
the existing legacy, composable, and component templates and look if
they define a custom node attribute routing in their settings for either
`index.routing.allocation.require.{nodeAttrName}` or
`index.routing.allocation.include.{nodeAttrName}`. If any does, we
update them to remove all the routings settings for the provided
`nodeAttrName`.
eg. any template with the following setting configuration:
```
"settings": {
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "rack1",
index.routing.allocation.exclude.data: "rack2,rack3"
}
```
will have its settings updated to:
```
"settings": {}
```
Adds two known issues to the 8.0.0-rc1 release notes:
* A general advisory not to upgrade production clusters to 8.0.0-rc1. The advisory also notes that upgrades from 8.0.0-rc1 are not supported.
* A note indicating that the SQL JDBC driver requires Java 17 or newer. This requirement will be changed to Java 1.8
or newer in 8.0.0-rc2.
(cherry picked from commit 31cba4fa16)
Cosine similarity is not defined when one of the vectors has zero magnitude.
Before, the kNN search endpoint threw a confusing exception related to top docs
collection. Now we reject vectors early with a clear error message, failing
indexing if the vector has zero magnitude.
As of 8.0, the compatibility window for cross-cluster search (CCS) to an earlier release will be one minor release. This updates the CCS docs and adds a related 8.0 breaking change.
Closes https://github.com/elastic/elasticsearch/issues/80782
Emit deprecation warning when creating new jobs with bucket spans that
aren't an integral divisor or multiple of a day.
Relates #81645
Co-authored-by: lcawl <lcawley@elastic.co>
* Adds a prerequisites section covering remote cluster config, node roles, and security.
* Moves existing content about remote cluster config to the prereqs.
* Updates the remote cluster docs to include information about eligible gateway nodes and tagging for gateway nodes.
Closes https://github.com/elastic/elasticsearch/issues/72001
Closes#81652.
Convert the `repository-azure`, `repository-gcs` and `repository-s3`
plugins into modules, so that they are always included in the
Elasticsearch distribution. Also change plugin installation, removal
and syncing so that attempting to add or remove these plugins still
succeeds but is now a no-op.
Adding text to clarify that the default pipeline only applies to indexing requests, not updates.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
(cherry picked from commit 4e6e4eab22)
Co-authored-by: Mike Barretta <mike.barretta@elastic.co>
This commit adds support for MPNet based models.
MPNet models differ from BERT style models in that:
- Special tokens are different
- Input to the model doesn't require token positions.
To configure an MPNet tokenizer for your pytorch MPNet based model:
```
"tokenization": {
"mpnet": {...}
}
```
The options provided to `mpnet` are the same as the previously supported `bert` configuration.
The migrate to data tiers routing API required ILM to be stopped. This
is fine for "live" runs, but for dry runs this isn't a requirement.
This changes the dry_run to allow the API to run irrespective of the ILM
status.
This fixes the migrate to data tiers routing API to take into account
the scenario where the node attribute configuration for an index is more
accurate than the existing `_tier_preference` configuration.
Previously we would simply remove the node attributes routing if there
was a `_tier_preference` configured for the index.
With this commit, we'll look if either the `require.data` or
`include.data` custom routings are colder than the existing `_tier_preference`
configuration (ie. `cold` vs `data_warm,data_hot`) and update the tier
routing accordingly.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold",
index.routing.allocation.include._tier_preference: "data_hot"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
This also removes the existing invariant that had the `require.data`
configuration take precedence over a possible `include.data`
configuration, and will now migrate the coldest configuration to the
corresponding `_tier_preference`.
eg.
{
index.routing.allocation.require.data: "warm",
index.routing.allocation.include.data: "cold"
}
will be migrated to:
{
index.routing.allocation.include._tier_preference: "data_cold,data_warm,data_hot"
}
As outlined in elastic/elasticsearch#81604, including the `searchable_snapshot` action in both the hot and cold phases can result in indices not automatically migrating to the cold tier during the cold phase.
This adds a related warning.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
Changes:
* Notes that the query string query's `default_field` and `fields` parameters support wildcards.
* Adds an xref to the `index.query.default_field` docs to the `default_field` parameter.
As part of the effort of making JDBC driver self sufficient, remove the
ES lib geo dependencies without any replacement.
Currently the JDBC driver takes the WKT text and instantiates a geo
object based on the ES lib geo.
Moving forward the driver will return the WKT string representation
without any conversion letting the user pick the geo library desired.
That can be ES lib geo, jts, spatial4j or others.
Note this is a breaking change.
Relates #80277
We (mostly I) were initially advocating for the auto-generated files to
use unique names (the name containing a timestamp particle), in order to
avoid that subsequent invocations of the config step conflict with
itself. Moreover, I was wishing that these files will not have to be
handled directly by admins (that the enrollment process was to be used).
However, experience proved us otherwise, admins have to manipulate these
files, and unique configuration names are hard to deal with in scripts
and docs, so this PR is all about using a fixed name for all the
generated files. _Labeling as a bug fix because the feedback is that it
very negatively impacts usabilty._ Closes
https://github.com/elastic/elasticsearch/issues/81057
This improves reporting of trained model size in the response of the stats API.
In particular, it removes the `model_size_bytes` from the `deployment_stats` section and
replaces it with a top-level `model_size_stats` object that contains:
- `model_size_bytes`: the actual model size
- `required_native_memory_bytes`: the amount of memory required to load a model
In addition, these are now reported for PyTorch models regardless of their deployment state.
Add JwtRealmSettings
Include unit tests and realm security settings documentation. Covers all settings except client authentication mTLS option, and HTTP proxy option.
Refactor Open ID Connect realm to reuse ClaimSetting.java and ClaimParser.java for JWT realm.
This change allows to not open scroll while reindex/delete_by_query/update_by_query
if configured max_docs if less then or equal to the number of documents returned by the scroll batch.
After 7.16.2, we'll no longer produce Windows MSI installer packages for Elasticsearch. These packages were previously released in beta and didn't receive widespread adoption.
### Changes:
* Adds a related 7.17 breaking change.
* Adds a related 7.16 deprecation.
* Removes the MSI installation instructions.
* Removes references to the MSI installer.
I plan to port the applicable changes to 8.1 (main), 8.0, 7.17, and 7.16. In the 7.16 ports, I'll leave in the MSI install docs and add related deprecation notes to them instead.
Removes a section covering configuration management tools from the
installation instructions.
After 7.16.2, Elastic will no longer maintain these tools. Previously,
the tools were only supported on a "best effort" basis.
For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater.
Co-authored-by: David Roberts <dave.roberts@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Combines several 8.0 breaking changes for the removal of API endpoints that contain mapping types. These items were separate because we previously organized breaking changes by area.
This is a follow-on to #79162.
This commit deprecates the indices.query.bool.max_clause_count node setting,
and instead configures the maximum clause count for lucene based on the available
heap and the size of the thread pool.
Closes#46433
This PR adds four new templates that are automatically installed from the Monitoring plugin.
In 8.x, Metricbeat will be writing its data in ECS compliant format, even when used with xpack
mode enabled (stack monitoring). In order to continue to support the legacy data format, new
mappings have been created with the new ECS fields for indexing data, and alias fields for the
legacy format which point to the corresponding ECS fields.
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Mat Schaffer <mat@schaffer.me>
* [DOCS] Enroll additional nodes on Docker
* Remove -p option for second node
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
* Rename nodes to align with other Docker docs
* Add elastic network to first node docker run command
* Remove hyphen from node names
Co-authored-by: Fabio Busatto <52658645+bytebilly@users.noreply.github.com>
Updates the remote clusters version compatibility table to include 7.17 and 8.x versions.
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
If a node reaches the flood stage watermark then we automatically apply
the `read_only_allow_delete` block to all its indices to prevent any
further growth in data. Users are expected to fix the disk space issue
by adding more space or deleting indices. However some users may prefer
to fix the disk space issues by modifying some of the index settings,
perhaps removing replicas or adjusting an allocation filter to move
shards onto nodes with more space. Today this isn't possible since the
`read_only_allow_delete` block also applies to metadata writes. Blocking
metadata writes isn't necessary to protect against further increases in
disk usage, and makes it harder for users to resolve the disk space
issue, so this commit removes the `METADATA_WRITE` level from the block
definition.
per issue 60780, decision from team to remove experimental language from HDR Histogram percentiles and ranks. Feature has been in production for quite some time.
closes#60780
* [DOCS] Add docs for verifying CA fingerprint
* Update openssl command and explanatory text
* Explain copying CA cert if fingerprint validation isn't possible
* Incorporate new section into the main security config page
* Clarify how cert is used
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
* Split into two, separate sections
* Rename file and update text based on feedback
* Update ref to use new filename
* Remove extra word
Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
* [DOCS] Remove sentence about security being disabled by default
* Updating introduction
* Remove minimal security page
* Clarify configuring security before starting ES
* Clarifications
* Remove old file
* Add set passwords page
* Update change passwords page, clarify TLS adjustments, and other edits
* Update test
* Minor clarification to intro text
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>