We sometimes see a `ShardLockObtainFailedException` when a shard failed
to shut down as fast as we expected, often because a node left and
rejoined the cluster. Sometimes this is because it was held open by
ongoing scrolls or PITs, but other times it may be because the shutdown
process itself is too slow. With this commit we add the ability to
capture and log a thread dump at the time of the failure to give us more
information about where the shutdown process might be running slowly.
Relates #93226
This adds term query capabilities for rank_features fields. term queries against rank_features are not scored in the typical way as regular fields. This is because the stored feature values take advantage of the term frequency storage mechanism, and thus regular BM25 does not work.
Instead, a term query against a rank_features field is very similar to linear rank_feature query. If more complicated combinations of features and values are required, the rank_feature query should be used.
This adds a new option to the knn search clause called query_vector_builder. This is a pluggable configuration that allows the query_vector created or retrieved.
This change introduces the configuration option `ignore_missing_component_templates` as discussed in https://github.com/elastic/elasticsearch/issues/92426 The implementation [option 6](https://github.com/elastic/elasticsearch/issues/92426#issuecomment-1372675683) was picked with a slight adjustment meaning no patterns are allowed.
## Implementation
During the creation of an index template, the list of component templates is checked if all component templates exist. This check is extended to skip any component templates which are listed under `ignore_missing_component_templates`. An index template that skips the check for the component template `logs-foo@custom` looks as following:
```
PUT _index_template/logs-foo
{
"index_patterns": ["logs-foo-*"],
"data_stream": { },
"composed_of": ["logs-foo@package", "logs-foo@custom"],
"ignore_missing_component_templates": ["logs-foo@custom"],
"priority": 500
}
```
The component template `logs-foo@package` has to exist before creation. It can be created with:
```
PUT _component_template/logs-foo@custom
{
"template": {
"mappings": {
"properties": {
"host.ip": {
"type": "ip"
}
}
}
}
}
```
## Testing
For manual testing, different scenarios can be tested. To simplify testing, the commands from `.http` file are added. Before each test run, a clean cluster is expected.
### New behaviour, missing component template
With the new config option, it must be possible to create an index template with a missing component templates without getting an error:
```
### Add logs-foo@package component template
PUT http://localhost:9200/
_component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json
{
"template": {
"mappings": {
"properties": {
"host.name": {
"type": "keyword"
}
}
}
}
}
### Add logs-foo index template
PUT http://localhost:9200/
_index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json
{
"index_patterns": ["logs-foo-*"],
"data_stream": { },
"composed_of": ["logs-foo@package", "logs-foo@custom"],
"ignore_missing_component_templates": ["logs-foo@custom"],
"priority": 500
}
### Create data stream
PUT http://localhost:9200/
_data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
### Check if mappings exist
GET http://localhost:9200/
logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```
It is checked if all templates could be created and data stream mappings are correct.
### Old behaviour, with all component templates
In the following, a component template is made optional but it already exists. It is checked, that it will show up in the mappings:
```
### Add logs-foo@package component template
PUT http://localhost:9200/
_component_template/logs-foo@package
Authorization: Basic elastic password
Content-Type: application/json
{
"template": {
"mappings": {
"properties": {
"host.name": {
"type": "keyword"
}
}
}
}
}
### Add logs-foo@custom component template
PUT http://localhost:9200/
_component_template/logs-foo@custom
Authorization: Basic elastic password
Content-Type: application/json
{
"template": {
"mappings": {
"properties": {
"host.ip": {
"type": "ip"
}
}
}
}
}
### Add logs-foo index template
PUT http://localhost:9200/
_index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json
{
"index_patterns": ["logs-foo-*"],
"data_stream": { },
"composed_of": ["logs-foo@package", "logs-foo@custom"],
"ignore_missing_component_templates": ["logs-foo@custom"],
"priority": 500
}
### Create data stream
PUT http://localhost:9200/
_data_stream/logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
### Check if mappings exist
GET http://localhost:9200/
logs-foo-bar
Authorization: Basic elastic password
Content-Type: application/json
```
### Check old behaviour
Ensure, that the old behaviour still exists when a component template is used that is not part of `ignore_missing_component_templates`:
```
### Add logs-foo index template
PUT http://localhost:9200/
_index_template/logs-foo
Authorization: Basic elastic password
Content-Type: application/json
{
"index_patterns": ["logs-foo-*"],
"data_stream": { },
"composed_of": ["logs-foo@package", "logs-foo@custom"],
"ignore_missing_component_templates": ["logs-foo@custom"],
"priority": 500
}
```
Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
* enhancement: boolean field to support ignore_malformed
* fix: changes in current builder for BooleanFieldMappers within tests files.
* Updating documentation
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:
```
POST _ingest/pipeline/_simulate
{
"pipeline": {
"description": "",
"processors" : [
{
"json" : {
"field" : "message",
"strict_json_parsing": false
}
}
]
},
"docs": [
{
"_source": {
"message": "123 \"foo\""
}
}
]
}'
```
Closes#92898
The systemd unit file is part of the Elasticsearch package and should
not be edited. Instead, we recommend creating a service override file.
This commit tweaks the docs for setting tmp dir with systemd to use the
override file instead of editing the unit file.
relates #93121
* Documentation for geohex_grid over geo_shape
The feature to add support for geohex_grid aggregations over geo_shape
fields was added in https://github.com/elastic/elasticsearch/pull/91956.
This is the associated documentation for that.
* Update docs/reference/aggregations/bucket/geohexgrid-aggregation.asciidoc
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
* Fix explanation for geo_point vs geo_shape proj
When aggregating geohex over geoshape we use requirectangular because
underlying lucene index indexes and searches the polygons in that way.
* Correct spelling
According to grammarly, "therefor" is not an alternative spelling
of "therefore". We should use the conjunctive form here.
See https://www.grammarly.com/blog/therefore-vs-therefor/
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Update tsdb docs to include a warning that the format of the `_tsid` field shouldn't be relied upon and added additional limitations about dimension fields.
* [+DOC] Restore policies in restoring ILM indices
👋 howdy! This may need Asciidoc reformatting. Will you kindly add in express commentary on [Restore a managed Datastream or Index](https://www.elastic.co/guide/en/elasticsearch/reference/master/index-lifecycle-and-snapshots.html?edit) to also restore ILM policies as needed (via `include_global_state`). Otherwise, you induce ILM errors once ILM starts (and have to do a form of repeating the entire outlined procedure to get indices going through correctly.)
* Apply suggestions from code review
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Today we suggest that users set `ES_TMPDIR` using `export`, which only
works if you're running things directly from the shell. Yet most users
encountering `ES_TMPDIR` problems seem to on RHEL and trying to run
things via `systemd`, for whom the `export` suggestion doesn't work.
This commit adds to the docs a suggestion of how to adjust the `systemd`
service file to set the appropriate environment variable.
Relates #80651
This PR is another round of documentation update for the JWT realm with the goal to achieve better clarity, differentiating more between the two token types and encourage readers to choose between them carefully.
Relates: #92409
To make it clear that repository snapshots should be available and reliable for any mounted searchable snapshots.
Co-authored-by: David Turner <david.turner@elastic.co>
The companion PR to elastic/ml-cpp#2440 adds processing of multimodal_distribution field in the anomaly score explanation. I added a changelog entry in the ml-cpp PR hence I mark this PR as a non-issue.
Documentation incorrectly states that all aggregations are supported by
the `aggregate_metric_double` field.
This PR rectifies this error.
Closes#92236
If debug logging is enabled then the lag detector will capture and
report the hot threads of a lagging node. In some cases the resulting
log message can be very large, exceeding 10kiB, which means it is
truncated in most logging setups. The relevant thread(s) may be waiting
on I/O, which is not considered "hot" and therefore may not appear in
the first 10kiB.
This commit adjusts this logging mechanism to split the message into
chunks of size at most 2kiB (after compression and base64-encoding) to
ensure that the entire hot threads output can be faithfully
reconstructed from these logs.
Closes#88126
* [DOC] Troubleshooting Expensive Searches
👋 re: https://github.com/elastic/elasticsearch/issues/73222 adds in content so we can link to users on how to find source of expensive searches.
* Several edits
* Apply suggestions from code review
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
It makes sense to allow more than one KNN search clause per individual search request. It may be that different documents have separate vector spaces or that a single doc is index with more than one vector space. In both of these scenarios, users may want to retrieve a resulting set that takes into account all their indexed vector spaces.
A prime example here would be searching a semantic text embedding along with searching an image embedding.
closes https://github.com/elastic/elasticsearch/issues/91187
Dangling indices are not imported automatically since 8.0 but the
`elasticsearch-node detach-cluster` documentation still suggests it is.
It tried to make it more explicit by listing the Dangling API to use and
by using the work "manually".