* [DOCS] Warn only one date format is added to the field date formats
When using multiple options in `dynamic_date_formats`, only one of the formats of the first document having a date matching one of the date formats provided will be used.
E.g.
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM", "MM/dd/yyyy"]
}
}
PUT my-index-000001/_doc/1
{
"create_date": "09/25/2015"
}
```
The generated mappings will be:
```
"mappings": {
"dynamic_date_formats": [
"yyyy/MM",
"MM/dd/yyyy"
],
"properties": {
"create_date": {
"type": "date",
"format": "MM/dd/yyyy"
}
}
},
```
Indexing a document with `2015/12` would lead to the `format` `"yyyy/MM"` being used for the `create_date`.
This can be misleading especially if the user is using multiple date formats on the same field.
The first document will determine the format of the `date` field being detected.
Maybe we should provide an additional example, such as:
```
PUT my-index-000001
{
"mappings": {
"dynamic_date_formats": [ "yyyy/MM||MM/dd/yyyy"]
}
}
```
My wording is not great, so feel free to amend/edit.
* Update docs/reference/mapping/dynamic/field-mapping.asciidoc
Reword and add code example
* Turned discussion of the two syntaxes into an admonition
* Fix failing tests
Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
The docs for `transport.ping_schedule` note that the transport client
defaults to a 5s ping schedule, but this is no longer relevant. This
commit drops this from the docs, and also moves the docs for this
setting further down the page to reflect its relative unimportance.
Today if a node shutdown is stalled due to unmoveable shards then we say
to use the allocation explain API to find details. In fact, since #78727
we include the allocation explanation in the response already so we
should tell users just to look at that instead. This commit adjusts the
message to address this.
This PR updates the URLs for several references that are being
used in the plugin document.
Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
Today if there are multiple nodes with the same name then
`POST /_cluster/voting_config_exclusions?node_names=ambiguous-name` will
return a `500 Internal Server Error` and a mysterious message. This
commit changes the behaviour to throw an `IllegalArgumentException`
(i.e. `400 Bad Request`) along with a more useful message describing the
problem.
We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
This PR expands the existing GetProfile API to support getting multiple
profiles by IDs. As a result, the response format is also changed to
align with the latest version of API design guideline. Concretely, this
means moving the profiles as an array inside a top level "profiles"
field so that (1) does not mix dynamic fields (uid) with static fields
and (2) enforcing an order in the response which is desirable for
clients.
The change also reports any error encounter in the retrieving process in
a top level "errors" field.
Relates: #81910
Fixing the names of the internal actions used by CoordinationDiagnosticsService to begin with "internal:" so
that they can be used in the system context with security enabled.
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
This PR adds a new `role_descriptors` field in the API key entity
returned by both GetApiKey and QueryApiKey APIs. The field value is the
map of the role descriptors that are assigned to an API key when
creating or updating the key. If the key has no assigned role
descriptors, i.e. it inherits the owner user's privileges, an empty
object is returned in place.
Relates: #89058
This PR adds BloomFilter to Elasticsearch and enables it for the _id
field of non-data stream indices. BloomFilter should speed up the
performance of mget and update requests at a small expense of refresh,
merge, and storage.
This commit causes non-master-eligible nodes to poll a random master-eligible node every 10 seconds
whenever the elected master goes null for diagnostic information in support of the health API's master
stability check.
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
If a docvalues field matches multiple field patterns, then ES will
return the value of that doc-values field multiple times. Like fetching
fields from source, we should deduplicate the matching doc-values
fields.
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.
This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.
Closes#86875
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field
caps API per field.
Especially for metrics fields, it must be clear which fields are metrics and if they belong
to only time-series indexes or mixed time-series and non-time-series indexes.
To further distinguish metric fields when they belong to any of the following indices:
- Standard (non-time-series) indexes
- Time series indexes
- Downsampled time series indexes
This PR modifies the field caps API so that the mapping parameters time_series_dimension
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.
This PR revisits some of the conventions adopted by #78790
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns
Relates #77065
Co-authored-by: David Roberts <dave.roberts@elastic.co>
represent transactions as bitsets for faster lookups when iterating over candidate sets. This PR implements
a lookup table and a subset check based on bits. It uses this lookup table to map transactions to items, this
so-called horizontal representation is used to speedup the lookup that checks if a transaction contains the
candidate item set
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.
Relates to #87625
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.
See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.
The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.
To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.
There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.
I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
As discussed in #73569 the current implementation is too slow in certain scenarios.
The inefficient part of the code can be stated as the following problem:
Given a text (getText()) and a position in this text (offset), find the sentence
boundary before and after the offset, in such a way that the after boundary is
maximal but respects end boundary - start boundary < fragment size.
In case it's impossible to produce an after boundary that respects the said
condition, use the nearest boundary following offset.
The current approach begins by finding the nearest preceding and following boundaries,
and expands the following boundary greedily while it respects the problem restriction. This
is fine asymptotically, but BreakIterator which is used to find each boundary is sometimes
expensive.
This new approach maximizes the after boundary by scanning for the last boundary
preceding the position that would cause the condition to be violated (i.e. knowing start
boundary and offset, how many characters are left before resulting length is fragment size).
If this scan finds the start boundary, it means it's impossible to satisfy the problem
restriction, and we get the first boundary following offset instead (or better, since we
already scanned [offset, targetEndOffset], start from targetEndOffset + 1).
transform persists the internal state of a transform (e.g. the data cursor) in state document.
This change improves the error handling and fixes the problem described in #88905. A transform
can now recover from this problem.
fixes#88905
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
Introduced in: #88439
* [ML] add text_similarity nlp task documentation
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Apply suggestions from code review
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
* Update docs/reference/ml/ml-shared.asciidoc
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
Computing routing nodes and the indices lookup takes considerable time
for large states. Both are needed during cluster state application and
Prior to this change would be computed on the applier thread in all cases.
By running the creation of both objects concurrently to publication, the
many shards benchmark sees a 10%+ reduction in the bootstrap time to
50k indices.