Commit Graph

12054 Commits

Author SHA1 Message Date
Luca Belluccini 2d3bcc483d
[DOCS] Warn only one date format is added to the field date formats when using dynamic_date_formats (#88915)
* [DOCS] Warn only one date format is added to the field date formats

When using multiple options in `dynamic_date_formats`, only one of the formats of the first document having a date matching one of the date formats provided will be used.

E.g.
```
PUT my-index-000001
{
  "mappings": {
    "dynamic_date_formats": [ "yyyy/MM", "MM/dd/yyyy"]
  }
}

PUT my-index-000001/_doc/1
{
  "create_date": "09/25/2015"
}
```

The generated mappings will be:
```
    "mappings": {
      "dynamic_date_formats": [
        "yyyy/MM",
        "MM/dd/yyyy"
      ],
      "properties": {
        "create_date": {
          "type": "date",
          "format": "MM/dd/yyyy"
        }
      }
    },
```

Indexing a document with `2015/12` would lead to the `format` `"yyyy/MM"` being used for the `create_date`.

This can be misleading especially if the user is using multiple date formats on the same field.
The first document will determine the format of the `date` field being detected.

Maybe we should provide an additional example, such as:
```
PUT my-index-000001
{
  "mappings": {
    "dynamic_date_formats": [ "yyyy/MM||MM/dd/yyyy"]
  }
}
```

My wording is not great, so feel free to amend/edit.

* Update docs/reference/mapping/dynamic/field-mapping.asciidoc

Reword and add code example

* Turned discussion of the two syntaxes into an admonition

* Fix failing tests

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-08-11 10:43:53 +02:00
David Turner 616fd07278
Drop transport client from ping_schedule docs (#89264)
The docs for `transport.ping_schedule` note that the transport client
defaults to a 5s ping schedule, but this is no longer relevant. This
commit drops this from the docs, and also moves the docs for this
setting further down the page to reflect its relative unimportance.
2022-08-11 09:25:14 +01:00
David Turner 0bf31b77fb
Fix message for stalled shutdown (#89254)
Today if a node shutdown is stalled due to unmoveable shards then we say
to use the allocation explain API to find details. In fact, since #78727
we include the allocation explanation in the response already so we
should tell users just to look at that instead. This commit adjusts the
message to address this.
2022-08-11 07:48:03 +01:00
Yoann Rodière 841ac8e43a
Upgrade Apache Commons Logging to 1.2 (#85745)
* Upgrade to Apache Commons Logging 1.2 (#40305)
* Clarify that Apache HTTP/commons-* dependencies are not just for tests
2022-08-10 13:19:15 -04:00
GabyCT 341f3b717a
[DOCS] Update URLs in plugin document (#89221)
This PR updates the URLs for several references that are being
used in the plugin document.

Signed-off-by: Gabriela Cervantes <gabriela.cervantes.tellez@intel.com>
2022-08-10 16:40:01 +02:00
David Turner ceffaf9aad
Improve rejection of ambiguous voting config name (#89239)
Today if there are multiple nodes with the same name then
`POST /_cluster/voting_config_exclusions?node_names=ambiguous-name` will
return a `500 Internal Server Error` and a mysterious message. This
commit changes the behaviour to throw an `IllegalArgumentException`
(i.e. `400 Bad Request`) along with a more useful message describing the
problem.
2022-08-10 12:39:24 +01:00
David Turner 546a2e2898
Add note on per-segment field name overhead (#89152)
We encountered a case where a substantial fraction of the heap usage was
due to per-segment-per-field `FieldInfo` objects, particularly
`FieldInfo#name`. This commit adds a note to the sizing docs about this
overhead.
2022-08-10 08:17:55 +01:00
Yang Wang d663231a83
User Profile - GetProfile API nows supports multiple UIDs (#89023)
This PR expands the existing GetProfile API to support getting multiple
profiles by IDs. As a result, the response format is also changed to
align with the latest version of API design guideline. Concretely, this
means moving the profiles as an array inside a top level "profiles"
field so that (1) does not mix dynamic fields (uid) with static fields
and (2) enforcing an order in the response which is desirable for
clients.

The change also reports any error encounter in the retrieving process in
a top level "errors" field.

Relates: #81910
2022-08-10 10:51:38 +09:30
Nikola Grcevski 895baf011c
Delete invalid settings for system indices (#88903) 2022-08-09 17:11:55 -04:00
Keith Massey e63bcb550e
Fixing internal action names (#89182)
Fixing the names of the internal actions used by CoordinationDiagnosticsService to begin with "internal:" so
that they can be used in the system context with security enabled.
2022-08-09 08:47:29 -05:00
Ignacio Vera cd359b3d39
geo_line aggregation returns a geojson point when the resulting line has only one point (#89199)
This commit changes the geojson output to return a point instead in the cases a geo_line aggregation 
only contains one point.
2022-08-09 15:07:22 +02:00
David Turner c9d4892929
Weaken language about "low-latency" networks (#89198)
Today we say that voting-only nodes require a "low-latency" network.
This term has a specific meaning in some operating environments which is
different from our intended meaning. To avoid this confusion this commit
removes the absolute term "low-latency" in favour of describing the
requirements relative to the user's own performance goals.
2022-08-09 13:15:37 +01:00
Yang Wang e6cfd9c263
Show assigned role descriptors in Get/QueryApiKey response (#89166)
This PR adds a new `role_descriptors` field in the API key entity
returned by both GetApiKey and QueryApiKey APIs. The field value is the
map of the role descriptors that are assigned to an API key when
creating or updating the key. If the key has no assigned role
descriptors, i.e. it inherits the owner user's privileges, an empty
object is returned in place.

Relates: #89058
2022-08-09 11:42:05 +09:30
Chris Hegarty ac25477e40
Quote paths with whitespace in Windows service CLIs (#89072) 2022-08-08 17:06:07 +01:00
Jack Conradson 81265d2c2a
Add support for source fallback with scaled float field type (#89053)
This change adds source fallback support for scaled float. This uses the already existing class 
SourceValueFetcherSortedDoubleIndexFieldData.
2022-08-08 08:39:13 -07:00
Jack Conradson 24e367fe0f
Add support for source fallback with the boolean field type (#89052)
This change adds a SourceValueFetcherSortedBooleanIndexFieldData to support boolean doc values 
for source fallback.
2022-08-08 08:38:48 -07:00
Nhat Nguyen cfad420cde
Enable BloomFilter for _id of non-datastream indices (#88409)
This PR adds BloomFilter to Elasticsearch and enables it for the _id 
field of non-data stream indices. BloomFilter should speed up the
performance of mget and update requests at a small expense of refresh,
merge, and storage.
2022-08-08 11:14:26 -04:00
István Zoltán Szabó 7602015384
[DOCS] Improves frequent items aggregation docs (#89122) 2022-08-08 15:46:29 +02:00
István Zoltán Szabó 226b8a260e
[DOCS] Modifies the description of frequency. (#89128) 2022-08-08 15:44:00 +02:00
Keith Massey ee33383156
Polling for cluster diagnostics information (#89014)
This commit causes non-master-eligible nodes to poll a random master-eligible node every 10 seconds
whenever the elected master goes null for diagnostic information in support of the health API's master
stability check.
2022-08-08 08:29:09 -05:00
David Turner c81f907ad8
Refine size-your-shards wording (#89081)
Clarify that the limits in the docs are absolute maxima that will avoid
things just breaking but won't necessarily give great performance.
2022-08-08 18:36:32 +09:30
Gonçalo Montalvão Marques c4bd4d3cbf
Fix typo in geo-distance-query doc (#89148) 2022-08-08 09:59:47 +02:00
Andrei Stefan 84e63080f4
fix the changelog yaml file for pr 87887 (#89146) 2022-08-05 14:03:03 +03:00
wjwei e53767835a
Fix object equals for SqlQueryRequest's binaryCommunication (#87887)
Co-authored-by: owenniceliu <owenniceliu@tencent.com>
2022-08-05 12:56:19 +03:00
Benjamin Trent d588d456f0
[ML] add new trained model deployment cache clear API (#89074)
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
2022-08-04 19:45:15 +01:00
Nhat Nguyen e3c33e2acd
Deduplicate fetching doc-values fields (#89094)
If a docvalues field matches multiple field patterns, then ES will 
return the value of that doc-values field multiple times. Like fetching
fields from source, we should deduplicate the matching doc-values
fields.
2022-08-04 14:05:09 -04:00
likzn f28f4545b2
In the field capabilities API, re-add support for `fields` in the request body (#88972)
We previously removed support for `fields` in the request body, to ensure there
was only one way to specify the parameter. We've now decided to undo the
change, since it was disruptive and the request body is actually the best place to
pass variable-length data like `fields`.

This PR restores support for `fields` in the request body. It throws an error
if the parameter is specified both in the URL and the body.

Closes #86875
2022-08-04 13:44:50 -04:00
Christos Soulios b81f4187ab
[TSDB] Metric fields in the field caps API (#88695)
To assist the user in configuring the visualizations correctly while leveraging TSDB
functionality, information about TSDB configuration should be exposed via the field 
caps API per field.

Especially for metrics fields, it must be clear which fields are metrics and if they belong 
to only time-series indexes or mixed time-series and non-time-series indexes.

To further distinguish metric fields when they belong to any of the following indices:

  -  Standard (non-time-series) indexes
  -  Time series indexes
  -  Downsampled time series indexes

This PR modifies the field caps API so that the mapping parameters time_series_dimension 
and time_series_dimension are presented only when they are set on fields of time-series indexes.
Those parameters are completely ignored when they are set on standard (non-time-series) indexes.

This PR revisits some of the conventions adopted by #78790
2022-08-04 20:42:34 +03:00
Ed Savage 188f8872c6
[ML] ECS Grok patterns in the _text_structure/find_structure endpoint (#88982)
Also add support for new CATALINA/TOMCAT timestamp formats used by ECS Grok patterns

Relates #77065

Co-authored-by: David Roberts <dave.roberts@elastic.co>
2022-08-04 18:39:04 +01:00
zhouhui 8f08c7b55b
Override bulk visit methods of exitable point visitor (#82120) 2022-08-04 11:48:36 -04:00
Adam Locke 7b8c056494
[DOCS] Replace ES_JAVA_OPTS with CLI_JAVA_OPTS (#89121) 2022-08-04 09:27:40 -04:00
Thomas Decaux 2f0d9c8342
[DOCS] Fix plugins CLI doc CLI_JAVA_OPTS env var (#89003)
The commit 1d4534f848 changes the env variable ``ES_JAVA_OPTS`` to ``CLI_JAVA_OPTS``. Doc must be updated as well.
2022-08-04 09:04:28 -04:00
Abdon Pijpelink b96c39e7ad
[DOCS] Move completion type asciidoc (#89086)
* [DOCS] Move completion type asciidoc

* Fix failing code snippet test
2022-08-04 10:02:28 +02:00
Hendrik Muhs 68050e9502
[ML] Optimize frequent items transaction lookup (#89062)
represent transactions as bitsets for faster lookups when iterating over candidate sets. This PR implements
a lookup table and a subset check based on bits. It uses this lookup table to map transactions to items, this
so-called horizontal representation is used to speedup the lookup that checks if a transaction contains the
candidate item set
2022-08-04 08:51:31 +02:00
Stef Nestor 5da482b9de
ILM Frozen allows Unfollow Action (#88973)
Updates [Phase Action](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-index-lifecycle.html#ilm-phase-actions) list to agree with [Unfollow](https://www.elastic.co/guide/en/elasticsearch/reference/current/ilm-unfollow.html) page that Frozen tier accepts Unfollow action.

Confirmed v8.3
```diff
PUT _ilm/policy/my_policy
{"policy": {"phases": { "frozen": { "actions": {
+  "unfollow" : {},
  "searchable_snapshot": {
    "snapshot_repository" : "found-snapshots"} } } } } }

{"acknowledged": true }
```
2022-08-03 14:32:15 -06:00
Stef Nestor 4af7069958
Update ES.ILM.Action.ReadOnly (#89054)
Related to [Discuss#311070](https://discuss.elastic.co/t/action-readonly-appears-to-set-index-blocks-write-not-index-blocks-read-only/311070), @joegallo explains

> The [ReadOnlyAction](https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ReadOnlyAction.java#L58-L65) is composed of a series of steps, the most important to this conversation being the [ReadOnlyStep](https://github.com/elastic/elasticsearch/blob/main/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ilm/ReadOnlyStep.java#L42). That step does indeed add a write block (as opposed to a ‘read_only’) block, almost certainly the reasoning is that a ‘read_only’ block makes the index metadata read only, also, and we can’t have that — it would prevent the index from moving through the rest of the ILM process.  E.g. can’t reassign tiers, can’t change replicas, can’t even change the currently assigned ilm phase/action/step, etc, if you can’t change the index’s metadata.

So, the intention of ILM Action "Read Only" is to make an index's data read only and not also the index's metadata. This also decouples "read only" from understanding overlapping to `index.blocks.read_only` which appears to be an accidental thought overlap.
2022-08-03 14:31:20 -06:00
Julie Tibshirani 21eb984e64
Deprecate the _knn_search endpoint (#88828)
This change deprecates the kNN search API in favor of the new 'knn' option
inside the search API. The 'knn' option is now the preferred way of performing
kNN search.

Relates to #87625
2022-08-03 15:19:01 -04:00
Dimitris Athanasiou 77aa8c03e1
[ML] Include start params in _stats for non-started model deployments (#89091)
Adds the missing start parameters to the _stats API response
for non-started deployments.
2022-08-03 18:55:05 +03:00
Leaf-Lin 942e5fd9fc
Adding specific items into troubleshooting guide (#88105)
* Update troubleshooting.asciidoc

Adding items into the troubleshooting guide

* Resolve conflicts

* Reorganizes troubleshooting links

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-08-03 17:00:34 +02:00
Rory Hunter 512bfebc10
Provide tracing implementation using OpenTelemetry + APM agent (#88443)
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.

See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.

The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.

To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.

There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.

I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.
2022-08-03 14:13:31 +01:00
Denilson das Mercês Amorim 6bf5078fa9
Improve efficiency of BoundedBreakIteratorScanner fragmentation algorithm (#89041)
As discussed in #73569 the current implementation is too slow in certain scenarios.

The inefficient part of the code can be stated as the following problem:

Given a text (getText()) and a position in this text (offset), find the sentence 
boundary before and after the offset, in such a way that the after boundary is 
maximal but respects end boundary - start boundary < fragment size.

In case it's impossible to produce an after boundary that respects the said 
condition, use the nearest boundary following offset.

The current approach begins by finding the nearest preceding and following boundaries, 
and expands the following boundary greedily while it respects the problem restriction. This 
is fine asymptotically, but BreakIterator which is used to find each boundary is sometimes 
expensive.

This new approach maximizes the after boundary by scanning for the last boundary 
preceding the position that would cause the condition to be violated (i.e. knowing start
boundary and offset, how many characters are left before resulting length is fragment size). 
If this scan finds the start boundary, it means it's impossible to satisfy the problem 
restriction, and we get the first boundary following offset instead (or better, since we 
already scanned [offset, targetEndOffset], start from targetEndOffset + 1).
2022-08-03 12:07:17 +01:00
David Turner 74ce7a4603
Fix typo (#89063) 2022-08-03 10:23:57 +01:00
Abdon Pijpelink aae0ed8eb1
[DOCS] Added note about using _size in Kibana. Closes #88322 (#89030)
* [DOCS] Added note about using _size in Kibana. Closes #88322

* Use correct attributes
2022-08-03 10:36:03 +02:00
Hendrik Muhs d3e057c33a
[Transform] improve error handling in state persistence (#88910)
transform persists the internal state of a transform (e.g. the data cursor) in state document.
This change improves the error handling and fixes the problem described in #88905. A transform
can now recover from this problem.

fixes #88905
2022-08-03 07:57:40 +02:00
Jack Conradson 3bb4a84bdd
Support source fallback for double, float, and half_float field types (#89010)
This change adds a SourceValueFetcherSortedDoubleIndexFieldData to support double doc values types for source fallback. This also adds support for double, float and half_float field types.
2022-08-02 10:13:58 -07:00
Alexander Reelsen 9b02303138
Docs: Remove paragraph that applies only before Elasticsearch 7.0 (#86209) 2022-08-03 02:35:11 +09:30
Benjamin Trent 9ce59bb7a9
[ML] add text_similarity nlp task documentation (#88994)
Introduced in: #88439

* [ML] add text_similarity nlp task documentation

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/ml-shared.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2022-08-02 12:17:14 -04:00
Armin Braun d9dc3a9629
Preemptively initialize routing nodes and indices lookup on all node types (#89032)
Follow up to #89005 running the initialization as soon as possible on non-master
nodes as well.
2022-08-02 17:30:18 +02:00
Leaf-Lin 44c8d19b6d
Update snapshots.asciidoc (#87584)
Adding a typo ``` in the doc
2022-08-02 11:24:31 +02:00
Armin Braun 9bed4b89fd
Preemptively compute RoutingNodes and the indices lookup during publication (#89005)
Computing routing nodes and the indices lookup takes considerable time
for large states. Both are needed during cluster state application and
Prior to this change would be computed on the applier thread in all cases.
By running the creation of both objects concurrently to publication, the
many shards benchmark sees a 10%+ reduction in the bootstrap time to
50k indices.
2022-08-02 11:02:00 +02:00