Commit Graph

9115 Commits

Author SHA1 Message Date
István Zoltán Szabó 1971bd4591
[DOCS] Adds Transform alerts docs (#78185) 2021-10-05 14:06:48 +02:00
James Rodewig 5c7fac77b3
[DOCS] Add Beats config example for ingest pipelines (#78633)
* [DOCS] Add Beats config example for ingest pipelines

The Elasticsearch ingest pipeline docs cover ingest pipelines for Fleet and
Elastic Agent. However, the docs don't cover Beats. This adds those docs.

Relates to https://github.com/elastic/beats/pull/28239.

* Update docs/reference/ingest.asciidoc

Co-authored-by: DeDe Morton <dede.morton@elastic.co>

Co-authored-by: DeDe Morton <dede.morton@elastic.co>
2021-10-05 05:47:50 -04:00
Alan Woodward 2de2bef4de
Remove indices_segments 'verbose' parameter (#78451)
The 'verbose' option to /_segments returns memory information
for each segment. However, lucene 9 has stopped tracking this memory
information as it is largely held off-heap and so is no longer significant.

This commit deprecates the 'verbose' parameter and makes it a no-op.

Fixes #75955
2021-10-05 09:17:16 +01:00
Ignacio Vera 920b3b52c2
Add support for metrics aggregations to mvt end point (#78614)
It adds support for several aggregations.
2021-10-05 09:17:25 +02:00
James Rodewig fd30c6daf8
Add reference to PHP client on Bulk API page (#78558) (#78651)
Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>

Co-authored-by: Christian Fratta <christian.fratta@gmail.com>
2021-10-04 17:42:42 -04:00
Joe Gallo 4a14f2f6f9
Validate that snapshot repository exists for ILM policies at creation/update time (#78468) 2021-10-04 15:19:10 -04:00
Benjamin Trent 7a7fffcb5a
[ML] Text/Log categorization multi-bucket aggregation (#71752)
This commit adds a new multi-bucket aggregation: `categorize_text`

The aggregation follows a similar design to significant text in that it reads from `_source`
and re-analyzes the the text as it is read. 

Key difference is that it does not use the indexed field's analyzer, but instead relies on 
the `ml_standard` tokenizer with specialized ML token filters. The tokenizer + filters are the
same that machine learning categorization anomaly jobs utilize.

The high level logical flow is as follows:
 - at each shard, read in the text field with a custom analyzer using `ml_standard` tokenizer
 - Read in the particular tokens from the analyzer
 - Feed these tokens to a token tree algorithm (an adaptation of the drain categorization algorithm)
 - Gather the individual log categories (the leaf nodes), sort them by doc_count, ship those buckets to be merged
 - Merge all buckets that have the EXACT same key
 - Once all buckets are merged, pass those keys + counts to a new token tree for additional merging
 - That tree builds the final buckets and that is returned to the user

Algorithm explanation:

 - Each log is parsed with the ml-standard tokenizer
 - each token is passed into a token tree
 - For `max_match_token` each token is stored in the tree and at `max_match_token+1` (or `len(tokens)`) a log group is created
 - If another log group exists at that leaf, merge it if they have `similarity_threshold` percentage of tokens in common
     - merging simply replaces tokens that are different in the group with `*`
 - If a layer in the tree has `max_unique_tokens` we add a `*` child and any new tokens are passed through there. Catch here is that on the final merge, we first attempt to merge together subtrees with the smallest number of documents. Especially if the new sub tree has more documents counted.

## Aggregation configuration.

Here is an example on some openstack logs
```js
POST openstack/_search?size=0
{
  "aggs": {
    "categories": {
      "categorize_text": {
        "field": "message", // The field to categorize
        "similarity_threshold": 20, // merge log groups if they are this similar
        "max_unique_tokens": 20, // Max Number of children per token position
        "max_match_token": 4, // Maximum tokens to build prefix trees
        "size": 1
      }
    }
  }
}
```

This will return buckets like
```json
"aggregations" : {
    "categories" : {
      "buckets" : [
        {
          "doc_count" : 806,
          "key" : "nova-api.log.1.2017-05-16_13 INFO nova.osapi_compute.wsgi.server * HTTP/1.1 status len time"
        }
      ]
    }
  }
```
2021-10-04 11:49:16 -04:00
Stef Nestor e0cb0beb73
[DOCS] Fix SLM status response (#78584)
The get SLM status API will only return one of three statuses: `RUNNING`, `STOPPING`, or `STOPPED`.

This corrects the docs to remove the `STARTED` status and document the `RUNNING` status.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-04 09:41:17 -04:00
Tanguy Leroux 63d663e220
Add periodic maintenance task to clean up unused blob store cache docs (#78438)
In #77686 we added a service to clean up blob store 
cache docs after a searchable snapshot is no more 
used. We noticed some situations where some cache 
docs could still remain in the system index: when the 
system index is not available when the searchable 
snapshot index is deleted; when the system index is 
restored from a backup or when the searchable 
snapshot index was deleted on a version before #77686.

This commit introduces a maintenance task that 
periodically scans and cleans up unused blob cache 
docs. This task is scheduled to run every hour on the 
data node that contain the blob store cache primary 
shard. The periodic task works by using a point in 
time context with search_after.
2021-10-04 13:15:56 +02:00
James Rodewig 9e0299f551
[DOCS] Troubleshoot the flood-stage watermark error (#78519)
Adds troubleshooting steps for the flood-stage watermark error.

Closes #77906.
2021-10-01 08:32:53 -04:00
Ignacio Vera e4cde37111
Add centroid grid type in mvt request (#78305)
For this grid type, the features on the aggregation layer are represented by a point that is computed from the 
centroid of the data inside the cell

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-01 06:56:13 +02:00
James Rodewig c33e340a47
[DOCS] EQL: Document `runs` keyword (#78478) (#78518)
Documents the `runs` keyword for running the same event criteria successively in a sequence query.

Relates to #75082.

# Conflicts:
#	docs/reference/release-notes/highlights.asciidoc
2021-09-30 10:23:14 -04:00
Yannick Welsch 3dac76c190
Disk usage API does not support timeout parameters (#78503)
Fixes the documentation that the disk usage API is not supporting timeout parameters.

Closes #78356
2021-09-30 16:08:00 +02:00
James Rodewig 12019a89fd
[DOCS] Document archived settings (#78351)
Documents `archived.*` persistent cluster settings and index settings.
These settings are commonly produced during a major version upgrade.

Closes #28027
2021-09-30 09:27:53 -04:00
debadair 7431a9656e
[DOCS] Fix erroneous page break. (#78487) 2021-09-29 15:12:13 -07:00
William Brafford 8c2fe902f3
Feature upgrade rest stubs (#77827)
* Add stubs for get API
* Add stub for post API
* Register new actions in ActionModule
* HLRC stubs
* Unit tests
* Add rest api spec and tests
* Add new action to non-operator actions list
2021-09-29 16:25:15 -04:00
Jack Conradson 086ba1aefb
Remove JodaCompatibleZonedDateTime (#78417)
This change removes JodaCompatibleZonedDateTime and replaces it with ZonedDateTime for use in 
scripting.

Breaking changes:
* JodaCompatibleDateTime no longer exists and cannot be cast to in Painless. Use ZonedDateTime 
instead.
* The dayOfWeek method on ZonedDateTime returns the DayOfWeek enum instead of an int from 
JodaCompatibleDateTime. dayOfWeekEnum still exists on ZonedDateTime as an augmentation to 
support the transition to ZonedDateTime, but is now deprecated in favor of dayOfWeek on 
ZonedDateTime.
2021-09-29 13:01:40 -07:00
Benjamin Trent 498e6e3d0f
[ML] adding docs for estimated heap and operations (#78376)
Add docs for optionally supplying memory and operation estimates in put model
2021-09-29 09:11:42 -04:00
James Rodewig 4544ab2dbb
[DOCS] Always enable file and native realms unless explicitly disabled (#78405)
* [DOCS] Always enable file and native realms by default

Adds an 8.0 breaking change for PR #69096.

The copy is based on the 7.13 deprecation notice added with PR #69320.

* reword

* Update docs/reference/migration/migrate_8_0/security.asciidoc

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Update docs/reference/migration/migrate_8_0/security.asciidoc

Co-authored-by: Yang Wang <ywangd@gmail.com>

Co-authored-by: Yang Wang <ywangd@gmail.com>
2021-09-29 09:10:30 -04:00
James Rodewig f4b5ef7416
[DOCS] Remove `include_type_name` query parameter (#78394)
Adds an 8.0 breaking change for PR #48632.
2021-09-29 09:00:15 -04:00
Benjamin Trent b96d929af3
[ML] add documentation for get deployment stats API (#78412)
* [ML] add documentation for get deployment stats API

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2021-09-29 07:20:25 -04:00
David Turner 07a2acac93
Improve docs for pre-release version compatibility (#78428)
* Improve docs for pre-release version compatibility

Follow-up to #78317 clarifying a couple of points:

- a pre-release build can restore snapshots from released builds
- compatibility applies if at least one of the local or remote cluster
  is a released build

* Remote cluster build date nit
2021-09-29 04:49:07 -04:00
James Baiera eafbd336c2
Remove Monitoring ingest pipelines (#77459)
Monitoring installs a number of ingest pipelines which have been historically used
to upgrade documents when mappings and document structures change between 
versions. Since there aren't any changes to the document format, nor will there be 
by the time the format is completely retired, we can comfortably remove these 
pipelines.
2021-09-28 16:10:02 -04:00
James Rodewig 58595e7af5
[DOCS] Searches on the `_type` field are no longer supported (#78400)
Adds an 8.0 breaking change for PR #68564
2021-09-28 14:51:45 -04:00
Benjamin Trent 408489310c
[ML] add zero_shot_classification task for BERT nlp models (#77799)
Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels.

This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs  text sequences with "entailment" clauses. An example could be:

"Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. 

This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification.

See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. 

The zeroshot classification task is configured as follows:
```js
{
   // <snip> model configuration </snip>
  "inference_config" : {
    "zero_shot_classification": {
      "classification_labels": ["entailment", "neutral", "contradiction"], // <1>
      "labels": ["sad", "glad", "mad", "rad"], // <2>
      "multi_label": false, // <3>
      "hypothesis_template": "This example is {}.", // <4>
      "tokenization": { /*<snip> tokenization configuration </snip>*/}
    }
  }
}
```
* <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case
* <2> This is an optional parameter for the default zero_shot labels to attempt to classify
* <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels
* <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.`

For inference in a pipeline one may provide label updates:
```js
{
  //<snip> pipeline definition </snip>
  "processors": [
    //<snip> other processors </snip>
    {
      "inference": {
        // <snip> general configuration </snip>
        "inference_config": {
          "zero_shot_classification": {
             "labels": ["humanities", "science", "mathematics", "technology"], // <1>
             "multi_label": true // <2>
          }
        }
      }
    }
    //<snip> other processors </snip>
  ]
}
```
* <1> The `labels` we care about, these replace the default ones if they exist. 
* <2> Should the results allow multiple true labels

Similarly one may provide label changes against the `_infer` endpoint
```js
{
   "docs":[{ "text_field": "This is a very happy person"}],
   "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}}
}
```
2021-09-28 09:38:23 -04:00
James Rodewig 485e7deaa0
[DOCS] Re-add docs for multiple data paths (MDP) (#78342)
We deprecated support for multiple data paths (MDP) in 7.13. However,
we won't remove support until after 8.0.

Changes:

* Reverts PR #72267, which removed MDP docs
* Removes a related item from the 8.0 breaking changes.
2021-09-28 09:20:45 -04:00
James Rodewig 0c01bcdd9f
[DOCS] Remove index API's `types` option (#78335)
Adds an 8.0 breaking change for PR #47203.
2021-09-28 08:44:25 -04:00
James Rodewig 1764fa0e8f
[DOCS] Remove `type` query (#78334)
Adds an 8.0 breaking change for PR #47207.
2021-09-28 08:44:06 -04:00
Benjamin Trent 00defa38a9
[ML] adding some initial document for our pytorch NLP model support (#78270)
Adding docs for:

put vocab
put model definition part
start deployment
all the new NLP configuration objects for trained model configurations
2021-09-27 12:46:13 -04:00
David Turner 4782cf4d91
Add docs for pre-release version compatibility (#78317)
The reference manual includes docs on version compatibility in various
places, but it's not clear that these docs only apply to released
versions and that the rules for pre-release versions are stricter than
folks expect. This commit adds some words to the docs for unreleased
versions which explains this subtlety.
2021-09-27 16:56:35 +01:00
Przemyslaw Gomulka 8c0d7fa2fa
[doc] Improve documentation for deprecation logging (#78326)
adding a section on WARN messages

relates #77030
2021-09-27 16:56:26 +02:00
James Rodewig b20939f071
[DOCS] Document empty first line support for msearch API (#78284)
Adds an 8.0 breaking change for PR #41011
2021-09-27 08:58:22 -04:00
Lukas Wegmann 421b3e80de
Document missing_order param for composite aggregations (#77839)
Documents the missing_order parameter for composite aggregations introduced in #76740
2021-09-27 09:57:45 +02:00
James Rodewig 38125c147d
[DOCS] Remove `gateway.auto_import_dangling_indices` setting (#78280)
Adds an 8.0 breaking change for PR #59698.
2021-09-26 19:24:01 -04:00
James Rodewig 181aebd1dc
[DOCS] Watcher history now writes to a data stream (#78277)
Adds an 8.0 breaking change for PR #64252.
2021-09-23 16:07:01 -04:00
James Rodewig 96c4bd96a9
[DOCS] Remove support for `unmapped_type:string` sort (#78272)
* [DOCS] Remove support for `unmapped_type:string` sort

Adds an 8.0 breaking change for PR #45675.

* Clarify error

* Reset mapping changes
2021-09-23 13:37:46 -04:00
James Rodewig b3cdf60ab3
Adding priority list and executing description to the pending tasks doc (#74456) (#78259)
* Adding priority to the pending tasks doc

https://github.com/elastic/elasticsearch/pull/19448#discussion_r70969307
917fea7c5d/core/src/main/java/org/elasticsearch/common/Priority.java (L29)

* Adding executing into the cluster pending tasks

* Update docs/reference/cluster/pending.asciidoc

Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>

Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>

Co-authored-by: Leaf-Lin <39002973+Leaf-Lin@users.noreply.github.com>
Co-authored-by: Henning Andersen <33268011+henningandersen@users.noreply.github.com>
2021-09-23 11:17:18 -04:00
István Zoltán Szabó 1d367abffc
[DOCS] Modifies aggregations title abbreviation to follow convention. (#78252) 2021-09-23 16:22:27 +02:00
James Rodewig ce4b95e5b0
[DOCS] Document `time_series_metric` mapping parameter (#78013)
Changes:
* Documents the `time_series_metric` mapping parameter for PR #76766.
* Renames the `dimension` parameter to `time_series_dimension` for PR #78012.
* Adds support for `unsigned_long` to `time_series_dimension` for PR #78204.
2021-09-23 08:54:19 -04:00
Ignacio Vera 9033faffff
Add cross cluster search test for mvt end point (#78054)
This commit adds a test to check that it is supported and document it.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-09-23 07:59:44 +02:00
Tim Vernum 6125067145
Add 'show' command to the keystore CLI (#76693)
This adds a new "elasticsearch-keystore show" command that displays
the value of a single secure setting from the keystore.

An optional `-o` (or `--output`) parameter can be used to direct
output to a file.

The `-o` option is required for binary keystore values
because the CLI `Terminal` class does not support writing binary data.
Hence this command:

    elasticsearch-keystore show xpack.watcher.encryption_key > watcher.key

would not produce a file with the correct contents.

Co-authored-by: Ioannis Kakavas <ikakavas@protonmail.com>
2021-09-23 12:37:20 +10:00
James Rodewig 80ba92f1b1
[DOCS] Add breaking change for unsupported `script` fields (#78217)
Adds an 8.0 breaking change for PR #59507.
2021-09-22 17:41:06 -04:00
Adam Locke 6940673e8a
[DOCS] Update remote cluster docs (#77043)
* [DOCS] Update remote cluster docs

* Add files, rename files, write new stuff

* Plethora of changes

* Add test and update snippets

* Redirects, moved files, and test updates

* Moved file to x-pack for tests

* Remove older CCS page and add redirects

* Cleanup, link updates, and some rewrites

* Update image

* Incorporating user feedback and rewriting much of the remote clusters page

* More changes from review feedback

* Numerous updates, including request examples for CCS and Kibana

* More changes from review feedback

* Minor clarifications on security for remote clusters

* Incorporate review feedback

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Some review feedback and some editorial changes

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Yang Wang <ywangd@gmail.com>
2021-09-22 16:02:33 -04:00
James Rodewig 15baf4017a
[DOCS] Remove `_term` and `_time` agg order keys (#78209)
Adds an 8.0 breaking change for the removal of the `_term` and `_time`
agg `order` keys.

Relates to #39450
2021-09-22 15:54:14 -04:00
James Rodewig ce56c19346
[DOCS] Remove support for EOL OSs and `SysV init` (#78199)
Adds an 8.0 breaking change for the removal of support for several EOL operating
systems and `SysV init`.

Relates to #51480 and #51716
2021-09-22 13:41:52 -04:00
James Rodewig 2b2f0e1d7f
[DOCS] Remove the `listener` thread pool (#78194)
Changes:
* Removes docs for the `listener` thread pool
* Adds an 8.0 breaking change for the thread pool removal

Relates to #53314 and #53049
2021-09-22 13:41:05 -04:00
Ryan Ernst a06aff9b01
Revert "Fail index creation using custom data path (#76792)" (#78031)
This reverts commit 79d91ed9d3.
2021-09-22 09:02:56 -07:00
Adam Locke 7d61b0261c
[DOCS] Add composite runtime fields (#78050)
* [DOCS] Add composite runtime fields

* Update snippets and tests

* Add note that composite runtime fields cannot be indexed yet
2021-09-22 07:56:50 -04:00
Ignacio Vera 75b7b0db03
Add track_total_hits support in mvt API (#78074)
This allows consumers of the API to be able to know exactly if all the features in a tile has been considered 
when building the hits layer of a vector tile

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-09-22 08:37:50 +02:00
James Rodewig db1aac1d8b [DOCS] Edit dedicated hosts section heading 2021-09-21 17:53:07 -04:00