Commit Graph

10030 Commits

Author SHA1 Message Date
Hendrik Muhs 14b2d2d37e
[ML] frequent items filter (#91137)
add a filter to the frequent items agg that filters documents from the analysis while still calculating support on the full set

A filter is specified top-level in frequent_items:

"frequent_items": {
  "filter": {
    "term": {
      "host.name.keyword": "i-12345"
    }
   },
...

The above filters documents that don't match, however still counts the docs when calculating support. That's in contrast to
specifying a query at the top, in which case you find the same item sets, but don't know the importance given the full
document set.
2022-11-03 13:58:40 +01:00
charliek17 4192c5b327
Update move-to-step.asciidoc (#91114) 2022-11-03 08:55:24 +00:00
Valeriy Khakhutskyy 7c4186ddbc
[ML] Update API documentation for anomaly score explanation (#91177)
This PR updates the API documentation to match the UI.

Co-authored-by: lcawl <lcawley@elastic.co>
2022-11-01 21:43:33 +01:00
Lisa Cawley 2d30bbab21
[DOCS] Semantic search endpoint (#91210) 2022-11-01 09:01:55 -07:00
Abdon Pijpelink 8abd39ab98
Fix typo in stop-tokenfilter.asciidoc (#91128) (#91207)
Since ignore_case is set to true in our custom stop words filter, the matching will be case-insensitive.

(cherry picked from commit a03fba9d77)

Co-authored-by: Siniša Subašić <68671543+sinisuba@users.noreply.github.com>
2022-11-01 15:32:16 +01:00
David Kilfoyle 56397f5d4c
[Docs] Remove feature flag from downsampling page (#91228) 2022-11-01 09:51:22 -04:00
Anthony McGlone 0249d1650f
[DOCS] Update the feature state example in the snapshot and restore docs (#90328) 2022-11-01 10:17:29 +09:00
Lisa Cawley f0c12cdeea
[DOCS] Fix typo in knn-search.asciidoc (#91206) 2022-10-31 10:07:53 -07:00
Mary Gouseti d55059afab
Mute reference/cluster/nodes-stats/line_2751 (#91174) 2022-10-28 11:55:53 +02:00
Julie Tibshirani 1b249639f1
Remove experimental marking from kNN search (#91065)
This commit removes the experimental tag from kNN search docs and makes some
docs improvements:
* Add a prominent warning about memory usage in the kNN search guide
* Link to the performance tuning guide from the main guide
* Clarify the memory requirements section in the tuning guide
2022-10-27 18:00:56 +02:00
Yang Wang 882fbe62b5
[Doc] Improve doc for certutil parameter applicability (#91124)
The http command does not take most of the parameters. This PR ensures
it is consistently documented for all parameters.
2022-10-27 09:38:56 +11:00
Frederic Dartayre fe0036fdbf
Update threadpool.asciidoc (#90098)
* Update threadpool.asciidoc

Starting from 8.0 the value of the `node.processors` setting is  bounded by the number of available
processors https://github.com/elastic/elasticsearch/pull/44894

* Update docs/reference/modules/threadpool.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-26 14:04:39 -04:00
Craig Taverner c19f642d94
Refine geo-point and geo-shape docs (#90913)
* Refine geo-point and geo-shape docs

While reviewing the docs for another issue, some deprecated
references to prefix-trees were discovered, leading to interest
in bringing the docs a little more up-to-date.

* Update docs/reference/mapping/types/geo-point.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/mapping/types/geo-shape.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-10-26 12:21:34 +02:00
Hendrik Muhs 82a71f6ef6
[Transform] add a health section to transform stats (#90760)
adds a health section to the transform stats endpoint and implements reporting assignment, indexing/search and persistence problems, together with a overall health state.
2022-10-25 09:01:21 +02:00
Flavio 83694c37a3
Update docker image (#90730) 2022-10-24 15:52:36 -04:00
Stéphane Campinas 8c44ed1442
Fix itemized list (#90855) 2022-10-24 15:14:17 -04:00
Przemysław Witek 95f484c4fd
[Transform] Expand the docs section regarding mappings deduction in transform's dest index (#91077) 2022-10-24 13:43:22 +02:00
Christos Soulios 1f265eb725
[DOCS] Add release notes for 8.5.0(#91063)
Forward port PR (#91029) with release notes for version 8.5.0
  - Add release notes for v8.5.0 after BC6 has been cut
2022-10-21 13:17:33 +03:00
Jack Conradson f28ae4b288
Add support for indexing byte-sized knn vectors (#90774)
This change adds an element_type as an optional mapping parameter for dense vector fields as 
described in #89784. This also adds a byte element_type for dense vector fields that supports storing 
dense vectors using only 8-bits per dimension. This is only supported when the mapping parameter 
index is set to true.

The code follows a similar pattern to our NumberFieldMapper where we have an enum for 
ElementType, and it has methods that DenseVectorFieldType and DenseVectorMapper can delegate to 
to support each available type (just float and byte for now).
2022-10-20 14:45:58 -07:00
Iraklis Psaroudakis 0f4374f4fb
Explain disk headroom settings more in docs (#90763)
Relates to #81406
2022-10-20 18:45:23 +03:00
Roberto Seldner 8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Leaf-Lin 14ef513f2c
[DOCS] Add CCR limitation (#87348)
* Add CCR limitation

closes https://github.com/elastic/elasticsearch/issues/86121

* Add restored index auto follow pattern restriction

https://github.com/elastic/elasticsearch/issues/87055

* Moving content to existing CCR page + several changes

* Remove sections to consolidate limitation information

* Delete separate file

* Remove restored indices from list of things that aren't replicated

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-10-17 16:05:29 -04:00
Lisa Cawley 2dd7732553
[DOCS] Add ML CPP PRs to release notes (#90961) 2022-10-17 09:58:40 -07:00
Mary Gouseti cfd23d512f
Disk indicator troubleshooting guides (#90504) 2022-10-14 15:24:21 +02:00
Paramdeep Singh 34ff7a9d98
Consolidated Circuit Breaker documentation to include EQL and ML infer (#90809)
Fixes #85851 

Co-authored-by: Iraklis Psaroudakis <kingherc@gmail.com>
2022-10-14 14:33:52 +03:00
Przemyslaw Gomulka aa922754af
Add known issues entry about date rounding bug (#90721)
add entry to all affected versions

relates #90187
2022-10-14 11:51:02 +02:00
Francisco Fernández Castaño 1a3032beb6
Keep track of average shard write load (#90768)
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.

Closes #90102
2022-10-13 16:34:45 +02:00
David Kyle 9e6a784aa5
[ML] Semantic search endpoint (#90450)
Adds a {index}_semantic_search endpoint which first converts the query text into a dense vector
using a NLP text embedding model then performs a knn search against an index containing 
dense vectors created with the same embedding model.
2022-10-13 13:17:30 +01:00
David Roberts be006e2eee
[ML] Improve categorize_text docs (#90765)
Adds more detail about the meaning of the results
fields of the `categorize_text` aggregation, and
advice about how to use these fields when searching
for messages that match the categories.

Followup to #90723
2022-10-13 10:46:53 +01:00
Julie Tibshirani f4038b3f15
Add guide for tuning kNN search (#89782)
This 'how to' guide explains performance considerations specific to kNN search.
It takes inspiration from the 'tune for search speed' guide.
2022-10-12 14:53:53 -07:00
Nik Everett 82aeb478db
Synthetic `_source`: support `wildcard` field (#90196)
This adds synthetic `_source` support for the `wildcard` field type.
2022-10-12 15:55:13 -04:00
David Kilfoyle cad87c4d5a
[DOCS] Add Downsampling docs (#88571)
This adds documentation for downsampling of time series indices.
2022-10-12 12:10:16 -04:00
Valeriy Khakhutskyy 95758e88a2
[ML] Explain anomaly score factors (#90675)
This PR surfaces new information about the impact of the factors on the initial anomaly score in the anomaly record:

- single bucket impact is determined by the deviation between actual and typical in the current bucket
- multi-bucket impact is determined by the deviation between actual and typical in the past 12 buckets
- anomaly characteristics are statistical properties of the current anomaly compared to the historical observations
- high variance penalty is the reduction of anomaly score in the buckets with large confidence intervals.
- incomplete bucket penalty is the reduction of anomaly score in the buckets with fewer samples than historically expected.

Additionally, we compute lower- and upper-confidence bounds and the typical value for the anomaly records. This improves the explainability of the cases where the model plot is not activated with only a slight overhead in performance (1-2%).
2022-10-12 16:57:06 +02:00
Luca Cavanna 18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
Albert Zaharovits 73cdc7b80a
DOC CCR Disaster recovery does not handle Security configuration (#85522)
We do not support and don't plan to support disaster recovery arrangements
where Security configuration is replicated between the production and the
disaster recovery cluster because the cluster-local Security APIs assume
exclusive write on the .security system index.
2022-10-12 13:53:53 +03:00
Ed Savage f355787165
[ML] Allow overriding timestamp field to null in file structure finder (#90764)
Use a magic value of "null" for the timestamp format override to indicate to the analysis that a timestamp is not expected in the input text. This should improve performance when analysing delimited, ndjson or xml formatted text files that don't contain timestamps. For semi-structured text files without timestamps the magic value indicates to treat the text as single line log messages.

see #55219
2022-10-12 09:08:25 +01:00
Dimitris Athanasiou 16bfc550ea
[ML] Add api to update trained model deployment number_of_allocations (#90728)
This commit adds a new API that users can use calling:

```
POST _ml/trained_models/{model_id}/deployment/_update
{
  "number_of_allocations": 4
}
```

This allows a user to update the number of allocations for a deployment
that is `started`.

If the allocations are increased we rebalance and let the assignment
planner find how to allocate the additional allocations.

If the allocations are decreased we cannot use the assignment planner.
Instead, we implement the reduction in a new class `AllocationReducer`
that tries to reduce the allocations so that:

  1. availability zone balance is maintained
  2. assignments that can be completely stopped are preferred to release memory
2022-10-12 10:04:23 +03:00
David Roberts bfccd20155
[ML] Add a regex to the output of the categorize_text aggregation (#90723)
The new `regex` field in `categorize_text` output is created in
the same way as the `regex` field that appears in the category
definitions created by anomaly detection jobs that do categorization.

It consists of the terms that occur in the same order for every
message that matches the category, separated with a `.+?` wildcard.
It therefore matches the category messages and enforces the order
of the terms that occurred in the same order for all messages used
to create the category.

It is not recommended to use the regex as the primary mechanism for
searching for the original documents that were categorized. Search
using a regular expression is very slow. Instead the terms of the
category should be used to search for matching documents, as a
terms search can use the inverted index and hence be much faster.
However, there may be situations where it is useful to use the
`regex` field to test whether a small set of messages that have not
been indexed match the category.
2022-10-10 11:41:16 +01:00
Andrei Dan b55f5fd77b
Rename the fields reported under details by the disk indicator (#90717)
Currently, we report the count of affected nodes and indices as part of
the disk indicator using a leaky abstraction. Namely we use the status
we assign to nodes internally to nodes based on their disk usage (red,
yellow, green, unknown).

However, these statuses don't have an explicit meaning outside the
implementation details e.g. a red node would probably convey it's a node
experiencing disk issues but not what kind

This proposes being explicit in what we return to our health API users
e.g.
```
"details": {
  "indices_with_readonly_block": 2,
  "nodes_with_enough_disk_space": 0,
  "nodes_with_unknown_disk_status": 0,
  "nodes_over_high_watermark": 0,
  "nodes_over_flood_watermark": 2
}
```
2022-10-10 11:30:03 +01:00
Lisa Cawley db2882cbb5
[DOCS] Add links to clear trained model deployment cache API (#90727) 2022-10-06 10:10:55 -07:00
Brandon Morelli ced1447db0
docs: update fleet/agent pipeline docs (#90659)
* docs: update fleet/agent pipeline docs

* Apply suggestions from code review

Co-authored-by: Adam Locke <adam.locke@elastic.co>

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-05 13:06:58 -07:00
Jack Conradson 8b0d0716d1
Add profiling and documentation for dfs phase (#90536)
Adds profiling statistics for the dfs phase, and adds documentation for both the dfs phase profiling 
and kNN profiling.

Closes #89713
2022-10-05 09:54:36 -07:00
Lisa Cawley c5c1f46fba
[DOCS] Remove coming tag from 8.4.3 release notes (#90683) 2022-10-05 08:05:41 -07:00
Ievgen Degtiarenko 4d6d979e0e
Deprecate state field in `/_cluster/reroute` response (#90399) 2022-10-05 08:18:27 +02:00
Lee Hinman 4fe9fc488c
Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
2022-10-04 01:04:40 +10:30
Adam Locke 52feb5540b
[Doc] Release notes for v8.4.3 (#90443) (#90538)
* Update docs for v8.4.3 release

* Update release highlights for 8.4.3 version.

* Update docs/reference/release-notes/8.4.3.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Update docs/reference/release-notes/8.4.3.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Update docs/reference/release-notes/highlights.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>

* Make link external type

* Update release notes to include #90319 PR after creating BC2.

* Remove release note for #90302

* Minor grammar fix

Co-authored-by: Adam Locke <adam.locke@elastic.co>
(cherry picked from commit 25a196f214)

# Conflicts:
#	docs/reference/release-notes.asciidoc
#	docs/reference/release-notes/highlights.asciidoc

Co-authored-by: Slobodan Adamović <slobodanadamovic@users.noreply.github.com>
2022-09-30 16:10:26 -04:00
Iraklis Psaroudakis ad8d064de5
Redefine section on sizing data nodes (#90274)
Now that we have the estimated field mappings heap overhead
in nodes stats, we can refer to them in the guide for sizing
data nodes appropriately.

Relates to #86639
2022-09-30 12:37:21 +03:00
debadair ef7aaec815
[DOCS] Fixed footnote. Closes #89403 (#90541) 2022-09-29 16:48:02 -07:00
David Turner c95fb2f3e8
More opinionated docs about http.max_content_length (#90500)
Adds to the docs a note that the `100mb` default for
`http.max_content_length` is the recommended maximum, along with
suggestions for what to do when hitting this limit.
2022-09-29 16:07:38 +01:00
David Kyle 17579ae1af
[ML] Add stat for non cache hit inference time (#90464) 2022-09-29 12:18:27 +01:00