Commit Graph

10038 Commits

Author SHA1 Message Date
Iraklis Psaroudakis aa083ce419
[CI] Mute reference/cluster/nodes-stats (#91399)
relates #91081
2022-11-08 14:57:37 +02:00
Iraklis Psaroudakis dcdf58721d
[CI] Mute reference/cluster/nodes-stats/line_2735 (#91380)
relates #91081
2022-11-08 05:04:49 -05:00
Liam Thompson cd6be58860
[DOCS] Add reference for ingest pipelines in Enterprise Search (#91357) 2022-11-08 09:22:01 +01:00
Hendrik Muhs 1b556d75fa
mute another node stats test (#91346)
muting another test part as it causes a lot of CI failures

relates #91081
2022-11-07 06:07:09 -05:00
Lisa Cawley 99877382a0
[DOCS] Remove coming tag from release notes (#91330) 2022-11-04 18:36:37 -07:00
Lisa Cawley 9e83084020
[DOCS] Clarify description of geo_results (#91237) 2022-11-04 08:15:46 -07:00
David Kilfoyle 3295662697
[DOCS] Add time range info to TSDS docs (#91291)
* [DOCS] Add time range info to TSDS docs

* Fixup
2022-11-04 09:18:35 -04:00
Dimitris Athanasiou 4e67df8b05
[ML] Low priority trained model deployments (#91234)
This adds a new parameter to the start trained model deployment API,
namely `priority`. The available settings are `normal` and `low`.

For normal priority deployments the allocations get distributed so that
node processors are never oversubscribed.

Low priority deployments allow users to test model functionality even if there
are no node processors available. They are limited to 1 allocation with a single thread.
In addition, the process is executed in low priority which limits the amount of
CPU that can be used when the CPU is under pressure. The intention of this is to
limit the impact of low priority deployments on normal priority deployments.

When we rebalance model assignments we now:

  1. compute a plan just for normal priority deployments
  2. fix the resources used by normal deployments
  3. compute a plan just for low priority deployments
  4. merge the two plans

Closes #91024
2022-11-04 14:22:30 +02:00
Hendrik Muhs 14b2d2d37e
[ML] frequent items filter (#91137)
add a filter to the frequent items agg that filters documents from the analysis while still calculating support on the full set

A filter is specified top-level in frequent_items:

"frequent_items": {
  "filter": {
    "term": {
      "host.name.keyword": "i-12345"
    }
   },
...

The above filters documents that don't match, however still counts the docs when calculating support. That's in contrast to
specifying a query at the top, in which case you find the same item sets, but don't know the importance given the full
document set.
2022-11-03 13:58:40 +01:00
charliek17 4192c5b327
Update move-to-step.asciidoc (#91114) 2022-11-03 08:55:24 +00:00
Valeriy Khakhutskyy 7c4186ddbc
[ML] Update API documentation for anomaly score explanation (#91177)
This PR updates the API documentation to match the UI.

Co-authored-by: lcawl <lcawley@elastic.co>
2022-11-01 21:43:33 +01:00
Lisa Cawley 2d30bbab21
[DOCS] Semantic search endpoint (#91210) 2022-11-01 09:01:55 -07:00
Abdon Pijpelink 8abd39ab98
Fix typo in stop-tokenfilter.asciidoc (#91128) (#91207)
Since ignore_case is set to true in our custom stop words filter, the matching will be case-insensitive.

(cherry picked from commit a03fba9d77)

Co-authored-by: Siniša Subašić <68671543+sinisuba@users.noreply.github.com>
2022-11-01 15:32:16 +01:00
David Kilfoyle 56397f5d4c
[Docs] Remove feature flag from downsampling page (#91228) 2022-11-01 09:51:22 -04:00
Anthony McGlone 0249d1650f
[DOCS] Update the feature state example in the snapshot and restore docs (#90328) 2022-11-01 10:17:29 +09:00
Lisa Cawley f0c12cdeea
[DOCS] Fix typo in knn-search.asciidoc (#91206) 2022-10-31 10:07:53 -07:00
Mary Gouseti d55059afab
Mute reference/cluster/nodes-stats/line_2751 (#91174) 2022-10-28 11:55:53 +02:00
Julie Tibshirani 1b249639f1
Remove experimental marking from kNN search (#91065)
This commit removes the experimental tag from kNN search docs and makes some
docs improvements:
* Add a prominent warning about memory usage in the kNN search guide
* Link to the performance tuning guide from the main guide
* Clarify the memory requirements section in the tuning guide
2022-10-27 18:00:56 +02:00
Yang Wang 882fbe62b5
[Doc] Improve doc for certutil parameter applicability (#91124)
The http command does not take most of the parameters. This PR ensures
it is consistently documented for all parameters.
2022-10-27 09:38:56 +11:00
Frederic Dartayre fe0036fdbf
Update threadpool.asciidoc (#90098)
* Update threadpool.asciidoc

Starting from 8.0 the value of the `node.processors` setting is  bounded by the number of available
processors https://github.com/elastic/elasticsearch/pull/44894

* Update docs/reference/modules/threadpool.asciidoc

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-26 14:04:39 -04:00
Craig Taverner c19f642d94
Refine geo-point and geo-shape docs (#90913)
* Refine geo-point and geo-shape docs

While reviewing the docs for another issue, some deprecated
references to prefix-trees were discovered, leading to interest
in bringing the docs a little more up-to-date.

* Update docs/reference/mapping/types/geo-point.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/mapping/types/geo-shape.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2022-10-26 12:21:34 +02:00
Hendrik Muhs 82a71f6ef6
[Transform] add a health section to transform stats (#90760)
adds a health section to the transform stats endpoint and implements reporting assignment, indexing/search and persistence problems, together with a overall health state.
2022-10-25 09:01:21 +02:00
Flavio 83694c37a3
Update docker image (#90730) 2022-10-24 15:52:36 -04:00
Stéphane Campinas 8c44ed1442
Fix itemized list (#90855) 2022-10-24 15:14:17 -04:00
Przemysław Witek 95f484c4fd
[Transform] Expand the docs section regarding mappings deduction in transform's dest index (#91077) 2022-10-24 13:43:22 +02:00
Christos Soulios 1f265eb725
[DOCS] Add release notes for 8.5.0(#91063)
Forward port PR (#91029) with release notes for version 8.5.0
  - Add release notes for v8.5.0 after BC6 has been cut
2022-10-21 13:17:33 +03:00
Jack Conradson f28ae4b288
Add support for indexing byte-sized knn vectors (#90774)
This change adds an element_type as an optional mapping parameter for dense vector fields as 
described in #89784. This also adds a byte element_type for dense vector fields that supports storing 
dense vectors using only 8-bits per dimension. This is only supported when the mapping parameter 
index is set to true.

The code follows a similar pattern to our NumberFieldMapper where we have an enum for 
ElementType, and it has methods that DenseVectorFieldType and DenseVectorMapper can delegate to 
to support each available type (just float and byte for now).
2022-10-20 14:45:58 -07:00
Iraklis Psaroudakis 0f4374f4fb
Explain disk headroom settings more in docs (#90763)
Relates to #81406
2022-10-20 18:45:23 +03:00
Roberto Seldner 8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Leaf-Lin 14ef513f2c
[DOCS] Add CCR limitation (#87348)
* Add CCR limitation

closes https://github.com/elastic/elasticsearch/issues/86121

* Add restored index auto follow pattern restriction

https://github.com/elastic/elasticsearch/issues/87055

* Moving content to existing CCR page + several changes

* Remove sections to consolidate limitation information

* Delete separate file

* Remove restored indices from list of things that aren't replicated

Co-authored-by: Adam Locke <adam.locke@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2022-10-17 16:05:29 -04:00
Lisa Cawley 2dd7732553
[DOCS] Add ML CPP PRs to release notes (#90961) 2022-10-17 09:58:40 -07:00
Mary Gouseti cfd23d512f
Disk indicator troubleshooting guides (#90504) 2022-10-14 15:24:21 +02:00
Paramdeep Singh 34ff7a9d98
Consolidated Circuit Breaker documentation to include EQL and ML infer (#90809)
Fixes #85851 

Co-authored-by: Iraklis Psaroudakis <kingherc@gmail.com>
2022-10-14 14:33:52 +03:00
Przemyslaw Gomulka aa922754af
Add known issues entry about date rounding bug (#90721)
add entry to all affected versions

relates #90187
2022-10-14 11:51:02 +02:00
Francisco Fernández Castaño 1a3032beb6
Keep track of average shard write load (#90768)
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.

Closes #90102
2022-10-13 16:34:45 +02:00
David Kyle 9e6a784aa5
[ML] Semantic search endpoint (#90450)
Adds a {index}_semantic_search endpoint which first converts the query text into a dense vector
using a NLP text embedding model then performs a knn search against an index containing 
dense vectors created with the same embedding model.
2022-10-13 13:17:30 +01:00
David Roberts be006e2eee
[ML] Improve categorize_text docs (#90765)
Adds more detail about the meaning of the results
fields of the `categorize_text` aggregation, and
advice about how to use these fields when searching
for messages that match the categories.

Followup to #90723
2022-10-13 10:46:53 +01:00
Julie Tibshirani f4038b3f15
Add guide for tuning kNN search (#89782)
This 'how to' guide explains performance considerations specific to kNN search.
It takes inspiration from the 'tune for search speed' guide.
2022-10-12 14:53:53 -07:00
Nik Everett 82aeb478db
Synthetic `_source`: support `wildcard` field (#90196)
This adds synthetic `_source` support for the `wildcard` field type.
2022-10-12 15:55:13 -04:00
David Kilfoyle cad87c4d5a
[DOCS] Add Downsampling docs (#88571)
This adds documentation for downsampling of time series indices.
2022-10-12 12:10:16 -04:00
Valeriy Khakhutskyy 95758e88a2
[ML] Explain anomaly score factors (#90675)
This PR surfaces new information about the impact of the factors on the initial anomaly score in the anomaly record:

- single bucket impact is determined by the deviation between actual and typical in the current bucket
- multi-bucket impact is determined by the deviation between actual and typical in the past 12 buckets
- anomaly characteristics are statistical properties of the current anomaly compared to the historical observations
- high variance penalty is the reduction of anomaly score in the buckets with large confidence intervals.
- incomplete bucket penalty is the reduction of anomaly score in the buckets with fewer samples than historically expected.

Additionally, we compute lower- and upper-confidence bounds and the typical value for the anomaly records. This improves the explainability of the cases where the model plot is not activated with only a slight overhead in performance (1-2%).
2022-10-12 16:57:06 +02:00
Luca Cavanna 18942d5b11
Enhance nested depth tracking when parsing queries (#90425)
When parsing queries on the coordinating node, there is currently no way to share state between the different parsing methods (`fromXContent`). The only query that supports a parse context is bool query, which uses the context to track nested depth of queries, added with #66204. Such nested depth tracking mechanism is not 100% accurate as it tracks bool queries only, while there's many more query types that can hold other queries hence potentially cause stack overflow when deeply nested.

This change removes the parsing context that's specific to bool query, introduced with #66204, in favour of generalizing the nested depth tracking to all query types.

The generic tracking is introduced by wrapping the parser and overriding the method that parses named objects through the xcontent registry. Another way would have been to require a context argument when parsing queries, which would mean adding a context argument to all the QueryBuilder#fromXContent static methods. That would be a breaking change for plugins that provide custom queries, hence I went for trying out a different approach.

One aspect that this change requires and introduces is the distinction between parsing a top level query (which will wrap the parser, or it would create the context if we had one), as opposed to parsing an inner query, which goes ahead with the given parser and context. We already have this distinction as we have two different static methods in `AbstractQueryBuilder` but in practice only bool query makes the distinction being the only context-aware query.

In addition to generalizing tracking nested depth when parsing queries, we should be able to adopt this same strategy to track queries usage as part #90176 .

Given that the depth check is now more restrictive, as it counts all compound queries and not only bool, we have decided to raise the default limit to `30` to ensure that users are not going to hit the limit due to this change.
2022-10-12 15:15:06 +02:00
Albert Zaharovits 73cdc7b80a
DOC CCR Disaster recovery does not handle Security configuration (#85522)
We do not support and don't plan to support disaster recovery arrangements
where Security configuration is replicated between the production and the
disaster recovery cluster because the cluster-local Security APIs assume
exclusive write on the .security system index.
2022-10-12 13:53:53 +03:00
Ed Savage f355787165
[ML] Allow overriding timestamp field to null in file structure finder (#90764)
Use a magic value of "null" for the timestamp format override to indicate to the analysis that a timestamp is not expected in the input text. This should improve performance when analysing delimited, ndjson or xml formatted text files that don't contain timestamps. For semi-structured text files without timestamps the magic value indicates to treat the text as single line log messages.

see #55219
2022-10-12 09:08:25 +01:00
Dimitris Athanasiou 16bfc550ea
[ML] Add api to update trained model deployment number_of_allocations (#90728)
This commit adds a new API that users can use calling:

```
POST _ml/trained_models/{model_id}/deployment/_update
{
  "number_of_allocations": 4
}
```

This allows a user to update the number of allocations for a deployment
that is `started`.

If the allocations are increased we rebalance and let the assignment
planner find how to allocate the additional allocations.

If the allocations are decreased we cannot use the assignment planner.
Instead, we implement the reduction in a new class `AllocationReducer`
that tries to reduce the allocations so that:

  1. availability zone balance is maintained
  2. assignments that can be completely stopped are preferred to release memory
2022-10-12 10:04:23 +03:00
David Roberts bfccd20155
[ML] Add a regex to the output of the categorize_text aggregation (#90723)
The new `regex` field in `categorize_text` output is created in
the same way as the `regex` field that appears in the category
definitions created by anomaly detection jobs that do categorization.

It consists of the terms that occur in the same order for every
message that matches the category, separated with a `.+?` wildcard.
It therefore matches the category messages and enforces the order
of the terms that occurred in the same order for all messages used
to create the category.

It is not recommended to use the regex as the primary mechanism for
searching for the original documents that were categorized. Search
using a regular expression is very slow. Instead the terms of the
category should be used to search for matching documents, as a
terms search can use the inverted index and hence be much faster.
However, there may be situations where it is useful to use the
`regex` field to test whether a small set of messages that have not
been indexed match the category.
2022-10-10 11:41:16 +01:00
Andrei Dan b55f5fd77b
Rename the fields reported under details by the disk indicator (#90717)
Currently, we report the count of affected nodes and indices as part of
the disk indicator using a leaky abstraction. Namely we use the status
we assign to nodes internally to nodes based on their disk usage (red,
yellow, green, unknown).

However, these statuses don't have an explicit meaning outside the
implementation details e.g. a red node would probably convey it's a node
experiencing disk issues but not what kind

This proposes being explicit in what we return to our health API users
e.g.
```
"details": {
  "indices_with_readonly_block": 2,
  "nodes_with_enough_disk_space": 0,
  "nodes_with_unknown_disk_status": 0,
  "nodes_over_high_watermark": 0,
  "nodes_over_flood_watermark": 2
}
```
2022-10-10 11:30:03 +01:00
Lisa Cawley db2882cbb5
[DOCS] Add links to clear trained model deployment cache API (#90727) 2022-10-06 10:10:55 -07:00
Brandon Morelli ced1447db0
docs: update fleet/agent pipeline docs (#90659)
* docs: update fleet/agent pipeline docs

* Apply suggestions from code review

Co-authored-by: Adam Locke <adam.locke@elastic.co>

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-10-05 13:06:58 -07:00
Jack Conradson 8b0d0716d1
Add profiling and documentation for dfs phase (#90536)
Adds profiling statistics for the dfs phase, and adds documentation for both the dfs phase profiling 
and kNN profiling.

Closes #89713
2022-10-05 09:54:36 -07:00