Commit Graph

379 Commits

Author SHA1 Message Date
István Zoltán Szabó 1ce2308e2a
[DOCS] Adds max_trees hyperparameter to GET TM API docs (#72298) 2021-05-06 08:18:19 +02:00
István Zoltán Szabó d07c174aaf
[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483) 2021-05-03 10:20:14 +02:00
Benjamin Trent 2ce4d175f0
[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487)
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.

If a node does not have a view into its native memory, we ignore it for assignment.

This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
2021-04-30 07:55:57 -04:00
István Zoltán Szabó ce9dd74cf5
[DOCS] Expands DFA and TM API docs with required privileges info (#71335) 2021-04-28 08:33:42 +02:00
Pierre Grimaud 3c44dfec60
[DOCS] Fix typos (#72227) 2021-04-26 12:40:38 -04:00
István Zoltán Szabó 2f122f03b2
[DOCS] Adds anomaly detection rule advanced settings to docs (#72072)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-26 09:55:02 +02:00
István Zoltán Szabó aca0a7ffa4
[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745) 2021-04-19 13:06:50 +02:00
Benjamin Trent 01fc8ed246
[ML] adding ability to update runtime_mappings via datafeed config update API (#71707)
Adds runtime_mappings as an updatable field via datafeed config update.

closes: #71702
2021-04-15 09:44:34 -04:00
István Zoltán Szabó ce389dff5d
[DOCS] Clarifies that custom rules are job rules in Kibana (#71678)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-15 09:33:03 +02:00
James Rodewig 693807a6d3
[DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
Benjamin Trent c8415a7924
[ML] adding support for composite aggs in anomaly detection (#69970)
This commit allows for composite aggregations in datafeeds. 

Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. 

The restrictions for this support are as follows:

- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).

Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. 
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.
2021-03-30 08:25:40 -04:00
István Zoltán Szabó 1db2b85e45
[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978) 2021-03-30 10:42:48 +02:00
Benjamin Trent b796632582
[ML] Allow datafeed and job configs for datafeed preview API (#70836)
Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them. 

closes https://github.com/elastic/elasticsearch/issues/70264
2021-03-26 12:52:23 -04:00
István Zoltán Szabó 9a8c6fb66f
[DOCS] Removes beta labels from DFA related docs. (#70808) 2021-03-26 09:46:41 +01:00
István Zoltán Szabó 165c0ddaeb
[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-03-18 18:23:19 +01:00
Benjamin Trent 10e637d97c
[ML] allow documents to be out of order within the same time bucket (#70468)
This commit allows documents seen within the same time bucket to be out of order.

This is already supported within the native process.

Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.
2021-03-17 09:34:49 -04:00
James Rodewig 5c75d004fa
[DOCS] Replace `put` with `create or update` in API names (#70330)
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-03-15 14:49:44 -04:00
István Zoltán Szabó 59f6280a7b
[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201) 2021-03-10 15:46:01 +01:00
Lisa Cawley 2caba7b11f
[DOCS] Edits machine learning settings (#69947)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-03-09 10:59:12 -08:00
István Zoltán Szabó c226958947
[DOCS] Expands anomaly detection alert type docs (#70026)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>
2021-03-09 12:02:16 +01:00
Lisa Cawley c537e5f38c
[DOCS] Edits delete trained model alias API (#70119) 2021-03-08 17:08:58 -08:00
István Zoltán Szabó 8a7aced8e8
[DOCS] Adds beta tag to anomaly detection alert docs. (#70013) 2021-03-08 10:46:24 +01:00
István Zoltán Szabó 2ccc81081f
[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959) 2021-03-04 16:43:06 +01:00
Joe Gallo 1e8b5fa7c2
Remove the _ml/find-file-structure docs (#69823) 2021-03-03 09:49:28 -05:00
Benjamin Trent 2279cafb4e
[ML] adding new _preview endpoint for data frame analytics (#69453)
This commit adds a new `_preview` endpoint for data frame analytics. 

This allows users to see the data on which their model will be trained. This is especially useful 
in the arrival of custom feature processors.

The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.
2021-03-01 12:25:50 -05:00
Lisa Cawley 138224b398
[DOCS] Edits trained model alias API (#69491) 2021-02-24 08:17:49 -08:00
István Zoltán Szabó 77d0f56581
[DOCS] Adds anomaly detection alert documentation (#68923)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-02-23 10:29:54 +01:00
Dimitris Athanasiou 7fb98c0d3c
[ML] Add runtime mappings to data frame analytics source config (#69183)
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.

Closes #65056
2021-02-19 16:29:19 +02:00
Benjamin Trent 0af38bba9e
[ML] add new delete trained model aliases API (#69195)
In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.
2021-02-18 13:12:07 -05:00
Lisa Cawley 55f0e32fe4
[DOCS] Clarify put data frame analytics API feature processors option (#69158) 2021-02-18 08:53:46 -08:00
Benjamin Trent 26eef892df
[ML] adds new trained model alias API to simplify trained model updates and deployments (#68922)
A `model_alias` allows trained models to be referred by a user defined moniker. 

This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. 

Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. 

When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. 

An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.
2021-02-18 09:41:50 -05:00
James Rodewig 9b88ae92e6
[DOCS] Fix typos for duplicate words (#69125) 2021-02-17 10:34:20 -05:00
Lisa Cawley a1fb2c3606
[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084) 2021-02-16 14:02:00 -08:00
Lisa Cawley 8b6ec07613
[DOCS] Edits ML hyperparameter descriptions (#68880) 2021-02-11 11:55:28 -08:00
Lisa Cawley 683368cc4d
[DOCS] Clarify soft_tree_depth_limit (#68787)
Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>
2021-02-10 12:51:01 -08:00
István Zoltán Szabó e45d7a942d
[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213) 2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy 78368428b3
[ML] Add early stopping DFA configuration parameter (#68099)
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.
2021-02-01 11:41:28 +01:00
Dimitris Athanasiou 5c961c1c81
[ML] Expand regression/classification hyperparameters (#67950)
Expands data frame analytics regression and classification
analyses with the followin hyperparameters:

- alpha
- downsample_factor
- eta_growth_rate_per_tree
- max_optimization_rounds_per_hyperparameter
- soft_tree_depth_limit
- soft_tree_depth_tolerance
2021-01-26 12:56:41 +02:00
István Zoltán Szabó addb5cbd3a
[DOCS] Adds custom feature processors description to PUT DFA API (#67424)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-19 09:47:32 +01:00
Dimitris Athanasiou 7574013604
[ML] Remove DFA job states reindexing and analyzing from docs (#67658)
These states do no longer exist as of #67423
2021-01-18 17:39:22 +02:00
Benjamin Trent 35f478b618
[ML] [DOCS] adding missing fields to the get trained models API docs (#67590)
Adds missing fields description, inference_config, and input to the GET trained models API documentation
2021-01-15 13:20:53 -05:00
Benjamin Trent 24ebcc8c24
[ML] [DOCS] update find-structure reference docs (#67586)
The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".
2021-01-15 12:19:38 -05:00
István Zoltán Szabó 085a288af5
[DOCS] Adds hyperparameter metadata property to GET trained models API docs. (#67412) 2021-01-13 13:49:51 +01:00
Lisa Cawley 401d302c69
[DOCS] Move find file structure to a new API endpoint (#67314) 2021-01-12 11:59:45 -08:00
Benjamin Trent af179ab2f5
[ML] move find file structure to a new API endpoint (#67123)
This introduces a new `text-structure` plugin. This is the new home of the find file structure API. 

The old REST URL is still available but is deprecated.

The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.

Changes to the high-level REST client and docs will be in separate commit.

related to: https://github.com/elastic/elasticsearch/issues/67001
2021-01-11 08:56:02 -05:00
Lisa Cawley eff9dfc3a4
[DOCS] Clarify impact of delayed data in anomaly detection (#66816)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-05 12:14:51 -08:00
István Zoltán Szabó d3ad9fe632
[DOCS] Improves inference processor linking and docs (#66119) 2021-01-05 09:42:06 +01:00
David Roberts c5bef7f9a7
[ML] Deprecate anomaly detection post data endpoint (#66347)
There is little evidence of this endpoint being used
and there is quite a lot of code complexity associated
with the various formats that can be used to upload
data and the different errors that can occur when direct
data upload is open to end users.

In a future release we can make this endpoint internal
so that only datafeeds can use it, and remove all the
options and formats that are not used by datafeeds.

End users will have to store their input data for
anomaly detection in Elasticsearch indices (which we
believe all do today) and use a datafeed to feed it
to anomaly detection jobs.
2020-12-15 18:37:20 +00:00
Dimitris Athanasiou 3bed6661de
[ML] Add log_time to AD data_counts and decide current based on it (#66343)
This commit is fixing a potential bug if we support anomaly detection
results index rollover in the future.

In particular, we determine the current `data_counts` by sorting on the
latest record time. However, this is not correct if the job reverts
to an older model snapshot. To fix this we add `log_time` to `data_counts`
(similarly to `model_size_stats`) and sort on `log_time` to figure
out the current counts for the job.
2020-12-15 19:09:13 +02:00
István Zoltán Szabó bc989e4a86
[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085) 2020-12-09 10:47:51 +01:00