Commit Graph

486 Commits

Author SHA1 Message Date
ymao1 c727b40d0b
[Docs] Update cross-document links to Kibana Alerting docs (#74034)
* Updating cross-document links

* PR fixes
2021-06-14 12:23:47 -04:00
Dimitris Athanasiou dc61a72c9e
[ML] Reset anomaly detection job API (#73908)
Adds a new API that allows a user to reset
an anomaly detection job.

To use the API do:

```
POST _ml/anomaly_detectors/<job_id>_reset
```

The API removes all data associated to the job.
In particular, it deletes model state, results and stats.

However, job notifications and user annotations are not removed.

Also, the API can be called asynchronously by setting the parameter
`wait_for_completion` to `false` (defaults to `true`). When run
that way the API returns the task id for further monitoring.

In order to prevent the job from opening while it is resetting,
a new job field has been added called `blocked`. It is an object
that contains a `reason` and the `task_id`. `reason` can take
a value from ["delete", "reset", "revert"] as all these
operations should block the job from opening. The `task_id` is also
included in order to allow tracking the task if necessary.

Finally, this commit also sets the `blocked` field when
the revert snapshot API is called as a job should not be opened
while it is reverted to a different model snapshot.
2021-06-14 18:56:28 +03:00
Benjamin Trent 8d882863d7
[ML] adding running_state to datafeed stats object (#73926)
It is useful to know the following information when reading datafeed stats:

 - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time
 - Has the datafeed processed all past data available at the time of starting.

This object is only available if the datafeed task has been created.

It has the form:

```
"running_state": {
  "is_real_time": <boolean>,
  "look_back_finished": <boolean>
}
```
2021-06-10 08:08:49 -04:00
István Zoltán Szabó 20d0dc300f
[DOCS] Updates datafeed related runtime field examples (#73725) 2021-06-08 11:27:55 +02:00
Lisa Cawley a6339918ac
[DOCS] Adds defaults to get ML results APIs (#73540)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-06-03 10:05:47 -07:00
István Zoltán Szabó 44c26c8bdc
[DOCS] Removes Kibana charts-related advise about agg interval and bucket span. (#73673) 2021-06-02 16:47:01 +02:00
David Roberts 0059c59e25
[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805)
Categorization jobs created once the entire cluster is upgraded to
version 7.14 or higher will default to using the new ml_standard
tokenizer rather than the previous default of the ml_classic
tokenizer, and will incorporate the new first_non_blank_line char
filter so that categorization is based purely on the first non-blank
line of each message.

The difference between the ml_classic and ml_standard tokenizers
is that ml_classic splits on slashes and colons, so creates multiple
tokens from URLs and filesystem paths, whereas ml_standard attempts
to keep URLs, email addresses and filesystem paths as single tokens.

It is still possible to config the ml_classic tokenizer if you
prefer: just provide a categorization_analyzer within your
analysis_config and whichever tokenizer you choose (which could be
ml_classic or any other Elasticsearch tokenizer) will be used.

To opt out of using first_non_blank_line as a default char filter,
you must explicitly specify a categorization_analyzer that does not
include it.

If no categorization_analyzer is specified but categorization_filters
are specified then the categorization filters are converted to char
filters applied that are applied after first_non_blank_line.

Closes elastic/ml-cpp#1724
2021-06-01 15:11:32 +01:00
István Zoltán Szabó 1ce2308e2a
[DOCS] Adds max_trees hyperparameter to GET TM API docs (#72298) 2021-05-06 08:18:19 +02:00
István Zoltán Szabó d07c174aaf
[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483) 2021-05-03 10:20:14 +02:00
Benjamin Trent 2ce4d175f0
[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487)
This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory.

If a node does not have a view into its native memory, we ignore it for assignment.

This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.
2021-04-30 07:55:57 -04:00
István Zoltán Szabó ce9dd74cf5
[DOCS] Expands DFA and TM API docs with required privileges info (#71335) 2021-04-28 08:33:42 +02:00
Pierre Grimaud 3c44dfec60
[DOCS] Fix typos (#72227) 2021-04-26 12:40:38 -04:00
István Zoltán Szabó 2f122f03b2
[DOCS] Adds anomaly detection rule advanced settings to docs (#72072)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-26 09:55:02 +02:00
István Zoltán Szabó aca0a7ffa4
[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745) 2021-04-19 13:06:50 +02:00
Benjamin Trent 01fc8ed246
[ML] adding ability to update runtime_mappings via datafeed config update API (#71707)
Adds runtime_mappings as an updatable field via datafeed config update.

closes: #71702
2021-04-15 09:44:34 -04:00
István Zoltán Szabó ce389dff5d
[DOCS] Clarifies that custom rules are job rules in Kibana (#71678)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-04-15 09:33:03 +02:00
James Rodewig 693807a6d3
[DOCS] Fix double spaces (#71082) 2021-03-31 09:57:47 -04:00
Benjamin Trent c8415a7924
[ML] adding support for composite aggs in anomaly detection (#69970)
This commit allows for composite aggregations in datafeeds. 

Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. 

The restrictions for this support are as follows:

- The composite aggregation must have EXACTLY one `date_histogram` source
- The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source
- The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg
- If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg.
- The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket).

Some key user interaction differences:
- Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. 
- Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is.
- I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.
2021-03-30 08:25:40 -04:00
István Zoltán Szabó 1db2b85e45
[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978) 2021-03-30 10:42:48 +02:00
Benjamin Trent b796632582
[ML] Allow datafeed and job configs for datafeed preview API (#70836)
Previously, a datafeed and job must already exist for the `_preview` API to work.

With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job
without creating either of them. 

closes https://github.com/elastic/elasticsearch/issues/70264
2021-03-26 12:52:23 -04:00
István Zoltán Szabó 9a8c6fb66f
[DOCS] Removes beta labels from DFA related docs. (#70808) 2021-03-26 09:46:41 +01:00
István Zoltán Szabó 165c0ddaeb
[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-03-18 18:23:19 +01:00
Benjamin Trent 10e637d97c
[ML] allow documents to be out of order within the same time bucket (#70468)
This commit allows documents seen within the same time bucket to be out of order.

This is already supported within the native process.

Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.
2021-03-17 09:34:49 -04:00
James Rodewig 5c75d004fa
[DOCS] Replace `put` with `create or update` in API names (#70330)
Co-authored-by: debadair <debadair@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-03-15 14:49:44 -04:00
István Zoltán Szabó 59f6280a7b
[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201) 2021-03-10 15:46:01 +01:00
Lisa Cawley 2caba7b11f
[DOCS] Edits machine learning settings (#69947)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-03-09 10:59:12 -08:00
István Zoltán Szabó c226958947
[DOCS] Expands anomaly detection alert type docs (#70026)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>
2021-03-09 12:02:16 +01:00
Lisa Cawley c537e5f38c
[DOCS] Edits delete trained model alias API (#70119) 2021-03-08 17:08:58 -08:00
István Zoltán Szabó 8a7aced8e8
[DOCS] Adds beta tag to anomaly detection alert docs. (#70013) 2021-03-08 10:46:24 +01:00
István Zoltán Szabó 2ccc81081f
[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959) 2021-03-04 16:43:06 +01:00
Joe Gallo 1e8b5fa7c2
Remove the _ml/find-file-structure docs (#69823) 2021-03-03 09:49:28 -05:00
Benjamin Trent 2279cafb4e
[ML] adding new _preview endpoint for data frame analytics (#69453)
This commit adds a new `_preview` endpoint for data frame analytics. 

This allows users to see the data on which their model will be trained. This is especially useful 
in the arrival of custom feature processors.

The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.
2021-03-01 12:25:50 -05:00
Lisa Cawley 138224b398
[DOCS] Edits trained model alias API (#69491) 2021-02-24 08:17:49 -08:00
István Zoltán Szabó 77d0f56581
[DOCS] Adds anomaly detection alert documentation (#68923)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-02-23 10:29:54 +01:00
Dimitris Athanasiou 7fb98c0d3c
[ML] Add runtime mappings to data frame analytics source config (#69183)
Users can now specify runtime mappings as part of the source config
of a data frame analytics job. Those runtime mappings become part of
the mapping of the destination index. This ensures the fields are
accessible in the destination index even if the relevant data frame
analytics job gets deleted.

Closes #65056
2021-02-19 16:29:19 +02:00
Benjamin Trent 0af38bba9e
[ML] add new delete trained model aliases API (#69195)
In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.
2021-02-18 13:12:07 -05:00
Lisa Cawley 55f0e32fe4
[DOCS] Clarify put data frame analytics API feature processors option (#69158) 2021-02-18 08:53:46 -08:00
Benjamin Trent 26eef892df
[ML] adds new trained model alias API to simplify trained model updates and deployments (#68922)
A `model_alias` allows trained models to be referred by a user defined moniker. 

This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. 

Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. 

When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. 

An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.
2021-02-18 09:41:50 -05:00
James Rodewig 9b88ae92e6
[DOCS] Fix typos for duplicate words (#69125) 2021-02-17 10:34:20 -05:00
Lisa Cawley a1fb2c3606
[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084) 2021-02-16 14:02:00 -08:00
Lisa Cawley 8b6ec07613
[DOCS] Edits ML hyperparameter descriptions (#68880) 2021-02-11 11:55:28 -08:00
Lisa Cawley 683368cc4d
[DOCS] Clarify soft_tree_depth_limit (#68787)
Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>
2021-02-10 12:51:01 -08:00
István Zoltán Szabó e45d7a942d
[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213) 2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy 78368428b3
[ML] Add early stopping DFA configuration parameter (#68099)
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.
2021-02-01 11:41:28 +01:00
Dimitris Athanasiou 5c961c1c81
[ML] Expand regression/classification hyperparameters (#67950)
Expands data frame analytics regression and classification
analyses with the followin hyperparameters:

- alpha
- downsample_factor
- eta_growth_rate_per_tree
- max_optimization_rounds_per_hyperparameter
- soft_tree_depth_limit
- soft_tree_depth_tolerance
2021-01-26 12:56:41 +02:00
István Zoltán Szabó addb5cbd3a
[DOCS] Adds custom feature processors description to PUT DFA API (#67424)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-19 09:47:32 +01:00
Dimitris Athanasiou 7574013604
[ML] Remove DFA job states reindexing and analyzing from docs (#67658)
These states do no longer exist as of #67423
2021-01-18 17:39:22 +02:00
Benjamin Trent 35f478b618
[ML] [DOCS] adding missing fields to the get trained models API docs (#67590)
Adds missing fields description, inference_config, and input to the GET trained models API documentation
2021-01-15 13:20:53 -05:00
Benjamin Trent 24ebcc8c24
[ML] [DOCS] update find-structure reference docs (#67586)
The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".
2021-01-15 12:19:38 -05:00
István Zoltán Szabó 085a288af5
[DOCS] Adds hyperparameter metadata property to GET trained models API docs. (#67412) 2021-01-13 13:49:51 +01:00
Lisa Cawley 401d302c69
[DOCS] Move find file structure to a new API endpoint (#67314) 2021-01-12 11:59:45 -08:00
Benjamin Trent af179ab2f5
[ML] move find file structure to a new API endpoint (#67123)
This introduces a new `text-structure` plugin. This is the new home of the find file structure API. 

The old REST URL is still available but is deprecated.

The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged.

Changes to the high-level REST client and docs will be in separate commit.

related to: https://github.com/elastic/elasticsearch/issues/67001
2021-01-11 08:56:02 -05:00
Lisa Cawley eff9dfc3a4
[DOCS] Clarify impact of delayed data in anomaly detection (#66816)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-05 12:14:51 -08:00
István Zoltán Szabó d3ad9fe632
[DOCS] Improves inference processor linking and docs (#66119) 2021-01-05 09:42:06 +01:00
David Roberts c5bef7f9a7
[ML] Deprecate anomaly detection post data endpoint (#66347)
There is little evidence of this endpoint being used
and there is quite a lot of code complexity associated
with the various formats that can be used to upload
data and the different errors that can occur when direct
data upload is open to end users.

In a future release we can make this endpoint internal
so that only datafeeds can use it, and remove all the
options and formats that are not used by datafeeds.

End users will have to store their input data for
anomaly detection in Elasticsearch indices (which we
believe all do today) and use a datafeed to feed it
to anomaly detection jobs.
2020-12-15 18:37:20 +00:00
Dimitris Athanasiou 3bed6661de
[ML] Add log_time to AD data_counts and decide current based on it (#66343)
This commit is fixing a potential bug if we support anomaly detection
results index rollover in the future.

In particular, we determine the current `data_counts` by sorting on the
latest record time. However, this is not correct if the job reverts
to an older model snapshot. To fix this we add `log_time` to `data_counts`
(similarly to `model_size_stats`) and sort on `log_time` to figure
out the current counts for the job.
2020-12-15 19:09:13 +02:00
István Zoltán Szabó bc989e4a86
[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085) 2020-12-09 10:47:51 +01:00
István Zoltán Szabó 3081cf4944
[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036) 2020-12-09 10:01:26 +01:00
David Kyle 22dadfd407
[ML] Docs and HRLC for datafeed runtime mappings (#65810)
For the changes in #65606
2020-12-08 10:06:58 +00:00
David Roberts 49e492f313
[ML] Adding assignment_memory_basis to model_size_stats (#65561)
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.

The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats.  An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.

Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory.  To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x.  However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.

Relates #63163
2020-12-03 17:18:08 +00:00
David Roberts fc72b39a17
[ML] Adjusting soft_limit description (#65383)
This PR adds detail to the explanation of the soft_limit
memory_status in ML job stats. A consequence that was not
mentioned before is that examples are not added to category
definitions.

Relates elastic/ml-cpp#1590
2020-11-24 09:35:07 +00:00
István Zoltán Szabó a85fb5534a
[DOCS] Fixes typo in Aggregating data for faster performance. (#65354) 2020-11-23 12:44:59 +01:00
István Zoltán Szabó f1e54a63a1
[DOCS] Adds UI related limitation to configuring aggs docs (#65184)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-11-20 19:03:18 +01:00
István Zoltán Szabó 1e045da339
[DOCS] Makes the screenshot larger on the custom URLs page. (#65269) 2020-11-20 09:29:39 +01:00
David Roberts e4ce39845b
[ML] Add total ML memory to ML info (#65195)
This change adds an extra piece of information,
limits.total_ml_memory, to the ML info response.
This returns the total amount of memory that ML
is permitted to use for native processes across
all ML nodes in the cluster.  Some of this may
already be in use; the value returned is total,
not available ML memory.
2020-11-18 15:06:21 +00:00
Lisa Cawley 9fef6e7b7e
[DOCS] Adds new snapshot upgrade API (#65095) 2020-11-16 09:48:07 -08:00
István Zoltán Szabó 95a0ed4304
[DOCS] Adds recommendation about when to use chunking_config in manual mode. (#65060) 2020-11-16 16:12:07 +01:00
Benjamin Trent 33de89d94c
[ML] add new snapshot upgrader API for upgrading older snapshots (#64665)
This new API provides a way for users to upgrade their own anomaly job
model snapshots.

To upgrade a snapshot the following is done:
- Open a native process given the job id and the desired snapshot id
- load the snapshot to the process
- write the snapshot again from the native task (now updated via the
  native process)

relates #64154
2020-11-12 10:45:56 -05:00
István Zoltán Szabó db15c4d6b9
[DOCS] Adds scroll_size maximum value to datafeeds API docs (#64986) 2020-11-12 15:53:53 +01:00
István Zoltán Szabó 9ed907bc75
[DOCS] Fixes example aggregation syntax in datafeed aggregations. (#64936) 2020-11-11 16:33:36 +01:00
Lisa Cawley 919c79b745
[DOCS] Add custom feature processor example (#64681) 2020-11-06 09:24:01 -08:00
James Rodewig 1ea83359bb
[DOCS] Fix case for 'Boolean' (#64299) 2020-10-29 09:04:43 -04:00
István Zoltán Szabó 6093518f4a
[DOCS] Changes experimental flag to beta in DFA related docs (#63992) 2020-10-26 17:02:46 +01:00
Lisa Cawley a00c7a2b6c
[DOCS] Add tips for num_top_classes classification parameter (#63781) 2020-10-21 09:27:13 -07:00
István Zoltán Szabó 9defe10616
[DOCS] Expands DFA evaluation API docs with the default set of metrics (#63971) 2020-10-21 14:30:33 +02:00
Benjamin Trent c1de07fa83
[ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)
When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.

This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.

APIs supporting this flag: 
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

relates to #63055
2020-10-20 11:28:29 -04:00
Dimitris Athanasiou 03ed7de6c1
[ML] Rename evaluation metric result fields to value (#63809)
Renames data frame analytics _evaluate API results as follows:

  - per class accuracy renamed from `accuracy` to `value`
  - per class precision renamed from `precision` to `value`
  - per class recall renamed from `recall` to `value`
  - auc_roc `score` renamed to `value` for both outlier detection and classification
2020-10-20 10:30:50 +03:00
David Roberts 977a4ad3f9
[ML] Change docs test mute comment (#63866)
The original comment mentioned issue #48583, but issue #48941
is specifically open for this mute.  However, this is
inappropriate, as the underlying reason the test cannot be
unmuted is the same as for all the other tests skipped with the
comment "Kibana sample data": issues #51572, #51576 and #51678.

Closes #48941
2020-10-19 10:17:27 +01:00
Przemysław Witek d9e7d88f08
[ML] Allow setting num_top_classes to a special value -1 (#63587) 2020-10-13 13:14:17 +02:00
István Zoltán Szabó e8930a44a4
[DOCS] Adds AUC ROC classification metric to the API examples (#63563) 2020-10-13 11:03:20 +02:00
István Zoltán Szabó b517d4d9b5
[DOCS] Adds huber and msle metrics to Evaluate API example calls (#63414) 2020-10-08 17:05:04 +02:00
Przemysław Witek b0019bd0a6
[ML] Validate that AucRoc has the data necessary to be calculated (#63302) 2020-10-08 08:19:43 +02:00
lcawl 2177b46289 [DOCS] Fixes typo 2020-10-06 09:19:43 -07:00
Lisa Cawley 49ab8f8688
[DOCS] Add feature_importance_baseline to get trained model API (#63279)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2020-10-06 07:56:55 -07:00
István Zoltán Szabó de3ce8bc39
[DOCS] Adds delta and offset parameters to Evaluate DFA API docs (#63317) 2020-10-06 16:06:35 +02:00
Lisa Cawley 51f9bf657d
[DOCS] Fix titles for ML APIs (#63152) 2020-10-02 11:53:49 -07:00
István Zoltán Szabó baffdd1ec0
[DOCS] Updates trained models API docs titles. (#63165) 2020-10-02 10:15:14 -07:00
Benjamin Trent 7bd6e78dae
[ML] adding for_export flag for ml plugin GET resource APIs (#63092)
This adds the new `for_export` flag to the following APIs:

- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. 

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)


closes https://github.com/elastic/elasticsearch/issues/63055
2020-10-02 08:29:19 -04:00
Benjamin Trent 1084aaf18a
[ML] renames */inference* apis to */trained_models* (#63097)
This commit renames all `inference` CRUD APIs to `trained_models`.

This aligns with internal terminology, documentation, and use-cases.
2020-10-01 12:13:49 -04:00
Przemysław Witek cd1a27f273
[ML] Implement AucRoc metric for classification (#60502) 2020-09-30 08:56:23 +02:00
Lisa Cawley e48eab95e9
[DOCS] Formatting fix in get trained model API (#62643) 2020-09-21 08:19:37 -07:00
Benjamin Trent a653a1cbb8
[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563)
This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well.

- `GET _ml/calendars/<calendar_id>/events`
- `GET _ml/calendars/<calendar_id>`
- `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>`
- `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`
2020-09-18 09:39:40 -04:00
Benjamin Trent fdb7b6d3b5
[ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
2020-09-18 07:11:38 -04:00
Lisa Cawley e743ed6102
[DOCS] Minor typo in ML API (#62414) 2020-09-15 13:19:17 -07:00
Lisa Cawley 9c2b214873
[DOCS] Removes inference from trained model API text (#62125) 2020-09-09 10:11:50 -07:00
David Roberts 6008a74da5
[ML] Include the "properties" layer in find_file_structure mappings (#62158)
Previously the "mappings" field of the response from the
find_file_structure endpoint was not a drop-in for the
mappings format of the create index endpoint - the
"properties" layer was missing.  The reason for omitting
it initially was that the assumption was that the
find_file_structure endpoint would only ever return very
simple mappings without any nested objects.  However,
this will not be true in the future, as we will improve
mappings detection for complex JSON objects.  As a first
step it makes sense to move the returned mappings closer
to the standard format.

This is a small building block towards fixing #55616
2020-09-09 16:29:23 +01:00
Lisa Cawley 1e6cdcac20
[DOCS] Fix from and size descriptions for model APIs (#62128) 2020-09-08 12:54:51 -07:00
Lisa Cawley 4a7492f3fd
[DOCS] Fix allow_no_match description for model APIs (#62008) 2020-09-08 08:11:33 -07:00
István Zoltán Szabó a75094e666
[DOCS] Removes inference from the names of trained model APIs. (#62036) 2020-09-07 11:23:29 +02:00
Lisa Cawley 511babde59
[DOCS] Refresh machine learning custom URL example (#61826) 2020-09-03 16:53:26 -07:00
Lisa Cawley f05d8c2b98
[DOCS] Per-partition categorization (#61506) 2020-08-26 17:07:46 -07:00
lcawl f56ab039ae [DOCS] Fix typo in update anomaly detection job API 2020-08-25 17:12:43 -07:00
Benjamin Trent 1b34c88d56
[ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.

Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-08-24 12:00:44 -04:00
James Rodewig a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
James Rodewig 6b9b8c5e31
[DOCS] Move script and stored fields content to search fields page (#60826)
Changes:

* Moves `Retrieve selected fields` to its own page and adds a title abbreviation.
* Adds existing script and stored fields content to `Retrieve selected fields`
* Adds a xref for `Retrieve selected fields` to `Search your data`
* Adds related redirects and updates existing xrefs
2020-08-06 12:45:03 -04:00
István Zoltán Szabó c3536935b2
[DOCS] Adds inference phase to get DFA job stats. (#60737) 2020-08-05 16:22:21 +02:00
Przemysław Witek 29ee3a05b6
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) 2020-08-05 12:29:07 +02:00
James Rodewig 441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
Lisa Cawley 1781d4a7b9
[DOCS] Fix security links in machine learning APIs (#60098) 2020-07-23 12:14:56 -07:00
James Rodewig 2774cd6938
[DOCS] Swap `[float]` for `[discrete]` (#60124)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 11:48:22 -04:00
James Rodewig 80b674fb25
[DOCS] Reformat snippets to use two-space indents (#59973) 2020-07-21 12:24:26 -04:00
Przemysław Witek 2a12dcf2e0
Rename binary_soft_classification evaluation to outlier_detection (#59951) 2020-07-21 14:27:57 +02:00
Lisa Cawley fb0157460f
[DOCS] Changes level offset of anomaly detection pages (#59911) 2020-07-20 16:33:54 -07:00
Lisa Cawley 823c337e76
[DOCS] Changes level offset for anomaly detection APIs (#59920) 2020-07-20 12:38:09 -07:00
Lisa Cawley 42be287b57
[DOCS] Changes level offset in data frame analytics APIs (#59919) 2020-07-20 12:11:47 -07:00
Benjamin Trent b551f75ec3
[ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for 
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors 
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 09:35:56 -04:00
Przemysław Witek dfbb47dcaa
Add a "verbose" option to the data frame analytics stats endpoint (#59589) 2020-07-15 15:59:56 +02:00
Dimitris Athanasiou da0249f6c2
[ML] Data frame analytics max_num_threads setting (#59254)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
2020-07-09 16:31:26 +03:00
James Rodewig 2be9db01c8
[DOCS] Replace `datatype` with `data type` (#58972) 2020-07-07 13:52:10 -04:00
Przemysław Witek 4a43b03855
Report peak model memory in ModelSizeStats (#59017) 2020-07-06 10:33:54 +02:00
Benjamin Trent 6238d4fc49
[ML] add exponent output aggregator to inference (#58933)
* [ML] add exponent output aggregator to inference

* fixing docs
2020-07-03 08:22:01 -04:00
Przemysław Witek 843c512e78
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) 2020-07-02 16:19:27 +02:00
Przemysław Witek 38aa474dec
Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) 2020-07-01 13:29:56 +02:00
Przemysław Witek dfa06240fc
Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) 2020-06-30 13:06:15 +02:00
István Zoltán Szabó d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
Przemysław Witek 3953de4c98
Introduce DataFrameAnalyticsConfig update API (#58302) 2020-06-29 09:26:31 +02:00
Dimitris Athanasiou 96853df6af
[ML] Rename increased_memory_estimate_bytes (#58614)
... to memory_reestimate_bytes in DF Analytics
memory usage.

Relates #58588
2020-06-27 12:04:39 +03:00
Dimitris Athanasiou 0994005c2e
[ML] Add status and increased estimate to memory usage (#58588)
Adds parsing of `status` and `increased_memory_estimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`increased_memory_estimate_bytes` can be used to update the job's
limit in order to restart the job.
2020-06-26 16:10:14 +03:00
István Zoltán Szabó 3b61ec1fe2
[DOCS] Updates screenshots in ML population analysis (#58318) 2020-06-23 09:03:31 +02:00
Benjamin Trent a43ff95f2d
[ML] calculate cache misses for inference and return in stats (#58252)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-18 17:18:43 -04:00
Przemysław Witek 76c7e3259f
Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808) 2020-06-08 15:31:37 +02:00
David Kyle bbeda643a6
Delete expired data by job (#57337)
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which 
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
2020-06-05 13:32:35 +01:00
David Roberts 605b4d0ea9
[ML] Add per-partition categorization option (#57683)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.
2020-06-05 11:56:15 +01:00
Dimitris Athanasiou e116ac850f
[ML] Fix race condition when force stopping DF analytics job (#57680)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
2020-06-05 12:13:02 +03:00
István Zoltán Szabó 3a15d84af9
[DOCS] Changes parameter order in model_plot_config. (#57642) 2020-06-04 10:57:36 +02:00
Przemysław Witek c4c094c006
Introduce ModelPlotConfig. annotations_enabled setting (#57539) 2020-06-04 09:27:40 +02:00
Lisa Cawley 0f52cab495
[DOCS] Replaces docdir attributes in ML APIs (#57390) 2020-06-01 11:46:10 -07:00
Benjamin Trent 251b17009a
[ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 12:29:28 -04:00
Benjamin Trent ec67787a2e
[ML] add max_model_memory parameter to forecast request (#57254)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 08:59:50 -04:00
István Zoltán Szabó eaf0d5ffee
[DOCS] Puts a link into the loss_function variable description (#56678) 2020-05-28 09:42:27 +02:00
István Zoltán Szabó b9b3546985
[DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196) 2020-05-27 13:42:50 +02:00
István Zoltán Szabó 90056edaf4
[DOCS] Improves navigation between forecast APIs and adds short description. (#57035) 2020-05-25 09:09:47 +02:00
István Zoltán Szabó 69b6041d57
[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031) 2020-05-21 17:30:59 +02:00
Benjamin Trent 8fed077b0a
[ML] relax throttling on expired data cleanup (#56711)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 07:21:06 -04:00
David Roberts cbb8b17d74
[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288)
Docs for #55735

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-14 09:24:07 +01:00
Lisa Cawley 84e28e42c8
[DOCS] Clarify model snapshot retention properties (#56477) 2020-05-11 07:41:47 -07:00
István Zoltán Szabó c994369893
[DOCS] Expands GET DFA stats API docs with new phases (#56407)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-11 09:22:30 +02:00
David Roberts c99021cdcb
[ML] More advanced model snapshot retention options (#56125)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Closes #52150
2020-05-05 12:55:50 +01:00
Dimitris Athanasiou 6bf3834059
[ML] Add loss_function to regression (#56118)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.
2020-05-05 12:36:05 +03:00
István Zoltán Szabó 86032ac56a
[DOCS] Simplifies footnote text in DFA APIs (#56105)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-05 09:03:16 +02:00