Commit Graph

82 Commits

Author SHA1 Message Date
István Zoltán Szabó e45d7a942d
[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213) 2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy 78368428b3
[ML] Add early stopping DFA configuration parameter (#68099)
The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.
2021-02-01 11:41:28 +01:00
Dimitris Athanasiou 5c961c1c81
[ML] Expand regression/classification hyperparameters (#67950)
Expands data frame analytics regression and classification
analyses with the followin hyperparameters:

- alpha
- downsample_factor
- eta_growth_rate_per_tree
- max_optimization_rounds_per_hyperparameter
- soft_tree_depth_limit
- soft_tree_depth_tolerance
2021-01-26 12:56:41 +02:00
István Zoltán Szabó addb5cbd3a
[DOCS] Adds custom feature processors description to PUT DFA API (#67424)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2021-01-19 09:47:32 +01:00
David Kyle 22dadfd407
[ML] Docs and HRLC for datafeed runtime mappings (#65810)
For the changes in #65606
2020-12-08 10:06:58 +00:00
David Roberts 49e492f313
[ML] Adding assignment_memory_basis to model_size_stats (#65561)
At present the Java code makes a decision on whether to
use current model memory or model memory limit to calculate
how much memory a job requires to be assigned.

The plan is to move this decision to the C++ code, which will
report it via a new field in the model size stats.  An
additional change will be that once we have made the switch
from using model memory limit to using current model memory
we will never switch back, as this causes large fluctuations
up and down in memory requirement which will be much more
noticeable when autoscaling is in use.

Although the only two options at present are model memory
limit and current model memory, the new enum includes a
third possibility, peak model memory.  To switch to this
now would be tricky, as there have been two bugs in the
implementation of peak model memory which render its value
unreliable in 7.x.  However, in 8.x it might make sense to
switch to using peak model memory instead of current model
memory and it's much easier from a BWC perspective if the
enum contains all the values from the start.

Relates #63163
2020-12-03 17:18:08 +00:00
David Roberts fc72b39a17
[ML] Adjusting soft_limit description (#65383)
This PR adds detail to the explanation of the soft_limit
memory_status in ML job stats. A consequence that was not
mentioned before is that examples are not added to category
definitions.

Relates elastic/ml-cpp#1590
2020-11-24 09:35:07 +00:00
István Zoltán Szabó 95a0ed4304
[DOCS] Adds recommendation about when to use chunking_config in manual mode. (#65060) 2020-11-16 16:12:07 +01:00
István Zoltán Szabó db15c4d6b9
[DOCS] Adds scroll_size maximum value to datafeeds API docs (#64986) 2020-11-12 15:53:53 +01:00
James Rodewig 1ea83359bb
[DOCS] Fix case for 'Boolean' (#64299) 2020-10-29 09:04:43 -04:00
Benjamin Trent c1de07fa83
[ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899)
When exporting and cloning ml configurations in a cluster it can be
frustrating to remove all the fields that were generated by
the plugin. Especially as the number of these fields change
from version to version.

This flag, exclude_generated, allows the GET config APIs to return
configurations with these generated fields removed.

APIs supporting this flag: 
- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)

relates to #63055
2020-10-20 11:28:29 -04:00
Benjamin Trent 7bd6e78dae
[ML] adding for_export flag for ml plugin GET resource APIs (#63092)
This adds the new `for_export` flag to the following APIs:

- GET _ml/anomaly_detection/<job_id>
- GET _ml/datafeeds/<datafeed_id>
- GET _ml/data_frame/analytics/<analytics_id>

The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. 

The following fields are not returned in the objects:

- any field that is not user settable (e.g. version, create_time)
- any field that is a calculated default value (e.g. datafeed chunking_config)
- any field that would effectively require changing to be of use (e.g. datafeed job_id)
- any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by)


closes https://github.com/elastic/elasticsearch/issues/63055
2020-10-02 08:29:19 -04:00
Lisa Cawley e48eab95e9
[DOCS] Formatting fix in get trained model API (#62643) 2020-09-21 08:19:37 -07:00
Benjamin Trent fdb7b6d3b5
[ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922)
Adds new flag include to the get trained models API
The flag initially has two valid values: definition, total_feature_importance.
Consequently, the old include_model_definition flag is now deprecated.
When total_feature_importance is included, the total_feature_importance field is included in the model metadata object.
Including definition is the same as previously setting include_model_definition=true.
2020-09-18 07:11:38 -04:00
Lisa Cawley 9c2b214873
[DOCS] Removes inference from trained model API text (#62125) 2020-09-09 10:11:50 -07:00
Lisa Cawley 1e6cdcac20
[DOCS] Fix from and size descriptions for model APIs (#62128) 2020-09-08 12:54:51 -07:00
Lisa Cawley 4a7492f3fd
[DOCS] Fix allow_no_match description for model APIs (#62008) 2020-09-08 08:11:33 -07:00
Benjamin Trent 1b34c88d56
[ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.

Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-08-24 12:00:44 -04:00
James Rodewig a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
James Rodewig 6b9b8c5e31
[DOCS] Move script and stored fields content to search fields page (#60826)
Changes:

* Moves `Retrieve selected fields` to its own page and adds a title abbreviation.
* Adds existing script and stored fields content to `Retrieve selected fields`
* Adds a xref for `Retrieve selected fields` to `Search your data`
* Adds related redirects and updates existing xrefs
2020-08-06 12:45:03 -04:00
Lisa Cawley fb0157460f
[DOCS] Changes level offset of anomaly detection pages (#59911) 2020-07-20 16:33:54 -07:00
Benjamin Trent b551f75ec3
[ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for 
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors 
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 09:35:56 -04:00
Przemysław Witek dfbb47dcaa
Add a "verbose" option to the data frame analytics stats endpoint (#59589) 2020-07-15 15:59:56 +02:00
Przemysław Witek 4a43b03855
Report peak model memory in ModelSizeStats (#59017) 2020-07-06 10:33:54 +02:00
István Zoltán Szabó d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
Przemysław Witek 76c7e3259f
Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808) 2020-06-08 15:31:37 +02:00
David Roberts 605b4d0ea9
[ML] Add per-partition categorization option (#57683)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.
2020-06-05 11:56:15 +01:00
Przemysław Witek c4c094c006
Introduce ModelPlotConfig. annotations_enabled setting (#57539) 2020-06-04 09:27:40 +02:00
Lisa Cawley 0f52cab495
[DOCS] Replaces docdir attributes in ML APIs (#57390) 2020-06-01 11:46:10 -07:00
Lisa Cawley 84e28e42c8
[DOCS] Clarify model snapshot retention properties (#56477) 2020-05-11 07:41:47 -07:00
David Roberts c99021cdcb
[ML] More advanced model snapshot retention options (#56125)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Closes #52150
2020-05-05 12:55:50 +01:00
Lisa Cawley 52a2f7689f
[DOCS] Synchs and links hyperparameter descriptions (#55827) 2020-05-04 07:37:14 -07:00
István Zoltán Szabó ca2f98382f
[DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:02:14 +02:00
David Roberts dcb6ed03cd
[ML] Adding failed_category_count to model_size_stats (#55716)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Relates elastic/ml-cpp#1130
2020-04-25 08:01:21 +01:00
Lisa Cawley 7fafec0f8f
[DOCS] Update example and nesting in get data frame analytics job stats API (#55191)
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
2020-04-22 08:07:31 -07:00
Benjamin Trent c1afda4a23
[ML] adding prediction_field_type to inference config (#55128)
Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 08:32:48 -04:00
Lisa Cawley 1f0341db39
[DOCS] Removes unshared sections from ml-shared.asciidoc (#55129) 2020-04-14 15:19:31 -07:00
Lisa Cawley 998a085c14
[DOCS] Edits create data frame analytics job API (#54751) 2020-04-13 09:58:03 -07:00
István Zoltán Szabó b1b067c5ba
[DOCS] Adds link points to the data frame analytics supported fields (#55004)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-09 11:16:13 -07:00
István Zoltán Szabó a0662399c7
[DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:45:42 +02:00
Benjamin Trent 4e1ff31c3c
[ML] add new inference_config field to trained model config (#54421)
A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. 

The inference processor can still override whatever is set as the default in the trained model config.
2020-04-02 10:34:17 -04:00
István Zoltán Szabó b96743cfc5
[DOCS] Adds data_counts object to the GET DFA stats API (#54498) 2020-04-01 10:05:00 +02:00
Lisa Cawley b90e491f68
[DOCS] Collapses nested objects in data frame analytics APIs (#54472) 2020-03-31 10:56:48 -07:00
István Zoltán Szabó 85d9b34dc5
[DOCS] Adds description of analysis_stats object and its properties to GET DFA stats API docs (#53881)
Co-authored-by: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-03-31 13:27:54 +02:00
Lisa Cawley fdcd19483d
[DOCS] Collapses content in machine learning APIs (#54234) 2020-03-30 10:08:38 -07:00
Jason Tedor 1fc0432b24
Introduce formal role for remote cluster client (#53924)
This commit introduce a formal role for identifying nodes that are
capable of making connections to remote clusters.
2020-03-24 19:21:56 -04:00
David Roberts cbe063a074
[ML] Introduce a "starting" datafeed state for lazy jobs (#53918)
It is possible for ML jobs to open lazily if the "allow_lazy_open"
option in the job config is set to true.  Such jobs wait in the
"opening" state until a node has sufficient capacity to run them.

This commit fixes the bug that prevented datafeeds for jobs lazily
waiting assignment from being started.  The state of such datafeeds
is "starting", and they can be stopped by the stop datafeed API
while in this state with or without force.

Fixes #53763
2020-03-24 10:01:13 +00:00
Tom Veasey 58340c2dbe
[ML] Adds the class_assignment_objective parameter to classification (#52763)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-12 18:39:29 +00:00
István Zoltán Szabó 77ec60baa0
[DOCS] Adds a warning about reindexing docs with the same ID to the PUT DFA docs. (#53490) 2020-03-12 18:00:36 +01:00
Benjamin Trent 4e1f029b04
[ML][Inference] adds new default_field_map field to trained models (#53294)
Adds a new `default_field_map` field to trained model config objects. 

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 12:23:56 -04:00