Commit Graph

424 Commits

Author SHA1 Message Date
Lisa Cawley fb0157460f
[DOCS] Changes level offset of anomaly detection pages (#59911) 2020-07-20 16:33:54 -07:00
Lisa Cawley 823c337e76
[DOCS] Changes level offset for anomaly detection APIs (#59920) 2020-07-20 12:38:09 -07:00
Lisa Cawley 42be287b57
[DOCS] Changes level offset in data frame analytics APIs (#59919) 2020-07-20 12:11:47 -07:00
Benjamin Trent b551f75ec3
[ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for 
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors 
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 09:35:56 -04:00
Przemysław Witek dfbb47dcaa
Add a "verbose" option to the data frame analytics stats endpoint (#59589) 2020-07-15 15:59:56 +02:00
Dimitris Athanasiou da0249f6c2
[ML] Data frame analytics max_num_threads setting (#59254)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
2020-07-09 16:31:26 +03:00
James Rodewig 2be9db01c8
[DOCS] Replace `datatype` with `data type` (#58972) 2020-07-07 13:52:10 -04:00
Przemysław Witek 4a43b03855
Report peak model memory in ModelSizeStats (#59017) 2020-07-06 10:33:54 +02:00
Benjamin Trent 6238d4fc49
[ML] add exponent output aggregator to inference (#58933)
* [ML] add exponent output aggregator to inference

* fixing docs
2020-07-03 08:22:01 -04:00
Przemysław Witek 843c512e78
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) 2020-07-02 16:19:27 +02:00
Przemysław Witek 38aa474dec
Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) 2020-07-01 13:29:56 +02:00
Przemysław Witek dfa06240fc
Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) 2020-06-30 13:06:15 +02:00
István Zoltán Szabó d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
Przemysław Witek 3953de4c98
Introduce DataFrameAnalyticsConfig update API (#58302) 2020-06-29 09:26:31 +02:00
Dimitris Athanasiou 96853df6af
[ML] Rename increased_memory_estimate_bytes (#58614)
... to memory_reestimate_bytes in DF Analytics
memory usage.

Relates #58588
2020-06-27 12:04:39 +03:00
Dimitris Athanasiou 0994005c2e
[ML] Add status and increased estimate to memory usage (#58588)
Adds parsing of `status` and `increased_memory_estimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`increased_memory_estimate_bytes` can be used to update the job's
limit in order to restart the job.
2020-06-26 16:10:14 +03:00
István Zoltán Szabó 3b61ec1fe2
[DOCS] Updates screenshots in ML population analysis (#58318) 2020-06-23 09:03:31 +02:00
Benjamin Trent a43ff95f2d
[ML] calculate cache misses for inference and return in stats (#58252)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-18 17:18:43 -04:00
Przemysław Witek 76c7e3259f
Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808) 2020-06-08 15:31:37 +02:00
David Kyle bbeda643a6
Delete expired data by job (#57337)
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which 
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
2020-06-05 13:32:35 +01:00
David Roberts 605b4d0ea9
[ML] Add per-partition categorization option (#57683)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.
2020-06-05 11:56:15 +01:00
Dimitris Athanasiou e116ac850f
[ML] Fix race condition when force stopping DF analytics job (#57680)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
2020-06-05 12:13:02 +03:00
István Zoltán Szabó 3a15d84af9
[DOCS] Changes parameter order in model_plot_config. (#57642) 2020-06-04 10:57:36 +02:00
Przemysław Witek c4c094c006
Introduce ModelPlotConfig. annotations_enabled setting (#57539) 2020-06-04 09:27:40 +02:00
Lisa Cawley 0f52cab495
[DOCS] Replaces docdir attributes in ML APIs (#57390) 2020-06-01 11:46:10 -07:00
Benjamin Trent 251b17009a
[ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 12:29:28 -04:00
Benjamin Trent ec67787a2e
[ML] add max_model_memory parameter to forecast request (#57254)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 08:59:50 -04:00
István Zoltán Szabó eaf0d5ffee
[DOCS] Puts a link into the loss_function variable description (#56678) 2020-05-28 09:42:27 +02:00
István Zoltán Szabó b9b3546985
[DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196) 2020-05-27 13:42:50 +02:00
István Zoltán Szabó 90056edaf4
[DOCS] Improves navigation between forecast APIs and adds short description. (#57035) 2020-05-25 09:09:47 +02:00
István Zoltán Szabó 69b6041d57
[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031) 2020-05-21 17:30:59 +02:00
Benjamin Trent 8fed077b0a
[ML] relax throttling on expired data cleanup (#56711)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 07:21:06 -04:00
David Roberts cbb8b17d74
[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288)
Docs for #55735

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-14 09:24:07 +01:00
Lisa Cawley 84e28e42c8
[DOCS] Clarify model snapshot retention properties (#56477) 2020-05-11 07:41:47 -07:00
István Zoltán Szabó c994369893
[DOCS] Expands GET DFA stats API docs with new phases (#56407)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-11 09:22:30 +02:00
David Roberts c99021cdcb
[ML] More advanced model snapshot retention options (#56125)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Closes #52150
2020-05-05 12:55:50 +01:00
Dimitris Athanasiou 6bf3834059
[ML] Add loss_function to regression (#56118)
Adds parameters `loss_function` and `loss_function_parameter`
to regression.
2020-05-05 12:36:05 +03:00
István Zoltán Szabó 86032ac56a
[DOCS] Simplifies footnote text in DFA APIs (#56105)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-05 09:03:16 +02:00
Lisa Cawley 52a2f7689f
[DOCS] Synchs and links hyperparameter descriptions (#55827) 2020-05-04 07:37:14 -07:00
Lisa Cawley 5ef7aacbf7
[DOCS] Adds documentation for secondary authorization headers (#55365)
Co-authored-by: Tim Vernum <tim@adjective.org>
2020-04-29 08:28:42 -07:00
István Zoltán Szabó d70cef3474
[DOCS] Makes the footnotes less verbose in configuring aggs page. (#55857) 2020-04-29 09:50:41 +02:00
István Zoltán Szabó ca2f98382f
[DOCS] Changes feature importance links to point to the new page (#55531)
* [DOCS] Changes feature importance links to point to the new page.

* [DOCS] Fixes line breaks.
2020-04-28 09:02:14 +02:00
David Roberts dcb6ed03cd
[ML] Adding failed_category_count to model_size_stats (#55716)
The failed_category_count statistic records the number of times
categorization wanted to create a new category but couldn't
because the job had reached its model_memory_limit.

Relates elastic/ml-cpp#1130
2020-04-25 08:01:21 +01:00
Lisa Cawley 7fafec0f8f
[DOCS] Update example and nesting in get data frame analytics job stats API (#55191)
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
2020-04-22 08:07:31 -07:00
David Roberts d1a9b3a545
[ML] Add effective max model memory limit to ML info (#55529)
The ML info endpoint returns the max_model_memory_limit setting
if one is configured.  However, it is still possible to create
a job that cannot run anywhere in the current cluster because
no node in the cluster has enough memory to accommodate it.

This change adds an extra piece of information,
limits.effective_max_model_memory_limit, to the ML info
response that returns the biggest model memory limit that could
be run in the current cluster assuming no other jobs were
running.

The idea is that the ML UI will be able to warn users who try to
create jobs with higher model memory limits that their jobs will
not be able to start unless they add a bigger ML node to their
cluster.

Relates elastic/kibana#63942
2020-04-22 11:36:58 +01:00
David Roberts 8906e76079
[ML] Return assigned node in start/open job/datafeed response (#55473)
Adds a "node" field to the response from the following endpoints:

1. Open anomaly detection job
2. Start datafeed
3. Start data frame analytics job

If the job or datafeed is assigned to a node immediately then
this field will return the ID of that node.

In the case where a job or datafeed is opened or started lazily
the node field will contain an empty string.  Clients that want
to test whether a job or datafeed was opened or started lazily
can therefore check for this.

Fixes #54067
2020-04-22 08:44:57 +01:00
István Zoltán Szabó f8bfab2dab
[DOCS] Provides further details on aggregations in datafeeds (#55462)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-04-22 08:53:34 +02:00
Benjamin Trent f72b2db29a
[ML] partitions model definitions into chunks (#55260)
This paves the data layer way so that exceptionally large models are partitioned across multiple documents.

This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0.

I chose the definition document limit to be 100. This *SHOULD* be plenty for any large model. One of the largest models that I have created so far had the following stats:
~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap. 
With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents. 
Supporting models 20 times this size (compressed) seems adequate for now.
2020-04-20 15:13:23 -04:00
Benjamin Trent c1afda4a23
[ML] adding prediction_field_type to inference config (#55128)
Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. 

Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. 

Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). 

Analytics provides the default `prediction_field_type` when the model is created from the process.
2020-04-15 08:32:48 -04:00
Lisa Cawley 1f0341db39
[DOCS] Removes unshared sections from ml-shared.asciidoc (#55129) 2020-04-14 15:19:31 -07:00
Lisa Cawley 998a085c14
[DOCS] Edits create data frame analytics job API (#54751) 2020-04-13 09:58:03 -07:00
István Zoltán Szabó b1b067c5ba
[DOCS] Adds link points to the data frame analytics supported fields (#55004)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-09 11:16:13 -07:00
István Zoltán Szabó bb44726ad6
[DOCS] Reworks some parts of EMM API docs (#54872)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-04-08 09:50:12 +02:00
Lisa Cawley c355fea8f4
[DOCS] Remove text fields from classification dependent variables (#54849) 2020-04-07 10:43:15 -07:00
István Zoltán Szabó 1ae8bde756
[DOCS] Changes kibana_user to kibana_admin in DFA API prerequisites. (#54806) 2020-04-06 15:45:08 +02:00
István Zoltán Szabó a0662399c7
[DOCS] Makes PUT inference API docs collapsible (#54653)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-04-03 09:45:42 +02:00
Benjamin Trent 4e1ff31c3c
[ML] add new inference_config field to trained model config (#54421)
A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. 

The inference processor can still override whatever is set as the default in the trained model config.
2020-04-02 10:34:17 -04:00
Benjamin Trent 1d24960ff8
[ML] prefer secondary authorization header for data[feed|frame] authz (#54121)
Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds. 

Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided).

closes https://github.com/elastic/elasticsearch/issues/53801
2020-04-02 10:10:46 -04:00
Benjamin Trent bbd6e943de
[ML] add num_matches and preferred_to_categories to category defintion objects (#54214)
This adds two new fields to category definitions.

- `num_matches` indicating how many documents have been seen by this category
- `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized.

These fields are only guaranteed to be up to date after a `_flush` or `_close`

native change: https://github.com/elastic/ml-cpp/pull/1062
2020-04-02 07:49:09 -04:00
István Zoltán Szabó b0f6d4ee0e
[DOCS] Updates estimate model memory docs (#54574) 2020-04-01 15:53:53 +02:00
István Zoltán Szabó b96743cfc5
[DOCS] Adds data_counts object to the GET DFA stats API (#54498) 2020-04-01 10:05:00 +02:00
Jason Tedor 95a7eed9aa
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 15:52:01 -04:00
Lisa Cawley b90e491f68
[DOCS] Collapses nested objects in data frame analytics APIs (#54472) 2020-03-31 10:56:48 -07:00
Dimitris Athanasiou 5a98fc20e1
[ML] Fix DF analytics explain API request in docs (#54510)
The explain API expects a data frame analytics config
as its request.
2020-03-31 18:37:19 +03:00
István Zoltán Szabó 85d9b34dc5
[DOCS] Adds description of analysis_stats object and its properties to GET DFA stats API docs (#53881)
Co-authored-by: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-03-31 13:27:54 +02:00
Lisa Cawley fdcd19483d
[DOCS] Collapses content in machine learning APIs (#54234) 2020-03-30 10:08:38 -07:00
Jason Tedor 1fc0432b24
Introduce formal role for remote cluster client (#53924)
This commit introduce a formal role for identifying nodes that are
capable of making connections to remote clusters.
2020-03-24 19:21:56 -04:00
David Roberts 8ee770560a
[ML] Add a model memory estimation endpoint for anomaly detection (#53507)
A new endpoint for estimating anomaly detection job
model memory requirements:

POST _ml/anomaly_detectors/estimate_model_memory

Closes #53219
2020-03-24 21:38:19 +00:00
David Roberts cbe063a074
[ML] Introduce a "starting" datafeed state for lazy jobs (#53918)
It is possible for ML jobs to open lazily if the "allow_lazy_open"
option in the job config is set to true.  Such jobs wait in the
"opening" state until a node has sufficient capacity to run them.

This commit fixes the bug that prevented datafeeds for jobs lazily
waiting assignment from being started.  The state of such datafeeds
is "starting", and they can be stopped by the stop datafeed API
while in this state with or without force.

Fixes #53763
2020-03-24 10:01:13 +00:00
István Zoltán Szabó 8279f82dea
[DOCS] Fixes typo in start datafeed API docs. (#53811) 2020-03-19 17:55:26 +01:00
István Zoltán Szabó 57321124ea
[DOCS] Changes seconds to milliseconds since the Epoch in AD docs. (#53797) 2020-03-19 15:40:53 +01:00
Tom Veasey 58340c2dbe
[ML] Adds the class_assignment_objective parameter to classification (#52763)
Adds a new parameter for classification that enables choosing whether to assign labels to
maximise accuracy or to maximise the minimum class recall.

Fixes #52427.
2020-03-12 18:39:29 +00:00
István Zoltán Szabó 77ec60baa0
[DOCS] Adds a warning about reindexing docs with the same ID to the PUT DFA docs. (#53490) 2020-03-12 18:00:36 +01:00
Benjamin Trent 4e1f029b04
[ML][Inference] adds new default_field_map field to trained models (#53294)
Adds a new `default_field_map` field to trained model config objects. 

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 12:23:56 -04:00
Dimitris Athanasiou 5a32f50d18
[ML] Rename data frame analytics maximum_number_trees to max_trees (#53300)
Deprecates `maximum_number_trees` parameter of classification and
regression and replaces it with `max_trees`.
2020-03-11 10:33:53 +02:00
István Zoltán Szabó 54b66d3385
[DOCS] Makes the description clearer on how to use aggregations in an anomaly detection job (#53103)
Co-authored-by: lcawl <lcawley@elastic.co>
2020-03-09 09:48:23 +01:00
István Zoltán Szabó 08fcc0b02f
[DOCS] Adds deleting flag to the GET job stats API docs (#53223) 2020-03-06 16:03:09 +01:00
István Zoltán Szabó 870e1891d9
[DOCS] Makes the naming convention of the DFA response objects coherent (#53172) 2020-03-05 16:25:43 +01:00
István Zoltán Szabó d7fb6416dd
[DOCS] Expands GET DFA stat API docs with response objects. (#53107) 2020-03-05 15:30:30 +01:00
Lisa Cawley 7004216455
[DOCS] Adds link in datafeed indices_options (#53067) 2020-03-03 10:28:54 -08:00
István Zoltán Szabó 24fe7e5899
[DOCS] Adds response body documentation to GET inference API (#53050) 2020-03-03 16:25:24 +01:00
Lisa Cawley b6534834f9
[DOCS] Adds cat anomaly detectors API (#52866) 2020-02-28 12:15:21 -08:00
Dimitris Athanasiou dd331935b3
[ML] Parse and report memory usage for DF Analytics (#52778)
Adds reporting of memory usage for data frame analytics jobs.
This commit introduces a new index pattern `.ml-stats-*` whose
first concrete index will be `.ml-stats-000001`. This index serves
to store instrumentation information for those jobs.
2020-02-28 17:35:07 +02:00
Benjamin Trent d7a63333b5
[ML] Add indices_options to datafeed config and update (#52793)
This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. 

This is necessary for the following use cases:
 - Reading from frozen indices
 - Allowing certain indices in multiple index patterns to not exist yet

These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object.
 
closes https://github.com/elastic/elasticsearch/issues/48056
2020-02-27 12:22:35 -05:00
Lisa Cawley 42fbca7dc6
[DOCS] Adds cat datafeeds API (#52738) 2020-02-26 09:20:36 -08:00
István Zoltán Szabó 490e8b47e6
[DOCS] Adds cat data frame analytics API (#52764)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-02-26 11:09:37 +01:00
Lisa Cawley cd069a861c
[DOCS] Updates custom rules example (#52731) 2020-02-25 09:30:14 -08:00
David Roberts ca80ad69f2
[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720)
This is because beat.timezone was renamed to event.timezone in
elastic/beats#9458
2020-02-25 12:18:53 +00:00
lcawl b590b49205 [DOCS] Adds anchor for custom rules 2020-02-24 10:04:34 -08:00
Lisa Cawley f41ebe47e3
[DOCS] Clarifies description of num_top_feature_importance_values (#52246)
Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>
2020-02-18 08:48:24 -08:00
Lisa Cawley ab139244d7
[DOCS] Fixes, sorts ML tagged regions (#52283) 2020-02-12 13:43:21 -08:00
David Kyle f64c6359ed
[ML] Make Ensemble feature names optional (#51996)
The featureNames field is requisite in individual models but is not required by the Ensemble.
2020-02-07 10:07:18 +00:00
David Roberts 72346b91f9
[ML] Add new categorization stats to model_size_stats (#51879)
This change adds support for the following new model_size_stats
fields:

- categorized_doc_count
- total_category_count
- frequent_category_count
- rare_category_count
- dead_category_count
- categorization_status

Relates #50749
2020-02-06 17:08:43 +00:00
Darren LaCasse ea67e24b7b
[DOCS] Remove extra word (#51757) 2020-01-31 10:27:37 -08:00
István Zoltán Szabó 67f14c3978
[DOCS] Adds PUT inference API docs (#51231)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-31 13:12:24 +01:00
Lisa Cawley 32adcd2c9d
[DOCS] Adds missing testenv attribute (#51719) 2020-01-30 16:13:26 -08:00
David Roberts a5a2e4eaee
[ML] Use CSV ingest processor in find_file_structure ingest pipeline (#51492)
Changes the find_file_structure response to include a CSV
ingest processor in the ingest pipeline it suggests.

Previously the Kibana file upload functionality parsed CSV
in the browser, but by parsing CSV in the ingest pipeline
it makes the Kibana file upload functionality more easily
interchangable with Filebeat such that the configurations
it creates can more easily be used to import data with the
same structure repeatedly in production.
2020-01-28 12:46:00 +00:00
István Zoltán Szabó 85e581282d
[DOCS] Refines description. (#51400) 2020-01-24 13:31:44 +01:00
Benjamin Trent c9e285c1e6
[ML][Inference] add tags url param to GET (#51330)
Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint.

This parameter allows the list of models to be further reduced to those who contain all the provided tags.
2020-01-24 07:30:56 -05:00
Lisa Cawley 789aeaedab
[DOCS] Updates categorization examples with wizard screenshots (#51133) 2020-01-22 11:26:10 -08:00
Lisa Cawley 551a83a2ff
[DOCS] Clarify interval, frequency, and bucket span in ML APIs and example (#51280) 2020-01-22 08:08:31 -08:00
David Kyle 7978f0b8ef
[ML] Calculate results and snapshot retention using latest bucket timestamps (#51061)
The retention period is calculated relative to the last bucket result or snapshot
time rather than wall clock
2020-01-22 10:08:41 +00:00
István Zoltán Szabó 087a048ee6 [DOCS] Adds text about data types to the categorization docs (#51145) 2020-01-17 09:52:57 -08:00
Dimitris Athanasiou 24ce598239
[ML] DF Analytics _explain API should skip object fields (#51115)
Object fields cannot be used as features. At the moment _explain
API includes them and even worse it allows it does not error when
an object field is excluded. This creates the expectation to the
user that all children fields will also be excluded while it's not
the case.

This commit omits object fields from the _explain API and also
adds an error if an object field is included or excluded.
2020-01-17 12:24:17 +02:00
David Kyle 5ad1d0d2cc
Fix hardcoded version replacement in put-dfanalytics.asciidoc (#51056) 2020-01-16 10:06:45 +00:00
Przemysław Witek 999884d8fb
Add missing docs for new evaluation metrics (#50967) 2020-01-15 14:23:37 +01:00
István Zoltán Szabó 406810c172
[DOCS] Describes the relationship of the time-related settings in anomaly detection docs (#50959)
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
2020-01-15 08:45:03 +01:00
Dimitris Athanasiou 4d2be9bd32
[ML] Add num_top_feature_importance_values param to regression and classi… (#50914)
Adds a new parameter to regression and classification that enables computation
of importance for the top most important features. The computation of the importance
is based on SHAP (SHapley Additive exPlanations) method.
2020-01-14 15:01:47 +02:00
Lisa Cawley 979a28d2b5
[DOCS] Clarify detector_index property in ML APIs (#50723) 2020-01-09 08:12:53 -08:00
István Zoltán Szabó b3457154a3
[DOCS] Fine-tunes data frame analytics API docs formatting. (#50799) 2020-01-09 16:21:01 +01:00
István Zoltán Szabó b683f96e23
[DOCS] Moves analysis resources to PUT DFA API docs (#50704)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-09 13:57:11 +01:00
István Zoltán Szabó 659b4ceb97
[DOCS] Improves find_file_structure documentation (#50743)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-01-09 11:19:19 +01:00
István Zoltán Szabó bc21500201
[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-09 10:44:07 +01:00
István Zoltán Szabó 2f55c3566f
[DOCS] Clarifies model_size_stats.total_xxx_field_count objects and removes notes in GET job stats API docs. (#50728) 2020-01-09 09:43:55 +01:00
István Zoltán Szabó d5fcb73b1f
[DOCS] Improves description for forecast_stats (#50729)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2020-01-09 09:31:30 +01:00
Lisa Cawley b13a755842
[DOCS] Adds missing timing_stats descriptions (#50574) 2020-01-03 09:07:08 -08:00
István Zoltán Szabó 675b98f90c
[DOCS] Fine-tunes training_percent definition. (#50601) 2020-01-03 14:49:43 +01:00
Dimitris Athanasiou af0ce426cc
[ML] Implement force deleting a data frame analytics job (#50553)
Adds a `force` parameter to the delete data frame analytics
request. When `force` is `true`, the action force-stops the
jobs and then proceeds to the deletion. This can be used in
order to delete a non-stopped job with a single request.

Closes #48124
2020-01-03 12:01:41 +02:00
István Zoltán Szabó fd50169c74
[DOCS] Specifies the possible data types of classification dependent_variable (#50582) 2020-01-03 10:41:38 +01:00
Lisa Cawley dd4ede5c56
[DOCS] Adds filter and calendar attributes (#50566) 2020-01-02 10:59:54 -08:00
lcawl c7408a25f1 [DOCS] Minor fixes in ML APIs 2019-12-30 15:21:18 -08:00
James Rodewig e8a6d4a3fb
[DOCS] Remove unneeded redirects (#50476)
The docs/reference/redirects.asciidoc file stores a list of relocated or
deleted pages for the Elasticsearch Reference documentation.

This prunes several older redirects that are no longer needed and
don't require work to fix broken links in other repositories.
2019-12-26 07:49:41 -05:00
Lisa Cawley 6501338a9e
[DOCS] Remove redundant results from ML APIs (#50477) 2019-12-24 08:34:03 -08:00
Orhan Toy 48342740c5 [DOCS] Fixes "enables you to" typos (#50225) 2019-12-23 14:38:37 -05:00
Lisa Cawley 362ce41eaf
[DOCS] Updates ML links (#50387) 2019-12-19 14:47:28 -08:00
lcawl d8a94f0397 [DOCS] Fixes security links 2019-12-18 11:51:03 -08:00
Lisa Cawley 68e02a19d8
[DOCS] Move machine learning results definitions into APIs (#50257) 2019-12-18 09:50:31 -08:00
István Zoltán Szabó 50e26d40a2
[DOCS] Adds GET, GET stats and DELETE inference APIs (#50224)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-18 09:10:12 +01:00
Lisa Cawley 207094cd67
[DOCS] Moves model snapshot resource definitions into APIs (#50157)
Co-Authored-By: Ed Savage <32410745+edsavage@users.noreply.github.com>
2019-12-16 10:42:30 -08:00
István Zoltán Szabó 3857e3d94f
[DOCS] Moves data frame analytics job resource definitions into APIs (#50021) 2019-12-12 10:59:37 +01:00
Lisa Cawley ca482127fa
[DOCS] Move job count resource definitions into API (#50057)
Co-Authored-By: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-Authored-By: David Roberts <dave.roberts@elastic.co>
Co-Authored-By: Ed Savage <32410745+edsavage@users.noreply.github.com>
2019-12-11 11:17:15 -08:00
Lisa Cawley 3d96e6b68e
[DOCS] Move datafeed resource definitions into APIs (#50005)
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
2019-12-11 09:50:41 -08:00
Dimitris Athanasiou 269425b54d
[ML] Introduce randomize_seed setting for regression and classification (#49990)
This adds a new `randomize_seed` for regression and classification.
When not explicitly set, the seed is randomly generated. One can
reuse the seed in a similar job in order to ensure the same docs
are picked for training.
2019-12-10 10:22:53 +02:00
Lisa Cawley 0f51bc2f72
[DOCS] Move anomaly detection job resource definitions into APIs (#49700)
Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>
2019-12-06 15:32:07 -08:00
István Zoltán Szabó e5d512a8ed
[DOCS] Fixes classification evaluation example response. (#49905) 2019-12-06 13:24:22 +01:00
István Zoltán Szabó f7a5b73972
[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831) 2019-12-05 14:15:19 +01:00
István Zoltán Szabó c793e80d3b
[DOCS] Fixes typo in the ML anomaly detection time functions docs. (#49834) 2019-12-05 09:57:01 +01:00
Dimitris Athanasiou bad07b76f7
[ML] Add optional source filtering during data frame reindexing (#49690)
This adds a `_source` setting under the `source` setting of a data
frame analytics config. The new `_source` is reusing the structure
of a `FetchSourceContext` like `analyzed_fields` does. Specifying
includes and excludes for source allows selecting which fields
will get reindexed and will be available in the destination index.

Closes #49531
2019-11-29 14:20:31 +02:00
lcawl 3b3f3ca925 [DOCS] Fixes typo in ML resources 2019-11-26 10:28:18 -08:00
lcawl 63b944c00f [DOCS] Fixes data type formatting 2019-11-26 08:21:39 -08:00
David Roberts 40c951d781
[ML] Add default categorization analyzer definition to ML info (#49545)
The categorization job wizard in the ML UI will use this
information when showing the effect of the chosen categorization
analyzer on a sample of input.
2019-11-25 13:20:12 +00:00
Dimitris Athanasiou 5a6967af57
[ML][DOCS] Anomaly detection job retention days settings do not require restart (#49546) 2019-11-25 15:12:41 +02:00
Dimitris Athanasiou 0390ec3627
[ML] Explain data frame analytics API (#49455)
This commit replaces the _estimate_memory_usage API with
a new API, the _explain API.

The API consolidates information that is useful before
creating a data frame analytics job.

It includes:

- memory estimation
- field selection explanation

Memory estimation is moved here from what was previously
calculated in the _estimate_memory_usage API.

Field selection is a new feature that explains to the user
whether each available field was selected to be included or
not in the analysis. In the case it was not included, it also
explains the reason why.
2019-11-22 20:08:14 +02:00
Lisa Cawley 8d214e851c
[DOCS] Clarify ML job closure prerequisites (#49265) 2019-11-19 08:31:24 -08:00
David Roberts b6c6387af5
[TEST] Mute docs snippet test in close-job.asciidoc (#49000)
Due to https://github.com/elastic/elasticsearch/pull/48583#issuecomment-552991325
2019-11-12 17:31:07 +00:00
Benjamin Trent ee8853fbc1
[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)
[ML] Add new geo_results.(actual_point|typical_point) fields for `lat_long` results (#47050)

Related PR: https://github.com/elastic/ml-cpp/pull/809
2019-11-11 13:21:18 -05:00
István Zoltán Szabó 7180b90646
[DOCS] Removes best practice about fields that are highly correlated to the dependent variable. (#48935) 2019-11-11 10:00:11 -05:00
István Zoltán Szabó e9cec6e1f7
[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307) 2019-11-11 09:53:59 -05:00
István Zoltán Szabó 6c3fed8d4d
[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241) 2019-11-06 07:40:27 -05:00
István Zoltán Szabó fe92cd0a26
[DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657) 2019-11-06 07:37:14 -05:00
Lisa Cawley 29ac34a45c
[DOCS] Re-enable code snippet testing in close anomaly detection job API (#48259) 2019-10-28 08:08:38 -07:00
David Roberts d308095b28
[ML] Add option to stop datafeed that finds no data (#47922)
Adds a new datafeed config option, max_empty_searches,
that tells a datafeed that has never found any data to stop
itself and close its associated job after a certain number
of real-time searches have returned no data.
2019-10-14 13:26:06 +01:00
David Roberts fd83c18cc1
[ML] Add lazy assignment job config option (#47726)
This change adds:

- A new option, allow_lazy_open, to anomaly detection jobs
- A new option, allow_lazy_start, to data frame analytics jobs

Both work in the same way: they allow a job to be
opened/started even if no ML node exists that can
accommodate the job immediately. In this situation
the job waits in the opening/starting state until ML
node capacity is available. (The starting state for data
frame analytics jobs is new in this change.)

Additionally, the ML nightly maintenance tasks now
creates audit warnings for ML jobs that are unassigned.
This means that jobs that cannot be assigned to an ML
node for a very long time will show a yellow warning
triangle in the UI.

A final change is that it is now possible to close a job
that is not assigned to a node without using force.
This is because previously jobs that were open but
not assigned to a node were an aberration, whereas
after this change they'll be relatively common.
2019-10-14 12:13:01 +01:00
István Zoltán Szabó 448d19f0ca
[DOCS] Adds supported fields section to the PUT DFA API description (#47842) 2019-10-10 12:34:39 +02:00
István Zoltán Szabó ab08c0cd76
[DOCS] Extends the analyzed_fields description in the PUT DFA API docs (#47791) 2019-10-09 18:13:33 +02:00
Dimitris Athanasiou e99435a7f6
[ML] Additional outlier detection parameters (#47600)
Adds the following parameters to `outlier_detection`:

- `compute_feature_influence` (boolean): whether to compute or not
   feature influence scores
- `outlier_fraction` (double): the proportion of the data set assumed
   to be outlying prior to running outlier detection
- `standardization_enabled` (boolean): whether to apply standardization
   to the feature values
2019-10-07 15:28:21 +03:00
Lisa Cawley 4e4990c6a0
[DOCS] Cleans up links to security content (#47610) 2019-10-04 16:10:26 -07:00
István Zoltán Szabó b03be6e816
[DOCS] Fixes an attribute in the update datafeed API docs. (#47551) 2019-10-04 08:42:30 +02:00
István Zoltán Szabó c0da956b6e
[DOCS] Amends update datafeed API docs (#47448) 2019-10-03 13:12:19 +02:00
István Zoltán Szabó 4977baf63a
[DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs (#46966)
* [DOCS] Adds examples to the PUT dfa and the evaluate dfa APIs.

* [DOCS] Removes extra lines from examples.

* Update docs/reference/ml/df-analytics/apis/evaluate-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* Update docs/reference/ml/df-analytics/apis/put-dfanalytics.asciidoc

Co-Authored-By: Lisa Cawley <lcawley@elastic.co>

* [DOCS] Explains examples.
2019-10-02 10:26:20 +02:00
István Zoltán Szabó aa7c4030cd
[DOCS] Fine tunes update anomaly detection job API documentation (#47280)
* [DOCS] Fine tunes update anomaly detection job API documentation.
* [DOCS] Removes delimiter to fix the table.
2019-10-02 10:04:35 +02:00
István Zoltán Szabó 4073499f43
[DOCS] Fixes typos in the PUT dfa and the evaluate dfa documentation. (#47348) 2019-10-02 09:49:59 +02:00
István Zoltán Szabó a6c517a96e
[DOCS] Changes wording to move away from data frame terminology in the ES repo (#47093)
* [DOCS] Changes wording to move away from data frame terminology in the ES repo.
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-10-01 08:04:06 +02:00
István Zoltán Szabó 14227106b0
[DOCS] Adds regression analytics resources and examples to the data frame analytics APIs and the evaluation API (#46176)
* [DOCS] Adds regression analytics resources and examples to the data frame analytics APIs.
Co-Authored-By: Benjamin Trent <ben.w.trent@gmail.com>
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
2019-09-19 09:10:11 +02:00
István Zoltán Szabó bd4d46c416
[DOCS] Adds outlier detection params to the data frame analytics resources (#46323)
* [DOCS] Adds outlier detection params to the data frame analytics resources.
Co-Authored-By: Tom Veasey <tveasey@users.noreply.github.com>
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-09-16 14:21:50 +02:00
James Rodewig 5c78f606c2
[DOCS] Change // CONSOLE comments to [source,console] (#46440) 2019-09-09 10:45:37 -04:00
James Rodewig e43be90e6c
[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449) 2019-09-06 14:05:36 -04:00
James Rodewig 97802d8aff
[DOCS] Change // CONSOLE comments to [source,console] (#46441) 2019-09-06 10:55:16 -04:00
István Zoltán Szabó e39cdd63c3
[DOCS] Adds progress parameter description to the GET stats data frame analytics API doc. (#46434) 2019-09-06 15:17:18 +02:00
James Rodewig 466c59a4a7
[DOCS] Replace "// TESTRESPONSE" magic comments with "[source,console-result] (#46295) 2019-09-05 16:47:18 -04:00
István Zoltán Szabó 626bbccd6e
[DOCS] [PUT DFA] Documents inline the child params of source and dest (#45649)
* [DOCS] [PUT DFA] Documents inline the child params of source and dest.

* [DOCS] Fixes indentation issues and amends dfa definitions.
2019-08-29 14:38:14 +02:00
Benjamin Trent 9f716fbd4c
[ML] Throw an error when a datafeed needs CCS but it is not enabled for the node (#46044)
Though we allow CCS within datafeeds, users could prevent nodes from accessing remote clusters. This can cause mysterious errors and difficult to troubleshoot.

This commit adds a check to verify that `cluster.remote.connect` is enabled on the current node when a datafeed is configured with a remote index pattern.
2019-08-28 15:06:26 -05:00
Dimitris Athanasiou f6a97decac
[ML] Improve progress reportings for DF analytics (#45856)
Previously, the stats API reports a progress percentage
for DF analytics tasks that are running and are in the
`reindexing` or `analyzing` state.

This means that when the task is `stopped` there is no progress
reported. Thus, one cannot distinguish between a task that never
run to one that completed.

In addition, there are blind spots in the progress reporting.
In particular, we do not account for when data is loaded into the
process. We also do not account for when results are written.

This commit addresses the above issues. It changes progress
to being a list of objects, each one describing the phase
and its progress as a percentage. We currently have 4 phases:
reindexing, loading_data, analyzing, writing_results.

When the task stops, progress is persisted as a document in the
state index. The stats API now reports progress from in-memory
if the task is running, or returns the persisted document
(if there is one).
2019-08-23 17:31:36 +03:00
Przemysław Witek 31f6e78acd
Allow the user to specify 'query' in Evaluate Data Frame request (#45775) 2019-08-22 08:27:38 +02:00
Dimitris Athanasiou 8af319481e
[ML] Add description to DF analytics (#45774) 2019-08-21 19:58:09 +03:00
Przemysław Witek c6a25a818d
Add docs for HLRC for Estimate memory usage API (#45538) 2019-08-21 12:52:17 +02:00
Przemysław Witek 7107c221a7
Implement ml/data_frame/analytics/_estimate_memory_usage API endpoint (#45188) 2019-08-13 20:59:35 +02:00
István Zoltán Szabó 78e35c2c3d
[DOCS] Adds supported time units ref to the ML and DF API params. (#45322) 2019-08-08 13:43:55 +02:00
Lisa Cawley 46912c8f3d
[DOCS] Reformats ML update APIs (#45253) 2019-08-06 11:05:01 -07:00
István Zoltán Szabó fbd9c9e2e3
[DOCS] Makes clearer the note under freq_rare. (#45193) 2019-08-05 13:28:22 +02:00
James Rodewig 8b152d6d79
Rename "indices APIs" to "index APIs" (#44863) 2019-08-02 14:09:46 -04:00
Lisa Cawley 53980c6267
[DOCS] Clarifies bucket span in overall buckets API (#45110) 2019-08-02 08:36:39 -07:00
Lisa Cawley 285f2e0625
[DOCS] Updates terms in machine learning get APIs (#44986) 2019-07-30 10:52:23 -07:00
István Zoltán Szabó c22296d0c2
[DOCS] Adds allow no jobs param to the GET, GET stats and Close APIs (#44503) 2019-07-30 14:22:14 +02:00
Lisa Cawley 75999ff83c
[DOCS] Updates anomaly detection terminology (#44888) 2019-07-26 11:07:01 -07:00
Lisa Cawley 3f31859669
[DOCS] Updates terms in machine learning datafeed APIs (#44883) 2019-07-26 10:47:03 -07:00
István Zoltán Szabó 84793476ba
[DOCS] Amends data frame analytics resources, GET, and PUT API docs (#44806)
This PR addresses the feedback in  https://github.com/elastic/ml-team/issues/175#issuecomment-512215731.

* Adds an example to `analyzed_fields`
* Includes `source` and `dest` objects inline in the resource page
* Lists `model_memory_limit` in the PUT API page
* Amends the `analysis` section in the resource page
* Removes Properties headings in subsections
2019-07-26 11:39:59 +02:00
Lisa Cawley aefb72040c
[DOCS] Updates terms in machine learning calendar APIs (#44866) 2019-07-25 11:20:42 -07:00
Lisa Cawley 990e037728
[DOCS] Updates terms in anomaly detection job APIs (#44839) 2019-07-25 08:58:16 -07:00
István Zoltán Szabó 5275392b47
[DOCS] Adds allow no datafeeds query param to the GET, GET stats and STOP datafeed APIs (#44499) 2019-07-25 16:45:06 +02:00
James Rodewig ea1adb61c2
[DOCS] Update anchors and links for Elasticsearch API relocation (#44500) 2019-07-19 09:16:35 -04:00
Lisa Cawley dbe7a48e82
[DOCS] Fixes query default value (#44572) 2019-07-18 08:15:28 -07:00
Lisa Cawley 4fd8e34662
[DOCS] Moves content to ML anomaly-detection folder (#44520) 2019-07-17 13:48:12 -07:00
Lisa Cawley 146be77ec3
[DOCS] Separates data frame analytics APIs (#44451)
* [DOCS] Separates data frame analytics APIs

* [DOCS] Adds links between new pages
2019-07-16 13:22:27 -07:00
James Rodewig bd52e148c5
[DOCS] Remove :edit_url: overrides. (#44445)
These overrides do not work in Asciidoctor and are no longer needed.
2019-07-16 15:02:38 -04:00
Lisa Cawley 2316703b93
[DOCS] Removes unnecessary resource definition pages (#44289)
* [DOCS] Removes calendar resource definition page

* [DOCS] Removes scheduled event and filter resource definitions
2019-07-15 09:44:57 -07:00
David Kyle 4402cf38bf
Wait for pending tasks in docs tests cleanup (#44123)
ML and Data Frame tests should wait for pending tasks
2019-07-15 11:58:09 +01:00
James Rodewig e5a3ae97e2 Revert "[DOCS] Fix broken links for ES API docs move (#44279)"
This reverts commit 3bdd2f4432.
2019-07-12 17:06:51 -04:00
James Rodewig 860984536c Revert "[DOCS] Fix broken link reused in Stack Overview"
This reverts commit c08c253432.
2019-07-12 17:06:44 -04:00
James Rodewig f9c09fa7f6 Revert "[DOCS] Fix broken links"
This reverts commit 313030263f.
2019-07-12 17:06:28 -04:00