Commit Graph

288 Commits

Author SHA1 Message Date
István Zoltán Szabó a75094e666
[DOCS] Removes inference from the names of trained model APIs. (#62036) 2020-09-07 11:23:29 +02:00
Lisa Cawley 511babde59
[DOCS] Refresh machine learning custom URL example (#61826) 2020-09-03 16:53:26 -07:00
Lisa Cawley f05d8c2b98
[DOCS] Per-partition categorization (#61506) 2020-08-26 17:07:46 -07:00
lcawl f56ab039ae [DOCS] Fix typo in update anomaly detection job API 2020-08-25 17:12:43 -07:00
Benjamin Trent 1b34c88d56
[ML] adding docs + hlrc for data frame analysis feature_processors (#61149)
Adds HLRC and some docs for the new feature_processors field in Data frame analytics.

Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co>
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-08-24 12:00:44 -04:00
James Rodewig a94e5cb7c4
[DOCS] Replace Wikipedia links with attribute (#61171) 2020-08-17 09:44:24 -04:00
James Rodewig 6b9b8c5e31
[DOCS] Move script and stored fields content to search fields page (#60826)
Changes:

* Moves `Retrieve selected fields` to its own page and adds a title abbreviation.
* Adds existing script and stored fields content to `Retrieve selected fields`
* Adds a xref for `Retrieve selected fields` to `Search your data`
* Adds related redirects and updates existing xrefs
2020-08-06 12:45:03 -04:00
István Zoltán Szabó c3536935b2
[DOCS] Adds inference phase to get DFA job stats. (#60737) 2020-08-05 16:22:21 +02:00
Przemysław Witek 29ee3a05b6
Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601) 2020-08-05 12:29:07 +02:00
James Rodewig 441c3a21b1
[DOCS] Update my-index examples (#60132)
Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
2020-07-27 14:46:39 -04:00
Lisa Cawley 1781d4a7b9
[DOCS] Fix security links in machine learning APIs (#60098) 2020-07-23 12:14:56 -07:00
James Rodewig 2774cd6938
[DOCS] Swap `[float]` for `[discrete]` (#60124)
Changes instances of `[float]` in our docs for `[discrete]`.

Asciidoctor prefers the `[discrete]` tag for floating headings:
https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks
2020-07-23 11:48:22 -04:00
James Rodewig 80b674fb25
[DOCS] Reformat snippets to use two-space indents (#59973) 2020-07-21 12:24:26 -04:00
Przemysław Witek 2a12dcf2e0
Rename binary_soft_classification evaluation to outlier_detection (#59951) 2020-07-21 14:27:57 +02:00
Lisa Cawley fb0157460f
[DOCS] Changes level offset of anomaly detection pages (#59911) 2020-07-20 16:33:54 -07:00
Lisa Cawley 823c337e76
[DOCS] Changes level offset for anomaly detection APIs (#59920) 2020-07-20 12:38:09 -07:00
Lisa Cawley 42be287b57
[DOCS] Changes level offset in data frame analytics APIs (#59919) 2020-07-20 12:11:47 -07:00
Benjamin Trent b551f75ec3
[ML] add new `custom` field to trained model processors (#59542)
This commit adds the new configurable field `custom`.

`custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job.

Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for 
the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature).

This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors 
in the analytics job configuration, we need to know the input and output field names.
2020-07-16 09:35:56 -04:00
Przemysław Witek dfbb47dcaa
Add a "verbose" option to the data frame analytics stats endpoint (#59589) 2020-07-15 15:59:56 +02:00
Dimitris Athanasiou da0249f6c2
[ML] Data frame analytics max_num_threads setting (#59254)
This adds a setting to data frame analytics jobs called
`max_number_threads`. The setting expects a positive integer.
When used the user specifies the max number of threads that may
be used by the analysis. Note that the actual number of threads
used is limited by the number of processors on the node where
the job is assigned. Also, the process may use a couple more threads
for operational functionality that is not the analysis itself.

This setting may also be updated for a stopped job.

More threads may reduce the time it takes to complete the job at the cost
of using more CPU.
2020-07-09 16:31:26 +03:00
James Rodewig 2be9db01c8
[DOCS] Replace `datatype` with `data type` (#58972) 2020-07-07 13:52:10 -04:00
Przemysław Witek 4a43b03855
Report peak model memory in ModelSizeStats (#59017) 2020-07-06 10:33:54 +02:00
Benjamin Trent 6238d4fc49
[ML] add exponent output aggregator to inference (#58933)
* [ML] add exponent output aggregator to inference

* fixing docs
2020-07-03 08:22:01 -04:00
Przemysław Witek 843c512e78
Rename regression evaluation metrics to make the names consistent with loss functions (#58887) 2020-07-02 16:19:27 +02:00
Przemysław Witek 38aa474dec
Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734) 2020-07-01 13:29:56 +02:00
Przemysław Witek dfa06240fc
Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684) 2020-06-30 13:06:15 +02:00
István Zoltán Szabó d0042fb791
[DOCS] Updates results_field description in the inference processor docs (#58554) 2020-06-29 11:28:17 +02:00
Przemysław Witek 3953de4c98
Introduce DataFrameAnalyticsConfig update API (#58302) 2020-06-29 09:26:31 +02:00
Dimitris Athanasiou 96853df6af
[ML] Rename increased_memory_estimate_bytes (#58614)
... to memory_reestimate_bytes in DF Analytics
memory usage.

Relates #58588
2020-06-27 12:04:39 +03:00
Dimitris Athanasiou 0994005c2e
[ML] Add status and increased estimate to memory usage (#58588)
Adds parsing of `status` and `increased_memory_estimate_bytes`
to data frame analytics `memory_usage`. When the training surpasses
the model memory limit, the status will be set to `hard_limit` and
`increased_memory_estimate_bytes` can be used to update the job's
limit in order to restart the job.
2020-06-26 16:10:14 +03:00
István Zoltán Szabó 3b61ec1fe2
[DOCS] Updates screenshots in ML population analysis (#58318) 2020-06-23 09:03:31 +02:00
Benjamin Trent a43ff95f2d
[ML] calculate cache misses for inference and return in stats (#58252)
When a local model is constructed, the cache hit miss count is incremented.

When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.
2020-06-18 17:18:43 -04:00
Przemysław Witek 76c7e3259f
Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808) 2020-06-08 15:31:37 +02:00
David Kyle bbeda643a6
Delete expired data by job (#57337)
Deleting expired data can take a long time leading to timeouts if there
are many jobs. Often the problem is due to a few large jobs which 
prevent the regular maintenance of the remaining jobs. This change adds
a job_id parameter to the delete expired data endpoint to help clean up
those problematic jobs.
2020-06-05 13:32:35 +01:00
David Roberts 605b4d0ea9
[ML] Add per-partition categorization option (#57683)
This PR adds the initial Java side changes to enable
use of the per-partition categorization functionality
added in elastic/ml-cpp#1293.

There will be a followup change to complete the work,
as there cannot be any end-to-end integration tests
until elastic/ml-cpp#1293 is merged, and also
elastic/ml-cpp#1293 does not implement some of the
more peripheral functionality, like stop_on_warn and
per-partition stats documents.

The changes so far cover REST APIs, results object
formats, HLRC and docs.
2020-06-05 11:56:15 +01:00
Dimitris Athanasiou e116ac850f
[ML] Fix race condition when force stopping DF analytics job (#57680)
When we force delete a DF analytics job, we currently first force
stop it and then we proceed with deleting the job config.
This may result in logging errors if the job config is deleted
before it is retrieved while the job is starting.

Instead of force stopping the job, it would make more sense to
try to stop the job gracefully first. So we now try that out first.
If normal stop fails, then we resort to force stopping the job to
ensure we can go through with the delete.

In addition, this commit introduces `timeout` for the delete action
and makes use of it in the child requests.
2020-06-05 12:13:02 +03:00
István Zoltán Szabó 3a15d84af9
[DOCS] Changes parameter order in model_plot_config. (#57642) 2020-06-04 10:57:36 +02:00
Przemysław Witek c4c094c006
Introduce ModelPlotConfig. annotations_enabled setting (#57539) 2020-06-04 09:27:40 +02:00
Lisa Cawley 0f52cab495
[DOCS] Replaces docdir attributes in ML APIs (#57390) 2020-06-01 11:46:10 -07:00
Benjamin Trent 251b17009a
[ML] adds new for_export flag to GET _ml/inference API (#57351)
Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API.

This flag is useful for moving models between clusters.
2020-05-29 12:29:28 -04:00
Benjamin Trent ec67787a2e
[ML] add max_model_memory parameter to forecast request (#57254)
This adds a max_model_memory setting to forecast requests. 
This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb").

The default value is `20mb`.

There is a HARD limit at `500mb` which will throw an error if used.

If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited.

related native change: https://github.com/elastic/ml-cpp/pull/1238

closes: https://github.com/elastic/elasticsearch/issues/56420
2020-05-29 08:59:50 -04:00
István Zoltán Szabó eaf0d5ffee
[DOCS] Puts a link into the loss_function variable description (#56678) 2020-05-28 09:42:27 +02:00
István Zoltán Szabó b9b3546985
[DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196) 2020-05-27 13:42:50 +02:00
István Zoltán Szabó 90056edaf4
[DOCS] Improves navigation between forecast APIs and adds short description. (#57035) 2020-05-25 09:09:47 +02:00
István Zoltán Szabó 69b6041d57
[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031) 2020-05-21 17:30:59 +02:00
Benjamin Trent 8fed077b0a
[ML] relax throttling on expired data cleanup (#56711)
Throttling nightly cleanup as much as we do has been over cautious.

Night cleanup should be more lenient in its throttling. We still
keep the same batch size, but now the requests per second scale
with the number of data nodes. If we have more than 5 data nodes,
we don't throttle at all.

Additionally, the API now has `requests_per_second` and `timeout` set.
So users calling the API directly can set the throttling.

This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`.
This will allow users to adjust throttling of the nightly maintenance.
2020-05-18 07:21:06 -04:00
David Roberts cbb8b17d74
[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288)
Docs for #55735

Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-14 09:24:07 +01:00
Lisa Cawley 84e28e42c8
[DOCS] Clarify model snapshot retention properties (#56477) 2020-05-11 07:41:47 -07:00
István Zoltán Szabó c994369893
[DOCS] Expands GET DFA stats API docs with new phases (#56407)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2020-05-11 09:22:30 +02:00
David Roberts c99021cdcb
[ML] More advanced model snapshot retention options (#56125)
This PR implements the following changes to make ML model snapshot
retention more flexible in advance of adding a UI for the feature in
an upcoming release.

- The default for `model_snapshot_retention_days` for new jobs is now
  10 instead of 1
- There is a new job setting, `daily_model_snapshot_retention_after_days`,
  that defaults to 1 for new jobs and `model_snapshot_retention_days`
  for pre-7.8 jobs
- For days that are older than `model_snapshot_retention_days`, all
  model snapshots are deleted as before
- For days that are in between `daily_model_snapshot_retention_after_days`
  and `model_snapshot_retention_days` all but the first model snapshot
  for that day are deleted
- The `retain` setting of model snapshots is still respected to allow
  selected model snapshots to be retained indefinitely

Closes #52150
2020-05-05 12:55:50 +01:00