Commit Graph

544 Commits

Author SHA1 Message Date
Lisa Cawley 3f2f9de928
[DOCS] Refresh machine learning rule docs (#92013) 2022-12-05 07:47:42 -08:00
István Zoltán Szabó 99415818e2
[DOCS] Adds semantic search API to the trained model API list (#91815) 2022-11-22 18:08:06 +01:00
Ed Savage e0e32caf28
[ML] Option to delete user-added annotations for the reset/delete job APIs (#91698)
Currently there is no way to remove user-added annotations when a job is deleted or reset.
This change adds an option - delete_user_annotations - to both the delete and reset job APIs.
The default value is false, to keep the behaviour of these calls as it is currently.
2022-11-18 17:17:33 +00:00
David Kyle 7b9a6fe3db
{ML] Correct index for text_similarity config (#91644) 2022-11-17 10:58:36 +00:00
István Zoltán Szabó 612a7b673a
[DOCS] Highlights inference caching behavior (#91608) 2022-11-16 13:17:49 +01:00
Benjamin Trent 2e8bf33b0a
[ML] allow model_aliases to be used with Pytorch trained models (#91296)
This adds model_alias support for native pytorch models.

Model aliases can be used in `_infer` or within the inference processor. This way the alias can be atomically changed without down time to another deployed model. 

Restrictions:
 - Model alias changes need to be done between two models of the same kind (e.g. pytorch -> pytorch)
 - Model alias change is not allowed between a model that is deployed to a model that is not
 - Model alias change is not allowed between a model that deployed AND allocated to a model that is deployed but NOT allocated (not assigned to any nodes).
 - A deployment cannot be stopped (without supplying the `force` parameter) when the model has a model alias that is used by a pipeline.


closes: https://github.com/elastic/elasticsearch/issues/90960
2022-11-08 08:35:33 -05:00
Lisa Cawley 9e83084020
[DOCS] Clarify description of geo_results (#91237) 2022-11-04 08:15:46 -07:00
Dimitris Athanasiou 4e67df8b05
[ML] Low priority trained model deployments (#91234)
This adds a new parameter to the start trained model deployment API,
namely `priority`. The available settings are `normal` and `low`.

For normal priority deployments the allocations get distributed so that
node processors are never oversubscribed.

Low priority deployments allow users to test model functionality even if there
are no node processors available. They are limited to 1 allocation with a single thread.
In addition, the process is executed in low priority which limits the amount of
CPU that can be used when the CPU is under pressure. The intention of this is to
limit the impact of low priority deployments on normal priority deployments.

When we rebalance model assignments we now:

  1. compute a plan just for normal priority deployments
  2. fix the resources used by normal deployments
  3. compute a plan just for low priority deployments
  4. merge the two plans

Closes #91024
2022-11-04 14:22:30 +02:00
Valeriy Khakhutskyy 7c4186ddbc
[ML] Update API documentation for anomaly score explanation (#91177)
This PR updates the API documentation to match the UI.

Co-authored-by: lcawl <lcawley@elastic.co>
2022-11-01 21:43:33 +01:00
Valeriy Khakhutskyy 95758e88a2
[ML] Explain anomaly score factors (#90675)
This PR surfaces new information about the impact of the factors on the initial anomaly score in the anomaly record:

- single bucket impact is determined by the deviation between actual and typical in the current bucket
- multi-bucket impact is determined by the deviation between actual and typical in the past 12 buckets
- anomaly characteristics are statistical properties of the current anomaly compared to the historical observations
- high variance penalty is the reduction of anomaly score in the buckets with large confidence intervals.
- incomplete bucket penalty is the reduction of anomaly score in the buckets with fewer samples than historically expected.

Additionally, we compute lower- and upper-confidence bounds and the typical value for the anomaly records. This improves the explainability of the cases where the model plot is not activated with only a slight overhead in performance (1-2%).
2022-10-12 16:57:06 +02:00
Dimitris Athanasiou 16bfc550ea
[ML] Add api to update trained model deployment number_of_allocations (#90728)
This commit adds a new API that users can use calling:

```
POST _ml/trained_models/{model_id}/deployment/_update
{
  "number_of_allocations": 4
}
```

This allows a user to update the number of allocations for a deployment
that is `started`.

If the allocations are increased we rebalance and let the assignment
planner find how to allocate the additional allocations.

If the allocations are decreased we cannot use the assignment planner.
Instead, we implement the reduction in a new class `AllocationReducer`
that tries to reduce the allocations so that:

  1. availability zone balance is maintained
  2. assignments that can be completely stopped are preferred to release memory
2022-10-12 10:04:23 +03:00
Lisa Cawley db2882cbb5
[DOCS] Add links to clear trained model deployment cache API (#90727) 2022-10-06 10:10:55 -07:00
David Kyle 17579ae1af
[ML] Add stat for non cache hit inference time (#90464) 2022-09-29 12:18:27 +01:00
David Roberts d9ea080d10
[ML] Release native inference functionality as beta (#90418)
Previously this functionality was tech preview (aka experimental).
This PR changes it to beta.
2022-09-28 11:09:02 +01:00
Ed Savage fd20027751
[ML] Performance improvements for categorization jobs (#89824)
Categorization of strings which break down to a huge number of tokens can cause the C++ backend process to choke - see elastic/ml-cpp#2403.

This PR adds a limit filter to the default categorization analyzer which caps the number of tokens passed to the backend at 100.

Unfortunately this isn't a complete panacea to all the issues surrounding categorization of many tokened / large messages as verification checks on the frontend can also fail due to calls to the datafeed _preview API returning an excessive amount of data.
2022-09-08 18:41:01 +01:00
István Zoltán Szabó 7de1a6efc5
[DOCS] Simplifies composite aggregation recommendation (#89878) 2022-09-07 17:54:05 +02:00
István Zoltán Szabó e244473962
[DOCS] Reworks aggregating data for faster performance page (#89575) 2022-09-01 13:59:05 +02:00
István Zoltán Szabó cbda0a51c6
[DOCS] Adds text similarity task example to API docs (#89756) 2022-09-01 11:53:26 +02:00
Dimitris Athanasiou b5504ea701
[ML] Lift limit of max number of classes for classification to 100 (#89755)
Limit was previously set to `30`. After the improvements in elastic/ml-cpp#2395
we now raist the limit to `100`.
2022-09-01 10:47:58 +03:00
Dimitris Athanasiou 32d512286d
[ML] Validate trained model deployment queue_capacity limit (#89573)
When starting a trained model deployment, a queue is created.
If the queue_capacity is too large, it can lead to OOM and a node
crash.

This commit adds validation that the queue_capacity cannot be more
than 1M.

Closes #89555
2022-08-24 16:52:19 +03:00
István Zoltán Szabó 74d694e0fd
[DOCS] Resizes anomaly detection screenshot properly. (#89544) 2022-08-23 16:38:15 +02:00
István Zoltán Szabó ac71b52ab3
[DOCS] Updates anomaly detection alert rule type screenshot. (#89532) 2022-08-23 15:37:40 +02:00
Benjamin Trent d588d456f0
[ML] add new trained model deployment cache clear API (#89074)
This adds a new `_ml/trained_models/<model_id>/deployment/cache/_clear` API. This will clear the inference cache on every node where the model is allocated.
2022-08-04 19:45:15 +01:00
Benjamin Trent 9ce59bb7a9
[ML] add text_similarity nlp task documentation (#88994)
Introduced in: #88439

* [ML] add text_similarity nlp task documentation

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/trained-models/apis/infer-trained-model.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Update docs/reference/ml/ml-shared.asciidoc

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2022-08-02 12:17:14 -04:00
Dimitris Athanasiou 3f9334012f
[ML] Fix version substitution in put DFA docs (#88862)
This fixes the version substitution in a couple of response examples in
the put DFA docs.
2022-07-28 01:37:30 +09:30
David Roberts 15e7b06b79
[ML] Add inference cache hit count to inference node stats (#88807)
The inference node stats for deployed PyTorch inference
models now contain two new fields: `inference_cache_hit_count`
and `inference_cache_hit_count_last_minute`.

These indicate how many inferences on that node were served
from the C++-side response cache that was added in
https://github.com/elastic/ml-cpp/pull/2305. Cache hits
occur when exactly the same inference request is sent to the
same node more than once.

The `average_inference_time_ms` and
`average_inference_time_ms_last_minute` fields now refer to
the time taken to do the cache lookup, plus, if necessary,
the time to do the inference. We would expect average inference
time to be vastly reduced in situations where the cache hit
rate is high.
2022-07-26 17:53:43 +01:00
Benjamin Trent a044b5c01e
[ML] make composite aggs in datafeeds Generally Available (#88589)
Commit makes composite aggs in datafeeds generally available.
2022-07-19 12:41:25 -04:00
Benjamin Trent afa28d49b4
[ML] add new cache_size parameter to trained_model deployments API (#88450)
With: https://github.com/elastic/ml-cpp/pull/2305 we now support caching pytorch inference responses per node per model.

By default, the cache will be the same size has the model on disk size. This is because our current best estimate for memory used (for deploying) is 2*model_size + constant_overhead. 

This is due to the model having to be loaded in memory twice when serializing to the native process. 

But, once the model is in memory and accepting requests, its actual memory usage is reduced vs. what we have "reserved" for it within the node.

Consequently, having a cache layer that takes advantage of that unused (but reserved) memory is effectively free. When used in production, especially in search scenarios, caching inference results is critical for decreasing latency.
2022-07-18 09:19:01 -04:00
István Zoltán Szabó cf68d0f13c
[DOCS] Updates infer trained model API docs with inference_config (#88500)
Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>
2022-07-13 17:47:05 +02:00
Lisa Cawley 7e214fc51b
[DOCS] Add authorization info to create, get, and update DFA jobs APIs (#88098) 2022-06-30 08:41:04 -07:00
Lisa Cawley c9b4499d2e
[DOCS] Add authorization details to update datafeed API (#88099) 2022-06-28 13:43:58 -07:00
Lisa Cawley aa19690990
[DOCS] Add authorization to anomaly detection job and datafeed API examples (#87937) 2022-06-27 13:05:35 -07:00
Dimitris Athanasiou f3199e968b
[ML] Adjust docs for distributed model allocation (#87955)
[ML] Adjust docs for distributed model allocation

Follow up to #87366
2022-06-23 15:35:58 +03:00
Lisa Cawley 76cd7b63a4
[DOCS] Add authorization info to get anomaly detection jobs API (#87904) 2022-06-22 15:15:33 -07:00
István Zoltán Szabó 78c0ad91fc
[DOCS] Adds note to time_of_week function about how values are calculated (#87871)
Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>
2022-06-22 10:22:49 +02:00
Dimitris Athanasiou 679351e224
[ML] Require that threads_per_allocation is a power of 2 (#87697)
As the number of cores in CPUs is typically a power of 2,
this commit adds a validation that trained model deployments
start with `threads_per_allocation` set to be a power of 2.
When we look for how we distribute the allocations across the
cluster, this prevents situations where we have a lot of wasted
CPU cores.

In addition, we add a max value limit of `32`.
2022-06-17 15:12:37 +03:00
Lisa Cawley 5b6838e6ec
[DOCS] Fix typo in anomaly detection example (#87668) 2022-06-14 14:34:33 -07:00
Lisa Cawley 32f6082b7e
[DOCS] Typo in time functions (#87373) 2022-06-03 08:40:12 -07:00
István Zoltán Szabó a71ad6e407
[DOCS] Expands AD and Transform alert docs with info on context for recovered alerts (#87118) 2022-06-02 09:52:47 +02:00
Benjamin Trent 115f19ff6d
[ML] adds start and end params to _preview and excludes cold/frozen tiers from unbounded previews (#86989)
n larger clusters with complicated datafeed requirements, being able to preview only a specific window of time is important. Previously, datafeed previews would always start at 0 (or from the beginning of the data). This causes issues if the index pattern contains indices on slower hardware, but when the datafeed is actually started, the "start" time is set to more recent data (and thus on faster hardware).

Additionally, when _preview is unbounded (as before), it attempts to only preview indices that are NOT frozen or cold. This is done through a query against the _tier field. Meaning, it only effects newer indices that actually have that field set.
2022-05-20 13:56:53 -04:00
István Zoltán Szabó f3e8904b2c
[DOCS] Adds settings of question_answering to inference_config of PUT and infer trained model APIs (#86895)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2022-05-19 11:04:14 +02:00
Lisa Cawley 6b7320790f
[DOCS] Updates example output for start trained model deployment API (#86824) 2022-05-17 07:27:44 -07:00
Lisa Cawley a9c8c12814
[DOCS] Removes infer trained model deployment API (#86497) 2022-05-10 09:56:36 -07:00
Dimitris Athanasiou 68c51f3ada
[ML] Rename threading params in _start trained model deployment API (#86597)
When starting a trained model deployment the user can tweak performance
by setting the `model_threads` and `inference_threads` parameters.
These parameters are hard to understand and cause confusion.

This commit renames these as well as the fields where their values are
reported in the stats API.

- `model_threads` => `number_of_allocations`
- `inference_threads` => `threads_per_allocation`

Now the terminology is as follows.

A model deployment starts with a requested `number_of_allocations`.
Each allocation means the model gets another thread for executing
parallel inference requests. Thus, more allocations should increase
throughput. In its turn, each allocation is may be using a number
of threads to parallelize each individual inference request.
This is the `threads_per_allocation` setting and increases inference
speed (which might also result in improved throughput).
2022-05-10 17:41:00 +03:00
Lisa Cawley 89a3e18e10
[DOCS] Add preview admonition to infer API (#86486) 2022-05-05 13:49:02 -07:00
Benjamin Trent a907f0bb6f
[ML] add new trained_models/{model_id}/_infer endpoint for all supervised models and deprecate deployment infer api (#86361)
This commit adds a new `_ml/trained_models/{model_id}/_infer` API. This api works for both native NLP models and supervised models trained via Data Frame analytics. 

The format of the API is the same as the old `_ml/trained_models/{model_id}/deployment/_infer`. Taking a `docs` and an `inference_config` parameter.

This PR also deprecates the old experimental `_ml/trained_models/{model_id}/deployment/_infer` API.

The biggest difference is that the response now nests all results under an "inference_results" object.

closes: https://github.com/elastic/elasticsearch/issues/86032
2022-05-05 14:58:59 -04:00
Benjamin Trent 25d1afbe6f
[ML] rename trained model allocations to assignments (#85503)
This renames the internal concept of a trained model allocation into an assignment.

Now models are assigned to a node and routes created for inference. Not "allocated".

This is an internal rename only. The user facing concepts of trained models and deployments are untouched.
2022-04-18 11:35:10 -04:00
István Zoltán Szabó 7f556ece75
[DOCS] Adds size param to evaluate DFA API docs (#85735) 2022-04-07 10:03:09 +02:00
Dimitris Athanasiou 5d670e45ac
Revert "[ML] Only one of `inference_threads` and `model_threads` may be great… (#84794)" (#85089)
This reverts commit 4eaedb265d.

On further investigation of how to improve allocation of trained models,
we concluded that being able to set `inference_threads` in combination with
`model_threads` is fundamental for scalability.
2022-03-18 09:41:27 +02:00
Benjamin Trent 258d2b71e2
[ML] add roberta/bart docs (#85001)
adds roberta section to NLP tokenization documentation.
2022-03-17 12:14:57 -04:00