Commit Graph

438 Commits

Author SHA1 Message Date
István Zoltán Szabó c879db98b1
[DOCS] Updates get trained models API docs (#79372)
* [DOCS] Updates get trained models API docs.

* [DOCS] Reviews get trained models related definitions in ml-shared.
2021-10-25 11:47:45 +02:00
István Zoltán Szabó 94ab204a1e
[DOCS] Fixes indentation issue in GET trained models API docs. (#79347) 2021-10-18 12:27:24 +02:00
Lisa Cawley 3d6074b76e
[DOCS] Fixes typo in calendar API example (#78867) 2021-10-07 17:51:14 -07:00
Lisa Cawley df5dde5b3c
[DOCS] Fixes ML get calendars API (#78808) 2021-10-07 12:22:11 -07:00
Lisa Cawley bcd75c3203
[DOCS] Fixes ML get scheduled events API (#78809) 2021-10-07 08:34:58 -07:00
Benjamin Trent 498e6e3d0f
[ML] adding docs for estimated heap and operations (#78376)
Add docs for optionally supplying memory and operation estimates in put model
2021-09-29 09:11:42 -04:00
Benjamin Trent b96d929af3
[ML] add documentation for get deployment stats API (#78412)
* [ML] add documentation for get deployment stats API

* Apply suggestions from code review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2021-09-29 07:20:25 -04:00
Benjamin Trent 408489310c
[ML] add zero_shot_classification task for BERT nlp models (#77799)
Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels.

This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs  text sequences with "entailment" clauses. An example could be:

"Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. 

This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification.

See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. 

The zeroshot classification task is configured as follows:
```js
{
   // <snip> model configuration </snip>
  "inference_config" : {
    "zero_shot_classification": {
      "classification_labels": ["entailment", "neutral", "contradiction"], // <1>
      "labels": ["sad", "glad", "mad", "rad"], // <2>
      "multi_label": false, // <3>
      "hypothesis_template": "This example is {}.", // <4>
      "tokenization": { /*<snip> tokenization configuration </snip>*/}
    }
  }
}
```
* <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case
* <2> This is an optional parameter for the default zero_shot labels to attempt to classify
* <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels
* <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.`

For inference in a pipeline one may provide label updates:
```js
{
  //<snip> pipeline definition </snip>
  "processors": [
    //<snip> other processors </snip>
    {
      "inference": {
        // <snip> general configuration </snip>
        "inference_config": {
          "zero_shot_classification": {
             "labels": ["humanities", "science", "mathematics", "technology"], // <1>
             "multi_label": true // <2>
          }
        }
      }
    }
    //<snip> other processors </snip>
  ]
}
```
* <1> The `labels` we care about, these replace the default ones if they exist. 
* <2> Should the results allow multiple true labels

Similarly one may provide label changes against the `_infer` endpoint
```js
{
   "docs":[{ "text_field": "This is a very happy person"}],
   "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}}
}
```
2021-09-28 09:38:23 -04:00
Benjamin Trent 00defa38a9
[ML] adding some initial document for our pytorch NLP model support (#78270)
Adding docs for:

put vocab
put model definition part
start deployment
all the new NLP configuration objects for trained model configurations
2021-09-27 12:46:13 -04:00
Benjamin Trent 281ec58b8d
[ML] add new default char filter `first_line_with_letters` for machine learning categorization (#77457)
The char filter replaces the previous default of `first_non_blank_line`.

`first_non_blank_line` worked well to figure out what line had characters at all, but log lines 
like the following were handled poorly:
```
--------------------------------------------------------------------------------

Alias 'foo' already exists and this prevents setting up ILM for logs

--------------------------------------------------------------------------------
```
When combined with the `ml_standard` tokenizer, the first line was used:
```
--------------------------------------------------------------------------------
```
This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer.


The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true).

Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens:

```
"tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"]
```
2021-09-09 10:09:57 -04:00
Lisa Cawley b5a32678e7
[DOCS] Fixes admonition formatting (#77393) 2021-09-08 11:20:43 -07:00
Benjamin Trent a68c6acdb3
[ML] adding new PUT trained model vocabulary endpoint (#77387)
This commit removes the ability to set the vocabulary location in the model config.
This opts instead for sane defaults to be set and used. Wrapping this up in an
API.

The index is now always the internally managed .ml-inference-native index
and the document ID is always <model_id>_vocabulary

This API only works for pytorch/nlp type models.
2021-09-08 10:21:45 -04:00
Benjamin Trent 708491d0d3
[ML] add allocation state reason and support for partial model allocations (#76925)
Previously, if a model failed to be allocated on any node, the deployment failed.

This commit allows for an allocation to be partially_started and indicates its
current state via a new state value in the deployment stats API.

Additionally, when starting a deployment, the user may specify to wait_for
starting, partially_started, started and the API will block (as long as timeout doesn't expire) until that state is reached.
2021-09-07 15:23:13 -04:00
Benjamin Trent de49ff22a4
[ML] creating new PUT model definition part API (#76987)
This commit simplifies the interactions for uploading chunked model definitions and model vocabulary.
2021-09-07 08:22:52 -04:00
Benjamin Trent 02e17c3442
[ML] adding new defer_definition_decompression parameter to put trained model API (#77189)
This new parameter is a boolean parameter that allows
users to put in a compressed model without it having
to be inflated on the master node during the put
request

This is useful for system/module set up and then later
having the model validated and fully parsed when it
is being loaded on a node for usage
2021-09-03 09:07:54 -04:00
István Zoltán Szabó cdec5228e8
[DOCS] Fixes line breaks. (#77248) 2021-09-03 14:40:43 +02:00
István Zoltán Szabó 70a012b0c7
[DOCS] Fixes section IDs in start/stop trained model deployment APIs. (#77247) 2021-09-03 14:24:37 +02:00
Lisa Cawley 007469af63
[DOCS] Replaces index pattern in ML docs (#77041) 2021-09-01 10:26:06 -07:00
Benjamin Trent 0e1efa6533
[ML] generalize pytorch sentiment analysis to text classification (#77084)
* [ML] generalize pytorch sentiment analysis to text classification

* Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TextClassificationConfig.java
2021-09-01 08:45:13 -04:00
István Zoltán Szabó ea007902ef
[DOCS] Adds anomaly job health alert type docs (#76659)
Co-authored-by: Lisa Cawley <lcawley@elastic.co>
2021-08-30 16:11:34 +02:00
Lisa Cawley d36f24fbc3
[DOCS] Update datafeed details in ML docs (#76854) 2021-08-25 11:35:21 -07:00
István Zoltán Szabó 789368b38f
[DOCS] Fixes a syntax error in datafeed runtime field example. (#76917) 2021-08-25 12:04:32 +02:00
István Zoltán Szabó 8aed99fc02
[DOCS] Adds links that point to loss function to ML API docs. (#76438) 2021-08-23 13:09:37 +02:00
István Zoltán Szabó 7faec52a1e
[DOCS] Fixes model_prune_window property description. (#76711) 2021-08-19 16:16:37 +02:00
István Zoltán Szabó b9d875bf68
[DOCS] Updates description of model_prune_window property in ML shared (#76487) 2021-08-13 12:18:38 +02:00
István Zoltán Szabó 9b0417f2df
[DOCS] Comments out links that points to regression loss functions (#76435)
* [DOCS] Comments out links that points to regression loss functions.

* Update docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc
2021-08-12 18:33:42 +02:00
David Roberts 7ac5ea39df
[ML] Use results retention time for deleting system annotations (#76096)
In #75617 a new setting, system_annotations_retention_days, was
added to control how long system annotations are retained for.
We now feel that this setting is redundant and that system
annotations should be retained for the same period as results.
This is intuitive and defensible, as system annotations can be
considered a type of result.

Followup to #75617
2021-08-04 17:42:31 +01:00
David Roberts 10a1d27c7b
[ML] Deleting a job now deletes the datafeed if necessary (#76010)
Previously attempting to delete a job that had a datafeed
would return an exception. However, this was unnecessarily
pedantic - the user would always want to delete both job
and datafeed together, and would react by deleting the
datafeed and then subsequently deleting the job again.

This change makes the delete job API automatically delete
a datafeed associated with the job. The same level of
force is used for this delete datafeed request as was used
on the delete job request. This means that it's possible
to force-delete an open job with a started datafeed (since
force-delete datafeed will automatically stop a started
datafeed). It's still not possible to delete an opened job
without using force.
2021-08-03 17:22:06 +01:00
James Rodewig fc0ac1923d
[DOCS] Correct spelling for geo terms (#76028)
Changes:
* Use "geopoint" when not referring to the literal field type
* Use "geoshape" when not referring to the literal field type or query type
* Use "GeoJSON" consistently
2021-08-03 09:55:48 -04:00
Ed Savage 5651215be1
[ML] Add 'model_prune_window' field to AD job config (#75741)
Add configuration for pruning dead split fields in anomaly detection
jobs via the `model_prune_window` field for both the job creation and
update APIs.

Relates to ml-cpp/#1962
2021-08-03 09:16:43 +01:00
István Zoltán Szabó ce537a33b6
[DOCS] Adds link that points to outlier detection example to GET DFA stats API docs. (#75689) 2021-08-02 18:10:03 +02:00
István Zoltán Szabó 8d4fb3aa84
[DOCS] Changes link to outlier detection docs in PUTDFA API docs. (#75933) 2021-08-02 13:45:37 +02:00
Przemysław Witek 30d9f13436
[ML] Delete expired annotations (#75617) 2021-07-29 15:27:03 +02:00
Lisa Cawley c1ba949aee
[DOCS] Fixes bulleted list in ML aggregations (#75806) 2021-07-28 11:29:48 -07:00
Lisa Cawley 02d851e50e
[DOCS] Drafts trained model deployment APIs (#75497) 2021-07-26 09:49:37 -07:00
István Zoltán Szabó 7e7a386078
[DOCS] Comments out link that points to outlier detection example (#75687) 2021-07-26 16:36:57 +02:00
Lisa Cawley 70b870ee7f
[DOCS] Fixes nesting of datafeed config in APIs (#75502) 2021-07-20 11:27:15 -07:00
István Zoltán Szabó 9ef156df9f
[DOCS] Adds peak_model_bytes and assignment_memory_basis to GET model snapshot API docs (#75413) 2021-07-16 17:12:47 +02:00
Lisa Cawley c8c7f0ef52
[DOCS] Anomaly detection: Visualize delayed data (#75098) 2021-07-13 18:06:07 -07:00
Lisa Cawley 3c76bcb3a5
[DOCS] Fixes links to machine learning concepts (#75194) 2021-07-09 13:09:03 -07:00
István Zoltán Szabó 6a4de77e11
[DOCS] Adds classification and regression links back to DFA docs. (#74930) 2021-07-08 16:37:16 +02:00
István Zoltán Szabó 841cfb9214
[DOCS] Adds outlier detection links to DFA API docs (#74748) 2021-07-06 15:10:41 +02:00
Lisa Cawley b71b7d0866
[DOCS] Fix links to anomaly detection overview (#74943) 2021-07-05 13:19:54 -07:00
Lisa Cawley 4c85852cc7
[DOCS] Update forecasting links in ML APIs (#74942) 2021-07-05 12:34:03 -07:00
Lisa Cawley 5bcd318e29
[DOCS] Move ML functions to appendix (#74802) 2021-07-05 11:53:17 -07:00
István Zoltán Szabó 483d145f78
[DOCS] Fixes an attribute in PUT DFA API docs. (#74931) 2021-07-05 17:08:11 +02:00
István Zoltán Szabó 6c6e6874ff
[DOCS] Removes link to classification and regression. (#74926) 2021-07-05 16:28:14 +02:00
István Zoltán Szabó a4f9f4fae1
[DOCS] Comments out links to outlier detection. (#74745) 2021-06-30 14:24:34 +02:00
Lisa Cawley 64af39b759
[DOCS] Add memory limit details in update job API (#74517)
Co-authored-by: David Roberts <dave.roberts@elastic.co>
2021-06-24 08:50:19 -07:00
Benjamin Trent 0303e6d733
[ML] add datafeed field to the job config (#74265)
This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed.

The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.
2021-06-23 08:06:58 -04:00