elasticsearch

Commit Graph

Author	SHA1	Message	Date
István Zoltán Szabó	c879db98b1	[DOCS] Updates get trained models API docs (#79372 ) * [DOCS] Updates get trained models API docs. * [DOCS] Reviews get trained models related definitions in ml-shared.	2021-10-25 11:47:45 +02:00
István Zoltán Szabó	94ab204a1e	[DOCS] Fixes indentation issue in GET trained models API docs. (#79347 )	2021-10-18 12:27:24 +02:00
Lisa Cawley	3d6074b76e	[DOCS] Fixes typo in calendar API example (#78867 )	2021-10-07 17:51:14 -07:00
Lisa Cawley	df5dde5b3c	[DOCS] Fixes ML get calendars API (#78808 )	2021-10-07 12:22:11 -07:00
Lisa Cawley	bcd75c3203	[DOCS] Fixes ML get scheduled events API (#78809 )	2021-10-07 08:34:58 -07:00
Benjamin Trent	498e6e3d0f	[ML] adding docs for estimated heap and operations (#78376 ) Add docs for optionally supplying memory and operation estimates in put model	2021-09-29 09:11:42 -04:00
Benjamin Trent	b96d929af3	[ML] add documentation for get deployment stats API (#78412 ) * [ML] add documentation for get deployment stats API * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2021-09-29 07:20:25 -04:00
Benjamin Trent	408489310c	[ML] add zero_shot_classification task for BERT nlp models (#77799 ) Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels. This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be: "Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification. See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. The zeroshot classification task is configured as follows: ```js { // <snip> model configuration </snip> "inference_config" : { "zero_shot_classification": { "classification_labels": ["entailment", "neutral", "contradiction"], // <1> "labels": ["sad", "glad", "mad", "rad"], // <2> "multi_label": false, // <3> "hypothesis_template": "This example is {}.", // <4> "tokenization": { /<snip> tokenization configuration </snip>/} } } } ``` * <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case * <2> This is an optional parameter for the default zero_shot labels to attempt to classify * <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels * <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.` For inference in a pipeline one may provide label updates: ```js { //<snip> pipeline definition </snip> "processors": [ //<snip> other processors </snip> { "inference": { // <snip> general configuration </snip> "inference_config": { "zero_shot_classification": { "labels": ["humanities", "science", "mathematics", "technology"], // <1> "multi_label": true // <2> } } } } //<snip> other processors </snip> ] } ``` * <1> The `labels` we care about, these replace the default ones if they exist. * <2> Should the results allow multiple true labels Similarly one may provide label changes against the `_infer` endpoint ```js { "docs":[{ "text_field": "This is a very happy person"}], "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}} } ```	2021-09-28 09:38:23 -04:00
Benjamin Trent	00defa38a9	[ML] adding some initial document for our pytorch NLP model support (#78270 ) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations	2021-09-27 12:46:13 -04:00
Benjamin Trent	281ec58b8d	[ML] add new default char filter `first_line_with_letters` for machine learning categorization (#77457 ) The char filter replaces the previous default of `first_non_blank_line`. `first_non_blank_line` worked well to figure out what line had characters at all, but log lines like the following were handled poorly: ``` -------------------------------------------------------------------------------- Alias 'foo' already exists and this prevents setting up ILM for logs -------------------------------------------------------------------------------- ``` When combined with the `ml_standard` tokenizer, the first line was used: ``` -------------------------------------------------------------------------------- ``` This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer. The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true). Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens: ``` "tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"] ```	2021-09-09 10:09:57 -04:00
Lisa Cawley	b5a32678e7	[DOCS] Fixes admonition formatting (#77393 )	2021-09-08 11:20:43 -07:00
Benjamin Trent	a68c6acdb3	[ML] adding new PUT trained model vocabulary endpoint (#77387 ) This commit removes the ability to set the vocabulary location in the model config. This opts instead for sane defaults to be set and used. Wrapping this up in an API. The index is now always the internally managed .ml-inference-native index and the document ID is always <model_id>_vocabulary This API only works for pytorch/nlp type models.	2021-09-08 10:21:45 -04:00
Benjamin Trent	708491d0d3	[ML] add allocation state reason and support for partial model allocations (#76925 ) Previously, if a model failed to be allocated on any node, the deployment failed. This commit allows for an allocation to be partially_started and indicates its current state via a new state value in the deployment stats API. Additionally, when starting a deployment, the user may specify to wait_for starting, partially_started, started and the API will block (as long as timeout doesn't expire) until that state is reached.	2021-09-07 15:23:13 -04:00
Benjamin Trent	de49ff22a4	[ML] creating new PUT model definition part API (#76987 ) This commit simplifies the interactions for uploading chunked model definitions and model vocabulary.	2021-09-07 08:22:52 -04:00
Benjamin Trent	02e17c3442	[ML] adding new defer_definition_decompression parameter to put trained model API (#77189 ) This new parameter is a boolean parameter that allows users to put in a compressed model without it having to be inflated on the master node during the put request This is useful for system/module set up and then later having the model validated and fully parsed when it is being loaded on a node for usage	2021-09-03 09:07:54 -04:00
István Zoltán Szabó	cdec5228e8	[DOCS] Fixes line breaks. (#77248 )	2021-09-03 14:40:43 +02:00
István Zoltán Szabó	70a012b0c7	[DOCS] Fixes section IDs in start/stop trained model deployment APIs. (#77247 )	2021-09-03 14:24:37 +02:00
Lisa Cawley	007469af63	[DOCS] Replaces index pattern in ML docs (#77041 )	2021-09-01 10:26:06 -07:00
Benjamin Trent	0e1efa6533	[ML] generalize pytorch sentiment analysis to text classification (#77084 ) * [ML] generalize pytorch sentiment analysis to text classification * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TextClassificationConfig.java	2021-09-01 08:45:13 -04:00
István Zoltán Szabó	ea007902ef	[DOCS] Adds anomaly job health alert type docs (#76659 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-08-30 16:11:34 +02:00
Lisa Cawley	d36f24fbc3	[DOCS] Update datafeed details in ML docs (#76854 )	2021-08-25 11:35:21 -07:00
István Zoltán Szabó	789368b38f	[DOCS] Fixes a syntax error in datafeed runtime field example. (#76917 )	2021-08-25 12:04:32 +02:00
István Zoltán Szabó	8aed99fc02	[DOCS] Adds links that point to loss function to ML API docs. (#76438 )	2021-08-23 13:09:37 +02:00
István Zoltán Szabó	7faec52a1e	[DOCS] Fixes model_prune_window property description. (#76711 )	2021-08-19 16:16:37 +02:00
István Zoltán Szabó	b9d875bf68	[DOCS] Updates description of model_prune_window property in ML shared (#76487 )	2021-08-13 12:18:38 +02:00
István Zoltán Szabó	9b0417f2df	[DOCS] Comments out links that points to regression loss functions (#76435 ) * [DOCS] Comments out links that points to regression loss functions. * Update docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc	2021-08-12 18:33:42 +02:00
David Roberts	7ac5ea39df	[ML] Use results retention time for deleting system annotations (#76096 ) In #75617 a new setting, system_annotations_retention_days, was added to control how long system annotations are retained for. We now feel that this setting is redundant and that system annotations should be retained for the same period as results. This is intuitive and defensible, as system annotations can be considered a type of result. Followup to #75617	2021-08-04 17:42:31 +01:00
David Roberts	10a1d27c7b	[ML] Deleting a job now deletes the datafeed if necessary (#76010 ) Previously attempting to delete a job that had a datafeed would return an exception. However, this was unnecessarily pedantic - the user would always want to delete both job and datafeed together, and would react by deleting the datafeed and then subsequently deleting the job again. This change makes the delete job API automatically delete a datafeed associated with the job. The same level of force is used for this delete datafeed request as was used on the delete job request. This means that it's possible to force-delete an open job with a started datafeed (since force-delete datafeed will automatically stop a started datafeed). It's still not possible to delete an opened job without using force.	2021-08-03 17:22:06 +01:00
James Rodewig	fc0ac1923d	[DOCS] Correct spelling for geo terms (#76028 ) Changes: * Use "geopoint" when not referring to the literal field type * Use "geoshape" when not referring to the literal field type or query type * Use "GeoJSON" consistently	2021-08-03 09:55:48 -04:00
Ed Savage	5651215be1	[ML] Add 'model_prune_window' field to AD job config (#75741 ) Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/#1962	2021-08-03 09:16:43 +01:00
István Zoltán Szabó	ce537a33b6	[DOCS] Adds link that points to outlier detection example to GET DFA stats API docs. (#75689 )	2021-08-02 18:10:03 +02:00
István Zoltán Szabó	8d4fb3aa84	[DOCS] Changes link to outlier detection docs in PUTDFA API docs. (#75933 )	2021-08-02 13:45:37 +02:00
Przemysław Witek	30d9f13436	[ML] Delete expired annotations (#75617 )	2021-07-29 15:27:03 +02:00
Lisa Cawley	c1ba949aee	[DOCS] Fixes bulleted list in ML aggregations (#75806 )	2021-07-28 11:29:48 -07:00
Lisa Cawley	02d851e50e	[DOCS] Drafts trained model deployment APIs (#75497 )	2021-07-26 09:49:37 -07:00
István Zoltán Szabó	7e7a386078	[DOCS] Comments out link that points to outlier detection example (#75687 )	2021-07-26 16:36:57 +02:00
Lisa Cawley	70b870ee7f	[DOCS] Fixes nesting of datafeed config in APIs (#75502 )	2021-07-20 11:27:15 -07:00
István Zoltán Szabó	9ef156df9f	[DOCS] Adds peak_model_bytes and assignment_memory_basis to GET model snapshot API docs (#75413 )	2021-07-16 17:12:47 +02:00
Lisa Cawley	c8c7f0ef52	[DOCS] Anomaly detection: Visualize delayed data (#75098 )	2021-07-13 18:06:07 -07:00
Lisa Cawley	3c76bcb3a5	[DOCS] Fixes links to machine learning concepts (#75194 )	2021-07-09 13:09:03 -07:00
István Zoltán Szabó	6a4de77e11	[DOCS] Adds classification and regression links back to DFA docs. (#74930 )	2021-07-08 16:37:16 +02:00
István Zoltán Szabó	841cfb9214	[DOCS] Adds outlier detection links to DFA API docs (#74748 )	2021-07-06 15:10:41 +02:00
Lisa Cawley	b71b7d0866	[DOCS] Fix links to anomaly detection overview (#74943 )	2021-07-05 13:19:54 -07:00
Lisa Cawley	4c85852cc7	[DOCS] Update forecasting links in ML APIs (#74942 )	2021-07-05 12:34:03 -07:00
Lisa Cawley	5bcd318e29	[DOCS] Move ML functions to appendix (#74802 )	2021-07-05 11:53:17 -07:00
István Zoltán Szabó	483d145f78	[DOCS] Fixes an attribute in PUT DFA API docs. (#74931 )	2021-07-05 17:08:11 +02:00
István Zoltán Szabó	6c6e6874ff	[DOCS] Removes link to classification and regression. (#74926 )	2021-07-05 16:28:14 +02:00
István Zoltán Szabó	a4f9f4fae1	[DOCS] Comments out links to outlier detection. (#74745 )	2021-06-30 14:24:34 +02:00
Lisa Cawley	64af39b759	[DOCS] Add memory limit details in update job API (#74517 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-24 08:50:19 -07:00
Benjamin Trent	0303e6d733	[ML] add datafeed field to the job config (#74265 ) This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed. The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.	2021-06-23 08:06:58 -04:00

1 2 3 4 5 ...

438 Commits