elasticsearch

Commit Graph

Author	SHA1	Message	Date
Ed Savage	e8a46649c5	[ML] Warn when creating job with an unusual bucket span (#82145 ) Emit deprecation warning when creating new jobs with bucket spans that aren't an integral divisor or multiple of a day. Relates #81645 Co-authored-by: lcawl <lcawley@elastic.co>	2022-01-10 17:04:18 +00:00
Benjamin Trent	9dc8aea1cb	[ML] adds new mpnet tokenization for nlp models (#82234 ) This commit adds support for MPNet based models. MPNet models differ from BERT style models in that: - Special tokens are different - Input to the model doesn't require token positions. To configure an MPNet tokenizer for your pytorch MPNet based model: ``` "tokenization": { "mpnet": {...} } ``` The options provided to `mpnet` are the same as the previously supported `bert` configuration.	2022-01-05 12:56:47 -05:00
Dimitris Athanasiou	14a63ac115	[ML] Improve reporting of trained model size stats (#82000 ) This improves reporting of trained model size in the response of the stats API. In particular, it removes the `model_size_bytes` from the `deployment_stats` section and replaces it with a top-level `model_size_stats` object that contains: - `model_size_bytes`: the actual model size - `required_native_memory_bytes`: the amount of memory required to load a model In addition, these are now reported for PyTorch models regardless of their deployment state.	2021-12-22 18:20:47 +02:00
Ed Savage	a646f55c57	[ML] Set default value of 30 days for model prune window (#81377 ) For new jobs, when the analysis config field model_prune_window is not set, use a default value of 30 days or 20 times the bucket span, whichever is greater. Co-authored-by: David Roberts <dave.roberts@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-12-20 11:27:30 +00:00
David Kyle	d1ee756da8	[ML][DOCS] Add note about max values of thread settings (#81367 )	2021-12-14 13:07:34 +00:00
David Roberts	0559dd087b	[ML] Model snapshot upgrade needs a stats endpoint (#81641 ) Previously the ML model snapshot upgrade endpoint did not provide a way to reliably monitor progress. This could lead to the upgrade assistant UI thinking that a model snapshot upgrade had finished when it actually hadn't. This change adds a new "stats" API that allows external interested parties to find out the status of each model snapshot upgrade and which node (if any) each is running on. Fixes #81519	2021-12-14 08:31:49 +00:00
Lisa Cawley	1751ced80a	[DOCS] Fix formatting in get anomaly job API (#81682 )	2021-12-13 12:56:27 -08:00
David Kyle	3c974a1e5d	[ML][DOCS] Remove orphaned GET deployment stats doc (#81505 )	2021-12-09 08:32:33 +00:00
Lisa Cawley	429bdd9afc	[DOCS] Move trained model APIs out of dataframe analytics (#81315 )	2021-12-03 09:21:09 -08:00
David Kyle	aba14aacfa	[ML][DOCS] Add zero shot example and setting truncation at inference (#81003 ) More examples for the _infer endpoint	2021-12-01 11:44:04 +00:00
Lisa Cawley	e5de9d8ad7	[DOCS] Add actual and typical values in ML alerting docs (#80571 )	2021-11-25 10:06:52 -08:00
Lisa Cawley	8da1236bca	[DOCS] Clarify impact of force stop trained model deployment (#81026 )	2021-11-25 09:08:46 -08:00
Lisa Cawley	d1af86cfdd	[DOCS] Fixes start and stop trained model deployment APIs (#80978 )	2021-11-24 10:09:45 -08:00
Lisa Cawley	38cbd116c9	[DOCS] Fixes query parameters for get buckets API (#80643 )	2021-11-22 11:34:43 -08:00
Lisa Cawley	f3a69ae4b1	[DOCS] Adds missing query parameters to ML APIs (#80863 )	2021-11-22 09:25:01 -08:00
Lisa Cawley	fffac5bd08	[DOCS] Adds missing query parameters in get influencer and get snapshot APIs (#80801 )	2021-11-18 08:24:24 -08:00
Lisa Cawley	d6f48dc5bd	[DOCS] Add query parameters to update datafeed API (#80777 )	2021-11-17 07:40:31 -08:00
Dimitris Athanasiou	c7f745b40a	[ML] Force delete trained models (#80595 ) Adds a `force` parameter to the delete trained models API which when set to `true` allows deletion of a model that is referenced by ingest pipelines or has a started deployment. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-11 10:54:01 +02:00
Benjamin Trent	5627dc66e1	[ML] deprecate estimated_heap_memory_usage_bytes and replace with model_size_bytes (#80554 ) This deprecates estimated_heap_memory_usage_bytes on model put and replaces it with model_size_bytes. On GET, only model_size_bytes is returned unless v7 rest-api compatibility is requested. For the ml/info API, only model_size_bytes is returned A forward-port of: #80545	2021-11-10 10:23:25 -05:00
David Roberts	a61088063e	[ML] use_auto_machine_memory_percent now defaults max_model_memory_limit (#80532 ) If the xpack.ml.use_auto_machine_memory_percent setting is true, and xpack.ml.max_model_memory_limit is not set then xpack.ml.max_model_memory_limit is now considered to be set to the largest size that could be assigned in the cluster. This functionality will be crucial for Cloud once the Elasticsearch startup code is setting the Elasticsearch JVM heap size. Then the Cloud code will no longer be able to accurately set xpack.ml.max_model_memory_limit, so will not set it at all. Instead the Cloud code will just set xpack.ml.use_auto_machine_memory_percent and the ML code will calculate the appropriate maximum model_memory_limit that should be permitted.	2021-11-10 08:38:02 +00:00
Lisa Cawley	6ecc495d15	[DOCS] Clarify parameters in delete expired data, forecast, and flush job APIs (#80517 )	2021-11-09 14:57:35 -08:00
Lisa Cawley	1c98a23ca8	[DOCS] Edits stop and start datafeed APIs (#80461 )	2021-11-09 14:39:13 -08:00
Benjamin Trent	cf5f521fac	[ML] add deployment_stats to trained model stats (#80531 ) This commit adds a new field deployment_stats that is optionally set for models that are deployed. If a model does not have a deployment, it will be null. Also, removes the get deployment stats API and makes the deployment stats action internal only.	2021-11-09 16:09:47 -05:00
Benjamin Trent	c3c3f88000	[ML] validate model definition on start deployment (#80439 ) When a deployment is started, we do not validate that the definition documents are all present and not truncated. This commit adds a validation on _start that prevents a bad state from occurring where the deployment starts, but the model is incorrectly defined, or some unknown error occurs to late in the deployment process.	2021-11-09 10:33:55 -05:00
Dimitris Athanasiou	afe58ba6d8	[ML] Force stop deployment in use (#80431 ) Implements a `force` parameter to the stop deployment API. This allows a user to forcefully stop a deployment. Currently, this specifically allows stopping a deployment that is in use by ingest processors. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-08 14:35:52 +02:00
Lisa Cawley	733381bed2	[DOCS] Adds missing query parameters to datafeed APIs (#80314 )	2021-11-05 16:31:04 -07:00
James Rodewig	f56a0f4b66	[DOCS] Remove `testenv` annotations from doc snippet tests (#80023 ) Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible. Relates to #79309, #31619	2021-11-05 18:38:50 -04:00
István Zoltán Szabó	f72e2da221	[DOCS] Adds missing query params to GET category and GET influencer APIs (#79448 )	2021-11-05 10:59:57 +01:00
David Kyle	0635f2758f	[ML] Consistently apply the default truncation option for the BERT tokenizer (#80339 ) The default is Truncate.First	2021-11-05 09:10:59 +00:00
Lisa Cawley	638fe2c26a	[DOCS] Fixes typo in start trained models API (#80368 )	2021-11-04 14:23:03 -07:00
Dimitris Athanasiou	d13baade69	[ML] Report start_time for trained model deployments and allocations (#80188 ) Adds `start_time` to the get deployment stats API for the deployment and each allocation. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-02 17:12:46 +02:00
David Kyle	58a517309a	[ML] [DOCS] Update the model part upload URL in example (#80181 )	2021-11-02 11:33:04 +00:00
Benjamin Trent	8887cfa080	[ML] updating the infer trained model deployment docs (#80083 ) the infer endpoint has changed its format. Also, the results format for the various tasks have changed. This updates the docs to match what is currently in 8.0.0.	2021-10-29 13:07:23 -04:00
Benjamin Trent	f9bf4e57b9	[ML] adds new params to the start trained model deployment docs (#80016 )	2021-10-28 11:23:25 -04:00
Benjamin Trent	375fc779b4	[ML] update truncation default & adding field output when input is truncated (#79942 ) This commit makes the two following changes (along with some refactoring) - Nlp results will now indicate if the input was truncated or not - The default truncation is now `none` instead of `first`	2021-10-28 10:40:49 -04:00
Benjamin Trent	d2b638356b	[ML] Update trained model docs for truncate parameter for bert tokenization (#79652 )	2021-10-28 07:19:10 -04:00
David Roberts	6b20e8e1b0	[ML] Fixing doc test substitution bug (#79943 ) The substitutions should not have a space after the field name. Fixes #79931	2021-10-27 19:45:15 +01:00
Mark Vieira	8f79cfacab	Mute documentation test	2021-10-27 09:48:20 -07:00
Lisa Cawley	610043f100	[DOCS] Edits formatting in create trained models API (#79758 ) Related to #78376 This PR fixes minor formatting issues in the create trained models API documentation	2021-10-27 07:41:11 -04:00
Lisa Cawley	cadc0c3800	[DOCS] Fixes typo in preview datafeed API (#79863 )	2021-10-26 16:48:06 -07:00
István Zoltán Szabó	c879db98b1	[DOCS] Updates get trained models API docs (#79372 ) * [DOCS] Updates get trained models API docs. * [DOCS] Reviews get trained models related definitions in ml-shared.	2021-10-25 11:47:45 +02:00
István Zoltán Szabó	94ab204a1e	[DOCS] Fixes indentation issue in GET trained models API docs. (#79347 )	2021-10-18 12:27:24 +02:00
Lisa Cawley	3d6074b76e	[DOCS] Fixes typo in calendar API example (#78867 )	2021-10-07 17:51:14 -07:00
Lisa Cawley	df5dde5b3c	[DOCS] Fixes ML get calendars API (#78808 )	2021-10-07 12:22:11 -07:00
Lisa Cawley	bcd75c3203	[DOCS] Fixes ML get scheduled events API (#78809 )	2021-10-07 08:34:58 -07:00
Benjamin Trent	498e6e3d0f	[ML] adding docs for estimated heap and operations (#78376 ) Add docs for optionally supplying memory and operation estimates in put model	2021-09-29 09:11:42 -04:00
Benjamin Trent	b96d929af3	[ML] add documentation for get deployment stats API (#78412 ) * [ML] add documentation for get deployment stats API * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2021-09-29 07:20:25 -04:00
Benjamin Trent	408489310c	[ML] add zero_shot_classification task for BERT nlp models (#77799 ) Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels. This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be: "Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification. See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. The zeroshot classification task is configured as follows: ```js { // <snip> model configuration </snip> "inference_config" : { "zero_shot_classification": { "classification_labels": ["entailment", "neutral", "contradiction"], // <1> "labels": ["sad", "glad", "mad", "rad"], // <2> "multi_label": false, // <3> "hypothesis_template": "This example is {}.", // <4> "tokenization": { /<snip> tokenization configuration </snip>/} } } } ``` * <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case * <2> This is an optional parameter for the default zero_shot labels to attempt to classify * <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels * <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.` For inference in a pipeline one may provide label updates: ```js { //<snip> pipeline definition </snip> "processors": [ //<snip> other processors </snip> { "inference": { // <snip> general configuration </snip> "inference_config": { "zero_shot_classification": { "labels": ["humanities", "science", "mathematics", "technology"], // <1> "multi_label": true // <2> } } } } //<snip> other processors </snip> ] } ``` * <1> The `labels` we care about, these replace the default ones if they exist. * <2> Should the results allow multiple true labels Similarly one may provide label changes against the `_infer` endpoint ```js { "docs":[{ "text_field": "This is a very happy person"}], "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}} } ```	2021-09-28 09:38:23 -04:00
Benjamin Trent	00defa38a9	[ML] adding some initial document for our pytorch NLP model support (#78270 ) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations	2021-09-27 12:46:13 -04:00
Benjamin Trent	281ec58b8d	[ML] add new default char filter `first_line_with_letters` for machine learning categorization (#77457 ) The char filter replaces the previous default of `first_non_blank_line`. `first_non_blank_line` worked well to figure out what line had characters at all, but log lines like the following were handled poorly: ``` -------------------------------------------------------------------------------- Alias 'foo' already exists and this prevents setting up ILM for logs -------------------------------------------------------------------------------- ``` When combined with the `ml_standard` tokenizer, the first line was used: ``` -------------------------------------------------------------------------------- ``` This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer. The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true). Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens: ``` "tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"] ```	2021-09-09 10:09:57 -04:00

1 2 3 4 5 ...

478 Commits