elasticsearch

Commit Graph

Author	SHA1	Message	Date
Lisa Cawley	1751ced80a	[DOCS] Fix formatting in get anomaly job API (#81682 )	2021-12-13 12:56:27 -08:00
David Kyle	3c974a1e5d	[ML][DOCS] Remove orphaned GET deployment stats doc (#81505 )	2021-12-09 08:32:33 +00:00
Lisa Cawley	429bdd9afc	[DOCS] Move trained model APIs out of dataframe analytics (#81315 )	2021-12-03 09:21:09 -08:00
David Kyle	aba14aacfa	[ML][DOCS] Add zero shot example and setting truncation at inference (#81003 ) More examples for the _infer endpoint	2021-12-01 11:44:04 +00:00
Lisa Cawley	e5de9d8ad7	[DOCS] Add actual and typical values in ML alerting docs (#80571 )	2021-11-25 10:06:52 -08:00
Lisa Cawley	8da1236bca	[DOCS] Clarify impact of force stop trained model deployment (#81026 )	2021-11-25 09:08:46 -08:00
Lisa Cawley	d1af86cfdd	[DOCS] Fixes start and stop trained model deployment APIs (#80978 )	2021-11-24 10:09:45 -08:00
Lisa Cawley	38cbd116c9	[DOCS] Fixes query parameters for get buckets API (#80643 )	2021-11-22 11:34:43 -08:00
Lisa Cawley	f3a69ae4b1	[DOCS] Adds missing query parameters to ML APIs (#80863 )	2021-11-22 09:25:01 -08:00
Lisa Cawley	fffac5bd08	[DOCS] Adds missing query parameters in get influencer and get snapshot APIs (#80801 )	2021-11-18 08:24:24 -08:00
Lisa Cawley	d6f48dc5bd	[DOCS] Add query parameters to update datafeed API (#80777 )	2021-11-17 07:40:31 -08:00
Dimitris Athanasiou	c7f745b40a	[ML] Force delete trained models (#80595 ) Adds a `force` parameter to the delete trained models API which when set to `true` allows deletion of a model that is referenced by ingest pipelines or has a started deployment. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-11 10:54:01 +02:00
Benjamin Trent	5627dc66e1	[ML] deprecate estimated_heap_memory_usage_bytes and replace with model_size_bytes (#80554 ) This deprecates estimated_heap_memory_usage_bytes on model put and replaces it with model_size_bytes. On GET, only model_size_bytes is returned unless v7 rest-api compatibility is requested. For the ml/info API, only model_size_bytes is returned A forward-port of: #80545	2021-11-10 10:23:25 -05:00
David Roberts	a61088063e	[ML] use_auto_machine_memory_percent now defaults max_model_memory_limit (#80532 ) If the xpack.ml.use_auto_machine_memory_percent setting is true, and xpack.ml.max_model_memory_limit is not set then xpack.ml.max_model_memory_limit is now considered to be set to the largest size that could be assigned in the cluster. This functionality will be crucial for Cloud once the Elasticsearch startup code is setting the Elasticsearch JVM heap size. Then the Cloud code will no longer be able to accurately set xpack.ml.max_model_memory_limit, so will not set it at all. Instead the Cloud code will just set xpack.ml.use_auto_machine_memory_percent and the ML code will calculate the appropriate maximum model_memory_limit that should be permitted.	2021-11-10 08:38:02 +00:00
Lisa Cawley	6ecc495d15	[DOCS] Clarify parameters in delete expired data, forecast, and flush job APIs (#80517 )	2021-11-09 14:57:35 -08:00
Lisa Cawley	1c98a23ca8	[DOCS] Edits stop and start datafeed APIs (#80461 )	2021-11-09 14:39:13 -08:00
Benjamin Trent	cf5f521fac	[ML] add deployment_stats to trained model stats (#80531 ) This commit adds a new field deployment_stats that is optionally set for models that are deployed. If a model does not have a deployment, it will be null. Also, removes the get deployment stats API and makes the deployment stats action internal only.	2021-11-09 16:09:47 -05:00
Benjamin Trent	c3c3f88000	[ML] validate model definition on start deployment (#80439 ) When a deployment is started, we do not validate that the definition documents are all present and not truncated. This commit adds a validation on _start that prevents a bad state from occurring where the deployment starts, but the model is incorrectly defined, or some unknown error occurs to late in the deployment process.	2021-11-09 10:33:55 -05:00
Dimitris Athanasiou	afe58ba6d8	[ML] Force stop deployment in use (#80431 ) Implements a `force` parameter to the stop deployment API. This allows a user to forcefully stop a deployment. Currently, this specifically allows stopping a deployment that is in use by ingest processors. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-08 14:35:52 +02:00
Lisa Cawley	733381bed2	[DOCS] Adds missing query parameters to datafeed APIs (#80314 )	2021-11-05 16:31:04 -07:00
James Rodewig	f56a0f4b66	[DOCS] Remove `testenv` annotations from doc snippet tests (#80023 ) Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible. Relates to #79309, #31619	2021-11-05 18:38:50 -04:00
István Zoltán Szabó	f72e2da221	[DOCS] Adds missing query params to GET category and GET influencer APIs (#79448 )	2021-11-05 10:59:57 +01:00
David Kyle	0635f2758f	[ML] Consistently apply the default truncation option for the BERT tokenizer (#80339 ) The default is Truncate.First	2021-11-05 09:10:59 +00:00
Lisa Cawley	638fe2c26a	[DOCS] Fixes typo in start trained models API (#80368 )	2021-11-04 14:23:03 -07:00
Dimitris Athanasiou	d13baade69	[ML] Report start_time for trained model deployments and allocations (#80188 ) Adds `start_time` to the get deployment stats API for the deployment and each allocation. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-11-02 17:12:46 +02:00
David Kyle	58a517309a	[ML] [DOCS] Update the model part upload URL in example (#80181 )	2021-11-02 11:33:04 +00:00
Benjamin Trent	8887cfa080	[ML] updating the infer trained model deployment docs (#80083 ) the infer endpoint has changed its format. Also, the results format for the various tasks have changed. This updates the docs to match what is currently in 8.0.0.	2021-10-29 13:07:23 -04:00
Benjamin Trent	f9bf4e57b9	[ML] adds new params to the start trained model deployment docs (#80016 )	2021-10-28 11:23:25 -04:00
Benjamin Trent	375fc779b4	[ML] update truncation default & adding field output when input is truncated (#79942 ) This commit makes the two following changes (along with some refactoring) - Nlp results will now indicate if the input was truncated or not - The default truncation is now `none` instead of `first`	2021-10-28 10:40:49 -04:00
Benjamin Trent	d2b638356b	[ML] Update trained model docs for truncate parameter for bert tokenization (#79652 )	2021-10-28 07:19:10 -04:00
David Roberts	6b20e8e1b0	[ML] Fixing doc test substitution bug (#79943 ) The substitutions should not have a space after the field name. Fixes #79931	2021-10-27 19:45:15 +01:00
Mark Vieira	8f79cfacab	Mute documentation test	2021-10-27 09:48:20 -07:00
Lisa Cawley	610043f100	[DOCS] Edits formatting in create trained models API (#79758 ) Related to #78376 This PR fixes minor formatting issues in the create trained models API documentation	2021-10-27 07:41:11 -04:00
Lisa Cawley	cadc0c3800	[DOCS] Fixes typo in preview datafeed API (#79863 )	2021-10-26 16:48:06 -07:00
István Zoltán Szabó	c879db98b1	[DOCS] Updates get trained models API docs (#79372 ) * [DOCS] Updates get trained models API docs. * [DOCS] Reviews get trained models related definitions in ml-shared.	2021-10-25 11:47:45 +02:00
István Zoltán Szabó	94ab204a1e	[DOCS] Fixes indentation issue in GET trained models API docs. (#79347 )	2021-10-18 12:27:24 +02:00
Lisa Cawley	3d6074b76e	[DOCS] Fixes typo in calendar API example (#78867 )	2021-10-07 17:51:14 -07:00
Lisa Cawley	df5dde5b3c	[DOCS] Fixes ML get calendars API (#78808 )	2021-10-07 12:22:11 -07:00
Lisa Cawley	bcd75c3203	[DOCS] Fixes ML get scheduled events API (#78809 )	2021-10-07 08:34:58 -07:00
Benjamin Trent	498e6e3d0f	[ML] adding docs for estimated heap and operations (#78376 ) Add docs for optionally supplying memory and operation estimates in put model	2021-09-29 09:11:42 -04:00
Benjamin Trent	b96d929af3	[ML] add documentation for get deployment stats API (#78412 ) * [ML] add documentation for get deployment stats API * Apply suggestions from code review Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co> Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>	2021-09-29 07:20:25 -04:00
Benjamin Trent	408489310c	[ML] add zero_shot_classification task for BERT nlp models (#77799 ) Zero-Shot classification allows for text classification tasks without a pre-trained collection of target labels. This is achieved through models trained on the Multi-Genre Natural Language Inference (MNLI) dataset. This dataset pairs text sequences with "entailment" clauses. An example could be: "Throughout all of history, man kind has shown itself resourceful, yet astoundingly short-sighted" could have been paired with the entailment clauses: ["This example is history", "This example is sociology"...]. This training set combined with the attention and semantic knowledge in modern day NLP models (BERT, BART, etc.) affords a powerful tool for ad-hoc text classification. See https://arxiv.org/abs/1909.00161 for a deeper explanation of the MNLI training and how zero-shot works. The zeroshot classification task is configured as follows: ```js { // <snip> model configuration </snip> "inference_config" : { "zero_shot_classification": { "classification_labels": ["entailment", "neutral", "contradiction"], // <1> "labels": ["sad", "glad", "mad", "rad"], // <2> "multi_label": false, // <3> "hypothesis_template": "This example is {}.", // <4> "tokenization": { /<snip> tokenization configuration </snip>/} } } } ``` * <1> For all zero_shot models, there returns 3 particular labels when classification the target sequence. "entailment" is the positive case, "neutral" the case where the sequence isn't positive or negative, and "contradiction" is the negative case * <2> This is an optional parameter for the default zero_shot labels to attempt to classify * <3> When returning the probabilities, should the results assume there is only one true label or multiple true labels * <4> The hypothesis template when tokenizing the labels. When combining with `sad` the sequence looks like `This example is sad.` For inference in a pipeline one may provide label updates: ```js { //<snip> pipeline definition </snip> "processors": [ //<snip> other processors </snip> { "inference": { // <snip> general configuration </snip> "inference_config": { "zero_shot_classification": { "labels": ["humanities", "science", "mathematics", "technology"], // <1> "multi_label": true // <2> } } } } //<snip> other processors </snip> ] } ``` * <1> The `labels` we care about, these replace the default ones if they exist. * <2> Should the results allow multiple true labels Similarly one may provide label changes against the `_infer` endpoint ```js { "docs":[{ "text_field": "This is a very happy person"}], "inference_config":{"zero_shot_classification":{"labels": ["glad", "sad", "bad", "rad"], "multi_label": false}} } ```	2021-09-28 09:38:23 -04:00
Benjamin Trent	00defa38a9	[ML] adding some initial document for our pytorch NLP model support (#78270 ) Adding docs for: put vocab put model definition part start deployment all the new NLP configuration objects for trained model configurations	2021-09-27 12:46:13 -04:00
Benjamin Trent	281ec58b8d	[ML] add new default char filter `first_line_with_letters` for machine learning categorization (#77457 ) The char filter replaces the previous default of `first_non_blank_line`. `first_non_blank_line` worked well to figure out what line had characters at all, but log lines like the following were handled poorly: ``` -------------------------------------------------------------------------------- Alias 'foo' already exists and this prevents setting up ILM for logs -------------------------------------------------------------------------------- ``` When combined with the `ml_standard` tokenizer, the first line was used: ``` -------------------------------------------------------------------------------- ``` This has no valid tokens for our standard tokenizer. Consequently, no tokens were found by `ml_standard` tokenizer. The new filter, `first_line_with_letters`, returns the first line with any letter character (e.g. `Character#isLetter` returns true). Given the previously poorly handled log, when combining with our `ml_standard` tokenizer, we get the following, more appropriate, tokens: ``` "tokens" : ["Alias", "foo", "already", "exists", "and", "this", "prevents", "setting", "up", "ILM", "for", "logs"] ```	2021-09-09 10:09:57 -04:00
Lisa Cawley	b5a32678e7	[DOCS] Fixes admonition formatting (#77393 )	2021-09-08 11:20:43 -07:00
Benjamin Trent	a68c6acdb3	[ML] adding new PUT trained model vocabulary endpoint (#77387 ) This commit removes the ability to set the vocabulary location in the model config. This opts instead for sane defaults to be set and used. Wrapping this up in an API. The index is now always the internally managed .ml-inference-native index and the document ID is always <model_id>_vocabulary This API only works for pytorch/nlp type models.	2021-09-08 10:21:45 -04:00
Benjamin Trent	708491d0d3	[ML] add allocation state reason and support for partial model allocations (#76925 ) Previously, if a model failed to be allocated on any node, the deployment failed. This commit allows for an allocation to be partially_started and indicates its current state via a new state value in the deployment stats API. Additionally, when starting a deployment, the user may specify to wait_for starting, partially_started, started and the API will block (as long as timeout doesn't expire) until that state is reached.	2021-09-07 15:23:13 -04:00
Benjamin Trent	de49ff22a4	[ML] creating new PUT model definition part API (#76987 ) This commit simplifies the interactions for uploading chunked model definitions and model vocabulary.	2021-09-07 08:22:52 -04:00
Benjamin Trent	02e17c3442	[ML] adding new defer_definition_decompression parameter to put trained model API (#77189 ) This new parameter is a boolean parameter that allows users to put in a compressed model without it having to be inflated on the master node during the put request This is useful for system/module set up and then later having the model validated and fully parsed when it is being loaded on a node for usage	2021-09-03 09:07:54 -04:00
István Zoltán Szabó	cdec5228e8	[DOCS] Fixes line breaks. (#77248 )	2021-09-03 14:40:43 +02:00
István Zoltán Szabó	70a012b0c7	[DOCS] Fixes section IDs in start/stop trained model deployment APIs. (#77247 )	2021-09-03 14:24:37 +02:00
Lisa Cawley	007469af63	[DOCS] Replaces index pattern in ML docs (#77041 )	2021-09-01 10:26:06 -07:00
Benjamin Trent	0e1efa6533	[ML] generalize pytorch sentiment analysis to text classification (#77084 ) * [ML] generalize pytorch sentiment analysis to text classification * Update x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/ml/inference/trainedmodel/TextClassificationConfig.java	2021-09-01 08:45:13 -04:00
István Zoltán Szabó	ea007902ef	[DOCS] Adds anomaly job health alert type docs (#76659 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-08-30 16:11:34 +02:00
Lisa Cawley	d36f24fbc3	[DOCS] Update datafeed details in ML docs (#76854 )	2021-08-25 11:35:21 -07:00
István Zoltán Szabó	789368b38f	[DOCS] Fixes a syntax error in datafeed runtime field example. (#76917 )	2021-08-25 12:04:32 +02:00
István Zoltán Szabó	8aed99fc02	[DOCS] Adds links that point to loss function to ML API docs. (#76438 )	2021-08-23 13:09:37 +02:00
István Zoltán Szabó	7faec52a1e	[DOCS] Fixes model_prune_window property description. (#76711 )	2021-08-19 16:16:37 +02:00
István Zoltán Szabó	b9d875bf68	[DOCS] Updates description of model_prune_window property in ML shared (#76487 )	2021-08-13 12:18:38 +02:00
István Zoltán Szabó	9b0417f2df	[DOCS] Comments out links that points to regression loss functions (#76435 ) * [DOCS] Comments out links that points to regression loss functions. * Update docs/reference/ml/df-analytics/apis/get-trained-models.asciidoc	2021-08-12 18:33:42 +02:00
David Roberts	7ac5ea39df	[ML] Use results retention time for deleting system annotations (#76096 ) In #75617 a new setting, system_annotations_retention_days, was added to control how long system annotations are retained for. We now feel that this setting is redundant and that system annotations should be retained for the same period as results. This is intuitive and defensible, as system annotations can be considered a type of result. Followup to #75617	2021-08-04 17:42:31 +01:00
David Roberts	10a1d27c7b	[ML] Deleting a job now deletes the datafeed if necessary (#76010 ) Previously attempting to delete a job that had a datafeed would return an exception. However, this was unnecessarily pedantic - the user would always want to delete both job and datafeed together, and would react by deleting the datafeed and then subsequently deleting the job again. This change makes the delete job API automatically delete a datafeed associated with the job. The same level of force is used for this delete datafeed request as was used on the delete job request. This means that it's possible to force-delete an open job with a started datafeed (since force-delete datafeed will automatically stop a started datafeed). It's still not possible to delete an opened job without using force.	2021-08-03 17:22:06 +01:00
James Rodewig	fc0ac1923d	[DOCS] Correct spelling for geo terms (#76028 ) Changes: * Use "geopoint" when not referring to the literal field type * Use "geoshape" when not referring to the literal field type or query type * Use "GeoJSON" consistently	2021-08-03 09:55:48 -04:00
Ed Savage	5651215be1	[ML] Add 'model_prune_window' field to AD job config (#75741 ) Add configuration for pruning dead split fields in anomaly detection jobs via the `model_prune_window` field for both the job creation and update APIs. Relates to ml-cpp/#1962	2021-08-03 09:16:43 +01:00
István Zoltán Szabó	ce537a33b6	[DOCS] Adds link that points to outlier detection example to GET DFA stats API docs. (#75689 )	2021-08-02 18:10:03 +02:00
István Zoltán Szabó	8d4fb3aa84	[DOCS] Changes link to outlier detection docs in PUTDFA API docs. (#75933 )	2021-08-02 13:45:37 +02:00
Przemysław Witek	30d9f13436	[ML] Delete expired annotations (#75617 )	2021-07-29 15:27:03 +02:00
Lisa Cawley	c1ba949aee	[DOCS] Fixes bulleted list in ML aggregations (#75806 )	2021-07-28 11:29:48 -07:00
Lisa Cawley	02d851e50e	[DOCS] Drafts trained model deployment APIs (#75497 )	2021-07-26 09:49:37 -07:00
István Zoltán Szabó	7e7a386078	[DOCS] Comments out link that points to outlier detection example (#75687 )	2021-07-26 16:36:57 +02:00
Lisa Cawley	70b870ee7f	[DOCS] Fixes nesting of datafeed config in APIs (#75502 )	2021-07-20 11:27:15 -07:00
István Zoltán Szabó	9ef156df9f	[DOCS] Adds peak_model_bytes and assignment_memory_basis to GET model snapshot API docs (#75413 )	2021-07-16 17:12:47 +02:00
Lisa Cawley	c8c7f0ef52	[DOCS] Anomaly detection: Visualize delayed data (#75098 )	2021-07-13 18:06:07 -07:00
Lisa Cawley	3c76bcb3a5	[DOCS] Fixes links to machine learning concepts (#75194 )	2021-07-09 13:09:03 -07:00
István Zoltán Szabó	6a4de77e11	[DOCS] Adds classification and regression links back to DFA docs. (#74930 )	2021-07-08 16:37:16 +02:00
István Zoltán Szabó	841cfb9214	[DOCS] Adds outlier detection links to DFA API docs (#74748 )	2021-07-06 15:10:41 +02:00
Lisa Cawley	b71b7d0866	[DOCS] Fix links to anomaly detection overview (#74943 )	2021-07-05 13:19:54 -07:00
Lisa Cawley	4c85852cc7	[DOCS] Update forecasting links in ML APIs (#74942 )	2021-07-05 12:34:03 -07:00
Lisa Cawley	5bcd318e29	[DOCS] Move ML functions to appendix (#74802 )	2021-07-05 11:53:17 -07:00
István Zoltán Szabó	483d145f78	[DOCS] Fixes an attribute in PUT DFA API docs. (#74931 )	2021-07-05 17:08:11 +02:00
István Zoltán Szabó	6c6e6874ff	[DOCS] Removes link to classification and regression. (#74926 )	2021-07-05 16:28:14 +02:00
István Zoltán Szabó	a4f9f4fae1	[DOCS] Comments out links to outlier detection. (#74745 )	2021-06-30 14:24:34 +02:00
Lisa Cawley	64af39b759	[DOCS] Add memory limit details in update job API (#74517 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-24 08:50:19 -07:00
Benjamin Trent	0303e6d733	[ML] add datafeed field to the job config (#74265 ) This is a quality of life improvement for typical users. Almost all anomaly jobs will receive their data through a datafeed. The datafeed config can now be supplied and is available in the datafeed field in the job config for creation and getting jobs.	2021-06-23 08:06:58 -04:00
David Roberts	6e9b959450	[ML] Closing an anomaly detection job now automatically stops its datafeed if necessary (#74257 ) Previously it was a requirement of the close job API that if the job had an associated datafeed that that datafeed was stopped before the job could be closed. Experience has shown that this is just a pedantic nuisance. If a user closes the job without first stopping the datafeed then it's just a mistake, and they then have to make two further calls, to stop the datafeed and then attempt to close the job again. This PR changes the behaviour so that if you ask to close a job whose datafeed is running then the datafeed gets stopped first as part of the same call. Datafeeds are stopped with the same level of force as the job close request specified.	2021-06-22 12:56:11 +01:00
István Zoltán Szabó	2e820fcab6	[DOCS] Clarifies terminology in Performing population analysis page. (#74237 )	2021-06-18 09:03:38 +02:00
ymao1	c727b40d0b	[Docs] Update cross-document links to Kibana Alerting docs (#74034 ) * Updating cross-document links * PR fixes	2021-06-14 12:23:47 -04:00
Dimitris Athanasiou	dc61a72c9e	[ML] Reset anomaly detection job API (#73908 ) Adds a new API that allows a user to reset an anomaly detection job. To use the API do: ``` POST _ml/anomaly_detectors/<job_id>_reset ``` The API removes all data associated to the job. In particular, it deletes model state, results and stats. However, job notifications and user annotations are not removed. Also, the API can be called asynchronously by setting the parameter `wait_for_completion` to `false` (defaults to `true`). When run that way the API returns the task id for further monitoring. In order to prevent the job from opening while it is resetting, a new job field has been added called `blocked`. It is an object that contains a `reason` and the `task_id`. `reason` can take a value from ["delete", "reset", "revert"] as all these operations should block the job from opening. The `task_id` is also included in order to allow tracking the task if necessary. Finally, this commit also sets the `blocked` field when the revert snapshot API is called as a job should not be opened while it is reverted to a different model snapshot.	2021-06-14 18:56:28 +03:00
Benjamin Trent	8d882863d7	[ML] adding running_state to datafeed stats object (#73926 ) It is useful to know the following information when reading datafeed stats: - Is the datafeed a "real-time" datafeed, i.e. a datafeed without a configured `end` time - Has the datafeed processed all past data available at the time of starting. This object is only available if the datafeed task has been created. It has the form: ``` "running_state": { "is_real_time": <boolean>, "look_back_finished": <boolean> } ```	2021-06-10 08:08:49 -04:00
István Zoltán Szabó	20d0dc300f	[DOCS] Updates datafeed related runtime field examples (#73725 )	2021-06-08 11:27:55 +02:00
Lisa Cawley	a6339918ac	[DOCS] Adds defaults to get ML results APIs (#73540 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-06-03 10:05:47 -07:00
István Zoltán Szabó	44c26c8bdc	[DOCS] Removes Kibana charts-related advise about agg interval and bucket span. (#73673 )	2021-06-02 16:47:01 +02:00
David Roberts	0059c59e25	[ML] Make ml_standard tokenizer the default for new categorization jobs (#72805 ) Categorization jobs created once the entire cluster is upgraded to version 7.14 or higher will default to using the new ml_standard tokenizer rather than the previous default of the ml_classic tokenizer, and will incorporate the new first_non_blank_line char filter so that categorization is based purely on the first non-blank line of each message. The difference between the ml_classic and ml_standard tokenizers is that ml_classic splits on slashes and colons, so creates multiple tokens from URLs and filesystem paths, whereas ml_standard attempts to keep URLs, email addresses and filesystem paths as single tokens. It is still possible to config the ml_classic tokenizer if you prefer: just provide a categorization_analyzer within your analysis_config and whichever tokenizer you choose (which could be ml_classic or any other Elasticsearch tokenizer) will be used. To opt out of using first_non_blank_line as a default char filter, you must explicitly specify a categorization_analyzer that does not include it. If no categorization_analyzer is specified but categorization_filters are specified then the categorization filters are converted to char filters applied that are applied after first_non_blank_line. Closes elastic/ml-cpp#1724	2021-06-01 15:11:32 +01:00
István Zoltán Szabó	1ce2308e2a	[DOCS] Adds max_trees hyperparameter to GET TM API docs (#72298 )	2021-05-06 08:18:19 +02:00
István Zoltán Szabó	d07c174aaf	[DOCS] Revises required privileges info in Anomaly Detection API docs (#72483 )	2021-05-03 10:20:14 +02:00
Benjamin Trent	2ce4d175f0	[ML] increase the default value of xpack.ml.max_open_jobs from 20 to 512 for autoscaling improvements (#72487 ) This commit increases the xpack.ml.max_open_jobs from 20 to 512. Additionally, it ignores nodes that cannot provide an accurate view into their native memory. If a node does not have a view into its native memory, we ignore it for assignment. This effectively fixes a bug with autoscaling. Autoscaling relies on jobs with adequate memory to assign jobs to nodes. If that is hampered by the xpack.ml.max_open_jobs scaling decisions are hampered.	2021-04-30 07:55:57 -04:00
István Zoltán Szabó	ce9dd74cf5	[DOCS] Expands DFA and TM API docs with required privileges info (#71335 )	2021-04-28 08:33:42 +02:00
Pierre Grimaud	3c44dfec60	[DOCS] Fix typos (#72227 )	2021-04-26 12:40:38 -04:00
István Zoltán Szabó	2f122f03b2	[DOCS] Adds anomaly detection rule advanced settings to docs (#72072 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-04-26 09:55:02 +02:00
István Zoltán Szabó	aca0a7ffa4	[DOCS] Alters examples in anomaly detection page to use runtime mappings (#71745 )	2021-04-19 13:06:50 +02:00
Benjamin Trent	01fc8ed246	[ML] adding ability to update runtime_mappings via datafeed config update API (#71707 ) Adds runtime_mappings as an updatable field via datafeed config update. closes: #71702	2021-04-15 09:44:34 -04:00
István Zoltán Szabó	ce389dff5d	[DOCS] Clarifies that custom rules are job rules in Kibana (#71678 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-04-15 09:33:03 +02:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
Benjamin Trent	c8415a7924	[ML] adding support for composite aggs in anomaly detection (#69970 ) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 08:25:40 -04:00
István Zoltán Szabó	1db2b85e45	[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978 )	2021-03-30 10:42:48 +02:00
Benjamin Trent	b796632582	[ML] Allow datafeed and job configs for datafeed preview API (#70836 ) Previously, a datafeed and job must already exist for the `_preview` API to work. With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job without creating either of them. closes https://github.com/elastic/elasticsearch/issues/70264	2021-03-26 12:52:23 -04:00
István Zoltán Szabó	9a8c6fb66f	[DOCS] Removes beta labels from DFA related docs. (#70808 )	2021-03-26 09:46:41 +01:00
István Zoltán Szabó	165c0ddaeb	[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-18 18:23:19 +01:00
Benjamin Trent	10e637d97c	[ML] allow documents to be out of order within the same time bucket (#70468 ) This commit allows documents seen within the same time bucket to be out of order. This is already supported within the native process. Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.	2021-03-17 09:34:49 -04:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
István Zoltán Szabó	59f6280a7b	[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201 )	2021-03-10 15:46:01 +01:00
Lisa Cawley	2caba7b11f	[DOCS] Edits machine learning settings (#69947 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-03-09 10:59:12 -08:00
István Zoltán Szabó	c226958947	[DOCS] Expands anomaly detection alert type docs (#70026 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>	2021-03-09 12:02:16 +01:00
Lisa Cawley	c537e5f38c	[DOCS] Edits delete trained model alias API (#70119 )	2021-03-08 17:08:58 -08:00
István Zoltán Szabó	8a7aced8e8	[DOCS] Adds beta tag to anomaly detection alert docs. (#70013 )	2021-03-08 10:46:24 +01:00
István Zoltán Szabó	2ccc81081f	[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959 )	2021-03-04 16:43:06 +01:00
Joe Gallo	1e8b5fa7c2	Remove the _ml/find-file-structure docs (#69823 )	2021-03-03 09:49:28 -05:00
Benjamin Trent	2279cafb4e	[ML] adding new _preview endpoint for data frame analytics (#69453 ) This commit adds a new `_preview` endpoint for data frame analytics. This allows users to see the data on which their model will be trained. This is especially useful in the arrival of custom feature processors. The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.	2021-03-01 12:25:50 -05:00
Lisa Cawley	138224b398	[DOCS] Edits trained model alias API (#69491 )	2021-02-24 08:17:49 -08:00
István Zoltán Szabó	77d0f56581	[DOCS] Adds anomaly detection alert documentation (#68923 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-02-23 10:29:54 +01:00
Dimitris Athanasiou	7fb98c0d3c	[ML] Add runtime mappings to data frame analytics source config (#69183 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056	2021-02-19 16:29:19 +02:00
Benjamin Trent	0af38bba9e	[ML] add new delete trained model aliases API (#69195 ) In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.	2021-02-18 13:12:07 -05:00
Lisa Cawley	55f0e32fe4	[DOCS] Clarify put data frame analytics API feature processors option (#69158 )	2021-02-18 08:53:46 -08:00
Benjamin Trent	26eef892df	[ML] adds new trained model alias API to simplify trained model updates and deployments (#68922 ) A `model_alias` allows trained models to be referred by a user defined moniker. This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.	2021-02-18 09:41:50 -05:00
James Rodewig	9b88ae92e6	[DOCS] Fix typos for duplicate words (#69125 )	2021-02-17 10:34:20 -05:00
Lisa Cawley	a1fb2c3606	[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084 )	2021-02-16 14:02:00 -08:00
Lisa Cawley	8b6ec07613	[DOCS] Edits ML hyperparameter descriptions (#68880 )	2021-02-11 11:55:28 -08:00
Lisa Cawley	683368cc4d	[DOCS] Clarify soft_tree_depth_limit (#68787 ) Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>	2021-02-10 12:51:01 -08:00
István Zoltán Szabó	e45d7a942d	[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213 )	2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy	78368428b3	[ML] Add early stopping DFA configuration parameter (#68099 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.	2021-02-01 11:41:28 +01:00
Dimitris Athanasiou	5c961c1c81	[ML] Expand regression/classification hyperparameters (#67950 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance	2021-01-26 12:56:41 +02:00
István Zoltán Szabó	addb5cbd3a	[DOCS] Adds custom feature processors description to PUT DFA API (#67424 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-19 09:47:32 +01:00
Dimitris Athanasiou	7574013604	[ML] Remove DFA job states reindexing and analyzing from docs (#67658 ) These states do no longer exist as of #67423	2021-01-18 17:39:22 +02:00
Benjamin Trent	35f478b618	[ML] [DOCS] adding missing fields to the get trained models API docs (#67590 ) Adds missing fields description, inference_config, and input to the GET trained models API documentation	2021-01-15 13:20:53 -05:00
Benjamin Trent	24ebcc8c24	[ML] [DOCS] update find-structure reference docs (#67586 ) The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".	2021-01-15 12:19:38 -05:00
István Zoltán Szabó	085a288af5	[DOCS] Adds hyperparameter metadata property to GET trained models API docs. (#67412 )	2021-01-13 13:49:51 +01:00
Lisa Cawley	401d302c69	[DOCS] Move find file structure to a new API endpoint (#67314 )	2021-01-12 11:59:45 -08:00
Benjamin Trent	af179ab2f5	[ML] move find file structure to a new API endpoint (#67123 ) This introduces a new `text-structure` plugin. This is the new home of the find file structure API. The old REST URL is still available but is deprecated. The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged. Changes to the high-level REST client and docs will be in separate commit. related to: https://github.com/elastic/elasticsearch/issues/67001	2021-01-11 08:56:02 -05:00
Lisa Cawley	eff9dfc3a4	[DOCS] Clarify impact of delayed data in anomaly detection (#66816 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-05 12:14:51 -08:00
István Zoltán Szabó	d3ad9fe632	[DOCS] Improves inference processor linking and docs (#66119 )	2021-01-05 09:42:06 +01:00
David Roberts	c5bef7f9a7	[ML] Deprecate anomaly detection post data endpoint (#66347 ) There is little evidence of this endpoint being used and there is quite a lot of code complexity associated with the various formats that can be used to upload data and the different errors that can occur when direct data upload is open to end users. In a future release we can make this endpoint internal so that only datafeeds can use it, and remove all the options and formats that are not used by datafeeds. End users will have to store their input data for anomaly detection in Elasticsearch indices (which we believe all do today) and use a datafeed to feed it to anomaly detection jobs.	2020-12-15 18:37:20 +00:00
Dimitris Athanasiou	3bed6661de	[ML] Add log_time to AD data_counts and decide current based on it (#66343 ) This commit is fixing a potential bug if we support anomaly detection results index rollover in the future. In particular, we determine the current `data_counts` by sorting on the latest record time. However, this is not correct if the job reverts to an older model snapshot. To fix this we add `log_time` to `data_counts` (similarly to `model_size_stats`) and sort on `log_time` to figure out the current counts for the job.	2020-12-15 19:09:13 +02:00
István Zoltán Szabó	bc989e4a86	[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085 )	2020-12-09 10:47:51 +01:00
István Zoltán Szabó	3081cf4944	[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036 )	2020-12-09 10:01:26 +01:00
David Kyle	22dadfd407	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) For the changes in #65606	2020-12-08 10:06:58 +00:00
David Roberts	49e492f313	[ML] Adding assignment_memory_basis to model_size_stats (#65561 ) At present the Java code makes a decision on whether to use current model memory or model memory limit to calculate how much memory a job requires to be assigned. The plan is to move this decision to the C++ code, which will report it via a new field in the model size stats. An additional change will be that once we have made the switch from using model memory limit to using current model memory we will never switch back, as this causes large fluctuations up and down in memory requirement which will be much more noticeable when autoscaling is in use. Although the only two options at present are model memory limit and current model memory, the new enum includes a third possibility, peak model memory. To switch to this now would be tricky, as there have been two bugs in the implementation of peak model memory which render its value unreliable in 7.x. However, in 8.x it might make sense to switch to using peak model memory instead of current model memory and it's much easier from a BWC perspective if the enum contains all the values from the start. Relates #63163	2020-12-03 17:18:08 +00:00
David Roberts	fc72b39a17	[ML] Adjusting soft_limit description (#65383 ) This PR adds detail to the explanation of the soft_limit memory_status in ML job stats. A consequence that was not mentioned before is that examples are not added to category definitions. Relates elastic/ml-cpp#1590	2020-11-24 09:35:07 +00:00
István Zoltán Szabó	a85fb5534a	[DOCS] Fixes typo in Aggregating data for faster performance. (#65354 )	2020-11-23 12:44:59 +01:00
István Zoltán Szabó	f1e54a63a1	[DOCS] Adds UI related limitation to configuring aggs docs (#65184 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-11-20 19:03:18 +01:00
István Zoltán Szabó	1e045da339	[DOCS] Makes the screenshot larger on the custom URLs page. (#65269 )	2020-11-20 09:29:39 +01:00

1 2 3 4 5 ...

572 Commits