elasticsearch

Commit Graph

Author	SHA1	Message	Date
Benjamin Trent	01fc8ed246	[ML] adding ability to update runtime_mappings via datafeed config update API (#71707 ) Adds runtime_mappings as an updatable field via datafeed config update. closes: #71702	2021-04-15 09:44:34 -04:00
István Zoltán Szabó	ce389dff5d	[DOCS] Clarifies that custom rules are job rules in Kibana (#71678 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-04-15 09:33:03 +02:00
James Rodewig	693807a6d3	[DOCS] Fix double spaces (#71082 )	2021-03-31 09:57:47 -04:00
Benjamin Trent	c8415a7924	[ML] adding support for composite aggs in anomaly detection (#69970 ) This commit allows for composite aggregations in datafeeds. Composite aggs provide a much better solution for having influencers, partitions, etc. on high volume data. Instead of worrying about long scrolls in the datafeed, the calculation is distributed across cluster via the aggregations. The restrictions for this support are as follows: - The composite aggregation must have EXACTLY one `date_histogram` source - The sub-aggs of the composite aggregation must have a `max` aggregation on the SAME timefield as the aforementioned `date_histogram` source - The composite agg must be the ONLY top level agg and it cannot have a `composite` or `date_histogram` sub-agg - If using a `date_histogram` to bucket time, it cannot have a `composite` sub-agg. - The top-level `composite` agg cannot have a sibling pipeline agg. Pipeline aggregations are supported as a sub-agg (thus a pipeline agg INSIDE the bucket). Some key user interaction differences: - Speed + resources used by the cluster should be controlled by the `size` parameter in the `composite` aggregation. Previously, we said if you are using aggs, use a specific `chunking_config`. But, with composite, that is not necessary. - Users really shouldn't use nested `terms` aggs anylonger. While this is still a "valid" configuration and MAY be desirable for some users (only wanting the top 10 of certain terms), typically when users want influencers, partition fields, etc. they want the ENTIRE population. Previously, this really wasn't possible with aggs, with `composite` it is. - I cannot really think of a typical usecase that SHOULD ever use a multi-bucket aggregation that is NOT supported by composite.	2021-03-30 08:25:40 -04:00
István Zoltán Szabó	1db2b85e45	[DOCS] Adds source index privileges required for Explain DFA API docs. (#70978 )	2021-03-30 10:42:48 +02:00
Benjamin Trent	b796632582	[ML] Allow datafeed and job configs for datafeed preview API (#70836 ) Previously, a datafeed and job must already exist for the `_preview` API to work. With this change, users can get an accurate preview of the data that will be sent to the anomaly detection job without creating either of them. closes https://github.com/elastic/elasticsearch/issues/70264	2021-03-26 12:52:23 -04:00
István Zoltán Szabó	9a8c6fb66f	[DOCS] Removes beta labels from DFA related docs. (#70808 )	2021-03-26 09:46:41 +01:00
István Zoltán Szabó	165c0ddaeb	[DOCS] Updates anomaly detection alert docs with the new alerting terminology (#70486 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-03-18 18:23:19 +01:00
Benjamin Trent	10e637d97c	[ML] allow documents to be out of order within the same time bucket (#70468 ) This commit allows documents seen within the same time bucket to be out of order. This is already supported within the native process. Additionally, when recording the "latest" record timestamp, we were assuming that the latest seen document was truly the "latest". This is not really the case if latency is utilized or if documents come out of order within the same bucket.	2021-03-17 09:34:49 -04:00
James Rodewig	5c75d004fa	[DOCS] Replace `put` with `create or update` in API names (#70330 ) Co-authored-by: debadair <debadair@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>	2021-03-15 14:49:44 -04:00
István Zoltán Szabó	59f6280a7b	[DOCS] Changes deprecated syntax to node.role style in datafeed docs. (#70201 )	2021-03-10 15:46:01 +01:00
Lisa Cawley	2caba7b11f	[DOCS] Edits machine learning settings (#69947 ) Co-authored-by: David Roberts <dave.roberts@elastic.co>	2021-03-09 10:59:12 -08:00
István Zoltán Szabó	c226958947	[DOCS] Expands anomaly detection alert type docs (#70026 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co> Co-authored-by: Dima Arnautov <arnautov.dima@gmail.com>	2021-03-09 12:02:16 +01:00
Lisa Cawley	c537e5f38c	[DOCS] Edits delete trained model alias API (#70119 )	2021-03-08 17:08:58 -08:00
István Zoltán Szabó	8a7aced8e8	[DOCS] Adds beta tag to anomaly detection alert docs. (#70013 )	2021-03-08 10:46:24 +01:00
István Zoltán Szabó	2ccc81081f	[DOCS] Adds hyperparameters option to the include setting of GET trained models API. (#69959 )	2021-03-04 16:43:06 +01:00
Joe Gallo	1e8b5fa7c2	Remove the _ml/find-file-structure docs (#69823 )	2021-03-03 09:49:28 -05:00
Benjamin Trent	2279cafb4e	[ML] adding new _preview endpoint for data frame analytics (#69453 ) This commit adds a new `_preview` endpoint for data frame analytics. This allows users to see the data on which their model will be trained. This is especially useful in the arrival of custom feature processors. The API design is a similar to datafeed `_preview` and data frame analytics `_explain`.	2021-03-01 12:25:50 -05:00
Lisa Cawley	138224b398	[DOCS] Edits trained model alias API (#69491 )	2021-02-24 08:17:49 -08:00
István Zoltán Szabó	77d0f56581	[DOCS] Adds anomaly detection alert documentation (#68923 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2021-02-23 10:29:54 +01:00
Dimitris Athanasiou	7fb98c0d3c	[ML] Add runtime mappings to data frame analytics source config (#69183 ) Users can now specify runtime mappings as part of the source config of a data frame analytics job. Those runtime mappings become part of the mapping of the destination index. This ensures the fields are accessible in the destination index even if the relevant data frame analytics job gets deleted. Closes #65056	2021-02-19 16:29:19 +02:00
Benjamin Trent	0af38bba9e	[ML] add new delete trained model aliases API (#69195 ) In addition to creating and re-assigning model aliases, users should be able to delete existing and unused model aliases.	2021-02-18 13:12:07 -05:00
Lisa Cawley	55f0e32fe4	[DOCS] Clarify put data frame analytics API feature processors option (#69158 )	2021-02-18 08:53:46 -08:00
Benjamin Trent	26eef892df	[ML] adds new trained model alias API to simplify trained model updates and deployments (#68922 ) A `model_alias` allows trained models to be referred by a user defined moniker. This not only improves the readability and simplicity of numerous API calls, but it allows for simpler deployment and upgrade procedures for trained models. Previously, if you referenced a model ID directly within an ingest pipeline, when you have a new model that performs better than an earlier referenced model, you have to update the pipeline itself. If this model was used in numerous pipelines, ALL those pipelines would have to be updated. When using a `model_alias` in an ingest pipeline, only that `model_alias` needs to be updated. Then, the underlying referenced model will change in place for all ingest pipelines automatically. An additional benefit is that the model referenced is not changed until it is fully loaded into cache, this way throughput is not hampered by changing models.	2021-02-18 09:41:50 -05:00
James Rodewig	9b88ae92e6	[DOCS] Fix typos for duplicate words (#69125 )	2021-02-17 10:34:20 -05:00
Lisa Cawley	a1fb2c3606	[DOCS] Fixes n_gram_encoding in data frame analytics APIs (#69084 )	2021-02-16 14:02:00 -08:00
Lisa Cawley	8b6ec07613	[DOCS] Edits ML hyperparameter descriptions (#68880 )	2021-02-11 11:55:28 -08:00
Lisa Cawley	683368cc4d	[DOCS] Clarify soft_tree_depth_limit (#68787 ) Co-authored-by: Tom Veasey <tveasey@users.noreply.github.com>	2021-02-10 12:51:01 -08:00
István Zoltán Szabó	e45d7a942d	[DOCS] Expands feature processors property description and adds a link of conceptual docs (#68213 )	2021-02-02 14:48:43 +01:00
Valeriy Khakhutskyy	78368428b3	[ML] Add early stopping DFA configuration parameter (#68099 ) The PR adds early_stopping_enabled optional data frame analysis configuration parameter. The enhancement was already described in elastic/ml-cpp#1676 and so I mark it here as non-issue.	2021-02-01 11:41:28 +01:00
Dimitris Athanasiou	5c961c1c81	[ML] Expand regression/classification hyperparameters (#67950 ) Expands data frame analytics regression and classification analyses with the followin hyperparameters: - alpha - downsample_factor - eta_growth_rate_per_tree - max_optimization_rounds_per_hyperparameter - soft_tree_depth_limit - soft_tree_depth_tolerance	2021-01-26 12:56:41 +02:00
István Zoltán Szabó	addb5cbd3a	[DOCS] Adds custom feature processors description to PUT DFA API (#67424 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-19 09:47:32 +01:00
Dimitris Athanasiou	7574013604	[ML] Remove DFA job states reindexing and analyzing from docs (#67658 ) These states do no longer exist as of #67423	2021-01-18 17:39:22 +02:00
Benjamin Trent	35f478b618	[ML] [DOCS] adding missing fields to the get trained models API docs (#67590 ) Adds missing fields description, inference_config, and input to the GET trained models API documentation	2021-01-15 13:20:53 -05:00
Benjamin Trent	24ebcc8c24	[ML] [DOCS] update find-structure reference docs (#67586 ) The text structure finder API documentation had many references to the "files". While this is one use of the API, the API now has a more generic name. This commit replaces many references to the word "file" to the more generic word "text".	2021-01-15 12:19:38 -05:00
István Zoltán Szabó	085a288af5	[DOCS] Adds hyperparameter metadata property to GET trained models API docs. (#67412 )	2021-01-13 13:49:51 +01:00
Lisa Cawley	401d302c69	[DOCS] Move find file structure to a new API endpoint (#67314 )	2021-01-12 11:59:45 -08:00
Benjamin Trent	af179ab2f5	[ML] move find file structure to a new API endpoint (#67123 ) This introduces a new `text-structure` plugin. This is the new home of the find file structure API. The old REST URL is still available but is deprecated. The new URL is: `_text_structure/find_structure`. All parameters and behavior are unchanged. Changes to the high-level REST client and docs will be in separate commit. related to: https://github.com/elastic/elasticsearch/issues/67001	2021-01-11 08:56:02 -05:00
Lisa Cawley	eff9dfc3a4	[DOCS] Clarify impact of delayed data in anomaly detection (#66816 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2021-01-05 12:14:51 -08:00
István Zoltán Szabó	d3ad9fe632	[DOCS] Improves inference processor linking and docs (#66119 )	2021-01-05 09:42:06 +01:00
David Roberts	c5bef7f9a7	[ML] Deprecate anomaly detection post data endpoint (#66347 ) There is little evidence of this endpoint being used and there is quite a lot of code complexity associated with the various formats that can be used to upload data and the different errors that can occur when direct data upload is open to end users. In a future release we can make this endpoint internal so that only datafeeds can use it, and remove all the options and formats that are not used by datafeeds. End users will have to store their input data for anomaly detection in Elasticsearch indices (which we believe all do today) and use a datafeed to feed it to anomaly detection jobs.	2020-12-15 18:37:20 +00:00
Dimitris Athanasiou	3bed6661de	[ML] Add log_time to AD data_counts and decide current based on it (#66343 ) This commit is fixing a potential bug if we support anomaly detection results index rollover in the future. In particular, we determine the current `data_counts` by sorting on the latest record time. However, this is not correct if the job reverts to an older model snapshot. To fix this we add `log_time` to `data_counts` (similarly to `model_size_stats`) and sort on `log_time` to figure out the current counts for the job.	2020-12-15 19:09:13 +02:00
István Zoltán Szabó	bc989e4a86	[DOCS] Adds note about data_counts values to Revert snapshot API docs. (#66085 )	2020-12-09 10:47:51 +01:00
István Zoltán Szabó	3081cf4944	[DOCS] Adds empty snapshot_id description to revert snapshot API docs (#66036 )	2020-12-09 10:01:26 +01:00
David Kyle	22dadfd407	[ML] Docs and HRLC for datafeed runtime mappings (#65810 ) For the changes in #65606	2020-12-08 10:06:58 +00:00
David Roberts	49e492f313	[ML] Adding assignment_memory_basis to model_size_stats (#65561 ) At present the Java code makes a decision on whether to use current model memory or model memory limit to calculate how much memory a job requires to be assigned. The plan is to move this decision to the C++ code, which will report it via a new field in the model size stats. An additional change will be that once we have made the switch from using model memory limit to using current model memory we will never switch back, as this causes large fluctuations up and down in memory requirement which will be much more noticeable when autoscaling is in use. Although the only two options at present are model memory limit and current model memory, the new enum includes a third possibility, peak model memory. To switch to this now would be tricky, as there have been two bugs in the implementation of peak model memory which render its value unreliable in 7.x. However, in 8.x it might make sense to switch to using peak model memory instead of current model memory and it's much easier from a BWC perspective if the enum contains all the values from the start. Relates #63163	2020-12-03 17:18:08 +00:00
David Roberts	fc72b39a17	[ML] Adjusting soft_limit description (#65383 ) This PR adds detail to the explanation of the soft_limit memory_status in ML job stats. A consequence that was not mentioned before is that examples are not added to category definitions. Relates elastic/ml-cpp#1590	2020-11-24 09:35:07 +00:00
István Zoltán Szabó	a85fb5534a	[DOCS] Fixes typo in Aggregating data for faster performance. (#65354 )	2020-11-23 12:44:59 +01:00
István Zoltán Szabó	f1e54a63a1	[DOCS] Adds UI related limitation to configuring aggs docs (#65184 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-11-20 19:03:18 +01:00
István Zoltán Szabó	1e045da339	[DOCS] Makes the screenshot larger on the custom URLs page. (#65269 )	2020-11-20 09:29:39 +01:00
David Roberts	e4ce39845b	[ML] Add total ML memory to ML info (#65195 ) This change adds an extra piece of information, limits.total_ml_memory, to the ML info response. This returns the total amount of memory that ML is permitted to use for native processes across all ML nodes in the cluster. Some of this may already be in use; the value returned is total, not available ML memory.	2020-11-18 15:06:21 +00:00
Lisa Cawley	9fef6e7b7e	[DOCS] Adds new snapshot upgrade API (#65095 )	2020-11-16 09:48:07 -08:00
István Zoltán Szabó	95a0ed4304	[DOCS] Adds recommendation about when to use chunking_config in manual mode. (#65060 )	2020-11-16 16:12:07 +01:00
Benjamin Trent	33de89d94c	[ML] add new snapshot upgrader API for upgrading older snapshots (#64665 ) This new API provides a way for users to upgrade their own anomaly job model snapshots. To upgrade a snapshot the following is done: - Open a native process given the job id and the desired snapshot id - load the snapshot to the process - write the snapshot again from the native task (now updated via the native process) relates #64154	2020-11-12 10:45:56 -05:00
István Zoltán Szabó	db15c4d6b9	[DOCS] Adds scroll_size maximum value to datafeeds API docs (#64986 )	2020-11-12 15:53:53 +01:00
István Zoltán Szabó	9ed907bc75	[DOCS] Fixes example aggregation syntax in datafeed aggregations. (#64936 )	2020-11-11 16:33:36 +01:00
Lisa Cawley	919c79b745	[DOCS] Add custom feature processor example (#64681 )	2020-11-06 09:24:01 -08:00
James Rodewig	1ea83359bb	[DOCS] Fix case for 'Boolean' (#64299 )	2020-10-29 09:04:43 -04:00
István Zoltán Szabó	6093518f4a	[DOCS] Changes experimental flag to beta in DFA related docs (#63992 )	2020-10-26 17:02:46 +01:00
Lisa Cawley	a00c7a2b6c	[DOCS] Add tips for num_top_classes classification parameter (#63781 )	2020-10-21 09:27:13 -07:00
István Zoltán Szabó	9defe10616	[DOCS] Expands DFA evaluation API docs with the default set of metrics (#63971 )	2020-10-21 14:30:33 +02:00
Benjamin Trent	c1de07fa83	[ML] adding new flag exclude_generated that removes generated fields in GET config APIs (#63899 ) When exporting and cloning ml configurations in a cluster it can be frustrating to remove all the fields that were generated by the plugin. Especially as the number of these fields change from version to version. This flag, exclude_generated, allows the GET config APIs to return configurations with these generated fields removed. APIs supporting this flag: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) relates to #63055	2020-10-20 11:28:29 -04:00
Dimitris Athanasiou	03ed7de6c1	[ML] Rename evaluation metric result fields to value (#63809 ) Renames data frame analytics _evaluate API results as follows: - per class accuracy renamed from `accuracy` to `value` - per class precision renamed from `precision` to `value` - per class recall renamed from `recall` to `value` - auc_roc `score` renamed to `value` for both outlier detection and classification	2020-10-20 10:30:50 +03:00
David Roberts	977a4ad3f9	[ML] Change docs test mute comment (#63866 ) The original comment mentioned issue #48583, but issue #48941 is specifically open for this mute. However, this is inappropriate, as the underlying reason the test cannot be unmuted is the same as for all the other tests skipped with the comment "Kibana sample data": issues #51572, #51576 and #51678. Closes #48941	2020-10-19 10:17:27 +01:00
Przemysław Witek	d9e7d88f08	[ML] Allow setting num_top_classes to a special value -1 (#63587 )	2020-10-13 13:14:17 +02:00
István Zoltán Szabó	e8930a44a4	[DOCS] Adds AUC ROC classification metric to the API examples (#63563 )	2020-10-13 11:03:20 +02:00
István Zoltán Szabó	b517d4d9b5	[DOCS] Adds huber and msle metrics to Evaluate API example calls (#63414 )	2020-10-08 17:05:04 +02:00
Przemysław Witek	b0019bd0a6	[ML] Validate that AucRoc has the data necessary to be calculated (#63302 )	2020-10-08 08:19:43 +02:00
lcawl	2177b46289	[DOCS] Fixes typo	2020-10-06 09:19:43 -07:00
Lisa Cawley	49ab8f8688	[DOCS] Add feature_importance_baseline to get trained model API (#63279 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com>	2020-10-06 07:56:55 -07:00
István Zoltán Szabó	de3ce8bc39	[DOCS] Adds delta and offset parameters to Evaluate DFA API docs (#63317 )	2020-10-06 16:06:35 +02:00
Lisa Cawley	51f9bf657d	[DOCS] Fix titles for ML APIs (#63152 )	2020-10-02 11:53:49 -07:00
István Zoltán Szabó	baffdd1ec0	[DOCS] Updates trained models API docs titles. (#63165 )	2020-10-02 10:15:14 -07:00
Benjamin Trent	7bd6e78dae	[ML] adding for_export flag for ml plugin GET resource APIs (#63092 ) This adds the new `for_export` flag to the following APIs: - GET _ml/anomaly_detection/<job_id> - GET _ml/datafeeds/<datafeed_id> - GET _ml/data_frame/analytics/<analytics_id> The flag is designed for cloning or exporting configuration objects to later be put into the same cluster or a separate cluster. The following fields are not returned in the objects: - any field that is not user settable (e.g. version, create_time) - any field that is a calculated default value (e.g. datafeed chunking_config) - any field that would effectively require changing to be of use (e.g. datafeed job_id) - any field that is automatically set via another Elastic stack process (e.g. anomaly job custom_settings.created_by) closes https://github.com/elastic/elasticsearch/issues/63055	2020-10-02 08:29:19 -04:00
Benjamin Trent	1084aaf18a	[ML] renames /inference apis to /trained_models (#63097 ) This commit renames all `inference` CRUD APIs to `trained_models`. This aligns with internal terminology, documentation, and use-cases.	2020-10-01 12:13:49 -04:00
Przemysław Witek	cd1a27f273	[ML] Implement AucRoc metric for classification (#60502 )	2020-09-30 08:56:23 +02:00
Lisa Cawley	e48eab95e9	[DOCS] Formatting fix in get trained model API (#62643 )	2020-09-21 08:19:37 -07:00
Benjamin Trent	a653a1cbb8	[ML] all multiple wildcard values for GET Calendars, Events, and DELETE forecasts (#62563 ) This commit adjusts the following APIs so now they not only support an `_all` case, but wildcard patterned Ids as well. - `GET _ml/calendars/<calendar_id>/events` - `GET _ml/calendars/<calendar_id>` - `GET _ml/anomaly_detectors/<job_id>/model_snapshots/<snapshot_id>` - `DELETE _ml/anomaly_detectors/<job_id>/_forecast/<forecast_id>`	2020-09-18 09:39:40 -04:00
Benjamin Trent	fdb7b6d3b5	[ML] Add new include flag to GET inference/<model_id> API for model training metadata (#61922 ) Adds new flag include to the get trained models API The flag initially has two valid values: definition, total_feature_importance. Consequently, the old include_model_definition flag is now deprecated. When total_feature_importance is included, the total_feature_importance field is included in the model metadata object. Including definition is the same as previously setting include_model_definition=true.	2020-09-18 07:11:38 -04:00
Lisa Cawley	e743ed6102	[DOCS] Minor typo in ML API (#62414 )	2020-09-15 13:19:17 -07:00
Lisa Cawley	9c2b214873	[DOCS] Removes inference from trained model API text (#62125 )	2020-09-09 10:11:50 -07:00
David Roberts	6008a74da5	[ML] Include the "properties" layer in find_file_structure mappings (#62158 ) Previously the "mappings" field of the response from the find_file_structure endpoint was not a drop-in for the mappings format of the create index endpoint - the "properties" layer was missing. The reason for omitting it initially was that the assumption was that the find_file_structure endpoint would only ever return very simple mappings without any nested objects. However, this will not be true in the future, as we will improve mappings detection for complex JSON objects. As a first step it makes sense to move the returned mappings closer to the standard format. This is a small building block towards fixing #55616	2020-09-09 16:29:23 +01:00
Lisa Cawley	1e6cdcac20	[DOCS] Fix from and size descriptions for model APIs (#62128 )	2020-09-08 12:54:51 -07:00
Lisa Cawley	4a7492f3fd	[DOCS] Fix allow_no_match description for model APIs (#62008 )	2020-09-08 08:11:33 -07:00
István Zoltán Szabó	a75094e666	[DOCS] Removes inference from the names of trained model APIs. (#62036 )	2020-09-07 11:23:29 +02:00
Lisa Cawley	511babde59	[DOCS] Refresh machine learning custom URL example (#61826 )	2020-09-03 16:53:26 -07:00
Lisa Cawley	f05d8c2b98	[DOCS] Per-partition categorization (#61506 )	2020-08-26 17:07:46 -07:00
lcawl	f56ab039ae	[DOCS] Fix typo in update anomaly detection job API	2020-08-25 17:12:43 -07:00
Benjamin Trent	1b34c88d56	[ML] adding docs + hlrc for data frame analysis feature_processors (#61149 ) Adds HLRC and some docs for the new feature_processors field in Data frame analytics. Co-authored-by: Przemysław Witek <przemyslaw.witek@elastic.co> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-08-24 12:00:44 -04:00
James Rodewig	a94e5cb7c4	[DOCS] Replace Wikipedia links with attribute (#61171 )	2020-08-17 09:44:24 -04:00
James Rodewig	6b9b8c5e31	[DOCS] Move script and stored fields content to search fields page (#60826 ) Changes: * Moves `Retrieve selected fields` to its own page and adds a title abbreviation. * Adds existing script and stored fields content to `Retrieve selected fields` * Adds a xref for `Retrieve selected fields` to `Search your data` * Adds related redirects and updates existing xrefs	2020-08-06 12:45:03 -04:00
István Zoltán Szabó	c3536935b2	[DOCS] Adds inference phase to get DFA job stats. (#60737 )	2020-08-05 16:22:21 +02:00
Przemysław Witek	29ee3a05b6	Deprecate allow_no_jobs and allow_no_datafeeds in favor of allow_no_match (#60601 )	2020-08-05 12:29:07 +02:00
James Rodewig	441c3a21b1	[DOCS] Update my-index examples (#60132 ) Changes the following example index names to `my-index-000001` for consistency: * `my-index` * `my_index` * `myindex`	2020-07-27 14:46:39 -04:00
Lisa Cawley	1781d4a7b9	[DOCS] Fix security links in machine learning APIs (#60098 )	2020-07-23 12:14:56 -07:00
James Rodewig	2774cd6938	[DOCS] Swap `[float]` for `[discrete]` (#60124 ) Changes instances of `[float]` in our docs for `[discrete]`. Asciidoctor prefers the `[discrete]` tag for floating headings: https://asciidoctor.org/docs/asciidoc-asciidoctor-diffs/#blocks	2020-07-23 11:48:22 -04:00
James Rodewig	80b674fb25	[DOCS] Reformat snippets to use two-space indents (#59973 )	2020-07-21 12:24:26 -04:00
Przemysław Witek	2a12dcf2e0	Rename binary_soft_classification evaluation to outlier_detection (#59951 )	2020-07-21 14:27:57 +02:00
Lisa Cawley	fb0157460f	[DOCS] Changes level offset of anomaly detection pages (#59911 )	2020-07-20 16:33:54 -07:00
Lisa Cawley	823c337e76	[DOCS] Changes level offset for anomaly detection APIs (#59920 )	2020-07-20 12:38:09 -07:00
Lisa Cawley	42be287b57	[DOCS] Changes level offset in data frame analytics APIs (#59919 )	2020-07-20 12:11:47 -07:00
Benjamin Trent	b551f75ec3	[ML] add new `custom` field to trained model processors (#59542 ) This commit adds the new configurable field `custom`. `custom` indicates if the preprocessor was submitted by a user or automatically created by the analytics job. Eventually, this field will be used in calculating feature importance. When `custom` is true, the feature importance for the processed fields is calculated. When `false` the current behavior is the same (we calculate the importance for the originating field/feature). This also adds new required methods to the preprocessor interface. If users are to supply their own preprocessors in the analytics job configuration, we need to know the input and output field names.	2020-07-16 09:35:56 -04:00
Przemysław Witek	dfbb47dcaa	Add a "verbose" option to the data frame analytics stats endpoint (#59589 )	2020-07-15 15:59:56 +02:00
Dimitris Athanasiou	da0249f6c2	[ML] Data frame analytics max_num_threads setting (#59254 ) This adds a setting to data frame analytics jobs called `max_number_threads`. The setting expects a positive integer. When used the user specifies the max number of threads that may be used by the analysis. Note that the actual number of threads used is limited by the number of processors on the node where the job is assigned. Also, the process may use a couple more threads for operational functionality that is not the analysis itself. This setting may also be updated for a stopped job. More threads may reduce the time it takes to complete the job at the cost of using more CPU.	2020-07-09 16:31:26 +03:00
James Rodewig	2be9db01c8	[DOCS] Replace `datatype` with `data type` (#58972 )	2020-07-07 13:52:10 -04:00
Przemysław Witek	4a43b03855	Report peak model memory in ModelSizeStats (#59017 )	2020-07-06 10:33:54 +02:00
Benjamin Trent	6238d4fc49	[ML] add exponent output aggregator to inference (#58933 ) * [ML] add exponent output aggregator to inference * fixing docs	2020-07-03 08:22:01 -04:00
Przemysław Witek	843c512e78	Rename regression evaluation metrics to make the names consistent with loss functions (#58887 )	2020-07-02 16:19:27 +02:00
Przemysław Witek	38aa474dec	Implement pseudo Huber loss (PseudoHuber) evaluation metric for regression analysis (#58734 )	2020-07-01 13:29:56 +02:00
Przemysław Witek	dfa06240fc	Implement MSLE (MeanSquaredLogarithmicError) evaluation metric for regression analysis (#58684 )	2020-06-30 13:06:15 +02:00
István Zoltán Szabó	d0042fb791	[DOCS] Updates results_field description in the inference processor docs (#58554 )	2020-06-29 11:28:17 +02:00
Przemysław Witek	3953de4c98	Introduce DataFrameAnalyticsConfig update API (#58302 )	2020-06-29 09:26:31 +02:00
Dimitris Athanasiou	96853df6af	[ML] Rename increased_memory_estimate_bytes (#58614 ) ... to memory_reestimate_bytes in DF Analytics memory usage. Relates #58588	2020-06-27 12:04:39 +03:00
Dimitris Athanasiou	0994005c2e	[ML] Add status and increased estimate to memory usage (#58588 ) Adds parsing of `status` and `increased_memory_estimate_bytes` to data frame analytics `memory_usage`. When the training surpasses the model memory limit, the status will be set to `hard_limit` and `increased_memory_estimate_bytes` can be used to update the job's limit in order to restart the job.	2020-06-26 16:10:14 +03:00
István Zoltán Szabó	3b61ec1fe2	[DOCS] Updates screenshots in ML population analysis (#58318 )	2020-06-23 09:03:31 +02:00
Benjamin Trent	a43ff95f2d	[ML] calculate cache misses for inference and return in stats (#58252 ) When a local model is constructed, the cache hit miss count is incremented. When a user calls _stats, we will include the sum cache hit miss count across ALL nodes. This statistic is important to in comparing against the inference_count. If the cache hit miss count is near the inference_count it indicates that the cache is overburdened, or inappropriately configured.	2020-06-18 17:18:43 -04:00
Przemysław Witek	76c7e3259f	Make ModelPlotConfig.annotations_enabled default to ModelPlotConfig.enabled if unset (#57808 )	2020-06-08 15:31:37 +02:00
David Kyle	bbeda643a6	Delete expired data by job (#57337 ) Deleting expired data can take a long time leading to timeouts if there are many jobs. Often the problem is due to a few large jobs which prevent the regular maintenance of the remaining jobs. This change adds a job_id parameter to the delete expired data endpoint to help clean up those problematic jobs.	2020-06-05 13:32:35 +01:00
David Roberts	605b4d0ea9	[ML] Add per-partition categorization option (#57683 ) This PR adds the initial Java side changes to enable use of the per-partition categorization functionality added in elastic/ml-cpp#1293. There will be a followup change to complete the work, as there cannot be any end-to-end integration tests until elastic/ml-cpp#1293 is merged, and also elastic/ml-cpp#1293 does not implement some of the more peripheral functionality, like stop_on_warn and per-partition stats documents. The changes so far cover REST APIs, results object formats, HLRC and docs.	2020-06-05 11:56:15 +01:00
Dimitris Athanasiou	e116ac850f	[ML] Fix race condition when force stopping DF analytics job (#57680 ) When we force delete a DF analytics job, we currently first force stop it and then we proceed with deleting the job config. This may result in logging errors if the job config is deleted before it is retrieved while the job is starting. Instead of force stopping the job, it would make more sense to try to stop the job gracefully first. So we now try that out first. If normal stop fails, then we resort to force stopping the job to ensure we can go through with the delete. In addition, this commit introduces `timeout` for the delete action and makes use of it in the child requests.	2020-06-05 12:13:02 +03:00
István Zoltán Szabó	3a15d84af9	[DOCS] Changes parameter order in model_plot_config. (#57642 )	2020-06-04 10:57:36 +02:00
Przemysław Witek	c4c094c006	Introduce ModelPlotConfig. annotations_enabled setting (#57539 )	2020-06-04 09:27:40 +02:00
Lisa Cawley	0f52cab495	[DOCS] Replaces docdir attributes in ML APIs (#57390 )	2020-06-01 11:46:10 -07:00
Benjamin Trent	251b17009a	[ML] adds new for_export flag to GET _ml/inference API (#57351 ) Adds a new boolean flag, `for_export` to the `GET _ml/inference/<model_id>` API. This flag is useful for moving models between clusters.	2020-05-29 12:29:28 -04:00
Benjamin Trent	ec67787a2e	[ML] add max_model_memory parameter to forecast request (#57254 ) This adds a max_model_memory setting to forecast requests. This setting can take a string value that is formatted according to byte sizes (i.e. "50mb", "150mb"). The default value is `20mb`. There is a HARD limit at `500mb` which will throw an error if used. If the limit is larger than 40% the anomaly job's configured model limit, the forecast limit is reduced to be strictly lower than that value. This reduction is logged and audited. related native change: https://github.com/elastic/ml-cpp/pull/1238 closes: https://github.com/elastic/elasticsearch/issues/56420	2020-05-29 08:59:50 -04:00
István Zoltán Szabó	eaf0d5ffee	[DOCS] Puts a link into the loss_function variable description (#56678 )	2020-05-28 09:42:27 +02:00
István Zoltán Szabó	b9b3546985	[DOCS] Fixes formatting of admonition paragraph in PUT inference API docs. (#57196 )	2020-05-27 13:42:50 +02:00
István Zoltán Szabó	90056edaf4	[DOCS] Improves navigation between forecast APIs and adds short description. (#57035 )	2020-05-25 09:09:47 +02:00
István Zoltán Szabó	69b6041d57	[DOCS] Removes the Jobs section from the ML anomaly detection APIs page. (#57031 )	2020-05-21 17:30:59 +02:00
Benjamin Trent	8fed077b0a	[ML] relax throttling on expired data cleanup (#56711 ) Throttling nightly cleanup as much as we do has been over cautious. Night cleanup should be more lenient in its throttling. We still keep the same batch size, but now the requests per second scale with the number of data nodes. If we have more than 5 data nodes, we don't throttle at all. Additionally, the API now has `requests_per_second` and `timeout` set. So users calling the API directly can set the throttling. This commit also adds a new setting `xpack.ml.nightly_maintenance_requests_per_second`. This will allow users to adjust throttling of the nightly maintenance.	2020-05-18 07:21:06 -04:00
David Roberts	cbb8b17d74	[DOCS] Docs changes for overridden delimiter in find_file_structure (#56288 ) Docs for #55735 Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-14 09:24:07 +01:00
Lisa Cawley	84e28e42c8	[DOCS] Clarify model snapshot retention properties (#56477 )	2020-05-11 07:41:47 -07:00
István Zoltán Szabó	c994369893	[DOCS] Expands GET DFA stats API docs with new phases (#56407 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-11 09:22:30 +02:00
David Roberts	c99021cdcb	[ML] More advanced model snapshot retention options (#56125 ) This PR implements the following changes to make ML model snapshot retention more flexible in advance of adding a UI for the feature in an upcoming release. - The default for `model_snapshot_retention_days` for new jobs is now 10 instead of 1 - There is a new job setting, `daily_model_snapshot_retention_after_days`, that defaults to 1 for new jobs and `model_snapshot_retention_days` for pre-7.8 jobs - For days that are older than `model_snapshot_retention_days`, all model snapshots are deleted as before - For days that are in between `daily_model_snapshot_retention_after_days` and `model_snapshot_retention_days` all but the first model snapshot for that day are deleted - The `retain` setting of model snapshots is still respected to allow selected model snapshots to be retained indefinitely Closes #52150	2020-05-05 12:55:50 +01:00
Dimitris Athanasiou	6bf3834059	[ML] Add loss_function to regression (#56118 ) Adds parameters `loss_function` and `loss_function_parameter` to regression.	2020-05-05 12:36:05 +03:00
István Zoltán Szabó	86032ac56a	[DOCS] Simplifies footnote text in DFA APIs (#56105 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-05-05 09:03:16 +02:00
Lisa Cawley	52a2f7689f	[DOCS] Synchs and links hyperparameter descriptions (#55827 )	2020-05-04 07:37:14 -07:00
Lisa Cawley	5ef7aacbf7	[DOCS] Adds documentation for secondary authorization headers (#55365 ) Co-authored-by: Tim Vernum <tim@adjective.org>	2020-04-29 08:28:42 -07:00
István Zoltán Szabó	d70cef3474	[DOCS] Makes the footnotes less verbose in configuring aggs page. (#55857 )	2020-04-29 09:50:41 +02:00
István Zoltán Szabó	ca2f98382f	[DOCS] Changes feature importance links to point to the new page (#55531 ) * [DOCS] Changes feature importance links to point to the new page. * [DOCS] Fixes line breaks.	2020-04-28 09:02:14 +02:00
David Roberts	dcb6ed03cd	[ML] Adding failed_category_count to model_size_stats (#55716 ) The failed_category_count statistic records the number of times categorization wanted to create a new category but couldn't because the job had reached its model_memory_limit. Relates elastic/ml-cpp#1130	2020-04-25 08:01:21 +01:00
Lisa Cawley	7fafec0f8f	[DOCS] Update example and nesting in get data frame analytics job stats API (#55191 ) Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>	2020-04-22 08:07:31 -07:00
David Roberts	d1a9b3a545	[ML] Add effective max model memory limit to ML info (#55529 ) The ML info endpoint returns the max_model_memory_limit setting if one is configured. However, it is still possible to create a job that cannot run anywhere in the current cluster because no node in the cluster has enough memory to accommodate it. This change adds an extra piece of information, limits.effective_max_model_memory_limit, to the ML info response that returns the biggest model memory limit that could be run in the current cluster assuming no other jobs were running. The idea is that the ML UI will be able to warn users who try to create jobs with higher model memory limits that their jobs will not be able to start unless they add a bigger ML node to their cluster. Relates elastic/kibana#63942	2020-04-22 11:36:58 +01:00
David Roberts	8906e76079	[ML] Return assigned node in start/open job/datafeed response (#55473 ) Adds a "node" field to the response from the following endpoints: 1. Open anomaly detection job 2. Start datafeed 3. Start data frame analytics job If the job or datafeed is assigned to a node immediately then this field will return the ID of that node. In the case where a job or datafeed is opened or started lazily the node field will contain an empty string. Clients that want to test whether a job or datafeed was opened or started lazily can therefore check for this. Fixes #54067	2020-04-22 08:44:57 +01:00
István Zoltán Szabó	f8bfab2dab	[DOCS] Provides further details on aggregations in datafeeds (#55462 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-04-22 08:53:34 +02:00
Benjamin Trent	f72b2db29a	[ML] partitions model definitions into chunks (#55260 ) This paves the data layer way so that exceptionally large models are partitioned across multiple documents. This change means that nodes before 7.8.0 will not be able to use trained inference models created on nodes on or after 7.8.0. I chose the definition document limit to be 100. This SHOULD be plenty for any large model. One of the largest models that I have created so far had the following stats: ~314MB of inflated JSON, ~66MB when compressed, ~177MB of heap. With the chunking sizes of `16 * 1024 * 1024` its compressed string could be partitioned to 5 documents. Supporting models 20 times this size (compressed) seems adequate for now.	2020-04-20 15:13:23 -04:00
Benjamin Trent	c1afda4a23	[ML] adding prediction_field_type to inference config (#55128 ) Data frame analytics dynamically determines the classification field type. This field type then dictates the encoded JSON that is written to Elasticsearch. Inference needs to know about this field type so that it may provide the EXACT SAME predicted values as analytics. Here is added a new field `prediction_field_type` which indicates the desired type. Options are: `string` (DEFAULT), `number`, `boolean` (where close_to(1.0) == true, false otherwise). Analytics provides the default `prediction_field_type` when the model is created from the process.	2020-04-15 08:32:48 -04:00
Lisa Cawley	1f0341db39	[DOCS] Removes unshared sections from ml-shared.asciidoc (#55129 )	2020-04-14 15:19:31 -07:00
Lisa Cawley	998a085c14	[DOCS] Edits create data frame analytics job API (#54751 )	2020-04-13 09:58:03 -07:00
István Zoltán Szabó	b1b067c5ba	[DOCS] Adds link points to the data frame analytics supported fields (#55004 ) Co-authored-by: lcawl <lcawley@elastic.co>	2020-04-09 11:16:13 -07:00
István Zoltán Szabó	bb44726ad6	[DOCS] Reworks some parts of EMM API docs (#54872 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-04-08 09:50:12 +02:00
Lisa Cawley	c355fea8f4	[DOCS] Remove text fields from classification dependent variables (#54849 )	2020-04-07 10:43:15 -07:00
István Zoltán Szabó	1ae8bde756	[DOCS] Changes kibana_user to kibana_admin in DFA API prerequisites. (#54806 )	2020-04-06 15:45:08 +02:00
István Zoltán Szabó	a0662399c7	[DOCS] Makes PUT inference API docs collapsible (#54653 ) Co-authored-by: lcawl <lcawley@elastic.co>	2020-04-03 09:45:42 +02:00
Benjamin Trent	4e1ff31c3c	[ML] add new inference_config field to trained model config (#54421 ) A new field called `inference_config` is now added to the trained model config object. This new field allows for default inference settings from analytics or some external model builder. The inference processor can still override whatever is set as the default in the trained model config.	2020-04-02 10:34:17 -04:00
Benjamin Trent	1d24960ff8	[ML] prefer secondary authorization header for data[feed\|frame] authz (#54121 ) Secondary authorization headers are to be used to facilitate Kibana spaces support + ML jobs/datafeeds. Now on PUT/Update/Preview datafeed, and PUT data frame analytics the secondary authorization is preferred over the primary (if provided). closes https://github.com/elastic/elasticsearch/issues/53801	2020-04-02 10:10:46 -04:00
Benjamin Trent	bbd6e943de	[ML] add num_matches and preferred_to_categories to category defintion objects (#54214 ) This adds two new fields to category definitions. - `num_matches` indicating how many documents have been seen by this category - `preferred_to_categories` indicating which other categories this particular category supersedes when messages are categorized. These fields are only guaranteed to be up to date after a `_flush` or `_close` native change: https://github.com/elastic/ml-cpp/pull/1062	2020-04-02 07:49:09 -04:00
István Zoltán Szabó	b0f6d4ee0e	[DOCS] Updates estimate model memory docs (#54574 )	2020-04-01 15:53:53 +02:00
István Zoltán Szabó	b96743cfc5	[DOCS] Adds data_counts object to the GET DFA stats API (#54498 )	2020-04-01 10:05:00 +02:00
Jason Tedor	95a7eed9aa	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 15:52:01 -04:00
Lisa Cawley	b90e491f68	[DOCS] Collapses nested objects in data frame analytics APIs (#54472 )	2020-03-31 10:56:48 -07:00
Dimitris Athanasiou	5a98fc20e1	[ML] Fix DF analytics explain API request in docs (#54510 ) The explain API expects a data frame analytics config as its request.	2020-03-31 18:37:19 +03:00
István Zoltán Szabó	85d9b34dc5	[DOCS] Adds description of analysis_stats object and its properties to GET DFA stats API docs (#53881 ) Co-authored-by: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-03-31 13:27:54 +02:00
Lisa Cawley	fdcd19483d	[DOCS] Collapses content in machine learning APIs (#54234 )	2020-03-30 10:08:38 -07:00
Jason Tedor	1fc0432b24	Introduce formal role for remote cluster client (#53924 ) This commit introduce a formal role for identifying nodes that are capable of making connections to remote clusters.	2020-03-24 19:21:56 -04:00
David Roberts	8ee770560a	[ML] Add a model memory estimation endpoint for anomaly detection (#53507 ) A new endpoint for estimating anomaly detection job model memory requirements: POST _ml/anomaly_detectors/estimate_model_memory Closes #53219	2020-03-24 21:38:19 +00:00
David Roberts	cbe063a074	[ML] Introduce a "starting" datafeed state for lazy jobs (#53918 ) It is possible for ML jobs to open lazily if the "allow_lazy_open" option in the job config is set to true. Such jobs wait in the "opening" state until a node has sufficient capacity to run them. This commit fixes the bug that prevented datafeeds for jobs lazily waiting assignment from being started. The state of such datafeeds is "starting", and they can be stopped by the stop datafeed API while in this state with or without force. Fixes #53763	2020-03-24 10:01:13 +00:00
István Zoltán Szabó	8279f82dea	[DOCS] Fixes typo in start datafeed API docs. (#53811 )	2020-03-19 17:55:26 +01:00
István Zoltán Szabó	57321124ea	[DOCS] Changes seconds to milliseconds since the Epoch in AD docs. (#53797 )	2020-03-19 15:40:53 +01:00
Tom Veasey	58340c2dbe	[ML] Adds the class_assignment_objective parameter to classification (#52763 ) Adds a new parameter for classification that enables choosing whether to assign labels to maximise accuracy or to maximise the minimum class recall. Fixes #52427.	2020-03-12 18:39:29 +00:00
István Zoltán Szabó	77ec60baa0	[DOCS] Adds a warning about reindexing docs with the same ID to the PUT DFA docs. (#53490 )	2020-03-12 18:00:36 +01:00
Benjamin Trent	4e1f029b04	[ML][Inference] adds new default_field_map field to trained models (#53294 ) Adds a new `default_field_map` field to trained model config objects. This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data. The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.	2020-03-11 12:23:56 -04:00
Dimitris Athanasiou	5a32f50d18	[ML] Rename data frame analytics maximum_number_trees to max_trees (#53300 ) Deprecates `maximum_number_trees` parameter of classification and regression and replaces it with `max_trees`.	2020-03-11 10:33:53 +02:00
István Zoltán Szabó	54b66d3385	[DOCS] Makes the description clearer on how to use aggregations in an anomaly detection job (#53103 ) Co-authored-by: lcawl <lcawley@elastic.co>	2020-03-09 09:48:23 +01:00
István Zoltán Szabó	08fcc0b02f	[DOCS] Adds deleting flag to the GET job stats API docs (#53223 )	2020-03-06 16:03:09 +01:00
István Zoltán Szabó	870e1891d9	[DOCS] Makes the naming convention of the DFA response objects coherent (#53172 )	2020-03-05 16:25:43 +01:00
István Zoltán Szabó	d7fb6416dd	[DOCS] Expands GET DFA stat API docs with response objects. (#53107 )	2020-03-05 15:30:30 +01:00
Lisa Cawley	7004216455	[DOCS] Adds link in datafeed indices_options (#53067 )	2020-03-03 10:28:54 -08:00
István Zoltán Szabó	24fe7e5899	[DOCS] Adds response body documentation to GET inference API (#53050 )	2020-03-03 16:25:24 +01:00
Lisa Cawley	b6534834f9	[DOCS] Adds cat anomaly detectors API (#52866 )	2020-02-28 12:15:21 -08:00
Dimitris Athanasiou	dd331935b3	[ML] Parse and report memory usage for DF Analytics (#52778 ) Adds reporting of memory usage for data frame analytics jobs. This commit introduces a new index pattern `.ml-stats-*` whose first concrete index will be `.ml-stats-000001`. This index serves to store instrumentation information for those jobs.	2020-02-28 17:35:07 +02:00
Benjamin Trent	d7a63333b5	[ML] Add indices_options to datafeed config and update (#52793 ) This adds a new configurable field called `indices_options`. This allows users to create or update the indices_options used when a datafeed reads from an index. This is necessary for the following use cases: - Reading from frozen indices - Allowing certain indices in multiple index patterns to not exist yet These index options are available on datafeed creation and update. Users may specify them as URL parameters or within the configuration object. closes https://github.com/elastic/elasticsearch/issues/48056	2020-02-27 12:22:35 -05:00
Lisa Cawley	42fbca7dc6	[DOCS] Adds cat datafeeds API (#52738 )	2020-02-26 09:20:36 -08:00
István Zoltán Szabó	490e8b47e6	[DOCS] Adds cat data frame analytics API (#52764 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-02-26 11:09:37 +01:00
Lisa Cawley	cd069a861c	[DOCS] Updates custom rules example (#52731 )	2020-02-25 09:30:14 -08:00
David Roberts	ca80ad69f2	[ML] Use event.timezone in file_structure_finder ingest pipeline (#52720 ) This is because beat.timezone was renamed to event.timezone in elastic/beats#9458	2020-02-25 12:18:53 +00:00
lcawl	b590b49205	[DOCS] Adds anchor for custom rules	2020-02-24 10:04:34 -08:00
Lisa Cawley	f41ebe47e3	[DOCS] Clarifies description of num_top_feature_importance_values (#52246 ) Co-Authored-By: Valeriy Khakhutskyy <1292899+valeriy42@users.noreply.github.com>	2020-02-18 08:48:24 -08:00
Lisa Cawley	ab139244d7	[DOCS] Fixes, sorts ML tagged regions (#52283 )	2020-02-12 13:43:21 -08:00
David Kyle	f64c6359ed	[ML] Make Ensemble feature names optional (#51996 ) The featureNames field is requisite in individual models but is not required by the Ensemble.	2020-02-07 10:07:18 +00:00
David Roberts	72346b91f9	[ML] Add new categorization stats to model_size_stats (#51879 ) This change adds support for the following new model_size_stats fields: - categorized_doc_count - total_category_count - frequent_category_count - rare_category_count - dead_category_count - categorization_status Relates #50749	2020-02-06 17:08:43 +00:00
Darren LaCasse	ea67e24b7b	[DOCS] Remove extra word (#51757 )	2020-01-31 10:27:37 -08:00
István Zoltán Szabó	67f14c3978	[DOCS] Adds PUT inference API docs (#51231 ) Co-authored-by: Benjamin Trent <ben.w.trent@gmail.com> Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-31 13:12:24 +01:00
Lisa Cawley	32adcd2c9d	[DOCS] Adds missing testenv attribute (#51719 )	2020-01-30 16:13:26 -08:00
David Roberts	a5a2e4eaee	[ML] Use CSV ingest processor in find_file_structure ingest pipeline (#51492 ) Changes the find_file_structure response to include a CSV ingest processor in the ingest pipeline it suggests. Previously the Kibana file upload functionality parsed CSV in the browser, but by parsing CSV in the ingest pipeline it makes the Kibana file upload functionality more easily interchangable with Filebeat such that the configurations it creates can more easily be used to import data with the same structure repeatedly in production.	2020-01-28 12:46:00 +00:00
István Zoltán Szabó	85e581282d	[DOCS] Refines description. (#51400 )	2020-01-24 13:31:44 +01:00
Benjamin Trent	c9e285c1e6	[ML][Inference] add tags url param to GET (#51330 ) Adds a new URL parameter, `tags` to the GET _ml/inference/<model_id> endpoint. This parameter allows the list of models to be further reduced to those who contain all the provided tags.	2020-01-24 07:30:56 -05:00
Lisa Cawley	789aeaedab	[DOCS] Updates categorization examples with wizard screenshots (#51133 )	2020-01-22 11:26:10 -08:00
Lisa Cawley	551a83a2ff	[DOCS] Clarify interval, frequency, and bucket span in ML APIs and example (#51280 )	2020-01-22 08:08:31 -08:00
David Kyle	7978f0b8ef	[ML] Calculate results and snapshot retention using latest bucket timestamps (#51061 ) The retention period is calculated relative to the last bucket result or snapshot time rather than wall clock	2020-01-22 10:08:41 +00:00
István Zoltán Szabó	087a048ee6	[DOCS] Adds text about data types to the categorization docs (#51145 )	2020-01-17 09:52:57 -08:00
Dimitris Athanasiou	24ce598239	[ML] DF Analytics _explain API should skip object fields (#51115 ) Object fields cannot be used as features. At the moment _explain API includes them and even worse it allows it does not error when an object field is excluded. This creates the expectation to the user that all children fields will also be excluded while it's not the case. This commit omits object fields from the _explain API and also adds an error if an object field is included or excluded.	2020-01-17 12:24:17 +02:00
David Kyle	5ad1d0d2cc	Fix hardcoded version replacement in put-dfanalytics.asciidoc (#51056 )	2020-01-16 10:06:45 +00:00
Przemysław Witek	999884d8fb	Add missing docs for new evaluation metrics (#50967 )	2020-01-15 14:23:37 +01:00
István Zoltán Szabó	406810c172	[DOCS] Describes the relationship of the time-related settings in anomaly detection docs (#50959 ) Co-Authored-By: David Roberts <dave.roberts@elastic.co>	2020-01-15 08:45:03 +01:00
Dimitris Athanasiou	4d2be9bd32	[ML] Add num_top_feature_importance_values param to regression and classi… (#50914 ) Adds a new parameter to regression and classification that enables computation of importance for the top most important features. The computation of the importance is based on SHAP (SHapley Additive exPlanations) method.	2020-01-14 15:01:47 +02:00
Lisa Cawley	979a28d2b5	[DOCS] Clarify detector_index property in ML APIs (#50723 )	2020-01-09 08:12:53 -08:00
István Zoltán Szabó	b3457154a3	[DOCS] Fine-tunes data frame analytics API docs formatting. (#50799 )	2020-01-09 16:21:01 +01:00
István Zoltán Szabó	b683f96e23	[DOCS] Moves analysis resources to PUT DFA API docs (#50704 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-09 13:57:11 +01:00
István Zoltán Szabó	659b4ceb97	[DOCS] Improves find_file_structure documentation (#50743 ) Co-authored-by: Lisa Cawley <lcawley@elastic.co>	2020-01-09 11:19:19 +01:00
István Zoltán Szabó	bc21500201	[DOCS] Forms role and privilege requirements as bulleted lists in DFA API docs (#50732 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2020-01-09 10:44:07 +01:00
István Zoltán Szabó	2f55c3566f	[DOCS] Clarifies model_size_stats.total_xxx_field_count objects and removes notes in GET job stats API docs. (#50728 )	2020-01-09 09:43:55 +01:00
István Zoltán Szabó	d5fcb73b1f	[DOCS] Improves description for forecast_stats (#50729 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2020-01-09 09:31:30 +01:00
Lisa Cawley	b13a755842	[DOCS] Adds missing timing_stats descriptions (#50574 )	2020-01-03 09:07:08 -08:00
István Zoltán Szabó	675b98f90c	[DOCS] Fine-tunes training_percent definition. (#50601 )	2020-01-03 14:49:43 +01:00
Dimitris Athanasiou	af0ce426cc	[ML] Implement force deleting a data frame analytics job (#50553 ) Adds a `force` parameter to the delete data frame analytics request. When `force` is `true`, the action force-stops the jobs and then proceeds to the deletion. This can be used in order to delete a non-stopped job with a single request. Closes #48124	2020-01-03 12:01:41 +02:00
István Zoltán Szabó	fd50169c74	[DOCS] Specifies the possible data types of classification dependent_variable (#50582 )	2020-01-03 10:41:38 +01:00
Lisa Cawley	dd4ede5c56	[DOCS] Adds filter and calendar attributes (#50566 )	2020-01-02 10:59:54 -08:00
lcawl	c7408a25f1	[DOCS] Minor fixes in ML APIs	2019-12-30 15:21:18 -08:00
James Rodewig	e8a6d4a3fb	[DOCS] Remove unneeded redirects (#50476 ) The docs/reference/redirects.asciidoc file stores a list of relocated or deleted pages for the Elasticsearch Reference documentation. This prunes several older redirects that are no longer needed and don't require work to fix broken links in other repositories.	2019-12-26 07:49:41 -05:00
Lisa Cawley	6501338a9e	[DOCS] Remove redundant results from ML APIs (#50477 )	2019-12-24 08:34:03 -08:00
Orhan Toy	48342740c5	[DOCS] Fixes "enables you to" typos (#50225 )	2019-12-23 14:38:37 -05:00
Lisa Cawley	362ce41eaf	[DOCS] Updates ML links (#50387 )	2019-12-19 14:47:28 -08:00
lcawl	d8a94f0397	[DOCS] Fixes security links	2019-12-18 11:51:03 -08:00
Lisa Cawley	68e02a19d8	[DOCS] Move machine learning results definitions into APIs (#50257 )	2019-12-18 09:50:31 -08:00
István Zoltán Szabó	50e26d40a2	[DOCS] Adds GET, GET stats and DELETE inference APIs (#50224 ) Co-Authored-By: Lisa Cawley <lcawley@elastic.co>	2019-12-18 09:10:12 +01:00
Lisa Cawley	207094cd67	[DOCS] Moves model snapshot resource definitions into APIs (#50157 ) Co-Authored-By: Ed Savage <32410745+edsavage@users.noreply.github.com>	2019-12-16 10:42:30 -08:00
István Zoltán Szabó	3857e3d94f	[DOCS] Moves data frame analytics job resource definitions into APIs (#50021 )	2019-12-12 10:59:37 +01:00
Lisa Cawley	ca482127fa	[DOCS] Move job count resource definitions into API (#50057 ) Co-Authored-By: Przemysław Witek <przemyslaw.witek@elastic.co> Co-Authored-By: David Roberts <dave.roberts@elastic.co> Co-Authored-By: Ed Savage <32410745+edsavage@users.noreply.github.com>	2019-12-11 11:17:15 -08:00
Lisa Cawley	3d96e6b68e	[DOCS] Move datafeed resource definitions into APIs (#50005 ) Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>	2019-12-11 09:50:41 -08:00
Dimitris Athanasiou	269425b54d	[ML] Introduce randomize_seed setting for regression and classification (#49990 ) This adds a new `randomize_seed` for regression and classification. When not explicitly set, the seed is randomly generated. One can reuse the seed in a similar job in order to ensure the same docs are picked for training.	2019-12-10 10:22:53 +02:00
Lisa Cawley	0f51bc2f72	[DOCS] Move anomaly detection job resource definitions into APIs (#49700 ) Co-Authored-By: István Zoltán Szabó <istvan.szabo@elastic.co>	2019-12-06 15:32:07 -08:00
István Zoltán Szabó	e5d512a8ed	[DOCS] Fixes classification evaluation example response. (#49905 )	2019-12-06 13:24:22 +01:00
István Zoltán Szabó	f7a5b73972	[DOCS] Adds an example of preprocessing actions to the PUT DFA API docs (#49831 )	2019-12-05 14:15:19 +01:00
István Zoltán Szabó	c793e80d3b	[DOCS] Fixes typo in the ML anomaly detection time functions docs. (#49834 )	2019-12-05 09:57:01 +01:00
Dimitris Athanasiou	bad07b76f7	[ML] Add optional source filtering during data frame reindexing (#49690 ) This adds a `_source` setting under the `source` setting of a data frame analytics config. The new `_source` is reusing the structure of a `FetchSourceContext` like `analyzed_fields` does. Specifying includes and excludes for source allows selecting which fields will get reindexed and will be available in the destination index. Closes #49531	2019-11-29 14:20:31 +02:00
lcawl	3b3f3ca925	[DOCS] Fixes typo in ML resources	2019-11-26 10:28:18 -08:00
lcawl	63b944c00f	[DOCS] Fixes data type formatting	2019-11-26 08:21:39 -08:00
David Roberts	40c951d781	[ML] Add default categorization analyzer definition to ML info (#49545 ) The categorization job wizard in the ML UI will use this information when showing the effect of the chosen categorization analyzer on a sample of input.	2019-11-25 13:20:12 +00:00
Dimitris Athanasiou	5a6967af57	[ML][DOCS] Anomaly detection job retention days settings do not require restart (#49546 )	2019-11-25 15:12:41 +02:00
Dimitris Athanasiou	0390ec3627	[ML] Explain data frame analytics API (#49455 ) This commit replaces the _estimate_memory_usage API with a new API, the _explain API. The API consolidates information that is useful before creating a data frame analytics job. It includes: - memory estimation - field selection explanation Memory estimation is moved here from what was previously calculated in the _estimate_memory_usage API. Field selection is a new feature that explains to the user whether each available field was selected to be included or not in the analysis. In the case it was not included, it also explains the reason why.	2019-11-22 20:08:14 +02:00
Lisa Cawley	8d214e851c	[DOCS] Clarify ML job closure prerequisites (#49265 )	2019-11-19 08:31:24 -08:00
David Roberts	b6c6387af5	[TEST] Mute docs snippet test in close-job.asciidoc (#49000 ) Due to https://github.com/elastic/elasticsearch/pull/48583#issuecomment-552991325	2019-11-12 17:31:07 +00:00
Benjamin Trent	ee8853fbc1	[ML] Add new geo_results.(actual_point\|typical_point) fields for `lat_long` results (#47050 ) [ML] Add new geo_results.(actual_point\|typical_point) fields for `lat_long` results (#47050) Related PR: https://github.com/elastic/ml-cpp/pull/809	2019-11-11 13:21:18 -05:00
István Zoltán Szabó	7180b90646	[DOCS] Removes best practice about fields that are highly correlated to the dependent variable. (#48935 )	2019-11-11 10:00:11 -05:00
István Zoltán Szabó	e9cec6e1f7	[DOCS] Extends analyzed_fields description in PUT DFA API docs. (#48307 )	2019-11-11 09:53:59 -05:00
István Zoltán Szabó	6c3fed8d4d	[DOCS] Adds classification type DFA API docs and ml-shared.asciidoc (#48241 )	2019-11-06 07:40:27 -05:00
István Zoltán Szabó	fe92cd0a26	[DOCS] Adds classification type evaluation docs to the DFA evaluation API (#47657 )	2019-11-06 07:37:14 -05:00
Lisa Cawley	29ac34a45c	[DOCS] Re-enable code snippet testing in close anomaly detection job API (#48259 )	2019-10-28 08:08:38 -07:00
David Roberts	d308095b28	[ML] Add option to stop datafeed that finds no data (#47922 ) Adds a new datafeed config option, max_empty_searches, that tells a datafeed that has never found any data to stop itself and close its associated job after a certain number of real-time searches have returned no data.	2019-10-14 13:26:06 +01:00

... 3 4 5 6 7 ...

572 Commits