elasticsearch/docs/reference
Benjamin Trent 20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
..
aggregations [DOCS] Links transforms in aggregation docs (#52563) 2020-02-21 08:22:04 +01:00
analysis [DOCS] Fixed typo in jump link. (#52302) 2020-02-12 17:52:11 -08:00
autoscaling Add autoscaling API skelton (#51564) 2020-02-06 19:15:17 -05:00
cat [DOCS] Fix `disk.used_percent` typo in `_cat/nodes` docs (#51854) 2020-02-04 09:15:06 -05:00
ccr
cluster [DOCS] Add missing `indices` parms returned by `_nodes/stats` (#52055) 2020-02-21 08:07:28 -05:00
commands [Docs] Fix typo in node-tool.asciidoc (#51667) 2020-01-31 10:38:27 +01:00
docs
eql [DOCS] Add EQL limitations page (#52001) 2020-02-12 08:45:15 -05:00
graph
high-availability
how-to [DOCS] Fix index_prefixes link in 'faster prefix queries' docs (#51833) 2020-02-04 08:36:16 -05:00
ilm Allow forcemerge in the hot phase for ILM policies (#52073) 2020-02-07 15:26:00 -07:00
images SQL: update ODBC docs, cover Cloud ID, latest params (#52291) 2020-02-19 17:33:48 +01:00
index-modules Remove translog retention settings (#51697) 2020-01-31 08:18:07 -05:00
indices [DOC] Remove definition typo in update alias API docs (#52184) 2020-02-14 08:30:26 -05:00
ingest [ML] Adds feature importance to option to inference processor (#52218) 2020-02-21 16:36:21 -05:00
licensing [DOCS] Augments update license API (#51903) 2020-02-05 11:07:02 -08:00
mapping Add support for multipoint geoshape queries (#52133) 2020-02-20 08:53:01 +01:00
migration Remove fixed_auto_queue_size threadpool type (#52280) 2020-02-14 16:20:40 +01:00
ml [DOCS] Clarifies description of num_top_feature_importance_values (#52246) 2020-02-18 08:48:24 -08:00
modules [DOCS] Document how CCS handles cluster-level settings (#49941) 2020-02-19 09:14:22 -05:00
monitoring Stricter checks of setup and teardown in docs tests (#51430) 2020-01-28 17:53:57 +01:00
query-dsl Add a cluster setting to disallow expensive queries (#51385) 2020-02-12 18:06:04 +01:00
release-notes
rest-api [DOCS] Adds X-Pack usage API (#52496) 2020-02-20 09:25:57 -08:00
rollup
scripting Scripting: Add char position of script errors (#51069) 2020-01-21 10:57:09 -07:00
search [DOCS] Fixed typo. (#52071) 2020-02-07 11:03:56 -08:00
settings [DOCS] Correct important note for xpack.transform.enabled (#52194) 2020-02-11 12:54:09 +00:00
setup [DOCS] Switch to standard ESS trial links (#52552) 2020-02-21 12:04:39 -05:00
slm Correct SLM retention timezone documentation (#52533) 2020-02-19 13:45:32 -07:00
snapshot-restore [DOCS] Align with ILM API docs (#48705) 2020-01-22 20:44:19 -08:00
sql SQL: specify command to run the CLI on a remote machine without Elasticsearch (#52626) 2020-02-21 13:26:31 +02:00
testing
transform [DOCS] Correct important note for xpack.transform.enabled (#52194) 2020-02-11 12:54:09 +00:00
upgrade Goodbye and thank you synced flush! (#50882) 2020-01-16 09:43:07 -05:00
vectors
aggregations.asciidoc
analysis.asciidoc [DOCS] Add attribute for Lucene analysis links (#51687) 2020-01-30 11:22:30 -05:00
api-conventions.asciidoc
cat.asciidoc
cluster.asciidoc Password-protected Keystore Feature Branch PR (#51123) 2020-01-27 19:51:39 -05:00
data-rollup-transform.asciidoc
docs.asciidoc
frozen-indices.asciidoc
getting-started.asciidoc [DOCS] Switch to standard ESS trial links (#52552) 2020-02-21 12:04:39 -05:00
glossary.asciidoc [DOCS] Split off ILM overview to a separate topic. (#51287) 2020-01-27 19:39:24 -08:00
gs-index.asciidoc
high-availability.asciidoc
how-to.asciidoc
index-extra-title-page.html
index-modules.asciidoc Deprecate creation of dot-prefixed index names except for hidden and system indices (#49959) 2020-01-27 17:18:26 -07:00
index.asciidoc [DOCS] Include docs on permanently unreleased branches only (#51743) 2020-02-11 11:22:49 -05:00
index.x.asciidoc
indices.asciidoc Goodbye and thank you synced flush! (#50882) 2020-01-16 09:43:07 -05:00
ingest.asciidoc
intro.asciidoc
mapping.asciidoc
modules.asciidoc [DOCS] Align with ILM API docs (#48705) 2020-01-22 20:44:19 -08:00
query-dsl.asciidoc Add a cluster setting to disallow expensive queries (#51385) 2020-02-12 18:06:04 +01:00
redirects.asciidoc [DOCS] Add basic EQL search tutorial docs (#51574) 2020-02-12 08:40:10 -05:00
release-notes.asciidoc
scripting.asciidoc
search.asciidoc
setup.asciidoc
testing.asciidoc
upgrade.asciidoc