elasticsearch/docs/reference/ingest/processors
Benjamin Trent 20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
..
append.asciidoc
bytes.asciidoc
circle.asciidoc Geo: Switch generated GeoJson type names to camel case (#50285) (#50400) 2019-12-20 04:47:42 -10:00
common-options.asciidoc
convert.asciidoc
csv.asciidoc Add empty_value parameter to CSV processor (#51567) 2020-02-05 22:36:00 +01:00
date-index-name.asciidoc Remove type field from DocWriteRequest and associated Response objects (#47671) 2019-10-11 10:23:55 +01:00
date.asciidoc
dissect.asciidoc [DOCS] Add anchors for Asciidoctor migration (#41648) 2019-04-30 10:19:09 -04:00
dot-expand.asciidoc
drop.asciidoc
enrich.asciidoc [DOCS] Explicitly document enrich `target_field` includes `match_field` (#49407) 2019-12-02 09:12:21 -05:00
fail.asciidoc
foreach.asciidoc
geoip.asciidoc Allow list of IPs in geoip ingest processor (#49573) 2019-12-06 21:57:06 +01:00
grok.asciidoc Docs: Fix & test more grok processor documentation (#49447) 2019-12-03 11:47:27 +01:00
gsub.asciidoc
html_strip.asciidoc Add HTML strip processor (#41888) 2019-05-09 12:59:45 +02:00
inference.asciidoc [ML] Adds feature importance to option to inference processor (#52218) 2020-02-21 16:36:21 -05:00
join.asciidoc
json.asciidoc
kv.asciidoc
lowercase.asciidoc
pipeline.asciidoc Add pipeline name to ingest metadata (#50467) 2020-01-15 16:17:05 +01:00
remove.asciidoc
rename.asciidoc
script.asciidoc Remove type field from DocWriteRequest and associated Response objects (#47671) 2019-10-11 10:23:55 +01:00
set-security-user.asciidoc Expose more authentication info to ingest pipeline (#51305) 2020-02-10 13:56:07 +11:00
set.asciidoc Remove type field from DocWriteRequest and associated Response objects (#47671) 2019-10-11 10:23:55 +01:00
sort.asciidoc
split.asciidoc Add option to split processor for preserving trailing empty fields (#48664) 2019-10-30 07:23:47 -05:00
trim.asciidoc
uppercase.asciidoc
url-decode.asciidoc
user-agent.asciidoc [DOCS] Correct required file ext for user agent ingest processor (#48688) 2019-10-30 11:10:35 -04:00