Commit Graph

6 Commits

Author SHA1 Message Date
Benjamin Trent 970f726c1f
[ML] renaming inference processor field field_mappings to new name field_map (#53433)
This renames the `inference` processor configuration field `field_mappings` to `field_map`. 

`field_mappings` is now deprecated.
2020-03-12 12:49:25 -04:00
Benjamin Trent 4e1f029b04
[ML][Inference] adds new default_field_map field to trained models (#53294)
Adds a new `default_field_map` field to trained model config objects. 

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 12:23:56 -04:00
Benjamin Trent 20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
David Kyle 34743bcd6f
[ML] Remove stray field from inference docs (#51870)
model_info_field is not a valid option
2020-02-05 10:49:36 +00:00
István Zoltán Szabó 4e0e6e83e0
[DOCS] Fixes indentation in inference processor code snippet (#51252) 2020-01-21 16:21:17 +01:00
István Zoltán Szabó b8cae37374
[DOCS] Adds inference processor documentation (#50204)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-19 12:19:44 +01:00