elasticsearch/docs/reference/aggregations/search-aggregations-pipelin...

---
navigation_title: "{{infer-cap}} bucket"
mapped_pages:
  - https://www.elastic.co/guide/en/elasticsearch/reference/current/search-aggregations-pipeline-inference-bucket-aggregation.html
---

# {{infer-cap}} bucket aggregation [search-aggregations-pipeline-inference-bucket-aggregation]


A parent pipeline aggregation which loads a pre-trained model and performs {{infer}} on the collated result fields from the parent bucket aggregation.

To use the {{infer}} bucket aggregation, you need to have the same security privileges that are required for using the [get trained models API](https://www.elastic.co/docs/api/doc/elasticsearch/operation/operation-ml-get-trained-models).

## Syntax [inference-bucket-agg-syntax]

A `inference` aggregation looks like this in isolation:

```js
{
  "inference": {
    "model_id": "a_model_for_inference", <1>
    "inference_config": { <2>
      "regression_config": {
        "num_top_feature_importance_values": 2
      }
    },
    "buckets_path": {
      "avg_cost": "avg_agg", <3>
      "max_cost": "max_agg"
    }
  }
}
```

1. The unique identifier or alias for the trained model.
2. The optional inference config which overrides the model’s default settings
3. Map the value of `avg_agg` to the model’s input field `avg_cost`


$$$inference-bucket-params$$$

| Parameter Name | Description | Required | Default Value |
| --- | --- | --- | --- |
| `model_id` | The ID or alias for the trained model. | Required | - |
| `inference_config` | Contains the inference type and its options. There are two types: [`regression`](#inference-agg-regression-opt) and [`classification`](#inference-agg-classification-opt) | Optional | - |
| `buckets_path` | Defines the paths to the input aggregations and maps the aggregation names to the field names expected by the model.See [`buckets_path` Syntax](/reference/aggregations/pipeline.md#buckets-path-syntax) for more details | Required | - |


## Configuration options for {{infer}} models [_configuration_options_for_infer_models]

The `inference_config` setting is optional and usually isn’t required as the pre-trained models come equipped with sensible defaults. In the context of aggregations some options can be overridden for each of the two types of model.


#### Configuration options for {{regression}} models [inference-agg-regression-opt]

`num_top_feature_importance_values`
:   (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. By default, it is zero and no {{feat-imp}} calculation occurs.


#### Configuration options for {{classification}} models [inference-agg-classification-opt]

`num_top_classes`
:   (Optional, integer) Specifies the number of top class predictions to return. Defaults to 0.

`num_top_feature_importance_values`
:   (Optional, integer) Specifies the maximum number of [{{feat-imp}}](docs-content://explore-analyze/machine-learning/data-frame-analytics/ml-feature-importance.md) values per document. Defaults to 0 which means no {{feat-imp}} calculation occurs.

`prediction_field_type`
:   (Optional, string) Specifies the type of the predicted field to write. Valid values are: `string`, `number`, `boolean`. When `boolean` is provided `1.0` is transformed to `true` and `0.0` to `false`.


## Example [inference-bucket-agg-example]

The following snippet aggregates a web log by `client_ip` and extracts a number of features via metric and bucket sub-aggregations as input to the {{infer}} aggregation configured with a model trained to identify suspicious client IPs:

```console
GET kibana_sample_data_logs/_search
{
  "size": 0,
  "aggs": {
    "client_ip": { <1>
      "composite": {
        "sources": [
          {
            "client_ip": {
              "terms": {
                "field": "clientip"
              }
            }
          }
        ]
      },
      "aggs": { <2>
        "url_dc": {
          "cardinality": {
            "field": "url.keyword"
          }
        },
        "bytes_sum": {
          "sum": {
            "field": "bytes"
          }
        },
        "geo_src_dc": {
          "cardinality": {
            "field": "geo.src"
          }
        },
        "geo_dest_dc": {
          "cardinality": {
            "field": "geo.dest"
          }
        },
        "responses_total": {
          "value_count": {
            "field": "timestamp"
          }
        },
        "success": {
          "filter": {
            "term": {
              "response": "200"
            }
          }
        },
        "error404": {
          "filter": {
            "term": {
              "response": "404"
            }
          }
        },
        "error503": {
          "filter": {
            "term": {
              "response": "503"
            }
          }
        },
        "malicious_client_ip": { <3>
          "inference": {
            "model_id": "malicious_clients_model",
            "buckets_path": {
              "response_count": "responses_total",
              "url_dc": "url_dc",
              "bytes_sum": "bytes_sum",
              "geo_src_dc": "geo_src_dc",
              "geo_dest_dc": "geo_dest_dc",
              "success": "success._count",
              "error404": "error404._count",
              "error503": "error503._count"
            }
          }
        }
      }
    }
  }
}
```

1. A composite bucket aggregation that aggregates the data by `client_ip`.
2. A series of metrics and bucket sub-aggregations.
3. {{infer-cap}} bucket aggregation that specifies the trained model and maps the aggregation names to the model’s input fields.