Commit Graph

219 Commits

Author SHA1 Message Date
bellengao 8ffe5d1f94
Support array for all string ingest processors 2020-03-17 15:22:30 -05:00
Benjamin Trent 970f726c1f
[ML] renaming inference processor field field_mappings to new name field_map (#53433)
This renames the `inference` processor configuration field `field_mappings` to `field_map`. 

`field_mappings` is now deprecated.
2020-03-12 12:49:25 -04:00
James Rodewig bc7643c65b
[DOCS] Reduce content reuse in enrich docs (#53460)
Restructures the 'Update an enrich policy' section to:

* Migrate the content to the section. It was previously stored in the
  Put Enrich Policy API docs.
* Remove the warning tag admonition from the section content.
* Replace a reused section earlier in the "Set up an enrich processor"
  page with a link.

No substantive changes were made to the content.
2020-03-12 05:40:57 -04:00
Benjamin Trent 4e1f029b04
[ML][Inference] adds new default_field_map field to trained models (#53294)
Adds a new `default_field_map` field to trained model config objects. 

This allows the model creator to supply field map if it knows that there should be some map for inference to work directly against the training data.

The use case internally is having analytics jobs supply a field mapping for multi-field fields. This allows us to use the model "out of the box" on data where we trained on `foo.keyword` but the `_source` only references `foo`.
2020-03-11 12:23:56 -04:00
Orhan Toy bce4a3bd4b
[DOCS] Fix formatting of simulate ingest pipeline API docs (#52754)
Wraps request routes for the simulate ingest pipelines in the API docs. This ensures the routes display in monospace.
2020-03-02 11:41:22 -05:00
David Pilato e51b8a51aa
[DOS] Fix typo in CSV processor docs (#52649)
Corrects an example array in a snippet of the CSV processor docs.
2020-02-25 08:47:58 -05:00
bellengao 21061f7479
[DOCS] Fix typo in ingest node docs (#52671) 2020-02-25 07:51:02 -05:00
Benjamin Trent 20f54272f0
[ML] Adds feature importance to option to inference processor (#52218)
This adds machine learning model feature importance calculations to the inference processor. 

The new flag in the configuration matches the analytics parameter name: `num_top_feature_importance_values`
Example:
```
"inference": {
   "field_mappings": {},
   "model_id": "my_model",
   "inference_config": {
      "regression": {
         "num_top_feature_importance_values": 3
      }
   }
}
```

This will write to the document as follows:
```
"inference" : {
   "feature_importance" : { 
      "FlightTimeMin" : -76.90955548511226,
      "FlightDelayType" : 114.13514762158526,
      "DistanceMiles" : 13.731580450792187
   },
   "predicted_value" : 108.33165831875137,
   "model_id" : "my_model"
}
```

This is done through calculating the [SHAP values](https://arxiv.org/abs/1802.03888). 

It requires that models have populated `number_samples` for each tree node. This is not available to models that were created before 7.7. 

Additionally, if the inference config is requesting feature_importance, and not all nodes have been upgraded yet, it will not allow the pipeline to be created. This is to safe-guard in a mixed-version environment where only some ingest nodes have been upgraded.

NOTE: the algorithm is a Java port of the one laid out in ml-cpp: https://github.com/elastic/ml-cpp/blob/master/lib/maths/CTreeShapFeatureImportance.cc

usability blocked by: https://github.com/elastic/ml-cpp/pull/991
2020-02-21 16:36:21 -05:00
Russ Cam 94f6f946ef
Specify name on enrich.get_policy as list type (#50217)
This commit updates the enrich.get_policy API to specify name
as a list, in line with other URL parts that accept a comma-separated
list of values. 

In addition, update the get enrich policy API docs
to align the URL part name in the documentation with
the name used in the REST API specs.
2020-02-20 12:33:06 +11:00
Yang Wang 5c9f79534f
Expose more authentication info to ingest pipeline (#51305)
The changes add more granularity for identiying the data ingestion user.
The ingest pipeline can now be configure to record authentication realm and
type. It can also record API key name and ID when one is in use. 
This improves traceability when data are being ingested from multiple agents
and will become more relevant with the incoming support of required
pipelines (#46847)

Resolves: #49106
2020-02-10 13:56:07 +11:00
Przemko Robakowski 5560135542
Add empty_value parameter to CSV processor (#51567)
* Add empty_value parameter to CSV processor

This change adds `empty_value` parameter to the CSV processor.
This value is used to fill empty fields. Fields will be skipped
if this parameter is ommited. This behavior is the same for both
quoted and unquoted fields.

* docs updated

* Fix compilation problem

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2020-02-05 22:36:00 +01:00
David Kyle 34743bcd6f
[ML] Remove stray field from inference docs (#51870)
model_info_field is not a valid option
2020-02-05 10:49:36 +00:00
Florian Kelbert bd52041f92
[DOCS] Remove unneeded comma from CSV processor example (#51859) 2020-02-04 09:23:43 -05:00
István Zoltán Szabó 4e0e6e83e0
[DOCS] Fixes indentation in inference processor code snippet (#51252) 2020-01-21 16:21:17 +01:00
Martijn van Groningen 2b2935fd52
Add pipeline name to ingest metadata (#50467)
This commit adds the name of the current pipeline to ingest metadata.
This pipeline name is accessible under the following key: '_ingest.pipeline'.

Example usage in pipeline:
PUT /_ingest/pipeline/2
{
    "processors": [
        {
            "set": {
                "field": "pipeline_name",
                "value": "{{_ingest.pipeline}}"
            }
        }
    ]
}

Closes #42106
2020-01-15 16:17:05 +01:00
Igor Motov 7f81467378
Geo: Switch generated GeoJson type names to camel case (#50285) (#50400)
Switches generated GeoJson type names to camel case
to conform to the standard.

Closes #49568
2019-12-20 04:47:42 -10:00
István Zoltán Szabó b8cae37374
[DOCS] Adds inference processor documentation (#50204)
Co-Authored-By: Lisa Cawley <lcawley@elastic.co>
2019-12-19 12:19:44 +01:00
Igor Motov a26e4d1e5e
Geo: Switch generated WKT to upper case (#50285)
Switches generated WKT to upper case to
conform to the standard recommendation.

Relates #49568
2019-12-18 07:28:56 -10:00
Przemko Robakowski 64e1a774fc
CSV ingest processor (#49509)
* CSV Processor for Ingest

This change adds new ingest processor that breaks line from CSV file into separate fields.
By default it conforms to RFC 4180 but can be tweaked.

Closes #49113
2019-12-11 14:52:04 +01:00
Przemko Robakowski c57032f622
Allow list of IPs in geoip ingest processor (#49573)
* Allow list of IPs in geoip ingest processor

This change lets you use array of IPs in addition to string in geoip processor source field.
It will set array containing geoip data for each element in source, unless first_only parameter
option is enabled, then only first found will be returned.

Closes #46193
2019-12-06 21:57:06 +01:00
Alexander Reelsen 062f9f03bf
Docs: Fix & test more grok processor documentation (#49447)
The documentation contained a small error, as bytes and duration was not
properly converted to a number and thus remained a string.

The documentation is now also properly tested by providing a full blown
simulate pipeline example.
2019-12-03 11:47:27 +01:00
James Rodewig 37baa50815
[DOCS] Explicitly document enrich `target_field` includes `match_field` (#49407)
When the enrich processor appends enrich data to an incoming document,
it adds a `target_field` to contain the enrich data.

This `target_field` contains both the `match_field` AND `enrich_fields`
specified in the enrich policy.

Previously, this was reflected in the documented example but not
explicitly stated. This adds several explicit statements to the docs.
2019-12-02 09:12:21 -05:00
Martijn van Groningen 88aea2107d
Add templating support to pipeline processor. (#49030)
This commit adds templating support to the pipeline processor's `name` option.

Closes #39955
2019-11-27 13:45:11 +01:00
Martijn van Groningen 4013e814e8
Add templating support to enrich processor (#49093)
Adds support for templating to `field` and `target_field` options.
2019-11-27 07:52:42 +01:00
Martijn van Groningen 2ba00c8149
Introduce on_failure_pipeline ingest metadata inside on_failure block (#49076)
In case an exception occurs inside a pipeline processor,
the pipeline stack is kept around as header in the exception.
Then in the on_failure processor the id of the pipeline the
exception occurred is made accessible via the `on_failure_pipeline`
ingest metadata.

Closes #44920
2019-11-26 14:49:51 +01:00
Lisa Cawley 9cc247d929
[DOCS] Fixes security links (#49563) 2019-11-25 12:59:59 -08:00
James Rodewig 71ca343874
[DOCS] Clean up example pipeline and enrich policy in docs snippets (#49341) 2019-11-19 17:02:58 -05:00
James Rodewig c9e9685bfd
[DOCS] Add high-level docs for enrich processor and policies (#49194)
* [DOCS] Add high-level docs for enrich policies

* fix typos

* fix typo

* add warning for enrich policy changes

* add addtl cross-links to execute API docs

* Reword match and geo_match policy example headings
2019-11-19 13:56:51 -05:00
James Rodewig 4ccd3a2b3f
[DOCS] Correct required file ext for user agent ingest processor (#48688)
For the user agent ingest processor, custom regex files must end
with the `.yml` file extension.

This corrects the docs which said the `.yaml` extension was required.
2019-10-30 11:10:35 -04:00
Dan Hermann fcc18dc19b
Add option to split processor for preserving trailing empty fields (#48664) 2019-10-30 07:23:47 -05:00
Shaunak Kashyap 93ecb9b7ab [DOCS] Remove extraneous comma in Enrich Stats API's JSON response (#48539) 2019-10-25 11:35:13 -05:00
James Rodewig 25d3add88a
[DOCS] Remove duplicate links for ingest processor overview (#48394) 2019-10-23 10:54:53 -05:00
Martijn van Groningen 1ef8dc4030
Also validate source index at put enrich policy time. (#48254)
This changes tests to create a valid
source index prior to creating the enrich policy.
2019-10-21 19:34:57 +02:00
Alexander Reelsen fd65eec64c update ingest-user-agent regexes.yml (#47807)
This new regexes are from:
154eba17f5/regexes.yaml
2019-10-18 16:14:44 +02:00
James Rodewig 17610e740a
[DOCS] Add `wait_for_completion` parm to execute enrich policy API docs (#48077) 2019-10-15 13:46:55 -04:00
Martijn van Groningen ddf3bc25d8
Change how `max_matches` affects `target_field` option. (#47982)
Prior to this change the `target_field` would always be a json array
field in the document being ingested. This to take into account that
multiple enrich documents could be inserted into the `target_field`.

However the default `max_matches` is `1`. Meaning that by default
only a single enrich document would be added to `target_field` json
array field.

This commit changes this; if `max_matches` is set to `1` then the single
document would be added as a json object to the `target_field` and
if it is configured to a higher value then the enrich documents will be
added as a json array (even if a single enrich document happens to be
enriched).
2019-10-14 21:04:47 +02:00
Martijn van Groningen e06598ba56
Merge remote-tracking branch 'es/master' into enrich 2019-10-14 10:17:18 +02:00
Alan Woodward 566e1b7d33
Remove type field from DocWriteRequest and associated Response objects (#47671)
This commit removes the type field from index, update and delete requests, and their
associated responses.

Relates to #41059
2019-10-11 10:23:55 +01:00
James Rodewig 17eef81f83
[DOCS] Add docs for `geo_match` enrich policy type (#47745) 2019-10-09 08:39:11 -04:00
Tal Levy 4d3f6816a7 Merge remote-tracking branch 'elastic/master' into enrich 2019-10-04 13:30:57 -07:00
James Rodewig 6ef5300e13
[DOCS] Reformat simulate pipeline API (#47301) 2019-10-01 14:29:05 -04:00
James Rodewig e2b9c1b764
[DOCS] Reformat put pipeline API (#47171) 2019-10-01 14:19:26 -04:00
James Rodewig 4ebb44ffaf
[DOCS] Reformat delete pipeline API (#47172) 2019-09-30 09:44:41 -04:00
Martijn van Groningen a23c7af811
Add config namespace in get policy api response (#47162)
Currently the policy config is placed directly in the json object
of the toplevel `policies` array field. For example:

```
{
    "policies": [
        {
            "match": {
                "name" : "my-policy",
                "indices" : ["users"],
                "match_field" : "email",
                "enrich_fields" : [
                    "first_name",
                    "last_name",
                    "city",
                    "zip",
                    "state"
                ]
            }
        }
    ]
}
```

This change adds a `config` field in each policy json object:

```
{
    "policies": [
        {
            "config": {
                "match": {
                    "name" : "my-policy",
                    "indices" : ["users"],
                    "match_field" : "email",
                    "enrich_fields" : [
                        "first_name",
                        "last_name",
                        "city",
                        "zip",
                        "state"
                    ]
                }
            }
        }
    ]
}
```

This allows us in the future to add other information about policies
in the get policy api response.

The UI will consume this API to build an overview of all policies.
The UI may in the future include additional information about a policy
and the plan is to include that in the get policy api, so that this
information can be gathered in a single api call.

An example of the information that is likely to be added is:
* Last policy execution time
* The status of a policy (executing, executed, unexecuted)
* Information about the last failure if exists
2019-09-30 14:36:53 +02:00
Martijn van Groningen f676d9730d
Merge remote-tracking branch 'es/master' into enrich 2019-09-27 13:51:17 +02:00
James Rodewig 223110491b
[DOCS] Reformat get pipeline API (#47131) 2019-09-26 08:26:01 -04:00
Alan Woodward c1f99e2d75
Remove `_type` from SearchHit (#46942)
This commit removes the `_type` field from all search hit responses.

Relates to #41059
2019-09-23 19:14:54 +01:00
James Rodewig 2d77751716 [DOCS] Minor editorial changes to enrich docs 2019-09-23 13:23:57 -04:00
Martijn van Groningen 1118da0199
fixed tests 2019-09-23 11:08:58 +02:00
Martijn van Groningen afc16ba518
Merge remote-tracking branch 'es/master' into enrich 2019-09-23 09:34:53 +02:00