Commit Graph

447 Commits

Author SHA1 Message Date
Joe Gallo dd32cb6439
Document new ip_location processor (#116623) 2024-11-11 19:55:57 -05:00
Joe Gallo 2302cdbe45
Document new ip_location APIs (#116611) 2024-11-11 13:52:47 -05:00
Joe Gallo b517abcb07
Document new ip geolocation fields (#116603) 2024-11-11 11:13:56 -05:00
Giorgos Bamparopoulos 9ad09b6ee0
Fix a typo in the example for using pre-existing pipeline definitions (#116084) 2024-11-04 16:06:16 +01:00
István Zoltán Szabó 9394e88c0f
[DOCS] Updates inference processor docs. (#115566) 2024-10-25 10:18:01 +02:00
Keith Massey 2ff6bb0543
Adding support for additional mapping to simulate ingest API (#114742) 2024-10-21 17:08:50 -05:00
Quentin Pradet fc23f2f1c6
[DOCS] Fix User agent processor properties (#112518) 2024-10-15 17:35:26 +04:00
Pete Gillin c8c6f5af53
Actually add `terminate` docs page (#114440)
A docs page for the `terminate` processor was added in
https://github.com/elastic/elasticsearch/pull/114157, but the change
to include it in the outer processor reference page was omitted. This
change corrects that oversight.
2024-10-10 08:34:43 +01:00
Keith Massey fb482f863d
Adding index_template_substitutions to the simulate ingest API (#114128)
This adds support for a new `index_template_substitutions` field to the
body of an ingest simulate API request. These substitutions can be used
to change the pipeline(s) used for ingest, or to change the mappings
used for validation. It is similar to the
`component_template_substitutions` added in #113276. Here is an example
that shows both of those usages working together:

```
## First, add a couple of pipelines that set a field to a boolean:
PUT /_ingest/pipeline/foo-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": true
      }
    }
  ]
}

PUT /_ingest/pipeline/bar-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "bar",
        "value": true
      }
    }
  ]
}

## Now, create three component templates. One provides a mapping enforces that the only field is "foo"
## and that field is a keyword. The next is similar, but adds a `bar` field. The final one provides a setting
## that makes "foo-pipeline" the default pipeline.
## Remember that the "foo-pipeline" sets the "foo" field to a boolean, so using both of these templates
## together would cause a validation exception. These could be in the same template, but are provided
## separately just so that later we can show how multiple templates can be overridden.
PUT _component_template/mappings_template
{
  "template": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "foo": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT _component_template/mappings_template_with_bar
{
    "template": {
      "mappings": {
        "dynamic": "strict",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "boolean"
          }
        }
      }
    }
}

PUT _component_template/settings_template
{
  "template": {
    "settings": {
      "index": {
        "default_pipeline": "foo-pipeline"
      }
    }
  }
}

## Here we create an index template  pulling in both of the component templates above
PUT _index_template/template_1
{
  "index_patterns": ["foo*"],
  "composed_of": ["mappings_template", "settings_template"]
}

## We can index a document here to create the index, or not. Either way the simulate call ought to work the same
POST foo-1/_doc
{
  "foo": "FOO"
}

## This will not blow up with validation exceptions because the substitute "index_template_substitutions"
## uses `mappings_template_with_bar`, which adds the bar field.
## And the bar-pipeline is executed rather than the foo-pipeline because the substitute
## "index_template_substitutions" uses a substitute `settings_template`, so the value of "foo"
## does not get set to an invalid type.
POST _ingest/_simulate?pretty&index=foo-1
{
  "docs": [
    {
      "_id": "asdf",
      "_source": {
        "foo": "foo",
        "bar": "bar"
      }
    }
  ],
  "component_template_substitutions": {
    "settings_template": {
      "template": {
        "settings": {
          "index": {
            "default_pipeline": "bar-pipeline"
          }
        }
      }
    }
  },
  "index_template_substitutions": {
    "template_1": {
      "index_patterns": ["foo*"],
      "composed_of": ["mappings_template_with_bar", "settings_template"]
    }
  }
}
```
2024-10-09 10:15:37 +11:00
Pete Gillin 43e5258b3c
Add a `terminate` ingest processor (#114157)
This processor simply causes any remaining processors in the pipeline
to be skipped. It will normally be executed conditionally using the
`if` option. (If this pipeline is being called from another pipeline,
the calling pipeline is *not* terminated.)

For example, this:

```
POST /_ingest/pipeline/_simulate
{
  "pipeline":
  {
    "description": "Appends just 'before' to the steps field if the number field
 is present, or both 'before' and 'after' if not",
    "processors": [
      {
        "append": {
          "field": "steps",
          "value": "before"
        }
      },
      {
        "terminate": {
          "if": "ctx.error != null"
        }
      },
      {
        "append": {
          "field": "steps",
          "value": "after"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "doc1",
      "_source": {
        "name": "okay",
        "steps": []
      }
    },
    {
      "_index": "index",
      "_id": "doc2",
      "_source": {
        "name": "bad",
        "error": "oh no",
        "steps": []
      }
    }
  ]
}
```

returns something like this:

```
{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc1",
        "_source": {
          "name": "okay",
          "steps": [
            "before",
            "after"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448881Z"
        }
      }
    },
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc2",
        "_source": {
          "name": "bad",
          "error": "oh no",
          "steps": [
            "before"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448932Z"
        }
      }
    }
  ]
}
```
2024-10-08 17:39:53 +01:00
István Zoltán Szabó 57955cb8d4
[DOCS] Adds DeBERTA v2 to the tokenizers list in API docs (#112752)
Co-authored-by: Max Hniebergall <137079448+maxhniebergall@users.noreply.github.com>
2024-10-07 10:23:46 +02:00
Liam Thompson 6e400c12a7
[DOCS] Port connector docs from Enterprise Search guide (#112953) 2024-09-30 10:22:37 +02:00
Sam Xiao 6917f1679a
Tag redacted document in ingest pipeline (#113552)
Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes #94633
2024-09-27 12:24:24 -04:00
kosabogi 6e73c1423b
Adds text_similarity task type to inference processor documentation (#113517) 2024-09-26 16:12:28 +02:00
Keith Massey cd950bb2fa
Adding component template substitutions to the simulate ingest API (#113276) 2024-09-25 15:30:22 -05:00
Stef Nestor e6b15f4bf7
(Doc+) Inference Pipeline ignores Mapping Analyzers (#112522)
* (Doc+) Inference Pipeline ignores Mapping Analyzers

From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-09-11 16:05:15 -06:00
Keith Massey 4aa3c3d7ee
Add support for templates when validating mappings in the simulate ingest API (#111161) 2024-09-05 09:25:53 -05:00
Panos Koutsovasilis 29453cb2ce
fix: support all allowed protocol numbers (#111528)
* fix(CommunityIdProcessor): support all allowed protocol numbers

* fix(CommunityIdProcessor): update documentation
2024-08-26 08:37:40 +03:00
Niels Bauman e0c1ccbc1e
Make enrich cache based on memory usage (#111412)
The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space.

This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.
2024-08-23 09:26:55 +02:00
István Zoltán Szabó 1ba72e4602
[DOCS] Documents output_field behavior after multiple inference runs (#111875)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-08-15 12:36:59 +02:00
Keith Massey c6a7537df7
Ingest download databases docs (#111688)
Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-08-08 09:23:56 -05:00
Joe Gallo 1aa5b2face
Fix geoip processor isp_organization_name property and docs (#111372) 2024-07-26 18:28:44 -04:00
Niels Bauman 86727a8741
Add size_in_bytes to enrich cache stats (#110578)
As preparation for #106081, this PR adds the `size_in_bytes`
field to the enrich cache. This field is calculated by summing
the ByteReference sizes of all the search hits in the cache.
It's not a perfect representation of the size of the enrich cache
on the heap, but some experimentation showed that it's quite close.
2024-07-12 08:53:53 +02:00
Matt Culbreth 81b8495388
Mark the Redact processor as Generally Available 2024-07-02 16:58:57 -04:00
Kathleen DeRusso 7a1d532ffb
Pass over Sparse Vector docs for correctness (#110282)
* Remove legacy mentions of text expansion queries

* Add missing query_vector param to sparse_vector query docs

* Fix formatting errors in sparse vector query dsl doc

* Remove unnecessary test setup block
2024-07-02 13:37:25 -04:00
Joe Gallo d9941f6285
Ingest geoip new databases release highlight (#109355) 2024-06-04 12:48:19 -04:00
Joe Gallo e1b2b599de
Add continent_code support to the geoip processor (#108780) 2024-05-17 11:48:23 -04:00
Joe Gallo babab0a8c0
Add support for the 'Connection Type' database to the geoip processor (#108683) 2024-05-15 17:58:08 -04:00
Keith Massey 639eee577e
Adding user_type support for the enterprise database for the geoip processor (#108687) 2024-05-15 12:23:52 -05:00
Keith Massey 69ec54d541
Add support for the 'ISP' database to the geoip processor (#108651) 2024-05-15 09:27:06 -05:00
Joe Gallo cc6597df23
Add support for the 'Domain' database to the geoip processor (#108639) 2024-05-14 17:49:05 -04:00
Keith Massey bcd62e8d03
Adding hits_time_in_millis and misses_time_in_millis to enrich cache stats (#107579) 2024-04-18 15:19:24 -05:00
Keith Massey 8adc2926a2
Fixed the spelling of the word successful in docs (#107595) 2024-04-18 08:08:30 -05:00
Liam Thompson 33a71e3289
[DOCS] Refactor book-scoped variables in `docs/reference/index.asciidoc` (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Keith Massey f5c7938ab8
Adding cache_stats to geoip stats API (#107334) 2024-04-16 16:57:14 -05:00
Joe Gallo 6ff3a2628a
Add support for the 'Enterprise' database to the geoip processor (#107377) 2024-04-11 16:45:10 -04:00
Joe Gallo 5266f79b16
Add support for the 'Anonymous IP' database to the geoip processor (#107287) 2024-04-11 14:05:52 -04:00
Keith Massey 48a88c575c
Renaming GeoIpDownloaderStatsAction (#107290)
Renaming GeoIpDownloaderStatsAction to GeoIpStatsAction
2024-04-10 09:21:24 -05:00
Jennie Soria 30828a5680
Update geoip.asciidoc (#105908)
The GeoIP endpoint does not use the xpack http client. The GeoIP downloader uses the JDKs builtin cacerts.

If customer is using custom https endpoint they need to provide the cacert in the jdk, whether our jdk bundled in or their jdk. Otherwise they will see something like
```
...PKiX path building failed: sun.security.provier.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target...
```
2024-03-05 11:26:49 +01:00
Liam Thompson 52aefa59eb
[DOCS] Ingest processors docs improvements (#104384)
* [DOCS] Categorize ingest processors on overview page, summarize use cases

* Add overview info, subheading, links

* Apply suggestions from review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Insert space

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-01-17 11:50:29 +01:00
ShourieG 147484b059
[elasticsearch][processors] - Added support for override flag in rename processor (#103565)
* added override flag for rename processer along with factory tests

* added yaml tests for rename processor using the override flag

* updated renameProcessor tests to include override flag as a parameter

* updated rename processor tests to incorporate override flag = true scenario

* updated rename processor asciidoc with override option

* updated rename processor asciidoc with override option

* removed unnecessary supresswarnings tag

* corrected formatting errors

* updated processor tests

* fixed yaml tests

* Prefer early throw style here

* Whitespace

* Move and rewrite this test

It's just a simple test of the primary behavior of the rename
processor, so put it first and simplify it.

* Rename this test

It doesn't actually exercise template snippets

* Tidy up this test

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-01-11 16:00:02 +05:30
Adam Demjen a26ff243f6
[Docs] [Enterprise Search] ML inference pipeline documentation updates (#103022)
* Remove mapping step, wording and screenshot updates

* Notes about pipeline name and model deployment

* Address CR comments
2024-01-02 09:56:50 -05:00
Abdon Pijpelink ac973f0064
[DOCS] Improve enrich policy execute 'wait_for_completion' docs (#102291)
* [DOCS] Improve enrich policy execute 'wait_for_completion' docs

* Update docs/reference/ingest/apis/enrich/execute-enrich-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

---------

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-11-27 17:17:06 +01:00
Abdon Pijpelink bc59315baa
[DOCS] Examples for ES|QL DISSECT and WHERE (#102591)
* DISSECT examples

* WHERE examples

* Remove references to empty keys

* Fix non-deterministic test
2023-11-27 10:56:48 +01:00
Keith Massey 643d825c45
Adding a simulate ingest api (#101409)
This commit introduces a new _ingest/simulate API that runs any pipelines
on the given data that would be executed for a given index, but instead of
indexing the data into the index, returns the transformed documents and
the list of pipelines that were executed.
2023-11-15 17:25:09 -06:00
Liam Thompson ddd94446f8
[DOCS] Fix incorrect image paths (#102082) 2023-11-13 16:00:00 +01:00
Felix Barnsteiner 978a5469ce
Add support for marking component templates as deprecated (#101148) 2023-11-02 19:28:20 +01:00
István Zoltán Szabó c34e0c0746
[DOCS] Clarifies that inference input must be single string (#101301) 2023-10-25 17:18:05 +02:00
Liam Thompson a6ed18c144
[DOCS] [Enterprise Search] Migrate ingest pipelines/ML docs (#101156)
* WIP, port docs

- Update link syntax
- Update ids
- Fix n^n build failures :/
-

* Fix id for doclink

* Let's try this on for size

* Idem

* Update attributes, Test image rendering

* Update image name

* Fix typo

* Update filename

* Add images, cleanup, standardize naming

* Tweak heading

* Cleanup, rewordings

- Modified introduction in `search-inference-processing.asciidoc`.
- Changed "Search connector" to "Elastic connector".
- Adjusted heading levels in `search-inference-processing.asciidoc`.
- Simplified ingest pipelines intro in `search-ingest-pipelines.asciidoc`.
- Edited ingest pipelines section for the *Content* UI.
- Reordered file inclusions in `search-ingest-pipelines.asciidoc`.
- Formatted inference pipeline creation into steps in `search-nlp-tutorial.asciidoc`.

* Lingering erroneousness

* Delete FAQ
2023-10-25 17:17:24 +02:00
Abdon Pijpelink 284f81873f
[DOCS] Expand ES|QL DISSECT and GROK documentation (#101225)
* Add 'Process data with DISSECT and GROK' page

* Expand DISSECT docs

* More DISSECT and GROK enhancements

* Improve examples

* Fix CSV tests

* Review feedback

* Reword
2023-10-25 13:19:17 +02:00