Commit Graph

454 Commits

Author SHA1 Message Date
Luiz Santos c0f3024c3f
Make it clear that previous enrich indices are deleted every 15 minutes (#109085)
Before this change, one could interpret that enrich policies are executed every 15 minutes, which is not true.
2025-01-29 19:28:43 +01:00
Lisa Cawley ba8beecdb0
[DOCS] More links to new API site (#119377) 2024-12-31 11:32:29 -08:00
Lisa Cawley 5e0fbef58b
[DOCS] Link to new API site (#119038)
Co-authored-by: shainaraskas <58563081+shainaraskas@users.noreply.github.com>
2024-12-30 16:52:16 +00:00
Stef Nestor db1c41b41d
(Doc+) Enrich run on ingest+data nodes not coordinating-only (#119136)
* (Doc+) Enrich run on ingest+data nodes not coordinating-only

👋 howdy, team! I'm not otherwise finding it so documenting https://github.com/elastic/elasticsearch/issues/95969 in ES docs

> Currently we tell users of enrich that they should co-locate the nodes that perform the enrichment (ingest nodes) with the actual enrich data so that enrich operations don't require a remote search operation.

* feedback

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>

---------

Co-authored-by: Lee Hinman <dakrone@users.noreply.github.com>
2024-12-25 08:17:18 -07:00
Sean Story 5255bfb6fb
Replace 'ent-search-generic' with 'search-default' pipeline (#118899)
* Replace 'ent-search-generic' with 'search-default' pipeline

* missed one

* [CI] Auto commit changes from spotless

---------

Co-authored-by: elasticsearchmachine <infra-root+elasticsearchmachine@elastic.co>
2024-12-18 08:03:08 -06:00
Pete Gillin bc25a73543
Update `UpdateForV9` in `AttachmentProcessor` (#118186)
We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
2024-12-09 14:28:24 +00:00
István Zoltán Szabó f27cb5efd3
[DOCS] Adds examples to inference processor docs (#116018) 2024-12-06 09:15:15 +01:00
Joe Gallo dd32cb6439
Document new ip_location processor (#116623) 2024-11-11 19:55:57 -05:00
Joe Gallo 2302cdbe45
Document new ip_location APIs (#116611) 2024-11-11 13:52:47 -05:00
Joe Gallo b517abcb07
Document new ip geolocation fields (#116603) 2024-11-11 11:13:56 -05:00
Giorgos Bamparopoulos 9ad09b6ee0
Fix a typo in the example for using pre-existing pipeline definitions (#116084) 2024-11-04 16:06:16 +01:00
István Zoltán Szabó 9394e88c0f
[DOCS] Updates inference processor docs. (#115566) 2024-10-25 10:18:01 +02:00
Keith Massey 2ff6bb0543
Adding support for additional mapping to simulate ingest API (#114742) 2024-10-21 17:08:50 -05:00
Quentin Pradet fc23f2f1c6
[DOCS] Fix User agent processor properties (#112518) 2024-10-15 17:35:26 +04:00
Pete Gillin c8c6f5af53
Actually add `terminate` docs page (#114440)
A docs page for the `terminate` processor was added in
https://github.com/elastic/elasticsearch/pull/114157, but the change
to include it in the outer processor reference page was omitted. This
change corrects that oversight.
2024-10-10 08:34:43 +01:00
Keith Massey fb482f863d
Adding index_template_substitutions to the simulate ingest API (#114128)
This adds support for a new `index_template_substitutions` field to the
body of an ingest simulate API request. These substitutions can be used
to change the pipeline(s) used for ingest, or to change the mappings
used for validation. It is similar to the
`component_template_substitutions` added in #113276. Here is an example
that shows both of those usages working together:

```
## First, add a couple of pipelines that set a field to a boolean:
PUT /_ingest/pipeline/foo-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "foo",
        "value": true
      }
    }
  ]
}

PUT /_ingest/pipeline/bar-pipeline?pretty
{
  "processors": [
    {
      "set": {
        "field": "bar",
        "value": true
      }
    }
  ]
}

## Now, create three component templates. One provides a mapping enforces that the only field is "foo"
## and that field is a keyword. The next is similar, but adds a `bar` field. The final one provides a setting
## that makes "foo-pipeline" the default pipeline.
## Remember that the "foo-pipeline" sets the "foo" field to a boolean, so using both of these templates
## together would cause a validation exception. These could be in the same template, but are provided
## separately just so that later we can show how multiple templates can be overridden.
PUT _component_template/mappings_template
{
  "template": {
    "mappings": {
      "dynamic": "strict",
      "properties": {
        "foo": {
          "type": "keyword"
        }
      }
    }
  }
}

PUT _component_template/mappings_template_with_bar
{
    "template": {
      "mappings": {
        "dynamic": "strict",
        "properties": {
          "foo": {
            "type": "keyword"
          },
          "bar": {
            "type": "boolean"
          }
        }
      }
    }
}

PUT _component_template/settings_template
{
  "template": {
    "settings": {
      "index": {
        "default_pipeline": "foo-pipeline"
      }
    }
  }
}

## Here we create an index template  pulling in both of the component templates above
PUT _index_template/template_1
{
  "index_patterns": ["foo*"],
  "composed_of": ["mappings_template", "settings_template"]
}

## We can index a document here to create the index, or not. Either way the simulate call ought to work the same
POST foo-1/_doc
{
  "foo": "FOO"
}

## This will not blow up with validation exceptions because the substitute "index_template_substitutions"
## uses `mappings_template_with_bar`, which adds the bar field.
## And the bar-pipeline is executed rather than the foo-pipeline because the substitute
## "index_template_substitutions" uses a substitute `settings_template`, so the value of "foo"
## does not get set to an invalid type.
POST _ingest/_simulate?pretty&index=foo-1
{
  "docs": [
    {
      "_id": "asdf",
      "_source": {
        "foo": "foo",
        "bar": "bar"
      }
    }
  ],
  "component_template_substitutions": {
    "settings_template": {
      "template": {
        "settings": {
          "index": {
            "default_pipeline": "bar-pipeline"
          }
        }
      }
    }
  },
  "index_template_substitutions": {
    "template_1": {
      "index_patterns": ["foo*"],
      "composed_of": ["mappings_template_with_bar", "settings_template"]
    }
  }
}
```
2024-10-09 10:15:37 +11:00
Pete Gillin 43e5258b3c
Add a `terminate` ingest processor (#114157)
This processor simply causes any remaining processors in the pipeline
to be skipped. It will normally be executed conditionally using the
`if` option. (If this pipeline is being called from another pipeline,
the calling pipeline is *not* terminated.)

For example, this:

```
POST /_ingest/pipeline/_simulate
{
  "pipeline":
  {
    "description": "Appends just 'before' to the steps field if the number field
 is present, or both 'before' and 'after' if not",
    "processors": [
      {
        "append": {
          "field": "steps",
          "value": "before"
        }
      },
      {
        "terminate": {
          "if": "ctx.error != null"
        }
      },
      {
        "append": {
          "field": "steps",
          "value": "after"
        }
      }
    ]
  },
  "docs": [
    {
      "_index": "index",
      "_id": "doc1",
      "_source": {
        "name": "okay",
        "steps": []
      }
    },
    {
      "_index": "index",
      "_id": "doc2",
      "_source": {
        "name": "bad",
        "error": "oh no",
        "steps": []
      }
    }
  ]
}
```

returns something like this:

```
{
  "docs": [
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc1",
        "_source": {
          "name": "okay",
          "steps": [
            "before",
            "after"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448881Z"
        }
      }
    },
    {
      "doc": {
        "_index": "index",
        "_version": "-3",
        "_id": "doc2",
        "_source": {
          "name": "bad",
          "error": "oh no",
          "steps": [
            "before"
          ]
        },
        "_ingest": {
          "timestamp": "2024-10-04T16:25:20.448932Z"
        }
      }
    }
  ]
}
```
2024-10-08 17:39:53 +01:00
István Zoltán Szabó 57955cb8d4
[DOCS] Adds DeBERTA v2 to the tokenizers list in API docs (#112752)
Co-authored-by: Max Hniebergall <137079448+maxhniebergall@users.noreply.github.com>
2024-10-07 10:23:46 +02:00
Liam Thompson 6e400c12a7
[DOCS] Port connector docs from Enterprise Search guide (#112953) 2024-09-30 10:22:37 +02:00
Sam Xiao 6917f1679a
Tag redacted document in ingest pipeline (#113552)
Adds a new option trace_redact in redact processor to indicate a document has been redacted in the ingest pipeline. If a document is processed by a redact processor AND any field is redacted, ingest metadata _ingest._redact._is_redacted = true will be set.

Closes #94633
2024-09-27 12:24:24 -04:00
kosabogi 6e73c1423b
Adds text_similarity task type to inference processor documentation (#113517) 2024-09-26 16:12:28 +02:00
Keith Massey cd950bb2fa
Adding component template substitutions to the simulate ingest API (#113276) 2024-09-25 15:30:22 -05:00
Stef Nestor e6b15f4bf7
(Doc+) Inference Pipeline ignores Mapping Analyzers (#112522)
* (Doc+) Inference Pipeline ignores Mapping Analyzers

From internal Dev feedback (will cross-link after), this updates that inference processors within ingest pipelines run before mapping analyzers effectively ignoring them. So if users want analyzers to take effect, they would need to select the analyzer's ingest pipeline process equivalent and run it higher in flow than the inference processor.

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-09-11 16:05:15 -06:00
Keith Massey 4aa3c3d7ee
Add support for templates when validating mappings in the simulate ingest API (#111161) 2024-09-05 09:25:53 -05:00
Panos Koutsovasilis 29453cb2ce
fix: support all allowed protocol numbers (#111528)
* fix(CommunityIdProcessor): support all allowed protocol numbers

* fix(CommunityIdProcessor): update documentation
2024-08-26 08:37:40 +03:00
Niels Bauman e0c1ccbc1e
Make enrich cache based on memory usage (#111412)
The max enrich cache size setting now also supports an absolute max size in bytes (of used heap space) and a percentage of the max heap space, next to the existing flat document count. The default is 1% of the max heap space.

This should prevent issues where the enrich cache takes up a lot of memory when there are large documents in the cache.
2024-08-23 09:26:55 +02:00
István Zoltán Szabó 1ba72e4602
[DOCS] Documents output_field behavior after multiple inference runs (#111875)
Co-authored-by: David Kyle <david.kyle@elastic.co>
2024-08-15 12:36:59 +02:00
Keith Massey c6a7537df7
Ingest download databases docs (#111688)
Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-08-08 09:23:56 -05:00
Joe Gallo 1aa5b2face
Fix geoip processor isp_organization_name property and docs (#111372) 2024-07-26 18:28:44 -04:00
Niels Bauman 86727a8741
Add size_in_bytes to enrich cache stats (#110578)
As preparation for #106081, this PR adds the `size_in_bytes`
field to the enrich cache. This field is calculated by summing
the ByteReference sizes of all the search hits in the cache.
It's not a perfect representation of the size of the enrich cache
on the heap, but some experimentation showed that it's quite close.
2024-07-12 08:53:53 +02:00
Matt Culbreth 81b8495388
Mark the Redact processor as Generally Available 2024-07-02 16:58:57 -04:00
Kathleen DeRusso 7a1d532ffb
Pass over Sparse Vector docs for correctness (#110282)
* Remove legacy mentions of text expansion queries

* Add missing query_vector param to sparse_vector query docs

* Fix formatting errors in sparse vector query dsl doc

* Remove unnecessary test setup block
2024-07-02 13:37:25 -04:00
Joe Gallo d9941f6285
Ingest geoip new databases release highlight (#109355) 2024-06-04 12:48:19 -04:00
Joe Gallo e1b2b599de
Add continent_code support to the geoip processor (#108780) 2024-05-17 11:48:23 -04:00
Joe Gallo babab0a8c0
Add support for the 'Connection Type' database to the geoip processor (#108683) 2024-05-15 17:58:08 -04:00
Keith Massey 639eee577e
Adding user_type support for the enterprise database for the geoip processor (#108687) 2024-05-15 12:23:52 -05:00
Keith Massey 69ec54d541
Add support for the 'ISP' database to the geoip processor (#108651) 2024-05-15 09:27:06 -05:00
Joe Gallo cc6597df23
Add support for the 'Domain' database to the geoip processor (#108639) 2024-05-14 17:49:05 -04:00
Keith Massey bcd62e8d03
Adding hits_time_in_millis and misses_time_in_millis to enrich cache stats (#107579) 2024-04-18 15:19:24 -05:00
Keith Massey 8adc2926a2
Fixed the spelling of the word successful in docs (#107595) 2024-04-18 08:08:30 -05:00
Liam Thompson 33a71e3289
[DOCS] Refactor book-scoped variables in `docs/reference/index.asciidoc` (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
Keith Massey f5c7938ab8
Adding cache_stats to geoip stats API (#107334) 2024-04-16 16:57:14 -05:00
Joe Gallo 6ff3a2628a
Add support for the 'Enterprise' database to the geoip processor (#107377) 2024-04-11 16:45:10 -04:00
Joe Gallo 5266f79b16
Add support for the 'Anonymous IP' database to the geoip processor (#107287) 2024-04-11 14:05:52 -04:00
Keith Massey 48a88c575c
Renaming GeoIpDownloaderStatsAction (#107290)
Renaming GeoIpDownloaderStatsAction to GeoIpStatsAction
2024-04-10 09:21:24 -05:00
Jennie Soria 30828a5680
Update geoip.asciidoc (#105908)
The GeoIP endpoint does not use the xpack http client. The GeoIP downloader uses the JDKs builtin cacerts.

If customer is using custom https endpoint they need to provide the cacert in the jdk, whether our jdk bundled in or their jdk. Otherwise they will see something like
```
...PKiX path building failed: sun.security.provier.certpath.SunCertPathBuilderException: unable to find valid certification path to requested target...
```
2024-03-05 11:26:49 +01:00
Liam Thompson 52aefa59eb
[DOCS] Ingest processors docs improvements (#104384)
* [DOCS] Categorize ingest processors on overview page, summarize use cases

* Add overview info, subheading, links

* Apply suggestions from review

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>

* Insert space

---------

Co-authored-by: István Zoltán Szabó <istvan.szabo@elastic.co>
2024-01-17 11:50:29 +01:00
ShourieG 147484b059
[elasticsearch][processors] - Added support for override flag in rename processor (#103565)
* added override flag for rename processer along with factory tests

* added yaml tests for rename processor using the override flag

* updated renameProcessor tests to include override flag as a parameter

* updated rename processor tests to incorporate override flag = true scenario

* updated rename processor asciidoc with override option

* updated rename processor asciidoc with override option

* removed unnecessary supresswarnings tag

* corrected formatting errors

* updated processor tests

* fixed yaml tests

* Prefer early throw style here

* Whitespace

* Move and rewrite this test

It's just a simple test of the primary behavior of the rename
processor, so put it first and simplify it.

* Rename this test

It doesn't actually exercise template snippets

* Tidy up this test

---------

Co-authored-by: Joe Gallo <joegallo@gmail.com>
2024-01-11 16:00:02 +05:30
Adam Demjen a26ff243f6
[Docs] [Enterprise Search] ML inference pipeline documentation updates (#103022)
* Remove mapping step, wording and screenshot updates

* Notes about pipeline name and model deployment

* Address CR comments
2024-01-02 09:56:50 -05:00
Abdon Pijpelink ac973f0064
[DOCS] Improve enrich policy execute 'wait_for_completion' docs (#102291)
* [DOCS] Improve enrich policy execute 'wait_for_completion' docs

* Update docs/reference/ingest/apis/enrich/execute-enrich-policy.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

---------

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-11-27 17:17:06 +01:00