Commit Graph

389 Commits

Author SHA1 Message Date
Joe Gallo 3284903205
Document the redact processor's skip_if_unlicensed option (#99063) 2023-08-31 14:00:12 -04:00
James Baiera 7d990d5a09
Allow custom geo ip database files to be downloaded (#97850)
This PR extends the assumptions we make about database file availability to all database file 
names instead of the default ones we host at Elastic. When creating a geo ip processor with 
a database name that is not recognized we unilaterally convert the processor to one that 
tags documents with a missing database message until the database file requested is 
downloaded or provided via the manual configuration route. This allows a pipeline to be 
created and for the download service to be started, potentially sourcing the needed files.

---------

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2023-08-16 00:31:51 -04:00
James Rodewig fe6a42b35f
[DOCS] Update Elastic GeoIP service link (#97455)
Adds TOS-related query parameters to the Elastic GeoIP link in the [GeoIP ingest processor docs](https://www.elastic.co/guide/en/elasticsearch/reference/current/geoip-processor.html). The current link returns a 400 HTTP status.
2023-07-07 10:53:02 -04:00
Aurélien FOUCRET dd1d157b47
Enable analytics geoip in behavioral analytics. (#96624)
* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.

* When using a managed pipeline GeoIpDownloader is triggered only when an index exists for the pipeline.

* Adding the geoip processor back

* Adding tags to the events mapping.

* Fix a forbidden API call into tests.

* lint

* Adding an integration tests for managed pipelines.

* lint

* Add a geoip_database_lazy_download param to pipelines and use it instead of managed.

* Fix a edge case: pipeline can be set after index is created.

* lint.

* Update docs/changelog/96624.yaml

* Update 96624.yaml

* Uses a processor setting (download_database_on_pipeline_creation) to decide database download strategy.

* Removing debug instruction.

* Improved documentation.

* Improved the way to check for referenced pipelines.

* Fixing an error in test.

* Improved integration tests.

* Lint.

* Fix failing tests.

* Fix failing tests (2).

* Adding javadoc.

* lint javadoc.

* Using a set instead of a list to store checked pipelines.
2023-06-15 23:42:10 +02:00
debadair 777598d602
[DOCS] Remove redirect pages (#88738)
* [DOCS] Remove manual redirects

* [DOCS] Removed refs to modules-discovery-hosts-providers

* [DOCS] Fixed broken internal refs

* Fixing bad cross links in ES book, and adding redirects.asciidoc[] back into docs/reference/index.asciidoc.

* Update docs/reference/search/point-in-time-api.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/setup/restart-cluster.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/sql/endpoints/translate.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update docs/reference/snapshot-restore/restore-snapshot.asciidoc

Co-authored-by: James Rodewig <james.rodewig@elastic.co>

* Update repository-azure.asciidoc

* Update node-tool.asciidoc

* Update repository-azure.asciidoc

---------

Co-authored-by: amyjtechwriter <61687663+amyjtechwriter@users.noreply.github.com>
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Co-authored-by: Amy Jonsson <amy.jonsson@elastic.co>
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-05-24 12:32:46 +01:00
István Zoltán Szabó b164555072
[DOCS] Adds deployment ID param documentation to trained model APIs (#96174) 2023-05-17 15:56:58 +02:00
amyjtechwriter c3e186ea01
Example of dot notation to access and array field for set processor. (#95893) 2023-05-09 10:21:27 +01:00
amyjtechwriter 3d6143b829
Nodes need access to storage.googleapis.com for geoip. (#95554) 2023-04-28 10:40:18 +01:00
Felix Barnsteiner 11b598a519
Add reroute processor (#76511) 2023-04-18 19:09:25 +02:00
Joe Gallo 9bc09d576a
Fix ignore_missing docs for a couple of Ingest processors (#95244) 2023-04-13 16:34:40 -04:00
Aurélien FOUCRET 9071d114f5
[Ingest Processor] Add `ignore_missing` param to the `uri_parts` ingest processor. (#95068) 2023-04-13 15:11:19 +02:00
Jean-Fabrice Bobo a7e901263b
Update geoip.asciidoc (#95101)
Fix `ingest.geoip.downloader.eager.download` setting not appearing in the rendered documentation
2023-04-12 09:59:27 +02:00
Alessandro Stoltenberg c787e3808f
docs: set-processor minor update (#94899) 2023-03-30 14:27:05 +02:00
Dimitris Kotsakos 38a09bea60
[ML] Make redact processor experimental for first release (#94683) 2023-03-23 18:28:03 +02:00
Joe Gallo 36aeb00835
Add an example of dot_expander's path option (#94291) 2023-03-06 09:26:40 -05:00
David Kyle f8e306e688
Rewrite Redact Processor docs intro (#93856)
Focus on what redact does rather than describing Grok
2023-02-16 14:17:54 +00:00
David Kyle b588d2ddd7
Redact Ingest Processor (#92951)
The Redact processor uses the Grok rules engine to
redact text in the input document that matches the
Grok pattern. For example Email or IP addresses can
be redacted using the definitions from the standard
Grok pattern bank. New patterns can be defined in
the processor configuration
2023-02-07 17:10:07 +00:00
Craig Taverner c18078e11e
Geo_grid ingest processor docs (#93507)
* Add docs for geo_grid ingest processor

Adds docs for https://github.com/elastic/elasticsearch/pull/93370

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Update docs/reference/ingest/processors/geo-grid.asciidoc

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>

* Consistent GeoJSON case

---------

Co-authored-by: Abdon Pijpelink <abdon.pijpelink@elastic.co>
2023-02-06 16:17:00 +01:00
Keith Massey 13b71900a6
Download the geoip databases only when needed (#92335)
This commit changes the geoip downloader so that we only download the geoip databases if you
have at least one geoip processor in your cluster, or when you add a new geoip processor (or if
`ingest.geoip.downloader.eager.download` is explicitly set to true).
2023-01-30 13:07:48 -06:00
Keith Massey f327352601
Making JsonProcessor stricter so that it does not silently drop data (#93179)
This PR makes JsonProcessor's JSON parsing a little bit stricter so that
we are not silently dropping data when given bad inputs. Previously if
the input string began with something that could be parsed as a valid
json field, then the processor would grab that and ignore the rest. For
example, `123 "foo"` would be parsed as `123`, dropping the `"foo"`. Now
by default it will throw an IllegalArgumentException on a string like
this. A user can now set the `strict_json_parsing` parameter to false to
get the old behavior. For example:

```
POST _ingest/pipeline/_simulate
{
  "pipeline": {
    "description": "",
    "processors" : [
      {
        "json" : {
          "field" : "message",
          "strict_json_parsing": false
        }
      }
    ]
  },
  "docs": [
    {
      "_source": {
        "message": "123 \"foo\""
      }
    }
  ]
}'
```

Closes #92898
2023-01-24 18:43:35 -05:00
David Kilfoyle 37f7b7b325
[Docs] Update remove processor with 'keep' option (#92836) 2023-01-11 12:52:35 -05:00
Keith Massey d5b4584612
Adding more detail about ingest.geoip.downloader.endpoint (#91182) 2022-11-09 09:17:33 -06:00
Roberto Seldner 8e35a6a846
Update documentation with supported IANA numbers (#90531)
Based on this:
https://github.com/elastic/elasticsearch/blob/main/modules/ingest-common/src/main/java/org/elasticsearch/ingest/common/CommunityIdProcessor.java#L440-L451
2022-10-19 08:23:11 -05:00
Lee Hinman 4fe9fc488c
Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)
This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
2022-10-04 01:04:40 +10:30
Abdon Pijpelink 00d4953df5
[DOCS] Fixes broken example in pipeline tutorial (#89315) 2022-08-16 08:40:10 +02:00
István Zoltán Szabó 5372c51dfd
[DOCS] Fixes a link that breaks the docs build. (#88111) 2022-06-28 10:22:23 +02:00
Ryan Ernst eed8da3919
Move the ingest attachment processor to the default distribution (#87989)
The ingest attachment processor is currently available as a plugin. This
commit moves the processor to the default distribution so it is always
available.
2022-06-28 02:10:36 -04:00
Stuart Tettemer d42211c431
Ingest: IngestDocument requires non-null version (#87665)
Changes the type of the version parameter in `IngestDocument` from
`Long` to `long` and moves it to the third argument, so all required
values occur before nullable arguments.

The `IngestService` expects a non-null version for a document and will
throw an `NullPointerException` if one is not provided.

Related: #87309
2022-06-15 07:50:45 -05:00
Martijn van Groningen 7154608abf
Allow pipeline processor to ignore missing pipelines (#87354)
Add `ignore_missing_pipeline` option to `pipeline` processor. This
controls whether the `pipeline` processor should fail with an error if
no pipeline with a name specified in the `name` option exists.

This enhancement is useful to setup a pipeline infrastructure that
lazily adds extension points for overwrites. So that for specific
cluster setups custom pre-processing can be added at a later point in
time.

Relates to #87323
2022-06-07 07:02:18 -04:00
wallrik 10f53f8766
Clarify environments with strict firewalls and GEOIP (RE: #85637) (#86648) 2022-05-23 06:43:26 -06:00
Luca Belluccini 1c52081b1f
[DOC] Air gapped environments and GEOIP (#85637)
* [DOC] Air gapped environments and GEOIP

Closing https://github.com/elastic/elasticsearch/issues/85542

* Use variable name for Elasticsearch

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-05-10 16:34:28 -04:00
Benjamin Trent 258d2b71e2
[ML] add roberta/bart docs (#85001)
adds roberta section to NLP tokenization documentation.
2022-03-17 12:14:57 -04:00
Benjamin Trent 45deac4c96
[ML] add windowing support for text_classification (#83989)
This commit adds initial windowing support for text_classification tasks.

Specifically, a user can now indicate a span (non-negative) indicating the tokenization windowing span when creating
sub-sequences.

Default value is span: -1 indicates that no windowing should take place.
2022-03-01 08:29:12 -05:00
James Rodewig d3d468e5f1
[DOCS] Update screenshots for ingest pipeline docs (#83845)
https://github.com/elastic/kibana/pull/101216 adds a new ECS mapper feature to the Ingest Pipelines UI. This updates the ES docs to cover the new feature.
2022-02-23 10:50:02 -05:00
Chris 3e72ffcac9
[DOCS] Change license abbreviation (#82266)
As far as I can see the correct abbreviation for the CC `Attribution-ShareAlike 4.0 International` License is `CC BY-SA 4.0` https://creativecommons.org/licenses/by-sa/4.0/
2022-01-13 09:38:42 -05:00
David Kyle 1473b09415
[ML] Add NLP inference configs to the inference processor docs (#82320) 2022-01-11 08:50:45 +00:00
James Rodewig f1004ee698
[DOCS] Fix xref for conditionally running ingest processor (#82001)
Closes #81966
2021-12-21 11:37:20 -05:00
Lisa Cawley 076343933f
[DOCS] Update link in inference processor (#81897) 2021-12-17 15:49:59 -08:00
Dan Hermann b1f5373e02
Correct docs on output_format option for date processor (#81557) 2021-12-17 06:07:03 -06:00
Lisa Cawley b18f5fd2c6
[DOCS] Fixes link to language identification example (#81347) 2021-12-03 17:21:04 -08:00
Jan Doberstein 73b3d8f639
Update execute-enrich-policy.asciidoc (#80750)
Changed the wording, as the execution of the policy does not trigger the delete. That delete is done periodical and can be configured with the `enrich.cleanup_period` 

https://www.elastic.co/guide/en/elasticsearch/reference/7.16/enrich-setup.html#ingest-enrich-settings
2021-11-16 11:57:04 +01:00
James Rodewig f56a0f4b66
[DOCS] Remove `testenv` annotations from doc snippet tests (#80023)
Removes `testenv` annotations and related code. These annotations originally let you skip x-pack snippet tests in the docs. However, that's no longer possible.

Relates to #79309, #31619
2021-11-05 18:38:50 -04:00
edh-oss 3c23a9e9cd
[DOCS] Remove `[testenv="gold+"]` attributes (#79309)
Changes:

* Removes several `[testenv="gold+"]` attributes from the docs. `gold+` is not a valid [subscription level](https://www.elastic.co/subscriptions) or testenv value.
* Moves two `[testenv="basic"]` attributes to the file header. This makes the `testenv` placement consistent and fixes the yml file generated from `docs/reference/snapshot-restore/register-repository.asciidoc`.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-27 16:32:30 -04:00
Michael Bischoff c30ab868ee
[DOCS] Document range enrich policy (#79607)
Adding docs for the range enrich policy

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-26 15:15:53 +02:00
Dan Hermann a23f58f809
[DOCS] `if_version` parameter for OCC on pipeline updates (#79640) 2021-10-25 08:25:26 -05:00
James Rodewig 58abbe941f
[DOCS] Fix cluster update settings refs (#79580)
The API is named 'cluster update settings,' not 'update cluster settings.'
2021-10-20 13:16:35 -04:00
Nikola Grcevski 055c770083
Deprecation of transient cluster settings (#78794)
This PR changes uses of transient cluster settings to
persistent cluster settings. 

The PR also deprecates the transient settings usage.

Relates to #49540
2021-10-15 13:00:52 -04:00
Martijn van Groningen 230e866842
Document a number of enrich node settings. (#78930)
Add a section in the docs that describe a number of node level settings
for the enrich processor.

Co-authored-by: James Rodewig <40268737+jrodewig@users.noreply.github.com>
2021-10-14 15:00:45 +02:00
Martijn van Groningen 04e5823a69
Remove default maxmind geoip databases from distribution (#78362)
* Adjusted integration tests to use geoip test fixture or to use test databases provided via config dirs (for qa module / docs).
* Kept the geolite2-databases dependency for most of the unit tests only.
* Made fallback_to_default_databases parameter on geoip processor a noop and emit deprecation warning upon using it.
* If no geoip databases are available yet to a node then the geoip processor factory returns a processor implementation that flags documents that databases are unavailable. This allows these documents to be reindex later with a pipeline. These documents will have a tag string array field, which contains a string _geoip_database_unavailable_{database_name} for each missing database in a pipeline.
* Added reload pipeline capabilities is IngestService, so that when databases are available again on a node then pipelines with geoip processor definition can be reloaded.

Relates to #68920
2021-10-13 14:52:18 +02:00
James Rodewig a763a86a0d
[DOCS] Update ingest node pipeline refs (#78770)
In https://github.com/elastic/kibana/pull/113783, we renamed Kibana's **Ingest Pipelines** feature to **Ingest Pipelines**. This updates screenshots and references for the feature. It also replaces a few remaining `ingest node pipeline` references.
2021-10-12 08:18:24 -04:00