elasticsearch/docs/reference/enrich-processor/attachment.md

---
navigation_title: "Attachment"
mapped_pages:
  - https://www.elastic.co/guide/en/elasticsearch/reference/current/attachment.html
---

# Attachment processor [attachment]


The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library [Tika](https://tika.apache.org/).

The source field must be a base64 encoded binary. If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then.

## Using the attachment processor in a pipeline [using-attachment]

$$$attachment-options$$$

| Name | Required | Default | Description |
| --- | --- | --- | --- |
| `field` | yes | - | The field to get the base64 encoded field from |
| `target_field` | no | attachment | The field that will hold the attachment information |
| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit. |
| `indexed_chars_field` | no | `null` | Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`. |
| `properties` | no | all properties |  Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language` |
| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document |
| `remove_binary` | encouraged | `false` | If `true`, the binary `field` will be removed from the document. This option is not required, but setting it explicitly is encouraged, and omitting it will result in a warning. |
| `resource_name` | no |  | Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable [Resource Name Based Detection](https://tika.apache.org/1.24.1/detection.html#Resource_Name_Based_Detection). |


### Example [attachment-json-ex]

If attaching files to JSON documents, you must first encode the file as a base64 string. On Unix-like systems, you can do this using a `base64` command:

```shell
base64 -in myfile.rtf
```

The command returns the base64-encoded string for the file. The following base64 string is for an `.rtf` file containing the text `Lorem ipsum dolor sit amet`: `e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=`.

Use an attachment processor to decode the string and extract the file’s properties:

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "remove_binary": true
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my-index-000001/_doc/my_id
```

The document’s `attachment` object contains extracted properties for the file:

```console-result
{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 22,
  "_primary_term": 1,
  "_source": {
    "attachment": {
      "content_type": "application/rtf",
      "language": "ro",
      "content": "Lorem ipsum dolor sit amet",
      "content_length": 28
    }
  }
}
```


## Exported fields [attachment-fields]

The fields which might be extracted from a document are:

* `content`,
* `title`,
* `author`,
* `keywords`,
* `date`,
* `content_type`,
* `content_length`,
* `language`,
* `modified`,
* `format`,
* `identifier`,
* `contributor`,
* `coverage`,
* `modifier`,
* `creator_tool`,
* `publisher`,
* `relation`,
* `rights`,
* `source`,
* `type`,
* `description`,
* `print_date`,
* `metadata_date`,
* `latitude`,
* `longitude`,
* `altitude`,
* `rating`,
* `comments`

To extract only certain `attachment` fields, specify the `properties` array:

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "properties": [ "content", "title" ],
        "remove_binary": true
      }
    }
  ]
}
```

::::{note}
Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.
::::


## Keeping the attachment binary [attachment-keep-binary]

Keeping the binary as a field within the document might consume a lot of resources. It is highly recommended to remove that field from the document, by setting `remove_binary` to `true` to automatically remove the field, as in the other examples shown on this page. If you *do* want to keep the binary field, explicitly set `remove_binary` to `false` to avoid the warning you get from omitting it:

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information including original binary",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "remove_binary": false
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my-index-000001/_doc/my_id
```

The document’s `_source` object includes the original binary field:

```console-result
{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 22,
  "_primary_term": 1,
  "_source": {
    "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
    "attachment": {
      "content_type": "application/rtf",
      "language": "ro",
      "content": "Lorem ipsum dolor sit amet",
      "content_length": 28
    }
  }
}
```


## Use the attachment processor with CBOR [attachment-cbor]

To avoid encoding and decoding JSON to base64, you can instead pass CBOR data to the attachment processor. For example, the following request creates the `cbor-attachment` pipeline, which uses the attachment processor.

```console
PUT _ingest/pipeline/cbor-attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "remove_binary": true
      }
    }
  ]
}
```

The following Python script passes CBOR data to an HTTP indexing request that includes the `cbor-attachment` pipeline. The HTTP request headers use a `content-type` of `application/cbor`.

::::{note}
Not all {{es}} clients support custom HTTP request headers.
::::


```python
import cbor2
import requests

file = 'my-file'
headers = {'content-type': 'application/cbor'}

with open(file, 'rb') as f:
  doc = {
    'data': f.read()
  }
  requests.put(
    'http://localhost:9200/my-index-000001/_doc/my_id?pipeline=cbor-attachment',
    data=cbor2.dumps(doc),
    headers=headers
  )
```


## Limit the number of extracted chars [attachment-extracted-chars]

To prevent extracting too many chars and overload the node memory, the number of chars being used for extraction is limited by default to `100000`. You can change this value by setting `indexed_chars`. Use `-1` for no limit but ensure when setting this that your node will have enough HEAP to extract the content of very big documents.

You can also define this limit per document by extracting from a given field the limit to set. If the document has that field, it will overwrite the `indexed_chars` setting. To set this field, define the `indexed_chars_field` setting.

For example:

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars" : 11,
        "indexed_chars_field" : "max_size",
        "remove_binary": true
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
}
GET my-index-000001/_doc/my_id
```

Returns this:

```console-result
{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id",
  "_version": 1,
  "_seq_no": 35,
  "_primary_term": 1,
  "_source": {
    "attachment": {
      "content_type": "application/rtf",
      "language": "is",
      "content": "Lorem ipsum",
      "content_length": 11
    }
  }
}
```

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars" : 11,
        "indexed_chars_field" : "max_size",
        "remove_binary": true
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id_2?pipeline=attachment
{
  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
  "max_size": 5
}
GET my-index-000001/_doc/my_id_2
```

Returns this:

```console-result
{
  "found": true,
  "_index": "my-index-000001",
  "_id": "my_id_2",
  "_version": 1,
  "_seq_no": 40,
  "_primary_term": 1,
  "_source": {
    "max_size": 5,
    "attachment": {
      "content_type": "application/rtf",
      "language": "sl",
      "content": "Lorem",
      "content_length": 5
    }
  }
}
```


## Using the attachment processor with arrays [attachment-with-arrays]

To use the attachment processor within an array of attachments the [foreach processor](/reference/enrich-processor/foreach-processor.md) is required. This enables the attachment processor to be run on the individual elements of the array.

For example, given the following source:

```js
{
  "attachments" : [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}
```

In this case, we want to process the data field in each element of the attachments field and insert the properties into the document so the following `foreach` processor is used:

```console
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information from arrays",
  "processors" : [
    {
      "foreach": {
        "field": "attachments",
        "processor": {
          "attachment": {
            "target_field": "_ingest._value.attachment",
            "field": "_ingest._value.data",
            "remove_binary": true
          }
        }
      }
    }
  ]
}
PUT my-index-000001/_doc/my_id?pipeline=attachment
{
  "attachments" : [
    {
      "filename" : "ipsum.txt",
      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
    },
    {
      "filename" : "test.txt",
      "data" : "VGhpcyBpcyBhIHRlc3QK"
    }
  ]
}
GET my-index-000001/_doc/my_id
```

Returns this:

```console-result
{
  "_index" : "my-index-000001",
  "_id" : "my_id",
  "_version" : 1,
  "_seq_no" : 50,
  "_primary_term" : 1,
  "found" : true,
  "_source" : {
    "attachments" : [
      {
        "filename" : "ipsum.txt",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "this is\njust some text",
          "content_length" : 24
        }
      },
      {
        "filename" : "test.txt",
        "attachment" : {
          "content_type" : "text/plain; charset=ISO-8859-1",
          "language" : "en",
          "content" : "This is a test",
          "content_length" : 16
        }
      }
    ]
  }
}
```

Note that the `target_field` needs to be set, otherwise the default value is used which is a top level field `attachment`. The properties on this top level field will contain the value of the first attachment only. However, by specifying the `target_field` on to a value on `_ingest._value` it will correctly associate the properties with the correct attachment.
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								---
 								navigation_title: "Attachment"
 								mapped_pages:
 								  - https://www.elastic.co/guide/en/elasticsearch/reference/current/attachment.html
 								---
 								# Attachment processor [attachment]
 								The attachment processor lets Elasticsearch extract file attachments in common formats (such as PPT, XLS, and PDF) by using the Apache text extraction library [Tika](https://tika.apache.org/).
 								The source field must be a base64 encoded binary. If you do not want to incur the overhead of converting back and forth between base64, you can use the CBOR format instead of JSON and specify the field as a bytes array instead of a string representation. The processor will skip the base64 decoding then.
 								## Using the attachment processor in a pipeline [using-attachment]
 								$$$attachment-options$$$
 								| Name | Required | Default | Description |
 								| --- | --- | --- | --- |
 								| `field` | yes | - | The field to get the base64 encoded field from |
 								| `target_field` | no | attachment | The field that will hold the attachment information |
 								| `indexed_chars` | no | 100000 | The number of chars being used for extraction to prevent huge fields. Use `-1` for no limit. |
 								| `indexed_chars_field` | no | `null` | Field name from which you can overwrite the number of chars being used for extraction. See `indexed_chars`. |
-												[DOCS] Replace irregular whitespaces in docs (#128199)

* Replace irregular whitespaces

* More chars
											
										
										
											2025-05-20 22:20:22 +08:00
+								| `properties` | no | all properties |  Array of properties to select to be stored. Can be `content`, `title`, `name`, `author`, `keywords`, `date`, `content_type`, `content_length`, `language` |
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								| `ignore_missing` | no | `false` | If `true` and `field` does not exist, the processor quietly exits without modifying the document |
 								| `remove_binary` | encouraged | `false` | If `true`, the binary `field` will be removed from the document. This option is not required, but setting it explicitly is encouraged, and omitting it will result in a warning. |
-												[DOCS] fix external links (#124248)


											
										
										
											2025-03-07 00:27:03 +08:00
+								| `resource_name` | no |  | Field containing the name of the resource to decode. If specified, the processor passes this resource name to the underlying Tika library to enable [Resource Name Based Detection](https://tika.apache.org/1.24.1/detection.html#Resource_Name_Based_Detection). |
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
 								### Example [attachment-json-ex]
 								If attaching files to JSON documents, you must first encode the file as a base64 string. On Unix-like systems, you can do this using a `base64` command:
 								```shell
-												[DOCS] Clarify ingest attachment example (#65143)


											
										
										
											2020-11-18 03:17:45 +08:00
+								base64 -in myfile.rtf
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												[DOCS] Clarify ingest attachment example (#65143)


											
										
										
											2020-11-18 03:17:45 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								The command returns the base64-encoded string for the file. The following base64 string is for an `.rtf` file containing the text `Lorem ipsum dolor sit amet`: `e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=`.
-												[DOCS] Clarify ingest attachment example (#65143)


											
										
										
											2020-11-18 03:17:45 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								Use an attachment processor to decode the string and extract the file’s properties:
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								PUT _ingest/pipeline/attachment
-												Ingest: Add attachment processor

This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303

											
										
										
											2016-02-09 21:57:05 +08:00
+								{
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								  "description" : "Extract attachment information",
-												Ingest: Add attachment processor

This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303

											
										
										
											2016-02-09 21:57:05 +08:00
+								  "processors" : [
 								    {
 								      "attachment" : {
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								        "field" : "data",
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								        "remove_binary": true
-												Ingest: Add attachment processor

This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303

											
										
										
											2016-02-09 21:57:05 +08:00
+								      }
 								    }
 								  ]
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/my_id?pipeline=attachment
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								{
 								  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_doc/my_id
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								The document’s `attachment` object contains extracted properties for the file:
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console-result
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								{
 								  "found": true,
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								  "_index": "my-index-000001",
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								  "_id": "my_id",
 								  "_version": 1,
-												Add doc's sequence number + primary term to GetResult and use it for updates (#36680)

This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
											
										
										
											2018-12-17 22:22:13 +08:00
+								  "_seq_no": 22,
 								  "_primary_term": 1,
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								  "_source": {
 								    "attachment": {
 								      "content_type": "application/rtf",
 								      "language": "ro",
 								      "content": "Lorem ipsum dolor sit amet",
-												Update documentation after merge with master

											
										
										
											2016-08-11 01:07:22 +08:00
+								      "content_length": 28
-												Add complete examples to some ingest docs

These examples should make it more clear what the plugins do and they
test that the snippets actually work.

Relates to #18160

											
										
										
											2016-08-10 23:12:41 +08:00
+								    }
 								  }
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Ingest: Add attachment processor

This is a simple port of the mapper attachment plugin to the ingest
functionality, no new features. The only option is to limit
the number of chars to prevent indexing of huge documents.

Fields can be selected in the processor as well.

Close #16303

											
										
										
											2016-02-09 21:57:05 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								## Exported fields [attachment-fields]
-												Extract more standard metadata from binary files (#78754)

Until now, we have been extracted a few number of fields from the binary files sent to the ingest attachment plugin:

* `content`,
* `title`,
* `author`,
* `keywords`,
* `date`,
* `content_type`,
* `content_length`,
* `language`.

Tika has a list of more standard properties which can be extracted:

* `modified`,
* `format`,
* `identifier`,
* `contributor`,
* `coverage`,
* `modifier`,
* `creator_tool`,
* `publisher`,
* `relation`,
* `rights`,
* `source`,
* `type`,
* `description`,
* `print_date`,
* `metadata_date`,
* `latitude`,
* `longitude`,
* `altitude`,
* `rating`,
* `comments`

This commit exposes those new fields.

Related to #22339.

Co-authored-by: Keith Massey <keith.massey@elastic.co>
											
										
										
											2021-11-23 12:01:08 +08:00
 								The fields which might be extracted from a document are:
 								* `content`,
 								* `title`,
 								* `author`,
 								* `keywords`,
 								* `date`,
 								* `content_type`,
 								* `content_length`,
 								* `language`,
 								* `modified`,
 								* `format`,
 								* `identifier`,
 								* `contributor`,
 								* `coverage`,
 								* `modifier`,
 								* `creator_tool`,
 								* `publisher`,
 								* `relation`,
 								* `rights`,
 								* `source`,
 								* `type`,
 								* `description`,
 								* `print_date`,
 								* `metadata_date`,
 								* `latitude`,
 								* `longitude`,
 								* `altitude`,
 								* `rating`,
 								* `comments`
-												[DOCS] Clarify ingest attachment example (#65143)


											
										
										
											2020-11-18 03:17:45 +08:00
+								To extract only certain `attachment` fields, specify the `properties` array:
-												Adds more information about ingest attachment properties extraction

This is coming from thsi thread on discuss: https://discuss.elastic.co/t/ingest-attachment-plugin-exception/69167/10

											
										
										
											2016-12-21 19:13:16 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												Adds more information about ingest attachment properties extraction

This is coming from thsi thread on discuss: https://discuss.elastic.co/t/ingest-attachment-plugin-exception/69167/10

											
										
										
											2016-12-21 19:13:16 +08:00
+								PUT _ingest/pipeline/attachment
 								{
 								  "description" : "Extract attachment information",
 								  "processors" : [
 								    {
 								      "attachment" : {
 								        "field" : "data",
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								        "properties": [ "content", "title" ],
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								        "remove_binary": true
-												Adds more information about ingest attachment properties extraction

This is coming from thsi thread on discuss: https://discuss.elastic.co/t/ingest-attachment-plugin-exception/69167/10

											
										
										
											2016-12-21 19:13:16 +08:00
+								      }
 								    }
 								  ]
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
 								::::{note}
 								Extracting contents from binary data is a resource intensive operation and consumes a lot of resources. It is highly recommended to run pipelines using this processor in a dedicated ingest node.
 								::::
-												Adds more information about ingest attachment properties extraction

This is coming from thsi thread on discuss: https://discuss.elastic.co/t/ingest-attachment-plugin-exception/69167/10

											
										
										
											2016-12-21 19:13:16 +08:00
-												[DOCS] resource_name property for attachment ingest processor (#65974)


											
										
										
											2020-12-09 01:53:58 +08:00
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								## Keeping the attachment binary [attachment-keep-binary]
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								Keeping the binary as a field within the document might consume a lot of resources. It is highly recommended to remove that field from the document, by setting `remove_binary` to `true` to automatically remove the field, as in the other examples shown on this page. If you *do* want to keep the binary field, explicitly set `remove_binary` to `false` to avoid the warning you get from omitting it:
 								```console
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								PUT _ingest/pipeline/attachment
 								{
 								  "description" : "Extract attachment information including original binary",
 								  "processors" : [
 								    {
 								      "attachment" : {
 								        "field" : "data",
 								        "remove_binary": false
 								      }
 								    }
 								  ]
 								}
 								PUT my-index-000001/_doc/my_id?pipeline=attachment
 								{
 								  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 								}
 								GET my-index-000001/_doc/my_id
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								The document’s `_source` object includes the original binary field:
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console-result
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								{
 								  "found": true,
 								  "_index": "my-index-000001",
 								  "_id": "my_id",
 								  "_version": 1,
 								  "_seq_no": 22,
 								  "_primary_term": 1,
 								  "_source": {
 								    "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
 								    "attachment": {
 								      "content_type": "application/rtf",
 								      "language": "ro",
 								      "content": "Lorem ipsum dolor sit amet",
 								      "content_length": 28
 								    }
 								  }
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								## Use the attachment processor with CBOR [attachment-cbor]
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								To avoid encoding and decoding JSON to base64, you can instead pass CBOR data to the attachment processor. For example, the following request creates the `cbor-attachment` pipeline, which uses the attachment processor.
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
+								PUT _ingest/pipeline/cbor-attachment
 								{
 								  "description" : "Extract attachment information",
 								  "processors" : [
 								    {
 								      "attachment" : {
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								        "field" : "data",
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								        "remove_binary": true
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
+								      }
 								    }
 								  ]
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								The following Python script passes CBOR data to an HTTP indexing request that includes the `cbor-attachment` pipeline. The HTTP request headers use a `content-type` of `application/cbor`.
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								::::{note}
 								Not all {{es}} clients support custom HTTP request headers.
 								::::
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
 								```python
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
+								import cbor2
 								import requests
 								file = 'my-file'
 								headers = {'content-type': 'application/cbor'}
 								with open(file, 'rb') as f:
 								  doc = {
 								    'data': f.read()
 								  }
 								  requests.put(
-												[DOCS] resource_name property for attachment ingest processor (#65974)


											
										
										
											2020-12-09 01:53:58 +08:00
+								    'http://localhost:9200/my-index-000001/_doc/my_id?pipeline=cbor-attachment',
 								    data=cbor2.dumps(doc),
-												[DOCS] Add CBOR example to ingest attachment docs (#60919)


											
										
										
											2020-08-11 20:55:40 +08:00
+								    headers=headers
 								  )
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Added "release-state" support to plugin docs

											
										
										
											2017-04-20 21:01:37 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								## Limit the number of extracted chars [attachment-extracted-chars]
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								To prevent extracting too many chars and overload the node memory, the number of chars being used for extraction is limited by default to `100000`. You can change this value by setting `indexed_chars`. Use `-1` for no limit but ensure when setting this that your node will have enough HEAP to extract the content of very big documents.
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								You can also define this limit per document by extracting from a given field the limit to set. If the document has that field, it will overwrite the `indexed_chars` setting. To set this field, define the `indexed_chars_field` setting.
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
 								For example:
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								PUT _ingest/pipeline/attachment
 								{
 								  "description" : "Extract attachment information",
 								  "processors" : [
 								    {
 								      "attachment" : {
 								        "field" : "data",
 								        "indexed_chars" : 11,
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								        "indexed_chars_field" : "max_size",
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								        "remove_binary": true
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								      }
 								    }
 								  ]
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/my_id?pipeline=attachment
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								{
 								  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0="
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_doc/my_id
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
 								Returns this:
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console-result
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								{
 								  "found": true,
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								  "_index": "my-index-000001",
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								  "_id": "my_id",
 								  "_version": 1,
-												Add doc's sequence number + primary term to GetResult and use it for updates (#36680)

This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
											
										
										
											2018-12-17 22:22:13 +08:00
+								  "_seq_no": 35,
 								  "_primary_term": 1,
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								  "_source": {
 								    "attachment": {
 								      "content_type": "application/rtf",
-												Upgrading to tika 2.4 (#86015)

Tika 1.x is end of life as of later this year. This change updates the
AttachmentProcessor to use tika 2. The goal was to keep the
functionality as close as possible, just with upgraded tika. The tests
have been slightly modified because of a small change in tika
functionality -- as of 2.4.0 it now adds an extra newline to the output
for every embedded attachment in a document. Also as part of this I have
broken apart the tika-parsers into individual dependencies. The reason
is that we are considering breaking this plugin apart, and want to know
exactly which parsers we pull in.
											
										
										
											2022-05-25 04:34:19 +08:00
+								      "language": "is",
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								      "content": "Lorem ipsum",
 								      "content_length": 11
 								    }
 								  }
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								PUT _ingest/pipeline/attachment
 								{
 								  "description" : "Extract attachment information",
 								  "processors" : [
 								    {
 								      "attachment" : {
 								        "field" : "data",
 								        "indexed_chars" : 11,
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								        "indexed_chars_field" : "max_size",
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								        "remove_binary": true
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								      }
 								    }
 								  ]
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/my_id_2?pipeline=attachment
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								{
 								  "data": "e1xydGYxXGFuc2kNCkxvcmVtIGlwc3VtIGRvbG9yIHNpdCBhbWV0DQpccGFyIH0=",
 								  "max_size": 5
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_doc/my_id_2
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
 								Returns this:
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console-result
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								{
 								  "found": true,
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								  "_index": "my-index-000001",
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								  "_id": "my_id_2",
 								  "_version": 1,
-												Add doc's sequence number + primary term to GetResult and use it for updates (#36680)

This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
											
										
										
											2018-12-17 22:22:13 +08:00
+								  "_seq_no": 40,
 								  "_primary_term": 1,
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								  "_source": {
 								    "max_size": 5,
 								    "attachment": {
 								      "content_type": "application/rtf",
-												Upgrading to tika 2.4 (#86015)

Tika 1.x is end of life as of later this year. This change updates the
AttachmentProcessor to use tika 2. The goal was to keep the
functionality as close as possible, just with upgraded tika. The tests
have been slightly modified because of a small change in tika
functionality -- as of 2.4.0 it now adds an extra newline to the output
for every embedded attachment in a document. Also as part of this I have
broken apart the tika-parsers into individual dependencies. The reason
is that we are considering breaking this plugin apart, and want to know
exactly which parsers we pull in.
											
										
										
											2022-05-25 04:34:19 +08:00
+								      "language": "sl",
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
+								      "content": "Lorem",
 								      "content_length": 5
 								    }
 								  }
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add ingest-attachment support for per document `indexed_chars` limit (#28977)

We today support a global `indexed_chars` processor parameter. But in some cases, users would like to set this limit depending on the document itself.
It used to be supported in mapper-attachments plugin by extracting the limit value from a meta field in the document sent to indexation process.

We add an option which reads this limit value from the document itself
by adding a setting named `indexed_chars_field`.

Which allows running:

```
PUT _ingest/pipeline/attachment
{
  "description" : "Extract attachment information. Used to parse pdf and office files",
  "processors" : [
    {
      "attachment" : {
        "field" : "data",
        "indexed_chars_field" : "size"
      }
    }
  ]
}
```

Then index either:

```
PUT index/doc/1?pipeline=attachment
{
  "data": "BASE64"
}
```

Which will use the default value (or the one defined by `indexed_chars`)

Or

```
PUT index/doc/2?pipeline=attachment
{
  "data": "BASE64",
  "size": 1000
}
```

Closes #28942
											
										
										
											2018-03-15 02:07:20 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								## Using the attachment processor with arrays [attachment-with-arrays]
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
-												[docs] Prepare for docs-assembler (#125118)

* reorg files for docs-assembler and create toc.yml files

* fix build error, add redirects

* only toc

* move images
											
										
										
											2025-03-21 01:09:12 +08:00
+								To use the attachment processor within an array of attachments the [foreach processor](/reference/enrich-processor/foreach-processor.md) is required. This enables the attachment processor to be run on the individual elements of the array.
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
 								For example, given the following source:
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```js
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
+								{
 								  "attachments" : [
 								    {
 								      "filename" : "ipsum.txt",
 								      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
 								    },
 								    {
 								      "filename" : "test.txt",
 								      "data" : "VGhpcyBpcyBhIHRlc3QK"
 								    }
 								  ]
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								In this case, we want to process the data field in each element of the attachments field and insert the properties into the document so the following `foreach` processor is used:
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								PUT _ingest/pipeline/attachment
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
+								{
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								  "description" : "Extract attachment information from arrays",
 								  "processors" : [
 								    {
 								      "foreach": {
 								        "field": "attachments",
 								        "processor": {
 								          "attachment": {
 								            "target_field": "_ingest._value.attachment",
-												Deprecate 'remove_binary' default of false for ingest attachment processor (#90460)

This commit adds deprecation warning for when the `remove_binary`
setting is unset. In the future we want to change the default to `true`
(it is currently `false`), so this will let a user know they should be
explicit about setting this to ensure the behavior does not change in a
future (breaking) release.

Relates to #86014
											
										
										
											2022-10-03 22:34:40 +08:00
+								            "field": "_ingest._value.data",
-												Update `UpdateForV9` in `AttachmentProcessor` (#118186)

We are not going to make this change in V9. We may do it in V10. This
change just bumps the annotation to remind us to revisit.

Since we are living with this for a while, it seems worth improving
the documentation. This now encourages explicitly setting the option
one way or the other, since you get a warning if you omit it. It also
changes the existing examples to use true rather than false, as that's
our recommendation. And it adds a new section with an example where
it's true, and moves the content previously in a note into that
section.
											
										
										
											2024-12-09 22:28:24 +08:00
+								            "remove_binary": true
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								          }
 								        }
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
+								      }
 								    }
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								  ]
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								PUT my-index-000001/_doc/my_id?pipeline=attachment
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								{
 								  "attachments" : [
 								    {
 								      "filename" : "ipsum.txt",
 								      "data" : "dGhpcyBpcwpqdXN0IHNvbWUgdGV4dAo="
 								    },
 								    {
 								      "filename" : "test.txt",
 								      "data" : "VGhpcyBpcyBhIHRlc3QK"
 								    }
 								  ]
 								}
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								GET my-index-000001/_doc/my_id
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
 								Returns this:
-												[DOCS] [5 of 5] Change // TESTRESPONSE comments to [source,console-results] (#46449)



											
										
										
											2019-09-07 02:05:36 +08:00
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```console-result
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								{
-												[DOCS] Update my-index examples (#60132)

Changes the following example index names to `my-index-000001` for consistency:

* `my-index`
* `my_index`
* `myindex`
											
										
										
											2020-07-28 02:46:39 +08:00
+								  "_index" : "my-index-000001",
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								  "_id" : "my_id",
 								  "_version" : 1,
-												Add doc's sequence number + primary term to GetResult and use it for updates (#36680)

This commit adds the last sequence number and primary term of the last operation that have
modified a document to `GetResult` and uses it to power the Update API.

Relates #36148 
Relates #10708
											
										
										
											2018-12-17 22:22:13 +08:00
+								  "_seq_no" : 50,
 								  "_primary_term" : 1,
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00
+								  "found" : true,
 								  "_source" : {
 								    "attachments" : [
 								      {
 								        "filename" : "ipsum.txt",
 								        "attachment" : {
 								          "content_type" : "text/plain; charset=ISO-8859-1",
 								          "language" : "en",
 								          "content" : "this is\njust some text",
 								          "content_length" : 24
 								        }
 								      },
 								      {
 								        "filename" : "test.txt",
 								        "attachment" : {
 								          "content_type" : "text/plain; charset=ISO-8859-1",
 								          "language" : "en",
 								          "content" : "This is a test",
 								          "content_length" : 16
 								        }
 								      }
 								    ]
-												Add ingest-attachment-with-arrays section to ingest attachments doc

Added a new section detailing how to use the attachment processor
within an array.

This reverts commit #22296 and instead links to the foreach processor.

											
										
										
											2016-12-22 00:18:33 +08:00
+								  }
 								}
-												[docs] Migrate docs from AsciiDoc to Markdown (#123507)

* delete asciidoc files

* add migrated files

* fix errors

* Disable docs tests

* Clarify release notes page titles

* Revert "Clarify release notes page titles"

This reverts commit 8be688648dcc9249943fbaabe37b0d58f87c09e8.

* Comment out edternal URI images

* Clean up query languages landing pages, link to conceptual docs

* Add .md to url

* Fixes inference processor nesting.

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
Co-authored-by: Liam Thompson <leemthompo@gmail.com>
Co-authored-by: Martijn Laarman <Mpdreamz@gmail.com>
Co-authored-by: István Zoltán Szabó <szabosteve@gmail.com>
											
										
										
											2025-02-28 00:56:14 +08:00
+								```
 								Note that the `target_field` needs to be set, otherwise the default value is used which is a top level field `attachment`. The properties on this top level field will contain the value of the first attachment only. However, by specifying the `target_field` on to a value on `_ingest._value` it will correctly associate the properties with the correct attachment.
-												Fix the ingest attachment array examples

Fix up the ingest attachment array handling example so they are full
examples and validated by the build system correctly.

											
										
										
											2016-12-23 13:48:44 +08:00