2022-06-09 21:42:06 +08:00
[[synthetic-source]]
2022-11-21 22:54:34 +08:00
==== Synthetic `_source`
IMPORTANT: Synthetic `_source` is Generally Available only for TSDB indices
(indices that have `index.mode` set to `time_series`). For other indices
synthetic `_source` is in technical preview. Features in technical preview may
2023-10-31 22:31:07 +08:00
be changed or removed in a future release. Elastic will work to fix
2022-11-21 22:54:34 +08:00
any issues, but features in technical preview are not subject to the support SLA
of official GA features.
2022-06-09 21:42:06 +08:00
Though very handy to have around, the source field takes up a significant amount
of space on disk. Instead of storing source documents on disk exactly as you
send them, Elasticsearch can reconstruct source content on the fly upon retrieval.
2022-07-18 19:50:10 +08:00
Enable this by setting `mode: synthetic` in `_source`:
2022-06-09 21:42:06 +08:00
[source,console,id=enable-synthetic-source-example]
----
PUT idx
{
"mappings": {
"_source": {
2022-07-18 19:50:10 +08:00
"mode": "synthetic"
2022-06-09 21:42:06 +08:00
}
}
}
----
// TESTSETUP
While this on the fly reconstruction is *generally* slower than saving the source
documents verbatim and loading them at query time, it saves a lot of storage
2023-09-26 19:29:56 +08:00
space.
2023-08-04 00:28:08 +08:00
[[synthetic-source-restrictions]]
===== Synthetic `_source` restrictions
There are a couple of restrictions to be aware of:
2022-06-09 21:42:06 +08:00
* When you retrieve synthetic `_source` content it undergoes minor
<<synthetic-source-modifications,modifications>> compared to the original JSON.
* Synthetic `_source` can be used with indices that contain only these field
types:
2022-08-02 01:42:25 +08:00
** <<aggregate-metric-double-synthetic-source, `aggregate_metric_double`>>
2024-04-26 01:31:27 +08:00
** {plugins}/mapper-annotated-text-usage.html#annotated-text-synthetic-source[`annotated-text`]
2024-04-23 01:06:39 +08:00
** <<binary-synthetic-source,`binary`>>
2022-06-09 21:42:06 +08:00
** <<boolean-synthetic-source,`boolean`>>
** <<numeric-synthetic-source,`byte`>>
2022-09-10 00:21:43 +08:00
** <<date-synthetic-source,`date`>>
** <<date-nanos-synthetic-source,`date_nanos`>>
2022-09-08 23:24:59 +08:00
** <<dense-vector-synthetic-source,`dense_vector`>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`double`>>
2023-04-11 16:54:28 +08:00
** <<flattened-synthetic-source, `flattened`>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`float`>>
** <<geo-point-synthetic-source,`geo_point`>>
2024-05-25 01:19:22 +08:00
** <<geo-shape-synthetic-source,`geo_shape`>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`half_float`>>
2022-09-08 00:25:38 +08:00
** <<histogram-synthetic-source,`histogram`>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`integer`>>
** <<ip-synthetic-source,`ip`>>
** <<keyword-synthetic-source,`keyword`>>
** <<numeric-synthetic-source,`long`>>
2024-04-25 02:32:20 +08:00
** <<range-synthetic-source,`range` types>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`scaled_float`>>
2024-05-22 02:30:30 +08:00
** <<search-as-you-type-synthetic-source,`search_as_you_type`>>
2022-06-09 21:42:06 +08:00
** <<numeric-synthetic-source,`short`>>
2022-09-27 00:45:04 +08:00
** <<text-synthetic-source,`text`>>
2024-05-28 01:22:59 +08:00
** <<token-count-synthetic-source,`token_count`>>
2022-08-30 21:39:50 +08:00
** <<version-synthetic-source,`version`>>
2022-10-13 03:55:13 +08:00
** <<wildcard-synthetic-source,`wildcard`>>
2022-06-09 21:42:06 +08:00
[[synthetic-source-modifications]]
2022-09-27 00:45:04 +08:00
===== Synthetic `_source` modifications
2022-06-09 21:42:06 +08:00
When synthetic `_source` is enabled, retrieved documents undergo some
modifications compared to the original JSON.
[[synthetic-source-modifications-leaf-arrays]]
====== Arrays moved to leaf fields
Synthetic `_source` arrays are moved to leaves. For example:
[source,console,id=synthetic-source-leaf-arrays-example]
----
PUT idx/_doc/1
{
"foo": [
{
"bar": 1
},
{
"bar": 2
}
]
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
Will become:
[source,console-result]
----
{
"foo": {
"bar": [1, 2]
}
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]
2022-11-17 04:19:42 +08:00
This can cause some arrays to vanish:
[source,console,id=synthetic-source-leaf-arrays-example-sneaky]
----
PUT idx/_doc/1
{
"foo": [
{
"bar": 1
},
{
"baz": 2
}
]
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
Will become:
[source,console-result]
----
{
"foo": {
"bar": 1,
"baz": 2
}
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]
2022-06-09 21:42:06 +08:00
[[synthetic-source-modifications-field-names]]
====== Fields named as they are mapped
Synthetic source names fields as they are named in the mapping. When used
with <<dynamic,dynamic mapping>>, fields with dots (`.`) in their names are, by
default, interpreted as multiple objects, while dots in field names are
preserved within objects that have <<subobjects>> disabled. For example:
[source,console,id=synthetic-source-objecty-example]
----
PUT idx/_doc/1
{
"foo.bar.baz": 1
}
----
// TEST[s/$/\nGET idx\/_doc\/1?filter_path=_source\n/]
Will become:
[source,console-result]
----
{
"foo": {
"bar": {
"baz": 1
}
}
}
----
// TEST[s/^/{"_source":/ s/\n$/}/]
[[synthetic-source-modifications-alphabetical]]
====== Alphabetical sorting
Synthetic `_source` fields are sorted alphabetically. The
https://www.rfc-editor.org/rfc/rfc7159.html[JSON RFC] defines objects as
"an unordered collection of zero or more name/value pairs" so applications
shouldn't care but without synthetic `_source` the original ordering is
preserved and some applications may, counter to the spec, do something with
that ordering.
2024-04-25 02:32:20 +08:00
[[synthetic-source-modifications-ranges]]
====== Representation of ranges
Range field vales (e.g. `long_range`) are always represented as inclusive on both sides with bounds adjusted accordingly. See <<range-synthetic-source-inclusive, examples>>.