2016-05-05 00:17:10 +08:00
[[modules-scripting-fields]]
2019-06-06 22:45:04 +08:00
== Accessing document fields and special variables
2016-05-05 00:17:10 +08:00
Depending on where a script is used, it will have access to certain special
variables and document fields.
2020-07-23 23:48:22 +08:00
[discrete]
2016-05-05 00:17:10 +08:00
== Update scripts
A script used in the <<docs-update,update>>,
<<docs-update-by-query,update-by-query>>, or <<docs-reindex,reindex>>
API will have access to the `ctx` variable which exposes:
[horizontal]
`ctx._source`:: Access to the document <<mapping-source-field,`_source` field>>.
`ctx.op`:: The operation that should be applied to the document: `index` or `delete`.
2020-08-06 01:21:00 +08:00
`ctx._index` etc:: Access to <<mapping-fields,document metadata fields>>, some of which may be read-only.
2016-05-05 00:17:10 +08:00
2023-03-09 22:57:17 +08:00
These scripts do not have access to the `doc` variable and have to use `ctx` to access the documents they operate on.
2020-07-23 23:48:22 +08:00
[discrete]
2019-06-06 22:45:04 +08:00
== Search and aggregation scripts
2016-05-05 00:17:10 +08:00
2020-08-07 00:45:03 +08:00
With the exception of <<script-fields,script fields>> which are
2016-05-05 00:17:10 +08:00
executed once per search hit, scripts used in search and aggregations will be
executed once for every document which might match a query or an aggregation.
Depending on how many documents you have, this could mean millions or billions
of executions: these scripts need to be fast!
Field values can be accessed from a script using
2020-06-02 08:29:48 +08:00
<<modules-scripting-doc-vals,doc-values>>,
<<modules-scripting-source, the `_source` field>>, or
<<modules-scripting-stored, stored fields>>,
each of which is explained below.
2016-05-05 00:17:10 +08:00
[[scripting-score]]
2020-07-23 23:48:22 +08:00
[discrete]
2016-05-05 00:17:10 +08:00
=== Accessing the score of a document within a script
Scripts used in the <<query-dsl-function-score-query,`function_score` query>>,
2020-07-24 00:58:57 +08:00
in <<sort-search-results,script-based sorting>>, or in
2016-05-05 00:17:10 +08:00
<<search-aggregations,aggregations>> have access to the `_score` variable which
represents the current relevance score of a document.
Here's an example of using a script in a
<<query-dsl-function-score-query,`function_score` query>> to alter the
relevance `_score` of each document:
2019-09-09 22:45:37 +08:00
[source,console]
2016-05-05 00:17:10 +08:00
-------------------------------------
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/1?refresh
2016-05-05 00:17:10 +08:00
{
"text": "quick brown fox",
"popularity": 1
}
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/2?refresh
2016-05-05 00:17:10 +08:00
{
"text": "quick fox",
"popularity": 5
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_search
2016-05-05 00:17:10 +08:00
{
"query": {
"function_score": {
"query": {
"match": {
"text": "quick brown fox"
}
},
"script_score": {
"script": {
"lang": "expression",
2017-06-09 23:29:25 +08:00
"source": "_score * doc['popularity']"
2016-05-05 00:17:10 +08:00
}
}
}
}
}
-------------------------------------
2024-10-30 22:31:26 +08:00
[discrete]
[[scripting-term-statistics]]
=== Accessing term statistics of a document within a script
Scripts used in a <<query-dsl-script-score-query,`script_score`>> query have access to the `_termStats` variable which provides statistical information about the terms in the child query.
In the following example, `_termStats` is used within a <<query-dsl-script-score-query,`script_score`>> query to retrieve the average term frequency for the terms `quick`, `brown`, and `fox` in the `text` field:
[source,console]
-------------------------------------
PUT my-index-000001/_doc/1?refresh
{
"text": "quick brown fox"
}
PUT my-index-000001/_doc/2?refresh
{
"text": "quick fox"
}
GET my-index-000001/_search
{
"query": {
"script_score": {
"query": { <1>
"match": {
"text": "quick brown fox"
}
},
"script": {
"source": "_termStats.termFreq().getAverage()" <2>
}
}
}
}
-------------------------------------
<1> Child query used to infer the field and the terms considered in term statistics.
<2> The script calculates the average document frequency for the terms in the query using `_termStats`.
`_termStats` provides access to the following functions for working with term statistics:
- `uniqueTermsCount`: Returns the total number of unique terms in the query. This value is the same across all documents.
- `matchedTermsCount`: Returns the count of query terms that matched within the current document.
- `docFreq`: Provides document frequency statistics for the terms in the query, indicating how many documents contain each term. This value is consistent across all documents.
- `totalTermFreq`: Provides the total frequency of terms across all documents, representing how often each term appears in the entire corpus. This value is consistent across all documents.
- `termFreq`: Returns the frequency of query terms within the current document, showing how often each term appears in that document.
[NOTE]
.Functions returning aggregated statistics
===================================================
The `docFreq`, `termFreq` and `totalTermFreq` functions return objects that represent statistics across all terms of the child query.
Statistics provides support for the following methods:
`getAverage()`: Returns the average value of the metric.
`getMin()`: Returns the minimum value of the metric.
`getMax()`: Returns the maximum value of the metric.
`getSum()`: Returns the sum of the metric values.
`getCount()`: Returns the count of terms included in the metric calculation.
===================================================
[NOTE]
.Painless language required
===================================================
The `_termStats` variable is only available when using the <<modules-scripting-painless, Painless>> scripting language.
===================================================
2016-05-05 00:17:10 +08:00
2020-07-23 23:48:22 +08:00
[discrete]
2016-05-05 00:17:10 +08:00
[[modules-scripting-doc-vals]]
2019-06-06 22:45:04 +08:00
=== Doc values
2016-05-05 00:17:10 +08:00
By far the fastest most efficient way to access a field value from a
script is to use the `doc['field_name']` syntax, which retrieves the field
value from <<doc-values,doc values>>. Doc values are a columnar field value
store, enabled by default on all fields except for <<text,analyzed `text` fields>>.
2019-09-09 22:45:37 +08:00
[source,console]
2016-05-05 00:17:10 +08:00
-------------------------------
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/1?refresh
2016-05-05 00:17:10 +08:00
{
"cost_price": 100
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_search
2016-05-05 00:17:10 +08:00
{
"script_fields": {
"sales_price": {
"script": {
"lang": "expression",
2017-06-09 23:29:25 +08:00
"source": "doc['cost_price'] * markup",
2016-05-05 00:17:10 +08:00
"params": {
"markup": 0.2
}
}
}
}
}
-------------------------------
Doc-values can only return "simple" field values like numbers, dates, geo-
points, terms, etc, or arrays of these values if the field is multi-valued.
It cannot return JSON objects.
2018-03-21 13:15:34 +08:00
[NOTE]
.Missing fields
===================================================
The `doc['field']` will throw an error if `field` is missing from the mappings.
In `painless`, a check can first be done with `doc.containsKey('field')` to guard
2021-03-31 21:57:47 +08:00
accessing the `doc` map. Unfortunately, there is no way to check for the
2018-03-21 13:15:34 +08:00
existence of the field in mappings in an `expression` script.
===================================================
2016-05-05 00:17:10 +08:00
[NOTE]
.Doc values and `text` fields
===================================================
The `doc['field']` syntax can also be used for <<text,analyzed `text` fields>>
2023-05-24 19:32:46 +08:00
if <<fielddata-mapping-param,`fielddata`>> is enabled, but *BEWARE*: enabling fielddata on a
2016-05-05 00:17:10 +08:00
`text` field requires loading all of the terms into the JVM heap, which can be
2021-03-31 21:57:47 +08:00
very expensive both in terms of memory and CPU. It seldom makes sense to
2016-05-05 00:17:10 +08:00
access `text` fields from scripts.
===================================================
2020-07-23 23:48:22 +08:00
[discrete]
2020-06-02 08:29:48 +08:00
[[modules-scripting-source]]
=== The document `_source`
2016-05-05 00:17:10 +08:00
2020-06-02 08:29:48 +08:00
The document <<mapping-source-field,`_source`>> can be accessed using the
`_source.field_name` syntax. The `_source` is loaded as a map-of-maps, so
properties within object fields can be accessed as, for example,
`_source.name.first`.
2016-05-05 00:17:10 +08:00
[IMPORTANT]
2020-06-02 08:29:48 +08:00
.Prefer doc-values to _source
2016-05-05 00:17:10 +08:00
=========================================================
2020-06-02 08:29:48 +08:00
Accessing the `_source` field is much slower than using doc-values. The
_source field is optimised for returning several fields per result, while doc
values are optimised for accessing the value of a specific field in many
documents.
2016-05-05 00:17:10 +08:00
2020-06-02 08:29:48 +08:00
It makes sense to use `_source` when generating a
2020-08-07 00:45:03 +08:00
<<script-fields,script field>> for the top ten hits from a
2020-06-02 08:29:48 +08:00
search result but, for other search and aggregation use cases, always prefer
using doc values.
2016-05-05 00:17:10 +08:00
=========================================================
For instance:
2019-09-09 22:45:37 +08:00
[source,console]
2016-05-05 00:17:10 +08:00
-------------------------------
2020-07-28 02:46:39 +08:00
PUT my-index-000001
2016-05-05 00:17:10 +08:00
{
"mappings": {
2019-01-18 21:11:18 +08:00
"properties": {
"first_name": {
2020-06-02 08:29:48 +08:00
"type": "text"
2019-01-18 21:11:18 +08:00
},
"last_name": {
2020-06-02 08:29:48 +08:00
"type": "text"
2016-05-05 00:17:10 +08:00
}
}
}
}
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/1?refresh
2016-05-05 00:17:10 +08:00
{
"first_name": "Barry",
"last_name": "White"
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_search
2016-05-05 00:17:10 +08:00
{
"script_fields": {
2020-06-02 08:29:48 +08:00
"full_name": {
2016-05-05 00:17:10 +08:00
"script": {
2016-11-23 11:24:12 +08:00
"lang": "painless",
2020-06-02 08:29:48 +08:00
"source": "params._source.first_name + ' ' + params._source.last_name"
}
}
}
}
-------------------------------
2020-07-23 23:48:22 +08:00
[discrete]
2020-06-02 08:29:48 +08:00
[[modules-scripting-stored]]
=== Stored fields
_Stored fields_ -- fields explicitly marked as
<<mapping-store,`"store": true`>> in the mapping -- can be accessed using the
`_fields['field_name'].value` or `_fields['field_name']` syntax:
[source,console]
-------------------------------
2020-07-28 02:46:39 +08:00
PUT my-index-000001
2020-06-02 08:29:48 +08:00
{
"mappings": {
"properties": {
"full_name": {
"type": "text",
"store": true
},
"title": {
"type": "text",
"store": true
2016-05-05 00:17:10 +08:00
}
2020-06-02 08:29:48 +08:00
}
}
}
2020-07-28 02:46:39 +08:00
PUT my-index-000001/_doc/1?refresh
2020-06-02 08:29:48 +08:00
{
"full_name": "Alice Ball",
"title": "Professor"
}
2020-07-28 02:46:39 +08:00
GET my-index-000001/_search
2020-06-02 08:29:48 +08:00
{
"script_fields": {
"name_with_title": {
2016-05-05 00:17:10 +08:00
"script": {
2016-11-23 11:24:12 +08:00
"lang": "painless",
2020-06-02 08:29:48 +08:00
"source": "params._fields['title'].value + ' ' + params._fields['full_name'].value"
2016-05-05 00:17:10 +08:00
}
}
}
}
-------------------------------
2019-09-09 22:45:37 +08:00
2016-05-05 00:17:10 +08:00
[TIP]
.Stored vs `_source`
=======================================================
The `_source` field is just a special stored field, so the performance is
2021-03-31 21:57:47 +08:00
similar to that of other stored fields. The `_source` provides access to the
2016-05-05 00:17:10 +08:00
original document body that was indexed (including the ability to distinguish
`null` values from empty fields, single-value arrays from plain scalars, etc).
The only time it really makes sense to use stored fields instead of the
`_source` field is when the `_source` is very large and it is less costly to
access a few small stored fields instead of the entire `_source`.
=======================================================