elasticsearch/docs/reference/transform/checkpoints.asciidoc

89 lines
3.8 KiB
Plaintext
Raw Normal View History

2019-09-11 09:13:40 +08:00
[role="xpack"]
[[ml-transform-checkpoints]]
== How {transform} checkpoints work
2019-09-11 09:13:40 +08:00
++++
<titleabbrev>How checkpoints work</titleabbrev>
++++
beta[]
Each time a {transform} examines the source indices and creates or
2019-09-11 09:13:40 +08:00
updates the destination index, it generates a _checkpoint_.
If your {transform} runs only once, there is logically only one
checkpoint. If your {transform} runs continuously, however, it creates
2019-09-11 09:13:40 +08:00
checkpoints as it ingests and transforms new source data.
To create a checkpoint, the {ctransform}:
2019-09-11 09:13:40 +08:00
. Checks for changes to source indices.
+
Using a simple periodic timer, the {transform} checks for changes to
2019-09-11 09:13:40 +08:00
the source indices. This check is done based on the interval defined in the
transform's `frequency` property.
+
If the source indices remain unchanged or if a checkpoint is already in progress
then it waits for the next timer.
. Identifies which entities have changed.
+
The {transform} searches to see which entities have changed since the
last time it checked. The `sync` configuration object in the {transform}
identifies a time field in the source indices. The {transform} uses the values
in that field to synchronize the source and destination indices.
2019-09-11 09:13:40 +08:00
. Updates the destination index (the {dataframe}) with the changed entities.
+
--
The {transform} applies changes related to either new or changed
2019-09-11 09:13:40 +08:00
entities to the destination index. The set of changed entities is paginated. For
each page, the {transform} performs a composite aggregation using a
2019-09-11 09:13:40 +08:00
`terms` query. After all the pages of changes have been applied, the checkpoint
is complete.
--
This checkpoint process involves both search and indexing activity on the
cluster. We have attempted to favor control over performance while developing
{transforms}. We decided it was preferable for the
{transform} to take longer to complete, rather than to finish quickly
2019-09-11 09:13:40 +08:00
and take precedence in resource consumption. That being said, the cluster still
requires enough resources to support both the composite aggregation search and
the indexing of its results.
TIP: If the cluster experiences unsuitable performance degradation due to the
{transform}, stop the {transform}. Consider whether you can apply a
source query to the {transform} to reduce the scope of data it
2019-09-11 09:13:40 +08:00
processes. Also consider whether the cluster has sufficient resources in place
to support both the composite aggregation search and the indexing of its
results.
[discrete]
[[ml-transform-checkpoint-errors]]
==== Error handling
Failures in {transforms} tend to be related to searching or indexing.
To increase the resiliency of {transforms}, the cursor positions of
2019-09-11 09:13:40 +08:00
the aggregated search and the changed entities search are tracked in memory and
persisted periodically.
Checkpoint failures can be categorized as follows:
* Temporary failures: The checkpoint is retried. If 10 consecutive failures
occur, the {transform} has a failed status. For example, this
2019-09-11 09:13:40 +08:00
situation might occur when there are shard failures and queries return only
partial results.
* Irrecoverable failures: The {transform} immediately fails. For
2019-09-11 09:13:40 +08:00
example, this situation occurs when the source index is not found.
* Adjustment failures: The {transform} retries with adjusted settings.
2019-09-11 09:13:40 +08:00
For example, if a parent circuit breaker memory errors occur during the
composite aggregation, the {transform} receives partial results. The aggregated
2019-09-11 09:13:40 +08:00
search is retried with a smaller number of buckets. This retry is performed at
the interval defined in the `frequency` property for the {transform}. If the
search is retried to the point where it reaches a minimal number of buckets, an
2019-09-11 09:13:40 +08:00
irrecoverable failure occurs.
If the node running the {transforms} fails, the {transform} restarts
2019-09-11 09:13:40 +08:00
from the most recent persisted cursor position. This recovery process might
repeat some of the work the {transform} had already done, but it ensures data
2019-09-11 09:13:40 +08:00
consistency.