2023-03-01 21:34:14 +08:00
[[fix-watermark-errors]]
=== Fix watermark errors
++++
<titleabbrev>Watermark errors</titleabbrev>
++++
2024-08-07 00:15:42 +08:00
:keywords: {es}, high watermark, low watermark, full disk, flood stage watermark
2023-03-01 21:34:14 +08:00
When a data node is critically low on disk space and has reached the
<<cluster-routing-flood-stage,flood-stage disk usage watermark>>, the following
error is logged: `Error: disk usage exceeded flood-stage watermark, index has read-only-allow-delete block`.
2024-08-07 00:15:42 +08:00
To prevent a full disk, when a node reaches this watermark, {es} <<index-block-settings,blocks writes>>
2023-03-01 21:34:14 +08:00
to any index with a shard on the node. If the block affects related system
2024-08-07 00:15:42 +08:00
indices, {kib} and other {stack} features may become unavailable. For example,
this could induce {kib}'s `Kibana Server is not Ready yet`
{kibana-ref}/access.html#not-ready[error message].
2022-07-18 23:54:02 +08:00
{es} will automatically remove the write block when the affected node's disk
2024-08-07 00:15:42 +08:00
usage falls below the <<cluster-routing-watermark-high,high disk watermark>>.
To achieve this, {es} attempts to rebalance some of the affected node's shards
to other nodes in the same data tier.
2022-07-18 23:54:02 +08:00
2025-01-09 23:24:20 +08:00
****
If you're using Elastic Cloud Hosted, then you can use AutoOps to monitor your cluster. AutoOps significantly simplifies cluster management with performance recommendations, resource utilization visibility, real-time issue detection and resolution paths. For more information, refer to https://www.elastic.co/guide/en/cloud/current/ec-autoops.html[Monitor with AutoOps].
****
2024-08-07 00:15:42 +08:00
[[fix-watermark-errors-rebalance]]
==== Monitor rebalancing
To verify that shards are moving off the affected node until it falls below high
watermark., use the <<cat-shards,cat shards API>> and <<cat-recovery,cat recovery API>>:
2022-07-18 23:54:02 +08:00
[source,console]
----
GET _cat/shards?v=true
2024-08-07 00:15:42 +08:00
GET _cat/recovery?v=true&active_only=true
2022-07-18 23:54:02 +08:00
----
2024-08-07 00:15:42 +08:00
If shards remain on the node keeping it about high watermark, use the
<<cluster-allocation-explain,cluster allocation explanation API>> to get an
explanation for their allocation status.
2022-07-18 23:54:02 +08:00
[source,console]
----
GET _cluster/allocation/explain
{
"index": "my-index",
"shard": 0,
2024-08-20 22:22:22 +08:00
"primary": false
2022-07-18 23:54:02 +08:00
}
----
// TEST[s/^/PUT my-index\n/]
// TEST[s/"primary": false,/"primary": false/]
2024-08-07 00:15:42 +08:00
[[fix-watermark-errors-temporary]]
==== Temporary Relief
2025-01-30 02:31:50 +08:00
To immediately restore write operations, you can temporarily increase
2024-08-07 00:15:42 +08:00
<<disk-based-shard-allocation,disk watermarks>> and remove the
<<index-block-settings,write block>>.
2022-07-18 23:54:02 +08:00
[source,console]
----
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": "90%",
2022-09-19 19:59:18 +08:00
"cluster.routing.allocation.disk.watermark.low.max_headroom": "100GB",
2022-07-18 23:54:02 +08:00
"cluster.routing.allocation.disk.watermark.high": "95%",
2022-09-19 19:59:18 +08:00
"cluster.routing.allocation.disk.watermark.high.max_headroom": "20GB",
"cluster.routing.allocation.disk.watermark.flood_stage": "97%",
"cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": "5GB",
"cluster.routing.allocation.disk.watermark.flood_stage.frozen": "97%",
"cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": "5GB"
2022-07-18 23:54:02 +08:00
}
}
PUT */_settings?expand_wildcards=all
{
"index.blocks.read_only_allow_delete": null
}
----
// TEST[s/^/PUT my-index\n/]
2024-08-07 00:15:42 +08:00
When a long-term solution is in place, to reset or reconfigure the disk watermarks:
2022-07-18 23:54:02 +08:00
[source,console]
----
PUT _cluster/settings
{
"persistent": {
"cluster.routing.allocation.disk.watermark.low": null,
2022-09-19 19:59:18 +08:00
"cluster.routing.allocation.disk.watermark.low.max_headroom": null,
2022-07-18 23:54:02 +08:00
"cluster.routing.allocation.disk.watermark.high": null,
2022-09-19 19:59:18 +08:00
"cluster.routing.allocation.disk.watermark.high.max_headroom": null,
"cluster.routing.allocation.disk.watermark.flood_stage": null,
"cluster.routing.allocation.disk.watermark.flood_stage.max_headroom": null,
"cluster.routing.allocation.disk.watermark.flood_stage.frozen": null,
"cluster.routing.allocation.disk.watermark.flood_stage.frozen.max_headroom": null
2022-07-18 23:54:02 +08:00
}
}
2022-09-19 19:59:18 +08:00
----
2024-08-07 00:15:42 +08:00
[[fix-watermark-errors-resolve]]
==== Resolve
2025-01-30 02:31:50 +08:00
To resolve watermark errors permanently, perform one of the following actions:
2024-08-07 00:15:42 +08:00
2025-01-30 02:31:50 +08:00
* Horizontally scale nodes of the affected <<data-tiers,data tiers>>.
2024-08-07 00:15:42 +08:00
2025-01-30 02:31:50 +08:00
* Vertically scale existing nodes to increase disk space.
2024-08-07 00:15:42 +08:00
2025-01-30 02:31:50 +08:00
* Delete indices using the <<indices-delete-index,delete index API>>, either
permanently if the index isn't needed, or temporarily to later
<<snapshots-restore-snapshot,restore>>.
2024-08-07 00:15:42 +08:00
* update related <<index-lifecycle-management,ILM policy>> to push indices
through to later <<data-tiers,data tiers>>
2025-01-30 02:31:50 +08:00
TIP: On {ess} and {ece}, indices may need to be temporarily deleted via
its {cloud}/ec-api-console.html[Elasticsearch API Console] to later
<<snapshots-restore-snapshot,snapshot restore>> in order to resolve
<<cluster-health,cluster health>> `status:red` which will block
{cloud}/ec-activity-page.html[attempted changes]. If you experience issues
with this resolution flow on {ess}, kindly reach out to
https://support.elastic.co[Elastic Support] for assistance.
2025-01-30 19:52:37 +08:00
[discrete]
[[fix-watermark-errors-prevent]]
=== Prevent watermark errors
2025-01-30 02:31:50 +08:00
2025-01-30 19:52:37 +08:00
To avoid watermark errors in future, perform one of the following actions:
2025-01-30 02:31:50 +08:00
* If you're using {ess}, {ece}, or {eck}: Enable <<xpack-autoscaling,autoscaling>>.
* Set up {kibana-ref}/kibana-alerts.html[stack monitoring alerts] on top of
<<monitor-elasticsearch-cluster,{es} monitoring>> to be notified before
2025-01-30 19:52:37 +08:00
the flood-stage watermark is reached.