2022-07-18 23:54:02 +08:00
|
|
|
|
[[red-yellow-cluster-status]]
|
|
|
|
|
=== Red or yellow cluster status
|
|
|
|
|
|
|
|
|
|
A red or yellow cluster status indicates one or more shards are missing or
|
|
|
|
|
unallocated. These unassigned shards increase your risk of data loss and can
|
|
|
|
|
degrade cluster performance.
|
|
|
|
|
|
|
|
|
|
[discrete]
|
|
|
|
|
[[diagnose-cluster-status]]
|
|
|
|
|
==== Diagnose your cluster status
|
|
|
|
|
|
|
|
|
|
**Check your cluster status**
|
|
|
|
|
|
|
|
|
|
Use the <<cluster-health,cluster health API>>.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
GET _cluster/health?filter_path=status,*_shards
|
|
|
|
|
----
|
|
|
|
|
|
|
|
|
|
A healthy cluster has a green `status` and zero `unassigned_shards`. A yellow
|
|
|
|
|
status means only replicas are unassigned. A red status means one or
|
|
|
|
|
more primary shards are unassigned.
|
|
|
|
|
|
|
|
|
|
**View unassigned shards**
|
|
|
|
|
|
|
|
|
|
To view unassigned shards, use the <<cat-shards,cat shards API>>.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
GET _cat/shards?v=true&h=index,shard,prirep,state,node,unassigned.reason&s=state
|
|
|
|
|
----
|
|
|
|
|
|
|
|
|
|
Unassigned shards have a `state` of `UNASSIGNED`. The `prirep` value is `p` for
|
|
|
|
|
primary shards and `r` for replicas.
|
|
|
|
|
|
|
|
|
|
To understand why an unassigned shard is not being assigned and what action
|
|
|
|
|
you must take to allow {es} to assign it, use the
|
|
|
|
|
<<cluster-allocation-explain,cluster allocation explanation API>>.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
GET _cluster/allocation/explain?filter_path=index,node_allocation_decisions.node_name,node_allocation_decisions.deciders.*
|
|
|
|
|
{
|
|
|
|
|
"index": "my-index",
|
|
|
|
|
"shard": 0,
|
2023-07-11 15:01:12 +08:00
|
|
|
|
"primary": false
|
2022-07-18 23:54:02 +08:00
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
|
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-red-yellow-cluster-status]]
|
|
|
|
|
==== Fix a red or yellow cluster status
|
|
|
|
|
|
|
|
|
|
A shard can become unassigned for several reasons. The following tips outline the
|
|
|
|
|
most common causes and their solutions.
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-reenable-allocation]]
|
|
|
|
|
===== Re-enable shard allocation
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
You typically disable allocation during a <<restart-cluster,restart>> or other
|
|
|
|
|
cluster maintenance. If you forgot to re-enable allocation afterward, {es} will
|
|
|
|
|
be unable to assign shards. To re-enable allocation, reset the
|
|
|
|
|
`cluster.routing.allocation.enable` cluster setting.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
PUT _cluster/settings
|
|
|
|
|
{
|
|
|
|
|
"persistent" : {
|
|
|
|
|
"cluster.routing.allocation.enable" : null
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-recover-nodes]]
|
|
|
|
|
===== Recover lost nodes
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
Shards often become unassigned when a data node leaves the cluster. This can
|
|
|
|
|
occur for several reasons, ranging from connectivity issues to hardware failure.
|
|
|
|
|
After you resolve the issue and recover the node, it will rejoin the cluster.
|
|
|
|
|
{es} will then automatically allocate any unassigned shards.
|
|
|
|
|
|
|
|
|
|
To avoid wasting resources on temporary issues, {es} <<delayed-allocation,delays
|
|
|
|
|
allocation>> by one minute by default. If you've recovered a node and don’t want
|
|
|
|
|
to wait for the delay period, you can call the <<cluster-reroute,cluster reroute
|
|
|
|
|
API>> with no arguments to start the allocation process. The process runs
|
|
|
|
|
asynchronously in the background.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
2022-10-05 14:18:27 +08:00
|
|
|
|
POST _cluster/reroute?metric=none
|
2022-07-18 23:54:02 +08:00
|
|
|
|
----
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-allocation-settings]]
|
|
|
|
|
===== Fix allocation settings
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
Misconfigured allocation settings can result in an unassigned primary shard.
|
|
|
|
|
These settings include:
|
|
|
|
|
|
|
|
|
|
* <<shard-allocation-filtering,Shard allocation>> index settings
|
|
|
|
|
* <<cluster-shard-allocation-filtering,Allocation filtering>> cluster settings
|
|
|
|
|
* <<shard-allocation-awareness,Allocation awareness>> cluster settings
|
|
|
|
|
|
|
|
|
|
To review your allocation settings, use the <<indices-get-settings,get index
|
|
|
|
|
settings>> and <<cluster-get-settings,cluster get settings>> APIs.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
GET my-index/_settings?flat_settings=true&include_defaults=true
|
|
|
|
|
|
|
|
|
|
GET _cluster/settings?flat_settings=true&include_defaults=true
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
|
|
|
|
|
You can change the settings using the <<indices-update-settings,update index
|
|
|
|
|
settings>> and <<cluster-update-settings,cluster update settings>> APIs.
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-allocation-replicas]]
|
|
|
|
|
===== Allocate or reduce replicas
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
To protect against hardware failure, {es} will not assign a replica to the same
|
|
|
|
|
node as its primary shard. If no other data nodes are available to host the
|
|
|
|
|
replica, it remains unassigned. To fix this, you can:
|
|
|
|
|
|
|
|
|
|
* Add a data node to the same tier to host the replica.
|
|
|
|
|
|
|
|
|
|
* Change the `index.number_of_replicas` index setting to reduce the number of
|
|
|
|
|
replicas for each primary shard. We recommend keeping at least one replica per
|
|
|
|
|
primary.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
PUT _settings
|
|
|
|
|
{
|
|
|
|
|
"index.number_of_replicas": 1
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
|
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-disk-space]]
|
|
|
|
|
===== Free up or increase disk space
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
{es} uses a <<disk-based-shard-allocation,low disk watermark>> to ensure data
|
|
|
|
|
nodes have enough disk space for incoming shards. By default, {es} does not
|
|
|
|
|
allocate shards to nodes using more than 85% of disk space.
|
|
|
|
|
|
|
|
|
|
To check the current disk space of your nodes, use the <<cat-allocation,cat
|
|
|
|
|
allocation API>>.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
GET _cat/allocation?v=true&h=node,shards,disk.*
|
|
|
|
|
----
|
|
|
|
|
|
|
|
|
|
If your nodes are running low on disk space, you have a few options:
|
|
|
|
|
|
|
|
|
|
* Upgrade your nodes to increase disk space.
|
|
|
|
|
|
|
|
|
|
* Delete unneeded indices to free up space. If you use {ilm-init}, you can
|
|
|
|
|
update your lifecycle policy to use <<ilm-searchable-snapshot,searchable
|
|
|
|
|
snapshots>> or add a delete phase. If you no longer need to search the data, you
|
|
|
|
|
can use a <<snapshot-restore,snapshot>> to store it off-cluster.
|
|
|
|
|
|
|
|
|
|
* If you no longer write to an index, use the <<indices-forcemerge,force merge
|
|
|
|
|
API>> or {ilm-init}'s <<ilm-forcemerge,force merge action>> to merge its
|
|
|
|
|
segments into larger ones.
|
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-index/_forcemerge
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
|
|
|
|
|
* If an index is read-only, use the <<indices-shrink-index,shrink index API>> or
|
|
|
|
|
{ilm-init}'s <<ilm-shrink,shrink action>> to reduce its primary shard count.
|
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
POST my-index/_shrink/my-shrunken-index
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n{"settings":{"index.number_of_shards":2,"blocks.write":true}}\n/]
|
|
|
|
|
|
|
|
|
|
* If your node has a large disk capacity, you can increase the low disk
|
|
|
|
|
watermark or set it to an explicit byte value.
|
|
|
|
|
+
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
|
|
|
|
PUT _cluster/settings
|
|
|
|
|
{
|
|
|
|
|
"persistent": {
|
|
|
|
|
"cluster.routing.allocation.disk.watermark.low": "30gb"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/"30gb"/null/]
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-jvm]]
|
|
|
|
|
===== Reduce JVM memory pressure
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
Shard allocation requires JVM heap memory. High JVM memory pressure can trigger
|
|
|
|
|
<<circuit-breaker,circuit breakers>> that stop allocation and leave shards
|
|
|
|
|
unassigned. See <<high-jvm-memory-pressure>>.
|
|
|
|
|
|
2023-04-25 16:35:14 +08:00
|
|
|
|
[discrete]
|
|
|
|
|
[[fix-cluster-status-restore]]
|
|
|
|
|
===== Recover data for a lost primary shard
|
2022-07-18 23:54:02 +08:00
|
|
|
|
|
|
|
|
|
If a node containing a primary shard is lost, {es} can typically replace it
|
|
|
|
|
using a replica on another node. If you can't recover the node and replicas
|
|
|
|
|
don't exist or are irrecoverable, you'll need to re-add the missing data from a
|
|
|
|
|
<<snapshot-restore,snapshot>> or the original data source.
|
|
|
|
|
|
|
|
|
|
WARNING: Only use this option if node recovery is no longer possible. This
|
|
|
|
|
process allocates an empty primary shard. If the node later rejoins the cluster,
|
|
|
|
|
{es} will overwrite its primary shard with data from this newer empty shard,
|
|
|
|
|
resulting in data loss.
|
|
|
|
|
|
|
|
|
|
Use the <<cluster-reroute,cluster reroute API>> to manually allocate the
|
|
|
|
|
unassigned primary shard to another data node in the same tier. Set
|
|
|
|
|
`accept_data_loss` to `true`.
|
|
|
|
|
|
|
|
|
|
[source,console]
|
|
|
|
|
----
|
2022-10-05 14:18:27 +08:00
|
|
|
|
POST _cluster/reroute?metric=none
|
2022-07-18 23:54:02 +08:00
|
|
|
|
{
|
|
|
|
|
"commands": [
|
|
|
|
|
{
|
|
|
|
|
"allocate_empty_primary": {
|
|
|
|
|
"index": "my-index",
|
|
|
|
|
"shard": 0,
|
|
|
|
|
"node": "my-node",
|
|
|
|
|
"accept_data_loss": "true"
|
|
|
|
|
}
|
|
|
|
|
}
|
|
|
|
|
]
|
|
|
|
|
}
|
|
|
|
|
----
|
|
|
|
|
// TEST[s/^/PUT my-index\n/]
|
|
|
|
|
// TEST[catch:bad_request]
|
|
|
|
|
|
|
|
|
|
If you backed up the missing index data to a snapshot, use the
|
|
|
|
|
<<restore-snapshot-api,restore snapshot API>> to restore the individual index.
|
2022-10-05 14:18:27 +08:00
|
|
|
|
Alternatively, you can index the missing data from the original data source.
|