Commit Graph

366 Commits

Author SHA1 Message Date
Martijn van Groningen 6566bb4075
Add global ordinal info to stats APIs. (#94500)
This change adds:
* Total global ordinal build time for all fields and per field.
* Max shard value count per field. The value count is per shard and of the shard with the highest count. Reporting value on index level or across indices is too expensive to report or keep track of.

This is added to common stats, which
is exposed in several stats APIs.

The following api call:

```
GET /_nodes/stats?filter_path=nodes.*.indices.fielddata&fields=key,key2
```

Returns:

```
{
    "nodes": {
        "pcMNy4GsQ8ef6Rw-bI2EFg": {
            "indices": {
                "fielddata": {
                    "memory_size_in_bytes": 2552,
                    "evictions": 0,
                    "fields": {
                        "key2": {
                            "memory_size_in_bytes": 1320
                        },
                        "key": {
                            "memory_size_in_bytes": 1232
                        }
                    },
                    "global_ordinals": {
                        "build_time_in_millis": 8,
                        "fields": {
                            "key2": {
                                "build_time_in_millis": 4,
                                "shard_max_value_count": 4
                            },
                            "key": {
                                "build_time_in_millis": 4,
                                "shard_max_value_count": 4
                            }
                        }
                    }
                }
            }
        }
    }
}
```
2023-04-24 10:45:27 +02:00
Ievgen Degtiarenko c2c0ced9b1
Reset desired balance (#94525)
This introduces an endpoint to reset the desired balance.
It could be used if computed balance diverged from the actual one a lot 
to start a new computation from the current state.
2023-04-20 08:03:48 +02:00
David Turner 4ef9965d47
Report transport message size per action (#94543)
Adds to the transport stats a histogram of transport message sizes for
each transport action.

Closes https://github.com/elastic/elasticsearch/issues/88151
2023-03-28 12:05:18 -04:00
Simon Cooper dc6ccbbe08
Add transport_version to node info JSON (#94669) 2023-03-24 09:00:01 +00:00
Ievgen Degtiarenko 3221d1d199
Make relocatingNodeIsDesired nullable when not relocating (#94279) 2023-03-20 12:49:16 +01:00
Ievgen Degtiarenko 21adad38a3
Add data tier to /_internal/desired-balance (#94496) 2023-03-20 10:44:55 +01:00
David Turner e43e7c2f4a
Improve transport stats histogram (#93598)
- omits empty buckets at the start and end of the histogram
- includes human-readable representation of the bucket boundaries if `?human` specified
2023-03-17 18:01:58 -04:00
Ievgen Degtiarenko 90d017ae09
Add cluster_info to the GET /_internal/desired-balance endpoint output (#94272) 2023-03-06 13:57:55 +01:00
Abdon Pijpelink c5b1d997d1
[DOCS] Add 'total' object for io_stats in nodes stats response (#93854) 2023-02-23 13:08:42 +01:00
David Turner a7e2430b79
Include node ID in balance API (#93823)
Today we report node stats by name, but the desired nodes work in terms
of node IDs. This commit adds a mapping between node name and ID to make
the output easier to interpret.
2023-02-15 08:39:59 -05:00
Pooya Salehi ee56ea8c82
Clarify wait_for_completion doc in the Task API (#93754) 2023-02-13 18:29:14 +01:00
David Turner 9c8c9528ad
Add cluster stats re. snapshot activity (#93680)
Shows how many ongoing snapshots/clones/deletions/etc. there are, and
summarises the shard-level status too for progress tracking.
2023-02-13 07:48:14 +00:00
Ievgen Degtiarenko ab5ae88919
Expose forecasted and actual disk usage per tier and node (#93497) 2023-02-06 13:34:18 +01:00
Ievgen Degtiarenko 513dc2f24f
Expose per node counts (#93439) 2023-02-02 16:13:01 +01:00
Ievgen Degtiarenko 22a1ba7b43
Expose tier balancing stats via internal endpoint (#92199) 2023-01-09 10:57:10 +01:00
Ievgen Degtiarenko 42464200fe
Add forecasted_write_load and forecasted_shard_size_in_bytes to the endpoint (#92303) 2023-01-05 12:08:52 +01:00
Frederic Dartayre 27878a78b5
Update get-settings.asciidoc (#91538)
If `include_defaults` is `true` the API returns the default settings from the node handling the API request.
2022-12-22 15:54:26 +01:00
Nik Everett 6481342466
Fix sneaky docs test failure (#91829)
This prevents docs files from *starting* with a "response" because when
that happens the response is converted to an assertion and appended
to the last snippet that was processed. If that last snipper was in a
different file then it's very hard to reason about the tests. That goes
double because the order we iterate files isn't defined....

Anyway! This adds a guard in the build, removes the offending
"response", and reenables the tests that we'd thought we failing here.

Closes #91081
2022-12-07 11:02:44 -05:00
Pooya Salehi 3a223d933a
Prevalidate node removal API (pt. 2) (#91256)
This PR extends the basic Prevalidation API so that in case there are 
red non-searchable-snapshot indices in the cluster, we reach out to 
the nodes (whose removal is being prevalidated) to find out if they 
have a local copy of any red indices.

Closes #87776
2022-11-28 11:51:51 +01:00
Pooya Salehi 327f50ba46
Prevalidate node removal API (pt. 1) (#88952)
This PR adds the first part of the Prevalidate Node Removal API. This
API allows checking whether attempting to remove some node(s) from the
cluster is likely to succeed or not. This check is useful when a node
needs to be removed from a RED cluster, without risking loosing the last
copy of some RED shards.

In this PR, we only check whether a RED index is a Searchable Snapshot
index or not, in which case the removal of any node is safe as the RED
index is backed by a snapshot.

Relates #87776
2022-11-16 13:44:00 +01:00
Artem Prigoda 79ca59bc96
DesiredBalance: expose it via _internal/desired_balance (#91038)
Add an internal endpoint for exposed the desired balance and computation stats at a master node as GET _internal/desired_balance and returns
```
{
  "stats": {
    "computation_active": false,
    "computation_submitted": 5,
    "computation_executed": 5,
    "computation_converged": 5,
    "computation_iterations": 4,
    "computation_converged_index": 4,
    "computation_time_in_millis": 0,
    "reconciliation_time_in_millis": 0
  },
  "routing_table": {
    "test": {
      "0": {
        "current": [
          {
            "state": "STARTED",
            "primary": true,
            "node": "UPYt8VwWTt-IADAEbqpLxA",
            "node_is_desired": true,
            "relocating_node": null,
            "relocating_node_is_desired": false,
            "shard_id": 0,
            "index": "test"
          }
        ],
        "desired": {
          "node_ids": [
            "UPYt8VwWTt-IADAEbqpLxA"
          ],
          "total": 1,
          "unassigned": 0,
          "ignored": 0
        }
      },
      "1": {
        "current": [
          {
            "state": "STARTED",
            "primary": true,
            "node": "2x1VTuSOQdeguXPdN73yRw",
            "node_is_desired": true,
            "relocating_node": null,
            "relocating_node_is_desired": false,
            "shard_id": 1,
            "index": "test"
          }
        ],
        "desired": {
          "node_ids": [
            "2x1VTuSOQdeguXPdN73yRw"
          ],
          "total": 1,
          "unassigned": 0,
          "ignored": 0
        }
      }
    }
  }
}
```

Fixes #90583
2022-11-16 02:54:10 +01:00
Luca Cavanna 238163cd60
Expose telemetry about search usage (#91528)
This is the continuation of #90176 which leverages #90425 to count query types. This PR adds search usage stats to the existing telemetry by counting sections being used as part of a search request, as well as query types. Each distinct query type is counted once per search request.

The counting is performed while parsing, for the following REST search endpoints:

- _search
- _msearch
- _async_search
- _search/template
- _msearch/template
- _fleet/_fleet_search
- _fleet/_fleet_msearch

All other API using search internally, like reindex, ML transform, rank eval, sql etc. are not counted as part of these search usage stats. Such additional functionalities should have its own dedicated telemetry if needed.

The counting of the search sections is not extensive, only the ones that are interesting to collect counts for are tracked.

The following is the new section added to the cluster stats API response, including some sample stats:

```
"search" : {
  "total" : 63,
  "sections" : {
    "knn" : 42,
    "query" : 21, 
    "aggs" : 46
  }, 
  "query" : {
    "match" : 58
  }
}
```

A big part of the change is actually the plumbing to make a common service class that holds the counters available to all the different callers of the parsing methods, especially plugins. Ideally, there would be a separate component that exposes the search parsing functionality rather than static methods, but changing that would require making the additional component available to the REST layer which is not trivial. I reused the existing UsageService which the RestController already holds, and is already used to count access to the different REST endpoints.

Co-authored-by: Mayya Sharipova mayya.sharipova@elastic.co
2022-11-15 21:34:49 +01:00
Iraklis Psaroudakis 756fcc212d
Log YAML test file on failure (#91349)
Relates #91081
2022-11-09 18:35:36 +02:00
Pooya Salehi c7bfdf89b6
Revert "[CI] mute DocsClientYamlTestSuiteIT.test {yaml=reference/cluster/nodes-info/line_283} (#91445)" (#91452)
Relates https://github.com/elastic/elasticsearch/issues/91444
2022-11-09 06:02:22 -05:00
Pooya Salehi 87608da4cf
[CI] mute DocsClientYamlTestSuiteIT.test {yaml=reference/cluster/nodes-info/line_283} (#91445)
Relates https://github.com/elastic/elasticsearch/issues/91444
2022-11-09 05:09:16 -05:00
Iraklis Psaroudakis aa083ce419
[CI] Mute reference/cluster/nodes-stats (#91399)
relates #91081
2022-11-08 14:57:37 +02:00
Iraklis Psaroudakis dcdf58721d
[CI] Mute reference/cluster/nodes-stats/line_2735 (#91380)
relates #91081
2022-11-08 05:04:49 -05:00
Hendrik Muhs 1b556d75fa
mute another node stats test (#91346)
muting another test part as it causes a lot of CI failures

relates #91081
2022-11-07 06:07:09 -05:00
Mary Gouseti d55059afab
Mute reference/cluster/nodes-stats/line_2751 (#91174) 2022-10-28 11:55:53 +02:00
Francisco Fernández Castaño 1a3032beb6
Keep track of average shard write load (#90768)
This commit adds a new field, write_load, into the shard stats. This new stat exposes the average number of write threads used while indexing documents.

Closes #90102
2022-10-13 16:34:45 +02:00
Ievgen Degtiarenko 4d6d979e0e
Deprecate state field in `/_cluster/reroute` response (#90399) 2022-10-05 08:18:27 +02:00
Iraklis Psaroudakis 3ed7a04d22
Introduce node mappings stats (#89807)
So that they are visible in NodeIndicesStats only at the node and index (but not shard) levels. Also visible in the _cat/nodes table. And make an exact count yaml REST test.
2022-09-19 15:47:47 +03:00
Artem Prigoda 72a6fdc2b8
Support "dry run" mode for updating Desired Nodes (#88305)
Add the dry_run query parameter to support simulating of updating of desired nodes. The update request will be validated, but no cluster state updates will be performed. In order to indicate that the response was a result of a dry run, we add the dry_run run field to the JSON representation of a response.

See #82975
2022-07-26 09:03:12 +02:00
Elasticsearch addict e3dc098a0a
Tasks doc: fix a mistake about the reindex task description (#88669) 2022-07-22 12:17:00 +02:00
Elasticsearch addict 11473964ab
Improve description for task api detailed param (#88493)
Co-authored-by: David Turner <david.turner@elastic.co>
2022-07-14 09:22:28 +02:00
David Turner ff269f8104
Small fixes to clear voting config excls API (#87828)
Fixes the name of the REST param in the error message, and expands the
API docs to emphasise that the exclusions should be empty in normal
operation.
2022-06-20 10:40:39 +01:00
David Turner fcf293f87c
Report overall mapping size in cluster stats (#87556)
Adds measures of the total size of all mappings and the total number of
fields in the cluster (both before and after deduplication).

Relates #86639
Relates #77466
2022-06-14 13:55:14 +01:00
Mayya Sharipova 4dabd5eb8e
Add mapping stats for indexed dense_vectors (#86859)
Add cluster mapping stats for indexed dense_vectors

Currently _cluster/stats mapping section displays all mapping types
along with their count. In 8.0 we introduced indexed dense_vector
types, and we would like to collect more enhanced stats on them:
- number of indexed dense_vector fields
- sum of dims across all indexed dense_vector fields

This allows to differentiate how indexed dense_vector types are
used as opposed to unindexed dense_vector types.
2022-06-07 08:40:28 -04:00
Joe Gallo 79990fa49b
Remove "Push back excessive requests for stats (#83832)" (#87054) 2022-05-23 12:58:02 -04:00
Francisco Fernández Castaño e91e7e653b
Add support for CPU ranges in desired nodes (#86434)
This commit adds support for CPU ranges in the desired nodes API. 

This aligns better with environments where administrators/orchestrators
can define lower and upper bounds for the amount of CPUs that the
desired node would get once deployed. 

This allows to provide information about the expected CPU and possible
allowed overcommit that the desired node will run on.

This was the previous expected body for the desired nodes API (we still support it):
```
PUT /_internal/desired_nodes/history/1
{
    "nodes" : [
        {
            "settings" : {
                 "node.name" : "instance-000187",
                 "node.external_id": "instance-000187",
                 "node.roles" : ["data_hot", "master"],
                 "node.attr.data" : "hot",
                 "node.attr.logical_availability_zone" : "zone-0"
            },
            "processors" : 8, 
            "memory" : "58gb",
            "storage" : "1700gb",
            "node_version" : "8.3.0"
        }
    ]
}
```

Now it's possible to define `processors` or `processors_range` as in:
```
PUT /_internal/desired_nodes/history/1
{
    "nodes" : [
        {
            "settings" : {
                 "node.name" : "instance-000187",
                 "node.external_id": "instance-000187",
                 "node.roles" : ["data_hot", "master"],
                 "node.attr.data" : "hot",
                 "node.attr.logical_availability_zone" : "zone-0"
            },
            "processors_range" : {"min": 8.0, "max": 16.0},
            "memory" : "58gb",
            "storage" : "1700gb",
            "node_version" : "8.3.0"
        }
    ]
}
```
Note that `max` in `processors_range` is optional.

This commit also moves from representing CPUs as integers to
accept floating point numbers.

Note: I disabled the bwc yamlRestTests for versions < 8.3 since we introduced
a few "breaking changes" but since this is an internal API it should be fine.
2022-05-20 11:47:32 +02:00
David Turner 6f0cee0fae
Add master_timeout support to voting config exclusions APIs (#86670)
Today the add/clear voting config exclusions APIs route a request to the
master node but do not expose the usual `?master_timeout` parameter
allowing to change the timeout for this phase of execution. This commit
adds the missing parameter.
2022-05-11 13:56:50 +01:00
Rene Groeschke 62d5aa986c
Port gradle docs test plugin to use internal yaml rest test plugin (#86598)
Remove usage of deprecated elasticsearch.rest-test in DocsTestPlugin

we keep some files in src/test in docs projects as moving them would require more changes
in build-docs project outside this repository
2022-05-11 12:01:23 +02:00
Gabi Davar 43ab984639
Add documentation for "io_time_in_millis" (#84911)
Add documentation for "io_time_in_millis"

Co-authored-by: Adam Locke <adam.locke@elastic.co>
2022-04-25 16:43:19 +01:00
Ryan Ernst d60cde6681
Remove flavor from build (#85796)
The default distribution is the only remaining build flavor, and has been for
quite a while now. This commit removes flavor from the internal Build
class. It keeps rest api compat for nodes info for now by hardcoding
`default`.
2022-04-11 16:46:55 -07:00
Ryan Ernst cf3dc57132
Remove no-jdk deprecations (#85765)
The no-jdk distributions exist in 7.x and before. They were removed with
8.0. This commit removes the remaining deprecation messages for using
the no-jdk distribution. Note that when talking with an older node, we
drop the bundledJdk attribute. This is ok because it is only possible
for this to not be true when talking with a 7.17 node, during an upgrade,
and the usingBundledJdk is retained, which is the important thing if
debugging a problem.

relates #76896
relates #85758
2022-04-11 14:52:31 -07:00
Mary Gouseti ed0bb2a8af
Push back excessive requests for stats (#83832)
Resolves #51992
2022-02-28 08:46:18 +01:00
Nhat Nguyen 86964c9752
Document partial search results with skip_unavailable (#84057)
This commit adds an explanation for the relation between `allow_partial_search_results` and `skip_unavailable` in CCS requests.

Relates to #33915

Closes #82407

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2022-02-23 10:04:52 -05:00
David Turner 02f38e3da9
Make allocation explanations more actionable (#83983)
The cluster allocation explain API includes a top-level status
indicating to the user whether the shard can be assigned/rebalanced/etc
or not. Today this status is fairly terse and experience shows that
users sometimes struggle to understand how to interpret it and to decide
on follow-up actions.

This commit makes the top-level explanation more detailed and
actionable. For instance, in the cases like `THROTTLED` where the status
is transient we instruct the user to wait; if a shard is lost we say to
restore it from a snapshot; if a shard cannot be assigned we say to
choose a specific node where its assignment is expected and to address
the obstacles.

Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2022-02-22 09:23:01 +00:00
Tobias Stadler e3deacf547
[DOCS] Fix typos (#83895) 2022-02-15 12:42:17 -05:00
Mark Vieira fcf1380492 Fix documentation snippet tests 2022-02-02 13:29:02 -08:00