Commit Graph

1000 Commits

Author SHA1 Message Date
David Turner 0131e80624 Revert "(+Doc) link split-brain wiki from quorom decision making (#108915)"
This reverts commit 4d3ca2d029.
2024-06-16 08:54:44 +01:00
shainaraskas 900eb82c99
[DOCS] Address local vs. remote storage + shard limits feedback (#109360) 2024-06-12 13:50:23 -04:00
David Turner 366c0b16bf
Add docs on HTTP client config (#109543)
Some notes and recommendations on timeouts and TCP keepalives.

Relates INC-1049
2024-06-12 14:54:54 +01:00
David Turner 683245e41e
Detect long-running tasks on network threads (#109204)
This commit introduces a watchdog timer to monitor for long-running
tasks on network threads. If a network thread is active and has not made
progress for two consecutive ticks of the timer then the watchdog logs a
warning and a thread dump.
2024-06-10 17:47:40 +10:00
Liam Thompson 2268e383e8
[DOCS][ESQL][8.14] Add API key based security model info for ESQL CCS (#109155)
Co-authored-by: Jake Landis <jake.landis@elastic.co>
2024-06-03 18:44:33 +02:00
Stef Nestor 4d3ca2d029
(+Doc) link split-brain wiki from quorom decision making (#108915)
Mini change to link the [wiki page about "split-brain"](https://en.wikipedia.org/wiki/Split-brain_(computing)) as an industry-not-Elastic term under [Quorum-based decision making](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-quorums.html)
2024-05-22 13:22:03 -06:00
Stef Nestor 12aab08330
(+Doc) Link split-brain wiki (#108914)
Mini change to link the wiki page about "split-brain" as an industry-not-Elastic term under [Voting configurations](https://www.elastic.co/guide/en/elasticsearch/reference/current/modules-discovery-voting.html).
2024-05-22 13:21:54 -06:00
David Turner 6ecb295ff1
Document `transport.compress` trade-offs more clearly (#108458)
Spells out explicitly that setting `transport.compress: true` may cost
extra CPU.
2024-05-09 14:04:45 -04:00
shainaraskas 9d9f23ca96
[DOCS] Add API example + diagrams to shard allocation awareness docs (#108390) 2024-05-08 12:52:50 -04:00
Liam Thompson 9a62dba53c
[DOCS] Remove remaining beta flags for RCS (#108201) 2024-05-03 09:12:37 +02:00
florent-leborgne 0c500e5264
Remove Beta label for RCS2.0 from 8.14 (#108030) 2024-05-02 15:43:21 +02:00
Michael Peterson a451511e3a
Change skip_unavailable default value to true (#105792)
In order to improve the experience of cross-cluster search, we are changing
the default value of the remote cluster `skip_unavailable` setting from `false` to `true`.

This setting causes any cross-cluster _search (or _async_search) to entirely fail when
any remote cluster with `skip_unavailable=false` is either unavailable (connection to it fails)
or if the search on it fails on all shards.

Setting `skip_unavailable=true` allows partial results from other clusters to be
returned. In that case, the search response cluster metadata will show a `skipped`
status, so the user can see that no data came in from that cluster. Kibana also
now leverages this metadata in the cross-cluster search responses to allow users
to see how many clusters returned data and drill down into which clusters did not
(including failure messages).

Currently, the user/admin has to specifically set the value to `true` in the configs, like so:

```
cluster:
    remote:
        remote1:
            seeds: 10.10.10.10:9300
            skip_unavailable: true
```

even though that is probably what search admins want in the vast majority of cases.

Setting `skip_unavailable=false` should be a conscious (and probably rare) choice
by an Elasticsearch admin that a particular cluster's results are so essential to a
search (or visualization in dashboard or Discover panel) that no results at all should
be shown if it cannot return any results.
2024-04-29 15:53:47 -04:00
Liam Thompson 33a71e3289
[DOCS] Refactor book-scoped variables in `docs/reference/index.asciidoc` (#107413)
* Remove `es-test-dir` book-scoped variable

* Remove `plugins-examples-dir` book-scoped variable

* Remove `:dependencies-dir:` and `:xes-repo-dir:` book-scoped variables

- In `index.asciidoc`, two variables (`:dependencies-dir:` and `:xes-repo-dir:`) were removed.
- In `sql/index.asciidoc`, the `:sql-tests:` path was updated to fuller path
- In `esql/index.asciidoc`, the `:esql-tests:` path was updated idem

* Replace `es-repo-dir` with `es-ref-dir`

* Move `:include-xpack: true` to few files that use it, remove from index.asciidoc
2024-04-17 14:37:07 +02:00
David Turner 9a2f8a80eb
Add remote cluster network troubleshooting docs (#107072)
Spells out in a little more detail our expectations for remote cluster
connections, including an example log message when the network is
unreliable and some suggestions for how to troubleshoot further.
2024-04-04 02:37:13 -04:00
shainaraskas 206a0b7a4c
[DOCS] Remove obsolete accounting circuit breakers (#107015) 2024-04-03 09:54:53 -04:00
Jake Landis bb9566a57e
Update discovery.asciidoc (#106541) (#106695)
Fix typo

(cherry picked from commit 96a46b9c5b)

Co-authored-by: Boen <13752080613@163.com>
2024-03-22 15:43:48 -04:00
shainaraskas 82d7e4ec93
[DOCS] Clarify behavior of the generic `data` node role (#106375) 2024-03-22 14:06:19 -04:00
florent-leborgne d37d93ac36
[Docs] [Remote Clusters] Note about certificates in ESS for Remote Cluster Security (#105771)
* note about ess certificates

* Update docs/reference/modules/cluster/remote-clusters-api-key.asciidoc

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>

---------

Co-authored-by: Liam Thompson <32779855+leemthompo@users.noreply.github.com>
2024-02-23 15:13:51 +01:00
David Turner 369096365c
Expand docs about max-shards-per-node (#105607)
Adds a little more detail on what sorts of problems may occur if you
exceed the default limits.
2024-02-20 08:43:18 +00:00
Nikolaj Volgushev e241a91a4e
Docs for hot-reloadable remote cluster credentials (#105483)
Docs PR to accompany
https://github.com/elastic/elasticsearch/pull/103215.

Resolves: ES-7625
2024-02-15 06:02:13 -05:00
florent-leborgne 4ee086e406
[DOCS] [Remote clusters] Reference specific instructions for cloud trust 2024-02-15 09:39:02 +01:00
David Turner cc2e56da38
Security auto-config overrides default `http.host` (#105377)
If you start up a freshly-unpacked Elasticsearch tarball, security
auto-configuration will set `http.host: 0.0.0.0` in `elasticsearch.yml`,
overriding the documented default behaviour which is to fall back to
`network.host` which itself defaults to `localhost`. This commit adds a
note to the docs about this.
2024-02-12 09:54:38 +00:00
Fabio Busatto b1adb78f6c
[DOCS] Update remote cluster setup instructions (#105256) 2024-02-07 21:11:57 +01:00
Yang Wang 6cf92584ba
[Docs] Minor tweak for balance settings docs (#105170)
Relates: #105119
2024-02-06 22:31:35 +11:00
David Turner 6a40c04cc1
More guidance in balance settings docs (#105119)
Today the docs on balancing settings describe what the settings all do
but offer little guidance about how to configure them. This commit adds
some extra detail to avoid some common misunderstandings and reorders
the docs a little so that more commonly-adjusted settings are mentioned
earlier.
2024-02-05 05:04:24 -05:00
David Turner 88e497069a
Allocation awareness allocates some replicas (#104800)
The docs for forced awareness indicate that no replicas will be assigned
until all zones are available, which is definitely undesirable and also
not the actual behaviour. This commit fixes the wording to match what
really happens.

Closes #104777
2024-01-29 08:13:06 +00:00
David Turner 1c11249c05
Fix docs about uneven disk usage (#104541)
There's a note in the docs saying we only consider shard count and not
disk usage which is no longer true. This commit fixes the note to
reflect today's implementation.
2024-01-18 16:02:37 +00:00
Iraklis Psaroudakis 37b7dd987b
Add warning on desired balancer heuristics (#102633)
To avoid changing them.
2023-11-27 14:45:57 +02:00
David Turner 61191b880c
Link to troubleshooting docs from other disco pages (#102509)
I have several times struggled to find the docs about restoring from a
snapshot if a quorum cannot be found. That info is on the discovery
troubleshooting page, but it seems I expect it to be on somewhere like
the quorums or voting docs pages instead. This commit adds links from
those pages to the troubleshooting page.
2023-11-23 09:45:21 +00:00
David Turner 9b51d9972d
More specific `cluster.initial_master_nodes` instructions (#101493)
In the note on forming a single cluster we describe what to do if
inadvertently forming extra clusters, but we can be more explicit about
what to do with `cluster.initial_master_nodes` in these instructions.
This commit adds the missing details.
2023-10-30 08:25:40 +00:00
David Turner 5dff56a00e
Mention network handler logging in docs (#100118)
Mentions the `InboundHandler` (and `OutboundHandler`) as potential
sources of useful log messages when tracking down a network threading
bug.
2023-10-02 08:52:16 +01:00
James Rodewig 4da2d31390
[main] [DOCS] Fix typo in query_cache.asciidoc (#99713) (#99810)
Co-authored-by: Joseph AFARI <71259267+joeafari@users.noreply.github.com>
2023-09-22 08:58:05 -04:00
James Rodewig 255c9a7f95
[DOCS] Move x-pack docs to `docs/reference` dir (#99209)
**Problem:**
For historical reasons, source files for the Elasticsearch Guide's security, watcher, and Logstash API docs are housed in the `x-pack/docs` directory. This can confuse new contributors who expect Elasticsearch Guide docs to be located in `docs/reference`. 

**Solution:**
- Move the security, watcher, and Logstash API doc source files to the `docs/reference` directory
- Update doc snippet tests to use security

Rel: https://github.com/elastic/platform-docs-team/issues/208
2023-09-12 14:53:41 -04:00
Abdon Pijpelink 54f6e4f51b
[DOCS] Remove 'coming in 8.10' from remote cluster API key auth docs (#99462) 2023-09-12 13:25:56 +02:00
Abdon Pijpelink af76a3a436
[DOCS] Add 'Troubleshooting an unstable cluster' to nav (#99287)
* [DOCS] Add 'Troubleshooting an unstable cluster' to nav

* Adjust docs links in code

* Revert "Adjust docs links in code"

This reverts commit f3846b1d78.

---------

Co-authored-by: David Turner <david.turner@elastic.co>
2023-09-08 13:42:50 +02:00
Abdon Pijpelink 0421c4fe9b
[DOCS] Remote cluster troubleshooting guide (#99128)
* [DOCS] Remote cluster troubleshooting guide

* Fix test failures

* Apply suggestions from code review

Co-authored-by: Yang Wang <ywangd@gmail.com>

* Review feedback

* Group issues under 'common' and 'API key'

* Apply suggestions from code review

Co-authored-by: Yang Wang <ywangd@gmail.com>

---------

Co-authored-by: Yang Wang <ywangd@gmail.com>
2023-09-05 15:10:45 +02:00
Yang Wang ebe4fe9f15
[Doc] Add links to the new API key based remote cluster page (#99115)
This PR adds links to the new API key based remote cluster page in
multiple places.

Relates: #98330
2023-09-01 06:08:49 -04:00
Abdon Pijpelink 4f1bf97776
[DOCS] Expand the step that enables the remote cluster server (#99084)
* [DOCS] Expand the step that enables the remote cluster server

* Update docs/reference/modules/cluster/remote-clusters-api-key.asciidoc

* Reword

* Reword
2023-09-01 10:35:46 +02:00
Abdon Pijpelink 792f9c1647
[DOCS] Remote cluster migration guide (#98999)
* [DOCS] Remote cluster migration guide

* Review feedback

* Clarify that any extra local privileges will be suppressed by the cross-cluster API key’s privileges
2023-08-31 10:24:20 +02:00
Stef Nestor de380ea2af
[DOC+] Write threadpool also covers ingest pipelines (#99010)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-08-29 13:51:18 -04:00
Abdon Pijpelink 1955bd8ad4
[DOCS] New docs for remote clusters using API key authentication (#98330)
* New docs structure for remote clusters

* Fix broken cross-book link errors

* More broken cross-book link errors

* Remove redirects for new pages

* Link to generic remote cluster docs instead

* Drop 'API' from the abbreviated title

* Add 'Establish trust with a remote cluster' section

* Restructure 'Establish trust' section into Prprequisite/local/remote instructions

* Add 'Configure roles and users' section

* Add 'Connect to a remote cluster' section

* Move version compatibility to prerequisites

* Fix test errors

* Incorporate review feedback

* Mention version 8.10 or later in the intro for API keys

* Add license prerequisite
2023-08-24 12:30:03 +02:00
Roberto Seldner 79d2879564
Add deprecated note for `balanced` allocator (#98610)
Co-authored-by: James Rodewig <james.rodewig@elastic.co>
2023-08-17 12:50:52 -04:00
Yang Wang b337f9b6f3
[Docs] Misc doc update for RCS 2.0 (#98472)
This PR adds docs for the following items: * Remote indices privileges *
Remote cluster network settings * Remote cluster security settings * New
privileges * New response field for RemoteInfo API

List of preview pages: * [Remote indices in defining
roles](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/defining-roles.html#roles-remote-indices-priv)
* [Remote indices in PutRole
API](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-api-put-role.html#security-api-put-role-request-body)
* [Remote cluster server SSL
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-settings.html#_remote_cluster_server_api_key_based_model_tlsssl_settings)
* [Remote cluster client SSL
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-settings.html#_remote_cluster_client_api_key_based_model_tlsssl_settings)
* [Remote cluster network
settings](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/modules-network.html#remote-cluster-network-settings)
and
[here](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/modules-network.html#common-network-settings)
* [Remote cluster credentials
setting](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/remote-clusters-settings.html)
* [New
privileges](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/security-privileges.html)
* [New response field for RemoteInfo
API](https://elasticsearch_98472.docs-preview.app.elstc.co/guide/en/elasticsearch/reference/master/cluster-remote-info.html#cluster-remote-info-api-response-body)
2023-08-15 20:11:21 -04:00
Luca Cavanna 4023454483
Introduce executor for concurrent search (#98204)
This commit enables concurrent search execution in the DFS phase, which is going to improve resource usage as well as performance of knn queries which benefit from both concurrent rewrite and collection.

We will enable concurrent execution for the query phase in a subsequent commit. While this commit does not introduce parallelism for the query phase, it introduces offloading sequential computation to the newly introduced executor. This is true both for situations where a single slice needs to be searched, as well as scenarios where a specific request does not support concurrency (currently only DFS phase does regardless of the request). Sequential collection is not offloaded only if the request includes aggregations that don't support offloading: composite, nested and cardinality as their post collection method must be executed in the same thread as the collection or we'll trip a lucene assertion that verifies that doc_values are pulled and consumed from the same thread.

## Technical details

This commit introduces a secondary executor, used exclusively to execute the concurrent bits of search. The search threads are still the ones that coordinate the search (where the caller search will originate from), but the actual work will be offloaded to the newly introduced executor.

We are offloading not only parallel execution but also sequential execution, to make the workload more predictable, as it would be surprising to have bits of search executed in either of the two thread pools. Also, that would introduce the possibility to suddenly run a higher amount of heavy operations overall (some in the caller thread and some in the separate threads), which could overload the system as well as make sizing of thread pools more difficult.

Note that fetch, together with other actions,  is still executed in the search thread pool. This commit does not make the search thread pool merely a coordinating only thread pool, It does so only for what concerns the IndexSearcher#search operation itself, which is though a big portion of the different phases of search API execution.

Given that the searcher blocks waiting for all tasks to be completed, we take a simple approach of introducing a thread pool executor that has the same size as the existing search thread pool but relies on an unbounded queue. This simplifies handling of thread pool queue and rejections. In fact, we'd like to guarantee that the secondary thread pool won't reject, and delegate queuing entirely to the search thread pool which is the entry point for every search operation anyway. The principle behind this is that if you got a slot in the search thread pool, you should be able to complete your search, and rather quickly.

As part of this commit we are also introducing the ability to cancel tasks that have not started yet, so that if any task throws an exception, other tasks are prevented from starting needless computation.

Relates to #80693
Relates to #90700
2023-08-10 12:40:36 +02:00
David Turner 0f6a217ed8
Fix admonition about initial_master_nodes (#98242)
Admonition paragraphs cannot be combined with a `+` continuation mark.
This commit fixes the formatting by using an admonition block instead.
2023-08-08 11:50:36 +01:00
David Turner 847ec45baa
Remove bound on SEARCH_COORDINATION default size (#98264)
Today by default the `SEARCH_COORDINATION` pool is sized at half the
allocated processors, or five if there are more than ten CPUs. Yet, if
we scale up a node to have more than ten CPUs, we probably want to scale
up the number of search coordination threads to match. This commit
removes the limit of five threads.
2023-08-08 07:09:25 +01:00
Pooya Salehi 966eb022d9
[DOCS] Mention mmap and FD limits when increasing default max shard per node (#97975) 2023-07-26 16:45:27 +02:00
David Turner 09e53f9ad9
Enhance docs around network troubleshooting (#97305)
Discovery, like cluster membership, can also be affected by network-like
issues (e.g. GC/VM pauses, dropped packets and blocked threads) so this
commit duplicates the troubleshooting info across both places.
2023-07-10 10:57:44 +01:00
James Rodewig ff84ad1469
[DOCS] Note license requirements for CCS (#97252)
Notes that CCS requires both clusters to use the same license level for full capabilities.
2023-06-29 16:55:10 -04:00
David Turner 2a49ad929c
Slightly better hot threads for transport workers (#96315)
A completely idle `transport_worker` thread is reported as `0.0%` idle,
which is confusing. Moreover the docs on the network threading model do
not reflect the changes made in #90482. This commit fixes both of those
things.
2023-05-25 12:08:08 +01:00