This makes the data stream lifecycle generally available. This will allow
data streams to take advantage of a native simplified and resilient
lifecycle implementation.
This PR adds a mappings version number to system index descriptor, which is intended to replace the use of Version to signal changes in the mappings for the descriptor. This value is required for managed system indices.
Previously, most of the system index descriptors automatically incremented their mapping version with each release, which meant that the mappings would stay up-to-date with any additive changes. Now, developers will need to increment the mapping version whenever there's a change.
* Add MappingsVersion inner class to SystemIndexDescriptor
* Add mappings version to metadata in all system index mappings
* Rename version meta key ('system' -> 'managed')
* Update mappings for ML indices if required
* Trigger ML index mappings updates based on new index mappings version.
---------
Co-authored-by: Ed Savage <ed.savage@elastic.co>
Some Rest Specs contain multiple paths, where some paths are
preferred and the others exist for historical reasons.
This commit makes 2 changes:
1. Tracks when a path is deprecated in the Rest Spec, so that metadata
can be accessed in the Java model of the API
2. Adds filter (predicate) to the execution context to determine which
paths should be considered as candidates within the test suite
Tests that extends from `ESClientYamlSuiteTestCase` can override
`createRestTestExecutionContext` and pass in a predicate that skips
deprecated paths (or any other criteria), provided that it does not
skip _all_ paths in an API.
Several minor structural and test improvements for cross-cluster search
These changes set the stage for a follow-on ticket to add _cluster/details to
cross-cluster searches with minimize_roundtrips = false. To help keep that
PR from being too large some of the simpler required changes and tests
are added in this PR.
This adds a new parameter to LocalClusterSpecBuilder.user(..) to
indicate whether the user should be an operator or not.
Users added with the simplified user(username, password) method are
operators (and have the "_es_test_root" role).
This commit changes the ActionModules to allow the RestController to be
provided by an internal plugin.
It renames `RestInterceptorActionPlugin` to `RestServerActionPlugin`
and adds a new `getRestController` method to it.
There may be multiple RestServerActionPlugins installed on a node, but
only 1 may provide a Rest Wrapper (getRestHandlerInterceptor) and only 1
may provide a RestController (getRestController).
This test only delays allocation for 120s which isn't necessarily enough
time for the node to come back, and yet it requires the shard to be
assigned back to the first node. This commit extends the timeout to make
sure the shard is reassigned as required, and tidies up a few other
aspects.
Closes#97896
in 2.17.2 (patch release) log4j has made a refactoring that requires a Configuration to be manually
passed into the created PatternLayout
If the Configuration is not passed, the System Variable lookup will not work
This results in cluster.name field not being populated in logs if ESv7 layout was used on ESv8 cluster (cloud)
This commit creates a PatternLayout with a DefaultConfiguration (the same was used previous to the refactoring)
Also adds back tests that verifies ESv8 cluster with ESv7 logging config. We still have that situation in the cloud so it is better to have tests around this.
relates apache/logging-log4j2#1551
The index and transport versions are low level details of how a node
behaves in a cluster. They were recently added to the main endpoint
response, but they are too low level and should be moved to another
endpoint TBD.
This commit removes those versions from the main endpoint response. Due
to the fact lucene version is now derived from index version, this
commit also adds an explicit lucene version member to the main response.
Cancel shard allocation command is broken for initial desired balance versions
and might allocate shard on the node where it is not supposed to be. This
is fixed by https://github.com/elastic/elasticsearch/pull/93635. Disabling test
when upgrading from affected versions.
For Stateless autoscaling, we'd need a different Alpha to track the task
execution time EWMA. This change makes the EWMA Alpha configurable and
uses a different value for the Write executors.
Closes ES-6325
This infers the index version from the specified node version. Note that this will break when 8.10 is released, and BwC tests try to use 8.10 nodes. #97200 should be implemented before 8.10 is released to properly specify IndexVersion, without requiring inference.
Drying this up further and adding the same short-cut for single node
tests. Dealing with most of the spots that I could grab via automatic
refactorings.
We want to be able to tweak client configuration in tests extending
ESRestTestCase. This makes that simpler by allowing overriding the
general client configuration.
This API can be quite heavy in large clusters, and might spam the
`MANAGEMENT` threadpool queue with work for clients that have long-since
given up. This commit adds some basic cancellability checks to reduce
the problem.
The log4j JUL bridge turned out to have issues because it relied on java
beans. This commit implements a custom bridge between JUL and Log4j.
closes#94613
This commit adds the Log4j JUL bridge so that messages using JUL are
more nicely converted to log4j messages. Currently these messages are
captured via the stdout logging stream. This commit also adds a log4j
filter to replace the logging stream filtering mechanism used to quiet
some Lucene log messages that may be confusing to users.
closes#94613
This commit changes access to the latest TransportVersion constant to
use a static method instead of a public static field. By encapsulating
the field we will be able to (in a followup) lazily determine what the
latest is, outside of clinit.
We continue to have CI failures for open files when trying to cleanup on
Windows. This commit tries to account for one of those cases, where the
out/err redirects are cleaned up, opting to retry once after a delay.
* Support CCS minimize round trips in async search
This commit makes the smallest set of changes to allow async-search based cross-cluster search
to work with the CCS minimize_round_trips feature without changing the internals/architecture of
the search action.
When ccsMinimizeRoundtrips is set to true on SubmitAsyncSearchRequest, the AsyncSearchTask on the
primary CCS coordinator sends a synchronous SearchRequest to all to clusters for a remote coordinator
to orchestrate and return the entire result set to the CCS coordinator as a single response.
This is the same functionality provided by synchronous CCS search using minimize_roundtrips.
Since this is an async search, it means that the async search coordinator has no visibility
into search progress on the remote clusters while they are running the search, thus losing one of
the key features of async search. However, this is a good first approach for improving overall search
latency for cross cluster searches that query a large number of shards on remote clusters, since
Kibana does not currently expose incremental progress of an async search to users.
Relates #73971
In a busy cluster the list-tasks API may retain information about a very
large number of tasks while waiting for all nodes to respond. This
commit makes the API cancellable so that unnecessary partial results can
be released earlier.
Relates #96279, which implements the early-release functionality.
This PR integrates CCS with the new search_shards API. With this change,
we will be able to skip shards on the coordinator on remote clusters
using the timestamps stored in the cluster state.
Relates #94534Closes#93730
we want to allow overriding info (GET /) api in serverless, therefore this commit moves the RestMainAction and is transport classes into a module that has a rest plugin
Main endpoint is often used in testing to verfiy that a cluster is ready, hence this commit also has to add a testing dependency on main to a lot of modules
relates #95422
The PR adds enforcement for API key type at authentication time.
Concretely, new cross-cluster API keys (#95714) can only be used on the
dedicated remote cluster interface and the existing (rest) API key must
not be used for new remote cluster communication. To make cross-cluster
API keys actually usable after authentication, the PR also adds support
for resolving their roles.
This PR adds new named cluster and index priviliges for CCR actions
required with the new RCS model. The new privileges are tightly scoped
so that it is no longer necessary to grant wider named privileges, e.g.
manage. Concretely, the following privileges are added: * cluster
privilege `cross_cluster_replication` * index privilege
`cross_cluster_replication` covers index actions required to be
performed with end users * index privilege
`cross_cluster_replication_internal` covers index actions performed by
the internal user
The intention of having two index privileges for CCR is that
`cross_cluster_replication` could be granted as part of `remote_indices`
for end-users on the QC (follower cluster). QC admins or users do not
have to care about `cross_cluster_replication_internal` which will be
automatically handled by FC (leader cluster) admins once the specialized
API key is in place.
The PR also renames `cross_cluster_access` to `cross_cluster_search`
which corresponds better to the new `cross_cluster_replication`.
This introduces an endpoint to reset the desired balance.
It could be used if computed balance diverged from the actual one a lot
to start a new computation from the current state.
This changes the serialization format for queries - when the index version is >=8.8.0, it serializes the actual transport version used into the stream. For BwC with old query formats, it uses the mapped TransportVersion for the index version.
This can be modified later if needed to re-interpret the vint used to store TransportVersion to something else, allowing the format to be further modified if necessary.
Fixes#82794. Upgrade the spotless plugin, which addresses the issue
around formatting `instanceof` expressions. Formatting of statements
including lambdas seems to have improved too.
Renames the `cluster.remote.*.authorization` setting to
`cluster.remote.*.credentials` since the name is more precise and aligns
with the rest of the code base.
This PR adds a redacted cluster_credentials field to the
RemoteConnctionInfo API response to differenitate between basic and RCS
remote clusters. The new field is available only for RCS remote
clusters.
Rename refactor PR that uses `cross_cluster_access` in place of
`remote_access` wherever appropriate, since `cross_cluster_access` is a
more precise, clearer term. No functional changes, however I did make a
few tweaks around version handling.
The test suite has been recently expanded to including more YAML tests,
PIT and msearh (#93720), EQL (#94265) and SQL (#94416). The original 15
min timeout needs to be bumped a bit to accomodate the new tests. This
PR increases it for additional 5 minutes.
PS: It is possible that the corresponding CcsCommonYamlTestSuite also
needs to similar increase in timeout. This PR does not touch it because
we haven't observed similar timeout for it so far and also tests with
existing CCS model and security disabled run a bit faster. So it may not
be an issue yet.
Resolves: #94491
This PR adds support for CCS common YAML test suite to run SQL yaml
tests with both the existing and new remote cluster models.
Relates: #94265, #93720
PS: SQL does not send its own actions across clusters and relies only on
the generic search, field_caps actions. Hence it is automatically
supported without touching production code, i.e. changes are test only.
In https://github.com/elastic/elasticsearch/pull/91238 we rewrote
BulkProcessor to avoid deadlock that had been seen in the
IlmHistoryStore. At some point we will remove BulkProcessor altogether.
This PR ports a couple of integration tests that were using BulkProcesor
over to BulkProcessor2.
This PR adds sniff mode support for RCS. It's achieved by having
alternative handshake action and nodes action to return DiscoveryNode
with remote cluster server address instead of the main transport
address. The changes are only effective to the new model and how
existing remote cluster model works is fully maintained. REST tests are
updated to random between proxy and sniff mode.
An additional benefit of using the alternative handshake action is that
we can remove the special handling of handshake action on the FC side to
ensure all actions must be allowed by the remote access API key. This
will be addressed in a separate PR.
Today the master's pending task queue is just the
`PriorityBlockingQueue<Runnable>` belonging to the underlying
`ThreadPoolExecutor`. The reasons for this date back a long way but it
doesn't really reflect the structure of the queue as it exists today. In
particular, we must keep track of batches independently of the queue
itself, and must do various bits of unchecked casting to process
multiple items of the same type at once.
This commit introduces an new queueing mechanism, independent of the
executor's queue, which better represents the conceptual structure of
the master's pending tasks:
* Today we use a priority queue to allow important tasks to preempt less-important ones. However there are only a small number of priority levels, so it is simpler to maintain a queue for each priority, effectively replacing the sorting within the priority queue with a radix sort.
* Today when a task is submitted we perform a map lookup to see if it can be added to an existing batch or not. With this change we allow client code to create its own dedicated queue of tasks. The entries in the per-priority-level queues are themselves queues, one for each executor, representing the batches to be run.
* Today each task in the queue holds a reference to its executor, but the executor used to run a task may belong to a different task in the same batch. In practice we know they're the same executor (that's how batches are defined) but we cannot express this knowledge in the type system so we have to do a bunch of unchecked casting to work around it. With this change we associate each per-executor queue directly with its executor, avoiding the need to do all this unchecked casting.
* Today the master service must block its thread while waiting for each task to complete, because otherwise the executor would start to process the next task in the queue. This makes testing using a `DeterministicTaskQueue` harder (see `FakeThreadPoolMasterService`). This change avoids enqueueing tasks on the `ThreadPoolExecutor` unless there is genuinely work to do, although it leaves the removal of the actual blocking to a followup.
Closes#81626
We built quite a bit of infrastructure to have one polling job
running via the `SchedulerEngine` and `ActiveSchedule`. This moves this
infrastructure outside x-pack to server so elasticsearch/modules can use
it and avoid re-implementing it using `threadPool.schedule`.
This PR ensures most search features (scroll, async search, pit, field
caps, msearch, vector tile etc) work with the new RCS model. The main
code change is tested by adapting the common yaml CCS tests to use the
new RCS model to provide a broad test coverage. The tests ensure the new
RCS model works from search's perspective. We could still use more tests
from security's perspective, e.g. DLS/FLS, in separate PRs.
Note: * Eql yaml test files are not located under `x-pack/plugin` and
this makes it hard to reuse. It should be possible to relocate them. But
I'll address it separately. * Sql yaml requires special transformation
to work. I'll also have it separately.
* Don't report MIGRATION_NEEDED for 7.x indices
Eventually, we will need to migrate 7.x indices to 8.x before doing a
significant upgrade of Lucene. However, the migrations to 8.x are not
adequately tested: while they will eventually be needed, they are not
currently needed, and may in fact produce bugs.
This change will ensure that the GET _migration/system_feature API
returns NO_MIGRATION_NEEDED in 8.x. We will begin to require
migrations once we start testing for the next major version upgrade.
Today when setting up for the desired balance computation we move all
shards to their desired locations without checking any allocation rules.
However, certain allocation rules (e.g. those related to node versions
and shutdowns) may prevent these movements in reality, resulting in a
shard which cannot move to its desired location but which may not remain
on its current node either.
This commit adds some checks to verify that these preliminary moves are
still legal when setting up the computation.
Closes#93271
We still don't properly understand why this test is failing, and it
doesn't reproduce locally, so this commit adds a little extra logging to
capture extra detail from a failure in CI.