This is the final step in the removal of the X-Pack specific SSL
configuration code (replaced by libs/ssl-config)
For some time we have had two implementations of SSL Configuration
code. One was in org.elasticsearch.xpack.core.ssl, the other in
org.elasticsearch.common.ssl
These two implementations had essentially the same functionality:
- Handle settings such as '*.ssl.certificate`, `*.ssl.key` etc
- Load certificates and keys from PEM files and Keystores
- Build and configure Java class such as SSLContext, KeyManager, etc
based on the configuration and certificates.
As of this common the X-Pack version is no more, and all SSL
configuration in Elasticsearch is handled by the libs/ssl-config
version instead.
Resolves: #68719
This PR makes the delayed allocation infrastructure aware of registered node shutdowns, so that reallocation of shards will be further delayed for nodes which are known to be restarting.
To make this more configurable, the Node Shutdown APIs now support a `allocation_delay` parameter, which defaults to 5 minutes. For example:
```
PUT /_nodes/USpTGYaBSIKbgSUJR2Z9lg/shutdown
{
"type": "restart",
"reason": "Demonstrating how the node shutdown API works",
"allocation_delay": "20m"
}
```
Will cause reallocation of shards assigned to that node to another node to be delayed by 20 minutes. Note that this delay will only be used if it's *longer* than the index-level allocation delay, set via `index.unassigned.node_left.delayed_timeout`.
The `allocation_delay` parameter is only valid for `restart`-type shutdown registrations, and the request will be rejected if it's used with another shutdown type.
Adds new field to recovery API to keep track of amount of data
recovered from snapshots.
The normal recovered_bytes field remains and is also increased for
recovery from snapshot but can go backwards in the unlikely case
that recovery from snapshot fails to download a file.
Relates #73496
This commit removes the X-Pack specific enums SSLClientAuth
and VerificationMode and updates places where they were used
to instead use the SslClientAuthenticationMode and
SslVerificationMode enums from the ssl-config library.
Relates: #68719
* Refactor SSL setup in X-Pack
This commit makes some internal changes to the way SSL configuration
works in X-Pack. This is in preparation for replacing the X-PackSSL
configuration with "libs/ssl-config" instead.
* Adds a new class to x-pack core that can loads
SslConfiguration objects (as defined in the ssl-config library),
from standard Elasticsearch Settings objects.
This class supports the semantics that are used for "xpack.*.ssl.*"
settings.
* Refactors the internals of SSLConfigurationSettings to reduce the
number of constants and the duplication of code between them
* Address feedback
Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
The org.elasticsearch.bootstrap package exists in server with classes
for starting up Elasticsearch. The elasticsearch-core jar has a handful
of classes that were split out from there, namely java version parsing
and jarhell. This commit moves those classes to a new
org.elasticsearch.jdk package so as to not split the server owned
bootstrap package.
relates #73784
We changed the default joda behaviour in strict_date_optional_time to
max 4 digits in a year. Java.time implementation should behave the same way.
At the same time date_optional_time should have 9digits for year part.
closes#52396closes#72191
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
This commit removes the ability for plugins to add roles. Roles are
fairly tightly coupled with the behavior of the system (as evidenced by
the fact that some roles from the default distribution leaked behavior
into the OSS distribution). We previously had this plugin extension
point so that we could support a difference in the set of roles between
the OSS and default distributions. We no longer need to maintain that
differentiation, and can therefore remove this plugin extension
point. This was technical debt that we were willing to accept to allow
the default distribution to have additional roles, but now we no longer
need to be encumbered with that technical debt.
This commit moves the machine learning roles to server. We no longer
need to maintain these roles outside of server since we only produce a
single distribution, the default distribution, which includes all
roles. Therefore we can simplify the plugin architecture by removing the
plugin extension point for roles. This is one step in that, by moving
the machine learning roles to server.
Today the response to `GET _cluster/state` does not include the roles of
the nodes in the cluster. In the past this made sense, roles were
relatively unchanging things that could be determined from elsewhere.
These days we have an increasingly rich collection of roles, with
nontrivial BWC implications, so it is important for debugging to be able
to see the specific roles as viewed by the master. This commit adds the
role names to the cluster state API output.
Relates #71385
When collecting stats for ML jobs, the collector checks the ML license.
However, it needs to do this in a way that is not tracked for feature
useage. This commit changes the collector to use the isAllowed method of
license state.
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
This commit moves the data tier roles to server. It is no longer
necessary to separate these roles from server as we no longer build
distributions that would not contain these roles. Moving these roles
will simplify many things. This is deliberately the smallest possible
commit that moves these roles. Other aspects related to the data tiers
can move in separate, also small, commits.
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.
Relates #69820
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
The test failure issue for this test has been open for over two years, but this test has been muted
so long that we don't have any actual failure information.
This unmutes it so either it ceases to fail (yay), or else it fails and we have a gradle buildscan
link that provides us with a bit more information.
Relates to #29880
This commit introduces system index types that will be used to
differentiate behavior. Previously system indices were all treated the
same regardless of whether they belonged to Elasticsearch, a stack
component, or one of our solutions. Upon further discussion and
analysis this decision was not in the best interest of the various
teams and instead a new type of system index was needed. These system
indices will be referred to as external system indices. Within external
system indices, an option exists for these indices to be managed by
Elasticsearch or to be managed by the external product.
In order to represent this within Elasticsearch, each system index will
have a type and this type will be used to control behavior.
Closes#67383
Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.
Relates to #66586
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow
Test artifact resolution is now variant aware which allows us a more adequate
compile and runtime classpath for the consuming projects.
We also Introduce a convention method in the elasticsearch build to declare
test artifact dependencies in an easy way close to how its done by the gradle build in
test fixture plugin.
Furthermore we cleaned up some inconsistent test dependencies declarations when
relying on a project and on its test artifacts
Today `GET _cluster/stats` can be quite expensive, and is typically
retrieved periodically by monitoring systems (e.g. Metricbeat) that
implement a client-side timeout. When the client times out it closes the
HTTP connection in use. With this commit we react to the close of the
HTTP connection by cancelling the ongoing stats request, avoiding
unnecessary duplicated work.
Relates #55550
unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests
Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.
Relates to #66586
Part 10 (and hopefully the last one).
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
Part 9.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
This has been deprecated in gradle before but we havnt been warned.
Gradle 7.0 will likely introduce a change in behaviour here that we
should fix the usage of this configuration upfront.
See https://github.com/gradle/gradle/issues/16027 for further information
about the change in Gradle 7.0
Part 6.
We have an in-house rule to compare explicitly against false instead
of using the logical not operator (!). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
This PR removes support for index metadata upgrades:
* Stop using the `index.version.upgrade` setting and deprecate it.
* Remove `MetadataIndexUpgradeService` and other references to upgrades.
In addition to supporting upgrades, `MetadataIndexUpgradeService` verified
certain aspects of the metadata, like index version compatibility. This logic
is important to keep, so `MetadataIndexUpgradeService` was reworked to
`IndexMetadataVerifier` instead of being removed completely.
Closes#66143.
Part 5.
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.
We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
This change adds a new "architectures" section to the
cluster stats, containing a summary of how many nodes
in the cluster are on each processor architecture.
The intention is to make it easier to see whether
clusters are running on aarch64, or mixed x86_64/aarch64,
which may aid support as aarch64 becomes more commonly
used.
This commit adds statistics about the index creation versions to the `/_cluster/stats` endpoint. The
stats look like:
```
{
"_nodes" : {
"total" : 1,
"successful" : 1,
"failed" : 0
},
"indices" : {
"count" : 3,
...
"versions" : [
{
"version" : "8.0.0",
"index_count" : 1,
"primary_shard_count" : 2,
"total_primary_size" : "8.6kb",
"total_primary_bytes" : 8831
},
{
"version" : "7.11.0",
"index_count" : 1,
"primary_shard_count" : 1,
"total_primary_size" : "4.6kb",
"total_primary_bytes" : 4230
}
]
},
...
}
```
(`total_primary_size` is only shown with the `?human` flag)
This is useful for telemetry as it allows us to see if/when a cluster has indices created on a
previous version that would need to be either upgraded or supported during an upgrade.
When tests in AbstractIndicesCleanerTestCase run at 1AM they can clash with scheduled clean up in CleanerService which leads to rare, non-reproducible failures.
This change fixes it by setting retention period in CleanerService to max possible time, so nothing is ever deleted by it.
Closes#64386
With #66993 there is now support for coordinator-side timeouts on a
`BroadcastRequest`, which includes requests for node stats and
recoveries. This commit adjusts Monitoring to use these coordinator-side
timeouts where applicable, which will prevent partial stats responses
from accumulating on the master while one or more nodes are not
responding quickly enough. It also enhances the message logged on a
timeout to include the IDs of the nodes which did not respond in time.
Closes#60188.
Adds data-streams plugin to LocalStateMonitoring and dummy stats actions for ccr and enrich.
The data stream plugin and dummy transport actions that are added to LocalStateMonitoring
will allow for monitoring java integration tests to function properly without printing error
messages that make debugging harder. For example the data stream plugin was added so that
index templates with data streams can be added without failing constantly in the background and
enrich stats dummy transport action so that the EnrichStatsCollector doesn't fail.
Also unmutes tests that were muted via #66586, to have another opportunity to look at logs without all the noise,
perhaps all these errors contributed to the test failures.
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.
#51816