Commit Graph

327 Commits

Author SHA1 Message Date
Martijn van Groningen 1ae4f3c937
Add enrich node cache (#76800)
Introduce a LRU cache to avoid searches that occur frequently
from the enrich processor.

Relates to #48988
2021-09-03 09:33:44 +02:00
Tim Vernum 76a684ad32
Replace X-Pack SSL config with libs/ssl-config (#76636)
This is the final step in the removal of the X-Pack specific SSL
configuration code (replaced by libs/ssl-config)

For some time we have had two implementations of SSL Configuration
code. One was in org.elasticsearch.xpack.core.ssl, the other in
org.elasticsearch.common.ssl

These two implementations had essentially the same functionality:
- Handle settings such as '*.ssl.certificate`, `*.ssl.key` etc
- Load certificates and keys from PEM files and Keystores
- Build and configure Java class such as SSLContext, KeyManager, etc
  based on the configuration and certificates.

As of this common the X-Pack version is no more, and all SSL
configuration in Elasticsearch is handled by the libs/ssl-config
version instead.

Resolves: #68719
2021-08-20 10:31:02 +10:00
Gordon Brown 58f66cf04a
Delay shard reassignment from nodes which are known to be restarting (#75606)
This PR makes the delayed allocation infrastructure aware of registered node shutdowns, so that reallocation of shards will be further delayed for nodes which are known to be restarting.

To make this more configurable, the Node Shutdown APIs now support a `allocation_delay` parameter, which defaults to 5 minutes. For example:
```
PUT /_nodes/USpTGYaBSIKbgSUJR2Z9lg/shutdown
{
  "type": "restart",
  "reason": "Demonstrating how the node shutdown API works",
  "allocation_delay": "20m"
}
```

Will cause reallocation of shards assigned to that node to another node to be delayed by 20 minutes. Note that this delay will only be used if it's *longer* than the index-level allocation delay, set via `index.unassigned.node_left.delayed_timeout`.

The `allocation_delay` parameter is only valid for `restart`-type shutdown registrations, and the request will be rejected if it's used with another shutdown type.
2021-08-16 15:59:50 -06:00
Francisco Fernández Castaño 2c132fe5f7
Keep track of data recovered from snapshots in RecoveryState (#76499)
Adds new field to recovery API to keep track of amount of data
recovered from snapshots.

The normal recovered_bytes field remains and is also increased for
recovery from snapshot but can go backwards in the unlikely case
that recovery from snapshot fails to download a file.

Relates #73496
2021-08-16 18:18:37 +02:00
Joe Gallo dcd363e67e
Re-enable deprecated _xpack/monitoring routes via REST compatibility (#76180) 2021-08-05 12:04:15 -04:00
Joe Gallo 26b41f2bb4 Revert "Re-enable deprecated _xpack/monitoring routes via REST compatibility (#75948)"
This reverts commit 5a97262a95.
2021-08-05 09:48:48 -04:00
Joe Gallo 5a97262a95
Re-enable deprecated _xpack/monitoring routes via REST compatibility (#75948) 2021-08-05 09:25:10 -04:00
Tim Vernum b0f68efb08
Remove X-Pack specific SSL enums (#75870)
This commit removes the X-Pack specific enums SSLClientAuth
and VerificationMode and updates places where they were used
to instead use the SslClientAuthenticationMode and
SslVerificationMode enums from the ssl-config library.

Relates: #68719
2021-08-02 18:17:13 +10:00
Adrien Grand d15445e0f3
Remove usage of RAM accounting of segments (#75674)
This is a pre-requisite for the upgrade to Lucene 9, which removes the ability to estimate RAM usage of segments.
2021-07-29 08:36:09 +02:00
Tim Vernum a81f0b4f29
Refactor SSL setup in X-Pack (#75410)
* Refactor SSL setup in X-Pack

This commit makes some internal changes to the way SSL configuration
works in X-Pack. This is in preparation for replacing the X-PackSSL
configuration with "libs/ssl-config" instead.

* Adds a new class to x-pack core that can loads
  SslConfiguration objects (as defined in the ssl-config library),
  from standard Elasticsearch Settings objects.
  This class supports the semantics that are used for "xpack.*.ssl.*"
  settings.

* Refactors the internals of SSLConfigurationSettings to reduce the
  number of constants and the duplication of code between them

* Address feedback

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-07-28 05:11:17 -04:00
Ryan Ernst 68817d7ca2
Rename o.e.common in libs/core to o.e.core (#73909)
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.

relates #73784
2021-06-08 09:53:28 -07:00
Ryan Ernst 64054de1ac
Rename bootstrap package in core jar (#73788)
The org.elasticsearch.bootstrap package exists in server with classes
for starting up Elasticsearch. The elasticsearch-core jar has a handful
of classes that were split out from there, namely java version parsing
and jarhell. This commit moves those classes to a new
org.elasticsearch.jdk package so as to not split the server owned
bootstrap package.

relates #73784
2021-06-07 08:14:44 -07:00
Przemyslaw Gomulka aba2282511
Change year max digits for strict_date_optional_time and date_optional_time (#73034)
We changed the default joda behaviour in strict_date_optional_time to
max 4 digits in a year. Java.time implementation should behave the same way.
At the same time date_optional_time should have 9digits for year part.

closes #52396
closes #72191
2021-06-04 09:35:07 +02:00
Rene Groeschke e609e07cfe
Remove internal build logic from public build tool plugins (#72470)
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic

This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.

It also introduces a set of internal versions of public plugins.

As part of this we also generate the plugin descriptors now.

As a follow up on this we can actually move these public used classes into 
an extra project (declared as included build)

We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
2021-05-06 14:02:35 +02:00
zhangchao 71a9cfec79
Add fs iotime in Nodes Stats API (#67861)
This adds io_time_in_millis to Nodes stats API
2021-05-05 15:33:13 +01:00
Jason Tedor 6823b8eb5e
Remove the ability for plugins to add roles (#71527)
This commit removes the ability for plugins to add roles. Roles are
fairly tightly coupled with the behavior of the system (as evidenced by
the fact that some roles from the default distribution leaked behavior
into the OSS distribution). We previously had this plugin extension
point so that we could support a difference in the set of roles between
the OSS and default distributions. We no longer need to maintain that
differentiation, and can therefore remove this plugin extension
point. This was technical debt that we were willing to accept to allow
the default distribution to have additional roles, but now we no longer
need to be encumbered with that technical debt.
2021-04-13 22:53:05 -04:00
Jason Tedor 60808e92c1
Move voting only role to server (#71473)
This commit moves the voting only role to server, as part of the effort
to remove the ability for plugins to add roles.
2021-04-09 10:13:53 -04:00
Jason Tedor 241b653ae4
Move machine learning roles to server (#71412)
This commit moves the machine learning roles to server. We no longer
need to maintain these roles outside of server since we only produce a
single distribution, the default distribution, which includes all
roles. Therefore we can simplify the plugin architecture by removing the
plugin extension point for roles. This is one step in that, by moving
the machine learning roles to server.
2021-04-07 19:25:36 -04:00
David Turner 6ed2d25458
Include node roles in cluster state JSON response (#71386)
Today the response to `GET _cluster/state` does not include the roles of
the nodes in the cluster. In the past this made sense, roles were
relatively unchanging things that could be determined from elsewhere.
These days we have an increasingly rich collection of roles, with
nontrivial BWC implications, so it is important for debugging to be able
to see the specific roles as viewed by the master. This commit adds the
role names to the cluster state API output.

Relates #71385
2021-04-07 10:44:35 +01:00
Ryan Ernst 6a081e1562
Do not track ML usage when collecting monitoring (#71314)
When collecting stats for ML jobs, the collector checks the ML license.
However, it needs to do this in a way that is not tracked for feature
useage. This commit changes the collector to use the isAllowed method of
license state.
2021-04-06 14:58:45 -07:00
Jason Tedor 32314493a2
Pass override settings when creating test cluster (#71203)
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
2021-04-02 10:20:36 -04:00
Jason Tedor e119ac60d4
Move data tier roles to server (#71084)
This commit moves the data tier roles to server. It is no longer
necessary to separate these roles from server as we no longer build
distributions that would not contain these roles. Moving these roles
will simplify many things. This is deliberately the smallest possible
commit that moves these roles. Other aspects related to the data tiers
can move in separate, also small, commits.
2021-03-31 15:13:02 -04:00
Henning Andersen 0f28e97857
Total data set size in stats (#70625)
With shared cache searchable snapshots we have shards that have a size
in S3 that differs from the locally occupied disk space. This commit
introduces `store.total_data_set_size` to node and indices stats, allowing to
differ between the two.

Relates #69820
2021-03-30 15:23:29 +02:00
David Kyle c4234ca3a0
[CI] Mute LocalExporterResourceIntegTests (#70365) 2021-03-15 09:50:45 +00:00
Luca Cavanna ffe61fb097
Move runtime fields stats to server (#69487)
Runtime fields usage is currently reported as part of the xpack feature usage API. Now that runtime fields are part of server, their corresponding stats can be moved to be part of the ordinary mapping stats exposed by the cluster stats API.
2021-03-08 12:38:20 +01:00
Lee Hinman 783712407d
Re-enable MonitoringIT.testMonitoringService (#69939)
The test failure issue for this test has been open for over two years, but this test has been muted
so long that we don't have any actual failure information.

This unmutes it so either it ceases to fail (yay), or else it fails and we have a gradle buildscan
link that provides us with a bit more information.

Relates to #29880
2021-03-03 16:56:56 -07:00
Jay Modi 1487a5a991
Introduce system index types including external (#68919)
This commit introduces system index types that will be used to
differentiate behavior. Previously system indices were all treated the
same regardless of whether they belonged to Elasticsearch, a stack
component, or one of our solutions. Upon further discussion and
analysis this decision was not in the best interest of the various
teams and instead a new type of system index was needed. These system
indices will be referred to as external system indices. Within external
system indices, an option exists for these indices to be managed by
Elasticsearch or to be managed by the external product.

In order to represent this within Elasticsearch, each system index will
have a type and this type will be used to control behavior.

Closes #67383
2021-03-01 10:38:53 -07:00
Martijn van Groningen de2598ca47
Manually trigger local exporter to open a bulk in some monitor tests. (#69139)
Change tests to use monitor bulk api on elected master node before verifying watcher index exists.
Sometimes the monitor service on the elected master doesn't yet export monitor documents resulting in tests using the `ensureInitialLocalResources(...)` method to fail.
Cluster alerts watcher are only installed when local exporter tries to resolve local bulk.

Relates to #66586
2021-02-18 08:55:40 +01:00
Rene Groeschke bdf229a148
Introduce Internal Test Artifact Plugin (#68766)
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow

Test artifact resolution is now variant aware which allows us a more adequate 
compile and runtime classpath for the consuming projects.

We also Introduce a convention method in the elasticsearch build to declare 
test artifact dependencies in an easy way close to how its done by the gradle build in 
test fixture plugin.

Furthermore we cleaned up some inconsistent test dependencies declarations when 
relying on a project and on its test artifacts
2021-02-16 14:36:17 +01:00
David Turner 114c39625b
Make GET _cluster/stats cancellable (#68676)
Today `GET _cluster/stats` can be quite expensive, and is typically
retrieved periodically by monitoring systems (e.g. Metricbeat) that
implement a client-side timeout. When the client times out it closes the
HTTP connection in use. With this commit we react to the close of the
HTTP connection by cancelling the ongoing stats request, avoiding
unnecessary duplicated work.

Relates #55550
2021-02-10 12:23:51 +00:00
Martijn van Groningen a7abc0a556
Add more trace logging when installing monitor watches and (#68752)
unmute TransportMonitoringMigrateAlertsActionTests#testLocalAlertsRemoval and
TransportMonitoringMigrateAlertsActionTests#testRepeatedLocalAlertsRemoval tests

Somehow during these tests the monitor watches are not installed. Both
tests use the local exporter and this exporter only installs the watches
under specific conditions via the elected master node. I suspect the
conditions are never met. The http exporter is more relaxed when attempting
to install monitor watches and the tests using the http exporter seem
not to be prone by the fact that tests fail because monitor watches have
not been installed.

Relates to #66586
2021-02-10 09:04:26 +01:00
Rory Hunter 9c7fe876a2
Replace NOT operator with explicit `false` check - part 10 (#68652)
Part 10 (and hopefully the last one).

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:38:22 +00:00
Rory Hunter 2d44cce31e
Replace NOT operator with explicit `false` check - part 9 (#68645)
Part 9.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:28:57 +00:00
Rene Groeschke 5dfa6f46ac
Remove deprecated usage of default configuration (#68575)
This has been deprecated in gradle before but we havnt been warned.

Gradle 7.0 will likely introduce a change in behaviour here that we
should fix the usage of this configuration upfront.

See https://github.com/gradle/gradle/issues/16027 for further information
about the change in Gradle 7.0
2021-02-07 12:08:02 +01:00
Martijn van Groningen 317448d4cf
Mute testLocalAlertsRemoval and testRepeatedLocalAlertsRemoval again, (#68578)
see #66586
2021-02-05 12:05:13 +01:00
Martijn van Groningen 40a073714b
Unmute and enable test logging (#67967)
Relates to #66586
2021-02-05 10:22:56 +01:00
Rory Hunter 65501d0f25
Replace NOT operator with explicit `false` check - part 6 (#68416)
Part 6.

We have an in-house rule to compare explicitly against false instead
of using the logical not operator (!). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-03 21:40:49 +00:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Julie Tibshirani 6be3a507f6
Refactor IndexMetadataUpgradeService to IndexMetadataVerifier (#67547)
This PR removes support for index metadata upgrades:
* Stop using the `index.version.upgrade` setting and deprecate it.
* Remove `MetadataIndexUpgradeService` and other references to upgrades.

In addition to supporting upgrades, `MetadataIndexUpgradeService` verified
certain aspects of the metadata, like index version compatibility. This logic
is important to keep, so `MetadataIndexUpgradeService` was reworked to
`IndexMetadataVerifier` instead of being removed completely.

Closes #66143.
2021-02-02 09:14:13 -08:00
Rory Hunter b4514228f0
Replace NOT operator with explicit `false` check - part 5 (#68360)
Part 5.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-02 14:27:33 +00:00
David Roberts 6e392a317d
Add processor architectures to cluster stats (#68264)
This change adds a new "architectures" section to the
cluster stats, containing a summary of how many nodes
in the cluster are on each processor architecture.

The intention is to make it easier to see whether
clusters are running on aarch64, or mixed x86_64/aarch64,
which may aid support as aarch64 becomes more commonly
used.
2021-02-02 09:48:20 +00:00
Lee Hinman ac1433d300
Add index creation version stats to cluster stats (#68141)
This commit adds statistics about the index creation versions to the `/_cluster/stats` endpoint. The
stats look like:

```
{
  "_nodes" : {
    "total" : 1,
    "successful" : 1,
    "failed" : 0
  },
  "indices" : {
    "count" : 3,
    ...
    "versions" : [
      {
        "version" : "8.0.0",
        "index_count" : 1,
        "primary_shard_count" : 2,
        "total_primary_size" : "8.6kb",
        "total_primary_bytes" : 8831
      },
      {
        "version" : "7.11.0",
        "index_count" : 1,
        "primary_shard_count" : 1,
        "total_primary_size" : "4.6kb",
        "total_primary_bytes" : 4230
      }
    ]
  },
  ...
}
```

(`total_primary_size` is only shown with the `?human` flag)

This is useful for telemetry as it allows us to see if/when a cluster has indices created on a
previous version that would need to be either upgraded or supported during an upgrade.
2021-01-28 13:58:21 -07:00
Mark Vieira fc6fbf4ab6 Mute failing tests in TransportMonitoringMigrateAlertsActionTests 2021-01-21 14:19:21 -08:00
Przemko Robakowski 04ecda5f9a
Avoid accidental cleanups in indices cleaner tests (#67338)
When tests in AbstractIndicesCleanerTestCase run at 1AM they can clash with scheduled clean up in CleanerService which leads to rare, non-reproducible failures.
This change fixes it by setting retention period in CleanerService to max possible time, so nothing is ever deleted by it.

Closes #64386
2021-01-12 17:25:05 +01:00
Przemko Robakowski 6dfdacdc8f
Remove watcher history clean up from monitoring (#67154)
Monitoring should not clean up watcher history - indices are managed by ILM policy now.
It was deprecated in 7.x, removing it now in 8
2021-01-11 21:35:29 +01:00
David Turner 1d2462e691
Move monitoring collection timeouts to coordinator (#67084)
With #66993 there is now support for coordinator-side timeouts on a
`BroadcastRequest`, which includes requests for node stats and
recoveries. This commit adjusts Monitoring to use these coordinator-side
timeouts where applicable, which will prevent partial stats responses
from accumulating on the master while one or more nodes are not
responding quickly enough. It also enhances the message logged on a
timeout to include the IDs of the nodes which did not respond in time.

Closes #60188.
2021-01-11 07:29:22 +00:00
Martijn van Groningen e259c1efeb
Mute two tests in TransportMonitoringMigrateAlertsActionTests class. (#67113)
Relates to #66586
2021-01-06 17:52:36 +01:00
Martijn van Groningen fe996515a2
Improve LocalStateMonitoring for tests. (#66997)
Adds data-streams plugin to LocalStateMonitoring and dummy stats actions for ccr and enrich.

The data stream plugin and dummy transport actions that are added to LocalStateMonitoring
will allow for monitoring java integration tests to function properly without printing error
messages that make debugging harder. For example the data stream plugin was added so that
index templates with data streams can be added without failing constantly in the background and
enrich stats dummy transport action so that the EnrichStatsCollector doesn't fail.

Also unmutes tests that were muted via #66586, to have another opportunity to look at logs without all the noise,
perhaps all these errors contributed to the test failures.
2021-01-06 14:29:32 +01:00
Przemyslaw Gomulka 5e74f79e22
Support response content-type with versioned media type (#65500)
This commit allows returning a correct requested response content-type - it did not work for versioned media types.
It is done by adding new vendor specific instances to XContent and TextFormat enums. These instances can then "format" the response content type string when provided with parameters. This is similar to what SQL plugin does with its media types.

#51816
2021-01-05 09:23:22 +01:00
Gordon Brown 3726f1563c
Mute failing monitoring migration tests (#66624)
See #66586 for details.
2020-12-18 11:51:55 -07:00