Commit Graph

71 Commits

Author SHA1 Message Date
David Turner bfcc93a042
Anonymize AbstractRefCounted (#77208)
Today `AbstractRefCounted` has a `name` field which is only used to
construct the exception message when calling `incRef()` after it's been
closed. This isn't really necessary, the stack trace will identify the
reference in question and give loads more useful detail besides. It's
also slightly irksome to have to name every single implementation.

This commit drops the name and the constructor parameter, and also
introduces a handy factory method for use when there's no extra state
needed and you just want to run a method or lambda when all references
are released.
2021-09-03 07:59:44 +01:00
Rory Hunter d01efa4fd6
Changes to keep Checkstyle happy after reformatting (#76464)
* Reformatting to keep Checkstyle after formatting

* Configure spotless everywhere, and disable the tasks if necessary

* Add XContentBuilder helpers, fix test

* Tweaks

* Add a TODO

Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
2021-08-18 07:15:55 -04:00
Rory Hunter 4bfead03a4
Fix compiler warnings in :server - part 2 (#75792)
Part of #40366. Fix a number of javac issues when linting is enforced in `server/`.
2021-08-02 15:45:53 +01:00
Ryan Ernst 68817d7ca2
Rename o.e.common in libs/core to o.e.core (#73909)
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.

relates #73784
2021-06-08 09:53:28 -07:00
Ryan Ernst 64054de1ac
Rename bootstrap package in core jar (#73788)
The org.elasticsearch.bootstrap package exists in server with classes
for starting up Elasticsearch. The elasticsearch-core jar has a handful
of classes that were split out from there, namely java version parsing
and jarhell. This commit moves those classes to a new
org.elasticsearch.jdk package so as to not split the server owned
bootstrap package.

relates #73784
2021-06-07 08:14:44 -07:00
Armin Braun 74c20f1bc8
Assert That AbstractRefCounted#decRef Does not Throw on Closing (#70373)
We should not allow exceptions in the internal close I think.
If the exceptions thrown here are fatal then they should just be errors
instead as well. If they are not and need handling then the handling should happen
in the `closeInternal` call. Otherwise, all callers of `decRef` must be aware of
necessary exception handling which would make reasoning about the behavior
of exceptions in `#closeInternal` extremely complicated.
2021-04-17 07:39:48 +02:00
Nik Everett 7693aeafcd
Banish build-eclipse (#70696)
Move it to `out` so we can be like the cool kids.
2021-03-24 16:00:52 -04:00
Przemyslaw Gomulka 8d09fbf82b
Allow for field declaration for future rest versions (#69774)
When renaming/removing a field, a new field might be declared which
should be parseable starting with the current version.
This commit changes the way ParseField is declared for compatible
Version. Instead of concrete version a boolean function has to be used
to indicate for what version a field is parseable. The onOrAfter and
equalTo functions are declared on RestApiVersion to allow for
this.
2021-03-05 08:27:00 +01:00
Joe Gallo 638735bbb9
Rename RestApiCompatibleVersion to RestApiVersion (#69897) 2021-03-03 12:17:48 -05:00
Armin Braun bb77ab46e0
Stop Ignoring Exceptions on Close in Network Code (#69665)
We should not be ignoring and suppressing exceptions on releasing
network resources quietly in these spots.

Co-authored-by: David Turner <david.turner@elastic.co>
2021-03-01 14:38:18 +01:00
Armin Braun c2370ffde5
Add Leak Tracking Infrastructure (#67688)
This commit adds leak tracking infrastructure that enables assertions
about the state of objects at GC time (simplified version of what Netty
uses to track `ByteBuf` instances).
This commit uses the infrastructure to improve the quality of leak
checks for page recycling in the mock nio transport (the logic in
`org.elasticsearch.common.util.MockPageCacheRecycler#ensureAllPagesAreReleased`
does not run for all tests and tracks too little information to allow for debugging
what caused a specific leak in most cases due to the lack of an equivalent of the added
`#touch` logic).

Co-authored-by: David Turner <david.turner@elastic.co>
2021-02-25 11:41:48 +01:00
Przemyslaw Gomulka f22adc47d8
Refactor ObjectParser and CompatibleObjectParser to support REST Compatible API (#68808)
In order to support compatible fields when parsing XContent additional information has to be set during ParsedField declaration.
This commit adds a set of RestApiCompatibleVersion on a ParsedField in order to specify on which versions a field is supported. By default ParsedField is allowed to be parsed on both current and previous major versions.

ObjectParser - which is used for constructing objects using 'setters' - has a modified fieldParsersMap to be Map of Maps. with key being RestApiCompatibility. This allows to choose set of field-parsers as specified on a request.
Under RestApiCompatibility.minimumSupported key, there is a map that contains field-parsers for both previous and current versions.
Under RestApiCompatibility.current there will be only current versions field (compatible fields not a present)

ConstructingObjectParser - which is used for constructing objects using 'constructors' - is modified to contain a map of Version To constructorArgInfo , declarations of fields to be set on a constructor depending on a version

relates #51816
2021-02-16 11:33:11 +01:00
Przemyslaw Gomulka 71d43b598d
Refactor usage of compatible version (#68648)
Compatible API version is at the moment represented by both Version and
byte - representing a major version. This can lead to a confusion which
representation to use, as well as to incorrect assumptions that minor
versions are supported (with the use of Version.V_7_0_0)

Current usage of XContentParser.useCompatible is also not allowing to
define two compatible implementations. This is not about
support N-2 compatibility, but to allow to continue development when a
major release is performed.

This commit is introducing the CompatibleVersion object responsible for
wrapping around a major version of compatible API.

relates #68100
2021-02-10 10:22:34 +01:00
Rory Hunter 780f273067
Replace NOT operator with explicit `false` check - part 8 (#68625)
Part 8.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:20:34 +00:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Rene Groeschke 810e7ff6b0
Move tasks in build scripts to task avoidance api (#64046)
- Some trivial cleanup on build scripts
- Change task referencing in build scripts to use task avoidance api
where replacement is trivial.
2020-11-12 12:04:15 +01:00
Tanguy Leroux 34fc3033c8
Add CacheFile#fsync() method to ensure cached data are written on disk (#64201)
This commit adds a new CacheFile.fsync() method to fsync the cache 
file on disk. This method uses the utility method IOUtils.fsync(Path, 
boolean) that executes a FileChannel.force() call under the hood.
2020-11-03 15:08:54 +01:00
Armin Braun eaac6496fc
Fix #invariant Assertion in CacheFile (#64180)
Fix #invariant Assertion in CacheFile

closes #64141
2020-10-27 17:59:17 +01:00
Armin Braun e28dbde289
Unify Stream Copy Buffer Usage (#56078)
We have various ways of copying between two streams and handling thread-local
buffers throughout the codebase. This commit unifies a number of them and
removes buffer allocations in many spots.
2020-08-03 18:21:24 +02:00
Nhat Nguyen f238d0a367
Set timeout of master requests on follower to unbounded (#60070)
Today, a follow task will fail if the master node of the follower 
cluster is temporarily overloaded and unable to process master node
requests (such as update mapping, setting, or alias) from a follow-task
within the default timeout. This error is transient, and follow-tasks
should not abort. We can avoid this problem by setting the timeout of
master node requests on the follower cluster to unbounded.

Closes #56891
2020-07-27 09:23:30 -04:00
Yannick Welsch 2078509af5
Set specific keepalive options by default on supported platforms (#59278)
keepalives tell any intermediate devices that the connection remains alive, which helps with overzealous firewalls that are
killing idle connections. keepalives are enabled by default in Elasticsearch, but use system defaults for their
configuration, which often times do not have reasonable defaults (e.g. 7200s for TCP_KEEP_IDLE) in the context of
distributed systems such as Elasticsearch.

This PR sets the socket-level keep_alive options for network.tcp.{keep_idle,keep_interval} to 5 minutes on configurations
that support it (>= Java 11 & (MacOS || Linux)) and where the system defaults are set to something higher than 5
minutes. This helps keep the connections alive while not interfering with system defaults or user-specified settings
unless they are deemed to be set too high by providing better out-of-the-box defaults.
2020-07-27 13:38:24 +02:00
Armin Braun 785a91c1e3
Optimize some Spots around Closing Resources (#60049)
The single element `close` calls go through a very inefficient path that includes creating
a one element list.
`releaseOnce` is only with a single non-null input in production in two spots so no need for
varargs and any complexity here.
`ReleasableBytesStreamOutput` does not require any `releaseOnce` wrapping because we already have
that kind of logic implemented in `org.elasticsearch.common.util.AbstractArray` (which we were
wrapping here) already.
2020-07-22 23:41:48 +02:00
Ryan Ernst 5d6a2dd35b
Add license feature usage api (#59342)
This commit adds a new api to track when gold+ features are used within
x-pack. The tracking is done internally whenever a feature is checked
against the current license. The output of the api is a list of each
used feature, which includes the name, license level, and last time it
was used. In addition to a unit test for the tracking, a rest test is
added which ensures starting up a default configured node does not
result in any features registering as used.

There are a couple features which currently do not work well with the
tracking, as they are checked in a manner that makes them look always
used. Those features will be fixed in followups, and in this PR they are
omitted from the feature usage output.
2020-07-14 12:25:36 -07:00
Rene Groeschke a1e1b33a47
Detangle JdkJarHellCheck from build tool building (#58601)
* Detangle JdkJarHellCheck from build tool building

- allows building the tool with same runtime as es
- allows building build tools with newer runtime version and keep ThirdPartyAuditTask
running with minimum runtime to ensure we check against correct jre
- add jdkjarhell test jar setup into fixture
2020-06-30 16:20:02 +02:00
Ryan Ernst 2671ff7572
Migrate plugin packaging tests to java (#58518)
This commit converts the bats tests for the plugin cli into the java
packaging test framework. The new tests only use the example plugin to
test the plugin cli. The tests for each individual plugin's contents
after being installed are handled by a new unit test for the plugin
installer added in #58287.
2020-06-25 13:38:40 -07:00
Ryan Ernst 7c839eefe8
Create utility for custom config setup in packaging tests (#58352)
This commit creates a shared withCustomConfig method that may be used by
any packaging test. The method will copy the config directory and
override the conf path appropriately depending on the distribution type.
2020-06-23 14:49:20 -07:00
Rene Groeschke 680ea07f7f
Remove deprecated usage of testCompile configuration (#57921)
* Remove usage of deprecated testCompile configuration
* Replace testCompile usage by testImplementation
* Make testImplementation non transitive by default (as we did for testCompile)
* Update CONTRIBUTING about using testImplementation for test dependencies
* Fail on testCompile configuration usage
2020-06-12 13:34:53 +02:00
Mark Vieira 627ef279fd
Include vendored code notices in distribution notice files (#57017) 2020-06-01 15:23:41 -07:00
Jason Tedor 689244e08e
Fix some licenses in our own code (#56978)
All of these files were written by us, and not sourced from
anywhere. Therefore, the license head should be granting licenses to
Elasticsearch, rathern than to the ASF. This commit address them by
changing the license to our standard Apache 2.0 license header.
2020-05-20 09:23:48 -04:00
Ryan Ernst c0ee68b0a0
Move publishing configuration to a separate plugin (#56727)
This is another part of the breakup of the massive BuildPlugin. This PR
moves the code for configuring publications to a separate plugin. Most
of the time these publications are jar files, but this also supports the
zip publication we have for integ tests.
2020-05-14 18:56:59 -07:00
Ryan Ernst 842ce32870
Use task avoidance with forbidden apis (#55034)
Currently forbidden apis accounts for 800+ tasks in the build. These
tasks are aggressively created by the plugin. In forbidden apis 3.0, we
will get task avoidance
(https://github.com/policeman-tools/forbidden-apis/pull/162), but we
need to ourselves use the same task avoidance mechanisms to not trigger
these task creations. This commit does that for our foribdden apis
usages, in preparation for upgrading to 3.0 when it is released.
2020-04-15 13:23:55 -07:00
Mark Vieira 0e55fdeae9
Re-add origin url information to publish POM files (#55171) 2020-04-14 11:48:36 -07:00
Gordon Brown e9bc3e8234
Disallow negative TimeValues (#53913)
This commit causes negative TimeValues, other than -1 which is sometimes used as
a sentinel value, to be rejected during parsing.

Also introduces a hack to allow ILM to load policies which were written to the
cluster state with a negative min_age, treating those values as 0, which should
match the behavior of prior versions.
2020-03-26 09:22:07 -06:00
Tal Levy 782b4f4436
correct licensing and incorporation of FastMath (#49122)
this resolves incorrectly licensed code in #49009.

ESSloppyMath is made as a wrapper around FastMath.java which is 
not meant to be modified with code beyond the original source
2019-11-21 07:10:01 -08:00
Tal Levy 3ab2de1c0e
Introduce faster approximate sinh/atan math functions (#49009)
This commit introduces a new class called ESSloppyMath
that is meant to reflect the purpose of Lucene's SloppyMath,
but add additional unimplemented faster alternatives to math functions.

The two that are used by geotile-grid a lot are sinh/atan.

In a quick elasticsearch rally benchmark for geotile-grid on Switzerland
data points, this shows a (1.22x) 22% speed-up over using Math's functions.

closes #41166.
2019-11-14 12:30:04 -08:00
Rory Hunter 3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00
Przemyslaw Gomulka 54d6da5432
[Java.time] Calculate week of a year with ISO rules (#48209)
Reverting the change introducing IsoLocal.ROOT and introducing IsoCalendarDataProvider that defaults start of the week to Monday and requires minimum 4 days in first week of a year. This extension is using java SPI mechanism and defaults for Locale.ROOT only. 
It require jvm property java.locale.providers to be set with SPI,COMPAT

closes #41670
2019-10-22 14:58:21 +02:00
Alpar Torok ca54b442bf
Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 10:50:46 +03:00
Tanguy Leroux 519fd9b41b
Fix CharArraysTests.testConstantTimeEquals() (#47346)
The change #47238 fixed a first issue (#47076) but introduced 
another one that can be reproduced using:

org.elasticsearch.common.CharArraysTests > testConstantTimeEquals FAILED

java.lang.StringIndexOutOfBoundsException: String index out of range: 1
at __randomizedtesting.SeedInfo.seed([DFCA64FE2C786BE3:ED987E883715C63B]:0)
at java.lang.String.substring(String.java:1963)
at org.elasticsearch.common.CharArraysTests.testConstantTimeEquals(CharArraysTests.java:74)

REPRODUCE WITH: ./gradlew ':libs:elasticsearch-core:test' --tests 
"org.elasticsearch.common.CharArraysTests.testConstantTimeEquals" 
-Dtests.seed=DFCA64FE2C786BE3 -Dtests.security.manager=true -Dtests.locale=fr-CA 
-Dtests.timezone=Pacific/Johnston -Dcompiler.java=12 -Druntime.java=8

that happens when the first randomized string has a length of 0.
2019-10-01 12:48:01 +02:00
Ryan Ernst 59fbe1b13a
Ensure char array test uses different values (#47238)
The test of constantTimeEquals could get unlucky and randomly produce
the same two strings. This commit tweaks the test to ensure the two
string are unique, and the loop inside constantTimeEquals is actually
executed (which requires the strings be of the same length).

fixes #47076
2019-09-30 10:52:10 -07:00
Lee Hinman 56aabcdd69
Add retention to Snapshot Lifecycle Management (#46407)
This commit adds retention to the existing Snapshot Lifecycle Management feature (#38461) as described in #43663. This allows a user to configure SLM to automatically delete older snapshots based on a number of criteria.

An example policy would look like:

```
PUT /_slm/policy/snapshot-every-day
{
  "schedule": "0 30 2 * * ?",
  "name": "<production-snap-{now/d}>",
  "repository": "my-s3-repository",
  "config": {
    "indices": ["foo-*", "important"]
  },
  // Newly configured retention options
  "retention": {
    // Snapshots should be deleted after 14 days
    "expire_after": "14d",
    // Keep a maximum of thirty snapshots
    "max_count": 30,
    // Keep a minimum of the four most recent snapshots
    "min_count": 4
  }
}
```

SLM Retention is run on a scheduled configurable with the `slm.retention_schedule` setting, which supports cron expressions. Deletions are run for a configurable time bounded by the `slm.retention_duration` setting, which defaults to 1 hour.

Included in this work is a new SLM stats API endpoint available through

``` json
GET /_slm/stats
```

That returns statistics about snapshot taken and deleted, as well as successful retention runs, failures, and the time spent deleting snapshots. #45362 has more information as well as an example of the output. These stats are also included when retrieving SLM policies via the API.

* Add base framework for snapshot retention (#43605)

* Add base framework for snapshot retention

This adds a basic `SnapshotRetentionService` and `SnapshotRetentionTask`
to start as the basis for SLM's retention implementation.

Relates to #38461

* Remove extraneous 'public'

* Use a local var instead of reading class var repeatedly

* Add SnapshotRetentionConfiguration for retention configuration (#43777)

* Add SnapshotRetentionConfiguration for retention configuration

This commit adds the `SnapshotRetentionConfiguration` class and its HLRC
counterpart to encapsulate the configuration for SLM retention.
Currently only a single parameter is supported as an example (we still
need to discuss the different options we want to support and their
names) to keep the size of the PR down. It also does not yet include version serialization checks
since the original SLM branch has not yet been merged.

Relates to #43663

* Fix REST tests

* Fix more documentation

* Use Objects.equals to avoid NPE

* Put `randomSnapshotLifecyclePolicy` in only one place

* Occasionally return retention with no configuration

* Implement SnapshotRetentionTask's snapshot filtering and delet… (#44764)

* Implement SnapshotRetentionTask's snapshot filtering and deletion

This commit implements the snapshot filtering and deletion for
`SnapshotRetentionTask`. Currently only the expire-after age is used for
determining whether a snapshot is eligible for deletion.

Relates to #43663

* Fix deletes running on the wrong thread

* Handle missing or null policy in snap metadata differently

* Convert Tuple<String, List<SnapshotInfo>> to Map<String, List<SnapshotInfo>>

* Use the `OriginSettingClient` to work with security, enhance logging

* Prevent NPE in test by mocking Client

* Allow empty/missing SLM retention configuration (#45018)

Semi-related to #44465, this allows the `"retention"` configuration map
to be missing.

Relates to #43663

* Add min_count and max_count as SLM retention predicates (#44926)

This adds the configuration options for `min_count` and `max_count` as
well as the logic for determining whether a snapshot meets this criteria
to SLM's retention feature.

These options are optional and one, two, or all three can be specified
in an SLM policy.

Relates to #43663

* Time-bound deletion of snapshots in retention delete function (#45065)

* Time-bound deletion of snapshots in retention delete function

With a cluster that has a large number of snapshots, it's possible that
snapshot deletion can take a very long time (especially since deletes
currently have to happen in a serial fashion). To prevent snapshot
deletion from taking forever in a cluster and blocking other operations,
this commit adds a setting to allow configuring a maximum time to spend
deletion snapshots during retention. This dynamic setting defaults to 1
hour and is best-effort, meaning that it doesn't hard stop a deletion
at an hour mark, but ensures that once the time has passed, all
subsequent deletions are deferred until the next retention cycle.

Relates to #43663

* Wow snapshots suuuure can take a long time.

* Use a LongSupplier instead of actually sleeping

* Remove TestLogging annotation

* Remove rate limiting

* Add SLM metrics gathering and endpoint (#45362)

* Add SLM metrics gathering and endpoint

This commit adds the infrastructure to gather metrics about the different SLM actions that a cluster
takes. These actions are stored in `SnapshotLifecycleStats` and perpetuated in cluster state. The
stats stored include the number of snapshots taken, failed, deleted, the number of retention runs,
as well as per-policy counts for snapshots taken, failed, and deleted. It also includes the amount
of time spent deleting snapshots from SLM retention.

This commit also adds an endpoint for retrieving all stats (further commits will expose this in the
SLM get-policy API) that looks like:

```
GET /_slm/stats
{
  "retention_runs" : 13,
  "retention_failed" : 0,
  "retention_timed_out" : 0,
  "retention_deletion_time" : "1.4s",
  "retention_deletion_time_millis" : 1404,
  "policy_metrics" : {
    "daily-snapshots2" : {
      "snapshots_taken" : 7,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 6,
      "snapshot_deletion_failures" : 0
    },
    "daily-snapshots" : {
      "snapshots_taken" : 12,
      "snapshots_failed" : 0,
      "snapshots_deleted" : 12,
      "snapshot_deletion_failures" : 6
    }
  },
  "total_snapshots_taken" : 19,
  "total_snapshots_failed" : 0,
  "total_snapshots_deleted" : 18,
  "total_snapshot_deletion_failures" : 6
}
```

This does not yet include HLRC for this, as this commit is quite large on its own. That will be
added in a subsequent commit.

Relates to #43663

* Version qualify serialization

* Initialize counters outside constructor

* Use computeIfAbsent instead of being too verbose

* Move part of XContent generation into subclass

* Fix REST action for master merge

* Unused import

*  Record history of SLM retention actions (#45513)

This commit records the deletion of snapshots by the retention component
of SLM into the SLM history index for the purposes of reviewing operations
taken by SLM and alerting.

* Retry SLM retention after currently running snapshot completes (#45802)

* Retry SLM retention after currently running snapshot completes

This commit adds a ClusterStateObserver to wait until the currently
running snapshot is complete before proceeding with snapshot deletion.
SLM retention waits for the maximum allowed deletion time for the
snapshot to complete, however, the waiting time is not factored into
the limit on actual deletions.

Relates to #43663

* Increase timeout waiting for snapshot completion

* Apply patch

From 2374316f0d.patch

* Rename test variables

* [TEST] Be less strict for stats checking

* Skip SLM retention if ILM is STOPPING or STOPPED (#45869)

This adds a check to ensure we take no action during SLM retention if
ILM is currently stopped or in the process of stopping.

Relates to #43663

* Check all actions preventing snapshot delete during retention (#45992)

* Check all actions preventing snapshot delete during retention run

Previously we only checked to see if a snapshot was currently running,
but it turns out that more things can block snapshot deletion. This
changes the check to be a check for:

- a snapshot currently running
- a deletion already in progress
- a repo cleanup in progress
- a restore currently running

This was found by CI where a third party delete in a test caused SLM
retention deletion to throw an exception.

Relates to #43663

* Add unit test for okayToDeleteSnapshots

* Fix bug where SLM retention task would be scheduled on every node

* Enhance test logging

* Ignore if snapshot is already deleted

* Missing import

* Fix SnapshotRetentionServiceTests

* Expose SLM policy stats in get SLM policy API (#45989)

This also adds support for the SLM stats endpoint to the high level rest client.

Retrieving a policy now looks like:

```json
{
  "daily-snapshots" : {
    "version": 1,
    "modified_date": "2019-04-23T01:30:00.000Z",
    "modified_date_millis": 1556048137314,
    "policy" : {
      "schedule": "0 30 1 * * ?",
      "name": "<daily-snap-{now/d}>",
      "repository": "my_repository",
      "config": {
        "indices": ["data-*", "important"],
        "ignore_unavailable": false,
        "include_global_state": false
      },
      "retention": {}
    },
    "stats": {
      "snapshots_taken": 0,
      "snapshots_failed": 0,
      "snapshots_deleted": 0,
      "snapshot_deletion_failures": 0
    },
    "next_execution": "2019-04-24T01:30:00.000Z",
    "next_execution_millis": 1556048160000
  }
}
```

Relates to #43663

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase (#46356)

* Rewrite SnapshotLifecycleIT as as ESIntegTestCase

This commit splits `SnapshotLifecycleIT` into two different tests.
`SnapshotLifecycleRestIT` which includes the tests that do not require
slow repositories, and `SLMSnapshotBlockingIntegTests` which is now an
integration test using `MockRepository` to simulate a snapshot being in
progress.

Relates to #43663
Resolves #46205

* Add error logging when exceptions are thrown
2019-09-09 09:55:34 -06:00
Przemyslaw Gomulka 8d1ea86519
Set start of the week to Monday for root locale (#43652)
Introducing a IsoLocal.ROOT constant which should be used instead of java.util.Locale.ROOT in ES when dealing with dates. IsoLocal.ROOT  customises start of the week to be Monday instead of Sunday.

closes #42588 an issue with investigation details
relates #41670 bug raised (this won't fix it on its own. joda.parseInto has to be reimplemented
closes #43275 an issue raised by community member
2019-08-09 15:24:05 +02:00
Yannick Welsch 245cb348d3
Add per-socket keepalive options (#44055)
Uses JDK 11's per-socket configuration of TCP keepalive (supported on Linux and Mac), see
https://bugs.openjdk.java.net/browse/JDK-8194298, and exposes these as transport settings.
By default, these options are disabled for now (i.e. fall-back to OS behavior), but we would like
to explore whether we can enable them by default, in particular to force keepalive configurations
that are better tuned for running ES.
2019-08-05 16:09:11 +02:00
Ioannis Kakavas 3b7b025690
Allow parsing the value of java.version sysprop (#44017)
We often start testing with early access versions of new Java
versions and this have caused minor issues in our tests
(i.e. #43141) because the version string that the JVM reports
cannot be parsed as it ends with the string -ea.

This commit changes how we parse and compare Java versions to
allow correct parsing and comparison of the output of java.version
system property that might include an additional alphanumeric
part after the version numbers
 (see [JEP 223[(https://openjdk.java.net/jeps/223)). In short it 
handles a version number part, like before, but additionally a 
PRE part that matches ([a-zA-Z0-9]+).

It also changes a number of tests that would attempt to parse
java.specification.version in order to get the full version
of Java. java.specification.version only contains the major
version and is thus inappropriate when trying to compare against
a version that might contain a minor, patch or an early access
part. We know parse java.version that can be consistently
parsed.

Resolves #43141
2019-07-22 20:13:32 +03:00
Yannick Welsch 855d27e374
Ignore failures to set socket options on Mac (#44355)
Brings some temporary relief for test failures until #41071 is addressed.
2019-07-17 17:33:40 +02:00
vinoov 0d0485ad9c Expose Elasticsearch API nullability information to Kotlin compiler. (#43912)
This change allows the Kotlin compiler to type check methods annotated with the
org.elasticsearch.common.Nullable annotation in Elasticsearch Java
APIs as described in: https://kotlinlang.org/docs/reference/java-interop.html#jsr-305-support.
2019-07-09 09:16:30 -07:00
Lee Hinman b7d63b8cd4
Add TimeValue.toHumanReadableString() to allow specifying frac… (#43346)
* Enhance TimeValue.toString() to allow specifying fractional values.

This enhances the `TimeValue` class to allow specifying the number of
truncated fractional decimals when calling `toString()`. The default
remains 1, however, more (or less, such as 0) can be specified to change
the output.

This commit also re-organizes some things in `TimeValue` such as putting
all the class variables near the top of the class, and moving the
constructors to the first methods in the class, in order to follow the
structure of our other code.

* Rename `toString(...)` to `toHumanReadableString(...)`
2019-06-24 10:13:42 -06:00
Jason Tedor 8c32e577a7
Fix IOUtils#fsync on Windows fsyncing directories (#43008)
Fsyncing directories on Windows is not possible. We always suppressed
this by allowing that an AccessDeniedException is thrown when attemping
to open the directory for reading. Yet, this suppression also allowed
other IOExceptions to be suppressed, and that was a bug (e.g., the
directory not existing, or a filesystem error and reasons that we might
get an access denied there, like genuine permissions issues). This
leniency was previously removed yet it exposed that we were suppressing
this case on Windows. Rather than relying on exceptions for flow control
and continuing to suppress there, we simply return early if attempting
to fsync a directory on Windows (we will not put this burden on the
caller).
2019-06-07 22:59:53 -04:00
Jason Tedor 526a6ca677
Only ignore IOException when fsyncing on dirs (#42972)
Today in the method IOUtils#fsync we ignore IOExceptions when fsyncing a
directory. However, the catch block here is too broad, for example it
would be ignoring IOExceptions when we try to open a non-existant
file. This commit addresses that by scoping the ignored exceptions only
to the invocation of FileChannel#force.
2019-06-07 08:34:19 -04:00