Commit Graph

48437 Commits

Author SHA1 Message Date
Lee Hinman 08e887ae30
Set default SLM retention invocation time (#47604)
This adds a default for the `slm.retention_schedule` setting, setting it
to `0 30 1 * * ?` which is 1:30am every day.

Having retention unset meant that it would never be invoked and clean up
snapshots. We determined it would be better to have a default than never
to be run. When coming to a decision, we weighed the option of an
absolute time (such as 1:30am) versus a periodic invocation (like every
12 hours). In the end we decided on the absolute time because it has
better predictability and consistency than a periodic invocation, which
would rely on when the master node were elected or restarted.

Relates to #43663
2019-10-04 14:57:23 -06:00
Tal Levy f6f249be15
Expose ValueException in Grok (#47368)
Previously, Grok's groupMatch would allow the code to
fall into an IndexOutOfBoundsException, which can be avoided.
The other exception that can come up is a ValueException. The times
this exception occurs is less understood, but it may make sense to expose
this since it typically means something did not go well.
2019-10-04 13:55:41 -07:00
Mark Vieira e86d40ff45
Eliminate Gradle task input warnings (#47538) 2019-10-04 13:12:42 -07:00
Mark Tozzi c26ce1d7f5
DocValueFormat implementation for date range fields (#47472) 2019-10-04 16:01:28 -04:00
lcawl beb50968d2 [DOCS] Minor fixes to security documentation 2019-10-04 10:58:59 -07:00
Lisa Cawley c120dd2017
[DOCS] Adds missing security anchors (#47585) 2019-10-04 10:16:16 -07:00
Armin Braun 9141e05466
Cleaner Handling of Store Refcount in BlobStoreRepository (#47560)
If a shard gets closed we properly abort its snapshot
before closing it. We should in thise case make sure to
not throw a confusing exception about trying to increment
the reference on an already closed shard in the async tasks
if the snapshot is already aborted.
Also, added an assertion to make sure that aborts are in
fact the only situation in which we run into a concurrently
closed store.
2019-10-04 19:03:44 +02:00
Andrei Dan 37d6106fee
ILM: Skip rolling indexes that are already rolled (#47324)
An index with an ILM policy that has a rollover action in one of the
phases was rolled over when the ILM conditions dictated regardless if
it was already rolled over (eg. manually after modifying an index
template in order to force the creation of a new index that uses the new
mappings).
This changes this behaviour and has ILM check if the index it's about to
roll has not been rolled over in the meantime.
2019-10-04 17:22:50 +01:00
James Rodewig fbf698ec02
[DOCS] Reformat refresh API docs (#46667) 2019-10-04 12:16:10 -04:00
James Rodewig 7ad45f406e
[DOCS] Reformat shrink index API docs (#46711) 2019-10-04 12:07:46 -04:00
Jack Conradson 2a527dbba4
Add a ScriptRoot to consolidate global data necessary for multiple passes (#47532)
This PR is to get plumbing in for a ScriptRoot class that will consolidate 
several pieces of state required by potentially multiple passes including 
PainlessLookup, CompilerSettings, FunctionTable, the root class node, and a 
synthetic counter. It's possible more may be added to this as we move 
forward and slowly make the the nodes have less mutable state.
2019-10-04 08:36:44 -07:00
James Rodewig 854cd91436 [DOCS] Correct headings for split index API docs 2019-10-04 11:02:50 -04:00
James Rodewig 506329f97d
[DOCS] Reformat split index API docs (#46713) 2019-10-04 10:26:38 -04:00
Luca Cavanna 4e6756fb4d
Fold InitialSearchPhase into AbstractSearchAsyncAction (#47182)
Historically, we have two base classes for search actions that generally need to fan out to multiple shards and then move on to the following phase: InitialSearchPhase and AbstractSearchAsyncAction that extends it. Practically, every search action extends the latter, and there are no direct subclasses of InitialSearchPhase in our codebase.

This commit folds InitialSearchPhase into AbstractSearchAsyncAction in the attempt of simplifying things and making the search code running on the coordinating node easier to reason about.
2019-10-04 16:07:49 +02:00
Andrei Stefan bb14ba8312
SQL: fix multi full-text functions usage with aggregate functions (#47444)
* Skip functions involving full-text predicates when replacing multiple
aggregate functions with "stats" or "matrix_stats" aggregations.
2019-10-04 16:05:55 +03:00
Martijn van Groningen 785cf6bd44
Upgrade joni from 2.1.6 to 2.1.29 (#47374)
Changed the Grok class to use searchInterruptible(...) instead of search(...)
otherwise we can't interrupt long running matching via the thread watch
dog.

Joni now also provides another way to interrupt long running matches.
By invoking the interrupt() method on the Matcher. We need then to refactor
the watch thread dog to keep track of Matchers instead of Threads, but
it is a better way of doing this, since interrupting would be more direct
(not every 30k iterations) and efficient (checking a volatile field).
This work needs to be done in a follow up.
2019-10-04 06:30:41 -05:00
Alpar Torok 2b0b0929fd
Fix windows packaging tests (#47554)
On windows, it happens that the process we called terminates but some
other process it creates still has the same output strems and thus the
files open, so we can't clean it up.

This PR makes the cleanup a best effort.
2019-10-04 14:02:24 +03:00
Przemysław Witek 72130352ad
Fix serialization of evaluation response. (#47557) 2019-10-04 12:57:37 +02:00
Jason Tedor fbd4ec5294
Validating monitoring hosts setting while parsing (#47246)
This commit lifts the validation of the monitoring hosts setting into
the setting itself, rather than when the setting is used. This prevents
a scenario where an invalid value for the setting is accepted, but then
later fails while applying a cluster state with the invalid setting.
2019-10-04 05:47:07 -05:00
Ioannis Kakavas e33a02b7b0
NameID mapping and Single Logout (#47288)
Clarify in the documentation that for SAML Single Logout to be
functional, the Identity Provider needs to release a NameID.
2019-10-04 13:39:54 +03:00
Przemysław Witek 1fc8dd2532
Implement new analysis type: classification (#46537) 2019-10-04 11:46:13 +02:00
Armin Braun e244d65869
Fix Snapshot Corruption in Edge Case (#47552)
This fixes missing to marking shard snapshots as failures when
multiple data-nodes are lost during the snapshot process or
shard snapshot failures have occured before a node left the cluster.

The problem was that we were simply not adding any shard entries for completed
shards on node-left events. This has no effect for a successful shard, but
for a failed shard would lead to that shard not being marked as failed during
snapshot finalization. Fixed by corectly keeping track of all previous completed
shard states as well in this case.
Also, added an assertion that without this fix would trip on almost every run of the
resiliency tests and adjusted the serialization of SnapshotsInProgress.Entry so
we have a proper assertion message.

Closes #47550
2019-10-04 10:42:54 +02:00
David Roberts d683b2060b
[ML] More accurate job memory overhead (#47516)
When an ML job runs the memory required can be
broken down into:

1. Memory required to load the executable code
2. Instrumented model memory
3. Other memory used by the job's main process or
   ancilliary processes that is not instrumented

Previously we added a simple fixed overhead to
account for 1 and 3. This was 100MB for anomaly
detection jobs (large because of the completely
uninstrumented categorization function and
normalize process), and 20MB for data frame
analytics jobs.

However, this was an oversimplification because
the executable code only needs to be loaded once
per machine.  Also the 100MB overhead for anomaly
detection jobs was probably too high in most cases
because categorization and normalization don't use
_that_ much memory.

This PR therefore changes the calculation of memory
requirements as follows:

1. A per-node overhead of 30MB for _only_ the first
   job of any type to be run on a given node - this
   is to account for loading the executable code
2. The established model memory (if applicable) or
   model memory limit of the job
3. A per-job overhead of 10MB for anomaly detection
   jobs and 5MB for data frame analytics jobs, to
   account for the uninstrumented memory usage

This change will enable more jobs to be run on the
same node.  It will be particularly beneficial when
there are a large number of small jobs.  It will
have less of an effect when there are a small number
of large jobs.
2019-10-04 09:16:56 +01:00
David Roberts e036ac4ffd
Remove fallback for controller location (#47104)
This change removes the temporary controller
location fallback introduced in #47013.

Relates elastic/ml-cpp#593
2019-10-04 09:16:10 +01:00
Alpar Torok 3a7ad13969
Do common node config on bwc tests (#47361)
* Do common node config on bwc tests

Before this PR we always ever ran `ElasticsearchCluster.start` once, and
the common node config was never done.
This becomes apparent in upgrading from `6.x` to `7.x` as the new config
is missing preventing the cluster from starting.
2019-10-04 11:03:26 +03:00
István Zoltán Szabó b03be6e816
[DOCS] Fixes an attribute in the update datafeed API docs. (#47551) 2019-10-04 08:42:30 +02:00
Alpar Torok f962d1ca03
Make All OS tests run on GCP instances (#46924)
This PR makes the necesary adaptations to the tests and adds a power shell script to
invoke the OS tests on GCP instances connected as CI workers.

Also noticed that logs were not being produced by the tests and that theses were not using log4j so fixed that too.

One of the difficulties in working on theses tests was that the tests just stalled with no indication where the problem is.
To ease with the debugging, after process explorer suggested that the tests are running some commands, we now have multiple timeouts: one for the tests ( which will generate a thread dump ) and one for individual commands ( that bails with the command being ran and output and error so far ) to make it easier to see what went wrong.

The tests were blocking because apparently the pipes to the sub-process were not closing, thus the threads were blocking on them and we were blocking indefinitely on the join. I'm not sure why this doesn't happen in vagrant, but we now properly deal with it.
2019-10-04 08:41:06 +03:00
Ryan Ernst d8b4556e2d
Add explanations to script score queries (#46693)
While function scores using scripts do allow explanations, they are only
creatable with an expert plugin. This commit improves the situation for
the newer script score query by adding the ability to set the
explanation from the script itself.

To set the explanation, a user would check for `explanation != null` to
indicate an explanation is needed, and then call
`explanation.set("some description")`.
2019-10-03 19:35:59 -07:00
Karen Metts b9b99943e7 Update link to ls monitoring settings (#47529) 2019-10-03 15:39:21 -07:00
Lisa Cawley f7edcb0857
[DOCS] Fixes build errors (#47536) 2019-10-03 15:22:41 -07:00
Gordon Brown f95753a1d2
Fix Rollover error when alias has closed indices (#47148)
Rollover previously requested index stats for all indices in the
provided alias, which causes an exception when there is a closed index
with that alias.

This commit adjusts the IndicesOptions used on the index stats
request so that closed indices are ignored, rather than throwing
an exception.
2019-10-03 15:26:50 -06:00
Lisa Cawley 5c49ac13dc
[DOCS] Copies security source files from stack-docs (#47521) 2019-10-03 13:39:21 -07:00
James Rodewig 730c35c4cd
[DOCS] Reformat index recovery API docs (#46493) 2019-10-03 16:13:21 -04:00
James Rodewig 0647050d1b
[DOCS] Reformat index shard stores API docs (#46504) 2019-10-03 15:57:11 -04:00
Francois-Clement Brossard 2288052d08 [DOCS] Update painless statements with if/else example (#47485) 2019-10-03 15:23:33 -04:00
Armin Braun be397b7e0d
Simplify Snapshot Delete Process (#47439)
We don't need to read the SnapshotInfo for a
snapshot to determine the indices that need to
be updated when it is deleted as the `RepositoryData`
contains that information already.
This PR makes it so the `RepositoryData` is used to
determine which indices to update and also removes
the special handling for deleting snapshot metadata
and the CS snapshot blob and has those simply be
deleted as part of the deleting of other unreferenced
blobs in the last step of the delete.

This makes the snapshot delete a little faster and
more resilient by removing two RPC calls
(the separate delete and the get).

Also, this shortens the diff with #46250 as a
side-effect.
2019-10-03 21:08:17 +02:00
Armin Braun 3c011aa4ab
Fix Ex. Handling in SnapshotsService#snapshots (#47507)
* Fix Ex. Handling in SnapshotsService#snapshots

We're needlessly wrapping a `SnapshotMissingException` which
itself is a `SnapshotException` when trying to load a missing
snapshot. This leads to failure #47442 which expects a
`SnapshotMissingException` in this case.

Closes #47442
2019-10-03 19:39:22 +02:00
Lisa Cawley e1a30179ba
[DOCS] Fixes missing link title (#47481) 2019-10-03 07:41:40 -07:00
Nhat Nguyen 8c46477566
Limit number of retaining translog files for peer recovery (#47414)
Today we control the extra translog (when soft-deletes is disabled) for
peer recoveries by size and age. If users manually (force) flush many
times within a short period, we can keep many small (or empty) translog
files as neither the size or age condition is reached. We can protect
the cluster from running out of the file descriptors in such a situation
by limiting the number of retaining translog files.
2019-10-03 10:34:15 -04:00
Paweł Krześniak 3f4be610de [DOCS] Change index name in rollover ILM example (#47492)
The warning section above the example tells that index name has to end with the digits but the example itself uses index name without digits which is confusing.
2019-10-03 09:25:35 -04:00
AndyHunt66 ac543d5386 [DOCS] Remove duplicated half-sentence from secure settings docs (#47498) 2019-10-03 08:40:31 -04:00
Ioannis Kakavas 1eb99a2def
Fix ADRealmTests in FIPS 140 JVMs (#47437)
The changes introduced in #47179 made it so that we could try to
build an SSLContext with verification mode set to None, which is
not allowed in FIPS 140 JVMs. This commit address that
2019-10-03 14:55:37 +03:00
István Zoltán Szabó c0da956b6e
[DOCS] Amends update datafeed API docs (#47448) 2019-10-03 13:12:19 +02:00
Alan Woodward 29463551ae
Remove typename checks in mapping updates (#47347)
This commit removes types validation during mapping updates. This will make
further work on types removal easier, as it will prevent test failures due to type-name
clashes when we remove type information from PutMapping and CreateIndex requests

Part of #41059
2019-10-03 11:25:09 +01:00
Armin Braun c74ca28402
Fix getSnapshotIndexMetaData Exception Behavior (#47488)
If we fail to read the global metadata in a snapshot
we would throw `SnapshotMissingException` but wouldn't
do so for the index metadata.
This is breaking SLM tests at a low rate because they
use `SnapshotMissingException` thrown from snapshot status APIs
to wait for a snapshot being gone.
Also, we should be consistent here in general and not leak the
`NoSuchFileException` to the transport layer for index meta.

Closes #46508
2019-10-03 11:00:20 +02:00
Armin Braun ea1c22cd51
Remove Incorrect Assertion from SnapshotsInProgress (#47458)
This relates to the effort towards #46250. We added
tracking of the shard generation for successful
snapshots to `8.0`.
This assertion isn't correct though. While an `8.0`
master won't create an entry with sucess state and
a null shard generation it may still (on e.g. master
failover) send a success entry created by a 7.x master
with a `null` generation over the wire.

Closes #47406
2019-10-03 10:59:43 +02:00
Alpar Torok ca54b442bf
Remove eclipse conditionals (#44075)
* Remove eclipse conditionals

We used to have some meta projects with a `-test` prefix because
historically eclipse could not distinguish between test and main
source-sets and could only use a single classpath.
This is no longer the case for the past few Eclipse versions.

This PR adds the necessary configuration to correctly categorize source
folders and libraries.
With this change eclipse can import projects, and the visibility rules
are correct e.x. auto compete doesn't offer classes from test code or
`testCompile` dependencies when editing classes in `main`.

Unfortunately the cyclic dependency detection in Eclipse doesn't seem to
take the difference between test and non test source sets into account,
but since we are checking this in Gradle anyhow, it's safe to set to
`warning` in the settings. Unfortunately there is no setting to ignore
it.

This might cause problems when building since Eclipse will probably not
know the right order to build things in so more wirk might be necesarry.
2019-10-03 10:50:46 +03:00
Nhat Nguyen 8ce4a76097
Always flush in FullClusterRestartIT#testRecovery (#47465)
The pattern in the latest failure is similar to the source fixed in #46956
but relates to synced-flush. If peer recovery happens after indexing,
and indexing flushes some shard at the end, then a synced flush in the
test will not roll or commit translog.

Closes #46712
2019-10-02 17:30:55 -04:00
Jason Tedor f63ee2f71a
Allow setting validation against arbitrary types (#47264)
Today when settings validate, they can only validate against settings
that are of the same type. While this strong-type is convenient from a
development perspective, it is too limiting in that some settings need
to validate against settings of a different type. For example, the list
setting xpack.monitoring.exporters.<namespace>.host wants to validate
that it is non-empty if and only if the string setting
xpack.monitoring.exporters.<namespace>.type is "http". Today this is
impossible since the settings validation framework only allows that
setting to validate against other list settings. This commit increases
the flexibility here to validate against settings of arbitrary type, at
the expense of losing strong-typing during development.
2019-10-02 15:11:51 -05:00
Mark Vieira 72a59d4111
Remove groovy test code from buildSrc (#47416) 2019-10-02 11:04:48 -07:00