Commit Graph

45063 Commits

Author SHA1 Message Date
Jason Tedor ad90055044
Stop auto-followers on shutdown (#40124)
When shutting down a node, auto-followers will keep trying to run. This
is happening even as transport services and other components are being
closed. In some cases, this can lead to a stack overflow as we rapidly
try to check the license state of the remote cluster, can not because
the transport service is shutdown, and then immeidately retry
again. This can happen faster than the shutdown, and we die with stack
overflow. This commit adds a stop command to auto-followers so that this
retry loop occurs at most once on shutdown.
2019-03-18 07:24:51 -04:00
Luca Cavanna 8fb5c326bb
Skip sibling pipeline aggregators reduction during non-final reduce (#40101)
Today a coordinating node forces a final reduction of sibling pipeline aggregators whenever reducing aggs, unless it is reducing aggs incrementally. This works well for incremental reduction of aggs, but breaks CCS when minimizing roundtrips as each cluster ends up reducing its own pipeline aggregators locally while that should only be done by the CCS coordinating node later. This causes issues as after their reduction,  pipeline aggs cannot be further reduced, which is what happens with CCS causing errors like "java.lang.UnsupportedOperationException: Not supported" being returned.

Each coordinating node should rather honour the reduce context flag that
indicates whether we are executing a final reduce or not. If not, it should leave the sibling pipeline aggregations alone.

Note that his bug affects only pipeline aggs that don't have a parent in
the aggs tree, while all the others work well.

Relates to #40059 but does not fix it yet, as the CCS coordinating node also needs to be adapted to recreate sibling pipeline aggregators from the request.
2019-03-18 12:24:32 +01:00
Luca Cavanna 792a8ed653
CCS: skip empty search hits when minimizing round-trips (#40098)
When minimizing round-trips, each cluster returns its own independent
search response. In case sort by field and/or field collapsing were
requested, when one cluster has no results to return, the information
about the field that sorting was based on (SortField array) as well as
the field (and the values) that collapsing was performed on are missing
in the search response. That causes problems as we can't build the
proper `TopDocs` instance which would need to be either `TopFieldDocs`
or `CollapseTopFieldDocs`. The merge routine expects that all the top
docs are of the same exact type which can't be guaranteed. Given that
the problematic results are empty, hence have no impact on the final
results, we can simply skip them.

Relates to #32125
Closes #40067
2019-03-18 12:23:44 +01:00
Luca Cavanna 27aee54d35 Fix bad cross-link
Relates to #39329
2019-03-18 11:54:46 +01:00
Luca Cavanna 87f4d3f851
[DOCS] add details on version compatibility and remote gateway selection (#40056)
This commit clarifies how the gateway selection works when configuring
remote clusters for CCR or CCS. Specifically, it clarifies compatibility
between different versions which is a very common question.
2019-03-18 11:37:56 +01:00
Jim Ferenczi 76d7cf8fb0
reenable bwc tests after backport of #40095 (#40146) 2019-03-18 11:33:44 +01:00
Alex Doerr 0f40658d09 Clarify version compatibility in snapshot/restore docs (#39329) 2019-03-18 11:29:36 +01:00
David Turner 541fd2848c
Note that GET /_cluster/state is unstable (#40104)
The `GET /_cluster/state` API returns an internal representation of the cluster
state that does change from version to version. It's useful for debugging, but
it is not intended for regular use by clients.

This change adjusts the documentation of `GET /_cluster/state` to clarify that
this API yields an internal representation that should not be expected to
remain stable between versions.

Relates #40061, #40016
2019-03-18 09:12:03 +00:00
Jim Ferenczi 4872983666
Adapt bwc serialization in FieldSortBuilder (#40095)
This change adapts the bwc version check for the new numeric_type option of the FieldSortBuilder.
Since this change needs to be backported to 7x, all bwc tests are temporary disabled
until https://github.com/elastic/elasticsearch/pull/40084 is merged.

Relates #38095
2019-03-18 09:32:31 +01:00
jimczi 1bc2b77ed9 Revert "Revert "Fix IndexSearcherWrapper visibility (#39071)""
This reverts commit 653afc81e5.
2019-03-18 08:51:45 +01:00
Daniel Mitterdorfer 19ebe5cfb9
Document monitoring node stats collection timeout (#39846)
With this commit we document the setting
`xpack.monitoring.collection.node.stats.timeout` that has been missing
so far in the docs.

Supersedes #31043
2019-03-18 08:24:52 +01:00
Ioannis Kakavas 9843994b20
Enable QA tests to run with FIPS nodes (#40105)
This commit enables full-cluster-restart and rolling-upgrade tests
to run with nodes using a JVM in fips approved only node by using
PEM key material instead of a JKS for the transport layer in that
case.
2019-03-18 08:52:14 +02:00
Ioannis Kakavas eaeacde4d8
Throw an exception when unable to read Certificate (#40092)
With SUN security provider, a CertificateException is thrown when
attempting to parse a Certificate from a PEM file on disk with
`sun.security.provider.X509Provider#parseX509orPKCS7Cert`

When using the BouncyCastle Security provider (as we do in fips
tests) the parsing happens in
CertificateFactory#engineGenerateCertificates which doesn't throw
an exception but returns an empty list.

In order to have a consistent behavior, this change makes it so
that we throw a CertificateException when attempting to read
a PEM file from disk and failing to do so in either Security
Provider

Resolves: #39580
2019-03-18 08:45:50 +02:00
Nhat Nguyen 7893aaec0c
Dump cluster state if ensureGreen timed out in QA tests (#40133)
When the method ensureGreen in QA tests is timed out, it does not
provide enough info for us to investigate why the testing index is
not green yet. With this change, we will dump the cluster state if
ensureGreen timed out.

Relates #32027
2019-03-17 11:23:35 -04:00
Albert Zaharovits 3844125a22
Un-hardcode SecurityIndexManager to handle generic indices (#40064)
`SecurityIndexManager` is hardcoded to handle only the `.security`-`.security-7` alias-index pair.
This commit removes the hardcoded bits, so that the `SecurityIndexManager` can be reused
for other indices, such as the planned security tokens index (`.security-tokens-7`).
2019-03-17 14:45:08 +02:00
Albert Zaharovits 97ef2951b7
AuditTrail correctly handle ReplicatedWriteRequest (#39925)
This fix deduplicates index names in `BulkShardRequests` and only audits
the specific resolved index for every comprising `BulkItemRequest`.
2019-03-17 13:03:50 +02:00
Ioannis Kakavas 7fa59c7175
Adjust ldap timeout for idp fixture (#40102)
This change adjusts the LDAP connection timeout for retrieving
attributes while performing the SAML IT to 5 seconds, from 5 ms
that it previously was.
Resolves: #40025
2019-03-17 11:45:58 +02:00
David Roberts 7a2ff624ea Mute JobResultsProviderIT.testMultipleSimultaneousJobCreations
Due to https://github.com/elastic/elasticsearch/issues/40134
2019-03-17 07:48:42 +00:00
Jason Tedor cfacd1a167
Reenable BWC tests after removing cluster state size (#40127)
This commit reenables the BWC tests after removing cluster state size
and backporting that work.
2019-03-16 14:39:55 -04:00
Jason Tedor 87627fcbae
Remove es.cluster_state.size hard failure (#40111)
In 6.7.0 we introduced a system property to return the old behavior of
computing and sending the compressed size of the cluster state on
cluster state endpoints. In 7.0.0, we removed this behavior but hard
failed usage of this system property (to clearly inform users this
behavior is gone). In this commit, intending to target 8.0.0 only, we
remove this detection as it should no longer be needed; that is, we now
treat this as dead code.
2019-03-16 11:04:27 -04:00
Jason Tedor 9f971ab10d
Add log message for auto-follower timeout
When an auto-follower coordinator times out waiting for the remote
cluster state, we do not log any indication of this. While this is
expected behavior in quiet deployments, it is still useful to see this
information for tracing the behavior of the auto-follow
coordinator. This commit adds a trace log message indicating that the
timeout.
2019-03-16 10:45:43 -04:00
Benjamin Trent 88644aa9c5
[ML] fixing sort order (#40119) 2019-03-16 09:02:01 -05:00
Andy Bristol 2722d3f7de
use shell with JAVA_HOME for starting archive (#40118)
Closes #40099
2019-03-15 19:50:26 -07:00
Gordon Brown df11ba8a32
Remove Migration Upgrade and Assistance APIs (#40075)
The Migration Assistance API has been functionally replaced by the
Deprecation Info API, and the Migration Upgrade API is not used for the
transition from ES 6.x to 7.x, and does not need to be kept around to
repair indices that were not properly upgraded before upgrading the
cluster, as was the case in 6.
2019-03-15 15:34:50 -06:00
jimczi 653afc81e5 Revert "Fix IndexSearcherWrapper visibility (#39071)"
This reverts commit e4d46ba74e.
2019-03-15 19:04:24 +01:00
Tim Brooks 9026c91a84
Remove transport name from tcp channel (#40074)
Currently, we maintain a transport name ("mock-nio", "nio", "netty")
that is passed to a `TcpTransportChannel` when a request is received.
The value of this name is to associate with the task when we register a
task with the task manager. However, it is only possible to run ES with
one transport, so having an implementation specific name is unnecessary.
This commit removes the name and replaces it with the generic
"transport".
2019-03-15 11:47:29 -06:00
Julie Tibshirani 91181e6779
Document the limitation around field aliases and percolator. (#40073)
Currently if a field alias is updated, any percolator queries that contain the
alias will still refer to its old target. This PR documents the issue while we
look into addressing it.

Relates to #37212.
2019-03-15 10:34:58 -07:00
Igor Motov 8579235aff
SQL: Refactor Literals serialization method (#40058)
Since other classes besides intervals can be serialized as part of
the Cursor, the getNamedWritables method should be moved from Intervals
to a more generic class Literals.

Relates to #39973
2019-03-15 13:15:11 -04:00
Jason Tedor 0195626b6d
Remove cluster state size (#40061)
This commit removes the cluster state size field from the cluster state
response, and drops the backwards compatibility layer added in 6.7.0 to
continue to support this field. As calculation of this field was
expensive and had dubious value, we have elected to remove this field.
2019-03-15 13:06:22 -04:00
Jack Conradson c0797294fb
Add Painless cast tests for long and Long (#40007) 2019-03-15 09:33:08 -07:00
Yannick Welsch c3fe51cfd7
Reduce logging noise when stepping down as master before state recovery (#39950)
Reduces the logging noise from the state recovery component when there are duelling elections.

Relates to #32006
2019-03-15 17:14:00 +01:00
Zachary Tong deab46ab5e
Do not allow Sampler to allocate more than maxDoc size, better CB accounting (#39381)
The `sampler` agg creates a BestDocsDeferringCollector, which internally
initializes a priority queue of size `shardSize`.  This queue is
populated with empty `Object` sentinels, which is roughly 16b per
object.

Similarly, the Diversified samplers create a DiversifiedTopDocsCollectors
which internally track PQ slots with ScoreDocKeys, weighing in around
28kb

If the user sets a very abusive `shard_size`, this could easily OOM
a node or cluster since these PQ are allocated up-front without
any checks.

This commit makes sure that when we create the collector, it
cannot be larger than the maxDoc so that we don't accidentally blow
up the node.  We ensure the size is not greater than the overall
index maxDoc. A similar treatment is done for `maxDocsPerValue`
parameter of the diversified samplers

For good measure, this also adds in some CB accounting to try and track
memory usage.

Finally, a redundant array creation is removed to reduce a bit of
temporary memory.
2019-03-15 11:46:00 -04:00
Lisa Cawley 7a6021ca98
[DOCS] Replaces CCS terms with attributes (#40076) 2019-03-15 07:54:45 -07:00
Lisa Cawley 9c6c40d28f
[DOCS] Fixes version info for rolling upgrades (#40070) 2019-03-15 07:53:17 -07:00
Jim Ferenczi e4d46ba74e
Fix IndexSearcherWrapper visibility (#39071)
* Fix IndexSearcherWrapper visibility

This change adds a wrapper for IndexSearcher that makes IndexSearcher#search(List, Weight, Collector) visible by
sub-classes. The wrapper is used by the ContextIndexSearcher to call this protected method on a searcher created by a plugin.
This ensures that an override of the protected method in an IndexSearcherWrapper plugin is called when a search is executed.

Closes #30758
2019-03-15 15:33:46 +01:00
David Kyle 65e97325d9
Mute CcrRetentionLeaseIT tests (#40090) 2019-03-15 13:17:08 +00:00
David Kyle 4c803e5abb
[ML] Avoid assertions on empty Optional in DF usage test (#40043)
Refactor the usage class to make testing simpler
2019-03-15 12:16:38 +00:00
David Kyle 758fb1531c
Mute DateTimeUnitTests.testConversion (#40086) 2019-03-15 11:39:21 +00:00
Henning Andersen 62bb853161
Blob Store compress default to true (#40033)
Changed default of compress setting from false to true for blob store
repositories. This aligns the code with existing documentation and also
seems like the better default.
2019-03-15 12:23:22 +01:00
Costin Leau 4f09613942
SQL: Introduce MAD (MedianAbsoluteDeviation) aggregation (#40048)
Add Median Absolute Deviation aggregation

Fix #39597
2019-03-15 11:45:10 +02:00
David Kyle cbd87f70de
Fix compilation failure in :qa:vagrant (#40083) 2019-03-15 09:38:17 +00:00
David Roberts be7ee7d2ed
[ML] Fix race condition when creating multiple jobs (#40049)
If multiple jobs are created together and the anomaly
results index does not exist then some of the jobs could
fail to update the mappings of the results index. This
lead them to fail to write their results correctly later.

Although this scenario sounds rare, it is exactly what
happens if the user creates their first jobs using the
Nginx module in the ML UI.

This change fixes the problem by updating the mappings
of the results index if it is found to exist during a
creation attempt.

Fixes #38785
2019-03-15 09:25:55 +00:00
Jim Ferenczi ddd34b192a
Add an option to force the numeric type of a field sort (#38095)
This change adds an option to the `FieldSortBuilder` that allows to transform the type
of a numeric field into another. Possible values for this option are `long` that transforms
the source field into an integer and `double` that transforms the source field into a floating point.
This new option is useful for cross-index search when the sort field is mapped differently on some
indices. For instance if a field is mapped as a floating point in one index and as an integer in another
it is possible to align the type for both indices using the `numeric_type` option:

```
{
   "sort": {
    "field": "my_field",
    "numeric_type": "double" <1>
   }
}
```

<1> Ensure that values for this field are transformed to a floating point if needed.
2019-03-15 10:25:26 +01:00
David Kyle 27346a076b Mute test NetworkDisruptionIT.testJobRelocation
Relates to #39858
2019-03-15 09:06:49 +00:00
David Turner fd70883e26 Missing import 2019-03-15 08:54:34 +00:00
David Turner cac5b4b453
Await all pending activity in testConnectAndDisconnect (#40037)
We call `ensureConnections()` to undo the effects of a disruption. However, it
is possible that one or more targets are currently CONNECTING and have been
since the disruption was active, and that the connection attempt was thwarted
by a concurrent disruption to the connection.  If so, we cannot simply add our
listener to the queue because it will be notified when this CONNECTING activity
completes even though it was disrupted. We must therefore wait for all the
current activity to finish and then go through and reconnect to any missing
nodes.

Closes #40030.
2019-03-15 08:04:08 +00:00
Ryan Ernst 7d5ff03b82
Add no-jdk distributions (#39882)
This commit adds a variant for every official distribution that omits
the bundled jdk. The "no-jdk" naming is conveyed through the package
classifier, alongside the platform. Package tests are also added for
each new distribution.
2019-03-15 00:55:15 -07:00
Ioannis Kakavas 5035c15238
Handle empty input in AddStringKeyStoreCommand (#39490)
* Handle zero byte from stdin in keystore command

This change ensures that we do not make assumptions about the length
of the input that we can read from the stdin. It still consumes only
one line, as the previous implementation

* address feedback

* Allow eventual IOException to be thrown
2019-03-15 09:36:51 +02:00
David Turner 9bc332a080
Create retention leases file during recovery (#39359)
Today we load the shard history retention leases from disk whenever opening the
engine, and treat a missing file as an empty set of leases. However in some
cases this is inappropriate: we might be restoring from a snapshot (if the
target index already exists then there may be leases on disk) or
force-allocating a stale primary, and in neither case does it make sense to
restore the retention leases from disk.

With this change we write an empty retention leases file during recovery,
except for the following cases:

- During peer recovery the on-disk leases may be accurate and could be needed
  if the recovery target is made into a primary.

- During recovery from an existing store, as long as we are not
  force-allocating a stale primary.

Relates #37165
2019-03-15 07:36:05 +00:00
Ioannis Kakavas e0651c2e1e [CI] Log response entity in correct level
This is an adjustment to the logging line that was added in
8e5ba9a1e4. This is necessary to
troubleshoot intermittent errors in SsmlAuthenticationIT in
CI that are not reproducible locally.
2019-03-15 07:57:01 +02:00