Commit Graph

3050 Commits

Author SHA1 Message Date
Nik Everett 6e0e6255a5
Remove some extra reproduce info (#71706)
This drops a few properties from the reproduction info printed when a
test fails because it is implied by the build:
* `tests.security.manager`
* `tests.rest.suite`
* `tests.rest.blacklist`

The two `tests.rest` properties a set by the build *and* duplicate the
`--test` output!

Closes #71290
2021-04-20 08:34:47 -04:00
Henning Andersen 9d6ce2c8d6
Frozen autoscaling decider based on storage pct (#71756)
The frozen tier partially downloads shards only. This commit
introduces an autoscaling decider that scales the total storage
on the tier according to a configurable percentage relative to
the total data set size.
2021-04-20 14:09:07 +02:00
Luca Cavanna d8057bfe71
Rename on_script_error options to fail or continue (#71841)
As we started thinking about applying on_script_error to runtime fields, to handle script errors at search time, we would like to use the same parameter that was recently introduced for indexed fields. We decided that continue or fail gives a better indication of the behaviour compared to the current ignore or reject which is too specific to indexing documents.

This commit applies such rename.
2021-04-20 09:59:42 +02:00
Henning Andersen 4312bf31c9
Add force single data path option for integ tests (#71868)
Some functionality will no longer work with multiple data paths and in
order to run integration tests for that, we need the capability to
force a single data path for those tests.

Relates #71844
2021-04-20 08:18:28 +02:00
David Turner c8fb9aad40
Track index details in SnapshotInfo (#71754)
This commit adds some per-index statistics to the `SnapshotInfo` blob:

- number of shards
- total size in bytes
- maximum number of segments per shard

It also exposes these statistics in the get snapshot API.
2021-04-19 14:57:32 +01:00
Przemyslaw Gomulka 3ef5e4c6e7
[Rest Compatible Api] include_type_name parameter (#70966)
This commit allows to use the include_type_name parameter with the compatible rest api.
The support for include_type_name was previously removed in #48632

relates #51816
types removal meta issue #54160
2021-04-19 15:21:24 +02:00
Dan Hermann eb345b2a8f
Deprecate legacy index template API endpoints (#71309) 2021-04-16 08:07:28 -05:00
Igor Motov 02eef40a45
Tests: add support for close_to assertion (#71590)
Adds support for close_to assertion to yaml tests. The assertion can be called
the following way:
```
  - close_to:   { get.fields._routing: { value: 5.1, error: 0.00001 } }
```
Closes #71303
2021-04-15 17:11:37 -10:00
Julie Tibshirani a18a65565a
Fix SearchReplicaSelectionIT failures (#71507)
This PR makes sure MockSearchService collects ARS statistics. Before, if we
randomly chose to use MockSearchService then ARS information would be missing
and the test would fail.

Also makes the following fixes:
* Removes a test workaround for the bug #71022, which is now fixed.
* Handle the case where nodes have same rank, to prevent random failures.
2021-04-15 08:31:37 -07:00
Nik Everett 1d69985dc9
Speed up terms agg when not force merged (#71241)
This speeds up the `terms` aggregation when it can't take the fancy
`filters` path, there is more than one segment, and any of those
segments have only a single value for the field. These three things are
super common.

Here are the performance change numbers:
```
|        50th percentile latency | date-histo-string-terms-via-global-ords | 3414.02 | 2632.01 | -782.015 | ms |
|        90th percentile latency | date-histo-string-terms-via-global-ords | 3470.91 | 2756.88 | -714.031 | ms |
|       100th percentile latency | date-histo-string-terms-via-global-ords | 3620.89 | 2875.79 | -745.102 | ms |
|   50th percentile service time | date-histo-string-terms-via-global-ords | 3410.15 | 2628.87 | -781.275 | ms |
|   90th percentile service time | date-histo-string-terms-via-global-ords | 3467.36 | 2752.43 | -714.933 | ms |   20%!!!!
|  100th percentile service time | date-histo-string-terms-via-global-ords | 3617.71 | 2871.63 | -746.083 | ms |
```

This works by hooking global ordinals into `DocValues.unwrapSingleton`.
Without this you could unwrap singletons *if* the segment's ordinals
aligned exactly with the global ordinals. If they didn't we'd return an
doc values iterator that you can't unwrap. Even if the segment ordinals
were singletons.

That speeds up the terms aggregator because we have a fast path we can
take if we have singletons. It was previously only working if we had a
single segment. Or if the segment's ordinals lined up exactly. Which,
for low cardinality fields is fairly common. So they might not benefit
from this quite as much as high cardinality fields.

Closes #71086
2021-04-15 08:27:28 -04:00
Nik Everett 2d6f8d1e0c
Add integration tests for filters (#69439)
Revamps the integration tests for the `filter` agg to be more clear and
builds integration tests for the `fitlers` agg. Both of these
integration tests are fairly basic but they do assert that the aggs
work.
2021-04-14 16:54:23 -04:00
Julie Tibshirani 318bf14126
Introduce `combined_fields` query (#71213)
This PR introduces a new query called `combined_fields` for searching multiple
text fields. It takes a term-centric view, first analyzing the query string
into individual terms, then searching for each term any of the fields as though
they were one combined field. It is based on Lucene's `CombinedFieldQuery`,
which takes a principled approach to scoring based on the BM25F formula.

This query provides an alternative to the `cross_fields` `multi_match` mode. It
has simpler behavior and a more robust approach to scoring.

Addresses #41106.
2021-04-14 13:33:19 -07:00
Jake Landis aaf1bb6400
Fix 4 path segment Window's REST test blacklist/repo line (#71660)
Normally there are only 3 parts to a YAML REST test
`api/name/test section name` where `api` is sourced 
from the filesystem, a relative path from the root of 
the tests. `name` is the filename of the test minus the `.yml` 
and the `test section name` is from inside the .yml file`

Some tests have use multiple directories to represent the `api`
for example `foo/bar/10_basic/My test Name` where foo/bar is the 
relative path from the root of the tests. All works fine in both 
*nix and Windows. Except for when you need to reference that `api`
(aka path from root) under Windows. Under Windows that relative path 
uses backslashes to represent the `api`.  This means that under Windows
you need to `foo\bar/10_basic/My test Name` to reproduce\execute a test. 
Additionally, due to how the regex matching is done for blacklisting tests
the backslash will never match, so it is not possible to 
blacklist a 4+ path YAML REST test for Windows. 

This commit simply ensures that the API part is always represented as a 
forward slash. This commit also removes a prior naive attempt to blacklist
on Windows. 

closes #71475
2021-04-14 14:21:59 -05:00
Jason Tedor 6823b8eb5e
Remove the ability for plugins to add roles (#71527)
This commit removes the ability for plugins to add roles. Roles are
fairly tightly coupled with the behavior of the system (as evidenced by
the fact that some roles from the default distribution leaked behavior
into the OSS distribution). We previously had this plugin extension
point so that we could support a difference in the set of roles between
the OSS and default distributions. We no longer need to maintain that
differentiation, and can therefore remove this plugin extension
point. This was technical debt that we were willing to accept to allow
the default distribution to have additional roles, but now we no longer
need to be encumbered with that technical debt.
2021-04-13 22:53:05 -04:00
Alan Woodward 78c79134b9
Forbid setting copy_to on scripted field mappers (#71621)
copy_to is currently implemented at document parse time, and does not
work with values generated from index-time scripts. We may want to add
this functionality in future, but for now this commit ensures that we throw
an exception if copy_to and script are both set on a field mapper.
2021-04-13 20:37:17 +01:00
Lyudmila Fokina 3b0b7941ae
Warn users if security is implicitly disabled (#70114)
* Warn users if security is implicitly disabled

Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
 clusters.
 This change introduces clear warnings when security features are
 implicitly disabled.
 - a warning header in each REST response if security is implicitly
 disabled;
 - a log message during cluster boot.
2021-04-13 18:33:41 +02:00
Nik Everett 57e6c78a52
Fix profiled global agg (#71575)
This fixes the `global` aggregator when `profile` is enabled. It does so
by removing all of the special case handling for `global` aggs in
`AggregationPhase` and having the global aggregator itself perform the
scoped collection using the same trick that we use in filter-by-filter
mode of the `filters` aggregation.

Closes #71098
2021-04-13 08:36:51 -04:00
Nik Everett 3583ba0eb5
Tests for runtime field queries with fbf aggs (#71503)
This adds a few tests for runtime field queries applied to
"filter-by-filter" style aggregations. We expect to still be able to
use filter-by-filter aggregations to speed up collection when the top
level query is a runtime field. You'd think that filter-by-filter would
be slow when the top level query is slow, like it is with runtime
fields, but we only run filter-by-filter when we can translate each
aggregation bucket into a quick query. So long as the results of those
queries don't "overlap" we shouldn't end up running the slower top level
query more times than we would during regular collection.

This also adds some javadoc to that effect to the two places where we
chose between filter-by-filter and a "native" aggregation
implementation.
2021-04-12 15:25:10 -04:00
Alan Woodward 5e11709693
Add scripts to keyword field mapper (#71555)
This commit adds script and on_script_error parameters to
keyword field mappers, allowing you to define index-time scripts
for keyword fields.
2021-04-12 16:46:02 +01:00
Tanguy Leroux 8a0beceeec
Centralize Lucene files extensions in one place (#71416)
Elasticsearch enumerates Lucene files extensions for various 
purposes: grouping files in segment stats under a description, 
mapping files in memory through HybridDirectory or adjusting 
the caching strategy for Lucene files in searchable snapshots.

But when a new extension is handled somewhere(let's say, 
added to the list of files to mmap) it is easy to forget to add it 
in other places. This commit is an attempt to centralize in a 
single place all known Lucene files extensions in Elasticsearch.
2021-04-12 15:58:32 +02:00
Alan Woodward 08aa65d061
Disallow multifields on mappers with index-time scripts (#71558)
Multifields are built at the same time as their parent fields, using
a positioned xcontent parser to read information. Fields with index
time scripts are built entirely differently, and it does not make sense
to combine the two.

This commit adds a base test to MapperScriptTestCase that ensures
a field mapper defined with both multifields and a script parameter
throws a parse error.
2021-04-12 14:27:10 +01:00
Luca Cavanna 1469e18c98
Add support for script parameter to boolean field mapper (#71454)
Relates to #68984
2021-04-12 10:04:12 +02:00
Jason Tedor 60808e92c1
Move voting only role to server (#71473)
This commit moves the voting only role to server, as part of the effort
to remove the ability for plugins to add roles.
2021-04-09 10:13:53 -04:00
Nhat Nguyen 5c9969250d
Allow specify dynamic templates in bulk request (#69948)
This change allows users to specify dynamic templates in a bulk request.

```
PUT myindex
{
  "mappings": {
    "dynamic_templates": [{
      "time_histograms": {
        "mapping": {
          "type": "histogram",
          "meta": {
            "unit": "s"
          }
        }
      }
    }]
  }
}
```

```
POST myindex/_bulk
{ "index": { "dynamic_templates": { "response_times": "time_histograms" } } }
{ "@timestamp": "2020-08-12", "response_times": { "values": [1, 10], "counts": [5, 1] }}
```

Closes #61939
2021-04-08 12:44:36 -04:00
Przemko Robakowski 44a2ae4893
Add GeoIP CLI integration test (#71381)
This change adds additional test to GeoIpDownloaderIT which tests that artifacts produces by GeoIP CLI tool can be consumed by cluster the same way as from our original service.
It does so by running the tool from fixture which then simply serves the generated files (this is exactly the way users are supposed to use the tool as well).

Relates to #68920
2021-04-08 12:49:29 +02:00
Alan Woodward af3f0e5069
Add MapperScriptTestCase (#71322)
When we added scripts to long and double mapped fields, we added tests
for the general scripting infrastructure, and also specific tests for those two
field types. This commit extracts those type-specific tests out into a new base
test class that we can use when adding scripts to more field mappers.
2021-04-08 11:55:08 +02:00
Nhat Nguyen bd124399c4
Ensure search contexts are released after tests (#71427)
These assertions are introduced in #71354
2021-04-07 14:08:24 -04:00
Yannick Welsch 801c50985c
Use default application credentials for GCS repositories (#71239)
Adds support for "Default Application Credentials" for GCS repositories, making it easier to set up a repository on GCP,
as all relevant information to connect to the repository is retrieved from the environment, not necessitating complicated
keystore setups.
2021-04-06 15:16:00 +02:00
Francisco Fernández Castaño e6894960f4
Include URLHttpClientIOException on URLBlobContainerRetriesTests testReadBlobWithReadTimeouts (#71318)
In some scenarios where the read timeout is too tight it's possible
that the http request times out before the response headers have
been received, in that case an URLHttpClientIOException is thrown.
This commit adds that exception type to the expected set of read timeout
exceptions.

Closes #70931
2021-04-06 14:58:57 +02:00
Christoph Büscher a07d876a93
Avoid duplicate values in MapperTestCase#testFetchMany (#71068)
The test currently generates a list of random values and checks whether
retrieval of these values via doc values is equivallent to fetching them with a
value fetcher from source. If the random value array contains a duplicate value,
we will only get one back via doc values, but fetching from source will return
both, which is a case we should probably avoid in this test.

Closes #71053
2021-04-06 10:54:35 +02:00
Ryan Ernst 6cf4eb7273
Deprecate multiple path.data entries (#71207)
This commit adds a node level deprecation log message when multiple
data paths are specified.

relates #71205
2021-04-02 14:55:36 -07:00
Jason Tedor 32314493a2
Pass override settings when creating test cluster (#71203)
Today when creating an internal test cluster, we allow the test to
supply the node settings that are applied. The extension point to
provide these settings has a single integer parameter, indicating the
index (zero-based) of the node being constructed. This allows the test
to make some decisions about the settings to return, but it is too
simplistic. For example, imagine a test that wants to provide a setting,
but some values for that setting are not valid on non-data nodes. Since
the only information the test has about the node being constructed is
its index, it does not have sufficient information to determine if the
node being constructed is a non-data node or not, since this is done by
the test framework externally by overriding the final settings with
specific settings that dicate the roles of the node. This commit changes
the test framework so that the test has information about what settings
are going to be overriden by the test framework after the test provide
its test-specific settings. This allows the test to make informed
decisions about what values it can return to the test framework.
2021-04-02 10:20:36 -04:00
Yash Jipkate 60f4d22722
Change default value of `action.destructive_requires_name` to True. (#66908)
This PR sets the default value of `action.destructive_requires_name`
to `true.` Fixes #61074. Additionally, we set this value explicitly in
test classes that rely on wildcard deletions to clear test state.
2021-03-31 15:59:57 -04:00
Jason Tedor e119ac60d4
Move data tier roles to server (#71084)
This commit moves the data tier roles to server. It is no longer
necessary to separate these roles from server as we no longer build
distributions that would not contain these roles. Moving these roles
will simplify many things. This is deliberately the smallest possible
commit that moves these roles. Other aspects related to the data tiers
can move in separate, also small, commits.
2021-03-31 15:13:02 -04:00
Przemko Robakowski 61fe14565a
Add tool for preparing local GeoIp database service (#71018)
Air-gapped environments can't simply use GeoIp database service provided by Infra, so they have to either use proxy or recreate similar service themselves.
This PR adds tool to make this process easier. Basic workflow is:

download databases from MaxMind site to single directory (either .mmdb files or gzipped tarballs with .tgz suffix)
run the tool with $ES_PATH/bin/elasticsearch-geoip -s directory/to/use [-t target/directory]
serve static files from that directory (for example with docker run -v directory/to/use:/usr/share/nginx/html:ro nginx
use server above as endpoint for GeoIpDownloader (geoip.downloader.endpoint setting)
to update new databases simply put new files in directory and run the tool again
This change also adds support for relative paths in overview json because the cli tool doesn't know about the address it would be served under.

Relates to #68920
2021-03-31 12:30:21 +02:00
Alan Woodward 1653f2fe91
Add script parameter to long and double field mappers (#69531)
This commit adds a script parameter to long and double fields that makes
it possible to calculate a value for these fields at index time. It uses the same
script context as the equivalent runtime fields, and allows for multiple index-time
scripted fields to cross-refer while still checking for indirection loops.
2021-03-31 11:14:11 +01:00
Nhat Nguyen 1edc7c6849 Mute testFetchMany
Tracked at #71053
2021-03-30 17:47:22 -04:00
Dan Hermann 2c6ba92d46
Improve data stream rollover and simplify cluster metadata validation for data streams (#70934) 2021-03-29 07:36:44 -05:00
Alan Woodward c475fd9e8a
Move runtime fields classes into common packages (#70965)
Runtime fields currently live in their own java package. This is really
a leftover from when they were in their own module; now that they are
in core they should instead live in the common packages for classes of
their kind.

This commit makes the following moves:
org.elasticsearch.runtimefields.mapper => org.elasticsearch.index.mapper
org.elasticsearch.runtimefields.fielddata => org.elasticsearch.index.fielddata
org.elasticsearch.runtimefields.query => org.elasticsearch.search.runtime

The XFieldScript fields are moved out of the `mapper` package into 
org.elasticsearch.scripts, and the `PARSE_FROM_SOURCE` default scripts
are moved from these Script classes directly into the field type classes that
use them.
2021-03-29 12:02:01 +01:00
Przemko Robakowski b025f51ece
Add support for .tgz files in GeoIpDownloader (#70725)
We have to ship COPYRIGHT.txt and LICENSE.txt files alongside .mmdb files for legal compliance. Infra will pack these in single .tgz (gzipped tar) archive provided by GeoIP databases service.
This change adds support for that format to GeoIpDownloader and DatabaseRegistry
2021-03-29 12:46:27 +02:00
Ignacio Vera a35563aaaf
Fix infinite loop when polygonizing a circle with centre on the pole (#70875)
This PR prevents the algorithm to run on circles that contain a pole.
2021-03-29 07:36:29 +02:00
Mark Vieira 6339691fe3
Consolidate REST API specifications and publish under Apache 2.0 license (#70036) 2021-03-26 16:20:14 -07:00
Francisco Fernández Castaño 3f8a9256ea
Add searchable snapshots integration tests for URL repositories (#70709)
Relates #69521
2021-03-26 15:23:44 +01:00
William Brafford 35af0bb47b
Don't use filesystem concat for resource paths in schema validation tests (#70596)
We use the `getDataPath` method to convert from a resource
location to a filesystem path in anticipation of eventually moving
the json files to a top-level directory. However, we were constructing
the resource locations using a filesystem concatenation, which,
on Windows, put backslashes in the path instead of slashes.
We will use a simple string concatenation to fix the Windows tests.
2021-03-26 10:13:18 -04:00
Dan Hermann 5077017034
Fix failing FullClusterRestartIT.testDataStreams test (#70845) 2021-03-25 06:53:48 -05:00
Nik Everett 91c700bd99
Super randomized tests for fetch fields API (#70278)
We've had a few bugs in the fields API where is doesn't behave like we'd
expect. Typically this happens because it isn't obvious what we expct. So
we'll try and use randomized testing to ferret out what we want. This adds
a test for most field types that asserts that `fields` works similarly
to `docvalues_fields`. We expect this to be true for most fields.

It does so by forcing all subclasses of `MapperTestCase` to define a
method that makes random values. It declares a few other hooks that
subclasses can override to further randomize the test.

We skip the test for a few field types that don't have doc values:
* `annotated_text`
* `completion`
* `search_as_you_type`
* `text`
We should come up with some way to test these without doc values, even
if it isn't as nice. But that is a problem for another time, I think.

We skip the test for a few more types just because I wanted to cut this
PR in half so we could get to reviewing it earlier. We'll get to those
in a follow up change.

I've filed a few bugs for things that are inconsistent with
`docvalues_fields`. Typically that means that we have to limit the
random values that we generate to those that *do* round trip properly.
2021-03-24 14:16:27 -04:00
Ignacio Vera afde502c14
Make sure forbidPrivateIndexSettings is kept during an internal cluster full restart (#70823) 2021-03-24 18:31:51 +01:00
Dan Hermann 7c3ebe220f
Remove obsolete BWC checks for data streams (#70777) 2021-03-24 07:22:40 -05:00
Tanguy Leroux efa6aea168
Prevent snapshot backed indices to be followed using CCR (#70580)
Today nothing prevents CCR's auto-follow patterns to pick 
up snapshot backed indices on a remote cluster. This can 
lead to various errors on the follower cluster that are not 
obvious to troubleshoot for a user (ex: multiple engine 
factories provided).

This commit adds verifications to CCR to make it fail faster 
when a user tries to follow an index that is backed by a 
snapshot, providing a more obvious error message.
2021-03-24 10:58:31 +01:00
Przemyslaw Gomulka e942873bd5
[REST Compatible API] Typed endpoints for Index and Get APIs (#69131)
The types removal effort has removed the type from Index API in #47671 and from Get API in #46587
This commit allows to use 'typed' endpoints for the both Index and Get APIs

relates compatible types-removal meta issue #54160
2021-03-23 10:59:21 +01:00