Commit Graph

162 Commits

Author SHA1 Message Date
Jake Landis 26dfe02a0b
Update task names for rest compatiblity (#75267)
This commit updates two task names:
```
yamlRestCompatTest -> yamlRestTestV7CompatTest
transformV7RestTests -> yamlRestTestV7CompatTransform 
```

`7` is the N-1 version and calculated, such that when `8` is 
N-1 version the task names will be  `yamlRestTestV8CompatTest` and 
`yamlRestTestV8CompatTransform`

The motivation for `yamlRestCompatTest -> yamlRestTestV7CompatTest`  is that 
many projects have configured `yamlRestCompatTest` 
but that configuration is specific to the N-1 version. For example, 
if we blacklist tests when running compatibility with v7, we don't also
want to blacklist those tests when running compatibility with v8.

By introducing a version-specific identifier in the name, the task will not
even exist when bumping the version creating the need to (correctly) remove
the version-specific condition.

The motivation for `transformV7RestTests -> yamlRestTestV7CompatTransform` 
is to provide more consistent naming. 

The idea behind the naming is the main task people
are likely familiar with is :

`yamlRestTest` so we will use that as a base.
`yamlRestTestV7CompatTest` to run the version-specific compat tests
`yamlRestTestV7CompatTransform` to run the version-specific transformations for the compat tests

CI should be un-effected since since we introduced a lifecycle task 
name `checkRestCompat` which is what CI should be configured to use.
2021-09-03 11:26:11 -05:00
Rene Groeschke 35ec6f348c
Introduce simple public yaml-rest-test plugin (#76554)
This introduces a basic public yaml rest test plugin that is supposed to be used by external 
elasticsearch plugin authors. This is driven by #76215

- Rename yaml-rest-test to intern-yaml-rest-test
- Use public yaml plugin in example plugins

Co-authored-by: Mark Vieira <portugee@gmail.com>
2021-08-31 08:45:52 +02:00
Przemyslaw Gomulka 127015e54c
[Rest Api Compatibility] Clean up blocklist (#76179)
v7compatibilityNotSupportedTests was introduced to make it easier to
track tests that have been identified as not needing compatible changes
and those that still need to be checked.
We have checked all tests now and the separate list is no longer needed.

relates #51816
relates #73912
2021-08-19 10:09:46 +02:00
Przemyslaw Gomulka 782e2c67f8
[Rest Api Compatibility] CommonTermsQuery and cutoff_frequency param (#75896)
Previously removed in #42654. The query and the parameter won't work under rest api compatibility and an exception with a message is returned advising that just use of match/multi_match is enough

relates #51816
2021-08-05 15:14:28 +02:00
Przemyslaw Gomulka 71e05838a6
[Rest Api Compatibility] Enable tests after types and cat api fixed (#75179)
Some tests are fixed after typed api is available with compatible api.
Also cat api returning text fixed some tests
relates #51816
2021-07-14 08:37:38 +02:00
Alan Woodward 7d665616da
Deprecate setting version on analyzers (#74073)
The version field on all lucene Analyzers is unused, and is being removed
in lucene 9. This commit deprecates setting a version on an analyzer in
index settings and removes the related calls to Analyzer.setVersion()

Relates to #74057
2021-06-16 09:40:41 +01:00
Ryan Ernst 63012c8a40
Move ParseField to o.e.c.xcontent (#73923)
ParseField is part of the x-content lib, yet it doesn't exist under the
same root package as the rest of the lib. This commit moves the class to
the appropriate package.

relates #73784
2021-06-08 13:32:14 -07:00
Jake Landis 279fde375e
Apply REST API compatibility testing for the :modules (#71137) 2021-04-02 11:20:54 -05:00
Mark Vieira 6339691fe3
Consolidate REST API specifications and publish under Apache 2.0 license (#70036) 2021-03-26 16:20:14 -07:00
Christoph Büscher c67b2384fe
Make keyword_marker filter updateable (#65457)
Currently we don't allow `keyword_marker` filter file resources to be reloaded via the
`_reload_search_analyzers` API. It would make sense to allow reloading this when the
file content has changed to allow e.g. for updating stemmer exeption rules at search time
without having to close and re-open the index in question. This change adds the updateable
flag to this token filter in the same way it is used for synonym filters. Analyzers containing
updateable keyword_marker filters would not be allowed to be used at index time but at
search time only, similar to what we allow for synonym filters.

Closes #65355
2021-02-25 16:44:11 +01:00
Julie Tibshirani 936abca50a
Rename MatchQuery -> MatchQueryParser. (#68716)
This commit renames `MatchQuery` to make it clear it's not a query. Its purpose
is actually to produce Lucene queries through its `parse` method.

It also renames `MultiMatchQuery` -> `MultiMatchQueryParser`.
2021-02-09 08:56:00 -08:00
Rory Hunter 2d44cce31e
Replace NOT operator with explicit `false` check - part 9 (#68645)
Part 9.

We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-02-08 15:28:57 +00:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Rory Hunter 1a05a5ac24
Introduce deprecation categories (#67443)
Closes #64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
2021-01-18 16:16:54 +00:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Jim Ferenczi c756ce1acf
Sort field tiebreaker for PIT (point in time) readers (#66093)
This commit introduces a new sort field called `_shard_doc` that
can be used in conjunction with a PIT to consistently tiebreak
identical sort values. The sort value is a numeric long that is
composed of the ordinal of the shard (assigned by the coordinating node)
and the internal Lucene document ID. These two values are consistent within
a PIT so this sort criteria can be used as the tiebreaker of any search
requests.
Since this sort criteria is stable we'd like to add it automatically to any
sorted search requests that use a PIT but we also need to expose it explicitly
in order to be able to:
* Reverse the order of the tiebreaking, useful to search "before" `search_after`.
* Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8.

I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big.

Relates #56828
2020-12-18 12:13:12 +01:00
Alan Woodward 1a8ce8716d
Restore use of default search and search_quote analyzers (#65491)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 16:57:45 +00:00
Nik Everett a08b52f3bd
Add `runtime_mappings` to search request (#64374)
This adds a way to specify the `runtime_mappings` on a search request
which are always "runtime" fields. It looks like:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d'
{"index": {}}
{"animal": "cat", "sound": "meow"}
{"index": {}}
{"animal": "dog", "sound": "woof"}
{"index": {}}
{"animal": "snake", "sound": "hisssssssssssssssss"}
'

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'
{
  "runtime_mappings": {
    "animal.upper": {
      "type": "keyword",
      "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}"
    }
  },
  "query": {
    "match": {
      "animal.upper": "DOG"
    }
  }
}'
```

NOTE:
If we have to send a search request with runtime mappings to a node that
doesn't support runtime mappings at all then we'll fail the search
request entirely. The alternative would be to not send those runtime
mappings and let the node fail the search request with an "unknown field"
error. I believe this is would be hard to surprising because you defined
the field in the search request.

NOTE:
It isn't obvious but you can also use `runtime_mappings` to override fields
inside objects by naming the runtime fields with `.` in them. Like this:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d'
{"index":{}}
{"name": {"first": "Andrew", "last": "Wiggin"}}
{"index":{}}
{"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}}
'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "runtime_mappings": {
    "name.first": {
      "type": "keyword",
      "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}"
    }
  },
  "query": {
    "match": {
      "name.first": "Bean"
    }
  }
}'
```

Relates to #59332
2020-11-10 12:38:59 -05:00
Jake Landis 7dd57c9415
Introduce javaRestTest source set/task and convert modules (#59939)
Introduce a javaRestTest source set and task to compliment the yamlRestTest.
javaRestTest differs such that the code is sourced from Java and may have
different dependencies and setup requirements for the test clusters. This also
allows the tests to run in parallel in different cluster instances to prevent any
cross test contamination between the two types of tests.

Included in this PR is all :modules no longer use the integTest task. The tests
are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest.
Since only :modules (and :rest-api-spec) have been converted to yamlRestTest
we can now disable the integTest task if either yamlRestTest or javaRestTest have
been applied. Once all projects are converted, we can delete the integTest task.

related: #56841
related: #59444
2020-07-21 17:17:17 -05:00
malpani 08de504b44
Support ignore_keywords flag for word delimiter graph token filter (#59563)
This commit allows customizing the word delimiter token filters to skip processing 
tokens tagged as keyword through the `ignore_keywords` flag Lucene's 
WordDelimiterGraphFilter already exposes.

Fix for #59491
2020-07-21 16:11:11 +01:00
Jake Landis ddd882b835
Convert modules to use yamlRestTest (#59089)
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.

related: #56841
2020-07-13 11:32:42 -05:00
Jake Landis 333a5d8cdf
Create plugin for yamlTest task (#56841)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 12:13:01 -05:00
Przemyslaw Gomulka 9bef31ccd3
Do not create two loggers for DeprecationLogger (#58435)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
2020-06-29 13:38:21 +02:00
Przemyslaw Gomulka 4d6dc51c72
Header warning logging refactoring (#55941)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
2020-06-01 15:44:01 +02:00
Tomasz Elendt 66ded59929
Support multiple tokens on LHS in stemmer_override rules (#56113) (#56484)
This commit adds support for rules with multiple tokens on LHS, also
known as "contraction rules", into stemmer override token
filter. Contraction rules are handy into translating multiple
inflected words into the same root form. One side effect of this change is
that it brings stemmer override rules format closer to synonym rules
format so that it makes it easier to translate one into another.

This change also makes stemmer override rules parser more strict so
that it should catch more errors which were previously accepted.

Closes #56113
2020-05-29 22:28:41 +02:00
Andrei Balici da31b4b83d
Add `max_token_length` setting to the CharGroupTokenizer (#56860)
Adds `max_token_length` option to the CharGroupTokenizer.
Updates documentation as well to reflect the changes.

Closes #56676
2020-05-20 14:15:57 +02:00
Amit Khandelwal 00fef6dfd3
Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432) 2020-05-04 10:06:37 +01:00
Amit Khandelwal 9e41feda86
Expose `preserve_original` in `edge_ngram` token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:22:59 +02:00
Rory Hunter 8638d08ebf
Always use deprecateAndMaybeLog for deprecation warnings (#55115)
Closes #53137. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages
can potentially be deduplicated.
2020-04-16 16:19:45 +01:00
David Turner 6e98af385a
Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 15:40:28 +01:00
Jason Tedor 95a7eed9aa
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 15:52:01 -04:00
Jake Landis afc2383b72
Optimize which Rest resources are used by the Rest tests. (#53299)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-18 09:09:29 -05:00
Jay Modi 0d1e67dbbb
Single instance of the IndexNameExpressionResolver (#52596)
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.

In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
2020-02-20 15:04:45 -07:00
Adrien Grand 28e2f16734
Prepare backport of #51260. (#51876)
Backport: #51875
2020-02-05 11:02:46 +01:00
Adrien Grand d5bc6d6de0
Move analysis/mappings stats to cluster-stats. (#51260)
Closes #51138
2020-02-04 16:56:49 +01:00
Marios Trivyzas 24e1858a70
Fix caching for PreConfiguredTokenFilter (#50912)
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
2020-01-16 12:04:14 +01:00
Christoph Büscher 9a4357ae04
Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862)
We already deprecated and removed the camel-case versions of the nGram and edgeNGram 
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in 7
and disallows usage in new indices starting with 8.

Closes #50561
2020-01-14 17:18:47 +01:00
Alan Woodward 736ed474e2
Check for deprecations when analyzers are built (#50908)
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349
2020-01-14 13:12:25 +00:00
Alan Woodward 2ba5da2eca
Remove type parameter from CIR.mapping(type, object...) (#50739)
This commit removes the type parameter from `CreateIndexRequest.mapping(type, object...)`,
and the associated delegating method on `CreateIndexRequestBuilder`. To make migration
simpler, the method on `CreateIndexRequest` is renamed to `simpleMapping`, and
on `CreateIndexRequestBuilder` to `setMapping`; this should help the compiler catch all
necessary changes on upgrades.

Relates to #41059
2020-01-09 16:02:28 +00:00
Christoph Büscher 4b366a4cbb
Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the 
_reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the 
filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge
the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time
only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. 
This, of course, will also make the analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 18:25:00 +01:00
Alan Woodward a59b065091
Remove type parameter from `CreateIndexRequest.mapping(type, XContentBuilder)` (#50586)
This continues the removal of type parameters from CreateIndexRequest.mapping
methods started in #50419. Here the removed methods are almost entirely in test
code, with the exception of a change to TransformIndex in the transform plugin.

Relates to #41059
2020-01-08 09:18:31 +00:00
Christoph Büscher 68f22faef9
Delete removed token filter names from SynonymsAnalysisTests (#50438)
The `testPreconfiguredTokenFilters` test refers to the `nGram` and `edgeNGram`
token filter which are no longer part of the preconfigured token filters, so
they can be removed here as well.
2020-01-02 16:53:56 +01:00
Christoph Büscher c6f7166145
Throw Error on deprecated nGram and edgeNGram custom filters (#50376)
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
exceptions for `nGram` and `edgeNGram` respectively.

Closes #50360
2019-12-20 18:01:05 +01:00
Stuart Tettemer cd721b6386
Scripting: ScriptFactory not required by compile (#50344)
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory.  Factories may optionally implement
ScriptFactory.  If so, then they can indicate determinism and thus
cacheability.

Relates: #49466
2019-12-19 10:14:28 -07:00
Stuart Tettemer 356d1a274e
Scripting: Groundwork for caching script results (#49895)
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic.  This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.

Script results were never cached and that does not change here.  Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.

Refs: #49466
2019-12-06 13:09:44 -07:00
Christoph Büscher 249f5a28a0
Remove outdated Todo in CommonAnalysisPlugin (#49450) 2019-11-22 11:01:47 +01:00
Christoph Büscher ed86750fa4
Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:36:39 +01:00
gpaimla d1ea9910c3 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:19:54 +01:00
Rory Hunter 3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00