elasticsearch

Commit Graph

Author	SHA1	Message	Date
Jake Landis	26dfe02a0b	Update task names for rest compatiblity (#75267 ) This commit updates two task names: ``` yamlRestCompatTest -> yamlRestTestV7CompatTest transformV7RestTests -> yamlRestTestV7CompatTransform ``` `7` is the N-1 version and calculated, such that when `8` is N-1 version the task names will be `yamlRestTestV8CompatTest` and `yamlRestTestV8CompatTransform` The motivation for `yamlRestCompatTest -> yamlRestTestV7CompatTest` is that many projects have configured `yamlRestCompatTest` but that configuration is specific to the N-1 version. For example, if we blacklist tests when running compatibility with v7, we don't also want to blacklist those tests when running compatibility with v8. By introducing a version-specific identifier in the name, the task will not even exist when bumping the version creating the need to (correctly) remove the version-specific condition. The motivation for `transformV7RestTests -> yamlRestTestV7CompatTransform` is to provide more consistent naming. The idea behind the naming is the main task people are likely familiar with is : `yamlRestTest` so we will use that as a base. `yamlRestTestV7CompatTest` to run the version-specific compat tests `yamlRestTestV7CompatTransform` to run the version-specific transformations for the compat tests CI should be un-effected since since we introduced a lifecycle task name `checkRestCompat` which is what CI should be configured to use.	2021-09-03 11:26:11 -05:00
Rene Groeschke	35ec6f348c	Introduce simple public yaml-rest-test plugin (#76554 ) This introduces a basic public yaml rest test plugin that is supposed to be used by external elasticsearch plugin authors. This is driven by #76215 - Rename yaml-rest-test to intern-yaml-rest-test - Use public yaml plugin in example plugins Co-authored-by: Mark Vieira <portugee@gmail.com>	2021-08-31 08:45:52 +02:00
Przemyslaw Gomulka	127015e54c	[Rest Api Compatibility] Clean up blocklist (#76179 ) v7compatibilityNotSupportedTests was introduced to make it easier to track tests that have been identified as not needing compatible changes and those that still need to be checked. We have checked all tests now and the separate list is no longer needed. relates #51816 relates #73912	2021-08-19 10:09:46 +02:00
Przemyslaw Gomulka	782e2c67f8	[Rest Api Compatibility] CommonTermsQuery and cutoff_frequency param (#75896 ) Previously removed in #42654. The query and the parameter won't work under rest api compatibility and an exception with a message is returned advising that just use of match/multi_match is enough relates #51816	2021-08-05 15:14:28 +02:00
Przemyslaw Gomulka	71e05838a6	[Rest Api Compatibility] Enable tests after types and cat api fixed (#75179 ) Some tests are fixed after typed api is available with compatible api. Also cat api returning text fixed some tests relates #51816	2021-07-14 08:37:38 +02:00
Alan Woodward	7d665616da	Deprecate setting version on analyzers (#74073 ) The version field on all lucene Analyzers is unused, and is being removed in lucene 9. This commit deprecates setting a version on an analyzer in index settings and removes the related calls to Analyzer.setVersion() Relates to #74057	2021-06-16 09:40:41 +01:00
Ryan Ernst	63012c8a40	Move ParseField to o.e.c.xcontent (#73923 ) ParseField is part of the x-content lib, yet it doesn't exist under the same root package as the rest of the lib. This commit moves the class to the appropriate package. relates #73784	2021-06-08 13:32:14 -07:00
Jake Landis	279fde375e	Apply REST API compatibility testing for the :modules (#71137 )	2021-04-02 11:20:54 -05:00
Mark Vieira	6339691fe3	Consolidate REST API specifications and publish under Apache 2.0 license (#70036 )	2021-03-26 16:20:14 -07:00
Christoph Büscher	c67b2384fe	Make keyword_marker filter updateable (#65457 ) Currently we don't allow `keyword_marker` filter file resources to be reloaded via the `_reload_search_analyzers` API. It would make sense to allow reloading this when the file content has changed to allow e.g. for updating stemmer exeption rules at search time without having to close and re-open the index in question. This change adds the updateable flag to this token filter in the same way it is used for synonym filters. Analyzers containing updateable keyword_marker filters would not be allowed to be used at index time but at search time only, similar to what we allow for synonym filters. Closes #65355	2021-02-25 16:44:11 +01:00
Julie Tibshirani	936abca50a	Rename MatchQuery -> MatchQueryParser. (#68716 ) This commit renames `MatchQuery` to make it clear it's not a query. Its purpose is actually to produce Lucene queries through its `parse` method. It also renames `MultiMatchQuery` -> `MultiMatchQueryParser`.	2021-02-09 08:56:00 -08:00
Rory Hunter	2d44cce31e	Replace NOT operator with explicit `false` check - part 9 (#68645 ) Part 9. We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-02-08 15:28:57 +00:00
Mark Vieira	a92a647b9f	Update sources with new SSPL+Elastic-2.0 license headers As per the new licensing change for Elasticsearch and Kibana this commit moves existing Apache 2.0 licensed source code to the new dual license SSPL+Elastic license 2.0. In addition, existing x-pack code now uses the new version 2.0 of the Elastic license. Full changes include: - Updating LICENSE and NOTICE files throughout the code base, as well as those packaged in our published artifacts - Update IDE integration to now use the new license header on newly created source files - Remove references to the "OSS" distribution from our documentation - Update build time verification checks to no longer allow Apache 2.0 license header in Elasticsearch source code - Replace all existing Apache 2.0 license headers for non-xpack code with updated header (vendored code with Apache 2.0 headers obviously remains the same). - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.	2021-02-02 16:10:53 -08:00
Rory Hunter	ad1f876daa	Replace NOT operator with explicit `false` check (#67817 ) We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-01-26 14:47:09 +00:00
Rory Hunter	1a05a5ac24	Introduce deprecation categories (#67443 ) Closes #64824. Introduce the concept of categories to deprecation logging. Every location where we log a deprecation message must now include a deprecation category.	2021-01-18 16:16:54 +00:00
Julie Tibshirani	5852fbedf5	Rename QueryShardContext -> SearchExecutionContext. (#67490 ) We decided to rename `QueryShardContext` to clarify that it supports all parts of search request execution. Before there was confusion over whether it should only be used for building queries, or maybe only used in the query phase. This PR also updates the javadocs. Closes #64740.	2021-01-14 09:11:59 -08:00
Jim Ferenczi	c756ce1acf	Sort field tiebreaker for PIT (point in time) readers (#66093 ) This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates #56828	2020-12-18 12:13:12 +01:00
Alan Woodward	1a8ce8716d	Restore use of default search and search_quote analyzers (#65491 ) In the refactoring of TextFieldMapper, we lost the ability to define a default search or search_quote analyzer in index settings. This commit restores that ability, and adds some more comprehensive testing. Fixes #65434	2020-11-26 16:57:45 +00:00
Nik Everett	a08b52f3bd	Add `runtime_mappings` to search request (#64374 ) This adds a way to specify the `runtime_mappings` on a search request which are always "runtime" fields. It looks like: ``` curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d' {"index": {}} {"animal": "cat", "sound": "meow"} {"index": {}} {"animal": "dog", "sound": "woof"} {"index": {}} {"animal": "snake", "sound": "hisssssssssssssssss"} ' curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d' { "runtime_mappings": { "animal.upper": { "type": "keyword", "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}" } }, "query": { "match": { "animal.upper": "DOG" } } }' ``` NOTE: If we have to send a search request with runtime mappings to a node that doesn't support runtime mappings at all then we'll fail the search request entirely. The alternative would be to not send those runtime mappings and let the node fail the search request with an "unknown field" error. I believe this is would be hard to surprising because you defined the field in the search request. NOTE: It isn't obvious but you can also use `runtime_mappings` to override fields inside objects by naming the runtime fields with `.` in them. Like this: ``` curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d' {"index":{}} {"name": {"first": "Andrew", "last": "Wiggin"}} {"index":{}} {"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}} ' curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{ "runtime_mappings": { "name.first": { "type": "keyword", "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}" } }, "query": { "match": { "name.first": "Bean" } } }' ``` Relates to #59332	2020-11-10 12:38:59 -05:00
Jake Landis	7dd57c9415	Introduce javaRestTest source set/task and convert modules (#59939 ) Introduce a javaRestTest source set and task to compliment the yamlRestTest. javaRestTest differs such that the code is sourced from Java and may have different dependencies and setup requirements for the test clusters. This also allows the tests to run in parallel in different cluster instances to prevent any cross test contamination between the two types of tests. Included in this PR is all :modules no longer use the integTest task. The tests are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest. Since only :modules (and :rest-api-spec) have been converted to yamlRestTest we can now disable the integTest task if either yamlRestTest or javaRestTest have been applied. Once all projects are converted, we can delete the integTest task. related: #56841 related: #59444	2020-07-21 17:17:17 -05:00
malpani	08de504b44	Support ignore_keywords flag for word delimiter graph token filter (#59563 ) This commit allows customizing the word delimiter token filters to skip processing tokens tagged as keyword through the `ignore_keywords` flag Lucene's WordDelimiterGraphFilter already exposes. Fix for #59491	2020-07-21 16:11:11 +01:00
Jake Landis	ddd882b835	Convert modules to use yamlRestTest (#59089 ) This commit moves the modules REST tests to the newly introduced yamlRestTest source set. A few tests have also been re-named to include the correct IT suffix. Without changing the names, the testing conventions task would fail since now that the YAML tests are no longer present pacify the convention. These tests have moved to the internalClusterTest source set. related: #56841	2020-07-13 11:32:42 -05:00
Jake Landis	333a5d8cdf	Create plugin for yamlTest task (#56841 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 12:13:01 -05:00
Przemyslaw Gomulka	9bef31ccd3	Do not create two loggers for DeprecationLogger (#58435 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance.	2020-06-29 13:38:21 +02:00
Przemyslaw Gomulka	4d6dc51c72	Header warning logging refactoring (#55941 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369	2020-06-01 15:44:01 +02:00
Tomasz Elendt	66ded59929	Support multiple tokens on LHS in stemmer_override rules (#56113 ) (#56484 ) This commit adds support for rules with multiple tokens on LHS, also known as "contraction rules", into stemmer override token filter. Contraction rules are handy into translating multiple inflected words into the same root form. One side effect of this change is that it brings stemmer override rules format closer to synonym rules format so that it makes it easier to translate one into another. This change also makes stemmer override rules parser more strict so that it should catch more errors which were previously accepted. Closes #56113	2020-05-29 22:28:41 +02:00
Andrei Balici	da31b4b83d	Add `max_token_length` setting to the CharGroupTokenizer (#56860 ) Adds `max_token_length` option to the CharGroupTokenizer. Updates documentation as well to reflect the changes. Closes #56676	2020-05-20 14:15:57 +02:00
Amit Khandelwal	00fef6dfd3	Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432 )	2020-05-04 10:06:37 +01:00
Amit Khandelwal	9e41feda86	Expose `preserve_original` in `edge_ngram` token filter (#55766 ) The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767	2020-04-28 10:22:59 +02:00
Rory Hunter	8638d08ebf	Always use deprecateAndMaybeLog for deprecation warnings (#55115 ) Closes #53137. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages can potentially be deduplicated.	2020-04-16 16:19:45 +01:00
David Turner	6e98af385a	Add RepositoriesService to createComponents() args (#54814 ) Today we pass the `RepositoriesService` to the searchable snapshots plugin during the initialization of the `RepositoryModule`, forcing the plugin to be a `RepositoryPlugin` even though it does not implement any repositories. After discussion we decided it best for now to pass this in via `Plugin#createComponents` instead, pending some future work in which plugins can depend on services more dynamically.	2020-04-16 15:40:28 +01:00
Jason Tedor	95a7eed9aa	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 15:52:01 -04:00
Jake Landis	afc2383b72	Optimize which Rest resources are used by the Rest tests. (#53299 ) This should help with Gradle's incremental compile such that projects only depend upon the resources they use. related #52114	2020-03-18 09:09:29 -05:00
Jay Modi	0d1e67dbbb	Single instance of the IndexNameExpressionResolver (#52596 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor.	2020-02-20 15:04:45 -07:00
Adrien Grand	28e2f16734	Prepare backport of #51260 . (#51876 ) Backport: #51875	2020-02-05 11:02:46 +01:00
Adrien Grand	d5bc6d6de0	Move analysis/mappings stats to cluster-stats. (#51260 ) Closes #51138	2020-02-04 16:56:49 +01:00
Marios Trivyzas	24e1858a70	Fix caching for PreConfiguredTokenFilter (#50912 ) The PreConfiguredTokenFilter#singletonWithVersion uses the version internaly for the token filter factories but it registers only one instance in the cahce and not one instance per version. This can lead to exceptions like the one described in #50734 since the singleton is created and cached using the version created of the first index that is processed. Remove the singletonWithVersion() methods and use the elasticsearchVersion() methods instead. Fixes: #50734	2020-01-16 12:04:14 +01:00
Christoph Büscher	9a4357ae04	Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862 ) We already deprecated and removed the camel-case versions of the nGram and edgeNGram filters a while ago and we should do the same with the nGram and edgeNGram tokenizers. This PR deprecates the use of these names in favour of ngram and edge_ngram in 7 and disallows usage in new indices starting with 8. Closes #50561	2020-01-14 17:18:47 +01:00
Alan Woodward	736ed474e2	Check for deprecations when analyzers are built (#50908 ) Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes #42349	2020-01-14 13:12:25 +00:00
Alan Woodward	2ba5da2eca	Remove type parameter from CIR.mapping(type, object...) (#50739 ) This commit removes the type parameter from `CreateIndexRequest.mapping(type, object...)`, and the associated delegating method on `CreateIndexRequestBuilder`. To make migration simpler, the method on `CreateIndexRequest` is renamed to `simpleMapping`, and on `CreateIndexRequestBuilder` to `setMapping`; this should help the compiler catch all necessary changes on upgrades. Relates to #41059	2020-01-09 16:02:28 +00:00
Christoph Büscher	4b366a4cbb	Make Multiplexer inherit filter chains analysis mode (#50662 ) Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the _reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. This, of course, will also make the analyzers using the multiplexer be usable at search-time only. Closes #50554	2020-01-08 18:25:00 +01:00
Alan Woodward	a59b065091	Remove type parameter from `CreateIndexRequest.mapping(type, XContentBuilder)` (#50586 ) This continues the removal of type parameters from CreateIndexRequest.mapping methods started in #50419. Here the removed methods are almost entirely in test code, with the exception of a change to TransformIndex in the transform plugin. Relates to #41059	2020-01-08 09:18:31 +00:00
Christoph Büscher	68f22faef9	Delete removed token filter names from SynonymsAnalysisTests (#50438 ) The `testPreconfiguredTokenFilters` test refers to the `nGram` and `edgeNGram` token filter which are no longer part of the preconfigured token filters, so they can be removed here as well.	2020-01-02 16:53:56 +01:00
Christoph Büscher	c6f7166145	Throw Error on deprecated nGram and edgeNGram custom filters (#50376 ) The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We currently throw errors on new indices when they are used. However these errors are currently only thrown for pre-configured filters, adding them as custom filters doesn't trigger the warning and error. This change adds the appropriate exceptions for `nGram` and `edgeNGram` respectively. Closes #50360	2019-12-20 18:01:05 +01:00
Stuart Tettemer	cd721b6386	Scripting: ScriptFactory not required by compile (#50344 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Relates: #49466	2019-12-19 10:14:28 -07:00
Stuart Tettemer	356d1a274e	Scripting: Groundwork for caching script results (#49895 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466	2019-12-06 13:09:44 -07:00
Christoph Büscher	249f5a28a0	Remove outdated Todo in CommonAnalysisPlugin (#49450 )	2019-11-22 11:01:47 +01:00
Christoph Büscher	ed86750fa4	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:36:39 +01:00
gpaimla	d1ea9910c3	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:19:54 +01:00
Rory Hunter	3a3e5f6176	Apply 2-space indent to all gradle scripts (#48849 ) Closes #48724. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-13 10:14:04 +00:00

1 2 3 4

162 Commits