elasticsearch

Commit Graph

Author	SHA1	Message	Date
Mark Vieira	a92a647b9f	Update sources with new SSPL+Elastic-2.0 license headers As per the new licensing change for Elasticsearch and Kibana this commit moves existing Apache 2.0 licensed source code to the new dual license SSPL+Elastic license 2.0. In addition, existing x-pack code now uses the new version 2.0 of the Elastic license. Full changes include: - Updating LICENSE and NOTICE files throughout the code base, as well as those packaged in our published artifacts - Update IDE integration to now use the new license header on newly created source files - Remove references to the "OSS" distribution from our documentation - Update build time verification checks to no longer allow Apache 2.0 license header in Elasticsearch source code - Replace all existing Apache 2.0 license headers for non-xpack code with updated header (vendored code with Apache 2.0 headers obviously remains the same). - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.	2021-02-02 16:10:53 -08:00
Rory Hunter	ad1f876daa	Replace NOT operator with explicit `false` check (#67817 ) We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-01-26 14:47:09 +00:00
Rory Hunter	1a05a5ac24	Introduce deprecation categories (#67443 ) Closes #64824. Introduce the concept of categories to deprecation logging. Every location where we log a deprecation message must now include a deprecation category.	2021-01-18 16:16:54 +00:00
Julie Tibshirani	5852fbedf5	Rename QueryShardContext -> SearchExecutionContext. (#67490 ) We decided to rename `QueryShardContext` to clarify that it supports all parts of search request execution. Before there was confusion over whether it should only be used for building queries, or maybe only used in the query phase. This PR also updates the javadocs. Closes #64740.	2021-01-14 09:11:59 -08:00
Jim Ferenczi	c756ce1acf	Sort field tiebreaker for PIT (point in time) readers (#66093 ) This commit introduces a new sort field called `_shard_doc` that can be used in conjunction with a PIT to consistently tiebreak identical sort values. The sort value is a numeric long that is composed of the ordinal of the shard (assigned by the coordinating node) and the internal Lucene document ID. These two values are consistent within a PIT so this sort criteria can be used as the tiebreaker of any search requests. Since this sort criteria is stable we'd like to add it automatically to any sorted search requests that use a PIT but we also need to expose it explicitly in order to be able to: * Reverse the order of the tiebreaking, useful to search "before" `search_after`. * Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8. I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big. Relates #56828	2020-12-18 12:13:12 +01:00
Alan Woodward	1a8ce8716d	Restore use of default search and search_quote analyzers (#65491 ) In the refactoring of TextFieldMapper, we lost the ability to define a default search or search_quote analyzer in index settings. This commit restores that ability, and adds some more comprehensive testing. Fixes #65434	2020-11-26 16:57:45 +00:00
Nik Everett	a08b52f3bd	Add `runtime_mappings` to search request (#64374 ) This adds a way to specify the `runtime_mappings` on a search request which are always "runtime" fields. It looks like: ``` curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d' {"index": {}} {"animal": "cat", "sound": "meow"} {"index": {}} {"animal": "dog", "sound": "woof"} {"index": {}} {"animal": "snake", "sound": "hisssssssssssssssss"} ' curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d' { "runtime_mappings": { "animal.upper": { "type": "keyword", "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}" } }, "query": { "match": { "animal.upper": "DOG" } } }' ``` NOTE: If we have to send a search request with runtime mappings to a node that doesn't support runtime mappings at all then we'll fail the search request entirely. The alternative would be to not send those runtime mappings and let the node fail the search request with an "unknown field" error. I believe this is would be hard to surprising because you defined the field in the search request. NOTE: It isn't obvious but you can also use `runtime_mappings` to override fields inside objects by naming the runtime fields with `.` in them. Like this: ``` curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d' {"index":{}} {"name": {"first": "Andrew", "last": "Wiggin"}} {"index":{}} {"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}} ' curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{ "runtime_mappings": { "name.first": { "type": "keyword", "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}" } }, "query": { "match": { "name.first": "Bean" } } }' ``` Relates to #59332	2020-11-10 12:38:59 -05:00
Jake Landis	7dd57c9415	Introduce javaRestTest source set/task and convert modules (#59939 ) Introduce a javaRestTest source set and task to compliment the yamlRestTest. javaRestTest differs such that the code is sourced from Java and may have different dependencies and setup requirements for the test clusters. This also allows the tests to run in parallel in different cluster instances to prevent any cross test contamination between the two types of tests. Included in this PR is all :modules no longer use the integTest task. The tests are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest. Since only :modules (and :rest-api-spec) have been converted to yamlRestTest we can now disable the integTest task if either yamlRestTest or javaRestTest have been applied. Once all projects are converted, we can delete the integTest task. related: #56841 related: #59444	2020-07-21 17:17:17 -05:00
malpani	08de504b44	Support ignore_keywords flag for word delimiter graph token filter (#59563 ) This commit allows customizing the word delimiter token filters to skip processing tokens tagged as keyword through the `ignore_keywords` flag Lucene's WordDelimiterGraphFilter already exposes. Fix for #59491	2020-07-21 16:11:11 +01:00
Jake Landis	ddd882b835	Convert modules to use yamlRestTest (#59089 ) This commit moves the modules REST tests to the newly introduced yamlRestTest source set. A few tests have also been re-named to include the correct IT suffix. Without changing the names, the testing conventions task would fail since now that the YAML tests are no longer present pacify the convention. These tests have moved to the internalClusterTest source set. related: #56841	2020-07-13 11:32:42 -05:00
Jake Landis	333a5d8cdf	Create plugin for yamlTest task (#56841 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 12:13:01 -05:00
Przemyslaw Gomulka	9bef31ccd3	Do not create two loggers for DeprecationLogger (#58435 ) DeprecationLogger's constructor should not create two loggers. It was taking parent logger instance, changing its name with a .deprecation prefix and creating a new logger. Most of the time parent logger was not needed. It was causing Log4j to unnecessarily cache the unused parent logger instance.	2020-06-29 13:38:21 +02:00
Przemyslaw Gomulka	4d6dc51c72	Header warning logging refactoring (#55941 ) Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog). Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed. relates #55699 relates #52369	2020-06-01 15:44:01 +02:00
Tomasz Elendt	66ded59929	Support multiple tokens on LHS in stemmer_override rules (#56113 ) (#56484 ) This commit adds support for rules with multiple tokens on LHS, also known as "contraction rules", into stemmer override token filter. Contraction rules are handy into translating multiple inflected words into the same root form. One side effect of this change is that it brings stemmer override rules format closer to synonym rules format so that it makes it easier to translate one into another. This change also makes stemmer override rules parser more strict so that it should catch more errors which were previously accepted. Closes #56113	2020-05-29 22:28:41 +02:00
Andrei Balici	da31b4b83d	Add `max_token_length` setting to the CharGroupTokenizer (#56860 ) Adds `max_token_length` option to the CharGroupTokenizer. Updates documentation as well to reflect the changes. Closes #56676	2020-05-20 14:15:57 +02:00
Amit Khandelwal	00fef6dfd3	Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432 )	2020-05-04 10:06:37 +01:00
Amit Khandelwal	9e41feda86	Expose `preserve_original` in `edge_ngram` token filter (#55766 ) The Lucene `preserve_original` setting is currently not supported in the `edge_ngram` token filter. This change adds it with a default value of `false`. Closes #55767	2020-04-28 10:22:59 +02:00
Rory Hunter	8638d08ebf	Always use deprecateAndMaybeLog for deprecation warnings (#55115 ) Closes #53137. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages can potentially be deduplicated.	2020-04-16 16:19:45 +01:00
David Turner	6e98af385a	Add RepositoriesService to createComponents() args (#54814 ) Today we pass the `RepositoriesService` to the searchable snapshots plugin during the initialization of the `RepositoryModule`, forcing the plugin to be a `RepositoryPlugin` even though it does not implement any repositories. After discussion we decided it best for now to pass this in via `Plugin#createComponents` instead, pending some future work in which plugins can depend on services more dynamically.	2020-04-16 15:40:28 +01:00
Jason Tedor	95a7eed9aa	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 15:52:01 -04:00
Jake Landis	afc2383b72	Optimize which Rest resources are used by the Rest tests. (#53299 ) This should help with Gradle's incremental compile such that projects only depend upon the resources they use. related #52114	2020-03-18 09:09:29 -05:00
Jay Modi	0d1e67dbbb	Single instance of the IndexNameExpressionResolver (#52596 ) This commit modifies the codebase so that our production code uses a single instance of the IndexNameExpressionResolver class. This change is being made in preparation for allowing name expression resolution to be augmented by a plugin. In order to remove some instances of IndexNameExpressionResolver, the single instance is added as a parameter of Plugin#createComponents and PersistentTaskPlugin#getPersistentTasksExecutor.	2020-02-20 15:04:45 -07:00
Adrien Grand	28e2f16734	Prepare backport of #51260 . (#51876 ) Backport: #51875	2020-02-05 11:02:46 +01:00
Adrien Grand	d5bc6d6de0	Move analysis/mappings stats to cluster-stats. (#51260 ) Closes #51138	2020-02-04 16:56:49 +01:00
Marios Trivyzas	24e1858a70	Fix caching for PreConfiguredTokenFilter (#50912 ) The PreConfiguredTokenFilter#singletonWithVersion uses the version internaly for the token filter factories but it registers only one instance in the cahce and not one instance per version. This can lead to exceptions like the one described in #50734 since the singleton is created and cached using the version created of the first index that is processed. Remove the singletonWithVersion() methods and use the elasticsearchVersion() methods instead. Fixes: #50734	2020-01-16 12:04:14 +01:00
Christoph Büscher	9a4357ae04	Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862 ) We already deprecated and removed the camel-case versions of the nGram and edgeNGram filters a while ago and we should do the same with the nGram and edgeNGram tokenizers. This PR deprecates the use of these names in favour of ngram and edge_ngram in 7 and disallows usage in new indices starting with 8. Closes #50561	2020-01-14 17:18:47 +01:00
Alan Woodward	736ed474e2	Check for deprecations when analyzers are built (#50908 ) Generally speaking, deprecated analysis components in elasticsearch will issue deprecation warnings when they are first used. However, this means that no warnings are emitted when indexes are created with deprecated components, and users have to actually index a document to see warnings. This makes it much harder to see these warnings and act on them at appropriate times. This is worse in the case where components throw exceptions on upgrade. In this case, users will not be aware of a problem until a document is indexed, instead of at index creation time. This commit adds a new check that pushes an empty string through all user-defined analyzers and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings and exceptions are now emitted when indexes are created or opened. Fixes #42349	2020-01-14 13:12:25 +00:00
Alan Woodward	2ba5da2eca	Remove type parameter from CIR.mapping(type, object...) (#50739 ) This commit removes the type parameter from `CreateIndexRequest.mapping(type, object...)`, and the associated delegating method on `CreateIndexRequestBuilder`. To make migration simpler, the method on `CreateIndexRequest` is renamed to `simpleMapping`, and on `CreateIndexRequestBuilder` to `setMapping`; this should help the compiler catch all necessary changes on upgrades. Relates to #41059	2020-01-09 16:02:28 +00:00
Christoph Büscher	4b366a4cbb	Make Multiplexer inherit filter chains analysis mode (#50662 ) Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the _reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. This, of course, will also make the analyzers using the multiplexer be usable at search-time only. Closes #50554	2020-01-08 18:25:00 +01:00
Alan Woodward	a59b065091	Remove type parameter from `CreateIndexRequest.mapping(type, XContentBuilder)` (#50586 ) This continues the removal of type parameters from CreateIndexRequest.mapping methods started in #50419. Here the removed methods are almost entirely in test code, with the exception of a change to TransformIndex in the transform plugin. Relates to #41059	2020-01-08 09:18:31 +00:00
Christoph Büscher	68f22faef9	Delete removed token filter names from SynonymsAnalysisTests (#50438 ) The `testPreconfiguredTokenFilters` test refers to the `nGram` and `edgeNGram` token filter which are no longer part of the preconfigured token filters, so they can be removed here as well.	2020-01-02 16:53:56 +01:00
Christoph Büscher	c6f7166145	Throw Error on deprecated nGram and edgeNGram custom filters (#50376 ) The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We currently throw errors on new indices when they are used. However these errors are currently only thrown for pre-configured filters, adding them as custom filters doesn't trigger the warning and error. This change adds the appropriate exceptions for `nGram` and `edgeNGram` respectively. Closes #50360	2019-12-20 18:01:05 +01:00
Stuart Tettemer	cd721b6386	Scripting: ScriptFactory not required by compile (#50344 ) Avoid backwards incompatible changes for 8.x and 7.6 by removing type restriction on compile and Factory. Factories may optionally implement ScriptFactory. If so, then they can indicate determinism and thus cacheability. Relates: #49466	2019-12-19 10:14:28 -07:00
Stuart Tettemer	356d1a274e	Scripting: Groundwork for caching script results (#49895 ) In order to cache script results in the query shard cache, we need to check if scripts are deterministic. This change adds a default method to the script factories, `isResultDeterministic() -> false` which is used by the `QueryShardContext`. Script results were never cached and that does not change here. Future changes will implement this method based on whether the results of the scripts are deterministic or not and therefore cacheable. Refs: #49466	2019-12-06 13:09:44 -07:00
Christoph Büscher	249f5a28a0	Remove outdated Todo in CommonAnalysisPlugin (#49450 )	2019-11-22 11:01:47 +01:00
Christoph Büscher	ed86750fa4	Allow custom characters in token_chars of ngram tokenizers (#49250 ) Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers only allows for a list of predefined character classes, which might not fit every use case. For example, including underscore "_" in a token would currently require the `punctuation` class which comes with a lot of other characters. This change adds an additional "custom" option to the `token_chars` setting, which requires an additional `custom_token_chars` setting to be present and which will be interpreted as a set of characters to inlcude into a token. Closes #25894	2019-11-20 10:36:39 +01:00
gpaimla	d1ea9910c3	Implement Lucene EstonianAnalyzer, Stemmer (#49149 ) This PR adds a new analyzer and stemmer for the Estonian language. Closes #48895	2019-11-18 17:19:54 +01:00
Rory Hunter	3a3e5f6176	Apply 2-space indent to all gradle scripts (#48849 ) Closes #48724. Update `.editorconfig` to make the Java settings the default for all files, and then apply a 2-space indent to all `*.gradle` files. Then reformat all the files.	2019-11-13 10:14:04 +00:00
Rory Hunter	cbfd61266e	Improve resiliency to auto-formatting in libs, modules (#48448 ) Make a number of changes so that code in the `libs` and `modules` directories are more resilient to automatic formatting. This covers: * Format cipher lists vertically, instead of horizontally * Remove string concatenation where JSON fits on a single line * Move some comments around to they aren't auto-formatted to a strange place	2019-10-29 09:38:31 +00:00
Alan Woodward	750c6d8bb1	Remove Client.prepareIndex(index, type, id) method (#48443 ) As types are no longer used in index requests, we can remove the type parameter from `prepareIndex` methods in the `Client` interface. However, just changing the signature of `prepareIndex(index, type, id)` to `prepareIndex(index, id)` risks confusion when upgrading with the previous (now removed) `prepareIndex(index, type)` method - just changing the dependency version of java code would end up silently changing the semantics of the method call. Instead we should just remove this method entirely, and replace it by calling `prepareIndex(index).setId(id)`	2019-10-25 11:09:52 +01:00
Alan Woodward	c2a048b772	Reset Token position on reuse in scripted analysis (#47424 ) Most of the information in AnalysisPredicateScript.Token is pulled directly from its underlying AttributeSource, but we also keep track of the token position, and this state is held directly on the Token. This information needs to be reset when the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used. Fixes #47197	2019-10-02 11:19:25 +01:00
Tanguy Leroux	b1a03a137f	Remove unused private methods and fields (#47115 ) This commit removes a bunch of unused private fields and unused private methods from the code base.	2019-09-26 09:35:57 +02:00
Christoph Büscher	bd25a52604	Enable reloading of synonym_graph filters (#45135 ) Reloading of synonym_graph filter doesn't work currently because the search time AnalysisMode doesn't get propagated to the TokenFilterFactory emitted by the graph filters getChainAwareTokenFilterFactory() method. This change fixes that. Closes #45127	2019-08-02 14:34:22 +02:00
Alan Woodward	c8ae530e7a	Don't use index_phrases on graph queries (#44340 ) Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you try to use a synonym filter with the index_phrases option on a text field, you can end up with null values in a Phrase query, leading to weird exceptions further down the querying chain. As a workaround, this commit disables the index_phrases optimization for queries that produce token graphs. Fixes #43976	2019-07-17 16:08:28 +01:00
Alan Woodward	60b460d38a	Add name() method to TokenizerFactory (#43909 ) This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory, and removes the need to pass around tokenizer names when building custom analyzers. As this means that TokenizerFactory is no longer a functional interface, the commit also adds a factory method to TokenizerFactory to make construction simpler.	2019-07-04 11:23:27 +01:00
Christoph Büscher	62d13e9468	Remove rests of StandardHtmlStripAnalyzer (#43485 ) StandardHtmlStripAnalyzer has been deprecated in 6.x and cannot be used for new indices from 7.0 on. This change removes it entirely and also removes the from tests and deprecation logging that has still been around during the 7.x versions.	2019-06-28 11:25:51 +02:00
Christoph Büscher	56ee1a5e00	Allow reloading of search time analyzers (#43313 ) Currently changing resources (like dictionaries, synonym files etc...) of search time analyzers is only possible by closing an index, changing the underlying resource (e.g. synonym files) and then re-opening the index for the change to take effect. This PR adds a new API endpoint that allows triggering reloading of certain analysis resources (currently token filters) that will then pick up changes in underlying file resources. To achieve this we introduce a new type of custom analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows swapping out analysis components. Custom analyzers that contain filters that are markes as "updateable" will automatically choose this implementation. This PR also adds this capability to `synonym` token filters for use in search time analyzers. Relates to #29051	2019-06-27 18:27:11 +02:00
Alan Woodward	392245b45f	Remove preconfigured `delimited_payload_filter` (#43686 ) #41560 removed the delimited_payload_filter as part of a general cleanup of pre-version 7 restrictions, but missed removing the preconfigured version due to #43684.	2019-06-27 14:42:27 +01:00
Alan Woodward	fbefb4690e	Use preconfigured filters correctly in Analyze API (#43568 ) When a named token filter or char filter is passed as part of an Analyze API request with no index, we currently try and build the relevant filter using no index settings. However, this can miss cases where there is a pre-configured filter defined in the analysis registry. One example here is the elision filter, which has a pre-configured version built with the french elision set; when used as part of normal analysis, this preconfigured set is used, but when used as part of the Analyze API we end up with NPEs because it tries to instantiate the filter with no index settings. This commit changes the Analyze API to check for pre-configured filters in the case that the request has no index defined, and is using a name rather than a custom definition for a filter. It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram` tokenizer to make their settings consistent with the defaults used when creating them with no settings Closes #43002 Closes #43621 Closes #43582	2019-06-27 09:01:53 +01:00
Alan Woodward	d2c696d54b	Require [articles] setting in elision filter (#43083 ) We should throw an exception at construction time if a list of articles is not provided, otherwise we can get random NPEs during indexing. Relates to #43002	2019-06-27 08:56:26 +01:00

1 2 3

150 Commits