Commit Graph

150 Commits

Author SHA1 Message Date
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Rory Hunter ad1f876daa
Replace NOT operator with explicit `false` check (#67817)
We have an in-house rule to compare explicitly against `false` instead
of using the logical not operator (`!`). However, this hasn't
historically been enforced, meaning that there are many violations in
the source at present.

We now have a Checkstyle rule that can detect these cases, but before we
can turn it on, we need to fix the existing violations. This is being
done over a series of PRs, since there are a lot to fix.
2021-01-26 14:47:09 +00:00
Rory Hunter 1a05a5ac24
Introduce deprecation categories (#67443)
Closes #64824. Introduce the concept of categories to deprecation
logging. Every location where we log a deprecation message must now
include a deprecation category.
2021-01-18 16:16:54 +00:00
Julie Tibshirani 5852fbedf5
Rename QueryShardContext -> SearchExecutionContext. (#67490)
We decided to rename `QueryShardContext` to clarify that it supports all parts
of search request execution. Before there was confusion over whether it should
only be used for building queries, or maybe only used in the query phase. This
PR also updates the javadocs.

Closes #64740.
2021-01-14 09:11:59 -08:00
Jim Ferenczi c756ce1acf
Sort field tiebreaker for PIT (point in time) readers (#66093)
This commit introduces a new sort field called `_shard_doc` that
can be used in conjunction with a PIT to consistently tiebreak
identical sort values. The sort value is a numeric long that is
composed of the ordinal of the shard (assigned by the coordinating node)
and the internal Lucene document ID. These two values are consistent within
a PIT so this sort criteria can be used as the tiebreaker of any search
requests.
Since this sort criteria is stable we'd like to add it automatically to any
sorted search requests that use a PIT but we also need to expose it explicitly
in order to be able to:
* Reverse the order of the tiebreaking, useful to search "before" `search_after`.
* Force the primary sort to use it in order to benefit from the `search_after` optimization when sorting by index order (to be released in Lucene 8.8.

I plan to add the documentation and the automatic configuration for PIT in a follow up since this change is already big.

Relates #56828
2020-12-18 12:13:12 +01:00
Alan Woodward 1a8ce8716d
Restore use of default search and search_quote analyzers (#65491)
In the refactoring of TextFieldMapper, we lost the ability to define
a default search or search_quote analyzer in index settings. This
commit restores that ability, and adds some more comprehensive
testing.

Fixes #65434
2020-11-26 16:57:45 +00:00
Nik Everett a08b52f3bd
Add `runtime_mappings` to search request (#64374)
This adds a way to specify the `runtime_mappings` on a search request
which are always "runtime" fields. It looks like:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -XPOST -uelastic:password -HContent-Type:application/json 'localhost:9200/test/_bulk?pretty&refresh' -d'
{"index": {}}
{"animal": "cat", "sound": "meow"}
{"index": {}}
{"animal": "dog", "sound": "woof"}
{"index": {}}
{"animal": "snake", "sound": "hisssssssssssssssss"}
'

curl -XPOST -uelastic:password -HContent-Type:application/json localhost:9200/test/_search?pretty -d'
{
  "runtime_mappings": {
    "animal.upper": {
      "type": "keyword",
      "script": "for (String s : doc[\"animal.keyword\"]) {emit(s.toUpperCase())}"
    }
  },
  "query": {
    "match": {
      "animal.upper": "DOG"
    }
  }
}'
```

NOTE:
If we have to send a search request with runtime mappings to a node that
doesn't support runtime mappings at all then we'll fail the search
request entirely. The alternative would be to not send those runtime
mappings and let the node fail the search request with an "unknown field"
error. I believe this is would be hard to surprising because you defined
the field in the search request.

NOTE:
It isn't obvious but you can also use `runtime_mappings` to override fields
inside objects by naming the runtime fields with `.` in them. Like this:
```
curl -XDELETE -uelastic:password -HContent-Type:application/json localhost:9200/test
curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_bulk?refresh -d'
{"index":{}}
{"name": {"first": "Andrew", "last": "Wiggin"}}
{"index":{}}
{"name": {"first": "Julian", "last": "Delphiki", "suffix": "II"}}
'

curl -uelastic:password -XPOST -HContent-Type:application/json localhost:9200/test/_search?pretty -d'{
  "runtime_mappings": {
    "name.first": {
      "type": "keyword",
      "script": "if (\"Wiggin\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Ender\");} else if (\"Delphiki\".equals(doc[\"name.last.keyword\"].value)) {emit(\"Bean\");}"
    }
  },
  "query": {
    "match": {
      "name.first": "Bean"
    }
  }
}'
```

Relates to #59332
2020-11-10 12:38:59 -05:00
Jake Landis 7dd57c9415
Introduce javaRestTest source set/task and convert modules (#59939)
Introduce a javaRestTest source set and task to compliment the yamlRestTest.
javaRestTest differs such that the code is sourced from Java and may have
different dependencies and setup requirements for the test clusters. This also
allows the tests to run in parallel in different cluster instances to prevent any
cross test contamination between the two types of tests.

Included in this PR is all :modules no longer use the integTest task. The tests
are now driven by test, yamlRestTest, javaRestTest, and internalClusterTest.
Since only :modules (and :rest-api-spec) have been converted to yamlRestTest
we can now disable the integTest task if either yamlRestTest or javaRestTest have
been applied. Once all projects are converted, we can delete the integTest task.

related: #56841
related: #59444
2020-07-21 17:17:17 -05:00
malpani 08de504b44
Support ignore_keywords flag for word delimiter graph token filter (#59563)
This commit allows customizing the word delimiter token filters to skip processing 
tokens tagged as keyword through the `ignore_keywords` flag Lucene's 
WordDelimiterGraphFilter already exposes.

Fix for #59491
2020-07-21 16:11:11 +01:00
Jake Landis ddd882b835
Convert modules to use yamlRestTest (#59089)
This commit moves the modules REST tests to the
newly introduced yamlRestTest source set. A few
tests have also been re-named to include the correct
IT suffix. Without changing the names, the testing
conventions task would fail since now that the YAML
tests are no longer present pacify the convention.
These tests have moved to the internalClusterTest
source set.

related: #56841
2020-07-13 11:32:42 -05:00
Jake Landis 333a5d8cdf
Create plugin for yamlTest task (#56841)
This commit creates a new Gradle plugin to provide a separate task name
and source set for running YAML based REST tests. The only project
converted to use the new plugin in this PR is distribution/archives/integ-test-zip.
For which the testing has been moved to :rest-api-spec since it makes the most
sense and it avoids a small but awkward change to the distribution plugin.

The remaining cases in modules, plugins, and x-pack will be handled in followups.

This plugin is distinctly different from the plugin introduced in #55896 since
the YAML REST tests are intended to be black box tests over HTTP. As such they
should not (by default) have access to the classpath for that which they are testing.

The YAML based REST tests will be moved to separate source sets (yamlRestTest).
The which source is the target for the test resources is dependent on if this
new plugin is applied. If it is not applied, it will default to the test source
set.

Further, this introduces a breaking change for plugin developers that
use the YAML testing framework. They will now need to either use the new source set
and matching task, or configure the rest resources to use the old "test" source set that
matches the old integTest task. (The former should be preferred).

As part of this change (which is also breaking for plugin developers) the
rest resources plugin has been removed from the build plugin and now requires
either explicit application or application via the new YAML REST test plugin.

Plugin developers should be able to fix the breaking changes to the YAML tests
by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests
under a yamlRestTest folder (instead of test)
2020-07-06 12:13:01 -05:00
Przemyslaw Gomulka 9bef31ccd3
Do not create two loggers for DeprecationLogger (#58435)
DeprecationLogger's constructor should not create two loggers. It was
taking parent logger instance, changing its name with a .deprecation
prefix and creating a new logger.
Most of the time parent logger was not needed. It was causing Log4j to
unnecessarily cache the unused parent logger instance.
2020-06-29 13:38:21 +02:00
Przemyslaw Gomulka 4d6dc51c72
Header warning logging refactoring (#55941)
Splitting DeprecationLogger into two. HeaderWarningLogger - responsible for adding a response warning headers and ThrottlingLogger - responsible for limiting the duplicated log entries for the same key (previously deprecateAndMaybeLog).
Introducing A ThrottlingAndHeaderWarningLogger which is a base for other common logging usages where both response warning header and logging throttling was needed.

relates #55699
relates #52369
2020-06-01 15:44:01 +02:00
Tomasz Elendt 66ded59929
Support multiple tokens on LHS in stemmer_override rules (#56113) (#56484)
This commit adds support for rules with multiple tokens on LHS, also
known as "contraction rules", into stemmer override token
filter. Contraction rules are handy into translating multiple
inflected words into the same root form. One side effect of this change is
that it brings stemmer override rules format closer to synonym rules
format so that it makes it easier to translate one into another.

This change also makes stemmer override rules parser more strict so
that it should catch more errors which were previously accepted.

Closes #56113
2020-05-29 22:28:41 +02:00
Andrei Balici da31b4b83d
Add `max_token_length` setting to the CharGroupTokenizer (#56860)
Adds `max_token_length` option to the CharGroupTokenizer.
Updates documentation as well to reflect the changes.

Closes #56676
2020-05-20 14:15:57 +02:00
Amit Khandelwal 00fef6dfd3
Analysis enhancement - add preserve_original setting in ngram-token-filter (#55432) 2020-05-04 10:06:37 +01:00
Amit Khandelwal 9e41feda86
Expose `preserve_original` in `edge_ngram` token filter (#55766)
The Lucene `preserve_original` setting is currently not supported in the `edge_ngram`
token filter. This change adds it with a default value of `false`.

Closes #55767
2020-04-28 10:22:59 +02:00
Rory Hunter 8638d08ebf
Always use deprecateAndMaybeLog for deprecation warnings (#55115)
Closes #53137. Replace calls to deprecate(String,Object...) with deprecateAndMaybeLog(...), with an appropriate key, so that all messages
can potentially be deduplicated.
2020-04-16 16:19:45 +01:00
David Turner 6e98af385a
Add RepositoriesService to createComponents() args (#54814)
Today we pass the `RepositoriesService` to the searchable snapshots plugin
during the initialization of the `RepositoryModule`, forcing the plugin to be a
`RepositoryPlugin` even though it does not implement any repositories.

After discussion we decided it best for now to pass this in via
`Plugin#createComponents` instead, pending some future work in which plugins
can depend on services more dynamically.
2020-04-16 15:40:28 +01:00
Jason Tedor 95a7eed9aa
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 15:52:01 -04:00
Jake Landis afc2383b72
Optimize which Rest resources are used by the Rest tests. (#53299)
This should help with Gradle's incremental compile such that projects
only depend upon the resources they use.

related #52114
2020-03-18 09:09:29 -05:00
Jay Modi 0d1e67dbbb
Single instance of the IndexNameExpressionResolver (#52596)
This commit modifies the codebase so that our production code uses a
single instance of the IndexNameExpressionResolver class. This change
is being made in preparation for allowing name expression resolution
to be augmented by a plugin.

In order to remove some instances of IndexNameExpressionResolver, the
single instance is added as a parameter of Plugin#createComponents and
PersistentTaskPlugin#getPersistentTasksExecutor.
2020-02-20 15:04:45 -07:00
Adrien Grand 28e2f16734
Prepare backport of #51260. (#51876)
Backport: #51875
2020-02-05 11:02:46 +01:00
Adrien Grand d5bc6d6de0
Move analysis/mappings stats to cluster-stats. (#51260)
Closes #51138
2020-02-04 16:56:49 +01:00
Marios Trivyzas 24e1858a70
Fix caching for PreConfiguredTokenFilter (#50912)
The PreConfiguredTokenFilter#singletonWithVersion uses the version
internaly for the token filter factories but it registers only one
instance in the cahce and not one instance per version. This can lead
to exceptions like the one described in #50734 since the singleton is
created and cached using the version created of the first index
that is processed.

Remove the singletonWithVersion() methods and use the
elasticsearchVersion() methods instead.

Fixes: #50734
2020-01-16 12:04:14 +01:00
Christoph Büscher 9a4357ae04
Deprecate and remove camel-case nGram and edgeNGram tokenizers (#50862)
We already deprecated and removed the camel-case versions of the nGram and edgeNGram 
filters a while ago and we should do the same with the nGram and edgeNGram tokenizers.
This PR deprecates the use of these names in favour of ngram and edge_ngram in 7
and disallows usage in new indices starting with 8.

Closes #50561
2020-01-14 17:18:47 +01:00
Alan Woodward 736ed474e2
Check for deprecations when analyzers are built (#50908)
Generally speaking, deprecated analysis components in elasticsearch will issue deprecation
warnings when they are first used. However, this means that no warnings are emitted when
indexes are created with deprecated components, and users have to actually index a document
to see warnings. This makes it much harder to see these warnings and act on them at
appropriate times.

This is worse in the case where components throw exceptions on upgrade. In this case, users
will not be aware of a problem until a document is indexed, instead of at index creation time.

This commit adds a new check that pushes an empty string through all user-defined analyzers
and normalizers when an IndexAnalyzers object is built for each index; deprecation warnings
and exceptions are now emitted when indexes are created or opened.

Fixes #42349
2020-01-14 13:12:25 +00:00
Alan Woodward 2ba5da2eca
Remove type parameter from CIR.mapping(type, object...) (#50739)
This commit removes the type parameter from `CreateIndexRequest.mapping(type, object...)`,
and the associated delegating method on `CreateIndexRequestBuilder`. To make migration
simpler, the method on `CreateIndexRequest` is renamed to `simpleMapping`, and
on `CreateIndexRequestBuilder` to `setMapping`; this should help the compiler catch all
necessary changes on upgrades.

Relates to #41059
2020-01-09 16:02:28 +00:00
Christoph Büscher 4b366a4cbb
Make Multiplexer inherit filter chains analysis mode (#50662)
Currently, if an updateable synonym filter is included in a multiplexer filter, it is not reloaded via the 
_reload_search_analyzers because the multiplexer itself doesn't pass on the analysis mode of the 
filters it contains, so its not recognized as "updateable" in itself. Instead we can check and merge
the AnalysisMode settings of all filters in the multiplexer and use the resulting mode (e.g. search-time
only) for the multiplexer itself, thus making any synonym filters contained in it reloadable. 
This, of course, will also make the analyzers using the multiplexer be usable at search-time only.

Closes #50554
2020-01-08 18:25:00 +01:00
Alan Woodward a59b065091
Remove type parameter from `CreateIndexRequest.mapping(type, XContentBuilder)` (#50586)
This continues the removal of type parameters from CreateIndexRequest.mapping
methods started in #50419. Here the removed methods are almost entirely in test
code, with the exception of a change to TransformIndex in the transform plugin.

Relates to #41059
2020-01-08 09:18:31 +00:00
Christoph Büscher 68f22faef9
Delete removed token filter names from SynonymsAnalysisTests (#50438)
The `testPreconfiguredTokenFilters` test refers to the `nGram` and `edgeNGram`
token filter which are no longer part of the preconfigured token filters, so
they can be removed here as well.
2020-01-02 16:53:56 +01:00
Christoph Büscher c6f7166145
Throw Error on deprecated nGram and edgeNGram custom filters (#50376)
The camel-case `nGram` and `edgeNGram` filter names were deprecated in 6. We
currently throw errors on new indices when they are used. However these errors
are currently only thrown for pre-configured filters, adding them as custom
filters doesn't trigger the warning and error. This change adds the appropriate
exceptions for `nGram` and `edgeNGram` respectively.

Closes #50360
2019-12-20 18:01:05 +01:00
Stuart Tettemer cd721b6386
Scripting: ScriptFactory not required by compile (#50344)
Avoid backwards incompatible changes for 8.x and 7.6 by removing type
restriction on compile and Factory.  Factories may optionally implement
ScriptFactory.  If so, then they can indicate determinism and thus
cacheability.

Relates: #49466
2019-12-19 10:14:28 -07:00
Stuart Tettemer 356d1a274e
Scripting: Groundwork for caching script results (#49895)
In order to cache script results in the query shard cache, we need to
check if scripts are deterministic.  This change adds a default method
to the script factories, `isResultDeterministic() -> false` which is
used by the `QueryShardContext`.

Script results were never cached and that does not change here.  Future
changes will implement this method based on whether the results of the
scripts are deterministic or not and therefore cacheable.

Refs: #49466
2019-12-06 13:09:44 -07:00
Christoph Büscher 249f5a28a0
Remove outdated Todo in CommonAnalysisPlugin (#49450) 2019-11-22 11:01:47 +01:00
Christoph Büscher ed86750fa4
Allow custom characters in token_chars of ngram tokenizers (#49250)
Currently the `token_chars` setting in both `edgeNGram` and `ngram` tokenizers
only allows for a list of predefined character classes, which might not fit
every use case. For example, including underscore "_" in a token would currently
require the `punctuation` class which comes with a lot of other characters.
This change adds an additional "custom" option to the `token_chars` setting,
which requires an additional `custom_token_chars` setting to be present and
which will be interpreted as a set of characters to inlcude into a token.

Closes #25894
2019-11-20 10:36:39 +01:00
gpaimla d1ea9910c3 Implement Lucene EstonianAnalyzer, Stemmer (#49149)
This PR adds a new analyzer and stemmer for the Estonian language.

Closes #48895
2019-11-18 17:19:54 +01:00
Rory Hunter 3a3e5f6176
Apply 2-space indent to all gradle scripts (#48849)
Closes #48724. Update `.editorconfig` to make the Java settings the default
for all files, and then apply a 2-space indent to all `*.gradle` files.
Then reformat all the files.
2019-11-13 10:14:04 +00:00
Rory Hunter cbfd61266e
Improve resiliency to auto-formatting in libs, modules (#48448)
Make a number of changes so that code in the `libs` and `modules`
directories are more resilient to automatic formatting. This covers:

* Format cipher lists vertically, instead of horizontally
* Remove string concatenation where JSON fits on a single line
* Move some comments around to they aren't auto-formatted to a strange
  place
2019-10-29 09:38:31 +00:00
Alan Woodward 750c6d8bb1
Remove Client.prepareIndex(index, type, id) method (#48443)
As types are no longer used in index requests, we can remove the type parameter
from `prepareIndex` methods in the `Client` interface. However, just changing the signature
of `prepareIndex(index, type, id)` to `prepareIndex(index, id)` risks confusion when
upgrading with the previous (now removed) `prepareIndex(index, type)` method -
just changing the dependency version of java code would end up silently changing the
semantics of the method call. Instead we should just remove this method entirely, and
replace it by calling `prepareIndex(index).setId(id)`
2019-10-25 11:09:52 +01:00
Alan Woodward c2a048b772
Reset Token position on reuse in scripted analysis (#47424)
Most of the information in AnalysisPredicateScript.Token is pulled directly
from its underlying AttributeSource, but we also keep track of the token position,
and this state is held directly on the Token. This information needs to be reset when
the containing ScriptFilteringTokenFilter or ScriptedConditionTokenFilter is re-used.

Fixes #47197
2019-10-02 11:19:25 +01:00
Tanguy Leroux b1a03a137f
Remove unused private methods and fields (#47115)
This commit removes a bunch of unused private fields and 
unused private methods from the code base.
2019-09-26 09:35:57 +02:00
Christoph Büscher bd25a52604
Enable reloading of synonym_graph filters (#45135)
Reloading of synonym_graph filter doesn't work currently because the search time
AnalysisMode doesn't get propagated to the TokenFilterFactory emitted by the
graph filters getChainAwareTokenFilterFactory() method. This change fixes that.

Closes #45127
2019-08-02 14:34:22 +02:00
Alan Woodward c8ae530e7a
Don't use index_phrases on graph queries (#44340)
Due to https://issues.apache.org/jira/browse/LUCENE-8916, when you
try to use a synonym filter with the index_phrases option on a text field,
you can end up with null values in a Phrase query, leading to weird
exceptions further down the querying chain. As a workaround, this commit
disables the index_phrases optimization for queries that produce token
graphs.

Fixes #43976
2019-07-17 16:08:28 +01:00
Alan Woodward 60b460d38a
Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:23:27 +01:00
Christoph Büscher 62d13e9468
Remove rests of StandardHtmlStripAnalyzer (#43485)
StandardHtmlStripAnalyzer has been deprecated in 6.x and cannot be used for new
indices from 7.0 on. This change removes it entirely and also removes the from
tests and deprecation logging that has still been around during the 7.x
versions.
2019-06-28 11:25:51 +02:00
Christoph Büscher 56ee1a5e00
Allow reloading of search time analyzers (#43313)
Currently changing resources (like dictionaries, synonym files etc...) of search
time analyzers is only possible by closing an index, changing the underlying
resource (e.g. synonym files) and then re-opening the index for the change to
take effect.

This PR adds a new API endpoint that allows triggering reloading of certain
analysis resources (currently token filters) that will then pick up changes in
underlying file resources. To achieve this we introduce a new type of custom
analyzer (ReloadableCustomAnalyzer) that uses a ReuseStrategy that allows
swapping out analysis components. Custom analyzers that contain filters that are
markes as "updateable" will automatically choose this implementation. This PR
also adds this capability to `synonym` token filters for use in search time
analyzers.

Relates to #29051
2019-06-27 18:27:11 +02:00
Alan Woodward 392245b45f
Remove preconfigured `delimited_payload_filter` (#43686)
#41560 removed the delimited_payload_filter as part of a general
cleanup of pre-version 7 restrictions, but missed removing the
preconfigured version due to #43684.
2019-06-27 14:42:27 +01:00
Alan Woodward fbefb4690e
Use preconfigured filters correctly in Analyze API (#43568)
When a named token filter or char filter is passed as part of an Analyze API
request with no index, we currently try and build the relevant filter using no
index settings. However, this can miss cases where there is a pre-configured
filter defined in the analysis registry. One example here is the elision filter, which
has a pre-configured version built with the french elision set; when used as part
of normal analysis, this preconfigured set is used, but when used as part of the
Analyze API we end up with NPEs because it tries to instantiate the filter with
no index settings.

This commit changes the Analyze API to check for pre-configured filters in the case
that the request has no index defined, and is using a name rather than a custom
definition for a filter.

It also changes the pre-configured `word_delimiter_graph` filter and `edge_ngram`
tokenizer to make their settings consistent with the defaults used when creating
them with no settings

Closes #43002
Closes #43621 
Closes #43582
2019-06-27 09:01:53 +01:00
Alan Woodward d2c696d54b
Require [articles] setting in elision filter (#43083)
We should throw an exception at construction time if a list of
articles is not provided, otherwise we can get random NPEs during
indexing.

Relates to #43002
2019-06-27 08:56:26 +01:00