Commit Graph

22 Commits

Author SHA1 Message Date
Simon Cooper a830787b07
Bulk migration of Version.CURRENT for index created to IndexVersion.current() (#98490) 2023-08-15 13:47:27 +01:00
twosom c9f781f196
Cleanups in NoriTokenizerFactory and KuromojiTokenizerFactory (#92574)
Some code restructuring to improve readability.
2023-06-23 13:56:54 +02:00
Kiyoung's Noona e535df64c5
Fix typo in nori docs and tests (#92336)
C샤프 in English is C#, not C++.
(C++ in Korean is C플플, C쁠쁠 or C플러스플러스.)

The translation doesn't make sense, so I changed C++ to C#.
It might be true that the writer used C샤프 as just an independent example, regardless C++, but I think it is better to align them for better understanding.
2023-01-11 23:54:44 +01:00
Armin Braun 02568210ba
Don't extend AbstractIndexComponent in AbstractTokenFilter (#88113)
No need for this extension, we don't make use of the settings or deprecation logger
in production any more. Also, this slows down CS operations that require a
temporary index service which builds quite a bit slower when the loggers
need to be set up via reflective calls.
2022-06-28 12:13:36 +02:00
Armin Braun 7b916f2678
AbstractAnalyzerProvider does not need to extend AbstractIndexComponent (#86537)
Remove the inheritance here to make instances smaller and speed up many-shards benchmarks a little.
Did not remove the dead arguments from the constructors in this PR as that would have been a
very noisy change.
2022-05-08 22:34:52 +02:00
Przemyslaw Gomulka 037261356e
Convert 'id' and '_id' values in REST API tests to strings (#82681)
Follow-up from #77144 (comment) with converting id/_id to always be strings instead of integers. This makes the type value in the Elasticsearch specification be only string instead of string | number.

this change was generated using following command on ubuntu
find . -type f -name "*.yml" -print0 | xargs -0 sed -i -r 's/([^a-zA-Z0-9_\.]id|[^a-zA-Z0-9_]_id):(\s*)([0-9]+)/\1:\2"\3"/g'
2022-02-10 09:14:17 +01:00
Mark Vieira 12ad399c48 Reformat Elasticsearch source 2021-10-27 08:19:51 -07:00
Ryan Ernst ebf1ed7893
Fix split package in analysis-nori plugin (#78040)
The analysis-nori plugin reuses server the server package name for
analysis. This commit moves the plugin implementation to use a single
package name, o.e.p.analysis.nori
2021-09-21 08:11:29 -07:00
Mark Vieira a92a647b9f Update sources with new SSPL+Elastic-2.0 license headers
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:

 - Updating LICENSE and NOTICE files throughout the code base, as well
   as those packaged in our published artifacts
 - Update IDE integration to now use the new license header on newly
   created source files
 - Remove references to the "OSS" distribution from our documentation
 - Update build time verification checks to no longer allow Apache 2.0
   license header in Elasticsearch source code
 - Replace all existing Apache 2.0 license headers for non-xpack code
   with updated header (vendored code with Apache 2.0 headers obviously
   remains the same).
 - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
2021-02-02 16:10:53 -08:00
Jake Landis 10be5d4c79
Convert most OSS plugins from integTest to [yaml | java]RestTest or internalClusterTest (#59444)
For all OSS plugins (except repository-* and discovery-*) integTest
task is now a no-op and all of the tests are now executed via a test,
yamlRestTest, javaRestTest, or internalClusterTest.

This commit does NOT convert the discovery-* and repository-* since they
are bit more complex then the rest of tests and this PR is large enough.
Those plugins will be addressed in a future PR(s).

This commit also fixes a minor issue that did not copy the rest api
for projects that only had YAML TEST tests.

related: #56841
2020-07-28 16:43:17 -05:00
Jason Tedor 95a7eed9aa
Rename MetaData to Metadata in all of the places (#54519)
This is a simple naming change PR, to fix the fact that "metadata" is a
single English word, and for too long we have not followed general
naming conventions for it. We are also not consistent about it, for
example, METADATA instead of META_DATA if we were trying to be
consistent with MetaData (although METADATA is correct when considered
in the context of "metadata"). This was a simple find and replace across
the code base, only taking a few minutes to fix this naming issue
forever.
2020-03-31 15:52:01 -04:00
Namgyu Kim 8d4ff290cc
Add nori_number token filter in analysis-nori (#53583)
This change adds the `nori_number` token filter.
It also adds a `discard_punctuation` option in nori_tokenizer that should be used in conjunction with the new filter.
2020-03-23 19:28:49 +01:00
Jim Ferenczi d66a307599
Add support for inlined user dictionary in the Kuromoji plugin (#45489)
This change adds a new option called user_dictionary_rules to
Kuromoji's tokenizer. It can be used to set additional tokenization rules
to the Japanese tokenizer directly in the settings (instead of using a file).
This commit also adds a check that no rules are duplicated since this is not allowed
in the UserDictionary.

Closes #25343
2019-08-20 15:06:01 +02:00
Alan Woodward 60b460d38a
Add name() method to TokenizerFactory (#43909)
This brings TokenizerFactory into line with CharFilterFactory and TokenFilterFactory,
and removes the need to pass around tokenizer names when building custom analyzers.

As this means that TokenizerFactory is no longer a functional interface, the commit also
adds a factory method to TokenizerFactory to make construction simpler.
2019-07-04 11:23:27 +01:00
Julie Tibshirani c2e9d13ebd
Default include_type_name to false in the yml test harness. (#38058)
This PR removes the temporary change we made to the yml test harness in #37285
to automatically set `include_type_name` to `true` in index creation requests
if it's not already specified. This is possible now that the vast majority of
index creation requests were updated to be typeless in #37611. A few additional
tests also needed updating here.

Additionally, this PR updates the test harness to set `include_type_name` to
`false` in index creation requests when communicating with 6.x nodes. This
mirrors the logic added in #37611 to allow for typeless document write requests
in test set-up code. With this update in place, we can remove many references
to `include_type_name: false` from the yml tests.
2019-02-01 11:44:13 -08:00
Colin Goodheart-Smithe 21e392e95e
Removes typed calls from YAML REST tests (#37611)
This PR attempts to remove all typed calls from our YAML REST tests. The PR adds include_type_name: false to create index requests that use a mapping and also to put mapping requests. It also removes _type from index requests where they haven't already been removed. The PR ignores tests named *_with_types.yml since this are specifically testing typed API behaviour.

The change also includes changing the test harness to add the type _doc to index, update, get and bulk requests that do not specify the document type when the test is running against a mixed 7.x/6.x cluster.
2019-01-30 16:32:58 +00:00
Jim Ferenczi a53e8653f2
Add support for inlined user dictionary in Nori (#36123)
Add support for inlined user dictionary in Nori

This change adds a new option called `user_dictionary_rules` to the
Nori a tokenizer`. It can be used to set additional tokenization rules
to the Korean tokenizer directly in the settings (instead of using a file).

Closes #35842
2018-12-07 15:26:08 +01:00
Jim Ferenczi 18866c4c0b
Make hits.total an object in the search response (#35849)
This commit changes the format of the `hits.total` in the search response to be an object with
a `value` and a `relation`. The `value` indicates the number of hits that match the query and the
`relation` indicates whether the number is accurate (in which case the relation is equals to `eq`)
or a lower bound of the total (in which case it is equals to `gte`).
This change also adds a parameter called `rest_total_hits_as_int` that can be used in the
search APIs to opt out from this change (retrieve the total hits as a number in the rest response).
Note that currently all search responses are accurate (`track_total_hits: true`) or they don't contain
`hits.total` (`track_total_hits: true`). We'll add a way to get a lower bound of the total hits in a
follow up (to allow numbers to be passed to `track_total_hits`).

Relates #33028
2018-12-05 19:49:06 +01:00
Armin Braun ed3b44fb4c
Handle TokenizerFactory TODOs (#32063)
* Don't replace Replace TokenizerFactory with Supplier, this approach was rejected in #32063 
* Remove unused parameter from constructor
2018-07-17 14:14:02 +02:00
Jim Ferenczi e852eb0548 Revert accidentally pushed changes in NoriAnalysisTests 2018-05-30 15:34:55 +02:00
Jim Ferenczi f582418ada Fix missing option serialization after backport
Relates #29465
2018-05-30 12:55:31 +02:00
Jim Ferenczi 891d3bd9c3
Expose the Lucene Korean analyzer module in a plugin (#30397)
This change adds a new plugin called `analysis-nori` that exposes
Korean text analysis in es using the new Lucene Korean analyzer module named (`nori`).
The plugin adds:
* a Korean analyzer: `nori`
* a Korean tokenizer: `nori_tokenizer`
* a part of speech stop filter: `nori_part_of_speech`
* a filter that can replace Hanja characters with their Hangul transcription: `nori_readingform`
2018-05-04 20:46:13 +02:00