Commit Graph

898 Commits

Author SHA1 Message Date
Philippus Baalman fd6e8857bc
Mention `bbq_hnsw` for `m` and `ef_construction` options in docs (#117022) 2024-11-25 14:50:09 +01:00
István Zoltán Szabó 339e431081
[DOCS] Documents that ELSER is the default service for `semantic_text` (#115769) 2024-11-25 08:07:30 -05:00
shainaraskas 2d2ad00872
fix formatting errors (#116843) 2024-11-14 15:45:16 -05:00
kosabogi bada2a60ed
Updates chunk settings documentation (#116719) 2024-11-13 14:14:56 +01:00
István Zoltán Szabó 4058daf8b2
Revert "[DOCS] Documents that ELSER is the default service for `semantic_text…" (#115748)
This reverts commit 541bcf30e5.
2024-10-28 14:31:42 +01:00
shainaraskas 97ed0a93bb
Make a minor change to trigger release note process (#113975)
* changelog entry
2024-10-24 13:26:15 -04:00
István Zoltán Szabó 541bcf30e5
[DOCS] Documents that ELSER is the default service for `semantic_text` (#114615)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-10-24 08:53:12 +02:00
Salvatore Campagna f32051f462
fix: use setting instead of (#115193) 2024-10-22 11:09:19 +02:00
István Zoltán Szabó 1cae3c8361
[DOCS] Documents that dynamic templates are not supported by semantic_text. (#115195) 2024-10-21 12:51:10 +02:00
Salvatore Campagna ebd363d4af
Update synthetic source documentation (#112363)
* docs: update synthetic source docs

* fix: also doc values false works

* Revert "fix: also doc values false works"

This reverts commit 0895a76758.

* fix: update synthetic source documentation

* fix: all field types support it

* fix: no need to explicitly mention it

* fix: synthetic source sorting

* fix: may instead of might
2024-10-18 13:48:32 +02:00
Salvatore Campagna f6a1e36d6b
Replace usages of `_source.mode` in documentation (#114743)
We will deprecate the `_source.mode` mapping level configuration
in favor of the index-level `index.mapping.source.mode` setting.
As a result, we go through the documentation and update it to reflect
the introduction of the setting.
2024-10-16 16:17:41 +02:00
Kostas Krikellas 8cf2cb35f6
Fix minor formatting issue (#114815)
The list with two options doesn't get rendered as a list, due to the
snippet in between.

https://www.elastic.co/guide/en/elasticsearch/reference/master/passthrough.html#passthrough-conflicts
2024-10-15 23:39:33 +11:00
Kostas Krikellas 4d775cba4f
Add documentation for passthrough field type (#114720)
* Guard second doc parsing pass with index setting

* add test

* updates

* updates

* merge

* Add documentation for passthrough field type

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* updates

* updates

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

* address comment

* Update docs/reference/mapping/types/passthrough.asciidoc

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* address comment

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-10-15 12:05:02 +02:00
Benjamin Trent 6c752abc23
Adding new bbq index types behind a feature flag (#114439)
new index types of bbq_hnsw and bbq_flat which utilize the better binary quantization formats. A 32x reduction in memory, with nice recall properties.
2024-10-14 20:13:27 -04:00
Liam Thompson 1292580c03
[DOCS] Lookup runtime fields are now GA (#114221) 2024-10-07 14:52:42 +02:00
Simon Cooper 4ef5ea6d1c
Change default locale of date mappers to ENGLISH (#112799)
English is not changing between COMPAT and CLDR locale databases, whereas ROOT is
2024-10-07 10:22:38 +01:00
Kostas Krikellas dd2024881d
Add object param for keeping synthetic source (#113690)
* Add object param for keeping synthetic source

* Update docs/changelog/113690.yaml

* fix merging

* add tests

* merge

* fix randomized tests

* add documentation

* dedup id in docs

* update documentation

* update documentation

* fix bwc

* fix bwc

* fix unintended

* Revert "fix bwc"

This reverts commit 18dc913eee.

* Revert "fix bwc"

This reverts commit f4ddb0e5e5.

* add missing test

* fix transform

* fix transform

* fix transform

* fix transform

* fix transform
2024-10-03 21:19:04 +03:00
István Zoltán Szabó b9adc701fa
[DOCS] Expands param descriptions for semantic_text (#114024)
Co-authored-by: Mike Pellegrini <mike.pellegrini@elastic.co>
2024-10-03 19:48:16 +02:00
Simon Cooper 31d50eed0f
Update 9.0 with various locale changes from 8.x (#113787) (#113870)
Forward-port changes from #113787, and update the docs with similar information to #113587
2024-10-02 11:41:33 +01:00
john-wagster 0fbb3bcb45
Updated Date Range to Follow Documentation When Assuming Missing Values (#112258)
* updated rangetype to be more inline with the docs (https://www.elastic.co/guide/en/elasticsearch/reference/current/query-dsl-range-query.html) and added tests to reflect as much
2024-10-01 09:21:47 -05:00
Kostas Krikellas c9f378da29
Revert "Apply auto-flattening to `subobjects: auto` (#112092)" (#113692)
* Revert "Apply auto-flattening to `subobjects: auto` (#112092)"

This reverts commit fffe8844

* fix DataGenerationHelper
2024-09-30 10:11:15 +03:00
István Zoltán Szabó 5e019998ef
[DOCS] Improves semantic text documentation. (#113606) 2024-09-26 16:09:28 +02:00
Kostas Krikellas fffe8844e9
Apply auto-flattening to `subobjects: auto` (#112092)
* Introduce mode `subobjects=auto` for objects

* Update docs/changelog/110524.yaml

* compilation error

* tests and fixes

* refactor

* spotless

* more tests

* fix nested objects

* fix test

* update fetch test

* add QA coverage

* update tests

* update tests

* update tests

* Apply auto-flattening to `subobjects: auto`

* Update docs/changelog/112092.yaml

* sync

* dont flatten subobjects auto

* refine test

* fix path for nested flattened objects and dynamic

* document `subobjects: auto`

* Apply suggestions from code review

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>

* comment updates

* restore indentation in comment

* update comment

* update comment

* update comment

* update comment

* rename isFlattenable

* add test for dynamic template

* fix copy_to and noop dynamic updates

* tests

* update comment

* fix tests

* update cluster feature in yaml test

* address comments

---------

Co-authored-by: Felix Barnsteiner <felixbarny@users.noreply.github.com>
2024-09-26 11:42:40 +03:00
Salvatore Campagna 208a1fe571
Introduce an `ignore_above` index-level setting (#113121)
Here we introduce a new index-level setting, `ignore_above`, similar to what we have
for `ignore_malformed`. The setting will apply to all `keyword`, `wildcard` and `flattened`
fields. Each field mapping will still be allowed to override the index-level setting using a
mapping-level `ignore_above` value.
2024-09-23 18:05:02 +02:00
Felix Barnsteiner 8d223cbf7a
Add support for multi-value dimensions (#112645)
Closes https://github.com/elastic/elasticsearch/issues/110387

Having this in now affords us not having to introduce version checks in
the ES exporter later. We can simply use the same serialization logic
for metric attributes as we do for other signals. This also enables us
to properly map `*.ip` fields to the ip field type as ip fields
containing a list of IPs are not converted to a comma-separated list.
2024-09-23 17:31:18 +10:00
Stef Nestor a4dba7db8d
(Doc+) Sparse Vectors NA to mapping analyzers (#112523)
* retry
2024-09-05 09:19:19 -06:00
Simon Cooper a36d90cf34
Use CLDR locale provider on JDK 23+ (#110222)
JDK 23 removes the COMPAT locale provider, leaving CLDR as the only option. This commit configures Elasticsearch
to use the CLDR provider when on JDK 23, but still use the existing COMPAT provider when on JDK 22 and below.

This causes some differences in locale behaviour; this also adapts various tests to still work whether run on COMPAT or CLDR.
2024-09-04 13:42:40 +01:00
Ignacio Vera 3747765ab8
[DOC] geo_shape field type supports geo_hex aggregation (#112448) 2024-09-04 11:12:11 +02:00
István Zoltán Szabó 2c29a3ae0a
[DOCS] Highlights auto-chunking in intro of semantic text. (#111836) 2024-08-29 12:43:10 +02:00
Liam Thompson 4034615e29
[DOCS] Clarify copy_to behavior with strict dynamic mappings (#111408)
* [DOCS] Clarify copy_to behavior with strict dynamic mappings

* Add id

* De-verbosify

* Delete pesky comma

* More info about root and nest

* Fixes per review, clarify non-recursive explanation

* Skip tests for illustrative example

* Fix example syntax

* Fix typo
2024-08-01 14:37:17 +02:00
Felix Barnsteiner 3090438037
Add support for boolean dimensions (#111457)
Closes #111338
2024-07-31 23:00:32 +10:00
István Zoltán Szabó 1a5b008921
[DOCS] Clarifies semantic query behavior on sparse and dense vector fields (#111339)
* [DOCS] Clarifies semantic query behavior on sparse and dense vector fields.

* [DOCS] Adds a NOTE to the semantic query docs.
2024-07-26 16:53:38 +02:00
Carlos Delgado ff3a77ca46
Clarify some semantic_text docs (#111329) 2024-07-26 16:45:29 +02:00
István Zoltán Szabó 22ead8d106
[DOCS] Documents automatic text chunking behavior for semantic text. (#111331) 2024-07-26 12:02:47 +02:00
Tommaso Teofili 9b86fd17aa
Document how to update dense vector field type (#111038) 2024-07-23 09:55:31 +02:00
Ioana Tagirta e99aaad800
Document how to query for a specific feature within rank_features (#110749) 2024-07-11 16:19:14 +02:00
Oleksandr Kolomiiets 276ae121c2
Reflect latest changes in synthetic source documentation (#109501) 2024-07-04 09:48:04 -07:00
Carlos Delgado 30b32b6a46
semantic_text: Updated copy-to docs (#110350) 2024-07-03 10:18:40 +02:00
Kathleen DeRusso 7a1d532ffb
Pass over Sparse Vector docs for correctness (#110282)
* Remove legacy mentions of text expansion queries

* Add missing query_vector param to sparse_vector query docs

* Fix formatting errors in sparse vector query dsl doc

* Remove unnecessary test setup block
2024-07-02 13:37:25 -04:00
Felix Barnsteiner cdbe092d90
Update docs now that keyword dimensions support ignore_above (#110385)
This is a follow-up from https://github.com/elastic/elasticsearch/pull/110337
2024-07-02 17:04:57 +02:00
Benjamin Trent 5add44d7d1
Adds new `bit` element_type for dense_vectors (#110059)
This commit adds `bit` vector support by adding `element_type: bit` for
vectors. This new element type works for indexed and non-indexed
vectors. Additionally, it works with `hnsw` and `flat` index types. No
quantization based codec works with this element type, this is
consistent with `byte` vectors.

`bit` vectors accept up to `32768` dimensions in size and expect vectors
that are being indexed to be encoded either as a hexidecimal string or a
`byte[]` array where each element of the `byte` array represents `8`
bits of the vector.

`bit` vectors support script usage and regular query usage. When
indexed, all comparisons done are `xor` and `popcount` summations (aka,
hamming distance), and the scores are transformed and normalized given
the vector dimensions. Note, indexed bit vectors require `l2_norm` to be
the similarity.

For scripts, `l1norm` is the same as `hamming` distance and `l2norm` is
`sqrt(l1norm)`. `dotProduct` and `cosineSimilarity` are not supported.

Note, the dimensions expected by this element_type are always to be
divisible by `8`, and the `byte[]` vectors provided for index must be
have size `dim/8` size, where each byte element represents `8` bits of
the vectors.

closes: https://github.com/elastic/elasticsearch/issues/48322
2024-06-27 04:48:41 +10:00
Mayya Sharipova 5c87eef89d
[DOCS Vectors with cosine automatically normalized (#110071)
PR #99445 introduced automatic normalization of dense vectors with
cosine similarity. This adds a note about this in the documentation.

Relates to #99445
2024-06-22 22:32:25 +10:00
Oleksandr Kolomiiets 8bc5ecdc31
Support synthetic source together with ignore_malformed in histogram fields (#109882) 2024-06-20 09:09:45 -07:00
Oleksandr Kolomiiets 5440f178aa
Support synthetic source for geo_point when ignore_malformed is used (#109651) 2024-06-18 08:37:27 -07:00
Benjamin Trent 3aed0afb2b
Add new int4 quantization to dense_vector (#109317)
This adds a new quantization mechanism for HNSW and flat indices. Here
we add `int4` quantization via the `int4_hnsw` and `int4_flat` index
types. This quantization methodology further reduces the memory required
for fast HNSW, meaning that the memory required is 8x smaller than with
regular float32 values. 

8x reduction means that 1M 1024 dimension vectors goes from requiring
3.8GB to 477MB.

Recall continues to stay steady, there is some reduction that is
recoverable via slightly oversampling and reranking. For example over
500k CohereV3 vectors, only 5 extra vectors are required to be gathered
to achieve over 0.98 recall in a brute-force scenario.

![recall](https://github.com/elastic/elasticsearch/assets/4357155/b47a79d0-020d-4baa-8199-41a932df00f7)
2024-06-18 00:15:43 +10:00
Carlos Delgado d10dfb4ac5
Add limitations section to semantic_text field type docs (#109666) 2024-06-13 15:19:00 +02:00
Oleksandr Kolomiiets c847235ed0
Support synthetic source for scaled_float and unsigned_long when ignore_malformed is used (#109506) 2024-06-12 11:05:23 -07:00
Benjamin Trent 29288d6590 Merge remote-tracking branch 'upstream/main' into lucene_snapshot_9_11 2024-06-11 06:54:23 -04:00
Carlos Delgado d975997a3a
Add semantic-text warning about inference endpoints removal (#109561) 2024-06-11 18:33:25 +10:00
Oleksandr Kolomiiets a9f31bd2aa
Support synthetic source for date fields when ignore_malformed is used (#109410) 2024-06-10 10:26:31 -07:00