Today there are request and response objects for the
get-cluster-settings action but the request is unused and the response
is only used in the REST layer. This commit removes the unused request
and renames the response to reflect that it's not a transport-layer
response. It also tidies a few things up in this area, removing the
unused `ActionResponse` superclass, making its fields final, and
replacing the overly-general `RestBuilderListener` with a regular
`RestToXContentListener` in the REST action.
Relates #82342 because to resolve that issue we will want to introduce
transport-layer request/response classes, and the classes involved in
this commit are in the way of that change.
A must_not with a match_all clause inside a bool query is currently not rewritten to a match_none query. This means that running a boolean query with "must_not":[{"terms":{"_tier":["data_frozen","data_cold"]}] is currently not rewritten as match_none on a cold/frozen tier node.
The ESClientYamlSuiteTestCase is used to run yaml tests throughout
Elasticsearch. It utilizes the low level rest client in sniffing for
nodes, but the sniffer is not needed anywhere else in the test
framework.
This commit creates a new project, `:test:rest-runner` which is meant to
house the rest test running infrastructure. This has two purposes. First
is to remove the sniffer from the test framework dependencies, because
it transitively depends on Jackson. Second is to setup the runner for
future refactorings where it could be made to not depend on the entire
test framework, though how that could work is left for the future.
This adds support for GET and DELETE and the ids query and
Elasticsearch's standard document versioning to TSDB. So you can do
things like:
```
POST /tsdb_idx/_doc?filter_path=_id
{
"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2
}
```
That'll return `{"_id" : "BsYQJjqS3TnsUlF3aDKnB34BAAA"}` which you can turn
around and fetch with
```
GET /tsdb_idx/_doc/BsYQJjqS3TnsUlF3aDKnB34BAAA
```
just like any other document in any other index. You can delete it too!
Or fetch it.
The ID comes from the dimensions and the `@timestamp`. So you can
overwrite the document:
```
POST /tsdb_idx/_bulk
{"index": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```
Or you can write only if it doesn't already exist:
```
POST /tsdb_idx/_bulk
{"create": {}}
{"@timestamp": "2021-12-29T19:25:05Z", "uid": "adsfadf", "v": 1.2}
```
This works by generating an id from the dimensions and the `@timestamp`
when parsing the document. The id looks like:
* 4 bytes of hash from the routing calculated from routing_path fields
* 8 bytes of hash from the dimensions
* 8 bytes of timestamp
All that's base 64 encoded so that `Uid` can chew on it fairly
efficiently.
When it comes time to fetch or delete documents we base 64 decode the id
and grab the routing from the first four bytes. We use that hash to pick
the shard. Then we use the entire ID to perform the fetch or delete.
We don't implement update actions because we haven't written the
infrastructure to make sure the dimensions don't change. It's possible
to do, but feels like more than we need now.
There *ton* of compromises with this. The long term sad thing is that it
locks us into *indexing* the id of the sample. It'll index fairly
efficiently because the each time series will have the same first eight
bytes. It's also possible we'd share many of the first few bytes in the
timestamp as well. In our tsdb rally track this costs 8.75 bytes per
document. It's substantial, but not overwhelming.
In the short term there are lots of problems that I'd like to save for a
follow up change:
1. ~~We still generate the automatic `_id` for the document but we don't use
it. We should stop generating it.~~ Included in this PR based on review comments.
2. We generated the time series `_id` on each shard and when replaying
the translog. It'd be the good kind of paranoid to generate it once
on the primary and then keep it forever.
3. We have to encode the `_id` as a string to pass it around
Elasticsearch internally. And Elasticsearch assumes that when an id
is loaded we always store as bytes encoded the `Uid` - which *does*
have nice encoding for base 64 bytes. But this whole thing requires
us to make the bytes, base 64 encode them, and then hand them back to
`Uid` to base 64 decode them into bytes. It's a bit hacky. And, it's
a small thing, but if the first byte of the routing hash encodes to
254 or 255 we `Uid` spends an extra byte to encode it. One that'll
always be a common prefix for tsdb indices, but still, it hurts my
heart. It's just hard to fix.
4. We store the `_id` in Lucene stored fields for tsdb indices. Now
that we're building it from the dimensions and the `@timestamp` we
really don't *need* to store it. We could recalculate it when fetching
documents. In the tsdb rall ytrick this'd save us 6 bytes per document
at the cost of marginally slower fetches. Which is *fine*.
5. There are several error messages that try to use `_id` right now
during parsing but the `_id` isn't available until after the parsing
is complete. And, if parsing fails, it may not be possible to know
the id at all. All of these error messages will have to change,
at least in tsdb mode.
6. ~~If you specify an `_id` on the request right now we just overwrite
it. We should send you an error.~~ Included in this PR after review comments.
7. We have to entirely disable the append-only optimization that allows
Elasticsearch to skip looking up the ids in lucene. This *halves*
indexing speed. It's substantial. We have to claw that optimization
back *somehow*. Something like sliding bloom filters or relying on
the increasing timestamps.
8. We parse the source from json when building the routing hash when
parsing fields. We should just build it from to parsed field values.
It looks like that'd improve indexing speed by about 20%.
9. Right now we write the `@timestamp` little endian. This is likely bad
the prefix encoded inverted index. It'll prefer big endian. Might shrink it.
10. Improve error message on version conflict to include tsid and timestamp.
11. Improve error message when modifying dimensions or timestamp in update_by_query
12. Make it possible to modify dimension or timestamp in reindex.
13. Test TSDB's `_id` in `RecoverySourceHandlerTests.java` and `EngineTests.java`.
I've had to make some changes as part of this that don't feel super
expected. The biggest one is changing `Engine.Result` to include the
`id`. When the `id` comes from the dimensions it is calculated by the
document parsing infrastructure which is happens in
`IndexShard#pepareIndex`. Which returns an `Engine.IndexResult`. To make
everything clean I made it so `id` is available on all `Engine.Result`s
and I made all of the "outer results classes" read from
`Engine.Results#id`. I'm not excited by it. But it works and it's what
we're going with.
I've opted to create two subclasses of `IdFieldMapper`, one for standard
indices and one for tsdb indices. This feels like the right way to
introduce the distinction, especially if we don't want tsdb to cary
around it's old fielddata support. Honestly if we *need* to aggregate on
`_id` in tsdb mode we have doc values for the `tsdb` and the
`@timestamp` - we could build doc values for `_id` on the fly. But I'm
not expecting folks will need to do this. Also! I'd like to stop storing
tsdb'd `_id` field (see number 4 above) and the new subclass feels like
a good place to put that too.
Term queries can in certain circumstances (eg when run against constant keyword
fields) rewrite themselves to match_no_docs queries, which is very useful for filtering
out shards from searches and field_caps requests. But match and match_phrase
queries can reduce down to simple term queries when there is no fuzziness defined
on them, and when they are run using a keyword analyzer.
This commit makes simple match and match_phrase rewrite themselves to term
queries when run against keyword fields.
Fixes#82515
Our bwc tests now always upgrade through 7.last.
This PR fixes a race condition related to potentially
seeing a 7.last index when we upgrade from 7.non-last.
The ES code base is quite JSON heavy. It uses a lot of multi-line JSON requests in tests which need to be escaped and concatenated which in turn makes them hard to read. Let's try to leverage Java 15 text blocks for representing them.
This donates the `MapMatcher` project to Elasticsearch. We want it in
our code base so we can more easilly maintain it and don't have to wait
for a release. It really isn't used outside of Elasticsearch so it isn't
really useful to keep it separate. We considered removing it entirely it
but still think that it's mostly and improvement on
`NotEqualMessageBuilder` and the build in `equalTo`.
Make `index.time_series.start_time` and `index.time_series.end_time` settings as required settings in tsdb indices.
It will change many tests, among which a time_series index is created, and it will add the `index.time_series.start_time` and `index.time_series.end_time` settings
This PR adds support for a field named _tsid that uniquely identifies the time series a document belongs to.
When a document is indexed in a time series index (IndexMode.TIME_SERIES), _tsid field is generated from the values of all dimension fields.
When configuring BWC tests to use feature flags, we need to ensure we
don't inadvertently enable the flag on the old version nodes, as those
are NOT release builds and therefore fail if the feature flag is
present. This commit tweaks our config here to ensure we explicitly
enable the flag only on the upgraded nodes. Closes#78219
* Route documents to the correct shards in tsdb
This causes elasticsearch to land documents from the same time series on
the same shard. It does so by adding a new index setting `routing_path`
which must be set when an index is in `mode: time_series` and may not be
set outside of that mode. That setting contains a list of patterns to
extract from the `_source` document that are hashed into the routing
value.
* Moar skip
* tsdb survives full cluster restart
* Remove auto generated id rejection
We do want to reject these documents but let's sae that for a follow up
change.
* Simplify
* Forbid routing_required
* Small
* Fork fork knife
* Let failures flow
* Fix full cluster
* Always fork
* Retry?
* Remove pressure test arm
* Add missing settings
* WIP
* Remove leftover
* More cleaning
* Fixup more tests
* Remove old skip
* New tests for Retry
* More tests
* Revert unrelated
* One dispatch please
* Stuff moved
* More moving
* Explain why fork
* Back to ActionRunnable
* Update comment
* Utility method
* Move routing_path under feature flag
* Imports
Fix the split package org.elasticsearch.common.xcontent, between server and the x-content lib. Move the x-content lib exported package from org.elasticsearch.common.xcontent to org.elasticsearch.xcontent ( following the naming convention of similar libraries ). Removing split packages is a prerequisite to modularization.
* Do not create unused testCluster
This avoids creating test clusters that are not required during the build.
We use lazy configuration here on testClusters and only instantiate them as theyre
* Do not fail on run task (debug)
* Create more test cluster lazy
* Make more test cluster lazy
* Avoid creating unused testcluster
* Fix PluginBuildPlugin
* Fix disabling geo db download
* Fix cluster setup in repository-multi-version
* Polishing
* Fix issue with irretic groovy ogic
* Fix bwc tests
* Fix more bwcTests
* Fix more bwc tests
* Fix more bwc tests
* Fix more bwc tests
* Fix typo
* Minor polishing
* Fix rolling upgrade tests
* Fix cluster config in sql qa mixedcluster project
* Fix more bwc tests
* Clean up before review
* Document test cluster usage
* Api polising after Review
provide useCluster(Provider) method to TestClusterAware
Ideally we take this a step further and realize those test clusters only on use.
But out of scope of this PR.
* Allow gradle provider as value for nonSystemProperties
* Some simplification on test configuration
* Fix typo in rest test config
* Fix more typos
* Fix another typo
* Fix more typos
synced flush is going to be replaced by flush. This commit allows to synced_flush api only in v7 compatibility mode.
Worth noting - sync_id is gone and won't be available in v7 responses from indices.stats
relates removal pr #50882
relates #51816
This is related to #74755. The test to check that compression settings
are compatible after upgrade uses the cluster.remote prefix for remote
cluster settings. These settings did not exist in early versions of 6.x.
This commit adds an assume true to ensure that the test is not run
against older versions.
With the overall theme of trying to configure and add less to the build instead of just disabling it later,
we're replacing standalone-test by standalone-rest tasks avoids creating the
unused test tasks.
Standalone rest test plugin and the other rest test plugins behave a little bit different in the sense how source sets and test tasks are wired.
The standalone rest test plugin assumes that all RestTestTasks are using the same sourceSet (test). The yaml, java Rest test plugins use one dedicated sourceSet per test task.
In the long run we probably will migrate standalone-rest-test usages to one of the other plugins and deprecate standalone-rest-test
This commit is related to #73497. It adds two new settings. The first setting
is transport.compression_scheme. This setting allows the user to
configure LZ4 or DEFLATE as the transport compression. Additionally, it
modifies transport.compress to support the value indexing_data. When
this setting is set to indexing_data only messages which are primarily
composed of raw source data will be compressed. This is bulk, operations
recovery, and shard changes messages.
When libs/core was created, several classes were moved from server's
o.e.common package, but they were not moved to a new package. Split
packages need to go away long term, so that Elasticsearch can even think
about modularization. This commit moves all the classes under o.e.common
in core to o.e.core.
relates #73784
Extract usage of internal API from TestClustersPlugin and PluginBuildPlugin and related plugins and build logic
This includes a refactoring of ElasticsearchDistribution to handle types
better in a way we can differentiate between supported Elasticsearch
Distribution types supported in TestCkustersPlugin and types only supported
in internal plugins.
It also introduces a set of internal versions of public plugins.
As part of this we also generate the plugin descriptors now.
As a follow up on this we can actually move these public used classes into
an extra project (declared as included build)
We keep LoggedExec and VersionProperties effectively public And workaround for RestTestBase
Script plugins cannot apply plugins and therefore wont work with porting
buildSrc to an included build as we plan. Therefore we take advantage
of moving our script plugins into precompiled script plugins.
As a limitation of this we ran into problems applying binary plugins
from script plugins and for now moved this out of those scripts.
Related to #71593 we move all build logic that is for elasticsearch build only into
the org.elasticsearch.gradle.internal* packages
This makes it clearer if build logic is considered to be used by external projects
Ultimately we want to only expose TestCluster and PluginBuildPlugin logic
to third party plugin authors.
This is a very first step towards that direction.
* Warn users if security is implicitly disabled
Elasticsearch has security features implicitly disabled by default for
Basic and Trial licenses, unless explicitly set in the configuration
file.
This may be good for onboarding, but it also lead to unintended insecure
clusters.
This change introduces clear warnings when security features are
implicitly disabled.
- a warning header in each REST response if security is implicitly
disabled;
- a log message during cluster boot.
This reduces the ceremony declaring test artifacts for a project.
It also solves an issue with usage of deprecated testRuntime that
testArtifacts extendsFrom which seems not required at all and would have
broke with Gradle 7.0 anyhow
Test artifact resolution is now variant aware which allows us a more adequate
compile and runtime classpath for the consuming projects.
We also Introduce a convention method in the elasticsearch build to declare
test artifact dependencies in an easy way close to how its done by the gradle build in
test fixture plugin.
Furthermore we cleaned up some inconsistent test dependencies declarations when
relying on a project and on its test artifacts
As per the new licensing change for Elasticsearch and Kibana this commit
moves existing Apache 2.0 licensed source code to the new dual license
SSPL+Elastic license 2.0. In addition, existing x-pack code now uses
the new version 2.0 of the Elastic license. Full changes include:
- Updating LICENSE and NOTICE files throughout the code base, as well
as those packaged in our published artifacts
- Update IDE integration to now use the new license header on newly
created source files
- Remove references to the "OSS" distribution from our documentation
- Update build time verification checks to no longer allow Apache 2.0
license header in Elasticsearch source code
- Replace all existing Apache 2.0 license headers for non-xpack code
with updated header (vendored code with Apache 2.0 headers obviously
remains the same).
- Replace all Elastic license 1.0 headers with new 2.0 header in xpack.
This PR adjusts the BWC tests in to handle the deprecation warnings that are now emitted in old clusters due to the backport of system index access deprecation warnings.
This PR adds deprecation warnings when accessing System Indices via the REST layer. At this time, these warnings are only enabled for Snapshot builds by default, to allow projects external to Elasticsearch additional time to adjust their access patterns.
Deprecation warnings will be triggered by all REST requests which access registered System Indices, except for purpose-specific APIs which access System Indices as an implementation detail a few specific APIs which will continue to allow access to system indices by default:
- `GET _cluster/health`
- `GET {index}/_recovery`
- `GET _cluster/allocation/explain`
- `GET _cluster/state`
- `POST _cluster/reroute`
- `GET {index}/_stats`
- `GET {index}/_segments`
- `GET {index}/_shard_stores`
- `GET _cat/[indices,aliases,health,recovery,shards,segments]`
Deprecation warnings for accessing system indices take the form:
```
this request accesses system indices: [.some_system_index], but in a future major version, direct access to system indices will be prevented by default
```
Referencing a project instance during task execution is discouraged by
Gradle and should be avoided. E.g. It is incompatible with Gradles
incubating configuration cache. Instead there are services available to handle
archive and filesystem operations in task actions.
Brings us one step closer to #57918
This change removes the HTTP content type required setting, which was
deprecated in 6.0 and only existed for users upgrading from 5.x so that
they did not need to remove the setting immediately. The setting has no
effect on behavior.
This commit introduces a new thread pool, `system_read`, which is
intended for use by system indices for all read operations (get and
search). The `system_read` pool is a fixed thread pool with a maximum
number of threads equal to lesser of half of the available processors
or 5. Given the combination of both get and read operations in this
thread pool, the queue size has been set to 2000. The motivation for
this change is to allow system read operations to be serviced in spite
of the number of user searches.
In order to avoid a significant performance hit due to pattern matching
on all search requests, a new metadata flag is added to mark indices
as system or non-system. Previously created system indices will have
flag added to their metadata upon upgrade to a version with this
capability.
Additionally, this change also introduces a new class, `SystemIndices`,
which encapsulates logic around system indices. Currently, the class
provides a method to check if an index is a system index and a method
to find a matching index descriptor given the name of an index.
Relates #50251
Relates #37867
* Merge test runner task into RestIntegTest
* Reorganizing Standalone runner and RestIntegTest task
* Rework general test task configuration and extension
- Replace immediate task creations by using task avoidance api
- One step closer to #56610
- Still many tasks are created during configuration phase. Tackled in separate steps
* Create new data-stream xpack module.
* Move TimestampFieldMapper to the new module,
this results in storing a composable index template
with data stream definition only to work with default
distribution. This way data streams can only be used
with default distribution, since a data stream can
currently only be created if a matching composable index
template exists with a data stream definition.
* Renamed `_timestamp` meta field mapper
to `_data_stream_timestamp` meta field mapper.
* Add logic to put composable index template api
to fail if `_data_stream_timestamp` meta field mapper
isn't registered. So that a more understandable
error is returned when attempting to store a template
with data stream definition via the oss distribution.
In a follow up the data stream transport and
rest actions can be moved to the xpack data-stream module.