elasticsearch

Commit Graph

Author	SHA1	Message	Date
Armin Braun	8947c1e980	Save Memory on Large Repository Metadata Blob Writes (#74313 ) This PR adds a new API for doing streaming serialization writes to a repository to enable repository metadata of arbitrary size and at bounded memory during writing. The existing write-APIs require knowledge of the eventual blob size beforehand. This forced us to materialize the serialized blob in memory before writing, costing a lot of memory in case of e.g. very large `RepositoryData` (and limiting us to `2G` max blob size). With this PR the requirement to fully materialize the serialized metadata goes away and the memory overhead becomes completely bounded by the outbound buffer size of the repository implementation. As we move to larger repositories this makes master node stability a lot more predictable since writing out `RepositoryData` does not take as much memory any longer (same applies to shard level metadata), enables aggregating multiple metadata blobs into a single larger blobs without massive overhead and removes the 2G size limit on `RepositoryData`.	2021-06-29 11:29:55 +02:00
Armin Braun	cbf48e0633	Flatten Get Snapshots Response (#74451 ) This PR returns the get snapshots API to the 7.x format (and transport client behavior) and enhances it for requests that ask for multiple repositories. The changes for requests that target multiple repositories are: * Add `repository` field to `SnapshotInfo` and REST response * Add `failures` map alongside `snapshots` list instead of returning just an exception response as done for single repo requests * Pagination now works across repositories instead of being per repository for multi-repository requests closes #69108 closes #43462	2021-06-24 16:58:33 +02:00
Armin Braun	5249540a5c	Simplify Blobstore Consistency Check in Tests (#73992 ) With work to make repo APIs more async incoming in #73570 we need a non-blocking way to run this check. This adds that async check and removes the need to manually pass executors around as well.	2021-06-10 16:12:26 +02:00
Ryan Ernst	68817d7ca2	Rename o.e.common in libs/core to o.e.core (#73909 ) When libs/core was created, several classes were moved from server's o.e.common package, but they were not moved to a new package. Split packages need to go away long term, so that Elasticsearch can even think about modularization. This commit moves all the classes under o.e.common in core to o.e.core. relates #73784	2021-06-08 09:53:28 -07:00
Ryan Ernst	64054de1ac	Rename bootstrap package in core jar (#73788 ) The org.elasticsearch.bootstrap package exists in server with classes for starting up Elasticsearch. The elasticsearch-core jar has a handful of classes that were split out from there, namely java version parsing and jarhell. This commit moves those classes to a new org.elasticsearch.jdk package so as to not split the server owned bootstrap package. relates #73784	2021-06-07 08:14:44 -07:00
Ryan Ernst	8cba213dfc	Explicitly set illegal-access to deny for tests (#72588 ) Since Java 16, the default value for illegal-access is deny. This means the latest release of Elasticsearch, and all current integration tests, run with deny (since we don't explicitly set it in jvm options). Yet tests run with illegal-access=warn, for legacy reasons. #71908 proposed to remove the setting from test jvms, but concerns were raised there about whether this would cause some test failures. This commit explicitly sets tests to deny. This has the added benefit that any failures will be caught even when running tests with older jvms.	2021-06-02 16:09:27 -07:00
Armin Braun	52e7b926a9	Make Large Bulk Snapshot Deletes more Memory Efficient (#72788 ) Use an iterator instead of a list when passing around what to delete. In the case of very large deletes the iterator is a much smaller than the actual list of files to delete (since we save all the prefixes which adds up if the individual shard folders contain lots of deletes). Also this commit as a side-effect adjusts a few spots in logging where the log messages could be catastrophic in size when trace logging is activated.	2021-05-10 13:40:57 +02:00
Armin Braun	bef9dab643	Cleanup BlobPath Class (#72860 ) There should be a singleton for the empty version of this. All the copying to `String[]` or use as an iterator make no sense either when we can just use the list outright.	2021-05-10 00:10:39 +02:00
Rene Groeschke	5bcd02cb4d	Restructure build tools java packages (#72030 ) Related to #71593 we move all build logic that is for elasticsearch build only into the org.elasticsearch.gradle.internal* packages This makes it clearer if build logic is considered to be used by external projects Ultimately we want to only expose TestCluster and PluginBuildPlugin logic to third party plugin authors. This is a very first step towards that direction.	2021-04-26 14:53:55 +02:00
Rene Groeschke	0f40889879	Update build to Gradle 7.0 (#68506 ) - Update gradle wrapper to gradle 7.0 - Remove deprecated usages to make build 7.0 compatible - Fix excludes in docs snippet tasks (See https://github.com/gradle/gradle/issues/16160 for details) - Fix deprecation warnings in 7.0 - Add explicit dependencies that have been missed - Make extract native licenses tasks output dir more explicit - Use a snapshot of the ospackage plugin that includes a fix for 7.0 already - fix test runtime classpath setup in repository-hdfs - Make task dependency explicit to fix further deprecation warnings - Remove manual check for http repo usages that has been deprecated in gradle 7.0 - Update spock to latest 2.0 milestone required for groovy 3	2021-04-13 09:15:08 +02:00
Mark Vieira	6339691fe3	Consolidate REST API specifications and publish under Apache 2.0 license (#70036 )	2021-03-26 16:20:14 -07:00
Mark Vieira	a92a647b9f	Update sources with new SSPL+Elastic-2.0 license headers As per the new licensing change for Elasticsearch and Kibana this commit moves existing Apache 2.0 licensed source code to the new dual license SSPL+Elastic license 2.0. In addition, existing x-pack code now uses the new version 2.0 of the Elastic license. Full changes include: - Updating LICENSE and NOTICE files throughout the code base, as well as those packaged in our published artifacts - Update IDE integration to now use the new license header on newly created source files - Remove references to the "OSS" distribution from our documentation - Update build time verification checks to no longer allow Apache 2.0 license header in Elasticsearch source code - Replace all existing Apache 2.0 license headers for non-xpack code with updated header (vendored code with Apache 2.0 headers obviously remains the same). - Replace all Elastic license 1.0 headers with new 2.0 header in xpack.	2021-02-02 16:10:53 -08:00
Mark Vieira	413e6bac07	Disable secureHdfs fixture when testing on JDK 16 (#68182 )	2021-01-28 18:25:59 -08:00
Rory Hunter	ad1f876daa	Replace NOT operator with explicit `false` check (#67817 ) We have an in-house rule to compare explicitly against `false` instead of using the logical not operator (`!`). However, this hasn't historically been enforced, meaning that there are many violations in the source at present. We now have a Checkstyle rule that can detect these cases, but before we can turn it on, we need to fix the existing violations. This is being done over a series of PRs, since there are a lot to fix.	2021-01-26 14:47:09 +00:00
David Turner	bc1f50c523	Permit wait_for_active_shards warnings in master (#67498 ) Part of the fixes for #66419, this commit permits nodes to emit the deprecation warning regarding not specifying `?wait_for_active_shards` when closing an index in 7.x versions for x ≥ 12. This change is required on `master` too since the BWC tests encounter these warnings. Relates #67246, which is the 7.x part of this change.	2021-01-14 15:55:43 +00:00
Nik Everett	7b0b09dfd7	Help eclipse compilation (#67403 ) Eclipse wasn't seeing the special shadow jars we were making for repository-azure and repository-hdfs so it wasn't able to compile those plugins. This points Eclipse at the project that we use to build the shadow jar which gets it compiling. The tests don't pass because we aren't pointing at the shadow jars but at least we compile.	2021-01-13 13:37:46 -05:00
Albert Zaharovits	cd72f45c33	Client-side encrypted snapshot repository (feature flag) (#66773 ) The client-side encrypted repository is a new type of snapshot repository that internally delegates to the regular variants of snapshot repositories (of types Azure, S3, GCS, FS, and maybe others but not yet tested). After the encrypted repository is set up, it is transparent to the snapshot and restore APIs (i.e. all snapshots stored in the encrypted repository are encrypted, no other parameters required). The encrypted repository is protected by a password stored on every node's keystore (which must be the same across the nodes). The password is used to generate a key encrytion key (KEK), using the PBKDF2 function, which is used to encrypt (using the AES Wrap algorithm) other symmetric keys (referred to as DEK - data encryption keys), which themselves are generated randomly, and which are ultimately used to encrypt the snapshot blobs. For example, here is how to set up an encrypted FS repository: ------ 1) make sure that the cluster runs under at least a "platinum" license (simplest test configuration is to put `xpack.license.self_generated.type: "trial"` in the elasticsearch.yml file) 2) identical to the un-encrypted FS repository, specify the mount point of the shared FS in the elasticsearch.yml conf file (on all the cluster nodes), e.g. `path.repo: ["/tmp/repo"]` 3) store the repository password inside the elasticsearch.keystore, on every cluster node. In order to support changing password on existing repository (implemented in a follow-up), the password itself must be names, e.g. for the "test_enc_key" repository password name: `./bin/elasticsearch-keystore add repository.encrypted.test_enc_pass.password` type in the password 4) start up the cluster and create the new encrypted FS repository, named "test_enc", by calling: ` curl -X PUT "localhost:9200/_snapshot/test_enc?pretty" -H 'Content-Type: application/json' -d' { "type": "encrypted", "settings": { "location": "/tmp/repo/enc", "delegate_type": "fs", "password_name": "test_enc_pass" } } ' ` 5) the snapshot and restore APIs work unmodified when they refer to this new repository, e.g. ` curl -X PUT "localhost:9200/_snapshot/test_enc/snapshot_1?wait_for_completion=true"` Related: #49896 #41910 #50846 #48221 #65768	2020-12-23 23:46:59 +02:00
Armin Braun	3819fcb582	Add Ability to Write a BytesReference to BlobContainer (#66501 ) Except when writing actual segment files to the blob store we always write `BytesReference` instead of a stream. Only having the stream API available forces needless copies on us. I fixed the straight-forward needless copying for HDFS and FS repos in this PR, we could do similar fixes for GCS and Azure as well and thus significantly reduce the peak memory use of these writes on master nodes in particular.	2020-12-17 17:42:29 +01:00
James Baiera	9bb6a3ad2d	Add HDFS searchable snapshot integration (#66185 ) Adds a bounded read implementation on the HDFS blob store as well as integration tests to the searchable snapshot project that ensures functionality on both kerberos and simple authentication HDFS.	2020-12-14 16:04:41 -05:00
Rene Groeschke	0911d04467	Make AntFixture handling task provider api compliant (#65832 ) This tweaks the AntFixture handling to make it compliant with the task avoidance api. Tasks of type StandaloneRestTestTask are now generally finalised by using the typed ant stop task which allows us to remove of errorprone dependsOn overrides in StandaloneRestTestTask. As a result we also ported more task definitions in the build to task avoidance api. Next work item regarding AntFixture handling is porting AntFixture to a plain Gradle task and remove Groovy AntBuilder will allow us to port more build logic from Groovy to Java but is out of the scope of This PR.	2020-12-08 13:07:36 +01:00
Rene Groeschke	97749a3372	Port rest integ tests to use task avoidance api (#65011 ) This ports the majority of the rest integ tests tasks to use the task avoidance api. - There are some edge cases left that we need to investigate, but we can do that separately.	2020-11-26 10:30:06 +01:00
Ryan Ernst	23a47cebf1	Add plugin permission validation (#64751 ) Security manager policies within plugins currently can ask to grant any permission (though we block some within the security manager itself at runtime). Yet most of these permissions should never be necessary, and some we would actively not want any plugins to be allowed to use. This commit adds validation of plugins' policy files to restrict the permissions allowed to be granted to a subset that is reasonable for plugins to need. The allowed permissions are not ideal (still containing things like suppressAccessChecks), but it is a step forward in defining a stricter model for plugins that reduces the surface area of potential abuse.	2020-11-19 14:21:34 -08:00
Mark Vieira	c0ba2ec875	Remove shutdown hook permission from hdfs plugin (#65016 )	2020-11-12 13:13:12 -08:00
Mark Vieira	fef57fb367	Revert "Remove shutdown hook permission from hdfs plugin (#64899 )" This reverts commit `974766d7`	2020-11-12 08:24:34 -08:00
Mark Vieira	e65c044b56	Revert "Remove commented out dependency" This reverts commit `664c293d`	2020-11-12 08:24:29 -08:00
Rene Groeschke	810e7ff6b0	Move tasks in build scripts to task avoidance api (#64046 ) - Some trivial cleanup on build scripts - Change task referencing in build scripts to use task avoidance api where replacement is trivial.	2020-11-12 12:04:15 +01:00
Ryan Ernst	664c293d52	Remove commented out dependency Leftover from before #64899	2020-11-11 09:58:30 -08:00
Ryan Ernst	974766d7db	Remove shutdown hook permission from hdfs plugin (#64899 ) The hadoop library we use to connect with hdfs expects is is running in a jvm dedicated to hadoop, as if it was an hdfs node. This means the shutdown hooks hadoop tries to register expect to work; there is no handling of a lack of permissions to add shutdown hooks. This commit removes the shutdown hook permission currently granted to repository-hdfs by replacing the ShutdownHookManager class with one of the same api, but having no-ops for the public methods.	2020-11-11 09:56:28 -08:00
Armin Braun	c419cd3251	Use Pooled Byte Arrays in BlobStoreRepository Serialization (#63461 ) Many of the metadata blobs we handle in the changed spots can grow up in size up to `O(1M)`. Not using recycled bytes when working with them causes significant spikes in memory use for larger repositories.	2020-10-12 10:27:52 +02:00
Jake Landis	1367bd0c92	Remove integTest task from PluginBuildPlugin (#61879 ) This commit removes `integTest` task from all es-plugins. Most relevant projects have been converted to use yamlRestTest, javaRestTest, or internalClusterTest in prior PRs. A few projects needed to be adjusted to allow complete removal of this task * x-pack/plugin - converted to use yamlRestTest and javaRestTest * plugins/repository-hdfs - kept the integTest task, but use `rest-test` plugin to define the task * qa/die-with-dignity - convert to javaRestTest * x-pack/qa/security-example-spi-extension - convert to javaRestTest * multiple projects - remove the integTest.enabled = false (yay!) related: #61802 related: #60630 related: #59444 related: #59089 related: #56841 related: #59939 related: #55896	2020-09-08 16:41:54 -05:00
Rory Hunter	4d9a6a0295	Remove old test mute code (#61277 ) It seems that some old test mute code, added as part of #31498, was never removed. This meant that the HDFS tests would fail when run under JDK 11.	2020-08-19 09:39:04 +01:00
Rene Groeschke	dd74be0f83	Merge test runner task into RestIntegTest (#60261 ) * Merge test runner task into RestIntegTest * Reorganizing Standalone runner and RestIntegTest task * Rework general test task configuration and extension	2020-08-03 12:07:41 +02:00
Armin Braun	6b7f4c6bec	Formalize and Streamline Buffer Sizes used by Repositories (#59771 ) Due to complicated access checks (reads and writes execute in their own access context) on some repositories (GCS, Azure, HDFS), using a hard coded buffer size of 4k for restores was needlessly inefficient. By the same token, the use of stream copying with the default 8k buffer size for blob writes was inefficient as well. We also had dedicated, undocumented buffer size settings for HDFS and FS repositories. For these two we would use a 100k buffer by default. We did not have such a setting for e.g. GCS though, which would only use an 8k read buffer which is needlessly small for reading from a raw `URLConnection`. This commit adds an undocumented setting that sets the default buffer size to `128k` for all repositories. It removes wasteful allocation of such a large buffer for small writes and reads in case of HDFS and FS repositories (i.e. still using the smaller buffer to write metadata) but uses a large buffer for doing restores and uploading segment blobs. This should speed up Azure and GCS restores and snapshots in a non-trivial way as well as save some memory when reading small blobs on FS and HFDS repositories.	2020-07-22 16:46:37 +02:00
Armin Braun	5da804b865	Add Check for Metadata Existence in BlobStoreRepository (#59141 ) In order to ensure that we do not write a broken piece of `RepositoryData` because the phyiscal repository generation was moved ahead more than one step by erroneous concurrent writing to a repository we must check whether or not the current assumed repository generation exists in the repository physically. Without this check we run the risk of writing on top of stale cached repository data. Relates #56911	2020-07-08 13:16:58 +02:00
Rene Groeschke	ef6eb3af3c	Fix dependency related deprecations (#58892 )	2020-07-07 11:29:26 +02:00
Jake Landis	333a5d8cdf	Create plugin for yamlTest task (#56841 ) This commit creates a new Gradle plugin to provide a separate task name and source set for running YAML based REST tests. The only project converted to use the new plugin in this PR is distribution/archives/integ-test-zip. For which the testing has been moved to :rest-api-spec since it makes the most sense and it avoids a small but awkward change to the distribution plugin. The remaining cases in modules, plugins, and x-pack will be handled in followups. This plugin is distinctly different from the plugin introduced in #55896 since the YAML REST tests are intended to be black box tests over HTTP. As such they should not (by default) have access to the classpath for that which they are testing. The YAML based REST tests will be moved to separate source sets (yamlRestTest). The which source is the target for the test resources is dependent on if this new plugin is applied. If it is not applied, it will default to the test source set. Further, this introduces a breaking change for plugin developers that use the YAML testing framework. They will now need to either use the new source set and matching task, or configure the rest resources to use the old "test" source set that matches the old integTest task. (The former should be preferred). As part of this change (which is also breaking for plugin developers) the rest resources plugin has been removed from the build plugin and now requires either explicit application or application via the new YAML REST test plugin. Plugin developers should be able to fix the breaking changes to the YAML tests by adding apply plugin: 'elasticsearch.yaml-rest-test' and moving the YAML tests under a yamlRestTest folder (instead of test)	2020-07-06 12:13:01 -05:00
Yannick Welsch	118521d022	Account for recovery throttling when restoring snapshot (#58658 ) Restoring from a snapshot (which is a particular form of recovery) does not currently take recovery throttling into account (i.e. the `indices.recovery.max_bytes_per_sec` setting). While restores are subject to their own throttling (repository setting `max_restore_bytes_per_sec`), this repository setting does not allow for values to be configured differently on a per-node basis. As restores are very similar in nature to peer recoveries (streaming bytes to the node), it makes sense to configure throttling in a single place. The `max_restore_bytes_per_sec` setting is also changed to default to unlimited now, whereas previously it was set to `40mb`, which is the current default of `indices.recovery.max_bytes_per_sec`). This means that no behavioral change will be observed by clusters where the recovery and restore settings were not adapted. Relates https://github.com/elastic/elasticsearch/issues/57023 Co-authored-by: James Rodewig <james.rodewig@elastic.co>	2020-06-30 13:08:21 +02:00
Rene Groeschke	9526c7a4b3	Replace compile configuration usage with api (#58451 ) - Use java-library instead of plugin to allow api configuration usage - Remove explicit references to runtime configurations in dependency declarations - Make test runtime classpath input for testing convention - required as java library will by default not have build jar file - jar file is now explicit input of the task and gradle will ensure its properly build	2020-06-30 09:37:09 +02:00
Rene Groeschke	5f9d1f1d7c	Unify dependency licenses task configuration (#58116 ) - Remove duplicate dependency configuration - Use task avoidance api accross the build - Remove redundant licensesCheck config	2020-06-17 18:27:16 +02:00
Tanguy Leroux	34e253558d	Remove more //NORELEASE (#57517 ) We agreed on removing the following //NORELEASE tags.	2020-06-05 15:19:38 +02:00
Rene Groeschke	731b282c9f	Improvement usage of gradle task avoidance api (#56627 ) Use gradle task avoidance api wherever it is possible as a drop in replacement in the es build	2020-05-19 20:01:49 +02:00
Jake Landis	abed62e246	Lazy test cluster module and plugins (#54852 ) This change converts the module and plugin parameters for testClusters to be lazy. Meaning that the values are not resolved until they are actually used. This removes the requirement to use project.afterEvaluate to be able to resolve the bundle artifact. Note - this does not completely remove the need for afterEvaluate since it is still needed for the custom resource extension.	2020-04-08 15:19:52 -05:00
Ryan Ernst	9191c933ca	Remove guava from transitive compile classpath (#54309 ) Guava was removed from Elasticsearch many years ago, but remnants of it remain due to transitive dependencies. When a dependency pulls guava into the compile classpath, devs can inadvertently begin using methods from guava without realizing it. This commit moves guava to a runtime dependency in the modules that it is needed. Note that one special case is the html sanitizer in watcher. The third party dep uses guava in the PolicyFactory class signature. However, only calling a method on the PolicyFactory actually causes the class to be loaded, a reference alone does not trigger compilation to look at the class implementation. There we utilize a MethodHandle for invoking the relevant method at runtime, where guava will continue to exist.	2020-04-02 12:54:39 -07:00
Jason Tedor	95a7eed9aa	Rename MetaData to Metadata in all of the places (#54519 ) This is a simple naming change PR, to fix the fact that "metadata" is a single English word, and for too long we have not followed general naming conventions for it. We are also not consistent about it, for example, METADATA instead of META_DATA if we were trying to be consistent with MetaData (although METADATA is correct when considered in the context of "metadata"). This was a simple find and replace across the code base, only taking a few minutes to fix this naming issue forever.	2020-03-31 15:52:01 -04:00
James Baiera	fe24848666	Update the HDFS version used by HDFS Repo (#53693 )	2020-03-24 15:40:40 -04:00
Jake Landis	afc2383b72	Optimize which Rest resources are used by the Rest tests. (#53299 ) This should help with Gradle's incremental compile such that projects only depend upon the resources they use. related #52114	2020-03-18 09:09:29 -05:00
Mark Vieira	8bb5a11e76	Fix cacheability of repository-hdfs integ tests (#52858 )	2020-02-27 09:53:33 -08:00
Mark Vieira	5c53cc6a83	Don't try to start hdfs fixtures when kerberos is unavailable (#52661 )	2020-02-21 13:46:46 -08:00
Mark Vieira	c1a1047e42	Consolidate docker availability build logic (#52548 )	2020-02-21 08:11:50 -08:00
Armin Braun	abd969e5b8	Remove BlobContainer Tests against Mocks (#50194 ) * Remove BlobContainer Tests against Mocks Removing all these weird mocks as asked for by #30424. All these tests are now part of real repository ITs and otherwise left unchanged if they had independent tests that didn't call the `createBlobStore` method previously. The HDFS tests also get added coverage as a side-effect because they did not have an implementation of the abstract repository ITs. Closes #30424	2019-12-16 10:39:26 +01:00

1 2 3 4 5

232 Commits