It seems https://github.com/elastic/elasticsearch/pull/88982 introduced
a dependency on `elasticsearch-grok` to `x-pack-core`. Since the latter
is published to Maven Central, this means consumers will have issues
resolving it's dependencies since `elasticsearch-grok` isn't published.
This pull request resolves this, by adding the publishing plugin to the
`grok` library. We'll then follow up separately to add that to our
release configuration.
Removing the custom dependency checksum functionality in favor of Gradle build-in dependency verification support.
- Use sha256 in favor of sha1 as sha1 is not considered safe these days.
Closes https://github.com/elastic/elasticsearch/issues/69736
This removes many calls to the last remaining `createParser` method that
I deprecated in #79814, migrating callers to one of the new methods that
it created.
The initial implementation of the embedded class loader took a brute
force approach to supporting multi-release JARs - iterating over all
possible release versions when searching for classes and resources. This
change improves upon that approach by deriving and caching package and
version specific maps, so class and resource loading can go directly to
the class and resource bytes, respectively, rather than searching.
It's hard to get empirical numbers to quanify just how much this change
improves the performance of classes loaded by this loader, and there is
typically only a couple of hundred classes loaded, but the initial cli
seems observably much quicker, while the server startup has improved
just a bit (at least on my machine).
This PR reworks the testing conventions precommit plugin. This plugin now:
- is compatible with yaml, java rest tests and internalClusterTest (aka different sourceSets per test type)
- enforces test base class and simple naming conventions (as it did before)
- adds one check task per test sourceSet
- uses the worker api to improve task execution parallelism and encapsulation
- is gradle configuration cache compatible
This also ports the TestingConventions integration testing to Spock and removes the build-tools-internal/test kit folder that is not required anymore. We also add some common logic for testing java related gradle plugins.
We will apply further cleanup on other tests within our test suite in a dedicated follow up cleanup
This commit adds desired nodes status tracking to the cluster state. Previously status was tracked
in-memory by DesiredNodesMembershipService this approach had certain limitations, and made
the consumer code more complex. This takes a simpler approach to keep the status updated when
the desired nodes are updated or when a new node joins, storing the status in the cluster state,
this allows to consume that information easily where it is necessary.
Additionally, this commit moves test code from depending directly of DesiredNodes which can be
seen as an internal data structure to rely more on UpdateDesiredNodesRequest.
Relates #84165
Introducing a stable logging API under libs/logging.
This change covers the most common use cases for logging: fetching a logger with LogManager, emitting a log messages with Logger and Level.
It is influenced by log4j2-api, but do not include Marker and LogBuilder methods.
Also methods using org.apache.logging.log4j.util.Supplier are replaced with java.util.Supplier
The basic implementation is present in server and injected statically in LogConfigurator
relates #84478
The jarhell check declares a URISyntaxException. However, this should
not be possible as the paths and URLs come from the jdk conversion. This
commit makes a URISyntaxException when converting form URL to URI an
assertion error, similar to MalformedURLException when creating a URL.
Noticed loads of duplicate lambdas on the heap and code related to these predicates
etc. pop up during benchmarking things that are hot on x-content parsing.
These changes way simplify the rest-api code (though it could be made even simpler I think)
and remove it from profiling for the most part.
Co-authored-by: Joe Gallo <joe.gallo@elastic.co>
The 'ensure no self reference' check during ingest only needs to be
performed when there is a chance that a map or list is added to
`IngestDocument` that references other entries in the `IngestDocument`
and when there is a risk that a circular reference is created between
entries.
Today this is only possible in `ScriptProcessor`, since a custom script
could do this. So doing this check if no script processor is used is a
not useful. Doing this check just adds an additional tax to time spent
on ingest, that in many cases really isn't necessary.
A flag is added to `IngestDocument` that controls whether the 'ensure no
self reference' check is performed at the end when all pipelines have
been executed. The `ScriptProcessor` has been modified to set this flag
and so this check will be performed if pipelines that execute have one
or more script processors.
Closes#87335
This is change modularizes the ingest.common component,
by adding a module-info.java. As well as two dependent libs.
The project only requires painless SPI to compile, so that was
fixed along the way ( so that the compile module path can be
inferred directly from the dependencies ).
Strings.format method, which is used heavily in logging with
Supplier should handle exceptions when a format is incorrect.
This will prevent a hard to catch mistakes to blow up in server.
Those mistakes are especially hard to detect in logging when a
code to create a message might be only executed when logger is debug
or trace. Which is not always the case in CI.
relates #87077 (comment)
relates #86549
This is a result of structural search/replace in intellij. This only affects log methods with a signature
logger.info(Supplier<?>) where level could be info/debug etc and supplier argument is in a form of
()-> new ParameterizedMessage
This commit also introduced a Strings utility class to avoid passing Locale.ROOT to every
String.format(Locale.ROOT, pattern, args)
relates #86549
This PR represents the initial phase of Modularizing Elasticsearch (with
Java Modules).
This initial phase modularizes the core of the Elasticsearch server
with Java Modules, which is then used to load and configure extension
components atop the server. Only a subset of extension components are
modularized at this stage (other components come in a later phase).
Components are loaded dynamically at runtime with custom class loaders
(same as is currently done). Components with a module-info.class are
defined to a module layer.
This architecture is somewhat akin to the Modular JDK, where
applications run on the classpath. In the analogy, the Elasticsearch
server modules are the platform (thus are always resolved and present),
while components without a module-info.class are non-modular code
running atop the Elasticsearch server modules. The extension components
cannot access types from non-exported packages of the server modules, in
the same way that classpath applications cannot access types from
non-exported packages of modules from the JDK. Broadly, the core
Elasticseach java modules simply "wrap" the existing packages and export
them. There are opportunites to export less, which is best done in more
narrowly focused follow-up PRs.
The Elasticsearch distribution startup scripts are updated to put jars
on the module path (the class path is empty), so the distribution will
run the core of the server as java modules. A number of key components
have been retrofitted with module-info.java's too, and the remaining
components can follow later. Unit and functional tests run as
non-modular (since they commonly require package-private access), while
higher-level integration tests, that run the distribution, run as
modular.
Co-authored-by: Chris Hegarty <christopher.hegarty@elastic.co>
Co-authored-by: Ryan Ernst <ryan@iernst.net>
Co-authored-by: Rene Groeschke <rene@elastic.co>
This commit adds support for CPU ranges in the desired nodes API.
This aligns better with environments where administrators/orchestrators
can define lower and upper bounds for the amount of CPUs that the
desired node would get once deployed.
This allows to provide information about the expected CPU and possible
allowed overcommit that the desired node will run on.
This was the previous expected body for the desired nodes API (we still support it):
```
PUT /_internal/desired_nodes/history/1
{
"nodes" : [
{
"settings" : {
"node.name" : "instance-000187",
"node.external_id": "instance-000187",
"node.roles" : ["data_hot", "master"],
"node.attr.data" : "hot",
"node.attr.logical_availability_zone" : "zone-0"
},
"processors" : 8,
"memory" : "58gb",
"storage" : "1700gb",
"node_version" : "8.3.0"
}
]
}
```
Now it's possible to define `processors` or `processors_range` as in:
```
PUT /_internal/desired_nodes/history/1
{
"nodes" : [
{
"settings" : {
"node.name" : "instance-000187",
"node.external_id": "instance-000187",
"node.roles" : ["data_hot", "master"],
"node.attr.data" : "hot",
"node.attr.logical_availability_zone" : "zone-0"
},
"processors_range" : {"min": 8.0, "max": 16.0},
"memory" : "58gb",
"storage" : "1700gb",
"node_version" : "8.3.0"
}
]
}
```
Note that `max` in `processors_range` is optional.
This commit also moves from representing CPUs as integers to
accept floating point numbers.
Note: I disabled the bwc yamlRestTests for versions < 8.3 since we introduced
a few "breaking changes" but since this is an internal API it should be fine.
Elasticsearch provides several command line tools, as well as the main script to start elasticsearch. While most of the logic is abstracted away for cli tools, the main elasticsearch script has hundreds of lines of platform specific shell code. That code is difficult to maintain because it uses many special shell features which then must also exist in other platforms (ie windows batch files). Additionally, the logic in these scripts are not easy to test, we must be on the actual platform and test with a full installation of Elasticsearch, which is relatively slow (compared to most in process tests).
This commit replaces logic of the main server script, as well as the windows service management script, with Java. The new entrypoints use the CliToolLauncher. The server cli figures out all the jvm options and such necessary, then launches the real server process. If run in the foreground, the launcher will stay alive for the lifetime of Elasticsearch; the streams are effectively inherited so all output from Elasticsearch still goes to the console. If daemonizing, the launcher waits around until Elasticsearch is "ready" (this means the Node startup completed), then detaches and exits.
Co-authored-by: William Brafford <william.brafford@elastic.co>
This commit fixes the PrintWriters created by Terminal to use
autoflushing. It also fixes prompting to flush the error stream after
writing. Finally, it makes the APIs more consistent by adding getters
for verbosity and reader so that subclasses of Terminal can fully wrap
and existing terminal.
relates #85758
This change adds support to embedded class loader to load the provider
and implmentation dependencies as modules - within their own module
layer - when the caller itself is a named module. Currently, this code
is not yet triggered during deployment, since the caller is always an
unnamed module, but the caller will be moularized in a subsequent
change.
Notable changes include:
count implementations for MultiRangeQuery and IndexSortedNumericDocValuesRangeQuery, which may speed up certain aggregations
more efficient decoding of docids in BKD reader
Each Command subclass can implement close() so that resources will be
cleaned up on exceptional exit like SIGINT. This is implemented through
a shutdown hook added in the superclass constructor. However, this hook
makes testing difficult because the hook cannot be added in normal
tests, so a flag must be overriden when testing Command classes.
This commit moves the shutdown hook handling into the CliToolLauncher
that creates the command. It also adds non-evil tests that check how the
hook runs, in place of the old evil tests that actually registered a
real shutdown hook.
relates #85758
Much easier to use the new API and not have to worry about all the manual
checks on the field name that were dead code anyway because they'd only be
hit on mal-formed input bytes which Jackson would throw on before entering
our code.
Fix using the filter output stream without overriding the bulk write.
The usage in `directFieldAsBase64` seems like a serious performance bug
since the stream is used to write a potentially larger response here.
I also removed the `BlobOutputStream` that used to contain the same
fix now added to the no-close stream after realizing the class is pointless
to begin with to cut down on our usage of `FilterOutputStream` where the bulk
write fix is needed.
The sysprops and envVars members of Command provide cli implementations
with information about the jvm process that is running. This is
convenient for runtime, but difficult for tests to mock because they
must subclass the cli class.
This commit adds a ProcessInfo record, and plumbs it through the
main and execute methods. The new record includes system properties,
environment variables and the working directory. By having this be a
single new parameter, additional information can be added in the future
without again needing to modify the method signatures.
relates #85758
We can be a little more efficient when parsing maps and exploit
the fact that we know the next token is a name in a couple of cases.
I fixed the most performance relevant one but there's a couple more
that could make use of this API in a follow up.
The cli lib has the SuppressForbidden annotation, but so does core,
which cli depends on. This commit removes the SuppressForbidden from
cli, in favor of the one from core.
relates #85758
Terminal is the abstraction Elasticsearch uses for all input and output,
both character based and binary. In an interactive shell, this is backed
by Java's Console, and in non-interactive it is backed by
stdin/stdout/stderr. Over time, the Terminal class has been amended to
support several different use cases, which has made constructing
subclasses for testing or filter based implementations complex. This
commit reworks Terminal so that the readers/writers/streams are
constructor arguments, instead of overrides. This allows subclasses to
simply call super with what is neeeded, rather than overloading several
methods and adding the same boilerplate implementation as others.
Note that the majority of the modifications here are to tests because
MockTerminal now has a factory method instead of direct constructor.
relates #85758
Most classes under elasticsearch-core had been moved to the o.e.core
package. However, a couple io related classes remained in an "internal"
package. This commit moves Streams and IOUtils to the core package, as
they are no more "internal" than the rest of the classes in core.
The Terminal class currently has two variants of readSecret, one that is
similar to Console.readPassword, and another that takes a maximum length
to read. This second variant was originally added as a protection when
reading the keystore password. However, there are no other uses of it,
and the passphrase has already been fully read in bin/elasticsearch by
the time it is read from Terminal, so it shouldn't be possible to
actually hit this edge case. This commit removes the maxLength variant.
relates #85758
In docker ES allows settings to be set via environment variables. This
is currently handled in complex bash logic. This commit moves that logic
into EnvironmentAwareCommand where the rest of the initial settings are
found.
relates #85758
Serveral mechanisms exist for intializing logging in cli tools. Some
base Command classes exist which initialize logging. But they do this
late, when they are constructed, which may be after static init has
occured for classes grabbing a Logger. Other CLIs like node tool
explicitly initialize logging to avoid that problem.
This commit removes all the of the LoggingAware classes, and
unifies logging configuration to occur at the very beginning of the cli
launcher.
relates #85758
Currently any code needing to access system properties or environment
variables does it with the static methods provided by Java. While this
is ok in production since these are instantiated for the entire jvm
once, it makes any code reading these properties difficult to test
without mucking with the test jvm.
This commit adds system properties and environment variables to the base
Command class that our CLI tools use. While it does not propagate the
properties and env down for all possible uses in the system, it is the
first step, and it makes CLI testing a bit easier.
CLI scripts have a common infrastructure in that they call to the shared
elasticsearch-cli shell script which launches them with the appropriate
java command line. However, each underlying Java class must implement
its own main method.
This commit introduces a single main method to be shared by CLIs. The
new CliToolLauncher takes in system properties to determine which tool
is being run, and a new CliToolProvider SPI allows defining and finding
the named tools.
relates #85758
Co-authored-by: William Brafford <william.brafford@elastic.co>