This splits out the registry and the service, which makes testing easier and removes much of the delegation from the old `APMMeter` to `Instruments` (now renamed `APMMeterRegistry`).
APMMeterService takes care of the lifecycle and APMMeterRegistry holds the instruments.
OTEL gauges should follow the callback model otherwise they will not be sent by
apm java agent. (or use BatchCallback)
This commit changes the gagues creation model to return Observable*Gauge
and uses AtomicLong/Double to store current value which will be polled when
metrics are exported (and callback is called)
`SpanId` is used when explicitly closing the trace in `executeQueryPhase` to avoid double closing the associated task.
`doPrivileged` avoids hitting `java.lang.UnsupportedOperationException: Cannot define class using reflection: access denied ("java.lang.reflect.ReflectPermission" "suppressAccessChecks")` when classes are sometimes injected while switching spans.
Removed `default Releasable withScope(Task task)` from the Tracer API because it automatically created a span id and, in one of the three uses, that SpanId was necessary to close the span.
Fixes: #100072
The latest version contains a fix to allow sending metrics to APM server. also adds a apm agent jvm options
"enable_experimental_instrumentations", "true"
which is required to enable the otel-metrics-instrumentation.
relates https://github.com/elastic/elasticsearch/pull/99832
Adds Metering instrument interfaces and adapter implementations for opentelemetry instrument types:
* Gauge - a single number that can go up or down
* Histogram - bucketed samples
* Counter - monotonically increasing summed value
* UpDownCounter - summed value that may decrease
Supports both Long* and Double* versions of the instruments.
Instruments can be registered and retrieved by name through APMMeter which is available via the APMTelemetryProvider.
The metering provider starts as the open telemetry noop provider.
`telemetry.metrics.enabled` turns on metering.
in order to avoid adding yet anther parameter to createComponents
a Tracer interface is replaced with TelemetryProvider.
this allows to get both Tracer and Metric (in the future) interfaces
with the support of metrics the TracerPlugin name is no longer adequate. Renaming this to TelemetryPlugin.
Also introducing TelemetryProvider interface. While it is only used in Node.java at the moment to fetch Tracer instance, it is intended to be used in Plugin::createComponents (to be done in separate commit due to
the broad scope of this method)
This will allow for plugins to get access to both Tracer and Metric interfaces
without the need to add yet another argument to createComponents
Also adding internal subpackage in module/apm so that it is more obvious
which packages are not exported
This commit renames the tracing to telemetry.tracing in both xpack/APM and elasticserach's org.elasticsearch.tracing.Tracer (the api)
the xpack/APM is renamed as follows:
org.elasticsearch.telemetry.apm - the only exported package
org.elasticsearch.telemetry.apm.settings - APMSettings
org.elasticsearch.telemetry.apm.tracing - APMTracer
org.elasticsearch.tracing.Tracer is moved to org.elasticsearch.telemetry.tracing.Tracer (responsible for majority of the changes in this PR)
when apm is enabled it throws a security manager exception:
java.security.AccessControlException: access denied ("java.net.NetPermission" "getProxySelector")
This commit adds a permission so that apm can be enabled
Following the changes in #95112, which relocated the calls
into the AuthenticationService that authenticate HTTP
requests, the authentication duration was no longer
comprised in between the Tracer#startTrace and
Tracer#stopTrace. Consequently, the span records
didn't cover the authentication duration any longer.
This PR remedies that by changing the Tracer
implementation, i.e. APMTracer, to look for the trace start
time instant in the transient thread context and use that
when starting traces (overriding the now default).
The trace start time is set in the thread context when
the request-wise thread context is first populated
(with HTTP request headers).
We can dry things up a little here and also making things a little faster
(in case we missed a corner case where a list setting is hot) with the optimized
string list setting constructor.
Fixes#82794. Upgrade the spotless plugin, which addresses the issue
around formatting `instanceof` expressions. Formatting of statements
including lambdas seems to have improved too.
Today the APM `Tracer` interface identifies each span by a raw string,
but in practice there is structure to these strings: task-related spans
have IDs like `task-NNNN` and spans that relate to REST requests have
IDs like `rest-NNNN`. This convention is distributed across the codebase
a little too widely, so with this commit we centralise it into a
`SpanId` class, and introduce specific overrides for `Task` and
`RestRequest` to avoid callers needing to construct IDs themselves.
Fixes#94689.
The APM agent version 1.33.0 fails to start on JDK 20, which prevents
the APM integration to work as expected. As a consequence, the
tracing does not work.
When setting `ELASTIC_APM_LOG_LEVEL=debug` and
`ELASTIC_APM_LOG_FILE=/tmp/log.txt`, the agent log shows that there
is an issue with accessing `Unsafe` (sorry I don't have the exact
stack trace).
There was a few changes in APM agent regarding the security manager
(SM) in recent versions, and updating the agent seems to make it
work as expected.
However, there is one known caveat so far
(https://github.com/elastic/apm-agent-java/issues/3074), keeping
the agent with `debug` log level with `ELASTIC_APM_LOG_LEVEL=debug`
makes it trigger another security exception when trying to establish
connection with apm-server because the agent prints few details if
a proxy is used or not (which is forbidden by default by the SM and
isn't yet wrapped in a privileged call.
Closes#92338.
When tracing REST requests with APM, we capture HTTP headers as labels
on the trace, but redact sensitive values. However, we can't know ahead
of time what are all possible sensitive values.
Push this redaction into the tracer, and make the redaction terms
configurable. Switch the defaults to the APM Java agent's defaults.
This pull request adds the necessary support, and implementation, for profiling queries in the Tracer.
In order to use the APM Agent's inferred spans functionality, the active span's context has to be open in the current thread. This PR adds context-sensitive methods to the Tracer interface, implements them in APMTracer, and makes use of them in the private SearchService.executeQueryPhase(), which is on the stack for a lot of our most critical operations.
With this change we are adding the allocation deciders
in create components we can simplify the use in the
Autoscaling plugin and implement reserved state handler
in the future.
Closes#89414. Remove the workaround from #89135 that addressed #89107,
and instead upgrade the OpenTelemetry API, which contains a fix for the
underlying issue.
Part of #84369. Implement the `Tracer` interface by providing a
module that uses OpenTelemetry, along with Elastic's APM
agent for Java.
See the file `TRACING.md` for background on the changes and the
reasoning for some of the implementation decisions.
The configuration mechanism is the most fiddly part of this PR. The
Security Manager permissions required by the APM Java agent make
it prohibitive to start an agent from within Elasticsearch
programmatically, so it must be configured when the ES JVM starts.
That means that the startup CLI needs to assemble the required JVM
options.
To complicate matters further, the APM agent needs a secret token
in order to ship traces to the APM server. We can't use Java system
properties to configure this, since otherwise the secret will be
readable to all code in Elasticsearch. It therefore has to be
configured in a dedicated config file. This in itself is awkward,
since we don't want to leave secrets in config files. Therefore,
we pull the APM secret token from the keystore, write it to a config
file, then delete the config file after ES starts.
There's a further issue with the config file. Any options we set
in the APM agent config file cannot later be reconfigured via system
properties, so we need to make sure that only "static" configuration
goes into the config file.
I generated most of the files under `qa/apm` using an APM test
utility (I can't remember which one now, unfortunately). The goal
is to setup up a complete system so that traces can be captured in
APM server, and the results in Elasticsearch inspected.