This is very useful when piping the input file to stdin and then using
/dev/stdin as the input file. e.g.
xzcat dump.xz |
promtool tsdb create-blocks-from openmetrics /dev/stdin /tmp/data
Signed-off-by: Nicolas Peugnet <nicolas.peugnet@lip6.fr>
* model/textparse: Change parser interface Metric(...) string to Labels(...)
Simplified the interface given no one is using the return argument.
Renamed for clarity too.
Found and discussed https://github.com/prometheus/prometheus/pull/15731#discussion_r1950916842
Signed-off-by: bwplotka <bwplotka@gmail.com>
* Fixed comments; optimized not needed copy for om and text.
Signed-off-by: bwplotka <bwplotka@gmail.com>
---------
Signed-off-by: bwplotka <bwplotka@gmail.com>
Add --ignore-unknown-fields that ignores unknown fields in rule group
files. There are lots of tools in the ecosystem that "like" to extend
the rule group file structure but they are currently unreadable by
promtool if there's anything extra. The purpose of this flag is so that
we could use the "vanilla" promtool instead of rolling our own.
Some examples of tools/code:
https://github.com/grafana/mimir/blob/main/pkg/mimirtool/rules/rwrulefmt/rulefmt.go8898eb3cc5/pkg/rules/rules.go (L18-L25)
Signed-off-by: Giedrius Statkevičius <giedrius.statkevicius@vinted.com>
What
Adds support for OTLP delta temporality to the OTLP endpoint.
This is done by calling the deltatocumulative processor from the OpenTelemetry collector during OTLP conversion.
Why
Delta conversion is a naturally stateful process, which requires careful request routing when operated inside a collector.
Prometheus is already stateful and doing the conversion in-server reduces the operational burden on the ingest architecture by only having one stateful component.
How
deltatocumulative is a OTel collector component that works as follows:
* pmetric.Metrics come from a receiver or in this case from the HTTP client
* It operates as an in-place update loop:
* for each sample, if not delta, leave unmodified
* if delta, do:
* state += sample, where state is the in-memory sum of all previous samples
* sample = state, sample value is now cumulative
* this is supported for sums (counters), gauges, histograms (old histograms) and exponential histograms (native histograms)
If a series receives no new samples for 5m, its state is removed from memory
Performance
Delta performance is a stateful operation and the OTel code is not highly optimized yet, e.g. it locks the entire processor for each request. Nonetheless, care has been taken to mitigate those effects:
delta conversion is behind a feature flag. If disabled, no conversion code is ever invoked
if enabled, conversion is not invoked if request not actually contains delta samples. This leads to no measureable performance difference between default-cumulative to convert-cumulative (only cumulative, feature on/off)
Signed-off-by: sh0rez <me@shorez.de>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Instead of storing discovered labels on every target, recompute them if
required. The `Target` struct now needs to hold some more data required
to recompute them, such as ScrapeConfig.
This moves the load from every Prometheus all of the time, to just when
someone views Service Discovery in the UI.
The way `PopulateLabels` is used changes; you are no longer expected to
call it with a part-populated `labels.Builder`.
The signature of `Target.Labels` changes to take a `labels.Builder`
instead of a `ScratchBuilder`, for consistency with `DiscoveredLabels`.
This will save a lot of work when many targets are filtered out in
relabeling. Combine with `keep_dropped_targets` to avoid ever computing
most labels for dropped targets.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
We should never modify (or even shallow copy) Config after config.Load;
added comments and modified GetScrapeConfigs to do so. For GetScrapeConfigs
the validation (even repeated) was likely doing writes (because global fields was 0). We GetScrapeConfigs concurrently
in tests and ApplyConfig causing test races. In prod there were
races but likelyt only to replace 0 with 0, so not too severe.
I removed validation since I don't see anyone using our config.Config without Load.
I had to refactor one test that was doing it, all others use yaml config.
Fixes#15538
Previous attempt: https://github.com/prometheus/prometheus/pull/15634
Signed-off-by: bwplotka <bwplotka@gmail.com>
Enable the `auto-gomaxprocs` feature flag by default.
* Add command line flag `--no-auto-gomaxprocs` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Enable the `auto-gomemlimit` feature flag by default.
* Add command line flag `--no-auto-gomemlimit` to disable.
Signed-off-by: SuperQ <superq@gmail.com>
Previously, top-level initialization (including WAL reading) had this at
the top of the stack:
github.com/oklog/run.(*Group).Run.func1:38
I found this confusing, and thought it had something to do with logging.
Introducing another local function should make this clearer.
Also make the error message user-centric not code-centric.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When we had a syntax error but restored the old file, we did not
re-trigger the config reload, so the config reload metric was showing
that config reload was unsucessful.
I made magic to handle logs in cmd/prometheus.
For now it is a separate file so we can backport this easily.
I will generalize the helper in another PR.
Signed-off-by: Julien <roidelapluie@o11y.eu>
* Add hidden flag for the delayed compaction random time window
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update tsdb/db.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Fix flag name according to review - add test for delay
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Fix afer main rebase
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Implement review comments
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update generatedelaytest to try with limit values
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
---------
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
* promtool: Add debug flag for rule tests
This makes it print out the tsdb state (both input_series and rules that
are run) at the end of a test, making reasoning about tests much easier.
Signed-off-by: David Leadbeater <dgl@dgl.cx>
* Reuse generated test name from junit testing
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: David Leadbeater <dgl@dgl.cx>
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Co-authored-by: David Leadbeater <dgl@dgl.cx>
turn some loops into subtests to make use of t.Parallel()
requires Go 1.22 to make use of https://go.dev/blog/loopvar-preview
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Now we check that a rule execution has taken place.
This also reduces the time to run the rules tests from 45s to 25s.
Signed-off-by: Julien <roidelapluie@o11y.eu>
The OTLP receiver can now considered stable. We've had it for longer
than a year in main and has received constant improvements.
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
This commit introduces a new `/api/v1/notifications/live` endpoint that
utilizes Server-Sent Events (SSE) to stream notifications to the web UI.
This is used to display alerts such as when a configuration reload
has failed.
I opted for SSE over WebSockets because SSE is simpler to implement and
more robust for our use case. Since we only need one-way communication
from the server to the client, SSE fits perfectly without the overhead
of establishing and maintaining a two-way WebSocket connection.
When the SSE connection fails, we go back to a classic
/api/v1/notifications API endpoint.
This commit also contains the required UI changes for the new Mantine UI.
Signed-off-by: Julien <roidelapluie@o11y.eu>
The pattern of `import _ "net/http/pprof"` adds handlers to the default
http handler, but Prometheus does not use that. There are explicit
handlers in `web/web.go`.
So, we can remove this line with no impact to behaviour.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
When a YAML file is invalid, trigger auto-reload anyway so that user is
aware that the configuration file is incorrect.
Failing to do so does not change the reload status in metrics and api.
Signed-off-by: Julien <roidelapluie@o11y.eu>
Conflicts:
cmd/prometheus/main.go
docs/command-line/prometheus.md
docs/feature_flags.md
web/ui/build_ui.sh
web/web.go
Resolved by dropping the UTF-8 feature flag and adding the
`auto-reload-config` feature flag.
For the new web ui pick all changes from `main`.
This change causes Prometheus to allow all UTF-8 characters in metric and label names.
This means that names that were previously invalid and would have been previously rejected will be allowed through.
Signed-off-by: Owen Williams <owen.williams@grafana.com>
Fix call to newTestEngine(t) in promql/engine_test.go:3214.
`agent` feature-flag it's own cmdline flag now.
Remove `scrape.name-escaping-scheme` argument.
Signed-off-by: Jan Fajerski <jfajersk@redhat.com>
PromQL engine: Delay deletion of __name__ label to the end of the query evaluation
- This change allows optionally preserving the `__name__` label via the `label_replace` and `label_join` functions, and helps prevent the dreaded "vector cannot contain metrics with the same labelset" error.
- The implementation extends the `Series` and `Sample` structs with a boolean flag indicating whether the `__name__` label should be deleted at the end of the query evaluation.
- The `label_replace` and `label_join` functions can still access the value of the `__name__` label, even if it has been previously marked for deletion. If `__name__` is used as target label, it won't be dropped at the end of the query evaluation.
- Fixes https://github.com/prometheus/prometheus/issues/11397
- See https://github.com/jcreixell/prometheus/pull/2 for previous discussion, including the decision to create this PR and benchmark it before considering other alternatives (like refactoring `labels.Labels`).
- See https://github.com/jcreixell/prometheus/pull/1 for an alternative implementation using a special label instead of boolean flags.
- Note: a feature flag `promql-delayed-name-removal` has been added as it changes the behavior of some "weird" queries (see https://github.com/prometheus/prometheus/issues/11397#issuecomment-1451998792)
Example (this always fails, as `__name__` is being dropped by `count_over_time`):
```
count_over_time({__name__!=""}[1m])
=> Error executing query: vector cannot contain metrics with the same labelset
```
Before:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=> Error executing query: vector cannot contain metrics with the same labelset
```
After:
```
label_replace(count_over_time({__name__!=""}[1m]), "__name__", "count_$1", "__name__", "(.+)")
=>
count_go_gc_cycles_automatic_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
count_go_gc_cycles_forced_gc_cycles_total{instance="localhost:9090", job="prometheus"} 1
...
```
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
---------
Signed-off-by: Jorge Creixell <jcreixell@gmail.com>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>
avoid simultaneous compactions and reduce stress on shared resources.
This is enabled via `--enable-feature=delayed-compaction`.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>