Commit Graph

16096 Commits

Author SHA1 Message Date
Bryan Boreham 2fb50b12cd
[PERF] TSDB: Optimize appender creation on empty chunks (#16922)
Skip creating an iterator and walking all through any existing values,
when we can easily tell there are no existing values.

This is the normal case - the TSDB head creates an appender immediately
after creating every chunk.

Remove redundant handling of empty chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-27 17:11:08 +01:00
Bryan Boreham 4a782634a4
Merge pull request #17093 from prometheus/beorn7/histogram
tsdb: Remove unused `Layout()` methods
2025-08-27 16:26:43 +01:00
beorn7 23f1d3ba25 tsdb: Remove unused `Layout()` methods
Both `HistogramChunk` and `FloatHistogramChunk` have a `Layout()`
method for historical reasons. As it has turned out, these methods are
unused and also buggy. This commit simply removes them.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 17:01:23 +02:00
beorn7 71c21fb9e4 Fix minor issues after applying analyzer "modernize"
- The tool left an empty line behind that we don't need anymore, see
  https://github.com/prometheus/prometheus/pull/17092. (Arguably not a
  bug in the tool but just our stricter style about empty lines.)

- In tsdb/index/postings_test.go , our (admittedly somewhat
  convoluted) code structure tricked the tool so it spit out something
  that wouldn't even compile.

- storage/remote/queue_manager_test.go is just a minor formatting
  nit.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 15:44:11 +02:00
beorn7 747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
Ayoub Mrini 9cbb3a66c9
Merge pull request #17063 from machine424/muttt
test(notifier): add a test showing an alert mutation bug between alertmanager_config and fix it
2025-08-27 11:13:00 +02:00
pipiland2612 0246aa22f4 Parralel test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-27 10:47:39 +02:00
Darkknight 9fc4212214
revert unexpected metadata metric fopr RWV2 and add log on unexpected metadata instead. (#17082)
Signed-off-by: leegin <leegin.t@gmail.com>
2025-08-26 11:54:14 -07:00
Lukasz Mierzwa 31282d67b7 Log when GC / block write starts
Right now Prometheus only logs when these operations are completed.
It's a bit surprising to see suddenly a message saying "I was busy doing X for the past N minutes"
so let's add a message when the operation starts, so it's easier to understand what Prometheus was doing at any point in time
when reading logs.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-26 10:30:22 +01:00
bragi92 20580b6ba8
remote_write azure auth : add workload identity support (#16788)
* initial changes

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* .

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix comments

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* fix tenantid test

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* style

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* Update storage/remote/azuread/azuread.go

Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>

* pr feedback

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>

---------

Signed-off-by: Kaveesh Dubey <kadubey@microsoft.com>
Signed-off-by: bragi92 <kadubey@microsoft.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-08-26 07:14:47 +01:00
machine424 8f79470ca9
fix(notifier): create a new alert when relabeling alters labels
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-26 07:37:45 +02:00
SuperQ b87cbf0294
Fixup err nil checks
Cleanup double `if` statements for errors being nil / not-nil.

Signed-off-by: SuperQ <superq@gmail.com>
2025-08-25 17:37:02 +02:00
machine424 bd725fd6b8
test(notifier): add a test showing an alert mutation bug between alertmanager_config (alertmanagersets)
The alert_relabel_configs should only apply to the corresponding alertmanagerset

Signed-off-by: machine424 <ayoubmrini424@gmail.com>
2025-08-25 17:04:14 +02:00
Darkknight 7cf585527f
remote_write: add metric for unexpected metadata in populateV2TimeSeries (#17034)
add metric to track unexpected metadata seen in populateV2TimeSeries, which would indicate metadata incorrectly routed in queue_manager code paths

---------

Signed-off-by: leegin <leegin.t@gmail.com>
Signed-off-by: Darkknight <leegin.t@gmail.com>
2025-08-22 10:33:52 -07:00
Bryan Boreham 153cdb2b0b [PERF] PromQL: Replace Fprintf %f with AppendFloat
The combination of `AvailableBuffer`` followed by `Write` is optimised
inside `bytes.Buffer`.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-22 10:58:01 +01:00
Minh Nguyen c8deefb038
[tsdb] Add CounterResetHint: CounterReset to synthetic zero sample (#17011)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-21 23:26:01 +02:00
Marco Pracucci 954cad35b2
Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE (#17064)
* Optimise concurrent rule evaluation for rules querying ALERTS and ALERTS_FOR_STATE

Signed-off-by: Marco Pracucci <marco@pracucci.com>

* Further optimised the case of ALERTS and ALERTS_FOR_STATE without alertname label matcher

Signed-off-by: Marco Pracucci <marco@pracucci.com>

---------

Signed-off-by: Marco Pracucci <marco@pracucci.com>
2025-08-21 16:57:57 +02:00
Bryan Boreham b8d2d505f5 [PERF] PromQL: Replace some Sprintf with bytes.Buffer
Goes faster due to reduced memory allocation.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:38:05 +01:00
Bryan Boreham 49d9261693 [PERF] PromQL: Replace some simple Sprintf with string concat
This goes faster because there is no runtime format parsing.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:05:49 +01:00
Bryan Boreham e44ee2f182 [TESTS] PromQL: Add BenchmarkExprString
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:05:38 +01:00
Bryan Boreham 66fbea97bb [TESTS] Check expr with function call in TestExprString
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-21 11:04:46 +01:00
Bryan Boreham 8b3f59e9c3
Merge pull request #16593 from bboreham/ast-child-iter
[PERF] PromQL: Reduce allocations when walking syntax tree
2025-08-21 09:14:41 +01:00
Björn Rabenstein 7e37066994
Merge pull request #17051 from juliusmh/nh_counter_reset_hint_collision_annotation
Histograms: set annotation when adding or subtracting histograms that have `not_reset` and `reset` hints.
2025-08-20 17:24:12 +02:00
Julius Hinze cdf7208478
annotations: histogram counter reset warning includes operation
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:14:21 +02:00
Julius Hinze 77b5c3f217
Histograms: set annotation when adding or subtracting histograms that have `not_reset` and `reset` hints.
Signed-off-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-20 15:00:45 +02:00
Bryan Boreham 498f63e60b
Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples
TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints
2025-08-20 11:15:10 +01:00
Bartlomiej Plotka 5dc3c976b4
Merge pull request #17061 from prometheus/not-parallel
[TESTS] remote-write: Make TestShutdown non-parallel to reduce flakes.
2025-08-20 09:03:45 +01:00
Ganesh Vernekar a86d9a3858
Merge pull request #16925 from prometheus/codesome/stale-series-tracking
tsdb: Track stale series in the Head block based on stale sample
2025-08-19 15:35:19 -07:00
Patryk Prus bbc9e47e42
Add comment about differences between agent mode and regular Prometheus
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-19 18:33:52 -04:00
Ganesh Vernekar 3904b3cd5f Restore stale series count from chunk snapshots
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar b29ce3e489 Restore stale series count on WAL replay
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 0c3d3d7466 Test the stale series tracking in Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 7a947d3629 Track stale series in the Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:27 -07:00
Bryan Boreham a3c4a9bd18 [TESTS] remote-write: Make TestShutdown non-parallel to reduce flakes.
Resolves #17045.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-19 18:04:20 +01:00
Justin Jung 0f98dcbc07
Engine: Allow error response code to be customized (#16257)
Currently the API always returns http code 422 for engine execution error, and

This PR allows the error code to be overriden, based on the ErrorType and the error itself.

Signed-off-by: Justin Jung <jungjust@amazon.com>
Signed-off-by: Justin Jung <justinjung04@gmail.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
2025-08-19 16:43:47 +01:00
Bartlomiej Plotka 93bbf4bc90
Merge pull request #17041 from bernot-dev/remove-queue-manager-startup-benchmark
test: remove obsolete queue manager test
2025-08-18 17:06:39 +01:00
Arve Knudsen 0a40df33fb
Make metric/label name validation scheme explicit (#16928)
* Parameterize metric/label name validation scheme

Parameterized metric/label name validation scheme

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
Co-authored-by: Julius Hinze <julius.hinze@grafana.com>
2025-08-18 08:09:00 +00:00
Arve Knudsen 68d0d3eee3
Remote write: Return after writing error response for invalid compression (#17050)
* Remote write: Return after writing error response for invalid compression

Fix remote write HTTP handler to return after writing error response for
invalid compression (non-Snappy).

---------

Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
2025-08-17 15:18:47 +00:00
Bryan Boreham af588174b0
Merge pull request #17052 from vicwicker/fix-chunk-encoding-docs
docs: Fix chunk format documentation for `varint` encoding
2025-08-17 09:35:19 +01:00
Adam Bernot 575a60ec92
test: fix flaky test
A race condition in TestSendSamplesWithBackoffWithSampleAgeLimit was
observed in CI where the sample age limit was too close to the backoff
time, causing samples to be dropped intermittently. Increasing the
SampleAgeLimit resolves the problem.

Signed-off-by: Adam Bernot <bernot@google.com>
2025-08-15 18:58:27 +00:00
Björn Rabenstein e8f650b00c
Merge pull request #17049 from prometheus/beorn7/doc
docs: counter vs. gauge histogram behavior with `+`/`-`
2025-08-15 11:19:51 +02:00
Victor Herrero Otal 0cbbc9b7d3 docs: Fix chunk format documentation for `varint` encoding
While preparing PR #16701, we identified an inconsistency in the chunk
format documentation. The `varint` encoding can require up to 10 bytes
for a 64-bit integer, such as when timestamps are encoded. However, the
chunk length field is a 32-bit integer, which requires at most 5 bytes
in `varint` encoding.

This is reflected in the code, where a maximum of 5 bytes are read when
parsing the chunk length.

    50ba25f273/tsdb/chunks/chunks.go (L709-L711)

    50ba25f273/tsdb/chunks/chunks.go (L47-L48)

Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
Signed-off-by: Victor Herrero Otal <victor.herrero.otal@sap.com>
2025-08-15 10:56:21 +02:00
Björn Rabenstein 1c002c5669
Merge pull request #17048 from prymitive/parserErr
ENHANCEMENT: Refactor TestParseExpressions to be more explicit about errors
2025-08-14 16:04:16 +02:00
Dimitar Dimitrov d94dab92a8
remote.ReadClient: allow multiple queries (#16742)
* remote read: simplify ReadMultiple to return single SeriesSet

Changed ReadMultiple to return a single SeriesSet with interleaved
series from all queries instead of a slice of SeriesSets. This
simplifies the interface and removes the complex multiplexing
infrastructure while maintaining the ability to send multiple
queries in a single HTTP request.

Changes:
- Updated ReadClient interface: ReadMultiple now returns storage.SeriesSet
- Removed multiplexing infrastructure (MessageQueue, QueueConsumer, etc.)
- Simplified response handling to interleave series from all queries
- Updated tests to match new interface
- All existing tests pass

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Fix sorting behavior in ReadMultiple for samples responses

When sortSeries=false, the previous implementation incorrectly used
storage.NewMergeSeriesSet which requires sorted inputs, violating the
function's contract and potentially producing incorrect results.

Changes:
- When sortSeries=true: Use NewMergeSeriesSet for efficient merging and
  deduplication of sorted series
- When sortSeries=false: Use simple concatenation to avoid the sorted
  input requirement, preserving duplicates from overlapping queries
- Add comprehensive tests to verify both sorting behaviors
- Update existing test expectations to match correct sorted order

Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>

* Refactor to reduce code duplication in ReadMultiple implementation

Extract common query result combination logic into a shared
combineQueryResults function that handles both sorted and unsorted
cases. This eliminates duplication between the real client
implementation and the mock client used in tests.

Changes:
- Add combineQueryResults helper function in client.go
- Refactor handleSamplesResponseImpl to use the helper
- Simplify mockedRemoteClient.ReadMultiple to use the same helper
- Reduce code duplication by ~30 lines while maintaining same functionality


Signed-off-by: Dimitar Dimitrov <dimitar.dimitrov@grafana.com>
2025-08-14 15:00:07 +02:00
Lukasz Mierzwa 7e22d2e5c0 Refactor TestParseExpressions to be more explicit about errors
Right now TestParseExpressions tests if a query returns an error but it only does a fuzzy check on returned errors.
The error returned by the parser is ParseErrors, which is a slice of ParseErr structs.
The Error() method on ParseErrors will return an error string based on the first error in that slice. This hides other returned errors so we can end up with bogus errors being returned but won't ever find this via this test.
This change makes the test compare returned error (which is always ParseErrors type) with expected ParseErrors slice.
The extra benefit of this is that current tests mostly ignore error positional range and only test for correct error message. Now errors must return expected positional information.
There are a few cases uncovered where the positional informatio of errors seems wrong, added FIXME for these lines.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-14 10:01:44 +01:00
Björn Rabenstein 17d5d80c80
Merge pull request #17047 from shk1999/docs
chore(configuration.md): fix unit_testing_rules typo
2025-08-13 19:44:38 +02:00
beorn7 1071c82b42 docs: counter vs. gauge histogram behavior with `+`/`-`
This mostly handles the cases mentioned in #16576. However, there are
some related changes in here, too:

- Some line formatting to avoid lines longer than 80 characters.

- Establish in basics.md that histograms have a counter vs. gauge
  "flavor" that is also stored in the sample and not just by
  convention as for float samples.

- Add the documentation of the unary minus, which was missing so far.
  This require a bit of restructuring.

- Cleaned up a few references to "Prometheus" that should better refer
  to "PromQL" (and "Prometheus's query language" → "PromQL" etc.).

I decided to not explain in all detail when and how PromQL detects an
incompatible counter reset. The spec is linked from basics.md, so the
minority that might be interested in this can still look it up.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-13 19:25:17 +02:00
Lukasz Mierzwa 7b308dc7fe Add a note about PositionRange values
Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-13 17:57:40 +01:00
Neeraj Gartia 2c0de4e7c2
Fix `histogram_quantile` annotation in range query when delayed name removal is disabled (#16794)
Signed-off-by: Neeraj Gartia <neerajgartia211002@gmail.com>
2025-08-13 18:06:48 +02:00
Björn Rabenstein 4217d4ba46
Merge pull request #17046 from prometheus/beorn7/promql
promqltest: Add test for unary minus with native histograms
2025-08-13 16:57:09 +02:00