Commit Graph

1428 Commits

Author SHA1 Message Date
Björn Rabenstein 1caac94026
Merge pull request #17302 from prometheus/release-3.7
buf.build / lint and publish (push) Waiting to run Details
CI / Go tests (push) Waiting to run Details
CI / More Go tests (push) Waiting to run Details
CI / Go tests with previous Go version (push) Waiting to run Details
CI / UI tests (push) Waiting to run Details
CI / Go tests on Windows (push) Waiting to run Details
CI / Mixins tests (push) Waiting to run Details
CI / Build Prometheus for common architectures (0) (push) Waiting to run Details
CI / Build Prometheus for common architectures (1) (push) Waiting to run Details
CI / Build Prometheus for common architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (0) (push) Waiting to run Details
CI / Build Prometheus for all architectures (1) (push) Waiting to run Details
CI / Build Prometheus for all architectures (10) (push) Waiting to run Details
CI / Build Prometheus for all architectures (11) (push) Waiting to run Details
CI / Build Prometheus for all architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (3) (push) Waiting to run Details
CI / Build Prometheus for all architectures (4) (push) Waiting to run Details
CI / Build Prometheus for all architectures (5) (push) Waiting to run Details
CI / Build Prometheus for all architectures (6) (push) Waiting to run Details
CI / Build Prometheus for all architectures (7) (push) Waiting to run Details
CI / Build Prometheus for all architectures (8) (push) Waiting to run Details
CI / Build Prometheus for all architectures (9) (push) Waiting to run Details
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details
CI / Check generated parser (push) Waiting to run Details
CI / golangci-lint (push) Waiting to run Details
CI / fuzzing (push) Waiting to run Details
CI / codeql (push) Waiting to run Details
CI / Publish main branch artifacts (push) Blocked by required conditions Details
CI / Publish release artefacts (push) Blocked by required conditions Details
CI / Publish UI on npm Registry (push) Blocked by required conditions Details
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details
Merge release-3.7 back into main.
2025-10-07 18:40:08 +02:00
beorn7 e2aed2cd27 tsdb: Disable more tests on MS Windows
Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-07 16:34:59 +02:00
Björn Rabenstein 68e4d4e5eb
Merge pull request #17298 from prometheus/release-3.7
buf.build / lint and publish (push) Waiting to run Details
CI / Go tests (push) Waiting to run Details
CI / More Go tests (push) Waiting to run Details
CI / Go tests with previous Go version (push) Waiting to run Details
CI / UI tests (push) Waiting to run Details
CI / Go tests on Windows (push) Waiting to run Details
CI / Mixins tests (push) Waiting to run Details
CI / Build Prometheus for common architectures (0) (push) Waiting to run Details
CI / Build Prometheus for common architectures (1) (push) Waiting to run Details
CI / Build Prometheus for common architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (0) (push) Waiting to run Details
CI / Build Prometheus for all architectures (1) (push) Waiting to run Details
CI / Build Prometheus for all architectures (10) (push) Waiting to run Details
CI / Build Prometheus for all architectures (11) (push) Waiting to run Details
CI / Build Prometheus for all architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (3) (push) Waiting to run Details
CI / Build Prometheus for all architectures (4) (push) Waiting to run Details
CI / Build Prometheus for all architectures (5) (push) Waiting to run Details
CI / Build Prometheus for all architectures (6) (push) Waiting to run Details
CI / Build Prometheus for all architectures (7) (push) Waiting to run Details
CI / Build Prometheus for all architectures (8) (push) Waiting to run Details
CI / Build Prometheus for all architectures (9) (push) Waiting to run Details
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details
CI / Check generated parser (push) Waiting to run Details
CI / golangci-lint (push) Waiting to run Details
CI / fuzzing (push) Waiting to run Details
CI / codeql (push) Waiting to run Details
CI / Publish main branch artifacts (push) Blocked by required conditions Details
CI / Publish release artefacts (push) Blocked by required conditions Details
CI / Publish UI on npm Registry (push) Blocked by required conditions Details
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details
Merging back release-3.7 branch into master
2025-10-07 16:23:36 +02:00
Björn Rabenstein f2fc492473
Merge pull request #17284 from linasm/custom-bucket-bounds-match-fn
NHCB: Separate CustomBucketBoundsMatch from FloatBucketsMatch
2025-10-07 15:38:59 +02:00
beorn7 51c8e55835 tsdb: Do not track stFloat in typesInBatch explicitly
Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-07 15:01:22 +02:00
beorn7 5f582a7e1f tsdb: Remove leftover debug fmt.Println
Signed-off-by: beorn7 <beorn@grafana.com>
2025-10-07 14:58:25 +02:00
Bartlomiej Plotka a4da440dad
fix: Fix slicelabels corruption when used with proto decoding (#17150)
* fix: Fix slicelabels corruption when used with proto decoding

Alternative to https://github.com/prometheus/prometheus/pull/16957/

Signed-off-by: bwplotka <bwplotka@gmail.com>

* addressed comments

Signed-off-by: bwplotka <bwplotka@gmail.com>

---------

Signed-off-by: bwplotka <bwplotka@gmail.com>
2025-10-07 12:06:48 +01:00
György Krajcsovits d11ee103ac
perf(tsdb): reuse map of sample types to speed up head appender
While investigating +10% CPU in v3.7 release, found that ~5% is from
expanding the types map. Try reuse.

Also fix a linter error.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-10-07 08:31:20 +02:00
György Krajcsovits c26a5390aa
perf(tsdb): reuse map of sample types to speed up head appender
While investigating +10% CPU in v3.7 release, found that ~5% is from
expanding the types map. Try reuse.

Also fix a linter error.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-10-06 21:44:34 +02:00
Linas Medziunas 8caf1f1c41 [NHCB] Separate CustomBucketBoundsMatch from floatBucketsMatch
Signed-off-by: Linas Medziunas <linas.medziunas@gmail.com>
2025-10-05 22:38:07 +03:00
Patryk Prus dc3e6af91a
tsdb: Fix appended sample count metrics when converting float staleness markers to histograms (#17241)
buf.build / lint and publish (push) Waiting to run Details
CI / Go tests (push) Waiting to run Details
CI / More Go tests (push) Waiting to run Details
CI / Go tests with previous Go version (push) Waiting to run Details
CI / UI tests (push) Waiting to run Details
CI / Go tests on Windows (push) Waiting to run Details
CI / Mixins tests (push) Waiting to run Details
CI / Build Prometheus for common architectures (0) (push) Waiting to run Details
CI / Build Prometheus for common architectures (1) (push) Waiting to run Details
CI / Build Prometheus for common architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (0) (push) Waiting to run Details
CI / Build Prometheus for all architectures (1) (push) Waiting to run Details
CI / Build Prometheus for all architectures (10) (push) Waiting to run Details
CI / Build Prometheus for all architectures (11) (push) Waiting to run Details
CI / Build Prometheus for all architectures (2) (push) Waiting to run Details
CI / Build Prometheus for all architectures (3) (push) Waiting to run Details
CI / Build Prometheus for all architectures (4) (push) Waiting to run Details
CI / Build Prometheus for all architectures (5) (push) Waiting to run Details
CI / Build Prometheus for all architectures (6) (push) Waiting to run Details
CI / Build Prometheus for all architectures (7) (push) Waiting to run Details
CI / Build Prometheus for all architectures (8) (push) Waiting to run Details
CI / Build Prometheus for all architectures (9) (push) Waiting to run Details
CI / Report status of build Prometheus for all architectures (push) Blocked by required conditions Details
CI / Check generated parser (push) Waiting to run Details
CI / golangci-lint (push) Waiting to run Details
CI / fuzzing (push) Waiting to run Details
CI / codeql (push) Waiting to run Details
CI / Publish main branch artifacts (push) Blocked by required conditions Details
CI / Publish release artefacts (push) Blocked by required conditions Details
CI / Publish UI on npm Registry (push) Blocked by required conditions Details
Scorecards supply-chain security / Scorecards analysis (push) Waiting to run Details
tsdb: Fix appended sample count metrics when converting histogram staleness markers

Signed-off-by: Patryk Prus <p@trykpr.us>
Signed-off-by: Björn Rabenstein <github@rabenste.in>
Co-authored-by: Björn Rabenstein <github@rabenste.in>
2025-09-30 16:49:54 +00:00
George Krajcsovits 35d9f28c87
Update tsdb/record/record.go
Co-authored-by: Björn Rabenstein <beorn@grafana.com>
Signed-off-by: George Krajcsovits <krajorama@users.noreply.github.com>
2025-09-24 14:27:37 +02:00
György Krajcsovits 30f941c57c
fix(wal): ignore invalid native histogram schemas on load
Reduce the resolution of histograms as needed and ignore invalid
schemas while emitting a warning log.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-24 11:41:25 +02:00
György Krajcsovits a5a6413c1a
better errors naming and formatting, typo fixes
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-23 11:20:55 +02:00
György Krajcsovits b6df8d3274
feat(chunkenc): allow more native histograms schemas
Allow -9..52 schemas instead of just -4..8, but reduce resolution to 8 if
above.

The reduce code path will be slow, but we only expect it to happen if
TSDB already has higher resolution samples and we are in a rollback.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>

# Conflicts:
#	model/histogram/generic.go
2025-09-23 11:20:48 +02:00
György Krajcsovits 5b39b79f5a
refactor error creation and tests
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-19 09:26:34 +02:00
György Krajcsovits b99378f2c4
Merge remote-tracking branch 'origin/main' into krajo/native-histogram-schema-validation 2025-09-19 08:59:00 +02:00
György Krajcsovits 267be7dc20
fix(chunkenc): error out when reading unknown histogram schemas from chunks
Otherwise higher level code like PromQL needs to constantly check if it
can handle the samples.

Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
2025-09-18 09:21:03 +02:00
beorn7 bd0bf66f31 tsdb: Include floatHistograms in headAppender.Rollback()
Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7 b1fbf4f1e2 tsdb: Refactor staleness marker handling
With the fixed commit order, we can now handle the conversion of float
staleness markers to histogram staleness markers in a more direct way.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7 7e82bdb75b tsdb: Fix commit order for mixed-typed series
Fixes https://github.com/prometheus/prometheus/issues/15177

The basic idea here is to divide the samples to be commited into (sub)
batches whenever we detect that the same series receives a sample of a
type different from the previous one. We then commit those batches one
after another, and we log them to the WAL one after another, so that
we hit both birds with the same stone. The cost of the stone is that
we have to track the sample type of each series in a map. Given the
amount of things we already track in the appender, I hope that it
won't make a dent. Note that this even addresses the NHCB special case
in the WAL.

This does a few other things that I could not resist to pick up on the
go:

- It adds more zeropool.Pools and uses the existing ones more
  consistently. My understanding is that this was merely an oversight.
  Maybe the additional pool usage will compensate for the increased
  memory demand of the map.

- Create the synthetic zero sample for histograms a bit more
  carefully. So far, we created a sample that always went into its own
  chunk. Now we create a sample that is compatible enough with the
  following sample to go into the same chunk. This changed the test
  results quite a bit. But IMHO it makes much more sense now.

- Continuing past efforts, I changed more namings of `Samples` into
  `Floats` to keep things consistent and less confusing. (Histogram
  samples are also samples.) I still avoided changing names in other
  packages.

- I added a few shortcuts `h := a.head`, saving many characters.

TODOs:

- Address @krajorama's TODOs about commit order and staleness handling.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
beorn7 46cfc9fb99 tsdb: Extend TestDataNotAvailableAfterRollback
This exposes the ommission of float histograms from the rollback.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-09-17 19:22:25 +02:00
Bryan Boreham 11c49151b7
[REFACTOR] TSDB chunks: replace magic numbers with constants (#17095)
For size of header and position of flags byte.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-09-02 16:05:21 +01:00
Bryan Boreham aa12c0d4c3
Merge pull request #17074 from prymitive/logs
TSDB: Log when GC / block write starts
2025-09-02 12:55:12 +01:00
Bryan Boreham 8e133e100f
Merge pull request #17081 from prometheus/superq/if_err_nil
tsdb: Fixup err nil checks
2025-09-02 12:37:51 +01:00
bwplotka 794bf774c2 Reapply "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)"
This reverts commit f5fab47577.
2025-08-29 08:16:37 +01:00
bwplotka f5fab47577 Revert "prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)"
This reverts commit c808a71e18.
2025-08-29 08:15:28 +01:00
Jonathan c808a71e18
prw: use Unit and Type labels for metadata when feature flag is enabled (#17033)
* chore: send Unit and Type when feature flag is enabled

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused code and comments

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unreal scenario

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused if

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused labels

Signed-off-by: perebaj <perebaj@gmail.com>

* linter

Signed-off-by: perebaj <perebaj@gmail.com>

* enable type and unit through remotewrite config

Signed-off-by: perebaj <perebaj@gmail.com>

* remove test comment and capture type and unit when flag is enabled

Signed-off-by: perebaj <perebaj@gmail.com>

* gofumpt

Signed-off-by: perebaj <perebaj@gmail.com>

* modelTypeToWriteV2Type

Signed-off-by: perebaj <perebaj@gmail.com>

* use NewMetadataFromLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* capture feature flag from main

Signed-off-by: perebaj <perebaj@gmail.com>

* simplifying logic

Signed-off-by: perebaj <perebaj@gmail.com>

* remove unused function

Signed-off-by: perebaj <perebaj@gmail.com>

* formatting code

Signed-off-by: perebaj <perebaj@gmail.com>

* gofumpt

Signed-off-by: perebaj <perebaj@gmail.com>

* remove public var: EnableTypeAndUnitLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* remove enableTypeAndUnitLabels from TestPopulateV2TimeSeries_typeAndUnitLabels

Signed-off-by: perebaj <perebaj@gmail.com>

* remove enableTypeAndUnitLabels from main

Signed-off-by: perebaj <perebaj@gmail.com>

* use schema helper to populate metadata

Signed-off-by: perebaj <perebaj@gmail.com>

* remove metadata since nil is the default value

Signed-off-by: perebaj <perebaj@gmail.com>

* add TestPopulateV2TimeSeries_UnexpectedMetadata

Signed-off-by: perebaj <perebaj@gmail.com>

* Update storage/remote/queue_manager_test.go

Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>

---------

Signed-off-by: perebaj <perebaj@gmail.com>
Signed-off-by: Bartlomiej Plotka <bwplotka@gmail.com>
Co-authored-by: Bartlomiej Plotka <bwplotka@gmail.com>
2025-08-29 04:10:01 +00:00
Björn Rabenstein ba808d1736
Merge pull request #17092 from prometheus/beorn7/cleanup
Apply analyzer modernize to the whole codebase
2025-08-28 00:42:33 +02:00
Bryan Boreham 2fb50b12cd
[PERF] TSDB: Optimize appender creation on empty chunks (#16922)
Skip creating an iterator and walking all through any existing values,
when we can easily tell there are no existing values.

This is the normal case - the TSDB head creates an appender immediately
after creating every chunk.

Remove redundant handling of empty chunks.

Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
2025-08-27 17:11:08 +01:00
beorn7 23f1d3ba25 tsdb: Remove unused `Layout()` methods
Both `HistogramChunk` and `FloatHistogramChunk` have a `Layout()`
method for historical reasons. As it has turned out, these methods are
unused and also buggy. This commit simply removes them.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 17:01:23 +02:00
beorn7 71c21fb9e4 Fix minor issues after applying analyzer "modernize"
- The tool left an empty line behind that we don't need anymore, see
  https://github.com/prometheus/prometheus/pull/17092. (Arguably not a
  bug in the tool but just our stricter style about empty lines.)

- In tsdb/index/postings_test.go , our (admittedly somewhat
  convoluted) code structure tricked the tool so it spit out something
  that wouldn't even compile.

- storage/remote/queue_manager_test.go is just a minor formatting
  nit.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 15:44:11 +02:00
beorn7 747c5ee2b1 Apply analyzer "modernize" to the whole codebase
See
https://pkg.go.dev/golang.org/x/tools/gopls/internal/analysis/modernize
for details.

This ran into a few issues (arguably bugs in the modernize tool),
which I will fix in the next commit, so that we have transparency what
was done automatically.

Beyond those hiccups, I believe all the changes applied are
legitimate. Even where there might be no tangible direct gain, I would
argue it's still better to use the "modern" way to avoid micro
discussions in tiny style PRs later.

Signed-off-by: beorn7 <beorn@grafana.com>
2025-08-27 14:48:41 +02:00
Lukasz Mierzwa 31282d67b7 Log when GC / block write starts
Right now Prometheus only logs when these operations are completed.
It's a bit surprising to see suddenly a message saying "I was busy doing X for the past N minutes"
so let's add a message when the operation starts, so it's easier to understand what Prometheus was doing at any point in time
when reading logs.

Signed-off-by: Lukasz Mierzwa <l.mierzwa@gmail.com>
2025-08-26 10:30:22 +01:00
SuperQ b87cbf0294
Fixup err nil checks
Cleanup double `if` statements for errors being nil / not-nil.

Signed-off-by: SuperQ <superq@gmail.com>
2025-08-25 17:37:02 +02:00
Minh Nguyen c8deefb038
[tsdb] Add CounterResetHint: CounterReset to synthetic zero sample (#17011)
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-21 23:26:01 +02:00
Bryan Boreham 498f63e60b
Merge pull request #17029 from pr00se/wal-checkpoint-dropped-samples
TSDB: use timestamps rather than WAL segment numbers to track how long deleted series should be retained in checkpoints
2025-08-20 11:15:10 +01:00
Ganesh Vernekar a86d9a3858
Merge pull request #16925 from prometheus/codesome/stale-series-tracking
tsdb: Track stale series in the Head block based on stale sample
2025-08-19 15:35:19 -07:00
Patryk Prus bbc9e47e42
Add comment about differences between agent mode and regular Prometheus
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-19 18:33:52 -04:00
Ganesh Vernekar 3904b3cd5f Restore stale series count from chunk snapshots
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar b29ce3e489 Restore stale series count on WAL replay
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 0c3d3d7466 Test the stale series tracking in Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:37 -07:00
Ganesh Vernekar 7a947d3629 Track stale series in the Head
Signed-off-by: Ganesh Vernekar <ganesh.vernekar@reddit.com>
2025-08-19 15:07:27 -07:00
Victor Herrero Otal 0cbbc9b7d3 docs: Fix chunk format documentation for `varint` encoding
While preparing PR #16701, we identified an inconsistency in the chunk
format documentation. The `varint` encoding can require up to 10 bytes
for a 64-bit integer, such as when timestamps are encoded. However, the
chunk length field is a 32-bit integer, which requires at most 5 bytes
in `varint` encoding.

This is reflected in the code, where a maximum of 5 bytes are read when
parsing the chunk length.

    50ba25f273/tsdb/chunks/chunks.go (L709-L711)

    50ba25f273/tsdb/chunks/chunks.go (L47-L48)

Co-authored-by: Istvan Zoltan Ballok <istvan.zoltan.ballok@sap.com>
Signed-off-by: Victor Herrero Otal <victor.herrero.otal@sap.com>
2025-08-15 10:56:21 +02:00
pipiland2612 82a4b12507 Add t.parallel() for ./tsdb
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-12 14:12:42 +02:00
pipiland2612 de93387f0b Parallel tsdb/wlog test
Signed-off-by: pipiland2612 <nguyen.t.dang.minh@gmail.com>
2025-08-12 14:12:15 +02:00
Patryk Prus 676f7665fa
Use testutil.RequireEqual to handle dedupelabels in test
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:52:03 -04:00
Patryk Prus ead6dc32b9
Fix test
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:34:56 -04:00
Patryk Prus 5cb0192626
Address linter errors
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:25:14 -04:00
Patryk Prus 0fea41ed53
Refactor keep function to work for both agent and non-agent implementations
Signed-off-by: Patryk Prus <p@trykpr.us>
2025-08-08 14:12:47 -04:00