This partially reverts ae3d392aa9.
ae3d392aa9 added a call to db.mtx.Lock() that lasts for the entire duration of db.reloadBlocks(),
previous db.mtx would be locked only during critical part of db.reloadBlocks().
The motivation was to protect against races:
9e0351e161 (r555699794)
The 'reloads' being mentioned are (I think) reloadBlocks() calls, rather than db.reload() or other methods.
TestTombstoneCleanRetentionLimitsRace was added to catch this but I wasn't able to ever get any error out of it, even after disabling all calls to db.mtx in reloadBlocks() and CleanTombstones().
To make things more complicated CleanupTombstones() itself calls reloadBlocks(), so it seems that the real issue is that we might have concurrent calls to reloadBlocks().
The problem with this change is that db.reloadBlocks() can take a very long time, that's because it might need to load very large blocks from disk, which is slow.
While db.mtx is locked a large chunk of the db is locked, including queries, since db.mtx read lock is needed for db.Querier() call.
One of the issues this manifests itself as is a gap in all metrics and blocked queries just after a large block compaction happens.
When compaction merges multiple day-or-more blocks into a week-or-more block it create a single very big block.
After that block is written it needs to be loaded and that seems to be taking many seconds (30-45), during which mtx is held and everything is blocked.
Turns out that there is another lock that is more fine grained and aimed at this specific use case:
// cmtx ensures that compactions and deletions don't run simultaneously.
cmtx sync.Mutex
All calls to reloadBlocks() are wrapped inside cmtx lock. The only exception is db.reload() which this change fixes.
We can't add cmtx lock inside reloadBlocks() itself because it's called by a number of functions, some of which are already holding cmtx.
Looking at the code I think it is sufficient to hold cmtx and skip a reloadBlocks() wide mtx call.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Fix issues raised by staticcheck
We are not enabling staticcheck explicitly, though, because it has too many false positives.
---------
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
When creating dummy data for benchmarks, call `Commit()` periodically to
avoid growing the appender to enormous size.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Exported the CheckpointPrefix constant to be used in other packages.
Updated references to the constant in db.go and checkpoint.go files.
This change improves code readability and maintainability.
Signed-off-by: johncming <johncming@yahoo.com>
Co-authored-by: johncming <conjohn668@gmail.com>
This enables it to take advantage of a more compact data structure
since all postings are known to be `*ListPostings`.
Remove the `Get` member which was not used for anything else, and fix up
tests.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Now we can call it with more specific types which is more efficient than
making everything go through the `Postings` interface.
Benchmark the concrete type.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
We need to create more postings entries so the merger has some work to do.
Not material for the regexp ones as they match so few series.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* [ENHANCEMENT] TSDB: Improve calculation of space used by labels
The labels for each series in the Head take up some some space in the
Postings index, but far more space in the `memSeries` structure.
Instead of having the Postings index calculate this overhead, which is
a layering violation, have the caller pass in a function to do it.
Provide three implementations of this function for the three Labels
versions.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Remove the 2 minute timeout as the default is 2 hours and wouldn't
interfere. With the test. Otherwise the extra samples combined with
race detection can push the test over 2 minutes and make it fail.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
The segment size was too low for the additional NHCB data, thus it created
more segments then expected. This meant that less were in the lower
numbered segments, which meant more was kept.
FAIL: TestCheckpoint (4.05s)
FAIL: TestCheckpoint/compress=none (0.22s)
checkpoint_test.go:361:
Error Trace: /home/krajo/go/github.com/prometheus/prometheus/tsdb/wlog/checkpoint_test.go:361
Error: "0.8586956521739131" is not less than "0.8"
Test: TestCheckpoint/compress=none
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Signed-off-by: Antoine Pultier <45740+fungiboletus@users.noreply.github.com>
Signed-off-by: Antoine Pultier <45740+fungiboletus@users.noreply.github.com>
While investigating lock contention on `MemPostings`, we saw that lots
of locking is happening in `LabelValues` and
`PostingsForLabelsMatching`, both copying the label values slices while
holding the mutex.
This adds an extra map that holds an append-only label values slice for
each one of the label names. Since the slice is append-only, it can be
copied without holding the mutex.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Since dot is matching newline now, `l=~".+"` is "any non empty label
value", and #14144 added a specific method in the index for that so we
don't need to run the matcher on each one of the label values.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Same as #15427 but for the new method added in #14144
Instead of allocating each ListPostings one by one, allocate them all in
one go.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Instead of allocating ListPostings pointers one by one, allocate a slice
and take pointers from that. It's faster, and also generates less
garbage (NewListPostings is one of the top offenders in number of
allocations).
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Simple follow-up to #13620. Modify `tsdb.PostingsForMatchers` to use the optimized tsdb.IndexReader.PostingsForLabelMatching method also for inverse matching.
Introduce method `PostingsForAllLabelValues`, to avoid changing the existing method.
The performance is much improved for a subset of the cases; there are up to
~60% CPU gains and ~12.5% reduction in memory usage.
Remove `TestReader_InversePostingsForMatcherHonorsContextCancel` since
`inversePostingsForMatcher` only passes `ctx` to `IndexReader` implementations now.
Signed-off-by: Arve Knudsen <arve.knudsen@gmail.com>
It was crashing due to uninitialized metrics, and not terminating due to
incorrectly reading segment names.
We need to export `SetMetrics` to avoid the first problem.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Always return unknown hint for first sample in non-gauge histogram chunk
---------
Signed-off-by: Fiona Liao <fiona.liao@grafana.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Move a couple of variables inside the scope of a goroutine, to avoid
data races.
Use `zeropool` to reduce garbage and avoid some lint warnings.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
This introduces back some unlocking that was removed in #13286 but in a
more balanced way, as suggested by @pracucci.
For TSDBs with a lot of churn, Delete() can take a couple of seconds,
and while it's holding the mutex, reads and writes are blocked waiting
for that mutex, increasing the number of connections handled and memory
usage.
This implementation pauses every 4K labels processed (note that also
compared to #13286 we're not processing all the label-values anymore,
but only the affected ones, because of #14307), makes sure that it's
possible to get the read lock, and waits for a few milliseconds more.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Co-authored-by: Marco Pracucci <marco@pracucci.com>
* Add hidden flag for the delayed compaction random time window
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update cmd/prometheus/main.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Update tsdb/db.go
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
* Fix flag name according to review - add test for delay
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Fix afer main rebase
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Implement review comments
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
* Update generatedelaytest to try with limit values
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
---------
Signed-off-by: Alban HURTAUD <alban.hurtaud@amadeus.com>
Signed-off-by: Alban Hurtaud <alban.hurtaud@amadeus.com>
Co-authored-by: Ayoub Mrini <ayoubmrini424@gmail.com>
This reverts commit 50ef0dc954.
Memory allocation goes so high in Prombench that the system is unusable.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* [REFACTOR] simplify appender commit
Signed-off-by: Nicolas Takashi <nicolas.tcs@hotmail.com>
Signed-off-by: Arthur Silva Sens <arthursens2005@gmail.com>
Co-authored-by: George Krajcsovits <krajorama@users.noreply.github.com>
Co-authored-by: Arthur Silva Sens <arthursens2005@gmail.com>
* Remove unused option from HeadOptions
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Improve docs for appendable() method in head appender
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Ingest CT (float) samples in Agent DB
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* allow for ingestion of CT native histogram
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* adding some verification for ct ts
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Validating CT histogram before append and add newly created series to pending series
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* checking the wal for written samples
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Checking for samples in test
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* adding case for validations
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* fixing comparison when dedupelabels is enabled
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* unite tests, use table testing
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Implement CT related methods in timestampTracker for write storage
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* adding error case to test
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* removing unused fields
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* Updating lastTs for series when adding CT to invalidate duplicates
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
* making sure that updating the lastTS wont cause OOO later on in Commit();
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
---------
Signed-off-by: Pedro Tanaka <pedro.tanaka@shopify.com>
memChunk is a linked list, speed up some common operations when there's no need to iterate all elements on the list.
Signed-off-by: Łukasz Mierzwa <l.mierzwa@gmail.com>
Fix some edge cases when OOO is enabled
Signed-off-by: Vanshikav123 <vanshikav928@gmail.com>
Signed-off-by: Vanshika <102902652+Vanshikav123@users.noreply.github.com>
Signed-off-by: Jesus Vazquez <jesusvzpg@gmail.com>
Co-authored-by: Jesus Vazquez <jesusvzpg@gmail.com>
When handling recoded histogram chunks the min time of the chunk is
updated by mistake. It should only update when the chunk is completely new.
Otherwise the ongoing chunk's meta will be later than the previously
written samples in it.
Same bug as https://github.com/prometheus/prometheus/pull/14629
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
When handling recoded histogram chunks the min time of the chunk is
updated by mistake. It should only update when the chunk is completely
new.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* Tests for Mempostings.{Add,Get} data race
* Fix MemPostings.{Add,Get} data race
We can't modify the postings list that are held in MemPostings as they
might already be in use by some readers.
* Modify BenchmarkHeadStripeSeriesCreate to have common labels
If there are no common labels on the series, we don't excercise the
ordering part of MemSeries, as we're just creating slices of one element
for each label value.
---------
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Move writing memSeries lastHistogramValue and lastFloatHistogramValue
after series creation under lock.
The resulting code isn't totally correct in the sense that we're setting
these values before Commit() , so they might be overwritten/rolled back
later.
Also Append of stale sample checks the values without lock, so there's
still a potential race.
The correct solution would be to set these only in Commit() which we
actually do, but then Commit() would also need to process samples in
order and not floats first, then histograms, then float histograms - which
leads to not knowing what stale marker to write for histograms.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Two Appenders race when creating a series with a native histogram
as the memSeries will be common and the lastHistogram field is written
without lock.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
- `float histogram` → `floathistogram`, as it is used in the code.
- Actual link encodings to the code (to find the actual numerical values).
- `<bytes>` → `<data>` for consistency.
Signed-off-by: beorn7 <beorn@grafana.com>
For: #14355
This commit updates Prometheus to adopt stdlib's log/slog package in
favor of go-kit/log. As part of converting to use slog, several other
related changes are required to get prometheus working, including:
- removed unused logging util func `RateLimit()`
- forward ported the util/logging/Deduper logging by implementing a small custom slog.Handler that does the deduping before chaining log calls to the underlying real slog.Logger
- move some of the json file logging functionality to use prom/common package functionality
- refactored some of the new json file logging for scraping
- changes to promql.QueryLogger interface to swap out logging methods for relevant slog sugar wrappers
- updated lots of tests that used/replicated custom logging functionality, attempting to keep the logical goal of the tests consistent after the transition
- added a healthy amount of `if logger == nil { $makeLogger }` type conditional checks amongst various functions where none were provided -- old code that used the go-kit/log.Logger interface had several places where there were nil references when trying to use functions like `With()` to add keyvals on the new *slog.Logger type
Signed-off-by: TJ Hoplock <t.hoplock@gmail.com>
Because we are reimplementing the `IndexReader` to fetch in-order and
out-of-order chunks together, we must reproduce the behaviour of
`Head.indexRange()`, which floors the minimum time queried at `head.MinTime()`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
The specialized version of sample add to the ring:
func addH(s hSample, buf []hSample, r *sampleRing) []hSample
func addFH(s fhSample, buf []fhSample, r *sampleRing) []fhSample
already correctly copy histogram samples from the reused hReader, fhReader
buffers, but the generic version does not. This means that the
data is overwritten on the next read if the sample ring has seen histogram
and float samples at the same time and switched to generic mode.
The `genericAdd` function (which was commented anyway) is by now quite
different from the specialized functions so that this commit deletes
it.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We are still seeing lock contention on MemPostings.mtx, and MemPostings.Delete() is by far the most expensive operation on that mutex.
This adds parallelism to that method, trying to reduce the amount of time we spend with the mutex held.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
If the query overlaps the range currently undergoing compaction, we
should only fetch chunks up to that time. Need to store that min time
in `HeadAndOOOIndexReader`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
If the query overlaps the range currently undergoing compaction, we
should only fetch chunks up to that time. Need to store that min time
in `HeadAndOOOIndexReader`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Instead of a 2-bit write followed by a 14-bit write, do two 8-bit
writes, which goes much faster since it avoids looping.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Benchmarks must do the same work N times.
Run 3 cases, where the values are constant, vary a bit, and vary a lot.
Also aim for 120 samples same as TSDB default.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Shortcut for `.*` matches newlines as well.
Add preamble change ^(?s:
Add test
dotAll flag por al regex
Add and fix regex tests
Signed-off-by: Mario Fernandez <mariofer@redhat.com>
Several regexps were coded like `"^.*$"`, which is an unnatural
formulation nobody is likely to use. Inside `NewMatcher`, `^` and `$`
are added anyway, which makes the form in the benchmark redundant.
It even printed it out in the expected way.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* tsdb: mmapCurrentOOOHeadChunk prepare for multiple ooo chunks
Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* tsdb: getOOOSeriesChunks prepare for multiple ooo chunks
Currently float samples can only create a single ooo head chunk, but
native histograms can result in multiple due to counter resets, etc.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
---------
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Check if headQuerier is nil before trying to use it.
* TestQueryOOOHeadDuringTruncate: unit test to check query during truncate
Regression test for #14822
* Simulate race between query and Compact()
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
promql, tsdb (histograms): Do not re-use spans between histograms
When multiple points exist with the same native histogram schemas they
share their spans.
This causes a problem when a native histogram (NH) schema is modified (for example, during
a Sum) then the other NH's with the same spans are also modified. As such,
we should create a new Span for each NH. This will ensure NH's interfaces
are safe to use without considering the effect on other histograms.
At the moment this doesn't present itself as a problem because in all
aggregations and functions operating on native histograms they are copied
by the promql query engine first.
Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
---------
Signed-off-by: Joshua Hesketh <josh@nitrotech.org>
This was part of #14525 which was reverted.
I still think that having this benchmark committed in to the repo is
useful.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Starts its index from 0 , but users call Next() before first sample
so it needs to start from -1
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
`getOOOSeriesChunks` was already finding sets of overlapping chunks; we
store those in a `multiMeta` struct so that `ChunkOrIterable` can
reconstruct an `Iterable` easily and predictably.
We no longer need a `MergeOOO` flag to indicate that this Meta should
be merged with other ones; this is explicit in the `multiMeta` structure.
We also no longer need `chunkMetaAndChunkDiskMapperRef`.
Add `wrapOOOHeadChunk` to defeat `chunkenc.Pool` - chunks are reset
during compaction, but if we wrap them (like `safeHeadChunk` was doing
then this is skipped) .
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Go's built-in append() grows larger slices with factor 1.3, which means we do a lot more allocating and copying for larger postings.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Case 1: OOO in-memory head chunk overlaps with first mmaped in-order chunk.
Query: |----------------------------------------------------------------|
InO: |------mmap---------------||---------mem----------------------|
OOO: |-----mem-----------|
This triggers ChunkOrIterableWithCopy not including OOO head chunks bug.
Similar to #14693 however testing the end of the interval doesn't
trigger the problem because there the in-order head chunk will be
trimmed with a tombstone, causing the code to switch to ChunkOrIterable
which was fixed.
See a36d1a8a92/tsdb/querier.go (L646)
where len(p.bufIter.Intervals) will be non zero, because it includes the
tombstone to trim the result to the query max time.
Thus a new test is added to check the overlap at the beginning of the
interval that has a separate chunk, which does not need trimming.
Note: same test doesn't fail for sample querier in Test_Querier_OOOQuery
as that doesn't use copy, that is copyHeadChunk is false in the if
condition above.
Case 2:
OOO mmaped head chunk overlaps with first mmaped in-order chunk.
Query: |----------------------------------------------------------------|
InO: |------mmap---------------||---------mem----------------------|
OOO: |-----mmap-----------| |--mem--|
In this case the meta contains the reference of the in-order chunk and
no indication that a merge is needed with the OOO mmaped chunk.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Several things done here:
- Set `max-issues-per-linter` to 0 so that we actually see all linter
warnings and not just 50 per linter. (As we also set
`max-same-issues` to 0, I assume this was the intention from the
beginning.)
- Stop using the golangci-lint default excludes (by setting
`exclude-use-default: false`. Those are too generous and don't match
our style conventions. (I have re-added some of the excludes
explicitly in this commit. See below.)
- Re-add the `errcheck` exclusion we have used so far via the
defaults.
- Exclude the signature requirement `govet` has for `Seek` methods
because we use non-standard `Seek` methods a lot. (But we keep other
requirements, while the default excludes completely disabled the
check for common method segnatures.)
- Exclude warnings about missing doc comments on exported symbols. (We
used to be pretty adamant about doc comments, but stopped that at
some point in the past. By now, we have about 500 missing doc
comments. We may consider reintroducing this check, but that's
outside of the scope of this commit. The default excludes of
golangci-lint essentially ignore doc comments completely.)
- By stop using the default excludes, we now get warnings back on
malformed doc comments. That's the most impactful change in this
commit. It does not enforce doc comments (again), but _if_ there is
a doc comment, it has to have the recommended form. (Most of the
changes in this commit are fixing this form.)
- Improve wording/spelling of some comments in .golangci.yml, and
remove an outdated comment.
- Leave `package-comments` inactive, but add a TODO asking if we
should change that.
- Add a new sub-linter `comment-spacings` (and fix corresponding
comments), which avoids missing spaces after the leading `//`.
Signed-off-by: beorn7 <beorn@grafana.com>
* tsdb: Unit test query overlapping in order and ooo head
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* TSDB: Merge overlapping head chunk
The basic idea is that getOOOSeriesChunks can populate Meta.Chunk, but since
it only returns one Meta per overlapping time-slot, that pointer may end up in a
Meta with a head-chunk ID. So we need HeadAndOOOChunkReader.ChunkOrIterable()
to call mergedChunks in that case.
Previously, mergedChunks was checking that meta.Ref was a valid OOO chunk reference,
but it never actually uses that reference; it just finds all chunks overlapping in time.
So we can delete that code.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Co-authored-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
In `mmapCurrentOOOHeadChunk`, check if the number is at the maximum and
drop the data with an error log. This is not expected to happen as the
maximum is over 8 million; that's 8 years of 1 sample every second.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
* Re-enable check in `createHeadWithOOOSamples` which wasn't really broken.
* Move code making `Block` into a `Queryable` into test file.
* Make `getSeriesChunks` return a slice (renamed `appendSeriesChunks`).
* Rename `oooMergedChunks` to `mergedChunks`.
* Improve comment on `ChunkOrIterableWithCopy`.
* Name return values from unpackHeadChunkRef.
Co-authored-by: Oleg Zaytsev <mail@olegzaytsev.com>
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Use headIndexReader instead.
OOOCompactionHeadIndexReader needs to be expanded slightly, because it previously delegated to OOOHeadIndexReader.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Just query via `HeadAndOOOQuerier`, which will skip series where no
in-order chunks are in range.
Now we don't need `OOORangeHead`.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Add `HeadAndOOOQuerier` which iterates just once over series, then
where necessary merges chunks from in-order and out-of-order lists.
Add a ChunkQuerier for in-order and ooo together
Add copy-last-chunk behaviour to HeadAndOOOChunkReader
Out-of-order chunk IDs are distinguished from in-order by setting bit 23.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Refactor existing BenchmarkQuerierSelect to provide the set-up.
Note that Head queries now run faster because they use a RangeHead.
Signed-off-by: Bryan Boreham <bjboreham@gmail.com>
Rename a variable.
Add parameters to memSeries.insert function.
No effect on how float samples are handled.
Related to #14546
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
Discovered while working on #14546 OOO native histograms.
Not triggered on main before #14546 as the code path is unused.
There was a bug where the min time of a chunk was adjusted even
if it was only recoded and not completely new.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
avoid simultaneous compactions and reduce stress on shared resources.
This is enabled via `--enable-feature=delayed-compaction`.
Signed-off-by: machine424 <ayoubmrini424@gmail.com>
* Fix appendable: check whether last val was a histogram
When appending a float, we were checking whether lastValue was equal to
current value, but we didn't check whether last value was a float value.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>
Iterator may share spans without copy, so we always have to make a copy
before modification - copy-on-write.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
* chunkenc: allow missing empty buckets on histogram append
Allow appending to chunks when the histogram to be added is missing
some buckets, but the missing buckets are empty in the chunk.
For example bucket at index 5 is present in the chunk, but its value
is 0 and the new histogram doesn't have a bucket at index 5.
This fixes an issue of merging chunks where one chunk was recoded to
retroactively have some empty buckets in all the histograms and we are
merging in a histogram that doesn't have the empty bucket (because it
was not recoded yet).
The operation alters the histogram that is being added, however this has
already been the case when appending gauge histograms. Thus the test
TestHistogramSeriesToChunks in storage package is changed to explicitly
test what happened to the appended histogram - Compact(0) call is removed.
The new expandIntSpansAndBuckets and expandFloatSpansAndBuckets functions
are a merge of expandSpansForward and counterResetInAnyBucket and
counterResetInAnyFloatBucket.
Signed-off-by: György Krajcsovits <gyorgy.krajcsovits@grafana.com>
We don't use seriesShard during DB initialization, so we can use the
same 8 bytes to store mmMaxTime, and save those during the rest of the
lifetime of the database.
This doesn't affect CPU performance.
Signed-off-by: Oleg Zaytsev <mail@olegzaytsev.com>