Commit Graph

12423 Commits

Author SHA1 Message Date
YaacovHazan c6e5d1d5fe Merge remote-tracking branch 'upstream/unstable' into HEAD
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-31 21:26:40 +03:00
DvirDukhan 8ea8f4220c
Update RediSearch Makefile - 7.99.90 (#13905)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-31 21:26:07 +03:00
Eran Hadad 1c646662e9
Bump module version to v7.99.90 for RedisBloom, JSON and Timeseries (#13908) 2025-03-31 21:24:22 +03:00
Ozan Tezcan 366c6aff81
Put replica online when bgsave is done (#13895)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
Before https://github.com/redis/redis/pull/13732, replicas were brought
online immediately after master wrote the last bytes of the RDB file to
the socket. This behavior remains unchanged if rdbchannel replication is
not used. However, with rdbchannel replication, the replica is brought
online after receiving the first ack which is sent by replica after rdb
is loaded.

To align the behavior, reverting this change to put replica online once
bgsave is done.

Additonal changes:
- INFO field `mem_total_replication_buffers` will also contain
`server.repl_full_sync_buffer.mem_used` which shows accumulated
replication stream during rdbchannel replication on replica side.
- Deleted debug level logging from some replication tests. These tests
generate thousands of keys and it may cause per key logging on some
cases.
2025-03-31 13:48:49 +03:00
YaacovHazan 5d887c58ae
Merge unstable into 8.0 (#13901)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
Merging unstable towards GA
2025-03-30 15:11:57 +03:00
Jason aa8e2d1712
Ignore shardId updates from replica nodes (#13877)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
Close https://github.com/redis/redis/issues/13868

This bug was introduced by https://github.com/redis/redis/pull/13468

## Issue
To maintain compatibility with older versions that do not support
shardid, when a replica passes a shardid, we also update the master’s
shardid accordingly.

However, when both the master and replica support shardid, an issue
arises: in one moment, the master may pass a shardid, causing us to
update both the master and all its replicas to match the master’s
shardid. But if the replica later passes a different shardid, we would
then update the master’s shardid again, leading to continuous changes in
shardid.

## Solution
Regardless of the situation, we always ensure that the replica’s shardid
remains consistent with the master’s shardid.
2025-03-30 15:15:04 +08:00
YaacovHazan 452b5b8a3b Merge remote-tracking branch 'upstream/unstable' into HEAD 2025-03-30 09:54:48 +03:00
Vitah Lin 057f039c4b
Fix 'RESTORE can set LFU' test (#13896)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
When the `restore foo 0 $encoded freq 100` command and `set freq [r
object freq foo]` run in different minute timestamps (i.e., when
server.unixtime/60 changes between these operations), the assertion may
fail due to the LFU decay.

This PR updates the “RESTORE can set LFU” test to verify the actual freq
value based on minute timestamps.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-03-28 13:33:58 +08:00
debing.sun 87d8e71708
Fix defrag when type/encoding changes during scan (#13883)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
This PR is based on: https://github.com/valkey-io/valkey/pull/1801

[SoftlyRaining](https://github.com/SoftlyRaining) was hunting for defrag
bugs with Jim and found a couple of improvements to make. Jim pointed
out that in several of the callbacks, if the encoding were to change it
simply returns without doing anything to `cursor` to make it reach 0,
meaning that it would continue no-op working on that item without making
any progress. Type and encoding can change while the defrag scan is in
progress if the value is mutated or replaced by something else with the
same key.

---------

Signed-off-by: Rain Valentine <rsg000@gmail.com>
Co-authored-by: Rain Valentine <rsg000@gmail.com>
2025-03-27 08:58:57 +08:00
Ozan Tezcan a0da8390a2
Fix use-after-free when diskless load config is not swapdb (#13887)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
When the diskless load configuration is set to on-empty-db, we retain a
pointer to the function library context. When emptyData() is called, it
frees this function library context pointer, leading to a use-after-free
situation.

I refactored code to ensure that emptyData() is called first, followed
by retrieving the valid pointer to the function library context.

Refactored code should not introduce any runtime implications.

Bug introduced by https://github.com/redis/redis/pull/13495 (Redis 8.0)

Co-authored-by: Oran Agra <oran@redislabs.com>
2025-03-26 21:50:10 +03:00
Cong Chen 981aa5c12f
Fix timing issue in HEXPIREAT test (#13873)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
This fixes an error that occurs in the job
[test-valgrind-no-malloc-usable-size-test](https://github.com/redis/redis/actions/runs/13912357739/job/38929051397)
of the Daily workflow:

```
*** [err]: HEXPIREAT - Set time and then get TTL (listpackex) in tests/unit/type/hash-field-expire.tcl
Expected '999' to be between to '1000' and '2000' (context: type eval line 6 cmd {assert_range [r hpttl myhash FIELDS 1 field1] 1000 2000} proc ::test)
```
2025-03-26 10:00:38 +08:00
Oran Agra 2a189709e0
avoid possible use-after-free with module KSN changes (#13875)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
in #13505, we changed the code to use the string value of the key rather
than the integer value on the stack, but we have a test in
unit/moduleapi/keyspace_events that uses keyspace notification hook to
modify the value with RM_StringDMA, which can cause this value to be
released before used. the reason it didn't happen so far is because we
were using shared integers, so releasing the object doesn't free it.
2025-03-24 12:24:52 +02:00
Yuan Wang 319bbcc1a7
Fix sdscatprintf error of the in output of `info stats` (#13871)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
CI failed: https://github.com/redis/redis/actions/runs/13981749993/job/39148249096,
since i don't reassign `info` after `sdscatprintf(info, xxx)`
Thanks to @sundb for spotting this
introduced in https://github.com/redis/redis/pull/13846
2025-03-24 09:17:58 +08:00
debing.sun 87b7c3ac1a
Fix rax node defragmentaion being skipped (#13847)
First, when we do `raxSeek()` and then call raxNext, we will get the
`RAX_ITER_JUST_SEEKED` flag and return success directly.
We always set the node defrag callback after `raxSeek()`, which means
that when we break from defragmentation, the first node that comes in
again will never be defragged.

In this PR, we save the last as the next node to be processed, not the
last node to be completed.
This way we defrag the next node when we exit to avoid it being skipped
on the next resume.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-03-24 08:57:08 +08:00
Benson-li 427c36888e
Fix potential infinite loop of RANDOMKEY during client pause (#13863)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
The bug mentioned in this
[#13862](https://github.com/redis/redis/issues/13862) has been fixed.

---------

Signed-off-by: li-benson <1260437731@qq.com>
Signed-off-by: youngmore1024 <youngmore1024@outlook.com>
Co-authored-by: youngmore1024 <youngmore1024@outlook.com>
2025-03-20 21:32:12 +08:00
debing.sun cb02bd190b
Fix timing issue in module defrag test (#13870)
After #13840, the data we populate becomes more complex and slower, we
always wait for a defragmentation cycle to end before verifying that the
test is okay.
However, in some slow environments, an entire defragmentation cycle can
exceed 5 seconds, and in my local test using 'taskset -c 0' it can reach
6 seconds, so increase the threshold to avoid test failures.
2025-03-20 21:22:47 +08:00
Yuan Wang 951ec79654
Cluster compatibility check (#13846)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
### Background
The program runs normally in standalone mode, but migrating to cluster
mode may cause errors, this is because some cross slot commands can not
run in cluster mode. We should provide an approach to detect this issue
when running in standalone mode, and need to expose a metric which
indicates the usage of no incompatible commands.

### Solution
To avoid perf impact, we introduce a new config
`cluster-compatibility-sample-ratio` which define the sampling ratio
(0-100) for checking command compatibility in cluster mode. When a
command is executed, it is sampled at the specified ratio to determine
if it complies with Redis cluster constraints, such as cross-slot
restrictions.

A new metric is exposed: `cluster_incompatible_ops` in `info stats`
output.

The following operations will be considered incompatible operations.

- cross-slot command
   If a command has multiple cross slot keys, it is incompatible
- `swap, copy, move, select` command
These commands involve multi databases in some cases, we don't allow
multiple DB in cluster mode, so there are not compatible
- Module command with `no-cluster` flag
If a module command has `no-cluster` flag, we will encounter an error
when loading module, leading to fail to load module if cluster is
enabled, so this is incompatible.
- Script/function with `no-cluster` flag
Similar with module command, if we declare `no-cluster` in shebang of
script/function, we also can not run it in cluster mode
- `sort` command by/get pattern
When `sort` command has `by/get` pattern option, we must ask that the
pattern slot is equal with the slot of keys, otherwise it is
incompatible in cluster mode.

- The script/function command accesses the keys and declared keys have
different slots
For the script/function command, we not only check the slot of declared
keys, but only check the slot the accessing keys, if they are different,
we think it is incompatible.

**Besides**, commands like `keys, scan, flushall, script/function
flush`, that in standalone mode iterate over all data to perform the
operation, are only valid for the server that executes the command in
cluster mode and are not broadcasted. However, this does not lead to
errors, so we do not consider them as incompatible commands.

### Performance impact test
**cross slot test**
Below are the test commands and results. When using MSET with 8 keys,
performance drops by approximately 3%.

**single key test**
It may be due to the overhead of the sampling function, and single-key
commands could cause a 1-2% performance drop.
2025-03-20 10:35:53 +08:00
Filipe Oliveira (Redis) 3e012c9260
Fix string2d usage in case of hexadecimal strings parsing and overflow (#13845)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
Since https://github.com/redis/redis/pull/11884, what was previously
accepted as a valid input (hexadecimal string) before 8.0 returned an
error. This PR addresses it. To avoid performance penalties if hints the
compiler that the fallbacks are not likely to happen.
Furthermore, we were ignoring std::result_out_of_range outputs from
fast_float. This PR addresses it as well and includes tests for both
identified scenarios.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-03-19 20:08:45 +08:00
debing.sun 26dcec4812
Fix messed-up unblocked clients in flush command (#13865)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
Fix https://github.com/redis/redis/pull/13853#pullrequestreview-2675227138

This PR ensures that the client's current command is not reset by
unblockClient(), while still needing to be handled after `unblockclient()`.
The FLUSH command still requires reprocessing (update the replication
offset) after unblockClient(). Therefore, we mark such blocked clients
with the CLIENT_PENDING_COMMAND flag to prevent the command from being
reset during unblockClient().
2025-03-19 10:22:47 +08:00
debing.sun a5a3afd923
Fix crash during SLAVEOF when clients are blocked on lazyfree (#13853)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
After https://github.com/redis/redis/pull/13167, when a client calls
`FLUSHDB` command, we still async empty database, and the client was
blocked until the lazyfree completes.

1) If another client calls `SLAVEOF` command during this time, the
server will unblock all blocked clients, including those blocked by the
lazyfree. However, when unblocking a lazyfree blocked client, we forgot
to call `updateStatsOnUnblock()`, which ultimately triggered the
following assertion.

2) If a client blocked by Lazyfree is unblocked midway, and at this
point the `bio_comp_list` has already received the completion
notification for the bio, we might end up processing a client that has
already been unblocked in `flushallSyncBgDone()`. Therefore, we need to
filter it out.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-03-17 20:27:05 +08:00
YaacovHazan 095c131fbb Redis 8.0 M04
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-16 10:12:40 +02:00
YaacovHazan 89eef40ca2 Merge remote-tracking branch 'upstream/unstable' into HEAD 2025-03-16 10:01:10 +02:00
kei-nan 752576ce47
Use Search v7.99.5 (#13859)
CI / test-ubuntu-latest (push) Waiting to run Details
CI / test-sanitizer-address (push) Waiting to run Details
CI / build-debian-old (push) Waiting to run Details
CI / build-macos-latest (push) Waiting to run Details
CI / build-32bit (push) Waiting to run Details
CI / build-libc-malloc (push) Waiting to run Details
CI / build-centos-jemalloc (push) Waiting to run Details
CI / build-old-chain-jemalloc (push) Waiting to run Details
Codecov / code-coverage (push) Waiting to run Details
External Server Tests / test-external-standalone (push) Waiting to run Details
External Server Tests / test-external-cluster (push) Waiting to run Details
External Server Tests / test-external-nodebug (push) Waiting to run Details
Spellcheck / Spellcheck (push) Waiting to run Details
2025-03-16 10:00:51 +02:00
YaacovHazan 84471e238e
Redis 8.0 RC1 (#13851)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
- Merge latest unstable
- Update version and release notes
2025-03-11 20:16:27 +02:00
YaacovHazan d1df881ec5 Redis 8.0 RC1 2025-03-11 10:03:17 +02:00
YaacovHazan 53949521de Merge remote-tracking branch 'upstream/unstable' into HEAD 2025-03-11 09:39:04 +02:00
Eran Hadad b704179f15
Update release of RedisJSON, RedisTS and RedisBloom 7.99.4 (#13850)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-11 09:36:28 +02:00
DvirDukhan 557e0b1c07
Update Makefile with search 7.99.4 (#13848)
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-09 13:55:27 +02:00
YaacovHazan a39ffc1fe9 Merge remote-tracking branch 'upstream/unstable' into HEAD
CI / test-ubuntu-latest (push) Has been cancelled Details
CI / test-sanitizer-address (push) Has been cancelled Details
CI / build-debian-old (push) Has been cancelled Details
CI / build-macos-latest (push) Has been cancelled Details
CI / build-32bit (push) Has been cancelled Details
CI / build-libc-malloc (push) Has been cancelled Details
CI / build-centos-jemalloc (push) Has been cancelled Details
CI / build-old-chain-jemalloc (push) Has been cancelled Details
Codecov / code-coverage (push) Has been cancelled Details
External Server Tests / test-external-standalone (push) Has been cancelled Details
External Server Tests / test-external-cluster (push) Has been cancelled Details
External Server Tests / test-external-nodebug (push) Has been cancelled Details
Spellcheck / Spellcheck (push) Has been cancelled Details
2025-03-09 10:08:03 +02:00
debing.sun f364dcca2d
Make RM_DefragRedisModuleDict API support incremental defragmentation for dict leaf (#13840)
After https://github.com/redis/redis/pull/13816, we make a new API to
defrag RedisModuleDict.
Currently, we only support incremental defragmentation of the dictionary
itself, but the defragmentation of values is still not incremental. If
the values are very large, it could lead to significant blocking.
Therefore, in this PR, we have added incremental defragmentation for the
values.

The main change is to the `RedisModuleDefragDictValueCallback`, we
modified the return value of this callback.
When the callback returns 1, we will save the `seekTo` as the key of the
current unfinished node, and the next time we enter, we will continue
defragmenting this node.
When the return value is 0, we will proceed to the next node.

## Test
Since each dictionary in the global dict originally contained only 10
strings, but now it has been changed to a nested dictionary, each
dictionary now has 10 sub-dictionaries, with each sub-dictionary
containing 10 strings, this has led to a corresponding reduction in the
defragmentation time obtained from other tests.
Therefore, the other tests have been modified to always wait for
defragmentation to be turned off before the test begins, then start it
after creating fragmentation, ensuring that they can always run for a
full defragmentation cycle.

---------

Co-authored-by: ephraimfeldblum <ephraim.feldblum@redis.com>
2025-03-04 17:19:41 +08:00
YaacovHazan cb261828bd
Merge unstable into 8.0 (#13835)
preparing for 8.0 RC1
2025-02-27 08:29:01 +02:00
YaacovHazan 9265234299 Merge remote-tracking branch 'upstream/unstable' into HEAD 2025-02-26 21:23:36 +02:00
debing.sun 7939ba031d
Enable the callback to be NULL for RM_DefragRedisModuleDict() and reduce the system calls of RM_DefragShouldStop() (#13830)
1) Enable the callback to be NULL for RM_DefragRedisModuleDict()
    Because the dictionary may store only the key without the value.

2) Reduce the system calls of RM_DefragShouldStop()
The API checks the following thresholds before performing a time check:
over 512 defrag hits, or over 1024 defrag misses, and performs the time
judgment if any of these thresholds are reached.

3) Added defragmentation statistics for dictionary items to cover the
associated code for RM_DefragRedisModuleDict().

4) Removed `module_ctx` from `defragModuleCtx` struct, which can be
replaced by a temporary variable.

---------

Co-authored-by: oranagra <oran@redislabs.com>
2025-02-26 20:04:29 +08:00
Yuan Wang f1d6542b1a
Stabilize tcl test cases (#13829)
Recently encountered some errors as bellow,

HGETEX/HSETEX with PXAT/EXAT options, after getting ttl, we calculate
current time by `[clock seconds]` that may have a delay that causes
results greater than expected.

Dismiss memory test error, now we introduced rdb-channel replication,
the full synchronization might finish before the child process exits. So
we may fail if calling `bgsave` immediately after full sync.
2025-02-25 16:31:53 +08:00
Denis Nevmerzhitskii 33f03f6fc8
Fix wrong behavior of XREAD + after last entry of stream have been removed (#13632)
Close #13628

This PR changes behavior of special `+` id of XREAD command. Now it uses
`streamLastValidID` to find last entry instead of `last_id` field of
stream object.
This PR adds test for the issue.

**Notes**

Initial idea to update `last_id` while executing XDEL seems to be wrong.
`last_id` is used to strore last generated id and not id of last entry.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: guybe7 <guy.benoish@redislabs.com>
2025-02-25 13:40:24 +08:00
Filipe Oliveira (Redis) 985bf68f34
Reduce redundant key slot calculations on expiration checks (#13796)
On high-pipeline/fast commands use-cases, expireIfNeeded can take up to
3% cpu cycles. 

This PR introduces an optimization where key expiration checks leverage
key slots to improve efficiency.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-25 11:55:30 +08:00
Moti Cohen 0200e8ada6
Fix multiple issues with "INFO KEYSIZES" (#13825)
This commit addresses several issues related to the `INFO KEYSIZES` feature:
- HyperLogLog commands: `KEYSIZES` hooks were not properly set or tested.
- HFE lazy expiration: `KEYSIZES` hooks were not properly set or tested.
- Empty DB & SYNC flow: On `blocking_async=0` flow, global `keysizes`
  histogram were not reset (can reproduced using `DEBUG RELOAD`).
- Empty string handling: Fix histogram for strings of size 0. Not 
  relevant to other data-types.
2025-02-25 00:38:44 +02:00
Filipe Oliveira (Redis) 1848809f66
Optimize dictFind by leveraging key length functions to avoid redundant computations. (#13792)
This PR enhances dictFind by introducing support for key length
functions, allowing the use of keyCompareWithLen when available. This
avoids redundant key length computations, improving efficiency,
especially when the dictionary is rehashing or there are a significant
number of hash collisions.

Additionally, it maintains backward compatibility and optimizes key
lookups without altering existing behavior.

Performance improvement on 100% GETs use-case

benchmark command used
```
taskset -c 1-11 memtier_benchmark --ratio 0:1 --key-maximum 1000000 --key-minimum 1 -c 1 -t 5 --pipeline 100 --key-pattern P:P --test-time 30 --hide-histogram -d 1024 -S /tmp/1.socket  -x 3
```

In unstable dictFindByHash takes 29% (and sdslen within it takes 8.9%)
of CPU cycles for a high-pipeline 100% gets use-case.
After this change dictFindByHash takes 27.8% (and sdslen within it takes
7.7%)

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
Co-authored-by: Yuan Wang <wangyuancode@163.com>
2025-02-24 22:27:06 +08:00
Filipe Oliveira (Redis) d7a448f9ae
Avoid redundant calls to sdslen(c->querybuf) in processMultibulkBuffer (#13787)
Optimize processMultibulkBuffer by avoiding redundant calls to
sdslen(c->querybuf).
The cached length is updated only when querybuf is modified.
2025-02-24 20:06:14 +08:00
debing.sun 658424fc83
Revert "Update history for ban-list propagation (#13749)" (#13827)
As discussed in
https://github.com/redis/redis/pull/13749#issuecomment-2673612941.
After #10398 we should record only the arguments and output changes in
the command history, while placing all others in the redis-doc, so
revert #13749.
2025-02-24 17:40:25 +08:00
Filipe Oliveira (Redis) 3f06ddfb7b
Reuse lookupCommand data on consecutive same command calls on main thread (#13764)
We can see that on fast commands and fast pipeline use-cases,
lookupCommand() takes 1.9% to 3.4% of total cpu cyles (depending on
pipeline). In cases in which consecutives commands are the same we can
avoid the call to lookupCommand() completely without changing or adding
new fields to the client struct (we simply reuse the info already
avaiable in lastcmd). This change can represent an improvement of around
4.4% in QPS on the high pipeline use-cases.

---------

Co-authored-by: debing.sun <debing.sun@redis.com>
2025-02-24 12:33:14 +08:00
YaacovHazan 1cddd0d3ba
Merge unstable into 8.0 (#13824) 2025-02-23 08:27:33 +02:00
debing.sun ee933d9e2b
Fixed passing incorrect endtime value for module context (#13822)
1) Fix a bug that passing an incorrect endtime to module.
   This bug was found by @ShooterIT.
After #13814, all endtime will be monotonic time, and we should no
longer convert it to ustime relative.
Add assertions to prevent endtime from being much larger thatn the
current time.

2) Fix a race in test `Reduce defrag CPU usage when module data can't be
defragged`

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-23 12:58:48 +08:00
YaacovHazan 1d5e13e121 Merge remote-tracking branch 'upstream/unstable' into HEAD 2025-02-22 21:48:15 +02:00
debing.sun 032357ec0f
Add RM_DefragRedisModuleDict module API (#13816)
After #13815, we introduced incremental defragmentation for global data
for module.
Now we added a new module API `RM_DefragRedisModuleDict` to incremental
defrag `RedisModuleDict`.

This PR adds a new APIs and a new defrag callback:
```c
RedisModuleDict *RM_DefragRedisModuleDict(RedisModuleDefragCtx *ctx, RedisModuleDict *dict, RedisModuleDefragDictValueCallback valueCB, RedisModuleString **seekTo);

typedef void *(*RedisModuleDefragDictValueCallback)(RedisModuleDefragCtx *ctx, void *data, unsigned char *key, size_t keylen);
```

Usage:
```c
RedisModuleString *seekTo = NULL;
RedisModuleDict *dict = = RedisModule_CreateDict(ctx);
... populate the dict code ...
/* Defragment a dictionary completely */
do {
    RedisModuleDict *new = RedisModule_DefragRedisModuleDict(ctx, dict, defragGlobalDictValueCB, &seekTo);
    if (new != NULL) {
        dict = new;
    }
} while (seekTo);
```

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 21:09:29 +08:00
debing.sun 695126ccce
Add support for incremental defragmentation of global module data (#13815)
## Description

Currently, when performing defragmentation on non-key data within the
module, we cannot process the defragmentation incrementally. This
limitation affects the efficiency and flexibility of defragmentation in
certain scenarios.
The primary goal of this PR is to introduce support for incremental
defragmentation of global module data.

## Interface Change
New module API `RegisterDefragFunc2`

This is a more advanced version of `RM_RegisterDefragFunc`, in that it
takes a new callbacks(`RegisterDefragFunc2`) that has a return value,
and can use RM_DefragShouldStop in and indicate that it should be called
again later, or is it done (returned 0).

## Note
The `RegisterDefragFunc` API remains available.

---------

Co-authored-by: ShooterIT <wangyuancode@163.com>
Co-authored-by: oranagra <oran@redislabs.com>
2025-02-20 00:28:16 +08:00
debing.sun 725cd268e6
Refactor of ActiveDefrag to reduce latencies (#13814)
This PR is based on: https://github.com/valkey-io/valkey/pull/1462

## Issue/Problems

Duty Cycle: Active Defrag has configuration values which determine the
intended percentage of CPU to be used based on a gradient of the
fragmentation percentage. However, Active Defrag performs its work on
the 100ms serverCron timer. It then computes a duty cycle and performs a
single long cycle. For example, if the intended CPU is computed to be
10%, Active Defrag will perform 10ms of work on this 100ms timer cron.

* This type of cycle introduces large latencies on the client (up to
25ms with default configurations)
* This mechanism is subject to starvation when slow commands delay the
serverCron

Maintainability: The current Active Defrag code is difficult to read &
maintain. Refactoring of the high level control mechanisms and functions
will allow us to more seamlessly adapt to new defragmentation needs.
Specific examples include:

* A single function (activeDefragCycle) includes the logic to
start/stop/modify the defragmentation as well as performing one "step"
of the defragmentation. This should be separated out, so that the actual
defrag activity can be performed on an independent timer (see duty cycle
above).
* The code is focused on kvstores, with other actions just thrown in at
the end (defragOtherGlobals). There's no mechanism to break this up to
reduce latencies.
* For the main dictionary (only), there is a mechanism to set aside
large keys to be processed in a later step. However this code creates a
separate list in each kvstore (main dict or not), bleeding/exposing
internal defrag logic. We only need 1 list - inside defrag. This logic
should be more contained for the main key store.
* The structure is not well suited towards other non-main-dictionary
items. For example, pub-sub and pub-sub-shard was added, but it's added
in such a way that in CMD mode, with multiple DBs, we will defrag
pub-sub repeatedly after each DB.

## Description of the feature

Primarily, this feature will split activeDefragCycle into 2 functions.

1. One function will be called from serverCron to determine if a defrag
cycle (a complete scan) needs to be started. It will also determine if
the CPU expenditure needs to be adjusted.
2. The 2nd function will be a timer proc dedicated to performing defrag.
This will be invoked independently from serverCron.

Once the functions are split, there is more control over the latency
created by the defrag process. A new configuration will be used to
determine the running time for the defrag timer proc. The default for
this will be 500us (one-half of the current minimum time). Then the
timer will be adjusted to achieve the desired CPU. As an example, 5% of
CPU will run the defrag process for 500us every 10ms. This is much
better than running for 5ms every 100ms.

The timer function will also adjust to compensate for starvation. If a
slow command delays the timer, the process will run proportionately
longer to ensure that the configured CPU is achieved. Given the presence
of slow commands, the proportional extra time is insignificant to
latency. This also addresses the overload case. At 100% CPU, if the
event loop slows, defrag will run proportionately longer to achieve the
configured CPU utilization.

Optionally, in low CPU situations, there would be little impact in
utilizing more than the configured CPU. We could optionally allow the
timer to pop more often (even with a 0ms delay) and the (tail) latency
impact would not change.

And we add a time limit for the defrag duty cycle to prevent excessive
latency. When latency is already high (indicated by a long time between
calls), we don't want to make it worse by running defrag for too long.

Addressing maintainability:

* The basic code structure can more clearly be organized around a
"cycle".
* Have clear begin/end functions and a set of "stages" to be executed.
* Rather than stages being limited to "kvstore" type data, a cycle
should be more flexible, incorporating the ability to incrementally
perform arbitrary work. This will likely be necessary in the future for
certain module types. It can be used today to address oddballs like
defragOtherGlobals.
* We reduced some of the globals, and reduce some of the coupling.
defrag_later should be removed from serverDb.
* Each stage should begin on a fresh cycle. So if there are
non-time-bounded operations like kvstoreDictLUTDefrag, these would be
less likely to introduce additional latency.


Signed-off-by: Jim Brunner
[brunnerj@amazon.com](mailto:brunnerj@amazon.com)
Signed-off-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)
Co-authored-by: Madelyn Olson
[madelyneolson@gmail.com](mailto:madelyneolson@gmail.com)

---------

Signed-off-by: Jim Brunner brunnerj@amazon.com
Signed-off-by: Madelyn Olson madelyneolson@gmail.com
Co-authored-by: Madelyn Olson madelyneolson@gmail.com
Co-authored-by: ShooterIT <wangyuancode@163.com>
2025-02-20 00:05:24 +08:00
guybe7 66df58f961
Do not send NL if replica client is already closed (#13813)
In case a replica connection was closed mid-RDB, we should not send a \n
to that replica, otherwise, it may reach the replica BEFORE it realizes
that the RDB transfer failed, causing it to treat the \n as if it was
read from the RDB stream
2025-02-19 15:04:28 +07:00
luozongle01 b045fe4e17
Fix overflow on 32-bit systems when calculating idle time for eviction (#13804)
the `dictGetSignedIntegerVal` function should be used here,
because in some cases (especially on 32-bit systems) long may
be 4 bytes, and the ttl time saved in expires is a unix timestamp
(millisecond value), which is more than 4 bytes. In this case, we may
not be able to get the correct idle time, which may cause eviction
disorder, in other words, keys that should be evicted later may be
evicted earlier.
2025-02-19 11:01:15 +08:00
Yunxiao Du c5f91abaf7
Fix syntax issue in comments of src/module.c (#13802)
closes https://github.com/redis/redis/issues/13797, just fix syntax
issue in comments instead of real code.
2025-02-19 10:58:14 +08:00