CI / build-libc-malloc (push) Has been cancelledDetails
CI / build-centos-jemalloc (push) Has been cancelledDetails
CI / build-old-chain-jemalloc (push) Has been cancelledDetails
Codecov / code-coverage (push) Has been cancelledDetails
External Server Tests / test-external-standalone (push) Has been cancelledDetails
External Server Tests / test-external-cluster (push) Has been cancelledDetails
External Server Tests / test-external-nodebug (push) Has been cancelledDetails
Spellcheck / Spellcheck (push) Has been cancelledDetails
This PR aims to avoid the situation of a potential crash when efSearch
is too large (and therefore the memory allocated could lead to a server
crash or an integer overflow (where less memory is allocated than
expected).
- Limit the accepted EF in the request o 100_000 as in VADD
- Limit the ef search to the number of nodes in the HNSW graph
This PR resolves a potential division-by-zero issue in the `redis-cli`
LRU test mode (`--lru-test`), as reported by the Linux Verification
Center.
Fixes#14361
This PR adds a `--ollama-url` option to `cli.py`, the lightweight
redis-cli-like tool that expands !"text" arguments into embeddings via
Ollama.
Previously, the embedding call was hardcoded to
http://localhost:11434/api/embeddings. With this change, users can
specify a custom Ollama server URL when starting the tool.
If no URL is provided, the tool defaults to what it was before.
Hash field expiration is managed with two levels of data structures.
1. At the DB level, an ebuckets structure maintains the set of all
hashes that contain fields with expiration.
2. At the per-hash level, an ebuckets structure tracks fields with
expiration.
This pull request refactors the 1st level to operate per slot instead,
and introduces a new API called estore (expiration store). Its design
aligns closely with the existing kvstore API, ensuring consistency and
simplifying usage. The terminology at that level has been updated from
“HFE” or “hexpire” to “subexpiry”, reflecting a broader scope that can
later support other data types.
From the malloc-stats reports of both failures and successes, we can see
that the additional fragments mainly come from bin24.
By analyzing the fragments mainly from the entries of the dict, since
`large_ebrax` test uses a dictionary with 1600 elements, it will move a
large number of entries during the rehashing process, and we will not
perform defragmentation on the dict entries.
In https://github.com/redis/redis/pull/13842 we changed to use two dicts
alternately to generate frag. Normally, the entries should also
alternate, but rehashing disrupted this, which resulted in bin24 frag
that can't be defragged.
## Solution
In this PR, the length of a single dictionary was reduced from 1600 to
500 to avoid excessive rehashing, and the threshold was also lowered.
---------
Co-authored-by: oranagra <oran@redislabs.com>
This PR is based on https://github.com/valkey-io/valkey/pull/1303
This PR introduces a DEBUG_DEFRAG compilation option that enables
activedefrag functionality even when the allocator is not jemalloc, and
always forces defragmentation regardless of the amount or ratio of
fragmentation.
## Using
```
make SANITIZER=address DEBUG_DEFRAG=<force|fully>
./runtest --debug-defrag
```
* DEBUG_DEFRAG=force
* Ignore the threshold for defragmentation to ensure that
defragmentation is always triggered.
* Always reallocate pointers to probe for correctness issues in pointer
reallocation.
* DEBUG_DEFRAG=fully
* Includes everything in the option `force`.
* Additionally performs a full defrag on every defrag cycle, which is
significantly slower but more accurate.
---------
Co-authored-by: Ran Shidlansik <ranshid@amazon.com>
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: oranagra <oran@redislabs.com>
This PR fixes three defrag issues.
1. Fix the issue that forget to update cgroup_ref_node when the consume
group was reallocated.
This crash was introduced by https://github.com/redis/redis/issues/14130
In this PR, when performing defragmentation on `s->cgroups` using
`defragRadixTree()`, we no longer rely on the automatic data
defragmentation of `defragRadixTree()`. Instead, we manually defragment
the consumer group and then update its reference in `s->cgroups`.
2. Fix a use-after-free issue caused by updating dictionary keys after
HFE key is reallocated.
This issue was introduced by https://github.com/redis/redis/issues/13842
3. Fix the issue that forgot to be updated NextSegHdr->firstSeg when the
first segment was reallocated.
This issue was introduced by https://github.com/redis/redis/issues/13842
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
We have code to assume that if we're facing a big argument, then the
next argument is likely to be very big too, so we allocate another huge
query buffer.
This will backfire and return OOM error on any command with an argument
that's larger than half the memory limit (even if the db completely
empty).
To mitigate that, we reserve query buffer for another big argument, only
if that big argument is less than 1/30 of the memory limit.
This PR fixes two defrag issues.
1. Fix a use-after-free issue caused by updating dictionary keys after a
pubsub channel is reallocated.
This issue was introduced by https://github.com/redis/redis/pull/13058
1. Fix potential use-after-free for lua during AOF loading with defrag
This issue was introduced by https://github.com/redis/redis/issues/13058
This fix follows https://github.com/redis/redis/pull/14319
This PR updates the LuaScript LRU list before script execution to
prevent accessing a potentially invalidated pointer after long-running
scripts.
From the following logs, if we are in a slow environment, the election
process of sentinels may become very slow.
Even if the master instance that was restarted and is slowly loading RDB
has already been loaded, the election just gets started.
This PR makes the master load the RDB more slowly, and fixes the missing
of reseting reset `key-load-delay` for the master node.
This PR follows https://github.com/redis/redis/pull/14226.
make test fails on fresh checkout because test modules are not built by
default when running tests.
error:
[exception]: Executing test client: ERR Error loading the extension.
Please check the server logs..
ERR Error loading the extension. Please check the server logs.
solution:
Add module_tests to the test target dependencies:
This PR follows https://github.com/redis/redis/issues/14226.
When using parallel compilation with `make -j`, the `module_tests`
target and `REDIS_SERVER_NAME` may compile concurrently, leading to
build failures. This appears to be caused by both targets having a
shared dependency on `redismodule.h`, creating a race condition during
parallel execution.
Solution:
Add explicit dependency of `module_tests` on `$(REDIS_SERVER_NAME)` to
enforce build order.
This PR fixes two crashes due to the defragmentation of the Lua script,
which were by https://github.com/redis/redis/pull/13108
1. During long-running Lua script execution, active defragmentation may
be triggered, causing the luaScript structure to be reallocated to a new
memory location, then we access `l->node`(may be reallocatedd) after
script execution to update the Lua LRU list.
In this PR, we don't defrag during blocked scripts, so we don't mess up
the LRU update when the script ends.
Note that defrag is now only permitted during loading.
This PR also reverts the changes made by
https://github.com/redis/redis/pull/14274.
2. Forgot to update the Lua LUR list node's value.
Since `lua_scripts_lru_list` node stores a pointer to the `lua_script`'s
key, we also need to update `node->value` when the key is reallocated.
In this PR, after performing defragmentation on a Lua script, if the
script is in the LRU list, its reference in the LRU list will be
unconditionally updated.
Fixes#14257
The XGROUP CREATE and SETID subcommands allowed setting an ENTRIESREAD
value greater than the stream's total `entries_added` counter. This
could lead to a logically inconsistent state.
This commit adds a check to ensure the provided ENTRIESREAD value is not
greater than the number of entries ever added to the stream. If
ENTRIESREAD is too large, it gets set to the total number of entries in
the stream, i.e. `s->entries_added`.
This PR fixes a bug in the `hnsw_cursor_free` function where the prev
pointer was never updated during cursor list traversal. As a result, if
the cursor being freed was not the head of the list, it would not be
correctly unlinked, potentially causing memory leaks or corruption of
the cursor list.
Note that since `hnsw_cursor_free()` is never used for now, this PR does
not actually fix any bug.
These two tests often fail in the slow environment.
1. `Module defrag: late defrag with cursor works` test
`defragtest_datatype_resumes` in a defrag cycle does not always reach 10
times, so increase the threshold and move the assertion of
`defragtest_datatype_resumes` to `wait_for_condition`.
2. `Module defrag: global defrag works` test
Increase the waiting time for this test.
This PR mainly fixes two flakiness tests.
1. Fix the failure of `Active Defrag HFE with ebrax` test in
`memefficiency.tcl`
When `redisObject` structure size changes, the current test design
becomes flakiness:
In the current test, we will create 1 hash key + N string keys. When we
delete this string key, these hash keys may be evenly distributed in
robj 's slabs, resulting in the inability to perform defragmentation.
2. Fix `bulk reply protocol` test in `protocol.tcl` introduced by
https://github.com/redis/redis/pull/13711
When `OBJ_ENCODING_EMBSTR_SIZE_LIMIT` (currently 44) changes, it can
cause this test to fail. This isn't necessarily a problem, but the main
issue is that we use `rawread` to verify encoding correctness. If the
reply length doesn't match exactly, it can cause the test to hang and
become difficult to debug.
integrate module API tests into default test suite
- Add module_tests target to main Makefile to build test modules
- Include unit/moduleapi in test_dirs to run module tests with ./runtest
- Module API tests now run by default instead of requiring
runtest-moduleapi
---------
Co-authored-by: debing.sun <debing.sun@redis.com>
Refactor `dictEntryNoValue` to remove the need for a per-entry bit
(`ENTRY_PTR_NO_VALUE`) indicating the absence of a value. By aligning
`dictEntry` and `dictEntryNoValue` so that the key and next fields share
the same layout, we can reuse common code paths. This precious bit can
be leveraged for other stuff.
The trade-off is that callers must now be more careful not to call
`dictSetVal()` or `dictGetVal()` on dictionaries that only store keys,
since we can no longer catch this with an assertion. However, this
limitation is manageable.
## Summary
This PR adds a new configuration option `decode_array_with_array_mt` to
lua_cjson that allows users to control how empty JSON arrays are handled
during encoding/decoding.
## Problem
Currently, lua_cjson has an ambiguity when handling empty tables:
- When decoding an empty JSON array `[]`, it becomes an empty Lua table.
This is mainly because both {} (object) and [] (array) are represented
as tables. The Lua cjson library then decides whether to encode a table
as a JSON object or as a JSON array, depending on its length. If the
length is not 0, it becomes a JSON array; otherwise, it is treated as a
JSON object.
## Solution
Added a new configuration option `decode_array_with_array_mt` (default:
`false` for backward compatibility):
- **When `false` (default)**: Maintains current behavior - empty arrays
decode to Lua table
- **When `true`**: Empty JSON arrays decode to tables with a special
metatable marker `__is_cjson_array`
```lua
-- Usage Example
-- Default behavior without decode_array_with_array_mt (backward compatible)
local arr = cjson.decode("[]") -- plain table {}
cjson.encode(arr) -- produces "{}"
-- Default behavior (backward compatible)
cjson.decode_array_with_array_mt(false)
local arr = cjson.decode("[]") -- plain table {}
cjson.encode(arr) -- produces "{}"
-- New behavior
cjson.decode_array_with_array_mt(true)
local arr = cjson.decode("[]") -- table with __is_cjson_array metatable
cjson.encode(arr) -- produces "[]"
```
## Note
this new Lua cjson API(decode_array_with_array_mt) references from
https://github.com/openresty/lua-cjson
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
Co-authored-by: Ozan Tezcan <ozantezcan@gmail.com>
## Summary
This PR optimizes the performance of
`zmalloc_get_frag_smallbins_by_arena()` by eliminating the overhead of
`snprintf()` and string parsing in the hot loop when calculating
jemalloc small bins frag bytes.
## Solution
Replaced je_mallctl() calls with je_mallctlbymib().
This approach has two benefits:
1. Avoid the overhead from `snprintf`.
2. Reduces the overhead that jemalloc needs to parse parameters from the
parameter.
After the key-value unification (kvobj), the MEMORY USAGE command may no
longer account for the embedded key length stored within the kvobj. To
fix this, replace sizeof(*o) with zmalloc_size((void *)o) to ensure the
full allocated size is measured.
In this context, the function objectComputeSize() was renamed and
modified to kvobjComputeSize(). From computing only the value size to
compute the key and its value.
This commit adds support for the "touches-arbitrary-keys" command flag
in Redis modules, allowing module commands to be properly marked when
they modify keys not explicitly provided as arguments, to avoid wrapping
replicated commands with MULTI/EXEC.
Changes:
- Added "touches-arbitrary-keys" flag parsing in
commandFlagsFromString()
- Updated module command documentation to describe the new flag
- Added test implementation in zset module with zset.delall command to
demonstrate and verify the flag functionality
The zset.delall command serves as a test case that scans the keyspace
and deletes all zset-type keys, properly using the new flag since it
modifies keys not provided via argv.
This commit adds a new `zset.delall` command to the zset test module
that iterates through the keyspace and deletes all keys of type "zset".
Key changes:
- Added zset_delall() function that uses RedisModule_Scan to iterate
through all keys in the keyspace
- Added zset_delall_callback() that checks each key's type and deletes
zset keys using RedisModule_Call with "DEL" command
- Registered the new command with "write touches-arbitrary-keys" flags
since it modifies arbitrary keys not provided via argv
- Added support for "touches-arbitrary-keys" flag in module command
parsing
- Added comprehensive tests for the new functionality
The command returns the number of deleted zset keys and properly handles
replication by using the "s!" format specifier with RedisModule_Call to
ensure DEL commands are replicated to slaves and AOF.
Usage: ZSET.DELALL
Returns: Integer count of deleted zset keys
This bug was introduced by https://github.com/redis/redis/pull/14130
found by @oranagra
### Summary
Because `s->cgroup_ref` is created at runtime the first time a consumer
group is linked with a message, but it is not released when all
references are removed.
However, after `debug reload` or restart, if the PEL is empty (meaning
no consumer group is referencing any message), `s->cgroup_ref` will not
be recreated.
As a result, when executing XADD or XTRIM with `ACKED` option and
checking whether a message that is being read but has not been ACKed can
be deleted, the cgroup_ref being NULL will cause a crash.
### Code Path
```
xaddCommand -> streamTrim -> streamEntryIsReferenced
```
### Solution
Check if `s->cgroup_ref` is NULL in streamEntryIsReferenced().
Fix https://github.com/redis/redis/issues/14267
This bug was introduced by https://github.com/redis/redis/pull/13495
### Summary
When a replica clears a large database, it periodically calls
processEventsWhileBlocked() in the replicationEmptyDbCallback() callback
during the key deletion process.
If defragmentation is enabled, this means that active defrag can be
triggered while the database is being deleted.
The defragmentation process may also modify the database at this time,
which could lead to crashes when the database is accessed after
defragmentation.
Code Path:
```
replicationEmptyDbCallback() -> processEventsWhileBlocked() -> whileBlockedCron() -> defragWhileBlocked()
```
### Solution
This PR temporarily disables active defrag before emptying the database,
then restores the active defrag setting after the empty is complete.
---------
Co-authored-by: Copilot <175728472+Copilot@users.noreply.github.com>
### Description
This PR introduces support for using the processor clock
(USE_PROCESSOR_CLOCK) for the RISC-V architecture in the
`src/monotonic.c`. The change adds conditional compilation code enabling
Redis to utilize the RISC-V processor clock for monotonic timing,
improving time measurement accuracy and consistency on RISC-V platforms.
### Motivation
Currently, Redis's monotonic clock implementation lacks explicit
handling for RISC-V processor clocks, adding USE_PROCESSOR_CLOCK support
helps Redis better leverage hardware capabilities on RISC-V, enhancing
portability and performance.
### Changes
- Added `USE_PROCESSOR_CLOCK` macro and related support code guarded by
RISC-V specific macros in `src/monotonic.c`.
- No existing functionality is changed for other architectures.
### Testing
- Build success, redis-server and redis-benchmark run all well on
**Sophgo SG2042 RISC-V CPU**.
- It is not easy to test the performance improvement brought by
`USE_PROCESSOR_CLOCK` using redis-benchmark, so we wrote a
micro-benchmark `monotonic_bench.c` test on RISC-V.
From `monotonic_bench` on **Sophgo SG2042 RISC-V CPU**, we can see that
the RISC-V processor clock implementation is approximately **2.78 times
faster** than the POSIX monotonic clock method on this platform.
### Notes
- No impact on existing Redis platforms. Enables improved timing for
RISC-V users which is critical for latency measurements, timeouts, and
other internal Redis timing logic.
- This change aligns with Redis’s strategy of supporting diverse
hardware platforms with minimal footprint, and it preserves backward
compatibility with existing code.
- To use `USE_PROCESSOR_CLOCK` on RISC-V, complie redis with `make
CFLAGS="-DUSE_PROCESSOR_CLOCK"`.
Signed-off-by: Huang Zheng <huang.zheng@sanechips.com.cn>
When Redis is shut down uncleanly (e.g., due to power loss), invalid
bytes may remain at the end of the AOF file. Currently, Redis detects
such corruption only after parsing most of the AOF, leading to delayed
error detection and increased downtime. Manual recovery via
`redis-check-aof --fix` is also time-consuming.
This fix introduces two new options to improve resilience and reduce
downtime:
- `aof-load-broken`: Enables automatic detection and repair of broken
AOF tails.
- `aof-load-broken-max-size`: Sets a maximum threshold (in bytes) for
the corrupted tail size that Redis will attempt to fix automatically
without requiring user intervention.
Hi, this PR implements the following changes:
1. The EPSILON option of VSIM is now documented.
2. The EPSILON behavior was fixed: the score was incorrectly divided by
two in the meaning, with a 0-2 interval provided by the underlying
cosine similarity, instead of the 0-1 interval. So an EPSILON of 0.2
only returned elements with a distance between 1 and 0.9 instead of 1
and 0.8. This is a *breaking change* but the command was not documented
so far, and it is a fix, as the user sees the similarity score so was a
total mismatch. I believe this fix should definitely be back ported as
soon as possible.
3. There are now tests.
Thanks for checking,
Salvatore
Fix https://github.com/redis/redis/issues/14208
As mentioned in the above issue, RM_GetCommandKeysWithFlags could have
memory leak when the number of keys is larger than MAX_KEYS_BUFFER. This
PR fixes it by calling getKeysFreeResult before the function's return. A
TCL testcase is created to verify the fix.
getSlotOrReply() is used by the `CLUSTER SLOT-STATS` command but is
defined
in cluster_legacy.c which might not be present in all build
configurations.
CI / build-libc-malloc (push) Has been cancelledDetails
CI / build-centos-jemalloc (push) Has been cancelledDetails
CI / build-old-chain-jemalloc (push) Has been cancelledDetails
Codecov / code-coverage (push) Has been cancelledDetails
External Server Tests / test-external-standalone (push) Has been cancelledDetails
External Server Tests / test-external-cluster (push) Has been cancelledDetails
External Server Tests / test-external-nodebug (push) Has been cancelledDetails
Spellcheck / Spellcheck (push) Has been cancelledDetails
Noticed we assume there are at least 3 arguments since we access to
index 2 in the if and only later check the argc.
Moved the argc check to the start of the if so the code will be a bit
safer.
CI / build-libc-malloc (push) Has been cancelledDetails
CI / build-centos-jemalloc (push) Has been cancelledDetails
CI / build-old-chain-jemalloc (push) Has been cancelledDetails
Codecov / code-coverage (push) Has been cancelledDetails
External Server Tests / test-external-standalone (push) Has been cancelledDetails
External Server Tests / test-external-cluster (push) Has been cancelledDetails
External Server Tests / test-external-nodebug (push) Has been cancelledDetails
Spellcheck / Spellcheck (push) Has been cancelledDetails
In cluster mode with modules, for a given key, the slot resolution for
the KEYSIZES histogram update was incorrect. As a result, the histogram
might gracefully ignored those keys instead or update the wrong slot
histogram.
CI / build-libc-malloc (push) Waiting to runDetails
CI / build-centos-jemalloc (push) Waiting to runDetails
CI / build-old-chain-jemalloc (push) Waiting to runDetails
Codecov / code-coverage (push) Waiting to runDetails
External Server Tests / test-external-standalone (push) Waiting to runDetails
External Server Tests / test-external-cluster (push) Waiting to runDetails
External Server Tests / test-external-nodebug (push) Waiting to runDetails
Spellcheck / Spellcheck (push) Waiting to runDetails
Introduced by https://github.com/redis/redis/issues/13806
Fixed a crash in the MOVE command when moving hash objects that have
both key expiration and field expiration.
The issue occurred in the following scenario:
1. A hash has both key expiration and field expiration.
2. During MOVE command, `setExpireByLink()` is called to set the
expiration time for the target hash, which may reallocate the kvobj of
hash.
3. Since the hash has field expiration, `hashTypeAddToExpires()` is
called to update the minimum field expiration time
Issue:
However, the kvobj pointer wasn't updated with the return value from
`setExpireByLink()`, causing `hashTypeAddToExpires()` to use freed
memory.