Authenticated users issuing specially crafted SETRANGE and SORT(_RO)
commands can trigger an integer overflow, resulting with Redis attempting
to allocate impossible amounts of memory and abort with an OOM panic.
Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit,
it'll also avoid feeding that output buffer with more data, but it will keep
running the loop in the command (despite the client already being marked for
disconnection)
This PR is an attempt to mitigate the problem, specifically for commands that
are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not
bound to the number of elements in the key), so it's enough to create a key
with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).
As Sentinel supports dynamic IP only when using hostnames, there
are few leftover addess comparison logic that doesn't take into
account that the IP might get change.
Co-authored-by: moticless <moticless@github.com>
(cherry picked from commit 4a27aa4875)
Before this commit, TLS tests on Ubuntu 22.04 would fail as dropped
connections result with an ECONNABORTED error thrown instead of an empty
read.
(cherry picked from commit 69d5576832)
This solves several problems in a more elegant way:
* No need to explicitly use `-lc` on x86_64 when building with `-m32`.
* Avoids issues with undefined floating point emulation funcs on ARM.
(cherry picked from commit f26e90be0c)
* Fix test modules linking on macOS 11.x.
* Use macOS 10.x for FreeBSD VM as VirtualBox is not yet supported on
11.
(cherry picked from commit 6d5a911707)
**Signed integer overflow.** Although, signed overflow issue can be problematic time to time
and change how compiler generates code, current findings mostly about signed shift or simple
addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue
as far as I can tell (checked generated code on godbolt.org).
UB means nothing guaranteed and risky to reason about program behavior but I don't think any
of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues
will be the real benefit.
partial cherry pick from commit b91d8b289b
The bug in BITFIELD seems to affect 12.2.1 used on Alpine
This test sets the master ping interval to 1 hour, in order to avoid
pings in the replicatoin stream incrementing the replication offset,
however, it didn't increase the repl-timeout so on slow machines
where the test took more than 60 seconds, the replicas would drop
and reconnect.
```
*** [err]: PSYNC2: Partial resync after restart using RDB aux fields in tests/integration/psync2.tcl
Replica didn't partial sync
```
The test would detect 4 additional partial syncs where it expects
only one.
(cherry picked from commit b0250b4508)
In moduleFireServerEvent we change the real client DB to 0 on freeClient in case the event is REDISMODULE_EVENT_CLIENT_CHANGE.
It results in a crash if the client is blocked on a key on other than DB 0.
The DB change is not necessary even for module-client, as we set its DB to 0 on either createClient or moduleReleaseTempClient.
Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit e4eb18b303)
Fix a few issues with the recent #11463
* use exitFromChild instead of exit
* test should ignore defunct process since that's what we expect to
happen for thees child processes when the parent dies.
* fix typo
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 4c54528f0f)
During a diskless sync, if the master main process crashes, the child would
have hung in `write`. This fix closes the read fd on the child side, so that if the
parent crashes, the child will get a write error and exit.
This change also fixes disk-based replication, BGSAVE and AOFRW.
In that case the child wouldn't have been hang, it would have just kept
running until done which may be pointless.
There is a certain degree of risk here. in case there's a BGSAVE child that could
maybe succeed and the parent dies for some reason, the old code would have let
the child keep running and maybe succeed and avoid data loss.
On the other hand, if the parent is restarted, it would have loaded an old rdb file
(or none), and then the child could reach the end and rename the rdb file (data
conflicting with what the parent has), or also have a race with another BGSAVE
child that the new parent started.
Note that i removed a comment saying a write error will be ignored in the child
and handled by the parent (this comment was very old and i don't think relevant).
(cherry picked from commit ccaef5c923)
Funcion sentinelAddrEqualsHostname() of sentinel makes DNS resolve
and based on it determines if two IP addresses are equal. Now, If the
DNS resolve command fails, the function simply returns 0, even if the
hostnames are identical.
This might become an issue in case of failover such that sentinel might
receives from Redis instance, response to regular INFO query it sent,
and wrongly decide that the instance is pointing to is different leader
than the one recorded because of this function, yet hostnames are
identical. In turn sentinel disconnects the connection between sentinel
and valid slave which leads to -failover-abort-no-good-slave.
See issue #11241.
I managed to reproduce only part of the flow in which the function
return wrong result and trigger +fix-slave-config.
The fix is even if the function failed to resolve then compare based on
hostnames. That is our best effort as long as the server is unavailable
for some reason. It is fine since Redis instance cannot have multiple
hostnames for a given setup
(cherry picked from commit bd23b15ad7)
1. show the overcommit warning when overcommit is disabled (2),
not just when it is set to heuristic (0).
2. improve warning text to mention the issue with jemalloc causing VM
mapping fragmentation when set to 2.
(cherry picked from commit dd60c6c8d3)
If Redis crashes due to calling an invalid function pointer,
the `backtrace` function will try to dereference this invalid pointer
which will cause a crash inside the crash report and will kill
the processes without having all the crash report information.
Example:
```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
198672:M 19 Sep 2022 18:06:12.936 # Redis 255.255.255 crashed by signal: 11, si_code: 1
198672:M 19 Sep 2022 18:06:12.936 # Accessing address: 0x1
198672:M 19 Sep 2022 18:06:12.936 # Crashed running the instruction at: 0x1
// here the processes is crashing
```
This PR tries to fix this crash be:
1. Identify the issue when it happened.
2. Replace the invalid pointer with a pointer to some dummy function
so that `backtrace` will not crash.
I identification is done by comparing `eip` to `info->si_addr`, if they
are the same we know that the crash happened on the same address it tries to
accesses and we can conclude that it tries to call and invalid function pointer.
To replace the invalid pointer we introduce a new function, `setMcontextEip`,
which is very similar to `getMcontextEip` and it knows to set the Eip for the
different supported OS's. After printing the trace we retrieve the old `Eip` value.
(cherry picked from commit 0bf90d9443)
This bug is introduced in #7653. (Redis 6.2.0)
When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is
`ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy
is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep
forever if some items are being freed in the lazyfree thread.
(cherry picked from commit 464aa04188)
When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the
client side tracking invalidation message was interleaved with transaction response.
(cherry picked from commit 6f0a27e38e)
Fix#11030, use lua_rawget to avoid triggering metatables.
#11030 shows how return `_G` from the Lua script (either function or eval), cause the
Lua interpreter to Panic and the Redis processes to exit with error code 1.
Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older
versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable
such that the metatable raises an error.
The following example demonstrate the issue:
```
127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0
Error: Server closed the connection
```
```
PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo')
```
The Lua panic happened because when returning the result to the client, Redis needs to
introspect the returning table and transform the table into a resp. In order to scan the table,
Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error.
This code is not running inside `pcall` (Lua protected call), so raising an error causes the
Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1.
Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable
that raises error when trying to fetch a none existing key.
### Solution
Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget`
that simply return the value from the table without triggering any metatable logic.
This is promised not to raise and error.
The downside of this solution is that it might be considered as breaking change, if someone
rely on metatable in the returned value. An alternative solution is to wrap this entire logic
with `pcall` (Lua protected call), this alternative require a much bigger refactoring.
### Back Porting
The same fix will work on older versions as well (6.2, 6.0). Notice that on those version,
the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses
Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done
from inside the script itself.
### Tests
Tests was added the verify the fix
(cherry picked from commit 020e046b42)
RM_SetAbsExpire and RM_GetAbsExpire were not actually operational since
they were introduced, due to omission in API registration.
(cherry picked from commit 39d216a326)
Use SSL_shutdown(), in a best-effort manner, when closing a TLS
connection. This change better supports OpenSSL 3.x clients that will
not silently ignore the socket-level EOF.
(cherry picked from commit 45ae605332)
when we know the size of the zset we're gonna store in advance,
we can check if it's greater than the listpack encoding threshold,
in which case we can create a skiplist from the get go, and avoid
converting the listpack to skiplist later after it was already populated.
(cherry picked from commit 2189100383)
When `zrangestore` is called container destination object is created.
Before this PR we used to create a listpack based object even if `zset-max-ziplist-entries`
or equivalent`zset-max-listpack-entries` was set to 0.
This triggered immediate conversion of the listpack into a skiplist in `zrangestore`, which hits
an assertion resulting in an engine crash.
Added a TCL test that reproduces this issue.
(cherry picked from commit 6461f09f43)
On v6.2.7 a new mechanism was added to Lua scripts that allows
filtering the globals of the Lua interpreter. This mechanism was
added on the following commit: 11b602fbf8
One of the globals that was filtered out was `__redis__compare_helper`. This global
was missed and was not added to the allow list or to the deny list. This is
why we get the following warning when Redis starts:
`A key '__redis__compare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list.`
After investigating the git blame log, the conclusion is that `__redis__compare_helper`
is no longer needed, the PR deletes this function, and fixes the warning.
Detailed Explanation:
`__redis__compare_helper` was added on this commit: https://github.com/redis/redis/commit/2c861050c1
Its purpose is to sort the replies of `SORT` command when script replication is enable and keep the replies
deterministic and avoid primary and replica synchronization issues. On `SORT` command, there was a need for
special compare function that are able to compare boolean values.
The need to sort the `SORT` command reply was removed on this commit: 36741b2c81
The sorting was moved to be part of the `SORT` command and there was not longer a need
to sort it on the Lua interpreter. The commit made `__redis__compare_helper` a dead code but did
not deleted it.
In #7491 (part of redis 6.2), we started using the monotonic timer instead of mstime to measure
command execution time for stats, apparently this meant sampling the clock 3 times per command
rather than two (wince we also need the wall-clock time).
In some cases this causes a significant overhead.
This PR fixes that by avoiding the use of monotonic timer, except for the cases were we know it
should be extremely fast.
This PR also adds a new INFO field called `monotonic_clock` that shows which clock redis is using.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 3cd8baf616)
Before #9497, before redis-server was shut down, we did not manually shut down all the clients,
which would have prevented valgrind from detecting a memory leak in the client's argc.
(cherry picked from commit 1376d83363)
Avoid printing "Killed by PID" when si_code != SI_USER.
Apparently SI_USER isn't always set to 0. e.g. on Mac it's 0x10001 and the check that did <= was wrong.
(cherry picked from commit 6761d10cc3)
Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client.
I.e. any ZRANGE / ZREVRNGE (when tank is used).
This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 1dc89e2d02)
Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b857928ba7)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the boundingbox but out of search areas.So we let search areas contain boundingbox to get n2.
Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit b2d393b990)