Commit Graph

10458 Commits

Author SHA1 Message Date
Oran Agra 137696d808 Redis 6.2.9 2023-01-16 18:41:08 +02:00
Oran Agra 37b3b2a7e0 Fix range issues in ZRANDMEMBER and HRANDFIELD (CVE-2023-22458)
missing range check in ZRANDMEMBER and HRANDIFLD leading to panic due
to protocol limitations
2023-01-16 18:41:08 +02:00
Oran Agra 5453899878 Avoid integer overflows in SETRANGE and SORT (CVE-2022-35977)
Authenticated users issuing specially crafted SETRANGE and SORT(_RO)
commands can trigger an integer overflow, resulting with Redis attempting
to allocate impossible amounts of memory and abort with an OOM panic.
2023-01-16 18:41:08 +02:00
Oran Agra 3148f3e8a5 Obuf limit, exit during loop in *RAND* commands and KEYS
Related to the hang reported in #11671
Currently, redis can disconnect a client due to reaching output buffer limit,
it'll also avoid feeding that output buffer with more data, but it will keep
running the loop in the command (despite the client already being marked for
disconnection)

This PR is an attempt to mitigate the problem, specifically for commands that
are easy to abuse, specifically: KEYS, HRANDFIELD, SRANDMEMBER, ZRANDMEMBER.
The RAND family of commands can take a negative COUNT argument (which is not
bound to the number of elements in the key), so it's enough to create a key
with one field, and then these commands can be used to hang redis.
For KEYS the caller can use the existing keyspace in redis (if big enough).
2023-01-16 18:41:08 +02:00
Moti Cohen 3ebace932d Fix sentinel issue if replica changes IP (#11590)
As Sentinel supports dynamic IP only when using hostnames, there
are few leftover addess comparison logic that doesn't take into
account that the IP might get change.

Co-authored-by: moticless <moticless@github.com>
(cherry picked from commit 4a27aa4875)
2023-01-16 18:41:08 +02:00
Oran Agra 423c78f4fa Redis 6.2.8 2022-12-12 17:02:54 +02:00
Yossi Gottlieb de4d78d7df Fix TLS tests on newer tcl-tls/OpenSSL. (#10910)
Before this commit, TLS tests on Ubuntu 22.04 would fail as dropped
connections result with an ECONNABORTED error thrown instead of an empty
read.

(cherry picked from commit 69d5576832)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb e5e642aa9b Use 'gcc' instead of 'ld' to link test modules. (#9710)
This solves several problems in a more elegant way:

* No need to explicitly use `-lc` on x86_64 when building with `-m32`.
* Avoids issues with undefined floating point emulation funcs on ARM.

(cherry picked from commit f26e90be0c)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb 52cccfbe94 Fix daily failures due to macos-latest change. (#9637)
* Fix test modules linking on macOS 11.x.
* Use macOS 10.x for FreeBSD VM as VirtualBox is not yet supported on
  11.

(cherry picked from commit 6d5a911707)
2022-12-12 17:02:54 +02:00
Ozan Tezcan 052e01e75d Some fixes to undefined behaviour bugs taken from (#9601)
**Signed integer overflow.** Although, signed overflow issue can be problematic time to time
and change how compiler generates code, current findings mostly about signed shift or simple
addition overflow. For most platforms Redis can be compiled for, this wouldn't cause any issue
as far as I can tell (checked generated code on godbolt.org).

UB means nothing guaranteed and risky to reason about program behavior but I don't think any
of the fixes here worth backporting. As sanitizers are now part of the CI, preventing new issues
will be the real benefit.

partial cherry pick from commit b91d8b289b
The bug in BITFIELD seems to affect 12.2.1 used on Alpine
2022-12-12 17:02:54 +02:00
Madelyn Olson 5d66aa3d22 Initialize manual failover replica target (#9814)
(cherry picked from commit 362b3b02e6)
2022-12-12 17:02:54 +02:00
Oran Agra 2f94d40e02 Bump vmactions/freebsd-vm to 0.3.0 2022-12-12 17:02:54 +02:00
Oran Agra d345d1da9b resolve build warnings in quicklist test 2022-12-12 17:02:54 +02:00
Oran Agra 0a3ae0f0a4 Try to fix a race in psync2 test (#11553)
This test sets the master ping interval to 1 hour, in order to avoid
pings in the replicatoin stream incrementing the replication offset,
however, it didn't increase the repl-timeout so on slow machines
where the test took more than 60 seconds, the replicas would drop
and reconnect.

```
*** [err]: PSYNC2: Partial resync after restart using RDB aux fields in tests/integration/psync2.tcl
Replica didn't partial sync
```

The test would detect 4 additional partial syncs where it expects
only one.

(cherry picked from commit b0250b4508)
2022-12-12 17:02:54 +02:00
chenyang8094 6029bd560e Fix null pointer subtraction warning (#10498)
The warning:
```
pqsort.c:106:7: warning: performing pointer subtraction with a null pointer has undefined behavior [-Wnull-pointer-subtraction]
loop:   SWAPINIT(a, es);
        ^~~~~~~~~~~~~~~
pqsort.c:65:47: note: expanded from macro 'SWAPINIT'
#define SWAPINIT(a, es) swaptype = ((char *)a - (char *)NULL) % sizeof(long) || \
```
Clang version:
```
Apple clang version 13.1.6 (clang-1316.0.21.2)
Target: x86_64-apple-darwin21.3.0
Thread model: posix
InstalledDir: /Applications/Xcode.app/Contents/Developer/Toolchains/XcodeDefault.xctoolchain/usr/bin
```

(cherry picked from commit cb625844bf)
2022-12-12 17:02:54 +02:00
uriyage 39e5a6fa2f Module CLIENT_CHANGE, Fix crash on free blocked client with DB!=0 (#11500)
In moduleFireServerEvent we change the real client DB to 0 on freeClient in case the event is REDISMODULE_EVENT_CLIENT_CHANGE.
It results in a crash if the client is blocked on a key on other than DB 0.

The DB change is not necessary even for module-client, as we set its DB to 0 on either createClient or moduleReleaseTempClient.

Co-authored-by: Madelyn Olson <34459052+madolson@users.noreply.github.com>
Co-authored-by: Binbin <binloveplay1314@qq.com>
Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit e4eb18b303)
2022-12-12 17:02:54 +02:00
Oran Agra 4ac3d79bfc fixes for fork child exit and test: #11463 (#11499)
Fix a few issues with the recent #11463
* use exitFromChild instead of exit
* test should ignore defunct process since that's what we expect to
  happen for thees child processes when the parent dies.
* fix typo

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit 4c54528f0f)
2022-12-12 17:02:54 +02:00
Oran Agra 51fa40ff42 diskless master, avoid bgsave child hung when fork parent crashes (#11463)
During a diskless sync, if the master main process crashes, the child would
have hung in `write`. This fix closes the read fd on the child side, so that if the
parent crashes, the child will get a write error and exit.

This change also fixes disk-based replication, BGSAVE and AOFRW.
In that case the child wouldn't have been hang, it would have just kept
running until done which may be pointless.

There is a certain degree of risk here. in case there's a BGSAVE child that could
maybe succeed and the parent dies for some reason, the old code would have let
the child keep running and maybe succeed and avoid data loss.
On the other hand, if the parent is restarted, it would have loaded an old rdb file
(or none), and then the child could reach the end and rename the rdb file (data
conflicting with what the parent has), or also have a race with another BGSAVE
child that the new parent started.

Note that i removed a comment saying a write error will be ignored in the child
and handled by the parent (this comment was very old and i don't think relevant).

(cherry picked from commit ccaef5c923)
2022-12-12 17:02:54 +02:00
Moti Cohen 042512aa36 Fix sentinel function that compares hostnames (if failed resolve) (#11419)
Funcion sentinelAddrEqualsHostname() of sentinel makes DNS resolve
and based on it determines if two IP addresses are equal. Now, If the
DNS resolve command fails, the function simply returns 0, even if the
hostnames are identical.

This might become an issue in case of failover such that sentinel might
receives from Redis instance, response to regular INFO query it sent,
and wrongly decide that the instance is pointing to is different leader
than the one recorded because of this function, yet hostnames are
identical. In turn sentinel disconnects the connection between sentinel
and valid slave which leads to -failover-abort-no-good-slave.
See issue #11241.

I managed to reproduce only part of the flow in which the function
return wrong result and trigger +fix-slave-config.

The fix is even if the function failed to resolve then compare based on
hostnames. That is our best effort as long as the server is unavailable
for some reason. It is fine since Redis instance cannot have multiple
hostnames for a given setup

(cherry picked from commit bd23b15ad7)
2022-12-12 17:02:54 +02:00
Oran Agra 3f39c18ac1 Improve linux overcommit check and warning (#11357)
1. show the overcommit warning when overcommit is disabled (2),
   not just when it is set to heuristic (0).
2. improve warning text to mention the issue with jemalloc causing VM
   mapping fragmentation when set to 2.

(cherry picked from commit dd60c6c8d3)
2022-12-12 17:02:54 +02:00
Meir Shpilraien (Spielrein) d8102f5ff8 Avoid crash on crash report when a bad function pointer was called (#11298)
If Redis crashes due to calling an invalid function pointer,
the `backtrace` function will try to dereference this invalid pointer
which will cause a crash inside the crash report and will kill
the processes without having all the crash report information.

Example:

```
=== REDIS BUG REPORT START: Cut & paste starting from here ===
198672:M 19 Sep 2022 18:06:12.936 # Redis 255.255.255 crashed by signal: 11, si_code: 1
198672:M 19 Sep 2022 18:06:12.936 # Accessing address: 0x1
198672:M 19 Sep 2022 18:06:12.936 # Crashed running the instruction at: 0x1
// here the processes is crashing
```

This PR tries to fix this crash be:
1. Identify the issue when it happened.
2. Replace the invalid pointer with a pointer to some dummy function
   so that `backtrace` will not crash.

I identification is done by comparing `eip` to `info->si_addr`, if they
are the same we know that the crash happened on the same address it tries to
accesses and we can conclude that it tries to call and invalid function pointer.

To replace the invalid pointer we introduce a new function, `setMcontextEip`,
which is very similar to `getMcontextEip` and it knows to set the Eip for the
different supported OS's. After printing the trace we retrieve the old `Eip` value.

(cherry picked from commit 0bf90d9443)
2022-12-12 17:02:54 +02:00
zhaozhao.zz 65ad4ff87e fix infinite sleep in performEvictions when have lazyfree jobs (#11237)
This bug is introduced in #7653. (Redis 6.2.0)

When `server.maxmemory_eviction_tenacity` is 100, `eviction_time_limit_us` is
`ULONG_MAX`, and if we cannot find the best key to delete (e.g. maxmemory-policy
is `volatile-lru` and all keys with ttl have been evicted), in `cant_free` redis will sleep
forever if some items are being freed in the lazyfree thread.

(cherry picked from commit 464aa04188)
2022-12-12 17:02:54 +02:00
DarrenJiang13 ec5564a4cf fix the client type in trackingInvalidateKey() (#11052)
Fix bug with scripts ignoring client tracking NOLOOP and
send an invalidation message anyway.

(cherry picked from commit 44859a41ee)
2022-12-12 17:02:54 +02:00
Huang Zhw 2cd5f6f3ff tracking pending invalidation message of flushdb sent by (#11068)
trackingHandlePendingKeyInvalidations should use proto.

(cherry picked from commit 61451b02cb)
2022-12-12 17:02:54 +02:00
Huang Zhw fbcd591b28 When client tracking is on, invalidation message of flushdb in a (#11038)
When FLUSHDB / FLUSHALL / SWAPDB is inside MULTI / EXEC, the
client side tracking invalidation message was interleaved with transaction response.

(cherry picked from commit 6f0a27e38e)
2022-12-12 17:02:54 +02:00
Meir Shpilraien (Spielrein) e1b17f1232 Fix #11030, use lua_rawget to avoid triggering metatables and crash. (#11032)
Fix #11030, use lua_rawget to avoid triggering metatables.

#11030 shows how return `_G` from the Lua script (either function or eval), cause the
Lua interpreter to Panic and the Redis processes to exit with error code 1.
Though return `_G` only panic on Redis 7 and 6.2.7, the underline issue exists on older
versions as well (6.0 and 6.2). The underline issue is returning a table with a metatable
such that the metatable raises an error.

The following example demonstrate the issue:
```
127.0.0.1:6379> eval "local a = {}; setmetatable(a,{__index=function() foo() end}) return a" 0
Error: Server closed the connection
```
```
PANIC: unprotected error in call to Lua API (user_script:1: Script attempted to access nonexistent global variable 'foo')
```

The Lua panic happened because when returning the result to the client, Redis needs to
introspect the returning table and transform the table into a resp. In order to scan the table,
Redis uses `lua_gettable` api which might trigger the metatable (if exists) and might raise an error.
This code is not running inside `pcall` (Lua protected call), so raising an error causes the
Lua to panic and exit. Notice that this is not a crash, its a Lua panic that exit with error code 1.

Returning `_G` panics on Redis 7 and 6.2.7 because on those versions `_G` has a metatable
that raises error when trying to fetch a none existing key.

### Solution

Instead of using `lua_gettable` that might raise error and cause the issue, use `lua_rawget`
that simply return the value from the table without triggering any metatable logic.
This is promised not to raise and error.

The downside of this solution is that it might be considered as breaking change, if someone
rely on metatable in the returned value. An alternative solution is to wrap this entire logic
with `pcall` (Lua protected call), this alternative require a much bigger refactoring.

### Back Porting

The same fix will work on older versions as well (6.2, 6.0). Notice that on those version,
the issue can cause Redis to crash if inside the metatable logic there is an attempt to accesses
Redis (`redis.call`). On 7.0, there is not crash and the `redis.call` is executed as if it was done
from inside the script itself.

### Tests

Tests was added the verify the fix

(cherry picked from commit 020e046b42)
2022-12-12 17:02:54 +02:00
chenyang8094 8c76e0f288 Register abs-expire apis (#11025)
RM_SetAbsExpire and RM_GetAbsExpire were not actually operational since
they were introduced, due to omission in API registration.

(cherry picked from commit 39d216a326)
2022-12-12 17:02:54 +02:00
Yossi Gottlieb b4de9e832d TLS: Notify clients on connection shutdown. (#10931)
Use SSL_shutdown(), in a best-effort manner, when closing a TLS
connection. This change better supports OpenSSL 3.x clients that will
not silently ignore the socket-level EOF.

(cherry picked from commit 45ae605332)
2022-12-12 17:02:54 +02:00
Oran Agra 70aaf50082 optimize zset conversion on large ZRANGESTORE (#10789)
when we know the size of the zset we're gonna store in advance,
we can check if it's greater than the listpack encoding threshold,
in which case we can create a skiplist from the get go, and avoid
converting the listpack to skiplist later after it was already populated.

(cherry picked from commit 2189100383)
2022-12-12 17:02:54 +02:00
Oran Agra 66472a5ee3 crash report instructions (#10816)
Trying to avoid people opening crash report issues about module crashes and ARM QEMU bugs.

(cherry picked from commit 475563e2e9)
2022-12-12 17:02:54 +02:00
Vitaly fd7bde5726 Fix ZRANGESTORE crash when zset_max_listpack_entries is 0 (#10767)
When `zrangestore` is called container destination object is created.
Before this PR we used to create a listpack based object even if `zset-max-ziplist-entries`
or equivalent`zset-max-listpack-entries` was set to 0.
This triggered immediate conversion of the listpack into a skiplist in `zrangestore`, which hits
an assertion resulting in an engine crash.

Added a TCL test that reproduces this issue.

(cherry picked from commit 6461f09f43)
2022-12-12 17:02:54 +02:00
Madelyn Olson 9d40f1cbb5 Unpause clients earlier during manual cluster failover (#9676)
Unpause clients after manual failover ends instead of the timed offset

(cherry picked from commit 32215e7889)
2022-12-12 17:02:54 +02:00
Wen Hui 389c5e14ad Sentinel: don't log auth-pass value for better security (#9652)
(cherry picked from commit 43b22f17dc)
2022-12-12 17:02:54 +02:00
Meir Shpilraien (Spielrein) 64c657a8af
Remove dead code on sorting reply on Lua scripts. (#10701)
On v6.2.7 a new mechanism was added to Lua scripts that allows
filtering the globals of the Lua interpreter. This mechanism was
added on the following commit: 11b602fbf8

One of the globals that was filtered out was `__redis__compare_helper`. This global
was missed and was not added to the allow list or to the deny list. This is
why we get the following warning when Redis starts:
`A key '__redis__compare_helper' was added to Lua globals which is not on the globals allow list nor listed on the deny list.`

After investigating the git blame log, the conclusion is that `__redis__compare_helper`
is no longer needed, the PR deletes this function, and fixes the warning.

Detailed Explanation:

`__redis__compare_helper` was added on this commit: https://github.com/redis/redis/commit/2c861050c1
Its purpose is to sort the replies of `SORT` command when script replication is enable and keep the replies
deterministic and avoid primary and replica synchronization issues. On `SORT` command, there was a need for
special compare function that are able to compare boolean values.

The need to sort the `SORT` command reply was removed on this commit: 36741b2c81
The sorting was moved to be part of the `SORT` command and there was not longer a need
to sort it on the Lua interpreter. The commit made `__redis__compare_helper` a dead code but did
not deleted it.
2022-12-06 12:22:36 +02:00
Oran Agra e6f67092f8 Redis 6.2.7 2022-04-27 16:31:52 +03:00
Yossi Gottlieb 3053337043 Fix test modules build issue on OS X 11. (#9658)
(cherry picked from commit 8bf4c2e38c)
2022-04-27 16:31:52 +03:00
Oran Agra 215c5eacb0 Skip cluster test unit in TLS mode.
see 7d6744c739
2022-04-27 16:31:52 +03:00
Oran Agra afb48c6cc9 Whitelist Lua print function to avoid breaking change in old releases 2022-04-27 16:31:52 +03:00
filipe oliveira 7fddebc272 Optimization: Use either monotonic or wall-clock to measure command execution time, to regain up to 4% execution time (#10502)
In #7491 (part of redis 6.2), we started using the monotonic timer instead of mstime to measure
command execution time for stats, apparently this meant sampling the clock 3 times per command
rather than two (wince we also need the wall-clock time).
In some cases this causes a significant overhead.

This PR fixes that by avoiding the use of monotonic timer, except for the cases were we know it
should be extremely fast.
This PR also adds a new INFO field called `monotonic_clock` that shows which clock redis is using.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 3cd8baf616)
2022-04-27 16:31:52 +03:00
Oran Agra 28d20b8f14 Bring some CI adjustments from 7.0 into 6.2 2022-04-27 16:31:52 +03:00
Yossi Gottlieb ba2feb3004 Clean unused var compiler warning in module test. (#9289)
(cherry picked from commit 8bf433dc86)
2022-04-27 16:31:52 +03:00
sundb aa0606db08 Fix memory leak due to missing freeCallback in blockonbackground moduleapi test (#9499)
Before #9497, before redis-server was shut down, we did not manually shut down all the clients,
which would have prevented valgrind from detecting a memory leak in the client's argc.

(cherry picked from commit 1376d83363)
2022-04-27 16:31:52 +03:00
Ozan Tezcan 49c1c96fc1 Fix overflow check in expireGenericCommand
Partial cherry pick from #9601 in order for the tests in #9601 to pass

(cherry picked from commit b91d8b289b)
2022-04-27 16:31:52 +03:00
Oran Agra d92f2f5ad6 test suite improvements pulled back from 7.0 for cherry picked commits 2022-04-27 16:31:52 +03:00
Oran Agra e1c09d5982 crash log, print killer pid only when si_code is SI_USER (#10454)
Avoid printing "Killed by PID" when si_code != SI_USER.
Apparently SI_USER isn't always set to 0. e.g. on Mac it's 0x10001 and the check that did <= was wrong.

(cherry picked from commit 6761d10cc3)
2022-04-27 16:31:52 +03:00
yiyuaner 8d9e75c769 Fix an off by one error in zzlStrtod (#10465)
When vlen = sizeof(buf), the statement buf[vlen] = '\0' accessing the buffer buf is an off by one error.

(cherry picked from commit 08aed7e7dd)
2022-04-27 16:31:52 +03:00
Vitah Lin 2643c691be Fix memory leak in RM_StreamIteratorStop and moduleFreeKeyIterator (#10353)
* Fix memory leak in RM_StreamIteratorStop
* Fix memory leak in moduleFreeKeyIterator

(cherry picked from commit dff153ff24)
2022-04-27 16:31:52 +03:00
filipe oliveira ef78757e69 Optimization: Avoid deferred array reply on ZRANGE commands BYRANK (#10337)
Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client.
I.e. any ZRANGE / ZREVRNGE (when tank is used).
This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 1dc89e2d02)
2022-04-27 16:31:52 +03:00
filipe oliveira 35e697de83 Optimize deferred replies to use shared objects instead of sprintf (#10334)
Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b857928ba7)
2022-04-27 16:31:52 +03:00
qetu3790 159981e73c Fix geo search bounding box check causing missing results (#10018)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns  "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the  boundingbox but out of search areas.So we let  search areas contain boundingbox to get n2.

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit b2d393b990)
2022-04-27 16:31:52 +03:00