Commit Graph

10461 Commits

Author SHA1 Message Date
filipe oliveira ef78757e69 Optimization: Avoid deferred array reply on ZRANGE commands BYRANK (#10337)
Avoid deferred array reply on genericZrangebyrankCommand() when consumer type is client.
I.e. any ZRANGE / ZREVRNGE (when tank is used).
This was a performance regression introduced in #7844 (v 6.2) mainly affecting pipelined workloads.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 1dc89e2d02)
2022-04-27 16:31:52 +03:00
filipe oliveira 35e697de83 Optimize deferred replies to use shared objects instead of sprintf (#10334)
Avoid sprintf/ll2string on setDeferredAggregateLen()/addReplyLongLongWithPrefix() when we can used shared objects.
In some pipelined workloads this achieves about 10% improvement.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit b857928ba7)
2022-04-27 16:31:52 +03:00
qetu3790 159981e73c Fix geo search bounding box check causing missing results (#10018)
Consider the following example:
1. geoadd k1 -0.15307903289794921875 85 n1 0.3515625 85.00019260486917005437 n2.
2. geodist k1 n1 n2 returns  "4891.9380"
3. but GEORADIUSBYMEMBER k1 n1 4891.94 m only returns n1.
n2 is in the  boundingbox but out of search areas.So we let  search areas contain boundingbox to get n2.

Co-authored-by: Binbin <binloveplay1314@qq.com>
(cherry picked from commit b2d393b990)
2022-04-27 16:31:52 +03:00
Yossi Gottlieb ec06e93319 Fix OpenSSL 3.0.x related issues. (#10291)
* Drop obsolete initialization calls.
* Use decoder API for DH parameters.
* Enable auto DH parameters if not explicitly used, which should be the
  preferred configuration going forward.

(cherry picked from commit 3881f7850f)
2022-04-27 16:31:52 +03:00
Oran Agra d6a8e64e69 Fix and improve module error reply statistics (#10278)
This PR handles several aspects
1. Calls to RM_ReplyWithError from thread safe contexts don't violate thread safety.
2. Errors returning from RM_Call to the module aren't counted in the statistics (they
  might be handled silently by the module)
3. When a module propagates a reply it got from RM_Call to it's client, then the error
  statistics are counted.

This is done by:
1. When appending an error reply to the output buffer, we avoid updating the global
  error statistics, instead we cache that error in a deferred list in the client struct.
2. When creating a RedisModuleCallReply object, the deferred error list is moved from
  the client into that object.
3. when a module calls RM_ReplyWithCallReply we copy the deferred replies to the dest
  client (if that's a real client, then that's when the error statistics are updated to the server)

Note about RM_ReplyWithCallReply: if the original reply had an array with errors, and the module
replied with just a portion of the original reply, and not the entire reply, the errors are currently not
propagated and the errors stats will not get propagated.

Fix #10180

(cherry picked from commit b099889a3a)
2022-04-27 16:31:52 +03:00
Oran Agra a06f10b009 Attempt to fix a rare crash in cluster tests. (#10265)
The theory is that a replica gets disconnected from within REPLCONF ACK,
so when we go up the stack, we'll crash when attempting to access
c->cmd->flags

(cherry picked from commit aa9beaca77)
2022-04-27 16:31:52 +03:00
ivanstosic-janea 0a26869a13 Fix protocol error caused by redis-benchmark (#10236)
The protocol error was caused by the buggy `writeHandler` in `redis-benchmark.c`,
which didn't handle one of the cases, thereby repeating data, leading to protocol errors
when the values being sent are very long.

This PR fixes #10233, issue introduced by #7959

(cherry picked from commit bb875603fb)
2022-04-27 16:31:52 +03:00
Binbin 2c529844fc Fix PSYNC crash with wrong offset (#10243)
`PSYNC replicationid str_offset` will crash the server.

The reason is in `masterTryPartialResynchronization`,
we will call `getLongLongFromObjectOrReply` check the
offset. With a wrong offset, it will add a reply and
then trigger a full SYNC and the client become a replica.

So crash in `c->bufpos == 0 && listLength(c->reply) == 0`.
In this commit, we check the psync_offset before entering
function `masterTryPartialResynchronization`, and return.

Regardless of that crash, accepting the sync, but also replying
with an error would have corrupt the replication stream.

(cherry picked from commit 344e41c922)
2022-04-27 16:31:52 +03:00
Moti Cohen 33f7e12b88 Improve srand entropy (and fix Sentinel failures) (#10197)
As Sentinel relies upon consensus algorithm, all sentinel instances,
randomize a time to initiate their next attempt to become the
leader of the group. But time after time, all raffled the same value.

The problem is in the line `srand(time(NULL)^getpid())` such that
all spinned up containers get same time (in seconds) and same pid
which is always 1. Added material `tv_usec` and verify that even
consecutive calls brings different values and makes the difference.

(cherry picked from commit 52b2fbe970)
2022-04-27 16:31:52 +03:00
Moti Cohen 90891a7dde Fixed Sentinel support for hostnames (#10146)
Sentinel tries to resolve instances hostname to IP only during registration.
It might be that the instance is unavailable during that time, such as
leader crashed and failover took place. Yet, promoted replica must support:

 - Register leader, even if it fails to resolve its hostname during failover
 - Try later to resolve it, if instance is disconnected. Note that
   this condition also support ip-change of an instance.

(cherry picked from commit 79f089bdd9)
2022-04-27 16:31:52 +03:00
David CARLIER 4a59230d82 zmalloc_get_rss netbsd impl fix proposal. (#10116)
Seems like the previous implementation was broken (always returning 0)

since kinfo_proc2 is used the KERN_PROC2 sysctl oid is more appropriate
and also the query's length was not necessarily accurate (6 here).

(cherry picked from commit 50fa627b90)
2022-04-27 16:31:52 +03:00
Binbin e24b947c9b LPOP/RPOP with count against non existing list return null array (#10095)
It used to return `$-1` in RESP2, now we will return `*-1`.
This is a bug in redis 6.2 when COUNT was added, the `COUNT`
option was introduced in #8179. Fix #10089.

the documentation of [LPOP](https://redis.io/commands/lpop) says
```
When called without the count argument:
Bulk string reply: the value of the first element, or nil when key does not exist.

When called with the count argument:
Array reply: list of popped elements, or nil when key does not exist.
```

(cherry picked from commit 39feee8e3a)
2022-04-27 16:31:52 +03:00
guybe7 c5a753c890 lpGetInteger returns int64_t, avoid overflow (#10068)
Fix #9410

Crucial for the ms and sequence deltas, but I changed all
calls, just in case (e.g. "flags")

Before this commit:
`ms_delta` and `seq_delta` could have overflown, causing `currid` to be wrong,
which in turn would cause `streamTrim` to trim the entire rax node (see new test)

(cherry picked from commit 7cd6a64d2f)
2022-04-27 16:31:52 +03:00
王辉 64327e4169 Fix C11_ATOMIC detection on GNU Make 4.3 (#10033)
Older version of GNU Make (<4.3) required quoting of number signs (#) to
avoid them being treated as a comment. Newer versions will treat this
quote as a literal.

This issue and a proposed solution is discussed here:
https://lists.gnu.org/archive/html/info-gnu/2020-01/msg00004.html

Co-authored-by: Yossi Gottlieb <yossigo@gmail.com>
(cherry picked from commit 747b08bee0)
2022-04-27 16:31:52 +03:00
sundb aab9b12786 Fix when the master connection is disconnected, replication retry read indefinitely (#10032)
Now if redis is still loading when we receive sigterm, we will wait for the loading to reach the event
loop (once in 2mb) before actually shutting down. See #10003.

This change caused valgrind CI to fail.
See https://github.com/redis/redis/runs/4662901673?check_suite_focus=true

This pr is mainly to solve the problem that redis process cannot be exited normally.
When the master is disconnected, if repl is processing diskless loading and using `connRead` to read data from master,
it may enter an infinite retry state, which does not handle `connRead` returning 0(master connection disconnected).

(cherry picked from commit 73951abe7b)
2022-04-27 16:31:52 +03:00
Madelyn Olson 066b68328e Redact ACL SETUSER arguments if the user has spaces (#9935)
(cherry picked from commit c40d23b89f)
2022-04-27 16:31:52 +03:00
sundb db3e3ebcb2 Santize dump payload: fix invalid listpack entry start with EOF (#9889)
When an invalid listpack entry starts with EOF, we will skip it when we verify it in the loop.

(cherry picked from commit 1808618f5d)
2022-04-27 16:31:52 +03:00
OfirMos 86db5091c7 fixed mem leak on rdb load error (#9860)
a rare case of short read that can happen when breaking the master-replica
connection on diskless load mode,

(cherry picked from commit 9f9c78578f)
2022-04-27 16:31:52 +03:00
Meir Shpilraien (Spielrein) 93c1d31d97 Clean Lua stack before parsing call reply to avoid crash on a call with many arguments (#9809)
This commit 0f8b634cd (CVE-2021-32626 released in 6.2.6, 6.0.16, 5.0.14)
fixes an invalid memory write issue by using `lua_checkstack` API to make
sure the Lua stack is not overflow. This fix was added on 3 places:
1. `luaReplyToRedisReply`
2. `ldbRedis`
3. `redisProtocolToLuaType`

On the first 2 functions, `lua_checkstack` is handled gracefully while the
last is handled with an assert and a statement that this situation can
not happened (only with misbehave module):

> the Redis reply might be deep enough to explode the LUA stack (notice
that currently there is no such command in Redis that returns such a nested
reply, but modules might do it)

The issue that was discovered is that user arguments is also considered part
of the stack, and so the following script (for example) make the assertion reachable:
```
local a = {}
for i=1,7999 do
    a[i] = 1
end
return redis.call("lpush", "l", unpack(a))
```

This is a regression because such a script would have worked before and now
its crashing Redis. The solution is to clear the function arguments from the Lua
stack which makes the original assumption true and the assertion unreachable.

(cherry picked from commit 6b0b04f1b2)
2022-04-27 16:31:52 +03:00
Binbin 8fca090ede Add tests to cover EXPIRE overflow fix (#9839)
In #8287, some overflow checks have been added. But when
`when *= 1000` overflows, it will become a positive number.
And the check not able to catch it. The key will be added with
a short expiration time and will deleted a few seconds later.

In #9601, will check the overflow after `*=` and return an
error first, and avoiding this situation.

In this commit, added some tests to cover those code paths.
Found it in #9825, and close it.

(cherry picked from commit 9273d09dd4)
2022-04-27 16:31:52 +03:00
Oran Agra e38d0b5a5f fix invalid read on corrupt ziplist (#9831)
If the last bytes in ziplist are corrupt and we decode from tail to head,
we may reach slightly outside the ziplist.

(cherry picked from commit a3a014294f)
2022-04-27 16:31:52 +03:00
Wen Hui 94013b8ecb Sentinel tls memory leak (#9753)
There was a memory leak when tls is used in Sentinels.
The memory leak is noticed when some of the replicas are offline.

(cherry picked from commit 2ce29e032b)
2022-04-27 16:31:52 +03:00
Itamar Haber 968cd2b967 Fixes LPOP/RPOP wrong replies when count is 0 (#9692)
Introduced in #8179, this fixes the command's replies in the 0 count edge case.
[BREAKING] changes the reply type when count is 0 to an empty array (instead of nil)
Moves LPOP ... 0 fast exit path after type check to reply with WRONGTYPE

(cherry picked from commit 06dd202a05)
2022-04-27 16:31:52 +03:00
Rafi Einstein 19c2b179c7 Fix memory leak when there's a read error of module aux data from rdb. (#9705)
(cherry picked from commit 734cde7e38)
2022-04-27 16:31:52 +03:00
Oran Agra 3ba7c6acbe fix new cluster tests issues (#9657)
Following #9483 the daily CI exposed a few problems.

* The cluster creation code (uses redis-cli) is complicated to test with TLS enabled.
  for now i'm just skipping them since the tests we run there don't really need that kind of coverage
* cluster port binding failures
  note that `find_available_port` already looks for a free cluster port
  but the code in `wait_server_started` couldn't detect the failure of binding
  (the text it greps for wasn't found in the log)

(cherry picked from commit 7d6744c739)
2022-04-27 16:31:52 +03:00
qetu3790 936ee01759 Release clients blocked on module commands in cluster resharding and down state (#9483)
Prevent clients from being blocked forever in cluster when they block with their own module command
and the hash slot is migrated to another master at the same time.
These will get a redirection message when unblocked.
Also, release clients blocked on module commands when cluster is down (same as other blocked clients)

This commit adds basic tests for the main (non-cluster) redis test infra that test the cluster.
This was done because the cluster test infra can't handle some common test features,
but most importantly we only build the test modules with the non-cluster test suite.

note that rather than really supporting cluster operations by the test infra, it was added (as dup code)
in two files, one for module tests and one for non-modules tests, maybe in the future we'll refactor that.

Co-authored-by: Oran Agra <oran@redislabs.com>
(cherry picked from commit 4962c5526d)
2022-04-27 16:31:52 +03:00
DarrenJiang13 8a0515d9a3 add missed error counting (#9646)
* add: add missed error counting in sentinel.c and cluster.c

(cherry picked from commit aa6deff01e)
2022-04-27 16:31:52 +03:00
Yossi Gottlieb 21ab5d4f78 hiredis: improve calloc() overflow fix. (#9630)
Cherry pick a more complete fix to 0215324a6 that also doesn't leak
memory from latest hiredis.

(cherry picked from commit 922ef86a3b)
2022-04-27 16:31:52 +03:00
Huang Zhw 0fb96d55bd Make tracking invalidation messages always after command's reply (#9422)
Tracking invalidation messages were sometimes sent in inconsistent order,
before the command's reply rather than after.
In addition to that, they were sometimes embedded inside other commands
responses, like MULTI-EXEC and MGET.

(cherry picked from commit fd135f3e2d)
2022-04-27 16:31:52 +03:00
Binbin f5d8a36983 Add missing pause tcl test to test_helper.tcl (#9158)
* Add keyname tags to avoid CROSSSLOT errors in external server CI
* Use new wait_for_blocked_clients_count in pause.tcl

(cherry picked from commit 5dddf496ce)
2022-04-27 16:31:52 +03:00
meir 11b602fbf8 Protect any table which is reachable from globals and added globals allow list.
The allow list is done by setting a metatable on the global table before initializing
any library. The metatable set the `__newindex` field to a function that check
the allow list before adding the field to the table. Fields which is not on the
allow list are simply ignored.

After initialization phase is done we protect the global table and each table
that might be reachable from the global table. For each table we also protect
the table metatable if exists.
2022-04-27 16:31:52 +03:00
meir b2ce3719af Protect globals of evals scripts.
Use the new `lua_enablereadonlytable` Lua API to protect the global tables of
evals scripts. The implemetation is easy, we simply call `lua_enablereadonlytable`
on the global table to turn it into a readonly table.
2022-04-27 16:31:52 +03:00
meir 21414ad480 Move user eval function to be located on Lua registry.
Today, Redis wrap the user Lua code with a Lua function.
For example, assuming the user code is:

```
return redis.call('ping')
```

The actual code that would have sent to the Lua interpreter was:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c() return redis.call('ping') end
```

The wraped code would have been saved on the global dictionary with the
following name: `f_<script sha>` (in our example `f_b3a02c833904802db9c34a3cf1292eee3246df3c`).

This approach allows one user to easily override the implementation a another user code, example:

```
f_b3a02c833904802db9c34a3cf1292eee3246df3c = function() return 'hacked' end
```

Running the above code will cause `evalsha b3a02c833904802db9c34a3cf1292eee3246df3c 0` to return
hacked although it should have returned `pong`.

Another disadventage is that Redis basically runs code on the loading (compiling) phase without been
aware of it. User can do code injection like this:

```
return 1 end <run code on compling phase> function() return 1
```

The wraped code will look like this and the entire `<run code on compling phase>` block will run outside
of eval or evalsha context:

```
f_<sha>() return 1 end <run code on compling phase> function() return 1 end
```
2022-04-27 16:31:52 +03:00
meir 13c1e1f298 Added support for Lua readonly tables.
The new feature can be turned off and on using the new `lua_enablereadonlytable` Lua API.

(cherry picked from commit 92b5098b87e2d0880a530899119524bf1dfbc332)
2022-04-27 16:31:52 +03:00
Vo Trong Phuc 34505d26f7
add check good slaves to write when execute script (#10249)
There was no check min-slave-* config when evaluating Lua script.
Add check enough good slaves for write command when evaluating scripts.

Co-authored-by: Phuc. Vo Trong <phucvt@vng.com.vn>
2022-04-11 12:59:27 +03:00
Oran Agra 4930d19e70 Redis 6.2.6 2021-10-04 13:59:40 +03:00
Oran Agra aba9517542 corrupt-dump-fuzzer test, avoid creating junk keys (#9302)
The execution of the RPOPLPUSH command by the fuzzer created junk keys,
that were later being selected by RANDOMKEY and modified.
This also meant that lists were statistically tested more than other
files.

Fix the fuzzer not to pass junk key names to RPOPLPUSH, and add a check
that detects that new keys are not added by the fuzzer to detect future
similar issues.

(cherry picked from commit 3f3f678a47)
2021-10-04 13:59:40 +03:00
sundb 7540708a61 Fix missing check for sanitize_dump in corrupt-dump-fuzzer test (#9285)
this means the assertion that checks that when deep sanitization is enabled,
there are no crashes, was missing.

(cherry picked from commit 3db0f1a284)
2021-10-04 13:59:40 +03:00
Yunier Pérez 664c7b36a5 Allow to override OPENSSL_PREFIX (#9567)
While the original issue was on Linux, this should work for other
platforms as well.
2021-10-04 13:59:40 +03:00
Yossi Gottlieb c390972c76 Propagate OPENSSL_PREFIX to hiredis. (#9345) 2021-10-04 13:59:40 +03:00
Oran Agra 73d286d523 Fix stream sanitization for non-int first value (#9553)
This was recently broken in #9321 when we validated stream IDs to be
integers but did that after to the stepping next record instead of before.

(cherry picked from commit 5a4ab7c7d2)
2021-10-04 13:59:40 +03:00
David CARLIER 4cbccab2bd TLS build fix on OpenBSD when built with LibreSSL. (#9486)
(cherry picked from commit 418c2e7931)
2021-10-04 13:59:40 +03:00
yvette903 23ed37dd87 Fix: client pause uses an old timeout (#9477)
A write request may be paused unexpectedly because `server.client_pause_end_time` is old.

**Recreate this:**
redis-cli -p 6379
127.0.0.1:6379> client pause 500000000 write
OK
127.0.0.1:6379> client unpause
OK
127.0.0.1:6379> client pause 10000 write
OK
127.0.0.1:6379> set key value

The write request `set key value` is paused util  the timeout of 500000000 milliseconds was reached.

**Fix:**
reset `server.client_pause_end_time` = 0 in `unpauseClients`

(cherry picked from commit f560531d5b)
2021-10-04 13:59:40 +03:00
zhaozhao.zz 9b25484a13 Fix wrong offset when replica pause (#9448)
When a replica paused, it would not apply any commands event the command comes from master, if we feed the non-applied command to replication stream, the replication offset would be wrong, and data would be lost after failover(since replica's `master_repl_offset` grows but command is not applied).

To fix it, here are the changes:
* Don't update replica's replication offset or propagate commands to sub-replicas when it's paused in `commandProcessed`.
* Show `slave_read_repl_offset` in info reply.
* Add an assert to make sure master client should never be blocked unless pause or module (some modules may use block way to do background (parallel) processing and forward original block module command to the replica, it's not a good way but it can work, so the assert excludes module now, but someday in future all modules should rewrite block command to propagate like what `BLPOP` does).

(cherry picked from commit 1b83353dc3)
2021-10-04 13:59:40 +03:00
Madelyn Olson 49f8f43890 Add test verifying PUBSUB NUMPAT behavior (#9209)
(cherry picked from commit 8b8f05c86c)
2021-10-04 13:59:40 +03:00
guybe7 c936f801c0 Fix two minor bugs (MIGRATE key args and getKeysUsingCommandTable) (#9455)
1. MIGRATE has a potnetial key arg in argv[3]. It should be reflected in the command table.
2. getKeysUsingCommandTable should never free getKeysResult, it is always freed by the caller)
   The reason we never encountered this double-free bug is that almost always getKeysResult
   uses the statis buffer and doesn't allocate a new one.

(cherry picked from commit 6aa2285e32)
2021-10-04 13:59:40 +03:00
sundb 694a869e78 Fix the timing of read and write events under kqueue (#9416)
Normally we execute the read event first and then the write event.
When the barrier is set, we will do it reverse.
However, under `kqueue`, if an `fd` has both read and write events,
reading the event using `kevent` will generate two events, which will
result in uncontrolled read and write timing.

This also means that the guarantees of AOF `appendfsync` = `always` are
not met on MacOS without this fix.

The main change to this pr is to cache the events already obtained when reading
them, so that if the same `fd` occurs again, only the mask in the cache is updated,
rather than a new event is generated.

This was exposed by the following test failure on MacOS:
```
*** [err]: AOF fsync always barrier issue in tests/integration/aof.tcl
Expected 544 != 544 (context: type eval line 26 cmd {assert {$size1 != $size2}} proc ::test)
```

(cherry picked from commit 306a5ccd2d)
2021-10-04 13:59:40 +03:00
sundb a2e8a3a241 Sanitize dump payload: fix double free after insert dup nodekey to stream rax and returns 0 (#9399)
(cherry picked from commit 492d8d0961)
2021-10-04 13:59:40 +03:00
Wang Yuan 2f8ec0e024 Fix the wrong detection of sync_file_range system call (#9371)
If we want to check `defined(SYNC_FILE_RANGE_WAIT_BEFORE)`, we should include fcntl.h.
otherwise, SYNC_FILE_RANGE_WAIT_BEFORE is not defined, and there is alway not `sync_file_range` system call.
Introduced by #8532

(cherry picked from commit 8edc3cd62c)
2021-10-04 13:59:40 +03:00
sundb 09c63c45dd Sanitize dump payload: handle remaining empty key when RDB loading and restore command (#9349)
This commit mainly fixes empty keys due to RDB loading and restore command,
which was omitted in #9297.

1) When loading quicklsit, if all the ziplists in the quicklist are empty, NULL will be returned.
    If only some of the ziplists are empty, then we will skip the empty ziplists silently.
2) When loading hash zipmap, if zipmap is empty, sanitization check will fail.
3) When loading hash ziplist, if ziplist is empty, NULL will be returned.
4) Add RDB loading test with sanitize.

(cherry picked from commit cbda492909)
2021-10-04 13:59:40 +03:00