Commit Graph

2494 Commits

Author SHA1 Message Date
antirez c641b670c3 Use new dictGetRandomKeys() API to get samples for eviction.
The eviction quality degradates a bit in my tests, but since the API is
faster, it allows to raise the number of samples, and overall is a win.
2014-03-20 16:52:12 +01:00
antirez 82b53c650c struct dictEntry -> dictEntry. 2014-03-20 16:20:37 +01:00
antirez 5317f5e99a Added dictGetRandomKeys() to dict.c: mass get random entries.
This new function is useful to get a number of random entries from an
hash table when we just need to do some sampling without particularly
good distribution.

It just jumps at a random place of the hash table and returns the first
N items encountered by scanning linearly.

The main usefulness of this function is to speedup Redis internal
sampling of the key space, for example for key eviction or expiry.
2014-03-20 15:50:46 +01:00
antirez 22c9cfaf57 LRU eviction pool implementation.
This is an improvement over the previous eviction algorithm where we use
an eviction pool that is persistent across evictions of keys, and gets
populated with the best candidates for evictions found so far.

It allows to approximate LRU eviction at a given number of samples
better than the previous algorithm used.
2014-03-20 11:57:29 +01:00
antirez 6d5790d682 Fix OBJECT IDLETIME return value converting to seconds.
estimateObjectIdleTime() returns a value in milliseconds now, so we need
to scale the output of OBJECT IDLETIME to seconds.
2014-03-20 11:55:18 +01:00
antirez ad6b0f70b2 Obtain LRU clock in a resolution dependent way.
For testing purposes it is handy to have a very high resolution of the
LRU clock, so that it is possible to experiment with scripts running in
just a few seconds how the eviction algorithms works.

This commit allows Redis to use the cached LRU clock, or a value
computed on demand, depending on the resolution. So normally we have the
good performance of a precomputed value, and a clock that wraps in many
days using the normal resolution, but if needed, changing a define will
switch behavior to an high resolution LRU clock.
2014-03-20 11:47:12 +01:00
antirez 1faf82663f Specify lruclock in redisServer structure via REDIS_LRU_BITS.
The padding field was totally useless: removed.
2014-03-20 11:37:27 +01:00
antirez d77e231682 Specify LRU resolution in milliseconds. 2014-03-20 11:33:25 +01:00
antirez fe30847016 Set LRU parameters via REDIS_LRU_BITS define. 2014-03-20 11:22:47 +01:00
antirez e150ec7d0c Unify stats reset for CONFIG RESETSTAT / initServer().
Now CONFIG RESETSTAT makes sure to reset all the fields, and in the
future it will be simpler to avoid missing new fields.
2014-03-19 12:55:49 +01:00
Matt Stancliff 67ed5f00aa Cluster: remove variable causing warning
GCC-4.9 warned about this, but clang didn't.

This commit fixes warning:
sentinel.c: In function 'sentinelReceiveHelloMessages':
sentinel.c:2156:43: warning: variable 'master' set but not used [-Wunused-but-set-variable]
     sentinelRedisInstance *ri = c->data, *master;
2014-03-18 15:35:09 -04:00
antirez b9e90a70fa Sentinel: sentinelRefreshInstanceInfo() minor refactoring.
Test sentinel.tilt condition on top and return if it is true.
This allows to remove the check for the tilt condition in the remaining
code paths of the function.
2014-03-18 15:35:47 +01:00
antirez 218cc5fc39 Sentinel: propagate down-after-ms changes to slaves and sentinels. 2014-03-18 14:37:44 +01:00
antirez bb6d850160 Sentinel: down-after-milliseconds is not master-specific.
addReplySentinelRedisInstance() modified so that this field is displayed
for all the kind of instances: Sentinels, Masters, Slaves.
2014-03-18 11:21:17 +01:00
antirez ae0b7680b3 Sentinel failure detection implementation improved.
Failure detection in Sentinel is ping-pong based. It used to work by
remembering the last time a valid PONG reply was received, and checking
if the reception time was too old compared to the current current time.

PINGs were sent at a fixed interval of 1 second.

This works in a decent way, but does not scale well when we want to set
very small values of "down-after-milliseconds" (this is the node
timeout basically).

This commit reiplements the failure detection making a number of
changes. Some changes are inspired to Redis Cluster failure detection
code:

* A new last_ping_time field is added in representation of instances.
  If non zero, we have an active ping that was sent at the specified
  time. When a valid reply to ping is received, the field is zeroed
  again.
* last_ping_time is not reset when we reconnect the link or send a new
  ping, so from our point of view it represents the time we started
  waiting for the instance to reply to our pings without receiving a
  reply.
* last_ping_time is now used in order to check if the instance is
  timed out. This means that we can have a node timeout of 100
  milliseconds and yet the system will work well since the new check is
  not bound to the period used to send pings.
* Pings are now sent every second, or often if the value of
  down-after-milliseconds is less than one second. With a lower limit of
  10 HZ ping frequency.
* Link reconnection code was improved. This is used in order to try to
  reconnect the link when we are at 50% of the node timeout without a
  valid reply received yet. However the old code triggered unnecessary
  reconnections when the node timeout was very small. Now that should be
  ok.

The new code passes the tests but more testing is needed and more unit
tests stressing the failure detector, so currently this is merged only
in the unstable branch.
2014-03-17 18:33:45 +01:00
antirez 3a2ff55617 Sentinel: use CLIENT SETNAME when connecting to Redis.
This makes debugging / monitoring of Sentinels simpler since you can
identify sentinels in CLIENT LIST output of Redis instances.
2014-03-15 14:59:23 +01:00
Matt Stancliff 584052ee6b Fix segfault from accessing array out of bounds
argc == 2; argv[2] == crash
2014-03-14 17:38:05 -04:00
antirez ed813863f0 Sentinel: be safe under crash-recovery assumptions.
Sentinel's main safety argument is that there are no two configurations
for the same master with the same version (configuration epoch).

For this to be true Sentinels require to be authorized by a majority.
Additionally Sentinels require to do two important things:

* Never vote again for the same epoch.
* Never exchange an old vote for a fresh one.

The first prerequisite, in a crash-recovery system model, requires to
persist the master->leader_epoch on durable storage before to reply to
messages. This was not the case.

We also make sure to persist the current epoch in order to never reply
to stale votes requests from other Sentinels, after a recovery.

The configuration is persisted by making use of fsync(), this is
considered in the context of this code a good enough guarantee that
after a restart our durable state is restored, however this may not
always be the case depending on the kind of hardware and operating
system used.
2014-03-14 14:58:44 +01:00
antirez 365094028b Sentinel: fake PUBLISH command to receive HELLO messages.
Now the way HELLO messages are received is unified.
Now it is no longer needed for Sentinels to converge to the higher
configuration for a master to be able to chat via some Redis instance,
the are able to directly exchanges configurations.

Note that this commit does not include the (trivial) change needed to
send HELLO messages to Sentinel instances as well, since for an error I
committed the change in the previous commit that refactored hello
messages processing into a separated function.
2014-03-14 11:07:42 +01:00
antirez 9dfe426fc8 Sentinel: HELLO processing refactored into sentinelProcessHelloMessage(). 2014-03-14 11:07:42 +01:00
antirez 133fccb03f Cluster: flag the transaction as dirty for the new redirections. 2014-03-13 15:11:53 +01:00
antirez 429aff4ef4 Linenoise updated, multiline mode enabled in redis-cli. 2014-03-13 15:11:08 +01:00
antirez cc11d103c0 redis-trib: call MIGRATE via r.client.call as fix for redis-rb API changes.
See issue #1593.

Thanks to @badboy for suggesting the direct client.call fix.
2014-03-11 16:10:13 +01:00
antirez df32eb6827 redis-trib: new subcommand 'call'. Exec command in all nodes.
Example:

./redis-trib.rb call 192.168.1.11:7000 config get cluster-node-timeout
2014-03-11 14:58:55 +01:00
antirez 2e5c394fa8 redis-trib: create subcommand is now able to assign spare slaves.
Example: if the user will try to configure a cluster with 9 nodes,
asking for 1 slave for master, redis-trib will configure a 4 masters
cluster with 1 slave each as usually, but this time will assign the
spare node as a slave of one of the masters.
2014-03-11 14:17:28 +01:00
antirez e26f4486b0 Cluster: update node configEpoch on UPDATE messages.
The UPDATE message contains the configEpoch of the node configuration
advertised in the packet. Update it if needed.
2014-03-11 11:53:09 +01:00
antirez a2ff90919f Cluster: set slot error if we receive an update for a busy slot.
By manually modifying nodes configurations in random ways, it is possible
to create the following scenario:

A is serving keys for slot 10
B is manually configured to serve keys for slot 10

A receives an update from B (or another node) where it is informed that
the slot 10 is now claimed by B with a greater configuration epoch,
however A still has keys from slot 10.

With this commit A will put the slot in error setting it in IMPORTING
state, so that redis-trib can detect the issue.
2014-03-11 11:49:47 +01:00
antirez 1ed0ad77f0 Cluster: clarified a comment in clusterUpdateSlotsConfigWith(). 2014-03-11 11:32:40 +01:00
antirez 8287945ff8 Cluster: flush importing/migrating state when master is turned into slave. 2014-03-11 11:22:06 +01:00
antirez 2e8e0ad44e Cluster: clusterCloseAllSlots() added. 2014-03-11 11:16:18 +01:00
antirez 8eae54aa1e DEBUG ERROR implemented.
The new "error" subcommand of the DEBUG command can reply with an user
selected error, specified as its sole argument:

    DEBUG ERROR "LOADING please wait..."

The error is generated just prefixing the command argument with a "-"
character, and replacing newlines with spaces (since error replies can't
include newlines).

The goal of the command is to help in Client libraries unit tests by
making simple to simulate a command call triggering a given error.
2014-03-10 23:01:55 +01:00
antirez 2705306ba1 DEBUG CMDKEYS: provide some guarantee to getKeysFromCommand().
getKeysFromCommand() is designed to be called with the command arguments
passing the basic arity checks described in the command table.

DEBUG CMDKEYS must provide the same guarantees for calling
getKeysFromCommand() to be safe.
2014-03-10 16:43:38 +01:00
antirez 5b864617bc Cluster: make sortGetKeys() able to handle multiple STORE options.
It does not make sense to pass multiple store options, so, better to
handle it ;-)
2014-03-10 16:39:07 +01:00
antirez c4ef1d6494 DEBUG CMDKEYS added for getKeysFromCommand() testing.
Examples:

    redis 127.0.0.1:6379> debug cmdkeys set foo bar
    1) "foo"
    redis 127.0.0.1:6379> debug cmdkeys mget a b c
    1) "a"
    2) "b"
    3) "c"
    redis 127.0.0.1:6379> debug cmdkeys zunionstore foo 2 a b
    1) "a"
    2) "b"
    3) "foo"
    redis 127.0.0.1:6379> debug cmdkeys ping
    (empty list or set)
2014-03-10 16:36:08 +01:00
antirez 3e1d772677 Cluster: don't allow BY option of SORT as well.
There is the exception of a "constant" BY pattern that is used in order
to signal to don't sort at all. In this case no lookup is needed so it
is possible to support this case in Cluster mode.
2014-03-10 16:28:18 +01:00
antirez 04cf02e8dc Cluster: SORT get keys helper implemented. 2014-03-10 16:26:08 +01:00
antirez 21765c8588 Cluster: evalGetKeys() fixed: was not setting keys count. 2014-03-10 16:23:42 +01:00
antirez 03344196f3 Cluster: don't allow GET option in cluster mode.
The commit also refactors a bit the error handling during SORT option
parsing.
2014-03-10 16:10:50 +01:00
antirez 8caecc9ab4 Fixed memory leak in SORT LIMIT option argument parsing on error. 2014-03-10 15:44:41 +01:00
antirez ef5e7fbaa2 Cluster: getKeysFromCommand() top comment improved. 2014-03-10 15:31:01 +01:00
antirez c0e818ab08 Cluster: evalGetKey() added for EVAL/EVALSHA.
Previously we used zunionInterGetKeys(), however after this function was
fixed to account for the destination key (not needed when the API was
designed for "diskstore") the two set of commands can no longer be served
by an unique keys-extraction function.
2014-03-10 15:26:13 +01:00
antirez caf7b9b425 Cluster: getKeysFromCommand() and related: top-comments added. 2014-03-10 15:24:38 +01:00
antirez 787b297046 Cluster: getKeysFromCommand() API cleaned up.
This API originated from the "diskstore" experiment, not for Redis
Cluster itself, so there were legacy/useless things trying to
differentiate between keys that are going to be overwritten and keys
that need to be fetched from disk (preloaded).

All useless with Cluster, so removed with the result of code
simplification.
2014-03-10 13:18:41 +01:00
antirez 55b88e0044 Cluster: some zunionInterGetKeys() comment trimmed.
Everything was pretty clear again from the initial statements.
2014-03-10 11:43:56 +01:00
Salvatore Sanfilippo aca6cb529b Merge pull request #1586 from mattsta/fix-zunioninterstorekeys
Fix key extraction for z{union,inter}store
2014-03-10 11:39:45 +01:00
antirez c1a7d3e61f Cluster: abort on port too high error.
It also fixes multi-line comment style to be consistent with the rest of
the code base.

Related to #1555.
2014-03-10 10:41:27 +01:00
Salvatore Sanfilippo 442b06db54 Merge pull request #1555 from mattsta/cluster-port-error-out
Cluster port error out
2014-03-10 10:37:50 +01:00
antirez ed8c55237b Cluster: be explicit about passing NULL as bind addr for connect.
The code was already correct but it was using that bindaddr[0] is set to
NULL as a side effect of current implementation if no bind address is
configured. This is not guarnteed to hold true in the future.
2014-03-10 10:33:53 +01:00
antirez 3e8a92ef8d Cluster: log error when anetTcpNonBlockBindConnect() fails. 2014-03-10 10:32:28 +01:00
Salvatore Sanfilippo 3b0edb80ec Merge pull request #1567 from mattsta/fix-cluster-join
Bind source address for cluster communication
2014-03-10 10:28:32 +01:00
antirez 0f1f25784f Cluster: better timeout and retry time for failover.
When node-timeout is too small, in the order of a few milliseconds,
there is no way the voting process can terminate during that time, so we
set a lower limit for the failover timeout of two seconds.

The retry time is set to two times the failover timeout time, so it is
at least 4 seconds.
2014-03-10 09:57:52 +01:00
Matt Stancliff f0782a6e86 Fix key extraction for z{union,inter}store
The previous implementation wasn't taking into account
the storage key in position 1 being a requirement (it
was only counting the source keys in positions 3 to N).

Fixes antirez/redis#1581
2014-03-07 16:33:20 -05:00
antirez 6984692060 Cluster: fix conditional generating TRYAGAIN error. 2014-03-07 16:18:00 +01:00
antirez 36676c2318 Redis Cluster: support for multi-key operations. 2014-03-07 13:19:09 +01:00
Salvatore Sanfilippo bbf39b7a3a Merge pull request #1576 from Hailei/fix-lruidletime-comment
Fix REDIS_LRU_CLOCK_MAX's value
2014-03-06 18:14:36 +01:00
antirez b74c899da3 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2014-03-06 18:06:30 +01:00
Matt Stancliff e8bae92e54 Reset op_sec_last_sample_ops when reset requested
This value needs to be set to zero (in addition to
stat_numcommands) or else people may see
a negative operations per second count after they
run CONFIG RESETSTAT.

Fixes antirez/redis#1577
2014-03-06 18:00:08 +01:00
Matt Stancliff 385c25f70f Remove redundant IP length definition
REDIS_CLUSTER_IPLEN had the same value as
REDIS_IP_STR_LEN.  They were both #define'd
to the same INET6_ADDRSTRLEN.
2014-03-06 17:55:43 +01:00
Matt Stancliff d2040ab9b1 Remove some redundant code
Function nodeIp2String in cluster.c is exactly
anetPeerToString with a pre-extracted fd.
2014-03-06 17:55:39 +01:00
Matt Stancliff 59cf0b1902 Fix return value check for anetTcpAccept
anetTcpAccept returns ANET_ERR, not AE_ERR.

This isn't a physical error since both ANET_ERR
and AE_ERR are -1, but better to be consistent.
2014-03-06 17:55:31 +01:00
Salvatore Sanfilippo 54e99fb226 Merge pull request #1578 from badboy/patch-5
Small typo fixed
2014-03-06 17:40:04 +01:00
antirez 9b401819c0 Cast saveparams[].seconds to long for %ld format specifier. 2014-03-05 11:26:18 +01:00
Jan-Erik Rediger 5f5118bdad Small typo fixed 2014-03-05 00:41:02 +01:00
Matt Stancliff e5b1e7be64 Bind source address for cluster communication
The first address specified as a bind parameter
(server.bindaddr[0]) gets used as the source IP
for cluster communication.

If no bind address is specified by the user, the
behavior is unchanged.

This patch allows multiple Redis Cluster instances
to communicate when running on the same interface
of the same host.
2014-03-04 17:36:45 -05:00
antirez 47750998a6 Sentinel: more aggressive failover start desynchronization.
Sentinel needs to avoid split brain conditions due to multiple sentinels
trying to get voted at the exact same time.

So far some desynchronization was provided by fluctuating server.hz,
that is the frequency of the timer function call. However the
desynchonization provided in this way was not enough when using many
Sentinel instances, especially when a large quorum value is used in
order to force a greater degree of agreement (more than N/2+1).

It was verified that it was likely to trigger a split brain
condition, forcing the system to try again after a timeout.
Usually the system will succeed after a few retries, but this is not
optimal.

This commit desynchronizes instances in a more effective way to make it
likely that the first attempt will be successful.
2014-03-04 17:09:36 +01:00
antirez 08da025f56 CONFIG REWRITE should be logged at WARNING level. 2014-03-04 16:39:47 +01:00
zhanghailei 138695d990 refer to updateLRUClock's comment REDIS_LRU_CLOCK_MAX is 22 bits,but #define REDIS_LRU_CLOCK_MAX ((1<<21)-1) only 21 bits 2014-03-04 12:20:31 +08:00
zhanghailei c0f8665414 FIXED a typo more thank should be more than 2014-03-04 11:21:34 +08:00
zhanghailei 4b9ac6edd0 According to context,the size should be 16 rather than 64 2014-03-04 11:21:34 +08:00
antirez c5edd91716 Cluster: invalidate current transaction on redirections. 2014-03-03 17:11:51 +01:00
antirez e41a3edfab Merge branch 'cli_improved_bigkeys' of git://github.com/michael-grunder/redis into unstable 2014-03-03 11:20:54 +01:00
antirez 12a88d575d Document why we update peak memory in INFO. 2014-03-03 11:19:54 +01:00
antirez 0c1bb1313c Merge branch 'unstable' of github.com:/antirez/redis into unstable 2014-03-03 11:17:37 +01:00
antirez 8dea2029a4 Fix configEpoch assignment when a cluster slot gets "closed".
This is still code to rework in order to use agreement to obtain a new
configEpoch when a slot is migrated, however this commit handles the
special case that happens when the nodes are just started and everybody
has a configEpoch of 0. In this special condition to have the maximum
configEpoch is not enough as the special epoch 0 is not unique (all the
others are).

This does not fixes the intrinsic race condition of a failover happening
while we are resharding, that will be addressed later.
2014-03-03 11:12:11 +01:00
Matt Stancliff f1c9a203b2 Force INFO used_memory_peak to match peak memory
used_memory_peak only updates in serverCron every server.hz,
but Redis can use more memory and a user can request memory
INFO before used_memory_peak gets updated in the next
cron run.

This patch updates used_memory_peak to the current
memory usage if the current memory usage is higher
than the recorded used_memory_peak value.

(And it only calls zmalloc_used_memory() once instead of
twice as it was doing before.)
2014-02-28 17:47:41 -05:00
antirez a89c8bb87c Sentinel test: Makefile target added. 2014-02-28 16:00:00 +01:00
michael-grunder 806788d009 Improved bigkeys with progress, pipelining and summary
This commit reworks the redis-cli --bigkeys command to provide more
information about our progress as well as output summary information
when we're done.

 - We now show an approximate percentage completion as we go
 - Hiredis pipelining is used for TYPE and SIZE retreival
 - A summary of keyspace distribution and overall breakout at the end
2014-02-27 12:01:57 -08:00
antirez 76a6e82d89 warnigns -> warnings in redisBitpos(). 2014-02-27 13:17:23 +01:00
antirez 0e31eaa27f More consistent BITPOS behavior with bit=0 and ranges.
With the new behavior it is possible to specify just the start in the
range (the end will be assumed to be the first byte), or it is possible
to specify both start and end.

This is useful to change the behavior of the command when looking for
zeros inside a string.

1) If the user specifies both start and end, and no 0 is found inside
   the range, the command returns -1.

2) If instead no range is specified, or just the start is given, even
   if in the actual string no 0 bit is found, the command returns the
   first bit on the right after the end of the string.

So for example if the string stored at key foo is "\xff\xff":

    BITPOS foo (returns 16)
    BITPOS foo 0 -1 (returns -1)
    BITPOS foo 0 (returns 16)

The idea is that when no end is given the user is just looking for the
first bit that is zero and can be set to 1 with SETBIT, as it is
"available". Instead when a specific range is given, we just look for a
zero within the boundaries of the range.
2014-02-27 12:53:03 +01:00
antirez 38c620b3b5 Initial implementation of BITPOS.
It appears to work but more stress testing, and both unit tests and
fuzzy testing, is needed in order to ensure the implementation is sane.
2014-02-27 12:44:27 +01:00
antirez addd4de9c1 Merge branch 'unstable' of github.com:/antirez/redis into unstable 2014-02-27 10:14:03 +01:00
antirez 746ce35f5f Fix misaligned word access in redisPopcount(). 2014-02-27 09:46:20 +01:00
Matt Stancliff d769cad4bf Fix IP representation in clusterMsgDataGossip 2014-02-25 16:02:28 -05:00
antirez 55e36e1132 Merge branch 'bigkeys_scan' of git://github.com/michael-grunder/redis into unstable 2014-02-25 14:59:57 +01:00
michael-grunder 013a4ce242 Update --bigkeys to use SCAN
This commit changes the findBigKeys() function in redis-cli.c to use the new
SCAN command for iterating the keyspace, rather than RANDOMKEY.  Because we
can know when we're done using SCAN, it will exit after exhausting the keyspace.
2014-02-25 05:41:30 -08:00
antirez a2c76ffb1c redis-cli: also remove useless uint8_t. 2014-02-25 13:47:37 +01:00
antirez ba993cc685 redis-cli: don't use uint64_t where actually not needed.
The computation is just something to take the CPU busy, no need to use a
specific type. Since stdint.h was not included this prevented
compilation on certain systems.
2014-02-25 13:44:31 +01:00
antirez 5580350a7b redis-cli: check argument existence for --pattern. 2014-02-25 12:38:29 +01:00
antirez c1d67ea9b4 redis-cli: --intrinsic-latency run mode added. 2014-02-25 12:37:52 +01:00
antirez dcac007b81 redis-cli: added comments to split program in parts. 2014-02-25 12:24:45 +01:00
antirez b15411df98 Sentinel: log quorum with +monitor event. 2014-02-24 17:10:20 +01:00
antirez 6b373edb77 Sentinel: generate +monitor events at startup. 2014-02-24 16:33:55 +01:00
antirez 3b7a757468 Sentinel: log +monitor and +set events.
Now that we have a runtime configuration system, it is very important to
be able to log how the Sentinel configuration changes over time because
of API calls.
2014-02-24 16:33:43 +01:00
antirez 25cebf7285 Sentinel: added missing exit(1) after checking for config file. 2014-02-24 16:22:52 +01:00
Salvatore Sanfilippo e163332858 Merge pull request #1545 from mattsta/fix-redis-cli-sync
Deny SYNC and PSYNC in redis-cli
2014-02-23 17:47:28 +01:00
antirez b1c1386374 Sentinel: IDONTKNOW error removed.
This error was conceived for the older version of Sentinel that worked
via master redirection and that was not able to get configuration
updates from other Sentinels via the Pub/Sub channel of masters or
slaves.

This reply does not make sense today, every Sentinel should reply with
the best information it has currently. The error will make even more
sense in the future since the plan is to allow Sentinels to update the
configuration of other Sentinels via gossip with a direct chat without
the prerequisite that they have at least a monitored instance in common.
2014-02-22 17:34:46 +01:00
Matt Stancliff 2c273e3591 Add cluster or sentinel to proc title
If you launch redis with `redis-server --sentinel` then
in a ps, your output only says "redis-server IP:Port" — this
patch changes the proc title to include [sentinel] or
[cluster] depending on the current server mode:
e.g.  "redis-server IP:Port [sentinel]"
      "redis-server IP:Port [cluster]"
2014-02-20 23:58:54 -05:00
antirez 7d7b3810e7 Sentinel: report instances role switch events.
This is useful mostly for debugging of issues.
2014-02-20 12:13:52 +01:00
Matt Stancliff ce68caea37 Cluster: error out quicker if port is unusable
The default cluster control port is 10,000 ports higher than
the base Redis port.  If Redis is started on a too-high port,
Cluster can't start and everything will exit later anyway.
2014-02-19 17:30:07 -05:00
Matt Stancliff b20ae393f1 Fix "can't bind to address" error reporting.
Report the actual port used for the listening attempt instead of
server.port.

Originally, Redis would just listen on server.port.
But, with clustering, Redis uses a Cluster Port too,
so we can't say server.port is always where we are listening.

If you tried to launch Redis with a too-high port number (any
port where Port+10000 > 65535), Redis would refuse to start, but
only print an error saying it can't connect to the Redis port.

This patch fixes much confusions.
2014-02-19 17:26:33 -05:00
antirez 7cec9e48ce Sentinel: SENTINEL_SLAVE_RECONF_RETRY_PERIOD -> RECONF_TIMEOUT
Rename define to match the new meaning.
2014-02-18 10:27:38 +01:00
antirez 18b8bad53c Sentinel: fix slave promotion timeout.
If we can't reconfigure a slave in time during failover, go forward as
anyway the slave will be fixed by Sentinels in the future, once they
detect it is misconfigured.

Otherwise a failover in progress may never terminate if for some reason
the slave is uncapable to sync with the master while at the same time
it is not disconnected.
2014-02-18 08:50:57 +01:00
antirez ede33fb912 Get absoulte config file path before processig 'dir'.
The code tried to obtain the configuration file absolute path after
processing the configuration file. However if config file was a relative
path and a "dir" statement was processed reading the config, the absolute
path obtained was wrong.

With this fix the absolute path is obtained before processing the
configuration while the server is still in the original directory where
it was executed.
2014-02-17 16:44:53 +01:00
antirez e1b77b61f3 Sentinel: better specify startup errors due to config file.
Now it logs the file name if it is not accessible. Also there is a
different error for the missing config file case, and for the non
writable file case.
2014-02-17 16:44:49 +01:00
antirez 51bd9da1fd Update cached time in rdbLoad() callback.
server.unixtime and server.mstime are cached less precise timestamps
that we use every time we don't need an accurate time representation and
a syscall would be too slow for the number of calls we require.

Such an example is the initialization and update process of the last
interaction time with the client, that is used for timeouts.

However rdbLoad() can take some time to load the DB, but at the same
time it did not updated the time during DB loading. This resulted in the
bug described in issue #1535, where in the replication process the slave
loads the DB, creates the redisClient representation of its master, but
the timestamp is so old that the master, under certain conditions, is
sensed as already "timed out".

Thanks to @yoav-steinberg and Redis Labs Inc for the bug report and
analysis.
2014-02-13 15:13:26 +01:00
antirez 7e8abcf693 Log when CONFIG REWRITE goes bad. 2014-02-13 14:32:44 +01:00
antirez 21e6b0fbe9 Fix script cache bug in the scripting engine.
This commit fixes a serious Lua scripting replication issue, described
by Github issue #1549. The root cause of the problem is that scripts
were put inside the script cache, assuming that slaves and AOF already
contained it, even if the scripts sometimes produced no changes in the
data set, and were not actaully propagated to AOF/slaves.

Example:

    eval "if tonumber(KEYS[1]) > 0 then redis.call('incr', 'x') end" 1 0

Then:

    evalsha <sha1 step 1 script> 1 0

At this step sha1 of the script is added to the replication script cache
(the script is marked as known to the slaves) and EVALSHA command is
transformed to EVAL. However it is not dirty (there is no changes to db),
so it is not propagated to the slaves. Then the script is called again:

    evalsha <sha1 step 1 script> 1 1

At this step master checks that the script already exists in the
replication script cache and doesn't transform it to EVAL command. It is
dirty and propagated to the slaves, but they fail to evaluate the script
as they don't have it in the script cache.

The fix is trivial and just uses the new API to force the propagation of
the executed command regardless of the dirty state of the data set.

Thank you to @minus-infinity on Github for finding the issue,
understanding the root cause, and fixing it.
2014-02-13 12:10:43 +01:00
antirez fc08c8599f AOF write error: retry with a frequency of 1 hz. 2014-02-12 16:27:59 +01:00
antirez fe8352540f AOF: don't abort on write errors unless fsync is 'always'.
A system similar to the RDB write error handling is used, in which when
we can't write to the AOF file, writes are no longer accepted until we
are able to write again.

For fsync == always we still abort on errors since there is currently no
easy way to avoid replying with success to the user otherwise, and this
would violate the contract with the user of only acknowledging data
already secured on disk.
2014-02-12 16:11:36 +01:00
antirez db6d628c3e Cluster: clusterDelNode(): remove node from master's slaves. 2014-02-11 10:34:25 +01:00
antirez 5e0e03be41 Cluster: UPDATE messages are the norm and verbose.
Logging them at WARNING level was of little utility and of sure disturb.
2014-02-11 10:18:24 +01:00
antirez 8251d2d150 Cluster: redis-trib fix: handling of another trivial case. 2014-02-11 10:13:18 +01:00
antirez 4a64286c36 Cluster: configEpoch assignment in SETNODE improved.
Avoid to trash a configEpoch for every slot migrated if this node has
already the max configEpoch across the cluster.

Still work to do in this area but this avoids both ending with a very
high configEpoch without any reason and to flood the system with fsyncs.
2014-02-11 10:09:17 +01:00
antirez 72f7abf6a2 Cluster: clusterSetStartupEpoch() made more generally useful.
The actual goal of the function was to get the max configEpoch found in
the cluster, so make it general by removing the assignment of the max
epoch to currentEpoch that is useful only at startup.
2014-02-11 10:00:14 +01:00
antirez 44f7afe28a Cluster: always increment the configEpoch in SETNODE after import.
Removed a stale conditional preventing the configEpoch from incrementing
after the import in certain conditions. Since the master got a new slot
it should always claim a new configuration.
2014-02-11 09:50:37 +01:00
antirez a1349728ea Cluster: on resharding upgrade version of receiving node.
The node receiving the hash slot needs to have a version that wins over
the other versions in order to force the ownership of the slot.

However the current code is far from perfect since a failover can happen
during the manual resharding. The fix is a work in progress but the
bottom line is that the new version must either be voted as usually,
set by redis-trib manually after it makes sure can't be used by other
nodes, or reserved configEpochs could be used for manual operations (for
example odd versions could be never used by slaves and are always used
by CLUSTER SETSLOT NODE).
2014-02-11 00:36:05 +01:00
antirez 6dc26795aa Cluster: fsync at every SETSLOT command puts too pressure on disks.
During slots migration redis-trib can send a number of SETSLOT commands.
Fsyncing every time is a bit too much in production as verified
empirically.

To make sure configs are fsynced on all nodes after a resharding
redis-trib may send something like CLUSTER CONFSYNC.

In this case fsyncs were not providing too much value since anyway
processes can crash in the middle of the resharding of an hash slot, and
redis-trib should be able to recover from this condition anyway.
2014-02-10 23:54:08 +01:00
antirez 218358bbbd Cluster: conditions to clear "migrating" on slot for SETSLOT ... NODE changed.
If the slot is manually assigned to another node, clear the migrating
status regardless of the fact it was previously assigned to us or not,
as long as we no longer have keys for this slot.

This avoid a race during slots migration that may leave the slot in
migrating status in the source node, since it received an update message
from the destination node that is already claiming the slot.

This way we are sure that redis-trib at the end of the slot migration is
always able to close the slot correctly.
2014-02-10 23:51:47 +01:00
antirez 3107e7ca60 Cluster: remove debugging xputs from redis-trib. 2014-02-10 19:14:05 +01:00
antirez 1ae50a9b1d Cluster: redis-trib fix: cover new case of open slot.
The case is the trivial one a single node claiming the slot as
migrating, without nodes claiming it as importing.
2014-02-10 19:10:23 +01:00
antirez 59e03a8f35 redis-trib: log event after we have reference to 'master'. 2014-02-10 18:48:40 +01:00
antirez bf670e0745 Cluster: don't update slave's master if we don't know it.
There is no way we can update the slave's node->slaveof pointer if we
don't know the master (no node with such an ID in our tables).
2014-02-10 18:33:34 +01:00
antirez a3755ae9ee Cluster: ignore slot config changes if we are importing it. 2014-02-10 18:04:43 +01:00
antirez 6fc53e16ad Cluster: update configEpoch after manually messing with slots. 2014-02-10 18:01:58 +01:00
antirez be0bb19fd3 Cluster: redis-trib, more info about open slots error. 2014-02-10 17:44:16 +01:00
antirez 1a73c992a3 Cluster: fixed inverted arguments in logging function call. 2014-02-10 17:21:10 +01:00
antirez 32563b4a5f Cluster: clear the FAIL status for masters without slots.
Masters without slots don't participate to the cluster but just do
redirections, no need to take them in FAIL state if they are back
reachable.
2014-02-10 17:18:27 +01:00
Matt Stancliff 21648473aa Auto-enter slaveMode when SYNC from redis-cli
If someone asks for SYNC or PSYNC from redis-cli,
automatically enter slaveMode (as if they ran
redis-cli --slave) and continue printing the replication
stream until either they Ctrl-C or the master gets disconnected.
2014-02-10 11:10:31 -05:00
antirez 5b2082ead3 Cluster: replica migration should only work for masters serving slots. 2014-02-10 17:08:37 +01:00
antirez f106a79309 Cluster: redis-trib del-node variable typo fixed. 2014-02-10 16:59:09 +01:00
antirez f885fa8bac Cluster: clusterReadHandler() fixed to work with new message header. 2014-02-10 16:27:37 +01:00
antirez 344a065d51 Cluster: don't propagate PUBLISH two times.
PUBLISH both published messages via Cluster bus and replication when
cluster was enabled, resulting in duplicated message in the slave.
2014-02-10 16:00:27 +01:00
antirez 7bf7b7350c Cluster: signature changed to "RCmb" (Redis Cluster message bus).
Sounds better after all.
2014-02-10 15:55:21 +01:00
antirez dced9c0619 Cluster: discard bus messages with version != 0. 2014-02-10 15:54:22 +01:00
antirez 007e1c7cb2 Cluster: added signature + version in bus packets. 2014-02-10 15:53:09 +01:00
antirez dca95f241c Cluster: redis-trib: options table entry for add-node fixed. 2014-02-10 12:34:21 +01:00
antirez 6df4ffe639 Don't count time to feed MONITORs in SLOWLOG. 2014-02-07 18:29:20 +01:00
antirez 142281dc79 Cluster: keys slot computation now supports hash tags.
Currently this is marginally useful, only to make sure two keys are in
the same hash slot when the cluster is stable (no rehashing in
progress).

In the future it is possible that support will be added to run
mutli-keys operations with keys in the same hash slot.
2014-02-07 17:39:01 +01:00
antirez 2d6eb68993 Sentinel: allow SHUTDOWN command in Sentinel mode. 2014-02-07 11:22:24 +01:00
antirez 970de3e9c0 Check for EAGAIN in sendBulkToSlave().
Sometime an osx master with a Linux server over a slow link caused
a strange error where osx called the writable function for
the socket but actually apparently there was no room in the socket
buffer to accept the write: write(2) call returned an EAGAIN error,
that was not checked, so we considered write(2) == 0 always as a connection
reset, which was unfortunate since the bulk transfer has to start again.

Also more errors are logged with the WARNING level in the same code path
now.
2014-02-05 16:38:10 +01:00
antirez 04fe000bf8 Cluster: fixed MF condition in clusterHandleSlaveFailover().
For manual failover we need a manual failover in progress, and that
mf_can_start is true (master offset received and matched).
2014-02-05 16:01:56 +01:00
antirez c6f02fd67a Cluster: CLUSTER FAILOVER replies with OK and logs the event. 2014-02-05 15:52:38 +01:00
antirez c72449af30 Cluster: check that a MF is in progress in manualFailoverCheckTimeout().
Otherwise it is always detected as a manual failover timed out.
2014-02-05 15:45:24 +01:00
antirez b7402bcad5 Cluster: force AUTH ACK on manual failover.
When a slave requests masters vote for a manual failover, the
REQUEST_AUTH message is flagged in a special way in order to force the
masters to give the authorization even if the master is not marked as
failing.
2014-02-05 13:10:03 +01:00
antirez 4cf0cd5719 Cluster: manual failover initial implementation. 2014-02-05 13:01:24 +01:00
antirez 4919a13f50 CLIENT PAUSE and related API implemented.
The API is one of the bulding blocks of CLUSTER FAILOVER command that
executes a manual failover in Redis Cluster. However exposed as a
command that the user can call directly, it makes much simpler to
upgrade a standalone Redis instance using a slave in a safer way.

The commands works like that:

    CLIENT PAUSE <milliesconds>

All the clients that are not slaves and not in MONITOR state are paused
for the specified number of milliesconds. This means that slaves are
normally served in the meantime.

At the end of the specified amount of time all the clients are unblocked
and will continue operations normally. This command has no effects on
the population of the slow log, since clients are not blocked in the
middle of operations but only when there is to process new data.

Note that while the clients are unblocked, still new commands are
accepted and queued in the client buffer, so clients will likely not
block while writing to the server while the pause is active.
2014-02-04 16:16:09 +01:00
antirez b089ba98cc Scripting: expire keys in scripts only at first access.
Keys expiring in the middle of the execution of Lua scripts are to
create inconsistencies in masters and / or AOF files. See the following
example:

    if redis.call("exists",KEYS[1]) == 1
    then
        redis.call("incr","mycounter")
    end

    if redis.call("exists",KEYS[1]) == 1
    then
        return redis.call("incr","mycounter")
    end

The script executes two times the same *if key exists then incrementcounter*
logic. However the two executions will work differently in the master and
the slaves, provided some unlucky timing happens.

In the master the first time the key may still exist, while the second time
the key may no longer exist. This will result in the key incremented just one
time. However as a side effect the master will generate a synthetic
`DEL` command in the replication channel in order to force the slaves to
expire the key (given that key expiration is master-driven).

When the same script will run in the slave, the key will no longer be
there, so the script will not increment the key.

The key idea used to implement the expire-at-first-lookup semantics was
provided by Marc Gravell.
2014-02-03 16:15:53 +01:00
antirez b770079f2c Allow CONFIG and SHUTDOWN while in stale-slave state. 2014-02-03 15:51:03 +01:00
antirez 89884e8f6e Scripting: use mstime() and mstime_t for lua_time_start.
server.lua_time_start is expressed in milliseconds. Use mstime_t instead
of long long, and populate it with mstime() instead of ustime()/1000.

Functionally identical but more natural.
2014-02-03 15:45:40 +01:00
antirez 7be946fde2 Option "backlog" renamed "tcp-backlog".
This is especially important since we already have a concept of backlog
(the replication backlog).
2014-01-31 14:56:10 +01:00