redis原理之数据淘汰

在 redis 中，允许用户设置最大使用内存大小 server.maxmemory，在内存限定的情况下是很有用的。redis 内存数据集大小上升到一定大小的时候，就会施行数据淘汰策略。redis 提供 6种数据淘汰策略：

volatile-lru：从已设置过期时间的数据集（server.db[i].expires）中挑选最近最少使用的数据淘汰
volatile-ttl：从已设置过期时间的数据集（server.db[i].expires）中挑选将要过期的数据淘汰
volatile-random：从已设置过期时间的数据集（server.db[i].expires）中任意选择数据淘汰
allkeys-lru：从数据集（server.db[i].dict）中挑选最近最少使用的数据淘汰
allkeys-random：从数据集（server.db[i].dict）中任意选择数据淘汰
no-enviction（驱逐）：禁止驱逐数据

redis 确定驱逐某个键值对后，会删除这个数据，并将这个数据变更消息发布到本地（AOF 持久化）和从机（主从连接）。

LRU 数据淘汰机制

在服务器配置中保存了 lru 计数器 server.lrulock，会定时（redis 定时程序 serverCorn()）更新，server.lrulock 的值是根据 server.unixtime 计算出来的。

// redisServer 保存了 lru 计数器
struct redisServer {
    ...
    unsigned lruclock:22;       /* Clock incrementing every minute, for LRU */
    ...
};

另外，从 struct redisObject 中可以发现，每一个 redis 对象都会设置相应的 lru，即最近访问的时间。可以想象的是，每一次访问数据的时候（get不会），会更新 redisObject.lru。

LRU 数据淘汰机制是这样的：在数据集中随机挑选几个键值对，取出其中 lru 最大的键值对淘汰。所以，你会发现，redis 并不是保证取得所有数据集中最近最少使用（LRU）的键值对，而只是随机挑选的几个键值对中的。

// 每一个 redis 对象都保存了 lru
#define REDIS_LRU_CLOCK_MAX ((1<<21)-1) /* Max value of obj->lru */
#define REDIS_LRU_CLOCK_RESOLUTION 10 /* LRU clock resolution in seconds */
typedef struct redisObject {
    // 刚刚好 32 bits

    // 对象的类型，字符串/列表/集合/哈希表
    unsigned type:4;
    // 未使用的两个位
    unsigned notused:2;     /* Not used */
    // 编码的方式，redis 为了节省空间，提供多种方式来保存一个数据
    // 譬如：“123456789” 会被存储为整数 123456789
    unsigned encoding:4;
    unsigned lru:22;        /* lru time (relative to server.lruclock) */

    // 引用数
    int refcount;

    // 数据指针
    void *ptr;
} robj;

// redis 定时执行程序。联想：linux cron
int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData) {
    ......
    /* We have just 22 bits per object for LRU information.
     * So we use an (eventually wrapping) LRU clock with 10 seconds resolution.
     * 2^22 bits with 10 seconds resolution is more or less 1.5 years.
     *
     * Note that even if this will wrap after 1.5 years it's not a problem,
     * everything will still work but just some object will appear younger
     * to Redis. But for this to happen a given object should never be touched
     * for 1.5 years.
     *
     * Note that you can change the resolution altering the
     * REDIS_LRU_CLOCK_RESOLUTION define.
     */
    updateLRUClock();
    ......
}

// 更新服务器的 lru 计数器
void updateLRUClock(void) {
    server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &
                                                REDIS_LRU_CLOCK_MAX;
}

TTL 数据淘汰机制

redis 数据集数据结构中保存了键值对过期时间的表，即 redisDb.expires，在使用 SET 命令的时候，就有一个键值对超时时间的选项。和 LRU 数据淘汰机制类似，TTL 数据淘汰机制是这样的：从过期时间 redisDB.expires 表中随机挑选几个键值对，取出其中 ttl 最大的键值对淘汰。同样你会发现，redis 并不是保证取得所有过期时间的表中最快过期的键值对，而只是随机挑选的几个键值对中的。

无论是什么机制，都是从所有的键值对中挑选合适的淘汰。

在哪里开始淘汰数据

redis 每服务客户端执行一个命令的时候，会检测使用的内存是否超额。如果超额，即进行数据淘汰。

// 执行命令
int processCommand(redisClient *c) {
    ......
    // 内存超额
    /* Handle the maxmemory directive.
    *
    * First we try to free some memory if possible (if there are volatile
    * keys in the dataset). If there are not the only thing we can do
    * is returning an error. */
    if (server.maxmemory) {
        int retval = freeMemoryIfNeeded();
        if ((c->cmd->flags & REDIS_CMD_DENYOOM) && retval == REDIS_ERR) {
            flagTransaction(c);
            addReply(c, shared.oomerr);
            return REDIS_OK;
        }
    }
    ......
}

这是我们之前讲述过的命令处理函数。在处理命令处理函数的过程，会涉及到内存使用量的检测，如果检测到内存使用超额，会触发数据淘汰机制。我们来看看淘汰机制触发的函数 freeMemoryIfNeeded() 里面发生了什么。

// 如果需要，是否一些内存
int freeMemoryIfNeeded(void) {
    size_t mem_used, mem_tofree, mem_freed;
    int slaves = listLength(server.slaves);

    // redis 从机回复空间和 AOF 内存大小不计算入 redis 内存大小
    // 关于已使用内存大小是如何统计的，我们会其他章节讲解，这里先忽略这个细节
    /* Remove the size of slaves output buffers and AOF buffer from the
     * count of used memory. */
    mem_used = zmalloc_used_memory();

    // 从机回复空间大小
    if (slaves) {
        listIter li;
        listNode *ln;

        listRewind(server.slaves,&li);
        while((ln = listNext(&li))) {
            redisClient *slave = listNodeValue(ln);
            unsigned long obuf_bytes = getClientOutputBufferMemoryUsage(slave);
            if (obuf_bytes > mem_used)
                mem_used = 0;
            else
                mem_used -= obuf_bytes;
        }
    }
    // server.aof_buf && server.aof_rewrite_buf_blocks
    if (server.aof_state != REDIS_AOF_OFF) {
        mem_used -= sdslen(server.aof_buf);
        mem_used -= aofRewriteBufferSize();
    }

    // 内存是否超过设置大小
    /* Check if we are over the memory limit. */
    if (mem_used <= server.maxmemory) return REDIS_OK;

    // redis 中可以设置内存超额策略
    if (server.maxmemory_policy == REDIS_MAXMEMORY_NO_EVICTION)
        return REDIS_ERR; /* We need to free memory, but policy forbids. */

    /* Compute how much memory we need to free. */
    mem_tofree = mem_used - server.maxmemory;
    mem_freed = 0;
    while (mem_freed < mem_tofree) {
        int j, k, keys_freed = 0;

        // 遍历所有数据集
        for (j = 0; j < server.dbnum; j++) {
            long bestval = 0; /* just to prevent warning */
            sds bestkey = NULL;
            struct dictEntry *de;
            redisDb *db = server.db+j;
            dict *dict;

            // 不同的策略，选择的数据集不一样
            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM)
            {
                dict = server.db[j].dict;
            } else {
                dict = server.db[j].expires;
            }

            // 数据集为空，继续下一个数据集
            if (dictSize(dict) == 0) continue;

            // 随机淘汰随机策略：随机挑选
            /* volatile-random and allkeys-random policy */
            if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_RANDOM ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_RANDOM)
            {
                de = dictGetRandomKey(dict);
                bestkey = dictGetKey(de);
            }

            // LRU 策略：挑选最近最少使用的数据
            /* volatile-lru and allkeys-lru policy */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
            {
                // server.maxmemory_samples 为随机挑选键值对次数
                // 随机挑选 server.maxmemory_samples个键值对，驱逐最近最少使用的数据
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;
                    robj *o;

                    // 随机挑选键值对
                    de = dictGetRandomKey(dict);

                    // 获取键
                    thiskey = dictGetKey(de);

                    /* When policy is volatile-lru we need an additional lookup
                     * to locate the real key, as dict is set to db->expires. */
                    if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
                        de = dictFind(db->dict, thiskey);
                    o = dictGetVal(de);

                    // 计算数据的空闲时间
                    thisval = estimateObjectIdleTime(o);

                    // 当前键值空闲时间更长，则记录
                    /* Higher idle time is better candidate for deletion */
                    if (bestkey == NULL || thisval > bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }

            // TTL 策略：挑选将要过期的数据
            /* volatile-ttl */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_TTL) {
                // server.maxmemory_samples 为随机挑选键值对次数
                // 随机挑选 server.maxmemory_samples个键值对，驱逐最快要过期的数据
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetKey(de);
                    thisval = (long) dictGetVal(de);

                    /* Expire sooner (minor expire unix timestamp) is better
                     * candidate for deletion */
                    if (bestkey == NULL || thisval < bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }

            // 删除选定的键值对
            /* Finally remove the selected key. */
            if (bestkey) {
                long long delta;

                robj *keyobj = createStringObject(bestkey,sdslen(bestkey));

                // 发布数据更新消息，主要是 AOF 持久化和从机
                propagateExpire(db,keyobj);

                // 注意， propagateExpire() 可能会导致内存的分配， 
                // propagateExpire() 提前执行就是因为 redis 只计算 
                // dbDelete() 释放的内存大小。倘若同时计算 dbDelete() 
                // 释放的内存和 propagateExpire() 分配空间的大小，与此
                // 同时假设分配空间大于释放空间，就有可能永远退不出这个循环。
                // 下面的代码会同时计算 dbDelete() 释放的内存和 propagateExpire() 分配空间的大小：
                // propagateExpire(db,keyobj);
                // delta = (long long) zmalloc_used_memory();
                // dbDelete(db,keyobj);
                // delta -= (long long) zmalloc_used_memory();
                // mem_freed += delta;
                /////////////////////////////////////////

                /* We compute the amount of memory freed by dbDelete() alone.
                 * It is possible that actually the memory needed to propagate
                 * the DEL in AOF and replication link is greater than the one
                 * we are freeing removing the key, but we can't account for
                 * that otherwise we would never exit the loop.
                 *
                 * AOF and Output buffer memory will be freed eventually so
                 * we only care about memory used by the key space. */
                // 只计算 dbDelete() 释放内存的大小
                delta = (long long) zmalloc_used_memory();
                dbDelete(db,keyobj);
                delta -= (long long) zmalloc_used_memory();
                mem_freed += delta;

                server.stat_evictedkeys++;

                // 将数据的删除通知所有的订阅客户端
                notifyKeyspaceEvent(REDIS_NOTIFY_EVICTED, "evicted",
                    keyobj, db->id);
                decrRefCount(keyobj);
                keys_freed++;

                // 将从机回复空间中的数据及时发送给从机
                /* When the memory to free starts to be big enough, we may
                 * start spending so much time here that is impossible to
                 * deliver data to the slaves fast enough, so we force the
                 * transmission here inside the loop. */
                if (slaves) flushSlavesOutputBuffers();
            }
        }

        // 未能释放空间，且此时 redis 使用的内存大小依旧超额，失败返回
        if (!keys_freed) return REDIS_ERR; /* nothing to free... */
    }
    return REDIS_OK;
}

我们再回过来想想为什么需要内存回收？
1、在Redis中，set指令可以指定key的过期时间，当过期时间到达以后，key就失效了；

2、Redis是基于内存操作的，所有的数据都是保存在内存中，一台机器的内存是有限且很宝贵的。

基于以上两点，为了保证Redis能继续提供可靠的服务，Redis需要一种机制清理掉不常用的、无效的、多余的数据，失效后的数据需要及时清理，这就需要内存回收了。

Redis的内存回收机制

Redis的内存回收主要分为过期删除策略和内存淘汰策略两部分。

过期删除策略
删除达到过期时间的key。

1、定时删除
对于每一个设置了过期时间的key都会创建一个定时器，一旦到达过期时间就立即删除。该策略可以立即清除过期的数据，对内存较友好，但是缺点是占用了大量的CPU资源去处理过期的数据，会影响Redis的吞吐量和响应时间。
2、惰性删除

当访问一个key时，才判断该key是否过期，过期则删除。该策略能最大限度地节省CPU资源，但是对内存却十分不友好。有一种极端的情况是可能出现大量的过期key没有被再次访问，因此不会被清除，导致占用了大量的内存。

3、定期删除
每隔一段时间，扫描Redis中过期key字典，并清除部分过期的key。该策略是前两者的一个折中方案，还可以通过调整定时扫描的时间间隔和每次扫描的限定耗时，在不同情况下使得CPU和内存资源达到最优的平衡效果。

在Redis中，同时使用了定期删除和惰性删除。

过期删除策略原理
在正式介绍过期删除策略原理之前，先给大家介绍一点可能会用到的相关Redis基础知识。

redisDb结构体定义

我们知道，Redis是一个键值对数据库，对于每一个redis数据库，redis使用一个redisDb的结构体来保存，它的结构如下：

typedef struct redisDb {
        dict *dict;                 /* 数据库的键空间，保存数据库中的所有键值对 */
        dict *expires;              /* 保存所有过期的键 */
        dict *blocking_keys;        /* Keys with clients waiting for data (BLPOP)*/
        dict *ready_keys;           /* Blocked keys that received a PUSH */
        dict *watched_keys;         /* WATCHED keys for MULTI/EXEC CAS */
        int id;                     /* 数据库ID字段，代表不同的数据库 */
        long long avg_ttl;          /* Average TTL, just for stats */
} redisDb;

从结构定义中我们可以发现，对于每一个Redis数据库，都会使用一个字典的数据结构来保存每一个键值对，dict的结构图如下：

image.png

以上就是过期策略实现时用到比较核心的数据结构。程序=数据结构+算法，介绍完数据结构以后，接下来继续看看处理的算法是怎样的。

expires属性
redisDb定义的第二个属性是expires，它的类型也是字典，Redis会把所有过期的键值对加入到expires，之后再通过定期删除来清理expires里面的值。加入expires的场景有：

1、set指定过期时间expire
如果设置key的时候指定了过期时间，Redis会将这个key直接加入到expires字典中，并将超时时间设置到该字典元素。

2、调用expire命令
显式指定某个key的过期时间

3、恢复或修改数据
从Redis持久化文件中恢复文件或者修改key，如果数据中的key已经设置了过期时间，就将这个key加入到expires字典中

以上这些操作都会将过期的key保存到expires。redis会定期从expires字典中清理过期的key。

Redis清理过期key的时机

1、Redis在启动的时候，会注册两种事件，一种是时间事件，另一种是文件事件。时间事件主要是Redis处理后台操作的一类事件，比如客户端超时、删除过期key；文件事件是处理请求。

在时间事件中，redis注册的回调函数是serverCron，在定时任务回调函数中，通过调用databasesCron清理部分过期key。（这是定期删除的实现。）

int serverCron(struct aeEventLoop *eventLoop, long long id, void *clientData)
{
    …
    /* Handle background operations on Redis databases. */
    databasesCron();
    ...
}

2、每次访问key的时候，都会调用expireIfNeeded函数判断key是否过期，如果是，清理key。（这是惰性删除的实现。）

robj *lookupKeyRead(redisDb *db, robj *key) {
    robj *val;
    expireIfNeeded(db,key);
    val = lookupKey(db,key);
     ...
    return val;
}

3、每次事件循环执行时，主动清理部分过期key。

void aeMain(aeEventLoop *eventLoop) {
    eventLoop->stop = 0;
    while (!eventLoop->stop) {
        if (eventLoop->beforesleep != NULL)
            eventLoop->beforesleep(eventLoop);
        aeProcessEvents(eventLoop, AE_ALL_EVENTS);
    }
}

void beforeSleep(struct aeEventLoop *eventLoop) {
       ...
       /* Run a fast expire cycle (the called function will return
        - ASAP if a fast cycle is not needed). */
       if (server.active_expire_enabled && server.masterhost == NULL)
           activeExpireCycle(ACTIVE_EXPIRE_CYCLE_FAST);
       ...
   }

过期策略的实现

我们知道，Redis是以单线程运行的，在清理key是不能占用过多的时间和CPU，需要在尽量不影响正常的服务情况下，进行过期key的清理。过期清理的算法如下：

1、server.hz配置了serverCron任务的执行周期，默认是10，即CPU空闲时每秒执行十次。
2、每次清理过期key的时间不能超过CPU时间的25%：timelimit = 1000000*ACTIVE_EXPIRE_CYCLE_SLOW_TIME_PERC/server.hz/100;
    比如，如果hz=1，一次清理的最大时间为250ms，hz=10，一次清理的最大时间为25ms。
3、如果是快速清理模式（在beforeSleep函数调用），则一次清理的最大时间是1ms。
4、依次遍历所有的DB。
5、从db的过期列表中随机取20个key，判断是否过期，如果过期，则清理。
6、如果有5个以上的key过期，则重复步骤5，否则继续处理下一个db
7、在清理过程中，如果达到CPU的25%时间，退出清理过程。

从实现的算法中可以看出，这只是基于概率的简单算法，且是随机的抽取，因此是无法删除所有的过期key，通过调高hz参数可以提升清理的频率，过期key可以更及时的被删除，但hz太高会增加CPU时间的消耗。

删除key

Redis4.0以前，删除指令是del，del会直接释放对象的内存，大部分情况下，这个指令非常快，没有任何延迟的感觉。但是，如果删除的key是一个非常大的对象，比如一个包含了千万元素的hash，那么删除操作就会导致单线程卡顿，Redis的响应就慢了。为了解决这个问题，在Redis4.0版本引入了unlink指令，能对删除操作进行“懒”处理，将删除操作丢给后台线程，由后台线程来异步回收内存。
实际上，在判断key需要过期之后，真正删除key的过程是先广播expire事件到从库和AOF文件中，然后在根据redis的配置决定立即删除还是异步删除。
如果是立即删除，Redis会立即释放key和value占用的内存空间，否则，Redis会在另一个bio线程中释放需要延迟删除的空间。

小结

总的来说，Redis的过期删除策略是在启动时注册了serverCron函数，每一个时间时钟周期，都会抽取expires字典中的部分key进行清理，从而实现定期删除。另外，Redis会在访问key时判断key是否过期，如果过期了，就删除，以及每一次Redis访问事件到来时，beforeSleep都会调用activeExpireCycle函数，在1ms时间内主动清理部分key，这是惰性删除的实现

Redis结合了定期删除和惰性删除，基本上能很好的处理过期数据的清理，但是实际上还是有点问题的，如果过期key较多，定期删除漏掉了一部分，而且也没有及时去查，即没有走惰性删除，那么就会有大量的过期key堆积在内存中，导致redis内存耗尽，当内存耗尽之后，有新的key到来会发生什么事呢？是直接抛弃还是其他措施呢？有什么办法可以接受更多的key？

内存淘汰策略
Redis的内存淘汰策略，是指内存达到maxmemory极限时，使用某种算法来决定清理掉哪些数据，以保证新数据的存入。

Redis的内存淘汰机制

noeviction: 当内存不足以容纳新写入数据时，新写入操作会报错。
allkeys-lru：当内存不足以容纳新写入数据时，在键空间（server.db[i].dict）中，移除最近最少使用的 key（这个是最常用的）。
allkeys-random：当内存不足以容纳新写入数据时，在键空间（server.db[i].dict）中，随机移除某个 key。
volatile-lru：当内存不足以容纳新写入数据时，在设置了过期时间的键空间（server.db[i].expires）中，移除最近最少使用的 key。
volatile-random：当内存不足以容纳新写入数据时，在设置了过期时间的键空间（server.db[i].expires）中，随机移除某个 key。
volatile-ttl：当内存不足以容纳新写入数据时，在设置了过期时间的键空间（server.db[i].expires）中，有更早过期时间的 key 优先移除。

在配置文件中，通过maxmemory-policy可以配置要使用哪一个淘汰机制。

什么时候会进行淘汰？

Redis会在每一次处理命令的时候（processCommand函数调用freeMemoryIfNeeded）判断当前redis是否达到了内存的最大限制，如果达到限制，则使用对应的算法去处理需要删除的key。伪代码如下：

int processCommand(client *c)
{
    ...
    if (server.maxmemory) {
        int retval = freeMemoryIfNeeded();  
    }
    ...
}

LRU实现原理
在淘汰key时，Redis默认最常用的是LRU算法（Latest Recently Used）。Redis通过在每一个redisObject保存lru属性来保存key最近的访问时间，在实现LRU算法时直接读取key的lru属性。
具体实现时，Redis遍历每一个db，从每一个db中随机抽取一批样本key，默认是3个key，再从这3个key中，删除最近最少使用的key。实现伪代码如下：

keys = getSomeKeys(dict, sample)
key = findSmallestIdle(keys)
remove(key)

3这个数字是配置文件中的maxmeory-samples字段，也是可以可以设置采样的大小，如果设置为10，那么效果会更好，不过也会耗费更多的CPU资源。

总结

Redis对于内存的回收有两种方式，一种是过期key的回收，另一种是超过redis的最大内存后的内存释放。

对于第一种情况，Redis会在：

每一次访问的时候判断key的过期时间是否到达，如果到达，就删除key
2、redis启动时会创建一个定时事件，会定期清理部分过期的key，默认是每秒执行十次检查，每次过期key清理的时间不超过CPU时间的25%，即若hz=1，则一次清理时间最大为250ms，若hz=10，则一次清理时间最大为25ms。

对于第二种情况，redis会在每次处理redis命令的时候判断当前redis是否达到了内存的最大限制，如果达到限制，则使用对应的算法去处理需要删除的key。

Redis近似LRU算法

Redis作为目前最流行的KV内存数据库，也实现了自己的LRU（Latest Recently Used）算法，在内存写满的时候，依据其进行数据的淘汰。LRU算法本身的含义，这里不做赘述，严格的LRU算法，会优先选择淘汰最久没有访问的数据，这种实现也比较简单，通常是用一个双向链表+一个哈希表来实现O(1)的淘汰和更新操作。但是，Redis为了节省内存使用，和通常的LRU算法实现不太一样，Redis使用了采样的方法来模拟一个近似LRU算法。

下面先给一个图来直观的感受一下Redis的近似LRU算法和严格LRU算法的差异，

image.png

图中深灰色和浅灰色的点表示的key数量正好可以写满内存，绿色的点表示刚写入的key，浅灰色的点表示被淘汰的key，深灰色的点表示剩余的没有被淘汰的key。

在严格LRU算法下，图的左上部分，最先写入的一半的key，被顺序淘汰掉了，但是在Redis的近似LRU算法下，图的左下部分，可能出现很早之前写入的key，并没有被淘汰掉，写入时间更晚的key反而被淘汰了，但是也没有出现比较极端的刚刚写入不久的key就被淘汰的情况。

根据Redis作者的说法，如果访问Redis的模式呈现幂律分布，即通常说的二八分布，Redis 2.8的近似LRU算法和严格LRU算法差异不大，下面我们就来看看这个近似LRU算法是怎么实现的。

Redis LRU算法实现

Redis 2.8.19中使用了一个全局的LRU时钟，server.lruclock，定义如下，

#define REDIS_LRU_BITS 24
unsigned lruclock:REDIS_LRU_BITS; /* Clock for LRU eviction */

默认的LRU时钟的分辨率是1秒，可以通过改变REDIS_LRU_CLOCK_RESOLUTION宏的值来改变，Redis会在serverCron()中调用updateLRUClock定期的更新LRU时钟，更新的频率和hz参数有关，默认为100ms一次，如下，

#define REDIS_LRU_CLOCK_MAX ((1<<REDIS_LRU_BITS)-1) /* Max value of obj->lru */
#define REDIS_LRU_CLOCK_RESOLUTION 1 /* LRU clock resolution in seconds */

void updateLRUClock(void) {
    server.lruclock = (server.unixtime/REDIS_LRU_CLOCK_RESOLUTION) &
                                                REDIS_LRU_CLOCK_MAX;
}

server.unixtime是系统当前的unix时间戳，当lruclock的值超出REDIS_LRU_CLOCK_MAX时，会从头开始计算，所以在计算一个key的最长没有访问时间时，可能key本身保存的lru访问时间会比当前的lrulock还要大，这个时候需要计算额外时间，如下，

/* Given an object returns the min number of seconds the object was never
 * requested, using an approximated LRU algorithm. */
unsigned long estimateObjectIdleTime(robj *o) {
    if (server.lruclock >= o->lru) {
        return (server.lruclock - o->lru) * REDIS_LRU_CLOCK_RESOLUTION;
    } else {
        return ((REDIS_LRU_CLOCK_MAX - o->lru) + server.lruclock) *
                    REDIS_LRU_CLOCK_RESOLUTION;
    }
}

这样计算会不会有问题呢？还是有的，即某个key就是很久很久没有访问，lruclock从头开始后，又超过了该key保存的lru访问时间，这个时间是多久呢，在现有的lru时钟1秒分辨率下，24bit可以表示的最长时间大约是194天，所以一个key如果连续194天没有访问了，Redis计算该key的idle时间是有误的，但是这种情况应该非常罕见。

Redis支持的淘汰策略比较多，这里只涉及和LRU相关的，

volatile-lru 设置了过期时间的key参与近似的lru淘汰策略
allkeys-lru 所有的key均参与近似的lru淘汰策略

当进行LRU淘汰时，Redis按如下方式进行的，

  ......
            /* volatile-lru and allkeys-lru policy */
            else if (server.maxmemory_policy == REDIS_MAXMEMORY_ALLKEYS_LRU ||
                server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
            {
                for (k = 0; k < server.maxmemory_samples; k++) {
                    sds thiskey;
                    long thisval;
                    robj *o;

                    de = dictGetRandomKey(dict);
                    thiskey = dictGetKey(de);
                    /* When policy is volatile-lru we need an additional lookup
                     * to locate the real key, as dict is set to db->expires. */
                    if (server.maxmemory_policy == REDIS_MAXMEMORY_VOLATILE_LRU)
                        de = dictFind(db->dict, thiskey);
                    o = dictGetVal(de);
                    thisval = estimateObjectIdleTime(o);

                    /* Higher idle time is better candidate for deletion */
                    if (bestkey == NULL || thisval > bestval) {
                        bestkey = thiskey;
                        bestval = thisval;
                    }
                }
            }
            ......

Redis会基于server.maxmemory_samples配置选取固定数目的key，然后比较它们的lru访问时间，然后淘汰最近最久没有访问的key，maxmemory_samples的值越大，Redis的近似LRU算法就越接近于严格LRU算法，但是相应消耗也变高，对性能有一定影响，样本值默认为5。

每个key的lru访问时间更新比较简单，但是有一点值得注意，为了避免fork子进程后额外的内存消耗，当进行bgsave或aof rewrite时，lru访问时间是不更新的。

robj *lookupKey(redisDb *db, robj *key) {
    dictEntry *de = dictFind(db->dict,key->ptr);
    if (de) {
        robj *val = dictGetVal(de);

        /* Update the access time for the ageing algorithm.
         * Don't do it if we have a saving child, as this will trigger
         * a copy on write madness. */
        if (server.rdb_child_pid == -1 && server.aof_child_pid == -1)
            val->lru = server.lruclock;
        return val;
    } else {
        return NULL;
    }
}

如果采用双向链表+hash表的方式来实现严格的LRU算法，初步估计每个key要增加额外32个字节左右的内存消耗，当key数量比较多时，还是会带来相当可观的内存消耗，Redis使用近似的LRU算法，每个key只需额外24bit的内存空间，节省还是相当的大的。下面我们介绍redis 3.x中对近似LRU算法的优化，使用尽量少的内存，使Redis的LRU算法更接近于严格LRU。

Redis近似LRU算法优化

上面我们看到了在Redis 2.8.19中LRU算法的具体实现，Redis使用了24 bit的lru时间戳来模拟一个近似的LRU算法，节省了实现一个严格LRU算法所需要的大量内存空间。

在Redis 3.0中对近似LRU算法进行了优化，既提升了算法的性能也提升了模拟效果。

Redis 3.0 LRU算法优化实现

Redis 3.0中主要做了如下优化：

LRU时钟的粒度从秒级提升为毫秒级
使用新的API来获取LRU替换时的采样样本
默认的LRU采样样本数从3提升为5
使用eviction pool来选取需要淘汰的key

提升LRU时钟的粒度，主要是为了在测试LRU算法性能时，能够在更短的时间内获取结果，更新LRU时钟的方法也有所变化，如果LRU时钟的时间粒度高于serverCron刷新的时间粒度，那么就主动获取最新的时间，否则使用server缓存的时间，

/* Macro used to obtain the current LRU clock.
 * If the current resolution is lower than the frequency we refresh the
 * LRU clock (as it should be in production servers) we return the
 * precomputed value, otherwise we need to resort to a system call. */
#define LRU_CLOCK() ((1000/server.hz <= LRU_CLOCK_RESOLUTION) ? server.lruclock : getLRUClock())

unsigned int getLRUClock(void) {
    return (mstime()/LRU_CLOCK_RESOLUTION) & LRU_CLOCK_MAX;
}

在Redis 2.8中每次选取淘汰样本时，都是调用dictGetRandomKey来随机获取一个key，会根据maxmemory-samples配置的大小，多次调用。这个流程在Redis 3.0中被优化为一次调用获取指定数量的key，且不需要每次都调用随机函数，如下，

unsigned int dictGetSomeKeys(dict *d, dictEntry **des, unsigned int count) {
    unsigned long j; /* internal hash table id, 0 or 1. */
    unsigned long tables; /* 1 or 2 tables? */
    unsigned long stored = 0, maxsizemask;
    unsigned long maxsteps;

    if (dictSize(d) < count) count = dictSize(d);
    maxsteps = count*10;

    /* Try to do a rehashing work proportional to 'count'. */
    for (j = 0; j < count; j++) {
        if (dictIsRehashing(d))
            _dictRehashStep(d);
        else
            break;
    }

    tables = dictIsRehashing(d) ? 2 : 1;
    maxsizemask = d->ht[0].sizemask;
    if (tables > 1 && maxsizemask < d->ht[1].sizemask)
        maxsizemask = d->ht[1].sizemask;

    /* Pick a random point inside the larger table. */
    unsigned long i = random() & maxsizemask;
    unsigned long emptylen = 0; /* Continuous empty entries so far. */
    while(stored < count && maxsteps--) {
        for (j = 0; j < tables; j++) {
            if (tables == 2 && j == 0 && i < (unsigned long) d->rehashidx) {
                if (i >= d->ht[1].size) i = d->rehashidx;
                continue;
            }
            if (i >= d->ht[j].size) continue; /* Out of range for this table. */
            dictEntry *he = d->ht[j].table[i];

            /* Count contiguous empty buckets, and jump to other
             * locations if they reach 'count' (with a minimum of 5). */
            if (he == NULL) {
                emptylen++;
                if (emptylen >= 5 && emptylen > count) {
                    i = random() & maxsizemask;
                    emptylen = 0;
                }
            } else {
                emptylen = 0;
                while (he) {
                    /* Collect all the elements of the buckets found non
                     * empty while iterating. */
                    *des = he;
                    des++;
                    he = he->next;
                    stored++;
                    if (stored == count) return stored;
                }
            }
        }
        i = (i+1) & maxsizemask;
    }
    return stored;
}

dictGetSomeKeys会随机从db的某个起始位置开始，连续获取指定数量的key，需要注意的是，如果db对应的字典正在做rehash，可能需要从两个hashtable来获取key。如果需要根据某种分布来随机获取字典里面的key，这种采样方式可能是不合适的，但是如果只是为了随机获取一系列key来作为LRU算法的淘汰样本，这种方式是可行的。

采样性能的提升带来的好处就是，我们可以在不牺牲淘汰算法性能的情况下，提高采样的样本数，让Redis的近似LRU算法更接近于严格LRU算法，所以目前Redis把超过maxmemory后默认的采样样本数从3个提升到5个。

最后一个也是最重要的改进是，选取要淘汰key的流程。之前是每次随机选取maxmemory-samples个key，然后比较它们的idle时间，idle时间最久的key会被淘汰掉。在Redis 3.0中增加了一个eviction pool的结构，eviction pool是一个数组，保存了之前随机选取的key及它们的idle时间，数组里面的key按idle时间升序排序，当内存满了需要淘汰数据时，会调用dictGetSomeKeys选取指定的数目的key，然后更新到eviction pool里面，如果新选取的key的idle时间比eviction pool里面idle时间最小的key还要小，那么就不会把它插入到eviction pool里面，这个思路和LIRS替换算法利用的每个块的历史信息思想有些类似，

eviction pool更新逻辑代码如下，

#define EVICTION_SAMPLES_ARRAY_SIZE 16
void evictionPoolPopulate(dict *sampledict, dict *keydict, struct evictionPoolEntry *pool) {
    int j, k, count;
    dictEntry *_samples[EVICTION_SAMPLES_ARRAY_SIZE];
    dictEntry **samples;

    /* Try to use a static buffer: this function is a big hit...
     * Note: it was actually measured that this helps. */
    if (server.maxmemory_samples <= EVICTION_SAMPLES_ARRAY_SIZE) {
        samples = _samples;
    } else {
        samples = zmalloc(sizeof(samples[0])*server.maxmemory_samples);
    }

    count = dictGetSomeKeys(sampledict,samples,server.maxmemory_samples);
    for (j = 0; j < count; j++) {
        unsigned long long idle;
        sds key;
        robj *o;
        dictEntry *de;

        de = samples[j];
        key = dictGetKey(de);
        /* If the dictionary we are sampling from is not the main
         * dictionary (but the expires one) we need to lookup the key
         * again in the key dictionary to obtain the value object. */
        if (sampledict != keydict) de = dictFind(keydict, key);
        o = dictGetVal(de);
        idle = estimateObjectIdleTime(o);

        /* Insert the element inside the pool.
         * First, find the first empty bucket or the first populated
         * bucket that has an idle time smaller than our idle time. */
        k = 0;
        while (k < MAXMEMORY_EVICTION_POOL_SIZE &&
               pool[k].key &&
               pool[k].idle < idle) k++;
        if (k == 0 && pool[MAXMEMORY_EVICTION_POOL_SIZE-1].key != NULL) {
            /* Can't insert if the element is < the worst element we have
             * and there are no empty buckets. */
            continue;
        } else if (k < MAXMEMORY_EVICTION_POOL_SIZE && pool[k].key == NULL) {
            /* Inserting into empty position. No setup needed before insert. */
        } else {
            /* Inserting in the middle. Now k points to the first element
             * greater than the element to insert.  */
            if (pool[MAXMEMORY_EVICTION_POOL_SIZE-1].key == NULL) {
                /* Free space on the right? Insert at k shifting
                 * all the elements from k to end to the right. */
                memmove(pool+k+1,pool+k,
                    sizeof(pool[0])*(MAXMEMORY_EVICTION_POOL_SIZE-k-1));
            } else {
                /* No free space on right? Insert at k-1 */
                k--;
                /* Shift all elements on the left of k (included) to the
                 * left, so we discard the element with smaller idle time. */
                sdsfree(pool[0].key);
                memmove(pool,pool+1,sizeof(pool[0])*k);
            }
        }
        pool[k].key = sdsdup(key);
        pool[k].idle = idle;
    }
    if (samples != _samples) zfree(samples);
}

当选取的淘汰策略和LRU相关时（allkeys-lru或volatile-lru），freeMemoryIfNeeded会调用evictionPoolPopulate来更新eviction pool，然后淘汰掉eviction pool里面的最后一个元素所对应的key，这样的选取淘汰key的方式的好处是：假设说新随机选取的key的访问时间可能比历史随机选取的key的访问时间还要新，但是在Redis 2.8中，新选取的key会被淘汰掉，这和LRU算法利用的访问局部性原理是相违背的，在Redis 3.0中，这种情况被避免了。

此外，如果某个历史选取的key的idle时间相对来说比较久，但是本次淘汰并没有被选中，因为出现了idle时间更久的key，那么在使用eviction pool的情况下，这种idle时间比较久的key淘汰概率增大了，因为它在eviction pool里面被保存下来，参与下轮淘汰，这个思路和访问局部性原理是契合的。

image.png

我们可以看到在前面1/2需要淘汰的key里面（浅灰色的点），Redis 3.0残留下来的key明显比Redis 2.8少了很多，而且后面新插入的1/2的key里面（绿色的点），Redis 3.0没有一个淘汰的key。

小结

Redis 3.0中对于LRU替换算法的优化，在只维护一个eviction pool带来的少量开销情况下，对算法效率的提升是比较明显的，效率的提升带来的是访问命中率的提升。同时，在目前3.4的unstable版本中我们也可以看见Redis计划实现LFU算法以支持更丰富的业务场景。