LRU缓存算法

一、什么是缓存

这里说的缓存是一种广义的概念，在计算机存储层次结构中，低一层的存储器都可以看做是高一层的缓存。比如Cache是内存的缓存，内存是硬盘的缓存，硬盘是网络的缓存等等。

缓存可以有效地解决存储器性能与容量的这对矛盾，但绝非看上去那么简单。如果缓存算法设计不当，非但不能提高访问速度，反而会使系统变得更慢。

从本质上来说，缓存之所以有效是因为程序和数据的局部性（locality）。程序会按固定的顺序执行，数据会存放在连续的内存空间并反复读写。这些特点使得我们可以缓存那些经常用到的数据，从而提高读写速度。

缓存的大小是固定的，它应该只保存最常被访问的那些数据。然而未来不可预知，我们只能从过去的访问序列做预测，于是就有了各种各样的缓存替换策略。本文介绍一种简单的缓存策略，称为最近最少使用（LRU，Least Recently Used）算法。

二、LRU的实现

我们以内存访问为例解释缓存的工作原理。假设缓存的大小固定，初始状态为空。每发生一次读内存操作，首先查找待读取的数据是否存在于缓存中，若是，则缓存命中，返回数据；若否，则缓存未命中，从内存中读取数据，并把该数据添加到缓存中。向缓存添加数据时，如果缓存已满，则需要删除访问时间最早的那条数据，这种更新缓存的方法就叫做LRU。

实现LRU时，我们需要关注它的读性能和写性能，理想的LRU应该可以在O(1)的时间内读取一条数据或更新一条数据，也就是说读写的时间复杂度都是O(1)。

此时很容易想到使用HashMap，根据数据的键访问数据可以达到O(1)的速度。但是更新缓存的速度却无法达到O(1)，因为需要确定哪一条数据的访问时间最早，这需要遍历所有缓存才能找到。

因此，我们需要一种既按访问时间排序，又能在常数时间内随机访问的数据结构。

这可以通过HashMap+双向链表实现。HashMap保证通过key访问数据的时间为O(1)，双向链表则按照访问时间的顺序依次穿过每个数据。之所以选择双向链表而不是单链表，是为了可以从中间任意结点修改链表结构，而不必从头结点开始遍历。

如下图所示，黑色部分为HashMap的结构，红色箭头则是双向链表的正向连接（逆向连接未画出）。可以清晰地看到，数据的访问顺序是1->3->5->6->10。我们只需要在每次访问过后改变链表的连接顺序即可。

HashMap+双向链表

实现代码如下：

/**
 * @author wjg
 * 
 * LRU（Least Recently Used）缓存算法
 * 使用HashMap+双向链表，使get和put的时间复杂度达到O(1)。
 * 读缓存时从HashMap中查找key，更新缓存时同时更新HashMap和双向链表，双向链表始终按照访问顺序排列。
 *
 */
public class LRUCache {

    /**
     * @param args
     * 测试程序，访问顺序为[[1,1],[2,2],[1],[3,3],[2],[4,4],[1],[3],[4]]，其中成对的数调用put，单个数调用get。
     * get的结果为[1],[-1],[-1],[3],[4]，-1表示缓存未命中，其它数字表示命中。
     */
    public static void main(String[] args) {
        
        LRUCache cache = new LRUCache(2);
        cache.put(1, 1);
        cache.put(2, 2);
        System.out.println(cache.get(1));
        cache.put(3, 3);
        System.out.println(cache.get(2));
        cache.put(4, 4);
        System.out.println(cache.get(1));
        System.out.println(cache.get(3));
        System.out.println(cache.get(4));

    }
    
    // 缓存容量
    private final int capacity;
    // 用于加速缓存项随机访问性能的HashMap
    private HashMap<Integer, Entry> map;
    // 双向链表头结点，该侧的缓存项访问时间较早
    private Entry head;
    // 双向链表尾结点，该侧的缓存项访问时间较新
    private Entry tail;

    public LRUCache(int capacity) {
        this.capacity = capacity;
        map = new HashMap<Integer, Entry>((int)(capacity / 0.75 + 1), 0.75f);
        head = new Entry(0, 0);
        tail = new Entry(0, 0);
        head.next = tail;
        tail.prev = head;
    }
    
    /**
     * 从缓存中获取key对应的值，若未命中则返回-1
     * @param key 键
     * @return key对应的值，若未命中则返回-1
     */
    public int get(int key) {
        if (map.containsKey(key)) {
            Entry entry = map.get(key);
            popToTail(entry);
            return entry.value;
        }
        return -1;
    }
    
    /**
     * 向缓存中插入或更新值
     * @param key 待更新的键
     * @param value 待更新的值
     */
    public void put(int key, int value) {
        if (map.containsKey(key)) {
            Entry entry = map.get(key);
            entry.value = value;
            popToTail(entry);
        }
        else {
            Entry newEntry = new Entry(key, value);
            if (map.size() >= capacity) {
                Entry first = removeFirst();
                map.remove(first.key);
            }
            addToTail(newEntry);
            map.put(key, newEntry);
        }
    }
    
    /**
     * 缓存项的包装类，包含键、值、前驱结点、后继结点
     * @author wjg
     *
     */
    class Entry {
        int key;
        int value;
        Entry prev;
        Entry next;
        
        Entry(int key, int value) {
            this.key = key;
            this.value = value;
        }
    }
    
    // 将entry结点移动到链表末端
    private void popToTail(Entry entry) {
        Entry prev = entry.prev;
        Entry next = entry.next;
        prev.next = next;
        next.prev = prev;
        Entry last = tail.prev;
        last.next = entry;
        tail.prev = entry;
        entry.prev = last;
        entry.next = tail;
    }
    
    // 移除链表首端的结点
    private Entry removeFirst() {
        Entry first = head.next;
        Entry second = first.next;
        head.next = second;
        second.prev = head;
        return first;
    }
    
    // 添加entry结点到链表末端
    private void addToTail(Entry entry) {
        Entry last = tail.prev;
        last.next = entry;
        tail.prev = entry;
        entry.prev = last;
        entry.next = tail;
    }

}

每个方法和成员变量前都有中文注释，不必过多解释。

值得一提的是，Java API中其实已经有数据类型提供了我们需要的功能，就是LinkedHashMap这个类。该类内部也是采用HashMap+双向链表实现的。使用这个类实现LRU就简练多了。

/**
 * 
 * 一个更简单实用的LRUCache方案，使用LinkedHashMap即可实现。
 * LinkedHashMap提供了按照访问顺序排序的方案，内部也是使用HashMap+双向链表。
 * 只需要重写removeEldestEntry方法，当该方法返回true时，LinkedHashMap会删除最旧的结点。
 * 
 * @author wjg
 *
 */
public class LRUCacheSimple {

    /**
     * @param args
     */
    public static void main(String[] args) {
        LRUCacheSimple cache = new LRUCacheSimple(2);
        cache.put(1, 1);
        cache.put(2, 2);
        System.out.println(cache.get(1));
        cache.put(3, 3);
        System.out.println(cache.get(2));
        cache.put(4, 4);
        System.out.println(cache.get(1));
        System.out.println(cache.get(3));
        System.out.println(cache.get(4));
    }
    
    private LinkedHashMap<Integer, Integer> map;
    private final int capacity;
    public LRUCacheSimple(int capacity) {
        this.capacity = capacity;
        map = new LinkedHashMap<Integer, Integer>(capacity, 0.75f, true){
            protected boolean removeEldestEntry(Map.Entry eldest) {
                return size() > capacity;
            }
        };
    }
    public int get(int key) {
        return map.getOrDefault(key, -1);
    }
    public void put(int key, int value) {
        map.put(key, value);
    }

}

只需要覆写LinkedHashMap的removeEldestEntry方法，在缓存已满的情况下返回true，内部就会自动删除最老的元素。

感兴趣的同学可以做一下LeetCode 146. LRU Cache这道题，尝试一下如何实现这个算法。

完整版代码LRUCache.java收录在GitHub仓库Algorithms中，欢迎大家下载试用。

参考资料

Cache replacement policies WikiPedia
146. LRU Cache LeetCode
Laziest implementation: Java's LinkedHashMap takes care of everything sky-xu