【netty学习笔记九】FastThreadLocal原理

这篇我们分析下FastThreadLocal的原理。我们知道jdk有自带的ThreadLocal类，那为什么netty要搞个FastThreadLocal，顾名思义，FastThreadLocal相对于ThreadLocal会更快，那么是怎么实现的呢？先看看FastThreadLocal的注释：

/**
// FastThreadLocal比ThreadLocal有更高的访问性能，当在FastThreadLocalThread中使用的时候
 * A special variant of {@link ThreadLocal} that yields higher access performance when accessed from a
 * {@link FastThreadLocalThread}.
 * <p>
 // FastThreadLocal使用数组中的下标来代替用hash方法查找元素，对比hash方法有略微的优势，适用于经常访问的情况
 * Internally, a {@link FastThreadLocal} uses a constant index in an array, instead of using hash code and hash table,
 * to look for a variable.  Although seemingly very subtle, it yields slight performance advantage over using a hash
 * table, and it is useful when accessed frequently.
 * </p><p>
 // 想要利用FastThreadLocal快的优势，线程必须使用FastThreadLocalThread或子类
 * To take advantage of this thread-local variable, your thread must be a {@link FastThreadLocalThread} or its subtype.
 * By default, all threads created by {@link DefaultThreadFactory} are {@link FastThreadLocalThread} due to this reason.
 * </p><p>
 * Note that the fast path is only possible on threads that extend {@link FastThreadLocalThread}, because it requires
 * a special field to store the necessary state.  An access by any other kind of thread falls back to a regular
 * {@link ThreadLocal}.
 * </p>
 *

源码注释上说的比较清楚了，jdk使用ThreadLocalMap来存储ThreadLocal，底层是一个hash结构，key冲突采取线性检测法。而FastThreadLocal底层是一个数组，每个FastThreadLocal对应一个下标，访问起来自然比ThreadLocal快，主要在2个场景：

key较多的情况下，hash+线性检测法访问性能下降；
经常访问的情况下，数组因为连续存储的优势会被cpu缓存，即访问下标1时，会将下标1及后面几个下标缓存到高性能缓存组件中，下次访问下标2就不用访问相对较慢的内存了。
值得注意的是，必须在FastThreadLocalThread中才能发挥FastThreadLocal快的优势。下面我们看下FastThreadLocal的实现，首先看例子：

public class FastThreadLocalTest {
    private static FastThreadLocal<String> threadLocal = new FastThreadLocal<>();

    public static void main(String[] args) {
        set();
        System.out.println(get()); 
    }

    private static String get() {
        return threadLocal.get();
    }

    private static void set() {
        threadLocal.set("abc");
    }
}

首先看下set方法：

public final void set(V value) {
        if (value != InternalThreadLocalMap.UNSET) {
            InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
            setKnownNotUnset(threadLocalMap, value))
        } else {
            remove();
        }
    }

如果value != InternalThreadLocalMap.UNSET则先获取InternalThreadLocalMap：

public static InternalThreadLocalMap get() {
        Thread thread = Thread.currentThread();
        if (thread instanceof FastThreadLocalThread) {
            return fastGet((FastThreadLocalThread) thread);
        } else {
            return slowGet();
        }
    }

如果当前线程是FastThreadLocalThread，则fastGet，否则slowGet。slowGet说明会比较慢，也对应了源码注释中说的在FastThreadLocalThread线程下才能发挥快的优势。那我们先看下slowGet:

private static InternalThreadLocalMap slowGet() {
        //这里的slowThreadLocalMap是一个ThreadLocal<InternalThreadLocalMap>
        ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
        InternalThreadLocalMap ret = slowThreadLocalMap.get();
        if (ret == null) {
            ret = new InternalThreadLocalMap();
            slowThreadLocalMap.set(ret);
        }
        return ret;
    }

static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();

首先用jdk的ThreadLocal存放InternalThreadLocalMap，然后InternalThreadLocalMap再存放value值。那慢是显而易见的了，首先要访问ThreadLocal拿到InternalThreadLocalMap，然后才能进行其他操作。
再看看fastGet：

private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
        InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
        if (threadLocalMap == null) {
            thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
        }
        return threadLocalMap;
    }

获取FastThreadLocalThread中的threadLocalMap，没有则new一个并初始化。那么再看看InternalThreadLocalMap：

private InternalThreadLocalMap() {
        super(newIndexedVariableTable());
    }

    private static Object[] newIndexedVariableTable() {
        Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];
        Arrays.fill(array, UNSET);
        return array;
    }

InternalThreadLocalMap是FastThreadLocal底层存储结构，不同于ThreadLocalMap使用hash结构，InternalThreadLocalMap直接使用数据，初始化大小为32，全部填满自定义的UNSET对象。
继续看setKnownNotUnset

private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
        if (threadLocalMap.setIndexedVariable(index, value)) {
            addToVariablesToRemove(threadLocalMap, this);
        }
    }

public boolean setIndexedVariable(int index, Object value) {
        Object[] lookup = indexedVariables;
        if (index < lookup.length) {
            Object oldValue = lookup[index];
            lookup[index] = value;
            return oldValue == UNSET;
        } else {
            //扩容
            expandIndexedVariableTableAndSet(index, value);
            return true;
        }
    }
private void expandIndexedVariableTableAndSet(int index, Object value) {
        //扩容为原来2倍，并且保证是2的n次方（和hashmap扩容一样）
        Object[] oldArray = indexedVariables;
        final int oldCapacity = oldArray.length;
        int newCapacity = index;
        newCapacity |= newCapacity >>>  1;
        newCapacity |= newCapacity >>>  2;
        newCapacity |= newCapacity >>>  4;
        newCapacity |= newCapacity >>>  8;
        newCapacity |= newCapacity >>> 16;
        newCapacity ++;
        //将以前的数组元素复制到新数组
        Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
        Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
        newArray[index] = value;
        indexedVariables = newArray;
    }

先看setIndexedVariable操作，首先获取数组，如果待访问的下标index大于数组长度，那么就扩容并插入新值，否则直接插入新值。
然后再看下addToVariablesToRemove(当插入而非更新时，setIndexedVariable方法会返回true):

private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {
        Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
        Set<FastThreadLocal<?>> variablesToRemove;
        if (v == InternalThreadLocalMap.UNSET || v == null) {
            //创建IdentityHashMap并放入InternalThreadLocalMap的下标为0处
            variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>());
            threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove);
        } else {
            variablesToRemove = (Set<FastThreadLocal<?>>) v;
        }
        //将FastThreadLocal放入variablesToRemove
        variablesToRemove.add(variable);
    }

这里将FastThreadLocal放入variablesToRemove（Set集合），当需要remove时可以快速移除，参考removeAll方法：

removeAll会在FastThreadLocalThread线程执行完毕时执行
public static void removeAll() {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.getIfSet();
        if (threadLocalMap == null) {
            return;
        }

        try {
            Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
            if (v != null && v != InternalThreadLocalMap.UNSET) {
                @SuppressWarnings("unchecked")
                Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;
                FastThreadLocal<?>[] variablesToRemoveArray =
                        variablesToRemove.toArray(new FastThreadLocal[0]);
                for (FastThreadLocal<?> tlv: variablesToRemoveArray) {
                    tlv.remove(threadLocalMap);
                }
            }
        } finally {
            InternalThreadLocalMap.remove();
        }
    }

到这里FastThreadLocal的set方法就讲完了，值得一提的是低版本的FastThreadLocal还有个ObjectCleaner来解决非FastThreadLocalThread线程使用了jdk版本的ThreadLocal所带来的内存泄露问题，不过高版本已经删除了这段逻辑，原因见这里：https://github.com/netty/netty/commit/5b1fe611a637c362a60b391079fff73b1a4ef912#diff-e0eb4e9a6ea15564e4ddd076c55978de，这里就不多说了。
继续看get方法:

public final V get() {
        InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
        Object v = threadLocalMap.indexedVariable(index);
        if (v != InternalThreadLocalMap.UNSET) {
            return (V) v;
        }

        return initialize(threadLocalMap);
    }
public Object indexedVariable(int index) {
        Object[] lookup = indexedVariables;
        return index < lookup.length? lookup[index] : UNSET;
    }

如果通过下标在数组中找到了值，则直接返回。否则初始化个null并返回null。