这篇我们分析下FastThreadLocal的原理。我们知道jdk有自带的ThreadLocal类,那为什么netty要搞个FastThreadLocal,顾名思义,FastThreadLocal相对于ThreadLocal会更快,那么是怎么实现的呢?先看看FastThreadLocal的注释:
/**
// FastThreadLocal比ThreadLocal有更高的访问性能,当在FastThreadLocalThread中使用的时候
* A special variant of {@link ThreadLocal} that yields higher access performance when accessed from a
* {@link FastThreadLocalThread}.
* <p>
// FastThreadLocal使用数组中的下标来代替用hash方法查找元素,对比hash方法有略微的优势,适用于经常访问的情况
* Internally, a {@link FastThreadLocal} uses a constant index in an array, instead of using hash code and hash table,
* to look for a variable. Although seemingly very subtle, it yields slight performance advantage over using a hash
* table, and it is useful when accessed frequently.
* </p><p>
// 想要利用FastThreadLocal快的优势,线程必须使用FastThreadLocalThread或子类
* To take advantage of this thread-local variable, your thread must be a {@link FastThreadLocalThread} or its subtype.
* By default, all threads created by {@link DefaultThreadFactory} are {@link FastThreadLocalThread} due to this reason.
* </p><p>
* Note that the fast path is only possible on threads that extend {@link FastThreadLocalThread}, because it requires
* a special field to store the necessary state. An access by any other kind of thread falls back to a regular
* {@link ThreadLocal}.
* </p>
*
源码注释上说的比较清楚了,jdk使用ThreadLocalMap来存储ThreadLocal,底层是一个hash结构,key冲突采取线性检测法。而FastThreadLocal底层是一个数组,每个FastThreadLocal对应一个下标,访问起来自然比ThreadLocal快,主要在2个场景:
- key较多的情况下,hash+线性检测法访问性能下降;
- 经常访问的情况下,数组因为连续存储的优势会被cpu缓存,即访问下标1时,会将下标1及后面几个下标缓存到高性能缓存组件中,下次访问下标2就不用访问相对较慢的内存了。
值得注意的是,必须在FastThreadLocalThread中才能发挥FastThreadLocal快的优势。下面我们看下FastThreadLocal的实现,首先看例子:
public class FastThreadLocalTest {
private static FastThreadLocal<String> threadLocal = new FastThreadLocal<>();
public static void main(String[] args) {
set();
System.out.println(get());
}
private static String get() {
return threadLocal.get();
}
private static void set() {
threadLocal.set("abc");
}
}
首先看下set方法:
public final void set(V value) {
if (value != InternalThreadLocalMap.UNSET) {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
setKnownNotUnset(threadLocalMap, value))
} else {
remove();
}
}
如果value != InternalThreadLocalMap.UNSET则先获取InternalThreadLocalMap:
public static InternalThreadLocalMap get() {
Thread thread = Thread.currentThread();
if (thread instanceof FastThreadLocalThread) {
return fastGet((FastThreadLocalThread) thread);
} else {
return slowGet();
}
}
如果当前线程是FastThreadLocalThread,则fastGet,否则slowGet。slowGet说明会比较慢,也对应了源码注释中说的在FastThreadLocalThread线程下才能发挥快的优势。那我们先看下slowGet:
private static InternalThreadLocalMap slowGet() {
//这里的slowThreadLocalMap是一个ThreadLocal<InternalThreadLocalMap>
ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = UnpaddedInternalThreadLocalMap.slowThreadLocalMap;
InternalThreadLocalMap ret = slowThreadLocalMap.get();
if (ret == null) {
ret = new InternalThreadLocalMap();
slowThreadLocalMap.set(ret);
}
return ret;
}
static final ThreadLocal<InternalThreadLocalMap> slowThreadLocalMap = new ThreadLocal<InternalThreadLocalMap>();
首先用jdk的ThreadLocal存放InternalThreadLocalMap,然后InternalThreadLocalMap再存放value值。那慢是显而易见的了,首先要访问ThreadLocal拿到InternalThreadLocalMap,然后才能进行其他操作。
再看看fastGet:
private static InternalThreadLocalMap fastGet(FastThreadLocalThread thread) {
InternalThreadLocalMap threadLocalMap = thread.threadLocalMap();
if (threadLocalMap == null) {
thread.setThreadLocalMap(threadLocalMap = new InternalThreadLocalMap());
}
return threadLocalMap;
}
获取FastThreadLocalThread中的threadLocalMap,没有则new一个并初始化。那么再看看InternalThreadLocalMap:
private InternalThreadLocalMap() {
super(newIndexedVariableTable());
}
private static Object[] newIndexedVariableTable() {
Object[] array = new Object[INDEXED_VARIABLE_TABLE_INITIAL_SIZE];
Arrays.fill(array, UNSET);
return array;
}
InternalThreadLocalMap是FastThreadLocal底层存储结构,不同于ThreadLocalMap使用hash结构,InternalThreadLocalMap直接使用数据,初始化大小为32,全部填满自定义的UNSET对象。
继续看setKnownNotUnset
private void setKnownNotUnset(InternalThreadLocalMap threadLocalMap, V value) {
if (threadLocalMap.setIndexedVariable(index, value)) {
addToVariablesToRemove(threadLocalMap, this);
}
}
public boolean setIndexedVariable(int index, Object value) {
Object[] lookup = indexedVariables;
if (index < lookup.length) {
Object oldValue = lookup[index];
lookup[index] = value;
return oldValue == UNSET;
} else {
//扩容
expandIndexedVariableTableAndSet(index, value);
return true;
}
}
private void expandIndexedVariableTableAndSet(int index, Object value) {
//扩容为原来2倍,并且保证是2的n次方(和hashmap扩容一样)
Object[] oldArray = indexedVariables;
final int oldCapacity = oldArray.length;
int newCapacity = index;
newCapacity |= newCapacity >>> 1;
newCapacity |= newCapacity >>> 2;
newCapacity |= newCapacity >>> 4;
newCapacity |= newCapacity >>> 8;
newCapacity |= newCapacity >>> 16;
newCapacity ++;
//将以前的数组元素复制到新数组
Object[] newArray = Arrays.copyOf(oldArray, newCapacity);
Arrays.fill(newArray, oldCapacity, newArray.length, UNSET);
newArray[index] = value;
indexedVariables = newArray;
}
先看setIndexedVariable操作,首先获取数组,如果待访问的下标index大于数组长度,那么就扩容并插入新值,否则直接插入新值。
然后再看下addToVariablesToRemove(当插入而非更新时,setIndexedVariable方法会返回true):
private static void addToVariablesToRemove(InternalThreadLocalMap threadLocalMap, FastThreadLocal<?> variable) {
Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
Set<FastThreadLocal<?>> variablesToRemove;
if (v == InternalThreadLocalMap.UNSET || v == null) {
//创建IdentityHashMap并放入InternalThreadLocalMap的下标为0处
variablesToRemove = Collections.newSetFromMap(new IdentityHashMap<FastThreadLocal<?>, Boolean>());
threadLocalMap.setIndexedVariable(variablesToRemoveIndex, variablesToRemove);
} else {
variablesToRemove = (Set<FastThreadLocal<?>>) v;
}
//将FastThreadLocal放入variablesToRemove
variablesToRemove.add(variable);
}
这里将FastThreadLocal放入variablesToRemove(Set集合),当需要remove时可以快速移除,参考removeAll方法:
removeAll会在FastThreadLocalThread线程执行完毕时执行
public static void removeAll() {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.getIfSet();
if (threadLocalMap == null) {
return;
}
try {
Object v = threadLocalMap.indexedVariable(variablesToRemoveIndex);
if (v != null && v != InternalThreadLocalMap.UNSET) {
@SuppressWarnings("unchecked")
Set<FastThreadLocal<?>> variablesToRemove = (Set<FastThreadLocal<?>>) v;
FastThreadLocal<?>[] variablesToRemoveArray =
variablesToRemove.toArray(new FastThreadLocal[0]);
for (FastThreadLocal<?> tlv: variablesToRemoveArray) {
tlv.remove(threadLocalMap);
}
}
} finally {
InternalThreadLocalMap.remove();
}
}
到这里FastThreadLocal的set方法就讲完了,值得一提的是低版本的FastThreadLocal还有个ObjectCleaner来解决非FastThreadLocalThread线程使用了jdk版本的ThreadLocal所带来的内存泄露问题,不过高版本已经删除了这段逻辑,原因见这里:https://github.com/netty/netty/commit/5b1fe611a637c362a60b391079fff73b1a4ef912#diff-e0eb4e9a6ea15564e4ddd076c55978de,这里就不多说了。
继续看get方法:
public final V get() {
InternalThreadLocalMap threadLocalMap = InternalThreadLocalMap.get();
Object v = threadLocalMap.indexedVariable(index);
if (v != InternalThreadLocalMap.UNSET) {
return (V) v;
}
return initialize(threadLocalMap);
}
public Object indexedVariable(int index) {
Object[] lookup = indexedVariables;
return index < lookup.length? lookup[index] : UNSET;
}
如果通过下标在数组中找到了值,则直接返回。否则初始化个null并返回null。