通过HashMap对比看ConcurrentHashMap
之前详细看了HashMap的实现,现在通过对比来学习下ConcurrentHashMap。
首先,jdk1.8之后,ConcurrentHashMap中的segment已经基本上废弃,原有的结构是三层:Segment—>Array—>链表
如果需要操作ConcurrentHashMap,则要先根据hash获取所在Segment,拿到锁之后,再计算在Segment中所处的数组index,进而操作。
现在的结构跟HashMap保持一致,废除了segment分段锁这一层,采用CAS+Synchronized来替代。
关键常量及变量
常量
HashMap:
static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; // aka 16
static final int MAXIMUM_CAPACITY = 1 << 30;
static final float DEFAULT_LOAD_FACTOR = 0.75f;
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;
ConcurrentHashMap:
private static final int MAXIMUM_CAPACITY = 1 << 30;
private static final int DEFAULT_CAPACITY = 16;
/**
* The largest possible (non-power of two) array size.
* Needed by toArray and related methods.
*/
static final int MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8;
/**
* The default concurrency level for this table. Unused but
* defined for compatibility with previous versions of this class.
*/
private static final int DEFAULT_CONCURRENCY_LEVEL = 16;
private static final float LOAD_FACTOR = 0.75f;
static final int TREEIFY_THRESHOLD = 8;
static final int UNTREEIFY_THRESHOLD = 6;
static final int MIN_TREEIFY_CAPACITY = 64;
/**
* Minimum number of rebinnings per transfer step. Ranges are
* subdivided to allow multiple resizer threads. This value
* serves as a lower bound to avoid resizers encountering
* excessive memory contention. The value should be at least
* DEFAULT_CAPACITY.
*/
private static final int MIN_TRANSFER_STRIDE = 16;
/**
* The number of bits used for generation stamp in sizeCtl.
* Must be at least 6 for 32bit arrays.
*/
private static int RESIZE_STAMP_BITS = 16;
/**
* The maximum number of threads that can help resize.
* Must fit in 32 - RESIZE_STAMP_BITS bits.
*/
private static final int MAX_RESIZERS = (1 << (32 - RESIZE_STAMP_BITS)) - 1;
/**
* The bit shift for recording size stamp in sizeCtl.
*/
private static final int RESIZE_STAMP_SHIFT = 32 - RESIZE_STAMP_BITS;
/*
* Encodings for Node hash fields. See above for explanation.
*/
static final int MOVED = -1; // hash for forwarding nodes // 表示正在扩容时 头节点的hash值
static final int TREEBIN = -2; // hash for roots of trees // 表示树的根节点hash值
static final int RESERVED = -3; // hash for transient reservations
// 除了首位是0,剩下的都是1,按位与,得正数(首位为0)
static final int HASH_BITS = 0x7fffffff; // usable bits of normal node hash
/** Number of CPUS, to place bounds on some sizings */
static final int NCPU = Runtime.getRuntime().availableProcessors();
/** For serialization compatibility. */
private static final ObjectStreamField[] serialPersistentFields = {
new ObjectStreamField("segments", Segment[].class),
new ObjectStreamField("segmentMask", Integer.TYPE),
new ObjectStreamField("segmentShift", Integer.TYPE)
};
通过对比可以看到,ConcurrentHashMap中增加了很多常量定义,而关键的loadFactor、DefaultCapacity、tree*相关变量,都还有保留。
下面详细看下多出来的这些常量都是有什么用
- MAX_ARRAY_SIZE,数组的最大长度,目前只在toArray方法中用到,而toArray方法在ConcurrentHashMap中并没有使用到,而且这里的数组并不是ConcurrentHashMap的存储底层table,所以这个参数可以暂时忽略。
- DEFAULT_CONCURRENCY_LEVEL,并发级别,看注释是没用的,只是为了保证老版本代码的兼容性。代码里面只有在序列化writeObject方法中有分支用到。暂时忽略。
- MIN_TRANSFER_STRIDE,最小迁移步幅,只在transfer方法中用到,暂时没太理解stride用来做什么的,//TODO
- RESIZE_STAMP_BITS,看注释没太明白什么意思,只在resizeStamp方法和下面两个常量中用到,先预留 //TODO
- MAX_RESIZERS 参与resize的最大线程数,如果是默认值,那么 (1 << (32-16))-1 = 32-1 = 31
- RESIZE_STAMP_SHIFT,值默认为16,看注释是用于记录sizeCtl中的 resize stamp偏移位,但是还是没太理解是啥意思 // TODO
- 一系列hash标识值,如果是-1,表示是forwardingNode 节点,如果是-2,说明是treeBin节点,也就是树的根节点。
- NCPU:number of cpus,当前服务器的核数
可以看到,这里多余的常量大都是resize相关,可以想到,ConcurrentHashMap的resize会比hashmap扩容复杂很多。我们接着往下看
变量
hashmap中的变量
transient Node<K,V>[] table;
transient Set<Map.Entry<K,V>> entrySet;
/**
* The number of key-value mappings contained in this map.
*/
transient int size;
transient int modCount;
int threshold;
final float loadFactor;
然后是ConcurrentHashMap中的变量:
/**
* The array of bins. Lazily initialized upon first insertion.
* Size is always a power of two. Accessed directly by iterators.
*/
transient volatile Node<K,V>[] table;
/**
* The next table to use; non-null only while resizing.
*/
private transient volatile Node<K,V>[] nextTable;
/**
* Base counter value, used mainly when there is no contention,
* but also as a fallback during table initialization
* races. Updated via CAS.
*/
private transient volatile long baseCount;
/**
* Table initialization and resizing control. When negative, the
* table is being initialized or resized: -1 for initialization,
* else -(1 + the number of active resizing threads). Otherwise,
* when table is null, holds the initial table size to use upon
* creation, or 0 for default. After initialization, holds the
* next element count value upon which to resize the table.
*/
private transient volatile int sizeCtl;
/**
* The next table index (plus one) to split while resizing.
*/
private transient volatile int transferIndex;
/**
* Spinlock (locked via CAS) used when resizing and/or creating CounterCells.
*/
private transient volatile int cellsBusy;
/**
* Table of counter cells. When non-null, size is a power of 2.
*/
private transient volatile CounterCell[] counterCells;
// views
private transient KeySetView<K,V> keySet;
private transient ValuesView<K,V> values;
private transient EntrySetView<K,V> entrySet;
可以看到,这里的差别就很大了,只有table和下面的keySet、entrySet之类的是重复的,其他的全都不一样。而且table在ConcurrentHashMap中还是volatile修饰的,在可见性方面做了保证。下面详细说下:
- table:作用跟HashMap中类似,用于存储节点,不过注意用了volatile修饰
- nextTable:看注释是在扩容的时候才会用到的集合,容量是table的两倍
- baseCount:记录当前ConcurrentHashMap中的元素个数,但是注意的是并不一定是当前的真实个数,通过cas无锁更新。
- sizeCtl:table初始化和resize时的控制位,通过注释可以看到,它的取值场景比较多,大概如下:
- 如果是负数,说明是初始化或resize中,-1表示正在初始化,-(1+N)表示正在被N个线程扩容
- 如果是正数或者0,表示table尚未被初始化。如果为正数,表示是初始化或者下次扩容时,table的容量大小
- 但是在实际测试时发现,它的值始终是当前ConcurrentHashMap容量的0.75倍,这与loadfactor是对应的。实际容量>=sizeCtl,则扩容。
- transferIndex:
- cellsBusy:在resize或创建counterCells时使用的自旋锁(通过CAS实现)
- counterCells:数组,长度总为2的N次方
详细的使用,会在下面实际的代码中一一补充。不过在这里并没有发现类似于HashMap中capacity的字段,也就是无法知道容量是多少。难道不需要么?
反射相关属性
// Unsafe mechanics
private static final sun.misc.Unsafe U;
private static final long SIZECTL;
private static final long TRANSFERINDEX;
private static final long BASECOUNT;
private static final long CELLSBUSY;
private static final long CELLVALUE;
private static final long ABASE;
private static final int ASHIFT;
static {
try {
U = sun.misc.Unsafe.getUnsafe(); // 获取Unsafe类的实例
Class<?> k = ConcurrentHashMap.class;
SIZECTL = U.objectFieldOffset
(k.getDeclaredField("sizeCtl"));
TRANSFERINDEX = U.objectFieldOffset
(k.getDeclaredField("transferIndex"));
BASECOUNT = U.objectFieldOffset
(k.getDeclaredField("baseCount"));
CELLSBUSY = U.objectFieldOffset
(k.getDeclaredField("cellsBusy"));
Class<?> ck = CounterCell.class;
CELLVALUE = U.objectFieldOffset
(ck.getDeclaredField("value"));
Class<?> ak = Node[].class;
ABASE = U.arrayBaseOffset(ak);
int scale = U.arrayIndexScale(ak);
if ((scale & (scale - 1)) != 0)
throw new Error("data type scale not a power of two");
ASHIFT = 31 - Integer.numberOfLeadingZeros(scale);
} catch (Exception e) {
throw new Error(e);
}
}
这里通过Unsafe反射获取到一些属性字段,分别对应上面的主要变量。
节点
先看下HashMap的节点Node源码:
/**
* Basic hash bin node, used for most entries. (See below for
* TreeNode subclass, and in LinkedHashMap for its Entry subclass.)
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
V value;
Node<K,V> next;
Node(int hash, K key, V value, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.value = value;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return value; }
public final String toString() { return key + "=" + value; }
public final int hashCode() {
return Objects.hashCode(key) ^ Objects.hashCode(value);
}
public final V setValue(V newValue) {
V oldValue = value;
value = newValue;
return oldValue;
}
public final boolean equals(Object o) {
if (o == this)
return true;
if (o instanceof Map.Entry) {
Map.Entry<?,?> e = (Map.Entry<?,?>)o;
if (Objects.equals(key, e.getKey()) &&
Objects.equals(value, e.getValue()))
return true;
}
return false;
}
}
然后是ConcurrentHashMap中的Node实现:
/**
* Key-value entry. This class is never exported out as a
* user-mutable Map.Entry (i.e., one supporting setValue; see
* MapEntry below), but can be used for read-only traversals used
* in bulk tasks. Subclasses of Node with a negative hash field
* are special, and contain null keys and values (but are never
* exported). Otherwise, keys and vals are never null.
*/
static class Node<K,V> implements Map.Entry<K,V> {
final int hash;
final K key;
volatile V val;
volatile Node<K,V> next;
Node(int hash, K key, V val, Node<K,V> next) {
this.hash = hash;
this.key = key;
this.val = val;
this.next = next;
}
public final K getKey() { return key; }
public final V getValue() { return val; }
public final int hashCode() { return key.hashCode() ^ val.hashCode(); }
public final String toString(){ return key + "=" + val; }
public final V setValue(V value) {
throw new UnsupportedOperationException();
}
public final boolean equals(Object o) {
Object k, v, u; Map.Entry<?,?> e;
return ((o instanceof Map.Entry) &&
(k = (e = (Map.Entry<?,?>)o).getKey()) != null &&
(v = e.getValue()) != null &&
(k == key || k.equals(key)) &&
(v == (u = val) || v.equals(u)));
}
/**
* Virtualized support for map.get(); overridden in subclasses.
*/
Node<K,V> find(int h, Object k) {
Node<K,V> e = this;
if (k != null) {
do {
K ek;
if (e.hash == h &&
((ek = e.key) == k || (ek != null && k.equals(ek))))
return e;
} while ((e = e.next) != null);
}
return null;
}
}
可以看到,其实基本上实现类似,不过ConcurrentHashMap中对于value、next增加了volatile修饰。
ConcurrentHashMap中定义的类
在比较内部实现之前,增加这块内容,从整体上把握,防止一叶障目不见泰山。当然是参照网上的资源:
主要的类都在这里,当然在jdk1.8中又增加了一些类,如下:
当然,虽然东西多,只抓主要的类,如下:
- Node类
Node类主要用于存储具体键值对,其子类有ForwardingNode、ReservationNode、TreeNode和TreeBin四个子类。
- Traverser类
Traverser类主要用于遍历操作,其子类有BaseIterator、KeySpliterator、ValueSpliterator、EntrySpliterator四个类,BaseIterator用于遍历操作。KeySplitertor、ValueSpliterator、EntrySpliterator则用于键、值、键值对的划分。
- CollectionView类
CollectionView抽象类主要定义了视图操作,其子类KeySetView、ValueSetView、EntrySetView分别表示键视图、值视图、键值对视图。对视图均可以进行操作。
- Segment类
Segment类在JDK1.8中与之前的版本的JDK作用存在很大的差别,JDK1.8下,其在普通的ConcurrentHashMap操作中已经没有用到,其在序列化与反序列化的时候会发挥作用。
- CounterCell
CounterCell类主要用于对baseCount的计数。