【Java】【翻译】HashMap源代码解析（一）

一、问题

下面是面试HashMap常见的一些问题

1、HashMap、HashTable、ConcurrentHashMap的区别是什么？

2、哪几个参数比较重要？

3、HashMap触发Resize操作后通过位运算来减少时间开销的大致流程是怎样的？

4、JDK的迭代过程中对HashMap的元素存储进行了结构上的优化，其存储方式与查询时间的复杂度如何？

由此可见，HashMap非常重要，今天着重讲讲HashMap的一些底层知识；

让我们来看看HashMap的源码；

======翻译不易，转载请注明出处======

二、总结

1、Hash Map基于实现了Map接口，提供了所有map的操作方法，并允许null的key和null的value；

2、HashMap类粗略的等同于Hashtable，但是它不是同步的（unsynchronized），并且允许nulls；该类不保证映射的顺序;尤其注意，不能保证保持固定不变的顺序；；

3、HashMap实现类为基本的操作（例如get，set方法）提供持久的表现；假设，hash功能将括号里的元素都分散了；集合中的迭代器要求一定倍数比例的HashMap类的实例(一定数量的括号)的容量能力来增加他的容量大小(key-value映射)；因此，如果看重迭代器的性能的话，初始化时设置较小的容量非常重要（否则，加载器会非常慢）；

4、一个HashMap的实例（instance）有两个参数会影响他的表现；即【初始化容量】和【加载因子】（load factor）；【容量】是hash table中“篮子”的数量；这里可以把HashMap理解为一个超市里的“存储柜”，“存储柜”里有好多个篮子（寄存东西的小格子），篮子越多，装的东西越多，每个篮子都有一个编号，这个编号，就是key；每个篮子里装的东西，就是value；你告诉这个大箱子一个key，他就会根据你提供的key去获取篮子里的值；如果没有这个编号，他就获取不到任何东西；但是，篮子越多，对系统性能要求就越高，程序运行也就越慢，越卡；【初始化容量】是当hash table 创建时就直接决定了；【加载因子】是衡量hash table被允许在装满之前自动扩容的能力；当一个hash table中的许多entries键值对超过【加载因子】的乘积（product不光有产品的意思，还有乘积的意思），并且当前容量时；hash table会重新哈希（rehashed）（就是说，内部数据构造器会重构）；因此，hash table会获得大约2倍的容量；

5、作为一个普通的规则，默认的加载因子0.75会提供一个在时间和容量花费上很折中（tradeoff）的数值；大于0.75会减少空间上线消耗，但是会增加查找时间花费（大多数HashMap类的操作中的反射，包括get和set方法）；当设置初始容量时，预计map中一定数量的键值对和加载因子必须纳入计算，才能最小化rehash操作的次数。如果【初始化容量】比键值对数目/加载因子的最大值还要大，那么，rehash操作将永远不能发生；

6、如果一个HashMap的实例中存储太多映射，创建一个足够大的容量值得HashMap来让键值对被有效地存储，而不是让他一直自动rehash来扩展表格；就是说，能一次性装进去，就一次性装，不要老是让Hash table 一次次扩容；**PS:就跟公司组织员工出去旅游一样，能一个大车将员工全部装走最好，即使空点座位，当然，座位空太多也浪费；不要一车装不下，还要打电话叫一个小车来，更不要出现再叫一个一样的大车过来这种情况；** 注意，使用许多键值对的key是同一个hashCode,绝对会降低hash table的性能；为了改善这种影响，当key是可比较的，这个类将会使用key值排序，来帮助断裂关系；

7、注意，HashMap的实现是异步的；如果多个线程同时访问一个hashMap，并且至少一个线程改变的这个map的结构，那么，其必须同步；（一个结构上的改变可以是任何操作，例如add或delete一个或多个键值对；仅仅更改与实例已经包含的键关联的值并不是结构修改）；这主要是通过在一个对象上的自然压缩map同步完成的；

8、如果有这样一个对象存在，这个map必须使用synchronizedMap的打包（wrapper）方法；这个打包方法最好在创建的时候，来防止访问异步的map；

Map m = Collections.synchronizedMap(new HashMap(...));

9、迭代器会通过HashMap类中集合的视图方法返回是fail-fast：任何时候，当迭代器创建后，map的结构发生改变，除了通过迭代器自己的remove方法，其他情况，迭代器会抛出ConcurrentModificationException这个异常；因此，当面临并发改变，迭代器会快速失效并清除，而不是武断地冒险，防止以后在不确定的时间发生不确定的行为；

10、注意，迭代器的fail-fast行为不会保证，通俗地讲，不可能在异步并发修改出现，做出任何强制主动操作；fail-fast迭代器会在底层尽可能抛出【ConcurrentModificationException】异常；因此，它是否会错误地写程序取决于这个为了正确性而抛出的异常，迭代器的fail-fast行为必须在发现bugs时只有时使用；

三、附录

java.util.HashMap源代码；

/**

* Hash table based implementation of the <tt>Map</tt> interface. This

* implementation provides all of the optional map operations, and permits

* <tt>null</tt> values and the <tt>null</tt> key. (The <tt>HashMap</tt>

* class is roughly equivalent to <tt>Hashtable</tt>, except that it is

* unsynchronized and permits nulls.) This class makes no guarantees as to

* the order of the map; in particular, it does not guarantee that the order

* will remain constant over time.

*

* This implementation provides constant-time performance for the basic

* operations (<tt>get</tt> and <tt>put</tt>), assuming the hash function

* disperses the elements properly among the buckets. Iteration over

* collection views requires time proportional to the "capacity" of the

* <tt>HashMap</tt> instance (the number of buckets) plus its size (the number

* of key-value mappings). Thus, it's very important not to set the initial

* capacity too high (or the load factor too low) if iteration performance is

* important.

*

* An instance of <tt>HashMap</tt> has two parameters that affect its

* performance: initial capacity and load factor. The

* capacity is the number of buckets in the hash table, and the initial

* capacity is simply the capacity at the time the hash table is created. The

* load factor is a measure of how full the hash table is allowed to

* get before its capacity is automatically increased. When the number of

* entries in the hash table exceeds the product of the load factor and the

* current capacity, the hash table is rehashed (that is, internal data

* structures are rebuilt) so that the hash table has approximately twice the

* number of buckets.

*

* As a general rule, the default load factor (.75) offers a good

* tradeoff between time and space costs. Higher values decrease the

* space overhead but increase the lookup cost (reflected in most of

* the operations of the <tt>HashMap</tt> class, including

* <tt>get</tt> and <tt>put</tt>). The expected number of entries in

* the map and its load factor should be taken into account when

* setting its initial capacity, so as to minimize the number of

* rehash operations. If the initial capacity is greater than the

* maximum number of entries divided by the load factor, no rehash

* operations will ever occur.

*

* If many mappings are to be stored in a <tt>HashMap</tt>

* instance, creating it with a sufficiently large capacity will allow

* the mappings to be stored more efficiently than letting it perform

* automatic rehashing as needed to grow the table. Note that using

* many keys with the same {@code hashCode()} is a sure way to slow

* down performance of any hash table. To ameliorate impact, when keys

* are {@link Comparable}, this class may use comparison order among

* keys to help break ties.

*

* Note that this implementation is not synchronized.

* If multiple threads access a hash map concurrently, and at least one of

* the threads modifies the map structurally, it must be

* synchronized externally. (A structural modification is any operation

* that adds or deletes one or more mappings; merely changing the value

* associated with a key that an instance already contains is not a

* structural modification.) This is typically accomplished by

* synchronizing on some object that naturally encapsulates the map.

*

* If no such object exists, the map should be "wrapped" using the

* {@link Collections#synchronizedMap Collections.synchronizedMap}

* method. This is best done at creation time, to prevent accidental

* unsynchronized access to the map:<pre>

* Map m = Collections.synchronizedMap(new HashMap(...));</pre>

*

* The iterators returned by all of this class's "collection view methods"

* are fail-fast: if the map is structurally modified at any time after

* the iterator is created, in any way except through the iterator's own

* <tt>remove</tt> method, the iterator will throw a

* {@link ConcurrentModificationException}. Thus, in the face of concurrent

* modification, the iterator fails quickly and cleanly, rather than risking

* arbitrary, non-deterministic behavior at an undetermined time in the

* future.

*

* Note that the fail-fast behavior of an iterator cannot be guaranteed

* as it is, generally speaking, impossible to make any hard guarantees in the

* presence of unsynchronized concurrent modification. Fail-fast iterators

* throw <tt>ConcurrentModificationException</tt> on a best-effort basis.

* Therefore, it would be wrong to write a program that depended on this

* exception for its correctness: the fail-fast behavior of iterators

* should be used only to detect bugs.

*

* This class is a member of the

* <a href="{@docRoot}/../technotes/guides/collections/index.html">

* Java Collections Framework</a>.

*

* @param <K> the type of keys maintained by this map

* @param <V> the type of mapped values

*

* @author Doug Lea

* @author Josh Bloch

* @author Arthur van Hoff

* @author Neal Gafter

* @see Object#hashCode()

* @see Collection

* @see Map

* @see TreeMap

* @see Hashtable

* @since 1.2

*/

【Java】【翻译】HashMap源代码解析（一）

【Java】【翻译】HashMap源代码解析（一）

相关阅读更多精彩内容

友情链接更多精彩内容