HashMap实现原理

本篇博客仅为本人了解HashMap原理过程中，查阅多篇博客后，为了加强记忆，写下此篇，在此对多篇博客多有借鉴

HashMap概述

HashMap是基于哈希表的Map接口的非同步实现。此实现提供所有可选的映射操作，并允许使用null值和null键。此类不保证映射的顺序，特别是它不保证该顺序恒久不变。

HashMap数据结构

HashMap中数据的存储是由数组与链表一起实现的。HashMap底层就是一个数组结构，数组中的每一项又是一个链表。当新建一个HashMap的时候，就会初始化一个数组。

数组

数组是在内存中开辟一段连续的空间，因此占用内存严重，故空间复杂的很大。我们只要知道数组首个元素的地址，在数组中寻址就会非常容易，其时间复杂度为O(1)。但是当要插入或删除数据时，时间复杂度就会变为O(n)。数组的特点是：寻址容易，插入和删除困难；

链表

链表在内存的存储区间是离散的，其插入和删除操作的内存复杂度为O(1)，但是寻址操作的复杂度却是O(n)。链表的特点是：寻址困难，插入和删除容易。

image.png

从上图中可以看出，HashMap底层就是一个数组结构，数组中的每一项又是一个链表。当新建一个HashMap的时候，就会初始化一个数组。

HashMap原理

HashMap类有一个叫做Entry的内部类。这个Entry类包含了key-value作为实例变量。每当往hashmap里面存放key-value对的时候，都会为它们实例化一个Entry对象，这个Entry对象就会存储在前面提到的Entry数组table中。Entry具体存在table的那个位置是根据key的hash值来决定。

/**
     * The table, resized as necessary. Length MUST Always be a power of two.
     */
transient Entry<K,V>[] table;
static class Entry<K,V> implements Map.Entry<K,V> {
        final K key;
        V value;
        Entry<K,V> next;
        int hash;

Entry就是数组中的元素，每个 Map.Entry 其实就是一个key-value对，它持有一个指向下一个元素的引用，这就构成了链表。

HashMap存取实现

存储

 /**
     * Associates the specified value with the specified key in this map.
     * If the map previously contained a mapping for the key, the old
     * value is replaced.
     *
     * @param key key with which the specified value is to be associated
     * @param value value to be associated with the specified key
     * @return the previous value associated with <tt>key</tt>, or
     *         <tt>null</tt> if there was no mapping for <tt>key</tt>.
     *         (A <tt>null</tt> return can also indicate that the map
     *         previously associated <tt>null</tt> with <tt>key</tt>.)
     */
    public V put(K key, V value) {
//HashMap允许存放null键值
//当key为null是，调用putForNullKey()方法，将value插入到数组的第一个位置，即角标为0的位置
        if (key == null)
            return putForNullKey(value);
//计算key的hash值
        int hash = hash(key);
//搜索hash值对应的指定数组的索引
        int i = indexFor(hash, table.length);
//如果i处的索引处Entry不为null，遍历e元素的下一个元素
        for (Entry<K,V> e = table[i]; e != null; e = e.next) {
            Object k;
            if (e.hash == hash && ((k = e.key) == key || key.equals(k))) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
//如果i出索引Entry为null，表明此处没有Entry
        modCount++;
//将key和value添加到i处索引
        addEntry(hash, key, value, i);
        return null;
    }

从上面的源代码中可以看出：当我们往HashMap中put元素的时候，先根据key的hash值，得到这个元素在数组中的位置（即下标），然后再在该索引上的单向链表进行循环遍历用equals比较key是否存在，如果存在则用新的value覆盖原值，如果不存在，则插入链表的头部，这与后面的多线程安全相关。

putForNullKey(V value)方法

/**
     * Offloaded version of put for null keys
     */
    private V putForNullKey(V value) {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null) {
                V oldValue = e.value;
                e.value = value;
                e.recordAccess(this);
                return oldValue;
            }
        }
        modCount++;
        addEntry(0, null, value, 0);
        return null;
    }

addEntry(int hash, K key, V value, int bucketIndex) 方法

根据计算出的hash值，将key-value对放在数组table的i索引处。addEntry 是 HashMap 提供的一个包访问权限的方法，代码如下：

/**
     * The number of key-value mappings contained in this map.
     */
    transient int size;

    /**
     * The next size value at which to resize (capacity * load factor).一般是大于0.75，开始扩容，double
     * @serial
     */
    int threshold;
/**
     * Adds a new entry with the specified key, value and hash code to
     * the specified bucket.  It is the responsibility of this
     * method to resize the table if appropriate.
     *
     * Subclass overrides this to alter the behavior of put method.
     */
    void addEntry(int hash, K key, V value, int bucketIndex) {
        if ((size >= threshold) && (null != table[bucketIndex])) {
            resize(2 * table.length);
            hash = (null != key) ? hash(key) : 0;
            bucketIndex = indexFor(hash, table.length);
        }

        createEntry(hash, key, value, bucketIndex);
    }

createEntry(int hash, K key, V value, int bucketIndex)方法

/**
     * Like addEntry except that this version is used when creating entries
     * as part of Map construction or "pseudo-construction" (cloning,
     * deserialization).  This version needn't worry about resizing the table.
     *
     * Subclass overrides this to alter the behavior of HashMap(Map),
     * clone, and readObject.
     */
    void createEntry(int hash, K key, V value, int bucketIndex) {
        Entry<K,V> e = table[bucketIndex];
        table[bucketIndex] = new Entry<>(hash, key, value, e);
        size++;
    }

读取

/**
     * Returns the value to which the specified key is mapped,
     * or {@code null} if this map contains no mapping for the key.
     *
     * <p>More formally, if this map contains a mapping from a key
     * {@code k} to a value {@code v} such that {@code (key==null ? k==null :
     * key.equals(k))}, then this method returns {@code v}; otherwise
     * it returns {@code null}.  (There can be at most one such mapping.)
     *
     * <p>A return value of {@code null} does not <i>necessarily</i>
     * indicate that the map contains no mapping for the key; it's also
     * possible that the map explicitly maps the key to {@code null}.
     * The {@link #containsKey containsKey} operation may be used to
     * distinguish these two cases.
     *
     * @see #put(Object, Object)
     */
    public V get(Object key) {
//如果key为null，调用getForNullkey()方法，如果数组0角标对应的Entry不为null，遍历e元素的下一个元素
        if (key == null)
            return getForNullKey();
        Entry<K,V> entry = getEntry(key);

        return null == entry ? null : entry.getValue();
    }

从上面的源代码中可以看出：从HashMap中get元素时，首先计算key的hash值，找到数组中对应位置的某一元素，然后通过key的equals方法在对应位置的链表中找到需要的元素。

getForNullKey()方法

/**
     * Offloaded version of get() to look up null keys.  Null keys map
     * to index 0.  This null case is split out into separate methods
     * for the sake of performance in the two most commonly used
     * operations (get and put), but incorporated with conditionals in
     * others.
     */
    private V getForNullKey() {
        for (Entry<K,V> e = table[0]; e != null; e = e.next) {
            if (e.key == null)
                return e.value;
        }
        return null;
    }

getEntry(Object key)方法

/**
     * Returns the entry associated with the specified key in the
     * HashMap.  Returns null if the HashMap contains no mapping
     * for the key.
     */
    final Entry<K,V> getEntry(Object key) {
        int hash = (key == null) ? 0 : hash(key);
        for (Entry<K,V> e = table[indexFor(hash, table.length)];
             e != null;
             e = e.next) {
            Object k;
            if (e.hash == hash &&
                ((k = e.key) == key || (key != null && key.equals(k))))
                return e;
        }
        return null;
    }

总结

HashMap 在底层将 key-value 当成一个整体进行处理，这个整体就是一个 Entry 对象。HashMap 底层采用一个 Entry[] 数组来保存所有的 key-value 对，当需要存储一个 Entry 对象时，会根据hash值来决定其在数组中的存储位置，在根据equals方法决定其在该数组位置上的链表中的存储位置；当需要取出一个Entry时，也会根据hash值找到其在数组中的存储位置，再根据equals方法从该位置上的链表中取出该Entry。

山外山

HashMap的两个重要属性是容量capacity和装载因子loadfactor，默认值分别为16和0.75，当容器中的元素个数大于 capacity*loadfactor = 12时，容器会进行扩容resize 为2n，在初始化Hashmap时可以对着两个值进行修改，负载因子0.75被证明为是性能比较好的取值，通常不会修改，那么只有初始容量capacity会导致频繁的扩容行为，这是非常耗费资源的操作，所以，如果事先能估算出容器所要存储的元素数量，最好在初始化时修改默认容量capacity，以防止频繁的resize操作影响性能。

java8对hashmap做了优化，底层有两种实现方法，一种是数组和链表，一种是数组和红黑树，hsahmap会根据数据量选择存储结构
if (binCount >= TREEIFY_THRESHOLD - 1)
当符合这个条件的时候，把链表变成treemap，这样查找效率从o(n)变成了o(log n)