HashMap分析面试总结

本着针对面试，不负责任的态度，写下《面试总结》系列。本系列记录面试过程中各个知识点，而不是入门系列，如果有不懂的自行学习。

基础知识
1. 可以接受null键和值，而Hashtable则不能
2. 非synchronized，所以很快
3. 存储的是键值对
4. 使用数组+链表的方式存储数据
5. 对key进行hash(散列)算法，所以顺序不固定
6. 实际使用Node存储
常量&变量


 // public class HashMap extends AbstractMap<K,V> implements Map<K,V>, Cloneable, Serializable {}
  /**
   * The default initial capacity - MUST be a power of two.
   默认数组长度
   */
  static final int DEFAULT_INITIAL_CAPACITY = 1 << 4; 

  /**
   * The maximum capacity, used if a higher value is implicitly specified
   * by either of the constructors with arguments.
   * MUST be a power of two <= 1<<30.
   * 数组最大长度
   */
  static final int MAXIMUM_CAPACITY = 1 << 30;

  /**
   * The load factor used when none specified in constructor.
   * 默认装填因子
   */
  static final float DEFAULT_LOAD_FACTOR = 0.75f;
  
  
  static class Node<K,V> implements Map.Entry<K,V> {
      final int hash;
      final K key;
      V value;
      Node<K,V> next;
  }
  
    /**
   * The number of key-value mappings contained in this map.
   */
  transient int size;

  /**
   * 阈值
   * The next size value at which to resize (capacity * load factor).
   *
   * @serial
   */
  // (The javadoc description is true upon serialization.
  // Additionally, if the table array has not been allocated, this
  // field holds the initial array capacity, or zero signifying
  // DEFAULT_INITIAL_CAPACITY.)
  int threshold;
  
  public V put(K key, V value) {
      return putVal(hash(key), key, value, false, true);
  }
  //实际存储方法
  final V putVal(int hash, K key, V value, boolean onlyIfAbsent,boolean evict) {}
  //扩容方法
  final Node<K,V>[] resize() {}
  
  
  public V get(Object key) {
      Node<K,V> e;
      return (e = getNode(hash(key), key)) == null ? null : e.value;
  }
  //实际取值方法
  final Node<K,V> getNode(int hash, Object key) {}

用法
1. put(key,value)
  调用hashCode(key)，使用node存储hash,key,value，如果hashcode存在则使用链表存储。
2. get(key)
  根据key的hashcode找到Entry，然后获取值对象，如果根据hashcode找到的是个链表，再去根据key.equals()判断，链表中正确的节点。

关于扩容

当HashMap的大小超过了阈值(size> threshold)的时候(默认的装填因子为0.75，也就是说当一个map填满了它定义容量的75%就会去扩容),HashMap大小会扩大到原来的2倍。整个过程类似于创建新的数组，将原数组的元素重新hash后放到新数组中(rehashing)。

HashMap是非同步的，所以在多线程中使用时需要注意扩容等问题

相关概念
- hashing的概念
- HashMap中解决碰撞的方法
- equals()和hashCode()的应用，以及它们在HashMap中的重要性
- 不可变对象的好处
- HashMap多线程的条件竞争
- 重新调整HashMap的大小

参考地址：http://www.importnew.com/7099.html

以上是网上能搜到的解释，下面是个人总结的知识点提要

如面试遇到此问题，第一步，反问面试官，您说的是哪个版本的HashMap

hashmap底层使用数组+链表的数据结构，实现存储数据，使用拉链法解决碰撞问题。
map.put(key,value)的时候，内部会对key进行一次hash算法，得到一个hash值，对这个hash值&操作得到元素在数组中的位置。
如果该位置没有元素，那么直接加入，如果发生碰撞了，那么用拉链法，需要遍历链表比较key和hash值，如果有就覆盖，没有就到表尾了，所以会插到表尾。
初始容量为16，加载因子0.75，当map添加的元素超过12个的时候会触发扩容机制。数组的容量翻倍，已经存入的元素做rehash的操作，重新在数组中找位置存储。
java8后改为碰撞链表元素超过8个，用红黑树实现
java8在表尾，java7是在链表头插入

思考点：
什么情况下考虑使用SparseArray和ArrayMap替换HashMap的情况

相关面试题

1. 为什么HashMap的容量总是2^x？

从源码中可以看到，当putVal方法中，是通过tab[i = (n - 1) & hash]得到在数组中位置的。
依稀记得当年在学校中，学到hash算法的时候，学的都是n%size运算，来确定数值在数组中的位置，而HashMap中为什么要用到&运算呢。

原因如下

大家都知道&运算要比%运算速度快，虽然可能是几毫米的差别。
在n为2^x时，(n-1)&hash == hash%n

为什么容量总是2^x？
首先，Hash算法要解决的一个最大的问题，就是hash冲突，既然不能避免hash冲突，那么就要有个好的算法解决。

而在做&运算时，如果选用非2ⁿ的数时，n-1转换为二进制，不能保证后几位全为1，这样做在&hash的运算中，不能做到均匀分布。违背了(n-1)&hash的初衷。

(16)₁₀ = 2⁴ = (10000)₂
(16-1)₁₀ =(1111)₂
假设n的值非2^x值，10
(10-1)₁₀ =(1001)₂
(19-1)₁₀ =(10011)₂

10011
&1111
=(11)₂=(3)₁₀

10011
&1001
=(1)₂=(1)₁₀

同样的%运算，19%16 = 3 ，19%10 = 9。

任意一个数与(1111)₂做&运算，都不会因为(1111)₂的值而影响到运算结果。

2. 如果初始化HashMap的时候定义大小为非2^x会影响到计算吗？

答案是，肯定不会，这种情况JAVA的工程师肯定考虑到了。

源码中我们可以看到，传入的capacity只是影响到了threshold的值，而threshold的值还是通过tableSizeFor()确定的。

  public HashMap(int initialCapacity) {
        this(initialCapacity, DEFAULT_LOAD_FACTOR);
    }

public HashMap(int initialCapacity, float loadFactor) {
        if (initialCapacity < 0)
            throw new IllegalArgumentException("Illegal initial capacity: " + initialCapacity);
        if (initialCapacity > MAXIMUM_CAPACITY)
            initialCapacity = MAXIMUM_CAPACITY;
        if (loadFactor <= 0 || Float.isNaN(loadFactor))
            throw new IllegalArgumentException("Illegal load factor: " + loadFactor);
        this.loadFactor = loadFactor;
        this.threshold = tableSizeFor(initialCapacity);
    }

在tableSizeFor()方法中。

 static final int tableSizeFor(int cap) {
      // cap=10
        int n = cap - 1;
        // n  =9   1001
        n |= n >>> 1;
    // (1001)|(0100)=1101
        n |= n >>> 2;
    //(1101)|(0011)=1111
        n |= n >>> 4;
     // (1111)|(0000)=1111
        n |= n >>> 8;
    // (1111)|(0000)=1111
        n |= n >>> 16;
    // (1111)|(0000)=1111
      //return n+1 = (10000)=16
  //确保threshold 为16， 2的4次幂
        return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
    }

在putVal()方法中，如果第一次添加值，那么table==null，会进入到resize()方法中，这个时候，就会用到threshold创建新的Node数组。

final Node<K,V>[] resize() {
        Node<K,V>[] oldTab = table;
      //第一次添加值，table==null; oldCap = 0;
        int oldCap = (oldTab == null) ? 0 : oldTab.length;
      //将threshold的值设置为oldThr，下面创建table的时候用到
        int oldThr = threshold;
        int newCap, newThr = 0;
        if (oldCap > 0) {
           ....
        }
        else if (oldThr > 0) 
          //通过threshold设置新数组容量
            newCap = oldThr;
        else { 
            ....
        }
        if (newThr == 0) {
           ....
        }
        threshold = newThr;
        @SuppressWarnings({"rawtypes","unchecked"})
        //通过threshold设置table的初始容量
        Node<K,V>[] newTab = (Node<K,V>[])new Node[newCap];
        table = newTab;
       ....
        return newTab;
    }

通过以上操作，不论初始化HashMap的时候，传入的容量是多少，都能保证HashMap的容量是2^x。

HashMap分析面试总结