Java的集合类比较庞大，来看一个简单的架构图。

http://www.cnblogs.com/yangliguo/p/7476788.html

我们可以看到Java的Set中的数据是不重复的，无序的。

而HashSet是基于HashMap来实现的，使用了HashMap的Key来实现各种操作。由于HashMap的key不允许重复，而允许null值，并且线程不安全，所以HashSet也是不允许值重复，允许null值，线程不安全的。

1. HashSet的实现

构造方法

HashSet的构造方法指明了hashSet内部用到的是一个HashMap，并且HashMap的键是我们储存在HashSet里的数据，而值是一个PRESENT的空对象。

可以看到，构造初始化的HashMap容量是16，装载因子是0.75
当然，你可以自行设置你需要的大小和装载因子。

 /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and the specified load factor.
     *
     * @param      initialCapacity   the initial capacity of the hash map
     * @param      loadFactor        the load factor of the hash map
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero, or if the load factor is nonpositive
     */
    public HashSet(int initialCapacity, float loadFactor) {
        map = new HashMap<>(initialCapacity, loadFactor);
    }

  /**
     * Constructs a new, empty set; the backing <tt>HashMap</tt> instance has
     * the specified initial capacity and default load factor (0.75).
     *
     * @param      initialCapacity   the initial capacity of the hash table
     * @throws     IllegalArgumentException if the initial capacity is less
     *             than zero
     */
    public HashSet(int initialCapacity) {
        map = new HashMap<>(initialCapacity);
    }

不例外，HashSet也同样可以通过Collection来做初始化。

  /**
     * Constructs a new set containing the elements in the specified
     * collection.  The <tt>HashMap</tt> is created with default load factor
     * (0.75) and an initial capacity sufficient to contain the elements in
     * the specified collection.
     *
     * @param c the collection whose elements are to be placed into this set
     * @throws NullPointerException if the specified collection is null
     */
    public HashSet(Collection<? extends E> c) {
        map = new HashMap<>(Math.max((int) (c.size()/.75f) + 1, 16));
        addAll(c);
    }

Math.max((int) (c.size()/.75f) + 1, 16)的意思是，通过原始集合的长度来算HashMap应开辟的空间，但无论如何，HashMap都长度不应该小于16。

2. 不重复原理

要怎样才能做到不重复？

下面是HashSet里的add()方法

 /**
     * Adds the specified element to this set if it is not already present.
     * More formally, adds the specified element <tt>e</tt> to this set if
     * this set contains no element <tt>e2</tt> such that
     * <tt>(e==null&nbsp;?&nbsp;e2==null&nbsp;:&nbsp;e.equals(e2))</tt>.
     * If this set already contains the element, the call leaves the set
     * unchanged and returns <tt>false</tt>.
     *
     * @param e element to be added to this set
     * @return <tt>true</tt> if this set did not already contain the specified
     * element
     */
    public boolean add(E e) {
        return map.put(e, PRESENT)==null;
    }

关注这一段注释:
If this set already contains the element, the call leaves the set
* unchanged and returns <tt>false</tt>.

add方法返回true说明添加成功，返回false说明集合中已有该元素，添加失败。

而这一切又是由HashMap来决定的。
将上面的问题换个说法，如何判断两个对象是相等的？我们知道两个对象相等之后，就可以不添加后者，保证序列元素的唯一性。

下面有两个事情要考虑。

引用相等
对象相等

引用相等

示意如下：

Head First Java

我们都知道如果如下代码应该返回的结果：

public class Main {
    public static void main(String[] args) {
        Object obj = new Object();
        Object obj2 = obj;
        System.out.println(obj == obj2);
    }
}

两个引用指向同一个对象，所以引用相等。

对象相等

但是对象相等是个什么概念呢？
如果说要把堆上的两个对象视为相等的，那么不仅仅要满足equal的成立，也要满足hashCode的成立。

看这样一段代码

public class Main {
    public static void main(String[] args) {
        String a = new String("SS");
        String b = new String("SS");

        System.out.println(a.hashCode());
        System.out.println(b.hashCode());
        System.out.println(a.hashCode() == b.hashCode());
    }
}

输出的结果是true。
这样就能判断这两个对象是相等的，因为在String类中，也有复写相应的hashCode方法。

那下面我们重写一个类。

package SetDemo;

public class Test {
    private int id;
    private String s;

    public Test(int id, String s) {
        this.id = id;
        this.s = s;
    }

    @Override
    public boolean equals(Object obj) {
        if(obj instanceof Test){
            Test a = (Test) obj;
            return a.id == this.id && this.s.equals(a.s);
        }
        return false;
    }
}

其中我们简要重写了equal方法
经测试

public class Main {
    public static void main(String[] args) {
        Test a = new Test(2,"233");
        Test b = new Test(2,"233");

        System.out.println(a.equals(b));   //返回true
        System.out.println(a.hashCode() == b.hashCode());  //返回false
    }
}

返回结果是equal，但是hashCode不相等。
有了这个铺垫，就知道为什么HashSet能做到检查重复了。

当你把对象加入HashSet的时候，他会使用hashCode来判断对象需要放入的位置，但同时也会与其它已经加入的对象的hashCode做对比，如果没有出现hashCode一致的情况，就判断没有元素重复。也就是说，只有在hashCode不同的时候，HashSet才把两个对象判断成不同的两个对象。

这样我们上面的例子，两个对象明明equal，却因为hashCode不同而被判断不相同，所以为了避免这种情况，我们必须重写hashCode方法。

上面就是一道很常见的面试题的答案：为什么重写equal()方法也要重写hashCode()方法？

这里直接使用s的hashCode方法

@Override
    public int hashCode() {
        return this.s.hashCode();
    }

3. HashCode

思考一些问题，API规定，

如果两个对象相等，那么他们的hashCode也一定要相等；
如果两个对象相等，那么调用的equal方法也必须返回true
如果两个对象hashCode相等，但是他们不一定是相等的（前面复写的hashCode方法仅仅用了s的hashCode，但不能保证id一定相等。）
如果equal被覆盖过，所以hashCode也必须被覆盖
hashCode来源于内存规则，由堆上内存分配生成的一个数字，所以说如果不重写hashCode()，两个对象是永远不会相等的。

至于hashCode在某些情况下为什么会相等，这主要是取决于hashMap的哈希算法了，数据结构中所涉及到的。

HashSet浅谈