2020-03-04-Android的ArrayMap

前面介绍过HashMap和SparseArray的原理，今天简单看一下另一种数据结构，ArrayMap。实际上ArrayMap和SparseArray都是二分查找，它们的查询时间复杂度都是O(log n)。两者的数据结构存在差异，SparseArray是维持两个Object类型的数组，分别存放key和value，ArrayMap是维持一个int类型的hash数组，用于排序和搜索，和一个Object类型的数组，同时存放key和value。为了避免对象的频繁创建和回收，ArrayMap可以最多维持大小为4和8的两种缓存数组各10个。

ArrayMap原理.jpg

构造函数

首先看一下ArrayMap有两个数组，一个int类型的数组mHashes用来存储key值的hash，一个Object类型的数组mArray，用来存储key和value的键值对，它的大小是mHashes的两倍。mSize用来记录现存在多少对数据。

    int[] mHashes;
    Object[] mArray;
    int mSize;

同时ArrayMap本身维持两个缓存数组，这种缓存机制是为了避免对象的频繁创建和回收。mBaseCache容量大小是4，mTwiceBaseCache容量是8，最大缓存数组个数都是10.

    static Object[] mBaseCache;
    static Object[] mTwiceBaseCache;

要理解缓存机制，必须看一下内存分配和内存释放的过程。
构造函数有两个参数，capacity是容量，默认是0。identityHashCode是判断采用哪种hashCode。这里不详细介绍，可以看下hashCode和System.identityHashCode方法的区别。我的理解是hashCode方法是可以被重写的。

    public ArrayMap(int capacity, boolean identityHashCode) {
        mIdentityHashCode = identityHashCode;

        // If this is immutable, use the sentinal EMPTY_IMMUTABLE_INTS
        // instance instead of the usual EmptyArray.INT. The reference
        // is checked later to see if the array is allowed to grow.
        if (capacity < 0) {//1
            mHashes = EMPTY_IMMUTABLE_INTS;
            mArray = EmptyArray.OBJECT;
        } else if (capacity == 0) {//1
            mHashes = EmptyArray.INT;
            mArray = EmptyArray.OBJECT;
        } else {//2
            allocArrays(capacity);
        }
        mSize = 0;//3
    }

1.如果capacity小于等于0，构造两个空数组；
其中EmptyArray是一个final类，EmptyArray.OBJECT相当于new Object[0];EmptyArray.INT相当于new int[0];
2.如果capacity大于0，通过allocArrays分配内存。
3.初始状态下没有数据，mSize=0。
接下来看看allocArrays分配内存的方法。

    private void allocArrays(final int size) {
        if (mHashes == EMPTY_IMMUTABLE_INTS) {
            throw new UnsupportedOperationException("ArrayMap is immutable");
        }
        if (size == (BASE_SIZE*2)) {//1
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCache != null) {//缓存不为空
                    final Object[] array = mTwiceBaseCache;//取出第一个缓存数组
                    mArray = array;//赋值给mArray
                    mTwiceBaseCache = (Object[])array[0];//将mTwiceBaseCache指向下一个缓存数组
                    mHashes = (int[])array[1];//将mHashes指向下一个hash数组
                    array[0] = array[1] = null;//将原来已取出的数组回收
                    mTwiceBaseCacheSize--;//缓存数组数量减一
                    if (DEBUG) Log.d(TAG, "Retrieving 2x cache " + mHashes
                            + " now have " + mTwiceBaseCacheSize + " entries");
                    return;
                }
            }
        } else if (size == BASE_SIZE) {//2
            synchronized (ArrayMap.class) {
                if (mBaseCache != null) {
                    final Object[] array = mBaseCache;
                    mArray = array;
                    mBaseCache = (Object[])array[0];
                    mHashes = (int[])array[1];
                    array[0] = array[1] = null;
                    mBaseCacheSize--;
                    if (DEBUG) Log.d(TAG, "Retrieving 1x cache " + mHashes
                            + " now have " + mBaseCacheSize + " entries");
                    return;
                }
            }
        }
        //3
        mHashes = new int[size];
        mArray = new Object[size<<1];
    }

1.如果需要分配的capacity容量等于8，直接将缓存数组mTwiceBaseCache赋值给mArray，然后将mTwichBaseCache指向下一个缓存数组，这里实际上是一个链表结构，最后回收已取出的缓存数组；
2.如果capacity容量等于4，将缓存数组mBaseCache赋值给mArray ，原理同上；
3.其他情况下，不使用缓存数组，直接通过new构建新数组。

数据获取

先看一下get方法。

    @Override
    public V get(Object key) {
        final int index = indexOfKey(key);
        return index >= 0 ? (V)mArray[(index<<1)+1] : null;
    }

它首先通过indexOfKey方法查找该元素所在的位置。

    public int indexOfKey(Object key) {
        return key == null ? indexOfNull()
                : indexOf(key, mIdentityHashCode ? System.identityHashCode(key) : key.hashCode());
    }

如果key不为null，通过indexOf方法查找。接下来看下indexOf的代码。

    int indexOf(Object key, int hash) {
        final int N = mSize;

        // Important fast case: if nothing is in here, nothing to look for.
        if (N == 0) {
            return ~0;//1
        }

        int index = binarySearchHashes(mHashes, N, hash);//2

        // If the hash code wasn't found, then we have no entry for this key.
        if (index < 0) {
            return index;//3
        }

        // If the key at the returned index matches, that's what we want.
        if (key.equals(mArray[index<<1])) {
            return index;//4
        }

        // Search for a matching key after the index.
        int end;
        for (end = index + 1; end < N && mHashes[end] == hash; end++) {
            if (key.equals(mArray[end << 1])) return end;//5
        }

        // Search for a matching key before the index.
        for (int i = index - 1; i >= 0 && mHashes[i] == hash; i--) {
            if (key.equals(mArray[i << 1])) return i;//6
        }

        // Key not found -- return negative value indicating where a
        // new entry for this key should go.  We use the end of the
        // hash chain to reduce the number of array entries that will
        // need to be copied when inserting.
        return ~end;//7
    }

1.如果当前数组大小为0，直接返回0的反码；
2.注释2处是一个二分查找方法binarySearchHashes，之前已经分析过；
3.如果index小于0，该key不存在，直接返回；
4.如果该索引位置key值相等，返回索引index；因为mArray存放的是key和value的键值对，大小是mHashes的两倍，所以需要对索引index左移一位，作乘2运算。
5.如果该位置key值不相等，说明存在冲突，从index往后寻找，找到key值直接返回。
6.从index往前查找，找到key值直接返回。
7.如果没有找到，说明key值不存在，返回当前位置的反码。

数据插入

数据插入主要是put方法。

    public V put(K key, V value) {
        final int osize = mSize;
        final int hash;
        int index;
        if (key == null) {//1
            hash = 0;
            index = indexOfNull();
        } else {//2
            hash = mIdentityHashCode ? System.identityHashCode(key) : key.hashCode();
            index = indexOf(key, hash);
        }
        if (index >= 0) {//3
            index = (index<<1) + 1;
            final V old = (V)mArray[index];
            mArray[index] = value;
            return old;
        }

        index = ~index;
        if (osize >= mHashes.length) {//4
            final int n = osize >= (BASE_SIZE*2) ? (osize+(osize>>1))
                    : (osize >= BASE_SIZE ? (BASE_SIZE*2) : BASE_SIZE);

            if (DEBUG) Log.d(TAG, "put: grow from " + mHashes.length + " to " + n);

            final int[] ohashes = mHashes;
            final Object[] oarray = mArray;
            allocArrays(n);//5

            if (CONCURRENT_MODIFICATION_EXCEPTIONS && osize != mSize) {
                throw new ConcurrentModificationException();
            }

            if (mHashes.length > 0) {//6
                if (DEBUG) Log.d(TAG, "put: copy 0-" + osize + " to 0");
                System.arraycopy(ohashes, 0, mHashes, 0, ohashes.length);
                System.arraycopy(oarray, 0, mArray, 0, oarray.length);
            }

            freeArrays(ohashes, oarray, osize);//7
        }

        if (index < osize) {//8
            if (DEBUG) Log.d(TAG, "put: move " + index + "-" + (osize-index)
                    + " to " + (index+1));
            System.arraycopy(mHashes, index, mHashes, index + 1, osize - index);
            System.arraycopy(mArray, index << 1, mArray, (index + 1) << 1, (mSize - index) << 1);
        }

        if (CONCURRENT_MODIFICATION_EXCEPTIONS) {
            if (osize != mSize || index >= mHashes.length) {
                throw new ConcurrentModificationException();
            }
        }
        mHashes[index] = hash;//9
        mArray[index<<1] = key;
        mArray[(index<<1)+1] = value;
        mSize++;
        return null;
    }

1.如果key值等于null，寻找null的索引；
2.key值不等于null，通过二分查找寻找索引；
3.如果key值存在，直接替换value，返回旧的value；
4.如果当前元素个数大于等于数组容量，需要进行扩容，如果数组容量小于8，以4或8为单位进行扩容；
5.扩容后需要通过allocArrays方法重新进行内存的分配；
6.如果旧的数组不为空，将旧数组内容复制到新数组；
6.将旧数组内存释放；
7.如果索引位置在原数组中间，将索引以后的内容后移；
8.插入key和value。
接下来看一下上面注释6处对内存的回收方法freeArrays。

    private static void freeArrays(final int[] hashes, final Object[] array, final int size) {
        if (hashes.length == (BASE_SIZE*2)) {
            synchronized (ArrayMap.class) {
                if (mTwiceBaseCacheSize < CACHE_SIZE) {
                    array[0] = mTwiceBaseCache;//1
                    array[1] = hashes;//2
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;//3
                    }
                    mTwiceBaseCache = array;//4
                    mTwiceBaseCacheSize++;//5
                    if (DEBUG) Log.d(TAG, "Storing 2x cache " + array
                            + " now have " + mTwiceBaseCacheSize + " entries");
                }
            }
        } else if (hashes.length == BASE_SIZE) {//6
            synchronized (ArrayMap.class) {
                if (mBaseCacheSize < CACHE_SIZE) {
                    array[0] = mBaseCache;
                    array[1] = hashes;
                    for (int i=(size<<1)-1; i>=2; i--) {
                        array[i] = null;
                    }
                    mBaseCache = array;
                    mBaseCacheSize++;
                    if (DEBUG) Log.d(TAG, "Storing 1x cache " + array
                            + " now have " + mBaseCacheSize + " entries");
                }
            }
        }
    }

1.将回收的数组指向mTwiceBaseCache第一个元素位置；
2.回收的hash指向mTwiceBaseCache第二个元素位置；
3.该数组其它位置被赋值null，对象回收；
4.将mTwiceBaseCache指向数组；
5.缓存数量加1；
6.如果回收的数组长度是4，原理同上，增加一个mBaseCache 缓存。

java内存分配.jpg

最后举一个例子，从初始状态开始，mArray，mHashes，mBaseCache和mTwiceBaseCache是4个空数组，都没有分配内存。
1.首先通过put插入第一对数据；
2.通过indexOf方法查找应该插入的索引位置，因为此时mSize等于0，返回了0的反码；
3.因为数组容量是0，需要进行扩容，分配4个单位的内存；
4.在allocArrays方法中对mHashes和mArray进行初始化，长度分别是4和8，但是这里只是创建了两个引用数组，并没有创建对象，没有在堆中分配内存；
5.将hash插入mHashes数组的第一个位置，将key插入mArray数组第一个位置，value插入数组第二个位置，这里创建了对象也分配了内存；
6.通过put方法插入第二对数据，假设第二对数据的hash比原来小，将原来数据后移一位，往前面插入数据，重复3次，这样已经插入4对数据；
7.插入第五个数据的时候就需要扩容，分配8个单位的内存；
8.然后将原来的数组内存释放，这个时候就将原来的数组赋值给mBaseCache；
9.如果同个进程内的另外一块代码使用了allocArrays，请求分配4个单位的内存，就可以将mBaseCache分配给该方法。这样就避免了减少频繁地创建和回收Map对象。
参考：

http://gityuan.com/2019/01/13/arraymap/
https://droidyue.com/blog/2017/02/12/dive-into-arraymap-in-android/

2020-03-04-Android的ArrayMap

构造函数

数据获取

数据插入

推荐阅读更多精彩内容