线性表是一个具有n个相同特性的数据元素的有限序列。定义中说明了一个线性表中存在同样的一种数据元素,并且长度有限、元素有序,在JCF中的实现是java.util.ArrayList
。
1.源码注释
- Resizable-array implementation of the <tt>List</tt> interface. Implements
all optional list operations, and permits all elements, including
<tt>null</tt>. In addition to implementing the <tt>List</tt> interface this class provides methods to manipulate the size of the array that is used internally to store the list. (This class is roughly equivalent to <tt>Vector</tt>, except that it is unsynchronized.)
ArrayList是List接口的可变长数组实现,实现了List中所有的可选操作(个人理解可选操作指的是接口中定义了某些子类实现后可能抛出如Unsupport异常的方法)(Many of the modification methods in the collection interfaces are labeled optional. Implementations are permitted to not perform one or more of these operations, throwing a runtime exception (UnsupportedOperationException) if they are attempted. ),允许存放所有类型包括null,在实现List方法之外,还增加了操作内部存储数据的数组大小的方法,ArrayList和Vector大致相等,只不过ArrayList是非同步的。
- <p>The <tt>size</tt>, <tt>isEmpty</tt>, <tt>get</tt>, <tt>set</tt>, <tt>iterator</tt>, and <tt>listIterator</tt> operations run in constant
time. The <tt>add</tt> operation runs in <i>amortized constant time</i>,
that is, adding n elements requires O(n) time. All of the other operations
run in linear time (roughly speaking). The constant factor is low compared
to that for the <tt>LinkedList</tt> implementation.
size(),isEmpty,get(),set(),iterator(),listIterator()操作运行的是常数时间,add()以一个固定摊销时间执行,添加n个元素需要O(n)+的时间,其他所有的操作都是线性时间的,常数因子比LinkedList实现小。
Each ArrayList instance has a capacity. The capacity is
the size of the array used to store the elements in the list. It is always
at least as large as the list size. As elements are added to an ArrayList,
its capacity grows automatically. The details of the growth policy are not
specified beyond the fact that adding an element has constant amortized
time cost.
An application can increase the capacity of an <tt>ArrayList</tt> instance
before adding a large number of elements using the <tt>ensureCapacity</tt> operation. This may reduce the amount of incremental reallocation.
每个ArrayList实例都有一容量,容量指的是集合中用来存储数据的数组的大小,它总是至少和集合一样大,随着元素的加入,容量会自动增长。除了添加一个元素需要固定摊销的时间成本外,增长的策略的时间成本并没有详细说明。应用在添加很多的元素之前可以调用ensureCapacity方法,来增加ArrayList的容量,从而减少递增的重新分配开销。
<p><strong>Note that this implementation is not synchronized.</strong>
If multiple threads access an <tt>ArrayList</tt> instance concurrently,
and at least one of the threads modifies the list structurally, it
<i>must</i> be synchronized externally. (A structural modification is
any operation that adds or deletes one or more elements, or explicitly
resizes the backing array; merely setting the value of an element is not
a structural modification.) This is typically accomplished by
synchronizing on some object that naturally encapsulates the list.
If no such object exists, the list should be "wrapped" using the
{@link Collections#synchronizedList Collections.synchronizedList}
method. This is best done at creation time, to prevent accidental
unsynchronized access to the list:<pre>
List list = Collections.synchronizedList(new ArrayList(...));</pre>
ArrayList实现是非同步的,如果多个线程并发的访问一个实例,并且至少有一个线程修改了集合的结构,则必须要有额外的同步。(一个结构修改指的是添加、删除一个或多个元素,或者改变数组的大小,如果仅仅设置元素的值并不是结构修改)。通常使用自然包含集合的对象来实现同步,如果没有这个样的对象,则应该使用Collections.synchronizedList
来包裹集合。包裹操作最好在创建的时候完成,以防止意外的非同步访问。
The iterators returned by this class's {@link #iterator() iterator} and
{@link #listIterator(int) listIterator} methods are <em>fail-fast</em>:</a>
if the list is structurally modified at any time after the iterator is
created, in any way except through the iterator's own
{@link ListIterator#remove() remove} or
{@link ListIterator#add(Object) add} methods, the iterator will throw a
{@link ConcurrentModificationException}. Thus, in the face of
concurrent modification, the iterator fails quickly and cleanly, rather
than risking arbitrary, non-deterministic behavior at an undetermined
time in the future.
<p>Note that the fail-fast behavior of an iterator cannot be guaranteed
as it is, generally speaking, impossible to make any hard guarantees in the
presence of unsynchronized concurrent modification. Fail-fast iterators
throw {@code ConcurrentModificationException} on a best-effort basis.
Therefore, it would be wrong to write a program that depended on this
exception for its correctness: <i>the fail-fast behavior of iterators
should be used only to detect bugs.</i>
iterator()和listIterator()方法返回的迭代器都是快速失败的,如果在迭代器创建之后集合的结构被修改了,无论以任何方法,除了迭代器自身的remove(),add()方法,迭代器都会抛出ConcurrentModification异常。因此,在面对并发修改时,迭代器会快速、干净的失败,而不是冒着武断的风险,在未来不确定的时间有不确定的运行情况。但是,迭代器并不能保证总是快速失败的,一般来说,在非同步的并发修改的情况下是不可能有严格准确的保证。快速失败的迭代器是指经最大努力的抛出CME,因此,不应该使程序的正确性依赖于是否抛出CME议程,而是仅仅把它用作BUG的检测。
2.可变长度
- ArrayList内部是使用array存放元素数据,初始化不指定容量时,使用空数组,当第一个元素加入,容量为
DEFAULT_CAPACITY = 10
。 - 扩容逻辑如何设计、实现?
初始化时如果指定了大于0的容量参数,则会new出一个指定长度的数组作为内部存储,向集合添加元素时,首先会进行calculateCapacity
操作,这个方法会进行如下判断:当现有数组为空数组,则取DEFAULT_CAPACITY
和添加元素需要达到的长度的最大值,如果现有不是空数组,则calculateCapacity
返回添加后的数组长度array.length + elementCount
。
ensureExplicitCapacity
方法中有overflow-conscious code
这样的注释,说明了minCapacity - elementData.length > 0
是具有溢出感知的,何为溢出感知?
a、b是两个正整数,且a比b大,我们将分为四种情况分析存在溢出情况下的比较方法:a<b,a-b<0这两个比较方式的区别,在比较之前明确比较都是数学意义上的,不应该以运行结果为导向,而且ArrayList中我们只考虑向上的溢出情况:
grow()方法实际进行数组扩容操作
private void grow(int minCapacity) {
// overflow-conscious code
int oldCapacity = elementData.length;
// oldCapacity >> 1 相当于 oldCapacity /2 ,newCapacity = 1.5 * oldCapacity
int newCapacity = oldCapacity + (oldCapacity >> 1);
//当newCapacity溢出,1.5 * L - (L+1) < 0 need false but true
if (newCapacity - minCapacity < 0)
//如果所需大小大于1.5*oldCapacity,或者1.5* oldCapacity溢出时,则以所需大小扩至minCapacity
newCapacity = minCapacity;
//此时newCapacity=minCapacity也有可能溢出,因为minCapacity = length+1
//所以此处仍使用溢出感知判断,当溢出时仍能返回true
if (newCapacity - MAX_ARRAY_SIZE > 0)
newCapacity = hugeCapacity(minCapacity);
// minCapacity is usually close to size, so this is a win:
//minCapacity通常情况下不会大于1.5* oldSize,更加接近于oldSize大小,所以使用newCapacity(猜测,理解可能不准确)
elementData = Arrays.copyOf(elementData, newCapacity);
}
private static int hugeCapacity(int minCapacity) {
//确认minCapacity没有产生溢出,否则JVM将会溢出或OOM
if (minCapacity < 0) // overflow
throw new OutOfMemoryError();
//如果minCapacity大于MAX_ARRAY_SIZE 则,
//新数组的长度为Integer.Max,否则新数组的长度是MAX_ARRAY_SIZE,
//这里说明了当ArrayList最大的容量是Integer.MAX_VALUE
return (minCapacity > MAX_ARRAY_SIZE) ?
Integer.MAX_VALUE :
MAX_ARRAY_SIZE;
}
- 另外有个问题
MAX_ARRAY_SIZE = Integer.MAX_VALUE - 8
的设计思路是什么样的?为什么这里减去的是8这个特殊的数值?
首先看一下数组类型在Heap中是如何存储的:
在实际测试中,不考虑JVM内存大小限制, 开启指针压缩时
new int[Integer.MAX_VALUE - 2]
可以通过JVM限制(Requested array size exceeds VM limit),实际数组可以容纳最大值:Integer.MAX_VALUE - 1 -1 个,
不开启指针压缩时,new int[Integer.MAX_VALUE - 3]
可以通过JVM限制(Requested array size exceeds VM limit),实际数组容纳最大值:Integer.MAX_VALUE - 2 -1个。
In HotSpot JVM array size is limited by internal representation. In the GC code JVM passes around the size of an array in heap words as an int
then converts back from heap words to jint
this may cause an overflow. So in order to avoid crashes and unexpected behavior the maximum array length is limited by [(max size - header size)] depends on C/C++ compiler which was used to build the JVM you are running(gcc for linux, clang for macos), and runtime settings(like UseCompressedClassPointers
),这里猜测在GC传值中,如果开启指针压缩,以4字节一个单位,则有效字节的offset取值范围是0,1,2,3,(4,···,Integer.MAX_VALUE)之间,由于0,1,2offset字节所有的对象都是一样的,针对于数组类型,额外需要跳过4个字节长度存储,所以数组的最大容纳元素个数就是
Integer.MAX_VALUE - 1 -1 个,这里的思路是站在数组长度是否多占字节来考虑的,开启压缩多占4字节,也就少容纳1个元素,不开启指针压缩,多占8字节(padding4字节),少容纳2个元素。
那么大胆的猜测:
MAX-8的考虑是站在总的byte序列上的,开启指针压缩,Header总共占用16字节,会占用4个元素空间,未开启压缩,header占用32字节,会占用8个元素空间,按照最极端的情况,数组就只能存放MAX_VALUE-8个元素。
在stackoverflow上有一个回答
参考:
https://github.com/openjdk-mirror/jdk7u-hotspot/blob/master/src/share/vm/oops/arrayOop.hpp#L109
https://stackoverflow.com/questions/31382531/why-i-cant-create-an-array-with-large-size
https://stackoverflow.com/questions/3038392/do-java-arrays-have-a-maximum-size
https://www.ibm.com/developerworks/java/library/j-codetoheap/index.html#ArrayList
https://www.jianshu.com/p/9d729c9c94c4
http://hg.openjdk.java.net/code-tools/jol/file/tip/jol-samples/src/main/java/org/openjdk/jol/samples/