part 7
UseConcMarkSweepGC下的内存申请流程分析
-XX:+UseConcMarkSweepGC俗称CMS,是一种减少GC停顿时间的堆管理方案,使用的堆管理器是GenCollectedHeap,新生代堆类型是ParNew,老年代是ConcurrentMarkSweepGeneration,新生代使用多线程版本的copy算法来进行垃圾收集,将新生代分为Eden + From + To三个空间区域;老年代使用CMS来进行周期性的垃圾收集,可以通过设置CMSInitiatingOccupancyFraction来让CMS检测是否需要进行一次CMS GC,CMSInitiatingOccupancyFraction的默认值为92%,也就是如果老年代的空间使用占了92%,那么就会进行一次CMS GC,这个默认值是计算出来的:
void ConcurrentMarkSweepGeneration::init_initiating_occupancy(intx io, uintx tr) {
assert(io <= 100 && tr <= 100, "Check the arguments");
if (io >= 0) {
_initiating_occupancy = (double)io / 100.0;
} else {
_initiating_occupancy = ((100 - MinHeapFreeRatio) +
(double)(tr * MinHeapFreeRatio) / 100.0)
/ 100.0;
}
}
参数io是CMSInitiatingOccupancyFraction,trCMSTriggerRatio;是如果设置了CMSInitiatingOccupancyFraction,那么_initiating_occupancy就是(double)io / 100.0,否则通过else分支中的计算分支来计算,假设没有设置CMSTriggerRatio,默认就是80,MinHeapFreeRatio是40;那么计算结果就是0.92;CMS的GC分为background gc和foreground gc,前者是CMS线程进行不但检测是否需要进行CMS GC来实现垃圾回收的,属于后台任务;而后者是被"Allocation Fail"或者“Promotion Fail”触发的,是一种主动的GC,而主动GC是要全程STW的,在实现上使用了SerialOld的策略,使用标记-清除-整理算法来进行整个堆空间的垃圾回收;关于CMS GC的详细细节另论,本文的重点在于UseConcMarkSweepGC下的对象内存分配策略探索。
在UseConcMarkSweepGC下对象依然首先在Eden中进行内存申请,UseConcMarkSweepGC新生代使用的是ParNew,是DefNew的子类,ParNew上的GC是DefNew上GC的多线程版本,在ParNew上进行空间分配应该也和DefNew差不多,下面来看看UseConcMarkSweepGC下内存分配的全流程。
因为前面的文章已经分析过对象在UseSerialGC下的内存申请流程,所以对于CMS的内存申请直接从CollectedHeap::common_mem_allocate_noinit函数开始看起,在UseSerialGC的时候也说过该函数,这个函数首先allocate_from_tlab函数来试图从TLAB申请空间,如果无法满足,那么就重新申请一块TLAB,申请一块TLAB和为对象申请空间的流程对于堆来说都是内存申请,所以后续的流程是一致的;如果通过TLAB无法申请到内存,那么就通过Universe::heap()->mem_allocate来直接在堆中申请内存,这个时候就要加锁了,因为堆面向的是所有线程,不像TLAB是线程私有的,所以会存在多线程竞争的问题,所以但愿TLAB可以有效;GenCollectorPolicy::mem_allocate_work将完成再堆中内存申请的流程,下面就主要来分析一下这个函数的具体实现。
// First allocation attempt is lock-free.
Generation *young = gch->young_gen();
assert(young->supports_inline_contig_alloc(),
"Otherwise, must do alloc within heap lock");
if (young->should_allocate(size, is_tlab)) {
result = young->par_allocate(size, is_tlab);
if (result != NULL) {
assert(gch->is_in_reserved(result), "result not in heap");
return result;
}
}
young->should_allocate用于判断是否应该在新生代进行空间申请,大对象应该直接在老年代进行分配,如果不是大对象,那么就会通过young->par_allocate来进行空间申请,young->par_allocate使用的是DefNew的实现,ParNew继承了DefNew的young->par_allocate实现;
HeapWord* DefNewGeneration::par_allocate(size_t word_size,
bool is_tlab) {
HeapWord* res = eden()->par_allocate(word_size);
if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
_old_gen->sample_eden_chunk();
}
return res;
}
可以看到是向Eden空间申请内存,具体实现时通过ContiguousSpace::par_allocate_impl来进行的,关于这块的内容前面的文章已经分析过,不再赘述,因为使用copying算法来进行垃圾回收,不会存在内存碎片问题,所以可以使用指针碰撞算法来进行空间分配,所谓指针碰撞就是使用一个top指针,来标记当前空闲内存的起始地址,分配一块size大小的内存空间的实现就是将top指针向前移动size即可实现;如果无法从Eden空间分配到内存,那么就要试图从From区域分配内存了,gch->attempt_allocation将实现先尝试从Eden区域申请内存,如果无法成功,那么尝试从From区域分配,如果还不可以,那么就从Old区域分配的逻辑,具体实现如下:
HeapWord* GenCollectedHeap::attempt_allocation(size_t size,
bool is_tlab,
bool first_only) {
HeapWord* res = NULL;
if (_young_gen->should_allocate(size, is_tlab)) {
res = _young_gen->allocate(size, is_tlab);
if (res != NULL || first_only) {
return res;
}
}
if (_old_gen->should_allocate(size, is_tlab)) {
res = _old_gen->allocate(size, is_tlab);
}
return res;
}
_young_gen->allocate将会从From区域尝试申请内存:
HeapWord* DefNewGeneration::allocate(size_t word_size, bool is_tlab) {
// This is the slow-path allocation for the DefNewGeneration.
// Most allocations are fast-path in compiled code.
// We try to allocate from the eden. If that works, we are happy.
// Note that since DefNewGeneration supports lock-free allocation, we
// have to use it here, as well.
HeapWord* result = eden()->par_allocate(word_size);
if (result != NULL) {
if (CMSEdenChunksRecordAlways && _old_gen != NULL) {
_old_gen->sample_eden_chunk();
}
} else {
// If the eden is full and the last collection bailed out, we are running
// out of heap space, and we try to allocate the from-space, too.
// allocate_from_space can't be inlined because that would introduce a
// circular dependency at compile time.
result = allocate_from_space(word_size);
}
return result;
}
eden()->par_allocate将从Eden区域申请内存,如果无法满足,那么就通过allocate_from_space从From区域进行内存分配;
// The last collection bailed out, we are running out of heap space,
// so we try to allocate the from-space, too.
HeapWord* DefNewGeneration::allocate_from_space(size_t size) {
bool should_try_alloc = should_allocate_from_space() || GCLocker::is_active_and_needs_gc();
// If the Heap_lock is not locked by this thread, this will be called
// again later with the Heap_lock held.
bool do_alloc = should_try_alloc && (Heap_lock->owned_by_self()
|| (SafepointSynchronize::is_at_safepoint()
&& Thread::current()->is_VM_thread()));
HeapWord* result = NULL;
if (do_alloc) {
result = from()->allocate(size);
}
return result;
}
当然,需要判断是否允许在From区域进行内存分配,如果不允许,那么还是无法在From区域进行分配;should_allocate_from_space将完成这个判断,当然,如果当前有线程在进行GC,那么是运行从From区域进行内存分配的,下面看看should_allocate_from_space函数的具体判断逻辑:
bool should_allocate_from_space() const {
return _should_allocate_from_space;
}
void clear_should_allocate_from_space() {
_should_allocate_from_space = false;
}
void set_should_allocate_from_space() {
_should_allocate_from_space = true;
}
很简单,直接返回_should_allocate_from_space的值,所以来看看在什么时候设置了该值即可找到判断逻辑:
判断条件还是比较严格的,首先collection_attempt_is_safe是true,并且Eden已经满了,collection_attempt_is_safe函数的实现如下:
bool DefNewGeneration::collection_attempt_is_safe() {
if (!to()->is_empty()) {
log_trace(gc)(":: to is not empty ::");
return false;
}
if (_old_gen == NULL) {
GenCollectedHeap* gch = GenCollectedHeap::heap();
_old_gen = gch->old_gen();
}
return _old_gen->promotion_attempt_is_safe(used());
}
如果To区域不为空,那么就直接不可以在From区域进行分配,To区域不为空就说明发生了“Promotion Fail”,如果没有发生过“Promotion Fail”,那么判断晋升是否是安全的,通过_old_gen->promotion_attempt_is_safe函数来实现:
bool ConcurrentMarkSweepGeneration::promotion_attempt_is_safe(size_t max_promotion_in_bytes) const {
size_t available = max_available();
size_t av_promo = (size_t)gc_stats()->avg_promoted()->padded_average();
bool res = (available >= av_promo) || (available >= max_promotion_in_bytes);
return res;
}
available是老年代可用内存大小,av_promo是新生代评价晋升对象大小,max_promotion_in_bytes是新生代的使用量(Eden + From),所以,如果老年代的可用空间大于新生代评价晋升对象大小,或者大于新生代的使用量,那么就说明年轻代晋升是安全的,否则就是不安全的;
总结一下,如果当前有线程在进行GC,或者Eden区域已经满了,或者老年代判断晋升是安全的,那么就运行在From区域进行分配,否则只能到老年代去分配了;
如果新生代(包括从Eden和From区域)无法申请到内存的话,那么就要去老年代试试了,_old_gen->should_allocate首先判断是否可以在老年代进行内存申请,如果允许,那么就通过_old_gen->allocate函数来申请内存,下面先来看看_old_gen->should_allocate的实现;
// Returns "true" iff this generation should be used to allocate an
// object of the given size. Young generations might
// wish to exclude very large objects, for example, since, if allocated
// often, they would greatly increase the frequency of young-gen
// collection.
virtual bool should_allocate(size_t word_size, bool is_tlab) {
bool result = false;
size_t overflow_limit = (size_t)1 << (BitsPerSize_t - LogHeapWordSize);
if (!is_tlab || supports_tlab_allocation()) {
result = (word_size > 0) && (word_size < overflow_limit);
}
return result;
}
如果上述函数判断是true,那么就通过_old_gen->allocate来从老年代申请内存:
HeapWord* ConcurrentMarkSweepGeneration::allocate(size_t size, bool tlab) {
CMSSynchronousYieldRequest yr;
MutexLockerEx x(freelistLock(), Mutex::_no_safepoint_check_flag);
return have_lock_and_allocate(size, tlab);
}
HeapWord* ConcurrentMarkSweepGeneration::have_lock_and_allocate(size_t size,
bool tlab /* ignored */) {
assert_lock_strong(freelistLock());
size_t adjustedSize = CompactibleFreeListSpace::adjustObjectSize(size);
HeapWord* res = cmsSpace()->allocate(adjustedSize);
// Allocate the object live (grey) if the background collector has
// started marking. This is necessary because the marker may
// have passed this address and consequently this object will
// not otherwise be greyed and would be incorrectly swept up.
// Note that if this object contains references, the writing
// of those references will dirty the card containing this object
// allowing the object to be blackened (and its references scanned)
// either during a preclean phase or at the final checkpoint.
if (res != NULL) {
// We may block here with an uninitialized object with
// its mark-bit or P-bits not yet set. Such objects need
// to be safely navigable by block_start().
assert(oop(res)->klass_or_null() == NULL, "Object should be uninitialized here.");
assert(!((FreeChunk*)res)->is_free(), "Error, block will look free but show wrong size");
collector()->direct_allocated(res, adjustedSize);
_direct_allocated_words += adjustedSize;
// allocation counters
NOT_PRODUCT(
_numObjectsAllocated++;
_numWordsAllocated += (int)adjustedSize;
)
}
return res;
}
CompactibleFreeListSpace将会负责CMS老年代的内存分配工作,这里需要说一下的是,CMS老年代和DefNew或者ParNew都不一样,CMS老年代堆可能会产生内存碎片,所以无法使用指针碰撞算法来进行内存分配,CMS老年代使用了称为空闲列表(Free-List)的算法来管理老年代的内存,下面来看看CompactibleFreeListSpace的allocate函数的实现:
HeapWord* CompactibleFreeListSpace::allocate(size_t size) {
HeapWord* res = NULL;
res = allocate_adaptive_freelists(size);
if (res != NULL) {
FreeChunk* fc = (FreeChunk*)res;
fc->markNotFree();
// Verify that the block offset table shows this to
// be a single block, but not one which is unallocated.
_bt.verify_single_block(res, size);
_bt.verify_not_unallocated(res, size);
}
return res;
}
allocate_adaptive_freelists函数将尽最大努力来找到一块合适的内存,这里面的流程也是非常复杂的,但是这里的实现像极了C++ STL中内存池的实现,所以如果有条件的话还是希望去分析一下C++ STL内存池的相关实现。下面来看看allocate_adaptive_freelists函数的具体实现。
这里顺便说一下,如果从Old区域中也无法满足申请要求,那么就得去通过expand_heap_and_allocate扩展堆再来allocate了,如果还不行,那么就执行进行GC了,VM_GenCollectForAllocation将会被放在VMThread中等待执行,具体执行Minor GC还是FullGC需要具体判断,这部分内容在前面的文章中分析过,就不再赘述,下面将详细分析CMS Free-List内存分配的实现细节,也就是allocate_adaptive_freelists函数的具体实现细节。
HeapWord* CompactibleFreeListSpace::allocate_adaptive_freelists(size_t size) {
assert_lock_strong(freelistLock());
HeapWord* res = NULL;
assert(size == adjustObjectSize(size),
"use adjustObjectSize() before calling into allocate()");
// Strategy
// if small
// exact size from small object indexed list if small
// small or large linear allocation block (linAB) as appropriate
// take from lists of greater sized chunks
// else
// dictionary
// small or large linear allocation block if it has the space
// Try allocating exact size from indexTable first
if (size < IndexSetSize) {
res = (HeapWord*) getChunkFromIndexedFreeList(size);
if(res != NULL) {
assert(res != (HeapWord*)_indexedFreeList[size].head(),
"Not removed from free list");
// no block offset table adjustment is necessary on blocks in
// the indexed lists.
// Try allocating from the small LinAB
} else if (size < _smallLinearAllocBlock._allocation_size_limit &&
(res = getChunkFromSmallLinearAllocBlock(size)) != NULL) {
// if successful, the above also adjusts block offset table
// Note that this call will refill the LinAB to
// satisfy the request. This is different that
// evm.
// Don't record chunk off a LinAB? smallSplitBirth(size);
} else {
// Raid the exact free lists larger than size, even if they are not
// overpopulated.
res = (HeapWord*) getChunkFromGreater(size);
}
} else {
// Big objects get allocated directly from the dictionary.
res = (HeapWord*) getChunkFromDictionaryExact(size);
if (res == NULL) {
// Try hard not to fail since an allocation failure will likely
// trigger a synchronous GC. Try to get the space from the
// allocation blocks.
res = getChunkFromSmallLinearAllocBlockRemainder(size);
}
}
return res;
}
CMS使用的Free-List分配算法策略复杂,当然复杂带来的好处是高效的内存分配速率;这一块内容日后再来整理。