GC part 8

part 8

UseConcMarkSweepGC下的GC流程分析

相比于SerialGC，CMS要复杂得多，因为他是第一个GC线程可以和用户线程并发执行的GC，GC线程和用户线程并发执行这件事情是非常困难的，也是极其复杂的，因为垃圾收集的同时，用户线程还在不断的产生垃圾，或者改变引用关系，使得已经被GC线程标记为垃圾的对象活起来了，这些情况都需要CMS能够很好的去解决；

CMS GC分为foreground gc和background gc，foreground gc是一种主动式GC，是Minor GC造成的一种FullGC，foreground gc将和Serial old使用同样的垃圾收集算法来做FullGC（单线程，mark-sweep-compact）；如果触发了foreground gc，但是发现此时background gc正在工作，那么就会发生"Concurrent model fail"；background gc也就是CMS old GC，只会收集老年代（ConcurrentMarkSweepGeneration），是一种周期性被动GC，ConcurrentMarkSweepThread会周期性的检测是否需要触发一次background gc，判断条件一般是老年代空间使用超过了设置的触发CMS old GC的阈值，默认为92%，可以通过CMSInitiatingOccupancyFraction来设置具体的值，建议开启-XX:+UseCMSInitiatingOccupancyOnly，否则CMS会根据收集到的数据进行判断，这样可能情况就变得更加复杂了。

UseConcMarkSweepGC依然使用GenCollectedHeap作为堆管理器，所以GC策略还是和Serial GC一样，这里就不再赘述，本文剩下的内容主要分析CMS Old GC的实现细节，以及background gc和foreground gc之间是如何相互配合来回收垃圾的。CMS过程复杂，下面是CMS Old GC可能经过的状态枚举：

  // CMS abstract state machine
  // initial_state: Idling
  // next_state(Idling)            = {Marking}
  // next_state(Marking)           = {Precleaning, Sweeping}
  // next_state(Precleaning)       = {AbortablePreclean, FinalMarking}
  // next_state(AbortablePreclean) = {FinalMarking}
  // next_state(FinalMarking)      = {Sweeping}
  // next_state(Sweeping)          = {Resizing}
  // next_state(Resizing)          = {Resetting}
  // next_state(Resetting)         = {Idling}
  // The numeric values below are chosen so that:
  // . _collectorState <= Idling ==  post-sweep && pre-mark
  // . _collectorState in (Idling, Sweeping) == {initial,final}marking ||
  //                                            precleaning || abortablePrecleanb
 public:
  enum CollectorState {
    Resizing            = 0,
    Resetting           = 1,
    Idling              = 2,
    InitialMarking      = 3,
    Marking             = 4,
    Precleaning         = 5,
    AbortablePreclean   = 6,
    FinalMarking        = 7,
    Sweeping            = 8
  };

Idling状态是初始状态，也代表background gc目前不在进行垃圾收集，此时进行foreground gc是不会发生 "Concurrent mode fail"的，简单说，CMS Old GC需要经过初始标记（STW）、并发标记、最终标记（STW）、清理垃圾这么几个关键的步骤，看起来CMS Old GC的过程中一直在做标记的工作，这主要是CMS希望能尽量缩短暂停用户线程的时候，所以有些阶段就直接和用户线程并发运行了，这就导致会产生“浮动垃圾”，使得CMS整体实现非常复杂难懂，下面按照一些关键步骤尝试分析每一步所做的事情，以及每一步存在的意义以及可能存在的一些运行时表现。

CMSCollector::collect_in_background函数完成的工作就是background gc的工作，foreground gc的工作由CMSCollector::collect函数完成，下面的分析的入口均从这连个函数进入。

InitialMarking （初始标记）

初始标记是一个STW的过程，当CMS 发现当前状态_collectorState为InitialMarking的时候就会执行初始化标记的工作，下面是InitialMarking工作的入口代码：

      case InitialMarking:
        {
          ReleaseForegroundGC x(this);
          stats().record_cms_begin();
          VM_CMS_Initial_Mark initial_mark_op(this);
          VMThread::execute(&initial_mark_op);
        }
        // The collector state may be any legal state at this point
        // since the background collector may have yielded to the
        // foreground collector.
        break;

VM_CMS_Initial_Mark的doit函数将被VMThread调度执行，下面来看看VM_CMS_Initial_Mark的doit函数的具体工作内容。

void VM_CMS_Initial_Mark::doit() {
  HS_PRIVATE_CMS_INITMARK_BEGIN();
  GCIdMark gc_id_mark(_gc_id);

  _collector->_gc_timer_cm->register_gc_pause_start("Initial Mark");

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, GCCause::_cms_initial_mark);

  VM_CMS_Operation::verify_before_gc();

  IsGCActiveMark x; // stop-world GC active
  _collector->do_CMS_operation(CMSCollector::CMS_op_checkpointRootsInitial, gch->gc_cause());

  VM_CMS_Operation::verify_after_gc();

  _collector->_gc_timer_cm->register_gc_pause_end();

  HS_PRIVATE_CMS_INITMARK_END();
}

_collector->do_CMS_operation将被执行，看参数中CMSCollector::CMS_op_checkpointRootsInitial可知接下来会进行初始化标记的过程，CMSCollector::do_CMS_operation函数内容如下：

void CMSCollector::do_CMS_operation(CMS_op_type op, GCCause::Cause gc_cause) {
  GCTraceCPUTime tcpu;
  TraceCollectorStats tcs(counters());

  switch (op) {
    case CMS_op_checkpointRootsInitial: {
      GCTraceTime(Info, gc) t("Pause Initial Mark", NULL, GCCause::_no_gc, true);
      SvcGCMarker sgcm(SvcGCMarker::OTHER);
      checkpointRootsInitial();
      break;
    }
    case CMS_op_checkpointRootsFinal: {
      GCTraceTime(Info, gc) t("Pause Remark", NULL, GCCause::_no_gc, true);
      SvcGCMarker sgcm(SvcGCMarker::OTHER);
      checkpointRootsFinal();
      break;
    }
    default:
      fatal("No such CMS_op");
  }
}

这个函数在FinalMarking阶段也会被调用，对应的Operation就是CMS_op_checkpointRootsFinal，无论是CMS_op_checkpointRootsFinal还是CMS_op_checkpointRootsInitial都是STW的，现在来看看CMS_op_checkpointRootsInitial对应的流程；checkpointRootsInitial函数将被调用：

// Checkpoint the roots into this generation from outside
// this generation. [Note this initial checkpoint need only
// be approximate -- we'll do a catch up phase subsequently.]
void CMSCollector::checkpointRootsInitial() {
  assert(_collectorState == InitialMarking, "Wrong collector state");
  check_correct_thread_executing();
  TraceCMSMemoryManagerStats tms(_collectorState,GenCollectedHeap::heap()->gc_cause());

  save_heap_summary();
  report_heap_summary(GCWhen::BeforeGC);

  ReferenceProcessor* rp = ref_processor();
  assert(_restart_addr == NULL, "Control point invariant");
  {
    // acquire locks for subsequent manipulations
    MutexLockerEx x(bitMapLock(),
                    Mutex::_no_safepoint_check_flag);
    checkpointRootsInitialWork();
    // enable ("weak") refs discovery
    rp->enable_discovery();
    _collectorState = Marking;
  }
}

checkpointRootsInitialWork是需要重点关注的函数调用；CMSParallelInitialMarkEnabled默认是true的，所以将会执行下面这段代码：

      // The parallel version.
      WorkGang* workers = gch->workers();
      assert(workers != NULL, "Need parallel worker threads.");
      uint n_workers = workers->active_workers();

      StrongRootsScope srs(n_workers);

      CMSParInitialMarkTask tsk(this, &srs, n_workers);
      initialize_sequential_subtasks_for_young_gen_rescan(n_workers);
      // If the total workers is greater than 1, then multiple workers
      // may be used at some time and the initialization has been set
      // such that the single threaded path cannot be used.
      if (workers->total_workers() > 1) {
        workers->run_task(&tsk);
      } else {
        tsk.work(0);
      }

CMSParInitialMarkTask就是具体的任务，CMSParInitialMarkTask::work将完成具体的InitialMarking工作，下面是CMSParInitialMarkTask::work的具体细节，从图中的代码片段可以看出来InitialMarking需要完成的工作是哪些：

void CMSParInitialMarkTask::work(uint worker_id) {
  elapsedTimer _timer;
  ResourceMark rm;
  HandleMark   hm;
  // ---------- scan from roots --------------
  _timer.start();
  GenCollectedHeap* gch = GenCollectedHeap::heap();
  ParMarkRefsIntoClosure par_mri_cl(_collector->_span, &(_collector->_markBitMap));
  // ---------- young gen roots --------------
  {
    work_on_young_gen_roots(&par_mri_cl);
    _timer.stop();
    log_trace(gc, task)("Finished young gen initial mark scan work in %dth thread: %3.3f sec",
                        worker_id, _timer.seconds());
  }
  // ---------- remaining roots --------------
  _timer.reset();
  _timer.start();
  CLDToOopClosure cld_closure(&par_mri_cl, true);
  gch->cms_process_roots(_strong_roots_scope,
                         false,     // yg was scanned above
                         GenCollectedHeap::ScanningOption(_collector->CMSCollector::roots_scanning_options()),
                         _collector->should_unload_classes(),
                         &par_mri_cl,
                         &cld_closure);
  assert(_collector->should_unload_classes()
         || (_collector->CMSCollector::roots_scanning_options() & GenCollectedHeap::SO_AllCodeCache),
         "if we didn't scan the code cache, we have to be ready to drop nmethods with expired weak oops");
  _timer.stop();
  log_trace(gc, task)("Finished remaining root initial mark scan work in %dth thread: %3.3f sec",
                      worker_id, _timer.seconds());
}

InitialMarking阶段将以GCRoot和新生代对象为Root扫描老年代，来标记出老年代存活的对象；在具体实现上，CMS使用称为“三色标记”的算法来进行存活对象标记，白色代表没有被标记，灰色代表自身被标记，但是引用的对象还没有被标记，黑色代表自身被标记，并且引用的对象也已经标记物完成，具体的算法实现非常复杂，本文就不继续分析研究了。

Marking （并发标记）

该阶段称为并发标记，这里的并发，指的是用户线程和GC线程并发执行，介于这种并发执行的情况，可能在GC线程标记的过程中存在新生代对象晋升的情况，或者根据内存分配策略大对象直接在老年代分配空间，以及Minor GC的时候存活对象无法转移到To Survivor中去而提前晋升转移到老年代中来，或者更为复杂的是对象引用关系发生变化，这些对象都需要被重新标记，否则就会错误的以为这部分对象不可达而被清理，造成严重的运行时错误。

      case Marking:
        // initial marking in checkpointRootsInitialWork has been completed
        if (markFromRoots()) { // we were successful
          assert(_collectorState == Precleaning, "Collector state should "
            "have changed");
        } else {
          assert(_foregroundGCIsActive, "Internal state inconsistency");
        }
        break;

markFromRoots函数将负责并发标记阶段的全部工作，下面来分析一下这个阶段的主要流程；

bool CMSCollector::markFromRoots() {
  // we might be tempted to assert that:
  // assert(!SafepointSynchronize::is_at_safepoint(),
  //        "inconsistent argument?");
  // However that wouldn't be right, because it's possible that
  // a safepoint is indeed in progress as a young generation
  // stop-the-world GC happens even as we mark in this generation.
  assert(_collectorState == Marking, "inconsistent state?");
  check_correct_thread_executing();
  verify_overflow_empty();

  // Weak ref discovery note: We may be discovering weak
  // refs in this generation concurrent (but interleaved) with
  // weak ref discovery by the young generation collector.

  CMSTokenSyncWithLocks ts(true, bitMapLock());
  GCTraceCPUTime tcpu;
  CMSPhaseAccounting pa(this, "Concurrent Mark");
  bool res = markFromRootsWork();
  if (res) {
    _collectorState = Precleaning;
  } else { // We failed and a foreground collection wants to take over
    assert(_foregroundGCIsActive, "internal state inconsistency");
    assert(_restart_addr == NULL,  "foreground will restart from scratch");
    log_debug(gc)("bailing out to foreground collection");
  }
  verify_overflow_empty();
  return res;
}

markFromRoots函数中的markFromRootsWork函数调用将完成主要的工作，然后判断该阶段的任务是否成功执行，如果是的话，那么就转移状态到Precleaning，接着GCThread就会进行下一阶段Precleaning的工作；下面来看看markFromRootsWork函数实现的细节：

bool CMSCollector::markFromRootsWork() {
  // iterate over marked bits in bit map, doing a full scan and mark
  // from these roots using the following algorithm:
  // . if oop is to the right of the current scan pointer,
  //   mark corresponding bit (we'll process it later)
  // . else (oop is to left of current scan pointer)
  //   push oop on marking stack
  // . drain the marking stack

  // Note that when we do a marking step we need to hold the
  // bit map lock -- recall that direct allocation (by mutators)
  // and promotion (by the young generation collector) is also
  // marking the bit map. [the so-called allocate live policy.]
  // Because the implementation of bit map marking is not
  // robust wrt simultaneous marking of bits in the same word,
  // we need to make sure that there is no such interference
  // between concurrent such updates.

  // already have locks
  assert_lock_strong(bitMapLock());

  verify_work_stacks_empty();
  verify_overflow_empty();
  bool result = false;
  if (CMSConcurrentMTEnabled && ConcGCThreads > 0) {
    result = do_marking_mt();
  } else {
    result = do_marking_st();
  }
  return result;
}

如果设置了CMSConcurrentMTEnabled，并且ConcGCThreads数量大于0，那么就会执行do_marking_mt，也就是多线程版本，否则就会执行do_marking_st，也就是单线程版本；为了分析简单，下面只分析单线程版本的内容：

bool CMSCollector::do_marking_st() {
  ResourceMark rm;
  HandleMark   hm;

  // Temporarily make refs discovery single threaded (non-MT)
  ReferenceProcessorMTDiscoveryMutator rp_mut_discovery(ref_processor(), false);
  MarkFromRootsClosure markFromRootsClosure(this, _span, &_markBitMap,
    &_markStack, CMSYield);
  // the last argument to iterate indicates whether the iteration
  // should be incremental with periodic yields.
  _markBitMap.iterate(&markFromRootsClosure);
  // If _restart_addr is non-NULL, a marking stack overflow
  // occurred; we need to do a fresh iteration from the
  // indicated restart address.
  while (_restart_addr != NULL) {
    if (_foregroundGCIsActive) {
      // We may be running into repeated stack overflows, having
      // reached the limit of the stack size, while making very
      // slow forward progress. It may be best to bail out and
      // let the foreground collector do its job.
      // Clear _restart_addr, so that foreground GC
      // works from scratch. This avoids the headache of
      // a "rescan" which would otherwise be needed because
      // of the dirty mod union table & card table.
      _restart_addr = NULL;
      return false;  // indicating failure to complete marking
    }
    // Deal with stack overflow:
    // we restart marking from _restart_addr
    HeapWord* ra = _restart_addr;
    markFromRootsClosure.reset(ra);
    _restart_addr = NULL;
    _markBitMap.iterate(&markFromRootsClosure, ra, _span.end());
  }
  return true;
}

markFromRootsClosure是一个闭包函数对象，它里面的do_bit函数将会被BitMap::iterate来调用，调用关系可以在CMSCollector::do_marking_st函数中看到，先开看看BitMap::iterate的实现：

// Note that if the closure itself modifies the bitmap
// then modifications in and to the left of the _bit_ being
// currently sampled will not be seen. Note also that the
// interval [leftOffset, rightOffset) is right open.
bool BitMap::iterate(BitMapClosure* blk, idx_t leftOffset, idx_t rightOffset) {
  verify_range(leftOffset, rightOffset);

  idx_t startIndex = word_index(leftOffset);
  idx_t endIndex   = MIN2(word_index(rightOffset) + 1, size_in_words());
  for (idx_t index = startIndex, offset = leftOffset;
       offset < rightOffset && index < endIndex;
       offset = (++index) << LogBitsPerWord) {
    idx_t rest = map(index) >> (offset & (BitsPerWord - 1));
    for (; offset < rightOffset && rest != 0; offset++) {
      if (rest & 1) {
        if (!blk->do_bit(offset)) return false;
        //  resample at each closure application
        // (see, for instance, CMS bug 4525989)
        rest = map(index) >> (offset & (BitsPerWord -1));
      }
      rest = rest >> 1;
    }
  }
  return true;
}

可以看到不断的调用了BitMapClosure的do_bit函数，这里的BitMapClosure就是MarkFromRootsClosure；下面来看看do_bit的具体实现：

bool MarkFromRootsClosure::do_bit(size_t offset) {
  if (_skipBits > 0) {
    _skipBits--;
    return true;
  }
  // convert offset into a HeapWord*
  HeapWord* addr = _bitMap->startWord() + offset;
  assert(_bitMap->endWord() && addr < _bitMap->endWord(),
         "address out of range");
  assert(_bitMap->isMarked(addr), "tautology");
  if (_bitMap->isMarked(addr+1)) {
    // this is an allocated but not yet initialized object
    assert(_skipBits == 0, "tautology");
    _skipBits = 2;  // skip next two marked bits ("Printezis-marks")
    oop p = oop(addr);
    if (p->klass_or_null_acquire() == NULL) {
      DEBUG_ONLY(if (!_verifying) {)
        // We re-dirty the cards on which this object lies and increase
        // the _threshold so that we'll come back to scan this object
        // during the preclean or remark phase. (CMSCleanOnEnter)
        if (CMSCleanOnEnter) {
          size_t sz = _collector->block_size_using_printezis_bits(addr);
          HeapWord* end_card_addr   = (HeapWord*)round_to(
                                         (intptr_t)(addr+sz), CardTableModRefBS::card_size);
          MemRegion redirty_range = MemRegion(addr, end_card_addr);
          assert(!redirty_range.is_empty(), "Arithmetical tautology");
          // Bump _threshold to end_card_addr; note that
          // _threshold cannot possibly exceed end_card_addr, anyhow.
          // This prevents future clearing of the card as the scan proceeds
          // to the right.
          assert(_threshold <= end_card_addr,
                 "Because we are just scanning into this object");
          if (_threshold < end_card_addr) {
            _threshold = end_card_addr;
          }
          if (p->klass_or_null_acquire() != NULL) {
            // Redirty the range of cards...
            _mut->mark_range(redirty_range);
          } // ...else the setting of klass will dirty the card anyway.
        }
      DEBUG_ONLY(})
      return true;
    }
  }
  scanOopsInOop(addr);
  return true;
}

主要关系MarkFromRootsClosure::scanOopsInOop函数：

void MarkFromRootsClosure::scanOopsInOop(HeapWord* ptr) {
  assert(_bitMap->isMarked(ptr), "expected bit to be set");
  assert(_markStack->isEmpty(),
         "should drain stack to limit stack usage");
  // convert ptr to an oop preparatory to scanning
  oop obj = oop(ptr);
  // Ignore mark word in verification below, since we
  // may be running concurrent with mutators.
  assert(obj->is_oop(true), "should be an oop");
  assert(_finger <= ptr, "_finger runneth ahead");
  // advance the finger to right end of this object
  _finger = ptr + obj->size();
  assert(_finger > ptr, "we just incremented it above");
  // On large heaps, it may take us some time to get through
  // the marking phase. During
  // this time it's possible that a lot of mutations have
  // accumulated in the card table and the mod union table --
  // these mutation records are redundant until we have
  // actually traced into the corresponding card.
  // Here, we check whether advancing the finger would make
  // us cross into a new card, and if so clear corresponding
  // cards in the MUT (preclean them in the card-table in the
  // future).

  DEBUG_ONLY(if (!_verifying) {)
    // The clean-on-enter optimization is disabled by default,
    // until we fix 6178663.
    if (CMSCleanOnEnter && (_finger > _threshold)) {
      // [_threshold, _finger) represents the interval
      // of cards to be cleared  in MUT (or precleaned in card table).
      // The set of cards to be cleared is all those that overlap
      // with the interval [_threshold, _finger); note that
      // _threshold is always kept card-aligned but _finger isn't
      // always card-aligned.
      HeapWord* old_threshold = _threshold;
      assert(old_threshold == (HeapWord*)round_to(
              (intptr_t)old_threshold, CardTableModRefBS::card_size),
             "_threshold should always be card-aligned");
      _threshold = (HeapWord*)round_to(
                     (intptr_t)_finger, CardTableModRefBS::card_size);
      MemRegion mr(old_threshold, _threshold);
      assert(!mr.is_empty(), "Control point invariant");
      assert(_span.contains(mr), "Should clear within span");
      _mut->clear_range(mr);
    }
  DEBUG_ONLY(})
  // Note: the finger doesn't advance while we drain
  // the stack below.
  PushOrMarkClosure pushOrMarkClosure(_collector,
                                      _span, _bitMap, _markStack,
                                      _finger, this);
  bool res = _markStack->push(obj);
  assert(res, "Empty non-zero size stack should have space for single push");
  while (!_markStack->isEmpty()) {
    oop new_oop = _markStack->pop();
    // Skip verifying header mark word below because we are
    // running concurrent with mutators.
    assert(new_oop->is_oop(true), "Oops! expected to pop an oop");
    // now scan this oop's oops
    new_oop->oop_iterate(&pushOrMarkClosure);
    do_yield_check();
  }
  assert(_markStack->isEmpty(), "tautology, emphasizing post-condition");
}

看到oop_iterate，就是进行对象标记工作了，当然，具体的工作还是由PushOrMarkClosure的闭包函数do_oop完成的，下面来看看实现细节：

void PushOrMarkClosure::do_oop(oop obj) {
  // Ignore mark word because we are running concurrent with mutators.
  assert(obj->is_oop_or_null(true), "Expected an oop or NULL at " PTR_FORMAT, p2i(obj));
  HeapWord* addr = (HeapWord*)obj;
  if (_span.contains(addr) && !_bitMap->isMarked(addr)) {
    // Oop lies in _span and isn't yet grey or black
    _bitMap->mark(addr);            // now grey
    if (addr < _finger) {
      // the bit map iteration has already either passed, or
      // sampled, this bit in the bit map; we'll need to
      // use the marking stack to scan this oop's oops.
      bool simulate_overflow = false;
      NOT_PRODUCT(
        if (CMSMarkStackOverflowALot &&
            _collector->simulate_overflow()) {
          // simulate a stack overflow
          simulate_overflow = true;
        }
      )
      if (simulate_overflow || !_markStack->push(obj)) { // stack overflow
        log_trace(gc)("CMS marking stack overflow (benign) at " SIZE_FORMAT, _markStack->capacity());
        assert(simulate_overflow || _markStack->isFull(), "Else push should have succeeded");
        handle_stack_overflow(addr);
      }
    }
    // anything including and to the right of _finger
    // will be scanned as we iterate over the remainder of the
    // bit map
    do_yield_check();
  }
}

可以看到，do_oop函数会将对象标记，并且将对象push到_markStack中去，然后在MarkFromRootsClosure::scanOopsInOop的while循环中将从_markStack中pop出一个obj继续遍历标记，整个过程是类似于递归完成的；所以并发标记阶段完成的工作就是根据初始化标记阶段标记出来的对象为Root，递归标记这些root可达的引用，只是在标记的过程中用户线程也是并发执行的，所以情况就会比较复杂，这也是为什么CMS需要有多次标记动作的原因，如果不执行多次标记，那么就可能会将一些存活的对象漏标记了，那么清理的时候就会误清理。

Precleaning （预清理）

通过Marking之后，_collectorState就会被更新为Precleaning，该阶段的入口如下：

   case Precleaning:
        // marking from roots in markFromRoots has been completed
        preclean();
        assert(_collectorState == AbortablePreclean ||
               _collectorState == FinalMarking,
               "Collector state should have changed");
        break;

preclean函数就完成Precleaning阶段的工作；

void CMSCollector::preclean() {
  check_correct_thread_executing();
  assert(Thread::current()->is_ConcurrentGC_thread(), "Wrong thread");
  verify_work_stacks_empty();
  verify_overflow_empty();
  _abort_preclean = false;
  if (CMSPrecleaningEnabled) {
    if (!CMSEdenChunksRecordAlways) {
      _eden_chunk_index = 0;
    }
    size_t used = get_eden_used();
    size_t capacity = get_eden_capacity();
    // Don't start sampling unless we will get sufficiently
    // many samples.
    if (used < (((capacity / CMSScheduleRemarkSamplingRatio) / 100)
                * CMSScheduleRemarkEdenPenetration)) {
      _start_sampling = true;
    } else {
      _start_sampling = false;
    }
    GCTraceCPUTime tcpu;
    CMSPhaseAccounting pa(this, "Concurrent Preclean");
    preclean_work(CMSPrecleanRefLists1, CMSPrecleanSurvivors1);
  }
  CMSTokenSync x(true); // is cms thread
  if (CMSPrecleaningEnabled) {
    sample_eden();
    _collectorState = AbortablePreclean;
  } else {
    _collectorState = FinalMarking;
  }
  verify_work_stacks_empty();
  verify_overflow_empty();
}

CMSPrecleaningEnabled用于控制是否进行Precleaning阶段，CMSPrecleaningEnabled默认是true的，也就是默认会进行CMSPrecleaningEnabled，除非特殊情况，应该使用默认配置；preclean_work函数用于完成Precleaning的具体工作，Precleaning阶段需要完成的工作包括：

（1）、在并发标记阶段，新生代引用了老年代对象，这些老年代对象需要被标记出来，防止被清理；
（2）、在并发标记阶段，老年代内部引用关系改变，这些老年代对象也需要被标记出来；

AbortablePreclean

AbortablePreclean其实是一个为了达到CMS的终极目标（缩短STW时间）而存在的，AbortablePreclean阶段要做的工作和Precleaning相似，并且是一个循环的过程，但是是有条件的，达到某些条件之后就会跳出循环，执行STW的Final Mark阶段，AbortablePreclean阶段（包括Precleaning阶段）所要做的事情就是尽最大努力减少Final Mark需要标记的对象，这样STW的时间就减下来了。

abortable_preclean函数将负责完成AbortablePreclean阶段的工作；

// Try and schedule the remark such that young gen
// occupancy is CMSScheduleRemarkEdenPenetration %.
void CMSCollector::abortable_preclean() {
  check_correct_thread_executing();
  assert(CMSPrecleaningEnabled,  "Inconsistent control state");
  assert(_collectorState == AbortablePreclean, "Inconsistent control state");

  // If Eden's current occupancy is below this threshold,
  // immediately schedule the remark; else preclean
  // past the next scavenge in an effort to
  // schedule the pause as described above. By choosing
  // CMSScheduleRemarkEdenSizeThreshold >= max eden size
  // we will never do an actual abortable preclean cycle.
  if (get_eden_used() > CMSScheduleRemarkEdenSizeThreshold) {
    GCTraceCPUTime tcpu;
    CMSPhaseAccounting pa(this, "Concurrent Abortable Preclean");
    // We need more smarts in the abortable preclean
    // loop below to deal with cases where allocation
    // in young gen is very very slow, and our precleaning
    // is running a losing race against a horde of
    // mutators intent on flooding us with CMS updates
    // (dirty cards).
    // One, admittedly dumb, strategy is to give up
    // after a certain number of abortable precleaning loops
    // or after a certain maximum time. We want to make
    // this smarter in the next iteration.
    // XXX FIX ME!!! YSR
    size_t loops = 0, workdone = 0, cumworkdone = 0, waited = 0;
    while (!(should_abort_preclean() ||
             ConcurrentMarkSweepThread::cmst()->should_terminate())) {
      workdone = preclean_work(CMSPrecleanRefLists2, CMSPrecleanSurvivors2);
      cumworkdone += workdone;
      loops++;
      // Voluntarily terminate abortable preclean phase if we have
      // been at it for too long.
      if ((CMSMaxAbortablePrecleanLoops != 0) &&
          loops >= CMSMaxAbortablePrecleanLoops) {
        log_debug(gc)(" CMS: abort preclean due to loops ");
        break;
      }
      if (pa.wallclock_millis() > CMSMaxAbortablePrecleanTime) {
        log_debug(gc)(" CMS: abort preclean due to time ");
        break;
      }
      // If we are doing little work each iteration, we should
      // take a short break.
      if (workdone < CMSAbortablePrecleanMinWorkPerIteration) {
        // Sleep for some time, waiting for work to accumulate
        stopTimer();
        cmsThread()->wait_on_cms_lock(CMSAbortablePrecleanWaitMillis);
        startTimer();
        waited++;
      }
    }
    log_trace(gc)(" [" SIZE_FORMAT " iterations, " SIZE_FORMAT " waits, " SIZE_FORMAT " cards)] ",
                               loops, waited, cumworkdone);
  }
  CMSTokenSync x(true); // is cms thread
  if (_collectorState != Idling) {
    assert(_collectorState == AbortablePreclean,
           "Spontaneous state transition?");
    _collectorState = FinalMarking;
  } // Else, a foreground collection completed this CMS cycle.
  return;
}

CMSScheduleRemarkEdenSizeThreshold默认值为2M，只有当Eden区域的使用量大于该值的时候才会进行接下来的工作；接下来看到的while循环里面做的工作和Precleaning是一样的，因为和Precleaning阶段一样使用了preclean_work函数来完成具体的工作；这个while循环执行下去的条件值得分析一下；

（1）、首先，CMSMaxAbortablePrecleanLoops用来设置最大的执行次数，默认是0，也就是不做限制
（2）、CMSMaxAbortablePrecleanTime用于设置最大的循环时间，默认是5000ms
（3）、如果每次循环花费的时间小于CMSAbortablePrecleanMinWorkPerIteration，那么就得等待CMSAbortablePrecleanWaitMillis再继续循环，两个值默认都是100ms
（4）、should_abort_preclean函数判断为true

inline bool CMSCollector::should_abort_preclean() const {
  // We are in the midst of an "abortable preclean" and either
  // scavenge is done or foreground GC wants to take over collection
  return _collectorState == AbortablePreclean &&
         (_abort_preclean || _foregroundGCIsActive ||
          GenCollectedHeap::heap()->incremental_collection_will_fail(true /* consult_young */));
}

_foregroundGCIsActive代表正在进行Serial Old GC，incremental_collection_will_fail代表已经发生了"Promotion Fail"，那么就不用进行“递增式GC了”，也就是JVM建议直接进行FullGC，这些情况下should_abort_preclean都会返回true；

（5）、ConcurrentMarkSweepThread::cmst()->should_terminate()返回true，代表ConcurrentMarkSweepThread被标记为需要terminate；

FinalMarking （最终标记）

FinalMarking属于ReMark，需要STW，下面来分析一下这个阶段需要完成的工作；首先大概猜测一下会进行哪些工作；首先，ReMark阶段需要将最终要清理掉的对象标记出来，也就是这个阶段完成之后，被标记为"垃圾"的对象将会在稍后的阶段回收内存，初始标记阶段完成了从GCRoot和新生代可达的老年代对象，两个preclean阶段是一种修正手段，将那些在GC线程和用户线程并发执行时发生的变化记录起来，并且因为FinalMark阶段是STW的去扫描整个新生代来发现那些可达的老年代对象的，所以，新生代存活的对象如果很多的话，需要扫描的对象就很多，整个社会STW的时间就会上升，所以AbortablePreclean阶段将尽力使得新生代发生一次YGC，这样FinalMark时需要扫描的新生代对象就变少了。因为并发标记阶段GC线程和用户线程并发运行，所以可能会发生下列情况：

（1）、并发期间新生代对象引用（或者解除引用）了老年代对象
（2）、并发期间GCRoot引用（或者解除引用）了老年代对象
（3）、并发期间老年代内部引用关系发生了变化(DirtyCard，引用关系改变的都将记录在DirtyCard内，所以扫描DirtyCard即可)

这些情况FinalMark阶段需要全部考虑到，下面具体来看看该阶段完成的工作；

        {
          ReleaseForegroundGC x(this);

          VM_CMS_Final_Remark final_remark_op(this);
          VMThread::execute(&final_remark_op);
        }
        assert(_foregroundGCShouldWait, "block post-condition");
        break;c

VM_CMS_Final_Remark类型的任务将被添加到VMThread里面执行，所以直接来看VM_CMS_Final_Remark的doit函数实现就可以知道具体的工作内容了；

void VM_CMS_Final_Remark::doit() {
  if (lost_race()) {
    // Nothing to do.
    return;
  }
  HS_PRIVATE_CMS_REMARK_BEGIN();
  GCIdMark gc_id_mark(_gc_id);

  _collector->_gc_timer_cm->register_gc_pause_start("Final Mark");

  GenCollectedHeap* gch = GenCollectedHeap::heap();
  GCCauseSetter gccs(gch, GCCause::_cms_final_remark);

  VM_CMS_Operation::verify_before_gc();

  IsGCActiveMark x; // stop-world GC active
  _collector->do_CMS_operation(CMSCollector::CMS_op_checkpointRootsFinal, gch->gc_cause());

  VM_CMS_Operation::verify_after_gc();

  _collector->save_heap_summary();
  _collector->_gc_timer_cm->register_gc_pause_end();

  HS_PRIVATE_CMS_REMARK_END();
}

和初始化标记一样使用了do_CMS_operation函数，但是执行类型变为了CMSCollector::CMS_op_checkpointRootsFinal，下面看看do_CMS_operation内部执行CMSCollector::CMS_op_checkpointRootsFinal的那部分代码；

void CMSCollector::checkpointRootsFinal() {
  assert(_collectorState == FinalMarking, "incorrect state transition?");
  check_correct_thread_executing();
  // world is stopped at this checkpoint
  assert(SafepointSynchronize::is_at_safepoint(),
         "world should be stopped");
  TraceCMSMemoryManagerStats tms(_collectorState,GenCollectedHeap::heap()->gc_cause());

  verify_work_stacks_empty();
  verify_overflow_empty();

  log_debug(gc)("YG occupancy: " SIZE_FORMAT " K (" SIZE_FORMAT " K)",
                _young_gen->used() / K, _young_gen->capacity() / K);
  {
    if (CMSScavengeBeforeRemark) {
      GenCollectedHeap* gch = GenCollectedHeap::heap();
      // Temporarily set flag to false, GCH->do_collection will
      // expect it to be false and set to true
      FlagSetting fl(gch->_is_gc_active, false);

      gch->do_collection(true,                      // full (i.e. force, see below)
                         false,                     // !clear_all_soft_refs
                         0,                         // size
                         false,                     // is_tlab
                         GenCollectedHeap::YoungGen // type
        );
    }
    FreelistLocker x(this);
    MutexLockerEx y(bitMapLock(),
                    Mutex::_no_safepoint_check_flag);
    checkpointRootsFinalWork();
  }
  verify_work_stacks_empty();
  verify_overflow_empty();
}

如果设置了CMSScavengeBeforeRemark，那么就在执行FinalMark之前执行一次YGC，具体原因前面说过，因为FinalMark阶段是STW的，如果新生代存活对象很多的话，就需要扫描很多对象，这个STW时间就上来了，所以提前进行一次YGC，那么就可以让新生代中废弃的对象回收掉，使得FinalMark阶段扫描的对象减少；CMSScavengeBeforeRemark默认是false的，这个参数还是建议不要轻易设置，因为有preclean阶段的存在，可能在preclean阶段已经发生了一次YGC，如果再进行一次YGC，是没有必要的，所以让CMS自己去按照自己的节奏去工作，除非特别不否和预期的时候才去干涉他的执行。

Sweeping （清除）

就像名字一样，该阶段就是进行垃圾对象清理的，这个阶段是并发的，整个CMS周期性GC过程中，除了initMark和FinalMark之外，其他阶段都是可以并发的；sweep函数将完成清理的工作，在sweep函数内部调用了一个关键的函数sweepWork，下面是sweepWork的具体实现：

void CMSCollector::sweepWork(ConcurrentMarkSweepGeneration* old_gen) {
  // We iterate over the space(s) underlying this generation,
  // checking the mark bit map to see if the bits corresponding
  // to specific blocks are marked or not. Blocks that are
  // marked are live and are not swept up. All remaining blocks
  // are swept up, with coalescing on-the-fly as we sweep up
  // contiguous free and/or garbage blocks:
  // We need to ensure that the sweeper synchronizes with allocators
  // and stop-the-world collectors. In particular, the following
  // locks are used:
  // . CMS token: if this is held, a stop the world collection cannot occur
  // . freelistLock: if this is held no allocation can occur from this
  //                 generation by another thread
  // . bitMapLock: if this is held, no other thread can access or update
  //

  // Note that we need to hold the freelistLock if we use
  // block iterate below; else the iterator might go awry if
  // a mutator (or promotion) causes block contents to change
  // (for instance if the allocator divvies up a block).
  // If we hold the free list lock, for all practical purposes
  // young generation GC's can't occur (they'll usually need to
  // promote), so we might as well prevent all young generation
  // GC's while we do a sweeping step. For the same reason, we might
  // as well take the bit map lock for the entire duration

  // check that we hold the requisite locks
  assert(have_cms_token(), "Should hold cms token");
  assert(ConcurrentMarkSweepThread::cms_thread_has_cms_token(), "Should possess CMS token to sweep");
  assert_lock_strong(old_gen->freelistLock());
  assert_lock_strong(bitMapLock());

  assert(!_inter_sweep_timer.is_active(), "Was switched off in an outer context");
  assert(_intra_sweep_timer.is_active(),  "Was switched on  in an outer context");
  old_gen->cmsSpace()->beginSweepFLCensus((float)(_inter_sweep_timer.seconds()),
                                          _inter_sweep_estimate.padded_average(),
                                          _intra_sweep_estimate.padded_average());
  old_gen->setNearLargestChunk();

  {
    SweepClosure sweepClosure(this, old_gen, &_markBitMap, CMSYield);
    old_gen->cmsSpace()->blk_iterate_careful(&sweepClosure);
    // We need to free-up/coalesce garbage/blocks from a
    // co-terminal free run. This is done in the SweepClosure
    // destructor; so, do not remove this scope, else the
    // end-of-sweep-census below will be off by a little bit.
  }
  old_gen->cmsSpace()->sweep_completed();
  old_gen->cmsSpace()->endSweepFLCensus(sweep_count());
  if (should_unload_classes()) {                // unloaded classes this cycle,
    _concurrent_cycles_since_last_unload = 0;   // ... reset count
  } else {                                      // did not unload classes,
    _concurrent_cycles_since_last_unload++;     // ... increment count
  }
}

CMS只会回收CMSGen，也就是老年代，这里需要重新说明一下；除了ConcMarkSweepGC外，其他GC类型的OldGC都可以说是FullGC（G1暂未了解），具体的sweep算法就不继续分析了。

foreground gc

上面说到的属于CMS周期性GC，也就是background gc，是一种被动的GC，通过监控老年代空间使用率来启动GC，foreground gc属于主动gc，发生foreground gc一般来说就是年轻代发生了Minor GC，并且发生了"Promotion fail"，老年代空间不足等原因，具体原因和GenCollectedHeap堆的GC策略相关，这一点可以看前面的分析文章；下面来简单分析一下foreground gc的一些情况；

发生foreground gc的入口是ConcurrentMarkSweepGeneration::collect；

void ConcurrentMarkSweepGeneration::collect(bool   full,
                                            bool   clear_all_soft_refs,
                                            size_t size,
                                            bool   tlab)
{
  collector()->collect(full, clear_all_soft_refs, size, tlab);
}

void CMSCollector::collect(bool   full,
                           bool   clear_all_soft_refs,
                           size_t size,
                           bool   tlab)
{
  // The following "if" branch is present for defensive reasons.
  // In the current uses of this interface, it can be replaced with:
  // assert(!GCLocker.is_active(), "Can't be called otherwise");
  // But I am not placing that assert here to allow future
  // generality in invoking this interface.
  if (GCLocker::is_active()) {
    // A consistency test for GCLocker
    assert(GCLocker::needs_gc(), "Should have been set already");
    // Skip this foreground collection, instead
    // expanding the heap if necessary.
    // Need the free list locks for the call to free() in compute_new_size()
    compute_new_size();
    return;
  }
  acquire_control_and_collect(full, clear_all_soft_refs);
}

acquire_control_and_collect函数将完成foreground gc的工作，看函数名字就可以猜测它要干嘛，首先要acquire control，也就是获取到堆的控制权，因为在触发foreground gc的时候，background gc可能正在工作，因为不可能同时两中gc同时运行，而foreground gc的优先级明显高于background gc，所以需要让background gc放弃gc，然后foreground gc来完成收集老年代垃圾的工作，当然，foreground gc顺带会回收新生代，所以是一次FullGC，下面具体看看acquire_control_and_collect函数的流程；

{
    MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
    if (_foregroundGCShouldWait) {
      // We are going to be waiting for action for the CMS thread;
      // it had better not be gone (for instance at shutdown)!
      assert(ConcurrentMarkSweepThread::cmst() != NULL && !ConcurrentMarkSweepThread::cmst()->has_terminated(),
             "CMS thread must be running");
      // Wait here until the background collector gives us the go-ahead
      ConcurrentMarkSweepThread::clear_CMS_flag(
        ConcurrentMarkSweepThread::CMS_vm_has_token);  // release token
      // Get a possibly blocked CMS thread going:
      //   Note that we set _foregroundGCIsActive true above,
      //   without protection of the CGC_lock.
      CGC_lock->notify();
      assert(!ConcurrentMarkSweepThread::vm_thread_wants_cms_token(),
             "Possible deadlock");
      while (_foregroundGCShouldWait) {
        // wait for notification
        CGC_lock->wait(Mutex::_no_safepoint_check_flag);
        // Possibility of delay/starvation here, since CMS token does
        // not know to give priority to VM thread? Actually, i think
        // there wouldn't be any delay/starvation, but the proof of
        // that "fact" (?) appears non-trivial. XXX 20011219YSR
      }
      ConcurrentMarkSweepThread::set_CMS_flag(
        ConcurrentMarkSweepThread::CMS_vm_has_token);
    }
  }

这一段会尝试等background gc主动把堆的控制权转移给foreground gc，在collect_in_background（background gc）中，开始之前会判断是否在进行foreground gc（_foregroundGCIsActive = true），如果在执行foreground gc，那么就会直接退出本次background gc；否则再每完成一个阶段之后都会尝试判断是否foreground gc在等待；

    {
      // Check if the FG collector wants us to yield.
      CMSTokenSync x(true); // is cms thread
      if (waitForForegroundGC()) {
        // We yielded to a foreground GC, nothing more to be
        // done this round.
        assert(_foregroundGCShouldWait == false, "We set it to false in "
               "waitForForegroundGC()");
        log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " exiting collection CMS state %d",
                             p2i(Thread::current()), _collectorState);
        return;
      } else {
        // The background collector can run but check to see if the
        // foreground collector has done a collection while the
        // background collector was waiting to get the CGC_lock
        // above.  If yes, break so that _foregroundGCShouldWait
        // is cleared before returning.
        if (_collectorState == Idling) {
          break;
        }
      }
    }

waitForForegroundGC函数完成等待foreground gc 发生的工作：

bool CMSCollector::waitForForegroundGC() {
  bool res = false;
  assert(ConcurrentMarkSweepThread::cms_thread_has_cms_token(),
         "CMS thread should have CMS token");
  // Block the foreground collector until the
  // background collectors decides whether to
  // yield.
  MutexLockerEx x(CGC_lock, Mutex::_no_safepoint_check_flag);
  _foregroundGCShouldWait = true;
  if (_foregroundGCIsActive) {
    // The background collector yields to the
    // foreground collector and returns a value
    // indicating that it has yielded.  The foreground
    // collector can proceed.
    res = true;
    _foregroundGCShouldWait = false;
    ConcurrentMarkSweepThread::clear_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_has_token);
    ConcurrentMarkSweepThread::set_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_wants_token);
    // Get a possibly blocked foreground thread going
    CGC_lock->notify();
    log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " waiting at CMS state %d",
                         p2i(Thread::current()), _collectorState);
    while (_foregroundGCIsActive) {
      CGC_lock->wait(Mutex::_no_safepoint_check_flag);
    }
    ConcurrentMarkSweepThread::set_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_has_token);
    ConcurrentMarkSweepThread::clear_CMS_flag(
      ConcurrentMarkSweepThread::CMS_cms_wants_token);
  }
  log_debug(gc, state)("CMS Thread " INTPTR_FORMAT " continuing at CMS state %d",
                       p2i(Thread::current()), _collectorState);
  return res;
}

如果此时进行（或者等待）foreground gc，那么就放弃此次background gc；否则告诉后续来到的foreground gc等待一下，等本阶段CMS GC完成会再次来判断的；

在foreground gc中，获取到了堆的控制权之后，就会执行下面的代码片段：

  if (first_state > Idling) {
    report_concurrent_mode_interruption();
  }
void CMSCollector::report_concurrent_mode_interruption() {
  if (is_external_interruption()) {
    log_debug(gc)("Concurrent mode interrupted");
  } else {
    log_debug(gc)("Concurrent mode failure");
    _gc_tracer_cm->report_concurrent_mode_failure();
  }
}

bool CMSCollector::is_external_interruption() {
  GCCause::Cause cause = GenCollectedHeap::heap()->gc_cause();
  return GCCause::is_user_requested_gc(cause) ||
         GCCause::is_serviceability_requested_gc(cause);
}

我们在观察CMS GC日志的时候，偶尔会看到“Concurrent mode interrupted”或者“Concurrent mode failure”这样的日志，就是因为在进行foreground gc的时候发现background gc已经在工作了；如果是类似于System.gc()这样的用户请求GC，那么就会打印“Concurrent mode interrupted”，否则就是“Concurrent mode failure”；

之后CMSCollector::do_compaction_work函数将做一次Mark-sweep-compact的工作，具体的工作在GenMarkSweep::invoke_at_safepoint函数中完成，这个函数在前面分析Serial Old的时候提到过，所以不再赘述；

总结

整个CMS GC其实是非常复杂的，涉及用户线程和GC线程并发执行，以及foreground gc和background gc相互配合的过程，当然还涉及大量的参数，这些参数稍微不注意就会让JVM工作得不好，所以建议在不了解某个参数的具体表现的时候不要轻易使用；

其实CMS Old GC为什么分这么多步骤呢？主要原因是为了降低STW的时候，所以将mark和sweep两个阶段都设计成并发了，initMark和FinalMark会STW，但是initMark阶段所做的mark非常有限，GCRoot-> cms gen ， YoungGen -> cms gen，而且因为两个preclan阶段和Dirty Card的存在，使得FinalMark阶段需要扫描的对象大大减小，如果在实际的运行过程中发现每次FinalMark过程都非常长，那么就设置参数在进行FinalMark之前进行一次YGC，使得FinalMark需要扫描的对象减少；CMS Old GC Mark 和 preclean阶段允许用户线程和GC线程并发执行，所以会存在：

（1）、yong gen -> old gen
（2）、GCRoot -> old gen
（3）、old gen internal ref changed

解决这些问题就需要FinalMark的存在，FinalMark将扫描新生代，标记出yong gen -> old gen的部分，老年代内部的对象引用关系如果在并发阶段发生变化，会记录到DirtyCard中去，所以在FinalMark阶段扫描DirtyCard即可；

最后要说一下foreground gc和background gc，最好不要发生foreground gc，因为foreground gc会认为此时已经没有什么办法满足对象分配了，那么就要做一次彻底清理的工作，也就是FullGC，并且foreground gc是单线程运行的，并且是mark-sweep-compact的，所以速度可想而知，如果发现foreground gc发生的频繁，就要分析一下原因了，建议去研究GenCollectedHeap::do_collection，搞明白GC的策略，当然不同GC对应的堆是不一样的，Serial 和 CMS对应的是GenCollectedHeap，其他的就不是了，这个前面的文章说过。