JVM GC 学习

前言

这篇是 what-is-garbage-collection 的学习笔记，如果能看懂英文的，直接点击进去学习即可，写得比我的笔记好了不知多少倍。

分区

Eden

下图的 Eden 区域的更加细节的划分，Eden 区域更加细分的话，还可以分为：

Thread Local Allocation Buffer（缩写是 TLAB），某个线程独立占有，用来创建一些线程的本地 object。划分出 TLAB 可以避免多线程的情况下，避免不必要的同步开销
shared Eden space，当 TLAB 的内存不足够时，便会在 shared Eden space 里面分配 object（也就是下图的 Common area）

当 Eden space 里面，没有足够的内存创建 object，JVM 便会进行 Minor GC，如果进行 Minor GC 之后，Eden space 仍然没有足够的内存创建 object，便会在 Old Generation 里面创建 object。

Mark and Copy：指的是 mark 下 Eden 区域下所有的 live object，然后将它们全部 copy 到 survivor 区域，清空 Eden 区域。

image.png

Survivor Spaces

Survivor space 分成 from 和 to 两块区域，其中一个总是空的。当 Eden 区域发生 GC 的时候，Eden 和 from 的对象都会被 copy 到 to 区域，此时，from 区域就变成了空的区域，这两单于 from 和 to 互换了角色。

当 object 在 survivor space 呆的时间“足够长”之后，它们便会被提升至 Old Generation，如下引用中所描述的：

This process of copying the live objects between the two Survivor spaces is repeated several times until some objects are considered to have matured and are ‘old enough’. Remember that, based on the generational hypothesis, objects which have survived for some time are expected to continue to be used for very long time.

image.png

Old Generation

老年代的 GC 过程简述如下：

Mark reachable objects by setting the marked bit next to all objects accessible through GC roots

Delete all unreachable objects

Compact the content of old space by copying the live objects contiguously to the beginning of the Old space

PermGen（永生代）

永生代包括如下的东西（TODO：永生代的确切内容待详细调研）：

metadata 例如 class 信息
一些内部的字符串
等

GC 算法

Marking Reachable Objects

原理很简单，从 GC root 出发，mark 所有 reachable 的 object，如下图所示。

image.png

但是什么 object 才能被定义为 GC root 呢？下面是几类典型的 GC root：

Local variable and input parameters of the currently executing methods

Active threads

Static field of the loaded classes

JNI references

值得注意的是，marking 阶段会导致 stop-the-world 的行为，只是时间通常短到可以忽略。这个 stop-the-world 的时间取决于什么呢？答案是取决于 alive objects 的数量（为什么呢？）

Such a situation when the application threads are temporarily stopped so that the JVM can indulge in housekeeping activities is called a safe point resulting in a Stop The World pause.

Removing Unused Objects

注意，下面说的 3 种做法，只是一般的做法，并不涉及具体的 GC 算法。

Sweep

下面一张图直接说明了 sweep 的简单原理，注意 sweep 之后 heap 存在非常多的碎片。

image.png

Compact

Compact 简单原理图

image.png

Copy

image.png

四种具体的 GC 算法

Serial GC

Young Generation：mark-copy
mark-sweep-compact：mark-sweep-compact

Parallel GC

This combination of Garbage Collectors uses mark-copy in the Young Generation and mark-sweep-compact in the Old Generation. Both Young and Old collections trigger stop-the-world events, stopping all application threads to perform garbage collection.

简而言之，新生代使用 mark-copy，老年代使用 mark-sweep-compact，两个 GC 都会引起 stop-the-world

Concurrent Mark and Sweep

CMS 的初衷是尽量避免老年代的 stop-the-world 行为，它的实现原理如下：

it does not compact the Old Generation but uses free-lists to manage reclaimed space. （不压缩 Old Generation）
it does most of the job in the mark-and-sweep phases concurrently with the application.（CMS 大部分的工作都可以在不阻塞 mark-and-sweep 阶段完成）

CMS 日志分析

新生代 GC 日志分析：

image.png

关于 CMS，一个值得注意的点是，在 Old Generation 进行 GC 的同时，Minor GC 同样可以并发进行，两者并不冲突。

Phase 1: Initial Mark

STW 阶段之一，在这个阶段里，老年代的所有 live object 都会被标记出来。

image.png

Phase 2: Concurrent Mark

During this phase the Garbage Collector traverses the Old Generation and marks all live objects, starting from the roots found in the previous phase of “Initial Mark”.

问题：为什么在阶段一对 live object 进行了标记，还需要在阶段二再次进行并行标记？
答：注意，阶段一并没有标记老年代中所有的 object，只是标记了从 GC Root 和新生代可到达的 objects。

image.png

Phase 3: Concurrent Preclean

在阶段二中，由于是不阻塞的并发执行，有些 objects 在阶段二完成之后可能已经发生了变化，例如下图的 current object，在阶段二完成之后，从它那里又引申出了一个 object。

image.png

JVM 会很将在阶段二中发生变化的 objects 标记为 dirty，在阶段三重新检查，标记一遍。

image.png

Phase 4: Concurrent Abortable Preclean

Again, a concurrent phase that is not stopping the application’s threads. This one attempts to take as much work off the shoulders of the stop-the-world Final Remark as possible.

也是个标记阶段，但是这个标记阶段可能会因为某些因素而被中断。

Phase 5: Final Remark

STW 阶段，确保 Old Generation 里的所有 objects 都被正确的标记了。

Phase 6: Concurrent Sweep

回收 heap

image.png

Phase 7: Concurrent Reset

Concurrently executed phase, resetting inner data structures of the CMS algorithm and preparing them for the next cycle.

CMS long gc 问题

在两种情况下，CMS 会导致 infamous Full GC pause：

concurrent mode failure
promotion failure

concurrent mode failure

当 CMS 进行 concurrent phase 的时候，老年代内存不足时，所有线程便会阻塞，等待 CMS 结束，释放掉足够的老年代内存。而正常的情况下，concurrent phase 并不会有 STW 的问题。

解决办法通常有两个：

让 CMS 尽早的运行，不要等到老年代只剩很少内存时才运行。
可以尝试提高 concurrent phase 的并发数。（但效果可能并不理想，因为 application 可能更需要 CPU 时间）

JVM GC 学习

前言

分区

Eden

Survivor Spaces

Old Generation

PermGen（永生代）

GC 算法

Marking Reachable Objects

Removing Unused Objects

Sweep

Compact

Copy

四种具体的 GC 算法

Serial GC

Parallel GC

Concurrent Mark and Sweep

CMS 日志分析

Phase 1: Initial Mark

Phase 2: Concurrent Mark

Phase 3: Concurrent Preclean

Phase 4: Concurrent Abortable Preclean

Phase 5: Final Remark

Phase 6: Concurrent Sweep

Phase 7: Concurrent Reset

CMS long gc 问题

concurrent mode failure

推荐阅读更多精彩内容