大师兄的Python源码学习笔记(五十四）: Python的内存管理机制(九）

大师兄的Python源码学习笔记(五十三）: Python的内存管理机制(八）
大师兄的Python源码学习笔记(五十五）: Python的内存管理机制(十）

五、Python中的垃圾收集

2. 分代的垃圾收集

无论何种语言开发，何种类型，何种规模的程序，都存在一个相似点：

大部分的内存块的生命周期比较短，而另外一部分会比较长，甚至从程序开始持续到结束。

这个比例通常在80%到98%之间。

这点对于垃圾收集技术有重要意义：

像标记—清除这样的垃圾收集所带来的额外操作，实际上与系统中总的内存块数相关。

所以当内存块数越少时，垃圾收集所带来的的额外操作越少，效率更高。

基于这点，可以采用一种空间换时间的策略：

将系统中的所有内存块根据其存货时间划分为不同的集合，每一个集合称为一代。

垃圾收集的频率随着代的存活时间的增大而减小，也就是说活得越长的对象，就越不可能是垃圾，应该更少去收集。

存活的时间用垃圾收集动作的次数来衡量，如果一个对象经过的垃圾收集次数越多，其存活时间越长。

Python中的分代垃圾收集机制共分三代：

所谓一代就是一个链表，所属同一代就是在同一个链表中。

所以三代就是维护三个链表。

Include/internal/mem.h

struct gc_generation {
    PyGC_Head head;
    int threshold; /* collection threshold */
    int count; /* count of allocations or collections of younger
                  generations */
};

Modules/gcmodule.c

#define GEN_HEAD(n) (&_PyRuntime.gc.generations[n].head)

void
_PyGC_Initialize(struct _gc_runtime_state *state)
{
    state->enabled = 1; /* automatic collection enabled? */

#define _GEN_HEAD(n) (&state->generations[n].head)
    struct gc_generation generations[NUM_GENERATIONS] = {
        /* PyGC_Head,                                 threshold,      count */
        {{{_GEN_HEAD(0), _GEN_HEAD(0), 0}},           700,            0},
        {{{_GEN_HEAD(1), _GEN_HEAD(1), 0}},           10,             0},
        {{{_GEN_HEAD(2), _GEN_HEAD(2), 0}},           10,             0},
    };
    for (int i = 0; i < NUM_GENERATIONS; i++) {
        state->generations[i] = generations[i];
    };
    state->generation0 = GEN_HEAD(0);
    struct gc_generation permanent_generation = {
          {{&state->permanent_generation.head, &state->permanent_generation.head, 0}}, 0, 0
    };
    state->permanent_generation = permanent_generation;
}

在_PyObject_GC_TRACK中，可以看到变量_PyGC_generation0，这是一个指针，它指向的正是第0代的内存块集合：

Include/objimpl.h

/* Tell the GC to track this object.  NB: While the object is tracked the
 * collector it must be safe to call the ob_traverse method. */
#define _PyObject_GC_TRACK(o) do { \
    PyGC_Head *g = _Py_AS_GC(o); \
    if (_PyGCHead_REFS(g) != _PyGC_REFS_UNTRACKED) \
        Py_FatalError("GC object already tracked"); \
    _PyGCHead_SET_REFS(g, _PyGC_REFS_REACHABLE); \
    g->gc.gc_next = _PyGC_generation0; \
    g->gc.gc_prev = _PyGC_generation0->gc.gc_prev; \
    g->gc.gc_prev->gc.gc_next = g; \
    _PyGC_generation0->gc.gc_prev = g; \
    } while (0);

对于每一个gc_generation，其中的count记录了当前这条可收集对象链表中一共有多少个可收集对象。
在_PyObject_GC_Alloc中，我们可以看到在分配内存后，都会进行count++动作，将第0代内存链表中所维护的内存块数量加1：

Modules/gcmodule.c

static PyObject *
_PyObject_GC_Alloc(int use_calloc, size_t basicsize)
{
    PyObject *op;
    PyGC_Head *g;
    ... ...
    _PyRuntime.gc.generations[0].count++; /* number of allocated GC objects */
    ... ...
}

这意味着所有新创建的对象实际上都会被加入到第0代可收集对象链表中。
在gc_generation中，threshold记录了该条可收集对象链表中最多可容纳多少个可收集对象，从源码中可以看到这个数字是700。
也就是说一旦第0代内存链表的数量超过700，则会立即出发垃圾回收机制。

Modules/gcmodule.c

static Py_ssize_t
collect_generations(void)
{
    int i;
    Py_ssize_t n = 0;

    /* Find the oldest generation (highest numbered) where the count
     * exceeds the threshold.  Objects in the that generation and
     * generations younger than it will be collected. */
    for (i = NUM_GENERATIONS-1; i >= 0; i--) {
        if (_PyRuntime.gc.generations[i].count > _PyRuntime.gc.generations[i].threshold) {
            /* Avoid quadratic performance degradation in number
               of tracked objects. See comments at the beginning
               of this file, and issue #4074.
            */
            if (i == NUM_GENERATIONS - 1
                && _PyRuntime.gc.long_lived_pending < _PyRuntime.gc.long_lived_total / 4)
                continue;
            n = collect_with_callback(i);
            break;
        }
    }
    return n;
}

虽然是由第0代内存链表的越界出发了垃圾收集，但Python会借机对所有满足count值越界的代内存链表进行垃圾收集。

大师兄的Python源码学习笔记(五十四）: Python的内存管理机制(九）

五、Python中的垃圾收集

2. 分代的垃圾收集

推荐阅读更多精彩内容