智能指针实现原理及使用实践

本文旨在讨论智能指针的底层实现和一些指针使用方面的问题，不是一个智能指针的新手教程，关于智能指针的使用可以参考这里。
C++ 的内存泄露问题一直以来饱受诟病，为了减少开发者的心智负担，降低内存管理难度，C++ 11 在标准库里引入了智能指针，智能指针通过 RAII 机制解决内存泄露问题。

C++ 11 标准引入的智能指针有三种（不考虑auto_ptr，C++ 17后被移除）：shared_ptr、weak_ptr 和 unique_ptr。

1 智能指针的实现

以 gcc 的 libstdc++ 实现为例，介绍智能指针的具体实现，只考虑基础的实现，不关注定制delete等其他功能。

1.1 unique_ptr 实现

使用场景：用于独占数据所有权（exclusive ownership）。
unique_ptr 源码实现最简单，模板类里包含指向数据的指针变量，基础的构造禁止拷贝、赋值构造函数，提供移动构造、移动赋值函数。
简化版实现：

template <typename Tp>
class my_unique_ptr
{
public:
    my_unique_ptr(Tp *ptr) : ptr(ptr) {}
    my_unique_ptr(my_uique_ptr &&other) 
    { 
        ptr = other.ptr; 
        other.ptr = nullptr; 
    }
    my_unique_ptr &operator=(my_uique_ptr &&other) 
    { 
        ptr = other.ptr; 
        other.ptr = nullptr; 
    }
    ~my_unique_ptr() { delete ptr; }

private:
    my_unique_ptr(const my_uique_ptr &other) = delete;
    my_unique_ptr &operator=(const my_uique_ptr &other) = delete;

private:
    Tp *ptr;
};

1.2 shared_ptr

使用场景：用于共享数据所有权（shared ownership）。
shared_ptr 模板类里包含指向数据和指向引用计数的指针变量。引用计数需要额外的堆上内存保存，放到堆上是因为多个 shared_ptr 的引用计数指针可以指向同一块引用计数内存，共享同样的引用计数。

两个 shared_ptr指向同一块数据，内存中示例图如下（data、ref_count 都在堆上）：

共享智能指针.png

简化版实现如下：

template <typename Tp>
class my_shared_ptr
{
    my_shared_ptr(): ptr(nullptr), ref_count_ptr(nullptr) {}
    my_shared_ptr(Tp *ptr): ptr(ptr), ref_count_ptr(new int(1)) {}
    my_shared_ptr(const my_shared_ptr other)
    {
        ptr = other.ptr;
        ref_count_ptr = other.ref_count_ptr;
        ++(*ref_count_ptr);
    }

    my_shared_ptr &operator=(const my_shared_ptr other)
    {
        ptr = other.ptr;
        ref_count_ptr = other.ref_count_ptr;
        ++(*ref_count_ptr);
    }


    ~my_shared_ptr() 
    { 
        if(ref_count_ptr && (--(*ref_count_ptr)) == 0)
        {
            delete ptr;
            delete ref_count_ptr;
        }
    }
private:
    Tp *ptr;
    atomit<int> *ref_count_ptr;
};

1.3 weak_ptr

使用场景：解决 shared_ptr的循环引用问题。
shared_ptr循环引用会导致内存泄露，意不意外，使用了智能指针仍然有内存泄露的问题（汗）。
实际有 shared_ptr和 unique_ptr就应该完全满足所有使用场景了，但使用 shared_ptr时引入了一个循环引用的问题，加入 weak_ptr纯粹是为了解决循环引用问题，也因此只有 shared_ptr有循环引用时才使用 weak_ptr，其他任何情况都不应该用 weak_ptr。
下面看下具体如何引发 shared_ptr循环引用问题：

class A
{
public:
    shared_ptr<B> b;
};

class B
{
public:
    shared_ptr<A> a;
};

int main()
{
    shared_ptr<A> aPtr(new A);
    shared_ptr<B> bPtr(new B);
    aPtr->b = bPtr;
    bPtr->a = aPtr;
    return 0;
}

共享智能指针循环引用.png

从上面的代码和以上示意图可知，在 A 和 B 的实例作用范围内，aPtr->ref_count_ptr->ref_count = 2, bPtr->ref_count_ptr->ref_count = 2。
要出作用范围时， bPtr先销毁，调用 bPtr->~shared_ptr<B>()析构函数， bPtr->ref_count_ptr->ref_count减 1 后，bPtr->ref_count_ptr->ref_count = 1，没有减为0，故不需要调用 bPtr->data_ptr->~B()，那么此时 bPtr->data_ptr->a->ref_count_ptr->ref_count(aPtr->ref_count_ptr->ref_count)没有变化，即 aPtr->ref_count_ptr->ref_count = 2。
销毁 aPtr时，调用 aPtr->~shared_ptr<A>()析构函数， aPtr->ref_count_ptr->ref_count减 1 后，aPtr->ref_count_ptr->ref_count = 1，没有减为0，故不需要调用 aPtr->data_ptr->~A()，那么此时 aPtr->data_ptr->b->ref_count_ptr->ref_count(bPtr->ref_count_ptr->ref_count)没有变化，即 bPtr->ref_count_ptr->ref_count = 1.

简单来说，由于 A 和 B 分别持有对方的 shared_ptr， shared_ptr<B>销毁时，B引用计数从2减为1，由于B引用计数不为0，不会调用 ~B销毁B，那么B中的 shared_ptr<A>也不会去销毁了，A引用计数仍然是2。
后销毁 shared_ptr<A>时，A引用计数从2减为1，由于A引用计数不为0，不会调用 ~A销毁A，那么A中的 shared_ptr<B>也不会去销毁了，B的引用计数仍然是1。
最终A和B的引用计数都是1，占用的堆上内存不会释放，即产生了内存泄露。

引入 weak_ptr即可解决循环引用。

class A
{
public:
    shared_ptr<B> b;
};

class B
{
public:
    weak_ptr<A> a;
};

int main()
{
    shared_ptr<A> aPtr(new A);
    shared_ptr<B> bPtr(new B);
    aPtr->b = bPtr;
    bPtr->a = aPtr;
    return 0;
}

2 智能指针使用实践

除了合适的场景选用合适智能指针外，以下是一些很个人的建议：

智能指针在函数（lambda）尽量使用复制，符合智能指针解决问题的初衷（当然用引用也是可以的）。
个人来说，能用 unique_ptr解决就不用 shared，生命周期有预期，不会无端延长智能指针的使用周期。
使用 make_shared和 make_unique创建智能指针，eg.shared_ptr<A> a = make_shared<A>()，编译器为 make_XXX做了返回值优化（RVO），故不用担心 =触发拷贝构造。

3 shared_ptr 是线程安全的么？

这也是一个老生常谈的话题了，STL里的组建几乎都不是线程安全的，要线程安全就要额外手段来保证。
shared_ptr也一样，尽管实现里引用计数用了aomic，这也只能保证引用计数是线程安全的，引用计数的操作和对指针变量的操作以及对指针指向内容的操作不是原子的，笼统的来说就是，保证不了整体的线程安全。
以下具体分析：

（1）多个线程持有主线程中 shared_ptr副本

shared_ptr引用计数：
线程安全。
shared_ptr指向数据的指针：
由于是副本，改变指针执行，不影响其他线程，引起的原来指向内存的引用计数减1，由于是atomic变量，是线程安全的，一个不影响，一个线程安全，故整体线程安全。
shared_ptr中指针实际指向的数据：
多个线程操作的是同一块内存，有 data race，非线程安全。

（2）多个线程持有主线程中 shared_ptr引用

shared_ptr引用计数：
线程安全。
shared_ptr指向数据的指针：
由于是操作的同一个指针变量，有 data race，非线程安全。
shared_ptr中指针实际指向的数据：
多个线程操作的是同一块内存，有 data race，非线程安全。

4 如何保证 shared_ptr 的线程安全

C++ 20 标准引入了 atomic<shared_ptr>来保证线程安全，具体参考这里。

智能指针实现原理及使用实践