OC底层原理03：内存对齐原理和malloc源码分析

获取内存大小的三种方式

sizeof：这是一个操作符，课操作的数据有基本数据类型、指针、对象。由于它不是函数，所以在编译时就确定了大小。
class_getInstanceSize：获取实例变量的成员变量大小之和，8字节对齐，用之前需要导入头文件objc/runtime.h。
malloc_size：获取系统实际开辟的内存大小，用之前需导入头文件malloc/malloc.h。

结构体内存对齐

首先自定义两个结构体，打印两个结构体的内存大小。

打印struct内存大小

从上图可以看到两个结构体的成员变量一样，但是由于定义的顺序不一样，导致了最后的内存大小也不一样。这种现象就是iOS中的内存字节对齐。
根据上篇文章的内存对齐原则来手动计算一下两个结构体的内存大小。

StructA：
a是一个char类型，占1个字节。
b是一个double类型，占8个字节。并且要从8的整数倍开始存取，所以a后面要补上7个字节的空数据。
c是一个short类型，占2个字节。
d是一个int类型，占4个字节。并且要从4的整数倍开始存取，所以c后面要补上2个字节的空数据。
所以StructA总的大小为：((1(a) + 7(空数据) + 8(b) + 2(c) + 2(空数据) + 4(d) + 7) >> 3 << 3 = 24

StructB:
b是一个double类型，占8个字节。
d是一个int类型，占4个字节。
c是一个short类型，占2个字节。
d是一个char类型，占1个字节。
所以StructB总的大小为：((8(b) + 4(d) + 2(c) + 1(d) + 7) >> 3 << 3 = 16

再来探索一下结构体嵌套的情况。

结构体嵌套

手动计算一下StructC的内存大小。
a是一个char类型，占1个字节。
b是一个double类型，占8个字节。并且要从8的整数倍开始存取，所以a后面要补上7个字节的空数据。
c是一个short类型，占2个字节。
structA是一个StructA类型，前面已经算出为24个字节。并且要从StructA内部最大的成员（double 8字节）整数倍开始存取，所以c后面要补上6个字节空数据。
d是一个int类型，占4个字节。
所以StructA总的大小为：((1(a) + 7(空数据) + 8(b) + 2(c) + 6(空数据) + 24(structA) + 4(d) + 7) >> 3 << 3 = 56

所以当我们定义结构体的时候，为了优化内存，减少开销，可以将成员从大到小排列。

下面是一些iOS中常见类型在32位和64位设备上对应的大小。

C	OC	32位	64位
bool	BOOL（32位）	1	1
signed char	SInt8、int8_t	1	1
unsigned char	UInt8、Boolean	1	1
short	int16_t	2	2
unsigned short	UInt16	2	2
signed short	SInt16	2	2
int	int32_t、NSInteger（32位）、boolean_t（32位）	4	4
unsigned int	UInt32（64位）、boolean（64位）NSUInteger（32位）	4	4
signed int	SInt32（64位）	4	4
long	NSInteger（64位）	4	8
unsigned long	NSUInteger（64位）	4	8
long long	int64_t	8	8
unsigned long long	UInt64	8	8
signed long long	SInt64	8	8
float	CGFloat（32位）	4	4
double	CGFloat（64位）	8	8

看完了结构体我们再来看看OC中对象的内存结构。

@interface Person : NSObject
@property(nonatomic, copy) NSString *name;
@property(nonatomic, assign) unsigned short age;
@property(nonatomic, copy) NSString *nickName;
@property(nonatomic, assign) double height;
@property(nonatomic, assign) char sex;
@property(nonatomic, assign) long score;
@end

person的内存结构

可以看到，OC中对象的内存结构跟定义的属性顺序无关。
第1个8字节是isa指针。
第2个8字节存储的是age和sex。
第3个和第4个存储的是name和nickName。
第5个字节存储的是height。
第6个字节存储的是score。
由此可得出，OC对对象的内存结构进行了重排，减少了内存开销，使读取速度变快。

额外补充一点儿，Swift中由于是静态语言，所以并不会属性进行重排。

swift中对象的内存结构

可以看到Swift中的对象内存结构跟成员变量的顺序有关，所以为了节约内存提高读取速度，在Swift定义的模型类成员变量应该从大往小排列。

在上篇文章我们看到alloc底层调用了calloc开辟了一块内存，接下来探究calloc的底层。

准备工作：

下载malloc源码
编译源码，可参考这里

调用calloc并打上断点。

调用calloc

通过Step into进入calloc源码。

calloc

进来是中间层代码，继续跟进。

_malloc_zone_calloc

继续跟进。

default_zone_malloc

在这一步创建了真正的zone，然后用真正的zone进行calloc，继续跟进。

nano_calloc

在这一步先计算了总的大小，然后根据大小进入不同的内存开辟方法，这里我是传入了40，小于NANO_MAX_SIZE(256)，所以进入了_nano_malloc_check_clear。

_nano_malloc_check_clear

继续跟进segregate_next_block。

segregate_next_block

如果是第一次调用segregated_next_block函数，band不存在，缓存也不会存在，所以会调用segregated_band_grow来开辟新的band。

segregated_band_grow

关于通过nano_blk_addr_t的联合体结构和宏定义如下。


struct nano_blk_addr_s {
    uint64_t                                   
    nano_offset:NANO_OFFSET_BITS,              //17 locates the block
    nano_slot:NANO_SLOT_BITS,                  //4  bucket of homogenous quanta-multiple blocks
    nano_band:NANO_BAND_BITS,                  //17
    nano_mag_index:NANO_MAG_BITS,              //6  the core that allocated this block
    nano_signature:NANOZONE_SIGNATURE_BITS;    //   the address range devoted to us.
};

#endif
// clang-format on

typedef union  {
    uint64_t            addr;
    struct nano_blk_addr_s    fields;
} nano_blk_addr_t;

#define SLOT_IN_BAND_SIZE   (1 << NANO_OFFSET_BITS)
#define SLOT_KEY_LIMIT      (1 << NANO_SLOT_BITS) /* Must track nano_slot width */
#define BAND_SIZE       (1 << (NANO_SLOT_BITS + NANO_OFFSET_BITS)) /*  == Number of bytes covered by a page table entry */
#define NANO_MAG_SIZE       (1 << NANO_MAG_BITS)
#define NANO_SLOT_SIZE      (1 << NANO_SLOT_BITS)

#define NANO_MAG_BITS           6
#define NANO_BAND_BITS          17
#define NANO_SLOT_BITS          4
#define NANO_OFFSET_BITS        17

下面来梳理下 nanozone 分配过程：
确定当前cpu对应的mag和通过size参数计算出来的slot ，去对应chained_block_s的链表中取已经被释放过的内存区块缓存，如果取到检查指针地址是否有问题，没有问题就直接返回；
初次进行nano malloc时，nanozone并没有缓存，会直接在nanozone范围的地址空间上直接分配连续地址内存；
如当前Band中当前Slot耗尽则向系统申请新的Band（每个 Band固定大小 2M，容纳了16个128k 的槽），连续地址分配内存的基地址、limit地址以及当前分配到的地址由meta data结构维护起来，而这些meta data则以Mag、Slot为维度（Mag个数是处理器个数，Slot是16个）的二维数组形式，放在 nanozone_t的meta_data字段中。

上面是当开辟的size小于NANO_MAX_SIZE（256）的情况，接下来探究当size大于256的情况。

szone_calloc

获取helper_zone

继续跟进。

szone_calloc

跟nano_calloc一样，先根据num_items计算total_bytes，继续跟进。

szone_malloc_should_clear

这里以看出在szone上分配的内存根据size大小不同包括tiny、small、medium 和large 四大类。这里我传的size是257，所以会进入tiny分支，我们以tiny为例开始下面的分析。

void *
tiny_malloc_should_clear(rack_t *rack, msize_t msize, boolean_t cleared_requested)
{
    void *ptr;
    // 计算mag_index下标，magazines是一个由64个magazine_t组成的数组。
    mag_index_t mag_index = tiny_mag_get_thread_index() % rack->num_magazines;
    // 根据mag_index下标获取magazine。
    magazine_t *tiny_mag_ptr = &(rack->magazines[mag_index]);

    MALLOC_TRACE(TRACE_tiny_malloc, (uintptr_t)rack, TINY_BYTES_FOR_MSIZE(msize), (uintptr_t)tiny_mag_ptr, cleared_requested);

#if DEBUG_MALLOC
    if (DEPOT_MAGAZINE_INDEX == mag_index) {
        malloc_zone_error(rack->debug_flags, true, "malloc called for magazine index -1\n");
        return (NULL);
    }

    if (!msize) {
        malloc_zone_error(rack->debug_flags, true, "invariant broken (!msize) in allocation (region)\n");
        return (NULL);
    }
#endif

    SZONE_MAGAZINE_PTR_LOCK(tiny_mag_ptr);

    //如果开启了tiny的缓存
#if CONFIG_TINY_CACHE
    ptr = tiny_mag_ptr->mag_last_free;

    if (tiny_mag_ptr->mag_last_free_msize == msize) {
        // we have a winner
        //优先查看上次最后释放的区块是否和此次请求的大小刚好相等（都是对齐之后的slot大小），如果是则直接返回。
        tiny_mag_ptr->mag_last_free = NULL;
        tiny_mag_ptr->mag_last_free_msize = 0;
        tiny_mag_ptr->mag_last_free_rgn = NULL;
        SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
        CHECK(szone, __PRETTY_FUNCTION__);
        if (cleared_requested) {
            memset(ptr, 0, TINY_BYTES_FOR_MSIZE(msize));
        }
#if DEBUG_MALLOC
        if (LOG(szone, ptr)) {
            malloc_report(ASL_LEVEL_INFO, "in tiny_malloc_should_clear(), tiny cache ptr=%p, msize=%d\n", ptr, msize);
        }
#endif
        return ptr;
    }
#endif /* CONFIG_TINY_CACHE */

    // 没有开启了tiny的缓存
    while (1) {
        //先从freelist 查找
        ptr = tiny_malloc_from_free_list(rack, tiny_mag_ptr, mag_index, msize);
        if (ptr) {
            SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
            CHECK(szone, __PRETTY_FUNCTION__);
            if (cleared_requested) {
                memset(ptr, 0, TINY_BYTES_FOR_MSIZE(msize));
            }
            return ptr;
        }

#if CONFIG_RECIRC_DEPOT
        //从一个后备magazine中取出一个可用region，完整地拿过来放到当前magazine，再走一遍上面的步骤。
        if (tiny_get_region_from_depot(rack, tiny_mag_ptr, mag_index, msize)) {
            //再次尝试从freelist 中获取
            ptr = tiny_malloc_from_free_list(rack, tiny_mag_ptr, mag_index, msize);
            if (ptr) {
                SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
                CHECK(szone, __PRETTY_FUNCTION__);
                if (cleared_requested) {
                    memset(ptr, 0, TINY_BYTES_FOR_MSIZE(msize));
                }
                return ptr;
            }
        }
#endif // CONFIG_RECIRC_DEPOT

        // The magazine is exhausted. A new region (heap) must be allocated to satisfy this call to malloc().
        // The allocation, an mmap() system call, will be performed outside the magazine spin locks by the first
        // thread that suffers the exhaustion. That thread sets "alloc_underway" and enters a critical section.
        // Threads arriving here later are excluded from the critical section, yield the CPU, and then retry the
        // allocation. After some time the magazine is resupplied, the original thread leaves with its allocation,
        // and retry-ing threads succeed in the code just above.
        if (!tiny_mag_ptr->alloc_underway) {
            //如果没有正在申请新的的 regin 操作，则进行申请操作
            void *fresh_region;

            // time to create a new region (do this outside the magazine lock)
            //设置当前正在申请新的堆
            tiny_mag_ptr->alloc_underway = TRUE;
            OSMemoryBarrier();
            SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
            //申请新的堆
            fresh_region = mvm_allocate_pages(TINY_REGION_SIZE,
                    TINY_BLOCKS_ALIGN,
                    MALLOC_FIX_GUARD_PAGE_FLAGS(rack->debug_flags),
                    VM_MEMORY_MALLOC_TINY);
            SZONE_MAGAZINE_PTR_LOCK(tiny_mag_ptr);

            // DTrace USDT Probe
            MAGMALLOC_ALLOCREGION(TINY_SZONE_FROM_RACK(rack), (int)mag_index, fresh_region, TINY_REGION_SIZE);

            if (!fresh_region) { // out of memory!
                tiny_mag_ptr->alloc_underway = FALSE;
                OSMemoryBarrier();
                SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
                return NULL;
            }

            region_set_cookie(&REGION_COOKIE_FOR_TINY_REGION(fresh_region));
            //从最近的一个region或者新申请的 region中malloc
            ptr = tiny_malloc_from_region_no_lock(rack, tiny_mag_ptr, mag_index, msize, fresh_region);

            // we don't clear because this freshly allocated space is pristine
            tiny_mag_ptr->alloc_underway = FALSE;
            OSMemoryBarrier();
            SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
            CHECK(szone, __PRETTY_FUNCTION__);
            return ptr;
        } else {
            SZONE_MAGAZINE_PTR_UNLOCK(tiny_mag_ptr);
            yield();
            SZONE_MAGAZINE_PTR_LOCK(tiny_mag_ptr);
        }
    }
    /* NOTREACHED */
}

tiny_malloc_from_free_list函数的作用是从free_list中不断进行各种策略尝试。
当free_list流程仍然找不到可以使用内存，就会使用tiny_get_region_from_depot。
每一个类型的rack指向的magazines ，都会在下标为-1 , magazine_t当做备用：depot，该方法的作用是从备用的depot查找出是否有满足条件的 region如果存在，更新depot和region的关联关系，然后在关联当前的magazine_t和region,之后在再次重复 free_list 过程。

scalable_zone分配过程
首先检查指针指向地址是否有问题。
如果 last free指针上没有挂载内存区块，则放到last free上。
如果有last free，置换内存，并把last free原有内存区块挂载到free list上（在挂载的free list前，会先根据region位图检查前后区块是否能合并成更大区块，如果能会合并成一个）。
合并后所在的 region 如果空闲字节超过一定条件，则将把此region放到后备的magazine 中（-1）。
如果整个region都是空的，则直接还给系统内核。

最后是一张大神画的流程图。

calloc流程图