本文会阐述下面几个问题
1、Class是什么
2、Class的内存布局
3、class_rw_t与class_ro_t的设计哲学
4、分类与class_rw_t的关系
查看源码(源码版本objc4-781.2)
源码地址
打开objc-private.h查看源码,发现Class是一个结构体指针
typedef struct objc_class *Class;
我们继续在源码中搜索“struct objc_class”,如图,发现有5个头文件都有定义,最终确认objc-runtime-new.h中是OC2.0中生效的,其他几个文件都有相关宏定义做了限定
objc_class结构体简略定义如下
struct objc_class : objc_object {
// Class ISA;
Class superclass;
cache_t cache; // formerly cache pointer and vtable
class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags
class_rw_t *data() const {
return bits.data();
}
...
};
发现objc_class继承自objc_object(c++对c结构体做了扩展,允许定义函数,允许继承并且默认访问权限为public这与c++中的class是不同的),我们再看下objc_object的定义
struct objc_object {
private:
isa_t isa;
public:
// ISA() assumes this is NOT a tagged pointer object
Class ISA();
// rawISA() assumes this is NOT a tagged pointer object or a non pointer ISA
Class rawISA();
// getIsa() allows this to be a tagged pointer object
Class getIsa();
...
};
所以现在可以理解为这个结构体大概长这样
struct objc_class {
// Class ISA;
Class superclass;
cache_t cache; // formerly cache pointer and vtable
class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags
class_rw_t *data() const {
return bits.data();
}
...
Class ISA();
// rawISA() assumes this is NOT a tagged pointer object or a non pointer ISA
Class rawISA();
// getIsa() allows this to be a tagged pointer object
Class getIsa();
...
};
下面这个东西是私有成员,所以类内部操作isa的地方使用的是objc_object里面封装的一系列函数,嗯~这很符合开闭原则
private:
isa_t isa;
我们从上到下梳理一下:
定义了一个Class类型的superclass指针,定义了一个cache_t类型的对象,class_data_bits_t类型的对象,注意这里的用词,在OC里面对象即指针,struct则不同,结构体指针在64位系统占8个字节,结构体对象占用的内存大小是内部所有成员变量的字节数总和,当然还要考虑内存对齐原则,iOS系统会按照8字节对齐,16字节为一个开辟单元,嗯~为了访问效率
cache_t 结构解析
cache_t的简略定义如下,保留了所有的成员变量,省略了函数
struct cache_t {
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
explicit_atomic<struct bucket_t *> _buckets;
explicit_atomic<mask_t> _mask;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
explicit_atomic<uintptr_t> _maskAndBuckets;
mask_t _mask_unused;
// How much the mask is shifted by.
static constexpr uintptr_t maskShift = 48;
// Additional bits after the mask which must be zero. msgSend
// takes advantage of these additional bits to construct the value
// `mask << 4` from `_maskAndBuckets` in a single instruction.
static constexpr uintptr_t maskZeroBits = 4;
// The largest mask value we can store.
static constexpr uintptr_t maxMask = ((uintptr_t)1 << (64 - maskShift)) - 1;
// The mask applied to `_maskAndBuckets` to retrieve the buckets pointer.
static constexpr uintptr_t bucketsMask = ((uintptr_t)1 << (maskShift - maskZeroBits)) - 1;
// Ensure we have enough bits for the buckets pointer.
static_assert(bucketsMask >= MACH_VM_MAX_ADDRESS, "Bucket field doesn't have enough bits for arbitrary pointers.");
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
// _maskAndBuckets stores the mask shift in the low 4 bits, and
// the buckets pointer in the remainder of the value. The mask
// shift is the value where (0xffff >> shift) produces the correct
// mask. This is equal to 16 - log2(cache_size).
explicit_atomic<uintptr_t> _maskAndBuckets;
mask_t _mask_unused;
static constexpr uintptr_t maskBits = 4;
static constexpr uintptr_t maskMask = (1 << maskBits) - 1;
static constexpr uintptr_t bucketsMask = ~maskMask;
#else
#error Unknown cache mask storage type.
#endif
#if __LP64__
uint16_t _flags;
#endif
uint16_t _occupied;
public:
...
};
嗯~还是有点长,我们来解读一下,里面有一些条件编译指令,还有一些static变量,我们知道如下条件编译只会走一个分支,静态变量存储在静态区,结构体不会为其分配内存空间,所以cache_t对象到底占多大内存呢?我们再次精简下结构
struct cache_t {
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_OUTLINED
explicit_atomic<struct bucket_t *> _buckets;
explicit_atomic<mask_t> _mask;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
explicit_atomic<uintptr_t> _maskAndBuckets;
mask_t _mask_unused;
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
explicit_atomic<uintptr_t> _maskAndBuckets;
mask_t _mask_unused;
#else
#error Unknown cache mask storage type.
#endif
#if __LP64__
uint16_t _flags;
#endif
uint16_t _occupied;
public:
...
};
explicit_atomic是个结构体模板,大小是传入参数的大小
template <typename T>
struct explicit_atomic : public std::atomic<T> {
explicit explicit_atomic(T initial) noexcept : std::atomic<T>(std::move(initial)) {}
operator T() const = delete;
T load(std::memory_order order) const noexcept {
return std::atomic<T>::load(order);
}
void store(T desired, std::memory_order order) noexcept {
std::atomic<T>::store(desired, order);
}
static explicit_atomic<T> *from_pointer(T *ptr) {
static_assert(sizeof(explicit_atomic<T> *) == sizeof(T *),
"Size of atomic must match size of original");
explicit_atomic<T> *atomic = (explicit_atomic<T> *)ptr;
ASSERT(atomic->is_lock_free());
return atomic;
}
};
mask_t又是什么呢?嗯~32位无符号整形,占4个字节
typedef uint32_t mask_t;
uintptr_t又是如何定义的呢?嗯~64位系统占8个字节
typedef unsigned long int uintptr_t;
所以我们来计算一下cache_t的大小,8+4+2+2,嗯~16个字节
class_data_bits_t 结构解析
内部只有一个成员变量,嗯~8个字节
struct class_data_bits_t {
uintptr_t bits;
};
接下来看这个常函数,返回值是class_rw_t指针
class_rw_t *data() const {
return bits.data();
}
class_rw_t 结构解析
class_rw_t简略定义如下,嗯~终于看到核心的东西了,如下四个函数依次返回了class_ro_t类型的结构体指针、method_array_t、property_array_t、protocol_array_t类型的对象
struct class_rw_t {
const class_ro_t *ro() const {...}
const method_array_t methods() const {...}
const property_array_t properties() const {...}
const protocol_array_t protocols() const {...}
};
先抛开class_ro_t不说,我们继续阅读源码,发现如下事实,他们同时继承于模板类list_array_tt,内部实现了添加、存储、释放等管理函数
class method_array_t : public list_array_tt<method_t, method_list_t>
{
...
};
class property_array_t : public list_array_tt<property_t, property_list_t>
{
...
};
class protocol_array_t : public list_array_tt<protocol_ref_t, protocol_list_t>
{
...
};
我们要重点阅读下这个:
模板类的attachLists函数,这是OC支持动态性的核心函数,if有多个元素,则通过memmove函数把old数据移动到array()->lists,再通过memcpy函数将addedLists数据拷贝过来,else if 本来list为空则直接赋值为addedLists,else做了一对多合并,所以从数据结构来讲
method、property、protocol都支持了动态更新
template <typename Element, typename List>
class list_array_tt {
void attachLists(List* const * addedLists, uint32_t addedCount) {
if (addedCount == 0) return;
if (hasArray()) {
// many lists -> many lists
uint32_t oldCount = array()->count;
uint32_t newCount = oldCount + addedCount;
setArray((array_t *)realloc(array(), array_t::byteSize(newCount)));
array()->count = newCount;
memmove(array()->lists + addedCount, array()->lists,
oldCount * sizeof(array()->lists[0]));
memcpy(array()->lists, addedLists,
addedCount * sizeof(array()->lists[0]));
}
else if (!list && addedCount == 1) {
// 0 lists -> 1 list
list = addedLists[0];
}
else {
// 1 list -> many lists
List* oldList = list;
uint32_t oldCount = oldList ? 1 : 0;
uint32_t newCount = oldCount + addedCount;
setArray((array_t *)malloc(array_t::byteSize(newCount)));
array()->count = newCount;
if (oldList) array()->lists[addedCount] = oldList;
memcpy(array()->lists, addedLists,
addedCount * sizeof(array()->lists[0]));
}
}
};
class_ro_t 结构解析
看着是不是很眼熟,嗯~没错,就是上面提到的oldList,同样有方法、属性、协议还有成员变量
struct class_ro_t {
uint32_t flags;
uint32_t instanceStart;
uint32_t instanceSize;
#ifdef __LP64__
uint32_t reserved;
#endif
const uint8_t * ivarLayout;
const char * name;
method_list_t * baseMethodList;
protocol_list_t * baseProtocols;
const ivar_list_t * ivars;
const uint8_t * weakIvarLayout;
property_list_t *baseProperties;
}
何以见得class_ro_t中的属性、方法等成员变量就是oldLists呢,再看一段源码
/***********************************************************************
* realizeClassWithoutSwift
* Performs first-time initialization on class cls,
* including allocating its read-write data.
* Does not perform any Swift-side initialization.
* Returns the real class structure for the class.
* Locking: runtimeLock must be write-locked by the caller
**********************************************************************/
static Class realizeClassWithoutSwift(Class cls, Class previously)
{
runtimeLock.assertLocked();
class_rw_t *rw;
Class supercls;
Class metacls;
if (!cls) return nil;
if (cls->isRealized()) return cls;
ASSERT(cls == remapClass(cls));
// fixme verify class is not in an un-dlopened part of the shared cache?
auto ro = (const class_ro_t *)cls->data();
auto isMeta = ro->flags & RO_META;
if (ro->flags & RO_FUTURE) {
// This was a future class. rw data is already allocated.
rw = cls->data();
ro = cls->data()->ro();
ASSERT(!isMeta);
cls->changeInfo(RW_REALIZED|RW_REALIZING, RW_FUTURE);
} else {
// Normal class. Allocate writeable class data.
rw = objc::zalloc<class_rw_t>();
rw->set_ro(ro);
rw->flags = RW_REALIZED|RW_REALIZING|isMeta;
cls->setData(rw);
}
...
}
嗯~Apple给我们做的注释很清楚了Performs first-time initialization on class cls,类第一次初始化的时候,都会执行如上函数,类的初始信息存储在class_ro_t中,经过一顿操作,将初始信息ro赋值给rw中的ro,bits.data()返回的就是rw指针,bits是什么呢。是不是还是很眼熟,回顾一下,嗯~就是class_data_bits_t
struct objc_class {
// Class ISA;
Class superclass;
cache_t cache; // formerly cache pointer and vtable
class_data_bits_t bits; // class_rw_t * plus custom rr/alloc flags
class_rw_t *data() const {
return bits.data();
}
...
};
下面再看一段源码,嗯~没有删减,看到了吧,类对象初始化的时候会执行到extAlloc函数,从ro中取出method_list_t、property_list_t、protocol_list_t然后执行attachLists方法合并到rw
class_rw_ext_t *class_rw_t::extAlloc(const class_ro_t *ro, bool deepCopy)
{
runtimeLock.assertLocked();
auto rwe = objc::zalloc<class_rw_ext_t>();
rwe->version = (ro->flags & RO_META) ? 7 : 0;
method_list_t *list = ro->baseMethods();
if (list) {
if (deepCopy) list = list->duplicate();
rwe->methods.attachLists(&list, 1);
}
// See comments in objc_duplicateClass
// property lists and protocol lists historically
// have not been deep-copied
//
// This is probably wrong and ought to be fixed some day
property_list_t *proplist = ro->baseProperties;
if (proplist) {
rwe->properties.attachLists(&proplist, 1);
}
protocol_list_t *protolist = ro->baseProtocols;
if (protolist) {
rwe->protocols.attachLists(&protolist, 1);
}
set_ro_or_rwe(rwe, ro);
return rwe;
}
class_rw_t与class_ro_t的设计哲学
apple为什么会定义两个结构差不多的结构体来实现Class呢?ro:read only,rw:read write,原因是class_ro_t是编译期的产物,类源文件中的属性、方法、协议、成员变量在编译期就存在class_ro_t中,而class_rw_t则是运行时的产物,class_rw_t的设计就是为了支撑Class的动态性,运行时将class_ro_t中的属性、协议、方法动态合并到对应的数据结构
分类真的可以添加属性
那么category呢,源码很长,但是还是忍不住全贴出来了,看见了吧,在调用attachCategories之前一定会调用一句auto rwe = cls->data()->extAllocIfNeeded();,而extAllocIfNeeded()则会调用到extAlloc()函数,extAlloc()内部会执行拷贝ro到rw,所以我们总说category里面的与原类中同名的方法会被优先调用到,原因就在此,以此类推,一个类的多个分类后被加载的分类同名方法总是优先被查询到
static void
attachCategories(Class cls, const locstamped_category_t *cats_list, uint32_t cats_count,
int flags)
{
if (slowpath(PrintReplacedMethods)) {
printReplacements(cls, cats_list, cats_count);
}
if (slowpath(PrintConnecting)) {
_objc_inform("CLASS: attaching %d categories to%s class '%s'%s",
cats_count, (flags & ATTACH_EXISTING) ? " existing" : "",
cls->nameForLogging(), (flags & ATTACH_METACLASS) ? " (meta)" : "");
}
/*
* Only a few classes have more than 64 categories during launch.
* This uses a little stack, and avoids malloc.
*
* Categories must be added in the proper order, which is back
* to front. To do that with the chunking, we iterate cats_list
* from front to back, build up the local buffers backwards,
* and call attachLists on the chunks. attachLists prepends the
* lists, so the final result is in the expected order.
*/
constexpr uint32_t ATTACH_BUFSIZ = 64;
method_list_t *mlists[ATTACH_BUFSIZ];
property_list_t *proplists[ATTACH_BUFSIZ];
protocol_list_t *protolists[ATTACH_BUFSIZ];
uint32_t mcount = 0;
uint32_t propcount = 0;
uint32_t protocount = 0;
bool fromBundle = NO;
bool isMeta = (flags & ATTACH_METACLASS);
auto rwe = cls->data()->extAllocIfNeeded();
for (uint32_t i = 0; i < cats_count; i++) {
auto& entry = cats_list[i];
method_list_t *mlist = entry.cat->methodsForMeta(isMeta);
if (mlist) {
if (mcount == ATTACH_BUFSIZ) {
prepareMethodLists(cls, mlists, mcount, NO, fromBundle);
rwe->methods.attachLists(mlists, mcount);
mcount = 0;
}
mlists[ATTACH_BUFSIZ - ++mcount] = mlist;
fromBundle |= entry.hi->isBundle();
}
property_list_t *proplist =
entry.cat->propertiesForMeta(isMeta, entry.hi);
if (proplist) {
if (propcount == ATTACH_BUFSIZ) {
rwe->properties.attachLists(proplists, propcount);
propcount = 0;
}
proplists[ATTACH_BUFSIZ - ++propcount] = proplist;
}
protocol_list_t *protolist = entry.cat->protocolsForMeta(isMeta);
if (protolist) {
if (protocount == ATTACH_BUFSIZ) {
rwe->protocols.attachLists(protolists, protocount);
protocount = 0;
}
protolists[ATTACH_BUFSIZ - ++protocount] = protolist;
}
}
if (mcount > 0) {
prepareMethodLists(cls, mlists + ATTACH_BUFSIZ - mcount, mcount, NO, fromBundle);
rwe->methods.attachLists(mlists + ATTACH_BUFSIZ - mcount, mcount);
if (flags & ATTACH_EXISTING) flushCaches(cls);
}
rwe->properties.attachLists(proplists + ATTACH_BUFSIZ - propcount, propcount);
rwe->protocols.attachLists(protolists + ATTACH_BUFSIZ - protocount, protocount);
}
如上,我们看分类里面的属性也会被添加到类的属性列表里,那为什么我们说,分类不能添加属性呢?明明添加进去了啊:
嗯~这是因为我们访问属性需要通过点语法,最终是通过get方法访问成员变量,而分类添加的属性不会生成get/set方法,并且成员变成是存在于class_ro_t中,分类并不会动态添加成员变量,更无法通过下划线访问,因为成员变量不存在所以都不能通过编译,那么如何让分类里面添加的属性生效呢,就是需要手动实现getter和setter方法,并且模拟添加成员变量
总结
Class的实现细节较多,本文只讨论了内存结构,下篇打算讨论下isa~
1、Class是什么
继承objc_object的结构体,objc_class类型的指针
2、Class的内存布局
isa
指向类对象,32位下是一个cls指针,64位下会存储类的很多相关信息,如:是否有自定义c++析构函数,是否有关联对象,是否有弱引用,是否用sidetable存储优化引用计数等
superclass
父类的指针
cache
缓存调用过的本类方法列表
class_rw_t
存储动态数据类型的结构体,通过attachLists函数,支持method、property、protocol的动态更新
class_ro_t
静态数据类型,类的初始信息存储在class_ro_t中,运行时,从ro中取出method_list_t、property_list_t、protocol_list_t然后执行attachLists方法合并到rw
3、class_rw_t与class_ro_t的设计哲学
class_rw_t的设计就是为了支撑Class的动态性,运行时将class_ro_t中的属性、协议、方法动态合并到对应的数据结构,注意:不包括成员变量(动态添加删除成员变量会造成内存地址混乱)
4、分类与class_rw_t的关系
attachCategories函数,负责将分类中的方法合并到class_rw_t,再此之前已经将ro合并到rw,因此category里面的与原类中同名的方法会被优先调用到,以此类推,一个类的多个分类后被加载的分类同名方法总是优先被查询到