objc_msgSend流程分析

一.什么是 Runtime？

我们都知道，将源代码转换为可执行的程序，通常要经过三个步骤：编译、链接、运行。不同的编译语言，在这三个步骤中所进行的操作又有些不同。

C 语言作为一门静态类语言，在编译阶段就已经确定了所有变量的数据类型，同时也确定好了要调用的函数，以及函数的实现。

而 Objective-C 语言是一门动态语言。在编译阶段并不知道变量的具体数据类型，也不知道所真正调用的哪个函数。只有在运行时间才检查变量的数据类型，同时在运行时才会根据函数名查找要调用的具体函数。这样在程序没运行的时候，我们并不知道调用一个方法具体会发生什么。

Objective-C 语言把一些决定性的工作从编译阶段、链接阶段推迟到 运行时阶段 的机制，使得 Objective-C 变得更加灵活。我们甚至可以在程序运行的时候，动态的去修改一个方法的实现，这也为大为流行的『热更新』提供了可能性。

而实现 Objective-C 语言运行时机制的一切基础就是 Runtime。

Runtime 实际上是一个库，这个库使我们可以在程序运行时动态的创建对象、检查对象，修改类和对象的方法

Runtime交互的三种方式

Objective-C Code直接调用，比如直接调用方法[self say]、#selector()等。
Framework&Serivce，比如NSSelectorFromString、isKindeofClass、isMemberOfClass等方法。
RuntimeAPI，比如sel_registerName、class_getInstanceSize等底层方法。

runtime 特性：动态类型,动态绑定,动态加载

二.objc_msgSend探索

1.什么是objc_msgSend?官方文档给出这么一个解释

When it encounters a method call, the compiler generates a call to one of the functions objc_msgSend, objc_msgSend_stret, objc_msgSendSuper, or objc_msgSendSuper_stret. Messages sent to an object’s superclass (using the super keyword) are sent using objc_msgSendSuper; other messages are sent using objc_msgSend. Methods that have data structures as return values are sent using objc_msgSendSuper_stret and objc_msgSend_stret.

大概意思是：
遇到方法调用时，编译器会生成对objc_msgSend，objc_msgSend_stret，objc_msgSendSuper或objc_msgSendSuper_stret函数之一的调用。发送到对象超类的消息（使用super关键字）是使用objc_msgSendSuper发送的；其他消息使用objc_msgSend发送。使用objc_msgSendSuper_stret和objc_msgSend_stret`发送具有数据结构作为返回值的方法。

2.准备可运行objc源码工程

int main(int argc, const char * argv[]) {
    @autoreleasepool { 
        PHPerson * person = [[PHPerson alloc]init];
        Class cls = object_getClass(person);
        [person doFirst];
        [person doSecond];
        [person doThird];
        NSLog(@"");
    }
    return 0;
}

新建一个PHPerson类，对象方法doFirst/doSecond/doThird
cd到当前工程main.m文件所在到文件夹
执行 clang -rewrite-objc main.m 转成.cpp文件
找到.cpp文件

int main(int argc, const char * argv[]) {
    /* @autoreleasepool */ { __AtAutoreleasePool __autoreleasepool; 

        PHPerson * person = ((PHPerson *(*)(id, SEL))(void *)objc_msgSend)((id)((PHPerson *(*)(id, SEL))(void *)objc_msgSend)((id)objc_getClass("PHPerson"), sel_registerName("alloc")), sel_registerName("init"));
        Class cls = object_getClass(person);
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)person, sel_registerName("doFirst"));
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)person, sel_registerName("doSecond"));
        ((void (*)(id, SEL))(void *)objc_msgSend)((id)person, sel_registerName("doThird"));
        NSLog((NSString *)&__NSConstantStringImpl__var_folders_4j_v597272j6kb0q7g5x72k3xhw0000gn_T_main_d18b1e_mi_0);
    }
    return 0;
}

每次调用方法的时候都会存在((void (*)(id, SEL))(void *)objc_msgSend)((id)person, sel_registerName("方法名"));
3.断点调试

下断点
开启汇编调试

结果：我们通过给objc_msgSend下符号断点得知objc_msgSend函数在我们的libobjc.A.dylib中。

4.汇编分析(笔者汇编不大懂，查资料理解，不正确的地方希望多多指正)

关掉汇编分析，下符号断点，并重新运行工程。

image.png

跳转到_objc_msg-x86_64.s下根据r10 = self->isa 可知此过程是获取对象的isa指针
接下来我们在libobjc.A.dylib中来查看我们的objc_msgSend源码。
由于app真机环境下基于arm64架构，我们以objc-msg-arm64.s为研究对象。

我们发现objc_msgSend使用汇编来实现的，为什么要用汇编来实现呢？有以下几点原

汇编更加容易被机器识别，效率更高。

C语言中不可以通过一个函数来保留未知的参数并且跳转到任意的函数指针。C语言没有满足这些事情的必要特性。

截取部分代码

    ENTRY _objc_msgSend
    UNWIND _objc_msgSend, NoFrame
    NilTest NORMAL
    GetIsaFast NORMAL       // r10 = self->isa
    // calls IMP on success
    CacheLookup NORMAL, CALL, _objc_msgSend
    NilTestReturnZero NORMAL

    GetIsaSupport NORMAL

我们可以看到在获取isa之后在方法的缓存列表继续查找

缓存查找

全局搜索CacheLookup

image.png

截取代码

.macro CacheLookup
LLookupStart$1:

    // p1 = SEL, p16 = isa
    ldr p11, [x16, #CACHE]              // p11 = mask|buckets

#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    and p10, p11, #0x0000ffffffffffff   // p10 = buckets
    and p12, p1, p11, LSR #48       // x12 = _cmd & mask
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    and p10, p11, #~0xf         // p10 = buckets
    and p11, p11, #0xf          // p11 = maskShift
    mov p12, #0xffff
    lsr p11, p12, p11               // p11 = mask = 0xffff >> p11
    and p12, p1, p11                // x12 = _cmd & mask
#else
#error Unsupported cache mask storage for ARM64.
#endif

    add p12, p10, p12, LSL #(1+PTRSHIFT)
                     // p12 = buckets + ((_cmd & mask) << (1+PTRSHIFT))

    ldp p17, p9, [x12]      // {imp, sel} = *bucket
1:  cmp p9, p1          // if (bucket->sel != _cmd)
    b.ne    2f          //     scan more
    CacheHit $0         // call or return imp
    
2:  // not hit: p12 = not-hit bucket
    CheckMiss $0            // miss if bucket->sel == 0
    cmp p12, p10        // wrap if bucket == buckets
    b.eq    3f
    ldp p17, p9, [x12, #-BUCKET_SIZE]!  // {imp, sel} = *--bucket
    b   1b          // loop

3:  // wrap: p12 = first bucket, w11 = mask
#if CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_HIGH_16
    add p12, p12, p11, LSR #(48 - (1+PTRSHIFT))
                    // p12 = buckets + (mask << 1+PTRSHIFT)
#elif CACHE_MASK_STORAGE == CACHE_MASK_STORAGE_LOW_4
    add p12, p12, p11, LSL #(1+PTRSHIFT)
                    // p12 = buckets + (mask << 1+PTRSHIFT)
#else
#error Unsupported cache mask storage for ARM64.
#endif
{imp, sel} = *bucket
    // Clone scanning loop to miss instead of hang when cache is corrupt.
    // The slow path may detect any corruption and halt later.

    ldp p17, p9, [x12]      // {imp, sel} = *bucket
1:  cmp p9, p1          // if (bucket->sel != _cmd)
    b.ne    2f          //     scan more
    CacheHit $0         // call or return imp
    
2:  // not hit: p12 = not-hit bucket
    CheckMiss $0            // miss if bucket->sel == 0
    cmp p12, p10        // wrap if bucket == buckets
    b.eq    3f
    ldp p17, p9, [x12, #-BUCKET_SIZE]!  // {imp, sel} = *--bucket
    b   1b          // loop

LLookupEnd$1:
LLookupRecover$1:
3:  // double wrap
    JumpMiss $0

.endmacro

可以看到CacheHit和CheckMiss两个函数，CacheHit是命中缓存（call or return imp，方法返回的imp指针），CheckMiss是没有在缓存方法列表找到方法的函数，此时主要分析CheckMiss这个函数内部

.macro CheckMiss
    // miss if bucket->sel == 0
.if $0 == GETIMP
    cbz p9, LGetImpMiss
.elseif $0 == NORMAL
    cbz p9, __objc_msgSend_uncached
.elseif $0 == LOOKUP
    cbz p9, __objc_msgLookup_uncached
.else
.abort oops
.endif
.endmacro

由于现在的$0是NORMAL，继续查找__objc_msgSend_uncached，发现内部都调用了MethodTableLookup函数返回

arm64架构下MethodTableLookup函数部分源码


.macro MethodTableLookup
    
....
    // lookUpImpOrForward(obj, sel, cls, LOOKUP_INITIALIZE | LOOKUP_RESOLVER)
    // receiver and selector already in x0 and x1
    bl  _lookUpImpOrForward
....
    mov sp, fp
    ldp fp, lr, [sp], #16
    AuthenticateLR
.endmacro

内部调用了_lookUpImpOrForward函数，看看这个函数内部干了啥

IMP lookUpImpOrForward(id inst, SEL sel, Class cls, int behavior)
{
...
}

注释中有一段比较重要段话翻译如下

runtimeLock在isrealization和isInitialized检查期间被保持，以防止与并发实现的竞争。runtimeLock在方法搜索期间进行，使方法查找+缓存填充相对于方法添加具有原子性。
否则，可能会添加一个类别，但会无限期地忽略它，因为在缓存刷新后会用旧值重新填充缓存。

if (fastpath(behavior & LOOKUP_CACHE)) {
        imp = cache_getImp(cls, sel);
        if (imp) goto done_nolock;
   }

如果sel == initialize，则class_initialize将发送+initialize，然后信使将在此过程结束后再次发送+initialize。当然，如果这不是被信使调用，那么它不会发生。

if (slowpath(!cls->isRealized())) {
        cls = realizeClassMaybeSwiftAndLeaveLocked(cls, runtimeLock);
    }
    if (slowpath((behavior & LOOKUP_INITIALIZE) && !cls->isInitialized())) {
        cls = initializeAndLeaveLocked(cls, inst, runtimeLock);
    }

递归超找父类方法缓存列表

可以看到在查找方法实现的时候有个cache_getImp函数，继续跟进

END_ENTRY _cache_getImp


/********************************************************************
*
* id _objc_msgForward(id self, SEL _cmd,...);
*
* _objc_msgForward is the externally-callable
*   function returned by things like method_getImplementation().
* _objc_msgForward_impcache is the function pointer actually stored in
*   method caches.
*
********************************************************************/
    STATIC_ENTRY __objc_msgForward_impcache
    // No stret specialization.
    b   __objc_msgForward
    END_ENTRY __objc_msgForward_impcache

    ENTRY __objc_msgForward

    adrp    x17, __objc_forward_handler@PAGE
    ldr p17, [x17, __objc_forward_handler@PAGEOFF]
    TailCallFunctionPointer x17

    END_ENTRY __objc_msgForward
    cache_fill
    ENTRY _objc_msgSend_noarg
    b   _objc_msgSend
    END_ENTRY _objc_msgSend_noarg
realizeClassMaybeSwiftAndLeaveLocked
    ENTRY _objc_msgSend_debug
    b   _objc_msgSend
    END_ENTRY _objc_msgSend_debug

    ENTRY _objc_msgSendSuper2_debug
    b   _objc_msgSendSuper2
    END_ENTRY _objc_msgSendSuper2_debug

    ENTRY _method_invoke
    // x1 is method triplet instead of SEL
    add p16, p1, #METHOD_IMP
    ldr p17, [x16]
    ldr p1, [x1, #METHOD_NAME]
    TailCallMethodListImp x17, x16
    END_ENTRY _method_invoke

注释的一段话大概意思是:_objc_msgForward是外部可调用的函数，由method_getImplementation()等函数返回。_objc_msgForward_impcache是实际存储在方法缓存中的函数指针。

method_getImplementation(Method m)
{
    return m ? m->imp : nil;
}

objc_msgSend流程分析

一.什么是 Runtime？

二.objc_msgSend探索

缓存查找

三.总结