iOS 底层学习12

iOS 底层第12天的学习。今天终于要进入下一个新的篇章了,你是否还记得在 iOS 底层学习1.1 的时候有一个程序加载流程:dyld_start -> dyld::main->dyld:initializeMainExecutable ->libSystem_initializer。 而这个新的篇章就是 dyld

what is dyld ?

  • 我们先看一下这个图


  • 图上的 链接 = dyld。程序员编写的代码(源文件)->编译->汇编,再通过链接的形式把这些和静动库串联起来,最终生成可执行文件
  • 名词解释: dyld(the dynamic link editor) 是苹果的动态链接器,是苹果操作系统一个重要组成部分.

从名词解释可知dyld是动态库链接,那到底是怎么动态链接的呢?接下来我们从 dyld源码 入手开始分析
那源码有那么多,要分析 dyld 切入点是什么呢?

dyld_start

  • 已知 dyld_start 是整个程序运行的入口,以入口作为切入点开始探索最适合不过了,打开 dyld 源码工程全局搜索 dyld_start
  • 开始进入汇编模式,有个注解 call dyldbootstrap::start 这里的 dyldbootstrapc++ 函数的命名空间,start 是命名空间里的方法。全局搜索 dyldbootstrap & start
namespace dyldbootstrap {
       ... 
 //
//  This is code to bootstrap dyld.  This work in normally done for a program by dyld and crt.
//  In dyld we have to do this manually.
//
uintptr_t start(const dyld3::MachOLoaded* appsMachHeader, int argc, const char* argv[],
                const dyld3::MachOLoaded* dyldsMachHeader, uintptr_t* startGlue)
{
       // ...
       //  bootstrapping dyld

    _subsystem_init(apple);

    // now that we are done bootstrapping dyld, call dyld's main
    uintptr_t appsSlide = appsMachHeader->getSlide();
    return dyld::_main((macho_header*)appsMachHeader, appsSlide, argc, argv, envp, apple, startGlue);
}
}
  • 找到return后最重要的函数 dyld::_main 直接进入
  • 缩进一看代码,1000多行左右,但我们只要找到最主要的代码就行了。
  • 我们现在已经找到了dyld::main,找到这个的目的是什么呢? 就是为了分析 dyld 是怎么链接镜像文件以及它的一个主流程。
//
// Entry point for dyld.  The kernel loads dyld and jumps to __dyld_start which
// sets up some registers and call this function.
//
// Returns address of main() in target program which __dyld_start jumps to
//
uintptr_t
_main(const macho_header* mainExecutableMH, uintptr_t mainExecutableSlide, 
        int argc, const char* argv[], const char* envp[], const char* apple[], 
        uintptr_t* startGlue)
{

  // ...  省略部分代码
  // 主程序执行的一些信息的处理
 getHostInfo(mainExecutableMH, mainExecutableSlide);
// Set the platform ID in the all image infos so debuggers can tell the process type
 { ... }

//  Check to see if we need to override the platform.
// 一些 dyld_root_path 的处理
 { ... }

    // 配置执行文件的操作处理
     configureProcessRestrictions(mainExecutableMH, envp);

    // Check if we should force dyld3.  Note we have to do this outside of the regular env parsing due to AMFI
 { ... }

    // load shared cache
    checkSharedRegionDisable((dyld3::MachOLoaded*)mainExecutableMH, mainExecutableSlide); // 系统级别,共享缓存处理

     // ...  省略部分代码

  {
    // find entry point for main executable
    result = (uintptr_t)sMainExecutable->getEntryFromLC_MAIN();
    if ( result != 0 ) {
        // main executable uses LC_MAIN, we need to use helper in libdyld to call into main()
        if ( (gLibSystemHelpers != NULL) && (gLibSystemHelpers->version >= 9) )
            *startGlue = (uintptr_t)gLibSystemHelpers->startGlueToCallExit;
        else
            halt("libdyld.dylib support not present for LC_MAIN");
    }
    else {
        // main executable uses LC_UNIXTHREAD, dyld needs to let "start" in program set up for main()
        result = (uintptr_t)sMainExecutable->getEntryFromLC_UNIXTHREAD();
        *startGlue = 0;
    }
  }
   return result 
}
  • 可知 main 函数返回 result, 根据 result 找到 sMainExecutable,根据这个 sMainExecutable 继续探究
    // ... 
    // instantiate ImageLoader for main executable
    sMainExecutable = instantiateFromLoadedImage(mainExecutableMH, mainExecutableSlide, sExecPath); 
    // sMainExecutable 的初始化

    // load any inserted libraries
    if  ( sEnv.DYLD_INSERT_LIBRARIES != NULL ) {
        for (const char* const* lib = sEnv.DYLD_INSERT_LIBRARIES; *lib != NULL; ++lib) 
         loadInsertedDylib(*lib);
    }

    // link main executable 
    // {...}
    link(sMainExecutable, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
     
    // link any inserted libraries
    // do this after linking main executable so that any dylibs pulled in by inserted 
    // dylibs (e.g. libSystem) will not be in front of dylibs the program uses
    //  {...}
    if (sInsertedDylibCount > 0 ) {
        for(unsigned int i=0; i < sInsertedDylibCount; ++i) {
          ImageLoader* image = sAllImages[i+1];
          link(image, sEnv.DYLD_BIND_AT_LAUNCH, true, ImageLoader::RPathChain(NULL, NULL), -1);
          image->setNeverUnloadRecursive();
       }
    }
    //  {...}
    // <rdar://problem/12186933> 
   // do weak binding only after all inserted images linked       
    sMainExecutable->weakBind(gLinkContext);
    //  {...}
    // run all initializer
    initializeMainExecutable(); 
    //  {...}
    // notify any montoring proccesses that this process is about to enter main()
    notifyMonitoringDyldMain();
  
  • sMainExecutable -> instantiateFromLoadedImage 实例化主程序,镜像文件的加载
  • load inserted dylib 加载插入的动态库
  • link main executable 链接主程序
  • link any inserted libraries 链接插入的动态库
  • sMainExecutable -> weakBind 弱引用绑定主程序
  • initializeMainExecutable 初始化,运行主程序
  • notifyMonitoringDyldMain 通知 dyld 可以进行 main 函数

这里面最重要的就是 initializeMainExecutableinitializeMainExecutable里到底是如何实现?

  • 继续探索 initializeMainExecutable
void initializeMainExecutable()
{
    // record that we've reached this step
    gLinkContext.startedInitializingMainExecutable = true;

    // run initialzers for any inserted dylibs
    ImageLoader::InitializerTimingList initializerTimes[allImagesCount()];
    initializerTimes[0].count = 0;
    const size_t rootCount = sImageRoots.size();
    if ( rootCount > 1 ) {
        for(size_t i=1; i < rootCount; ++i) {
            sImageRoots[i]->runInitializers(gLinkContext, initializerTimes[0]);
        }
    }
    // 镜像文件的初始化
    // run initializers for main executable and everything it brings up 
    sMainExecutable->runInitializers(gLinkContext, initializerTimes[0]);
    // { ... }
}

  • 进入ImageLoader::runInitializers 探索
void ImageLoader::runInitializers(const LinkContext& context, InitializerTimingList& timingInfo)
{
    uint64_t t1 = mach_absolute_time();
    mach_port_t thisThread = mach_thread_self();
    ImageLoader::UninitedUpwards up;
    up.count = 1;
    up.imagesAndPaths[0] = { this, this->getPath() };
    //  执行 Initializers
    processInitializers(context, thisThread, timingInfo, up);
    context.notifyBatch(dyld_image_state_initialized, false);
    //
    mach_port_deallocate(mach_task_self(), thisThread);
    uint64_t t2 = mach_absolute_time();
    fgTotalInitTime += (t2 - t1);
}
  • 进入 ImageLoader::processInitializers 探索
void ImageLoader::processInitializers(const LinkContext& context, mach_port_t thisThread,
                                     InitializerTimingList& timingInfo, ImageLoader::UninitedUpwards& images)
{
    uint32_t maxImageCount = context.imageCount()+2;
    ImageLoader::UninitedUpwards upsBuffer[maxImageCount];
    ImageLoader::UninitedUpwards& ups = upsBuffer[0];
    ups.count = 0;
    // Calling recursive init on all images in images list, building a new list of
    // uninitialized upward dependencies.
    for (uintptr_t i=0; i < images.count; ++i) {
        images.imagesAndPaths[i].first->recursiveInitialization(context, thisThread, images.imagesAndPaths[i].second, timingInfo, ups);
    }
    //  { ... }
}
  • 进入 ImageLoader:: recursiveInitialization 探索
void ImageLoader::recursiveInitialization(const LinkContext& context, mach_port_t this_thread, const char* pathToInitialize,
                                          InitializerTimingList& timingInfo, UninitedUpwards& uninitUps)
{  
       try {
            // initialize lower level libraries first
            for(unsigned int i=0; i < libraryCount(); ++i) {
                ImageLoader* dependentImage = libImage(i);
                if ( dependentImage != NULL ) {
                    // don't try to initialize stuff "above" me yet
                    if ( libIsUpward(i) ) {
                        uninitUps.imagesAndPaths[uninitUps.count] = { dependentImage, libPath(i) };
                        uninitUps.count++;
                    }
                    else if ( dependentImage->fDepth >= fDepth ) {
                        // 依赖文件的加载
                        dependentImage->recursiveInitialization(context, this_thread, libPath(i), timingInfo, uninitUps);
                    }
                }
            }
        
            // 核心代码
            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            // initialize this image
            bool hasInitializers = this->doInitialization(context);

            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_initialized, this, NULL);
            // { ... }
        }

}
  • 全局 搜索 notifySingle , 寻找 notifySingle 是在何时进行赋值的,以及是如何实现的
  • notifySingle 赋值
gLinkContext.notifySingle           = &notifySingle;
  • notifySingle 实现
static void notifySingle(dyld_image_states state, const ImageLoader* image, ImageLoader::InitializerTimingList* timingInfo)
{
    //dyld::log("notifySingle(state=%d, image=%s)\n", state, image->getPath());
    std::vector<dyld_image_state_change_handler>* handlers = stateToHandlers(state, sSingleHandlers);
    if ( handlers != NULL ) {
        dyld_image_info info;
        info.imageLoadAddress   = image->machHeader();
        info.imageFilePath      = image->getRealPath();
        info.imageFileModDate   = image->lastModified();
        for (std::vector<dyld_image_state_change_handler>::iterator it = handlers->begin(); it != handlers->end(); ++it) {
             // ...
        }
    }
    if ( (state == dyld_image_state_dependents_initialized) && (sNotifyObjCInit != NULL) && image->notifyObjC() ) {
        uint64_t t0 = mach_absolute_time();
        dyld3::ScopedTimer timer(DBG_DYLD_TIMING_OBJC_INIT, (uint64_t)image->machHeader(), 0, 0);
         // ----  核心  sNotifyObjCInit  
        (*sNotifyObjCInit)(image->getRealPath(), image->machHeader());
         //  ----
        uint64_t t1 = mach_absolute_time();
        uint64_t t2 = mach_absolute_time();
        uint64_t timeInObjC = t1-t0;
        uint64_t emptyTime = (t2-t1)*100;
        if ( (timeInObjC > emptyTime) && (timingInfo != NULL) ) {
            timingInfo->addTime(image->getShortName(), timeInObjC);
        }
    }
    // mach message csdlc about dynamically unloaded images
    //  { ... }
}
  • 全局搜索 sNotifyObjCInit
static _dyld_objc_notify_init       sNotifyObjCInit;
void registerObjCNotifiers(_dyld_objc_notify_mapped mapped, _dyld_objc_notify_init init, _dyld_objc_notify_unmapped unmapped)
{
    // record functions to call
    sNotifyObjCMapped   = mapped;
    sNotifyObjCInit     = init;
    sNotifyObjCUnmapped = unmapped;
}
  • 全局搜索 registerObjCNotifiers
void _dyld_objc_notify_register(_dyld_objc_notify_mapped    mapped,
                                _dyld_objc_notify_init      init,
                                _dyld_objc_notify_unmapped  unmapped)
{
    dyld::registerObjCNotifiers(mapped, init, unmapped);
}
  • 先在这里停一下,我们在👆一直不断的探索看源码,找到关键核心代码 -> 继续探索。到最后我们找到了 _dyld_objc_notify_register
  • 得出的一个结论就是在 initializeMainExecutable -> ImageLoader: runInitializers -> notifySingle -> _dyld_objc_notify_register

那为什么 ImageLoader:: recursiveInitialization 进行反向推导会来到 _dyld_objc_notify_register

dyld_objc_notify_register

  • 这时我们已经无法静态去分析dyld_objc_notify_register 那该怎么办呢?
  • 根据经验我们可以添加去动态分析——把程序跑起来。
  • 添加 dyld_objc_notify_register 符号断点
  • bt打印堆栈 ,我们发现在 libdyld.dylib 中调用了 _dyld_objc_notify_register
  • libobjc 源码中全局搜索 _dyld_objc_notify_register 发现在 _objc_init 里会调用_dyld_objc_notify_register
void _objc_init(void)
{
   //...  省略 一些 init  的方法
    _imp_implementationWithBlock_init();
    _dyld_objc_notify_register(&map_images, load_images, unmap_image);

#if __OBJC2__
    didCallDyldNotifyRegister = true;
#endif
}
  • 之前已经得知 dyld_start->... -> ImageLoader:: recursiveInitialization 进行反向推导会来到 _dyld_objc_notify_register
  • 而在 _objc_init 初始化时也会调用 _dyld_objc_notify_register

这时又会有个新的问题就是 dyld_start_objc_init 到底是什么关系呢?

  • 我们再把程序运行起来bt 打印堆栈

  • 根据堆栈信息. 2️⃣ libdispatch.dylib:_os_object_init ->1️⃣ libobjc.A.dylib: _objc_init

  • 直接查找libdispatch.dylib 源码 _os_object_init 进行验证

  • 验证 3️⃣ libdispatch.dylib:libdispatch_init -> 2️⃣ libdispatch.dylib:_os_object_init

  • 验证 4️⃣ libSystem.B.dylib :libSystem_initializer -> 3️⃣ libdispatch.dylib:libdispatch_init
  • 验证 5️⃣ dyld:ImageLoaderMachO::doModInitFunctions -> 4️⃣ libSystem.B.dylib :libSystem_initializer
void ImageLoaderMachO::doImageInit(const LinkContext& context)
{
    // 核心代码 
    // libSystem initializer 必须是第一次加载,否就报错
    if ( ! dyld::gProcessInfo->libSystemInitialized ) {
                                // <rdar://problem/17973316> libSystem initializer must run first
                                const char* installPath = getInstallPath();
                                if ( (installPath == NULL) || (strcmp(installPath, libSystemPath(context)) != 0) )
                          dyld::throwf("initializer in image (%s) that does not link with libSystem.dylib\n", this->getPath());
    }
    // now safe to use malloc() and other calls in libSystem.dylib
    dyld::gProcessInfo->libSystemInitialized = true;
}
  • 根据👆我们可以推导出如下流程
  • dyld 源码中全局搜索 doModInitFunctions
bool ImageLoaderMachO::doInitialization(const LinkContext& context)
{
    CRSetCrashLogMessage2(this->getPath());

    // mach-o has -init and static initializers
    doImageInit(context);
    doModInitFunctions(context);
    
    CRSetCrashLogMessage2(NULL);
    
    return (fHasDashInit || fHasInitializers);
}
  • doInitialization 中会调用 doModInitFunctions, 继续搜索 doInitialization 看看在哪里会调用
void ImageLoader::recursiveInitialization {
        // let objc know we are about to initialize this image
            uint64_t t1 = mach_absolute_time();
            fState = dyld_image_state_dependents_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_dependents_initialized, this, &timingInfo);
            
            // initialize this image
            bool hasInitializers = this->doInitialization(context);

            // let anyone know we finished initializing this image
            fState = dyld_image_state_initialized;
            oldState = fState;
            context.notifySingle(dyld_image_state_initialized, this, NULL);
}

  • 我们发现在 ImageLoader::recursiveInitialization中会调用 doInitialization
    doInitialization -> doModInitFunctions -> ... -> _objc_init
  • 这下整个流程都通了,最终形成了一个闭环

  • 根据上面的种种分析,现在终于就能解决为什么 ImageLoader:: recursiveInitialization 进行反推导会来到 _dyld_objc_notify_register

  • _objc_init 执行 _dyld_objc_notify_register 就是一个反向回调,把 map_images,load_iamges,unmapped_image 三个参数传进去

那为何要进行反向回调?

  • 因为在 dyld 链接 images,它无法确定 images 在何时能够加载完成,此时就在 notifySingle 下了一个句柄,当 dyld_image_state_initialized = true 了,就在 alloc_init 调用 _dyld_objc_notify_register 传入三个参数,根据参数的内容来进行调用执行

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
【社区内容提示】社区部分内容疑似由AI辅助生成,浏览时请结合常识与多方信息审慎甄别。
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

相关阅读更多精彩内容

友情链接更多精彩内容