Mach-O 文件结构

本文源码从苹果开源官网获得

什么是Mach-O

Mach-OMach Object文件格式的缩写,是用于 iOS 和 macOS 的可执行文件,目标代码,动态库,内核转储的文件格式。

Mach-O 文件格式

苹果官方给的一张文件结构图:


Mach-O文件结构

我们编写一个HelloWorld程序,将其编译,然后通过MachOView来打开.out文件:

可以知道Mach-O由三部分组成:

  • Header:指明了CPU架构、文件类型、Load Commands 个数等一些基本信息。
  • Load Commands:描述了怎样加载每个 Segment 的信息。在 Mach-O 文件中可以有多个 Segment,每个 Segment 可能包含零个、一个或多个 Section。
  • Data:Segment 的具体数据,包含了代码和数据等。

Header

/*
 * The 32-bit mach header appears at the very beginning of the object file for
 * 32-bit architectures.
 */
struct mach_header {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
};

/*
 * The 64-bit mach header appears at the very beginning of object files for
 * 64-bit architectures.
 */
struct mach_header_64 {
    uint32_t    magic;      /* mach magic number identifier */
    cpu_type_t  cputype;    /* cpu specifier */
    cpu_subtype_t   cpusubtype; /* machine specifier */
    uint32_t    filetype;   /* type of file */
    uint32_t    ncmds;      /* number of load commands */
    uint32_t    sizeofcmds; /* the size of all the load commands */
    uint32_t    flags;      /* flags */
    uint32_t    reserved;   /* reserved */
};
  • magic:魔数,0xfeedface是32位,0xcefaedfe是64位
/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC    0xfeedface  /* the mach magic number */
#define MH_CIGAM    0xcefaedfe  /* NXSwapInt(MH_MAGIC) */
  • cputype:CPU类型
  • cpusubtype:CPU具体类型
  • filetype:文件类型,例如可执行文件、库文件等
    文件类型filetype的宏定义有:
#define MH_OBJECT   0x1     /* relocatable object file */
#define MH_EXECUTE  0x2     /* demand paged executable file */
#define MH_FVMLIB   0x3     /* fixed VM shared library file */
#define MH_CORE     0x4     /* core file */
#define MH_PRELOAD  0x5     /* preloaded executable file */
#define MH_DYLIB    0x6     /* dynamically bound shared library */
#define MH_DYLINKER 0x7     /* dynamic link editor */
#define MH_BUNDLE   0x8     /* dynamically bound bundle file */
#define MH_DYLIB_STUB   0x9     /* shared library stub for static */
                    /*  linking only, no section contents */
#define MH_DSYM     0xa     /* companion file with only debug */
                    /*  sections */
#define MH_KEXT_BUNDLE  0xb     /* x86_64 kexts */
  • ncmds:Load Commands的数量
  • sizeofcmds:Load Commands的总大小
  • flags:标志位,用于描述该文件的详细信息。
  • reserved:64位才有的保留字段,暂时没用

标志位flags的宏定义有:

#define MH_NOUNDEFS 0x1     /* the object file has no undefined
                       references */
#define MH_INCRLINK 0x2     /* the object file is the output of an
                       incremental link against a base file
                       and can't be link edited again */
#define MH_DYLDLINK 0x4     /* the object file is input for the
                       dynamic linker and can't be staticly
                       link edited again */
#define MH_BINDATLOAD   0x8     /* the object file's undefined
                       references are bound by the dynamic
                       linker when loaded. */
#define MH_PREBOUND 0x10        /* the file has its dynamic undefined
                       references prebound. */
#define MH_SPLIT_SEGS   0x20        /* the file has its read-only and
                       read-write segments split */
#define MH_LAZY_INIT    0x40        /* the shared library init routine is
                       to be run lazily via catching memory
                       faults to its writeable segments
                       (obsolete) */
#define MH_TWOLEVEL 0x80        /* the image is using two-level name
                       space bindings */
#define MH_FORCE_FLAT   0x100       /* the executable is forcing all images
                       to use flat name space bindings */
#define MH_NOMULTIDEFS  0x200       /* this umbrella guarantees no multiple
                       defintions of symbols in its
                       sub-images so the two-level namespace
                       hints can always be used. */
#define MH_NOFIXPREBINDING 0x400    /* do not have dyld notify the
                       prebinding agent about this
                       executable */
#define MH_PREBINDABLE  0x800           /* the binary is not prebound but can
                       have its prebinding redone. only used
                                           when MH_PREBOUND is not set. */
#define MH_ALLMODSBOUND 0x1000      /* indicates that this binary binds to
                                           all two-level namespace modules of
                       its dependent libraries. only used
                       when MH_PREBINDABLE and MH_TWOLEVEL
                       are both set. */ 
#define MH_SUBSECTIONS_VIA_SYMBOLS 0x2000/* safe to divide up the sections into
                        sub-sections via symbols for dead
                        code stripping */
#define MH_CANONICAL    0x4000      /* the binary has been canonicalized
                       via the unprebind operation */
#define MH_WEAK_DEFINES 0x8000      /* the final linked image contains
                       external weak symbols */
#define MH_BINDS_TO_WEAK 0x10000    /* the final linked image uses
                       weak symbols */

#define MH_ALLOW_STACK_EXECUTION 0x20000/* When this bit is set, all stacks 
                       in the task will be given stack
                       execution privilege.  Only used in
                       MH_EXECUTE filetypes. */
#define MH_DEAD_STRIPPABLE_DYLIB 0x400000 /* Only for use on dylibs.  When
                         linking against a dylib that
                         has this bit set, the static linker
                         will automatically not create a
                         LC_LOAD_DYLIB load command to the
                         dylib if no symbols are being
                         referenced from the dylib. */
#define MH_ROOT_SAFE 0x40000           /* When this bit is set, the binary 
                      declares it is safe for use in
                      processes with uid zero */
                                         
#define MH_SETUID_SAFE 0x80000         /* When this bit is set, the binary 
                      declares it is safe for use in
                      processes when issetugid() is true */

#define MH_NO_REEXPORTED_DYLIBS 0x100000 /* When this bit is set on a dylib, 
                      the static linker does not need to
                      examine dependent dylibs to see
                      if any are re-exported */
#define MH_PIE 0x200000         /* When this bit is set, the OS will
                       load the main executable at a
                       random address.  Only used in
                       MH_EXECUTE filetypes. */

对于上面的HelloWorld程序来说,它的Header信息如下:

Load Commands

struct load_command {
    uint32_t cmd;       /* type of load command */
    uint32_t cmdsize;   /* total size of command in bytes */
};
  • cmd类型:指定command类型
  • cmdsize:表示command大小,用于计算到下一个command的偏移量

cmd类型:

cmd 作用
LC_SEGMENT/LC_SEGMENT_64 将段内数据加载映射到内存中去
LC_SYMTAB 符号表信息
LC_DYSYMTAB 动态符号表信息
LC_DYLD_INFO_ONLY 动态库信息
LC_LOAD_DYLINKER 启动dyld
LC_UUID 唯一标识符
LC_SOURCE_VERSION 源代码版本
LC_MAIN 程序入口
LC_LOAD_DYLIB 加载动态库
LC_FUNCTION_STARTS 函数符号表
LC_DATA_IN_CODE Data注入代码地址
LC_CODE_SIGNATURE 代码签名信息

segment

首先看看segment的定义:

struct segment_command { /* for 32-bit architectures */
    uint32_t    cmd;        /* LC_SEGMENT */
    uint32_t    cmdsize;    /* includes sizeof section structs */
    char        segname[16];    /* segment name */
    uint32_t    vmaddr;     /* memory address of this segment */
    uint32_t    vmsize;     /* memory size of this segment */
    uint32_t    fileoff;    /* file offset of this segment */
    uint32_t    filesize;   /* amount to map from the file */
    vm_prot_t   maxprot;    /* maximum VM protection */
    vm_prot_t   initprot;   /* initial VM protection */
    uint32_t    nsects;     /* number of sections in segment */
    uint32_t    flags;      /* flags */
};
  • cmd:上面提到的Load Command类型
  • cmdsize:Load Command大小
  • segname[16]:段名称
segname 含义
__PAGEZERO 可执行文件捕获空指针的段
__TEXT 代码段和只读数据
__DATA 全局变量和静态变量
__LINKEDIT 包含动态链接器所需的符号、字符串表等数据
  • vmaddr:段虚拟地址(未偏移),真实虚拟地址要加上ASLR的偏移量
  • vmsize:段的虚拟地址大小
  • fileoff:段在文件内的地址偏移
  • filesize:段在文件内的大小
    加载segment的过程,就是从文件偏移fileoff处,将大小为filesize的段,加载到虚拟机vmaddr处。
  • nsects:段内section数量
  • flags:标志位,用于描述详细信息
    标志位宏定义:
#define SG_HIGHVM   0x1 /* the file contents for this segment is for
                   the high part of the VM space, the low part
                   is zero filled (for stacks in core files) */
#define SG_FVMLIB   0x2 /* this segment is the VM that is allocated by
                   a fixed VM library, for overlap checking in
                   the link editor */
#define SG_NORELOC  0x4 /* this segment has nothing that was relocated
                   in it and nothing relocated to it, that is
                   it maybe safely replaced without relocation*/
#define SG_PROTECTED_VERSION_1  0x8 /* This segment is protected.  If the
                       segment starts at file offset 0, the
                       first page of the segment is not
                       protected.  All other pages of the
                       segment are protected. */

section

section的定义:

struct section { /* for 32-bit architectures */
    char        sectname[16];   /* name of this section */
    char        segname[16];    /* segment this section goes in */
    uint32_t    addr;       /* memory address of this section */
    uint32_t    size;       /* size in bytes of this section */
    uint32_t    offset;     /* file offset of this section */
    uint32_t    align;      /* section alignment (power of 2) */
    uint32_t    reloff;     /* file offset of relocation entries */
    uint32_t    nreloc;     /* number of relocation entries */
    uint32_t    flags;      /* flags (section type and attributes)*/
    uint32_t    reserved1;  /* reserved (for offset or index) */
    uint32_t    reserved2;  /* reserved (for count or sizeof) */
};
  • sectname:section名称
  • segname:所属的segment名称
    (大写的__TEXT代表segment,小写的__text代表section
sectname 含义
__text 主程序代码
__subs 桩代码
__stub_helper 用于动态链接,启动dyld
__cstring 硬编码的C字符串
__la_symbol_ptr 延迟加载
__data 初始化的可变的变量
  • addr:section在内存中的地址
  • size:section大小
  • offset:section在文件中的偏移
  • align:内存对齐边界
  • reloff:重定位入口在文件中的偏移
  • nreloc:重定位入口数量
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,752评论 6 493
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,100评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 159,244评论 0 349
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,099评论 1 286
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,210评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,307评论 1 292
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,346评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,133评论 0 269
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,546评论 1 306
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,849评论 2 328
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,019评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,702评论 4 337
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,331评论 3 319
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,030评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,260评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,871评论 2 365
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,898评论 2 351

推荐阅读更多精彩内容