本文源码从苹果开源官网获得
什么是Mach-O
Mach-O
为 Mach Object
文件格式的缩写,是用于 iOS 和 macOS 的可执行文件,目标代码,动态库,内核转储的文件格式。
Mach-O 文件格式
苹果官方给的一张文件结构图:
我们编写一个HelloWorld程序,将其编译,然后通过MachOView来打开.out
文件:
可以知道Mach-O由三部分组成:
-
Header
:指明了CPU架构、文件类型、Load Commands 个数等一些基本信息。 -
Load Commands
:描述了怎样加载每个 Segment 的信息。在 Mach-O 文件中可以有多个 Segment,每个 Segment 可能包含零个、一个或多个 Section。 -
Data
:Segment 的具体数据,包含了代码和数据等。
Header
/*
* The 32-bit mach header appears at the very beginning of the object file for
* 32-bit architectures.
*/
struct mach_header {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
};
/*
* The 64-bit mach header appears at the very beginning of object files for
* 64-bit architectures.
*/
struct mach_header_64 {
uint32_t magic; /* mach magic number identifier */
cpu_type_t cputype; /* cpu specifier */
cpu_subtype_t cpusubtype; /* machine specifier */
uint32_t filetype; /* type of file */
uint32_t ncmds; /* number of load commands */
uint32_t sizeofcmds; /* the size of all the load commands */
uint32_t flags; /* flags */
uint32_t reserved; /* reserved */
};
-
magic
:魔数,0xfeedface是32位,0xcefaedfe是64位
/* Constant for the magic field of the mach_header (32-bit architectures) */
#define MH_MAGIC 0xfeedface /* the mach magic number */
#define MH_CIGAM 0xcefaedfe /* NXSwapInt(MH_MAGIC) */
-
cputype
:CPU类型 -
cpusubtype
:CPU具体类型 -
filetype
:文件类型,例如可执行文件、库文件等
文件类型filetype的宏定义有:
#define MH_OBJECT 0x1 /* relocatable object file */
#define MH_EXECUTE 0x2 /* demand paged executable file */
#define MH_FVMLIB 0x3 /* fixed VM shared library file */
#define MH_CORE 0x4 /* core file */
#define MH_PRELOAD 0x5 /* preloaded executable file */
#define MH_DYLIB 0x6 /* dynamically bound shared library */
#define MH_DYLINKER 0x7 /* dynamic link editor */
#define MH_BUNDLE 0x8 /* dynamically bound bundle file */
#define MH_DYLIB_STUB 0x9 /* shared library stub for static */
/* linking only, no section contents */
#define MH_DSYM 0xa /* companion file with only debug */
/* sections */
#define MH_KEXT_BUNDLE 0xb /* x86_64 kexts */
-
ncmds
:Load Commands的数量 -
sizeofcmds
:Load Commands的总大小 -
flags
:标志位,用于描述该文件的详细信息。 -
reserved
:64位才有的保留字段,暂时没用
标志位flags的宏定义有:
#define MH_NOUNDEFS 0x1 /* the object file has no undefined
references */
#define MH_INCRLINK 0x2 /* the object file is the output of an
incremental link against a base file
and can't be link edited again */
#define MH_DYLDLINK 0x4 /* the object file is input for the
dynamic linker and can't be staticly
link edited again */
#define MH_BINDATLOAD 0x8 /* the object file's undefined
references are bound by the dynamic
linker when loaded. */
#define MH_PREBOUND 0x10 /* the file has its dynamic undefined
references prebound. */
#define MH_SPLIT_SEGS 0x20 /* the file has its read-only and
read-write segments split */
#define MH_LAZY_INIT 0x40 /* the shared library init routine is
to be run lazily via catching memory
faults to its writeable segments
(obsolete) */
#define MH_TWOLEVEL 0x80 /* the image is using two-level name
space bindings */
#define MH_FORCE_FLAT 0x100 /* the executable is forcing all images
to use flat name space bindings */
#define MH_NOMULTIDEFS 0x200 /* this umbrella guarantees no multiple
defintions of symbols in its
sub-images so the two-level namespace
hints can always be used. */
#define MH_NOFIXPREBINDING 0x400 /* do not have dyld notify the
prebinding agent about this
executable */
#define MH_PREBINDABLE 0x800 /* the binary is not prebound but can
have its prebinding redone. only used
when MH_PREBOUND is not set. */
#define MH_ALLMODSBOUND 0x1000 /* indicates that this binary binds to
all two-level namespace modules of
its dependent libraries. only used
when MH_PREBINDABLE and MH_TWOLEVEL
are both set. */
#define MH_SUBSECTIONS_VIA_SYMBOLS 0x2000/* safe to divide up the sections into
sub-sections via symbols for dead
code stripping */
#define MH_CANONICAL 0x4000 /* the binary has been canonicalized
via the unprebind operation */
#define MH_WEAK_DEFINES 0x8000 /* the final linked image contains
external weak symbols */
#define MH_BINDS_TO_WEAK 0x10000 /* the final linked image uses
weak symbols */
#define MH_ALLOW_STACK_EXECUTION 0x20000/* When this bit is set, all stacks
in the task will be given stack
execution privilege. Only used in
MH_EXECUTE filetypes. */
#define MH_DEAD_STRIPPABLE_DYLIB 0x400000 /* Only for use on dylibs. When
linking against a dylib that
has this bit set, the static linker
will automatically not create a
LC_LOAD_DYLIB load command to the
dylib if no symbols are being
referenced from the dylib. */
#define MH_ROOT_SAFE 0x40000 /* When this bit is set, the binary
declares it is safe for use in
processes with uid zero */
#define MH_SETUID_SAFE 0x80000 /* When this bit is set, the binary
declares it is safe for use in
processes when issetugid() is true */
#define MH_NO_REEXPORTED_DYLIBS 0x100000 /* When this bit is set on a dylib,
the static linker does not need to
examine dependent dylibs to see
if any are re-exported */
#define MH_PIE 0x200000 /* When this bit is set, the OS will
load the main executable at a
random address. Only used in
MH_EXECUTE filetypes. */
对于上面的HelloWorld程序来说,它的Header信息如下:
Load Commands
struct load_command {
uint32_t cmd; /* type of load command */
uint32_t cmdsize; /* total size of command in bytes */
};
-
cmd
类型:指定command类型 -
cmdsize
:表示command大小,用于计算到下一个command的偏移量
cmd类型:
cmd | 作用 |
---|---|
LC_SEGMENT/LC_SEGMENT_64 | 将段内数据加载映射到内存中去 |
LC_SYMTAB | 符号表信息 |
LC_DYSYMTAB | 动态符号表信息 |
LC_DYLD_INFO_ONLY | 动态库信息 |
LC_LOAD_DYLINKER | 启动dyld |
LC_UUID | 唯一标识符 |
LC_SOURCE_VERSION | 源代码版本 |
LC_MAIN | 程序入口 |
LC_LOAD_DYLIB | 加载动态库 |
LC_FUNCTION_STARTS | 函数符号表 |
LC_DATA_IN_CODE | Data注入代码地址 |
LC_CODE_SIGNATURE | 代码签名信息 |
segment
首先看看segment的定义:
struct segment_command { /* for 32-bit architectures */
uint32_t cmd; /* LC_SEGMENT */
uint32_t cmdsize; /* includes sizeof section structs */
char segname[16]; /* segment name */
uint32_t vmaddr; /* memory address of this segment */
uint32_t vmsize; /* memory size of this segment */
uint32_t fileoff; /* file offset of this segment */
uint32_t filesize; /* amount to map from the file */
vm_prot_t maxprot; /* maximum VM protection */
vm_prot_t initprot; /* initial VM protection */
uint32_t nsects; /* number of sections in segment */
uint32_t flags; /* flags */
};
-
cmd
:上面提到的Load Command类型 -
cmdsize
:Load Command大小 -
segname[16]
:段名称
segname | 含义 |
---|---|
__PAGEZERO | 可执行文件捕获空指针的段 |
__TEXT | 代码段和只读数据 |
__DATA | 全局变量和静态变量 |
__LINKEDIT | 包含动态链接器所需的符号、字符串表等数据 |
-
vmaddr
:段虚拟地址(未偏移),真实虚拟地址要加上ASLR的偏移量 -
vmsize
:段的虚拟地址大小 -
fileoff
:段在文件内的地址偏移 -
filesize
:段在文件内的大小
加载segment的过程,就是从文件偏移fileoff
处,将大小为filesize
的段,加载到虚拟机vmaddr
处。 -
nsects
:段内section数量 -
flags
:标志位,用于描述详细信息
标志位宏定义:
#define SG_HIGHVM 0x1 /* the file contents for this segment is for
the high part of the VM space, the low part
is zero filled (for stacks in core files) */
#define SG_FVMLIB 0x2 /* this segment is the VM that is allocated by
a fixed VM library, for overlap checking in
the link editor */
#define SG_NORELOC 0x4 /* this segment has nothing that was relocated
in it and nothing relocated to it, that is
it maybe safely replaced without relocation*/
#define SG_PROTECTED_VERSION_1 0x8 /* This segment is protected. If the
segment starts at file offset 0, the
first page of the segment is not
protected. All other pages of the
segment are protected. */
section
section的定义:
struct section { /* for 32-bit architectures */
char sectname[16]; /* name of this section */
char segname[16]; /* segment this section goes in */
uint32_t addr; /* memory address of this section */
uint32_t size; /* size in bytes of this section */
uint32_t offset; /* file offset of this section */
uint32_t align; /* section alignment (power of 2) */
uint32_t reloff; /* file offset of relocation entries */
uint32_t nreloc; /* number of relocation entries */
uint32_t flags; /* flags (section type and attributes)*/
uint32_t reserved1; /* reserved (for offset or index) */
uint32_t reserved2; /* reserved (for count or sizeof) */
};
-
sectname
:section名称 -
segname
:所属的segment名称
(大写的__TEXT
代表segment
,小写的__text
代表section
)
sectname | 含义 |
---|---|
__text | 主程序代码 |
__subs | 桩代码 |
__stub_helper | 用于动态链接,启动dyld |
__cstring | 硬编码的C字符串 |
__la_symbol_ptr | 延迟加载 |
__data | 初始化的可变的变量 |
-
addr
:section在内存中的地址 -
size
:section大小 -
offset
:section在文件中的偏移 -
align
:内存对齐边界 -
reloff
:重定位入口在文件中的偏移 -
nreloc
:重定位入口数量