This talk is going to be divided in two sections, the first is more theory and the second more practical, I’ll be doing the first theory part. And in it I’ll be walking you through all the steps that happen, all the way up to main.
But in order for you to understand and appreciate(领会) all the steps I first need to give you a crash course on Mach-O and Virtual Memory.
Mach-O is a bunch of file types for difference run time executables.
Executable——可执行 Main binary for application
dylib——a dynamic library, 动态库 (aka DSOs or DLLs)
bunlde——a special kind of dylan that you cannot link against, all you can do is load it at runtime by an dlopen and that’s used on a Mac OS for plug-ins.
image——refers to any of these three types. And I’ll be using that term a lot.
Framework——a dylib with a special directory structure around it to holds files needed by that dylan. So let’s dive right into the Mach-O image format.
Mach-O Image File:
1、segment
File divided into segments, by convention all segment names are use upper case letters. 文件分为几个部分,各部分命名用大写
Each segment is always a multiple of the page size, in this example the text is 3 pages, the DATA and LINKEDIT are each one page.
TEXT部分大小为3页,DATA和LINKEDIT部分大小都为1页
page size:页大小
the page size is determined by the hardware 页大小由硬件决定
arm64——16K
everything else——4K
2、section
sections is something the compiler omits.
sections are really just a subrange of a segment, they don’t have any of the constrains of being page size, but the are non-overlaying.
Common segments:
__TEXT has header, code, and read-only constants
__DATA has all read-write content: globals, static variables, etc.
__LINKEDIT doesn’t contain your functions of global variables, a LINKEDIT contains information about your function of variables such as their name and address.
You may have also heard of universal files, what are they?
Well suppose you build an iOS app, for a 64 bit, and now you have this Mach-O file, so what happens the next code when you say you also want to build it for 32 bit devices?
When you rebuild, Xcode will build another separate Mach-O file, this one built for 32 bits, RB7.
And then those two files are merged into a third file, called the Mach-O universal file.
And that has a header at the start, and all the header has a list of all the architecture and what their offsets are in the file.
在文件开始有个header,所有头部有个所有架构列表以及在文件中的位移位置。
And that header is also one page in size.
Now you may be wondering, why are the segments multiple page sizes?
Why is the header a page size, and it’s wasting a lot of space.
Well the reason everything is page based has to do with our next topic with is virtual memory.
So what is virtual memory?
虚拟内存
Virtual Memory is a level of indirection.
虚拟内存是中间层。
Every process is a logic address space which gets mapped to some physical page of RAM.
每个进程是个逻辑地址空间,和RAM里的物理页面相对应
Now this mapping does not have to be one to one, you could have logical address that go to no physical RAM and you can have multiple logical address that go to the same physical RAM.
这个映射不一定是一对一的,逻辑地址可能不指向物理内存,也可能多个逻辑地址指向同一个物理内存
This offered lots of opportunities here.
So what can you do with VM?
Well first, if you have a logical address that does not map to any physical RAM, when you access that address in your process, a page fault happens.
假设有个逻辑地址不和任何物理内存对应,访问进程中的该地址时,产生页错误。
At that point the kernel stops that thread and tries to figure out what needs to happen.
那时 内核会停止该线程,试图找出发生了什么。
The next thing is if you have two processes, with different logical addresses, mapping to the same physical page, those two processes are now sharing the same bit of RAM.
另外一件事,如果你有两个不同逻辑地址的进程,对应同一块物理页,这两个进程共享同一块内存。
You now have sharing between processes.
Another interesting feature is file backed mapping.
另一个有趣的特点是备份文件的映射
Rather than actually read an entire file into RAM you can tell the VM system through the mmap call, then I want this slice of this file mapped to this address range in my process.
而不是实际读取整个文件到RAM中,您可以通过mmap调用告诉VM系统,我想要将这个文件的这个切片映射到我的进程中的这个地址范围。
So why would you do that? Well rather than having to read the entire file, by having that mapping set up, as you first access those difference addresses, as if you had read it in memory, each time you access an address that hasn’t been accessed before it will cause a page fault, the kernel will read just that one page.
为什么要这么做?不必读取整个文件,通过映射设置,当您第一次访问这些不同的地址,如果您读过它在内存中,