再次探索：x86-64的站空间和栈帧结构

接上一篇文章《where the top of the stack is on x86》，这次我们关注x86-6下的战阵结构和参数的存放规则，以及Linux和其他遵循System V AMD64 ABI调用约定的操作系统。

寄存器差异

之前的文章已经介绍了不同结构下通用寄存器的种类和作用，我们知道x86下只有8个通用寄存器分别是(eax, ebx, ecx, edx, ebp, esp, esi, edi)，而x86-64增新了8个寄存器(r8, r9, r10, r11, r12, r13, r14, r15)。

参数传递

我们最关心的时x86-64结构下这些寄存器到底是如何存储的，从ABI规则来看，函数开始的6个整型或者指针类型参数通过寄存器传递参数，分别保存在rdi, rsi, rdx, rcx，r8，r9中，从第7个参数开始，接下来的所有参数将通过栈传递。

分析一个栈帧实例

还是以典型的C程序为例，看下栈帧布局：

long myfunc(long a, long b, long c, long d,
            long e, long f, long g, long h)
{
    long xx = a * b * c * d * e * f * g * h;
    long yy = a + b + c + d + e + f + g + h;
    long zz = utilfunc(xx, yy, xx % yy);
    return zz + 20;
}

结合上面文章的分析，我们可以得到本函数的栈帧布局

stack x86-64.png

函数有8个参数，发现最后两个参数的传递和x86是一致的，但是最后有两个所谓"red zone"，下面分析这个区域是神马。

红灯区（Red Zone）

来自System V AMD64 ABI的标准中的话：
The 128-byte area beyond the location pointed to by %rsp is considered to be reserved and shall not be modified by signal or interrupt handlers. Therefore, functions may use this area for temporary data that is not needed across function calls. In particular, leaf functions may use this area for their entire stack frame, rather than adjusting the stack pointer in the prologue and epilogue. This area is known as the red zone.

尝试翻译下 - “在%rsp指向的栈顶之后的128字节是被保留的——它不能被信号和终端处理程序使用。因此，函数可以在这个区域放一些临时的数据。特别地，叶子函数可能会将这128字节的区域作为它的整个栈帧，而不是像往常一样在进入函数和离开时靠移动栈指针获取栈帧和释放栈帧。这128字节被称作红色区域”

简单点说，这个红色区域（red zone）就是一个优化。因为这个区域不会被信号或者中断侵占，函数可以在不移动栈指针的情况下使用它存取一些临时数据——于是两个移动rsp的指令就被节省下来了。但是这个区域会被程序覆写，文献中描述说red zone最有用的时候是末端函数(叶子函数)使用的时候。

看起来还是不容易理解，回头看上面myfunc函数，其引用的utilfunc就是一个叶子函数，查看utilfunc代码

long utilfunc(long a, long b, long c)
{
    long xx = a + 2;
    long yy = b + 3;
    long zz = c + 4;
    long sum = xx + yy + zz;
  
    return xx * yy * zz + sum;
}

这个函数没有用到栈空间存放参数，其结构为

yezi x86-64.png

可以看到这个叶子函数直接使用myfunc函数的128bytes的red zone空间存储函数的所有的局部变量，最明显的差异就是此时rsp指针不在递减。

再看一个例子：

/*test.c*/
long test2(long a, long b, long c)  /* 叶子函数 */
{
    return a*b + c;
}
long test1(long a, long b)
{
    return test2(b, a, 3);
}
int main(int argc, char const *argv[])
{
    return test1(1, 2);
}

使用gcc进行编译和反编译

gcc test.c && objdump -d a.out

查看test2、test1、main函数的汇编结果

00000000004004d6 <test2>:
  4004d6:   55                      push   %rbp
  4004d7:   48 89 e5                mov    %rsp,%rbp
  4004da:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
  4004de:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  4004e2:   48 89 55 e8             mov    %rdx,-0x18(%rbp)
  4004e6:   48 8b 45 f8             mov    -0x8(%rbp),%rax
  4004ea:   48 0f af 45 f0          imul   -0x10(%rbp),%rax
  4004ef:   48 89 c2                mov    %rax,%rdx
  4004f2:   48 8b 45 e8             mov    -0x18(%rbp),%rax
  4004f6:   48 01 d0                add    %rdx,%rax
  4004f9:   5d                      pop    %rbp
  4004fa:   c3                      retq

00000000004004fb <test1>:
  4004fb:   55                      push   %rbp
  4004fc:   48 89 e5                mov    %rsp,%rbp
  4004ff:   48 83 ec 10             sub    $0x10,%rsp
  400503:   48 89 7d f8             mov    %rdi,-0x8(%rbp)
  400507:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  40050b:   48 8b 4d f8             mov    -0x8(%rbp),%rcx
  40050f:   48 8b 45 f0             mov    -0x10(%rbp),%rax
  400513:   ba 03 00 00 00          mov    $0x3,%edx
  400518:   48 89 ce                mov    %rcx,%rsi
  40051b:   48 89 c7                mov    %rax,%rdi
  40051e:   e8 b3 ff ff ff          callq  4004d6 <test2>
  400523:   c9                      leaveq 
  400524:   c3                      retq

0000000000400525 <main>:
  400525:   55                      push   %rbp
  400526:   48 89 e5                mov    %rsp,%rbp
  400529:   48 83 ec 10             sub    $0x10,%rsp
  40052d:   89 7d fc                mov    %edi,-0x4(%rbp)
  400530:   48 89 75 f0             mov    %rsi,-0x10(%rbp)
  400534:   be 02 00 00 00          mov    $0x2,%esi
  400539:   bf 01 00 00 00          mov    $0x1,%edi
  40053e:   e8 b8 ff ff ff          callq  4004fb <test1>
  400543:   c9                      leaveq 
  400544:   c3                      retq   
  400545:   66 2e 0f 1f 84 00 00    nopw   %cs:0x0(%rax,%rax,1)
  40054c:   00 00 00 
  40054f:   90                      nop

可以看到main函数和test1函数都执行了rsp移动获取栈帧空间：

4004ff: 48 83 ec 10             sub    $0x10,%rsp
....
400529: 48 83 ec 10             sub    $0x10,%rsp

而test2函数由于是叶子函数直接使用ebp/esp（此时它们两个相等），其参数和局部变量直接使用red zone空间存储，test2函数的栈帧空间布局如下：

yezi2 x86-64.png

关于ebp基地址指针的使用（原标题：节约通用寄存器）

其实很多时候，我们发现ebp指针并没有使用，而仅仅使用esp指针就可以定位，并且DWARF（Debugging With Attributed Record Formats）调试信息格式支持处理无基址指针的方法（CFI）。这就是一些编译器开始在高级优化中省略基址指针了，这样做可以缩减程序执行的“预处理代码”（prologue）和“后处理代码”（epilogue），节省出来一个通用寄存器供程序使用（在x86架构有限的GPRs资源条件下非常有用）。GPRs：GeneralPurpose Registers（通用寄存器）。在x86 gcc下默认保留ebp指针，但是也提供了-fomit-frame-pointer优化参数选项，对于是否推荐使用这个选项，争议比较大，我们查阅了相关的资料：

总之，通过使用%rsp索引栈帧的方法避免了传统的%rbp使用方法，这项技术节约掉了“预处理代码”（prologue）和“后处理代码”（epilogue）中的两条指令，而且也空出来一个通用寄存器供给程序使用。

为了弄清楚，我又编写了一个简单的包含叶子函数的C程序，分别使用正常编译和带有-fomit-frame-pointer指令的编译。
C程序为

#include <stdio.h>

int add(int a, int b)
{

        return a + b;
}

int main(int argc, char const *argv[])
{

        int sum = 0;

        sum = add(1,2);

        printf("%d\n",sum);

        return 0;
}

gcc反编译得到：

0000000000400526 <add>:
  400526:   55                      push   %rbp
  400527:   48 89 e5                mov    %rsp,%rbp
  40052a:   89 7d fc                mov    %edi,-0x4(%rbp)
  40052d:   89 75 f8                mov    %esi,-0x8(%rbp)
  400530:   8b 55 fc                mov    -0x4(%rbp),%edx
  400533:   8b 45 f8                mov    -0x8(%rbp),%eax
  400536:   01 d0                   add    %edx,%eax
  400538:   5d                      pop    %rbp
  400539:   c3                      retq   

000000000040053a <main>:
  40053a:   55                      push   %rbp
  40053b:   48 89 e5                mov    %rsp,%rbp
  40053e:   48 83 ec 20             sub    $0x20,%rsp
  400542:   89 7d ec                mov    %edi,-0x14(%rbp)
  400545:   48 89 75 e0             mov    %rsi,-0x20(%rbp)
  400549:   c7 45 fc 00 00 00 00    movl   $0x0,-0x4(%rbp)
  400550:   be 02 00 00 00          mov    $0x2,%esi
  400555:   bf 01 00 00 00          mov    $0x1,%edi
  40055a:   e8 c7 ff ff ff          callq  400526 <add>
  40055f:   89 45 fc                mov    %eax,-0x4(%rbp)
  400562:   8b 45 fc                mov    -0x4(%rbp),%eax
  400565:   89 c6                   mov    %eax,%esi
  400567:   bf 04 06 40 00          mov    $0x400604,%edi
  40056c:   b8 00 00 00 00          mov    $0x0,%eax
  400571:   e8 8a fe ff ff          callq  400400 <printf@plt>
  400576:   b8 00 00 00 00          mov    $0x0,%eax
  40057b:   c9                      leaveq 
  40057c:   c3                      retq   
  40057d:   0f 1f 00                nopl   (%rax)

带有指令的编译和汇编码

lic@ubuntu:~/Documents$ gcc -fomit-frame-pointer test2.c
lic@ubuntu:~/Documents$ objdump -d a.out

反汇编结果

0000000000400526 <add>:
  400526:   89 7c 24 fc             mov    %edi,-0x4(%rsp)
  40052a:   89 74 24 f8             mov    %esi,-0x8(%rsp)
  40052e:   8b 54 24 fc             mov    -0x4(%rsp),%edx
  400532:   8b 44 24 f8             mov    -0x8(%rsp),%eax
  400536:   01 d0                   add    %edx,%eax
  400538:   c3                      retq   

0000000000400539 <main>:
  400539:   48 83 ec 28             sub    $0x28,%rsp
  40053d:   89 7c 24 0c             mov    %edi,0xc(%rsp)
  400541:   48 89 34 24             mov    %rsi,(%rsp)
  400545:   c7 44 24 1c 00 00 00    movl   $0x0,0x1c(%rsp)
  40054c:   00 
  40054d:   be 02 00 00 00          mov    $0x2,%esi
  400552:   bf 01 00 00 00          mov    $0x1,%edi
  400557:   e8 ca ff ff ff          callq  400526 <add>
  40055c:   89 44 24 1c             mov    %eax,0x1c(%rsp)
  400560:   8b 44 24 1c             mov    0x1c(%rsp),%eax
  400564:   89 c6                   mov    %eax,%esi
  400566:   bf 04 06 40 00          mov    $0x400604,%edi
  40056b:   b8 00 00 00 00          mov    $0x0,%eax
  400570:   e8 8b fe ff ff          callq  400400 <printf@plt>
  400575:   b8 00 00 00 00          mov    $0x0,%eax
  40057a:   48 83 c4 28             add    $0x28,%rsp
  40057e:   c3                      retq   
  40057f:   90                      nop

发现不仅是叶子函数，mian函数也没有了ebp指针，但是对于gcc下的elf-x86-64程序，其函数的入口是start函数，查看start函数的汇编结果：

0000000000400430 <_start>:
  400430:   31 ed                   xor    %ebp,%ebp
  400432:   49 89 d1                mov    %rdx,%r9
  400435:   5e                      pop    %rsi
  400436:   48 89 e2                mov    %rsp,%rdx
  400439:   48 83 e4 f0             and    $0xfffffffffffffff0,%rsp
  40043d:   50                      push   %rax
  40043e:   54                      push   %rsp
  40043f:   49 c7 c0 f0 05 40 00    mov    $0x4005f0,%r8
  400446:   48 c7 c1 80 05 40 00    mov    $0x400580,%rcx
  40044d:   48 c7 c7 39 05 40 00    mov    $0x400539,%rdi
  400454:   e8 b7 ff ff ff          callq  400410 <__libc_start_main@plt>
  400459:   f4                      hlt    
  40045a:   66 0f 1f 44 00 00       nopw   0x0(%rax,%rax,1)

这里的结果和一些网上的文章结果不一致，我这里的环境和对象分别为：

lic@ubuntu:~/Documents$ gcc --version
gcc (Ubuntu 5.4.0-6ubuntu1~16.04.12) 5.4.0 20160609

a.out:     file format elf64-x86-64

关于start函数我们将在另一篇介绍elf文件结构的文章中进行描述。

最后，依照windows x64 ABI，并不存在所谓的red zone，