RAFT分布式K/V缓存 C++实现

学习MIT6.824时，想从socket开始写起，构建网络层，RPC层，Raft层，到应用层的k/v缓存。想自己实现去感受一下各个部分到底是如何运作的，以学习和练习为主，并且为之后的学习打好基础。

参考资料：
网络库：主要参考muduo，事件驱动的非阻塞网络库
RPC : protobuf https://developers.google.com/protocol-buffers/docs/reference/cpp/google.protobuf.service
可视化：https://raft.github.io/
可视化: http://thesecretlivesofdata.com/raft/
论文原版：https://ramcloud.atlassian.net/wiki/download/attachments/6586375/raft.pdf
论文中文翻译: https://github.com/maemual/raft-zh_cn/blob/master/raft-zh_cn.md
MIT6.824 Lab2 :https://pdos.csail.mit.edu/6.824/labs/lab-raft.html

github:https://github.com/JiaMing5139/DistributedSystem

一.Raft中数据结构的定义，对应论文中Figure2

1.RPC消息格式（protobuf）

message LogEntry{
int64 index = 1;
int64 term = 2;
string commandName =3 ;
bytes command = 4;
}

message AppendEntriesRequest {
int64 term = 1;
string leaderId = 2;
int64 prevLogIndex = 3;
int64 prevLogTerm =4;
repeated LogEntry LogEntries = 5;
int64 leaderCommit;
}

message AppendEntriesResponse {
int64 Term = 1;
int64 Index = 2;
int64 CommitIndex =3;
bool Success = 4;
}
论文Figure写 Response中包括 Success 和term
但只靠success 分别不出来一种状态，就是对端是否确认有一个新的log被match，原地不动和有新logMatch都返回true，所以需要本地再维护一个状态判断是否又新log被发出
加上Index，对端直接返回自己match的序号，更简单

message RequestVoteRequest {
uint64 Term=1;
uint64 LastLogIndex=2;
uint64 LastLogTerm=3;
string CandidateName=4;
}

message RequestVoteResponse {
uint64 Term=1;
bool VoteGranted=2;
}

service RaftService {
rpc AppendEntries(AppendEntriesRequest) returns(AppendEntriesResponse);
rpc Vote(RequestVoteRequest) returns(RequestVoteResponse);
}

2. 每个Raft实例需要维护的变量

class Client{
int64_t nextIndex_ = 0;
int64_t matchIndex_ = 0;
bool voteGranted_ = false;
std::string ClientName;
}
class Raft{
int64_t currentTerm_; // 当先raft的term
State currentState_; //当前状态
int votedNum_;//收集到的投票
string votedFor_; //当前term给谁投的票
vector<Client> clients_//其他节点
}

一.领导人选择的限制条件

- Raft为了保证同一Term只有一个Leader被选出，选举时需要什么条件？

过半结点投票同意，每个任期只能投票一次。如果出现等票情况
则electiontimeout过期自动开启新一轮Term的领导人选举
成为leader后立马发出appendEntry 限制新的candidatate出现

- 为了保证安全性，选举时需要条件？

安全性就是指提交后的日志，不会被再次覆盖
因为必须超过半数才能当领导人，两个任期内必定会有一个完整的结点是重合的。
所以不论怎么发生网络划分，都必须让那个两个任期中重复的，也是日志最完整的结点当Leader，才能保证不会被其他的日志覆盖，所以要进行限制
一种临界情况：
S1 S5 结点的Term为45
S2 S3 S4 的Term为2，已经有三个日志确认提交，返回给客户端成功
此时RAFT必须保证三条日志不会丢失

屏幕快照 2020-06-22 上午5.47.56.png

为了保证RAFT正常工作，必须有3个及以上的结点正在运行，才能选出Leader正常运行
这时候S3 S4两个保证完整结点死掉，S1 S5重启成功，这时候仅剩1个S2保存着正常结点
为了让这个运行着的RAFT返回正确的结果，必须让S2当结点Leader，保证日志的完整性

屏幕快照 2020-06-22 上午5.48.11.png

状态转移表：

event\state	follower
electionTimeout	清空上一个任期内的记录 this.currentTerm++ this.currentState=Candidate votedFor = selfname for c in clients to c.requestRpc()
RequestVoteRpc	如果日志不是最新的，返回false Term的三种情况: 1.localterm大于peerterm，说明收到的请求是过期的candidate返回false 2.localterm等于peerterm，说明同一时间有两个candidate再发送投票请求，只投一个 3.localterm小于peerterm,说明这是一个新Term的请求，且是第一个到的，将之设置为自己的Term，返回true
VoteResponseRpc	如果peertrm > localterm localterm = peerterm

event\state	candidate
electionTimeout	清空上一个任期内的记录 this.currentTerm++ this.currentState=Candidate votedFor = selfname for c in clients to c.requestRpc() （有人瓜分选票，没有出现leader，再次重复给没有voteGranted为false的client发送s）
RequestVoteRpc	处理同follower，但肯定反回false,因为成为candidate必然已经投票给了自己
VoteResponseRpc	if（RequestVoteResponse.VoteGranted == true）{ client.voteGranted=true 查看有多少client的voteGranted为true if 超过一半 this.currentState=Leader start AppendEntriesTimer 让所有client中的nextinex = commitedIndex + 1 } if (RequestVoteResponse.Term > currentTerm) current =Term

event\state	Leader
electionTimeout	Leader不会有electionout
RequestVoteRpc	如果peerTerm比较大，说明自己是一个过期的Leader，将状态变为Follower，然后像一半的follower投同意票
VoteResponseRpc	丢弃？

二.日志提交的限制条件

日志提交指的是，已经给客户端返回成功，保证该日志永远不会丢失。

如何判断多数结点都已经在N处保存了log？

多数client的matchIndex >= N

为了保证安全性，需要什么限制条件？

不提交之前任期的日志

日志复制中需要用到的各个参数的变化时机
client.nextindex 用于推测client中需要下一个日志的位置
1.初始化为最后一个日志的坐标+1
2.当AppendEntry返回True时，nextIndex可能+1
3.当 AppendEntry返回false时， nextIndex - 1
client.matchindex 保证matchIndex已经被该客户端所保存
1.初始化为0
2.当AppendEntry返回时，matchIndex=peer.matchIndex

用来保证安全，用来保守的表示从头数已经共享到第几个log给字节点，

event\state	Leader
heartBeatTimeout	1.设置preIndex和preTer (preLogindex,preTerm = locallogs[client.nextIndex - 1].index,term) 2. 如果nextIndex在log的范围内，且matchIndex + 1 = nextindex?{ 则说明已经匹配上位置,并添加nextIndex位置的log到rpc中 } 3.重设heartBeat定时器
AppendEntryRpc	产生leader后，同一个term只有一个leader，所以leader不可能会收到另一个leader发来同一任期的AppendEntryRpc if peerterm>localterm leader变为follower，设置自己term]
AppendResponseRpc	1.如果peerTerm>localTerm 则一定返回的是false，自己是一个过期Leader，状态变为follower 2.如果peerTerm<=localTerm ，且成功则可以开始继续处理，否则对应client的nextIndex-1，matchIndex = response.matchIndex(0) 3.如果是一个新match的log，则让client.match++ client.nextIndex = client.match+1 4.如果最新match的日志是在自己任期内，则开始进行commited判断

event\state	Follower
heartBeatTimeout
AppendEntryRpc	0. 重置electionTimer 1.term检查，失败返回false 2.if preIndex preTerm匹配检查，response.matchIndex = 0失败返回false 3.else 匹配成功，如果日志不为空，则将日志添加,且进行commindex更新（leadercommit和logs.back().index中的较小值）
AppendResponseRpc

event\state	Candidate
heartBeatTimeout
AppendEntryRpc	0. 重置electionTimer,设置状态为follower 1.term检查，失败返回false 2.if preIndex preTerm匹配检查，response.matchIndex = 0失败返回false 3.else 匹配成功，如果日志不为空，则将日志添加,且进行commindex更新（leadercommit和logs.back().index中的较小值）
AppendResponseRpc

三.与客户端的交互

RAFT分布式K/V缓存 C++实现