DDIA Ch5

All of the diffi‐ culty in replication lies in handling changes to replicated data, and that’s what this chapter is about.

复制本身不会复杂问题，但是你要改动数据的时候，问题就变得复杂了

Replication of databases is an old topic—the principles haven’t changed much since they were studied in the 1970s

真不知道原来这是个旧问题，也对，lamport的论文不就是30年前的论文了么

WAL shipping

a WAL contains details of which bytes were changed in which disk blocks

我以为只是一个record of writes. 没想到是存了在哪个disk block 写了什么，看来要回去看CS5600 的slides，里面应该有提到

所以说WAL是真正存了整个disk在哪里存了什么？厉害了

Logical log replication

logical log decouples from storage engine internals by using different log formats (应该就是不存哪里存了什么，只存数据)

A logical log for a relational database is usually a sequence of records describing writes to databse tables at the granularity of a row:

For an inserted row, the log contains the new values of all columns.

For a deleted row, the log contains enough information to uniquely identify the row that was deleted. Typically this would be the primary key, but if there is no primary key on the table, the old values of all columns need to be logged.

For an updated row, the log contains enough information to uniquely identify the updated row, and the new values of all columns (or at least the new values of all columns that changed).

logical log 有几个好处，它decoupled storage engine, 那么不同的node甚至可以跑不同的storage engine (one is based on SSTable or LSM tree, one is based on Btree) 所以说你有选择让leader has high write throughput, reader has high read throughput

backward compatible. 因为你的log只是执行write 操作，所以如果跑不同version也是可以的（比如用Avro 来实现你的现在的node DB version 可以read 之前older version writes，或者说replica write to newer version DB from older version leader, 所以说你可以先upgrade follower, 然后fail over，然后实现0 downtime upgrade

logical log is easier for external system to parse (such as dataware house) 这里就应该是之前的应用场景了，dataware house用不同的存储方式（custom index and caches）但并不需要log本身做什么，他自己就可以完成replication

This technique is called change data capture, we will return it in [[DDIA Ch11]]

Eventual consistency

就是说follower需要时间sync with leader，所以the followers will "eventually" catch up and become consistent with the leader.

This effect is known as eventual consistency

$\begin{align*} \frac{x+1}{x+2} &= E[(X-E[X]^2)]\\ &= E[X^{2}-2XE[X]+E[X]^2]\\ &= E[X^{2]}-2E[X]E[X]+E[X]^2 \end{align*}$

what?! Lamport invented LaTex?!!!

Monotonic reads

Monotonic reads is a lesser guarantee than strong consistency, but a stronger guarantee than eventual consistency.

所以说eventual consistency is the weakest? 应该是这样

monotonic reads only means that if one user makes several reads in sequence, they will not see time go backward— i.e., they will not read older data after having previously read newer data.

One way to achieve this is to make sure user read only from the same replica

Consistent prefix reads

有一种异常是在follower2 比 follower1 要快的时候，而两个不同user 分别在这两个follower 的leader 上面做writes，然后第三个人，observer, 从两个follower 里面读数据，这时候就会造成先读到follower2 然后follower1, 会造成违反因果律的情况，（follower1是一个提问，follower2 是一个答案）结果这个人先看到了问题的答案，然后才看到问题

consistent prefix reads prevents this kind of anomaly 就是确保了任何人读的时候，都是按照写的顺序读的

One solution is to make sure that any writes that are causally related to each other are written to the same partition—but in some applications that cannot be done effi‐ ciently