3. Cost-Performance and the Multi-Component LSM-Tree(3)
The factor COSTπ/COSTP corresponding to the ratio of efficiency of multi-page block over single page I/O is a constant, and we can do nothing with the LSM-tree structure to have any effect on it.
However the batching efficiency 1/M of a merge step is proportional to the ratio in size be- tween the C0 and the C1 components;
the larger the C0 component in comparison to the C1 component, the more efficiency is gained in the merge;
up to a certain point, this means that we can save additional money on disk arm cost by using a larger C0 component, but this entails a larger memory cost to contain the C0 component.
There is an optimal mix of sizes to minimize the total cost of disk arms and memory capacity, but the solution can be quite expensive in terms of memory for a large C0.
It is this consideration that motivates the need for a multi- component LSM-tree, which is investigated in Section 3.3. A three component LSM-tree has memory resident component C0 and disk resident components C1 and C2, where the components increase in size with increasing subscript.
There is a rolling merge processes in train between C0 and C1 as well as a separate rolling merge between C1 and C2 that move entries out from the smaller to the larger component each time the smaller component exceeds its threshold size.
The advantage of an LSM-tree of three components is that batching efficiency can be geometri- cally improved by choosing C1 to optimize the combined ratio of size between C0 and C1 and be- tween C1 and C2. As a result, the size of the C0 memory component can be made much smaller in proportion to the total index, with a significant improvement in cost.
对应于多页块与单页I/O的效率比率的因子COSTπ/ cost是一个常数,我们不能对lsm树结构进行任何操作来对它产生任何影响。
而归并步骤的配料效率1/M与C0和C1组分的大小之比成正比;
与C1组分相比,C0组分越大,合并效率越高;
在某种程度上,这意味着我们可以通过使用更大的C0组件来节省磁盘臂成本,但这需要更大的内存成本来包含C0组件。
有一个优化的大小组合,可以最小化磁盘臂和内存容量的总成本,但是对于大型C0来说,这个解决方案在内存方面可能非常昂贵。
正是这种考虑激发了对多组件lsm树的需求,这将在第3.3节中进行研究。一个有三个组件的lsm树有内存驻留组件C0和磁盘驻留组件C1和C2,其中组件的大小随着下标的增加而增加。
在C0和C1之间有一个滚动合并过程,在C1和C2之间也有一个单独的滚动合并过程,每次较小的组件超过其阈值时,将条目从较小的组件移动到较大的组件。
三组分lsm树的优点是通过选择C1来优化C0与C1的组合比例和C1与C2之间的补间,可以在几何上提高批处理效率。因此,C0内存组件的大小与总索引的比例可以大大降低,从而大大提高了成本。
(都是有道翻译)
Section 3.4 derives a mathematical procedure for arriving at the optimal relative sizes of the different components of a multi-component LSM-tree to minimize total cost for memory and disk.
第3.4节导出了一个数学过程,该过程用于获得多组件lsm树中不同组件的最佳相对大小,以最小化内存和磁盘的总成本。(有道翻译)
todo:自己翻译