LSM-Tree(10)

2.1 How a Two Component LSM-tree Grows

To trace the metamorphosis of an LSM-tree from the beginning of its growth, let us begin with a first insertion to the C0 tree component in memory.
跟踪LSM-tree增加内容的变化过程,先从内存存储的C0树插入开始。
Unlike the C1 tree, the C0 tree is not expected to have a B-tree-like structure.
不像C1,C0没有类似B树的结构。
For one thing, the nodes could be any size: there is no need to insist on disk page size nodes since the C0 tree never sits on disk, and so we need not sacrifice CPU efficiency to minimize depth.
首先,大小没有限制:不需要匹配磁盘页大小,C0不会存储在磁盘上,所以不需要消耗CPU减少深度。
Thus a (2-3) tree or AVL-tree (as explained, for example, in [1]) are possible alternative structures for a C0 tree.
因此一个(2-3)树活在AVL-tree是两个可以选择的C0结构。
When the growing C0 tree first reaches its threshold size, a leftmost sequence of entries is deleted from the C0 tree (this should be done in an efficient batch manner rather than one entry at a time) and reorganized into a C1 tree leaf node packed 100% full.
当C0达到边界大小,一个最左边的序列会从C0删除(批量操作,而不是每条内容只想一次)重组写入填满C1的子节点。
Successive leaf nodes are placed left-to-right in the initial pages of a buffer resident multi-page block until the block is full;
连续的叶子节点从左到右存储在缓存的多页存储块中,知道存满;
then this block is written out to disk to become the first part of the C1 tree disk-resident leaf level.
然后存储磁盘称为C1的叶节点。
A directory node structure for the C1 tree is created in memory buffers as successive leaf nodes are added, with details explained below.
C1的节点目录连续的存储在内存中,详细内容后续分析。

Successive multi-page blocks of the C1 tree leaf level in ever increasing key-sequence order are written out to disk to keep the C0 tree threshold size from exceeding its threshold.
以不断增加的键序顺序将C1树叶子级的连续多页块写入磁盘,以防止C0树阈值大小超过其阈值。(有道翻译)
Upper level C1 tree directory nodes are maintained in separate multi-page block buffers, or else in single page buffers, whichever makes more sense from a standpoint of total memory and disk arm cost;
上层C1树目录节点是在单独的多页块缓冲区中维护的,或者是在单页缓冲区中维护的,从总内存和磁盘臂成本的角度来看,哪个更有意义;(有道翻译)
entries in these directory nodes contain separators that channel access to individual single-page nodes below, as in a B-tree.
这些目录节点中的条目包含分隔符,分隔符引导对下面单个单页节点的访问,就像在b -树中一样。(有道翻译)
The intention is to provide efficient exact-match access along a path of single page index nodes down to the leaf level, avoiding multi-page block reads in such a case to minimize memory buffer requirements.
这样做的目的是沿着单页索引节点到叶节点的路径提供高效的精确匹配访问,在这种情况下避免多页块读取,以最小化内存缓冲区需求。(有道翻译)
Thus we read and write multi- page blocks for the rolling merge or for long range retrievals, and single-page nodes for indexed find (exact-match) access.
因此,我们为滚动合并或长范围检索读写多页块,为索引的find(精确匹配)访问读写单页节点。(有道翻译)
A somewhat different architecture that supports such a dichotomy is presented in [21].
一个不同的架构见[21].
Partially full multi-page blocks of C1 directory nodes are usually allowed to remain in buffer while a sequence of leaf node blocks are written out.
在写出叶节点块序列时,C1目录节点的部分满的多页块通常允许留在缓冲区中。(有道翻译)
C1 directory nodes are forced to new positions on disk when:
C1目录在下面的情况会强制写入磁盘新空间:
-A multi-page block buffer containing directory nodes becomes full
多页缓存满
-The root node splits, increasing the depth of the C1 tree (to a depth greater than two)
根分裂,增加一层深度
-A checkpoint is performed
执行检查点

In the first case, the single multi-page block which has filled is written out to disk.
在第一种情况下,已填满的单个多页块被写入磁盘。
In the latter two cases, all multi-page block buffers and directory node buffers are flushed to disk.
后面两种情况,多页块的缓存和目录缓存写入磁盘。

After the rightmost leaf entry of the C0 tree is written out to the C1 tree for the first time, the process starts over on the left end of the two trees, except that now and with successive passes multi-page leaf-level blocks of the C1 tree must be read into buffer and merged with the entries in the C0 tree, thus creating new multi-page leaf blocks of C1 to be written to disk.
最右边的C0树的叶子条目后首次写入C1树,左边的过程重新开始的两棵树,现在除了和连续通过多页面C1树的叶级块必须读入缓冲区和合并C0树中的条目,从而创建新的C1多页叶块,将其写入磁盘。(有道翻译)

Once the merge starts, the situation is more complex.
一旦开始合并,场景就更复杂了。
We picture the rolling merge process in a two component LSM-tree as having a conceptual cursor which slowly circulates in quantized steps through equal key values of the C0 tree and C1 tree components, drawing indexing data out from the C0 tree to the C1 tree on disk.
我们将两个组件lsm树中的滚动合并过程描绘为一个概念性游标,它以量化的步骤缓慢地循环C0树和C1树组件的相同键值,将索引数据从C0树提取到磁盘上的C1树。(有道翻译)。
The rolling merge cursor has a position at the leaf level of the C1 tree and within each higher directory level as well.
位置标记在C1的叶子层上,同时在每一个更高层的目录上。
At each level, all currently merging multi-page blocks of the C1 tree will in general be split into two blocks:
每层,C1的当前多页块一般分成两块:
the "empty- ing" block whose entries have been depleted but which retains information not yet reached by the merge cursor,
空块:保留了已删除但是合并标记还没有执行到的内容,
and the "filling" block which reflects the result of the merge up to this moment.
填充块:合并结果保存。
There will be an analogous "filling node" and "emptying node" defining the cursor which will certainly be buffer resident.
缓存中的填充块和空快都有类似的填充节点和空节点对应。
For concurrent access purposes, both the emptying block and the filling block on each level contain an integral number of page-sized nodes of the C1 tree, which simply happen to be buffer resident.
出于并发操作目的,每层的空快和填充快都有一个整数表示页节点大小,简化缓存操作。
(During the merge step that restructures individual nodes, other types of concurrent access to the entries on those nodes are blocked.
在合并操作重构节点阶段,其他并发操作节点的被阻塞。
) Whenever a complete flush of all buffered nodes to disk is required, all buffered information at each level must be written to new positions on disk (
当一个完整flush缓存操作执行,所有缓存信息写入新磁盘新位置
with positions reflected in superior directory information, and a sequential log entry for recovery purposes).
带有目录信息序列化操作log用于恢复信息)。
At a later point, when the filling block in buffer on some level of the C1 tree fills and must be flushed again, it goes to a new position.
之后,一个填充快需要写入磁盘时,会写入一个新的磁盘位置。
Old information that might still be needed during recovery is never overwritten on disk, only invalidated as new writes succeed with more up-to-date information.
旧的信息仍然可能用于恢复信息,只有更新成功了,信息才失效。
A somewhat more detailed explanation of the rolling merge process is presented in Section 4, where con- currency and recovery designs are considered.
第四章更详细介绍合并处理过程,包括并发和恢复的设计实现。

It is an important efficiency consideration of the LSM-tree that when the rolling merge process on a particular level of the C1 tree passes through nodes at a relatively high rate, all reads and writes are in multi-page blocks.
当在C1树的特定级别上滚动合并进程以相对较高的速度通过节点时,所有读和写都在多页块中,这是lsm树的一个重要效率考虑因素。(有道翻译)
By eliminating seek time and rotational latency, we expect to gain a large advantage over random page I/O involved in normal B-tree entry insertion.
通过消除寻道时间和旋转延迟,我们期望获得比普通b树插入所涉及的随机页面I/O更大的优势。(有道翻译)
(This advantage is analyzed below, in Section 3.2.
详细信息见3.2.
) The idea of always writing multi-page blocks to new locations was inspired by the Log-Structured File System devised by Rosenblum and Ousterhout [23], from which the Log-Structured Merge-tree takes its name.
总是将多页块写入新位置的想法受到了Rosenblum和Ousterhout[23]设计的日志结构文件系统的启发,日志结构合并树就是从这个系统中获得名字的。(有道翻译)
Note that the continuous use of new disk space for fresh multi-page block writes implies that the area of disk being written will wrap, and old discarded blocks must be reused.
请注意,为新的多页块写而持续使用新的磁盘空间意味着要写入的磁盘区域将被包装,并且必须重用旧的丢弃的块。(有道翻译)
This bookkeeping can be done in a memory table; old multi-page blocks are invalidated and reused as single units, and re- covery is guaranteed by the checkpoint.
这种簿记可以在内存表中完成;旧的多页块失效并作为单个单元重用,恢复由检查点保证。(有道翻译)
In the Log-Structured File System, the reuse of old blocks involves significant I/O because blocks are typically only partially freed up, so reuse requires a block read and block write.
在日志结构文件系统中,旧块的重用涉及大量的I/O,因为块通常只被部分释放,所以重用需要读块和写块。(有道翻译)
In the LSM-Tree, blocks are totally freed up on the trailing edge of the rolling merge, so no extra I/O is involved.
在LSM-Tree中,滚动合并的后缘完全释放了块,因此不涉及额外的I/O。(有道翻译)

吐槽:作者的长句子太长了。
todo:在分析一下,部分没有自己翻译

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 194,319评论 5 459
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 81,801评论 2 371
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 141,567评论 0 319
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 52,156评论 1 263
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 61,019评论 4 355
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 46,090评论 1 272
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 36,500评论 3 381
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 35,192评论 0 253
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 39,474评论 1 290
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 34,566评论 2 309
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 36,338评论 1 326
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 32,212评论 3 312
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 37,572评论 3 298
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 28,890评论 0 17
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 30,169评论 1 250
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 41,478评论 2 341
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 40,661评论 2 335

推荐阅读更多精彩内容

  • 2. The Two Component LSM-Tree Algorithm(3) The rolling me...
    i_need_job阅读 192评论 0 0
  • 2. The Two Component LSM-Tree Algorithm(2) The operation ...
    i_need_job阅读 161评论 0 0
  • 2. The Two Component LSM-Tree Algorithm(1) An LSM-tree is...
    i_need_job阅读 141评论 0 1
  • pyspark.sql模块 模块上下文 Spark SQL和DataFrames的重要类: pyspark.sql...
    mpro阅读 9,423评论 0 13
  • 16宿命:用概率思维提高你的胜算 以前的我是风险厌恶者,不喜欢去冒险,但是人生放弃了冒险,也就放弃了无数的可能。 ...
    yichen大刀阅读 6,018评论 0 4