LSM-Tree(54)

4.2. Recovery in the LSM-tree(4)

This recovery approach clearly works, and its only drawback is that there is a possibly large pause while various disk writes take place during the checkpoint process. This pause is not terribly significant, however, since we can write the C0 component to disk in a short period and then resume inserts to the C0 component while the rest of the writes to disk complete; this will simply result in a longer than usual latency period during which index entries newly inserted to C0 are not merged out to larger disk-based components. Once the checkpoint is complete, the rolling merge process can catch up on work it has missed. Note that the last piece of informa- tion mentioned in the checkpoint log list above was the current information for dynamic allo- cation of new multi-page blocks. In the case of a crash, we will need to figure out in recovery what multi-page blocks are available in our dynamic disk storage allocation algorithm. This is clearly not a difficult problem; in fact a more difficult problem of garbage collecting frag- mented information within such a block had to be solved in [23].
这种恢复方法显然是有效的,惟一的缺点是在检查点过程中发生各种磁盘写操作时可能会有很大的暂停。不过,这个暂停并不是特别重要,因为我们可以在很短的时间内将C0组件写入磁盘,然后在完成对磁盘的其他写入操作时继续插入C0组件;这只会导致比通常更长的延迟时间,在此期间,新插入到C0的索引项不会合并到更大的基于磁盘的组件。一旦检查点完成,滚动合并进程就可以补上它错过的工作。注意,检查点日志列表中提到的最后一条信息是动态分配新多页块的当前信息。在发生崩溃的情况下,我们需要在恢复时弄清楚在动态磁盘存储分配算法中有哪些多页块可用。这显然不是一个困难的问题;事实上,在这样的块中,垃圾收集碎片信息的一个更困难的问题必须在[23]中解决。(有道翻译)

Another detail of recovery has to do with directory information. Note that as the rolling merge progresses, each time a multi-page block or a higher level directory node is brought in from disk to be emptied it must immediately be assigned a new disk position in case a checkpoint occurs before the emptying is completed and remaining buffered information must be forced out to disk. This means that the directory entries pointing down to the emptying nodes must be immediately corrected to point to the new node locations. Similarly we must immediately assign a disk position for newly created nodes so that directory entries in the tree will be able to point immediately to the appropriate position on disk. At every point we need to take care that di- rectory nodes containing pointers to lower-level nodes buffered by a rolling merge are also buffered; only in this way can we make all necessary modifications quickly so that a checkpoint will not be held up waiting for I/Os to correct directories. Furthermore, after a checkpoint occurs and the multi-page blocks are read back into memory buffers to continue the rolling merge, all the blocks involved must be assigned to a new disk position, and thus all directory pointers to subsidiary nodes must be corrected. If this sounds like a great deal of work the reader should recall that there is no additional I/O necessary and the number of pointers in- volved is probably only about 64 for each block buffered. Furthermore these changes should be amortized over a large number of merged nodes, assuming that the checkpoints are only taken frequently enough to keep recovery time from growing beyond a few minutes; this implies a few minutes of I/O between checkpoints.
恢复的另一个细节与目录信息有关。注意,随着轧制合并的进行,每次一个多页的块或更高级别的目录节点从磁盘把它必须立即被分配一个新磁盘的位置,以防出现检查点清空之前完成,剩余的缓冲信息必须被迫离开到磁盘。这意味着指向空节点的目录条目必须立即更正为指向新节点位置。类似地,我们必须立即为新创建的节点分配磁盘位置,以便树中的目录条目能够立即指向磁盘上的适当位置。在每一点上,我们都需要注意,包含由滚动合并缓冲的低层节点指针的目录节点也要被缓冲;只有这样,我们才能快速地进行所有必要的修改,这样检查点才不会因为等待I/ o修正目录而被占用。而且,在检查点发生并且将多页块读入内存缓冲区以继续滚动合并之后,所有涉及到的块都必须分配到一个新的磁盘位置,因此所有指向附属节点的目录指针都必须更正。如果这听起来像是大量的工作,读者应该记得,没有额外的I/O必要,涉及的指针的数量可能只有64个左右的每个块缓冲。此外,这些更改应该分摊到大量合并的节点上,假设检查点的使用频率仅够保持恢复时间不超过几分钟;这意味着在检查点之间需要几分钟的I/O。(有道翻译)

todo:自己翻译,仔细阅读

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 217,406评论 6 503
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 92,732评论 3 393
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 163,711评论 0 353
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 58,380评论 1 293
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 67,432评论 6 392
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 51,301评论 1 301
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 40,145评论 3 418
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 39,008评论 0 276
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 45,443评论 1 314
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,649评论 3 334
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,795评论 1 347
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 35,501评论 5 345
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 41,119评论 3 328
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,731评论 0 22
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,865评论 1 269
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,899评论 2 370
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,724评论 2 354

推荐阅读更多精彩内容