LSM-tree 6.1 Extensions of LSM-tree Application

6.1 Extensions of LSM-tree Application

(有道翻译)

To begin with, it should be clear that the LSM-tree entries could themselves contain records rather than RIDs pointing to records elsewhere on disk. This means that the records themselves can be clustered by their keyvalue. The cost for this is larger entries and a concomitant ac-celeration of the rate of insert R in bytes per second and therefore of cursor movement and total I/O rate H. However, as we saw in Example 3.3 a three component LSM-tree should be able to provide the necessary circulation at a cost of the disk media to store the records and index, and all of this disk media would be needed in any event to store the rows in a non-clustered manner.
首先,应该清楚的是,LSM-tree条目本身可以包含记录,而不是指向磁盘上其他地方的记录的rid。这意味着记录本身可以根据它们的键值进行集群。这是更大的条目和相应的成本慢慢的插入R字节每秒的速度,因此光标移动和总I / O率h。然而,当我们看到在例3.3三个组件LSM-tree应该能够提供必要的流通成本的磁盘媒体存储记录和索引,所有这些磁盘介质在任何情况下都需要以非集群方式存储行。

Advantages of clustering might have quite important performance implications. For example, consider the Escrow transactional method [20], which serves as a good layer to support work- flow management because of the non-blocking nature of long-lived updates. In the Escrow method, a number of incremental changes to various aggregate Escrow fields can be generated by a long-lived transaction transaction. The approach used is to set aside the incremental amount requested (Escrow quantity) and unlock the aggregate record for concurrent requests. We need to keep logs for these Escrow quantities, and we can think of two possible clustering indexes for these logs: Transaction ID (TID) of the generating transaction, and Field ID (FID) of the field on which the Escrow quantity was taken. We might easily have twenty Escrow logs with a single TID in existence over an extended period (extended enough so that the logs are no longer be memory resident in classical log structures), and clustering by TID would be important up until the time when the transaction performs a commit or abort, which determines the ultimate effect these logs will have. In the event of a commit, the quantity taken out of the field would be permanent and the log can simply be forgotten, but in the event of an abort we would like to return the quantity to the field specified by the log's FID. A certain amount of speed is called for. In processing an abort, the logs of an aborted transaction should be accessed (clustering by TID is an important advantage) and fields with corresponding FID should be corrected. However, if the field is not memory resident, rather than read in the containing record the log can be re- inverted (placed in a different LSM-tree) clustered by its FID. Then when an Escrow field is read back into memory, we will try to access all logs clustered by FID that might have some update to perform; again there might be a large number of logs accessed, and clustering these logs in an LSM-tree is an important savings. Using LSM-trees to cluster Escrow logs first by TID, then by FID when the associated field is not in memory, will save a large number of I/Os where long-lived transactions make large numbers of updates to cold or warm data. This ap- proach is an improvement over the "extended field" concept of [20].
集群的优点可能具有非常重要的性能影响。例如,考虑托管事务方法[20],由于长时间更新的非阻塞特性,它可以作为一个很好的层来支持工作流管理。在托管方法中,可以通过一个长期存在的事务事务生成对各种集合托管字段的许多增量更改。所使用的方法是留出请求的增量数量(托管数量),并为并发请求解锁聚合记录。我们需要为这些托管数量保留日志,我们可以为这些日志考虑两个可能的聚类索引:生成事务的事务ID (TID)和获取托管数量的字段的字段ID (FID)。我们可能很容易有二十托管日志与单个TID存在很长一段时间内(足够扩展日志不再是内存常驻在经典日志结构),和集群TID会重要到事务执行提交或中止时,这决定了这些日志的最终效果。在提交的情况下,从字段中取出的数量将是永久的,日志可以简单地忘记,但在中止的情况下,我们希望将数量返回到日志的FID指定的字段。需要一定的速度。在处理中止时,应该访问中止事务的日志(通过TID进行集群是一个重要的优势),并且应该更正带有相应FID的字段。然而,如果该字段不是内存驻留的,而不是在包含的记录中读取,则可以由其FID集群将日志重新倒置(放置在不同的lsm树中)。然后,当托管字段被读入内存时,我们将尝试访问所有由FID集群的日志,这些日志可能需要执行一些更新;同样,可能会访问大量的日志,将这些日志聚集到lsm树中是一个重要的节省。使用lsm -树首先由TID对托管日志进行集群,然后在关联字段不在内存中的情况下由FID对其进行集群,这将节省大量I/ o,在这些I/ o中,长寿命事务将对冷数据或暖数据进行大量更新。这种方法是对[20]的“扩展域”概念的改进。

Another possible variation to the LSM-tree algorithm mentioned at the end of Section 2.2 is the possibility of retaining recent entries (generated in the last τi seconds) in component Ci rather then letting them migrate out to Ci+1. A number of alternatives are suggested by this idea. One variation suggests that during cursor circulation, a time-key index such as that provided by the TSB-tree might be generated. The rolling merge can be used to provide great efficiency for new version inserts, and the multi-component structure suggests a final component migration to write-once storage, with a good deal of control over archival time-key indexing. This approach clearly deserves further study, and has been the subject of a conference paper [22].
在第2.2节末尾提到的lsm树算法的另一种可能的变化是可能保留成分Ci中最近的条目(在最后τi秒内产生),而不是让它们迁移到Ci+1。这个想法提出了许多替代方案。一种变化表明,在游标循环期间,可能会生成tsb树所提供的时间键索引。滚动合并可用于提高新版本插入的效率,多组件结构建议将最终组件迁移到一次写存储,并对归档时间键索引进行了大量控制。这种方法显然值得进一步研究,并已成为会议文件[22]的主题。

Other ideas for further research include the following.
其他进一步研究的想法包括以下几点。

(1) Extend the cost analysis approach of Theorem 3.1 and Example 3.3 to situations where some proportion of find operations must be balanced with the merge for purposes of I/O bal- ancing. Because of tha added load on the disks, it will no longer be possible to assign all of the disk I/O capacity to rolling merge operations and optimize for that case. Some proportion of the disk capacity will have to be set aside for the find operation workload. Other ways to extend the cost analysis are to allow for deletions prior to migration to component CK and consider re- taining some proportion of recent entries in the inner component Ci-1 during the (Ci-1, Ci) merge.
(1)将定理3.1和例3.3的成本分析方法扩展到某些情况下,为了实现I/O全局化,find操作的一部分必须与合并操作平衡。由于磁盘上增加了负载,将不再可能将所有磁盘I/O容量分配给滚动合并操作并针对这种情况进行优化。必须为查找操作工作负载预留磁盘容量的一部分。扩展成本分析的其他方法是允许在迁移到组件CK之前进行删除,并考虑在(Ci-1, Ci)合并期间在内部组件Ci-1中重新保留一些最近的条目。

(2) It is clear that we can offload the CPU work to maintain the LSM-tree so that this doesn't have to be done by the CPU that produces the log records. We merely need to communicate the logs to the other CPU and then communicate later find requests as well. In cases where there is shared memory, it is possible that finds can be done almost without added latency. The design for such distributed work needs to be carefully thought out.
(2)很明显,我们可以卸下CPU的工作来维护LSM-tree,这样就不需要由产生日志记录的CPU来完成。我们只需要将日志通信到另一个CPU,然后再通信查找请求。在有共享内存的情况下,查找几乎可以在不增加延迟的情况下完成。这种分布式工作的设计需要仔细考虑。

Acknowledgments

The authors would like to acknowledge the assistance of Jim Gray and Dave Lomet, both of whom read an early version of this paper and made valuable suggestions for improvement. In addition, the reviewers for this journal article made many valuable suggestions.
作者感谢Jim Gray和Dave Lomet的帮助,他们都阅读了本文的早期版本,并提出了宝贵的改进建议。此外,本期刊文章的审稿人也提出了许多有价值的建议。

todo:自己翻译

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 215,133评论 6 497
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 91,682评论 3 390
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 160,784评论 0 350
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 57,508评论 1 288
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,603评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,607评论 1 293
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,604评论 3 415
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 38,359评论 0 270
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,805评论 1 307
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 37,121评论 2 330
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 39,280评论 1 344
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,959评论 5 339
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,588评论 3 322
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 31,206评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,442评论 1 268
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 47,193评论 2 367
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 44,144评论 2 352

推荐阅读更多精彩内容