6.1 Extensions of LSM-tree Application
(有道翻译)
To begin with, it should be clear that the LSM-tree entries could themselves contain records rather than RIDs pointing to records elsewhere on disk. This means that the records themselves can be clustered by their keyvalue. The cost for this is larger entries and a concomitant ac-celeration of the rate of insert R in bytes per second and therefore of cursor movement and total I/O rate H. However, as we saw in Example 3.3 a three component LSM-tree should be able to provide the necessary circulation at a cost of the disk media to store the records and index, and all of this disk media would be needed in any event to store the rows in a non-clustered manner.
首先,应该清楚的是,LSM-tree条目本身可以包含记录,而不是指向磁盘上其他地方的记录的rid。这意味着记录本身可以根据它们的键值进行集群。这是更大的条目和相应的成本慢慢的插入R字节每秒的速度,因此光标移动和总I / O率h。然而,当我们看到在例3.3三个组件LSM-tree应该能够提供必要的流通成本的磁盘媒体存储记录和索引,所有这些磁盘介质在任何情况下都需要以非集群方式存储行。
Advantages of clustering might have quite important performance implications. For example, consider the Escrow transactional method [20], which serves as a good layer to support work- flow management because of the non-blocking nature of long-lived updates. In the Escrow method, a number of incremental changes to various aggregate Escrow fields can be generated by a long-lived transaction transaction. The approach used is to set aside the incremental amount requested (Escrow quantity) and unlock the aggregate record for concurrent requests. We need to keep logs for these Escrow quantities, and we can think of two possible clustering indexes for these logs: Transaction ID (TID) of the generating transaction, and Field ID (FID) of the field on which the Escrow quantity was taken. We might easily have twenty Escrow logs with a single TID in existence over an extended period (extended enough so that the logs are no longer be memory resident in classical log structures), and clustering by TID would be important up until the time when the transaction performs a commit or abort, which determines the ultimate effect these logs will have. In the event of a commit, the quantity taken out of the field would be permanent and the log can simply be forgotten, but in the event of an abort we would like to return the quantity to the field specified by the log's FID. A certain amount of speed is called for. In processing an abort, the logs of an aborted transaction should be accessed (clustering by TID is an important advantage) and fields with corresponding FID should be corrected. However, if the field is not memory resident, rather than read in the containing record the log can be re- inverted (placed in a different LSM-tree) clustered by its FID. Then when an Escrow field is read back into memory, we will try to access all logs clustered by FID that might have some update to perform; again there might be a large number of logs accessed, and clustering these logs in an LSM-tree is an important savings. Using LSM-trees to cluster Escrow logs first by TID, then by FID when the associated field is not in memory, will save a large number of I/Os where long-lived transactions make large numbers of updates to cold or warm data. This ap- proach is an improvement over the "extended field" concept of [20].
集群的优点可能具有非常重要的性能影响。例如,考虑托管事务方法[20],由于长时间更新的非阻塞特性,它可以作为一个很好的层来支持工作流管理。在托管方法中,可以通过一个长期存在的事务事务生成对各种集合托管字段的许多增量更改。所使用的方法是留出请求的增量数量(托管数量),并为并发请求解锁聚合记录。我们需要为这些托管数量保留日志,我们可以为这些日志考虑两个可能的聚类索引:生成事务的事务ID (TID)和获取托管数量的字段的字段ID (FID)。我们可能很容易有二十托管日志与单个TID存在很长一段时间内(足够扩展日志不再是内存常驻在经典日志结构),和集群TID会重要到事务执行提交或中止时,这决定了这些日志的最终效果。在提交的情况下,从字段中取出的数量将是永久的,日志可以简单地忘记,但在中止的情况下,我们希望将数量返回到日志的FID指定的字段。需要一定的速度。在处理中止时,应该访问中止事务的日志(通过TID进行集群是一个重要的优势),并且应该更正带有相应FID的字段。然而,如果该字段不是内存驻留的,而不是在包含的记录中读取,则可以由其FID集群将日志重新倒置(放置在不同的lsm树中)。然后,当托管字段被读入内存时,我们将尝试访问所有由FID集群的日志,这些日志可能需要执行一些更新;同样,可能会访问大量的日志,将这些日志聚集到lsm树中是一个重要的节省。使用lsm -树首先由TID对托管日志进行集群,然后在关联字段不在内存中的情况下由FID对其进行集群,这将节省大量I/ o,在这些I/ o中,长寿命事务将对冷数据或暖数据进行大量更新。这种方法是对[20]的“扩展域”概念的改进。
Another possible variation to the LSM-tree algorithm mentioned at the end of Section 2.2 is the possibility of retaining recent entries (generated in the last τi seconds) in component Ci rather then letting them migrate out to Ci+1. A number of alternatives are suggested by this idea. One variation suggests that during cursor circulation, a time-key index such as that provided by the TSB-tree might be generated. The rolling merge can be used to provide great efficiency for new version inserts, and the multi-component structure suggests a final component migration to write-once storage, with a good deal of control over archival time-key indexing. This approach clearly deserves further study, and has been the subject of a conference paper [22].
在第2.2节末尾提到的lsm树算法的另一种可能的变化是可能保留成分Ci中最近的条目(在最后τi秒内产生),而不是让它们迁移到Ci+1。这个想法提出了许多替代方案。一种变化表明,在游标循环期间,可能会生成tsb树所提供的时间键索引。滚动合并可用于提高新版本插入的效率,多组件结构建议将最终组件迁移到一次写存储,并对归档时间键索引进行了大量控制。这种方法显然值得进一步研究,并已成为会议文件[22]的主题。
Other ideas for further research include the following.
其他进一步研究的想法包括以下几点。
(1) Extend the cost analysis approach of Theorem 3.1 and Example 3.3 to situations where some proportion of find operations must be balanced with the merge for purposes of I/O bal- ancing. Because of tha added load on the disks, it will no longer be possible to assign all of the disk I/O capacity to rolling merge operations and optimize for that case. Some proportion of the disk capacity will have to be set aside for the find operation workload. Other ways to extend the cost analysis are to allow for deletions prior to migration to component CK and consider re- taining some proportion of recent entries in the inner component Ci-1 during the (Ci-1, Ci) merge.
(1)将定理3.1和例3.3的成本分析方法扩展到某些情况下,为了实现I/O全局化,find操作的一部分必须与合并操作平衡。由于磁盘上增加了负载,将不再可能将所有磁盘I/O容量分配给滚动合并操作并针对这种情况进行优化。必须为查找操作工作负载预留磁盘容量的一部分。扩展成本分析的其他方法是允许在迁移到组件CK之前进行删除,并考虑在(Ci-1, Ci)合并期间在内部组件Ci-1中重新保留一些最近的条目。
(2) It is clear that we can offload the CPU work to maintain the LSM-tree so that this doesn't have to be done by the CPU that produces the log records. We merely need to communicate the logs to the other CPU and then communicate later find requests as well. In cases where there is shared memory, it is possible that finds can be done almost without added latency. The design for such distributed work needs to be carefully thought out.
(2)很明显,我们可以卸下CPU的工作来维护LSM-tree,这样就不需要由产生日志记录的CPU来完成。我们只需要将日志通信到另一个CPU,然后再通信查找请求。在有共享内存的情况下,查找几乎可以在不增加延迟的情况下完成。这种分布式工作的设计需要仔细考虑。
Acknowledgments
The authors would like to acknowledge the assistance of Jim Gray and Dave Lomet, both of whom read an early version of this paper and made valuable suggestions for improvement. In addition, the reviewers for this journal article made many valuable suggestions.
作者感谢Jim Gray和Dave Lomet的帮助,他们都阅读了本文的早期版本,并提出了宝贵的改进建议。此外,本期刊文章的审稿人也提出了许多有价值的建议。
todo:自己翻译