2.2 Finds in the LSM-tree Index
When an exact-match find or range find requiring immediate response is performed through the LSM-tree index, first the C0 tree and then the C1 tree is searched for the value or values desired.
当一个明确匹配的搜索或者顺序搜索请求立即执行,首先搜索C0,之后的C1.
This may imply a slight CPU overhead compared to the B-tree case, since two directories may need to be searched.
这可能稍微超过B-tree的比较开销,因为可能需要搜索两个目录。
In LSM-trees with more than two components, there may also be an I/O overhead.
LSM中超过两个组件,可能有I/O开销。
To anticipate Chapter 3 somewhat, we define a multi component LSM-tree as having components C0, C1, C2, . . ., CK-1 and CK, indexed tree structures of increasing size, where C0 is memory resident and all other components are disk resident.
预测第三章,定义一个多个组件C0~CK,索引结构增加大小,C0存储在内存,其他在磁盘。
There are asynchronous rolling merge processes in train between all component pairs (Ci-1, Ci) that move entries out from the smaller to the larger component each time the smaller component, Ci-1, exceeds its threshold size.
在所有组件对(Ci-1, Ci)之间存在异步滚动合并进程,每当较小的组件Ci-1超过其阈值时,这些进程就将条目从较小的组件移到较大的组件。(有道翻译)
As a rule, in order to guarantee that all entries in the LSM-tree have been examined, it is necessary for an exact-match find or range find to access each component Ci through its index structure.
作为一个规则,为了保证检索所有的LSM内容都检查一遍,需要一个精确匹配搜索或者顺序遍历每个组件的索引结构。
However, there are a number of possible optimizations where this search can be limited to an initial subset of the components.
然而,有一些方法可以优化搜索在一个受限的初始子集。
First, where unique index values are guaranteed by the logic of generation, as when time- stamps are guaranteed to be distinct, a matching indexed find is complete if it locates the desired value in an early Ci component.
首先,在生成逻辑保证惟一索引值的情况下(如保证时间戳是不同的),如果匹配的索引find在早期Ci组件中找到所需的值,那么它就是完整的。(有道翻译)
As another example, we could limit our search when the find criterion uses recent timestamp values so that the entries sought could not yet have migrated out to the largest components.
另一个例子是,当find条件使用最近的时间戳值时,我们可以限制搜索,以便搜索的条目还不能迁移到最大的组件。(有道翻译)
As the merge cursor circulates through the (Ci, Ci+1) pairs, we will often have reason to retain entries in Ci that have been inserted in the recent past (in the last τi seconds), allowing only the older entries to go out to Ci+1.
当合并游标在(Ci, Ci+1)对中循环时,我们通常有理由保留Ci中最近插入的条目(最近τi秒),只允许较老的条目输出到Ci+1。(有道翻译)
In cases where the most frequent find references are to recently inserted values, many finds can be completed in the C0 tree, and so the C0 tree fulfills a valuable memory buffering function.
在最频繁的查找引用是最近插入的值的情况下,许多查找可以在C0树中完成,因此C0树实现了有价值的内存缓冲功能。(有道翻译)
This point was made also in [23], and represents an important efficiency consideration.
这一点也在[23]中提出,并代表了一个重要的效率考虑。(有道翻译)
For example, indexes to short- term transaction UNDO logs accessed in the event of an abort will have a large proportion of accesses in a relatively short time-span after creation, and we can expect most of these indexes to remain memory resident.
例如,在中断事件中访问的短期事务UNDO日志的索引将在创建后相对较短的时间内有很大比例的访问,我们可以预期这些索引中的大多数将保留在内存中。(有道翻译)
By keeping track of the start-time for each transaction we can guarantee that all logs for a transaction started in the last τ0 seconds, for example, will be found in component C0, without recourse to disk components.
通过跟踪每一笔交易的开始时间,我们可以保证所有在最后τ0秒内开始的交易的日志,例如,将在组件C0中找到,而无需求助于磁盘组件。(有道翻译)
todo:自己翻译