2022-09-25 ES 存储设计(一)

学习资料:

1.https://www.elastic.co/cn/blog/found-dive-into-elasticsearch-storage

2.https://alibaba-cloud.medium.com/analysis-of-lucene-basic-concepts-5ff5d8b90a53

3.https://dzone.com/refcardz/lucene

4.https://stackoverflow.com/questions/2602253/how-does-lucene-index-documents

5.https://lucene.apache.org/core/3_5_0/fileformats.html

目标:

学习Lucene的存储设计

ES的学习暂时不急,因为ES包含大量关于分布式的东西,现在专注单体存储设计

Definitions

官网 基础定义

Segment

Segment
基础存储设计

When Lucene writes data it first writes to an in-memory buffer (similar to MemTable in LSM, but not readable). When the data in the Buffer reaches a certain amount, it will be flushed to become a Segment. Every segment has its own independent index and are independently searchable, but the data can never be changed. This scheme prevents random writes. Data is written as Batch or as an Append and achieves a high throughput. The documents written in the Segment cannot be modified, but they can be deleted. The deletion method does not change the file in its original, internal location, but the DocID of the document to be deleted is saved by another file to ensure that the data file cannot be modified. Index queries need to query multiple Segments and merge the results, as well as handling deleted documents. In order to optimize queries, Lucene has a policy to merge multiple segments and in this regard is similar to LSM’s Merge of SSTable.

Type of fields

1.In Lucene, fields may bestored, in which case their text is stored in the index literally, in a non-inverted manner.

2.Fields that are inverted are called indexed.

3.The text of a field may betokenizedinto terms to be indexed, or the text of a field may be used literally as a term to be indexed

Field Infos

1. fnm the definition of filed

2. fdx the index file of field

3. fdt The stored fields for documents

Term Infos

1. tis Part of the term dictionary, stores term info

2. tii The index into the Term Infos file

Other

all
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容