Elasticsearch的数据写入过程

简单的写入过程流程图

1、过程

备注：可以先看看《Direct I/O 与 Buffer I/O》会更好理解本文
（1）数据在被node写入以后，不会直接被写入到磁盘，而是先写到一块index buffer的内存中，同时写入translog文件（参数控制直接写盘还是写page cache）。在index buffer中的数据达到阈值以后，会触发reflush操作将数据写入到segment file。
（2）在translog file中的数据量达到一定阈值以后，会触发flush操作，清空translog，生成commit point文件。
备注:ES通过写入SegmentFile再Merge的方式写数据，主要是为了顺序写，并且无锁。

2、概念

（1）refresh：触发数据从index buffer中写入到page cache，清空index buffer里面的数据。
（2）flush：An Elasticsearch flush is the process of performing a Lucene commit and starting a new translog。即，触发refresh，删除旧的translog file，防止translog file太大节点宕机要恢复的话会需要很久；生成commit point
（3）commit point：记录每个index的所有segment（已经被fsync以后的数据，不包括在page cache的部分）。segment在merge的时候也会更新这个文件
（4）index buffer：内存缓冲区。一个node只有一块index buffer，所有shard共用。数据会在index buffer中排序、压缩。
（5）translog：shard级别，一个translog对应一个shard。

3、数据完整性以及故障恢复

数据被写入到index buffer中的时候，如果发生trash，此时数据没有被持久化的数据会丢失。基于tranlog的机制，node在恢复的时候，会从translog文件恢复那些数据（如果不是选择从page cache同步写入到file的机制，其实也是会丢掉{index.translog.sync_interval}的数据）

4、参数

参数	备注
index.translog.durability	request:每次请求page cache中的数据都fsync到磁盘中(同步写)。async:{index.translog.sync_interval}时间会fsync一次到磁盘。Elasticsearch v2.0以后默认是request模式，如果能接受少量数据的丢失，可以选择async
index.translog.sync_interval	page cache中的数据多久fsync到translog file,默认5s
index.translog.flush_threshold_size	控制translog file触发flush的阈值。默认512MB。~~调大这个阈值能减少merge次数，但是增加recovery的时间。~~
~~index.translog.retension.size~~	同上
~~index.translog.retension.age~~	同上
indices.memory.index_buffer_size	决定了index buffer的大小，默认是10%，即10%的heap大小。index buffer大小是全局配置，无法为每个index配置
index.refresh_interval	决定了refresh的频率，默认是1s，可以设置为-1，等到index buffer满了以后再触发refresh。index级别的配置，可单独为每个index配置

reference

图解elasticsearch的写入流程(包含对refresh、fsync、flush操作的理解)
Elasticsearch translog文件介绍

最后编辑于：2020.05.05 08:38:51