一、CommitLog
RocketMQ 通过使用内存映射文件来提高IO 访问性能,无论是CommitLog 、ConsumeQueue 单个文件都被设计为固定长度,如果一个文件写满以后再创建一个新文件,文件名就为该文件第一条消息对应的全局物理偏移量。
CommitLog逻辑存储如下:
二、MappedFileQueue
(一) MappedFileQueue核心属性
MappedFileQueue巳是MappedFile 的管理容器, MappedFileQueue 是对存储目录的封装,例如CommitLog 文件的存储路径{ ROCKET_HOME} /store/commitlog/ ,该目录下会存在多个内存映射文件(MappedFile)
MappedFileQueue 的核心属性:
String storePath :存储目录。
int mappedFileSize : 单个文件的存储大小。
CopyOnWriteArrayList<MappedFile> mappedFiles: MappedFile 文件集合。
AllocateMappedFileService allocateMappedFileService :创建MappedFile 服务类。
long flushedWhere = 0 : 当前刷盘指针, 表示该指针之前的所有数据全部持久化到磁盘。
long committedWhere = 0 : 当前数据提交指针,内存中ByteBuffer 当前的写指针,该值大于等于flushedWhere 。
(二) 根据消息存储时间戳来查找MappdFile
根据消息存储时间戳来查找MappdFile 。从MappedFile 列表中第一个文件开始查找,找到第一个最后一次更新时间大于待查找时间戳的文件,如果不存在,则返回最后一个MappedFile 文件。
public MappedFile getMappedFileByTime(final long timestamp) {
Object[] mfs = this.copyMappedFiles(0);
if (null == mfs)
return null;
for (int i = 0; i < mfs.length; i++) {
MappedFile mappedFile = (MappedFile) mfs[i];
if (mappedFile.getLastModifiedTimestamp() >= timestamp) {
return mappedFile;
}
}
return (MappedFile) mfs[mfs.length - 1];
}
(三) 根据偏移量获取文件
根据offset 定位MappedFile 的算法为:( int) ((offset / this.mappedFileSize) -(mappedFile.getFileFromOffset() / this . MappedFileSize ))
/**
* Finds a mapped file by offset.
*
* @param offset Offset.
* @param returnFirstOnNotFound If the mapped file is not found, then return the first one.
* @return Mapped file or null (when not found and returnFirstOnNotFound is <code>false</code>).
*/
public MappedFile findMappedFileByOffset(final long offset, final boolean returnFirstOnNotFound) {
try {
MappedFile firstMappedFile = this.getFirstMappedFile();
MappedFile lastMappedFile = this.getLastMappedFile();
if (firstMappedFile != null && lastMappedFile != null) {
// 是否存在file
if (offset < firstMappedFile.getFileFromOffset() || offset >= lastMappedFile.getFileFromOffset() + this.mappedFileSize) {
LOG_ERROR.warn("Offset not matched. Request offset: {}, firstOffset: {}, lastOffset: {}, mappedFileSize: {}, mappedFiles count: {}",
offset,
firstMappedFile.getFileFromOffset(),
lastMappedFile.getFileFromOffset() + this.mappedFileSize,
this.mappedFileSize,
this.mappedFiles.size());
} else {
int index = (int) ((offset / this.mappedFileSize) - (firstMappedFile.getFileFromOffset() / this.mappedFileSize));
MappedFile targetFile = null;
try {
targetFile = this.mappedFiles.get(index);
} catch (Exception ignored) {
}
if (targetFile != null && offset >= targetFile.getFileFromOffset()
&& offset < targetFile.getFileFromOffset() + this.mappedFileSize) {
return targetFile;
}
for (MappedFile tmpMappedFile : this.mappedFiles) {
if (offset >= tmpMappedFile.getFileFromOffset()
&& offset < tmpMappedFile.getFileFromOffset() + this.mappedFileSize) {
return tmpMappedFile;
}
}
}
if (returnFirstOnNotFound) {
return firstMappedFile;
}
}
} catch (Exception e) {
log.error("findMappedFileByOffset Exception", e);
}
return null;
}
(四) 获取最后一个MappedFile
public MappedFile getLastMappedFile(final long startOffset, boolean needCreate) {
long createOffset = -1;
MappedFile mappedFileLast = getLastMappedFile();
// 是否为空
if (mappedFileLast == null) {
createOffset = startOffset - (startOffset % this.mappedFileSize);
}
// 不为空,但是已经满
if (mappedFileLast != null && mappedFileLast.isFull()) {
createOffset = mappedFileLast.getFileFromOffset() + this.mappedFileSize;
}
// 创建mappedfile
if (createOffset != -1 && needCreate) {
String nextFilePath = this.storePath + File.separator + UtilAll.offset2FileName(createOffset);
String nextNextFilePath = this.storePath + File.separator
+ UtilAll.offset2FileName(createOffset + this.mappedFileSize);
MappedFile mappedFile = null;
// 通过AllocateMappedFileService创建MappedFile,后面小节进行具体分析
if (this.allocateMappedFileService != null) {
mappedFile = this.allocateMappedFileService.putRequestAndReturnMappedFile(nextFilePath,
nextNextFilePath, this.mappedFileSize);
} else {
// 直接创建MappedFile
try {
mappedFile = new MappedFile(nextFilePath, this.mappedFileSize);
} catch (IOException e) {
log.error("create mappedFile exception", e);
}
}
if (mappedFile != null) {
if (this.mappedFiles.isEmpty()) {
mappedFile.setFirstCreateInQueue(true);
}
this.mappedFiles.add(mappedFile);
}
return mappedFile;
}
return mappedFileLast;
}
三、MappedFile
(一) MappedFile 核心属性
MappedFile是RocketMQ 内存映射文件的具体实现,相关属性如下:
int OS_PAGE_SIZE :操作系统每页大小,默认4k 。
AtomicLong TOTAL_MAPPED_VIRTUAL_MEMORY : 当前JVM 实例中MappedFile虚拟内存。
Atomiclnteger TOTAL_MAPPED_FILES :当前JVM 实例中MappedFile 对象个数。
Atomiclnteger wrotePosition : 当前该文件的写指针,从0 开始(内存映射文件中的写指针) 。
Atomiclnteger committedPosition :当前文件的提交指针,如果开启transientStore PoolEnable, 则数据会存储在TransientStorePool 中, 然后提交到内存映射ByteBuffer 中, 再刷写到磁盘。
Atomiclnteger flushedPosition :刷写到磁盘指针,该指针之前的数据持久化到磁盘中。
int fileSize :文件大小。
FileChannel fileChannel : 文件通道。
ByteBuffer writeBuffer :堆内存ByteBuffer , 如果不为空,数据首先将存储在该Buffer 中, 然后提交到MappedFile 对应的内存映射文件Buffer 。transientStorePoolEnable为true 时不为空。
TransientStorePool transientStorePool :堆内存池, transientStorePoolEnable 为true时启用。
String fileName :文件名称。
long fileFromOffset :该文件的初始偏移量。
File file :物理文件。
MappedByteBuffer mappedByteBuffer :物理文件对应的内存映射Buffer 。
volatile long storeTimestamp = 0 :文件最后一次内容写入时间。
boolean firstCreatelnQueue :是否是MappedFileQueue 队列中第一个文件。
(二) MappedFile 初始化
private void init(final String fileName, final int fileSize) throws IOException {
this.fileName = fileName;
this.fileSize = fileSize;
this.file = new File(fileName);
this.fileFromOffset = Long.parseLong(this.file.getName());
boolean ok = false;
// 确保目录存在
ensureDirOK(this.file.getParent());
try {
this.fileChannel = new RandomAccessFile(this.file, "rw").getChannel();
this.mappedByteBuffer = this.fileChannel.map(MapMode.READ_WRITE, 0, fileSize);
TOTAL_MAPPED_VIRTUAL_MEMORY.addAndGet(fileSize);
TOTAL_MAPPED_FILES.incrementAndGet();
ok = true;
} catch (FileNotFoundException e) {
log.error("create file channel " + this.fileName + " Failed. ", e);
throw e;
} catch (IOException e) {
log.error("map file " + this.fileName + " Failed. ", e);
throw e;
} finally {
if (!ok && this.fileChannel != null) {
this.fileChannel.close();
}
}
}
(三) MappedFile 提交
内存映射文件的提交动作由MappedFile 的commit 方法实现,具体实现如下:
public int commit(final int commitLeastPages) {
if (writeBuffer == null) {
//no need to commit data to file channel, so just regard wrotePosition as committedPosition.
return this.wrotePosition.get();
}
if (this.isAbleToCommit(commitLeastPages)) {
if (this.hold()) {
commit0(commitLeastPages);
this.release();
} else {
log.warn("in commit, hold failed, commit offset = " + this.committedPosition.get());
}
}
// All dirty data has been committed to FileChannel.
if (writeBuffer != null && this.transientStorePool != null && this.fileSize == this.committedPosition.get()) {
this.transientStorePool.returnBuffer(writeBuffer);
this.writeBuffer = null;
}
return this.committedPosition.get();
}
commitLeastPages为本次最小提交页数,如果提交页数不满足commitLeastPages,则不执行提交操作,writeBuffer 如果为空,直接返回wrotePosition 指针,无须执行commit操作, 表明commit 操作主体是writeBuffer。
最终调用commit0()方法,实现真正的数据提交,具体实现如下:
protected void commit0(final int commitLeastPages) {
int writePos = this.wrotePosition.get();
int lastCommittedPosition = this.committedPosition.get();
if (writePos - this.committedPosition.get() > 0) {
try {
ByteBuffer byteBuffer = writeBuffer.slice();
byteBuffer.position(lastCommittedPosition);
byteBuffer.limit(writePos);
this.fileChannel.position(lastCommittedPosition);
this.fileChannel.write(byteBuffer);
this.committedPosition.set(writePos);
} catch (Throwable e) {
log.error("Error occurred when commit data to FileChannel.", e);
}
}
}
首先创建writeBuffer 的共享缓存区,然后将新创建的position 回退到上一次提交的位置( committedPosition ) , 设置limit 为wrotePosition (当前最大有效数据指针),然后把commitedPosition 到wrotePosition 的数据复制(写入)到FileChannel中, 然后更新committedPosition 指针为wrotePosition.commit 的作用就是将MappedFile#writeBuffer中的数据提交到文件通道FileChannel 中。
ByteBuffer 使用技巧: slice () 方法创建一个共享缓存区, 与原先的ByteBuffer 共享内存但维护一套独立的指针( position 、mark 、limit) 。
(四) MappedFile 刷新
MappedFile调用flush方法,将内存中的数据刷新到磁盘中,具体实现如下:
/**
* @return The current flushed position
*/
public int flush(final int flushLeastPages) {
if (this.isAbleToFlush(flushLeastPages)) {
if (this.hold()) {
int value = getReadPosition();
try {
//We only append data to fileChannel or mappedByteBuffer, never both.
if (writeBuffer != null || this.fileChannel.position() != 0) {
this.fileChannel.force(false);
} else {
this.mappedByteBuffer.force();
}
} catch (Throwable e) {
log.error("Error occurred when force data to disk.", e);
}
this.flushedPosition.set(value);
this.release();
} else {
log.warn("in flush, hold failed, flush offset = " + this.flushedPosition.get());
this.flushedPosition.set(getReadPosition());
}
}
return this.getFlushedPosition();
}
刷写磁盘,直接调用mappedByteBuffer 或fileChannel 的force 方法将内存中的数据持久化到磁盘,那么flushedPosition 应该等于MappedByteBuffer 中的写指针;
如果writeBuffer不为空, 则flushedPosition 应等于上一次commit 指针;因为上一次提交的数据就是进入到MappedByteBuffer 中的数据;
如果writeBuffer 为空,数据是直接进入到MappedByteBuffer的wrotePosition 代表的是MappedByteBuffer 中的指针,故设置flushedPosition 为wrotePosition 。
(五) 获取MappedFile 最大读指针
RocketMQ 文件的一个组织方式是内存映射文件,预先申请一块连续的固定大小的内存, 需要一套指针标识当前最大有效数据的位置,获取最大有效数据偏移量的方法由MappedFile 的getReadPosition 方法实现
/**
* @return The max position which have valid data
*/
public int getReadPosition() {
return this.writeBuffer == null ? this.wrotePosition.get() : this.committedPosition.get();
}
获取当前文件最大的可读指针。如果writeBuffer 为空, 则直接返回当前的写指针;如果writeBuffer 不为空, 则返回上一次提交的指针。在MappedFile 设计中,只有提交了的数据(写入到MappedByteBuffer 或FileChannel 中的数据)才是安全的数据。
四、 AllocateMappedFileService
(一) AllocateMappedFileService核心属性
ConcurrentMap<String, AllocateRequest> requestTable:key是filepath,value是分配请求
PriorityBlockingQueue<AllocateRequest> requestQueue:分配请求的 队列,注意是优先级队列
int waitTimeOut:生成对应请求到创建MappedFile,可以等待5s
(二) MappedFile创建
通过调用putRequestAndReturnMappedFile方法,往队列里面添加创建MappedFile的请求
public MappedFile putRequestAndReturnMappedFile(String nextFilePath, String nextNextFilePath, int fileSize) {
// 处理两个分配MappedFile请求
int canSubmitRequests = 2;
if (this.messageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
if (this.messageStore.getMessageStoreConfig().isFastFailIfNoBufferInStorePool()
&& BrokerRole.SLAVE != this.messageStore.getMessageStoreConfig().getBrokerRole()) { //if broker is slave, don't fast fail even no buffer in pool
canSubmitRequests = this.messageStore.getTransientStorePool().remainBufferNumbs() - this.requestQueue.size();
}
}
AllocateRequest nextReq = new AllocateRequest(nextFilePath, fileSize);
boolean nextPutOK = this.requestTable.putIfAbsent(nextFilePath, nextReq) == null;
if (nextPutOK) {
if (canSubmitRequests <= 0) {
log.warn("[NOTIFYME]TransientStorePool is not enough, so create mapped file error, " +
"RequestQueueSize : {}, StorePoolSize: {}", this.requestQueue.size(), this.messageStore.getTransientStorePool().remainBufferNumbs());
this.requestTable.remove(nextFilePath);
return null;
}
boolean offerOK = this.requestQueue.offer(nextReq);
if (!offerOK) {
log.warn("never expected here, add a request to preallocate queue failed");
}
canSubmitRequests--;
}
AllocateRequest nextNextReq = new AllocateRequest(nextNextFilePath, fileSize);
boolean nextNextPutOK = this.requestTable.putIfAbsent(nextNextFilePath, nextNextReq) == null;
if (nextNextPutOK) {
if (canSubmitRequests <= 0) {
log.warn("[NOTIFYME]TransientStorePool is not enough, so skip preallocate mapped file, " +
"RequestQueueSize : {}, StorePoolSize: {}", this.requestQueue.size(), this.messageStore.getTransientStorePool().remainBufferNumbs());
this.requestTable.remove(nextNextFilePath);
} else {
boolean offerOK = this.requestQueue.offer(nextNextReq);
if (!offerOK) {
log.warn("never expected here, add a request to preallocate queue failed");
}
}
}
if (hasException) {
log.warn(this.getServiceName() + " service has exception. so return null");
return null;
}
AllocateRequest result = this.requestTable.get(nextFilePath);
try {
if (result != null) {
boolean waitOK = result.getCountDownLatch().await(waitTimeOut, TimeUnit.MILLISECONDS);
if (!waitOK) {
log.warn("create mmap timeout " + result.getFilePath() + " " + result.getFileSize());
return null;
} else {
this.requestTable.remove(nextFilePath);
return result.getMappedFile();
}
} else {
log.error("find preallocate mmap failed, this never happen");
}
} catch (InterruptedException e) {
log.warn(this.getServiceName() + " service has exception. ", e);
}
return null;
}
异步通过调用mmapOperation方法,异步创建MappedFile
/**
* Only interrupted by the external thread, will return false
*/
private boolean mmapOperation() {
boolean isSuccess = false;
AllocateRequest req = null;
try {
req = this.requestQueue.take();
AllocateRequest expectedRequest = this.requestTable.get(req.getFilePath());
if (null == expectedRequest) {
log.warn("this mmap request expired, maybe cause timeout " + req.getFilePath() + " "
+ req.getFileSize());
return true;
}
if (expectedRequest != req) {
log.warn("never expected here, maybe cause timeout " + req.getFilePath() + " "
+ req.getFileSize() + ", req:" + req + ", expectedRequest:" + expectedRequest);
return true;
}
if (req.getMappedFile() == null) {
long beginTime = System.currentTimeMillis();
MappedFile mappedFile;
if (messageStore.getMessageStoreConfig().isTransientStorePoolEnable()) {
try {
mappedFile = ServiceLoader.load(MappedFile.class).iterator().next();
mappedFile.init(req.getFilePath(), req.getFileSize(), messageStore.getTransientStorePool());
} catch (RuntimeException e) {
log.warn("Use default implementation.");
mappedFile = new MappedFile(req.getFilePath(), req.getFileSize(), messageStore.getTransientStorePool());
}
} else {
mappedFile = new MappedFile(req.getFilePath(), req.getFileSize());
}
long eclipseTime = UtilAll.computeEclipseTimeMilliseconds(beginTime);
if (eclipseTime > 10) {
int queueSize = this.requestQueue.size();
log.warn("create mappedFile spent time(ms) " + eclipseTime + " queue size " + queueSize
+ " " + req.getFilePath() + " " + req.getFileSize());
}
// pre write mappedFile
if (mappedFile.getFileSize() >= this.messageStore.getMessageStoreConfig()
.getMapedFileSizeCommitLog()
&&
this.messageStore.getMessageStoreConfig().isWarmMapedFileEnable()) {
mappedFile.warmMappedFile(this.messageStore.getMessageStoreConfig().getFlushDiskType(),
this.messageStore.getMessageStoreConfig().getFlushLeastPagesWhenWarmMapedFile());
}
req.setMappedFile(mappedFile);
this.hasException = false;
isSuccess = true;
}
} catch (InterruptedException e) {
log.warn(this.getServiceName() + " interrupted, possibly by shutdown.");
this.hasException = true;
return false;
} catch (IOException e) {
log.warn(this.getServiceName() + " service has exception. ", e);
this.hasException = true;
if (null != req) {
requestQueue.offer(req);
try {
Thread.sleep(1);
} catch (InterruptedException ignored) {
}
}
} finally {
if (req != null && isSuccess)
req.getCountDownLatch().countDown();
}
return true;
}