目录
一、hbase
1.1 文件格式
[图片上传失败...(image-9cdc93-1646955640463)]
1.2 索引Block数据结构
代码入口:
class BlockIndexChunk {
// key
private final List<byte[]> blockKeys = new ArrayList<>();
// 对应的block的offset
private final List<Long> blockOffsets = new ArrayList<>();
// block的大小
private final List<Integer> onDiskDataSizes = new ArrayList<>();
// 叶子结点
private final List<Long> numSubEntriesAt = new ArrayList<>();
}
1.3 数据索引构建
代码入口:org.apache.hadoop.hbase.regionserver.StoreFileWriter#append
1.3.1 leaf index block
构建leaf index block代码块:
void writeNonRoot(DataOutput out) throws IOException {
// The number of entries in the block.
out.writeInt(blockKeys.size());
if (secondaryIndexOffsetMarks.size() != blockKeys.size()) {
throw new IOException("Corrupted block index chunk writer: " +
blockKeys.size() + " entries but " +
secondaryIndexOffsetMarks.size() + " secondary index items");
}
// For each entry, write a "secondary index" of relative offsets to the
// entries from the end of the secondary index. This works, because at
// read time we read the number of entries and know where the secondary
// index ends.
for (int currentSecondaryIndex : secondaryIndexOffsetMarks)
out.writeInt(currentSecondaryIndex);
// We include one other element in the secondary index to calculate the
// size of each entry more easily by subtracting secondary index elements.
out.writeInt(curTotalNonRootEntrySize);
for (int i = 0; i < blockKeys.size(); ++i) {
out.writeLong(blockOffsets.get(i));
out.writeInt(onDiskDataSizes.get(i));
out.write(blockKeys.get(i));
}
}
构建上层索引,在root chunk 中添加索引
public void blockWritten(long offset, int onDiskSize, int uncompressedSize) {
// Add leaf index block size
totalBlockOnDiskSize += onDiskSize;
totalBlockUncompressedSize += uncompressedSize;
if (singleLevelOnly)
throw new UnsupportedOperationException(INLINE_BLOCKS_NOT_ALLOWED);
if (firstKey == null) {
throw new IllegalStateException("Trying to add second-level index " +
"entry with offset=" + offset + " and onDiskSize=" + onDiskSize +
"but the first key was not set in writeInlineBlock");
}
if (rootChunk.getNumEntries() == 0) {
// We are writing the first leaf block, so increase index level.
expectNumLevels(1);
numLevels = 2;
}
// Add another entry to the second-level index. Include the number of
// entries in all previous leaf-level chunks for mid-key calculation.
rootChunk.add(firstKey, offset, onDiskSize, totalNumEntries);
firstKey = null;
}
1.3.2 intermediate level data index block
代码入口:org.apache.hadoop.hbase.io.hfile.HFileBlockIndex.BlockIndexWriter#writeIndexBlocks
/**
* Current leaf-level chunk. New entries referencing data blocks get added
* to this chunk until it grows large enough to be written to disk.
*/
private BlockIndexChunk curInlineChunk = new BlockIndexChunk();
if (curInlineChunk != null) {
while (rootChunk.getRootSize() > maxChunkSize
// HBASE-16288: if firstKey is larger than maxChunkSize we will loop indefinitely
&& rootChunk.getNumEntries() > minIndexNumEntries
// Sanity check. We will not hit this (minIndexNumEntries ^ 16) blocks can be addressed
&& numLevels < 16) {
rootChunk = writeIntermediateLevel(out, rootChunk);
numLevels += 1;
}
}
1.3.3 root index
代码入口:org.apache.hadoop.hbase.io.hfile.HFileBlockIndex.BlockIndexWriter#writeIndexBlocks
// write the root level
long rootLevelIndexPos = out.getPos();
{
DataOutput blockStream =
blockWriter.startWriting(BlockType.ROOT_INDEX);
rootChunk.writeRoot(blockStream);
if (midKeyMetadata != null)
blockStream.write(midKeyMetadata);
blockWriter.writeHeaderAndData(out);
if (cacheConf != null) {
cacheConf.getBlockCache().ifPresent(cache -> {
HFileBlock blockForCaching = blockWriter.getBlockForCaching(cacheConf);
cache.cacheBlock(new BlockCacheKey(nameForCaching, rootLevelIndexPos, true,
blockForCaching.getBlockType()), blockForCaching);
});
}
}
1.3.4 序列化结构
一级索引:
[图片上传失败...(image-a839e-1646955640462)]
多级索引查询:二分查找
static int binarySearchNonRootIndex(Cell key, ByteBuff nonRootIndex,
CellComparator comparator) {
int numEntries = nonRootIndex.getIntAfterPosition(0);
int low = 0;
int high = numEntries - 1;
int mid = 0;
// Entries start after the number of entries and the secondary index.
// The secondary index takes numEntries + 1 ints.
int entriesOffset = Bytes.SIZEOF_INT * (numEntries + 2);
// If we imagine that keys[-1] = -Infinity and
// keys[numEntries] = Infinity, then we are maintaining an invariant that
// keys[low - 1] < key < keys[high + 1] while narrowing down the range.
ByteBufferKeyOnlyKeyValue nonRootIndexkeyOnlyKV = new ByteBufferKeyOnlyKeyValue();
ObjectIntPair<ByteBuffer> pair = new ObjectIntPair<>();
// 二分查找
while (low <= high) {
mid = low + ((high - low) >> 1);
// Midkey's offset relative to the end of secondary index
int midKeyRelOffset = nonRootIndex.getIntAfterPosition(Bytes.SIZEOF_INT * (mid + 1));
// The offset of the middle key in the blockIndex buffer
int midKeyOffset = entriesOffset // Skip secondary index
+ midKeyRelOffset // Skip all entries until mid
+ SECONDARY_INDEX_ENTRY_OVERHEAD; // Skip offset and on-disk-size
// We subtract the two consecutive secondary index elements, which
// gives us the size of the whole (offset, onDiskSize, key) tuple. We
// then need to subtract the overhead of offset and onDiskSize.
int midLength = nonRootIndex.getIntAfterPosition(Bytes.SIZEOF_INT * (mid + 2)) -
midKeyRelOffset - SECONDARY_INDEX_ENTRY_OVERHEAD;
// we have to compare in this order, because the comparator order
// has special logic when the 'left side' is a special key.
// TODO make KeyOnlyKeyValue to be Buffer backed and avoid array() call. This has to be
// done after HBASE-12224 & HBASE-12282
// TODO avoid array call.
nonRootIndex.asSubByteBuffer(midKeyOffset, midLength, pair);
nonRootIndexkeyOnlyKV.setKey(pair.getFirst(), pair.getSecond(), midLength);
int cmp = PrivateCellUtil.compareKeyIgnoresMvcc(comparator, key, nonRootIndexkeyOnlyKV);
// key lives above the midpoint
if (cmp > 0)
low = mid + 1; // Maintain the invariant that keys[low - 1] < key
// key lives below the midpoint
else if (cmp < 0)
high = mid - 1; // Maintain the invariant that key < keys[high + 1]
else
return mid; // exact match
}
// As per our invariant, keys[low - 1] < key < keys[high + 1], meaning
// that low - 1 < high + 1 and (low - high) <= 1\. As per the loop break
// condition, low >= high + 1\. Therefore, low = high + 1.
if (low != high + 1) {
throw new IllegalStateException("Binary search broken: low=" + low
+ " " + "instead of " + (high + 1));
}
// OK, our invariant says that keys[low - 1] < key < keys[low]. We need to
// return i such that keys[i] <= key < keys[i + 1]. Therefore i = low - 1.
int i = low - 1;
// Some extra validation on the result.
if (i < -1 || i >= numEntries) {
throw new IllegalStateException("Binary search broken: result is " +
i + " but expected to be between -1 and (numEntries - 1) = " +
(numEntries - 1));
}
return i;
}
1.4 布隆过滤器构建
会根据key进行拆分不同大小的bloomchunk
内存数据结构:BloomFilterChunk
一层索引
代码入口:CompoundBloomFilterWriter
@InterfaceAudience.Private
public class BloomFilterChunk implements BloomFilterBase {
/** Bytes (B) in the array. This actually has to fit into an int. */
protected long byteSize;
/** Number of hash functions */
protected int hashCount;
/** Hash type */
protected final int hashType;
/** Hash Function */
protected final Hash hash;
/** Keys currently in the bloom */
protected int keyCount;
/** Max Keys expected for the bloom */
protected int maxKeys;
/** Bloom bits */
protected ByteBuffer bloom;
/** The type of bloom */
protected BloomType bloomType;
}
二、presto
目前 presto没有索引类,只有统计类的数据数据结构
列统计信息:ColumnStatistics
public final class ColumnStatistics
{
private static final ColumnStatistics EMPTY = new ColumnStatistics(Estimate.unknown(), Estimate.unknown(), Estimate.unknown(), Optional.empty());
private final Estimate nullsFraction;
private final Estimate distinctValuesCount;
private final Estimate dataSize;
private final Optional<DoubleRange> range;
}
public enum ColumnStatisticType
{
MIN_VALUE,
MAX_VALUE,
NUMBER_OF_DISTINCT_VALUES,
NUMBER_OF_NON_NULL_VALUES,
NUMBER_OF_TRUE_VALUES,
MAX_VALUE_SIZE_IN_BYTES,
TOTAL_SIZE_IN_BYTES,
}
# 四、es-lucene
[https://www.elastic.co/cn/blog/frame-of-reference-and-roaring-bitmaps](https://www.elastic.co/cn/blog/frame-of-reference-and-roaring-bitmaps)
[https://issues.apache.org/jira/browse/LUCENE-5983](https://issues.apache.org/jira/browse/LUCENE-5983)
代码块 : RoaringDocIdSet
/**
- {@link DocIdSet} implementation inspired from http://roaringbitmap.org/
- The space is divided into blocks of 2^16 bits and each block is encoded
- independently. In each block, if less than 2^12 bits are set, then
- documents are simply stored in a short[]. If more than 216-212 bits are
- set, then the inverse of the set is encoded in a simple short[]. Otherwise
- a {@link FixedBitSet} is used.
- @lucene.internal
*/
public class RoaringDocIdSet extends DocIdSet {
}
目前lucene在使用query的查询缓存的时候缓存的是 docId ,目前是自己实现的bitmap
# 五、carbondata
# 六、参考