翻译 RocksDB FAQ

原网址:https://github.com/facebook/rocksdb/wiki/RocksDB-FAQ

有道+自己翻译

Building RocksDB

Q: What is the absolute minimum version of gcc that we need to build RocksDB?
gcc最低版本

A: 4.8.

Q: What is RocksDB's latest stable release?
最新稳定版获取方式

A: All the releases in https://github.com/facebook/rocksdb/releases are stable. For RocksJava, stable releases are available in https://oss.sonatype.org/#nexus-search;quick~rocksdb.
地址为https://github.com/facebook/rocksdb/releases。对于java用户,稳定版地址为https://oss.sonatype.org/#nexus-search;quick~rocksdb.

Basic Read/Write

Q: Are basic operations Put(), Write(), Get() and NewIterator() thread safe?
基本操作Put(), Write(), Get()NewIterator()是否线程安全?

A: Yes.

Q: Can I write to RocksDB using multiple processes?
是否可用于对进程应用?

A: No. However, it can be opened using Secondary DB. If no write goes to the database, it can be opened in read-only mode from multiple processes.
不可以。可以使用从库打开。写操作只能有一个进程只想,其他的为只读模式打开。

Q: Does RocksDB support multi-process read access?
是否支持多进程读操作?

A: Yes, you can read it using secondary database using DB::OpenAsSecondary(). RocksDB can also support multi-process read only process without writing the database. This can be done by opening the database with DB::OpenForReadOnly() call.
支持,可以使用从库打开。支持多进程只读操作。可以通过函数DB::OpenForReadOnly()完成。

Q: Is it safe to close RocksDB while another thread is issuing read, write or manual compaction requests?
有其他线程在执行读、写或者合并操作时是否可以关闭DB?

A: No. The users of RocksDB need to make sure all functions have finished before they close RocksDB. You can speed up the waiting by calling CancelAllBackgroundWork().
不可以。用户需要保证这些操作完成之后才可以关闭。你可以通过调用' cancelallback地基()'来加速等待。

Q: What's the maximum key and value sizes supported?
支持的最大key,value大小?

A: In general, RocksDB is not designed for large keys. The maximum recommended sizes for key and value are 8MB and 3GB respectively.
一般情况,rocksdb不是设计用于大key的。最大推荐是key 8MB,value 3GB。

Q: What's the fastest way to load data into RocksDB?
最快的load data方式是什么?

A: A fast way to direct insert data to the DB:
快速插入数据方式:

  1. using single writer thread and insert in sorted order
    单线程写如排序数据
  2. batch hundreds of keys into one write batch
    数百key执行batch操作
  3. use vector memtable
    使用vector memtable
  4. make sure options.max_background_flushes is at least 4
    options.max_background_flushes 至少是4
  5. before inserting the data, disable automatic compaction, set options.level0_file_num_compaction_trigger, options.level0_slowdown_writes_trigger and options.level0_stop_writes_trigger to very large value. After inserting all the data, issue a manual compaction.
    插入之前关闭自动compact,设置options.level0_file_num_compaction_trigger, options.level0_slowdown_writes_triggeroptions.level0_stop_writes_trigger 到很大的值。在插入完成之后,执行手动compact。

3-5 will be automatically done if you call Options::PrepareForBulkLoad() to your option
调用Options::PrepareForBulkLoad()之后3~5回自动完成。

If you can pre-process the data offline before inserting. There is a faster way: you can sort the data, generate SST files with non-overlapping ranges in parallel and bulk load the SST files. See https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files
如果可以在插入之前预处理数据。会有更快的方式:排序数据,生成不重合的sst文件,只想批量加载sst。参考https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files

Q: What is the correct way to delete the DB? Can I simply call DestroyDB() on a live DB?
正确删除DB方式?只是执行DestroyDB()可以吗?

A: Close the DB then destroy the DB is the correct way. Calling DestroyDB() on a live DB is an undefined behavior.
关闭DB之后destory是正确方式。在运行的DB上执行DestroyDB()是为定义行为。

Q: What is the difference between DestroyDB() and directly deleting the DB directory manually?
执行DestroyDB()和手动删除DB目录的区别是啥?

A: The major difference is that DestroyDB() will take care of the case where the RocksDB database is stored in multiple directories. For instance, a single DB can be configured to store its data in multiple directories by specifying different paths to DBOptions::db_paths, DBOptions::db_log_dir, and DBOptions::wal_dir.
主要区别是DestroyDB()会处理有多个目录的情况。例如,可以通过DBOptions::db_paths, DBOptions::db_log_dir, 和 DBOptions::wal_dir设置不同内容的目录在不同地方。

Q: Any better way to dump key-value pairs generated by map-reduce job into RocksDB?
是否有更好的方法将map-reduce job生成的键值对转储到RocksDB?

A: A better way is to use SstFileWriter, which allows you to directly create RocksDB SST files and add them to a RocksDB database. However, if you're adding SST files to an existing RocksDB database, then its key-range must not overlap with the database. https://github.com/facebook/rocksdb/wiki/Creating-and-Ingesting-SST-files
一个更好的方法是使用SstFileWriter,它允许你直接创建RocksDB的SST文件,并将它们添加到RocksDB数据库中。然而,如果你要将SST文件添加到现有的RocksDB数据库中,那么它的键范围就不能与数据库重叠。

Q: Is it safe to read from or write to RocksDB inside compaction filter callback?
在压缩过滤器回调中读取或写入RocksDB是否安全?

A: It is safe to read but not always safe to write to RocksDB inside compaction filter callback as write might trigger deadlock when write-stop condition is triggered.
读取是安全的,但在压缩过滤器回调中写入RocksDB并不总是安全的,因为当写入条件被触发时,可能会触发死锁。

Q: Does RocksDB hold SST files and memtables for a snapshot?
RocksDB是否为快照保存SST文件和memtables?

A: No. See https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#gets-iterators-and-snapshots for how snapshots work.
不会。关于snapshots,参见https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#gets-iterators-and-snapshots

Q: With DBWithTTL, is there a time bound for the expired keys to be removed?
使用' DBWithTTL ',是否有一个时间限制来删除过期的密钥?

A: DBwithTTL itself does not provide an upper time bound. Expired keys will be removed when they are part of any compaction. However, there is no guarantee that when such compaction will start. For instance, if you have a certain key-range that is never updated, then compaction is less likely to apply to that key-range. For leveled compaction, you can enforce some limit using the feature of periodic compaction to do that. The feature right now has a limitation: if the write rate is too slow that memtable flush is never triggered, the periodic compaction won't be triggered either.
' DBwithTTL '本身没有提供上限时间。过期的密钥将被删除,当它们是任何压缩的一部分。然而,不能保证何时会开始这样的压实。例如,如果您有一个永远不会更新的键范围,那么压缩就不太可能应用于该键范围。对于分级压缩,您可以使用定期压缩的特性来执行一些限制。这个特性现在有一个限制:如果写速率太慢,memtable flush不会被触发,那么周期性的压缩也不会被触发。

Q: If I delete a column family, and I didn't yet delete the column family handle, can I still use it to access the data?
如果我删除了一个列族,而我还没有删除列族句柄,我还可以使用它来访问数据吗?

A: Yes. DropColumnFamily() only marks the specified column family as dropped, and it will not be dropped until its reference count goes to zero and marked as dropped.
是的。' DropColumnFamily()'只将指定的列族标记为已删除,并且在其引用计数为0并标记为已删除之前不会被删除。

Q: Why does RocksDB issue reads from the disk when I only make write request?
为什么当我只发出写请求时,RocksDB会从磁盘读取数据?

A: Such IO reads are from compactions. RocksDB compaction reads from one or more SST files, perform merge-sort like operation, generate new SST files, and delete the old SST files it inputs.
这样的IO读取来自压缩。RocksDB压缩从一个或多个SST文件中读取数据,进行类似合并排序的操作,生成新的SST文件,并删除其输入的旧SST文件。

Q: Is block_size before compression , or after?
block_size是在压缩之前还是压缩之后?

A: block_size is for size before compression.
Block_size表示压缩前的大小

Q: After using options.prefix_extractor, I sometimes see wrong results. What's wrong?
在使用的选项。prefix_extractor ',我有时会看到错误的结果。怎么了?

A: There are limitations in options.prefix_extractor. If prefix iterating is used, doesn't support Prev() or SeekToLast(), and many operations don't support SeekToFirst() either. A common mistake is to seek the last key of a prefix by calling Seek(), followed by Prev(). This is, however, not supported. Currently there is no way to find the last key of prefix with prefix iterating. Also, you can't continue iterating keys after finishing the prefix you seek to. In places where those operations are needed, you can try to set ReadOptions.total_order_seek = true to disable prefix iterating.
在' options.prefix_extractor '中有一些限制。如果使用前缀迭代,不支持' Prev() '或' SeekToLast() ',许多操作也不支持' SeekToFirst() '。一个常见的错误是通过调用' seek() '和' Prev() '来查找前缀的最后一个键。但是,这并不支持。目前还无法通过前缀迭代找到前缀的最后一个键。此外,在完成了所寻求的前缀之后,您不能继续迭代键。在需要这些操作的地方,你可以尝试设置“ReadOptions”。Total_order_seek = true '禁用前缀迭代。

Q: If Put() or Write() is called with WriteOptions.sync=true, does it mean all previous writes are persistent too?
如果' Put() '或' Write() '调用'WriteOptions。Sync=true',是否意味着所有之前的写操作都是持久的?

A: Yes, but only for all previous writes with WriteOptions.disableWAL=false.
是的,但只适用于所有以前写入' writeoptions . disableWAL =false '。

Q: I disabled write-ahead-log and rely on DB::Flush() to persist the data. It works well for single family. Can I do the same if I have multiple column families?
我禁用了WAL,并依赖于' DB::Flush() '来持久化数据。它适用于单个column family。如果我有多个列族,我能做同样的事情吗?

A: Yes. Set option.atomic_flush=true to enable atomic flush across multiple column families.
是的。设置”选项。Atomic_flush =true '启用跨多个列族的原子刷新。

Q: What's the best way to delete a range of keys?
删除一系列键的最好方法是什么?

A: See https://github.com/facebook/rocksdb/wiki/DeleteRange .
参见https://github.com/facebook/rocksdb/wiki/DeleteRange

Q: What are column families used for?
列族的用途是什么?

A: The most common reasons of using column families:
使用列族最常见的原因是:

  1. Use different compaction setting, comparators, compression types, merge operators, or compaction filters in different parts of data
    在数据的不同部分使用不同的压缩设置、比较器、压缩类型、合并操作符或压缩过滤器
  2. Drop a column family to delete its data
    删除列族可删除其数据
  3. One column family to store metadata and another one to store the data.
    一个列族用于存储元数据,另一个列族用于存储数据。

Q: What's the difference between storing data in multiple column family and in multiple rocksdb database?
将数据存储在多个列族和多个rocksdb数据库之间的区别是什么?

A: The main differences will be backup, atomic writes and performance of writes. The advantage of using multiple databases: database is the unit of backup or checkpoint. It's easier to copy a database to another host than a column family. Advantages of using multiple column families:
主要的区别在于备份、原子写入和写入的性能。使用多数据库的优点:数据库是备份或检查点的单位。将数据库复制到另一个主机上比将列族复制到另一个主机上更容易。使用多列族的优点:

  1. write batches are atomic across multiple column families on one database. You can't achieve this using multiple RocksDB databases
    在一个数据库上的多个列族之间,batch写入是原子的。你无法通过使用多个RocksDB数据库实现这一点
  2. If you issue sync writes to WAL, too many databases may hurt the performance.
    如果向WAL发出同步写操作,过多的数据库可能会损害性能。

Q: Is RocksDB really “lockless” in reads?
RocksDB的读取真的是“无锁”的吗?

A: Reads might hold mutex in the following situations:
在以下情况下,读取可能会持有互斥锁:

  1. access the sharded block cache
    访问分片块缓存
  2. access table cache if options.max_open_files != -1
    访问表缓存如果'选项。max_open_files ! = 1
  3. if a read happens just after flush or compaction finishes, it may briefly hold the global mutex to fetch the latest metadata of the LSM tree.
    如果读取发生在刷新或压缩完成之后,它可能会短暂地持有全局互斥,以获取LSM树的最新元数据。
  4. the memory allocators RocksDB relies on (e.g. jemalloc), may sometimes hold locks. These locks are only held rarely, or in fine granularity.
    RocksDB所依赖的内存分配器(如jemalloc)有时会持有锁。这些锁很少被持有,或者以很细的粒度持有。

Q: If I update multiple keys, should I issue multiple Put(), or put them in one write batch and issue Write()?
如果我更新多个键,我应该发出多个' Put() ',或把它们放在一个写批处理和发出' write () ' ?

A: Using WriteBatch() to batch more keys usually performs better than single Put().
使用' WriteBatch() '来批处理更多的键通常比单一的' Put() '性能更好。

Q: What's the best practice to iterate all the keys?
什么是迭代所有键的最佳实践?

A: If it's a small or read-only database, just create an iterator and iterate all the keys. Otherwise consider to recreate iterators once a while, because an iterator will hold all the resources from being released. If you need to read from consistent view, create a snapshot and iterate using it.
如果是小型或只读数据库,只需创建一个迭代器并迭代所有键。否则,考虑每隔一段时间重新创建一次迭代器,因为迭代器将保存所有被释放的资源。如果需要从一致的视图中读取,则创建快照并使用它进行迭代。

Q: I have different key spaces. Should I separate them using prefixes, or use different column families?
我有不同的键空间。我应该使用前缀来分隔它们,还是使用不同的列族?

A: If each key space is reasonably large, it's a good idea to put them in different column families. If it can be small, then you should consider to pack multiple key spaces into one column family, to avoid the trouble of maintaining too many column families.
如果每个键空间都相当大,那么最好将它们放在不同的列族中。如果它可以很小,那么您应该考虑将多个键空间打包到一个列族中,以避免维护太多列族的麻烦。

Q: Is the performance of iterator Next() the same as Prev()?
迭代器的性能' Next() '与' Prev() '相同?

A: The performance of reversed iteration is usually much worse than forward iteration. There are various reasons for that:
反向迭代的性能通常比前向迭代差得多。原因有很多:

  1. delta encoding in data blocks is more friendly to Next()
    数据块中的增量编码对' Next() '更友好
  2. the skip list used in the memtable is single-direction, so Prev() is another binary search
    memtable中使用的skip list是单方向的,所以' Prev() '是另一个二分查找
  3. the internal key order is optimized for Next().
    内部键序为' Next() '进行了优化。

Q: If I want to retrieve 10 keys from RocksDB, is it better to batch them and use MultiGet() versus issuing 10 individual Get() calls?
如果我想从RocksDB获取10个键值,那么是使用“MultiGet()”批量处理它们,还是使用10个单独的“Get()”调用呢?

A: There are potential performance benefits in using MultiGet(). See https://github.com/facebook/rocksdb/wiki/MultiGet-Performance .
使用“MultiGet()”有潜在的性能优势。参见https://github.com/facebook/rocksdb/wiki/MultiGet-Performance

Q: If I have multiple column families and call the DB functions without a column family handle, what the result will be?
如果我有多个列族,并调用DB函数没有列族句柄,结果将是什么?

A: It will operate only the default column family.
它将只操作默认的列族。

Q: Can I reuse ReadOptions, WriteOptions, etc, across multiple threads?
我可以重用' ReadOptions ', ' WriteOptions '等,跨多个线程?

A: As long as they are const, you are free to reuse them.
只要它们是const,您就可以自由地重用它们。

Feature Support

Q: Can I cancel a specific compaction?
我可以取消特定的压缩吗?

A: No, you can't cancel one specific compaction.
不,你不能取消一个特定的压缩。

Q: Can I close the DB when a manual compaction is in progress?
当手动压缩正在进行时,我可以关闭数据库吗?

A: No, it's not safe to do that. However, you call CancelAllBackgroundWork(db, true) in another thread to abort the running compactions, so that you can close the DB sooner. Since 6.5, you can also speed it up using DB::DisableManualCompaction().
不,那样做不安全。但是,你可以在另一个线程中调用CancelAllBackgroundWork(db, true)来中止正在运行的压缩,这样你就可以更快地关闭db。从6.5开始,你也可以使用' DB::DisableManualCompaction() '来加速它。

Q: Is it safe to directly copy an open RocksDB instance?
直接复制打开的RocksDB实例安全吗?

A: No, unless the RocksDB instance is opened in read-only mode.
不可以,除非RocksDB实例以只读模式打开。

Q: Does RocksDB support replication?
RocksDB是否支持复制?

A: No, RocksDB does not directly support replication. However, it offers some APIs that can be used as building blocks to support replication. For instance, GetUpdatesSince() allows developers to iterate though all updates since a specific point in time.
See https://github.com/facebook/rocksdb/wiki/Replication-Helpers
不,RocksDB并不直接支持复制。但是,它提供了一些api,可以用作支持复制的构建块。例如,' GetUpdatesSince() '允许开发人员迭代自特定时间点以来的所有更新。
参见https://github.com/facebook/rocksdb/wiki/Replication-Helpers

Q: Does RocksDB support group commit?
RocksDB是否支持group commit?

A: Yes. Multiple write requests issued by multiple threads may be grouped together. One of the threads writes WAL log for those write requests in one single write request and fsync once if configured.
是的。由多个线程发出的多个写请求可以组合在一起。其中一个线程在一个写请求中为这些写请求写WAL日志,如果配置了fsync,则会写一次。

Q: Is it possible to scan/iterate over keys only? If so, is that more efficient than loading keys and values?
是否可能只扫描/迭代key?如果是这样,是不是比加载键和值更有效?

A: No it is usually not more efficient. RocksDB's values are normally stored inline with keys. When a user iterates over the keys, the values are already loaded in memory, so skipping the value won't save much. In BlobDB, keys and large values are stored separately so it maybe beneficial to only iterate keys, but it is not supported yet. We may add the support in the future.
不,它通常不会更有效率。RocksDB的值通常与键一起存储。当用户遍历键时,值已经加载到内存中,因此跳过值不会节省太多。在BlobDB中,键和大值是分开存储的,所以只迭代键可能会有好处,但目前还不支持。我们以后可能会增加支持。

Q: Is the transaction object thread-safe?
事务对象是否线程安全?

A: No it's not. You can't issue multiple operations to the same transaction concurrently. (Of course, you can execute multiple transactions in parallel, which is the point of the feature.)
不是的。您不能同时向同一事务发出多个操作。(当然,您可以并行地执行多个事务,这是该特性的重点。)

Q: After iterator moves away from a key/value, is the memory pointed by those key/value still kept?
当迭代器从一个键/值移动后,这些键/值指向的内存仍然保留吗?

A: No, they can be freed, unless you set ReadOptions.pin_data = true and your setting supports this feature.
不,它们可以被释放,除非你设置了“ReadOptions”。Pin_data = true ',您的设置支持此特性。

Q: Can I programmatically read data from an SST file?
我可以通过编程方式从SST文件读取数据吗?

A: We don't support it right now. But you can dump the data using sst_dump. Since version 6.5, you'll be able to do it using SstFileReader.
我们现在不支持它。但是您可以使用' sst_dump '转储数据。从6.5版开始,您就可以使用SstFileReader来实现这个功能。

Q: RocksDB repair: when can I use it? Best-practices?
RocksDB repair:我什么时候可以使用它?最佳实践?

A: Check https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer
参见https://github.com/facebook/rocksdb/wiki/RocksDB-Repairer

Configuration and Tuning

Q: What's the default value of the block cache?
块缓存的默认值是多少?

A: 8MB. That's too low for most use cases, so it's likely that you need to set your own value.
8 mb。对于大多数用例来说,这个值都太低了,所以您可能需要设置自己的值。

Q: Are bloom filter blocks of SST files always loaded to memory, or can they be loaded from disk?
Are bloom filter blocks of SST files always loaded to memory, or can they be loaded from disk?

A: The behavior is configurable. When BlockBaseTableOptions::cache_index_and_filter_blocks is set to true, then bloom filters and index block will be loaded into a LRU cache only when related Get() requests are issued. In the other case where cache_index_and_filter_blocks is set to false, then RocksDB will try to keep the index block and bloom filter in memory up to DBOptions::max_open_files number of SST files.
该行为是可配置的。当' BlockBaseTableOptions::cache_index_and_filter_blocks '设置为true时,bloom过滤器和索引块将被加载到LRU缓存中,只有当相关的' Get() '请求被发出时。在另一种情况下,' cache_index_and_filter_blocks '被设置为false,那么RocksDB会尝试将索引块和bloom过滤器在内存中保持在' DBOptions::max_open_files '的SST文件数量。

Q: Is it safe to configure different prefix extractor for different column family?
为不同的列族配置不同的前缀提取器安全吗?

A: Yes.
安全。

Q: Can I change the prefix extractor?
我可以更改前缀提取器吗

A: No. Once you've specified a prefix extractor, you cannot change it. However, you can disable it by specifying a null value.
不。一旦指定了前缀提取器,就不能更改它。但是,您可以通过指定一个空值禁用它。

Q: How to configure RocksDB to use multiple disks?
如何配置RocksDB使用多个磁盘?

A: You can create a single filesystem (ext3, xfs, etc.) on multiple disks. Then, you can run RocksDB on that single file system.
您可以在多个磁盘上创建单个文件系统(ext3、xfs等)。然后,你可以在单一的文件系统上运行RocksDB。
Some tips when using disks:
使用磁盘时的一些提示:

  • If RAID is used, use larger RAID stripe size (64kb is too small, 1MB would be excellent).
    如果使用RAID,则使用更大的RAID条带(64kb太小,1MB最好)。
  • Consider enabling compaction read-ahead by specifying ColumnFamilyOptions::compaction_readahead_size to at least 2MB.
    考虑通过指定' ColumnFamilyOptions::compaction_readahead_size '至少为2MB来启用压缩预读。
  • If workload is write-heavy, have enough compaction threads to keep the disks busy
    如果工作负载写量大,则需要有足够的压缩线程来保持磁盘繁忙
  • Consider enabling async write behind for compaction
    考虑启用异步写后置压缩

Q: Can I open RocksDB with a different compression type and still read old data?
我是否可以使用不同的压缩类型打开RocksDB并读取旧数据?

A: Yes, since RocksDB stored the compression information in each SST file and performs decompression accordingly, you can change the compression and the db will still be able to read existing files. In addition, you can also specify a different compression for the last level by specifying ColumnFamilyOptions::bottommost_compression.
是的,因为RocksDB将压缩信息存储在每个SST文件中,并进行相应的解压缩,所以你可以改变压缩方式,而db仍然能够读取现有的文件。此外,您还可以通过指定' ColumnFamilyOptions::bottommost_compression '为最后一层指定不同的压缩。

Q: Can I put log files and sst files in different directories? How about information logs?
我可以把日志文件和sst文件放在不同的目录吗?信息日志呢?

A: Yes. WAL files can be placed in a separate directory by specifying DBOptions::wal_dir, information logs can as well be written in a separate directory by using DBOptions::db_log_dir.
是的。可以通过指定' DBOptions::wal_dir '将WAL文件放在一个单独的目录中,也可以使用' DBOptions::db_log_dir '将信息日志写入一个单独的目录中。

Q: If I use non-default comparators or merge operators, can I still use ldb tool?
如果我使用非默认的比较器或合并操作符,我还可以使用' ldb '工具吗?

A: You cannot use the regular ldb tool in this case. However, you can build your custom ldb tool by passing your own options using this function rocksdb::LDBTool::Run(argc, argv, options) and compile it.
在这种情况下,你不能使用常规的“ldb”工具。然而,你可以通过传递你自己的选项来构建你的自定义' ldb '工具,使用这个函数' rocksdb::LDBTool::Run(argc, argv, options) '并编译它。

Q: What will happen if I open RocksDB with a different compaction style?
如果我用不同的compaction style打开RocksDB会发生什么?

A: When opening a RocksDB database with a different compaction style or compaction settings, one of the following scenarios will happen:
答:当你打开一个RocksDB数据库,使用不同的压缩样式或压缩设置时,会出现以下情况:

  1. The database will refuse to open if the new configuration is incompatible with the current LSM layout.
    如果新配置与当前LSM布局不兼容,则数据库将拒绝打开。
  2. If the new configuration is compatible with the current LSM layout, then RocksDB will continue and open the database. However, in order to make the new options take full effect, it might require a full compaction.
    如果新的配置与当前的LSM布局兼容,那么RocksDB将继续并打开数据库。但是,为了使新选项完全生效,可能需要一次full compaction。

Consider to use the migration helper function OptionChangeMigration(), which will compact the files to satisfy the new compaction style if needed.
考虑使用迁移帮助函数' OptionChangeMigration() ',如果需要,它将压缩文件以满足新的压缩样式。

Q: Does RocksDB have columns? If it doesn't have column, why there are column families?
RocksDB是否有columns?如果它没有columns,为什么会有column families?

A: No, RocksDB doesn't have columns. See https://github.com/facebook/rocksdb/wiki/Column-Families for what is column family.
不,RocksDB没有columns。请参阅https://github.com/facebook/rocksdb/wiki/Column-Families了解什么是column family。

Q: How to estimate space can be reclaimed If I issue a full manual compaction?
如何估计空间可以回收,如果我发出一个完整的手动压缩?

A: There is no easy way to predict it accurately, especially when there is a compaction filter. If the database size is steady, DB property rocksdb.estimate-live-data-size is the best estimation.
没有简单的方法来准确预测它,特别是当有一个压缩过滤器时。如果数据库大小是稳定的,那么DB属性' rocksdb.estimate-live-data-size '是最好的估计。

Q: What's the difference between a snapshot, a checkpoint and a backup?
snapshot、checkpoint和backup之间的区别是什么?**

A: Snapshot is a logical concept. Users can query data using program interface, but underlying compactions still rewrite existing files.
快照是一个逻辑概念。用户可以使用程序接口查询数据,但底层压缩仍然重写现有文件。

A checkpoint will create a physical mirror of all the database files using the same Env. This operation is very cheap if the file system hard-link can be used to create mirrored files.
检查点将使用相同的' Env '创建所有数据库文件的物理镜像。如果文件系统硬链接可以用来创建镜像文件,那么这个操作是非常容易的。

A backup can move the physical database files to another Env (like HDFS). The backup engine also supports incremental copy between different backups.
备份可以将物理数据库文件移动到另一个' Env '(如HDFS)。备份引擎还支持不同备份之间的增量拷贝。

Q: Which compression type should I use?
我应该使用哪种压缩类型?

A: Start with LZ4 (or Snappy, if LZ4 is not available) for all levels for good performance. If you want to further reduce data size, try to use ZStandard (or Zlib, if ZStandard is not available) in the bottommost level. See https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#compression
从LZ4(或者Snappy,如果LZ4不可用的话)开始,在所有level中都能获得良好的性能。如果您想进一步减少数据大小,请尝试在最底层使用ZStandard(或者Zlib,如果ZStandard不可用的话)。参考https://github.com/facebook/rocksdb/wiki/Setup-Options-and-Basic-Tuning#compression

Q: Is compaction needed if no key is deleted or overwritten?
如果没有删除或覆盖键,是否需要压缩?

A: Even if there is no need to clear out-of-date data, compaction is needed to ensure read performance.
即使不需要清除过时的数据,也需要压缩以确保读取性能。

Q: After a write following option.disableWAL=true, I write another record with options.sync=true, will it persist the previous write too?
在write 操作中设置 'option.disableWAL=true ',参数为options.sync=true,它会坚持之前的写吗?

A: No. After the program crashes, writes with option.disableWAL=true will be lost, if they are not flushed to SST files.
不。在程序崩溃后,使用option.disableWAL=true'写入内容将丢失,如果它们没有被刷新到SST文件。

Q: What is options.target_file_size_multiplier useful for?
options.target_file_size_multiplier的作用?

A: It's a rarely used feature. For example, you can use it to reduce the number of the SST files.
这是一个很少使用的功能。例如,可以减少SST文件的数量。

Q: I observed burst write I/Os. How can I eliminate that?
我观察到了写I/O的突发。我如何消除它?

A: Try to use the rate limiter: See https://github.com/facebook/rocksdb/wiki/Rate-Limiter
尝试使用 rate limiter : 请参阅https://github.com/facebook/rocksdb/wiki/Rate-Limiter

Q: Can I change the compaction filter without reopening the DB?
我可以在不重新打开数据库的情况下更改压缩过滤器吗?

A: It's not supported. However, you can achieve it by implementing your CompactionFilterFactory which returns different compaction filters.
它不支持。然而,你可以通过实现你的“CompactionFilterFactory”来实现它,它会返回不同的压缩过滤器。

Q: How many column families can a single db support?
一个db可以支持多少column families?

A: Users should be able to run at least thousands of column families without seeing any error. However, too many column families don't usually perform well. We don't recommend users to use more than a few hundreds of column families.
用户应该能够运行至少数千个列族而不会看到任何错误。但是,太多的列族通常都不能很好地执行。我们不建议用户使用超过几百个列族。

Q: Can I reuse DBOptions or ColumnFamilyOptions to open multiple DBs or column families?
我可以重复使用' DBOptions '或' ColumnFamilyOptions '打开多个数据库或column families?

A: Yes. Internally, RocksDB always makes a copy to those options, so you can freely change them and reuse these objects.
是的。在内部,RocksDB总是会复制这些选项,所以你可以自由地更改它们并重用这些对象。

Portability

Q: Can I run RocksDB and store the data on HDFS?
我可以运行RocksDB并将数据存储在HDFS上吗

A: Yes, by using the Env returned by NewHdfsEnv(), RocksDB will store data on HDFS. However, the file lock is currently not supported in HDFS Env.
是的,通过使用' NewHdfsEnv() '返回的Env, RocksDB会将数据存储在HDFS上。但是,HDFS Env目前不支持文件锁。

Q: Does RocksJava support all the features?
RocksJava支持所有的特性吗?

A: We are working toward making RocksJava feature compatible. However, you're more than welcome to submit pull request if you find something is missing
我们正在努力使RocksJava特性兼容。然而,如果你发现一些东西是缺失的,欢迎发起pull request?

Backup

Q: Can I preserve a “snapshot” of RocksDB and later roll back the DB state to it?
我是否可以保留RocksDB的“快照”,然后回滚到它的DB状态?

A: Yes, via the BackupEngine or [[Checkpoints]].
是的,通过BackupEngine或[[检查点]]。

Q: Does BackupableDB create a point-in-time snapshot of the database?
' BackupableDB '是否创建数据库的时间点快照?

A: Yes when BackupOptions::backup_log_files = true or flush_before_backup = true when calling CreateNewBackup().
是的。当BackupOptions::backup_log_files = true 或者 flush_before_backup = true 会调用 CreateNewBackup().

Q: Does the backup process affect accesses to the database in the mean while?
备份过程是否影响对数据库的访问?

A: No, you can keep reading and writing to the database at the same time.
不,您可以同时对数据库进行读写操作。

Q: How can I configure RocksDB to backup to HDFS?
如何配置RocksDB备份到HDFS?

A: Use BackupableDB and set backup_env to the return value of NewHdfsEnv().
使用' BackupableDB '并将backup_env设置为' NewHdfsEnv() '的返回值。

Failure Handling

Q: Does RocksDB throw exceptions?
RocksDB是否会抛出异常?

A: No, RocksDB returns rocksdb::Status to indicate any error. However, RocksDB does not catch exceptions thrown by STL or other dependencies. For instance, so it's possible that you will see std::bad_malloc when memory allocation fails, or similar exceptions in other situations.
不,RocksDB返回' RocksDB::Status '来表示任何错误。然而,RocksDB并不捕捉STL或其他依赖项抛出的异常。例如,当内存分配失败时,您可能会看到' std::bad_malloc ',或者在其他情况下出现类似的异常。

Q: How RocksDB handles read or write I/O errors?
RocksDB如何处理读写I/O错误?

A: If the I/O errors happen in the foreground operations such as Get() and Write(), then RocksDB will return rocksdb::IOError status. If the error happens in background threads and options.paranoid_checks=true, we will switch to the read-only mode. All the writes will be rejected with the status code representing the background error.
如果I/O错误发生在前台操作,比如' Get() '和' Write() ',那么RocksDB将返回' RocksDB::IOError '状态。如果错误发生在后台线程和'选项。id_checks=true ',我们将切换到只读模式。所有的写操作都将被拒绝,状态码表示后台错误。

Q: How to distinguish type of exceptions thrown by RocksJava?
如何区分RocksJava抛出的异常类型?

A: Yes, RocksJava throws RocksDBException for every RocksDB related exceptions.
是的,RocksJava会对每一个RocksDB相关的异常抛出' RocksDBException '。

Failure Recovery

Q: If my process crashes, can it corrupt the database?
如果我的进程崩溃,它会破坏数据库吗?

A: No, but data in the un-flushed memtables might be lost if [[Write Ahead Log]] (WAL) is disabled.
没有,但是如果[[Write Ahead Log]] (WAL)被禁用,未刷新memtable中的数据可能会丢失。

Q: If my machine crashes and rebooted, will RocksDB preserve the data?
如果我的机器崩溃并重新启动,RocksDB会保存数据吗?

A: Data is synced when you issue a sync write (write with WriteOptions.sync=true), call DB::SyncWAL(), or when memtables are flushed.
当你发出一个同步写入(写入' WriteOptions.sync=true '),调用' DB::SyncWAL() ',或者当memtables被刷新时,数据是同步的。

Q: How to know the number of keys stored in a RocksDB database?
如何知道RocksDB数据库中存储的键的数量?

A: Use GetIntProperty(cf_handle, "rocksdb.estimate-num-keys") to obtain an estimated number of keys stored in a column family, or use GetAggregatedIntProperty(“rocksdb.estimate-num-keys", &num_keys) to obtain an estimated number of keys stored in the whole RocksDB database.
使用' GetIntProperty(cf_handle, " RocksDB . estimated -num-keys") '来获取一个列族中存储的键的估计数量,或者使用' GetAggregatedIntProperty(" RocksDB . estimated -num-keys", &num_keys)来获取整个RocksDB数据库中存储的键的估计数量。

Q: Why GetIntProperty can only return an estimated number of keys in a RocksDB database?
为什么gettintproperty只能返回RocksDB数据库中估计的键数?**

A: Obtaining an accurate number of keys in any LSM databases like RocksDB is a challenging problem as they have duplicate keys and deletion entries (i.e., tombstones) that will require a full compaction in order to get an accurate number of keys. In addition, if the RocksDB database contains merge operators, it will also make the estimated number of keys less accurate.
在任何LSM数据库(如RocksDB)中获取准确的键数都是一个具有挑战性的问题,因为它们有重复的键和删除条目(即tombstone),需要进行完整的压缩才能获得准确的键数。此外,如果RocksDB数据库包含合并操作符,也会降低估计键数的准确性。

Resource Management

Q: How much resource does an iterator hold and when will these resource be released?
迭代器持有多少资源,何时释放这些资源?

A: Iterators hold both data blocks and memtables in memory. The resource each iterator holds are:
迭代器在内存中同时保存数据块和memtable。每个迭代器持有的资源是:

  1. The data blocks that the iterator is currently pointing to. See https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB#blocks-pinned-by-iterators
    迭代器当前指向的数据块。看到https://github.com/facebook/rocksdb/wiki/Memory-usage-in-RocksDB blocks-pinned-by-iterators
  2. The memtables that existed when the iterator was created, even after the memtables have been flushed.
    在创建迭代器时存在的memtable,即使memtable已经被刷新。
  3. All the SST files on disk that existed when the iterator was created, even if they are compacted.
    创建迭代器时磁盘上存在的所有SST文件,即使它们被压缩。

These resources will be released when the iterator is deleted.
当迭代器被删除时,这些资源将被释放。

Q: How to estimate total size of index and filter blocks in a DB?
如何估计索引和过滤器块的总大小?

A: For an offline DB, "sst_dump --show_properties --command=none" will show you the index and filter size for a specific sst file. You can sum them up for all DB. For a running DB, you can fetch from DB property kAggregatedTableProperties. Or calling DB::GetPropertiesOfAllTables() and sum up the index and filter block size of individual files.
对于离线DB, ' "sst_dump——show_properties——command=none"将显示特定sst文件的索引和过滤器大小。你可以把所有的DB加起来。对于一个运行中的DB,你可以从DB属性' kAggregatedTableProperties '中获取。或者调用' DB::GetPropertiesOfAllTables() ',并将单个文件的索引和过滤块大小相加。

Q: Can RocksDB tell us the total number of keys in the database? Or the total number of keys within a range?
RocksDB能否告诉我们数据库中key的总数?或者一个范围内键的总数?**

A: RocksDB can estimate number of keys through DB property “rocksdb.estimate-num-keys”. Note this estimation can be far off when there are merge operators, existing keys overwritten, or deleting non-existing keys.
RocksDB可以通过数据库属性“RocksDB .estimate-num-keys”来估计键的数量。注意,当存在合并操作符、覆盖现有键或删除非现有键时,这种估计可能相差很远。

The best way to estimate total number of keys within a range is to first estimate size of a range by calling DB::GetApproximateSizes(), and then estimate number of keys from that.
估计一个范围内键总数的最佳方法是,首先通过调用' DB::GetApproximateSizes(),估计一个范围的大小,然后据此估计键的数量。

Others

Q: Who is using RocksDB?
谁在使用RocksDB?

A: https://github.com/facebook/rocksdb/blob/main/USERS.md

Q: How should I implement multiple data shards/partitions.
我应该如何实现多个数据碎片/分区。

A: You can use one RocksDB database per shard/partition. Multiple RocksDB instances could be run as separate processes or within a single process. When multiple instances of RocksDB are used within the single process, some resources (like thread pool, block cache, rate limiter etc..) could be shared between those RocksDB instances (See https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#support-for-multiple-embedded-databases-in-the-same-process)
你可以在每个shard/partition中使用一个RocksDB数据库。多个RocksDB实例可以作为单独的进程运行,也可以在单个进程中运行。当一个进程中使用多个RocksDB实例时,一些资源(如线程池、块缓存、速率限制等)可以在这些RocksDB实例之间共享(见https://github.com/facebook/rocksdb/wiki/RocksDB-Overview#support-for-multiple-embedded-databases-in-the-same-process)。

Q: DB operations fail because of out-of-space. How can I unblock myself?
DB操作失败,原因是空间不足。我怎样才能解除?

A: First clear up some free space. The DB will automatically start accepting operations once enough free space is available. The only exception is if 2PC is enabled and the WAL sync fails (in this case, the DB needs to be reopened). See [[Background Error Handling]] for more details.
首先清理一些空闲空间。一旦有足够的可用空间,DB将自动开始接受操作。唯一的例外是,如果2PC被启用,WAL同步失败(在这种情况下,DB需要重新打开)。详见[[后台错误处理]]。

©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 211,948评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,371评论 3 385
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 157,490评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,521评论 1 284
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 65,627评论 6 386
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 49,842评论 1 290
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 38,997评论 3 408
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,741评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,203评论 1 303
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,534评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,673评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,339评论 4 330
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 39,955评论 3 313
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,770评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,000评论 1 266
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,394评论 2 360
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,562评论 2 349

推荐阅读更多精彩内容

  • 16宿命:用概率思维提高你的胜算 以前的我是风险厌恶者,不喜欢去冒险,但是人生放弃了冒险,也就放弃了无数的可能。 ...
    yichen大刀阅读 6,041评论 0 4
  • 公元:2019年11月28日19时42分农历:二零一九年 十一月 初三日 戌时干支:己亥乙亥己巳甲戌当月节气:立冬...
    石放阅读 6,876评论 0 2
  • 昨天考过了阿里规范,心里舒坦了好多,敲代码也犹如神助。早早完成工作回家喽
    常亚星阅读 3,034评论 0 1
  • 三军可夺气,将军可夺心。是故朝气锐,昼气惰,暮气归。善用兵者,避其锐气,击其惰归,此治气者也。以治待乱,以静待哗,...
    生姜牛奶泡腾片阅读 1,572评论 0 1
  • 追星逐渐从心理上的偏好转向致力于打榜花钱的形式主义,明星信息的公开化与非法售卖也导致私生饭等盲目甚至变性的行...
    黑彧阅读 1,595评论 0 3