Elasticsearch轻量级替代品-meilisearch

概括

rust 写的，快，比go都快，轻量级，java的东西又臭又费内存
es支持的常用查询，美丽search 大部分都支持，够用
适合单机部署，不代表不能分布式部署，数据量几百万完全够
分页效果比es强，搜索排序和es的打分机制精准度五五开
再技术市场上再沉淀几年，才能被开发者所认可

安装

docker pull getmeili/meilisearch:v1.8
docker run -d --name meilisearch -p 7700:7700 -e MEILI_MASTER_KEY='meilisearch-api' getmeili/meilisearch

其中MEILI_MASTER_KEY为一个自定义的密钥，像密码一样，当然，你可以不设置

浏览器打开 http://127.0.0.1:7700/ 输入上面密钥即可
这个是一个控制台，可用来调试

官方提供了模拟数据使用

下载模拟数据
curl https://www.meilisearch.com/movies.json -O

新建索引，并导入数据
curl -X POST "http://localhost:7700/indexes/movies/documents?primaryKey=id" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer meilisearch-api" \
  --data-binary @movies.json

这样就可以在 http://127.0.0.1:7700/ 看到效果了，当然也可以在程序中查询数据

配置说明

meilisearch 的配置项可通过-h参数查看支持项
官网关于配置项的介绍地址
https://www.meilisearch.com/docs/learn/self_hosted/configure_meilisearch_at_launch#command-line-options-and-flags

由于官网的介绍过于详细化，这里我全部列出来方便查看

/meili_data # /bin/meilisearch -h
Usage: meilisearch [OPTIONS]

Options:
      --config-file-path <CONFIG_FILE_PATH>
          Set the path to a configuration file that should be used to setup the engine. Format must be TOML
      --db-path <DB_PATH>
          Designates the location where database files will be created and retrieved [env: MEILI_DB_PATH=] [default: ./data.ms]
      --dump-dir <DUMP_DIR>
          Sets the directory where Meilisearch will create dump files [env: MEILI_DUMP_DIR=] [default: dumps/]
      --env <ENV>
          Configures the instance's environment. Value must be either `production` or `development` [env: MEILI_ENV=] [default: development] [possible values: development, production]
      --experimental-contains-filter
          Experimental contains filter feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/763> [env: MEILI_EXPERIMENTAL_CONTAINS_FILTER=]
      --experimental-drop-search-after <EXPERIMENTAL_DROP_SEARCH_AFTER>
          Experimental drop search after. For more information, see: <https://github.com/orgs/meilisearch/discussions/783> [env: MEILI_EXPERIMENTAL_DROP_SEARCH_AFTER=] [default: 60]
      --experimental-dumpless-upgrade
          Experimental dumpless upgrade. For more information, see: <https://github.com/orgs/meilisearch/discussions/804> [env: MEILI_EXPERIMENTAL_DUMPLESS_UPGRADE=]
      --experimental-embedding-cache-entries <EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES>
          Enables experimental caching of search query embeddings. The value represents the maximal number of entries in the cache of each distinct embedder [env: MEILI_EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES=] [default: 0]
      --experimental-enable-logs-route
          Experimental logs route feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/721> [env: MEILI_EXPERIMENTAL_ENABLE_LOGS_ROUTE=]
      --experimental-enable-metrics
          Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518> [env: MEILI_EXPERIMENTAL_ENABLE_METRICS=]
      --experimental-limit-batched-tasks-total-size <EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE>
          Experimentally reduces the maximum total size, in bytes, of tasks that will be processed at once, see: <https://github.com/orgs/meilisearch/discussions/801> [env: MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_SIZE=] [default: 18446744073709551615]
      --experimental-logs-mode <EXPERIMENTAL_LOGS_MODE>
          Experimental logs mode feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/723> [env: MEILI_EXPERIMENTAL_LOGS_MODE=] [default: HUMAN]
      --experimental-max-number-of-batched-tasks <EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS>
          Experimentally reduces the maximum number of tasks that will be processed at once, see: <https://github.com/orgs/meilisearch/discussions/713> [env: MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS=] [default: 18446744073709551615]
      --experimental-nb-searches-per-core <EXPERIMENTAL_NB_SEARCHES_PER_CORE>
          Experimental number of searches per core. For more information, see: <https://github.com/orgs/meilisearch/discussions/784> [env: MEILI_EXPERIMENTAL_NB_SEARCHES_PER_CORE=] [default: 4]
      --experimental-no-snapshot-compaction
          Experimental no snapshot compaction feature [env: MEILI_EXPERIMENTAL_NO_SNAPSHOT_COMPACTION=]
      --experimental-reduce-indexing-memory-usage
          Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652> [env: MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE=]
      --experimental-replication-parameters
          Enable multiple features that helps you to run meilisearch in a replicated context. For more information, see: <https://github.com/orgs/meilisearch/discussions/725> [env: MEILI_EXPERIMENTAL_REPLICATION_PARAMETERS=]
      --experimental-search-queue-size <EXPERIMENTAL_SEARCH_QUEUE_SIZE>
          Experimental search queue size. For more information, see: <https://github.com/orgs/meilisearch/discussions/729> [env: MEILI_EXPERIMENTAL_SEARCH_QUEUE_SIZE=] [default: 1000]
  -h, --help
          Print help (see more with '--help')
      --http-addr <HTTP_ADDR>
          Sets the HTTP address and port Meilisearch will use [env: MEILI_HTTP_ADDR=0.0.0.0:7700] [default: localhost:7700]
      --http-payload-size-limit <HTTP_PAYLOAD_SIZE_LIMIT>
          Sets the maximum size of accepted payloads. Value must be given in bytes or explicitly stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb') [env: MEILI_HTTP_PAYLOAD_SIZE_LIMIT=] [default: 100000000]
      --ignore-dump-if-db-exists
          Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-dump`. Instead, the dump will be ignored and Meilisearch will launch using the existing database [env: MEILI_IGNORE_DUMP_IF_DB_EXISTS=]
      --ignore-missing-dump
          Prevents Meilisearch from throwing an error when `--import-dump` does not point to a valid dump file. Instead, Meilisearch will start normally without importing any dump [env: MEILI_IGNORE_MISSING_DUMP=]
      --ignore-missing-snapshot
          Prevents a Meilisearch instance from throwing an error when `--import-snapshot` does not point to a valid snapshot file [env: MEILI_IGNORE_MISSING_SNAPSHOT=]
      --ignore-snapshot-if-db-exists
          Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-snapshot`. Instead, the snapshot will be ignored and Meilisearch will launch using the existing database [env: MEILI_IGNORE_SNAPSHOT_IF_DB_EXISTS=]
      --import-dump <IMPORT_DUMP>
          Imports the dump file located at the specified path. Path must point to a `.dump` file. If a database already exists, Meilisearch will throw an error and abort launch [env: MEILI_IMPORT_DUMP=]
      --import-snapshot <IMPORT_SNAPSHOT>
          Launches Meilisearch after importing a previously-generated snapshot at the given filepath [env: MEILI_IMPORT_SNAPSHOT=]
      --log-level <LOG_LEVEL>
          Defines how much detail should be present in Meilisearch's logs [env: MEILI_LOG_LEVEL=] [default: INFO]
      --master-key <MASTER_KEY>
          Sets the instance's master key, automatically protecting all routes except `GET /health` [env: MEILI_MASTER_KEY=meilisearch-api]
      --max-indexing-memory <MAX_INDEXING_MEMORY>
          Sets the maximum amount of RAM Meilisearch can use when indexing. By default, Meilisearch uses no more than two thirds of available memory [env: MEILI_MAX_INDEXING_MEMORY=] [default: "10.373163858428597 GiB"]
      --max-indexing-threads <MAX_INDEXING_THREADS>
          Sets the maximum number of threads Meilisearch can use during indexation. By default, the indexer avoids using more than half of a machine's total processing units. This ensures Meilisearch is always ready to perform searches, even while you are updating an index [env: MEILI_MAX_INDEXING_THREADS=] [default: 6]
      --no-analytics
          Deactivates Meilisearch's built-in telemetry when provided [env: MEILI_NO_ANALYTICS=]
      --schedule-snapshot [<SNAPSHOT_INTERVAL_SEC>]
          Activates scheduled snapshots when provided. Snapshots are disabled by default [env: MEILI_SCHEDULE_SNAPSHOT=] [default: ]
      --snapshot-dir <SNAPSHOT_DIR>
          Sets the directory where Meilisearch will store snapshots [env: MEILI_SNAPSHOT_DIR=] [default: snapshots/]
      --ssl-auth-path <SSL_AUTH_PATH>
          Enables client authentication in the specified path [env: MEILI_SSL_AUTH_PATH=]
      --ssl-cert-path <SSL_CERT_PATH>
          Sets the server's SSL certificates [env: MEILI_SSL_CERT_PATH=]
      --ssl-key-path <SSL_KEY_PATH>
          Sets the server's SSL key files [env: MEILI_SSL_KEY_PATH=]
      --ssl-ocsp-path <SSL_OCSP_PATH>
          Sets the server's OCSP file. *Optional* [env: MEILI_SSL_OCSP_PATH=]
      --ssl-require-auth
          Makes SSL authentication mandatory [env: MEILI_SSL_REQUIRE_AUTH=]
      --ssl-resumption
          Activates SSL session resumption [env: MEILI_SSL_RESUMPTION=]
      --ssl-tickets
          Activates SSL tickets [env: MEILI_SSL_TICKETS=]
      --task-webhook-authorization-header <TASK_WEBHOOK_AUTHORIZATION_HEADER>
          The Authorization header to send on the webhook URL whenever a task finishes so a third party can be notified [env: MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER=]
      --task-webhook-url <TASK_WEBHOOK_URL>
          Called whenever a task finishes so a third party can be notified [env: MEILI_TASK_WEBHOOK_URL=]
  -V, --version
          Print version

通过这些简单的英文，大致也能猜出意思来

这里要特殊说明一下，以数据存放位置 --db-path为例
在部署时命令为
/bin/meilisearch ---db-path=/home/data/data.ms

meilisearch 还提供了参数项通过配置文件来体现，注意，只支持.toml文件，例如
新建一个config.toml文件

db_path = "/home/data/data.ms"

启动服务
/bin/meilisearch --config-file-path=./config.toml

效果和上面的一样的

官网提供了完整的配置文件下载示例
curl https://raw.githubusercontent.com/meilisearch/meilisearch/latest/config.toml > config.toml

几个比较常用的命令，例如db-path,import-dump,config-file-path,master_key
需要进一步运维的话可以关注一下配置块

功能介绍

我们把她当作文档数据库，索引也就对应的概念为库表，文档对应具体的记录

对于查询功能，不在花费过多时间去整理，网上自行查阅，这里列举几个常用的功能点说明

索引的创建
分为显式和隐式，直接插入数据到一个指定索引库，会根据实际数据新建索引库并插入记录
主键ID
索引库必须有一个主键属性，每条文档都会有一个唯一id，新建库时可以指定指定字段
如果没有主动指定， Meilisearch 从您的数据集中推断出一个字段作为唯一标识

当然，后续也可以更行主键ID字段

异步任务，添加文档、修改、新建索引，这类操作为异步操作，应对密集计算服务
数据导出、导入、迁移等服务
支持搜索驱动嵌入，例如，你可以将向量搜索引入进来，只需要配置llm厂商的key，就可以实现向量化搜索，推荐用openai的向量搜索，1百万token才3美分
提供了权限、临时权限管理

查询

基本搜索，这个就是最普通的全文搜索了，注意哈，当你搜索Americane时，e是你不小心打错的字符，American也能搜出来，因为她支持错别字纠正
过滤器，支持过滤字段，例如某几个字段 where 条件查询
支持查询结果升降序排序
支持分页，没错，offset limit
支持前缀搜索，例如mysql中的like，比如搜索 "mat" 也能匹配 "matrix"
同义词搜索，假如你搜索phone时，你可以设置同样搜索[iphone, apple phone]，省的你用别名去查询多次
可搜索字段控制，你可以指定索引库哪些字段可被搜索
属性裁剪，

假如你要搜索的是关于小说的索引，十万字的小说存储在content字段中，返回内容将非常庞大，你可以在搜索时指定该字段返回显示多少个字
指定字段返回，也就相当于select * 向select id,name,sex转变
高亮显示，你可以让搜索内容中的搜索被搜索内容高亮显示，例如中华<em>人民</em>共和国万岁，其中人民两个字是标签包裹起来的，然后配合前端区高亮显示，并且这个标签是可以自定义的，而且支持多个字段高亮显示，例如真实记录中的title desc

常用到的查询参数

参数	说明
q	查询关键词
filter	过滤条件
sort	排序字段
limit	返回结果上限
offset	跳过的结果数
attributesToRetrieve	指定返回字段
attributesToHighlight	高亮字段
attributesToCrop	裁剪字段
cropLength	裁剪长度

当然，官方还提供了很多更丰富的查询方式，以下为官方文档
https://meilisearch.org.cn/docs/home