Elasticsearch轻量级替代品-meilisearch

概括

  1. rust 写的,快,比go都快,轻量级,java的东西又臭又费内存
  2. es支持的常用查询,美丽search 大部分都支持,够用
  3. 适合单机部署,不代表不能分布式部署,数据量几百万完全够
  4. 分页效果比es强,搜索排序和es的打分机制精准度五五开
  5. 再技术市场上再沉淀几年,才能被开发者所认可

安装

docker pull getmeili/meilisearch:v1.8
docker run -d --name meilisearch -p 7700:7700 -e MEILI_MASTER_KEY='meilisearch-api' getmeili/meilisearch

其中MEILI_MASTER_KEY为一个自定义的密钥,像密码一样,当然,你可以不设置

浏览器打开 http://127.0.0.1:7700/ 输入上面密钥即可
这个是一个控制台,可用来调试

官方提供了模拟数据使用

下载模拟数据
curl https://www.meilisearch.com/movies.json -O

新建索引,并导入数据
curl -X POST "http://localhost:7700/indexes/movies/documents?primaryKey=id" \
  -H "Content-Type: application/json" \
  -H "Authorization: Bearer meilisearch-api" \
  --data-binary @movies.json

这样就可以在 http://127.0.0.1:7700/ 看到效果了,当然也可以在程序中查询数据

配置说明

meilisearch 的配置项可通过-h参数查看支持项
官网关于配置项的介绍地址
https://www.meilisearch.com/docs/learn/self_hosted/configure_meilisearch_at_launch#command-line-options-and-flags

由于官网的介绍过于详细化,这里我全部列出来方便查看

/meili_data # /bin/meilisearch -h
Usage: meilisearch [OPTIONS]

Options:
      --config-file-path <CONFIG_FILE_PATH>
          Set the path to a configuration file that should be used to setup the engine. Format must be TOML
      --db-path <DB_PATH>
          Designates the location where database files will be created and retrieved [env: MEILI_DB_PATH=] [default: ./data.ms]
      --dump-dir <DUMP_DIR>
          Sets the directory where Meilisearch will create dump files [env: MEILI_DUMP_DIR=] [default: dumps/]
      --env <ENV>
          Configures the instance's environment. Value must be either `production` or `development` [env: MEILI_ENV=] [default: development] [possible values: development, production]
      --experimental-contains-filter
          Experimental contains filter feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/763> [env: MEILI_EXPERIMENTAL_CONTAINS_FILTER=]
      --experimental-drop-search-after <EXPERIMENTAL_DROP_SEARCH_AFTER>
          Experimental drop search after. For more information, see: <https://github.com/orgs/meilisearch/discussions/783> [env: MEILI_EXPERIMENTAL_DROP_SEARCH_AFTER=] [default: 60]
      --experimental-dumpless-upgrade
          Experimental dumpless upgrade. For more information, see: <https://github.com/orgs/meilisearch/discussions/804> [env: MEILI_EXPERIMENTAL_DUMPLESS_UPGRADE=]
      --experimental-embedding-cache-entries <EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES>
          Enables experimental caching of search query embeddings. The value represents the maximal number of entries in the cache of each distinct embedder [env: MEILI_EXPERIMENTAL_EMBEDDING_CACHE_ENTRIES=] [default: 0]
      --experimental-enable-logs-route
          Experimental logs route feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/721> [env: MEILI_EXPERIMENTAL_ENABLE_LOGS_ROUTE=]
      --experimental-enable-metrics
          Experimental metrics feature. For more information, see: <https://github.com/meilisearch/meilisearch/discussions/3518> [env: MEILI_EXPERIMENTAL_ENABLE_METRICS=]
      --experimental-limit-batched-tasks-total-size <EXPERIMENTAL_LIMIT_BATCHED_TASKS_TOTAL_SIZE>
          Experimentally reduces the maximum total size, in bytes, of tasks that will be processed at once, see: <https://github.com/orgs/meilisearch/discussions/801> [env: MEILI_EXPERIMENTAL_LIMIT_BATCHED_TASKS_SIZE=] [default: 18446744073709551615]
      --experimental-logs-mode <EXPERIMENTAL_LOGS_MODE>
          Experimental logs mode feature. For more information, see: <https://github.com/orgs/meilisearch/discussions/723> [env: MEILI_EXPERIMENTAL_LOGS_MODE=] [default: HUMAN]
      --experimental-max-number-of-batched-tasks <EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS>
          Experimentally reduces the maximum number of tasks that will be processed at once, see: <https://github.com/orgs/meilisearch/discussions/713> [env: MEILI_EXPERIMENTAL_MAX_NUMBER_OF_BATCHED_TASKS=] [default: 18446744073709551615]
      --experimental-nb-searches-per-core <EXPERIMENTAL_NB_SEARCHES_PER_CORE>
          Experimental number of searches per core. For more information, see: <https://github.com/orgs/meilisearch/discussions/784> [env: MEILI_EXPERIMENTAL_NB_SEARCHES_PER_CORE=] [default: 4]
      --experimental-no-snapshot-compaction
          Experimental no snapshot compaction feature [env: MEILI_EXPERIMENTAL_NO_SNAPSHOT_COMPACTION=]
      --experimental-reduce-indexing-memory-usage
          Experimental RAM reduction during indexing, do not use in production, see: <https://github.com/meilisearch/product/discussions/652> [env: MEILI_EXPERIMENTAL_REDUCE_INDEXING_MEMORY_USAGE=]
      --experimental-replication-parameters
          Enable multiple features that helps you to run meilisearch in a replicated context. For more information, see: <https://github.com/orgs/meilisearch/discussions/725> [env: MEILI_EXPERIMENTAL_REPLICATION_PARAMETERS=]
      --experimental-search-queue-size <EXPERIMENTAL_SEARCH_QUEUE_SIZE>
          Experimental search queue size. For more information, see: <https://github.com/orgs/meilisearch/discussions/729> [env: MEILI_EXPERIMENTAL_SEARCH_QUEUE_SIZE=] [default: 1000]
  -h, --help
          Print help (see more with '--help')
      --http-addr <HTTP_ADDR>
          Sets the HTTP address and port Meilisearch will use [env: MEILI_HTTP_ADDR=0.0.0.0:7700] [default: localhost:7700]
      --http-payload-size-limit <HTTP_PAYLOAD_SIZE_LIMIT>
          Sets the maximum size of accepted payloads. Value must be given in bytes or explicitly stating a base unit (for instance: 107374182400, '107.7Gb', or '107374 Mb') [env: MEILI_HTTP_PAYLOAD_SIZE_LIMIT=] [default: 100000000]
      --ignore-dump-if-db-exists
          Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-dump`. Instead, the dump will be ignored and Meilisearch will launch using the existing database [env: MEILI_IGNORE_DUMP_IF_DB_EXISTS=]
      --ignore-missing-dump
          Prevents Meilisearch from throwing an error when `--import-dump` does not point to a valid dump file. Instead, Meilisearch will start normally without importing any dump [env: MEILI_IGNORE_MISSING_DUMP=]
      --ignore-missing-snapshot
          Prevents a Meilisearch instance from throwing an error when `--import-snapshot` does not point to a valid snapshot file [env: MEILI_IGNORE_MISSING_SNAPSHOT=]
      --ignore-snapshot-if-db-exists
          Prevents a Meilisearch instance with an existing database from throwing an error when using `--import-snapshot`. Instead, the snapshot will be ignored and Meilisearch will launch using the existing database [env: MEILI_IGNORE_SNAPSHOT_IF_DB_EXISTS=]
      --import-dump <IMPORT_DUMP>
          Imports the dump file located at the specified path. Path must point to a `.dump` file. If a database already exists, Meilisearch will throw an error and abort launch [env: MEILI_IMPORT_DUMP=]
      --import-snapshot <IMPORT_SNAPSHOT>
          Launches Meilisearch after importing a previously-generated snapshot at the given filepath [env: MEILI_IMPORT_SNAPSHOT=]
      --log-level <LOG_LEVEL>
          Defines how much detail should be present in Meilisearch's logs [env: MEILI_LOG_LEVEL=] [default: INFO]
      --master-key <MASTER_KEY>
          Sets the instance's master key, automatically protecting all routes except `GET /health` [env: MEILI_MASTER_KEY=meilisearch-api]
      --max-indexing-memory <MAX_INDEXING_MEMORY>
          Sets the maximum amount of RAM Meilisearch can use when indexing. By default, Meilisearch uses no more than two thirds of available memory [env: MEILI_MAX_INDEXING_MEMORY=] [default: "10.373163858428597 GiB"]
      --max-indexing-threads <MAX_INDEXING_THREADS>
          Sets the maximum number of threads Meilisearch can use during indexation. By default, the indexer avoids using more than half of a machine's total processing units. This ensures Meilisearch is always ready to perform searches, even while you are updating an index [env: MEILI_MAX_INDEXING_THREADS=] [default: 6]
      --no-analytics
          Deactivates Meilisearch's built-in telemetry when provided [env: MEILI_NO_ANALYTICS=]
      --schedule-snapshot [<SNAPSHOT_INTERVAL_SEC>]
          Activates scheduled snapshots when provided. Snapshots are disabled by default [env: MEILI_SCHEDULE_SNAPSHOT=] [default: ]
      --snapshot-dir <SNAPSHOT_DIR>
          Sets the directory where Meilisearch will store snapshots [env: MEILI_SNAPSHOT_DIR=] [default: snapshots/]
      --ssl-auth-path <SSL_AUTH_PATH>
          Enables client authentication in the specified path [env: MEILI_SSL_AUTH_PATH=]
      --ssl-cert-path <SSL_CERT_PATH>
          Sets the server's SSL certificates [env: MEILI_SSL_CERT_PATH=]
      --ssl-key-path <SSL_KEY_PATH>
          Sets the server's SSL key files [env: MEILI_SSL_KEY_PATH=]
      --ssl-ocsp-path <SSL_OCSP_PATH>
          Sets the server's OCSP file. *Optional* [env: MEILI_SSL_OCSP_PATH=]
      --ssl-require-auth
          Makes SSL authentication mandatory [env: MEILI_SSL_REQUIRE_AUTH=]
      --ssl-resumption
          Activates SSL session resumption [env: MEILI_SSL_RESUMPTION=]
      --ssl-tickets
          Activates SSL tickets [env: MEILI_SSL_TICKETS=]
      --task-webhook-authorization-header <TASK_WEBHOOK_AUTHORIZATION_HEADER>
          The Authorization header to send on the webhook URL whenever a task finishes so a third party can be notified [env: MEILI_TASK_WEBHOOK_AUTHORIZATION_HEADER=]
      --task-webhook-url <TASK_WEBHOOK_URL>
          Called whenever a task finishes so a third party can be notified [env: MEILI_TASK_WEBHOOK_URL=]
  -V, --version
          Print version

通过这些简单的英文,大致也能猜出意思来

这里要特殊说明一下,以数据存放位置 --db-path为例
在部署时命令为
/bin/meilisearch ---db-path=/home/data/data.ms

meilisearch 还提供了参数项通过配置文件来体现,注意,只支持.toml文件,例如
新建一个config.toml文件

db_path = "/home/data/data.ms"

启动服务
/bin/meilisearch --config-file-path=./config.toml

效果和上面的一样的

官网提供了完整的配置文件下载示例
curl https://raw.githubusercontent.com/meilisearch/meilisearch/latest/config.toml > config.toml

几个比较常用的命令,例如db-path,import-dump,config-file-path,master_key
需要进一步运维的话可以关注一下配置块

功能介绍

我们把她当作文档数据库,索引也就对应的概念为库表,文档对应具体的记录

对于查询功能,不在花费过多时间去整理,网上自行查阅,这里列举几个常用的功能点说明

  1. 索引的创建
    分为显式和隐式,直接插入数据到一个指定索引库,会根据实际数据新建索引库并插入记录

  2. 主键ID
    索引库必须有一个主键属性,每条文档都会有一个唯一id,新建库时可以指定指定字段
    如果没有主动指定, Meilisearch 从您的数据集中推断出一个字段作为唯一标识

当然,后续也可以更行主键ID字段

  1. 异步任务,添加文档、修改、新建索引,这类操作为异步操作,应对密集计算服务

  2. 数据导出、导入、迁移等服务

  3. 支持搜索驱动嵌入,例如,你可以将向量搜索引入进来,只需要配置llm厂商的key,就可以实现向量化搜索,推荐用openai的向量搜索,1百万token才3美分

  4. 提供了权限、临时权限管理

查询

  • 基本搜索,这个就是最普通的全文搜索了,注意哈,当你搜索Americane时,e是你不小心打错的字符,American也能搜出来,因为她支持错别字纠正
  • 过滤器,支持过滤字段,例如某几个字段 where 条件查询
  • 支持查询结果升降序排序
  • 支持分页,没错,offset limit
  • 支持前缀搜索,例如mysql中的like,比如搜索 "mat" 也能匹配 "matrix"
  • 同义词搜索,假如你搜索phone时,你可以设置同样搜索[iphone, apple phone],省的你用别名去查询多次
  • 可搜索字段控制,你可以指定索引库哪些字段可被搜索
  • 属性裁剪,
  1. 假如你要搜索的是关于小说的索引,十万字的小说存储在content字段中,返回内容将非常庞大,你可以在搜索时指定该字段返回显示多少个字
  2. 指定字段返回,也就相当于select *select id,name,sex转变
  3. 高亮显示,你可以让搜索内容中的搜索被搜索内容高亮显示,例如中华<em>人民</em>共和国万岁,其中人民两个字是标签包裹起来的,然后配合前端区高亮显示,并且这个标签是可以自定义的,而且支持多个字段高亮显示,例如真实记录中的title desc

常用到的查询参数

参数 说明
q 查询关键词
filter 过滤条件
sort 排序字段
limit 返回结果上限
offset 跳过的结果数
attributesToRetrieve 指定返回字段
attributesToHighlight 高亮字段
attributesToCrop 裁剪字段
cropLength 裁剪长度

当然,官方还提供了很多更丰富的查询方式,以下为官方文档
https://meilisearch.org.cn/docs/home

最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
平台声明:文章内容(如有图片或视频亦包括在内)由作者上传并发布,文章内容仅代表作者本人观点,简书系信息发布平台,仅提供信息存储服务。

推荐阅读更多精彩内容