检查状态
$ curl localhost:9200/_cat/health
1597911300 08:15:00 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0%
查看集群状态
$ curl 'localhost:9200/_cat/health?v'
epoch timestamp cluster status node.total node.data shards pri relo init unassign pending_tasks max_task_wait_time ctive_shards_percent
1597911567 08:19:27 elasticsearch green 1 1 0 0 0 0 0 0 - 100.0%
集群的名字是“elasticsearch”,正常运行,并且状态是绿色。
集群节点信息
$ curl localhost:9200/_cat/nodes?v
ip heap.percent ram.percent cpu load_1m load_5m load_15m node.role master name
127.0.0.1 16 90 7 0.12 1.05 1.21 dilmrt * royzeng-VirtualBox
看到叫“royzeng-VirtualBox”的节点,这个节点是我们集群中的唯一节点。
列出所有的索引
$ curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
这个结果意味着,在我们的集群中没有任何索引。
创建一个索引
Request
PUT /<index>
现在让我们创建一个叫做“customer” 的索引,然后再列出所有的索引:
curl -XPUT 'localhost:9200/customer?pretty'
curl 'localhost:9200/_cat/indices?v'
第一个命令使用PUT创建了一个叫做“customer” 的索引。将pretty
附加到调用的尾部,使其以格式化的形式打印出JSON响应
响应如下:
$ curl -XPUT 'localhost:9200/customer?pretty'
{
"acknowledged" : true,
"shards_acknowledged" : true,
"index" : "customer"
}
$ curl 'localhost:9200/_cat/indices?v'
health status index uuid pri rep docs.count docs.deleted store.size pri.store.size
yellow open customer 0zmIaw_fSeKLcEmKsdXb7g 1 1 0 0 208b 208b
第二个命令的结果告知我们,我们现在有一个叫做 customer 的索引,并且它有1个主分片和1份复制,其中包含0个文档。
你可能也注意到了这个customer索引有一个黄色健康标签。回顾我们之前的讨论,黄色意味着某些复制没有(或者还未)被分配。这个索引之所以这样,是因为 Elasticsearch默认 为这个索引创建一份复制。 由于现在我们只有一个节点在运行,那一份复制就分配不了了(为了高可用),直到当另外一个节点加入到这个集群后,才能分配。一旦那份复制在第二个节点上被复制,这个节点的健康状态就会变成绿色。
创建索引的同时指定mapping
7.0 版本后没有用type 了,这里的例子只是向前兼容。
PUT /my-index-000001
{
"mappings": {
"properties": {
"age": { "type": "integer" },
"email": { "type": "keyword" },
"name": { "type": "text" }
}
}
}
例如
$ curl -H "Content-Type: application/json" -XPUT localhost:9200/students -d '
> {
> "mappings": {
> "properties": {
> "age": { "type": "integer" },
> "email": { "type": "keyword" },
> "name": { "type": "text" }
> }
> }
> }'
{"acknowledged":true,"shards_acknowledged":true,"index":"students"}
Add a field to existing index
PUT /my-index-000001/_mapping
{
"properties": {
"employee-id": {
"type": "keyword",
"index": false
}
}
}
例如
$ curl -H "Content-Type: application/json" -XPUT localhost:9200/customer/_mapping -d '
> {
> "properties": {
> "name": {
> "type": "text"
> }
> }
> }'
{"acknowledged":true}
添加一个文档
Request
PUT /<target>/_doc/<_id>
POST /<target>/_doc/
PUT /<target>/_create/<_id>
POST /<target>/_create/<_id>
例如
$ curl -H "Content-Type: application/json" -XPUT 'localhost:9200/teacher/_doc/3?pretty' -d '
> {
> "name": "Roy Zeng"
> }'
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 2
}
# 用 POST 方式会自动生成一个id。
$ curl -H "Content-Type: application/json" -XPOST 'localhost:9200/teacher/_doc/' -d '
> {
> "name": "Tony He"
> }'
{"_index":"teacher","_type":"_doc","_id":"-L-6D3QBwc726urOj0Un","_version":1,"result":"created","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":3,"_primary_term":2}
查询一个文档get
查询刚才的输入
Request
GET <index>/_doc/<_id>
HEAD <index>/_doc/<_id>
GET <index>/_source/<_id>
HEAD <index>/_source/<_id>
$ curl -XGET localhost:9200/teacher/_doc/3?pretty
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"_seq_no" : 2,
"_primary_term" : 2,
"found" : true,
"_source" : {
"name" : "Roy Zeng"
}
}
$ curl -XGET localhost:9200/teacher/_source/3?pretty
{
"name" : "Roy Zeng"
}
$ curl -XGET localhost:9200/teacher/_doc/-L-6D3QBwc726urOj0Un?pretty
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "-L-6D3QBwc726urOj0Un",
"_version" : 1,
"_seq_no" : 3,
"_primary_term" : 2,
"found" : true,
"_source" : {
"name" : "Tony He"
}
}
查询Index 里面全部内容
GET /<target>/_search
批量读取mget
mget它允许我们一次get大量的document,与get单条数据的api get方法类似,mget查询是基于index,id两个个条件进行的,比如我们可以一次mget 50条数据,这50条数据可以是在50个不同index中,并且每一个get都可以单独指定它的路由查询信息,或者返回的字段内容。
mget可以批量的根据index,type,id三个字段来获取一批数据,它不能用来查询,最少得需要知道index 和 id两个字段的值,才能进行get,这一点与query是不一样的。
GET /_mget
GET /<index>/_mget
不指定index
GET /_mget
{
"docs": [
{
"_index": "my-index-000001",
"_id": "1"
},
{
"_index": "my-index-000002",
"_id": "2"
}
]
}
指定index
GET /my-index-000001/_mget
{
"docs": [
{
"_type": "_doc",
"_id": "1"
},
{
"_type": "_doc",
"_id": "2"
}
]
}
或者简化
GET /my-index-000001/_mget
{
"ids" : ["1", "2"]
}
例如
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/_mget?pretty -d '
> {
> "docs" : [
> {
> "_index": "teacher",
> "_id" : "1"
> },
> {
> "_index": "customer",
> "_id" : "2"
> }
> ]
> }'
{
"docs" : [
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 12,
"_primary_term" : 2,
"found" : true,
"_source" : {
"name" : "John Doe becomes Jane Doe"
}
},
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"name" : "Roy Doe"
}
}
]
}
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/customer/_source/_mget?pretty -d '
> {
> "ids" : ["1", "2"]
> }'
{
"docs" : [
{
"_index" : "customer",
"_type" : "_source",
"_id" : "1",
"found" : false
},
{
"_index" : "customer",
"_type" : "_source",
"_id" : "2",
"found" : false
}
]
}
此外,还可以单独的设置对返回的数据(source)进行过滤操作,默认情况下如果这条数据被store了,那么它会返回整个document。
常见使用source过滤
GET /_mget
{
"docs": [
{
"_index": "test",
"_type": "_doc",
"_id": "1",
"_source": false
},
{
"_index": "test",
"_type": "_doc",
"_id": "2",
"_source": [ "field3", "field4" ]
},
{
"_index": "test",
"_type": "_doc",
"_id": "3",
"_source": {
"include": [ "user" ],
"exclude": [ "user.location" ]
}
}
]
}
例如
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/customer/_mget?pretty -d '
> {
> "docs": [
> {
> "_id": "1",
> "_source": [ "age" ]
> }
> ]
> }'
{
"docs" : [
{
"_index" : "customer",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 2,
"_primary_term" : 3,
"found" : true,
"_source" : {
"age" : 20
}
}
]
}
删除
删除需求,通常如下:
- 删除某个index的数据
- 根据某个主键id删除单条数据
- 根据某个查询条件删除一批数据
删除索引
$ curl -XDELETE 'localhost:9200/employee'
{"acknowledged":true}
重新验证一下
$ curl 'localhost:9200/_cat/indices?v'
删除文档
Request
DELETE /<index>/_doc/<_id>
删除文档是非常直观的。以下的例子展示了怎样删除ID为2的文档:
$ curl -XDELETE "localhost:9200/teacher/_doc/2"
{"_index":"teacher","_type":"_doc","_id":"2","_version":2,"result":"deleted","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":9,"_primary_term":2}
根据查询删除文档
POST /<target>/_delete_by_query
POST /my-index-000001/_delete_by_query
{
"query": {
"match": {
"user.id": "elkbee"
}
}
}
例子
$ curl -H 'Content-Type: application/json' -XPOST localhost:9200/customer/_delete_by_query?pretty -d '
> {
> "query": {
> "match": {
> "age": 20
> }
> }
> }'
{
"took" : 168,
"timed_out" : false,
"total" : 1,
"deleted" : 1,
"batches" : 1,
"version_conflicts" : 0,
"noops" : 0,
"retries" : {
"bulk" : 0,
"search" : 0
},
"throttled_millis" : 0,
"requests_per_second" : -1.0,
"throttled_until_millis" : 0,
"failures" : [ ]
}
更新文档
Request
POST /<index>/_update/<_id>
例子
注意:json 文件的格式变了,多了 doc 部分
$ curl -H 'Content-Type: application/json' -XPOST "localhost:9200/teacher/_update/3?pretty" -d '
> {
> "doc" : {
> "name": "Roy He"
> }
> }'
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 5,
"_primary_term" : 2
}
还可以添加新的field
$ curl -H 'Content-Type: application/json' -XPOST "localhost:9200/teacher/_update/3?pretty" -d '
> {
> "doc" : {
> "name": "Roy Zeng",
> "age": 20
> }
> }'
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_version" : 5,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 2
}
更新也可以通过使用简单的脚本来进行。这个例子使用一个脚本将age加5:
$ curl -H 'Content-Type: application/json' -XPOST "localhost:9200/teacher/_update/3?pretty" -d '
> {
> "script" : "ctx._source.age += 5"
> }'
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_version" : 6,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 8,
"_primary_term" : 2
}
在上面的例子中,ctx._source
指向当前被更新的文档。
还可以用重复写入的方式来修改。
批处理
Request
POST /_bulk
POST /<target>/_bulk
Provides a way to perform multiple index
, create
, delete
, and update
actions in a single request.
bulk的格式:
action:index/create/update/delete
metadata:_index,_type,_id
request body:_source (删除操作不需要加request body)
{ action: { metadata }}
{ request body }
create 和index的区别
如果数据存在,使用create操作失败,会提示文档已经存在,使用index则可以成功执行。
以下例子在一个bulk操作中,首先更新第一个文档(ID为1),然后删除第二个文档(ID为2)
$ curl -H 'Content-Type: application/json' -XPOST "localhost:9200/teacher/_bulk?pretty" -d '
> {"update":{"_id":"1"}}
> {"doc": { "name": "John Doe becomes Jane Doe" } }
> {"delete":{"_id":"2"}}
> '
{
"took" : 12,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "teacher",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 12,
"_primary_term" : 2,
"status" : 200
}
},
{
"delete" : {
"_index" : "teacher",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "not_found",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 13,
"_primary_term" : 2,
"status" : 404
}
}
]
}
$ cat request
{ "index" : { "_index" : "test", "_id" : "1" } }
{ "field1" : "value1" }
{ "delete" : { "_index" : "test", "_id" : "2" } }
{ "create" : { "_index" : "test", "_id" : "3" } }
{ "field1" : "value3" }
{ "update" : {"_index" : "test", "_id" : "1" } }
{ "doc" : {"field2" : "value2"} }
$ curl -s -H "Content-Type: application/x-ndjson" -XPOST localhost:9200/_bulk --data-binary "@request"
{"took":29,"errors":true,"items":[{"index":{"_index":"test","_type":"_doc","_id":"1","_version":6,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":8,"_primary_term":1,"status":200}},{"delete":{"_index":"test","_type":"_doc","_id":"2","_version":1,"result":"not_found","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":9,"_primary_term":1,"status":404}},{"create":{"_index":"test","_type":"_doc","_id":"3","status":409,"error":{"type":"version_conflict_engine_exception","reason":"[3]: version conflict, document already exists (current version [1])","index_uuid":"CNHCqjrOR8CaPPqBY6-4dw","shard":"0","index":"test"}}},{"update":{"_index":"test","_type":"_doc","_id":"1","_version":7,"result":"updated","_shards":{"total":2,"successful":1,"failed":0},"_seq_no":10,"_primary_term":1,"status":200}}]}
权限管理
前提:x-pack 已经配置(setting [xpack.security.enabled] to [true] in the elasticsearch.yml)。
先创建 role,再创建用户,并分配role。
有不同范围的权限,cluster/ index/ application ,我主要关注的是 index。具体权限看这里
创建role
Request
POST /_security/role/<name>
PUT /_security/role/<name>
example to create role my_admin_role:
POST /_security/role/my_admin_role
{
"cluster": ["all"],
"indices": [
{
"names": [ "index1", "index2" ],
"privileges": ["all"]
}
]
}
创建用户
Request
POST /_security/user/<username>
PUT /_security/user/<username>
example to create user jacknich
:
POST /_security/user/jacknich
{
"password" : "j@rV1s",
"roles" : [ "my_admin_role" ],
"full_name" : "Jack Nicholson",
"email" : "jacknich@example.com"
}
Verify:
GET /_security/user/jacknich
{
"jacknich" : {
"username" : "jacknich",
"roles" : [
"my_admin_role",
"monitor"
],
"full_name" : "Jack Nicholson",
"email" : "jacknich@example.com",
"metadata" : { },
"enabled" : true
}
}
Search API
实践中,这个才是最常用的
GET /my-index-000001/_search
Request
GET /<target>/_search
GET /_search
POST /<target>/_search
POST /_search
例子
查询所有
GET /teacher/_search
{
"query":{"match_all":{}}
}
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/teacher/_search?pretty -d '
> {
> "query": { "match_all": {} }
> }'
{
"took" : 37,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 1.0,
"hits" : [
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "-L-6D3QBwc726urOj0Un",
"_score" : 1.0,
"_source" : {
"name" : "Tony He"
}
},
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "1",
"_score" : 1.0,
"_source" : {
"name" : "John Doe becomes Jane Doe"
}
}
]
}
}
对于上述的响应信息,我们可以看到以下几个部分:
-
took
— ElasticSearch执行搜索操作耗费的时间,以毫秒为单位。 -
timed_out
— 表示搜索操作是否超时 -
_shards
— 表示搜索的分片数量,以及搜索成功/失败的分片数量。 -
hits
— 搜索结果 -
hits.total
— 匹配搜索条件的文档总数 -
hits.hits
— 搜索结果的实际数组(默认为前10个文档) -
max_score
– 最相关的文档分数 -
hits.total.value
- 找到的文档总数
查询名称Tony 的人
GET /teacher/_search
{
"query" : {
"match" : {
"name" : "Tony"
}
}
}
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/teacher/_search?pretty -d '
> {
> "query" : {
> "match" : {
> "name" : "Tony"
> }
> }
> }'
{
"took" : 445,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 2,
"relation" : "eq"
},
"max_score" : 0.7801935,
"hits" : [
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "-L-6D3QBwc726urOj0Un",
"_score" : 0.7801935,
"_source" : {
"name" : "Tony He"
}
},
{
"_index" : "teacher",
"_type" : "_doc",
"_id" : "3",
"_score" : 0.7801935,
"_source" : {
"name" : "Tony Wang"
}
}
]
}
}
果要全部匹配而不是仅仅是包含关键字之一,你需要使用 match_phrase
而不是 match
。
简单查询(uri 搜索)
一个搜索可以用纯粹的uri来执行查询。在这种模式下使用搜索,并不是所有的选项都是暴露的。它可以方便快速进行curl 测试
。
$ curl -XGET 'http://localhost:9200/twitter/tweet/_search?q=user:kimchy'
多个条件使用&拼接;排序:默认是升序asc
例子
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/teacher/_search?q=name:Tony
{"took":50,"timed_out":false,"_shards":{"total":1,"successful":1,"skipped":0,"failed":0},"hits":{"total":{"value":2,"relation":"eq"},"max_score":0.5442147,"hits":[{"_index":"teacher","_type":"_doc","_id":"3","_score":0.5442147,"_source":
{
"name": "Tony Wang"
}},{"_index":"teacher","_type":"_doc","_id":"-L-6D3QBwc726urOj0Un","_score":0.5442147,"_source":
{
"name": "Tony He"
}}]}}
分页查询
可以用from和size参数对结果进行分页。from表示你想获得的第一个结果的偏移量,size表示你想获得的结果的个数。from默认是0,size默认是10.
例如查询第二页
GET /teacher/_search
{
"query": { "match_all": {} },
"from": 1,
"size": 1
}
聚合
查询各城市的人有多少。请求使用 terms
聚合对 teacher
索引中对所有账户进行分组
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/teacher/_search?pretty -d '
> {
> "size": 0,
> "aggs": {
> "group_by_address": {
> "terms": {
> "field": "address.keyword"
> }
> }
> }
> }'
{
"took" : 250,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_address" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Guangzhou",
"doc_count" : 2
},
{
"key" : "Shanghai",
"doc_count" : 1
}
]
}
}
}
在返回结果中 buckets
的值是字段 address
的桶。 doc_count
表示每市的账户数。可以看到广州有 2 个帐户,由于 "size": 0
表示了返回只有聚合结果,无具体数据。
聚合嵌套
请求将 avg
聚合嵌套在 group_by_address
聚合内,以计算没人的平均年龄。
$ curl -H 'Content-Type: application/json' -XGET localhost:9200/teacher/_search?pretty -d '
> {
> "size": 0,
> "aggs": {
> "group_by_address": {
> "terms": {
> "field": "address.keyword"
> },
> "aggs": {
> "average_age": {
> "avg": {
> "field": "age"
> }
> }
> }
> }
> }
> }'
{
"took" : 58,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 3,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"group_by_address" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Guangzhou",
"doc_count" : 2,
"average_age" : {
"value" : 25.0
}
},
{
"key" : "Shanghai",
"doc_count" : 1,
"average_age" : {
"value" : 20.0
}
}
]
}
}
}
聚合
聚合框架有助于基于搜索查询提供聚合数据。
不同类型的聚合
有许多不同类型的聚合,每种聚合都有自己的目的和输出。为了更好地理解这些类型,通常更容易将它们分为四个主要系列:
Bucketing
生成存储桶的一组聚合,其中每个存储桶都与一个键和一个文档条件相关联。执行聚合时,将对上下文中的每个文档评估所有存储桶条件,并且当条件匹配时,该文档将被视为 “落入” 相关存储桶。在汇总过程结束时,我们将获得一个存储桶列表 - 每个存储桶都带有一组 “属于” 的文档。
Metric
用于跟踪和计算一组文档的指标的聚合。
Matrix
一组聚合,可在多个字段上进行操作,并根据从请求的文档字段中提取的值生成矩阵结果。与指标和存储桶聚合不同,此聚合系列尚不支持脚本。
Pipeline
聚合其他聚合的输出及其相关度量的聚合
接下来是有趣的部分。由于每个存储桶有效地定义了一个文档集 (所有属于该存储桶的文档),可以潜在地关联 bucket 级别上的聚合,这些聚合将在该 bucket 的上下文中执行。这就是聚合的真正力量发挥作用的地方: 聚合可以嵌套!
结构聚合
下面的代码片段描述了聚合的基本结构:
"aggregations" : {
"<aggregation_name>" : {
"<aggregation_type>" : {
<aggregation_body>
}
[,"meta" : { [<meta_data_body>] } ]?
[,"aggregations" : { [<sub_aggregation>]+ } ]?
}
[,"<aggregation_name_2>" : { ... } ]*
}
JSON 中的 aggregations 对象(也可以使用 aggs)保存要计算的聚合。每个聚合都与用户定义的逻辑名称相关联(例如,如果聚合计算平均价格,则将其命名为 avg_price 是有意义的)。这些逻辑名称还将用于唯一地标识响应中的聚合。每个聚合都具有特定的类型 (上面的代码片段中为 <aggregation_type>),通常是命名聚合主体中的第一个键。每种聚合类型都取决于聚合的性质 (例如,特定字段上的 avg 聚合将定义要在其上计算平均值的字段),定义自己的主体。在聚合类型定义的同一级别上,可以选择定义一组其他聚合,尽管仅当您定义的聚合具有存储特性时才有意义。在这种情况下,将为存储桶聚合构建的所有存储桶计算您在存储桶聚合级别上定义的子聚合。例如,如果您在 range 聚合下定义一组聚合,则将为定义的范围存储桶计算子聚合。