title: "Elasticsearch学习笔记之文档的增删改查"
date: 2020-06-10T14:16:17+08:00
summary: "Elasticsearch学习笔记之文档的增删改查"
Elasticsearch相关概念讲解,参看官方博客 ,可以看下这篇文章对Elasticsearch来个初步印象:《终于有人把Elasticsearch原理讲透了》。
笔记参考:
假设我们已经安装好Elasticsearch和Kibana,浏览器输入ip:5601
,进入Kibana操作界面,这次我们使用到的是Kibana菜单里的Dev Tools
功能,如图示:
创建文档
执行如下命令:
POST twitter/_doc/1
{
"username":"Dannis",
"uid":1
}
这行命令表明,向Elasticsearch发送一个POST
请求,创建一个名为twitter
的索引(index
),同时生成一个文档,文档的ID为1,点击三角图标执行请求:
生成结果为:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 0,
"_primary_term" : 1
}
我们来分析下返回的结果信息,_index
表明我们创建的索引名为twitter
,_type
类型是文档类型_doc
,_id
文档ID为1,这些信息都是我们在请求里指定的,没问题,接下来,version
表示版本号,result
表明我们的操作属于创建created
,_shards
表示分片信息,里面的信息表明有2个分片,成功1个,失败0个,这是什么意思呢?
我们执行GET _cat/shards/twitter
命令,该命令是用来查看索引的分片信息,输出结果如下所示:
twitter 0 p STARTED 1 3.8kb 172.29.0.2 09baeea2d96a
twitter 0 r UNASSIGNED
我们看到创建的索引twitter
有两个分片,p
是primary
的首字母缩写,表示主分片,STAARTED
表示分片正常。r
又是什么呢?
我们再来看下索引的设置信息setting
,输入并执行:
GET twitter/_settings
返回结果如下:
{
"twitter" : {
"settings" : {
"index" : {
"creation_date" : "1591930378288",
"number_of_shards" : "1",
"number_of_replicas" : "1",
"uuid" : "3vc_AZQcQbGM-a6FtTSX5g",
"version" : {
"created" : "7070199"
},
"provided_name" : "twitter"
}
}
}
}
可以看到索引的详细信息,creation_date
表示索引的创建时间,以时间戳形式展示;number_of_shards
表示主分片的数量,默认是1,number_of_replicas
表示索引的副本数量,默认也是1。
回到上面GET _cat/shards/twitter
命令的返回结果,我们知道了r
表示的是分片副本的意思,即replica
的首字母,UNASSIGNED
意为未分配,为什么是这个呢?这是因为我本地环境开启的的Elasticsearch环境为开发模式,只有一个节点,也就是主节点,如果开启的是集群模式,则副本数据则会自动分配到加入的节点上。
创建文档的时候,如果没有指定文档ID,Elasticsearch会自动给该文档生成一个ID,如下:
POST twitter/_doc
{
"username":"test",
"age":20
}
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "EIcep3IBFDGGZnEIs9aT",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 4,
"_primary_term" : 1
}
我们看到返回有将文档_id
返回,再来查看一下:
GET twitter/_doc/EIcep3IBFDGGZnEIs9aT
返回内容正是我们期望写入的数据:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "EIcep3IBFDGGZnEIs9aT",
"_version" : 1,
"_seq_no" : 4,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "test",
"age" : 20
}
}
我们还可以通过调用_create
方法来创建文档,如下:
POST twitter/_create/2
{
"username":"GB",
"uid":1,
"city":"Guangzhou",
"province":"Guangdong",
"country":"China"
}
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 6,
"_primary_term" : 1
}
可知数据已成功写入,如果我们再次调用如下命令:
POST twitter/_create/2
{
"username":"GB",
"uid":1,
"city":"Guangzhou",
"province":"Guangdong",
"country":"China"
}
返回结果如下:
{
"error" : {
"root_cause" : [
{
"type" : "version_conflict_engine_exception",
"reason" : "[2]: version conflict, document already exists (current version [1])",
"index_uuid" : "AiRNBupCRWWLFKqFSVW3mw",
"shard" : "0",
"index" : "twitter"
}
],
"type" : "version_conflict_engine_exception",
"reason" : "[2]: version conflict, document already exists (current version [1])",
"index_uuid" : "AiRNBupCRWWLFKqFSVW3mw",
"shard" : "0",
"index" : "twitter"
},
"status" : 409
}
返回错误,提示我们文档已经存在,那么如果执行下面的命令呢?
POST twitter/_doc/1
{
"username":"Dannis",
"uid":1
}
返回结果如下:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 7,
"_primary_term" : 1
}
发现成功响应,只是result
值变成了updated
,并且版本号也变成了2,由此可知,如果使用POST twitter/_create/2
方式来创建文档,只有该文档ID不存在才能创建成功,使用POST twitter/_doc/1
方式来创建文档,如果文档不存在,则创建,如果文档已存在,则结果会变成对指定文档的更新操作,并且版本号加1。
查询文档
输入执行:
GET twitter/_doc/1
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "Dannis",
"uid" : 1
}
}
可以看到我们定义的文档数据被放在了_source
字段下面,found
字段表示查到了该文档,如果我们输入以下命令:
GET twitter/_doc/2
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"found" : false
}
可以看到found
为false。
修改文档
输入命令并执行:
POST twitter/_update/1
{
"doc": {
"username":"Dannis-update"
}
}
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1
}
我们看到result
值变成了updated
,_version
也变成了2,再查看一下:
GET twitter/_doc/1
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "Dannis-update",
"uid" : 1
}
}
没问题,username
已变成了我们想要的。
如果我们更新的时候是加入一个不存在的字段,看下发生什么?输入以下命令并执行:
POST twitter/_update/1
{
"doc": {
"age":25
}
}
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1
}
可以看到文档的版本号变为3了。
再来查看一下,
GET twitter/_doc/1
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"_seq_no" : 2,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "Dannis-update",
"uid" : 1,
"age" : 25
}
}
发现返回的_source
里也包含了刚加的内容,由此可知,如果字段存在,则会更新原来的字段内容,如果字段不存在,则会添加新的字段内容。
删除文档
输入命令并执行:
DELETE twitter/_doc/1
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 3,
"_primary_term" : 1
}
可看到_version
变成了4,而result
为deleted,由此发现,我们对文档每进行一次写入/更新操作,版本号都会加1。
我们再来查看一下:
GET twitter/_doc/1
返回结果:
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"found" : false
}
found
为false,表示找不到ID为1的文档了,表明删除文档成功。
下面来看下对文档的批量操作。
批量创建文档
批量创建多个文档,输入以下命令并执行:
POST _bulk
{"index":{"_index":"twitter","_id":3}}
{"username":"张三","uid":3,"age":30}
{"index":{"_index":"twitter","_id":4}}
{"username":"李四","uid":4,"age":25}
{"index":{"_index":"twitter","_id":5}}
{"username":"王五","uid":5,"age":18}
返回结果:
{
"took" : 44,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 9,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "4",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 10,
"_primary_term" : 1,
"status" : 201
}
},
{
"index" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "5",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 11,
"_primary_term" : 1,
"status" : 201
}
}
]
}
没有异常,如果我们想同时往不同的索引里写入数据呢?执行以下命令:
POST _bulk
{"index":{"_index":"twitter","_id":6}}
{"username":"Jack","uid":6,"age":22}
{"index":{"_index":"twitter_v1","_id":1}}
{"username":"test","age":20}
我们同时往twitter
和twitter_v1
各写入一条数据,分别用GET twitter/_doc/6
和GET twitter_v1/_doc/1
查询,都能正常查询到。
批量查询文档
如果我们想同时查询多个文档,可执行如下命令:
GET _mget
{
"docs":[
{
"_index":"twitter",
"_id":1
},
{
"_index":"twitter",
"_id":2
}
]
}
返回结果:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"_seq_no" : 8,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "GB",
"uid" : 1,
"city" : "Guangzhou2",
"province" : "Guangdong",
"country" : "China"
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "2",
"_version" : 1,
"_seq_no" : 6,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "GB",
"uid" : 1,
"city" : "Guangzhou",
"province" : "Guangdong",
"country" : "China"
}
}
]
}
这是同时查同一个索引的文档,也可以同时查多个索引下的文档,比如:
GET _mget
{
"docs":[
{
"_index":"twitter",
"_id":1
},
{
"_index":"twitter_v1",
"_id":1
}
]
}
结果如下:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"_seq_no" : 8,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "GB",
"uid" : 1,
"city" : "Guangzhou2",
"province" : "Guangdong",
"country" : "China"
}
},
{
"_index" : "twitter_v1",
"_type" : "_doc",
"_id" : "1",
"_version" : 1,
"_seq_no" : 0,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "test",
"age" : 20
}
}
]
}
批量更新文档
批量更新文档,执行如下命令:
POST _bulk
{"update":{"_index":"twitter","_id":1}}
{"doc":{"username":"GB-update"}}
{"update":{"_index":"twitter_v1","_id":1}}
{"doc":{"age":25}}
返回结果:
{
"took" : 107,
"errors" : false,
"items" : [
{
"update" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 13,
"_primary_term" : 1,
"status" : 200
}
},
{
"update" : {
"_index" : "twitter_v1",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 1,
"_primary_term" : 1,
"status" : 200
}
}
]
}
我们再用以下命令查询一下:
GET _mget
{
"docs":[
{
"_index":"twitter",
"_id":1
},
{
"_index":"twitter_v1",
"_id":1
}
]
}
返回结果如下所示,发现数据都变成了我们想要的内容。
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 4,
"_seq_no" : 13,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "GB-update",
"uid" : 1,
"city" : "Guangzhou2",
"province" : "Guangdong",
"country" : "China"
}
},
{
"_index" : "twitter_v1",
"_type" : "_doc",
"_id" : "1",
"_version" : 2,
"_seq_no" : 1,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "test",
"age" : 25
}
}
]
}
批量删除文档
批量删除文档跟批量更新的命令类似,如下:
POST _bulk
{"delete":{"_index":"twitter","_id":1}}
{"delete":{"_index":"twitter_v1","_id":1}}
返回结果:
{
"took" : 80,
"errors" : false,
"items" : [
{
"delete" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"_version" : 5,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 14,
"_primary_term" : 1,
"status" : 200
}
},
{
"delete" : {
"_index" : "twitter_v1",
"_type" : "_doc",
"_id" : "1",
"_version" : 3,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 2,
"_primary_term" : 1,
"status" : 200
}
}
]
}
我们再用GET _mget
命令去查这两个文档的时候,发现是找不到数据了,返回结果如下:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "1",
"found" : false
},
{
"_index" : "twitter_v1",
"_type" : "_doc",
"_id" : "1",
"found" : false
}
]
}
同时进行创建、更新、删除文档操作
通过上面的例子我们发现,对文档的批量操作都是通过_bulk
命令来操作,只是传入的参数不同,那么可不可以同时进行创建、更新、删除操作呢?试一下:
POST _bulk
{"index":{"_index":"twitter","_id":10}}
{"username":"小飞飞","age":12}
{"update":{"_index":"twitter","_id":3}}
{"doc":{"username":"张三-update"}}
{"delete":{"_index":"twitter","_id":6}}
返回结果:
{
"took" : 51,
"errors" : false,
"items" : [
{
"index" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "10",
"_version" : 1,
"result" : "created",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 15,
"_primary_term" : 1,
"status" : 201
}
},
{
"update" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"result" : "updated",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 16,
"_primary_term" : 1,
"status" : 200
}
},
{
"delete" : {
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"_version" : 2,
"result" : "deleted",
"_shards" : {
"total" : 2,
"successful" : 1,
"failed" : 0
},
"_seq_no" : 17,
"_primary_term" : 1,
"status" : 200
}
}
]
}
我们再来批量查询一下:
GET _mget
{
"docs":[
{
"_index":"twitter",
"_id":10
},
{
"_index":"twitter",
"_id":3
},
{
"_index":"twitter",
"_id":6
}
]
}
返回结果:
{
"docs" : [
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "10",
"_version" : 1,
"_seq_no" : 15,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "小飞飞",
"age" : 12
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "3",
"_version" : 2,
"_seq_no" : 16,
"_primary_term" : 1,
"found" : true,
"_source" : {
"username" : "张三-update",
"uid" : 3,
"age" : 30
}
},
{
"_index" : "twitter",
"_type" : "_doc",
"_id" : "6",
"found" : false
}
]
}
发现数据都符合预期。
到这里,发现了通过bulk
创建文档时,参数:{"index":{"_index":"twitter","_id":10}}
里的index
其实是个动词,即创建索引,特此说明。