Parent / Child
-
对象和 Nested 对象的局限性
- 每次更新,需要重新索引整个对象(包括根对象和嵌套对象)
-
ES 提供了类似关系型数据库中 Join 的实现。使⽤ Join 数据类型实现,可以通过维护 Parent / Child 的关系,从⽽分离两个对象
⽗⽂档和⼦⽂档是两个独⽴的⽂档
更新⽗⽂档⽆需重新索引⼦⽂档。⼦⽂档被添加,更新或者删除也不会影响到⽗⽂档和其他的⼦⽂档
⽗⼦关系
-
定义⽗⼦关系的⼏个步骤
1.设置索引的 Mapping
2.索引⽗⽂档
3.索引⼦⽂档
4.按需查询⽂档
设置 Mapping
# 设定 Parent/Child Mapping
PUT my_blogs
{
"settings": {
"number_of_shards": 2
},
"mappings": {
"properties": {
"blog_comments_relation": {
"type": "join",
"relations": {
"blog": "comment"
}
},
"content": {
"type": "text"
},
"title": {
"type": "keyword"
}
}
}
}
索引⽗⽂档
#索引父文档
PUT my_blogs/_doc/blog1
{
"title":"Learning Elasticsearch",
"content":"learning ELK @ geektime",
"blog_comments_relation":{
"name":"blog"
}
}
#索引父文档
PUT my_blogs/_doc/blog2
{
"title":"Learning Hadoop",
"content":"learning Hadoop",
"blog_comments_relation":{
"name":"blog"
}
}
索引⼦⽂档
#索引子文档
PUT my_blogs/_doc/comment1?routing=blog1
{
"comment":"I am learning ELK",
"username":"Jack",
"blog_comments_relation":{
"name":"comment",
"parent":"blog1"
}
}
#索引子文档
PUT my_blogs/_doc/comment2?routing=blog2
{
"comment":"I like Hadoop!!!!!",
"username":"Jack",
"blog_comments_relation":{
"name":"comment",
"parent":"blog2"
}
}
#索引子文档
PUT my_blogs/_doc/comment3?routing=blog2
{
"comment":"Hello Hadoop",
"username":"Bob",
"blog_comments_relation":{
"name":"comment",
"parent":"blog2"
}
}
-
⽗⽂档和⼦⽂档必须存在相同的分⽚上
- 确保查询 join 的性能
-
当指定⼦⽂档时候,必须指定它的⽗⽂档 Id
- 使⽤ route 参数来保证,分配到相同的分⽚
Parent / Child 所⽀持的查询
查询所有⽂档
Parent Id 查询
Has Child 查询
Has Parent 查询
使⽤ has_child 查询
返回⽗⽂档
-
通过对⼦⽂档进⾏查询
返回具有相关⼦⽂档的⽗⽂档
⽗⼦⽂档在相同的分⽚上,因此 Join 效率⾼
# Has Child 查询,返回父文档
POST my_blogs/_search
{
"query": {
"has_child": {
"type": "comment",
"query" : {
"match": {
"username" : "Jack"
}
}
}
}
}
使⽤ has_parent 查询
返回相关的⼦⽂档
通过对⽗⽂档进⾏查询
返回所有相关⼦⽂档
# Has Parent 查询,返回相关的子文档
POST my_blogs/_search
{
"query": {
"has_parent": {
"parent_type": "blog",
"query" : {
"match": {
"title" : "Learning Hadoop"
}
}
}
}
}
使⽤ parent_id 查询
返回所有相关⼦⽂档
通过对⽗⽂档 Id 进⾏查询
返回所有相关⼦⽂档
# Parent Id 查询
POST my_blogs/_search
{
"query": {
"parent_id": {
"type": "comment",
"id": "blog2"
}
}
}
访问⼦⽂档
- 需指定⽗⽂档 routing 参数
#通过ID ,访问子文档
GET my_blogs/_doc/comment3
res:(报错)
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "comment3",
"found" : false
}
#通过ID和routing ,访问子文档
GET my_blogs/_doc/comment3?routing=blog2
res:
{
"_index" : "my_blogs",
"_type" : "_doc",
"_id" : "comment3",
"_version" : 1,
"_seq_no" : 4,
"_primary_term" : 1,
"_routing" : "blog2",
"found" : true,
"_source" : {
"comment" : "Hello Hadoop",
"username" : "Bob",
"blog_comments_relation" : {
"name" : "comment",
"parent" : "blog2"
}
}
}
更新⼦⽂档
- 更新⼦⽂档不会影响到⽗⽂档
#更新子文档
PUT my_blogs/_doc/comment3?routing=blog2
{
"comment": "Hello Hadoop???",
"blog_comments_relation": {
"name": "comment",
"parent": "blog2"
}
}
嵌套对象 v.s ⽗⼦⽂档
Nested Object | Parent / Child | |
---|---|---|
优点 | ⽂档存储在⼀起,读取性能⾼ | ⽗⼦⽂档可以独⽴更新 |
缺点 | 更新嵌套的⼦⽂档时,需要更新整个⽂档 | 需要额外的内存维护关系。读取 性能相对差 |
适⽤场景 | ⼦⽂档偶尔更新,以查询为主 | ⼦⽂档更新频繁 |
课程demo
DELETE my_blogs
# 设定 Parent/Child Mapping
PUT my_blogs
{
"settings": {
"number_of_shards": 2
},
"mappings": {
"properties": {
"blog_comments_relation": {
"type": "join",
"relations": {
"blog": "comment"
}
},
"content": {
"type": "text"
},
"title": {
"type": "keyword"
}
}
}
}
#索引父文档
PUT my_blogs/_doc/blog1
{
"title":"Learning Elasticsearch",
"content":"learning ELK @ geektime",
"blog_comments_relation":{
"name":"blog"
}
}
#索引父文档
PUT my_blogs/_doc/blog2
{
"title":"Learning Hadoop",
"content":"learning Hadoop",
"blog_comments_relation":{
"name":"blog"
}
}
#索引子文档
PUT my_blogs/_doc/comment1?routing=blog1
{
"comment":"I am learning ELK",
"username":"Jack",
"blog_comments_relation":{
"name":"comment",
"parent":"blog1"
}
}
#索引子文档
PUT my_blogs/_doc/comment2?routing=blog2
{
"comment":"I like Hadoop!!!!!",
"username":"Jack",
"blog_comments_relation":{
"name":"comment",
"parent":"blog2"
}
}
#索引子文档
PUT my_blogs/_doc/comment3?routing=blog2
{
"comment":"Hello Hadoop",
"username":"Bob",
"blog_comments_relation":{
"name":"comment",
"parent":"blog2"
}
}
# 查询所有文档
POST my_blogs/_search
{
}
#根据父文档ID查看
GET my_blogs/_doc/blog2
# Parent Id 查询
POST my_blogs/_search
{
"query": {
"parent_id": {
"type": "comment",
"id": "blog2"
}
}
}
# Has Child 查询,返回父文档
POST my_blogs/_search
{
"query": {
"has_child": {
"type": "comment",
"query" : {
"match": {
"username" : "Jack"
}
}
}
}
}
# Has Parent 查询,返回相关的子文档
POST my_blogs/_search
{
"query": {
"has_parent": {
"parent_type": "blog",
"query" : {
"match": {
"title" : "Learning Hadoop"
}
}
}
}
}
#通过ID ,访问子文档
GET my_blogs/_doc/comment3
#通过ID和routing ,访问子文档
GET my_blogs/_doc/comment3?routing=blog2
#更新子文档
PUT my_blogs/_doc/comment3?routing=blog2
{
"comment": "Hello Hadoop??",
"blog_comments_relation": {
"name": "comment",
"parent": "blog2"
}
}
相关阅读
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-has-child-query.html
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-has-parent-query.html
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-parent-id-query.html
- https://www.elastic.co/guide/en/elasticsearch/reference/7.1/query-dsl-parent-id-query.html