PS:我使用的ES版本是7.9.2
如题:最近遇到类似需求,需要将terms聚合的结果以列表形式分页展示。
- Demo数据准备:
PUT /employees/_bulk
{ "index" : { "_id" : "1" } }
{ "name" : "Emma","age":32,"job":"Product Manager","gender":"female","salary":35000 }
{ "index" : { "_id" : "2" } }
{ "name" : "Underwood","age":41,"job":"Dev Manager","gender":"male","salary": 50000}
{ "index" : { "_id" : "3" } }
{ "name" : "Tran","age":25,"job":"Web Designer","gender":"male","salary":18000 }
{ "index" : { "_id" : "4" } }
{ "name" : "Rivera","age":26,"job":"Web Designer","gender":"female","salary": 22000}
{ "index" : { "_id" : "5" } }
{ "name" : "Rose","age":25,"job":"QA","gender":"female","salary":18000 }
{ "index" : { "_id" : "6" } }
{ "name" : "Lucy","age":31,"job":"QA","gender":"female","salary": 25000}
{ "index" : { "_id" : "7" } }
{ "name" : "Byrd","age":27,"job":"QA","gender":"male","salary":20000 }
{ "index" : { "_id" : "8" } }
{ "name" : "Foster","age":27,"job":"Java Programmer","gender":"male","salary": 20000}
{ "index" : { "_id" : "9" } }
{ "name" : "Gregory","age":32,"job":"Java Programmer","gender":"male","salary":22000 }
{ "index" : { "_id" : "10" } }
{ "name" : "Bryant","age":20,"job":"Java Programmer","gender":"male","salary": 9000}
{ "index" : { "_id" : "11" } }
{ "name" : "Jenny","age":36,"job":"Java Programmer","gender":"female","salary":38000 }
{ "index" : { "_id" : "12" } }
{ "name" : "Mcdonald","age":31,"job":"Java Programmer","gender":"male","salary": 32000}
{ "index" : { "_id" : "13" } }
{ "name" : "Jonthna","age":30,"job":"Java Programmer","gender":"female","salary":30000 }
{ "index" : { "_id" : "14" } }
{ "name" : "Marshall","age":32,"job":"Javascript Programmer","gender":"male","salary": 25000}
{ "index" : { "_id" : "15" } }
{ "name" : "King","age":33,"job":"Java Programmer","gender":"male","salary":28000 }
{ "index" : { "_id" : "16" } }
{ "name" : "Mccarthy","age":21,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : { "_id" : "17" } }
{ "name" : "Goodwin","age":25,"job":"Javascript Programmer","gender":"male","salary": 16000}
{ "index" : { "_id" : "18" } }
{ "name" : "Catherine","age":29,"job":"Javascript Programmer","gender":"female","salary": 20000}
{ "index" : { "_id" : "19" } }
{ "name" : "Boone","age":30,"job":"DBA","gender":"male","salary": 30000}
{ "index" : { "_id" : "20" } }
{ "name" : "Kathy","age":29,"job":"DBA","gender":"female","salary": 20000}
按job分桶聚合
- query1:
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"field": "job.keyword",
"size": 10
}
}
}
}
- result1
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 20,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myTerms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Java Programmer",
"doc_count" : 7
},
{
"key" : "Javascript Programmer",
"doc_count" : 4
},
{
"key" : "QA",
"doc_count" : 3
},
{
"key" : "DBA",
"doc_count" : 2
},
{
"key" : "Web Designer",
"doc_count" : 2
},
{
"key" : "Dev Manager",
"doc_count" : 1
},
{
"key" : "Product Manager",
"doc_count" : 1
}
]
}
}
}
利用bucket_sort实现分页
- query2
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
}
}
}
注意:
- bucket_sort中 from不是pageNum,如想实现pageNum效果,from=pageNum*size即可;
- terms聚合的size,这里demo中分桶后只有7条,所以我只设了10,实际上size可以尽可能的设置大一点,具体大小按实际情况来看;
- result2
{
"took" : 0,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 20,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myTerms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Java Programmer",
"doc_count" : 7
},
{
"key" : "Javascript Programmer",
"doc_count" : 4
},
{
"key" : "QA",
"doc_count" : 3
},
{
"key" : "DBA",
"doc_count" : 2
},
{
"key" : "Web Designer",
"doc_count" : 2
}
]
}
}
}
可以看到实现了分页的效果,但是新的问题出现了,没有total的话前端很多分页插件效果都不好,查询相关文档,cardinality类似SQL中distinct某个字段后,再count,于是使用cardinality获取相关聚合结果的total:
- query3
GET employees/_search
{
"size": 0,
"query": {
"match_all": {}
},
"aggs": {
"myTerms": {
"terms": {
"field": "job.keyword",
"size": 10
},
"aggs": {
"myBucketSort": {
"bucket_sort": {
"from": 0,
"size": 5,
"gap_policy": "SKIP"
}
}
}
},
"termsCount": {
"cardinality": {
"field": "job.keyword",
"precision_threshold": 30000
}
}
}
}
这里简单介绍一下precision_threshold
摘抄官网的原文:
The precision threshold options allows to trade memory for accuracy, and defines a unique count below which counts are expected to be close to accurate. Above this value, counts might become a bit more fuzzy. The maximum supported value is 40000, thresholds above this number will have the same effect as a threshold of 40000. The default value is 3000.
翻译:
精度阈值选项允许用内存交换精度,并定义了一个唯一的计数,在该计数低于此值时,预计计数接近准确。超过这个值,计数可能会变得有点模糊。支持的最大值是40000,高于这个数字的阈值将具有与40000阈值相同的效果。缺省值为3000。
- result3
{
"took" : 3,
"timed_out" : false,
"_shards" : {
"total" : 1,
"successful" : 1,
"skipped" : 0,
"failed" : 0
},
"hits" : {
"total" : {
"value" : 20,
"relation" : "eq"
},
"max_score" : null,
"hits" : [ ]
},
"aggregations" : {
"myTerms" : {
"doc_count_error_upper_bound" : 0,
"sum_other_doc_count" : 0,
"buckets" : [
{
"key" : "Java Programmer",
"doc_count" : 7
},
{
"key" : "Javascript Programmer",
"doc_count" : 4
},
{
"key" : "QA",
"doc_count" : 3
},
{
"key" : "DBA",
"doc_count" : 2
},
{
"key" : "Web Designer",
"doc_count" : 2
}
]
},
"termsCount" : {
"value" : 7
}
}
}
result3中termsCount就是我们想要得到的total。
以上就是我对terms聚合后再对其聚合结果进行分页的解决办法,如果大家有其他更好的办法,也请分享出来。
如果觉得本文有帮助到你,请给我点个赞吧!
PS:如果需要在此基础上支持搜索功能,请移步Elasticsearch聚合结果分页并支持之模糊查询