Spring集成ES的聚合查询

1. 在build.gradle中添加需要的jar包

elasticsearch:                  "org.elasticsearch:elasticsearch:7.5.1",
elasticsearch_client:           "org.elasticsearch.client:transport:7.5.1",
springboot_elasticsearch:       "org.springframework.boot:spring-boot-starter-data-elasticsearch:2.2.0.RELEASE"

请注意,SpringBoot是2.2.0.RELEASE才兼容elasticsearch 7.x

2. application.properties

在application.properties添加elasticsearch的配置

#es的默认名称,如果安装es时没有做特殊的操作名字都是此名称
spring.data.elasticsearch.cluster-name=my-application
# Elasticsearch 集群节点服务地址,用逗号分隔,如果没有指定其他就启动一个客户端节点,默认java访问端口9300
spring.data.elasticsearch.cluster-nodes=localhost:9300
# 设置连接超时时间
spring.data.elasticsearch.properties.transport.tcp.connect_timeout=120s

3. Aggregation基础实现

3.1 聚合基础

操作之前,我们首先建一个employee索引,employee索引类型包括8个字段:编号(id), 姓名(ename),性别(sex),年龄(age),籍贯(birthplace),部门名称(deptid),职位(job),工资(salary)。

生成的mapping如下:

(1)对单个字段group by

例如要计算每个部门的员工数,如果使用SQL语句,应表达如下:

select deptid, count(*) as emp_count 
from employee 
group by deptid 
where sex=0

es的java代码如下:

public static void groupByTest(TransportClient client) {
        SearchRequestBuilder requestBuilder = client.prepareSearch("esindex").setTypes("employee");
        TermsAggregationBuilder termsAggregationBuilder = AggregationBuilders.terms("emp_count").field("deptid");
        requestBuilder.setQuery(QueryBuilders.termQuery("sex", 0)).addAggregation(termsAggregationBuilder);

        SearchResponse response = requestBuilder.execute().actionGet();
        
        Terms aggregation = response.getAggregations().get("emp_count");
        for (Terms.Bucket bucket : aggregation.getBuckets()) {
            System.out.println("部门编号=" + bucket.getKey() + ";员工数=" + bucket.getDocCount());
        }
    }

输出结果:

部门编号=8;员工数=31
部门编号=5;员工数=25
部门编号=6;员工数=22
部门编号=7;员工数=22

(2)group by多个field

例如要计算每个部门每个籍贯的员工数,如果使用SQL语句,应表达如下:

select deptid, birthplace, count(*) as emp_count 
from employee 
group by deptid, birthplace

es的java代码如下:

public static void groupByMutilFieldTest(TransportClient client) {
        SearchRequestBuilder requestBuilder = client.prepareSearch("esindex").setTypes("employee");
        TermsAggregationBuilder aggregationBuilder1 = AggregationBuilders.terms("emp_count").field("deptid");
        TermsAggregationBuilder aggregationBuilder2 = AggregationBuilders.terms("region_count").field("birthplace");
        requestBuilder.addAggregation(aggregationBuilder1.subAggregation(aggregationBuilder2));
        
        SearchResponse response = requestBuilder.execute().actionGet();
        
        Terms terms1 = response.getAggregations().get("emp_count");
        Terms terms2;
        for (Terms.Bucket bucket : terms1.getBuckets()) {
            System.out.println("部门编号=" + bucket.getKey());
            terms2 = bucket.getAggregations().get("region_count");
            for (Terms.Bucket bucket2 : terms2.getBuckets()) {
                System.out.println("籍贯=" + bucket2.getKey() + ";员工数=" + bucket2.getDocCount());
            }
        }
    }

输出结果:

部门编号=8
籍贯=广东佛山;员工数=9
籍贯=湖北武汉;员工数=8
籍贯=湖南长沙;员工数=5
籍贯=广东深圳;员工数=4
籍贯=广东广州;员工数=3
籍贯=湖南岳阳;员工数=2
部门编号=5
籍贯=湖北武汉;员工数=7
籍贯=广东佛山;员工数=4
籍贯=广东广州;员工数=4
籍贯=广东深圳;员工数=4
籍贯=湖南岳阳;员工数=3
籍贯=湖南长沙;员工数=3
部门编号=6
籍贯=广东广州;员工数=6
籍贯=广东佛山;员工数=4
籍贯=广东深圳;员工数=4
籍贯=湖南岳阳;员工数=4
籍贯=湖北武汉;员工数=2
籍贯=湖南长沙;员工数=2
部门编号=7
籍贯=广东广州;员工数=6
籍贯=湖南岳阳;员工数=6
籍贯=广东深圳;员工数=4
籍贯=湖北武汉;员工数=3
籍贯=广东佛山;员工数=2
籍贯=湖南长沙;员工数=1

(3)max/min/sum/avg

例如计算每个部门最高的工资,如果使用SQL语句,应表达如下:

select deptid, max(salary) as max_salary 
from employee 
group by deptid

es的java代码如下:

public static void maxTest(TransportClient client) {
        SearchRequestBuilder requestBuilder = client.prepareSearch("esindex").setTypes("employee");

        TermsAggregationBuilder aggregationBuilder1 = AggregationBuilders.terms("deptid").field("deptid");
        MaxAggregationBuilder aggregationBuilder2 = AggregationBuilders.max("maxsalary").field("salary");
        requestBuilder.addAggregation(aggregationBuilder1.subAggregation(aggregationBuilder2));
        SearchResponse response = requestBuilder.execute().actionGet();

        Terms aggregation = response.getAggregations().get("deptid");
        Max terms2 = null;
        for (Terms.Bucket bucket : aggregation.getBuckets()) {
            terms2 = bucket.getAggregations().get("maxsalary");  //class org.elasticsearch.search.aggregations.metrics.max.InternalMax
            System.out.println("部门编号=" + bucket.getKey() + ";最高工资=" + terms2.getValue());
        }
    }

输出结果:

部门编号=8;最高工资=8000.0
部门编号=5;最高工资=8000.0
部门编号=6;最高工资=8000.0
部门编号=7;最高工资=8000.0

(4)对多个field求max/min/sum/avg

例如要计算每个部门的平均年龄,同时又要计算总薪资,最后按平均年龄升序排序,如果使用SQL语句,应表达如下:

select deptid, avg(age) as avg_age, sum(salary) as max_salary 
from employee 
group by deptid 
order by avg_age asc

es的java代码如下:

public static void maxSumTest1(TransportClient client) {
        SearchRequestBuilder requestBuilder = client.prepareSearch("esindex").setTypes("employee");

        TermsAggregationBuilder aggregationBuilder1 = AggregationBuilders.terms("deptid").field("deptid")
                .order(BucketOrder.aggregation("avg_age",true)); //按平均年龄升序排序,
        AggregationBuilder aggregationBuilder2 = AggregationBuilders.avg("avg_age").field("age");
        AggregationBuilder aggregationBuilder3 = AggregationBuilders.sum("sum_salary").field("salary");
        requestBuilder.addAggregation(aggregationBuilder1.subAggregation(aggregationBuilder2).subAggregation(aggregationBuilder3));
        SearchResponse response = requestBuilder.execute().actionGet();

        Terms aggregation = response.getAggregations().get("deptid");
        Avg terms2 = null;
        Sum term3 = null;
        for (Terms.Bucket bucket : aggregation.getBuckets()) {
            terms2 = bucket.getAggregations().get("avg_age"); // org.elasticsearch.search.aggregations.metrics.avg.InternalAvg
            term3 = bucket.getAggregations().get("sum_salary"); // org.elasticsearch.search.aggregations.metrics.sum.InternalSum
            System.out.println("部门编号=" + bucket.getKey() + ";平均年龄=" + terms2.getValue() + ";总工资=" + term3.getValue());
        }
    }

输出结果:

部门编号=7;平均年龄=31.954545454545453;总工资=129000.0
部门编号=5;平均年龄=33.36;总工资=121000.0
部门编号=8;平均年龄=33.54838709677419;总工资=171000.0
部门编号=6;平均年龄=33.59090909090909;总工资=121000.0

3.2 ORDER 排序

1. 按桶的doc_count升序排序

AggregationBuilders
    .terms("genders")
    .field("gender")
    .order(BucketOrder.count(true))

2. 按桶的英文字母顺序,按升序排列

AggregationBuilders
    .terms("genders")
    .field("gender")
    .order(BucketOrder.key(true))

3.按单值 度量子聚合 (由聚合名称标识)对桶进行排序

AggregationBuilders
    .terms("genders")
    .field("gender")
    .order(BucketOrder.aggregation("avg_height", false))
    .subAggregation(
        AggregationBuilders.avg("avg_height").field("height")
    )

4. 按多个标准排序桶

AggregationBuilders
    .terms("genders")
    .field("gender")
    .order(BucketOrder.compound( // in order of priority:
        BucketOrder.aggregation("avg_height", false), // sort by sub-aggregation first
        BucketOrder.count(true))) // then bucket count as a tie-breaker
    .subAggregation(
        AggregationBuilders.avg("avg_height").field("height")
    )

3.3 Range Aggregation 范围聚合

AggregationBuilder aggregation =
        AggregationBuilders
                .range("agg")
                .field("height")
                .addUnboundedTo(1.0f)               // from -infinity to 1.0 (excluded)
                .addRange(1.0f, 1.5f)               // from 1.0 to 1.5 (excluded)
                .addUnboundedFrom(1.5f);            // from 1.5 to +infinity

3.4 Date Range Aggregation 日期范围聚合

AggregationBuilder aggregation =
        AggregationBuilders
                .dateRange("agg")
                .field("dateOfBirth")
                .format("yyyy")
                .addUnboundedTo("1950")    // from -infinity to 1950 (excluded)
                .addRange("1950", "1960")  // from 1950 to 1960 (excluded)
                .addUnboundedFrom("1960"); // from 1960 to +infinity

3.5 Ip Range Aggregation IP范围聚合

AggregationBuilder aggregation =
        AggregationBuilders
                .ipRange("agg")
                .field("ip")
                .addUnboundedTo("192.168.1.0")             // from -infinity to 192.168.1.0 (excluded)
                .addRange("192.168.1.0", "192.168.2.0")    // from 192.168.1.0 to 192.168.2.0 (excluded)
                .addUnboundedFrom("192.168.2.0");          // from 192.168.2.0 to +infinity

3.6 Histogram Aggregation 直方图聚合

AggregationBuilder aggregation =
        AggregationBuilders
                .histogram("agg")
                .field("height")
                .interval(1);

3.7 Date Histogram Aggregation 日期直方图句

AggregationBuilder aggregation =
        AggregationBuilders
                .dateHistogram("agg")
                .field("dateOfBirth")
                .dateHistogramInterval(DateHistogramInterval.YEAR);

或者你想要设置一个十天的间隔

AggregationBuilder aggregation =
        AggregationBuilders
                .dateHistogram("agg")
                .field("dateOfBirth")
                .dateHistogramInterval(DateHistogramInterval.days(10));

3.8 Geo Distance Aggregation 地理距离聚合

AggregationBuilder aggregation =
        AggregationBuilders
                .geoDistance("agg", new GeoPoint(48.84237171118314,2.33320027692004))
                .field("address.location")
                .unit(DistanceUnit.KILOMETERS)
                .addUnboundedTo(3.0)
                .addRange(3.0, 10.0)
                .addRange(10.0, 500.0);
最后编辑于
©著作权归作者所有,转载或内容合作请联系作者
  • 序言:七十年代末,一起剥皮案震惊了整个滨河市,随后出现的几起案子,更是在滨河造成了极大的恐慌,老刑警刘岩,带你破解...
    沈念sama阅读 213,254评论 6 492
  • 序言:滨河连续发生了三起死亡事件,死亡现场离奇诡异,居然都是意外死亡,警方通过查阅死者的电脑和手机,发现死者居然都...
    沈念sama阅读 90,875评论 3 387
  • 文/潘晓璐 我一进店门,熙熙楼的掌柜王于贵愁眉苦脸地迎上来,“玉大人,你说我怎么就摊上这事。” “怎么了?”我有些...
    开封第一讲书人阅读 158,682评论 0 348
  • 文/不坏的土叔 我叫张陵,是天一观的道长。 经常有香客问我,道长,这世上最难降的妖魔是什么? 我笑而不...
    开封第一讲书人阅读 56,896评论 1 285
  • 正文 为了忘掉前任,我火速办了婚礼,结果婚礼上,老公的妹妹穿的比我还像新娘。我一直安慰自己,他们只是感情好,可当我...
    茶点故事阅读 66,015评论 6 385
  • 文/花漫 我一把揭开白布。 她就那样静静地躺着,像睡着了一般。 火红的嫁衣衬着肌肤如雪。 梳的纹丝不乱的头发上,一...
    开封第一讲书人阅读 50,152评论 1 291
  • 那天,我揣着相机与录音,去河边找鬼。 笑死,一个胖子当着我的面吹牛,可吹牛的内容都是我干的。 我是一名探鬼主播,决...
    沈念sama阅读 39,208评论 3 412
  • 文/苍兰香墨 我猛地睁开眼,长吁一口气:“原来是场噩梦啊……” “哼!你这毒妇竟也来了?” 一声冷哼从身侧响起,我...
    开封第一讲书人阅读 37,962评论 0 268
  • 序言:老挝万荣一对情侣失踪,失踪者是张志新(化名)和其女友刘颖,没想到半个月后,有当地人在树林里发现了一具尸体,经...
    沈念sama阅读 44,388评论 1 304
  • 正文 独居荒郊野岭守林人离奇死亡,尸身上长有42处带血的脓包…… 初始之章·张勋 以下内容为张勋视角 年9月15日...
    茶点故事阅读 36,700评论 2 327
  • 正文 我和宋清朗相恋三年,在试婚纱的时候发现自己被绿了。 大学时的朋友给我发了我未婚夫和他白月光在一起吃饭的照片。...
    茶点故事阅读 38,867评论 1 341
  • 序言:一个原本活蹦乱跳的男人离奇死亡,死状恐怖,灵堂内的尸体忽然破棺而出,到底是诈尸还是另有隐情,我是刑警宁泽,带...
    沈念sama阅读 34,551评论 4 335
  • 正文 年R本政府宣布,位于F岛的核电站,受9级特大地震影响,放射性物质发生泄漏。R本人自食恶果不足惜,却给世界环境...
    茶点故事阅读 40,186评论 3 317
  • 文/蒙蒙 一、第九天 我趴在偏房一处隐蔽的房顶上张望。 院中可真热闹,春花似锦、人声如沸。这庄子的主人今日做“春日...
    开封第一讲书人阅读 30,901评论 0 21
  • 文/苍兰香墨 我抬头看了看天上的太阳。三九已至,却和暖如春,着一层夹袄步出监牢的瞬间,已是汗流浃背。 一阵脚步声响...
    开封第一讲书人阅读 32,142评论 1 267
  • 我被黑心中介骗来泰国打工, 没想到刚下飞机就差点儿被人妖公主榨干…… 1. 我叫王不留,地道东北人。 一个月前我还...
    沈念sama阅读 46,689评论 2 362
  • 正文 我出身青楼,却偏偏与公主长得像,于是被迫代替她去往敌国和亲。 传闻我的和亲对象是个残疾皇子,可洞房花烛夜当晚...
    茶点故事阅读 43,757评论 2 351

推荐阅读更多精彩内容