最近在用es做统计的时候,有个同比和环比的需求,但是数据量超过10000的时候只能显示10000+,如下:
经过在网上冲浪之后找到了解决办法,一行代码搞定!
searchSourceBuilder.trackTotalHits(true);
官网解释是这样的:
elasticsearch 分页查询全部数据只显示10000条问题
当索引非常非常大(千万或亿),是无法安装 from + size 做深分页的,分页越深则越容易OOM,即便不OOM,也是很消耗CPU和内存资源的。官方在后2.x版本中已增加限定 index.max_result_window:10000作为保护措施 ,即默认 from + size 不能超过1万。
所以数据量大的话,采用scroll游标查询。
final Scroll scroll = new Scroll(TimeValue.timeValueMinutes(1L)); //设定滚动时间间隔
searchRequest.scroll(scroll);
searchSourceBuilder.size(1000); //设定每次返回多少条数据
searchRequest.source(searchSourceBuilder);
SearchResponse searchResponse = restHighLevelClient.search(searchRequest, RequestOptions.DEFAULT);
String scrollId = searchResponse.getScrollId();
List<Map<String, Object>> mapList = new ArrayList<>();
boolean flag;
SearchHit[] searchHits = searchResponse.getHits().getHits();
for (SearchHit searchHit : searchHits) {
mapList.add(searchHit.getSourceAsMap());
}
}
//遍历搜索命中的数据,直到没有数据
while (searchHits != null && searchHits.length > 0) {
SearchScrollRequest scrollRequest = new SearchScrollRequest(scrollId);
scrollRequest.scroll(scroll);
try {
searchResponse = restHighLevelClient.scroll(scrollRequest, RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
scrollId = searchResponse.getScrollId();
searchHits = searchResponse.getHits().getHits();
if (searchHits != null && searchHits.length > 0) {
for (SearchHit searchHit : searchHits) {
mapList.add(searchHit.getSourceAsMap());
}
}
}
}
//清除滚屏
ClearScrollRequest clearScrollRequest = new ClearScrollRequest();
clearScrollRequest.addScrollId(scrollId);//也可以选择setScrollIds()将多个scrollId一起使用
ClearScrollResponse clearScrollResponse = null;
try {
clearScrollResponse = restHighLevelClient.clearScroll(clearScrollRequest,RequestOptions.DEFAULT);
} catch (IOException e) {
e.printStackTrace();
}
boolean succeeded = clearScrollResponse.isSucceeded();
System.out.println("succeeded:" + succeeded);