32、初识搜索引擎_分页搜索以及deep paging性能问题深度图解揭秘

1、如何使用Elasticsearch进行分页搜索的语法

size：分页的页大小
from：分页到第几页，从0开始为第一页

语法:

GET /_search?size=10
GET /_search?size=10&from=0
GET /_search?size=10&from=20

举例：

1、首先查询test_index/test_type中有几条数据,哪些数据

GET /test_index/test_type/_search
-----------------------------结果----------------------------
{
  "took": 8,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [....省略 ]
  }
}
查询到的9条数据，id分别是：AW9QyvfMvHutHJSW2Hir，7，11，2，5，8，9，1，3

2、我们假设将这9条数据分成3页，每一页是3条数据，来实验一下这个分页搜索的效果
第一页：id = AW9QyvfMvHutHJSW2Hir，7，11

GET /test_index/test_type/_search?from=0&size=3
-------------------------结果----------------------------
{
  "took": 2,
  "timed_out": false,
  "_shards": {
    "total": 3,
    "successful": 3,
    "failed": 0
  },
  "hits": {
    "total": 9,
    "max_score": 1,
    "hits": [
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "AW9QyvfMvHutHJSW2Hir",
        "_score": 1,
        "_source": {
          "test_content": "my test_content automatic"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "7",
        "_score": 1,
        "_source": {
          "test_field": "test client 2"
        }
      },
      {
        "_index": "test_index",
        "_type": "test_type",
        "_id": "11",
        "_score": 1,
        "_source": {
          "num": 0,
          "tags": [
            "tag1"
          ]
        }
      }
    ]
  }
}

第二页：id = 2，5，8

GET /test_index/test_type/_search?from=3&size=3

第三页：id = 9，1，3

GET /test_index/test_type/_search?from=6&size=3

2、什么是deep paging，它的底层原理是什么，会产生什么问题？

deep paging性能问题，很高级的知识点

什么是deep paging：

deep paging简单来说就是搜索过深。比如总共有4个shard，其中有3个shard存储有相同索引的数据，有一个shard上没有，总共有60000条数据，3个shard，每个shard上存储了20000条数据，每页是10条数据，这个时候，要搜索第1000页的数据，一般情况下，搜索到的就是10001-10010，但是分片从每个shard上获取数据的话，我们真正获取到的数据是第几条到第几条？

错误的理解：每个shard都要返回的是最后10条数据。
错误原因：因为客户端的请求可能首先会发送到一个不包含这个index的shard上去，这个node就是coordinate node，那么这个coordinate node就会将搜索请求转发到index的三个shard所在node上去。

实际情况：要搜索60000条数据中的第1000页的数据，实际上每个shard都要将内部的20000条数据从第1000页数据如第10001-10010条为止之前的数据都拿出来，不是只拿10条，而是拿10010条数据，3个shard每个shard都返回10010条数据给coordinate node，coordinate node会总共接收30030条数据，然后再把这些数据进行排序，这里排序根据_score（相关度分数）排序，从排序之后的数据里取第1000页的数据，正好10条数据，并返回给客户端。

deep paging有什么问题：

搜索过深的时候，就需要在coordinate node上存储大量的数据，还要进行大量数据的排序，排序之后，再取出对应的那一页，这个过程，既会耗费网络宽带和内存，还因为大量的排序运算耗费cpu，这是deep paging的性能问题，我们应该尽量避免出现这种deep paging操作。

deep paging问题图解